Month: July 2016

Apache Camel – Throughput or Performance Improvement while writing into files

Hi,

 

I bet that the example that i am going to share with you in this article is going to help you a lot.

 

A very common use case for a process would be to read from a file or to write into a file. If you are using a camel route for this purpose, you would notice that the file reading works perfectly fine but the file writing will take longer than what you’ve expected. Well that’s because when using the file component the IO operation is very expensive and in a production environment this means your application is expensive for the admin. A simple file writing process to generate a file with 0.1 million records would easily take around 10-15 mins.

In this example, i will try to explain the performance improvement of file writing by comparison technique. The comparison is between camel file component v/s camel aggregator v/s camel stream and finally we will which one is a best fit for performance improvement.

Let’s put these three options into our routes:


<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:camel="http://camel.apache.org/schema/spring"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd
http://camel.apache.org/schema/spring
http://camel.apache.org/schema/spring/camel-spring.xsd">

<bean id="shutdownInitiator" class="foo.beans.ShutdownInitiator" />
<bean id="counter" class="foo.beans.CounterIncrementer" />
<bean id="fileAggregationStrategy" class="foo.aggregator.FileAggregationStrategy" />

<camelContext xmlns="http://camel.apache.org/schema/spring">

<!-- Camel file writing using regular file component -->
<camel:route id="routeUsingFileComponent" autoStartup="false">
<camel:from uri="file:src/txt?fileName=LongFile1.txt&amp;noop=true" />
<camel:convertBodyTo type="java.lang.String" />
<camel:log message="${date:now:yyyy-MM-dd HH:mm:ss} File writing started..." />
<camel:split>
<camel:tokenize token="\r\n" />
<camel:bean ref="counter" />
<camel:to uri="file:src/txt1?fileName=ProcessedRecs.txt&amp;autoCreate=true&amp;fileExist=Append" />
</camel:split>
<camel:log message="${date:now:yyyy-MM-dd HH:mm:ss} File writing finished..." />
<camel:bean ref="shutdownInitiator" />
</camel:route>

<!-- camel route using aggregator -->
<camel:route id="routeUsingAggregator" autoStartup="false">
<camel:from uri="file:src/txt?fileName=LongFile1.txt&amp;noop=true" />
<camel:convertBodyTo type="java.lang.String" />
<camel:log message="${date:now:yyyy-MM-dd HH:mm:ss} File writing started..." />
<camel:split>
<camel:tokenize token="\r\n" />
<camel:bean ref="counter" />
<camel:aggregate strategyRef="fileAggregationStrategy" completionInterval="3000">
<camel:correlationExpression>
<camel:simple>${header.CamelFileName}</camel:simple>
</camel:correlationExpression>
<camel:to uri="file:src/txt1?fileName=ProcessedRecs.txt&amp;autoCreate=true&amp;fileExist=Append" />
</camel:aggregate>
</camel:split>
<camel:log message="${date:now:yyyy-MM-dd HH:mm:ss} File writing finished..." />
<camel:bean ref="shutdownInitiator" />
</camel:route>

<!-- camel route using camel stream -->
<camel:route id="routeUsingStream" autoStartup="false">
<camel:from uri="file:src/txt?fileName=LongFile1.txt&amp;noop=true" />
<camel:convertBodyTo type="java.lang.String" />
<camel:log message="${date:now:yyyy-MM-dd HH:mm:ss} File writing started..." />
<camel:split>
<camel:tokenize token="\r\n" />
<camel:bean ref="counter" />
<camel:to uri="stream:file?fileName=src/txt1/ProcessedRecs.txt" />
</camel:split>
<camel:log message="${date:now:yyyy-MM-dd HH:mm:ss} File writing finished..." />
<camel:bean ref="shutdownInitiator" />
</camel:route>
</camelContext>
</beans>
  1. The logic inside bean “counter”is simple. For every single record sent by the splitter EIP, it will prepend the date time to the string.
  2. The File aggregation strategy simply combines the old and new exchange.
  3. Finally the bean “shutdownInitiator” is a fancy way to tell my main class that the camel route execution is completed.

And the main class is a stand alone java code that will load my camel context and start the route i want to.


package foo;

import org.apache.camel.CamelContext;
import org.springframework.context.ApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;

import foo.util.AppConstants;

public class MainApp {

public static void main(String[] args) throws Exception {
 ApplicationContext springContext = new ClassPathXmlApplicationContext("META-INF/spring/camel-context.xml");
 CamelContext context = springContext.getBean(CamelContext.class);

 context.start();
context.startRoute("routeUsingFileComponent");

 while(!AppConstants.shutdown) 
 Thread.sleep(1000);

 System.out.println("shutting down camel...");
 context.stop();

}

}

I have loaded all the routes but instructed the container not to start them. I will be executing the main class 3 times and each time i will starting a different route. The input file (LongFile1.txt) contains the 0.1 million lines.

 

So, here are the results from the log messages:

When the route “routeUsingFileComponent” is started, it took  28 mins (whoa..)

routeUsingFileComponent INFO 2016-07-10 02:09:50 File writing started…
routeUsingFileComponent INFO 2016-07-10 02:37:08 File writing finished…
shutting down camel…

Using the route “routeUsingAggregator”,  it took 46 seconds.

routeUsingAggregator INFO 2016-07-10 02:02:59 File writing started…
routeUsingAggregator INFO 2016-07-10 02:03:45 File writing finished…
shutting down camel…

And finally using the route “routeUsingStream”, it took 17 seconds.

routeUsingStream INFO 2016-07-10 02:07:01 File writing started…
routeUsingStream INFO 2016-07-10 02:07:18 File writing finished…
shutting down camel…

 

Finally the winner is: camel stream component which took around 17 seconds.

 

By fine tuning the completion interval for the aggregator in “routeUsingAggreator” we can still improve the performance but the point here is camel stream would do just fine without the need for an aggregator (you will have to provide an aggregation strategy, correlation expression, completion predicate)

Sure camel stream component is the winner but there are few disadvantages with this. Camel file component provides you a lot of options when writing into the files like force writing, setting permission on the file, etc which are not handy when it comes to the stream component. Finally it boils down to performance v/s integrity and of course the decision is always business driven.

 

References:

http://camel.apache.org/file2.html

http://camel.apache.org/stream.html

http://fabian-kostadinov.github.io/2016/01/10/reading-from-and-writing-to-files-in-apache-camel/

http://www.catify.com/2012/07/09/parsing-large-files-with-apache-camel/

 

Thanks,

Kalyan

 

Apache Camel Interview Questions – Basics Part2

Hi,

 

This is continuation of my previous article.

 

Q) How many Messages can exist in an exchange?

A) Only one message.

Q) What are global and local routes in camel?

A) A local rule is a rule written inside <route> tags (in case of spring DSL). A global rule starts with a special processor type such as intercept(), exception(), or errorHandler().

A local rule always starts by defining a consumer endpoint, using from(“endPointURI”), and typically (but not always) ends by defining a producer endpoint using to(“endPointURI”).

 

Q) What is an event-driven consumer and a polling consumer?

A) “direct” component is an example of an event-driven consumer and file, ftp, etc are examples of a polling consumer. An event driven consumer responds for an event(action) whereas a polling consumer will keep on polling the source location for any messages.

Q) What are the message exchange patterns provided in camel?

A) The exchange object holds the message object which has a property attached to it called Message Exchange Pattern also referred to as MEP. To the end user, the exchange pattern is simply a way to identify the type of exchange object that we are dealing with.

InOnly (Fire and forget)

InOut (Request – Response)

Q) If we have multiple routes defined, what is the order in which these routes are  executed?

A) Camel routes are always executed in the order they are defined. Though this can be controlled by passing the attribute “startupOrder” to the camel:route tag in the spring.

Q) How do we start the execution of camel routes?

A) The execution can be started by loading the object of camel context (camel runtime or container) that contains all the routes. If it is a standalone application, an additional step of starting the camel context is required.

Q) How can we start and stop a single route at runtime?

A) camel context provides overloaded versions of the methods startRoute() and stopRoute() that can be used to start and stop routes at run time.

Q) Can we have two routes to read from the same file?

A) No. Camel does not allow to have more than one consumer endpoint trying to read from the same source location.

Q) Can we print the headers or properties of an exchange object in the camel route?

A) Yes. Camel supports various languages to write expressions and predicates in your routes. Some of the supported languages are Spel, XPath, XQuery, Tokenizer, etc. We can use Simple file language (simple) and spring expression language (Spel) in the route DSL to print the exchange information.

 

References:

http://camel.apache.org/architecture.html

http://bigdataconsultants.blogspot.in/2015/02/what-is-apache-camel.html

 

Thanks,

Kalyan