An aspiring software craftsman journey, By Mahmoud Ben Hassine
Disclaimer: I am the author of Easy Batch and for those who don’t know me, I tend to be a constructive person.
In this post I will compare objectively Easy Batch and Spring Batch frameworks in terms of features, ease of use and developer productivity. The goal is not to say that Easy Batch is better than Spring Batch or vice versa, the goal is to say in which situation it is better to use one framework over the other.
First of all, let me start out by saying that I am a VERY big fan of Spring framework and all related projects (Spring data, Spring batch, etc). Spring Batch is an awesome framework, really! Entire books have been written on Spring Batch due to the very rich feature set it offers. Nevertheless, In my honest opinion, Spring Batch (and the JSR 352) has some shortcomings. Before giving my own point of view, I’ve tried to collect some feedback from Spring Batch users on the net:
"I got a little overwhelmed by the complexity and amount of configuration needed for even a simple example"
"What should we think of the Spring Batch solution? Complex. Obviously, it looks more complicated than the simple approaches. This is typical of a framework: the learning curve is steeper."
"For individuals with no prior Spring Batch knowledge, implementing Spring Batch and creating a functioning job may take as little as a couple workdays. After the initial setup, building fairly complex batch jobs can take around another day"
"Spring Batch application grows pretty quick and involves configuring a lot of stuff that, at the outset, it just doesn't seem like you should need to configure. A "job repository" to track the status and history of job executions, which itself requires a data source - just to get started? Wow, that's a bit heavy handed"
"I recently evaluated Spring Batch, and quickly rejected it once I realized that it added nothing to my project aside from bloat and overhead"
"Verbose configuration. I differ with the choice of the defaults. I would have gone with a non-persistent job repository and a resourceless transaction manager. But then they chose the defaults that showcase their salient features"
"il faut configurer le composant qui permet de lancer un batch, le « jobLauncher ». Simple, mais on voit que l’on a besoin d’un « jobRepository » qui permet de suivre et de reprendre l’avancement des tâches. On voit que l’on a besoin d’un transaction manager. Cette propriété est obligatoire, ce qui est à mon sens dommage pour les cas simples comme le nôtre où nous n’utilisons pas les transactions."
"Spring Batch or How Not to Design an API.. Why do I Need a Transaction Manager? Why do I Need a Job Repository?"
Most of these posts are quite recent, there are a couple of them that seem to be outdated, but this is still true for the last version of Spring Batch (v3.0.3 as of writing this post).
These reactions from the community can be summarized in 3 points:
Personally, steep learning curve is not a problem if it worth it (and it does for Spring Batch!). Complex configuration is also a point that I can accept. But, the most annoying thing, in my opinion, is that I am forced to configure components that I don’t need:
The Interface Segregation Principle says that a client should not be forced to depend on methods it does not use. With Spring Batch I am being forced to use features that I don’t need. So I do believe it is a kind of violation of this principle in a more general way.
That said, Spring Batch is well suited for use cases where you really need advanced features like retry on failure, remoting , flows, etc. If your batch application does not require such advanced features, using Spring Batch is like using a hummer to kill a fly In such situations, usually in-house solutions are created from scratch. And this is where Easy Batch comes to play, as a middle lightweight solution between Spring Batch and the “Do It Yourself” way:
In this post, I will implement a hello world batch application to process a CSV file containing tweets. I will then try to evaluate the effort needed to configure the application using both frameworks.
The data source is a CSV file containing the following tweets:
The goal is to print out tweets in upper case to the console.
Records should be mapped to the following
Tweet domain object:
First, let’s create a tweet processor to implement the application business logic:
Then, create a writer:
And finally, configure the application:
Here is the class to launch the application with Spring Batch:
Please note that in this case, we have written more lines of configuration code than the application’s business logic itself.
With Easy Batch, usually you just have to implement the application business logic and let the framework take care of the boilerplate code of reading, parsing and mapping data to domain objects. So let’s create a tweet processor to implement the batch business logic:
Then, let’s create a main method to launch the application with Easy Batch:
Tweet object is common to both scenarios; it will be ignored. The following table summarizes the runtime dependencies size, the number of line of code (imports and empty lines are ignored) and the number of XML configuration lines of both solutions:
|Spring Batch||Easy Batch|
|Runtime dependencies||14 jars (4.43 Mo)||2 jars (130 Kb)|
|Lines of code||23||19|
As you can see, even for a simple application, Spring Batch still require to configure a lot of technical stuff that you don’t really need, which is not the case for Easy Batch. Note that even if Spring Batch has been configured with Java instead of XML, you still need roughly the same amount code to achieve the same XML configuration above.
But ok, Easy Batch is more lightweight, may be easier to learn, configure and use, this does not make it suitable for all use cases. Let’s see a side by side comparison of features for both frameworks:
|Feature||Spring Batch||Easy Batch|
|POJO based development||Yes||Yes|
|Real time monitoring||Yes||Yes|
|Job configuration||Java, Xml, Annotations||Java|
|Transaction management||Declarative, Programmatic||Declarative, Programmatic|
|Retry on failure||Yes||Yes (using a listener)|
|Remote job administration||Yes||No|
|Implements the JSR 352||Yes||No|
There is no doubt, Spring Batch outcomes Easy Batch in term of features, but this comes with a cost, a complex configuration and a steep learning curve. The goal of Easy Batch is to keep the framework small and easy but at the same time extensible and flexible with smart defaults.
In this post, I tried to show that Easy Batch can be useful in situations where Spring Batch is a heavy solution for the requirement, and developing a new solution from scratch is not an option neither. So let me conclude by saying that we should be pragmatic: choose the right tool for the right job! If your application requires advanced features like retry on failure, remoting or flows, then go for Spring Batch (or an implementation of JSR352). If you don’t need all this advanced stuff; then Easy Batch can be very handy to simplify your batch application development.
The source code of this comparison can be found here.