An aspiring software craftsman journey, By Mahmoud Ben Hassine
In the previous post, I tried to compare Easy Batch and Spring Batch in terms of features, ease of use and developer productivity. I came up to the conclusion that with no doubt, Spring Batch provides a richer features set and allows you to do much more than Easy Batch does.
In this post, I will compare Spring Batch and Easy Batch in terms of performance. The goal of the benchmark is to process customer data in the following CSV flat file:
I will measure the performance of reading, parsing and mapping data to the following domain object:
The processing logic depends heavily on the use case, so it will be omitted.
I will the Random Beans library to generate several files of different sizes for the benchmark: 10.000, 100.000, 1.000.000 and 10.000.000 customers.
The configuration of Spring Batch and Easy Batch applications is pretty much like the Hello World application of the previous post,
only the domain object has been changed from
Customer. Here is the main class to launch Easy Batch:
And here is the main class to launch Spring Batch:
The complete source code of this benchmark is available on GitHub here.
The benchmark results have been obtained as an average of 5 executions and have been executed on the following Hardware/Software configuration:
commit-interval is an important parameter for the performance of Spring Batch. I have used different values
for this parameter: 10, 100 and 1000. The following table summarizes the number of input records, the file size and the processing time for each framework:
|Number of records (file size)||Easy Batch (s)||Spring Batch CI = 10 (s)||Spring Batch CI = 100 (s)||Spring Batch CI = 1000 (s)|
|10.000 (964 Ko)||<1||3||3||2|
|100.000 (9.4 Mo)||1||10||6||5|
|1.000.000 (94 Mo)||5||64||37||35|
|10.000.000 (983 Mo)||52||417||343||337|
It’s always better to see results in a chart, so here it is:
The difference is more important for very large data sets:
Please note that this is a macro benchmark, not a micro benchmark at nano seconds level, the goal is to have a rough idea about the whole execution time for both applications.
Easy Batch is about 6x faster than Spring Batch to read, parse and map flat data to domain objects. There are may be other configurations and use cases where Spring Batch performs better than Easy Batch, if you have done other comparisons, please feel free to share your results.
Another benchmark is to compare a parallel version of the application, but this will be done in another post, so keep tuned