Start programming with SPRING BATCH
Introduction
In your personal or professional projects, you sometimes process large volumes of data. Data batch processing is an efficient way of processing large volumes of data where the data is collected, processed and then the batch results are produced. Batch processing can be applied in many use cases. A common use case for batch processing is transforming a large set of CSV or JSON files into a structured format ready for further processing.
In this tutorial, we will try to see how to set up this architecture with Spring Boot which is a framework that facilitates the development of applications based on Spring.
What is Spring-Batch?
Spring Batch is an open source framework for batch processing. It is a lightweight, comprehensive solution designed to enable the development of robust batch applications, often found in modern enterprise systems. Its development is the result of a collaboration between SpringSource and Accenture.
It makes it possible to overcome recurring problems during batch development:
- productivity
- management of large volumes of data
- reliability
- reinvention of the wheel.
NB: In IT, a batch is a program operating in StandAlone, carrying out a set of processing operations on a volume of data.
Basic Architecture of Spring Batch
To manage batch data, we mainly use the following three tools:
JobLauncher: This is the component responsible for launching/starting the batch program. It can be configured to trigger itself or to be triggered by an external event (manual launch). In the Spring Batch workflow, the JobLauncher is responsible for executing a Job.
Job: This is the component that represents the task to which responsibility for the business need addressed in the program is delegated. It is responsible for sequentially launching one or more Steps.
Step: this is the component that envelops the very heart of the business need to be addressed. It is responsible for defining three subcomponents structured as follows:
ItemReader: This is the component responsible for reading the input data to be processed. They can come from various sources (databases, flat files (csv, xml, xls, etc.), queue);
ItemProcessor: This is the component responsible for transforming the data read. It is within it that all management rules are implemented.
ItemWriter: this component saves the data transformed by the processor in one or more desired containers (databases, flat files (csv, xml, xls, etc.), cloud).
JobRepository: this is the component responsible for recording statistics from monitoring on the JobLauncher, the Job and the Step(s) at each execution. It offers two possible techniques for storing these statistics: using a database or using a Map. When the statistics are stored in a database, and therefore persisted in a lasting manner, this allows the continuous monitoring of the Batch over time in order to analyze possible problems in the event of failure. Conversely, when it is in a Map, the persisted statistics will be lost at the end of each Batch execution instance. In all cases, one or the other must be configured.
For more information, I advise you to consult the Spring website.
After this brief explanation of the spring batch architecture, let's now try to show how to set up a spring batch job which will read data from a CSV file which we will subsequently insert into a database."Let's get into coding".
Project setup
The easiest way to generate a Spring Boot project is to use the Spring Boot Tool with the steps below:
- Go to the Spring Initializr website
- Select Maven Project and Java language
- Add Spring Batch, JPA, Lombok, H2 Database
- Enter the group name as "com.example" and the artifact as "SpringBatch"
- Click the generate button
Once the project is generated, you must unzip it then import it into your IDE.
Technologies used:
- JDK 1.8
- Maven
- IntelliJ
- Lombok
- Spring data JPA
- H2 Database
Project dependencies
All project dependencies are in the pom.xml file. The three letters POM are the acronym for Project Object Model. Its XML representation is translated by Maven into a data structure that represents the project model.
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>2.3.3.RELEASE</version> <relativePath/> <!-- lookup parent from repository --> </parent> <groupId>com.pathus</groupId> <artifactId>SpringBatchExample</artifactId> <version>0.0.1-SNAPSHOT</version> <name>SpringBatchExample</name> <description>Demo of spring batch project </description> <properties> <java.version>1.8</java.version> </properties> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-batch</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-jpa</artifactId> </dependency> <dependency> <groupId>com.h2database</groupId> <artifactId>h2</artifactId> <scope>runtime</scope> </dependency> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <optional>true</optional> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> <exclusions> <exclusion> <groupId>org.junit.vintage</groupId> <artifactId>junit-vintage-engine</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.springframework.batch</groupId> <artifactId>spring-batch-test</artifactId> <scope>test</scope> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-maven-plugin</artifactId> </plugin> </plugins> </build> </project>
Project Structure
The structure of the project is as follows:
Job configuration
To enable batch processing, we need to annotate the configuration class with @EnableBatchProcessing. We must then create a reader to read our CSV file, create a processor to process the input data before writing, create a writer to write to the database.
import org.springframework.batch.core.Job; import org.springframework.batch.core.Step; import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing; import org.springframework.batch.core.configuration.annotation.JobBuilderFactory; import org.springframework.batch.core.configuration.annotation.StepBuilderFactory; import org.springframework.batch.item.ItemProcessor; import org.springframework.batch.item.ItemReader; import org.springframework.batch.item.ItemWriter; import org.springframework.batch.item.file.FlatFileItemReader; import org.springframework.batch.item.file.LineMapper; import org.springframework.batch.item.file.mapping.DefaultLineMapper; import org.springframework.batch.item.file.transform.DelimitedLineTokenizer; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.beans.factory.annotation.Value; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.core.io.ClassPathResource; import com.pathus90.springbatchexample.batch.StudentProcessor; import com.pathus90.springbatchexample.batch.StudentWriter; import com.pathus90.springbatchexample.model.Student; import com.pathus90.springbatchexample.model.StudentFieldSetMapper; @Configuration @EnableBatchProcessing @EnableScheduling public class BatchConfig { private static final String FILE_NAME = "results.csv"; private static final String JOB_NAME = "listStudentsJob"; private static final String STEP_NAME = "processingStep"; private static final String READER_NAME = "studentItemReader"; @Value("${header.names}") private String names; @Value("${line.delimiter}") private String delimiter; @Autowired private JobBuilderFactory jobBuilderFactory; @Autowired private StepBuilderFactory stepBuilderFactory; @Bean public Step studentStep() { return stepBuilderFactory.get(STEP_NAME) .<Student, Student>chunk(5) .reader(studentItemReader()) .processor(studentItemProcessor()) .writer(studentItemWriter()) .build(); } @Bean public Job listStudentsJob(Step step1) { return jobBuilderFactory.get(JOB_NAME) .start(step1) .build(); } @Bean public ItemReader<Student> studentItemReader() { FlatFileItemReader<Student> reader = new FlatFileItemReader<>(); reader.setResource(new ClassPathResource(FILE_NAME)); reader.setName(READER_NAME); reader.setLinesToSkip(1); reader.setLineMapper(lineMapper()); return reader; } @Bean public LineMapper<Student> lineMapper() { final DefaultLineMapper<Student> defaultLineMapper = new DefaultLineMapper<>(); final DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer(); lineTokenizer.setDelimiter(delimiter); lineTokenizer.setStrict(false); lineTokenizer.setNames(names.split(delimiter)); final StudentFieldSetMapper fieldSetMapper = new StudentFieldSetMapper(); defaultLineMapper.setLineTokenizer(lineTokenizer); defaultLineMapper.setFieldSetMapper(fieldSetMapper); return defaultLineMapper; } @Bean public ItemProcessor<Student, Student> studentItemProcessor() { return new StudentProcessor(); } @Bean public ItemWriter<Student> studentItemWriter() { return new StudentWriter(); } }
Configuration of the job and the Step
The first method defines the job and the second defines a single step. Jobs are created from steps, where each step can involve a reader, a processor and a writer. In the step definition, we define the amount of data to write at a time and in our case, it writes up to 5 records at a time. Then, we configure the reader, the processor and the writer using the beans injected previously. When defining our job, it will be able to define different steps within our execution through a precise order. the step studentStep will be executed by the job listStudentsJob.
@Bean public Step studentStep() { return stepBuilderFactory.get(STEP_NAME) .<Student, Student>chunk(5) .reader(studentItemReader()) .processor(studentItemProcessor()) .writer(studentItemWriter()) .build(); } @Bean public Job listStudentsJob(Step step1) { return jobBuilderFactory.get(JOB_NAME) .start(step1) .build(); }
Definition of Reader
In our batch configuration, the Reader reads a data source and is called successively within a step and returns objects for which it is defined (Student in our case).
@Bean public ItemReader<Student> studentItemReader() { FlatFileItemReader<Student> reader = new FlatFileItemReader<>(); reader.setResource(new ClassPathResource(FILE_NAME)); reader.setName(READER_NAME); reader.setLinesToSkip(1); reader.setLineMapper(lineMapper()); return reader; }
The FlatFileItemReader class uses the DefaultLineMapper class which in turn uses the DelimitedLineTokenizer class. The role of DelimitedLineTokenizer is to decompose each line into a FieldSet object and the names property gives the format of the file header and allows the data of each line to be identified. This names property is used by the data transformation implementation class into a business object through the FieldSet object. This is the class indicated by the fieldSetMapper (StudentFieldSetMapper) property.
import org.springframework.batch.item.file.mapping.FieldSetMapper; import org.springframework.batch.item.file.transform.FieldSet; public class StudentFieldSetMapper implements FieldSetMapper<Student> { @Override public Student mapFieldSet(FieldSet fieldSet) { return Student.builder() .rank(fieldSet.readString(0)) .firstName(fieldSet.readString(1)) .lastName(fieldSet.readString(2)) .center(fieldSet.readString(3)) .pv(fieldSet.readString(4)) .origin(fieldSet.readString(5)) .mention(fieldSet.readString(6)) .build(); } }
The LineMapper interface is used to map lines (strings) to objects generally used to map lines read from a file
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>2.3.3.RELEASE</version> <relativePath/> <!-- lookup parent from repository --> </parent> <groupId>com.pathus</groupId> <artifactId>SpringBatchExample</artifactId> <version>0.0.1-SNAPSHOT</version> <name>SpringBatchExample</name> <description>Demo of spring batch project </description> <properties> <java.version>1.8</java.version> </properties> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-batch</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-jpa</artifactId> </dependency> <dependency> <groupId>com.h2database</groupId> <artifactId>h2</artifactId> <scope>runtime</scope> </dependency> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <optional>true</optional> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> <exclusions> <exclusion> <groupId>org.junit.vintage</groupId> <artifactId>junit-vintage-engine</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.springframework.batch</groupId> <artifactId>spring-batch-test</artifactId> <scope>test</scope> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-maven-plugin</artifactId> </plugin> </plugins> </build> </project>
Processor Definition
Unlike a Reader, the implementations of a Processor are more for functional needs. It is not obligatory and we can do without it if no functional need is foreseen in our processing. In our example, we have written a simple processor which only converts a few attributes of our student object into uppercase and we can go beyond this example with more concrete functional cases.
import org.springframework.batch.core.Job; import org.springframework.batch.core.Step; import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing; import org.springframework.batch.core.configuration.annotation.JobBuilderFactory; import org.springframework.batch.core.configuration.annotation.StepBuilderFactory; import org.springframework.batch.item.ItemProcessor; import org.springframework.batch.item.ItemReader; import org.springframework.batch.item.ItemWriter; import org.springframework.batch.item.file.FlatFileItemReader; import org.springframework.batch.item.file.LineMapper; import org.springframework.batch.item.file.mapping.DefaultLineMapper; import org.springframework.batch.item.file.transform.DelimitedLineTokenizer; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.beans.factory.annotation.Value; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.core.io.ClassPathResource; import com.pathus90.springbatchexample.batch.StudentProcessor; import com.pathus90.springbatchexample.batch.StudentWriter; import com.pathus90.springbatchexample.model.Student; import com.pathus90.springbatchexample.model.StudentFieldSetMapper; @Configuration @EnableBatchProcessing @EnableScheduling public class BatchConfig { private static final String FILE_NAME = "results.csv"; private static final String JOB_NAME = "listStudentsJob"; private static final String STEP_NAME = "processingStep"; private static final String READER_NAME = "studentItemReader"; @Value("${header.names}") private String names; @Value("${line.delimiter}") private String delimiter; @Autowired private JobBuilderFactory jobBuilderFactory; @Autowired private StepBuilderFactory stepBuilderFactory; @Bean public Step studentStep() { return stepBuilderFactory.get(STEP_NAME) .<Student, Student>chunk(5) .reader(studentItemReader()) .processor(studentItemProcessor()) .writer(studentItemWriter()) .build(); } @Bean public Job listStudentsJob(Step step1) { return jobBuilderFactory.get(JOB_NAME) .start(step1) .build(); } @Bean public ItemReader<Student> studentItemReader() { FlatFileItemReader<Student> reader = new FlatFileItemReader<>(); reader.setResource(new ClassPathResource(FILE_NAME)); reader.setName(READER_NAME); reader.setLinesToSkip(1); reader.setLineMapper(lineMapper()); return reader; } @Bean public LineMapper<Student> lineMapper() { final DefaultLineMapper<Student> defaultLineMapper = new DefaultLineMapper<>(); final DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer(); lineTokenizer.setDelimiter(delimiter); lineTokenizer.setStrict(false); lineTokenizer.setNames(names.split(delimiter)); final StudentFieldSetMapper fieldSetMapper = new StudentFieldSetMapper(); defaultLineMapper.setLineTokenizer(lineTokenizer); defaultLineMapper.setFieldSetMapper(fieldSetMapper); return defaultLineMapper; } @Bean public ItemProcessor<Student, Student> studentItemProcessor() { return new StudentProcessor(); } @Bean public ItemWriter<Student> studentItemWriter() { return new StudentWriter(); } }
Definition of Writer
The Writer writes the data coming from the processor (or directly read by the Reader). In our case, it receives the transformed objects from the processor and each object will subsequently be persisted in our database and the transaction will be validated.
@Bean public Step studentStep() { return stepBuilderFactory.get(STEP_NAME) .<Student, Student>chunk(5) .reader(studentItemReader()) .processor(studentItemProcessor()) .writer(studentItemWriter()) .build(); } @Bean public Job listStudentsJob(Step step1) { return jobBuilderFactory.get(JOB_NAME) .start(step1) .build(); }
Batch configuration file (application.properties)
@Bean public ItemReader<Student> studentItemReader() { FlatFileItemReader<Student> reader = new FlatFileItemReader<>(); reader.setResource(new ClassPathResource(FILE_NAME)); reader.setName(READER_NAME); reader.setLinesToSkip(1); reader.setLineMapper(lineMapper()); return reader; }
CSV file to write to database
import org.springframework.batch.item.file.mapping.FieldSetMapper; import org.springframework.batch.item.file.transform.FieldSet; public class StudentFieldSetMapper implements FieldSetMapper<Student> { @Override public Student mapFieldSet(FieldSet fieldSet) { return Student.builder() .rank(fieldSet.readString(0)) .firstName(fieldSet.readString(1)) .lastName(fieldSet.readString(2)) .center(fieldSet.readString(3)) .pv(fieldSet.readString(4)) .origin(fieldSet.readString(5)) .mention(fieldSet.readString(6)) .build(); } }
Launching the application
Once we have finished setting up the batch configuration, let's now see if everything said above works
To run the application, you need to look for the file that contains the annotation @SpringBootApplication which is the main part of our application.
@Bean public LineMapper<Student> lineMapper() { final DefaultLineMapper<Student> defaultLineMapper = new DefaultLineMapper<>(); final DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer(); lineTokenizer.setDelimiter(delimiter); lineTokenizer.setStrict(false); lineTokenizer.setNames(names.split(delimiter)); final StudentFieldSetMapper fieldSetMapper = new StudentFieldSetMapper(); defaultLineMapper.setLineTokenizer(lineTokenizer); defaultLineMapper.setFieldSetMapper(fieldSetMapper); return defaultLineMapper; }
Launching the main above will start our job and the batch launcher looks like this:
import org.springframework.batch.item.ItemProcessor; import com.pathus90.springbatchexample.model.Student; public class StudentProcessor implements ItemProcessor<Student, Student> { @Override public Student process(Student student) { student.setFirstName(student.getFirstName().toUpperCase()); student.setLastName(student.getLastName().toUpperCase()); student.setCenter(student.getCenter().toUpperCase()); student.setOrigin(student.getOrigin().toUpperCase()); student.setMention(student.getMention().toUpperCase()); return student; } }
A scheduler has been set up to allow the batch to be triggered automatically. In this example, the batch once launched will run every 8 seconds. You can play with it by changing the fixedDelayvalue in milliseconds.
import java.util.List; import org.springframework.batch.item.ItemWriter; import org.springframework.beans.factory.annotation.Autowired; import com.pathus90.springbatchexample.model.Student; import com.pathus90.springbatchexample.service.IStudentService; import lombok.extern.slf4j.Slf4j; @Slf4j public class StudentWriter implements ItemWriter<Student> { @Autowired private IStudentService studentService; @Override public void write(List<? extends Student> students) { students.stream().forEach(student -> { log.info("Enregistrement en base de l'objet {}", student); studentService.insertStudent(student); }); } }
In addition to running the main file above to start the batch, you can also run the command mvn spring-boot:run while using a command prompt.
You can also launch the application with the JAR archive file and in this case you must:
Go to the parent folder of the project using a command prompt and execute the command mvn clean package which will package our project.
In the target folder, a jar file will be created.
To run the application, use the command java -jar target/generated_file_name-0.0.1-SNAPSHOT.jar
Also ensure that the H2console has already started when launching our spring batch application and the database is automatically generated as well as the creation of the table Student .
We can clearly see that our file has been well integrated into our database.
N.B:If we also want to start the batch manually without passing a schedulerwhich will be triggered depending on our settings, I have exposed an API using of the controller to call the Job Spring Batch.
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>2.3.3.RELEASE</version> <relativePath/> <!-- lookup parent from repository --> </parent> <groupId>com.pathus</groupId> <artifactId>SpringBatchExample</artifactId> <version>0.0.1-SNAPSHOT</version> <name>SpringBatchExample</name> <description>Demo of spring batch project </description> <properties> <java.version>1.8</java.version> </properties> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-batch</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-jpa</artifactId> </dependency> <dependency> <groupId>com.h2database</groupId> <artifactId>h2</artifactId> <scope>runtime</scope> </dependency> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <optional>true</optional> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> <exclusions> <exclusion> <groupId>org.junit.vintage</groupId> <artifactId>junit-vintage-engine</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.springframework.batch</groupId> <artifactId>spring-batch-test</artifactId> <scope>test</scope> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-maven-plugin</artifactId> </plugin> </plugins> </build> </project>
just launch the URL: http://localhost:8080/load and the batch will launch
We have reached the end of our first learning about batch programming using the Spring framework. Leave comments or questions if you have any!
Happy learning everyone and I hope that this first tutorial will be beneficial to you.
You will find the source code available here
References
- https://spring.io/guides/gs/batch-processing/
- https://jeremy-jeanne.developpez.com/tutoriels/spring/spring-batch/#LIII-B-3
- https://www.baeldung.com/introduction-to-spring-batch
The above is the detailed content of Start programming with SPRING BATCH. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











Troubleshooting and solutions to the company's security software that causes some applications to not function properly. Many companies will deploy security software in order to ensure internal network security. ...

Solutions to convert names to numbers to implement sorting In many application scenarios, users may need to sort in groups, especially in one...

Field mapping processing in system docking often encounters a difficult problem when performing system docking: how to effectively map the interface fields of system A...

When using MyBatis-Plus or other ORM frameworks for database operations, it is often necessary to construct query conditions based on the attribute name of the entity class. If you manually every time...

Start Spring using IntelliJIDEAUltimate version...

Conversion of Java Objects and Arrays: In-depth discussion of the risks and correct methods of cast type conversion Many Java beginners will encounter the conversion of an object into an array...

Detailed explanation of the design of SKU and SPU tables on e-commerce platforms This article will discuss the database design issues of SKU and SPU in e-commerce platforms, especially how to deal with user-defined sales...

How does the Redis caching solution realize the requirements of product ranking list? During the development process, we often need to deal with the requirements of rankings, such as displaying a...
