Spring Boot - Batch Processing Notes - Raghu-224-249
Spring Boot - Batch Processing Notes - Raghu-224-249
Batch:-- Multiple Operations are executed as one Task or one large/Big Task executed
step by Step.
=>Every task is called as “JOB” and sub task is called as “STEP”.
=>One JOB can contain one or more Steps (Step can also called as Sub Task).
=>One Step contains.
a. Item Reader (Read data from Source).
b. Item Process (Do calculations and logics/operations etc…).
c. Item writer (Provide output to next Step or final output).
Step Implementation:--
=>In a Job (work) we can define one or multiple steps which are executed in order
(step by step).
=>Job may contain 1 step, job may contain 2 step…… job may contains many Steps so,
finally 1-job = * (many) Step.
=>Every step having 3 execution stages
a. ItemReader<T>
b. ItemProcessor<I, O>
c. ItemWriter<T>.
b>ItemProcessor<I, O> :-- It is used to process input data given by Reader and returns
in Modifier (or same) format of data.
c>ItemWriter<T> :-- It will read bulk of Items from Processor at a time and writes to
one destination. Even Destination can be a File (ex: DB, Text File, CSV, EXCEL,
XML…etc).
=>To start Job, we need one JobLauncher… which works in general way or Scheduling
based.
=>Here details like: Job, Steps, Launcher… are store in one memory which is called as
JobRepository [Ex: H2DB or MySQL, DB … any one DB].
NOTE:--
1. An Item can be String, Array, Object of any class, Collection (List/Set….).
2. ItemReader will read one by one Item from Source. For example Source has 10
items then 10 times ItemReader will be executed.
3. ItemReader GenericType must match with ItemProcessor Input GenericType.
4. ItemProcess will read item by item from Reader and does some process (calculate,
convert, check conditions, and convert to any class Object etc…).
5. ItemProcessor will provide output (may be same as Input type) which is called as
Transformed Type.
6. ItemWriter will collect all output Items into one List type from Processor at a time
(Only one time).
7. ItemWriter writes data to any destination.
=>Here T/I/O can be String Object of any class or even collection (List, Set…).
=>Here ItemReader reads data from source with the helps of steps.
=>Reader and Processor are executed for every item, but writer is called based on
chunk size.
Ex:-- No. of Items = 200, chunk=50 then Reader and Processor are executed 200 times
but writer is called one time after 50 items are processed (Here 4 times executed).
=>ItemWriter writes data to destination with the help of step.
=>Programmer can define above interfaces Impl classes or can be also use exist
(pre-defined) classes.
=>Reader, Writer, Processor is functional interfaces. We can define them using
Lambda Expression and methods references even.
2>Job Creation:-- One job is collection of Steps executed in order (one by one).
=>Even job may contain one Step also. Here, job is interface which is constructed using
“JobBuilderFactory <C>” and Step (I) instances.
=>To execute any logic before or after job, define Impl class for “JobExecutionListener
(I)” having method
Like: beforeJob(…) : void and
afterJob(…) : void
=>Here Listener is optional, It may be used to find current status of Batch (Ex:
COMPLETED, STOPTES, FAILED…) start date and time, end date and time etc.
3>Job Execution:-- Once Steps and job are configured, then we need to start them
using “JobLauncher (I)” run(…) method.
=>This run(…) method takes 2 parameters
a. Job (I) and
b. JobParameter (C)
=>Here, JobParameters (C) are inputs given to Job While starting this one.
Ex:- Parameters are : Server Data and Time, Customer Name flags (true/false), Task
Name… etc.
=>JobParameters (C) object is created using “JobParametesBuilder” and its method
toJobParameters().
Step :-- One Step can be constructed using StepBuilderFactory (sf) class by providing
name, chunk size, reader, processor and writer.
StepBuilderFactory (sf):--
sf.get(“step1”) =>Provide name of Step.
.<String, String> chunk(1) => No. of Items to be read at a time.
.reader (readerObj) =>Any Impl class of IteamReader<T> (I)
.processor(processorObj) =>Any Impl class of itemProcessor <I, O>
.writer(writerObje =>Any Impl class of ItemWriter <T> (I)
.build(); =>Convert to step (impl class) Object.
UML Notation:--
JobBuilderFactory jf :--
Jf.get(“jobA”) =>Job Name
.incremental(runIdIncrementar) =>Incrementer
.listener (jobExListener) =>Job Execution Listener
.start (stepA =>First Step
JobExecutionListener (I):--
=>This Interface is provided Spring Batch f/w, which gets called automatically for our
Job.
=>For one job – one Listener can be configured.
=>It is an Interface which has provided two abstract methods.
a. beforeJob (JobExecution) : void
b. afterJob (JobExecution) : void
=>If we write any impl class for above Listener (I) we need to implement two abstract
method in our class.
=>Some times we need only one method in our class file afterJob() method is required.
Then go for JobListenerAdapter(C) which has provided default impl logic for both
beforeJob; and afterJob(); methods.
=>JobExecution is a class which is used to find current job details like jobParameters,
BatchStatus, stepExecutions etc…
Step#1:- Define one Spring Starter Project and select Spring Batch or else add below
dependencies.
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
Coding order:--
1. Reader
2. Processor
3. Writer
4. Step Configuration using StepBuilderFactory
5. JobExecutionListener
6. Job Config –job BuilderFactory
7. JobParameters using JobParametersBuilder
8. JobLauncher using CommandLineRunner
9. ** Add key in application.properties
Spring.batch.job.enabled=false
=>To avoid execution of job multiple times (by Starter class)
application.properties:--
#Disable this otherwise job executed one time by SpringBoot on startup
And also one more time by our launcher
spring.batch.job.enabled=false
spring.batch.initialize-schema=always
spring.datasource.driverClassName=oracle.jdbc.driver.OracleDriver
spring.datasource.url=jdbc:oracle:thin:@localhost:1521:xe
spring.datasource.username=system
spring.datasource.password=system
1>DataReader:--
package com.app.batch.reader;
import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.NonTransientResourceException;
import org.springframework.batch.item.ParseException;
import org.springframework.batch.item.UnexpectedInputException;
import org.springframework.stereotype.Component;
@Override
public String read() throws Exception, UnexpectedInputException,
ParseException, NonTransientResourceException {
if(index < message.length)
{
return message[index++];
} else {
index=0;
}
return null;
}
}
2. DataProcessor:--
package com.app.batch.processor;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.stereotype.Component;
@Component
public class DataProcessor implements ItemProcessor<String, String> {
@Override
public String process(String item) throws Exception {
return item.toUpperCase();
}
}
3. DataWriter:--
package com.app.batch.writer;
import java.util.List;
import org.springframework.batch.item.ItemWriter;
import org.springframework.stereotype.Component;
@Component
public class DataWriter implements ItemWriter<String> {
@Configuration
@EnableBatchProcessing
public class BatchConfig {
@Autowired
private JobBuilderFactory jobBuilderFactory;
@Autowired
private StepBuilderFactory stepBuilderFactory;
@Bean
public Job jobA() {
return jobBuilderFactory.get("jobA")
.incrementer(new RunIdIncrementer())
.listener(listener())
.start(stepA())
//.next(step)
@Component
public class MyJobListener implements JobExecutionListener {
@Override
public void beforeJob(JobExecution jobExecution) {
System.out.println(jobExecution.getStartTime());
System.out.println(jobExecution.getStatus());
}
@Override
public void afterJob(JobExecution jobExecution) {
System.out.println(jobExecution.getEndTime());
System.out.println(jobExecution.getStatus());
}
}
6. MyJobLauncher.java:--
package com.app.batch.runner;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
@Component
public class MyJobLauncher implements CommandLineRunner {
@Autowired
private JobLauncher jobLauncher;
@Autowired
private Job job;
@Override
public void run(String... args) throws Exception {
JobParameters jobParameters = new JobParametersBuilder()
.addLong("time", System.currentTimeMillis())
.toJobParameters();
jobLauncher.run(job, jobParameters);
}
}
Output:--
2. Spring Boot Batch Processing : Converting .csv file data to DataBase table:--
=>Consider input data given by csv file is related to products having product id, name,
cost, by using one item reader convert .csv file data to Product class object.
=>Define one Processor class to calculate gst (Goods and Service Tax) and discount of
product.
=>Finally Product should have id, name, cost, gst, discount.
=>Use one ItemWriter class to convert one object to one Row in DB table.
=>In realtime CSV (Comma Separated Values) Files are used to hold large amount of
data using symbol / Tokenizer ‘,’ or (It will be , . -, \, /).
=>This data will be converted to one Model class(T) object format.
=>It means one line data(id, code, cost…) converted to one java class object by Reader.
=>Calculate Discount and GST… etc using Processor class. Processor may return same
class (or) different Object (i.e I==O)
=>Based on chunck(int) size all objects returned by Processor will be collected into one
List by Writer.
=>Every Object in List will be converted into its equal “INSERT SQL…”.
=>Multiple SQLs are converted to one JDBC Batch and sent to DB at a time.
=>No of calls between Writer and Database depends on chunk size (and no of Items).
FlatFileItemReader:---
=>This class is provided by Spring Batch to read data from any Text Related file
(.txt, .csv,...).
Execution Flow:--
=>Spring Boot Batch f/w has provided pre-defined ItemReaders and ItemWriters.
=>FlatFileItemReader <T> is used to load any file (source) as input to read data
example .txt, .csv,… etc.
=>It will read one line data based on LineMapper (\n).
=>One Line data is divided into multiple values based on Tokenizer (Delimitar = ,).
=>These values are mapped to one class (T) type object also known as Target.
=>Here job.enabled=false will disable execution of job by one time on app starts by
Starter class.
=>Initialize-schema=always will allow Spring Batch to communicate DB to hold its
Repository details (like Job, Step, current status details…).
=>***In case of Embedded Database initialize-schema not required.
application.properties:--
spring.batch.job.enabled=false
spring.batch.initialize-schema=always
spring.datasource.driverClassName=oracle.jdbc.driver.OracleDriver
spring.datasource.url=jdbc:oracle:thin:@localhost:1521:xe
spring.datasource.username=system
spring.datasource.password=system
@Data
public class Product {
private Integer prodId;
private String prodName;
private Double prodCost;
private Double prodGst;
private Double prodDisc;
}
@Override
public Product process(Product item) throws Exception {
item.setProdGst(item.getProdCost()*12/100.0);
item.setProdDisc(item.getProdCost()*25/100.0);
return item;
}
}
3. BatchConfig.java:--
package com.app.config;
import javax.sql.DataSource;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.
EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.database.
BeanPropertyItemSqlParameterSourceProvider;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import org.springframework.jdbc.datasource.DriverManagerDataSource;
import com.app.model.Product;
import com.app.process.ProductProcessor;
@Bean
public Step stepA() {
return sf.get("stepA")
.<Product, Product> chunk(3)
.reader(reader())
.processor(processor())
.writer(writer())
.build();
}
//JOB
@Autowired
private JobBuilderFactory jf;
@Bean
public Job jobA() {
return jf.get("jobA")
.incrementer(new RunIdIncrementer())
.start(stepA())
.build();
}
}
//dataSource -- Creates DB connection
@Bean
public DataSource dataSource() {
DriverManagerDataSource ds = new DriverManagerDataSource();
ds.setDriverClassName("oracle.jdbc.driver.OracleDriver");
ds.setUrl("jdbc:oracle:thin:@localhost:1521:xe");
ds.setUsername("system");
ds.setPassword("system");
return ds;
}
@Component
public class MyJobLauncher implements CommandLineRunner{
@Autowired
private JobLauncher jobLauncher;
@Autowired
private Job job;
@Override
public void run(String... args) throws Exception {
jobLauncher.run(job,new JobParametersBuilder()
.addLong("time",System.currentTimeMillis())
.toJobParameters());
}
}
----------------------------------------------------------------------------------------------------------------
DB Table:-- Execute below SQL query before execution of program to create table.
SQL>CREATE TABLE prodstab (PID number (10), PNAME varchar2 (50), PCOST number,
PGST number, PDISC number);
Task:--
#1. Write Spring boot Batch Application To read data from database (Oracle DB) using
“JdbcCursorItemReader” and write data to csv file using “FlatFIleItemWriter”.
#2. Write data from MongoDB using “MongoItemReader” and write data to JSON file
using “JsonFIleItemWriter”.
=>All these are functional Interface (contains only one (1) abstract method.
So Logic can be provided to these interfaces using Lambda Expression.
Lambda Exp:--
@Bean
ItemProcessor<Product, Product> process() {
return (p) -> {
double cost=p.getCost();
p.setDisc(cost*3/100.0);
p.setGst(cost*12/100.0);
return p;
};
}
Naresh IT, Hyderabad P: 040-2374 6666,9000994007 /08] Page 246
[Raghu Sir] [NareshIT, Hyd]
***Different ways of creating object and calling method:--
Consider below class:--
class Sample {
Sample() {
System.out.println("Constructor");
}
void show () {
System.out.println("Method");
}
}
Text class:--
public class WayOfCreatingObject
{
public static void main(String[] args) {
//3. Creating object (add extra code, override methods) and calling method
new Sample () {
public void show() {
System.out.println("NEW LOGIC");
}
}.show();
package Com.app;
public class Test
Public static void main {