ETL and Event Sourcing
Integration Architecture: Best Practice and Case Study
Marc Siegel - Panorama Education - Wed Feb 6 2019
ETL pipelines from external systems
ETL and Event Sourcing
Prerequisite knowledge
Familiarity with traditional ETL architectures:
Software systems that Extract data from external systems,
Transform them, and Load the resulting data sets into internal
systems, most often relational databases
Dissatisfaction with traditional ETL architectures / curiosity to
learn about and consider an alternative architecture
ETL and Event Sourcing
What you’ll learn
How Event Sourcing can be applied to ETL
How Determinism can be a property of a system
Value of treating the Past as First Class
What is ETL?
ETL
In a nutshell
ETL
In a nutshell
External
System
ETL
Traditional ETL Process
Extract
In a nutshell
External
System
ETL
Traditional ETL Process
Extract Transform
In a nutshell
External
System
ETL
Traditional ETL Process
Extract Transform Load
In a nutshell
External
System
ETL
Traditional ETL Process
Extract Transform Load
Internal
Database
In a nutshell
External
System
ETL
Traditional ETL Process
Extract Transform Load
Internal
Database
In a nutshell
External
System
Q: What is the System of Record?
What is the Source of Truth?
ETL
In a nutshell
External
System
System of Record
The authoritative data source for a given
data element or piece of information (1)
ETL
Internal
Database
In a nutshell
Source of Truth
A trusted data source that gives a complete
picture of the data object as a whole (2)
ETL
Traditional ETL Process
Extract Transform Load
Internal
Database
In a nutshell
External
System
ETL and Event Sourcing
ETL Challenges
Operational
Domain Modelling
Selective Attention
ETL Challenges
Operational
Domain Modelling
Selective Attention
Must rerun long ETL job to test edge case
Missing Interests:
● Decoupling
ETL Challenges
Operational
Domain Modelling
Selective Attention
Must rerun long ETL job to test edge case
Running ETL job can overwrite history
Missing Interests:
● Decoupling
● Determinism
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL Challenges
ETL Challenges
Operational
Domain Modelling
Selective Attention
Must create one true schema to load into
Missing Interests:
● Decoupling (of each interpretation)
ETL Challenges
Operational
Domain Modelling
Selective Attention
Must create one true schema to load into
Tend toward lowest common denominator
OR superset of all external model features
Missing Interests:
● Decoupling (of each interpretation)
● Modeling State Explicitly
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL Challenges
ETL Challenges
Operational
Domain Modelling
Selective Attention
From Psychology: the act of focusing on a
particular object while ignoring irrelevant
information
→ Can’t re-interpret past extracts
Missing Interests:
● Past as First Class
ETL Problems
Awareness Tests
YouTube:
● Basketball
● Monkey
Business
How many passes did the team in white make?
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL Challenges
ETL Advantage
Not just problems. Positive trade-offs of ETL?
● Low Costs: Training, framing, explaining
○ Training: Low cost to train new engineers in ETL concepts
○ Framing: No requirement for explicit domain modeling
○ Explaining: Intuitive to explain to non-engineers
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL Challenges
ETL and Event Sourcing
What is ELT?
ETL
Traditional ETL Process
Extract Transform Load
Internal
Database
In a nutshell
External
System
ETL and ELT
Traditional ETL Process
Extract Transform Load
Internal
Database
External
System
ETL and ELT
EL Process
Extract
Traditional ETL Process
Extract Transform Load
Internal
Database
Load
External
System
ETL and ELT
EL Process
Extract
Data Lake
or Blob or
File Store
Traditional ETL Process
Extract Transform Load
Internal
Database
Load
External
System
ETL and ELT
EL Process
Extract
Data Lake
or Blob or
File Store
T Process
Do anything here! Many vendors
offering various solutions.
Traditional ETL Process
Extract Transform Load
Internal
Database
Load
External
System
ETL and ELT
EL Process
Extract
Data Lake
or Blob or
File Store
T Process(es)
Do anything here! Many vendors
offering various solutions.
Traditional ETL Process
Extract Transform Load
Internal
Database
Load
External
System
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL and ELT
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL and ELT
ETL and ELT
EL Process
Extract
Data Lake
or Blob or
File Store
T Process(es)
Do anything here! Many vendors
offering various solutions.
Traditional ETL Process
Extract Transform Load
Internal
Database
Load
External
System
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL and ELT
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL and ELT
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL and ELT
What is Event Sourcing?
ETL
Traditional ETL Process
Extract Transform Load
Internal
Database
In a nutshell
External
System
ETL and ELT
EL Process
Extract
Data Lake
or Blob or
File Store
T Process(es)
Do anything here! Many vendors
offering various solutions.
Traditional ETL Process
Extract Transform Load
Internal
Database
Load
External
System
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
TeTL Process
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
TeTL Process
Domain
Events
Tr
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
TeTL Process
Domain
Events
Tr Tr Lo
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
Read
Model
TeTL Process
Domain
Events
Tr Tr Lo
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
Read
Model(s)
TeTL Process(es)
Domain
Events
Tr Tr Lo
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
Read
Model(s)
TeTL Process(es)
Domain
Events
Tr Tr Lo
1) Decouple extractions 2) Source of Truth: the extracts 3) Deterministic transform: to events + to model
regular expression mnemonic: from /(ETL)/ to /E{1}T*L*/ ← Extract once, Transform & Load
Infinitely
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL, ELT, and Event Sourcing
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL, ELT, and Event Sourcing
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL, ELT, and Event Sourcing
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
Read
Model(s)
TeTL Process(es)
Domain
Events
Tr Tr Lo
1) Decouple extractions 2) Source of Truth: the extracts 3) Deterministic transform: to events + to model
regular expression mnemonic: from /(ETL)/ to /E{1}T*L*/ ← Extract once, Transform & Load
Infinitely
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL, ELT, and Event Sourcing
Event Sourcing Challenge
Not just advantages. Negative trade-offs of ES?
● High Costs: Training, framing, explaining
○ Training: Higher cost to train new engineers in ES concepts
○ Framing: Requirement for (lots of) explicit domain modeling
○ Explaining: Not necessarily intuitive to explain to non-engineers
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL, ELT, and Event Sourcing
ETL and Event Sourcing
How does Event Sourcing work?
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Event Sourcing Basics
Events
State transitions are an important part of our problem space and
should be modeled within our domain.
Event Sourcing Basics
Events
State transitions are an important part of our problem space and
should be modeled within our domain.
Event Sourcing says all state is transient and you only store facts.
Event Sourcing Basics
Events
State transitions are an important part of our problem space and
should be modeled within our domain.
Event Sourcing says all state is transient and you only store facts.
Event: something that happened in the past; a fact; a state
transition.
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Read Models
student_id course_id grade
123 abc B+
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Read Models
student_id course_id grade
123 abc C
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Read Models
student_id course_id grade
123 abc A-
Event Sourcing Basics
Read Models
Event Sourcing takes the term Read Model from CQRS.
Event Sourcing Basics
Read Models
Event Sourcing takes the term Read Model from CQRS.
A Read Model is an interpretation of a sequence of events, that is
optimized for answering a given set of queries (reads).
Event Sourcing Basics
Read Models
Event Sourcing takes the term Read Model from CQRS.
A Read Model is an interpretation of a sequence of events, that is
optimized for answering a given set of queries (reads).
Read Models: are independent representations of state that we
deterministically regenerate from events using projections.
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Projections
def f(state, event)
state.where(
student_id: event.student_id,
course_id: event.course_id
).update(grade: event.grade)
end
student_id course_id grade
123 abc A-
Event Sourcing Basics
Projections
When we talk about Event Sourcing, current state is a left-fold of
previous behaviors.
Event Sourcing Basics
Projections
When we talk about Event Sourcing, current state is a left-fold of
previous behaviors.
We play back a stream of events, applying a function
f ( staten
, eventn
) -> staten+1
Event Sourcing Basics
Projections
When we talk about Event Sourcing, current state is a left-fold of
previous behaviors.
We play back a stream of events, applying a function
f ( staten
, eventn
) -> staten+1
Projection: a function through which we apply events in sequence
to deterministically derive the state of our application
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Projections
def f(state, event)
state.where(
student_id: event.student_id,
course_id: event.course_id
).update(grade: event.grade)
end
student_id course_id grade
123 abc A-
Read Models
Event Sourcing Basics
Review
Event: something that happened in the past; a fact; a state
transition.
Projection: a function through which we apply events in sequence
to deterministically derive the state of our application
Read Models: are independent representations of state that we
deterministically regenerate from events using projections.
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Projections
def f(state, event)
state.where(
student_id: event.student_id,
course_id: event.course_id
).update(grade: event.grade)
end
student_id course_id grade
123 abc A-
Read Models
Applying Event Sourcing to ETL
Applying Event Sourcing to ETL
Q: How to we get from ETL to explicitly modeled Domain Events?
Applying Event Sourcing to ETL
Q: How to we get from ETL to explicitly modeled Domain Events?
Immutable &
Sequential
Store
Read
Model(s)
TeTL Process(es)
Domain
Events
Tr Tr Lo
Applying Event Sourcing to ETL
Q: How to we get from ETL to explicitly modeled Domain Events?
A: Build an Observational Event Sourced system
Immutable &
Sequential
Store
Read
Model(s)
TeTL Process(es)
Domain
Events
Tr Tr Lo
Observations
student_id course_id grade
123 abc A-
Applying Event Sourcing to ETL
Domain Events
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Read Models
Applying Event Sourcing to ETL
Observational
When capturing observations of external systems using Event
Sourcing, the events in our domain are the observations we capture.
Applying Event Sourcing to ETL
Observational
When capturing observations of external systems using Event
Sourcing, the events in our domain are the observations we capture.
Transforming a sequence of observations into explicitly modeled
domain events is the first projection.
Applying Event Sourcing to ETL
Observational
When capturing observations of external systems using Event
Sourcing, the events in our domain are the observations we capture.
Transforming a sequence of observations into explicitly modeled
domain events is the first projection.
Observational: an Event Sourced system where the event history is
of captured observations, and all state is derived from them.
Observations
student_id course_id grade
123 abc A-
Applying Event Sourcing to ETL
Domain Events
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Read Models
Observations
student_id course_id grade
123 abc A-
Applying Event Sourcing to ETL
Domain Events
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Read Models
Immutable &
Sequential
Store
Observations
student_id course_id grade
123 abc A-
Applying Event Sourcing to ETL
Domain Events
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Read Models
Immutable &
Sequential
Store
TeTL Process(es)
Domain
Events
Tr
Observations
student_id course_id grade
123 abc A-
Applying Event Sourcing to ETL
Domain Events
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Read Models
Immutable &
Sequential
Store
Read
Model(s)
TeTL Process(es)
Domain
Events
Tr Tr Lo
Case study: Event Sourcing ETL
Case study: Event Sourcing ETL
GradeUpdated
student_id: 1
date: Oct 11
course: Biology
grade: B-
GradeUpdated
student_id: 1
date: Oct 12
course: Biology
grade: B+
projection
observation events domain events
Case study: Event Sourcing ETL
GradeUpdated
student_id: 1
date: Oct 11
course: Biology
grade: B-
GradeUpdated
student_id: 1
date: Oct 12
course: Biology
grade: B+
projection
InProgressGrades
domain events
read models
Case study: Event Sourcing ETL
queried
InProgressGrades
read models
Case study: Event Sourcing ETL
Past as First Class
First
Later interpretation
Case study: Event Sourcing ETL
Past as First Class
First
Later interpretation
Case study: Event Sourcing ETL
Past as First Class
First
Later interpretation
Case study: Event Sourcing ETL
Determinism
Case study: Event Sourcing ETL
Determinism
● Read Models regenerated nightly from source of truth
○ Given the same history, we regenerate the same Read Models
Case study: Event Sourcing ETL
Determinism
● Read Models regenerated nightly from source of truth
○ Given the same history, we regenerate the same Read Models
● On-demand Read Model Comparison tool
○ Ensure no Read Model changes across larger code refactors
Case study: Event Sourcing ETL
Determinism
Read Model Comparison - Before and After Regeneration
Read Model DB Same DB, but later.Regenerations Run
Clone Read Model Clone Read Model Again
batch_BEFORE batch_AFTER
Case study: Event Sourcing ETL
Determinism
Read Model Comparison - Before and After Regeneration
Read Model DB Same DB, but later.Regenerations Run
Case study: Event Sourcing ETL
Determinism
Read Model Comparison - Before and After Regeneration
Read Model DB Same DB, but later.Regenerations Run
Case study: Event Sourcing ETL
Trade-off: Investment in Training
Case study: Event Sourcing ETL
Trade-off: Investment in Training
● 5 x 1 hr training videos + 1 hr discussions = 10 hrs
Case study: Event Sourcing ETL
Trade-off: Investment in Training
● 5 x 1 hr training videos + 1 hr discussions = 10 hrs
● Gentle ramp up w/ pairing and joint designs (weeks)
Case study: Event Sourcing ETL
Trade-off: Investment in Training
● 5 x 1 hr training videos + 1 hr discussions = 10 hrs
● Gentle ramp up w/ pairing and joint designs (weeks)
● Set expectation that architecture will feel different
Lessons Learned
At the two year mark
● Lessons learned: Thinnest extractions possible
● Lessons learned: Extracted files as Source of Truth
● Lessons learned: Many iterations on transformations
● Lessons learned: Why TL must be fast and run often
Lessons Learned
At the two year mark
Lessons learned: Thinnest extractions possible
My first version of converting [one type of] XML to CSV was
silently dropping rows, and would have lost all that data if not
for the ability to replace from original extract.
Lessons Learned
At the two year mark
Lessons learned: Extracted files as Source of Truth
Real world example of changing incorrect foreign key reference
(which had been nearly all overlapping previously).
Lessons Learned
At the two year mark
Lessons learned: Many iterations on interpretations
Very natural to handle the changes, big and small, that appear in
the format and content of the data we have extracted. Also, new
features sometimes mean new or changed interpretations.
Lessons Learned
At the two year mark
Lessons learned: Why TL must be fast and run often
Consider the “nightly restores from backups” to prove that you
can actually restore from backups. This practice exists in our
application rather than our tools. If regeneration ever gets too
slow to complete overnight, we could lose this.
Summary and Review
What we covered
How Event Sourcing can be applied to ETL
How Determinism can be a property of a system
Value of treating the Past as First Class
Learn More
Resources
● DDD, CQRS, and Event Sourcing videos by Greg Young
● CQRS documentation site by Edument AB
● Domain Driven Design book by Eric Evans
Keep in touch!
● twitter: @ms_ati
● email: msiegel@panoramaed.com

More Related Content

PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r2)
PDF
Introduction SQL Analytics on Lakehouse Architecture
PPTX
Architecting a datalake
PDF
Observability & Datadog
PDF
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
PDF
DataOps: An Agile Method for Data-Driven Organizations
DOC
Data warehouse concepts
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Introduction SQL Analytics on Lakehouse Architecture
Architecting a datalake
Observability & Datadog
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
DataOps: An Agile Method for Data-Driven Organizations
Data warehouse concepts

What's hot (20)

PDF
Considerations for Data Access in the Lakehouse
PDF
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PPTX
Practical Enterprise Architecture - Introducing CSVLOD EA Model
PDF
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
PDF
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
PDF
Intro to Delta Lake
PDF
Data platform architecture
PPTX
Data Warehousing Trends, Best Practices, and Future Outlook
PPT
Date warehousing concepts
PDF
Spark with Delta Lake
PDF
Lecture1 introduction to big data
PDF
ETL VS ELT.pdf
PPTX
Azure Data Lake Intro (SQLBits 2016)
PDF
Data Engineering Basics
PPTX
Free Training: How to Build a Lakehouse
PPTX
Informatica Powercenter Architecture
PDF
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
PDF
Introduction to Azure Data Factory
Considerations for Data Access in the Lakehouse
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Practical Enterprise Architecture - Introducing CSVLOD EA Model
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Intro to Delta Lake
Data platform architecture
Data Warehousing Trends, Best Practices, and Future Outlook
Date warehousing concepts
Spark with Delta Lake
Lecture1 introduction to big data
ETL VS ELT.pdf
Azure Data Lake Intro (SQLBits 2016)
Data Engineering Basics
Free Training: How to Build a Lakehouse
Informatica Powercenter Architecture
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Introduction to Azure Data Factory
Ad

Similar to ETL and Event Sourcing (20)

PPTX
Proven ETL Developer Interview Questions to Assess and Hire ETL Developers
PDF
Etl testing contents
PPTX
DMDW 7. Student Presentation - Pentaho Data Integration (Kettle)
PDF
Introduction to ETL and Data Integration
PPTX
Ask On Data Uses NLP to Simplify ETL.pptx
PDF
How Ask On Data Uses NLP to Simplify ETL.pdf
PPTX
Our ETL testing training program in Hyderabad covers comprehensive topics suc...
PPTX
Entity Framework 4
PDF
Spark Summit EU talk by Bas Geerdink
DOCX
What is an ETL plan that Ralph Kimball identifies from the 34 Subsyste.docx
PDF
ETL testing training program in Hyderabad covers comprehensive topics
DOCX
Etl techniques
PPT
NEOOUG 2010 Oracle Data Integrator Presentation
PPTX
Data Warehouse - What you know about etl process is wrong
PPTX
etl testing training in hyderabad.......
PDF
Etl testing training institute in hyderabad
PPTX
What is ETL?
PDF
Big data analytics beyond beer and diapers
PPTX
“Extract, Load, Transform,” is another type of data integration process
PPTX
Extract Transformation Load (3) (1).pptx
Proven ETL Developer Interview Questions to Assess and Hire ETL Developers
Etl testing contents
DMDW 7. Student Presentation - Pentaho Data Integration (Kettle)
Introduction to ETL and Data Integration
Ask On Data Uses NLP to Simplify ETL.pptx
How Ask On Data Uses NLP to Simplify ETL.pdf
Our ETL testing training program in Hyderabad covers comprehensive topics suc...
Entity Framework 4
Spark Summit EU talk by Bas Geerdink
What is an ETL plan that Ralph Kimball identifies from the 34 Subsyste.docx
ETL testing training program in Hyderabad covers comprehensive topics
Etl techniques
NEOOUG 2010 Oracle Data Integrator Presentation
Data Warehouse - What you know about etl process is wrong
etl testing training in hyderabad.......
Etl testing training institute in hyderabad
What is ETL?
Big data analytics beyond beer and diapers
“Extract, Load, Transform,” is another type of data integration process
Extract Transformation Load (3) (1).pptx
Ad

Recently uploaded (20)

PDF
Java Basics-Introduction and program control
PPTX
CONTRACTS IN CONSTRUCTION PROJECTS: TYPES
PPT
Chapter 1 - Introduction to Manufacturing Technology_2.ppt
PDF
Unit1 - AIML Chapter 1 concept and ethics
PDF
August -2025_Top10 Read_Articles_ijait.pdf
PPTX
ASME PCC-02 TRAINING -DESKTOP-NLE5HNP.pptx
PPTX
CN_Unite_1 AI&DS ENGGERING SPPU PUNE UNIVERSITY
PDF
Computer organization and architecuture Digital Notes....pdf
PDF
Present and Future of Systems Engineering: Air Combat Systems
PPTX
Chapter 2 -Technology and Enginerring Materials + Composites.pptx
PPTX
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
PDF
First part_B-Image Processing - 1 of 2).pdf
PPTX
Micro1New.ppt.pptx the mai themes of micfrobiology
PDF
LOW POWER CLASS AB SI POWER AMPLIFIER FOR WIRELESS MEDICAL SENSOR NETWORK
PPTX
A Brief Introduction to IoT- Smart Objects: The "Things" in IoT
DOC
T Pandian CV Madurai pandi kokkaf illaya
PPTX
Management Information system : MIS-e-Business Systems.pptx
PPTX
Amdahl’s law is explained in the above power point presentations
PPTX
mechattonicsand iotwith sensor and actuator
PPTX
"Array and Linked List in Data Structures with Types, Operations, Implementat...
Java Basics-Introduction and program control
CONTRACTS IN CONSTRUCTION PROJECTS: TYPES
Chapter 1 - Introduction to Manufacturing Technology_2.ppt
Unit1 - AIML Chapter 1 concept and ethics
August -2025_Top10 Read_Articles_ijait.pdf
ASME PCC-02 TRAINING -DESKTOP-NLE5HNP.pptx
CN_Unite_1 AI&DS ENGGERING SPPU PUNE UNIVERSITY
Computer organization and architecuture Digital Notes....pdf
Present and Future of Systems Engineering: Air Combat Systems
Chapter 2 -Technology and Enginerring Materials + Composites.pptx
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
First part_B-Image Processing - 1 of 2).pdf
Micro1New.ppt.pptx the mai themes of micfrobiology
LOW POWER CLASS AB SI POWER AMPLIFIER FOR WIRELESS MEDICAL SENSOR NETWORK
A Brief Introduction to IoT- Smart Objects: The "Things" in IoT
T Pandian CV Madurai pandi kokkaf illaya
Management Information system : MIS-e-Business Systems.pptx
Amdahl’s law is explained in the above power point presentations
mechattonicsand iotwith sensor and actuator
"Array and Linked List in Data Structures with Types, Operations, Implementat...

ETL and Event Sourcing

  • 1. ETL and Event Sourcing Integration Architecture: Best Practice and Case Study Marc Siegel - Panorama Education - Wed Feb 6 2019
  • 2. ETL pipelines from external systems
  • 3. ETL and Event Sourcing Prerequisite knowledge Familiarity with traditional ETL architectures: Software systems that Extract data from external systems, Transform them, and Load the resulting data sets into internal systems, most often relational databases Dissatisfaction with traditional ETL architectures / curiosity to learn about and consider an alternative architecture
  • 4. ETL and Event Sourcing What you’ll learn How Event Sourcing can be applied to ETL How Determinism can be a property of a system Value of treating the Past as First Class
  • 8. ETL Traditional ETL Process Extract In a nutshell External System
  • 9. ETL Traditional ETL Process Extract Transform In a nutshell External System
  • 10. ETL Traditional ETL Process Extract Transform Load In a nutshell External System
  • 11. ETL Traditional ETL Process Extract Transform Load Internal Database In a nutshell External System
  • 12. ETL Traditional ETL Process Extract Transform Load Internal Database In a nutshell External System Q: What is the System of Record? What is the Source of Truth?
  • 13. ETL In a nutshell External System System of Record The authoritative data source for a given data element or piece of information (1)
  • 14. ETL Internal Database In a nutshell Source of Truth A trusted data source that gives a complete picture of the data object as a whole (2)
  • 15. ETL Traditional ETL Process Extract Transform Load Internal Database In a nutshell External System
  • 18. ETL Challenges Operational Domain Modelling Selective Attention Must rerun long ETL job to test edge case Missing Interests: ● Decoupling
  • 19. ETL Challenges Operational Domain Modelling Selective Attention Must rerun long ETL job to test edge case Running ETL job can overwrite history Missing Interests: ● Decoupling ● Determinism
  • 20. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL Challenges
  • 21. ETL Challenges Operational Domain Modelling Selective Attention Must create one true schema to load into Missing Interests: ● Decoupling (of each interpretation)
  • 22. ETL Challenges Operational Domain Modelling Selective Attention Must create one true schema to load into Tend toward lowest common denominator OR superset of all external model features Missing Interests: ● Decoupling (of each interpretation) ● Modeling State Explicitly
  • 23. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL Challenges
  • 24. ETL Challenges Operational Domain Modelling Selective Attention From Psychology: the act of focusing on a particular object while ignoring irrelevant information → Can’t re-interpret past extracts Missing Interests: ● Past as First Class
  • 25. ETL Problems Awareness Tests YouTube: ● Basketball ● Monkey Business How many passes did the team in white make?
  • 26. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL Challenges
  • 27. ETL Advantage Not just problems. Positive trade-offs of ETL? ● Low Costs: Training, framing, explaining ○ Training: Low cost to train new engineers in ETL concepts ○ Framing: No requirement for explicit domain modeling ○ Explaining: Intuitive to explain to non-engineers
  • 28. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL Challenges
  • 31. ETL Traditional ETL Process Extract Transform Load Internal Database In a nutshell External System
  • 32. ETL and ELT Traditional ETL Process Extract Transform Load Internal Database External System
  • 33. ETL and ELT EL Process Extract Traditional ETL Process Extract Transform Load Internal Database Load External System
  • 34. ETL and ELT EL Process Extract Data Lake or Blob or File Store Traditional ETL Process Extract Transform Load Internal Database Load External System
  • 35. ETL and ELT EL Process Extract Data Lake or Blob or File Store T Process Do anything here! Many vendors offering various solutions. Traditional ETL Process Extract Transform Load Internal Database Load External System
  • 36. ETL and ELT EL Process Extract Data Lake or Blob or File Store T Process(es) Do anything here! Many vendors offering various solutions. Traditional ETL Process Extract Transform Load Internal Database Load External System
  • 37. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL and ELT
  • 38. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL and ELT
  • 39. ETL and ELT EL Process Extract Data Lake or Blob or File Store T Process(es) Do anything here! Many vendors offering various solutions. Traditional ETL Process Extract Transform Load Internal Database Load External System
  • 40. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL and ELT
  • 41. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL and ELT
  • 42. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL and ELT
  • 43. What is Event Sourcing?
  • 44. ETL Traditional ETL Process Extract Transform Load Internal Database In a nutshell External System
  • 45. ETL and ELT EL Process Extract Data Lake or Blob or File Store T Process(es) Do anything here! Many vendors offering various solutions. Traditional ETL Process Extract Transform Load Internal Database Load External System
  • 46. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System
  • 47. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store
  • 48. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store TeTL Process
  • 49. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store TeTL Process Domain Events Tr
  • 50. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store TeTL Process Domain Events Tr Tr Lo
  • 51. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store Read Model TeTL Process Domain Events Tr Tr Lo
  • 52. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store Read Model(s) TeTL Process(es) Domain Events Tr Tr Lo
  • 53. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store Read Model(s) TeTL Process(es) Domain Events Tr Tr Lo 1) Decouple extractions 2) Source of Truth: the extracts 3) Deterministic transform: to events + to model regular expression mnemonic: from /(ETL)/ to /E{1}T*L*/ ← Extract once, Transform & Load Infinitely
  • 54. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL, ELT, and Event Sourcing
  • 55. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL, ELT, and Event Sourcing
  • 56. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL, ELT, and Event Sourcing
  • 57. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store Read Model(s) TeTL Process(es) Domain Events Tr Tr Lo 1) Decouple extractions 2) Source of Truth: the extracts 3) Deterministic transform: to events + to model regular expression mnemonic: from /(ETL)/ to /E{1}T*L*/ ← Extract once, Transform & Load Infinitely
  • 58. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL, ELT, and Event Sourcing
  • 59. Event Sourcing Challenge Not just advantages. Negative trade-offs of ES? ● High Costs: Training, framing, explaining ○ Training: Higher cost to train new engineers in ES concepts ○ Framing: Requirement for (lots of) explicit domain modeling ○ Explaining: Not necessarily intuitive to explain to non-engineers
  • 60. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL, ELT, and Event Sourcing
  • 62. How does Event Sourcing work?
  • 63. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events
  • 64. Event Sourcing Basics Events State transitions are an important part of our problem space and should be modeled within our domain.
  • 65. Event Sourcing Basics Events State transitions are an important part of our problem space and should be modeled within our domain. Event Sourcing says all state is transient and you only store facts.
  • 66. Event Sourcing Basics Events State transitions are an important part of our problem space and should be modeled within our domain. Event Sourcing says all state is transient and you only store facts. Event: something that happened in the past; a fact; a state transition.
  • 67. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events
  • 68. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events Read Models student_id course_id grade 123 abc B+
  • 69. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events Read Models student_id course_id grade 123 abc C
  • 70. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events Read Models student_id course_id grade 123 abc A-
  • 71. Event Sourcing Basics Read Models Event Sourcing takes the term Read Model from CQRS.
  • 72. Event Sourcing Basics Read Models Event Sourcing takes the term Read Model from CQRS. A Read Model is an interpretation of a sequence of events, that is optimized for answering a given set of queries (reads).
  • 73. Event Sourcing Basics Read Models Event Sourcing takes the term Read Model from CQRS. A Read Model is an interpretation of a sequence of events, that is optimized for answering a given set of queries (reads). Read Models: are independent representations of state that we deterministically regenerate from events using projections.
  • 74. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events Projections def f(state, event) state.where( student_id: event.student_id, course_id: event.course_id ).update(grade: event.grade) end student_id course_id grade 123 abc A-
  • 75. Event Sourcing Basics Projections When we talk about Event Sourcing, current state is a left-fold of previous behaviors.
  • 76. Event Sourcing Basics Projections When we talk about Event Sourcing, current state is a left-fold of previous behaviors. We play back a stream of events, applying a function f ( staten , eventn ) -> staten+1
  • 77. Event Sourcing Basics Projections When we talk about Event Sourcing, current state is a left-fold of previous behaviors. We play back a stream of events, applying a function f ( staten , eventn ) -> staten+1 Projection: a function through which we apply events in sequence to deterministically derive the state of our application
  • 78. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events Projections def f(state, event) state.where( student_id: event.student_id, course_id: event.course_id ).update(grade: event.grade) end student_id course_id grade 123 abc A- Read Models
  • 79. Event Sourcing Basics Review Event: something that happened in the past; a fact; a state transition. Projection: a function through which we apply events in sequence to deterministically derive the state of our application Read Models: are independent representations of state that we deterministically regenerate from events using projections.
  • 80. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events Projections def f(state, event) state.where( student_id: event.student_id, course_id: event.course_id ).update(grade: event.grade) end student_id course_id grade 123 abc A- Read Models
  • 82. Applying Event Sourcing to ETL Q: How to we get from ETL to explicitly modeled Domain Events?
  • 83. Applying Event Sourcing to ETL Q: How to we get from ETL to explicitly modeled Domain Events? Immutable & Sequential Store Read Model(s) TeTL Process(es) Domain Events Tr Tr Lo
  • 84. Applying Event Sourcing to ETL Q: How to we get from ETL to explicitly modeled Domain Events? A: Build an Observational Event Sourced system Immutable & Sequential Store Read Model(s) TeTL Process(es) Domain Events Tr Tr Lo
  • 85. Observations student_id course_id grade 123 abc A- Applying Event Sourcing to ETL Domain Events GradeUpdated student_id: 123 course_id: abc grade: A- Read Models
  • 86. Applying Event Sourcing to ETL Observational When capturing observations of external systems using Event Sourcing, the events in our domain are the observations we capture.
  • 87. Applying Event Sourcing to ETL Observational When capturing observations of external systems using Event Sourcing, the events in our domain are the observations we capture. Transforming a sequence of observations into explicitly modeled domain events is the first projection.
  • 88. Applying Event Sourcing to ETL Observational When capturing observations of external systems using Event Sourcing, the events in our domain are the observations we capture. Transforming a sequence of observations into explicitly modeled domain events is the first projection. Observational: an Event Sourced system where the event history is of captured observations, and all state is derived from them.
  • 89. Observations student_id course_id grade 123 abc A- Applying Event Sourcing to ETL Domain Events GradeUpdated student_id: 123 course_id: abc grade: A- Read Models
  • 90. Observations student_id course_id grade 123 abc A- Applying Event Sourcing to ETL Domain Events GradeUpdated student_id: 123 course_id: abc grade: A- Read Models Immutable & Sequential Store
  • 91. Observations student_id course_id grade 123 abc A- Applying Event Sourcing to ETL Domain Events GradeUpdated student_id: 123 course_id: abc grade: A- Read Models Immutable & Sequential Store TeTL Process(es) Domain Events Tr
  • 92. Observations student_id course_id grade 123 abc A- Applying Event Sourcing to ETL Domain Events GradeUpdated student_id: 123 course_id: abc grade: A- Read Models Immutable & Sequential Store Read Model(s) TeTL Process(es) Domain Events Tr Tr Lo
  • 93. Case study: Event Sourcing ETL
  • 94. Case study: Event Sourcing ETL GradeUpdated student_id: 1 date: Oct 11 course: Biology grade: B- GradeUpdated student_id: 1 date: Oct 12 course: Biology grade: B+ projection observation events domain events
  • 95. Case study: Event Sourcing ETL GradeUpdated student_id: 1 date: Oct 11 course: Biology grade: B- GradeUpdated student_id: 1 date: Oct 12 course: Biology grade: B+ projection InProgressGrades domain events read models
  • 96. Case study: Event Sourcing ETL queried InProgressGrades read models
  • 97. Case study: Event Sourcing ETL Past as First Class First Later interpretation
  • 98. Case study: Event Sourcing ETL Past as First Class First Later interpretation
  • 99. Case study: Event Sourcing ETL Past as First Class First Later interpretation
  • 100. Case study: Event Sourcing ETL Determinism
  • 101. Case study: Event Sourcing ETL Determinism ● Read Models regenerated nightly from source of truth ○ Given the same history, we regenerate the same Read Models
  • 102. Case study: Event Sourcing ETL Determinism ● Read Models regenerated nightly from source of truth ○ Given the same history, we regenerate the same Read Models ● On-demand Read Model Comparison tool ○ Ensure no Read Model changes across larger code refactors
  • 103. Case study: Event Sourcing ETL Determinism Read Model Comparison - Before and After Regeneration Read Model DB Same DB, but later.Regenerations Run Clone Read Model Clone Read Model Again batch_BEFORE batch_AFTER
  • 104. Case study: Event Sourcing ETL Determinism Read Model Comparison - Before and After Regeneration Read Model DB Same DB, but later.Regenerations Run
  • 105. Case study: Event Sourcing ETL Determinism Read Model Comparison - Before and After Regeneration Read Model DB Same DB, but later.Regenerations Run
  • 106. Case study: Event Sourcing ETL Trade-off: Investment in Training
  • 107. Case study: Event Sourcing ETL Trade-off: Investment in Training ● 5 x 1 hr training videos + 1 hr discussions = 10 hrs
  • 108. Case study: Event Sourcing ETL Trade-off: Investment in Training ● 5 x 1 hr training videos + 1 hr discussions = 10 hrs ● Gentle ramp up w/ pairing and joint designs (weeks)
  • 109. Case study: Event Sourcing ETL Trade-off: Investment in Training ● 5 x 1 hr training videos + 1 hr discussions = 10 hrs ● Gentle ramp up w/ pairing and joint designs (weeks) ● Set expectation that architecture will feel different
  • 110. Lessons Learned At the two year mark ● Lessons learned: Thinnest extractions possible ● Lessons learned: Extracted files as Source of Truth ● Lessons learned: Many iterations on transformations ● Lessons learned: Why TL must be fast and run often
  • 111. Lessons Learned At the two year mark Lessons learned: Thinnest extractions possible My first version of converting [one type of] XML to CSV was silently dropping rows, and would have lost all that data if not for the ability to replace from original extract.
  • 112. Lessons Learned At the two year mark Lessons learned: Extracted files as Source of Truth Real world example of changing incorrect foreign key reference (which had been nearly all overlapping previously).
  • 113. Lessons Learned At the two year mark Lessons learned: Many iterations on interpretations Very natural to handle the changes, big and small, that appear in the format and content of the data we have extracted. Also, new features sometimes mean new or changed interpretations.
  • 114. Lessons Learned At the two year mark Lessons learned: Why TL must be fast and run often Consider the “nightly restores from backups” to prove that you can actually restore from backups. This practice exists in our application rather than our tools. If regeneration ever gets too slow to complete overnight, we could lose this.
  • 115. Summary and Review What we covered How Event Sourcing can be applied to ETL How Determinism can be a property of a system Value of treating the Past as First Class
  • 116. Learn More Resources ● DDD, CQRS, and Event Sourcing videos by Greg Young ● CQRS documentation site by Edument AB ● Domain Driven Design book by Eric Evans Keep in touch! ● twitter: @ms_ati ● email: [email protected]