SlideShare a Scribd company logo
3
Most read
5
Most read
9
Most read
1 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Kafka Connect - Debezium
Stream MySQL events to Kafka
2 | Kafka Connect /Debezium - Stream MySQL events to Kafka
About me
Kasun Don
Software Engineer - London
AWIN AG | Eichhornstraße 3 | 10785 Berlin
Telephone +49 (0)30 5096910 | info@awin.com | www.awin.com
• Automation & DevOps enthusiastic
• Hands on Big Data Engineering
• Open Source Contributor
3 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Why Streaming MySQL events (CDC) ?
• Integrations with Legacy Applications
Avoid dual writes when Integrating with legacy systems.
• Smart Cache Invalidation
Automatically invalidate entries in a cache as soon as the record(s) for entries change or are removed.
• Monitoring Data Changes
Immediately react to data changes committed by application/user.
• Data Warehousing
Atomic operation synchronizations for ETL-type solutions.
• Event Sourcing (CQRS)
Totally ordered collection of events to asynchronously update the read-only views while writes can be recorded as normal.
4 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Apache Kafka
Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable,
and durable.
Producer
Consumer Consumer Consumer
Producer Producer
Kafka
5 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Kafka Connect
Connectors – A logical process responsible for managing the copying of data between Kafka and
another system.
There are two types of connectors,
• Source Connectors import data from another system
• Sink Connectors export data from Kafka
Workers – Unit of work that schedules connectors and tasks in a
process.
There are two main type of workers: standalone and distributed
Tasks - Unit of process that handles assigned set of work load by connectors.
Connector configuration allows set to maximum number of tasks can be run by a
connector.
6 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Kafka Connect - Overview
Data
Source
Data
Sink
KafkaConnect
KAFKA
KafkaConnect
7 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Kafka Connect – Configuration
Common Connector Configuration
• name - Unique name for the connector. Attempting to register again with the same name will
fail.
• connector.class - The Java class for the connector
• tasks.max - The maximum number of tasks that should be created for this connector. The
connector may create fewer tasks if it cannot achieve this level of parallelism.
Please note that connector configuration might vary, see specific connector documentation for
more information.
Distributed Mode - Worker Configuration
bootstrap.servers - A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.
group.id - A unique string that identifies the Connect cluster group this worker belongs to.
config.storage.topic - The topic to store connector and task configuration data in. This must be the same for all
workers with the same group.id.
offset.storage.topic - The topic to store offset data for connectors in. This must be the same for all workers with the
same group.id
status.storage.topic - The name of the topic where connector and task configuration status updates are stored.
For more distributed mode worker configuration : http://docs.confluent.io/current/connect/userguide.html#configuring-workers
8 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Kafka Connect – Running A Instance
It is recommended to run Kafka Connect on containerized environments such as Kubernetes, Mesos, Docker Swarm, or
YARN.
Kafka Connect distributed mode exposes port 8083 by default to serve management REST interface.
Kafka Connect does not automatically handle restarting or scaling workers which means your existing clustering solutions can continue to be used transparently. –
Confluent.io
$ docker run -d 
> --name=kafka-connect 
> --net=host 
> -e CONNECT_BOOTSTRAP_SERVERS="kafka-broker:9092" 
> -e CONNECT_GROUP_ID="group_1" 
> -e CONNECT_CONFIG_STORAGE_TOPIC="kafka-connect-config" 
> -e CONNECT_OFFSET_STORAGE_TOPIC="kafka-connect-offset" 
> -e CONNECT_STATUS_STORAGE_TOPIC="kafka-connect-status" 
> -e CONNECT_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" 
> -e CONNECT_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" 
> -e CONNECT_INTERNAL_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" 
> -e CONNECT_INTERNAL_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" 
> -e CONNECT_LOG4J_LOGGERS="io.debezium.connector.mysql=INFO" 
> -v /opt/kafka-connect/jars:/etc/kafka-connect/jars 
> --restart always 
> confluentinc/cp-kafka-connect:3.3.0
9 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Debezium Connector
What is Debezium ?
Debezium is an open source distributed platform for change data capture using MySQL row-level binary
logs. Debezium built on top of Kafka Connect API Framework to support fault tolerance and high availability
using Apache Kafka eco system. Debezium records in a transaction log all row-level changes committed to
each database table.
Supported Databases
Debezium currently able to support following list of database software.
• MySQL
• MongoDB
• PostgreSQL
For more Information : http://debezium.io/docs/connectors/
10 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Debezium Connector – MySQL Configuration
Enable binary logs
server-id = 1000001
log_bin = mysql-bin
binlog_format = row
binlog_row_image = full
expire_logs_days = 5
or
Enable GTIDs
gtid_mode = on
enforce_gtid_consistency = on
MySQL user with sufficient privileges
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION
CLIENT ON *.* TO 'debezium' IDENTIFIED BY password';
Supported MySQL topologies
• MySQL standalone
• MySQL master and slave
• Highly Available MySQL clusters
• Multi-Master MySQL
• Hosted MySQL eg: Amazon RDS and Amazon Aurora
11 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Debezium Connector – MySQL Connector
Configuration
Example Configuration
{
"name": "example-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "127.0.0.1",
"database.port": "3306",
"database.user": "debezium",
"database.password": "dbz",
"database.server.id": "184054",
"database.server.name": "mysql-example",
"database.whitelist": "db1",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "dbhistory.mysql-example"
}
}
For more configuration : http://debezium.io/docs/connectors/mysql/
12 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Debezium Connector – Add Connector to Kafka
Connect
For more configuration : http://debezium.io/docs/connectors/mysql/
More REST Endpoints : https://docs.confluent.io/current/connect/managing.html#using-the-rest-interface
List Available Connector plugins
$ curl -s http://kafka-connect:8083/connector-plugins
[
{
"class": "io.confluent.connect.jdbc.JdbcSinkConnector"
},
{
"class": "io.confluent.connect.jdbc.JdbcSourceConnector"
},
{
"class": "io.debezium.connector.mysql.MySqlConnector"
},
{
"class": "org.apache.kafka.connect.file.FileStreamSinkConnector"
},
{
"class": "org.apache.kafka.connect.file.FileStreamSourceConnector"
}
]
Add connector
$ curl -s -X POST -H "Content-Type: application/json" --data @connector-config.json http://kafka-connect:8083/conn
Remove connector
$ curl -X DELETE -H "Content-Type: application/json” http://kafka-connect:8083/connectors
13 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Debezium Connector – Sample CDC Event
{
"schema": {},
"payload": {
"before": null,
"after": {
"id": 1004,
"first_name": "Anne Marie",
"last_name": "Kretchmar",
"email": "annek@noanswer.org"
},
"source": {
"name": "mysql-server-1",
"server_id": 223344,
"ts_sec": 1465581,
"gtid": null,
"file": "mysql-bin.000003",
"pos": 805,
"row": 0,
"snapshot": null
},
"op": "d",
"ts_ms": 1465581902461
}
}
{
"schema": {},
"payload": {
"before": {
"id": 1004,
"first_name": "Anne Marie",
"last_name": "Kretchmar",
"email": "annek@noanswer.org"
},
"after": null,
"source": {
"name": "mysql-server-1",
"server_id": 223344,
"ts_sec": 1465889,
"gtid": null,
"file": "mysql-bin.000003",
"pos": 806,
"row": 0,
"snapshot": null
},
"op": "d",
"ts_ms": 1465581902500
}
}
INSERT DELETE
14 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Useful Links
Kafka Connect – User Guide
http://docs.confluent.io/2.0.0/connect/userguide.
html
Debezium – Interactive tutorial
http://debezium.io/docs/tutorial/
Debezium – MySQL connector
http://debezium.io/docs/connectors/mysql/
Kafka Connect – REST Endpoints
http://docs.confluent.io/2.0.0/connect/userguide.html#rest-
interface
Debezium Support/User Group
User ::
https://gitter.im/debezium/user
Dev :: https://gitter.im/debezium/dev
Kafka Connect – Connectors
https://www.confluent.io/product/connectors/
15 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Q & A
16 | Kafka Connect /Debezium - Stream MySQL events to Kafka
Thank you
http://linkedin.com/in/kasundon

More Related Content

PDF
Introducing Change Data Capture with Debezium
ChengKuan Gan
 
PPTX
Iptables the Linux Firewall
Syed fawad Gillani
 
PDF
Altinity Quickstart for ClickHouse
Altinity Ltd
 
PDF
Soluciones Dynatrace
Innovation Strategies
 
PDF
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Altinity Ltd
 
PPTX
2017 ofi-hoti-tutorial
seanhefty
 
PDF
AWS Certified Solutions Architect Associate Exam Guide 1st Edition 2024_KIRAN...
Kiran Kumar Malik
 
PPTX
Fleet and elastic agent
Ismaeel Enjreny
 
Introducing Change Data Capture with Debezium
ChengKuan Gan
 
Iptables the Linux Firewall
Syed fawad Gillani
 
Altinity Quickstart for ClickHouse
Altinity Ltd
 
Soluciones Dynatrace
Innovation Strategies
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Altinity Ltd
 
2017 ofi-hoti-tutorial
seanhefty
 
AWS Certified Solutions Architect Associate Exam Guide 1st Edition 2024_KIRAN...
Kiran Kumar Malik
 
Fleet and elastic agent
Ismaeel Enjreny
 

What's hot (20)

PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
PPTX
Capture the Streams of Database Changes
confluent
 
PDF
Kafka Streams: What it is, and how to use it?
confluent
 
PDF
From Zero to Hero with Kafka Connect
confluent
 
PDF
Change Data Streaming Patterns for Microservices With Debezium
confluent
 
PDF
Apache Kafka Introduction
Amita Mirajkar
 
PPTX
Migrating with Debezium
Mike Fowler
 
PPTX
Envoy and Kafka
Adam Kotwasinski
 
PDF
Kafka 101 and Developer Best Practices
confluent
 
PDF
When NOT to use Apache Kafka?
Kai Wähner
 
ODP
Stream processing using Kafka
Knoldus Inc.
 
PPTX
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
PDF
Introduction to Kafka Streams
Guozhang Wang
 
PPTX
APACHE KAFKA / Kafka Connect / Kafka Streams
Ketan Gote
 
PPTX
Apache Kafka
Saroj Panyasrivanit
 
PDF
CDC patterns in Apache Kafka®
confluent
 
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
PPTX
Delta lake and the delta architecture
Adam Doyle
 
PPTX
Kafka connect 101
Whiteklay
 
PDF
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
Capture the Streams of Database Changes
confluent
 
Kafka Streams: What it is, and how to use it?
confluent
 
From Zero to Hero with Kafka Connect
confluent
 
Change Data Streaming Patterns for Microservices With Debezium
confluent
 
Apache Kafka Introduction
Amita Mirajkar
 
Migrating with Debezium
Mike Fowler
 
Envoy and Kafka
Adam Kotwasinski
 
Kafka 101 and Developer Best Practices
confluent
 
When NOT to use Apache Kafka?
Kai Wähner
 
Stream processing using Kafka
Knoldus Inc.
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
Introduction to Kafka Streams
Guozhang Wang
 
APACHE KAFKA / Kafka Connect / Kafka Streams
Ketan Gote
 
Apache Kafka
Saroj Panyasrivanit
 
CDC patterns in Apache Kafka®
confluent
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Delta lake and the delta architecture
Adam Doyle
 
Kafka connect 101
Whiteklay
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
Ad

Similar to Kafka Connect - debezium (20)

ODP
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
Timofey Turenko
 
PDF
Building Out Your Kafka Developer CDC Ecosystem
confluent
 
PDF
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps_Fest
 
PDF
Diving into the Deep End - Kafka Connect
confluent
 
PPTX
Containerized Data Persistence on Mesos
Joe Stein
 
PPTX
Training
HemantDunga1
 
PDF
Apache Kafka - A modern Stream Processing Platform
Guido Schmutz
 
PPTX
Cassandra - A decentralized storage system
Arunit Gupta
 
PDF
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PDF
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
PPTX
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
DOCX
Kafk a with zoo keeper setup documentation
Thiyagarajan saminadane
 
PDF
Multitenancy: Kafka clusters for everyone at LINE
kawamuray
 
PPTX
Building big data pipelines with Kafka and Kubernetes
Venu Ryali
 
PPTX
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Joe Stein
 
PDF
Sparkstreaming
Marilyn Waldman
 
PPTX
Introduction To Apache Mesos
Joe Stein
 
DOCX
Apache kafka configuration-guide
Chetan Khatri
 
PDF
What is Apache Kafka®?
Eventador
 
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
Timofey Turenko
 
Building Out Your Kafka Developer CDC Ecosystem
confluent
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps_Fest
 
Diving into the Deep End - Kafka Connect
confluent
 
Containerized Data Persistence on Mesos
Joe Stein
 
Training
HemantDunga1
 
Apache Kafka - A modern Stream Processing Platform
Guido Schmutz
 
Cassandra - A decentralized storage system
Arunit Gupta
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
Kafk a with zoo keeper setup documentation
Thiyagarajan saminadane
 
Multitenancy: Kafka clusters for everyone at LINE
kawamuray
 
Building big data pipelines with Kafka and Kubernetes
Venu Ryali
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Joe Stein
 
Sparkstreaming
Marilyn Waldman
 
Introduction To Apache Mesos
Joe Stein
 
Apache kafka configuration-guide
Chetan Khatri
 
What is Apache Kafka®?
Eventador
 
Ad

Recently uploaded (20)

PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PDF
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PPTX
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
PPTX
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PPTX
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
PPTX
INFO8116 -Big data architecture and analytics
guddipatel10
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PDF
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
PPTX
Presentation on animal welfare a good topic
kidscream385
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
INFO8116 -Big data architecture and analytics
guddipatel10
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
Presentation on animal welfare a good topic
kidscream385
 

Kafka Connect - debezium

  • 1. 1 | Kafka Connect /Debezium - Stream MySQL events to Kafka Kafka Connect - Debezium Stream MySQL events to Kafka
  • 2. 2 | Kafka Connect /Debezium - Stream MySQL events to Kafka About me Kasun Don Software Engineer - London AWIN AG | Eichhornstraße 3 | 10785 Berlin Telephone +49 (0)30 5096910 | [email protected] | www.awin.com • Automation & DevOps enthusiastic • Hands on Big Data Engineering • Open Source Contributor
  • 3. 3 | Kafka Connect /Debezium - Stream MySQL events to Kafka Why Streaming MySQL events (CDC) ? • Integrations with Legacy Applications Avoid dual writes when Integrating with legacy systems. • Smart Cache Invalidation Automatically invalidate entries in a cache as soon as the record(s) for entries change or are removed. • Monitoring Data Changes Immediately react to data changes committed by application/user. • Data Warehousing Atomic operation synchronizations for ETL-type solutions. • Event Sourcing (CQRS) Totally ordered collection of events to asynchronously update the read-only views while writes can be recorded as normal.
  • 4. 4 | Kafka Connect /Debezium - Stream MySQL events to Kafka Apache Kafka Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable. Producer Consumer Consumer Consumer Producer Producer Kafka
  • 5. 5 | Kafka Connect /Debezium - Stream MySQL events to Kafka Kafka Connect Connectors – A logical process responsible for managing the copying of data between Kafka and another system. There are two types of connectors, • Source Connectors import data from another system • Sink Connectors export data from Kafka Workers – Unit of work that schedules connectors and tasks in a process. There are two main type of workers: standalone and distributed Tasks - Unit of process that handles assigned set of work load by connectors. Connector configuration allows set to maximum number of tasks can be run by a connector.
  • 6. 6 | Kafka Connect /Debezium - Stream MySQL events to Kafka Kafka Connect - Overview Data Source Data Sink KafkaConnect KAFKA KafkaConnect
  • 7. 7 | Kafka Connect /Debezium - Stream MySQL events to Kafka Kafka Connect – Configuration Common Connector Configuration • name - Unique name for the connector. Attempting to register again with the same name will fail. • connector.class - The Java class for the connector • tasks.max - The maximum number of tasks that should be created for this connector. The connector may create fewer tasks if it cannot achieve this level of parallelism. Please note that connector configuration might vary, see specific connector documentation for more information. Distributed Mode - Worker Configuration bootstrap.servers - A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. group.id - A unique string that identifies the Connect cluster group this worker belongs to. config.storage.topic - The topic to store connector and task configuration data in. This must be the same for all workers with the same group.id. offset.storage.topic - The topic to store offset data for connectors in. This must be the same for all workers with the same group.id status.storage.topic - The name of the topic where connector and task configuration status updates are stored. For more distributed mode worker configuration : http://docs.confluent.io/current/connect/userguide.html#configuring-workers
  • 8. 8 | Kafka Connect /Debezium - Stream MySQL events to Kafka Kafka Connect – Running A Instance It is recommended to run Kafka Connect on containerized environments such as Kubernetes, Mesos, Docker Swarm, or YARN. Kafka Connect distributed mode exposes port 8083 by default to serve management REST interface. Kafka Connect does not automatically handle restarting or scaling workers which means your existing clustering solutions can continue to be used transparently. – Confluent.io $ docker run -d > --name=kafka-connect > --net=host > -e CONNECT_BOOTSTRAP_SERVERS="kafka-broker:9092" > -e CONNECT_GROUP_ID="group_1" > -e CONNECT_CONFIG_STORAGE_TOPIC="kafka-connect-config" > -e CONNECT_OFFSET_STORAGE_TOPIC="kafka-connect-offset" > -e CONNECT_STATUS_STORAGE_TOPIC="kafka-connect-status" > -e CONNECT_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" > -e CONNECT_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" > -e CONNECT_INTERNAL_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" > -e CONNECT_INTERNAL_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" > -e CONNECT_LOG4J_LOGGERS="io.debezium.connector.mysql=INFO" > -v /opt/kafka-connect/jars:/etc/kafka-connect/jars > --restart always > confluentinc/cp-kafka-connect:3.3.0
  • 9. 9 | Kafka Connect /Debezium - Stream MySQL events to Kafka Debezium Connector What is Debezium ? Debezium is an open source distributed platform for change data capture using MySQL row-level binary logs. Debezium built on top of Kafka Connect API Framework to support fault tolerance and high availability using Apache Kafka eco system. Debezium records in a transaction log all row-level changes committed to each database table. Supported Databases Debezium currently able to support following list of database software. • MySQL • MongoDB • PostgreSQL For more Information : http://debezium.io/docs/connectors/
  • 10. 10 | Kafka Connect /Debezium - Stream MySQL events to Kafka Debezium Connector – MySQL Configuration Enable binary logs server-id = 1000001 log_bin = mysql-bin binlog_format = row binlog_row_image = full expire_logs_days = 5 or Enable GTIDs gtid_mode = on enforce_gtid_consistency = on MySQL user with sufficient privileges GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'debezium' IDENTIFIED BY password'; Supported MySQL topologies • MySQL standalone • MySQL master and slave • Highly Available MySQL clusters • Multi-Master MySQL • Hosted MySQL eg: Amazon RDS and Amazon Aurora
  • 11. 11 | Kafka Connect /Debezium - Stream MySQL events to Kafka Debezium Connector – MySQL Connector Configuration Example Configuration { "name": "example-connector", "config": { "connector.class": "io.debezium.connector.mysql.MySqlConnector", "tasks.max": "1", "database.hostname": "127.0.0.1", "database.port": "3306", "database.user": "debezium", "database.password": "dbz", "database.server.id": "184054", "database.server.name": "mysql-example", "database.whitelist": "db1", "database.history.kafka.bootstrap.servers": "kafka:9092", "database.history.kafka.topic": "dbhistory.mysql-example" } } For more configuration : http://debezium.io/docs/connectors/mysql/
  • 12. 12 | Kafka Connect /Debezium - Stream MySQL events to Kafka Debezium Connector – Add Connector to Kafka Connect For more configuration : http://debezium.io/docs/connectors/mysql/ More REST Endpoints : https://docs.confluent.io/current/connect/managing.html#using-the-rest-interface List Available Connector plugins $ curl -s http://kafka-connect:8083/connector-plugins [ { "class": "io.confluent.connect.jdbc.JdbcSinkConnector" }, { "class": "io.confluent.connect.jdbc.JdbcSourceConnector" }, { "class": "io.debezium.connector.mysql.MySqlConnector" }, { "class": "org.apache.kafka.connect.file.FileStreamSinkConnector" }, { "class": "org.apache.kafka.connect.file.FileStreamSourceConnector" } ] Add connector $ curl -s -X POST -H "Content-Type: application/json" --data @connector-config.json http://kafka-connect:8083/conn Remove connector $ curl -X DELETE -H "Content-Type: application/json” http://kafka-connect:8083/connectors
  • 13. 13 | Kafka Connect /Debezium - Stream MySQL events to Kafka Debezium Connector – Sample CDC Event { "schema": {}, "payload": { "before": null, "after": { "id": 1004, "first_name": "Anne Marie", "last_name": "Kretchmar", "email": "[email protected]" }, "source": { "name": "mysql-server-1", "server_id": 223344, "ts_sec": 1465581, "gtid": null, "file": "mysql-bin.000003", "pos": 805, "row": 0, "snapshot": null }, "op": "d", "ts_ms": 1465581902461 } } { "schema": {}, "payload": { "before": { "id": 1004, "first_name": "Anne Marie", "last_name": "Kretchmar", "email": "[email protected]" }, "after": null, "source": { "name": "mysql-server-1", "server_id": 223344, "ts_sec": 1465889, "gtid": null, "file": "mysql-bin.000003", "pos": 806, "row": 0, "snapshot": null }, "op": "d", "ts_ms": 1465581902500 } } INSERT DELETE
  • 14. 14 | Kafka Connect /Debezium - Stream MySQL events to Kafka Useful Links Kafka Connect – User Guide http://docs.confluent.io/2.0.0/connect/userguide. html Debezium – Interactive tutorial http://debezium.io/docs/tutorial/ Debezium – MySQL connector http://debezium.io/docs/connectors/mysql/ Kafka Connect – REST Endpoints http://docs.confluent.io/2.0.0/connect/userguide.html#rest- interface Debezium Support/User Group User :: https://gitter.im/debezium/user Dev :: https://gitter.im/debezium/dev Kafka Connect – Connectors https://www.confluent.io/product/connectors/
  • 15. 15 | Kafka Connect /Debezium - Stream MySQL events to Kafka Q & A
  • 16. 16 | Kafka Connect /Debezium - Stream MySQL events to Kafka Thank you http://linkedin.com/in/kasundon