0% found this document useful (0 votes)

3 views15 pages

Kafka

Kafka is a distributed message streaming platform that utilizes a publish-subscribe mechanism for data exchange between applications. It features a scalable and fault-tolerant architecture with core components including producers, consumers, brokers, and Zookeeper for cluster management. Kafka supports various messaging patterns, such as point-to-point and publish-subscribe, and is widely used by major enterprises for efficient data communication.

Uploaded by

leggasickvincent

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views15 pages

Kafka

Uploaded by

leggasickvincent

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

What is Kafka?

 Kafka is a fast, scalable, fault-tolerant distributed message streaming platform that

uses publish and subscribe mechanism to stream the records.
 It’s a publish-subscribe messaging system which lets you exchanging of data
between applications, servers, and processors as well.
 Kafka was originally developed by LinkedIn, and later it was donated to the Apache
Software Foundation.
 Apache Kafka has resolved the lethargic trouble of data communication between a
sender and a receiver.
 Currently used by many big enterprises like LinkedIn, Airbnb, Netflix, Uber,
Walmart etc.

What is a messaging system?

 A messaging system is a simple exchange of messages between two or more
persons, devices, etc.
 A publish-subscribe messaging system allows a sender to send/write the message
and a receiver to read that message.
 In Apache Kafka, a sender is known as a producer who publishes messages, and a
receiver is known as a consumer who consumes that message by subscribing it.

There are two types of messaging patterns available:

 Point to point messaging system
 Publish-subscribe messaging system
a) Point to Point Messaging System
 Messages are persisted in a queue.
 A particular message can be consumed by a maximum of one receiver only.
 After the receiver reads the message in the queue, the message disappears from that
queue.
 There is no time dependency laid for the receiver to receive the messages.
 When the Receiver receives the message, it will send an acknowledgment back to
the Sender.

b) Publish-Subscribe Messaging System

 Messages are persisted in a Topic.
 A particular message can be consumed by any number of consumers.
 There is time dependency laid for the consumer to consume the message.
 When the subscriber receives the message it doesn’t send an acknowledgement
back to the publisher.

Kafka Architecture: Below we are discussing four core APIs of Apache Kafka.
1) Topics: A stream of messages belonging to a particular category is called a topic. It is a
logical feed name to which records are published. Similar to a table in DB (Records are
considered messages here). Unique identifier of a topic is its name. We can create as many
topics as we want.

Partitions:
Topics are split into partitions. All the messages within a partition are ordered and
immutable. Each message within a partition has a unique id associated knows as Offset.
Replica/Replication:
Replicas are backup of a partition. Replicas are never read or write data. They are used to
prevent data loss (Fault Tolerant).

2) Producers:
 Producers are applications which write/publish data to the topics within a cluster
using the Producer APIs.
 Producers can write date either on the topic level (All the partition of that topic in a
round robin manner) or specific partitions of that topic.
3) Consumers:
 Consumers are applications which read/consume data from the topics within a
cluster using the Consumer APIs.
 Consumers can read date either on the topic level (All the partition of that) or
specific partitions of that topic.
 Consumers are always associated with exactly one Consumer Group which is a
group of Consumers that performs a task.

4) Brokers:
 Brokers are simple software processes who maintain and manage the published
messages also known as Kafka Servers.
 Brokers also manage the consumer-offsets and are responsible for the delivery of
messages to the right consumers.
 A set of brokers who are communicating with each other to perform the
management and maintenance task are collectively known as Kafka Cluster.
 We can add more brokers in a already running Kafka cluster without and downtime
which ensures horizontal scalability.

5) Zookeeper:
 Zookeeper is used to monitor Kafka Cluster and co-ordinate with each broker.
 Keeps all the metadata information related to Kafka Cluster in the form of Key-Value
pair.
Metadata Includes:
Configuration Information
Health status of each broker.
 It is used for the controller election within Kafka Cluster.
 A set of Zookeepers nodes working together to manage other distributed systems is
known as Zookeeper Cluster or Zookeeper Ensemble.

Kafka Features:

1) Scalable: Horizontal scaling is done by adding new brokers to the existing clusters.
2) Fault Tolerance: Kafka Clusters can handle failures because of its distributed nature.
3) Performance: Kafka has high throughput for both publishing and subscribing
messages.
4) No Data Loss: It ensures no data loss if we configure it properly.
5) Zero Down Time: It ensures zero downtime when required number of brokers are
present in the cluster.
Zookeeper:
a) Start the Zookeeper:
cd $ZOOKEEPER_HOME
bin/zkServer.sh start
b) Validate your Zookeeper is running or not:
echo stat | nc localhost 2181
c) Stop the Zookeeper:
bin/zkServer.sh stop

Kafka:
a) Start the Kafka:
cd $KAFKA_HOME
bin/kafka-server-start.sh config/server.properties
b) Validate your Kafka is running or not:
echo dump | nc localhost 2181 | grep brokers
c) Stop the Kafka:
bin/kafka-server-stop.sh

1) Create Topic:
bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic myTopic
--partitions 1 --replication-factor 1

2) List Topic:
bin/kafka-topics.sh --bootstrap-server localhost:9092 --list

3) Describe Topic:
bin/kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic myTopic

4) To create a Producer:
bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic myTopic

5) To create a Consumer:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic myTopic
--from-beginning

6) View Consumer Groups:

bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list

7) Describe Consumer Groups:

bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group
console-consumer-55106
8) Once the data is consumed we can see one __consumer_offsets created where all
the offsets information is stored.
bin/kafka-topics.sh --bootstrap-server localhost:9092 --list

9) Multiple Producer One Consumer:

bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic myTopic
bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic myTopic

10) Multiple Consumer One Producer:

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic myTopic
--from-beginning
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic myTopic

11) Let's verify how many Consumer Groups are now available:
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list

12) Create Consumer Group with User Defined Group:

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic myTopic
--group myConsumerGroup

Push data from Producer & see message in your User Defined Consumer Group

List Consumer Groups:

bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 –list

Before starting Multi-Node Zookeeper & Kafka you need to create tmp directories &
add broker id as mentioned in config.
mkdir /tmp/zookeeper-1
mkdir /tmp/zookeeper-2
mkdir /tmp/zookeeper-3

echo 1 >> /tmp/zookeeper-1/myid

echo 1 >> /tmp/zookeeper-2/myid
echo 1 >> /tmp/zookeeper-3/myid

14) Multi-Node Zookeeper:

cd /usr/local/hadoop-env/zookeeper_cluster/zkNode1/zookeeper-3.5.7
cd /usr/local/hadoop-env/zookeeper_cluster/zkNode2/zookeeper-3.5.7
cd /usr/local/hadoop-env/zookeeper_cluster/zkNode3/zookeeper-3.5.7
bin/zkServer.sh start
echo stat | nc localhost 2181

15) Multi-Node Kafka:

cd /usr/local/hadoop-env/kafka_cluster/kafkaNode1/kafka_2.12-2.5.0
cd /usr/local/hadoop-env/kafka_cluster/kafkaNode2/kafka_2.12-2.5.0
cd /usr/local/hadoop-env/kafka_cluster/kafkaNode3/kafka_2.12-2.5.0

bin/kafka-server-start.sh config/server.properties
echo dump | nc localhost 2181 | grep brokers

Kafka Controller Node:

In a Kafka Cluster, one of the broker serves as the controller which is responsible for
managing the state of partitions and replicas, also it performs administrative tasks like
reassigning partitions.

Command to see the controller:

echo dump | nc localhost 2181

16) Create Topic:

bin/kafka-topics.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094
--create --topic myMultiTopic --partitions 3 --replication-factor 5
Above should throw an error:

bin/kafka-topics.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094

--create --topic myMultiTopic --partitions 3 --replication-factor 3

17) Describe Topic:

bin/kafka-topics.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094
--describe --topic myMultiTopic
Stop any one broker by pressing Ctrl + C

Note: If you stop one broker, then your insync replica will not be exact like before it will
be a subset of replica, also it changes the leader.
bin/kafka-topics.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094
--describe --topic myMultiTopic

Again start your broker:

bin/kafka-server-start.sh config/server.properties
ISR will be updated again.
bin/kafka-topics.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094
--describe --topic myMultiTopic

Internals of Topics, Partitions & Replicas:

a) bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic internals

--partitions 3 --replication-factor 2
b) bin/kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic internals
What happens internally when we create topic:

Internals of Producer:
Offsets:
The records in the partitions are assigned a sequential id number called offset that
uniquely identifies each record within the partition.
1) Log-end offset: Offset of the last message written to a log/partition.
2) Current offset: Pointer to the last record that Kafka has already sent to a consumer in
the most recent poll.
3) Committed offset: Marking an offset as consumed is called committing an offset.

Metadata i.e. information about data when messages are sent to Topic.

Note: When Key is null it always send the messages in Round-Robin fashion. If key has
some value then the messages are directly sent to the required partition.

max.poll.response (15 records)

Offsets Cont’d…
2) Current offset: Pointer to the last record that Kafka has already sent to a consumer in
the most recent poll.
3) Committed offset: Marking an offset as consumed is called committing an offset.
Kafka Consumer Group:
Consumer group is a logical entity in Kafka ecosystem which mainly provides parallel
processing/scalable message consumption to consumer clients.
1) Each consumer must be associated with some consumer group.
2) Make sure there is no duplication within consumers who are part of the same
consumer group.
Consumer Group Rebalancing:
The process of re-distributing partitions to the consumers within a consumer group is
known as Consumer Group Rebalancing.

Rebalancing of a consumer group happens in below cases:

1) A consumer joining the consumer group.
2) A consumer leaving the consumer group.
3) If partitions are added to the topics which these consumers are interested in.
4) If a partition goes in offline state.

Consumer Group Rebalancing:

bin/kafka-topics.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094
--create --topic myConsumerGroupRebalancing --partitions 3 --replication-factor 3

bin/kafka-topics.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094

--describe --topic myConsumerGroupRebalancing

Create Producer:
bin/kafka-console-producer.sh --bootstrap-server
localhost:9092,localhost:9093,localhost:9094 --topic myConsumerGroupRebalancing

Create Consumer Group:

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic
myConsumerGroupRebalancing --group myConsumerGroupRebalancing
Produce Messages & see if the consumer is receiving or not.

Describe:
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group
myConsumerGroupRebalancing

If you see all the 3 partitions are consumed by same consumer group. See their
CONSUMER-ID

Create Consumer Group and check the Rebalancing:

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic
myConsumerGroupRebalancing --group myConsumerGroupRebalancing

Describe & See the partition balancing.

Create One more Consumer in the same Group & check Rebalancing:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic
myConsumerGroupRebalancing --group myConsumerGroupRebalancing
Describe & See the partition balancing.

Try: Now send the messages from one producer & observe the distribution in Round
Robin fashion to the consumer in the group.

Try: Add one more consumer in the group & see who sits idle as we have 3 partitions but
4 consumers available in the same consumer group.

Try: Kill one and send the data from producer again the data will start redistributing in
Round Robin fashion.

Delete Topic:

bin/kafka-topics.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094

--create --topic testDelete --partitions 3 --replication-factor 3

bin/kafka-topics.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094 --

delete --topic testDelete

Kafka Using Spring Boot
No ratings yet
Kafka Using Spring Boot
136 pages
Understanding Apache Kafka White Paper
No ratings yet
Understanding Apache Kafka White Paper
7 pages
CCNA Cisco Network Fundamentals FINAL Exam PDF
100% (1)
CCNA Cisco Network Fundamentals FINAL Exam PDF
14 pages
AK
No ratings yet
AK
22 pages
Fundamentals and Architecture of Apache Kafka
No ratings yet
Fundamentals and Architecture of Apache Kafka
30 pages
Kafka
No ratings yet
Kafka
20 pages
Kafka Notes
No ratings yet
Kafka Notes
7 pages
Apache Kafka Key Concepts
100% (1)
Apache Kafka Key Concepts
8 pages
Unit 5 Apache Kafka Notes
No ratings yet
Unit 5 Apache Kafka Notes
54 pages
Kafka Clustering v1.0.0
No ratings yet
Kafka Clustering v1.0.0
20 pages
Kafka
No ratings yet
Kafka
23 pages
Apache_Kafka_360_1631077800
No ratings yet
Apache_Kafka_360_1631077800
137 pages
Apache Kafka
No ratings yet
Apache Kafka
38 pages
Kafka
No ratings yet
Kafka
12 pages
Apache Kafka Long Polling
No ratings yet
Apache Kafka Long Polling
20 pages
4. Introduction to Apache Kafka and its setup (3)
No ratings yet
4. Introduction to Apache Kafka and its setup (3)
29 pages
Apache Kafka Description
No ratings yet
Apache Kafka Description
36 pages
kafka
No ratings yet
kafka
43 pages
Cours - Kafka
No ratings yet
Cours - Kafka
72 pages
08_Apache_Kafka
No ratings yet
08_Apache_Kafka
45 pages
Kafka Using Spring Boot v2
No ratings yet
Kafka Using Spring Boot v2
150 pages
Configuring Kafka For High Throughput
No ratings yet
Configuring Kafka For High Throughput
11 pages
KAFKAExample2
No ratings yet
KAFKAExample2
12 pages
Apache Kafka(1)
No ratings yet
Apache Kafka(1)
10 pages
Kafka Notes
No ratings yet
Kafka Notes
7 pages
Big Data-Kafka
No ratings yet
Big Data-Kafka
14 pages
5_kafka_2.7m
No ratings yet
5_kafka_2.7m
46 pages
w17_kafka_runningnotes-210309-183000
No ratings yet
w17_kafka_runningnotes-210309-183000
20 pages
Top Answers To Kafka Interview Questions
No ratings yet
Top Answers To Kafka Interview Questions
3 pages
Kafka 1
No ratings yet
Kafka 1
10 pages
Kafkha
No ratings yet
Kafkha
32 pages
Chapter 1 - Introduction To KAFKA: Objectives
No ratings yet
Chapter 1 - Introduction To KAFKA: Objectives
17 pages
Apache Kafka
No ratings yet
Apache Kafka
17 pages
Apache Kafka 101
No ratings yet
Apache Kafka 101
25 pages
Apache Kafka - PPT
No ratings yet
Apache Kafka - PPT
27 pages
Big Data - Group 14
No ratings yet
Big Data - Group 14
26 pages
Kafka Notes2
No ratings yet
Kafka Notes2
19 pages
Kafka Streaming Data
No ratings yet
Kafka Streaming Data
154 pages
unit 3
No ratings yet
unit 3
26 pages
Apache Kafka Tutorial
No ratings yet
Apache Kafka Tutorial
6 pages
Kafka SlidesShare
No ratings yet
Kafka SlidesShare
100 pages
Kafka Interview Questions
No ratings yet
Kafka Interview Questions
11 pages
Pache Kafka Is An Open-Source Distr
No ratings yet
Pache Kafka Is An Open-Source Distr
1 page
Apache Kafka
No ratings yet
Apache Kafka
13 pages
Kafka Architectures Notes
No ratings yet
Kafka Architectures Notes
9 pages
BDA Lab A7
No ratings yet
BDA Lab A7
10 pages
Kafka My Kafka Note v67
No ratings yet
Kafka My Kafka Note v67
55 pages
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
No ratings yet
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
23 pages
Introduction To Apache Kafka
No ratings yet
Introduction To Apache Kafka
18 pages
kafka-overview
No ratings yet
kafka-overview
36 pages
Kafka_Interview Questions
No ratings yet
Kafka_Interview Questions
4 pages
Kafka
No ratings yet
Kafka
3 pages
Kafka Patterns and Anti-Patterns
No ratings yet
Kafka Patterns and Anti-Patterns
7 pages
Documentation
No ratings yet
Documentation
105 pages
Kafka Topic Questions
No ratings yet
Kafka Topic Questions
9 pages
Kafka's Architecture: Find Answers On The Fly, or Master Something New. Subscribe Today
No ratings yet
Kafka's Architecture: Find Answers On The Fly, or Master Something New. Subscribe Today
1 page
Kafka: Big Data Huawei Course
No ratings yet
Kafka: Big Data Huawei Course
14 pages
Confluent Certified Developer for Apache Kafka® Exam kit
From Everand
Confluent Certified Developer for Apache Kafka® Exam kit
PRIYANKA
No ratings yet
Advanced Apache Kafka: Engineering High-Performance Streaming Applications
From Everand
Advanced Apache Kafka: Engineering High-Performance Streaming Applications
Peter Jones
No ratings yet
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
Internet Explorer Fundamentals
No ratings yet
Internet Explorer Fundamentals
24 pages
Chapter 4 - Protecting The Organization
No ratings yet
Chapter 4 - Protecting The Organization
17 pages
Book 2 Dipti
No ratings yet
Book 2 Dipti
17 pages
Delta Ia-Hmi Omron Cj1-Cs1 Series PLC CM en 20130823
No ratings yet
Delta Ia-Hmi Omron Cj1-Cs1 Series PLC CM en 20130823
4 pages
Daksh PPT
No ratings yet
Daksh PPT
16 pages
10.1.11.155
No ratings yet
10.1.11.155
1 page
How To Hack Wi-Fi: Stealing Wi-Fi Passwords With An Evil Twin Attack Null Byte :: WonderHowTo
No ratings yet
How To Hack Wi-Fi: Stealing Wi-Fi Passwords With An Evil Twin Attack Null Byte :: WonderHowTo
8 pages
01 2014-4-8 ITCC Architecture Overview
No ratings yet
01 2014-4-8 ITCC Architecture Overview
17 pages
Wireless Networks Dundamentals
No ratings yet
Wireless Networks Dundamentals
17 pages
User Guide Version 02.00 140912
No ratings yet
User Guide Version 02.00 140912
68 pages
4.1 Networks: Ict Igcse Theory
No ratings yet
4.1 Networks: Ict Igcse Theory
24 pages
Chord (Peer-To-Peer) : Computing Peer-To-Peer Distributed Hash Table Key-Value Pairs
No ratings yet
Chord (Peer-To-Peer) : Computing Peer-To-Peer Distributed Hash Table Key-Value Pairs
7 pages
JMS Tutorial
100% (3)
JMS Tutorial
39 pages
UPOE 802.3bt
No ratings yet
UPOE 802.3bt
13 pages
DWDM Node Acceptance Criteria - Express Layer - v2 T24 Chennai Ekkathangal-1
No ratings yet
DWDM Node Acceptance Criteria - Express Layer - v2 T24 Chennai Ekkathangal-1
15 pages
Udemy AWS Certification Learning
100% (1)
Udemy AWS Certification Learning
174 pages
6-Web Application Security
No ratings yet
6-Web Application Security
36 pages
Nama Wifi Yang Unik
No ratings yet
Nama Wifi Yang Unik
5 pages
Analyzing Android Encrypted Network Traffic To Identify User Actions
No ratings yet
Analyzing Android Encrypted Network Traffic To Identify User Actions
8 pages
Custodio, Mac Paul A. - 1.3.2.5+Packet+Tracer+-+Investigating+Directly+Connected+Routes+Instructions
No ratings yet
Custodio, Mac Paul A. - 1.3.2.5+Packet+Tracer+-+Investigating+Directly+Connected+Routes+Instructions
6 pages
ENCOR Chapter 28
No ratings yet
ENCOR Chapter 28
43 pages
f2464 Wcdma Rtu Technical Specification
No ratings yet
f2464 Wcdma Rtu Technical Specification
4 pages
Install - Guide CentOS7 SLURM 1.3 x86 - 64
No ratings yet
Install - Guide CentOS7 SLURM 1.3 x86 - 64
45 pages
Deltav PK Controller: Powerful Standalone. Easily Integrated
No ratings yet
Deltav PK Controller: Powerful Standalone. Easily Integrated
2 pages
Asn PDF
No ratings yet
Asn PDF
21 pages
Debug 1214
No ratings yet
Debug 1214
17 pages
BCA_602_22062022
No ratings yet
BCA_602_22062022
2 pages
EEE449 Computer Networks Lecture Slide 2 For Student
No ratings yet
EEE449 Computer Networks Lecture Slide 2 For Student
18 pages
Lumber DNT IP Square NEW IB 3th June 2010a
No ratings yet
Lumber DNT IP Square NEW IB 3th June 2010a
47 pages

Kafka

Uploaded by

Kafka

Uploaded by

What is Kafka?

 Kafka is a fast, scalable, fault-tolerant distributed message streaming platform that

What is a messaging system?

There are two types of messaging patterns available:

b) Publish-Subscribe Messaging System

6) View Consumer Groups:

7) Describe Consumer Groups:

9) Multiple Producer One Consumer:

10) Multiple Consumer One Producer:

12) Create Consumer Group with User Defined Group:

List Consumer Groups:

echo 1 >> /tmp/zookeeper-1/myid

14) Multi-Node Zookeeper:

15) Multi-Node Kafka:

Kafka Controller Node:

Command to see the controller:

16) Create Topic:

bin/kafka-topics.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094

17) Describe Topic:

Again start your broker:

Internals of Topics, Partitions & Replicas:

a) bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic internals

max.poll.response (15 records)

Rebalancing of a consumer group happens in below cases:

Consumer Group Rebalancing:

bin/kafka-topics.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094

Create Consumer Group:

Create Consumer Group and check the Rebalancing:

Describe & See the partition balancing.

bin/kafka-topics.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094

bin/kafka-topics.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094 --

You might also like