KAFKA
Introduction:
Kafka is a distributed publish-subscribe messaging system that is designed to be
fast, scalable, and durable.
 Kafka maintains feeds of messages in categories called topics.
 Kafka messages are generated by processes called producers.
 The processes that subscribe to topics and process the feed of published
messages are called consumers.
 Kafka is run as a cluster comprised of one or more servers each of which
is called a broker.
Quick Start:
1. Create a topic
/usr/bin/kafka-topics --create --zookeeper zookeeperIP:2181 --
replication-factor1 --partitions1 --topic testTopic
2. Publish Messagevia Producer to a topic
/usr/bin/kafka-console-producer --broker-list producerIP:9092 --topic
testTopic
This is firstkafka message
3. Starting a consumer
/usr/bin/kafka-console-consumer --zookeeper zookeeperIP:2181 --topic
testTopic --from-beginning
If you start different consumers in different sessions of putty then you can see
messages being delivered to all the consumers as soon as the producer
publishes them
A bit more details:
A topic is a category or feed name to which messages are published. For each
topic, the Kafka cluster maintains a partitioned log that looks like…
Each partition is an ordered, immutable sequence of messages that is continually
appended to commit log. The messages in the partitions are each assigned a
sequential id number called the offset that uniquely identifies each message
within the partition.
The Kafka cluster retains all published messages whether or not they have been
consumed for a configurable period of time. Log retention can be set in two
either, day based retention or size based retention. Kafka's performance is
effectively constant with respect to data size so retaining lots of data is not a
problem.
QnA:
1. What type of messages canbe sent froma producer
The producer class takes two generic parameter i.e
Producer<K, V>
V: type of the message
K: type of the optional key associated with the message
So any kind of message can be sent for example String, JSON, AVRO
2. How a consumer can start reading from a particular offset
Kafka does not take care of the offset up till which a particular consumer has
already read. The consumer has to take care of the offset on his side. The
information regarding the offset up till he has consumed the messages have to
be stored elsewhere i.e HDFS/Db/HBase etc.
Kafka only provides two kind of reading from Beginning OR from Latest Time
3. So When touse Kafka?
Cloudera recommends using Kafka if the data will be consumed by multiple
applications
API Examples:
 A sample Producer
Propertiesprops=newProperties();
props.put("metadata.broker.list",args[0]);
props.put("zk.connect",args[1]);
props.put("serializer.class","kafka.serializer.StringEncoder");
props.put("request.required.acks","1");
StringTOPIC= "event";
ProducerConfigconfig=newProducerConfig(props);
Producer<String,String>producer=new Producer<String,String>(config);
String[] events={"Normal","Normal","Normal",…];
String[] truckIds= {"1", "2", "3","4"};
String[] driverIds={"11", "12", "13", "14"};
Stringmessage = newTimestamp(newDate().getTime()) +"|"
+ truckIds[2] + "|" + driverIds[2] +"|" + events[random.nextInt(evtCnt)] );
try {
KeyedMessage<String,String>data= new KeyedMessage<String,String>(TOPIC, message);
producer.send(data);
Thread.sleep(1000);
} catch (Exceptione) {
e.printStackTrace();
}
 A sample Consumer
Kafka provides a simple consumer which can be modified as per requirement
Steps for using a Simple Consumer
 Find an active Broker and find out which Broker is the leader for your topic
and partition
 Determine who the replica Brokers are for your topic and partition
 Build the request defining what data you are interested in
 Fetch the data
 Identify and recover from leader changes
Data fetch pseudo code
FetchRequestreq=new FetchRequestBuilder().clientId(clientName).addFetch(a_topic,a_partition,
readOffset, 100000).build();
FetchResponsefetchResponse =consumer.fetch(req);
if (fetchResponse.hasError()) {
//Error Handlingcode here
}
for (MessageAndOffset messageAndOffset:fetchResponse.messageSet(a_topic,a_partition)) {
longcurrentOffset=messageAndOffset.offset();
if (currentOffset<readOffset) {
//Properloggerhere
continue;
}
readOffset=messageAndOffset.nextOffset();
ByteBufferpayload=messageAndOffset.message().payload();
byte[] bytes=new byte[payload.limit()];
payload.get(bytes);
System.out.println(String.valueOf(messageAndOffset.offset()) + ": " + new String(bytes, "UTF-8"));
numRead++;
a_maxReads--;
}
Conclusion:
As you can see, Kafka has a unique design that makes it very useful for solving a
wide range of architectural challenges. It is important to make sure you use the
right approach for your use case and use it correctly to ensure high throughput,
low latency, high availability, and no loss of data.

KAFKA Quickstart

  • 1.
    KAFKA Introduction: Kafka is adistributed publish-subscribe messaging system that is designed to be fast, scalable, and durable.  Kafka maintains feeds of messages in categories called topics.  Kafka messages are generated by processes called producers.  The processes that subscribe to topics and process the feed of published messages are called consumers.  Kafka is run as a cluster comprised of one or more servers each of which is called a broker. Quick Start: 1. Create a topic /usr/bin/kafka-topics --create --zookeeper zookeeperIP:2181 -- replication-factor1 --partitions1 --topic testTopic 2. Publish Messagevia Producer to a topic /usr/bin/kafka-console-producer --broker-list producerIP:9092 --topic testTopic This is firstkafka message
  • 2.
    3. Starting aconsumer /usr/bin/kafka-console-consumer --zookeeper zookeeperIP:2181 --topic testTopic --from-beginning If you start different consumers in different sessions of putty then you can see messages being delivered to all the consumers as soon as the producer publishes them A bit more details: A topic is a category or feed name to which messages are published. For each topic, the Kafka cluster maintains a partitioned log that looks like…
  • 3.
    Each partition isan ordered, immutable sequence of messages that is continually appended to commit log. The messages in the partitions are each assigned a sequential id number called the offset that uniquely identifies each message within the partition. The Kafka cluster retains all published messages whether or not they have been consumed for a configurable period of time. Log retention can be set in two either, day based retention or size based retention. Kafka's performance is effectively constant with respect to data size so retaining lots of data is not a problem. QnA: 1. What type of messages canbe sent froma producer The producer class takes two generic parameter i.e Producer<K, V> V: type of the message K: type of the optional key associated with the message So any kind of message can be sent for example String, JSON, AVRO 2. How a consumer can start reading from a particular offset Kafka does not take care of the offset up till which a particular consumer has already read. The consumer has to take care of the offset on his side. The information regarding the offset up till he has consumed the messages have to be stored elsewhere i.e HDFS/Db/HBase etc. Kafka only provides two kind of reading from Beginning OR from Latest Time
  • 4.
    3. So Whentouse Kafka? Cloudera recommends using Kafka if the data will be consumed by multiple applications API Examples:  A sample Producer Propertiesprops=newProperties(); props.put("metadata.broker.list",args[0]); props.put("zk.connect",args[1]); props.put("serializer.class","kafka.serializer.StringEncoder"); props.put("request.required.acks","1"); StringTOPIC= "event"; ProducerConfigconfig=newProducerConfig(props); Producer<String,String>producer=new Producer<String,String>(config); String[] events={"Normal","Normal","Normal",…]; String[] truckIds= {"1", "2", "3","4"}; String[] driverIds={"11", "12", "13", "14"}; Stringmessage = newTimestamp(newDate().getTime()) +"|" + truckIds[2] + "|" + driverIds[2] +"|" + events[random.nextInt(evtCnt)] ); try { KeyedMessage<String,String>data= new KeyedMessage<String,String>(TOPIC, message); producer.send(data); Thread.sleep(1000); } catch (Exceptione) { e.printStackTrace(); }  A sample Consumer Kafka provides a simple consumer which can be modified as per requirement Steps for using a Simple Consumer  Find an active Broker and find out which Broker is the leader for your topic and partition  Determine who the replica Brokers are for your topic and partition
  • 5.
     Build therequest defining what data you are interested in  Fetch the data  Identify and recover from leader changes Data fetch pseudo code FetchRequestreq=new FetchRequestBuilder().clientId(clientName).addFetch(a_topic,a_partition, readOffset, 100000).build(); FetchResponsefetchResponse =consumer.fetch(req); if (fetchResponse.hasError()) { //Error Handlingcode here } for (MessageAndOffset messageAndOffset:fetchResponse.messageSet(a_topic,a_partition)) { longcurrentOffset=messageAndOffset.offset(); if (currentOffset<readOffset) { //Properloggerhere continue; } readOffset=messageAndOffset.nextOffset(); ByteBufferpayload=messageAndOffset.message().payload(); byte[] bytes=new byte[payload.limit()]; payload.get(bytes); System.out.println(String.valueOf(messageAndOffset.offset()) + ": " + new String(bytes, "UTF-8")); numRead++; a_maxReads--; } Conclusion: As you can see, Kafka has a unique design that makes it very useful for solving a wide range of architectural challenges. It is important to make sure you use the right approach for your use case and use it correctly to ensure high throughput, low latency, high availability, and no loss of data.