0% found this document useful (1 vote)

224 views

Apache Storm Tutorial Point

Uploaded by

ashokmvanjare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

224 views

Apache Storm Tutorial Point

Uploaded by

ashokmvanjare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Apache Storm

About the Tutorial

Storm was originally created by Nathan Marz and team at BackType. BackType is a social
analytics company. Later, Storm was acquired and open-sourced by Twitter. In a short time,
Apache Storm became a standard for distributed real-time processing system that allows you to
process large amount of data, similar to Hadoop. Apache Storm is written in Java and Clojure.
It is continuing to be a leader in real-time analytics.

This tutorial will explore the principles of Apache Storm, distributed messaging, installation,
creating Storm topologies and deploy them to a Storm cluster, workflow of Trident, real-time
applications and finally concludes with some useful examples.

Audience
This tutorial has been prepared for professionals aspiring to make a career in Big Data Analytics
using Apache Storm framework. This tutorial will give you enough understanding on creating
and deploying a Storm cluster in a distributed environment.

Prerequisites
Before proceeding with this tutorial, you must have a good understanding of Core Java and any
of the Linux flavors.

Copyright & Disclaimer

All the content and graphics published in this e-book are the property of Tutorials Point (I) Pvt.
Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute or republish any
contents or a part of contents of this e-book in any manner without written consent of the
publisher.

We strive to update the contents of our website and tutorials as timely and as precisely as
possible, however, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt. Ltd.
provides no guarantee regarding the accuracy, timeliness or completeness of our website or its
contents including this tutorial. If you discover any errors on our website or in this tutorial,
please notify us at [email protected]

i
Apache Storm

Table of Contents
About the Tutorial..............................................................................................................................................i

Audience............................................................................................................................................................i

Prerequisites ......................................................................................................................................................i

Copyright & Disclaimer.......................................................................................................................................i

Table of Contents ..............................................................................................................................................ii

1. APACHE STORM – INTRODUCTION .................................................................................................1

What is Apache Storm? .....................................................................................................................................1

Apache Storm vs Hadoop ..................................................................................................................................1

Use-Cases of Apache Storm ...............................................................................................................................2

Apache Storm – Benefits ...................................................................................................................................3

2. APACHE STORM – CORE CONCEPTS................................................................................................4

Topology ...........................................................................................................................................................5

Tasks .................................................................................................................................................................5

Workers ............................................................................................................................................................6

Stream Grouping ...............................................................................................................................................6

3. STORM – CLUSTER ARCHITECTURE .................................................................................................9

4. APACHE STORM – WORKFLOW .....................................................................................................11

5. STORM – DISTRIBUTED MESSAGING SYSTEM...............................................................................12

What is Distributed Messaging System? ..........................................................................................................12

Thrift Protocol .................................................................................................................................................13

6. APACHE STORM – INSTALLATION..................................................................................................14

Step 1: Verifying Java Installation ....................................................................................................................14

Step 2: ZooKeeper Framework Installation ......................................................................................................15

Step 3: Apache Storm Framework Installation .................................................................................................17

ii
Apache Storm

7. APACHE STORM – WORKING EXAMPLE ........................................................................................19

Scenario – Mobile Call Log Analyzer ................................................................................................................19

Spout Creation ................................................................................................................................................19

Bolt Creation ...................................................................................................................................................23

Call log Creator Bolt.........................................................................................................................................24

Call log Counter Bolt........................................................................................................................................26

Creating Topology ...........................................................................................................................................27

Local Cluster....................................................................................................................................................28

Building and Running the Application..............................................................................................................29

Non-JVM languages.........................................................................................................................................30

8. APACHE STORM – TRIDENT ...........................................................................................................32

Trident Topology .............................................................................................................................................32

Trident Tuples .................................................................................................................................................32

Trident Spout ..................................................................................................................................................32

Trident Operations ..........................................................................................................................................33

State Maintenance ..........................................................................................................................................37

Distributed RPC ...............................................................................................................................................37

When to Use Trident?......................................................................................................................................37

Working Example of Trident ............................................................................................................................37

Building and Running the Application..............................................................................................................41

9. APACHE STORM IN TWITTER .........................................................................................................43

Twitter ............................................................................................................................................................43

Hashtag Reader Bolt........................................................................................................................................47

Hashtag Counter Bolt ......................................................................................................................................49

Submitting a Topology.....................................................................................................................................50

Building and Running the Application..............................................................................................................51

iii
Apache Storm

10. APACHE STORM IN YAHOO! FINANCE...........................................................................................53

Spout Creation ................................................................................................................................................53

Bolt Creation ...................................................................................................................................................55

Submitting a Topology.....................................................................................................................................57

Building and Running the Application..............................................................................................................58

11. APACHE STORM – APPLICATIONS..................................................................................................59

Klout ...............................................................................................................................................................59

The Weather Channel......................................................................................................................................59

Telecom Industry.............................................................................................................................................59

iv
1. Apache Storm – Introduction Apache Storm

What is Apache Storm?

Apache Storm is a distributed real-time big data-processing system. Storm is designed to
process vast amount of data in a fault-tolerant and horizontal scalable method. It is a
streaming data framework that has the capability of highest ingestion rates. Though Storm is
stateless, it manages distributed environment and cluster state via Apache ZooKeeper. It is
simple and you can execute all kinds of manipulations on real-time data in parallel.

Apache Storm is continuing to be a leader in real-time data analytics. Storm is easy to setup,
operate and it guarantees that every message will be processed through the topology at least
once.

Apache Storm vs Hadoop

Basically Hadoop and Storm frameworks are used for analyzing big data. Both of them
complement each other and differ in some aspects. Apache Storm does all the operations
except persistency, while Hadoop is good at everything but lags in real-time computation.
The following table compares the attributes of Storm and Hadoop.

Storm Hadoop

Real-time stream processing Batch processing

Stateless Stateful

Master/Slave architecture with ZooKeeper Master-slave architecture with/without

based coordination. The master node is ZooKeeper based coordination. Master node
called as nimbus and slaves are is job tracker and slave node is task
supervisors. tracker.
A Storm streaming process can access tens
Hadoop Distributed File System (HDFS) uses
of thousands messages per second on
MapReduce framework to process vast
cluster.
amount of data that takes minutes or hours.

Storm topology runs until shutdown by the MapReduce jobs are executed in a sequential
user or an unexpected unrecoverable failure. order and completed eventually.

Both are distributed and fault-tolerant

5
Apache Storm

If nimbus / supervisor dies, restarting makes If the JobTracker dies, all the running jobs are
it continue from where it stopped, hence lost.
nothing gets affected.

Use-Cases of Apache Storm

Apache Storm is very famous for real-time big data stream processing. For this reason, most
of the companies are using Storm as an integral part of their system. Some notable examples
are as follows:

Twitter – Twitter is using Apache Storm for its range of “Publisher Analytics products”.
“Publisher Analytics Products” process each and every tweets and clicks in the Twitter
Platform. Apache Storm is deeply integrated with Twitter infrastructure.

NaviSite – NaviSite is using Storm for Event log monitoring/auditing system. Every logs
generated in the system will go through the Storm. Storm will check the message against the
configured set of regular expression and if there is a match, then that particular message will
be saved to the database.

Wego – Wego is a travel metasearch engine located in Singapore. Travel related data comes
from many sources all over the world with different timing. Storm helps Wego to search real-
time data, resolves concurrency issues and find the best match for the end-user.

Apache Storm – Benefits

Here is a list of the benefits that Apache Storm offers:

 Storm is open source, robust, and user friendly. It could be utilized in small companies
as well as large corporations.

 Storm is fault tolerant, flexible, reliable, and supports any programming language.

 Allows real-time stream processing.

 Storm is unbelievably fast because it has enormous power of processing the data.

 Storm can keep up the performance even under increasing load by adding resources
linearly. It is highly scalable.

 Storm performs data refresh and end-to-end delivery response in seconds or minutes
depends upon the problem. It has very low latency.

 Storm has operational intelligence.

6
Apache Storm

 Storm provides guaranteed data processing even if any of the connected nodes in the
cluster die or messages are lost.

7
2. Apache Storm – Core Concepts Apache Storm

Apache Storm reads raw stream of real-time data from one end and passes it through a
sequence of small processing units and output the processed / useful information at the other
end.

The following diagram depicts the core concept of Apache Storm.

Let us now have a closer look at the components of Apache Storm:

Components Description

Tuple is the main data structure in Storm. It is a list of ordered elements.

Tuple By default, a Tuple supports all data types. Generally, it is modelled as a
set of comma separated values and passed to a Storm cluster.

Stream Stream is an unordered sequence of tuples.

Source of stream. Generally, Storm accepts input data from raw data
sources like Twitter Streaming API, Apache Kafka queue, Kestrel queue,
Spouts etc. Otherwise you can write spouts to read data from datasources.
“ISpout" is the core interface for implementing spouts. Some of the
specific interfaces are IRichSpout, BaseRichSpout, KafkaSpout, etc.

8
Apache Storm

Bolts are logical processing units. Spouts pass data to bolts and bolts
process and produce a new output stream. Bolts can perform the
operations of filtering, aggregation, joining, interacting with data sources
Bolts
and databases. Bolt receives data and emits to one or more bolts. “IBolt”
is the core interface for implementing bolts. Some of the common
interfaces are IRichBolt, IBasicBolt, etc.

Let’s take a real-time example of “Twitter Analysis” and see how it can be modelled in Apache
Storm. The following diagram depicts the structure.

The input for the “Twitter Analysis” comes from Twitter Streaming API. Spout will read the
tweets of the users using Twitter Streaming API and output as a stream of tuples. A single
tuple from the spout will have a twitter username and a single tweet as comma separated
values. Then, this steam of tuples will be forwarded to the Bolt and the Bolt will split the tweet
into individual word, calculate the word count, and persist the information to a configured
datasource. Now, we can easily get the result by querying the datasource.

9
Apache Storm

Topology
Spouts and bolts are connected together and they form a topology. Real-time application logic
is specified inside Storm topology. In simple words, a topology is a directed graph where
vertices are computation and edges are stream of data.

A simple topology starts with spouts. Spout emits the data to one or more bolts. Bolt
represents a node in the topology having the smallest processing logic and the output of a
bolt can be emitted into another bolt as input.

Storm keeps the topology always running, until you kill the topology. Apache Storm’s main
job is to run the topology and will run any number of topology at a given time.

Tasks
Now you have a basic idea on spouts and bolts. They are the smallest logical unit of the
topology and a topology is built using a single spout and an array of bolts. They should be
executed properly in a particular order for the topology to run successfully. The execution of
each and every spout and bolt by Storm is called as “Tasks”. In simple words, a task is either
the execution of a spout or a bolt. At a given time, each spout and bolt can have multiple
instances running in multiple separate threads.

Workers
A topology runs in a distributed manner, on multiple worker nodes. Storm spreads the tasks
evenly on all the worker nodes. The worker node’s role is to listen for jobs and start or stop
the processes whenever a new job arrives.

Stream Grouping
Stream of data flows from spouts to bolts or from one bolt to another bolt. Stream grouping
controls how the tuples are routed in the topology and helps us to understand the tuples flow
in the topology. There are four in-built groupings as explained below.

Shuffle Grouping

10
Apache Storm

In shuffle grouping, an equal number of tuples is distributed randomly across all of the
workers executing the bolts. The following diagram depicts the structure.

11
Apache Storm

Field Grouping
The fields with same values in tuples are grouped together and the remaining tuples kept
outside. Then, the tuples with the same field values are sent forward to the same worker
executing the bolts. For example, if the stream is grouped by the field “word”, then the tuples
with the same string, “Hello” will move to the same worker. The following diagram shows how
Field Grouping works.

Global Grouping
All the streams can be grouped and forward to one bolt. This grouping sends tuples generated
by all instances of the source to a single target instance (specifically, pick the worker with
lowest ID).

12
Apache Storm

All Grouping
All Grouping sends a single copy of each tuple to all instances of the receiving bolt. This kind
of grouping is used to send signals to bolts. All grouping is useful for join operations.

13
Apache Storm

14
3. Storm – Cluster Architecture Apache Storm

One of the main highlight of the Apache Storm is that it is a fault-tolerant, fast with no “Single
Point of Failure” (SPOF) distributed application. We can install Apache Storm in as many
systems as needed to increase the capacity of the application.

Let’s have a look at how the Apache Storm cluster is designed and its internal architecture.
The following diagram depicts the cluster design.

Apache Storm has two type of nodes, Nimbus (master node) and Supervisor (worker node).
Nimbus is the central component of Apache Storm. The main job of Nimbus is to run the
Storm topology. Nimbus analyzes the topology and gathers the task to be executed. Then, it
will distributes the task to an available supervisor.

A supervisor will have one or more worker process. Supervisor will delegate the tasks to
worker processes. Worker process will spawn as many executors as needed and run the task.
Apache Storm uses an internal distributed messaging system for the communication between
nimbus and supervisors.

Components Description

15
Apache Storm

Nimbus is a master node of Storm cluster. All other nodes in the

cluster are called as worker nodes. Master node is responsible for
Nimbus
distributing data among all the worker nodes, assign tasks to worker
nodes and monitoring failures.

The nodes that follow instructions given by the nimbus are called as
Supervisors. A supervisor has multiple worker processes and it
Supervisor
governs worker processes to complete the tasks assigned by the
nimbus.

A worker process will execute tasks related to a specific topology. A

worker process will not run a task by itself, instead it creates
Worker process
executors and asks them to perform a particular task. A worker
process will have multiple executors.

An executor is nothing but a single thread spawn by a worker

Executor process. An executor runs one or more tasks but only for a specific
spout or bolt.

A task performs actual data processing. So, it is either a spout or a

Task
bolt.

Apache ZooKeeper is a service used by a cluster (group of nodes)

to coordinate between themselves and maintaining shared data with
robust synchronization techniques. Nimbus is stateless, so it
ZooKeeper framework depends on ZooKeeper to monitor the working node status.

ZooKeeper helps the supervisor to interact with the nimbus. It is

responsible to maintain the state of nimbus and supervisor.

Storm is stateless in nature. Even though stateless nature has its own disadvantages, it
actually helps Storm to process real-time data in the best possible and quickest way.

Storm is not entirely stateless though. It stores its state in Apache ZooKeeper. Since the state
is available in Apache ZooKeeper, a failed nimbus can be restarted and made to work from
where it left. Usually, service monitoring tools like monit will monitor Nimbus and restart it
if there is any failure.

Apache Storm also have an advanced topology called Trident Topology with state
maintenance and it also provides a high-level API like Pig. We will discuss all these features
in the coming chapters.

16
4. Apache Storm – Workflow Apache Storm

A working Storm cluster should have one nimbus and one or more supervisors. Another
important node is Apache ZooKeeper, which will be used for the coordination between the
nimbus and the supervisors.

Let us now take a close look at the workflow of Apache Storm:

 Initially, the nimbus will wait for the “Storm Topology” to be submitted to it. The

 Once a topology is submitted, it will process the topology and gather all the tasks that
are to be carried out and the order in which the task is to be executed.

 Then, the nimbus will evenly distribute the tasks to all the available supervisors.

 At a particular time interval, all supervisors will send heartbeats to the nimbus to
inform that they are still alive.

 When a supervisor dies and doesn’t send a heartbeat to the nimbus, then the nimbus
assigns the tasks to another supervisor.

 When the nimbus itself dies, supervisors will work on the already assigned task without
any issue.

 Once all the tasks are completed, the supervisor will wait for a new task to come in.

 In the meantime, the dead nimbus will be restarted automatically by service

monitoring tools.

 The restarted nimbus will continue from where it stopped. Similarly, the dead
supervisor can also be restarted automatically. Since both the nimbus and the
supervisor can be restarted automatically and both will continue as before, Storm is
guaranteed to process all the task at least once.

 Once all the topologies are processed, the nimbus waits for a new topology to arrive
and similarly the supervisor waits for new tasks.

By default, there are two modes in a Storm cluster:

 Local mode: This mode is used for development, testing, and debugging because it
is the easiest way to see all the topology components working together. In this mode,
we can adjust parameters that enable us to see how our topology runs in different
Storm configuration environments. In Local mode, storm topologies run on the local
machine in a single JVM.

 Production mode: In this mode, we submit our topology to the working storm
cluster, which is composed of many processes, usually running on different machines.

17
Apache Storm

As discussed in the workflow of storm, a working cluster will run indefinitely until it is
shutdown.

18
Apache Storm

End of ebook preview

If you liked what you saw…
Buy it from our store @ https://store.tutorialspoint.com

Hadoop Classroom Notes
100% (2)
Hadoop Classroom Notes
76 pages
A Survey of New Product Evaluation Models
No ratings yet
A Survey of New Product Evaluation Models
18 pages
Xstream Tutorial
100% (1)
Xstream Tutorial
68 pages
Angular Material Tutorial
0% (1)
Angular Material Tutorial
17 pages
Wokring With Python Django-Web Framework Define Statement
No ratings yet
Wokring With Python Django-Web Framework Define Statement
78 pages
List Vs Tuples
No ratings yet
List Vs Tuples
10 pages
Apache Httpclient Tutorial
100% (1)
Apache Httpclient Tutorial
69 pages
Spark SQL Tutorial
0% (1)
Spark SQL Tutorial
7 pages
Hadoop - The Final Product
100% (2)
Hadoop - The Final Product
42 pages
Spark Notes
No ratings yet
Spark Notes
71 pages
Parallel Programming With Spark: Matei Zaharia
No ratings yet
Parallel Programming With Spark: Matei Zaharia
40 pages
Building A Custom Tag Library
50% (2)
Building A Custom Tag Library
29 pages
18.1OOP's Part - 1 STUDY MATERIAL PDF
No ratings yet
18.1OOP's Part - 1 STUDY MATERIAL PDF
31 pages
Frontend Cheatsheet
No ratings yet
Frontend Cheatsheet
2 pages
17 SparkSQL
No ratings yet
17 SparkSQL
44 pages
Durga Core Java
50% (2)
Durga Core Java
2 pages
Avro Tutorial
100% (2)
Avro Tutorial
49 pages
Study Material: Full Stack Web Development With Python and Django
No ratings yet
Study Material: Full Stack Web Development With Python and Django
21 pages
Docker - Part1
No ratings yet
Docker - Part1
3 pages
Beginning Modern Javascript a Step by Step Gentle Guide to Learn
No ratings yet
Beginning Modern Javascript a Step by Step Gentle Guide to Learn
186 pages
NareshIT PHP Training Notes03 SwamyNaidu
No ratings yet
NareshIT PHP Training Notes03 SwamyNaidu
100 pages
Python Class Links From Naresh IT
No ratings yet
Python Class Links From Naresh IT
3 pages
Ringkasan 2
100% (1)
Ringkasan 2
241 pages
Maven Notes
No ratings yet
Maven Notes
45 pages
Ramchandra Corejava New
67% (3)
Ramchandra Corejava New
569 pages
Chp6-Testing Angular: Unit Testing and End-To-End Testing
No ratings yet
Chp6-Testing Angular: Unit Testing and End-To-End Testing
19 pages
Spark RDD Dataframes SQL
No ratings yet
Spark RDD Dataframes SQL
3 pages
NARESH IT FINAL YEAR PROJECT TITLES - 2010-2011e
No ratings yet
NARESH IT FINAL YEAR PROJECT TITLES - 2010-2011e
66 pages
Unix Shell Scripting
100% (1)
Unix Shell Scripting
46 pages
Hive Query Optimization Infinity
No ratings yet
Hive Query Optimization Infinity
13 pages
Scala Tutorial
No ratings yet
Scala Tutorial
119 pages
Angular PDF
No ratings yet
Angular PDF
54 pages
Bigdata Notes
No ratings yet
Bigdata Notes
26 pages
Tutorial Elasticsearch - English
0% (1)
Tutorial Elasticsearch - English
166 pages
HDFS Interview Questions
No ratings yet
HDFS Interview Questions
29 pages
Jenkins Declarative Pipeline
No ratings yet
Jenkins Declarative Pipeline
41 pages
Hibernate PDF
No ratings yet
Hibernate PDF
177 pages
Angular Material Tutorial PDF
0% (1)
Angular Material Tutorial PDF
17 pages
Spring Boot Interview Questions: 1. Overview
No ratings yet
Spring Boot Interview Questions: 1. Overview
9 pages
Distributed Database Systems: - Spark I
No ratings yet
Distributed Database Systems: - Spark I
59 pages
Hive Tutorial PDF
0% (1)
Hive Tutorial PDF
14 pages
Pair RDD Operations: Flat Map
No ratings yet
Pair RDD Operations: Flat Map
4 pages
Subba Raju Sir (HTML 5) PDF
No ratings yet
Subba Raju Sir (HTML 5) PDF
270 pages
The Difference Between XML and HTML
No ratings yet
The Difference Between XML and HTML
161 pages
DB Campus Drive Preparation Materials Geeks4Geeks
No ratings yet
DB Campus Drive Preparation Materials Geeks4Geeks
14 pages
Apache Airflow On Docker For Complete Beginners - Justin Gage - Medium
No ratings yet
Apache Airflow On Docker For Complete Beginners - Justin Gage - Medium
12 pages
Top 5 Best Software Training Institute in Hyderabad
No ratings yet
Top 5 Best Software Training Institute in Hyderabad
14 pages
TutorialsPoint Node.js
0% (1)
TutorialsPoint Node.js
145 pages
Django Tutorial
No ratings yet
Django Tutorial
85 pages
Sathaya Institute KVR Sir Java Notes
50% (2)
Sathaya Institute KVR Sir Java Notes
147 pages
Apache Airflow TRAINING12532
No ratings yet
Apache Airflow TRAINING12532
3 pages
IBM InfoSphere Replication Server and Data Event Publisher
From Everand
IBM InfoSphere Replication Server and Data Event Publisher
Pav Kumar-Chatterjee
No ratings yet
Apache Storm Tutorial
100% (1)
Apache Storm Tutorial
64 pages
PPT 2.1.4
No ratings yet
PPT 2.1.4
23 pages
HD Mod012 Storm
No ratings yet
HD Mod012 Storm
79 pages
Apache Storm
No ratings yet
Apache Storm
39 pages
Apache Storm
No ratings yet
Apache Storm
29 pages
Big Data
No ratings yet
Big Data
12 pages
Cloud Computing : Beginners And Intermediate User Guide
From Everand
Cloud Computing : Beginners And Intermediate User Guide
David comer
No ratings yet
Storm Applied
No ratings yet
Storm Applied
2 pages
Building Python Real-Time Applications With Storm - Sample Chapter
No ratings yet
Building Python Real-Time Applications With Storm - Sample Chapter
18 pages
Analysis & Pediction Using WEKA Machine Learing Toolkit
No ratings yet
Analysis & Pediction Using WEKA Machine Learing Toolkit
37 pages
Full Stack Java 6
No ratings yet
Full Stack Java 6
4 pages
Examining Gender Differences in People's Information-Sharing Decisions
No ratings yet
Examining Gender Differences in People's Information-Sharing Decisions
12 pages
Searching Efficient Estimator of Population Variance Using Tri-Mean and Third Quartile of Auxiliary Variable
No ratings yet
Searching Efficient Estimator of Population Variance Using Tri-Mean and Third Quartile of Auxiliary Variable
11 pages
Analysis of Mediation Effect of Country-Of-Origin Image On Brand Equity
No ratings yet
Analysis of Mediation Effect of Country-Of-Origin Image On Brand Equity
21 pages
JD Full Stack Python Developer PDF
No ratings yet
JD Full Stack Python Developer PDF
2 pages
Community Parenting Platform Development and Deployment Using The Django Framework
No ratings yet
Community Parenting Platform Development and Deployment Using The Django Framework
44 pages
Performance Comparison and Evaluation of Web Development Technologies in PHP, Python and Node - Js
No ratings yet
Performance Comparison and Evaluation of Web Development Technologies in PHP, Python and Node - Js
9 pages
Full Stack Python Matthew Makai
0% (1)
Full Stack Python Matthew Makai
6 pages
TFM Borys Makogonyuk Vasylev PDF
No ratings yet
TFM Borys Makogonyuk Vasylev PDF
51 pages
Matthew Makai: Primary Capabilities
No ratings yet
Matthew Makai: Primary Capabilities
1 page
1000 Artificial Intelligence MCQs For Freshers & Experienced - Sanfoundry
0% (1)
1000 Artificial Intelligence MCQs For Freshers & Experienced - Sanfoundry
7 pages
Teradata Tutorial Point
100% (1)
Teradata Tutorial Point
120 pages
Artificial Intelligence Interview Questions and Answers For 2020
No ratings yet
Artificial Intelligence Interview Questions and Answers For 2020
13 pages
Do Long-Term Shareholders Benefit From Corporate Acquisitions - 1997
No ratings yet
Do Long-Term Shareholders Benefit From Corporate Acquisitions - 1997
27 pages
MC-Partnership-1
No ratings yet
MC-Partnership-1
18 pages
Prealgebra Fifth Edition Richard N. Aufmann 2024 Scribd Download
100% (2)
Prealgebra Fifth Edition Richard N. Aufmann 2024 Scribd Download
81 pages
An Elgamal Encryption Scheme of Fibonacci Q-Matrix and Finite State Machine
No ratings yet
An Elgamal Encryption Scheme of Fibonacci Q-Matrix and Finite State Machine
5 pages
Boiler: Diagram of A Fire-Tube Boiler
No ratings yet
Boiler: Diagram of A Fire-Tube Boiler
4 pages
Quiz - Digital Signal Processing PDF
11% (9)
Quiz - Digital Signal Processing PDF
4 pages
Ealps QC
No ratings yet
Ealps QC
14 pages
Lecture 4 - Theory of Cutoff Grades
No ratings yet
Lecture 4 - Theory of Cutoff Grades
52 pages
Service Product Training - EWAD-EWYD-BZ - Chapter 2 - System Architecture - Presentations - English
100% (3)
Service Product Training - EWAD-EWYD-BZ - Chapter 2 - System Architecture - Presentations - English
24 pages
Biot-Savart Law and Applications
No ratings yet
Biot-Savart Law and Applications
2 pages
A3977 Datasheet
No ratings yet
A3977 Datasheet
17 pages
Voltage-Gated Potassium Channels: Gavin Y. Oudit and Peter H. Backx
No ratings yet
Voltage-Gated Potassium Channels: Gavin Y. Oudit and Peter H. Backx
13 pages
Chemical Engineering s7 & s8
No ratings yet
Chemical Engineering s7 & s8
337 pages
Lesson 2 The Concept of Logic Circuits
No ratings yet
Lesson 2 The Concept of Logic Circuits
23 pages
Chapter 35 - Interference Hints-5/23/11: Reference Line N A
No ratings yet
Chapter 35 - Interference Hints-5/23/11: Reference Line N A
7 pages
Chapter 7-Hydroelectric Power Plant
No ratings yet
Chapter 7-Hydroelectric Power Plant
72 pages
Basic-Probability-Concepts
No ratings yet
Basic-Probability-Concepts
6 pages
Chapter 4 - Determinants Revision Notes
No ratings yet
Chapter 4 - Determinants Revision Notes
9 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
17 pages
Aggregate Planning Graphical Method
No ratings yet
Aggregate Planning Graphical Method
16 pages
EC8701 Antennas and Microwave Engineering PDF
0% (1)
EC8701 Antennas and Microwave Engineering PDF
42 pages
VF DOC ADC 1301 VF Service Release Notes Rev.A17
No ratings yet
VF DOC ADC 1301 VF Service Release Notes Rev.A17
25 pages
Introduction To Parallel Programming: Parallel Methods For Matrix Multiplication
No ratings yet
Introduction To Parallel Programming: Parallel Methods For Matrix Multiplication
50 pages
B G1025 Pages: 2: Answer Any Two Full Questions, Each Carries 15 Marks
No ratings yet
B G1025 Pages: 2: Answer Any Two Full Questions, Each Carries 15 Marks
2 pages
Plate Versus Tension-Band Wire Fixation For Olecranon Fractures
No ratings yet
Plate Versus Tension-Band Wire Fixation For Olecranon Fractures
13 pages
Codigos de SMD Capasitores
No ratings yet
Codigos de SMD Capasitores
26 pages
Thermo: Lab-Line Bench Top Incubated Shaker
No ratings yet
Thermo: Lab-Line Bench Top Incubated Shaker
30 pages
HGE Formulas
No ratings yet
HGE Formulas
29 pages
Biofarmasetika BCS
No ratings yet
Biofarmasetika BCS
19 pages

Apache Storm Tutorial Point

Uploaded by

Apache Storm Tutorial Point

Uploaded by

Apache Storm

About the Tutorial

Copyright & Disclaimer

Copyright & Disclaimer.......................................................................................................................................i

Table of Contents ..............................................................................................................................................ii

1. APACHE STORM – INTRODUCTION .................................................................................................1

What is Apache Storm? .....................................................................................................................................1

Apache Storm vs Hadoop ..................................................................................................................................1

Use-Cases of Apache Storm ...............................................................................................................................2

Apache Storm – Benefits ...................................................................................................................................3

2. APACHE STORM – CORE CONCEPTS................................................................................................4

Stream Grouping ...............................................................................................................................................6

3. STORM – CLUSTER ARCHITECTURE .................................................................................................9

4. APACHE STORM – WORKFLOW .....................................................................................................11

5. STORM – DISTRIBUTED MESSAGING SYSTEM...............................................................................12

What is Distributed Messaging System? ..........................................................................................................12

Thrift Protocol .................................................................................................................................................13

6. APACHE STORM – INSTALLATION..................................................................................................14

Step 1: Verifying Java Installation ....................................................................................................................14

Step 2: ZooKeeper Framework Installation ......................................................................................................15

Step 3: Apache Storm Framework Installation .................................................................................................17

7. APACHE STORM – WORKING EXAMPLE ........................................................................................19

Scenario – Mobile Call Log Analyzer ................................................................................................................19

Spout Creation ................................................................................................................................................19

Bolt Creation ...................................................................................................................................................23

Call log Creator Bolt.........................................................................................................................................24

Call log Counter Bolt........................................................................................................................................26

Creating Topology ...........................................................................................................................................27

Building and Running the Application..............................................................................................................29

8. APACHE STORM – TRIDENT ...........................................................................................................32

Trident Topology .............................................................................................................................................32

Trident Tuples .................................................................................................................................................32

Trident Spout ..................................................................................................................................................32

Trident Operations ..........................................................................................................................................33

State Maintenance ..........................................................................................................................................37

Distributed RPC ...............................................................................................................................................37

When to Use Trident?......................................................................................................................................37

Working Example of Trident ............................................................................................................................37

Building and Running the Application..............................................................................................................41

9. APACHE STORM IN TWITTER .........................................................................................................43

Hashtag Reader Bolt........................................................................................................................................47

Hashtag Counter Bolt ......................................................................................................................................49

Building and Running the Application..............................................................................................................51

10. APACHE STORM IN YAHOO! FINANCE...........................................................................................53

Spout Creation ................................................................................................................................................53

Bolt Creation ...................................................................................................................................................55

Building and Running the Application..............................................................................................................58

11. APACHE STORM – APPLICATIONS..................................................................................................59

The Weather Channel......................................................................................................................................59

What is Apache Storm?

Apache Storm vs Hadoop

Real-time stream processing Batch processing

Master/Slave architecture with ZooKeeper Master-slave architecture with/without

Both are distributed and fault-tolerant

Use-Cases of Apache Storm

Apache Storm – Benefits

 Allows real-time stream processing.

 Storm has operational intelligence.

The following diagram depicts the core concept of Apache Storm.

Let us now have a closer look at the components of Apache Storm:

Tuple is the main data structure in Storm. It is a list of ordered elements.

Stream Stream is an unordered sequence of tuples.

Nimbus is a master node of Storm cluster. All other nodes in the

A worker process will execute tasks related to a specific topology. A

An executor is nothing but a single thread spawn by a worker

A task performs actual data processing. So, it is either a spout or a

Apache ZooKeeper is a service used by a cluster (group of nodes)

ZooKeeper helps the supervisor to interact with the nimbus. It is

Let us now take a close look at the workflow of Apache Storm:

 In the meantime, the dead nimbus will be restarted automatically by service

By default, there are two modes in a Storm cluster:

End of ebook preview

You might also like