SlideShare a Scribd company logo
Machine
Learning Basics
An Introduction
What’s in it for you?
Big Data Challenges
What is HDFS?
HDFS Cluster Architecture
HDFS Data Blocks
Data Node Failure
Rack Awareness
General Architecture of HDFS
Read/Write Mechanism
What’s in it for you?
What’s in it for you?
Why Hadoop?
What’s in it for you?
Why Hadoop?
What is Hadoop?
What’s in it for you?
Why Hadoop?
What is Hadoop?
Hadoop HDFS
What’s in it for you?
Why Hadoop?
What is Hadoop?
Hadoop HDFSHadoop MapReduce
What’s in it for you?
Why Hadoop?
What is Hadoop?
Hadoop HDFSHadoop MapReduce
Hadoop YARN
What’s in it for you?
Why Hadoop?
What is Hadoop?
Hadoop HDFSHadoop MapReduce
Hadoop YARN
Use case of Hadoop
What’s in it for you?
Why Hadoop?
What is Hadoop?
Hadoop HDFSHadoop MapReduce
Hadoop YARN
Use case of Hadoop
Demo on HDFS, MapReduce
and YARN
What’s in it for you?
Big Data Challenges
What is HDFS?
HDFS Cluster Architecture
HDFS Data Blocks
Data Node Failure
Rack Awareness
General Architecture of HDFS
Read/Write Mechanism
Why Hadoop?
In a town far away..
Tim sells food grains in his shop
The customers were happy as Tim was very quick
with the orders
Tim sensed a good demand for other products, so he
thought of expanding his business
He started selling fruits, vegetables, meat, and dairy
products in addition to food grains
But it wasn’t as easy as he expected it to be. The
number of customers increased, and he was not
able to cater to their needs on time
He had to look into assisting his customers with
each of their orders and billing. It was too difficult for
him to manage alone
To start delivering orders on time and to manage the
customers’ demands, Tim hired 3 more people to
work with him
Matt took care of the fruits and vegetable section.
Luke handled the dairy and meat section. Ann was
appointed as the cashier
Matt
Luke
Ann
Tim
However, this was still not a solution to Tim’s problem
as there was not enough space in the shop for all the
items
Storage area
The storage was a bottleneck since storing and accessing
became more and more difficult with increased supply and
demand
Storage area
Tim came up with an idea to overcome this issue. He
decided to expand the storage area and distribute each
category of product on different floors
Now, customers were happy, and after picking up their
products from the respective sections, it was then billed
Now, customers were happy, and after picking up their
products from the respective sections, it was then billed
Now, let us compare this story to big data
Earlier, data was generated at a moderate rate, and all the
data was structured in nature. One processor was enough to
process all of it
With the increase in data generation, different types of data
were generated at high speed. It became difficult for a single
processor to process different types of data
Massive amount of different types of data which cannot be
processed and stored using traditional databases is known as
big data
To overcome this issue, multiple processors were used to
process each type of data
But now the problem was that one storage system was
accessed by all the processors and the storage became the
bottleneck
Just like how Tim adopted the distributed approach, the
storage system was also distributed and by doing so, the data
was stored in individual databases
Just like how Tim adopted the distributed approach, the
storage system was also distributed and by doing so, the data
was stored in individual databases
Through this story, we see the two approaches that are
used by Hadoop that is HDFS and MapReduce
HDFS refers to the distributed storage space just like how Tim distributed the
storage space amongst the various sections
Each person took care of a separate section and at the end the customers
went to the cashier for the final billing, this sorted the process and made it
easier. This is how Hadoop MapReduce works
This was a rough story of big data
generation and why Hadoop is required. I
will now explain in detail as to what
Hadoop is
This sounds interesting. I would like
to know more about Hadoop
What’s in it for you?
Big Data Challenges
What is HDFS?
HDFS Cluster Architecture
HDFS Data Blocks
Data Node Failure
Rack Awareness
General Architecture of HDFS
Read/Write Mechanism
What is Hadoop?
What is Hadoop?
Hadoop is a framework which stores and processes big data in a distributed and parallel fashion
What is Hadoop?
Hadoop is a framework which stores and processes big data in a distributed and parallel fashion
BIG DATA
That sounds interesting, so how
does Hadoop store and process all
of this big data?
Hadoop has individual components, which
are used for storing and processing big
data
One day in an office..
HDFS
MapReduce
YARN
Components of Hadoop
The storage unit of Hadoop
One day in an office..
HDFS
MapReduce
YARN
Components of Hadoop
The storage unit of Hadoop
The processing unit of Hadoop
One day in an office..
HDFS
MapReduce
YARN
Components of Hadoop
The storage unit of Hadoop
The processing unit of Hadoop
The resource management unit of Hadoop
What’s in it for you?
Big Data Challenges
What is HDFS?
HDFS Cluster Architecture
HDFS Data Blocks
Data Node Failure
Rack Awareness
General Architecture of HDFS
Read/Write Mechanism
Hadoop HDFS
What is HDFS?
Each block of data is stored on multiple
systems and by default has 128 MB of data
Data
Datanode Datanode Datanode
Hadoop Distributed File System (HDFS) is known for its distributed storage method.
It distributes the data amongst many computers. In addition to this, replication of
data is also done to avoid loss of data
What is HDFS?
Let us now see how 500 MB of data is stored in the traditional method
Let us now see how 500 MB of data is stored in the
traditional method
500 MB data
What is HDFS?
Let us now see how 500 MB of data is stored in the traditional method
Let us now see how 500 MB of data is stored in the
traditional method
Here, the entire set of data is stored in one
database. This overloads the database, and if it
crashes, we lose all our data
500 MB data
What is HDFS?
Let us now see how 500 MB of data is stored in the traditional method
What is HDFS?
Using Hadoop HDFS, this problem is taken care of as data is distributed amongst
many systems
Using Hadoop HDFS, this problem is taken care of
as data is distributed amongst many databases
By doing so, a single database is not
overloaded
500 MB data
What is HDFS?
.
.
.
Using Hadoop HDFS, this problem is taken care of as data is distributed amongst
many systems
Hadoop Distributed File System (HDFS) is specially designed for
storing massive datasets in commodity hardware
What is HDFS?
What is HDFS?
HDFS has two main components that help
with its storage
NameNode DataNode
Hadoop Distributed File System (HDFS) is specially designed for
storing massive datasets in commodity hardware
What is HDFS?
DataNode DataNode DataNode DataNode
NameNode
• NameNode is the master of the
system
• It stores all the metadata
NameNode
What is HDFS?
DataNode DataNode DataNode DataNode
• NameNode is the master of the
system
• It stores all the metadata• DataNode is known as the slave
node. There are multiple
DataNodes
• It performs the read/write
operations and stores the actual
data
What is HDFS?
NameNode
DataNode DataNode DataNode DataNode
• NameNode manages all the
DataNodes
• The DataNodes send signals
known as heartbeats to the
NameNode. This signal gives the
status of the DataNode
As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the
form of blocks here. The default size of each block is 128 MB
What is HDFS?
What is HDFS?
Now, let’s consider storing a file of size 530
MB in HDFS
As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the
form of blocks here. The default size of each block is 128 MB
What is HDFS?
Now, let’s consider storing a file of size 530
MB in HDFS
File.txt
530 MB
As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the
form of blocks here. The default size of each block is 128 MB
What is HDFS?
Now, let’s consider storing a file of size 530
MB in HDFS
File.txt
530 MB
Block B Block DBlock C
128 MB 128 MB128 MB 128 MB
Block A
As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the
form of blocks here. The default size of each block is 128 MB
What is HDFS?
Now, let’s consider storing a file of size 530
MB in HDFS
File.txt
530 MB
Block B Block D Block E
18 MB
Block C
128 MB 128 MB128 MB 128 MB
Block A
As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the
form of blocks here. The default size of each block is 128 MB
What is HDFS?
Now, let’s consider storing a file of size 530
MB in HDFS
File.txt
530 MB
Block B Block D Block E
18 MB
Block C
128 MB 128 MB128 MB 128 MB
Block A
The final block uses
only the remaining
space for storage
As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the
form of blocks here. The default size of each block is 128 MB
What is HDFS?
Now, let’s consider storing a file of size 530
MB in HDFS
File.txt
530 MB
Block B Block D Block E
18 MB
Block C
128 MB 128 MB128 MB 128 MB
Block A
DataNode 1 DataNode 2 DataNode 3 DataNode 4 DataNode 5
As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the
form of blocks here. The default size of each block is 128 MB
What is HDFS?
Now, let’s consider storing a file of size 530
MB in HDFS
File.txt
530 MB
Block B Block D Block E
18 MB
Block C
128 MB 128 MB128 MB 128 MB
Block A
All these data blocks are stored
in DataNodes – computers
DataNode 1 DataNode 2 DataNode 3 DataNode 4 DataNode 5
As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the
form of blocks here. The default size of each block is 128 MB
What happens if the computer that
contains block A crashes? Do we lose
the data in block A?
No, we don’t. That’s the beauty of Hadoop
HDFS. It uses replication to prevent the
loss of data
c
Rack 1
Replication in HDFS
HDFS overcomes the issue of DataNode failure by creating copies of the
data; this is known as the replication method
Block ADN 1
c
Rack 1 Rack 2
Replication in HDFS
HDFS overcomes the issue of DataNode failure by creating copies of the
data; this is known as the replication method
Block ADN 1 DN 1
Block ADN 5
Block A is replicated. The
replication factor is 3. The
replicas are stored in
different DataNodes
Block ADN 6
2 replicas cannot be stored on the same datanode
c
Rack 1 Rack 2
Replication in HDFS
HDFS overcomes the issue of DataNode failure by creating copies of the
data; this is known as the replication method
Rack 3 Rack 4 Rack 5
Similarly, every other
block is replicated
Block ADN 1
Block DDN 2
DN 1
Block ADN 5
Block DDN 10Block BDN 4 Block CDN 7
Block CDN 11
Block EDN 13
Block DDN 14
DN 12Block ADN 6
Block BDN 8
Block BDN 9 Block CDN 15Block EDN 3 Block EDN 12
Architecture of HDFS
Stores
Metadata (Name, replicas, ….)
NameNode
Stores
DataNodes
Metadata (Name, replicas, ….)
DataNodes
NameNode
…..….
Architecture of HDFS
…..….
Stores
DataNodes
Metadata (Name, replicas, ….)
DataNodes
NameNode
Rack is a collection
of DataNodes
Replication
Rack 1 Rack 2
Architecture of HDFS
Metadata ops Stores
Client
DataNodes
Metadata (Name, replicas, ….)
DataNodes
NameNode
Read
request?
Replication
…..….
Architecture of HDFS
Stores
Client
DataNodes
Metadata (Name, replicas, ….)
DataNodes
NameNode
Replication
…..….
Architecture of HDFS
Read
request?
Okay, read data from
DataNodes
Read permission
Stores
DataNodes
Metadata (Name, replicas, ….)
DataNodes
NameNode
Here is the data
that is read
ReplicationRead
data
…..….
Architecture of HDFS
Metadata ops
Client
Read
request?
Metadata ops Stores
Client
DataNodes
Metadata (Name, replicas, ….)
DataNodes
NameNode
Write Write
Client
ReplicationRead
data
…..….
Architecture of HDFS
Features of HDFS
HDFS is fault tolerant as
multiple copies of data are
made
Fault tolerant Data security Scalability Flexibility
Features of HDFS
Provides end-to-end
encryption that protects
data
Fault tolerant Data security Scalability Flexibility
Features of HDFS
Multiple nodes can be
added to the cluster
depending on the
requirement
Fault tolerant Data security Scalability Flexibility
Features of HDFS
Hadoop is flexible in storing any type
of data, like structured, semi
structured or unstructured data
Fault tolerant Data security Scalability Flexibility
Now that we have stored data in
HDFS, how can we process it?
For processing data, Hadoop has a unit
known as MapReduce
In the traditional approach, big data was processed at the master node
Why MapReduce?
big data
In the traditional approach, big data was processed at the master node
Why MapReduce?
Master
Slave Slave
Slave Slave
big data
This was a disadvantage as it consumed more time to process various types of
data
Master
Slave Slave
Slave Slave
Why MapReduce?
big data
To overcome this issue, data was processed at each slave node. This approach
is known as MapReduce
Master
Slave Slave
Slave Slave
Why MapReduce?
big data
What’s in it for you?
Big Data Challenges
What is HDFS?
HDFS Cluster Architecture
HDFS Data Blocks
Data Node Failure
Rack Awareness
General Architecture of HDFS
Read/Write Mechanism
Hadoop MapReduce
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
MapReduce tasks
Map tasks Reduce tasks
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Map and Reduce steps
Input Data Output Data
map()
map()
map()
Shuffle and
Sort
reduce()
reduce()
Input Data is divided to form the input splits
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Map and Reduce steps
Input Data Output Data
map()
map()
map()
Shuffle and
Sort
reduce()
reduce()
Map phase is the first phase, here data in each split is passed to produce output
values
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Map and Reduce steps
Input Data Output Data
map()
map()
map()
Shuffle and
Sort
reduce()
reduce()
In the shuffle and sort phase, output of mapping phase is taken and similar data
is grouped
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Map and Reduce steps
Input Data Output Data
map()
map()
map()
Shuffle and
Sort
reduce()
reduce()
Here, the output values from the shuffling phase are aggregated. It then returns
a single output value
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Let us now see how MapReduce works with an example
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Let us now see how MapReduce works with an example
Input data
Welcome to Hadoop
Hadoop is interesting
Hadoop is easy
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Let us now see how MapReduce works with an example
Input data
Welcome to Hadoop
Hadoop is interesting
Hadoop is easy
Welcome to Hadoop
Hadoop is interesting
Hadoop is easy
Input Splits
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Let us now see how MapReduce works with an example
Input data
Welcome to Hadoop
Hadoop is interesting
Hadoop is easy
Welcome to Hadoop
Hadoop is interesting
Hadoop is easy
Input Splits
Hadoop, 1
is, 1
interesting, 1
Welcome, 1
to, 1
Hadoop, 1
Hadoop, 1
is, 1
easy, 1
Map phase
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Let us now see how MapReduce works with an example
Map phase Shuffle and Sort phase
Hadoop, 1
is, 1
interesting, 1
Welcome, 1
to, 1
Hadoop, 1
Hadoop, 1
is, 1
easy, 1
to, 1
Hadoop, 1
Hadoop, 1
Hadoop, 1
is, 1
is, 1
interesting, 1
Welcome, 1
easy, 1
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Let us now see how MapReduce works with an example
Map phase Shuffle and Sort phase
Hadoop, 1
is, 1
interesting, 1
Welcome, 1
to, 1
Hadoop, 1
Hadoop, 1
is, 1
easy, 1
to, 1
Hadoop, 1
Hadoop, 1
Hadoop, 1
is, 1
is, 1
interesting, 1
Welcome, 1
Reducer phase
easy, 1easy, 1
Hadoop, 3
interesting, 1
is, 2
to, 1
Welcome, 1
What is MapReduce?
Programming technique where huge data is processed in a parallel and
distributed fashion is known as Hadoop MapReduce
Let us now see how MapReduce works with an example
Map phase Shuffle and Sort phase Final Output
Hadoop, 1
is, 1
interesting, 1
Welcome, 1
to, 1
Hadoop, 1
Hadoop, 1
is, 1
easy, 1
to, 1
Hadoop, 1
Hadoop, 1
Hadoop, 1
is, 1
is, 1
interesting, 1
Welcome, 1
Reducer phase
easy, 1
easy 1
Hadoop 3
interesting 1
is 2
to 1
Welcome 1
easy, 1
Hadoop, 3
interesting, 1
is, 2
to, 1
Welcome, 1
Features of MapReduce
Good load
balancing
Re-execution of
tasks
Simple programming
model
Map task +
Reduce task
Splitting the stages into Map and
Reduce tasks improves the load
balancing
Features of MapReduce
Good load
balancing
Re-execution of
tasks
Simple programming
model
There is an automatic re-execution if a
certain task fails
Map task +
Reduce task
Features of MapReduce
Good load
balancing
Re-execution of
tasks
Simple programming
model
MapReduce has one of the simplest
programming model which is based on
Java. Java is a very common
programming language
Map task +
Reduce task
HDFS and MapReduce were the two units
of Hadoop 1.0
Hadoop 1.0 was also known as
MapReduce Version 1
The disadvantage with this version was
that the Job tracker did both the
processing of data and resource
allocation
As a result, Job tracker was overburdened
due to handling job scheduling, and
resource management
To overcome this issue, Hadoop 2
introduced YARN as the processing layer
that supported many frameworks
What’s in it for you?
Big Data Challenges
What is HDFS?
HDFS Cluster Architecture
HDFS Data Blocks
Data Node Failure
Rack Awareness
General Architecture of HDFS
Read/Write Mechanism
Hadoop YARN
What is YARN?
Yet Another Resource Negotiator (YARN) acts as the resource management
unit of Hadoop
What is YARN?
Yet Another Resource Negotiator (YARN) acts as the resource management
unit of Hadoop
Apache YARN consists of
Resource
Manager
It is the master daemon. Manages the assignment of
resources such as CPU, memory
What is YARN?
Yet Another Resource Negotiator (YARN) acts as the resource management
unit of Hadoop
Apache YARN consists of
Resource
Manager
Node
Manager
It is the slave daemon. It reports the resource
usage to the Resource Manager
What is YARN?
Yet Another Resource Negotiator (YARN) acts as the resource management
unit of Hadoop
Apache YARN consists of
Resource
Manager
Application
Master
Node
Manager
Works with the negotiation of resources from resource
manager and works with node manager
What is YARN?
Client
Client
What is YARN?
Client
Client
Resource
Manager
What is YARN?
Client
Client
Resource
Manager
Node
Manager
Node
Manager
Node
Manager
What is YARN?
Client
Client
Resource
Manager
Node
Manager
container
Node
Manager
Node
Manager
container
container container
Container is a collection of physical resources such as CPU, RAM
What is YARN?
Client
Client
Resource
Manager
Node
Manager
App Master
App Master
Node
Manager
Node
Manager
container
container
container container
App Master requests container to Resource Manager. It uses
container allocated by Node Manager
Node
Manager
App Master
App Master
Node
Manager
Node
Manager
container
container
container container
What is YARN?
Client
Client
Application
Resource
Manager
Client program sends application request to the resource
manager
What is YARN?
Node
Manager
container
App Master
App Master
container
Node
Manager
Node
Manager
container container
Client
Client
Resource
Manager
Node status
Job request
Node manager updates the status of the nodes to the resource
manager
What is YARN?
Node
Manager
container
App Master
App Master
container
Node
Manager
Node
Manager
container container
Client
Client
Resource
Manager
Job request
Resource Manager contacts the Node Manager requesting for
resources(containers). The Node Manager grants the request
What is YARN?
Node
Manager
container
App Master
App Master
container
Node
Manager
Node
Manager
container container
Client
Client
Resource
Manager
Job request
App Master contacts the Node Manager to use the container and runs in
one of the container allocated on one of the nodes
Features of YARN
Job scheduling Multitenancy
YARN is responsible to
process job requests and
allocate resources
Scalability
Features of YARN
Job scheduling Multitenancy
Different versions of MapReduce
can run on YARN. This makes
upgrading of MapReduce
manageable
Scalability
Features of YARN
Job scheduling Multitenancy Scalability
Depending on the requirement, the
number of nodes can be increased
Many companies use Hadoop for storing
and processing data. Now, let me tell you
about one such company
What’s in it for you?
Big Data Challenges
What is HDFS?
HDFS Cluster Architecture
HDFS Data Blocks
Data Node Failure
Rack Awareness
General Architecture of HDFS
Read/Write Mechanism
Use case - Pinterest
You would have probably heard of the
popular image sharing website Pinterest
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop Tutorial | Simplilearn
`
Pinterest is a social media platform which allows you to pin any
interesting information you find on its site
Pinterest is a social media platform which allows you to pin any
interesting information you find on its site
`
Pinterest is a social media platform which allows you to pin any
interesting information you find on its site
Pinterest has more than 250 million users and nearly 30 billion pins. All these
account to big data concerning Pinterest
`
Pinterest is a social media platform which allows you to pin any
interesting information you find on its site
Problem
Pinterest faced a challenge in processing tremendous amount of data
Pinterest has more than 250 million users and nearly 30 billion pins. All these
account to big data concerning Pinterest
`
Pinterest is a social media platform which allows you to pin any
interesting information you find on its site
Problem
Pinterest faced a challenge in processing tremendous amount of data
There was a difficulty in analyzing which data needs to be displayed in a user’s
personalized discovery engine
Pinterest has more than 250 million users and nearly 30 billion pins. All these
account to big data concerning Pinterest
`
Pinterest is a social media platform which allows you to pin any
interesting information you find on its site
Solution
Pinterest has more than 250 million users and nearly 30 billion pins. All these
account to big data concerning Pinterest
`
Pinterest is a social media platform which allows you to pin any
interesting information you find on its site
Solution
Pinterest uses Hadoop to process and analyze big data in a way that it helps
the company to show the most relevant content to its users
Pinterest has more than 250 million users and nearly 30 billion pins. All these
account to big data concerning Pinterest
`
Pinterest is a social media platform which allows you to pin any
interesting information you find on its site
Solution
Pinterest uses Hadoop to process and analyze big data in a way that it helps
the company to show the most relevant content to its users
Through continuous analysis of the data, Pinterest can provide its users with
features such as related pins, guided search and so on
Pinterest has more than 250 million users and nearly 30 billion pins. All these
account to big data concerning Pinterest
This is how Pinterest benefited from
Hadoop. Let’s also start using Hadoop to
put an end to the big data challenges we
are facing
What’s in it for you?
Big Data Challenges
What is HDFS?
HDFS Cluster Architecture
HDFS Data Blocks
Data Node Failure
Rack Awareness
General Architecture of HDFS
Read/Write Mechanism
Demo on HDFS, MapReduce
and YARN
Key Takeaways
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop Tutorial | Simplilearn

More Related Content

What's hot (20)

Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
Mahmood Reza Esmaili Zand
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
Paladion Networks
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
Rahul Agarwal
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Simplilearn
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
Jeff Hammerbacher
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
tipanagiriharika
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
puneet yadav
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
Md. Hasan Basri (Angel)
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
Lucian Neghina
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
Shivanee garg
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
Bhavesh Padharia
 
03 hive query language (hql)
03 hive query language (hql)03 hive query language (hql)
03 hive query language (hql)
Subhas Kumar Ghosh
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Prashant Gupta
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
 
Hadoop
Hadoop Hadoop
Hadoop
ABHIJEET RAJ
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
Paladion Networks
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Simplilearn
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
puneet yadav
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
Md. Hasan Basri (Angel)
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
Shivanee garg
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
 

Similar to Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop Tutorial | Simplilearn (20)

Hadoop
HadoopHadoop
Hadoop
RittikaBaksi
 
Chapter2.pdf
Chapter2.pdfChapter2.pdf
Chapter2.pdf
WasyihunSema2
 
Big Data-Session, data engineering and scala
Big Data-Session, data engineering and scalaBig Data-Session, data engineering and scala
Big Data-Session, data engineering and scala
ssusera3b277
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
Ankan Banerjee
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
Derek Chen
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
Subhas Kumar Ghosh
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
Umair Shafique
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
AnkitChauhan817826
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
Siva Sankar
 
hadoop
hadoophadoop
hadoop
swatic018
 
hadoop
hadoophadoop
hadoop
swatic018
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khan
KamranKhan587
 
Bigdata workshop february 2015
Bigdata workshop  february 2015 Bigdata workshop  february 2015
Bigdata workshop february 2015
clairvoyantllc
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
AltafKhadim
 
Unit 3 Big Data àaaaaaaaaaaaTutorial.pdf
Unit 3 Big Data àaaaaaaaaaaaTutorial.pdfUnit 3 Big Data àaaaaaaaaaaaTutorial.pdf
Unit 3 Big Data àaaaaaaaaaaaTutorial.pdf
VarunTyagi624957
 
Hadoop-2.6.0 Slides
Hadoop-2.6.0 SlidesHadoop-2.6.0 Slides
Hadoop-2.6.0 Slides
kul prasad subedi
 
Module 1- Introduction to Big Data and Hadoop
Module 1- Introduction to Big Data and HadoopModule 1- Introduction to Big Data and Hadoop
Module 1- Introduction to Big Data and Hadoop
SiddheshMhatre27
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questions
Kalyan Hadoop
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
RajatTripathi34
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
Siddharth Mathur
 
Big Data-Session, data engineering and scala
Big Data-Session, data engineering and scalaBig Data-Session, data engineering and scala
Big Data-Session, data engineering and scala
ssusera3b277
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
Derek Chen
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
Umair Shafique
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
AnkitChauhan817826
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
Siva Sankar
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khan
KamranKhan587
 
Bigdata workshop february 2015
Bigdata workshop  february 2015 Bigdata workshop  february 2015
Bigdata workshop february 2015
clairvoyantllc
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
AltafKhadim
 
Unit 3 Big Data àaaaaaaaaaaaTutorial.pdf
Unit 3 Big Data àaaaaaaaaaaaTutorial.pdfUnit 3 Big Data àaaaaaaaaaaaTutorial.pdf
Unit 3 Big Data àaaaaaaaaaaaTutorial.pdf
VarunTyagi624957
 
Module 1- Introduction to Big Data and Hadoop
Module 1- Introduction to Big Data and HadoopModule 1- Introduction to Big Data and Hadoop
Module 1- Introduction to Big Data and Hadoop
SiddheshMhatre27
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questions
Kalyan Hadoop
 
Ad

More from Simplilearn (20)

Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Ad

Recently uploaded (20)

How to Manage & Create a New Department in Odoo 18 Employee
How to Manage & Create a New Department in Odoo 18 EmployeeHow to Manage & Create a New Department in Odoo 18 Employee
How to Manage & Create a New Department in Odoo 18 Employee
Celine George
 
PEST OF WHEAT SORGHUM BAJRA and MINOR MILLETS.pptx
PEST OF WHEAT SORGHUM BAJRA and MINOR MILLETS.pptxPEST OF WHEAT SORGHUM BAJRA and MINOR MILLETS.pptx
PEST OF WHEAT SORGHUM BAJRA and MINOR MILLETS.pptx
Arshad Shaikh
 
How to Manage Upselling of Subscriptions in Odoo 18
How to Manage Upselling of Subscriptions in Odoo 18How to Manage Upselling of Subscriptions in Odoo 18
How to Manage Upselling of Subscriptions in Odoo 18
Celine George
 
Analysis of Quantitative Data Parametric and non-parametric tests.pptx
Analysis of Quantitative Data Parametric and non-parametric tests.pptxAnalysis of Quantitative Data Parametric and non-parametric tests.pptx
Analysis of Quantitative Data Parametric and non-parametric tests.pptx
Shrutidhara2
 
Publishing Your Memoir with Brooke Warner
Publishing Your Memoir with Brooke WarnerPublishing Your Memoir with Brooke Warner
Publishing Your Memoir with Brooke Warner
Brooke Warner
 
Unit 3 Poster Sketches with annotations.pptx
Unit 3 Poster Sketches with annotations.pptxUnit 3 Poster Sketches with annotations.pptx
Unit 3 Poster Sketches with annotations.pptx
bobby205207
 
LDMMIA Free Reiki Yoga S9 Grad Level Intuition II
LDMMIA Free Reiki Yoga S9 Grad Level Intuition IILDMMIA Free Reiki Yoga S9 Grad Level Intuition II
LDMMIA Free Reiki Yoga S9 Grad Level Intuition II
LDM & Mia eStudios
 
june 10 2025 ppt for madden on art science is over.pptx
june 10 2025 ppt for madden on art science is over.pptxjune 10 2025 ppt for madden on art science is over.pptx
june 10 2025 ppt for madden on art science is over.pptx
roger malina
 
LDMMIA Spring Ending Guest Grad Student News
LDMMIA Spring Ending Guest Grad Student NewsLDMMIA Spring Ending Guest Grad Student News
LDMMIA Spring Ending Guest Grad Student News
LDM & Mia eStudios
 
GEOGRAPHY-Study Material [ Class 10th] .pdf
GEOGRAPHY-Study Material [ Class 10th] .pdfGEOGRAPHY-Study Material [ Class 10th] .pdf
GEOGRAPHY-Study Material [ Class 10th] .pdf
SHERAZ AHMAD LONE
 
Exploring Ocean Floor Features for Middle School
Exploring Ocean Floor Features for Middle SchoolExploring Ocean Floor Features for Middle School
Exploring Ocean Floor Features for Middle School
Marie
 
Energy Balances Of Oecd Countries 2011 Iea Statistics 1st Edition Oecd
Energy Balances Of Oecd Countries 2011 Iea Statistics 1st Edition OecdEnergy Balances Of Oecd Countries 2011 Iea Statistics 1st Edition Oecd
Energy Balances Of Oecd Countries 2011 Iea Statistics 1st Edition Oecd
razelitouali
 
Battle of Bookworms 2025 - U25 Literature Quiz by Pragya
Battle of Bookworms 2025 - U25 Literature Quiz by Pragya Battle of Bookworms 2025 - U25 Literature Quiz by Pragya
Battle of Bookworms 2025 - U25 Literature Quiz by Pragya
Pragya - UEM Kolkata Quiz Club
 
Rai dyansty Chach or Brahamn dynasty, History of Dahir History of Sindh NEP.pptx
Rai dyansty Chach or Brahamn dynasty, History of Dahir History of Sindh NEP.pptxRai dyansty Chach or Brahamn dynasty, History of Dahir History of Sindh NEP.pptx
Rai dyansty Chach or Brahamn dynasty, History of Dahir History of Sindh NEP.pptx
Dr. Ravi Shankar Arya Mahila P. G. College, Banaras Hindu University, Varanasi, India.
 
Basic English for Communication - Dr Hj Euis Eti Rohaeti Mpd
Basic English for Communication - Dr Hj Euis Eti Rohaeti MpdBasic English for Communication - Dr Hj Euis Eti Rohaeti Mpd
Basic English for Communication - Dr Hj Euis Eti Rohaeti Mpd
Restu Bias Primandhika
 
How to Manage Multi Language for Invoice in Odoo 18
How to Manage Multi Language for Invoice in Odoo 18How to Manage Multi Language for Invoice in Odoo 18
How to Manage Multi Language for Invoice in Odoo 18
Celine George
 
IDF 30min presentation - December 2, 2024.pptx
IDF 30min presentation - December 2, 2024.pptxIDF 30min presentation - December 2, 2024.pptx
IDF 30min presentation - December 2, 2024.pptx
ArneeAgligar
 
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
parmarjuli1412
 
How to Configure Vendor Management in Lunch App of Odoo 18
How to Configure Vendor Management in Lunch App of Odoo 18How to Configure Vendor Management in Lunch App of Odoo 18
How to Configure Vendor Management in Lunch App of Odoo 18
Celine George
 
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptxROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
Belicia R.S
 
How to Manage & Create a New Department in Odoo 18 Employee
How to Manage & Create a New Department in Odoo 18 EmployeeHow to Manage & Create a New Department in Odoo 18 Employee
How to Manage & Create a New Department in Odoo 18 Employee
Celine George
 
PEST OF WHEAT SORGHUM BAJRA and MINOR MILLETS.pptx
PEST OF WHEAT SORGHUM BAJRA and MINOR MILLETS.pptxPEST OF WHEAT SORGHUM BAJRA and MINOR MILLETS.pptx
PEST OF WHEAT SORGHUM BAJRA and MINOR MILLETS.pptx
Arshad Shaikh
 
How to Manage Upselling of Subscriptions in Odoo 18
How to Manage Upselling of Subscriptions in Odoo 18How to Manage Upselling of Subscriptions in Odoo 18
How to Manage Upselling of Subscriptions in Odoo 18
Celine George
 
Analysis of Quantitative Data Parametric and non-parametric tests.pptx
Analysis of Quantitative Data Parametric and non-parametric tests.pptxAnalysis of Quantitative Data Parametric and non-parametric tests.pptx
Analysis of Quantitative Data Parametric and non-parametric tests.pptx
Shrutidhara2
 
Publishing Your Memoir with Brooke Warner
Publishing Your Memoir with Brooke WarnerPublishing Your Memoir with Brooke Warner
Publishing Your Memoir with Brooke Warner
Brooke Warner
 
Unit 3 Poster Sketches with annotations.pptx
Unit 3 Poster Sketches with annotations.pptxUnit 3 Poster Sketches with annotations.pptx
Unit 3 Poster Sketches with annotations.pptx
bobby205207
 
LDMMIA Free Reiki Yoga S9 Grad Level Intuition II
LDMMIA Free Reiki Yoga S9 Grad Level Intuition IILDMMIA Free Reiki Yoga S9 Grad Level Intuition II
LDMMIA Free Reiki Yoga S9 Grad Level Intuition II
LDM & Mia eStudios
 
june 10 2025 ppt for madden on art science is over.pptx
june 10 2025 ppt for madden on art science is over.pptxjune 10 2025 ppt for madden on art science is over.pptx
june 10 2025 ppt for madden on art science is over.pptx
roger malina
 
LDMMIA Spring Ending Guest Grad Student News
LDMMIA Spring Ending Guest Grad Student NewsLDMMIA Spring Ending Guest Grad Student News
LDMMIA Spring Ending Guest Grad Student News
LDM & Mia eStudios
 
GEOGRAPHY-Study Material [ Class 10th] .pdf
GEOGRAPHY-Study Material [ Class 10th] .pdfGEOGRAPHY-Study Material [ Class 10th] .pdf
GEOGRAPHY-Study Material [ Class 10th] .pdf
SHERAZ AHMAD LONE
 
Exploring Ocean Floor Features for Middle School
Exploring Ocean Floor Features for Middle SchoolExploring Ocean Floor Features for Middle School
Exploring Ocean Floor Features for Middle School
Marie
 
Energy Balances Of Oecd Countries 2011 Iea Statistics 1st Edition Oecd
Energy Balances Of Oecd Countries 2011 Iea Statistics 1st Edition OecdEnergy Balances Of Oecd Countries 2011 Iea Statistics 1st Edition Oecd
Energy Balances Of Oecd Countries 2011 Iea Statistics 1st Edition Oecd
razelitouali
 
Battle of Bookworms 2025 - U25 Literature Quiz by Pragya
Battle of Bookworms 2025 - U25 Literature Quiz by Pragya Battle of Bookworms 2025 - U25 Literature Quiz by Pragya
Battle of Bookworms 2025 - U25 Literature Quiz by Pragya
Pragya - UEM Kolkata Quiz Club
 
Basic English for Communication - Dr Hj Euis Eti Rohaeti Mpd
Basic English for Communication - Dr Hj Euis Eti Rohaeti MpdBasic English for Communication - Dr Hj Euis Eti Rohaeti Mpd
Basic English for Communication - Dr Hj Euis Eti Rohaeti Mpd
Restu Bias Primandhika
 
How to Manage Multi Language for Invoice in Odoo 18
How to Manage Multi Language for Invoice in Odoo 18How to Manage Multi Language for Invoice in Odoo 18
How to Manage Multi Language for Invoice in Odoo 18
Celine George
 
IDF 30min presentation - December 2, 2024.pptx
IDF 30min presentation - December 2, 2024.pptxIDF 30min presentation - December 2, 2024.pptx
IDF 30min presentation - December 2, 2024.pptx
ArneeAgligar
 
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
parmarjuli1412
 
How to Configure Vendor Management in Lunch App of Odoo 18
How to Configure Vendor Management in Lunch App of Odoo 18How to Configure Vendor Management in Lunch App of Odoo 18
How to Configure Vendor Management in Lunch App of Odoo 18
Celine George
 
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptxROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
ROLE PLAY: FIRST AID -CPR & RECOVERY POSITION.pptx
Belicia R.S
 

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop Tutorial | Simplilearn

  • 2. What’s in it for you? Big Data Challenges What is HDFS? HDFS Cluster Architecture HDFS Data Blocks Data Node Failure Rack Awareness General Architecture of HDFS Read/Write Mechanism What’s in it for you?
  • 3. What’s in it for you? Why Hadoop?
  • 4. What’s in it for you? Why Hadoop? What is Hadoop?
  • 5. What’s in it for you? Why Hadoop? What is Hadoop? Hadoop HDFS
  • 6. What’s in it for you? Why Hadoop? What is Hadoop? Hadoop HDFSHadoop MapReduce
  • 7. What’s in it for you? Why Hadoop? What is Hadoop? Hadoop HDFSHadoop MapReduce Hadoop YARN
  • 8. What’s in it for you? Why Hadoop? What is Hadoop? Hadoop HDFSHadoop MapReduce Hadoop YARN Use case of Hadoop
  • 9. What’s in it for you? Why Hadoop? What is Hadoop? Hadoop HDFSHadoop MapReduce Hadoop YARN Use case of Hadoop Demo on HDFS, MapReduce and YARN
  • 10. What’s in it for you? Big Data Challenges What is HDFS? HDFS Cluster Architecture HDFS Data Blocks Data Node Failure Rack Awareness General Architecture of HDFS Read/Write Mechanism Why Hadoop?
  • 11. In a town far away..
  • 12. Tim sells food grains in his shop
  • 13. The customers were happy as Tim was very quick with the orders
  • 14. Tim sensed a good demand for other products, so he thought of expanding his business
  • 15. He started selling fruits, vegetables, meat, and dairy products in addition to food grains
  • 16. But it wasn’t as easy as he expected it to be. The number of customers increased, and he was not able to cater to their needs on time
  • 17. He had to look into assisting his customers with each of their orders and billing. It was too difficult for him to manage alone
  • 18. To start delivering orders on time and to manage the customers’ demands, Tim hired 3 more people to work with him
  • 19. Matt took care of the fruits and vegetable section. Luke handled the dairy and meat section. Ann was appointed as the cashier Matt Luke Ann Tim
  • 20. However, this was still not a solution to Tim’s problem as there was not enough space in the shop for all the items Storage area
  • 21. The storage was a bottleneck since storing and accessing became more and more difficult with increased supply and demand Storage area
  • 22. Tim came up with an idea to overcome this issue. He decided to expand the storage area and distribute each category of product on different floors
  • 23. Now, customers were happy, and after picking up their products from the respective sections, it was then billed
  • 24. Now, customers were happy, and after picking up their products from the respective sections, it was then billed Now, let us compare this story to big data
  • 25. Earlier, data was generated at a moderate rate, and all the data was structured in nature. One processor was enough to process all of it
  • 26. With the increase in data generation, different types of data were generated at high speed. It became difficult for a single processor to process different types of data
  • 27. Massive amount of different types of data which cannot be processed and stored using traditional databases is known as big data
  • 28. To overcome this issue, multiple processors were used to process each type of data
  • 29. But now the problem was that one storage system was accessed by all the processors and the storage became the bottleneck
  • 30. Just like how Tim adopted the distributed approach, the storage system was also distributed and by doing so, the data was stored in individual databases
  • 31. Just like how Tim adopted the distributed approach, the storage system was also distributed and by doing so, the data was stored in individual databases Through this story, we see the two approaches that are used by Hadoop that is HDFS and MapReduce
  • 32. HDFS refers to the distributed storage space just like how Tim distributed the storage space amongst the various sections
  • 33. Each person took care of a separate section and at the end the customers went to the cashier for the final billing, this sorted the process and made it easier. This is how Hadoop MapReduce works
  • 34. This was a rough story of big data generation and why Hadoop is required. I will now explain in detail as to what Hadoop is
  • 35. This sounds interesting. I would like to know more about Hadoop
  • 36. What’s in it for you? Big Data Challenges What is HDFS? HDFS Cluster Architecture HDFS Data Blocks Data Node Failure Rack Awareness General Architecture of HDFS Read/Write Mechanism What is Hadoop?
  • 37. What is Hadoop? Hadoop is a framework which stores and processes big data in a distributed and parallel fashion
  • 38. What is Hadoop? Hadoop is a framework which stores and processes big data in a distributed and parallel fashion BIG DATA
  • 39. That sounds interesting, so how does Hadoop store and process all of this big data?
  • 40. Hadoop has individual components, which are used for storing and processing big data
  • 41. One day in an office.. HDFS MapReduce YARN Components of Hadoop The storage unit of Hadoop
  • 42. One day in an office.. HDFS MapReduce YARN Components of Hadoop The storage unit of Hadoop The processing unit of Hadoop
  • 43. One day in an office.. HDFS MapReduce YARN Components of Hadoop The storage unit of Hadoop The processing unit of Hadoop The resource management unit of Hadoop
  • 44. What’s in it for you? Big Data Challenges What is HDFS? HDFS Cluster Architecture HDFS Data Blocks Data Node Failure Rack Awareness General Architecture of HDFS Read/Write Mechanism Hadoop HDFS
  • 45. What is HDFS? Each block of data is stored on multiple systems and by default has 128 MB of data Data Datanode Datanode Datanode Hadoop Distributed File System (HDFS) is known for its distributed storage method. It distributes the data amongst many computers. In addition to this, replication of data is also done to avoid loss of data
  • 46. What is HDFS? Let us now see how 500 MB of data is stored in the traditional method
  • 47. Let us now see how 500 MB of data is stored in the traditional method 500 MB data What is HDFS? Let us now see how 500 MB of data is stored in the traditional method
  • 48. Let us now see how 500 MB of data is stored in the traditional method Here, the entire set of data is stored in one database. This overloads the database, and if it crashes, we lose all our data 500 MB data What is HDFS? Let us now see how 500 MB of data is stored in the traditional method
  • 49. What is HDFS? Using Hadoop HDFS, this problem is taken care of as data is distributed amongst many systems
  • 50. Using Hadoop HDFS, this problem is taken care of as data is distributed amongst many databases By doing so, a single database is not overloaded 500 MB data What is HDFS? . . . Using Hadoop HDFS, this problem is taken care of as data is distributed amongst many systems
  • 51. Hadoop Distributed File System (HDFS) is specially designed for storing massive datasets in commodity hardware What is HDFS?
  • 52. What is HDFS? HDFS has two main components that help with its storage NameNode DataNode Hadoop Distributed File System (HDFS) is specially designed for storing massive datasets in commodity hardware
  • 53. What is HDFS? DataNode DataNode DataNode DataNode NameNode • NameNode is the master of the system • It stores all the metadata
  • 54. NameNode What is HDFS? DataNode DataNode DataNode DataNode • NameNode is the master of the system • It stores all the metadata• DataNode is known as the slave node. There are multiple DataNodes • It performs the read/write operations and stores the actual data
  • 55. What is HDFS? NameNode DataNode DataNode DataNode DataNode • NameNode manages all the DataNodes • The DataNodes send signals known as heartbeats to the NameNode. This signal gives the status of the DataNode
  • 56. As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the form of blocks here. The default size of each block is 128 MB What is HDFS?
  • 57. What is HDFS? Now, let’s consider storing a file of size 530 MB in HDFS As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the form of blocks here. The default size of each block is 128 MB
  • 58. What is HDFS? Now, let’s consider storing a file of size 530 MB in HDFS File.txt 530 MB As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the form of blocks here. The default size of each block is 128 MB
  • 59. What is HDFS? Now, let’s consider storing a file of size 530 MB in HDFS File.txt 530 MB Block B Block DBlock C 128 MB 128 MB128 MB 128 MB Block A As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the form of blocks here. The default size of each block is 128 MB
  • 60. What is HDFS? Now, let’s consider storing a file of size 530 MB in HDFS File.txt 530 MB Block B Block D Block E 18 MB Block C 128 MB 128 MB128 MB 128 MB Block A As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the form of blocks here. The default size of each block is 128 MB
  • 61. What is HDFS? Now, let’s consider storing a file of size 530 MB in HDFS File.txt 530 MB Block B Block D Block E 18 MB Block C 128 MB 128 MB128 MB 128 MB Block A The final block uses only the remaining space for storage As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the form of blocks here. The default size of each block is 128 MB
  • 62. What is HDFS? Now, let’s consider storing a file of size 530 MB in HDFS File.txt 530 MB Block B Block D Block E 18 MB Block C 128 MB 128 MB128 MB 128 MB Block A DataNode 1 DataNode 2 DataNode 3 DataNode 4 DataNode 5 As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the form of blocks here. The default size of each block is 128 MB
  • 63. What is HDFS? Now, let’s consider storing a file of size 530 MB in HDFS File.txt 530 MB Block B Block D Block E 18 MB Block C 128 MB 128 MB128 MB 128 MB Block A All these data blocks are stored in DataNodes – computers DataNode 1 DataNode 2 DataNode 3 DataNode 4 DataNode 5 As mentioned earlier, the actual data is stored in DataNodes. Data is stored in the form of blocks here. The default size of each block is 128 MB
  • 64. What happens if the computer that contains block A crashes? Do we lose the data in block A?
  • 65. No, we don’t. That’s the beauty of Hadoop HDFS. It uses replication to prevent the loss of data
  • 66. c Rack 1 Replication in HDFS HDFS overcomes the issue of DataNode failure by creating copies of the data; this is known as the replication method Block ADN 1
  • 67. c Rack 1 Rack 2 Replication in HDFS HDFS overcomes the issue of DataNode failure by creating copies of the data; this is known as the replication method Block ADN 1 DN 1 Block ADN 5 Block A is replicated. The replication factor is 3. The replicas are stored in different DataNodes Block ADN 6 2 replicas cannot be stored on the same datanode
  • 68. c Rack 1 Rack 2 Replication in HDFS HDFS overcomes the issue of DataNode failure by creating copies of the data; this is known as the replication method Rack 3 Rack 4 Rack 5 Similarly, every other block is replicated Block ADN 1 Block DDN 2 DN 1 Block ADN 5 Block DDN 10Block BDN 4 Block CDN 7 Block CDN 11 Block EDN 13 Block DDN 14 DN 12Block ADN 6 Block BDN 8 Block BDN 9 Block CDN 15Block EDN 3 Block EDN 12
  • 69. Architecture of HDFS Stores Metadata (Name, replicas, ….) NameNode
  • 70. Stores DataNodes Metadata (Name, replicas, ….) DataNodes NameNode …..…. Architecture of HDFS
  • 71. …..…. Stores DataNodes Metadata (Name, replicas, ….) DataNodes NameNode Rack is a collection of DataNodes Replication Rack 1 Rack 2 Architecture of HDFS
  • 72. Metadata ops Stores Client DataNodes Metadata (Name, replicas, ….) DataNodes NameNode Read request? Replication …..…. Architecture of HDFS
  • 73. Stores Client DataNodes Metadata (Name, replicas, ….) DataNodes NameNode Replication …..…. Architecture of HDFS Read request? Okay, read data from DataNodes Read permission
  • 74. Stores DataNodes Metadata (Name, replicas, ….) DataNodes NameNode Here is the data that is read ReplicationRead data …..…. Architecture of HDFS Metadata ops Client Read request?
  • 75. Metadata ops Stores Client DataNodes Metadata (Name, replicas, ….) DataNodes NameNode Write Write Client ReplicationRead data …..…. Architecture of HDFS
  • 76. Features of HDFS HDFS is fault tolerant as multiple copies of data are made Fault tolerant Data security Scalability Flexibility
  • 77. Features of HDFS Provides end-to-end encryption that protects data Fault tolerant Data security Scalability Flexibility
  • 78. Features of HDFS Multiple nodes can be added to the cluster depending on the requirement Fault tolerant Data security Scalability Flexibility
  • 79. Features of HDFS Hadoop is flexible in storing any type of data, like structured, semi structured or unstructured data Fault tolerant Data security Scalability Flexibility
  • 80. Now that we have stored data in HDFS, how can we process it?
  • 81. For processing data, Hadoop has a unit known as MapReduce
  • 82. In the traditional approach, big data was processed at the master node Why MapReduce? big data
  • 83. In the traditional approach, big data was processed at the master node Why MapReduce? Master Slave Slave Slave Slave big data
  • 84. This was a disadvantage as it consumed more time to process various types of data Master Slave Slave Slave Slave Why MapReduce? big data
  • 85. To overcome this issue, data was processed at each slave node. This approach is known as MapReduce Master Slave Slave Slave Slave Why MapReduce? big data
  • 86. What’s in it for you? Big Data Challenges What is HDFS? HDFS Cluster Architecture HDFS Data Blocks Data Node Failure Rack Awareness General Architecture of HDFS Read/Write Mechanism Hadoop MapReduce
  • 87. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce
  • 88. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce MapReduce tasks Map tasks Reduce tasks
  • 89. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Map and Reduce steps Input Data Output Data map() map() map() Shuffle and Sort reduce() reduce() Input Data is divided to form the input splits
  • 90. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Map and Reduce steps Input Data Output Data map() map() map() Shuffle and Sort reduce() reduce() Map phase is the first phase, here data in each split is passed to produce output values
  • 91. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Map and Reduce steps Input Data Output Data map() map() map() Shuffle and Sort reduce() reduce() In the shuffle and sort phase, output of mapping phase is taken and similar data is grouped
  • 92. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Map and Reduce steps Input Data Output Data map() map() map() Shuffle and Sort reduce() reduce() Here, the output values from the shuffling phase are aggregated. It then returns a single output value
  • 93. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Let us now see how MapReduce works with an example
  • 94. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Let us now see how MapReduce works with an example Input data Welcome to Hadoop Hadoop is interesting Hadoop is easy
  • 95. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Let us now see how MapReduce works with an example Input data Welcome to Hadoop Hadoop is interesting Hadoop is easy Welcome to Hadoop Hadoop is interesting Hadoop is easy Input Splits
  • 96. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Let us now see how MapReduce works with an example Input data Welcome to Hadoop Hadoop is interesting Hadoop is easy Welcome to Hadoop Hadoop is interesting Hadoop is easy Input Splits Hadoop, 1 is, 1 interesting, 1 Welcome, 1 to, 1 Hadoop, 1 Hadoop, 1 is, 1 easy, 1 Map phase
  • 97. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Let us now see how MapReduce works with an example Map phase Shuffle and Sort phase Hadoop, 1 is, 1 interesting, 1 Welcome, 1 to, 1 Hadoop, 1 Hadoop, 1 is, 1 easy, 1 to, 1 Hadoop, 1 Hadoop, 1 Hadoop, 1 is, 1 is, 1 interesting, 1 Welcome, 1 easy, 1
  • 98. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Let us now see how MapReduce works with an example Map phase Shuffle and Sort phase Hadoop, 1 is, 1 interesting, 1 Welcome, 1 to, 1 Hadoop, 1 Hadoop, 1 is, 1 easy, 1 to, 1 Hadoop, 1 Hadoop, 1 Hadoop, 1 is, 1 is, 1 interesting, 1 Welcome, 1 Reducer phase easy, 1easy, 1 Hadoop, 3 interesting, 1 is, 2 to, 1 Welcome, 1
  • 99. What is MapReduce? Programming technique where huge data is processed in a parallel and distributed fashion is known as Hadoop MapReduce Let us now see how MapReduce works with an example Map phase Shuffle and Sort phase Final Output Hadoop, 1 is, 1 interesting, 1 Welcome, 1 to, 1 Hadoop, 1 Hadoop, 1 is, 1 easy, 1 to, 1 Hadoop, 1 Hadoop, 1 Hadoop, 1 is, 1 is, 1 interesting, 1 Welcome, 1 Reducer phase easy, 1 easy 1 Hadoop 3 interesting 1 is 2 to 1 Welcome 1 easy, 1 Hadoop, 3 interesting, 1 is, 2 to, 1 Welcome, 1
  • 100. Features of MapReduce Good load balancing Re-execution of tasks Simple programming model Map task + Reduce task Splitting the stages into Map and Reduce tasks improves the load balancing
  • 101. Features of MapReduce Good load balancing Re-execution of tasks Simple programming model There is an automatic re-execution if a certain task fails Map task + Reduce task
  • 102. Features of MapReduce Good load balancing Re-execution of tasks Simple programming model MapReduce has one of the simplest programming model which is based on Java. Java is a very common programming language Map task + Reduce task
  • 103. HDFS and MapReduce were the two units of Hadoop 1.0
  • 104. Hadoop 1.0 was also known as MapReduce Version 1
  • 105. The disadvantage with this version was that the Job tracker did both the processing of data and resource allocation
  • 106. As a result, Job tracker was overburdened due to handling job scheduling, and resource management
  • 107. To overcome this issue, Hadoop 2 introduced YARN as the processing layer that supported many frameworks
  • 108. What’s in it for you? Big Data Challenges What is HDFS? HDFS Cluster Architecture HDFS Data Blocks Data Node Failure Rack Awareness General Architecture of HDFS Read/Write Mechanism Hadoop YARN
  • 109. What is YARN? Yet Another Resource Negotiator (YARN) acts as the resource management unit of Hadoop
  • 110. What is YARN? Yet Another Resource Negotiator (YARN) acts as the resource management unit of Hadoop Apache YARN consists of Resource Manager It is the master daemon. Manages the assignment of resources such as CPU, memory
  • 111. What is YARN? Yet Another Resource Negotiator (YARN) acts as the resource management unit of Hadoop Apache YARN consists of Resource Manager Node Manager It is the slave daemon. It reports the resource usage to the Resource Manager
  • 112. What is YARN? Yet Another Resource Negotiator (YARN) acts as the resource management unit of Hadoop Apache YARN consists of Resource Manager Application Master Node Manager Works with the negotiation of resources from resource manager and works with node manager
  • 116. What is YARN? Client Client Resource Manager Node Manager container Node Manager Node Manager container container container Container is a collection of physical resources such as CPU, RAM
  • 117. What is YARN? Client Client Resource Manager Node Manager App Master App Master Node Manager Node Manager container container container container App Master requests container to Resource Manager. It uses container allocated by Node Manager
  • 118. Node Manager App Master App Master Node Manager Node Manager container container container container What is YARN? Client Client Application Resource Manager Client program sends application request to the resource manager
  • 119. What is YARN? Node Manager container App Master App Master container Node Manager Node Manager container container Client Client Resource Manager Node status Job request Node manager updates the status of the nodes to the resource manager
  • 120. What is YARN? Node Manager container App Master App Master container Node Manager Node Manager container container Client Client Resource Manager Job request Resource Manager contacts the Node Manager requesting for resources(containers). The Node Manager grants the request
  • 121. What is YARN? Node Manager container App Master App Master container Node Manager Node Manager container container Client Client Resource Manager Job request App Master contacts the Node Manager to use the container and runs in one of the container allocated on one of the nodes
  • 122. Features of YARN Job scheduling Multitenancy YARN is responsible to process job requests and allocate resources Scalability
  • 123. Features of YARN Job scheduling Multitenancy Different versions of MapReduce can run on YARN. This makes upgrading of MapReduce manageable Scalability
  • 124. Features of YARN Job scheduling Multitenancy Scalability Depending on the requirement, the number of nodes can be increased
  • 125. Many companies use Hadoop for storing and processing data. Now, let me tell you about one such company
  • 126. What’s in it for you? Big Data Challenges What is HDFS? HDFS Cluster Architecture HDFS Data Blocks Data Node Failure Rack Awareness General Architecture of HDFS Read/Write Mechanism Use case - Pinterest
  • 127. You would have probably heard of the popular image sharing website Pinterest
  • 129. ` Pinterest is a social media platform which allows you to pin any interesting information you find on its site Pinterest is a social media platform which allows you to pin any interesting information you find on its site
  • 130. ` Pinterest is a social media platform which allows you to pin any interesting information you find on its site Pinterest has more than 250 million users and nearly 30 billion pins. All these account to big data concerning Pinterest
  • 131. ` Pinterest is a social media platform which allows you to pin any interesting information you find on its site Problem Pinterest faced a challenge in processing tremendous amount of data Pinterest has more than 250 million users and nearly 30 billion pins. All these account to big data concerning Pinterest
  • 132. ` Pinterest is a social media platform which allows you to pin any interesting information you find on its site Problem Pinterest faced a challenge in processing tremendous amount of data There was a difficulty in analyzing which data needs to be displayed in a user’s personalized discovery engine Pinterest has more than 250 million users and nearly 30 billion pins. All these account to big data concerning Pinterest
  • 133. ` Pinterest is a social media platform which allows you to pin any interesting information you find on its site Solution Pinterest has more than 250 million users and nearly 30 billion pins. All these account to big data concerning Pinterest
  • 134. ` Pinterest is a social media platform which allows you to pin any interesting information you find on its site Solution Pinterest uses Hadoop to process and analyze big data in a way that it helps the company to show the most relevant content to its users Pinterest has more than 250 million users and nearly 30 billion pins. All these account to big data concerning Pinterest
  • 135. ` Pinterest is a social media platform which allows you to pin any interesting information you find on its site Solution Pinterest uses Hadoop to process and analyze big data in a way that it helps the company to show the most relevant content to its users Through continuous analysis of the data, Pinterest can provide its users with features such as related pins, guided search and so on Pinterest has more than 250 million users and nearly 30 billion pins. All these account to big data concerning Pinterest
  • 136. This is how Pinterest benefited from Hadoop. Let’s also start using Hadoop to put an end to the big data challenges we are facing
  • 137. What’s in it for you? Big Data Challenges What is HDFS? HDFS Cluster Architecture HDFS Data Blocks Data Node Failure Rack Awareness General Architecture of HDFS Read/Write Mechanism Demo on HDFS, MapReduce and YARN

Editor's Notes