Hadoop

Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of computers. It uses MapReduce as a programming model and HDFS as a distributed file system. HDFS stores large files across clusters and replicates data for reliability, while MapReduce allows parallel processing of datasets in a fault-tolerant manner. A typical Hadoop cluster integrates these components, with a master node running job and name nodes and slave nodes running task and data nodes.

Uploaded by

jefferyleclerc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views

Hadoop

Uploaded by

jefferyleclerc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Hadoop/MapReduce

Object-oriented framework presentation

CSCI 5448
Casey McTaggart
What is Apache Hadoop?
• Large scale, open source software framework
▫ Yahoo! has been the largest contributor to date
• Dedicated to scalable, distributed, data-intensive
computing
• Handles thousands of nodes and petabytes of data
• Supports applications under a free license
• 3 Hadoop subprojects:
▫ Hadoop Common: common utilities package
▫ HFDS: Hadoop Distributed File System with high
throughput access to application data
▫ MapReduce: A software framework for distributed
processing of large data sets on computer clusters
Hadoop MapReduce
• MapReduce is a programming model and software
framework first developed by Google (Google’s
MapReduce paper submitted in 2004)
• Intended to facilitate and simplify the processing of
vast amounts of data in parallel on large clusters of
commodity hardware in a reliable, fault-tolerant
manner
▫ Petabytes of data
▫ Thousands of nodes
• Computational processing occurs on both:
▫ Unstructured data : filesystem
▫ Structured data : database
Hadoop Distributed File System (HFDS)
• Inspired by Google File System
• Scalable, distributed, portable filesystem written in Java for
Hadoop framework
▫ Primary distributed storage used by Hadoop applications
• HFDS can be part of a Hadoop cluster or can be a stand-alone
general purpose distributed file system
• An HFDS cluster primarily consists of
▫ NameNode that manages file system metadata
▫ DataNode that stores actual data
• Stores very large files in blocks across machines in a large
cluster
▫ Reliability and fault tolerance ensured by replicating data across
multiple hosts
• Has data awareness between nodes
• Designed to be deployed on low-cost hardware
More on Hadoop file systems

• Hadoop can work directly with any distributed

file system which can be mounted by the
underlying OS
• However, doing this means a loss of locality as
Hadoop needs to know which servers are closest
to the data
• Hadoop-specific file systems like HFDS are
developed for locality, speed, fault tolerance,
integration with Hadoop, and reliability
Typical Hadoop cluster integrates
MapReduce and HFDS
• Master/slave architecture
• Master node contains
▫ Job tracker node (MapReduce layer)
▫ Task tracker node (MapReduce layer)
▫ Name node (HFDS layer)
▫ Data node (HFDS layer)
• Multiple slave nodes contain
▫ Task tracker node (MapReduce layer)
▫ Data node (HFDS layer)
• MapReduce layer has job and task tracker nodes
• HFDS layer has name and data nodes
Hadoop simple cluster graphic
MapReduce layer HFDS layer

Master Node

JobTracker TaskTracker Name Data

Slave Node
1..*
TaskTracker Data

GCP ACE CheatSheets
100% (3)
GCP ACE CheatSheets
49 pages
AWS CSAA Practice-Questions DCT V08-Ambu0d
100% (11)
AWS CSAA Practice-Questions DCT V08-Ambu0d
411 pages
Designing For Failure Disaster Recovery Using AWS KaranDesai
No ratings yet
Designing For Failure Disaster Recovery Using AWS KaranDesai
91 pages
Unit-2 Hadoop HDFS Hadoopecosystem
No ratings yet
Unit-2 Hadoop HDFS Hadoopecosystem
25 pages
Hadoop Introduction PDF
No ratings yet
Hadoop Introduction PDF
3 pages
Unit V Cloud Technologies and Advancements
No ratings yet
Unit V Cloud Technologies and Advancements
33 pages
BigData Unit 2
No ratings yet
BigData Unit 2
56 pages
Hadoopintro
No ratings yet
Hadoopintro
31 pages
Intro Hadoop Ecosystem Components, Hadoop Ecosystem Tools
No ratings yet
Intro Hadoop Ecosystem Components, Hadoop Ecosystem Tools
15 pages
02 Unit-II Hadoop Architecture and HDFS
No ratings yet
02 Unit-II Hadoop Architecture and HDFS
18 pages
Unit 2
No ratings yet
Unit 2
21 pages
Bda - Unit 2
No ratings yet
Bda - Unit 2
56 pages
Bda Unit 2
No ratings yet
Bda Unit 2
79 pages
Hadoop Overview
100% (1)
Hadoop Overview
16 pages
Big Data
No ratings yet
Big Data
67 pages
Bda Unit 4 Material
No ratings yet
Bda Unit 4 Material
37 pages
Unit Iv-1
No ratings yet
Unit Iv-1
84 pages
Chapter2 Bdi
No ratings yet
Chapter2 Bdi
101 pages
Hadoop 1
No ratings yet
Hadoop 1
75 pages
Compusoft, 2 (11), 370-373 PDF
No ratings yet
Compusoft, 2 (11), 370-373 PDF
4 pages
Module II
No ratings yet
Module II
46 pages
HDFS 79
No ratings yet
HDFS 79
74 pages
Hadoop
No ratings yet
Hadoop
5 pages
Lecture-1 - 3 Hadoop - HDFS - Mapreduce (Self Study)
No ratings yet
Lecture-1 - 3 Hadoop - HDFS - Mapreduce (Self Study)
25 pages
Unit - III Advanced Analytics Technology and Tools
No ratings yet
Unit - III Advanced Analytics Technology and Tools
44 pages
UNIT 5 Combined
No ratings yet
UNIT 5 Combined
13 pages
BIG Data_Unit_2
No ratings yet
BIG Data_Unit_2
24 pages
BDA Notes
No ratings yet
BDA Notes
25 pages
2 Hadoop
No ratings yet
2 Hadoop
20 pages
U-3 Big Data
No ratings yet
U-3 Big Data
23 pages
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
No ratings yet
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
53 pages
Hadoop Notes 2
No ratings yet
Hadoop Notes 2
5 pages
Hadoop PDF
0% (1)
Hadoop PDF
4 pages
BDA-UNIT-2 - 2023
No ratings yet
BDA-UNIT-2 - 2023
58 pages
Wa0002.
No ratings yet
Wa0002.
32 pages
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
No ratings yet
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
20 pages
2 Hadoop (Uploaded)
No ratings yet
2 Hadoop (Uploaded)
82 pages
Module 2.1
No ratings yet
Module 2.1
21 pages
Unit 2 Hadoop
No ratings yet
Unit 2 Hadoop
60 pages
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
100% (1)
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
89 pages
Hadoop, A Distributed Framework For Big Data
No ratings yet
Hadoop, A Distributed Framework For Big Data
55 pages
BDA Manual
No ratings yet
BDA Manual
57 pages
Hadoop Major Components
No ratings yet
Hadoop Major Components
10 pages
10th August Morning and Afternoon session Hadoop (1)
No ratings yet
10th August Morning and Afternoon session Hadoop (1)
18 pages
UNIT V-Cloud Computing
No ratings yet
UNIT V-Cloud Computing
33 pages
UNIT 5-PLH
No ratings yet
UNIT 5-PLH
34 pages
Hadoop Important Lecture
No ratings yet
Hadoop Important Lecture
38 pages
Hadoop
No ratings yet
Hadoop
7 pages
BDA UNIT-2dhhhhbv
No ratings yet
BDA UNIT-2dhhhhbv
23 pages
Big Data Analytics AAM Unit 5 (1)
No ratings yet
Big Data Analytics AAM Unit 5 (1)
28 pages
Unit-Iv CC&BD CS71
No ratings yet
Unit-Iv CC&BD CS71
148 pages
Hadoop-How It Works
No ratings yet
Hadoop-How It Works
5 pages
CC 2
No ratings yet
CC 2
25 pages
Big data unit 2
No ratings yet
Big data unit 2
25 pages
Hadoop
No ratings yet
Hadoop
13 pages
Unit 1 Haoop Architecture
No ratings yet
Unit 1 Haoop Architecture
26 pages
shawn
No ratings yet
shawn
4 pages
Fbda Unit-3
No ratings yet
Fbda Unit-3
27 pages
BDA-Unit 4
No ratings yet
BDA-Unit 4
20 pages
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
No ratings yet
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
55 pages
Unit - II
No ratings yet
Unit - II
64 pages
DW - Bigdata9
No ratings yet
DW - Bigdata9
113 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Paper Dvi
No ratings yet
Paper Dvi
7 pages
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-C
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-C
10 pages
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-H
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-H
4 pages
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-A
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-A
7 pages
A Distance-Based Kernel For Classification Via Support Vector Machines - PMC-17
No ratings yet
A Distance-Based Kernel For Classification Via Support Vector Machines - PMC-17
1 page
2 Mapreduce Model Principles
No ratings yet
2 Mapreduce Model Principles
7 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-16
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-16
3 pages
MapReduce - What It Is, and Why It Is So Popular
No ratings yet
MapReduce - What It Is, and Why It Is So Popular
7 pages
Balanced K-Means Revisited-5
No ratings yet
Balanced K-Means Revisited-5
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1Q
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1Q
2 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-17
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-17
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1E
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1E
2 pages
Balanced K-Means Revisited-1
No ratings yet
Balanced K-Means Revisited-1
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-O
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-O
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-14
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-14
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-9
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-9
4 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-P
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-P
3 pages
Tutorial For K Means Clustering in Python Sklearn - MLK - Machine Learning Knowledge-5
No ratings yet
Tutorial For K Means Clustering in Python Sklearn - MLK - Machine Learning Knowledge-5
3 pages
Fast Scalable K-Means++ Algorithm With Mapreduce
No ratings yet
Fast Scalable K-Means++ Algorithm With Mapreduce
2 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-A
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-A
6 pages
Data Visualization Cheat Sheet For Basic Machine Learning Algorithms - by Boriharn K - Mar, 2024 - Towards Data Science
No ratings yet
Data Visualization Cheat Sheet For Basic Machine Learning Algorithms - by Boriharn K - Mar, 2024 - Towards Data Science
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-4
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-4
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-5
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-5
4 pages
Embed and Conquer: Scalable Embeddings For Kernel K-Means On Mapreduce
No ratings yet
Embed and Conquer: Scalable Embeddings For Kernel K-Means On Mapreduce
9 pages
The Incremental Online K Means Clustering Algorithm and Its Application To Color Quantization
No ratings yet
The Incremental Online K Means Clustering Algorithm and Its Application To Color Quantization
42 pages
Improved K-Means Map Reduce Algorithm For Big Data Cluster Analysis
No ratings yet
Improved K-Means Map Reduce Algorithm For Big Data Cluster Analysis
7 pages
K-Means Clustering Optimization Algorithm Based On Mapreduce
No ratings yet
K-Means Clustering Optimization Algorithm Based On Mapreduce
6 pages
Fuzzy K-Mean Clustering in Mapreduce On Cloud Based Hadoop: Dweepna Garg
No ratings yet
Fuzzy K-Mean Clustering in Mapreduce On Cloud Based Hadoop: Dweepna Garg
4 pages
Analysis of Mapreduce Algorithms: Harini Padmanaban
No ratings yet
Analysis of Mapreduce Algorithms: Harini Padmanaban
6 pages
seance cloud 1
No ratings yet
seance cloud 1
10 pages
AWS Data Engineer 6 Weeks Training Course Content
No ratings yet
AWS Data Engineer 6 Weeks Training Course Content
5 pages
Difference Between Client Server and Web Testing
No ratings yet
Difference Between Client Server and Web Testing
2 pages
virtuozzo-devops-platform-as-a-service-ds-en-ltr - ds-application-platform-2022
No ratings yet
virtuozzo-devops-platform-as-a-service-ds-en-ltr - ds-application-platform-2022
2 pages
Cloud Computing Solutions Architecture, Data Storage, Implementation, and Security (Souvik Pal, Dac-Nhuong Le Etc.)
No ratings yet
Cloud Computing Solutions Architecture, Data Storage, Implementation, and Security (Souvik Pal, Dac-Nhuong Le Etc.)
403 pages
AWS Study Guide
No ratings yet
AWS Study Guide
5 pages
Virtual Container Technology Options For Management Security
No ratings yet
Virtual Container Technology Options For Management Security
100 pages
HDFS Vs AFS
No ratings yet
HDFS Vs AFS
4 pages
Cloud Service Models - Iaas, Paas, SaaS
No ratings yet
Cloud Service Models - Iaas, Paas, SaaS
15 pages
PCCF-Unit 4
No ratings yet
PCCF-Unit 4
15 pages
Army Institute of Business Administration, Savar: COURSE TITLE: Computer & Its Application Course Code: Ged1104
No ratings yet
Army Institute of Business Administration, Savar: COURSE TITLE: Computer & Its Application Course Code: Ged1104
13 pages
Designing Restful Web Apis: What Is Rest
No ratings yet
Designing Restful Web Apis: What Is Rest
20 pages
Oracle Database 21c - Install and Upgrade
No ratings yet
Oracle Database 21c - Install and Upgrade
43 pages
WK2 Cloud Computing Presentation PDF
No ratings yet
WK2 Cloud Computing Presentation PDF
16 pages
17 Cra Ries: Archives
No ratings yet
17 Cra Ries: Archives
80 pages
What Are Torrents and How They Work
No ratings yet
What Are Torrents and How They Work
3 pages
AWS vs. Azure vs. Google: Cloud Comparison (2019 Update)
67% (3)
AWS vs. Azure vs. Google: Cloud Comparison (2019 Update)
16 pages
Vip 40 Sku List
No ratings yet
Vip 40 Sku List
1,805 pages
Google - Passleader.cloud Digital Leader - free.PDF.2023 May 25.by - Gary.153q.vce
No ratings yet
Google - Passleader.cloud Digital Leader - free.PDF.2023 May 25.by - Gary.153q.vce
5 pages
Mod Datalake Dremio Spark Iceberg 012524pdf
No ratings yet
Mod Datalake Dremio Spark Iceberg 012524pdf
64 pages
CNS-INF-A-SVC-DEP-PR-NCI-Deploy
No ratings yet
CNS-INF-A-SVC-DEP-PR-NCI-Deploy
5 pages
572 1588 1 PB
No ratings yet
572 1588 1 PB
21 pages
Web Servers
No ratings yet
Web Servers
3 pages
IOT Basics
No ratings yet
IOT Basics
44 pages
Azure Global Infrastructure - Regions and Availability Zones
No ratings yet
Azure Global Infrastructure - Regions and Availability Zones
15 pages
AWS Machine Learning
No ratings yet
AWS Machine Learning
16 pages
BigData and Hadoop - Syllabus
No ratings yet
BigData and Hadoop - Syllabus
2 pages

Hadoop

Uploaded by

Hadoop

Uploaded by

Hadoop/MapReduce

Object-oriented framework presentation

• Hadoop can work directly with any distributed

JobTracker TaskTracker Name Data

You might also like