Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Sign in
Sign in
Download free for days
0 ratings
0% found this document useful (0 votes)
256 views
BDACh 02 L01 Hadoop
Uploaded by
mkarveer
AI-enhanced title
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF, TXT or read online on Scribd
Download now
Download
Save BDACh02L01Hadoop For Later
Download
Save
Save BDACh02L01Hadoop For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
0 ratings
0% found this document useful (0 votes)
256 views
BDACh 02 L01 Hadoop
Uploaded by
mkarveer
AI-enhanced title
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF, TXT or read online on Scribd
Download now
Download
Save BDACh02L01Hadoop For Later
Carousel Previous
Carousel Next
Save
Save BDACh02L01Hadoop For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
Download now
Download
You are on page 1
/ 24
Search
Fullscreen
Lesson 1
Hadoop
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 1
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Big Data Programming Model
• Distributed pieces of codes as well as
the data at the computing nodes
• Distributed data storage systems do not
use the concept of joins
• Hadoop provides that model
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 2
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Big Data Distributed Computing
Model in Hadoop
• Distributed model which requires no
sharing between data nodes
• Multiple tasks of an application also
distribute, run using machines
associated with multiple data nodes
and execute at the same time in
parallel.
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 3
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Big Data Storage Model in
Hadoop
• Data partitions into data blocks and
written at one set of nodes
• The blocks replicate at multiple nodes
to take care of possibilities of network
faults; (When a network fault occurs,
then replicated node makes the data
available)
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 4
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Big Data Computing Model
• Fault tolerant due to replication
• Follows CAP theorem─ out of three
properties (consistency, availability
and partitions), two must at least be
present
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 5
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop
• Hadoop consisted of two components:
data store in blocks in the clusters and
the other is computations at each
individual cluster in parallel with
another.
• Hadoop system uses the Big Data
programming and storage models
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 6
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop
• Jobs or tasks assigned and scheduled
on the same servers which hold the
data
• The system provides faster results
from Big Data and from unstructured
data as well
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 7
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop Infrastructure
• Execution of instructions in two
interrelated entities, such as a query
and the database
• Cloud for clusters
• A cluster consists of sets of computers
or PCs
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 8
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop Platform
• Provides a low cost Big Data
platform, which is open source and
uses cloud services
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 9
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop
• Tera Bytes of data processing takes
just few minutes
• Hadoop enables distributed processing
of large datasets (above 10 million
bytes) across clusters of computers
using a programming model called
MapReduce.
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 10
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop System Characteristics
• Scalable
• Self-manageable
• Self-healing
• Distributed file system
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 11
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Scalability
• Means can be scaled up (enhanced) by
adding storage and processing units as
per the requirements failure.
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 12
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Self Manageability
• Means creation of storage and
processing resources which are used,
scheduled and reduced or increased
with the help of the system itself
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 13
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Self Healing
• Means taken care of by the system
itself in case of faults
• Enables functioning and resources
availability
• Software detect and handle failures at
the task level and also Software
enable the task execution on
communication failure.
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 14
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop Hardware Need
• The hardware scales up from a single
server to thousands of machines that
store the clusters
• Each cluster stores a large number of
data blocks in racks. Default data
block size is 64 MB.
• IBM BigInsights, built on Hadoop
deploys default 128 MB block size. of
data.
2019 “Big Data Analytics “, Ch.02 L01: Introduction To Hadoop 15
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Big Data analytics applications
• Software applications that leverage
large-scale data
• The applications analyze Big Data
using massive parallel processing
frameworks
• Hadoop provides that framework
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 16
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop Framework
• Provides the computing features of a
system of distributed, flexible,
scalable, fault tolerant computing with
high computing power
• Provides an efficient platform for the
distributed storage and processing of a
large amount
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 17
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop Big Data storage and
cluster computing
• Manages both, large-sized structured
and unstructured data in different
formats, such as XML, JSON and text
with efficiency and effectiveness
• Performs better with clusters of many
servers when the focus is on
horizontal scalability
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 18
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Figure 2.1 Core components of Hadoop
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 19
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop
• Open Source Framework
• Java and Linux based: Hadoop uses Java
interfaces
• Base is Linux but has its own set of shell
commands support
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 20
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Figure 2.2 Hadoop main components and
ecosystem components
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 21
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Summary
We learnt
• Hadoop Distributed model with pieces
of codes as well as the data at the
computing nodes which requires no
sharing between data nodes
• Hadoop multiple tasks distribution,
running using machines associated,
execute at the same time in parallel
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 22
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Summary
We learnt
• Partitionability
• Replication of Data
• Java, Linux based, Hadoop Shell
Command Codes
• Hadoop Core Components and
Ecosystem Tools
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 23
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
End of Lesson 1 on
Hadoop
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 24
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
You might also like
Service Manual Am15-2
PDF
No ratings yet
Service Manual Am15-2
64 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Case Tools and Testing Tools Lab
PDF
0% (1)
Case Tools and Testing Tools Lab
36 pages
INDU 6121 - Course Outline (Fall 2021)
PDF
No ratings yet
INDU 6121 - Course Outline (Fall 2021)
10 pages
Aicte Notification
PDF
No ratings yet
Aicte Notification
22 pages
7th Sem 1
PDF
No ratings yet
7th Sem 1
32 pages
Module 3 Python (Chap 2)
PDF
No ratings yet
Module 3 Python (Chap 2)
13 pages
DAN Lab ManuaL
PDF
No ratings yet
DAN Lab ManuaL
53 pages
Prof - Deepali Jain (AI) UNIT-6 Knowledge Engineering
PDF
No ratings yet
Prof - Deepali Jain (AI) UNIT-6 Knowledge Engineering
19 pages
Rmkec Meenak Ge8161 PSPP Lab Viva Questions
PDF
No ratings yet
Rmkec Meenak Ge8161 PSPP Lab Viva Questions
3 pages
Class Diagram UML
PDF
No ratings yet
Class Diagram UML
5 pages
SDN Lab Manual
PDF
No ratings yet
SDN Lab Manual
24 pages
R-2008-M.e. Cse-Syllabus
PDF
No ratings yet
R-2008-M.e. Cse-Syllabus
39 pages
Crypto Currency Lab Manual
PDF
No ratings yet
Crypto Currency Lab Manual
58 pages
Cloud Computing Lab Manual-New
PDF
No ratings yet
Cloud Computing Lab Manual-New
150 pages
M.sc. Computer Science
PDF
No ratings yet
M.sc. Computer Science
18 pages
BDA Lab ManuaL[1]
PDF
No ratings yet
BDA Lab ManuaL[1]
83 pages
Docker Lab Manual Aditya Nair
PDF
No ratings yet
Docker Lab Manual Aditya Nair
20 pages
Python Solutions For iPA 10-Feb-23
PDF
No ratings yet
Python Solutions For iPA 10-Feb-23
21 pages
GE3151 - Python Syllabus
PDF
No ratings yet
GE3151 - Python Syllabus
2 pages
Aim: Write A Program To Parse XML Text, Generate Web Graph and Compute Topic Specific Page Rank. Source Code
PDF
0% (1)
Aim: Write A Program To Parse XML Text, Generate Web Graph and Compute Topic Specific Page Rank. Source Code
5 pages
CS8792 CNS Unit 1 - R1
PDF
No ratings yet
CS8792 CNS Unit 1 - R1
89 pages
P.prabu (28x61c) CCS334 BDA - Unit 4
PDF
No ratings yet
P.prabu (28x61c) CCS334 BDA - Unit 4
28 pages
Notes - Unit 3 - Map Reduce Applications
PDF
No ratings yet
Notes - Unit 3 - Map Reduce Applications
11 pages
Animal Detection and Prevention in Agri Field Using Iot
PDF
No ratings yet
Animal Detection and Prevention in Agri Field Using Iot
36 pages
Java - Lab - Manual-21csl35 - Skit
PDF
No ratings yet
Java - Lab - Manual-21csl35 - Skit
30 pages
Question Bank: T.E. (Computer Engineering) Data Science and Big Data Analytics (2019 Pattern)
PDF
No ratings yet
Question Bank: T.E. (Computer Engineering) Data Science and Big Data Analytics (2019 Pattern)
4 pages
Hadoop Unit-4
PDF
No ratings yet
Hadoop Unit-4
44 pages
Cloud Computing Unit-1 Notes
PDF
No ratings yet
Cloud Computing Unit-1 Notes
12 pages
deep-learning-r18-jntuh-lab-manual
PDF
No ratings yet
deep-learning-r18-jntuh-lab-manual
20 pages
Big Data Analytics Lab Manual
PDF
No ratings yet
Big Data Analytics Lab Manual
80 pages
Jerusalem College of Engineering: ACADEMIC YEAR 2021 - 2022
PDF
No ratings yet
Jerusalem College of Engineering: ACADEMIC YEAR 2021 - 2022
40 pages
Syllabus BCA-3001 Python Programming
PDF
No ratings yet
Syllabus BCA-3001 Python Programming
1 page
Bda Unit 5
PDF
No ratings yet
Bda Unit 5
29 pages
BDA Experiment 14 PDF
PDF
No ratings yet
BDA Experiment 14 PDF
77 pages
Lecture Notes-Cns by Suthoju Girija Rani
PDF
100% (1)
Lecture Notes-Cns by Suthoju Girija Rani
163 pages
CS3491 AI Question Bank
PDF
100% (1)
CS3491 AI Question Bank
4 pages
Bda Unit 3
PDF
No ratings yet
Bda Unit 3
22 pages
BDA Unit - II
PDF
No ratings yet
BDA Unit - II
66 pages
Apache Pig
PDF
No ratings yet
Apache Pig
21 pages
UNIT 3 Developing IoTs-1
PDF
No ratings yet
UNIT 3 Developing IoTs-1
53 pages
Cloud Computing Chapter-11
PDF
No ratings yet
Cloud Computing Chapter-11
15 pages
PPT-unit 5-303105103
PDF
No ratings yet
PPT-unit 5-303105103
108 pages
VTU B.E B.tech 2019 8th Semester July CBCS 15 Scheme 15CS833 Network Management
PDF
No ratings yet
VTU B.E B.tech 2019 8th Semester July CBCS 15 Scheme 15CS833 Network Management
2 pages
I MSC CS Ooad
PDF
No ratings yet
I MSC CS Ooad
110 pages
Python Programming (Open Elective) : Course Objectives Course Outcomes
PDF
No ratings yet
Python Programming (Open Elective) : Course Objectives Course Outcomes
1 page
Sets in Python
PDF
No ratings yet
Sets in Python
7 pages
CCA 3 QP 2021-Final
PDF
No ratings yet
CCA 3 QP 2021-Final
2 pages
Install and Run Hadoop On Windows
PDF
No ratings yet
Install and Run Hadoop On Windows
29 pages
Characteristics of A Good SRS
PDF
No ratings yet
Characteristics of A Good SRS
2 pages
22 PLC15 B
PDF
No ratings yet
22 PLC15 B
5 pages
Unit - V Packages & Gui
PDF
No ratings yet
Unit - V Packages & Gui
41 pages
BD - Unit - IV - Hive and Pig
PDF
No ratings yet
BD - Unit - IV - Hive and Pig
41 pages
MC4112 Set2
PDF
No ratings yet
MC4112 Set2
3 pages
BCA 6TH Sem Artificial Intelligence
PDF
No ratings yet
BCA 6TH Sem Artificial Intelligence
2 pages
MCQ Type Questions
PDF
No ratings yet
MCQ Type Questions
24 pages
Theory of Computation
PDF
No ratings yet
Theory of Computation
22 pages
Unit 5 2 Marks
PDF
No ratings yet
Unit 5 2 Marks
10 pages
AI 5 Semester - Questions & Answers - Set 1 06-01-2025
PDF
No ratings yet
AI 5 Semester - Questions & Answers - Set 1 06-01-2025
12 pages
M.E.cse - R21 Syllabus
PDF
No ratings yet
M.E.cse - R21 Syllabus
20 pages
Cs3591 Cn Unit 1 Notes (1)
PDF
No ratings yet
Cs3591 Cn Unit 1 Notes (1)
37 pages
cp4252-machine learning lab manual 23-24
PDF
No ratings yet
cp4252-machine learning lab manual 23-24
28 pages
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
BDACh01L03DesignLayersindata Processingarchitecture
PDF
No ratings yet
BDACh01L03DesignLayersindata Processingarchitecture
12 pages
Object Relational DBMSs
PDF
No ratings yet
Object Relational DBMSs
34 pages
Nicolas Hohn PHD Thesis
PDF
No ratings yet
Nicolas Hohn PHD Thesis
210 pages
Central Social Welfare Board: Chapter-7
PDF
No ratings yet
Central Social Welfare Board: Chapter-7
12 pages
FTTH 05122022
PDF
No ratings yet
FTTH 05122022
1 page
Experiment Theory 2
PDF
No ratings yet
Experiment Theory 2
4 pages
Plutus Smart Contracts
PDF
100% (1)
Plutus Smart Contracts
126 pages
SM M127 Troubleshooting
PDF
100% (4)
SM M127 Troubleshooting
37 pages
Top 30 JVM (Java Virtual Machine) Interview Questions and Answers Question 1. What Is JVM in Java?
PDF
No ratings yet
Top 30 JVM (Java Virtual Machine) Interview Questions and Answers Question 1. What Is JVM in Java?
25 pages
Xprs Manual v1 0 PDF
PDF
No ratings yet
Xprs Manual v1 0 PDF
26 pages
Basic State Transitions - Machine Cycle and Instruction Cycles - Timing Diagram - Data Transfer Instructions
PDF
No ratings yet
Basic State Transitions - Machine Cycle and Instruction Cycles - Timing Diagram - Data Transfer Instructions
71 pages
Assignment Class Notes
PDF
No ratings yet
Assignment Class Notes
8 pages
Python CheatSheet
PDF
No ratings yet
Python CheatSheet
1 page
MPPSC PDF
PDF
No ratings yet
MPPSC PDF
13 pages
IETM Development Process
PDF
No ratings yet
IETM Development Process
6 pages
Gunjan_s_cv (2) - Gunjan
PDF
No ratings yet
Gunjan_s_cv (2) - Gunjan
1 page
Web Services
PDF
No ratings yet
Web Services
10 pages
Air Dryer ED18-1000
PDF
No ratings yet
Air Dryer ED18-1000
18 pages
Solutions For Tutorial Exercises Association Rule Mining.: Exercise 1. Apriori
PDF
No ratings yet
Solutions For Tutorial Exercises Association Rule Mining.: Exercise 1. Apriori
5 pages
LKS32MC05x 3P3N DS EN v1.88
PDF
No ratings yet
LKS32MC05x 3P3N DS EN v1.88
47 pages
Syteline UsingBOMs&EngChgNotices Slides
PDF
No ratings yet
Syteline UsingBOMs&EngChgNotices Slides
157 pages
The 2022 Duo Trusted Access Report
PDF
No ratings yet
The 2022 Duo Trusted Access Report
52 pages
Pérez Jhon Manuel C. BSHM 2l Abtt
PDF
No ratings yet
Pérez Jhon Manuel C. BSHM 2l Abtt
20 pages
08-DOM Manipulation and Jquery
PDF
No ratings yet
08-DOM Manipulation and Jquery
52 pages
English for Information Technology 1 Teacher's Book
PDF
100% (1)
English for Information Technology 1 Teacher's Book
60 pages
TLS4/8601 Series Consoles: Troubleshooting Manual
PDF
No ratings yet
TLS4/8601 Series Consoles: Troubleshooting Manual
47 pages
Challenges and Opportunities For Online Education in India
PDF
No ratings yet
Challenges and Opportunities For Online Education in India
5 pages
Equipos Sustitutos AT&T Ármalo
PDF
No ratings yet
Equipos Sustitutos AT&T Ármalo
2 pages
Robotics Project: Steering Controller: Ross Makulec ME 5286: Robotics March 11, 2009
PDF
No ratings yet
Robotics Project: Steering Controller: Ross Makulec ME 5286: Robotics March 11, 2009
7 pages
I Asked ChatGPT To Find Me Free Money - by Paul Rose
PDF
No ratings yet
I Asked ChatGPT To Find Me Free Money - by Paul Rose
25 pages
Letter of Undertaking From Developer To Provider - Gpi MD Agencia Mega Comunicacao Integrada
PDF
No ratings yet
Letter of Undertaking From Developer To Provider - Gpi MD Agencia Mega Comunicacao Integrada
16 pages