Big Data Analytics (15CS82) : Venugopala Rao A S Dept. of CSE, SMVITM, Bantakal

The document discusses the architecture and functionality of the Hadoop Distributed File System (HDFS), highlighting the roles of the NameNode and SecondaryNameNode in managing data blocks and metadata. It explains the replication process for data blocks, the significance of safe mode during NameNode startup, and how HDFS ensures data availability through redundancy. Additionally, it covers the configuration of replication factors based on the number of DataNodes in a cluster.

Uploaded by

venurao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views12 pages

Big Data Analytics (15CS82) : Venugopala Rao A S Dept. of CSE, SMVITM, Bantakal

Uploaded by

venurao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Big Data Analytics

(15CS82)
Venugopala Rao A S
Dept. of CSE, SMVITM, Bantakal
Module 1
• During any ongoing data transfer, the NameNode monitors the
DataNodes by listening for heartbeats sent from them.
• If NameNode could not sense heartbeat signal from a specific
DataNode, this indicates a potential node failure.
• In such a case, the NameNode will start re-replicating the now-
missing blocks.
• Because the file system is redundant, DataNodes can be taken
offline (decommissioned) for maintenance by informing the
NameNode of the DataNodes to exclude from the HDFS pool.
• The mappings between data blocks and the physical
DataNodes are not kept in persistent storage on the NameNode

BDA-15CS82
Module 1
• Once the NameNode starts up, each DataNode provides a
block report (which it keeps in persistent storage) to the
NameNode.
• The block reports are sent every 10 heartbeats. (This interval is
configurable)
• The reports enable the NameNode to keep an up-to-date
account of all data blocks in the cluster.

BDA-15CS82
Module 1
• SecondaryNameNode
• Available in almost all Hadoop deployments
• Not explicitly required by a NameNode, but recommended.
• The name “SecondaryNameNode” (CheckPointNode) is
somewhat misleading.
• It is not an active failover node and cannot replace the primary
NameNode in case of its failure.
• The purpose of this is to perform periodic checkpoints that
evaluate the status of the NameNode.
• Note that the NameNode keeps all system metadata memory
for fast access.
• It also has two disk files that track changes to the metadata:

BDA-15CS82
Module 1
• First is an image of the file system state when the NameNode
was started.
• This file begins with fsimage_* and is used only at startup by
the NameNode.
• A series of modifications done to the file system after starting
the NameNode.
• These files begin with edit_* and reflect the changes made
after the fsimage_* file was read.
• The SecondaryNameNode periodically downloads fsimage and
edits files, joins them into a new fsimage, and uploads the new
fsimage file to the NameNode.
• Thus, when the NameNode restarts, the fsimage file is
reasonably up-to-date and requires only the edit logs to be
applied since the last checkpoint
BDA-15CS82
Module 1
• Thus in the absence of SecondaryNameNode, a restart of the
NameNode could take a long time due to the number of
changes to the file system
• To summarize various roles of HDFS,
• HDFS uses a master/slave model designed for large file
reading/streaming.
• The NameNode is a metadata server or “data traffic cop.”
• HDFS provides a single namespace that is managed by the
NameNode.
• Data is redundantly stored on DataNodes; there is no data on
the NameNode.
• The SecondaryNameNode performs checkpoints of
NameNode file system’s state but is not a failover node.
BDA-15CS82
Module 1
• HDFS Block Replication
• We saw that, when HDFS writes a file, it is replicated across
the cluster.
• The amount of replication is based on the value of
dfs.replication in the hdfs-site.xml file
• This default value can be overruled with the hdfs dfs-setrep
command.
• For Hadoop cluster
• containing more than eight DataNodes, the replication value is usually
set to 3
• of eight or fewer DataNodes but more than one DataNode, a
replication factor may be set to 2.
• For a single machine, the replication factor is set to 1

BDA-15CS82
Module 1
• If several machines are to be involved in the serving of a file,
and if any one of these machines go down then a file could be
rendered unavailable.
• HDFS overcomes this problem by replicating each block
across a number of machines (three is the default).
• The HDFS default block size is often 64MB.
• Note that, the HDFS default block size is not the minimum
block size.
• If a 20KB file is written to HDFS, it will create a block that is
approximately 20KB in size.
• On the other hand, if a file of size 80MB is written to HDFS, a
64MB block and a 16MB block will be created

BDA-15CS82
Module 1
• HDFS blocks are not exactly the same as the data splits used
by the MapReduce process.
• The HDFS blocks are based on size, while the splits are based
on a logical partitioning of the data.
• i.e. if a file contains discrete records, the logical split ensures
that a record is not split physically across two separate servers
during processing.
• Each HDFS block may consist of one or more splits
• HDFS block replication example is shown in the figure

BDA-15CS82
Module 1

• a file is broken into blocks and replicated across the cluster.

• In this case, a replication factor of 3 ensures that any one
DataNode can fail and the replicated blocks will be available on
other nodes—and then subsequently re-replicated on other
DataNodes. BDA-15CS82
Module 1
• HDFS Safe Mode
• When the NameNode starts, it enters a read-only safe mode
where blocks cannot be replicated or deleted.
• Safe Mode enables the NameNode to perform two important
processes:
• 1. The previous file system state is reconstructed by loading
the fsimage file into memory and replaying the edit log.
• 2. The mapping between blocks and data nodes is created by
waiting for enough of the DataNodes to register so that at least
one copy of the data is available.
• Not all DataNodes are required to register before HDFS exits
from Safe Mode. The registration process may continue for
some time
BDA-15CS82
Module 1
• HDFS may also enter Safe Mode for maintenance using the
hdfs dfsadmin-safemode command or when there is a
file system issue that must be addressed by the administrator.
•

BDA-15CS82

(Ebook) The Transformers Legends by David Cian ISBN 9780743497916, 0743497910 Download
100% (2)
(Ebook) The Transformers Legends by David Cian ISBN 9780743497916, 0743497910 Download
67 pages
Big Data Unit-3
No ratings yet
Big Data Unit-3
46 pages
Bigdata 15cs82 Vtu Module 1 2 Notes
57% (14)
Bigdata 15cs82 Vtu Module 1 2 Notes
49 pages
Architecture and Sociology
No ratings yet
Architecture and Sociology
11 pages
21CS72 Bigdata Module 2 HDFS
No ratings yet
21CS72 Bigdata Module 2 HDFS
55 pages
HDFS Concepts
No ratings yet
HDFS Concepts
10 pages
Notes
88% (8)
Notes
18 pages
HDFS
No ratings yet
HDFS
16 pages
Analog and Digital Electronics 21CS33: Venugopala Rao A S Dept. of Computer Science and Design AIET, Moodbidri
No ratings yet
Analog and Digital Electronics 21CS33: Venugopala Rao A S Dept. of Computer Science and Design AIET, Moodbidri
29 pages
Hdfs Part 2
No ratings yet
Hdfs Part 2
42 pages
Portable Radios: Operating Instructions
100% (1)
Portable Radios: Operating Instructions
47 pages
Lecture 3: Role of Academic Librarian: Prof. Dana P. Tugade
100% (1)
Lecture 3: Role of Academic Librarian: Prof. Dana P. Tugade
34 pages
Unit 3 Full
No ratings yet
Unit 3 Full
89 pages
Bigdata Unit 3
No ratings yet
Bigdata Unit 3
96 pages
Big Data Analytics - 7th Sem VTU 2018 Scheme - Class 1
No ratings yet
Big Data Analytics - 7th Sem VTU 2018 Scheme - Class 1
8 pages
Unit - 3 (HDFS)
No ratings yet
Unit - 3 (HDFS)
23 pages
Unit-3 (HDFS)
No ratings yet
Unit-3 (HDFS)
59 pages
Big Data Refers To Extremely Large and Complex Datasets That 1
No ratings yet
Big Data Refers To Extremely Large and Complex Datasets That 1
421 pages
Module 4 Part 1 - Introduction To VHDL
No ratings yet
Module 4 Part 1 - Introduction To VHDL
57 pages
기존 시설물 (기초및지반) 내진성능 평가요령 (안)
No ratings yet
기존 시설물 (기초및지반) 내진성능 평가요령 (안)
216 pages
Unit-4 BDA As On 25-11-2024
No ratings yet
Unit-4 BDA As On 25-11-2024
248 pages
Iso 11600 2002
No ratings yet
Iso 11600 2002
9 pages
Unit-4 BDA As On 25-11-2024
No ratings yet
Unit-4 BDA As On 25-11-2024
258 pages
GAGEtrak Pro 8 Intro Guide
No ratings yet
GAGEtrak Pro 8 Intro Guide
119 pages
SC MCQ
0% (1)
SC MCQ
10 pages
Module 4 - Hadoop HDFS
No ratings yet
Module 4 - Hadoop HDFS
102 pages
Data Quality DMB Ok Dam A Brasil
100% (1)
Data Quality DMB Ok Dam A Brasil
46 pages
3a Index PDF
0% (1)
3a Index PDF
4 pages
Module 3
No ratings yet
Module 3
121 pages
Bda - M 2
No ratings yet
Bda - M 2
113 pages
Big Data Analytics - 7th Sem VTU 2018 Scheme - Class 3
No ratings yet
Big Data Analytics - 7th Sem VTU 2018 Scheme - Class 3
10 pages
Unit 2 Da Material
No ratings yet
Unit 2 Da Material
71 pages
Chap4 BigDataStorageAndManagement
No ratings yet
Chap4 BigDataStorageAndManagement
46 pages
BDA Module 2 - Notes PDF
No ratings yet
BDA Module 2 - Notes PDF
101 pages
Unit 2
No ratings yet
Unit 2
53 pages
Unit 2
No ratings yet
Unit 2
56 pages
DSECL ZG 522: Big Data Systems: Session 6: Hadoop Architecture and Filesystem
No ratings yet
DSECL ZG 522: Big Data Systems: Session 6: Hadoop Architecture and Filesystem
56 pages
Very Low Drop 5V Regulator With Reset: Description
No ratings yet
Very Low Drop 5V Regulator With Reset: Description
79 pages
Big Data Hadoop HDFS
No ratings yet
Big Data Hadoop HDFS
32 pages
Sheet Metal Shop Exp 1.3
No ratings yet
Sheet Metal Shop Exp 1.3
30 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
84 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
48 pages
Unit 5 Print
No ratings yet
Unit 5 Print
32 pages
(17CS82) 8 Semester CSE: Big Data Analytics
No ratings yet
(17CS82) 8 Semester CSE: Big Data Analytics
169 pages
Hadoop
No ratings yet
Hadoop
23 pages
Hytera+VM780+4G+Body+Worn+Camera+User+Manual+ (HyTalk) +V1.0.00 Eng
No ratings yet
Hytera+VM780+4G+Body+Worn+Camera+User+Manual+ (HyTalk) +V1.0.00 Eng
50 pages
BUCHI Destilador B-324 LIGAL 489 Operationmanual - SP
No ratings yet
BUCHI Destilador B-324 LIGAL 489 Operationmanual - SP
30 pages
Module 1 PDF
No ratings yet
Module 1 PDF
42 pages
Module-2-Introduction To HDFS and Tools
No ratings yet
Module-2-Introduction To HDFS and Tools
38 pages
HDFS (27 Jan 2025 Hadoop Distributed File System)
No ratings yet
HDFS (27 Jan 2025 Hadoop Distributed File System)
73 pages
Big Data Analytics (15CS82) : Venugopala Rao A S Dept. of CSE, SMVITM, Bantakal
No ratings yet
Big Data Analytics (15CS82) : Venugopala Rao A S Dept. of CSE, SMVITM, Bantakal
28 pages
Class 6
No ratings yet
Class 6
34 pages
Unit - 3 (HDFS) - 1
No ratings yet
Unit - 3 (HDFS) - 1
24 pages
Chapter N2 HDFS The Hadoop Distributed File System - Matrix
No ratings yet
Chapter N2 HDFS The Hadoop Distributed File System - Matrix
37 pages
File System Basics: Hadoop Distributed
No ratings yet
File System Basics: Hadoop Distributed
22 pages
Huawei
No ratings yet
Huawei
32 pages
Big Data Ia Answers
No ratings yet
Big Data Ia Answers
14 pages
Big Data Analytics (15CS82) : Venugopala Rao A S Dept. of CSE, SMVITM, Bantakal
No ratings yet
Big Data Analytics (15CS82) : Venugopala Rao A S Dept. of CSE, SMVITM, Bantakal
20 pages
HDFS
No ratings yet
HDFS
15 pages
BDA Mid 2
No ratings yet
BDA Mid 2
21 pages
The Hadoop Distributed File System
No ratings yet
The Hadoop Distributed File System
44 pages
Bigdata 15cs82 Vtu Module 1 2 Notes PDF
No ratings yet
Bigdata 15cs82 Vtu Module 1 2 Notes PDF
49 pages
What Is Hadoop HDFS
No ratings yet
What Is Hadoop HDFS
20 pages
Operating Systems MCAH102: Venugopala Rao A S Head - Computer Applications Department PIM Udupi
No ratings yet
Operating Systems MCAH102: Venugopala Rao A S Head - Computer Applications Department PIM Udupi
20 pages
DrWeb Crash
No ratings yet
DrWeb Crash
12 pages
BigData Module 1
No ratings yet
BigData Module 1
17 pages
BD Module 1 Final
No ratings yet
BD Module 1 Final
17 pages
05 - Introduction To HDFS
No ratings yet
05 - Introduction To HDFS
27 pages
German Observers Guns - Harry Woodman
No ratings yet
German Observers Guns - Harry Woodman
16 pages
Big-Data Computing: Hadoop Distributed File System: B. Ramamurthy
No ratings yet
Big-Data Computing: Hadoop Distributed File System: B. Ramamurthy
43 pages
RTK Notes m1
No ratings yet
RTK Notes m1
16 pages
Module 1 PDF
No ratings yet
Module 1 PDF
49 pages
Nba Lab Details May 2014
No ratings yet
Nba Lab Details May 2014
38 pages
Nursing Care Assignment
No ratings yet
Nursing Care Assignment
8 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Business Case Studies
No ratings yet
Business Case Studies
10 pages
Introduction To Hadoop Ecosystem
No ratings yet
Introduction To Hadoop Ecosystem
46 pages
Big Data Analytics (15CS82) : Venugopala Rao A S Dept. of CSE, SMVITM, Bantakal
No ratings yet
Big Data Analytics (15CS82) : Venugopala Rao A S Dept. of CSE, SMVITM, Bantakal
8 pages
Class 5
No ratings yet
Class 5
10 pages
Hadoop
No ratings yet
Hadoop
9 pages
Kleinman 2011
No ratings yet
Kleinman 2011
9 pages
Employee Retention, Engagement and Careers
No ratings yet
Employee Retention, Engagement and Careers
16 pages
Hadoop File System: B. Ramamurthy
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
Hadoop Distributed File System: Bhavneet Kaur B.Tech Computer Science 2 Year
No ratings yet
Hadoop Distributed File System: Bhavneet Kaur B.Tech Computer Science 2 Year
34 pages
Equilibrium: Three Stooges in Chemical Reactions
No ratings yet
Equilibrium: Three Stooges in Chemical Reactions
5 pages
vb8 Datasheet
No ratings yet
vb8 Datasheet
9 pages
Hadoop File System
No ratings yet
Hadoop File System
36 pages
Reg 216 - B520
No ratings yet
Reg 216 - B520
24 pages
Hadoop Distributed File System
No ratings yet
Hadoop Distributed File System
4 pages
HDFS
No ratings yet
HDFS
8 pages
Bigdata
No ratings yet
Bigdata
5 pages
Exercise About News Item
No ratings yet
Exercise About News Item
3 pages
Cusat Btech Ece S8 Syllabus
No ratings yet
Cusat Btech Ece S8 Syllabus
4 pages
10 Pile Foundation in Road Project
No ratings yet
10 Pile Foundation in Road Project
1 page
Week 03 - Quiz
No ratings yet
Week 03 - Quiz
1 page
DIP Syllabus
No ratings yet
DIP Syllabus
2 pages
Standard Specification For Castings, Austenitic-Ferritic (Duplex) Stainless Steel, For Pressure-Containing Parts
No ratings yet
Standard Specification For Castings, Austenitic-Ferritic (Duplex) Stainless Steel, For Pressure-Containing Parts
6 pages
Blood of The Fold Terry Goodkind Instant Download
100% (1)
Blood of The Fold Terry Goodkind Instant Download
35 pages
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet

Big Data Analytics (15CS82) : Venugopala Rao A S Dept. of CSE, SMVITM, Bantakal

Uploaded by

Big Data Analytics (15CS82) : Venugopala Rao A S Dept. of CSE, SMVITM, Bantakal

Uploaded by

Big Data Analytics

• a file is broken into blocks and replicated across the cluster.

You might also like