0% found this document useful (0 votes)
3 views

HADOOP FILE SYSTEM

The document outlines the data flow for reading and writing files in HDFS, detailing the interactions between the client, namenode, and datanodes. For reading, the client uses the open() method to obtain a FSDataInputStream, while for writing, the create() method is called to establish a new file and obtain a FSDataOutputStream. It also describes the master/slave architecture of HDFS, highlighting the roles of the namenode and datanodes in managing file storage and access.

Uploaded by

rbsraja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

HADOOP FILE SYSTEM

The document outlines the data flow for reading and writing files in HDFS, detailing the interactions between the client, namenode, and datanodes. For reading, the client uses the open() method to obtain a FSDataInputStream, while for writing, the create() method is called to establish a new file and obtain a FSDataOutputStream. It also describes the master/slave architecture of HDFS, highlighting the roles of the namenode and datanodes in managing file storage and access.

Uploaded by

rbsraja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 5

DATAFLOW OF FILE READ IN

HDFS
To get an idea of how data flows between the client
interacting with HDFS, the namenode and the datanode,
consider the below diagram,
which shows the main sequence of events when
reading a file.
The client opens the file it wishes to read by calling
open() on the FileSystem object, which for HDFS is an
instance of DistributedFileSystem.

The DistributedFileSystem returns a FSDataInputStream


to the client for it to read data from.
FSDataInputStream in turn wraps a DFSInputStream,
which manages the datanode and
namenode I/O
DATAFLOW OF FILE READ IN
HDFS
DATAFLOW OF FILE WRITE IN
HDFS
The case we’re going to consider is the case of creating a
new file, writing data to it, then closing the file.
The client creates the file by calling create() on
DistributedFileSystem
The namenode performs various checks to make sure the
file doesn’t already exist, and that the client has the right
permissions to create the file. If these checks pass, the
name node makes a record of the new file.
The DistributedFileSystemreturns a SDataOutputStream for
the client to start writing data to. Just as in the read case,
FSDataOutputStream wraps a DFSOutputStream, which
handles communication with the datanodes and
namenode.
DATAFLOW OF FILE READ IN
HDFS
NAMENODE AND DATANODES
 Master/slave architecture
 HDFS cluster consists of a single Namenode, a master
server that manages the file system namespace and
regulates access to files by clients.
 There are a number of DataNodes usually one per node in
a cluster.
 The DataNodes manage storage attached to the nodes
that they run on.
 HDFS exposes a file system namespace and allows user
data to be stored in files.
 A file is split into one or more blocks and set of blocks are
stored in DataNodes.
 DataNodes: serves read, write requests, performs block
creation, deletion, and replication upon instruction from
Namenode.

You might also like