This document discusses recovering from Namenode failure in Hadoop. It explains that the Namenode is a single point of failure and stores all HDFS metadata. If it fails, the entire cluster becomes unavailable. However, checkpoints of the metadata are periodically taken by the Secondary Namenode and stored on its local disk. These checkpoints can be used to recover the Namenode by copying the latest fsimage file from the Secondary Namenode to the primary Namenode before restarting it and the HDFS services.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
124 views14 pages
Recover From Namenode Failure
This document discusses recovering from Namenode failure in Hadoop. It explains that the Namenode is a single point of failure and stores all HDFS metadata. If it fails, the entire cluster becomes unavailable. However, checkpoints of the metadata are periodically taken by the Secondary Namenode and stored on its local disk. These checkpoints can be used to recover the Namenode by copying the latest fsimage file from the Secondary Namenode to the primary Namenode before restarting it and the HDFS services.
Agenda What is Namenode Responsibility of Namenode Single point of failure Causes of Namenode failures. Namenode recovery Role of Secondary Namenode FsImage & Edits files. Checkpoints in Hadoop Creating checkpoints Recovery with the help of checkpoint What is Namenode
Namenode is a process which runs on master
machine of Hadoop cluster.
We need to contact Namenode for any read/write
operation in HDFS.
Namenode keeps metadata of the data which is
stored in HDFS.
Namenode cordinates with datanodes to
read/write data in HDFS. Responsibility of a Namenode
Namenode keeps a block map of all the files in
HDFS.
It contacts each datanode & ask for block report.
It creates bigger block report from all datanode
block reports.
It keeps list of live nodes & dead nodes.
It balances the storage of Hadoop cluster.
Single point of failure
Namenode is single point of failure in Hadoop
cluster.
Hadoop cluster is not accessible if Namenode is
down.
We can't do any read/write operation, even
datanodes have all the data.
Hot backup is not yet supported in Hadoop.
Causes of Namenode failure
Master machine can stop working due to
hardware problem.
Namenode metadata can get corrupt.
Without metadata, Namenode is not capable of
finding the data in HDFS.
We can't contact Datanode directly for data.
Checkpoint can be used to recover metadata.
Namenode recovery
Namenode must be recovered in order to
access HDFS.
Hadoop cluster will remain offline, untill we
recover Namenode.
Secondary Namenode can help Namenode
to recover. We can only recover till the last checkpoint saved. Stale data is far better than no data. Role of Secondary Namenode
Secondary Namenode must be on separate machine in
Hadoop production cluster.
Add the following entry in hdfs-site.xml to run Secondary
Namnode on another machine. <property> <name>dfs.secondary.http.address</name> <value>192.168.1.2:50090</value> <description>The secondary Namenode address and port </description> </property>
Checkpoints, which are stored on Secondary Namenode,
helps in Namenode recovery. FsImage & Edits files
FSImage contains snapshot of HDFS metadata.
Namenode loads FSImage at it's startup.
After every read/write operation, FSImage is not
updated.
Instead, all the changes are recorded in edits file.