Job tracker
• Map reduce daemons:
• Job tracker
• Task tracker
Intro…
• Each cluster can have multiple task tracker but
single job tracker.
• Job tracker is a master process whereas task
tracker is a slave process.
• If working on a small cluster then name node
and job tracker are on the same machine.
• But if large cluster is there then name node
and job tracker are on different machines.
Working….
• When a client submits a job to job tracker
firstly it divides that job and then JT decides
which task should run on which node.
• The process of assigning tasks to worker nodes
is called as task scheduling.
• JT is also responsible for monitoring the
health of all the worker nodes within a cluster
and also for monitoring the tasks assigned to
them.
Contd..
• JT carries out the communication with the
clients and TT for making the use of
RPC(remote procedure calls).
• JT starts monitoring all the jobs and its tasks
within the main memory.
• Hence the memory requirements for the job
tracker is huge.
• It depends upon the no. of tasks and the no.
of tasks differ from one job to another.
Task tracker
• It is a slave process.
• It is the responsibility of the TT to execute all the
tasks assigned by JT.
• Each worker node can have only a single task
tracker process running under them.
• For each task tracker there are no. of map reduce
statements.
• These map reduce statements ask task tracker
how many no. of statements can be executed
simultaneously.
Contd..
• TT periodically keep reporting the current
health status of all the tasks and the progress
of all the tasks through process called as
heartbeat.
Job tracker characteristics…
• Client applications submit jobs to the Job
tracker.
• The JobTracker talks to the Name Node to
determine the location of the data.
• The JobTracker locates TaskTracker nodes with
available slots at or near the data
• The JobTracker submits the work to the
chosen TaskTracker nodes.
• The TaskTracker nodes are monitored.
• If they do not submit heartbeat signals often
enough, they are deemed to have failed and the
work is scheduled on a different TaskTracker.
• A TaskTracker will notify the JobTracker when a
task fails.
• The JobTracker decides what to do then: it may
resubmit the job elsewhere, it may mark that
specific record as something to avoid, and it may
even blacklist the TaskTracker as unreliable.
• When the work is completed, the JobTracker
updates its status.
• Client applications can poll the JobTracker for
information. The JobTracker is a point of
failure for the Hadoop MapReduce service. If
it goes down, all running jobs are halted.
Functionality of Task Tracker
• Task Tracker runs on Data Node.
• Mapped and Reducer tasks are executed on
Data Nodes administered by Task Trackers.
• Task Trackers will be assigned Mapper and
Reducer tasks to execute by Job Tracker.
• Task Tracker will be in constant
communication with the Job Tracker signaling
the progress of the task in execution.
• Task Tracker failure is not considered fatal.
• When a Task Tracker becomes unresponsive,
Job Tracker will assign the task executed by
the Task Tracker to another node.