0% found this document useful (0 votes)
101 views14 pages

Job Tracker and Task Tracker Overview

The job tracker is the master process that assigns and monitors map and reduce tasks within a Hadoop cluster. It divides jobs into tasks and assigns them to task trackers running on worker nodes. The job tracker is also responsible for monitoring task and node health through task trackers' heartbeat messages. Each cluster has one job tracker that may run on the same or different machine than the name node, depending on cluster size.

Uploaded by

Lord Drawnzer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views14 pages

Job Tracker and Task Tracker Overview

The job tracker is the master process that assigns and monitors map and reduce tasks within a Hadoop cluster. It divides jobs into tasks and assigns them to task trackers running on worker nodes. The job tracker is also responsible for monitoring task and node health through task trackers' heartbeat messages. Each cluster has one job tracker that may run on the same or different machine than the name node, depending on cluster size.

Uploaded by

Lord Drawnzer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Job tracker

• Map reduce daemons:


• Job tracker
• Task tracker
Intro…
• Each cluster can have multiple task tracker but
single job tracker.
• Job tracker is a master process whereas task
tracker is a slave process.
• If working on a small cluster then name node
and job tracker are on the same machine.
• But if large cluster is there then name node
and job tracker are on different machines.
Working….
• When a client submits a job to job tracker
firstly it divides that job and then JT decides
which task should run on which node.
• The process of assigning tasks to worker nodes
is called as task scheduling.
• JT is also responsible for monitoring the
health of all the worker nodes within a cluster
and also for monitoring the tasks assigned to
them.
Contd..

• JT carries out the communication with the


clients and TT for making the use of
RPC(remote procedure calls).
• JT starts monitoring all the jobs and its tasks
within the main memory.
• Hence the memory requirements for the job
tracker is huge.
• It depends upon the no. of tasks and the no.
of tasks differ from one job to another.
Task tracker
• It is a slave process.
• It is the responsibility of the TT to execute all the
tasks assigned by JT.
• Each worker node can have only a single task
tracker process running under them.
• For each task tracker there are no. of map reduce
statements.
• These map reduce statements ask task tracker
how many no. of statements can be executed
simultaneously.
Contd..
• TT periodically keep reporting the current
health status of all the tasks and the progress
of all the tasks through process called as
heartbeat.
Job tracker characteristics…
• Client applications submit jobs to the Job
tracker.
• The JobTracker talks to the Name Node to
determine the location of the data.
• The JobTracker locates TaskTracker nodes with
available slots at or near the data
• The JobTracker submits the work to the
chosen TaskTracker nodes.
• The TaskTracker nodes are monitored.
• If they do not submit heartbeat signals often
enough, they are deemed to have failed and the
work is scheduled on a different TaskTracker.
• A TaskTracker will notify the JobTracker when a
task fails.
• The JobTracker decides what to do then: it may
resubmit the job elsewhere, it may mark that
specific record as something to avoid, and it may
even blacklist the TaskTracker as unreliable.
• When the work is completed, the JobTracker
updates its status.
• Client applications can poll the JobTracker for
information. The JobTracker is a point of
failure for the Hadoop MapReduce service. If
it goes down, all running jobs are halted.
Functionality of Task Tracker

• Task Tracker runs on Data Node.


• Mapped and Reducer tasks are executed on
Data Nodes administered by Task Trackers.
• Task Trackers will be assigned Mapper and
Reducer tasks to execute by Job Tracker.
• Task Tracker will be in constant
communication with the Job Tracker signaling
the progress of the task in execution.
• Task Tracker failure is not considered fatal.
• When a Task Tracker becomes unresponsive,
Job Tracker will assign the task executed by
the Task Tracker to another node.

You might also like