0% found this document useful (0 votes)

6 views

Job_Scheduling_in_MapReduce

Job Scheduling in MapReduce is crucial for efficiently allocating cluster resources to submitted jobs, aiming to maximize throughput and fairness. Hadoop supports various schedulers, including FIFO, Capacity, and Fair Schedulers, each catering to different resource-sharing needs and user requirements. While these schedulers enhance resource utilization and support multi-tenancy, they also come with complexities and potential drawbacks in configuration and management.

Uploaded by

subramanyau67

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Job_Scheduling_in_MapReduce

Uploaded by

subramanyau67

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Job Scheduling in MapReduce

Definition

Job Scheduling in MapReduce is the mechanism by which Hadoop determines the order and allocation of

cluster resources (like CPU, memory) to submitted jobs. It plays a vital role in ensuring that resources are

fairly and efficiently distributed among multiple users and their applications. The goal is to maximize

throughput, minimize response time, and provide fairness and resource guarantees when necessary.

MapReduce Algorithm

MapReduce is a data processing paradigm that allows for distributed computation on large datasets across a

Hadoop cluster. It is composed of:

- Map Phase: Processes input data to generate intermediate key-value pairs.

- Shuffle and Sort Phase: Intermediate data is sorted and grouped by key.

- Reduce Phase: Aggregates values associated with a specific key to produce the final output.

Job scheduling in this context ensures that tasks in each phase are executed efficiently on available nodes.

Hadoop Schedulers

Schedulers in Hadoop manage how jobs are assigned to resources. They aim to enforce policies such as

fairness, prioritization, and guaranteed capacities. Hadoop supports different types of schedulers to match

varying workload and resource-sharing requirements.

Types of Job Scheduling in MapReduce

1. FIFO Scheduler

- First-In-First-Out (FIFO) was the default scheduler in early Hadoop versions.

Job Scheduling in MapReduce

- Jobs are placed in a single queue and executed in the order of submission.

- It is simple to understand and easy to implement.

- Lacks fairness and may delay short jobs if long jobs are submitted earlier.

- Does not support multi-user or multi-tenant environments.

2. Capacity Scheduler

- Designed to allow sharing of cluster resources among multiple organizations.

- Cluster is divided into multiple queues, each with a guaranteed capacity.

- Queues can have sub-queues to provide more granular resource control.

- Unused capacity in one queue can be temporarily allocated to other queues.

- Supports user-based access control and job priorities.

- Encourages multi-tenancy and fair resource distribution.

- Suitable for enterprise environments with strict capacity guarantees.

3. Fair Scheduler

- Developed by Facebook to provide fair sharing of resources among all running jobs.

- Ensures all users/jobs get approximately equal resource share over time.

- Supports job pools, each with guaranteed minimum and fair shares.

- Allows preemption: if a job exceeds its share, running tasks may be paused or killed.

- Can be configured to support priority, deadlines, and interactive responsiveness.

- Best suited for environments with mixed workloads and multiple users.

Advantages

- Ensures fair sharing of resources among users.

- Supports priorities, allowing urgent jobs to be prioritized.

Job Scheduling in MapReduce

- Enhances resource utilization and system throughput.

- Supports multi-tenancy and queue-based resource management.

- Elastic resource sharing (e.g., Capacity Scheduler allows borrowing unused capacity).

- Fair Scheduler improves responsiveness of short, interactive jobs.

- Adaptable to both small and large-scale cluster environments.

Disadvantages

- FIFO does not support fairness or job prioritization.

- Configuration of Capacity and Fair schedulers can be complex.

- Requires careful tuning to avoid resource starvation or imbalance.

- Preemption may disrupt long-running tasks, affecting stability.

- Monitoring and managing multiple queues and pools can add overhead.

- Improper setup may lead to inefficient cluster usage or unfair resource distribution.

tm111 Summry
100% (1)
tm111 Summry
9 pages
PMI ACP Course Workbook
75% (4)
PMI ACP Course Workbook
273 pages
Scheduling Algorithm
No ratings yet
Scheduling Algorithm
27 pages
Scheduling in Parallel and Distributed Computing
No ratings yet
Scheduling in Parallel and Distributed Computing
2 pages
Schwarzkopf - Omega Algorithm
No ratings yet
Schwarzkopf - Omega Algorithm
14 pages
Pavithra CC AAT
No ratings yet
Pavithra CC AAT
2 pages
Scheduling
No ratings yet
Scheduling
27 pages
Scheduling in YARN
No ratings yet
Scheduling in YARN
7 pages
Linux Clusters Institute: Scheduling
No ratings yet
Linux Clusters Institute: Scheduling
93 pages
Rts Notes 2
No ratings yet
Rts Notes 2
14 pages
Module 4_Yarn Schedulers
No ratings yet
Module 4_Yarn Schedulers
21 pages
Yarn Scheduler
No ratings yet
Yarn Scheduler
13 pages
CSE 487 Lecture-No.4 (Summer 2021)-Task Scheduling (1)
No ratings yet
CSE 487 Lecture-No.4 (Summer 2021)-Task Scheduling (1)
27 pages
Engineering Journal::An Efficient Mapreduce Scheduling Algorithm in Hadoop
No ratings yet
Engineering Journal::An Efficient Mapreduce Scheduling Algorithm in Hadoop
7 pages
Lecture 06 - Data Analytics For IoT A Primer
No ratings yet
Lecture 06 - Data Analytics For IoT A Primer
31 pages
Assignment_DCA1201_Set 1 & 2_ Ans
No ratings yet
Assignment_DCA1201_Set 1 & 2_ Ans
16 pages
Scheduling in Distributed Systems
No ratings yet
Scheduling in Distributed Systems
9 pages
Job Scheduling in High Perfomance Computing
No ratings yet
Job Scheduling in High Perfomance Computing
6 pages
Resource Management in Distributed Systems: Task Assignment, Load-Balancing and Load-Sharing
No ratings yet
Resource Management in Distributed Systems: Task Assignment, Load-Balancing and Load-Sharing
36 pages
Efficient Job Execution For Map Reduce Using Phase-Level Scheduling Algorithm
No ratings yet
Efficient Job Execution For Map Reduce Using Phase-Level Scheduling Algorithm
5 pages
Job Aware Scheduling Algorithm For MapReduce Framework
No ratings yet
Job Aware Scheduling Algorithm For MapReduce Framework
6 pages
My Paper
No ratings yet
My Paper
8 pages
The Hadoop Fair Scheduler: Matei Zaharia
No ratings yet
The Hadoop Fair Scheduler: Matei Zaharia
31 pages
HadoopMapreduce Summerization
No ratings yet
HadoopMapreduce Summerization
24 pages
Lecture8 Resource Management
No ratings yet
Lecture8 Resource Management
71 pages
1063-Article Text-1696-1-10-20191020
No ratings yet
1063-Article Text-1696-1-10-20191020
7 pages
Review On Various Scheduler Used by Hadoop Framework
No ratings yet
Review On Various Scheduler Used by Hadoop Framework
6 pages
Assignment_DCA1201-(2314503234)
No ratings yet
Assignment_DCA1201-(2314503234)
10 pages
working principle of grid scheduling
No ratings yet
working principle of grid scheduling
3 pages
Take A Close Look At: Ma Ed
No ratings yet
Take A Close Look At: Ma Ed
42 pages
Job Scheduling in HPC cluster
No ratings yet
Job Scheduling in HPC cluster
4 pages
Analysis of Hadoop MapReduce Scheduling in Heterog 2021 Ain Shams Engineerin
No ratings yet
Analysis of Hadoop MapReduce Scheduling in Heterog 2021 Ain Shams Engineerin
10 pages
B. Hadoop Ecosystem_III (MapReduce)
No ratings yet
B. Hadoop Ecosystem_III (MapReduce)
55 pages
(IJCST-V12I4P1) :sunkari Mahesh, DR K. Ram Mohan Rao
No ratings yet
(IJCST-V12I4P1) :sunkari Mahesh, DR K. Ram Mohan Rao
7 pages
An Optimized Algorithm For Reduce Task Scheduling: Xiaotong Zhang, Bin Hu, Jiafu Jiang
No ratings yet
An Optimized Algorithm For Reduce Task Scheduling: Xiaotong Zhang, Bin Hu, Jiafu Jiang
8 pages
Distributed Systems: Resource Management
No ratings yet
Distributed Systems: Resource Management
36 pages
Scheduling AMIRA
No ratings yet
Scheduling AMIRA
27 pages
CLOUD COMPUTING UNIT 3
No ratings yet
CLOUD COMPUTING UNIT 3
10 pages
Grid Computing
No ratings yet
Grid Computing
10 pages
Bda CHP2
No ratings yet
Bda CHP2
105 pages
Shangjiang Cluster13
No ratings yet
Shangjiang Cluster13
8 pages
Ditp - ch2 4
No ratings yet
Ditp - ch2 4
2 pages
Unit-Iv: With The Help of An Example Explain The Fair Queue Scheduling Algorithm in Cloud Computing
No ratings yet
Unit-Iv: With The Help of An Example Explain The Fair Queue Scheduling Algorithm in Cloud Computing
21 pages
Mod4_DC MU
No ratings yet
Mod4_DC MU
13 pages
Unit 4 MCQS
No ratings yet
Unit 4 MCQS
9 pages
Map Reduce
No ratings yet
Map Reduce
36 pages
OS UNIT 3
No ratings yet
OS UNIT 3
10 pages
Operating System - Assignment
No ratings yet
Operating System - Assignment
10 pages
Scheduling Data Intensive Workloads Through Virtualization On MapReduce Based Clouds
No ratings yet
Scheduling Data Intensive Workloads Through Virtualization On MapReduce Based Clouds
12 pages
Lecture 5 Scheduling Algorithms
No ratings yet
Lecture 5 Scheduling Algorithms
27 pages
CC-Unit 4 Notes
No ratings yet
CC-Unit 4 Notes
14 pages
Batch Queue Resource Scheduling For Workflow Applications
No ratings yet
Batch Queue Resource Scheduling For Workflow Applications
10 pages
BDA 2 (1)
No ratings yet
BDA 2 (1)
35 pages
Unit 4 DC -VDP
No ratings yet
Unit 4 DC -VDP
60 pages
Task Scdeuling Algorithm
No ratings yet
Task Scdeuling Algorithm
8 pages
Hive Optimization
No ratings yet
Hive Optimization
19 pages
OS mini
No ratings yet
OS mini
20 pages
ECS765P_W3_Hadoop principles and components
No ratings yet
ECS765P_W3_Hadoop principles and components
47 pages
unit-4 rtu kota
No ratings yet
unit-4 rtu kota
17 pages
Needs Selective Improvement
No ratings yet
Needs Selective Improvement
7 pages
Efficient Workload Management with SGE: Definitive Reference for Developers and Engineers
From Everand
Efficient Workload Management with SGE: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Workflow Orchestration with Oozie: Definitive Reference for Developers and Engineers
From Everand
Efficient Workflow Orchestration with Oozie: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Ziehm 7000 Service Manual English b
No ratings yet
Ziehm 7000 Service Manual English b
330 pages
17 Lagrange Interpolation Mathematica Program
No ratings yet
17 Lagrange Interpolation Mathematica Program
14 pages
Soal Usp Inggris
No ratings yet
Soal Usp Inggris
3 pages
301-30 BW Densitometer Operation Manual
No ratings yet
301-30 BW Densitometer Operation Manual
25 pages
Mock Interview Questions
No ratings yet
Mock Interview Questions
18 pages
32 BIT MICROPROCESSOR TRAINER MODULE Z3-EV VoL.2-3
No ratings yet
32 BIT MICROPROCESSOR TRAINER MODULE Z3-EV VoL.2-3
30 pages
GROUP PRESENTATION - CS Workshop
No ratings yet
GROUP PRESENTATION - CS Workshop
16 pages
Sophos Case Study-01
No ratings yet
Sophos Case Study-01
1 page
Vishal CV DXC
No ratings yet
Vishal CV DXC
2 pages
Tivoli Storage Manager - Version 4.2
No ratings yet
Tivoli Storage Manager - Version 4.2
350 pages
Dump
No ratings yet
Dump
18 pages
Magnum 1 Minute System
No ratings yet
Magnum 1 Minute System
21 pages
Functional in SAP
No ratings yet
Functional in SAP
10 pages
Y7 S1 Math Exam
No ratings yet
Y7 S1 Math Exam
18 pages
SAMS User Manual For Admission Into Post-Graduation Courses: CPET - 2022-23
No ratings yet
SAMS User Manual For Admission Into Post-Graduation Courses: CPET - 2022-23
16 pages
Coding Decoding 3
No ratings yet
Coding Decoding 3
28 pages
Real-Time Iot Stream Processing and Large-Scale Data Analytics For Smart City Applications
No ratings yet
Real-Time Iot Stream Processing and Large-Scale Data Analytics For Smart City Applications
50 pages
ECB (Electronic Codebook) Mode
No ratings yet
ECB (Electronic Codebook) Mode
3 pages
Cambium: PMP 400/430 Series Networks PTP 200/230 Series Bridges
No ratings yet
Cambium: PMP 400/430 Series Networks PTP 200/230 Series Bridges
79 pages
Large and Fast: Exploiting Memory Hierarchy
No ratings yet
Large and Fast: Exploiting Memory Hierarchy
48 pages
Overview and Ongoing Works of TMForum
No ratings yet
Overview and Ongoing Works of TMForum
7 pages
Touhou Lunatic Extra Manual v1.0
No ratings yet
Touhou Lunatic Extra Manual v1.0
32 pages
Vision L4 U6 Progress Test A AK
No ratings yet
Vision L4 U6 Progress Test A AK
1 page
LGGoldstarEZ Digital OS3020 OS3040 OS3060 Manual
No ratings yet
LGGoldstarEZ Digital OS3020 OS3040 OS3060 Manual
149 pages
8040 MC CNC
No ratings yet
8040 MC CNC
247 pages
Issues and Challenges For Malaysia To Become Smart Port'
No ratings yet
Issues and Challenges For Malaysia To Become Smart Port'
26 pages
9v Power Supply: Circuit Ideas
No ratings yet
9v Power Supply: Circuit Ideas
3 pages

Job_Scheduling_in_MapReduce

Uploaded by

Job_Scheduling_in_MapReduce

Uploaded by

Job Scheduling in MapReduce

Hadoop cluster. It is composed of:

- Map Phase: Processes input data to generate intermediate key-value pairs.

varying workload and resource-sharing requirements.

Types of Job Scheduling in MapReduce

- First-In-First-Out (FIFO) was the default scheduler in early Hadoop versions.

- It is simple to understand and easy to implement.

- Does not support multi-user or multi-tenant environments.

- Designed to allow sharing of cluster resources among multiple organizations.

- Cluster is divided into multiple queues, each with a guaranteed capacity.

- Queues can have sub-queues to provide more granular resource control.

- Unused capacity in one queue can be temporarily allocated to other queues.

- Supports user-based access control and job priorities.

- Encourages multi-tenancy and fair resource distribution.

- Suitable for enterprise environments with strict capacity guarantees.

- Can be configured to support priority, deadlines, and interactive responsiveness.

- Ensures fair sharing of resources among users.

- Supports priorities, allowing urgent jobs to be prioritized.

- Enhances resource utilization and system throughput.

- Supports multi-tenancy and queue-based resource management.

- Fair Scheduler improves responsiveness of short, interactive jobs.

- Adaptable to both small and large-scale cluster environments.

- FIFO does not support fairness or job prioritization.

- Configuration of Capacity and Fair schedulers can be complex.

- Requires careful tuning to avoid resource starvation or imbalance.

- Preemption may disrupt long-running tasks, affecting stability.

You might also like