0% found this document useful (0 votes)
6 views

Job_Scheduling_in_MapReduce

Job Scheduling in MapReduce is crucial for efficiently allocating cluster resources to submitted jobs, aiming to maximize throughput and fairness. Hadoop supports various schedulers, including FIFO, Capacity, and Fair Schedulers, each catering to different resource-sharing needs and user requirements. While these schedulers enhance resource utilization and support multi-tenancy, they also come with complexities and potential drawbacks in configuration and management.

Uploaded by

subramanyau67
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Job_Scheduling_in_MapReduce

Job Scheduling in MapReduce is crucial for efficiently allocating cluster resources to submitted jobs, aiming to maximize throughput and fairness. Hadoop supports various schedulers, including FIFO, Capacity, and Fair Schedulers, each catering to different resource-sharing needs and user requirements. While these schedulers enhance resource utilization and support multi-tenancy, they also come with complexities and potential drawbacks in configuration and management.

Uploaded by

subramanyau67
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Job Scheduling in MapReduce

Definition

Job Scheduling in MapReduce is the mechanism by which Hadoop determines the order and allocation of

cluster resources (like CPU, memory) to submitted jobs. It plays a vital role in ensuring that resources are

fairly and efficiently distributed among multiple users and their applications. The goal is to maximize

throughput, minimize response time, and provide fairness and resource guarantees when necessary.

MapReduce Algorithm

MapReduce is a data processing paradigm that allows for distributed computation on large datasets across a

Hadoop cluster. It is composed of:

- Map Phase: Processes input data to generate intermediate key-value pairs.

- Shuffle and Sort Phase: Intermediate data is sorted and grouped by key.

- Reduce Phase: Aggregates values associated with a specific key to produce the final output.

Job scheduling in this context ensures that tasks in each phase are executed efficiently on available nodes.

Hadoop Schedulers

Schedulers in Hadoop manage how jobs are assigned to resources. They aim to enforce policies such as

fairness, prioritization, and guaranteed capacities. Hadoop supports different types of schedulers to match

varying workload and resource-sharing requirements.

Types of Job Scheduling in MapReduce

1. FIFO Scheduler

- First-In-First-Out (FIFO) was the default scheduler in early Hadoop versions.


Job Scheduling in MapReduce

- Jobs are placed in a single queue and executed in the order of submission.

- It is simple to understand and easy to implement.

- Lacks fairness and may delay short jobs if long jobs are submitted earlier.

- Does not support multi-user or multi-tenant environments.

2. Capacity Scheduler

- Designed to allow sharing of cluster resources among multiple organizations.

- Cluster is divided into multiple queues, each with a guaranteed capacity.

- Queues can have sub-queues to provide more granular resource control.

- Unused capacity in one queue can be temporarily allocated to other queues.

- Supports user-based access control and job priorities.

- Encourages multi-tenancy and fair resource distribution.

- Suitable for enterprise environments with strict capacity guarantees.

3. Fair Scheduler

- Developed by Facebook to provide fair sharing of resources among all running jobs.

- Ensures all users/jobs get approximately equal resource share over time.

- Supports job pools, each with guaranteed minimum and fair shares.

- Allows preemption: if a job exceeds its share, running tasks may be paused or killed.

- Can be configured to support priority, deadlines, and interactive responsiveness.

- Best suited for environments with mixed workloads and multiple users.

Advantages

- Ensures fair sharing of resources among users.

- Supports priorities, allowing urgent jobs to be prioritized.


Job Scheduling in MapReduce

- Enhances resource utilization and system throughput.

- Supports multi-tenancy and queue-based resource management.

- Elastic resource sharing (e.g., Capacity Scheduler allows borrowing unused capacity).

- Fair Scheduler improves responsiveness of short, interactive jobs.

- Adaptable to both small and large-scale cluster environments.

Disadvantages

- FIFO does not support fairness or job prioritization.

- Configuration of Capacity and Fair schedulers can be complex.

- Requires careful tuning to avoid resource starvation or imbalance.

- Preemption may disrupt long-running tasks, affecting stability.

- Monitoring and managing multiple queues and pools can add overhead.

- Improper setup may lead to inefficient cluster usage or unfair resource distribution.

You might also like