0% found this document useful (0 votes)

7 views5 pages

Unit 3

Uploaded by

chanduvip226

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views5 pages

Unit 3

Uploaded by

chanduvip226

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Unit 3

MapReduce API framework :

The MapReduce API framework, primarily associated with Apache Hadoop, provides a
programming model for processing large datasets in a distributed and parallel manner across a
cluster of machines. It simplifies the complexities of distributed programming by abstracting
away details like data distribution, fault tolerance, and inter-process communication.

The core components of the MapReduce API framework include:

• JobContext Interface
• Job Class
• Mapper Class
• Reducer Class

• InputFormat

• OutputFormat:

• Partitioner:

• Combiner (Optional):

JobContext Interface

The JobContext interface is the super interface for all the classes, which defines different jobs in
MapReduce. It gives you a read-only view of the job that is provided to the tasks while they are
running.

The following are the sub-interfaces of JobContext interface.

S.No. Subinterface Description

1. MapContext<KEYIN, VALUEIN, KEYOUT, VALUEOUT>

Defines the context that is given to the Mapper.

2. ReduceContext<KEYIN, VALUEIN, KEYOUT, VALUEOUT>

Defines the context that is passed to the Reducer.

Job class is the main class that implements the JobContext interface

Job Class

The Job class is the most important class in the MapReduce API. It allows the user to configure
the job, submit it, control its execution, and query the state. The set methods only work until the
job is submitted, afterwards they will throw an IllegalStateException.

Following are the constructor summary of Job class.

S.No Constructor Summary

1 Job()

2 Job(Configuration conf)

3 Job(Configuration conf, String jobName)

Methods

Some of the important methods of Job class are as follows −

S.No Method Description

1 getJobName() -User-specified job name.

2 getJobState() -Returns the current state of the Job.

3 isComplete() -Checks if the job is finished or not.

4 setInputFormatClass() -Sets the InputFormat for the job.

5 setJobName(String name) -Sets the user-specified job name.

6 setOutputFormatClass() -Sets the Output Format for the job.

7 setMapperClass(Class) -Sets the Mapper for the job.

8 setReducerClass(Class) -Sets the Reducer for the job.

9 setPartitionerClass(Class) -Sets the Partitioner for the job.

10 setCombinerClass(Class) -Sets the Combiner for the job.

 Mapper Class:

Users implement the map method within this class. The map method processes a single input
key-value pair and generates zero or more intermediate key-value pairs. This phase focuses on
data transformation and filtering.

 Reducer Class:

Users implement the reduce method within this class. The reduce method receives a key and an
iterable list of values associated with that key (which have been grouped and sorted by the
framework). It then aggregates or combines these values to produce the final output key-value
pairs.

 InputFormat:

This defines how the input data is read and split into records that are fed to the mappers. It
determines the input key-value pairs for the map phase.

 OutputFormat:
This defines how the output of the reduce phase is written to the desired location, typically
HDFS.

 Partitioner:

This optional component determines which reducer receives which intermediate key-value pair
from the mappers. By default, it uses a hash-based partitioning to distribute data evenly across
reducers, but custom partitioners can be implemented for specific needs.

 Combiner (Optional):

This is an optional "mini-reducer" that runs on the mapper side before the data is shuffled to the
reducers. It performs local aggregation to reduce the amount of data transferred over the
network, improving efficiency.

The framework handles the entire workflow, including splitting input data, distributing tasks to
nodes, managing communication and data transfers between map and reduce phases (shuffle and
sort), and ensuring fault tolerance by re-executing failed tasks. The key and value classes used
throughout the process must implement the Writable interface for serialization, and key classes
also need to implement WritableComparable for sorting.

features of mapreduce :
MapReduce is a programming model and software framework used for processing large datasets
in a distributed and parallel manner across a cluster of computers. Its core features contribute to
its effectiveness in big data processing:
Key Features of MapReduce:

 Scalability:
MapReduce can handle massive datasets by distributing computations across a large number of
commodity machines. As data volume grows, more nodes can be added to the cluster to
maintain performance.
 Fault Tolerance:
It is inherently designed to handle failures. If a node or task fails, the framework automatically
detects the failure and re-executes the task on another available node, ensuring job completion
without manual intervention.

 Data Locality:
MapReduce aims to process data where it resides (on the same node or rack), minimizing
network traffic and improving efficiency. This is achieved by scheduling tasks on nodes that
store the relevant data blocks.
 Parallel Processing:
The framework enables parallel execution of tasks. The "Map" phase processes data
independently across multiple nodes, and the "Reduce" phase aggregates the results in parallel.
 Simplicity (for Developers):
Developers primarily need to implement two functions: map() and reduce(). The complex
details of distributed processing, fault tolerance, and scheduling are handled by the MapReduce
framework.
 Cost-Effectiveness:
It leverages commodity hardware, making it a cost-efficient solution for large-scale data
processing compared to traditional high-performance computing systems.

Mapper Class

The Mapper class defines the Map job. Maps input key-value pairs to a set of intermediate key-
value pairs. Maps are the individual tasks that transform the input records into intermediate
records. The transformed intermediate records need not be of the same type as the input records.
A given input pair may map to zero or many output pairs.

Method: map is the most prominent method of the Mapper class.

The syntax is defined below −

map(KEYIN key, VALUEIN value, org.apache.hadoop.mapreduce.Mapper.Context context)

This method is called once for each key-value pair in the input split.

Reducer Class

The Reducer class defines the Reduce job in MapReduce. It reduces a set of intermediate values
that share a key to a smaller set of values. Reducer implementations can access the Configuration
for a job via the JobContext.getConfiguration() method.

A Reducer has three primary phases − Shuffle, Sort, and Reduce.

Shuffle − The Reducer copies the sorted output from each Mapper using HTTP across the
network.
Sort − The framework merge-sorts the Reducer inputs by keys (since different Mappers may
have output the same key). The shuffle and sort phases occur simultaneously, i.e., while outputs
are being fetched, they are merged.

Reduce − In this phase the reduce (Object, Iterable, Context) method is called for each <key,
(collection of values)> in the sorted inputs.

Method: Reduce is the most prominent method of the Reducer class.

The syntax is defined below −

reduce(KEYIN key, Iterable<VALUEIN> values,

org.apache.hadoop.mapreduce.Reducer.Context context)

This method is called once for each key on the collection of key-value pairs

Mapreduce Notes
No ratings yet
Mapreduce Notes
4 pages
BDA Unit-2
100% (1)
BDA Unit-2
11 pages
Cloud Computing Prof
No ratings yet
Cloud Computing Prof
11 pages
Unit 2
No ratings yet
Unit 2
12 pages
3 Unit
No ratings yet
3 Unit
17 pages
Hadoop MapReduce for Developers
No ratings yet
Hadoop MapReduce for Developers
4 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
HadoopMapreduce Summerization
No ratings yet
HadoopMapreduce Summerization
24 pages
21CS1601 Unit 5 Understanding Big Data Technolgies
No ratings yet
21CS1601 Unit 5 Understanding Big Data Technolgies
20 pages
Bda Unit-3
No ratings yet
Bda Unit-3
20 pages
BDA UNIT-3 (1) - Merged
No ratings yet
BDA UNIT-3 (1) - Merged
98 pages
MapReduce Fundamentals Explained
No ratings yet
MapReduce Fundamentals Explained
15 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
No ratings yet
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
12 pages
Unit 3
No ratings yet
Unit 3
14 pages
Unit 5 - Mapreduce
No ratings yet
Unit 5 - Mapreduce
8 pages
Unit 3
No ratings yet
Unit 3
27 pages
Bigdata All Mid-1
No ratings yet
Bigdata All Mid-1
10 pages
Big Data Analytics UNIT 3 Notets
No ratings yet
Big Data Analytics UNIT 3 Notets
12 pages
6 Mapreduce - User Interfaces: Mapper
No ratings yet
6 Mapreduce - User Interfaces: Mapper
1 page
MapReduce for Data Processing
No ratings yet
MapReduce for Data Processing
7 pages
Bda 2
No ratings yet
Bda 2
35 pages
Understanding Hadoop and MapReduce
No ratings yet
Understanding Hadoop and MapReduce
14 pages
Essential MapReduce Interview Questions
No ratings yet
Essential MapReduce Interview Questions
6 pages
Unit IV Notes
No ratings yet
Unit IV Notes
25 pages
UNIT III Notes
No ratings yet
UNIT III Notes
24 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
21 pages
Map Reduce
No ratings yet
Map Reduce
36 pages
Understanding MapReduce Framework
No ratings yet
Understanding MapReduce Framework
120 pages
Hadoop OnePage
No ratings yet
Hadoop OnePage
2 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
Big Data Unit - 3
No ratings yet
Big Data Unit - 3
7 pages
BDS Session 8 MapReduce YARN
No ratings yet
BDS Session 8 MapReduce YARN
68 pages
M4 06 MapReduce
No ratings yet
M4 06 MapReduce
28 pages
P.Prabu (23x61c) CCS334-BDA - Unit-3
No ratings yet
P.Prabu (23x61c) CCS334-BDA - Unit-3
23 pages
Hadoop MapReduce Overview & Usage
No ratings yet
Hadoop MapReduce Overview & Usage
57 pages
BDA Unit 2 Notes
No ratings yet
BDA Unit 2 Notes
32 pages
BDA - Unit 3
No ratings yet
BDA - Unit 3
41 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
Bda Unit-3
No ratings yet
Bda Unit-3
44 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
12 pages
Bda Unit 3
No ratings yet
Bda Unit 3
14 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
Hadoop MapReduce Tutorial Guide
No ratings yet
Hadoop MapReduce Tutorial Guide
20 pages
Unit-III Big Data
No ratings yet
Unit-III Big Data
10 pages
Wibd
No ratings yet
Wibd
39 pages
Unit - Iii
No ratings yet
Unit - Iii
38 pages
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
No ratings yet
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
26 pages
B. Hadoop Ecosystem - III - B (MapReduce Framework)
No ratings yet
B. Hadoop Ecosystem - III - B (MapReduce Framework)
33 pages
BDA Unit-3
No ratings yet
BDA Unit-3
63 pages
Unit 4 1
No ratings yet
Unit 4 1
12 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
Bda Unit 3
No ratings yet
Bda Unit 3
29 pages
BDA Unit 3 Notes
No ratings yet
BDA Unit 3 Notes
11 pages
Unit 2 Topic 5 Developing A Map Reduce Application
No ratings yet
Unit 2 Topic 5 Developing A Map Reduce Application
52 pages
18CS72-Big Data and Analytics 3rd Internal QP 7th Semester - Scheme of Evaluation
No ratings yet
18CS72-Big Data and Analytics 3rd Internal QP 7th Semester - Scheme of Evaluation
14 pages
Computer Science Exam Paper Form 6 P1
No ratings yet
Computer Science Exam Paper Form 6 P1
6 pages
Sysmex White Paper Differential Diagnosis of Thrombocytopenia
No ratings yet
Sysmex White Paper Differential Diagnosis of Thrombocytopenia
5 pages
Social Support and Work-Family Balance
100% (1)
Social Support and Work-Family Balance
8 pages
Boyle's Law: History, Formula, Applications
No ratings yet
Boyle's Law: History, Formula, Applications
1 page
EC Maths Grade 11 November 2023 P2 and Memo 1 12
No ratings yet
EC Maths Grade 11 November 2023 P2 and Memo 1 12
12 pages
Navigation Exam Results Summary
No ratings yet
Navigation Exam Results Summary
34 pages
Coordination and Control in Government Agency
No ratings yet
Coordination and Control in Government Agency
22 pages
Hooman Darabi Rfic
No ratings yet
Hooman Darabi Rfic
49 pages
Lecture 4 INVESTMENT CRITERIA FOR PROJECT APPRAISAL
No ratings yet
Lecture 4 INVESTMENT CRITERIA FOR PROJECT APPRAISAL
49 pages
1SFA898111R7000 pstx170 600 70 Softstarter
No ratings yet
1SFA898111R7000 pstx170 600 70 Softstarter
4 pages
Sulfur Guard Process Specs
100% (1)
Sulfur Guard Process Specs
5 pages
Database ACID Properties Explained
100% (1)
Database ACID Properties Explained
2 pages
Electricity
No ratings yet
Electricity
4 pages
Introductory Statistics 8th Ed by Mann (PDFDrive) - 16-20
100% (1)
Introductory Statistics 8th Ed by Mann (PDFDrive) - 16-20
5 pages
JSON Functions in PySpark 1753482553
No ratings yet
JSON Functions in PySpark 1753482553
9 pages
Xi Special Online Class Schedule With Google Meet Link
No ratings yet
Xi Special Online Class Schedule With Google Meet Link
1 page
Book B
No ratings yet
Book B
47 pages
Iso Metric 32
No ratings yet
Iso Metric 32
2 pages
G09 PBC Calculations Guide
No ratings yet
G09 PBC Calculations Guide
94 pages
Download
No ratings yet
Download
18 pages
Job Satisfaction Analysis at PT Heartwarmer
No ratings yet
Job Satisfaction Analysis at PT Heartwarmer
10 pages
Functionality Added in M-16DX 2.0
No ratings yet
Functionality Added in M-16DX 2.0
24 pages
The Shiva625
No ratings yet
The Shiva625
55 pages
CH 6 Elements, Compounds and Mixtures
No ratings yet
CH 6 Elements, Compounds and Mixtures
9 pages
2.9 Analysing Forces in Equilibrium: Chapter 2 Forces and Motion
No ratings yet
2.9 Analysing Forces in Equilibrium: Chapter 2 Forces and Motion
31 pages
Thrust Bearings Guide for Engineers
No ratings yet
Thrust Bearings Guide for Engineers
9 pages
Lesson 2 Classification of The Elements
No ratings yet
Lesson 2 Classification of The Elements
23 pages
Exer 3
No ratings yet
Exer 3
2 pages
Philips Zenition 50 C Arm
50% (2)
Philips Zenition 50 C Arm
2 pages
Technical Mathematics P1 Grade 10 Exemplar Eng Memo
No ratings yet
Technical Mathematics P1 Grade 10 Exemplar Eng Memo
8 pages

Unit 3

Uploaded by

Unit 3

Uploaded by

Unit 3

MapReduce API framework :

The core components of the MapReduce API framework include:

The following are the sub-interfaces of JobContext interface.

S.No. Subinterface Description

1. MapContext<KEYIN, VALUEIN, KEYOUT, VALUEOUT>

Defines the context that is given to the Mapper.

2. ReduceContext<KEYIN, VALUEIN, KEYOUT, VALUEOUT>

Defines the context that is passed to the Reducer.

Following are the constructor summary of Job class.

3 Job(Configuration conf, String jobName)

Some of the important methods of Job class are as follows −

S.No Method Description

1 getJobName() -User-specified job name.

2 getJobState() -Returns the current state of the Job.

3 isComplete() -Checks if the job is finished or not.

4 setInputFormatClass() -Sets the InputFormat for the job.

5 setJobName(String name) -Sets the user-specified job name.

6 setOutputFormatClass() -Sets the Output Format for the job.

7 setMapperClass(Class) -Sets the Mapper for the job.

8 setReducerClass(Class) -Sets the Reducer for the job.

9 setPartitionerClass(Class) -Sets the Partitioner for the job.

10 setCombinerClass(Class) -Sets the Combiner for the job.

Method: map is the most prominent method of the Mapper class.

The syntax is defined below −

map(KEYIN key, VALUEIN value, org.apache.hadoop.mapreduce.Mapper.Context context)

A Reducer has three primary phases − Shuffle, Sort, and Reduce.

Method: Reduce is the most prominent method of the Reducer class.

The syntax is defined below −

reduce(KEYIN key, Iterable<VALUEIN> values,

You might also like