0% found this document useful (0 votes)

64 views

MapReduce Practical Assignment

The document discusses questions related to a word count MapReduce program. It includes questions about: 1) Copying input files and displaying file contents 2) Writing and running a word count program with output in a mapred_output directory 3) Modifying the code to not perform aggregation 4) Routing specific words to different reducers 5) Routing words to different reducers based on key length 6) Explaining behavior of a custom partitioner routing to reducers 0 and 1.

Uploaded by

gokesex621

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views

MapReduce Practical Assignment

Uploaded by

gokesex621

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

1

Assignment -Week 2

We have an input directory named input_data, which contains two text

files:
Qu 1) Copy the input_data directory to a directory named mapred_input
residing in home directory of hdfs.
Display contents of mapred_input directory in a single command and
also the contents of the files copied. Share the snapshot of the same.

Qu 2)
 Write and execute a wordcount program to count the frequency for
each word in these two input files.
 Create a runnable jar file named wordcount.jar from the above code
and execute that jar on the given input files, the output should be
generated in a directory called mapred_output inside home
directory of hdfs.
 Share the snapshot of the command used to run the jar file in
terminal.
 Display the contents of the mapred_output directory and the
reducer part file generated. Share the snapshot of the same.

Qu 3) What change will you make in the above code if we do not want
any aggregation finally.Just mention the change in the code.

Qu 4) Write the code if we want the words present in the input files -
Hadoop , Elephant to go to one reducer and the other remaining words
to go to the second reducer.
2

 Create a runnable jar file wc_part.jar ,execute the jar file , Share the
snapshot of the command
 Place the output in wc_part_out directory inside home directory of
hdfs. Share the snapshot of the output generated.

Qu 5) Write the code : If

a)the key-length <3 ,output should go to reducer 1
b) the key-length = 3,output should go to reducer 2
c) the key-length >3,output should go to reducer 3
Note: You can comment the previous code of Question 4 and write the
logic there itself.
 Create a Runnable jar file wc_custom.jar ,The output directory
should be named wc_custom_dir, in hdfs. Execute the jar, share the
snapshot of the command for running the jar in terminal and the
output snapshot.

Qu 6)
For the above program, what will happen if you use 3 reducers and in
partitioner class you have below condition:
if key length less than 4 than return 0
else return 1
Please explain.

Qu 7) what will happen if you use 2 reducers and in paritioner class you
have below condition:
if key length less than 4 - than return 0
if key length >= 4 and <6 return 1
3

else return 2
Please explain.

Qu 8) For word count problem, in your reducer if the code is

long count = 0;
for (IntWritable value : values) {
count = count+1;
}
context.write(key, new LongWritable(count));
A) Do you expect correct output if you run this code without combiner
& why. please explain.
B) Do you expect correct output if you run this code with a combiner
class & why. please explain
C) In above problem how will you make sure that output is correct along
with the right optimization. What changes will you make.

Qu 9)
A) What can be the use case when reducer is not required. Please
explain one such use case.
B)Is it good in terms of performance if reducer is not required?
C)Will shuffle and sort come into play when there is no reducer? Please
explain why?

Qu 10)
In Java:

The java.lang.Math.random() is used to return a pseudo random double

type number greater than or equal to 0.0 and less than 1.0. The default
4

random number is always generated between 0 and 1.

If you want to get specific range of values, you have to multiply the
returned value with the magnitude of the range. For example, if you
want to get the random number between 0 to 20, the resultant has to
be multiplied by 20 to get the desired result.

In word count problem ,Consider you are using 2 reducers and we have
written the custom partitioning logic as below:

if (key.length() + Math.random()*5 < 5)

return 0;

else
return 1;

What is the behaviour of the above code. Please explain what do you
feel and why? Do you suggest any changes in the above code ?

**************

class 8 syllabus star plus publications 2025-26
No ratings yet
class 8 syllabus star plus publications 2025-26
1 page
Update to Modern C++
From Everand
Update to Modern C++
James Raynard
No ratings yet
C & C++ Interview Questions You'll Most Likely Be Asked
From Everand
C & C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
How I Became A Famous Novelist - Steve Hely
No ratings yet
How I Became A Famous Novelist - Steve Hely
27 pages
Acoustics, An Introduction To Its Physical Principles and Applications
0% (1)
Acoustics, An Introduction To Its Physical Principles and Applications
5 pages
Week2 - Assignment Solutions
No ratings yet
Week2 - Assignment Solutions
16 pages
Inter BDSD 2022-2023
No ratings yet
Inter BDSD 2022-2023
3 pages
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Experiment-4 BDA LAB
No ratings yet
Experiment-4 BDA LAB
7 pages
Ravikant_Hadoop_file
No ratings yet
Ravikant_Hadoop_file
22 pages
Week2 Quiz
No ratings yet
Week2 Quiz
7 pages
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
From Everand
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
Miguel Miranda de Mattos
No ratings yet
CTBD Ex02
No ratings yet
CTBD Ex02
3 pages
C Programming
From Everand
C Programming
Netra
No ratings yet
Big Data Practical 2
No ratings yet
Big Data Practical 2
11 pages
Python Programming Concepts
From Everand
Python Programming Concepts
MRB
No ratings yet
Lab2 WC
No ratings yet
Lab2 WC
2 pages
Palak
No ratings yet
Palak
10 pages
Advanced Mapreduce
No ratings yet
Advanced Mapreduce
37 pages
Exp-11
No ratings yet
Exp-11
4 pages
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
DSBDA 11
No ratings yet
DSBDA 11
15 pages
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Ravinder Big Data 4 PDF
No ratings yet
Ravinder Big Data 4 PDF
15 pages
Python Pranks and Mischief with NLP
From Everand
Python Pranks and Mischief with NLP
Edward Franklin
No ratings yet
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
A Beginner's guide to Python
From Everand
A Beginner's guide to Python
Steven Mcananey
No ratings yet
Python Programming Using Google Colab
From Everand
Python Programming Using Google Colab
AM Govind Kumar
No ratings yet
CCDH Exam With Answers
No ratings yet
CCDH Exam With Answers
17 pages
Hadoop and Map Reduce
No ratings yet
Hadoop and Map Reduce
27 pages
Word Count Example
No ratings yet
Word Count Example
4 pages
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
Map Reduce
No ratings yet
Map Reduce
57 pages
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
No ratings yet
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
5 pages
Learn Programming Using C#
From Everand
Learn Programming Using C#
Taurius Litvinavicius
No ratings yet
BDC Output 3
No ratings yet
BDC Output 3
4 pages
Windows Batch File Programming
From Everand
Windows Batch File Programming
Michael Elliott
2/5 (2)
Practical 3bcbs
No ratings yet
Practical 3bcbs
5 pages
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
From Everand
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
Kanto
No ratings yet
Hadoop Developingapps PDF
No ratings yet
Hadoop Developingapps PDF
17 pages
WordCount Program Hadoop Task 2
No ratings yet
WordCount Program Hadoop Task 2
7 pages
Learn Java Programming in 24 Hours
From Everand
Learn Java Programming in 24 Hours
PublishDrive
No ratings yet
Coding Interview Questions and Answers
From Everand
Coding Interview Questions and Answers
Chinmoy Mukherjee
No ratings yet
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
BDA
No ratings yet
BDA
6 pages
Example - (Map Function in Word Count)
No ratings yet
Example - (Map Function in Word Count)
6 pages
Gd Script
From Everand
Gd Script
Marijo Trkulja
No ratings yet
Python Interview Questions You'll Most Likely Be Asked
From Everand
Python Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
2/5 (1)
Couchbase Certified Java Developer - Exam Practice Tests
From Everand
Couchbase Certified Java Developer - Exam Practice Tests
Cristian Scutaru
No ratings yet
Test 1 Fall 06
No ratings yet
Test 1 Fall 06
12 pages
QuestionsInterview
No ratings yet
QuestionsInterview
1 page
Running Jar Program
No ratings yet
Running Jar Program
3 pages
Assignment 11 DSBDA
No ratings yet
Assignment 11 DSBDA
4 pages
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
30 pages
Hadoop Mapred
100% (1)
Hadoop Mapred
11 pages
Mastering Node.js Web Development: Go on a comprehensive journey from the fundamentals to advanced web development with Node.js
From Everand
Mastering Node.js Web Development: Go on a comprehensive journey from the fundamentals to advanced web development with Node.js
Adam Freeman
No ratings yet
Exam Bigdata
No ratings yet
Exam Bigdata
2 pages
LPIC-1 Primer
From Everand
LPIC-1 Primer
John Greene
4.5/5 (3)
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
3 MapReduce program ex code
No ratings yet
3 MapReduce program ex code
14 pages
???‍♂️French Contrast Method Program
No ratings yet
???‍♂️French Contrast Method Program
7 pages
Quality Assurence Syallbus
No ratings yet
Quality Assurence Syallbus
23 pages
Instant download Our Natural History The Lessons of Lewis and Clark Daniel B. Botkin pdf all chapter
100% (4)
Instant download Our Natural History The Lessons of Lewis and Clark Daniel B. Botkin pdf all chapter
81 pages
(B-0520) Operating Unit
No ratings yet
(B-0520) Operating Unit
48 pages
BM - Module 1 - IMT Nag
No ratings yet
BM - Module 1 - IMT Nag
24 pages
Bas250 FT 2 2023 2
No ratings yet
Bas250 FT 2 2023 2
44 pages
Educational Assessment of Students With Additional Needs
100% (2)
Educational Assessment of Students With Additional Needs
2 pages
007 0032 (2017) PDF
No ratings yet
007 0032 (2017) PDF
3 pages
Ademe Tamene
No ratings yet
Ademe Tamene
80 pages
50 Year History of The Aiche Ammonia Safety Symposium: Gerald P. Williams
No ratings yet
50 Year History of The Aiche Ammonia Safety Symposium: Gerald P. Williams
9 pages
The Demonizing Representation of Cosplaying in The Case of Nami
No ratings yet
The Demonizing Representation of Cosplaying in The Case of Nami
20 pages
Pearl Hoyt Leadership Manual
No ratings yet
Pearl Hoyt Leadership Manual
145 pages
Street of Eternal Happiness Big City Dreams Along a Shangha Unknown pdf download
100% (2)
Street of Eternal Happiness Big City Dreams Along a Shangha Unknown pdf download
36 pages
VMware NFS Best Practices WP en New
No ratings yet
VMware NFS Best Practices WP en New
23 pages
Computational Chemistry: Geometry Optimization Using Avogadro Software
100% (1)
Computational Chemistry: Geometry Optimization Using Avogadro Software
7 pages
Chapter 6 Strategy Analysis and Choice: Strategic Management: A Competitive Advantage Approach, 14e (David)
No ratings yet
Chapter 6 Strategy Analysis and Choice: Strategic Management: A Competitive Advantage Approach, 14e (David)
29 pages
SD Wan Overview
No ratings yet
SD Wan Overview
10 pages
2016
No ratings yet
2016
241 pages
Theories of Intelligence
No ratings yet
Theories of Intelligence
24 pages
Mock Drill Oil Bottling Plant - Bangalore
No ratings yet
Mock Drill Oil Bottling Plant - Bangalore
34 pages
FYP Indian Pharmaceutical Industry
No ratings yet
FYP Indian Pharmaceutical Industry
37 pages
Phrasal Verbs
No ratings yet
Phrasal Verbs
3 pages
GC 2024 10 13
No ratings yet
GC 2024 10 13
26 pages
BSNL
No ratings yet
BSNL
4 pages
8-Economic Aspect of Irrigation
No ratings yet
8-Economic Aspect of Irrigation
35 pages
Fansadox Collection 222 Witch Hunt Gary Roberts download
No ratings yet
Fansadox Collection 222 Witch Hunt Gary Roberts download
40 pages
Administrative Business Communication Module
No ratings yet
Administrative Business Communication Module
66 pages

MapReduce Practical Assignment

Uploaded by

MapReduce Practical Assignment

Uploaded by

1

We have an input directory named input_data, which contains two text

Qu 5) Write the code : If

Qu 8) For word count problem, in your reducer if the code is

The java.lang.Math.random() is used to return a pseudo random double

random number is always generated between 0 and 1.

if (key.length() + Math.random()*5 < 5)

You might also like