0% found this document useful (0 votes)

2 views3 pages

Wordcount

The document provides a Java implementation of a Word Count program using Hadoop's MapReduce framework. It includes the necessary code for the Mapper and Reducer classes, as well as instructions for compiling and running the program in a Hadoop environment. The example demonstrates how to create input data, execute the job, and view the output results.

Uploaded by

aarthi.acs24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views3 pages

Wordcount

Uploaded by

aarthi.acs24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Implement a word count program in Hadoop

Java :
Open a notepad or editor and save the below program as WordCount.java

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

public static class TokenizerMapper extends Mapper<Object, Text, Text,

IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(Object key, Text value, Context context) throws IOException,
InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken().replaceAll("\\W+", "").toLowerCase());
if (!word.toString().isEmpty()) {
context.write(word, one);
}
}
}
}

public static class IntSumReducer extends Reducer<Text, IntWritable, Text,

IntWritable> {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values, Context context)

throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: WordCount <input path> <output path>");
System.exit(-1);
}

Configuration conf = new Configuration();

Job job = Job.getInstance(conf, "word count");

job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

In Command Prompt:(Always run CMD as adimistrator only)

C:\Users\KUMARESH\Documents>mkdir -p classes

C:\Users\KUMARESH\Documents>javac -classpath
"C:\hadoop\share\hadoop\common\*;C:\hadoop\share\hadoop\common\lib\*;C:\
hadoop\share\hadoop\mapreduce\*;C:\hadoop\share\hadoop\mapreduce\lib\*;C:
\hadoop\share\hadoop\hdfs\*;C:\hadoop\share\hadoop\hdfs\lib\*" -d classes
WordCount.java

C:\Users\KUMARESH\Documents>jar -cvf wordcount.jar -C classes/ .

added manifest
adding: WordCount$IntSumReducer.class(in = 1739) (out= 742)(deflated 57%)
adding: WordCount$TokenizerMapper.class(in = 1926) (out= 857)(deflated 55%)
adding: WordCount.class(in = 1656) (out= 918)(deflated 44%)

c:\hadoop>start-all.cmd
This script is Deprecated. Instead use start-dfs.cmd and start-yarn.cmd
starting yarn daemons

c:\hadoop>hdfs dfs -mkdir -p /input

Create a sample txt file in a folder ‘input’
Sample data:
apple banana apple
orange banana apple
grape orange banana
apple banana apple
grape orange banana

c:\hadoop>hdfs dfs -put input/file.txt /input

c:\hadoop>hadoop jar wordcount.jar WordCount /input /output2

c:\Users\KUMARESH\Documents>hdfs dfs -cat /output2/part-r-00000

apple 12
banana 12
grape 4
orange 8

b93c La Guia Esencial Definitiva Pokemon Pokemon Deluxe Essential Handbook Todo Lo Que Necesitas Saber Sobre Mas de 700 Pokemon The Need To Know Stats and Facts On Over 700 Pokemon Pokem
25% (8)
b93c La Guia Esencial Definitiva Pokemon Pokemon Deluxe Essential Handbook Todo Lo Que Necesitas Saber Sobre Mas de 700 Pokemon The Need To Know Stats and Facts On Over 700 Pokemon Pokem
2 pages
Fields of Vision
100% (21)
Fields of Vision
383 pages
Run Wordcount
No ratings yet
Run Wordcount
3 pages
Experiment 6 BDA
No ratings yet
Experiment 6 BDA
4 pages
Word Count Program
No ratings yet
Word Count Program
3 pages
Wordcount
No ratings yet
Wordcount
3 pages
Practical 2c
No ratings yet
Practical 2c
2 pages
MapReduceProgram
No ratings yet
MapReduceProgram
2 pages
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
No ratings yet
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
5 pages
Practical 3bcbs
No ratings yet
Practical 3bcbs
5 pages
Source Code for Wordcount
No ratings yet
Source Code for Wordcount
3 pages
ExNo04
No ratings yet
ExNo04
4 pages
✅ PART 1- Install Java and Hadoop on Ubuntu
No ratings yet
✅ PART 1- Install Java and Hadoop on Ubuntu
4 pages
To Count Using Map and Reduce Program: Wordcount - Java
No ratings yet
To Count Using Map and Reduce Program: Wordcount - Java
2 pages
11. WordCountApp
No ratings yet
11. WordCountApp
2 pages
Ravikant_Hadoop_file
No ratings yet
Ravikant_Hadoop_file
22 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
5 pages
1WordCount
No ratings yet
1WordCount
2 pages
Ravinder Big Data 4 PDF
No ratings yet
Ravinder Big Data 4 PDF
15 pages
Lab2 WC
No ratings yet
Lab2 WC
2 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
WordCount Program Hadoop Task 2
No ratings yet
WordCount Program Hadoop Task 2
7 pages
Map Reduce Java Program
No ratings yet
Map Reduce Java Program
2 pages
Exp 3-Word Count
No ratings yet
Exp 3-Word Count
4 pages
BDA
No ratings yet
BDA
6 pages
Codigo Haddop
No ratings yet
Codigo Haddop
3 pages
Hadoop WordCount
No ratings yet
Hadoop WordCount
2 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
Word Count Program
No ratings yet
Word Count Program
2 pages
DSBDA GRP B Print
No ratings yet
DSBDA GRP B Print
21 pages
Experiment-4 BDA LAB
No ratings yet
Experiment-4 BDA LAB
7 pages
579 BDA Week-04
No ratings yet
579 BDA Week-04
1 page
Core Java Programming Book
From Everand
Core Java Programming Book
Manish Soni
No ratings yet
049
No ratings yet
049
2 pages
Run Code
No ratings yet
Run Code
1 page
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
No ratings yet
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
4 pages
Word_Count(2021)
No ratings yet
Word_Count(2021)
50 pages
Map Reduce Example
No ratings yet
Map Reduce Example
6 pages
MapReduce Word Count Example - Javatpoint
No ratings yet
MapReduce Word Count Example - Javatpoint
12 pages
CTBD Sol02
No ratings yet
CTBD Sol02
2 pages
B1 instructions
No ratings yet
B1 instructions
9 pages
Steps to create jar file and execute word count problem in mapper reducer
No ratings yet
Steps to create jar file and execute word count problem in mapper reducer
5 pages
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
ADA Lab Manual
No ratings yet
ADA Lab Manual
34 pages
wc
No ratings yet
wc
13 pages
Hadoop Map-Reduce
No ratings yet
Hadoop Map-Reduce
2 pages
Running Jar Program
No ratings yet
Running Jar Program
3 pages
BDC Output 3
No ratings yet
BDC Output 3
4 pages
Exp-11
No ratings yet
Exp-11
4 pages
DSBDA 11
No ratings yet
DSBDA 11
15 pages
Word Count Example
No ratings yet
Word Count Example
4 pages
Part B Assignment - No - 1
No ratings yet
Part B Assignment - No - 1
6 pages
32 BDA Exp3
No ratings yet
32 BDA Exp3
11 pages
Assignment 11 DSBDA
No ratings yet
Assignment 11 DSBDA
4 pages
Introduction to PHP, Part 5, Second Edition
From Everand
Introduction to PHP, Part 5, Second Edition
Adam Majczak
No ratings yet
DA Lab Program-2
No ratings yet
DA Lab Program-2
6 pages
02-Wordcount Mapreduce
No ratings yet
02-Wordcount Mapreduce
5 pages
13-Writing a MapReduce Application-24-01-2025
No ratings yet
13-Writing a MapReduce Application-24-01-2025
5 pages
Wordcount - Java: Mapreduce Tutorial
No ratings yet
Wordcount - Java: Mapreduce Tutorial
1 page
MapReduce Programs
No ratings yet
MapReduce Programs
10 pages
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
No ratings yet
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
9 pages
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Handout
No ratings yet
Handout
3 pages
FieldServer Configuration Manual
No ratings yet
FieldServer Configuration Manual
89 pages
Unseen 2.1 June 2023
No ratings yet
Unseen 2.1 June 2023
4 pages
Catalog SoMove Setup Software For Motor Control Devices - May 2017
No ratings yet
Catalog SoMove Setup Software For Motor Control Devices - May 2017
9 pages
Shweder, Sullivan 1990 - The Semiotic Subject of Cultural PDF
No ratings yet
Shweder, Sullivan 1990 - The Semiotic Subject of Cultural PDF
21 pages
54 Pastoral Epistles Student Handouts
No ratings yet
54 Pastoral Epistles Student Handouts
107 pages
Raz PDF
No ratings yet
Raz PDF
31 pages
Frequencies: Frequencies Variables Height /order Analysis
No ratings yet
Frequencies: Frequencies Variables Height /order Analysis
6 pages
Magic Bookifier - The Story of Hue-Jang
No ratings yet
Magic Bookifier - The Story of Hue-Jang
13 pages
Id3v2.4.0 Structure
No ratings yet
Id3v2.4.0 Structure
14 pages
1.2 Rational Numbers
No ratings yet
1.2 Rational Numbers
33 pages
PixiMaths Aiming for Grade 9 solutions
No ratings yet
PixiMaths Aiming for Grade 9 solutions
76 pages
18 QW
No ratings yet
18 QW
4 pages
GLOBAL PERSPECTIVE PROJECT
No ratings yet
GLOBAL PERSPECTIVE PROJECT
7 pages
Hamming code
No ratings yet
Hamming code
8 pages
Ge 107
No ratings yet
Ge 107
3 pages
Mini Mental Status Examination (MMSE) Component Description Patient Score Points I. Orientation
No ratings yet
Mini Mental Status Examination (MMSE) Component Description Patient Score Points I. Orientation
1 page
Level 6 Final Test V 2
No ratings yet
Level 6 Final Test V 2
3 pages
Nagaraju Juluru@fusion p2p
No ratings yet
Nagaraju Juluru@fusion p2p
5 pages
Real-time Dynamics of False Vacuum Decay
No ratings yet
Real-time Dynamics of False Vacuum Decay
18 pages
Comenzi Java
No ratings yet
Comenzi Java
6 pages
Homi K Bhabha notes
No ratings yet
Homi K Bhabha notes
3 pages
GTC 1
No ratings yet
GTC 1
183 pages
Results - Baseline Gara - Gps1
No ratings yet
Results - Baseline Gara - Gps1
7 pages
Homework 09
No ratings yet
Homework 09
4 pages
Tenses Lesson Plan
No ratings yet
Tenses Lesson Plan
3 pages
Competency Test 2
33% (3)
Competency Test 2
4 pages
Berlitz English Sample LEVEL3
0% (3)
Berlitz English Sample LEVEL3
1 page

Wordcount

Uploaded by

Wordcount

Uploaded by

Implement a word count program in Hadoop

public class WordCount {

public static class TokenizerMapper extends Mapper<Object, Text, Text,

public static class IntSumReducer extends Reducer<Text, IntWritable, Text,

public void reduce(Text key, Iterable<IntWritable> values, Context context)

Configuration conf = new Configuration();

FileInputFormat.addInputPath(job, new Path(args[0]));

In Command Prompt:(Always run CMD as adimistrator only)

C:\Users\KUMARESH\Documents>jar -cvf wordcount.jar -C classes/ .

c:\hadoop>hdfs dfs -mkdir -p /input

c:\hadoop>hdfs dfs -put input/file.txt /input

c:\hadoop>hadoop jar wordcount.jar WordCount /input /output2

c:\Users\KUMARESH\Documents>hdfs dfs -cat /output2/part-r-00000

You might also like