0% found this document useful (0 votes)
2 views3 pages

Wordcount

The document provides a Java implementation of a Word Count program using Hadoop's MapReduce framework. It includes the necessary code for the Mapper and Reducer classes, as well as instructions for compiling and running the program in a Hadoop environment. The example demonstrates how to create input data, execute the job, and view the output results.

Uploaded by

aarthi.acs24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views3 pages

Wordcount

The document provides a Java implementation of a Word Count program using Hadoop's MapReduce framework. It includes the necessary code for the Mapper and Reducer classes, as well as instructions for compiling and running the program in a Hadoop environment. The example demonstrates how to create input data, execute the job, and view the output results.

Uploaded by

aarthi.acs24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Implement a word count program in Hadoop

Java :
Open a notepad or editor and save the below program as WordCount.java

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

public static class TokenizerMapper extends Mapper<Object, Text, Text,


IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(Object key, Text value, Context context) throws IOException,
InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken().replaceAll("\\W+", "").toLowerCase());
if (!word.toString().isEmpty()) {
context.write(word, one);
}
}
}
}

public static class IntSumReducer extends Reducer<Text, IntWritable, Text,


IntWritable> {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values, Context context)


throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: WordCount <input path> <output path>");
System.exit(-1);
}

Configuration conf = new Configuration();


Job job = Job.getInstance(conf, "word count");

job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(args[0]));


FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

In Command Prompt:(Always run CMD as adimistrator only)


C:\Users\KUMARESH\Documents>mkdir -p classes

C:\Users\KUMARESH\Documents>javac -classpath
"C:\hadoop\share\hadoop\common\*;C:\hadoop\share\hadoop\common\lib\*;C:\
hadoop\share\hadoop\mapreduce\*;C:\hadoop\share\hadoop\mapreduce\lib\*;C:
\hadoop\share\hadoop\hdfs\*;C:\hadoop\share\hadoop\hdfs\lib\*" -d classes
WordCount.java

C:\Users\KUMARESH\Documents>jar -cvf wordcount.jar -C classes/ .


added manifest
adding: WordCount$IntSumReducer.class(in = 1739) (out= 742)(deflated 57%)
adding: WordCount$TokenizerMapper.class(in = 1926) (out= 857)(deflated 55%)
adding: WordCount.class(in = 1656) (out= 918)(deflated 44%)

c:\hadoop>start-all.cmd
This script is Deprecated. Instead use start-dfs.cmd and start-yarn.cmd
starting yarn daemons

c:\hadoop>hdfs dfs -mkdir -p /input


Create a sample txt file in a folder ‘input’
Sample data:
apple banana apple
orange banana apple
grape orange banana
apple banana apple
grape orange banana

c:\hadoop>hdfs dfs -put input/file.txt /input

c:\hadoop>hadoop jar wordcount.jar WordCount /input /output2

c:\Users\KUMARESH\Documents>hdfs dfs -cat /output2/part-r-00000


apple 12
banana 12
grape 4
orange 8

You might also like