0% found this document useful (0 votes)
28 views5 pages

Bda-Wordcount 250805 135324

The document outlines the steps to create a Word Count application using Hadoop in Java, including setting up a project in Eclipse, adding necessary classes, and writing the code for the WordCount, WordMapper, and WordReducer classes. It also details how to add external Hadoop libraries, export the project as a JAR file, upload input files to HDFS, and execute the JAR file to perform the word count operation. Finally, it describes how to view the output of the word count job in HDFS.

Uploaded by

Wizgamer 7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views5 pages

Bda-Wordcount 250805 135324

The document outlines the steps to create a Word Count application using Hadoop in Java, including setting up a project in Eclipse, adding necessary classes, and writing the code for the WordCount, WordMapper, and WordReducer classes. It also details how to add external Hadoop libraries, export the project as a JAR file, upload input files to HDFS, and execute the JAR file to perform the word count operation. Finally, it describes how to view the output of the word count job in HDFS.

Uploaded by

Wizgamer 7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

EXPERIMENT: WORD COUNT IN HADOOP (USING HDFS)

1. Create a Java Project in Eclipse


Open Eclipse IDE
Go to File → New → Java Project
Name the project: wordCountJob

2. Add Classes to the Project


Right-click on wordCountJob → New → Class
Create the following three classes:
i. WordCount.java
ii. WordMapper.java
iii. WordReducer.java

3. Add Code to Each File

Prog 1: WordCount.java

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {


public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.printf("Usage: WordCount <input dir> <output dir>\n");
System.exit(-1);
}

Job job = new Job();


job.setJarByClass(WordCount.class);
job.setJobName("wordCount");

FileInputFormat.setInputPaths(job, new Path(args[0]));


FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.setMapperClass(WordMapper.class);
job.setReducerClass(WordReducer.class);

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

boolean success = job.waitForCompletion(true);


System.exit(success ? 0 : 1);
}
}

Prog 2: WordMapper.java

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordMapper extends Mapper<LongWritable, Text, Text, IntWritable> {


@Override
public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {
String line = value.toString();
for (String word : line.split("\\W+")) {
if (word.length() > 0) {
context.write(new Text(word), new IntWritable(1));
}
}
}
}

Prog 3: WordReducer.java

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WordReducer extends Reducer<Text, IntWritable, Text, IntWritable> {


@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws
IOException, InterruptedException {
int wordCount = 0;
for (IntWritable value : values) {
wordCount += value.get();
}
context.write(key, new IntWritable(wordCount));
}
}

4. Add External Hadoop Libraries

Right-click on wordCountJob → Build Path → Configure Build Path


Go to Libraries → Add External JARs
Navigate to:
/usr/lib/hadoop/
Add core Hadoop jars (like hadoop-core-1.2.1.jar or similar)
Also include JARs from /usr/lib/hadoop/lib/ if required

5. Export Project to JAR File

Right-click on the project → Export


Choose Java → JAR File
Click Next
Choose export destination (e.g., Desktop or src folder)
Name the file: wordCount.jar
Click Finish

6. Copy JAR and Sample File to Workspace (Linux Terminal)

cd ~/workspace/wordCountJob/src
cp /path/to/wordCount.jar ./

//create sample.txt
Hello world
I am eshaan vaswani
I am from BE COMPS TSEC

7. Upload Input File to HDFS

hadoop fs -mkdir -p /user/training/hadoop_eshaan


hadoop fs -put sample.txt /user/training/hadoop_eshaan
hadoop fs -ls /user/training/hadoop_eshaan

8. Run the JAR File using Hadoop

[training@localhost src]$ hadoop jar wordCount.jar WordCount


/user/training/hadoop_eshaan/sample.txt output
25/08/04 23:20:45 WARN mapred.JobClient: Use GenericOptionsParser for parsing the
arguments. Applications should implement Tool for the same.
25/08/04 23:20:45 INFO input.FileInputFormat: Total input paths to process : 1
25/08/04 23:20:45 WARN snappy.LoadSnappy: Snappy native library is available
25/08/04 23:20:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library
25/08/04 23:20:45 INFO snappy.LoadSnappy: Snappy native library loaded
25/08/04 23:20:45 INFO mapred.JobClient: Running job: job_202508010234_0006
25/08/04 23:20:46 INFO mapred.JobClient: map 0% reduce 0%
25/08/04 23:20:48 INFO mapred.JobClient: map 100% reduce 0%
25/08/04 23:20:55 INFO mapred.JobClient: map 100% reduce 100%
25/08/04 23:20:55 INFO mapred.JobClient: Job complete: job_202508010234_0006
25/08/04 23:20:55 INFO mapred.JobClient: Counters: 22
25/08/04 23:20:55 INFO mapred.JobClient: Job Counters
25/08/04 23:20:55 INFO mapred.JobClient: Launched reduce tasks=1
25/08/04 23:20:55 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1318
25/08/04 23:20:55 INFO mapred.JobClient: Total time spent by all reduces waiting after
reserving slots (ms)=0
25/08/04 23:20:55 INFO mapred.JobClient: Total time spent by all maps waiting after
reserving slots (ms)=0
25/08/04 23:20:55 INFO mapred.JobClient: Launched map tasks=1
25/08/04 23:20:55 INFO mapred.JobClient: Data-local map tasks=1
25/08/04 23:20:55 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=7519
25/08/04 23:20:55 INFO mapred.JobClient: FileSystemCounters
25/08/04 23:20:55 INFO mapred.JobClient: FILE_BYTES_READ=134
25/08/04 23:20:55 INFO mapred.JobClient: HDFS_BYTES_READ=176
25/08/04 23:20:55 INFO mapred.JobClient: FILE_BYTES_WRITTEN=110360
25/08/04 23:20:55 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=71
25/08/04 23:20:55 INFO mapred.JobClient: Map-Reduce Framework
25/08/04 23:20:55 INFO mapred.JobClient: Reduce input groups=10
25/08/04 23:20:55 INFO mapred.JobClient: Combine output records=0
25/08/04 23:20:55 INFO mapred.JobClient: Map input records=3
25/08/04 23:20:55 INFO mapred.JobClient: Reduce shuffle bytes=134
25/08/04 23:20:55 INFO mapred.JobClient: Reduce output records=10
25/08/04 23:20:55 INFO mapred.JobClient: Spilled Records=24
25/08/04 23:20:55 INFO mapred.JobClient: Map output bytes=104
25/08/04 23:20:55 INFO mapred.JobClient: Combine input records=0
25/08/04 23:20:55 INFO mapred.JobClient: Map output records=12
25/08/04 23:20:55 INFO mapred.JobClient: SPLIT_RAW_BYTES=120
25/08/04 23:20:55 INFO mapred.JobClient: Reduce input records=12

9. View Output

[training@localhost src]$ hadoop fs -ls output2


Found 3 items
-rw-r--r-- 1 training supergroup 0 2025-08-04 23:20 /user/training/output2/_SUCCESS
drwxr-xr-x - training supergroup 0 2025-08-04 23:20 /user/training/output2/_logs
-rw-r--r-- 1 training supergroup 71 2025-08-04 23:20
/user/training/output2/part-r-00000

[training@localhost src]$ hadoop fs -cat output2/part-r-00000


am 2
be 1
comps 1
eshaan 1
from 1
hello 1
i 2
tsec 1
vaswani 1
world 1

You might also like