0% found this document useful (0 votes)

28 views5 pages

Bda-Wordcount 250805 135324

The document outlines the steps to create a Word Count application using Hadoop in Java, including setting up a project in Eclipse, adding necessary classes, and writing the code for the WordCount, WordMapper, and WordReducer classes. It also details how to add external Hadoop libraries, export the project as a JAR file, upload input files to HDFS, and execute the JAR file to perform the word count operation. Finally, it describes how to view the output of the word count job in HDFS.

Uploaded by

Wizgamer 7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views5 pages

Bda-Wordcount 250805 135324

Uploaded by

Wizgamer 7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

EXPERIMENT: WORD COUNT IN HADOOP (USING HDFS)

1. Create a Java Project in Eclipse

Open Eclipse IDE
Go to File → New → Java Project
Name the project: wordCountJob

2. Add Classes to the Project

Right-click on wordCountJob → New → Class
Create the following three classes:
i. WordCount.java
ii. WordMapper.java
iii. WordReducer.java

3. Add Code to Each File

Prog 1: WordCount.java

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.printf("Usage: WordCount <input dir> <output dir>\n");
System.exit(-1);
}

Job job = new Job();

job.setJarByClass(WordCount.class);
job.setJobName("wordCount");

FileInputFormat.setInputPaths(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.setMapperClass(WordMapper.class);
job.setReducerClass(WordReducer.class);

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

boolean success = job.waitForCompletion(true);

System.exit(success ? 0 : 1);
}
}

Prog 2: WordMapper.java

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

@Override
public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {
String line = value.toString();
for (String word : line.split("\\W+")) {
if (word.length() > 0) {
context.write(new Text(word), new IntWritable(1));
}
}
}
}

Prog 3: WordReducer.java

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WordReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws
IOException, InterruptedException {
int wordCount = 0;
for (IntWritable value : values) {
wordCount += value.get();
}
context.write(key, new IntWritable(wordCount));
}
}

4. Add External Hadoop Libraries

Right-click on wordCountJob → Build Path → Configure Build Path

Go to Libraries → Add External JARs
Navigate to:
/usr/lib/hadoop/
Add core Hadoop jars (like hadoop-core-1.2.1.jar or similar)
Also include JARs from /usr/lib/hadoop/lib/ if required

5. Export Project to JAR File

Right-click on the project → Export

Choose Java → JAR File
Click Next
Choose export destination (e.g., Desktop or src folder)
Name the file: wordCount.jar
Click Finish

6. Copy JAR and Sample File to Workspace (Linux Terminal)

cd ~/workspace/wordCountJob/src
cp /path/to/wordCount.jar ./

//create sample.txt
Hello world
I am eshaan vaswani
I am from BE COMPS TSEC

7. Upload Input File to HDFS

hadoop fs -mkdir -p /user/training/hadoop_eshaan

hadoop fs -put sample.txt /user/training/hadoop_eshaan
hadoop fs -ls /user/training/hadoop_eshaan

8. Run the JAR File using Hadoop

[training@localhost src]$ hadoop jar wordCount.jar WordCount

/user/training/hadoop_eshaan/sample.txt output
25/08/04 23:20:45 WARN mapred.JobClient: Use GenericOptionsParser for parsing the
arguments. Applications should implement Tool for the same.
25/08/04 23:20:45 INFO input.FileInputFormat: Total input paths to process : 1
25/08/04 23:20:45 WARN snappy.LoadSnappy: Snappy native library is available
25/08/04 23:20:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library
25/08/04 23:20:45 INFO snappy.LoadSnappy: Snappy native library loaded
25/08/04 23:20:45 INFO mapred.JobClient: Running job: job_202508010234_0006
25/08/04 23:20:46 INFO mapred.JobClient: map 0% reduce 0%
25/08/04 23:20:48 INFO mapred.JobClient: map 100% reduce 0%
25/08/04 23:20:55 INFO mapred.JobClient: map 100% reduce 100%
25/08/04 23:20:55 INFO mapred.JobClient: Job complete: job_202508010234_0006
25/08/04 23:20:55 INFO mapred.JobClient: Counters: 22
25/08/04 23:20:55 INFO mapred.JobClient: Job Counters
25/08/04 23:20:55 INFO mapred.JobClient: Launched reduce tasks=1
25/08/04 23:20:55 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1318
25/08/04 23:20:55 INFO mapred.JobClient: Total time spent by all reduces waiting after
reserving slots (ms)=0
25/08/04 23:20:55 INFO mapred.JobClient: Total time spent by all maps waiting after
reserving slots (ms)=0
25/08/04 23:20:55 INFO mapred.JobClient: Launched map tasks=1
25/08/04 23:20:55 INFO mapred.JobClient: Data-local map tasks=1
25/08/04 23:20:55 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=7519
25/08/04 23:20:55 INFO mapred.JobClient: FileSystemCounters
25/08/04 23:20:55 INFO mapred.JobClient: FILE_BYTES_READ=134
25/08/04 23:20:55 INFO mapred.JobClient: HDFS_BYTES_READ=176
25/08/04 23:20:55 INFO mapred.JobClient: FILE_BYTES_WRITTEN=110360
25/08/04 23:20:55 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=71
25/08/04 23:20:55 INFO mapred.JobClient: Map-Reduce Framework
25/08/04 23:20:55 INFO mapred.JobClient: Reduce input groups=10
25/08/04 23:20:55 INFO mapred.JobClient: Combine output records=0
25/08/04 23:20:55 INFO mapred.JobClient: Map input records=3
25/08/04 23:20:55 INFO mapred.JobClient: Reduce shuffle bytes=134
25/08/04 23:20:55 INFO mapred.JobClient: Reduce output records=10
25/08/04 23:20:55 INFO mapred.JobClient: Spilled Records=24
25/08/04 23:20:55 INFO mapred.JobClient: Map output bytes=104
25/08/04 23:20:55 INFO mapred.JobClient: Combine input records=0
25/08/04 23:20:55 INFO mapred.JobClient: Map output records=12
25/08/04 23:20:55 INFO mapred.JobClient: SPLIT_RAW_BYTES=120
25/08/04 23:20:55 INFO mapred.JobClient: Reduce input records=12

9. View Output

[training@localhost src]$ hadoop fs -ls output2

Found 3 items
-rw-r--r-- 1 training supergroup 0 2025-08-04 23:20 /user/training/output2/_SUCCESS
drwxr-xr-x - training supergroup 0 2025-08-04 23:20 /user/training/output2/_logs
-rw-r--r-- 1 training supergroup 71 2025-08-04 23:20
/user/training/output2/part-r-00000

[training@localhost src]$ hadoop fs -cat output2/part-r-00000

am 2
be 1
comps 1
eshaan 1
from 1
hello 1
i 2
tsec 1
vaswani 1
world 1

Create and Run Jar in Hadoop Cluster
No ratings yet
Create and Run Jar in Hadoop Cluster
4 pages
BDF Programs
No ratings yet
BDF Programs
32 pages
Running Jar Program
No ratings yet
Running Jar Program
3 pages
Java Word Count MapReduce Example
No ratings yet
Java Word Count MapReduce Example
2 pages
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
No ratings yet
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
13 pages
02-Wordcount Mapreduce
No ratings yet
02-Wordcount Mapreduce
5 pages
Java Word Count Program in Hadoop
No ratings yet
Java Word Count Program in Hadoop
7 pages
First Map-Reduce Program in Hadoop
No ratings yet
First Map-Reduce Program in Hadoop
22 pages
Hadoop MapReduce WordCount Guide
No ratings yet
Hadoop MapReduce WordCount Guide
5 pages
Experiment-4 BDA LAB
No ratings yet
Experiment-4 BDA LAB
7 pages
Hadoop MapReduce Word Count Lab
No ratings yet
Hadoop MapReduce Word Count Lab
8 pages
Java Hadoop Word Count Tutorial
No ratings yet
Java Hadoop Word Count Tutorial
4 pages
Big Data Analytics with Hadoop Guide
No ratings yet
Big Data Analytics with Hadoop Guide
10 pages
MapReduce Word Count Implementation
No ratings yet
MapReduce Word Count Implementation
12 pages
Java MapReduce Word Count Program
No ratings yet
Java MapReduce Word Count Program
2 pages
DSBDA GRP B 1
No ratings yet
DSBDA GRP B 1
8 pages
Create and Execute Word Count Jar File
No ratings yet
Create and Execute Word Count Jar File
5 pages
Word Count Example
No ratings yet
Word Count Example
4 pages
MapReduce Word Count Tutorial
No ratings yet
MapReduce Word Count Tutorial
6 pages
Wordcount
No ratings yet
Wordcount
3 pages
Hadoop Word Count MapReduce Tutorial
No ratings yet
Hadoop Word Count MapReduce Tutorial
11 pages
MapReduce Word Count Program Guide
No ratings yet
MapReduce Word Count Program Guide
14 pages
MapReduce Programs
No ratings yet
MapReduce Programs
10 pages
DSBDA GRP B 1
No ratings yet
DSBDA GRP B 1
8 pages
MapReduce Programming: Word Count & Union
No ratings yet
MapReduce Programming: Word Count & Union
11 pages
MapReduce WordCount and Matrix Operations
No ratings yet
MapReduce WordCount and Matrix Operations
17 pages
BDA3
No ratings yet
BDA3
7 pages
Hadoop Mini Project
No ratings yet
Hadoop Mini Project
8 pages
Dsbda 11
No ratings yet
Dsbda 11
15 pages
Advanced Mapreduce
No ratings yet
Advanced Mapreduce
37 pages
Installing Apache Hadoop 2.2.0 Guide
No ratings yet
Installing Apache Hadoop 2.2.0 Guide
6 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
58 pages
Java MapReduce Word Count Example
No ratings yet
Java MapReduce Word Count Example
15 pages
BDAPract 4
No ratings yet
BDAPract 4
5 pages
Hadoop Installation & MapReduce Guide
No ratings yet
Hadoop Installation & MapReduce Guide
13 pages
Word Count MapReduce Tutorial Guide
No ratings yet
Word Count MapReduce Tutorial Guide
4 pages
BDA Output
No ratings yet
BDA Output
32 pages
Mapreduce Program
No ratings yet
Mapreduce Program
3 pages
Word Count Program in MapReduce
No ratings yet
Word Count Program in MapReduce
5 pages
MapReduce Word Counting with Hadoop
No ratings yet
MapReduce Word Counting with Hadoop
19 pages
Java Word Count MapReduce Example
No ratings yet
Java Word Count MapReduce Example
4 pages
Hadoop HDFS MapReduce Guide
No ratings yet
Hadoop HDFS MapReduce Guide
57 pages
B1 Instructions
No ratings yet
B1 Instructions
9 pages
Word Count Program
No ratings yet
Word Count Program
2 pages
Java MapReduce Word Count Example
No ratings yet
Java MapReduce Word Count Example
2 pages
Understanding Hadoop MapReduce Workflow
No ratings yet
Understanding Hadoop MapReduce Workflow
20 pages
Big Data MapReduce Training Guide
No ratings yet
Big Data MapReduce Training Guide
3 pages
Classcreation
No ratings yet
Classcreation
2 pages
Dsbda Group B 1
No ratings yet
Dsbda Group B 1
5 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
48 pages
Lab-1-Steps-Word Count Problem-Hadoop
No ratings yet
Lab-1-Steps-Word Count Problem-Hadoop
6 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
Exp3 - Map Reduce Code
No ratings yet
Exp3 - Map Reduce Code
2 pages
Install Java and Hadoop on Ubuntu
No ratings yet
Install Java and Hadoop on Ubuntu
4 pages
Understanding MapReduce Architecture
No ratings yet
Understanding MapReduce Architecture
49 pages
MapReduce Word Count Program Guide
No ratings yet
MapReduce Word Count Program Guide
4 pages
Hadoop Mapred
100% (1)
Hadoop Mapred
11 pages
Hadoop MapReduce Customer Data Processing
No ratings yet
Hadoop MapReduce Customer Data Processing
15 pages
Hadoop Word Count with MapReduce
No ratings yet
Hadoop Word Count with MapReduce
6 pages
S-3 Sectional Orthographic 230322 163027
No ratings yet
S-3 Sectional Orthographic 230322 163027
4 pages
ML Pt-I 5. QB
No ratings yet
ML Pt-I 5. QB
1 page
Se Pt-Ii QB 2024 - 241008 - 204910
No ratings yet
Se Pt-Ii QB 2024 - 241008 - 204910
1 page
NLP PT1 - Syllabus25 (1) - 250820 - 123838
No ratings yet
NLP PT1 - Syllabus25 (1) - 250820 - 123838
10 pages
(English (Auto-Generated) ) How To Learn Any Skill So Fast It Feels Illegal ? (DownSub - Com)
No ratings yet
(English (Auto-Generated) ) How To Learn Any Skill So Fast It Feels Illegal ? (DownSub - Com)
6 pages
Seasonal Phytochemical Profile of Azolla
No ratings yet
Seasonal Phytochemical Profile of Azolla
4 pages
The Effect of H-WPS Office
No ratings yet
The Effect of H-WPS Office
9 pages
Astm D1708 13
No ratings yet
Astm D1708 13
3 pages
Data Centers and Telecommunication Facilities - Part 01
No ratings yet
Data Centers and Telecommunication Facilities - Part 01
25 pages
Super Shampoo Questions
No ratings yet
Super Shampoo Questions
6 pages
FIR Filter Design: Dr.G.Ramesh Babu
No ratings yet
FIR Filter Design: Dr.G.Ramesh Babu
96 pages
BASIC SCIENCE JSS 3 SECOND TERM EXAMINATION - EduDelightTutors
No ratings yet
BASIC SCIENCE JSS 3 SECOND TERM EXAMINATION - EduDelightTutors
1 page
P.O.W.E.R. Learning Note-Taking Guide
100% (113)
P.O.W.E.R. Learning Note-Taking Guide
13 pages
Understanding Thinking and Creativity
No ratings yet
Understanding Thinking and Creativity
54 pages
Organic Input Operators List
No ratings yet
Organic Input Operators List
2 pages
深度學習入門4强化學習 (簡體) 部分166
No ratings yet
深度學習入門4强化學習 (簡體) 部分166
2 pages
The Legend of Lake Lanao's Creation
No ratings yet
The Legend of Lake Lanao's Creation
2 pages
Tools and Equipments Used in Aquaculture - PPTX Gr.8
No ratings yet
Tools and Equipments Used in Aquaculture - PPTX Gr.8
22 pages
Chemical Engineering Job Application Letter
No ratings yet
Chemical Engineering Job Application Letter
3 pages
English 10: Argumentative Essay Quiz
No ratings yet
English 10: Argumentative Essay Quiz
6 pages
Pessimistic View of Rizal's Letters
No ratings yet
Pessimistic View of Rizal's Letters
12 pages
AMF Oct 2024 61st Asia Marketing Management Questions
No ratings yet
AMF Oct 2024 61st Asia Marketing Management Questions
3 pages
Current Affairs for Class 3 & 4
No ratings yet
Current Affairs for Class 3 & 4
10 pages
Term 2 Part 1. Diamond Mining in Kimberley Notes
No ratings yet
Term 2 Part 1. Diamond Mining in Kimberley Notes
4 pages
S VQA
No ratings yet
S VQA
8 pages
Defect Detection in Industrial Soldering Processes Using Machine Learning A Critical Literature Review
No ratings yet
Defect Detection in Industrial Soldering Processes Using Machine Learning A Critical Literature Review
26 pages
CH 1 Electric Charges and Field - Done
No ratings yet
CH 1 Electric Charges and Field - Done
9 pages
Master GC Service Manual 20070919R6 1.3 ENG
100% (2)
Master GC Service Manual 20070919R6 1.3 ENG
173 pages
Die Geometry Impact on Al-Mg-Si Alloy Recrystallization
No ratings yet
Die Geometry Impact on Al-Mg-Si Alloy Recrystallization
12 pages
Sewerage Infrastructure Specs
No ratings yet
Sewerage Infrastructure Specs
12 pages
Bell's Theorem: Tests and Implications
No ratings yet
Bell's Theorem: Tests and Implications
48 pages
FIDIC Sub-Clause 20.1 - Time Extension Procedure - Fidic Sub Clause 20.1
No ratings yet
FIDIC Sub-Clause 20.1 - Time Extension Procedure - Fidic Sub Clause 20.1
27 pages
APA Tutorial APA 7 Formatting Basics 1
No ratings yet
APA Tutorial APA 7 Formatting Basics 1
15 pages
Varkala Eco-Tourism Challenges & Vision
No ratings yet
Varkala Eco-Tourism Challenges & Vision
3 pages

Bda-Wordcount 250805 135324

Uploaded by

Bda-Wordcount 250805 135324

Uploaded by

EXPERIMENT: WORD COUNT IN HADOOP (USING HDFS)

1. Create a Java Project in Eclipse

2. Add Classes to the Project

3. Add Code to Each File

public class WordCount {

Job job = new Job();

FileInputFormat.setInputPaths(job, new Path(args[0]));

boolean success = job.waitForCompletion(true);

public class WordMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

public class WordReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

4. Add External Hadoop Libraries

Right-click on wordCountJob → Build Path → Configure Build Path

5. Export Project to JAR File

Right-click on the project → Export

6. Copy JAR and Sample File to Workspace (Linux Terminal)

7. Upload Input File to HDFS

hadoop fs -mkdir -p /user/training/hadoop_eshaan

8. Run the JAR File using Hadoop

[training@localhost src]$ hadoop jar wordCount.jar WordCount

[training@localhost src]$ hadoop fs -ls output2

[training@localhost src]$ hadoop fs -cat output2/part-r-00000

You might also like