0% found this document useful (0 votes)
21 views13 pages

Exp 9 - Merged

exoeriment

Uploaded by

Abhishek Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views13 pages

Exp 9 - Merged

exoeriment

Uploaded by

Abhishek Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

EXPERIMENT NUMBER – 06

AIM: To develop a MapReduce program to find the grades of students.

import java.util.Scanner;
public class Main{
public static void main(String args[]){
int marks[] = new int [6];
int i;
float total = 0, avg;
Scanner scanner = new Scanner (System.in);
for(i=0; i<6; i++) {
System.out.print("Enter Marks of Subject " + (i + 1) + ":");
marks[i] = scanner.nextInt();
total = total + marks[i];
}
scanner.close();
avg = total/6;
System.out.print("The student grade is: ");
if (avg>=80)
System.out.println("A");
else if (avg>=60 && avg<80)
System.out.println("B");
else if(avg>=40 && avg<60)
System.out.println("C");
else
System.out.println("D");
}
}
Loading the dataset and initiating pig:
Filtering & Sorting operation:
Grouping & Splitting operation:
EXPERIMENT NUMBER – 09

AIM: Perform the MapReduce program for matrix multiplication.

Prerequisites:
1. Hadoop setup in Cloudera: Ensure that Hadoop is installed and configured in
Cloudera.

2. Eclipse IDE installed: Eclipse should be installed with the appropriate plugins for
Java development.

3. Hadoop Libraries: Ensure that the Hadoop libraries are added to your project build
path in Eclipse.

Step 1: Set Up Your Eclipse Project:


1. Open Eclipse IDE and create a new Java project.

o Go to File > New > Java Project.

o Name your project, e.g., MatrixMultiplication.

o Click Finish.

2. Add Hadoop Libraries to the Build Path:


o Right-click on the project in the Project Explorer and select Build Path >
Configure Build Path.
o Under the Libraries tab, click Add External JARs.

o Navigate to your Hadoop installation directory, usually located at


/usr/lib/hadoop/ in Cloudera.
o Add all the required JAR files from hadoop-common, hadoop-hdfs, hadoop-
mapreduce-client-core, and hadoop-yarn.

Step 2: Create Input Matrices:


Matrix multiplication requires two matrices as input:
Store matrices in a sparse format in HDFS:
matrixA matrixB
A,0,0,3 B,0,0,1
A,0,1,4 B,1,0,2
A,1,0,2 B,0,1,5
…. …..

Step 3: Write the MapReduce Code


1. Create a Mapper Class:

o Right-click on your project, select New > Class, and name it MatrixMapper.

o Write the following code:


Code are as follows:
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class MatrixMapper extends Mapper<Object, Text, Text, Text> {
private Text outputKey = new Text();
private Text outputValue = new Text();
@Override
protected void map(Object key, Text value, Context context) throws IOException,
InterruptedException {
// Split the input line into tokens
String[] tokens = value.toString().split(",");
// Ensure that the input line has the expected number of tokens (4 tokens: matrixName, i, j,
value_ij)
if (tokens.length == 4) {
String matrixName = tokens[0];
int i = Integer.parseInt(tokens[1]);
int j = Integer.parseInt(tokens[2]);
int value_ij = Integer.parseInt(tokens[3]);
// Handling matrix A
if (matrixName.equals("A")) {
for (int k = 0; k < context.getConfiguration().getInt("p", 0); k++) {
outputKey.set(i + "," + k);
outputValue.set("A," + j + "," + value_ij);
context.write(outputKey, outputValue);} }
// Handling matrix B
else if (matrixName.equals("B")) {
for (int k = 0; k < context.getConfiguration().getInt("m", 0); k++) {
outputKey.set(k + "," + j);
outputValue.set("B," + i + "," + value_ij);
context.write(outputKey, outputValue);
}
}
} else {
// Log or handle the case of an incorrect input format
System.err.println("Invalid input format: " + value.toString());
}
}
}
2. Create a Reducer class:

o Similarly, create a new class named MatrixReducer with the following code:

Code are as follows:


import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
public class MatrixReducer extends Reducer<Text, Text, Text, IntWritable> {
private IntWritable result = new IntWritable();
@Override
protected void reduce(Text key, Iterable<Text> values, Context context) throws
IOException, InterruptedException {
Map<Integer, Integer> aMap = new HashMap<>();
Map<Integer, Integer> bMap = new HashMap<>();
// Populate aMap and bMap with values from matrices A and B respectively
for (Text val : values) {
String[] tokens = val.toString().split(",");
String matrixName = tokens[0];
int index = Integer.parseInt(tokens[1]);
int matrixValue = Integer.parseInt(tokens[2]);
if (matrixName.equals("A")) {
aMap.put(index, matrixValue);
} else if (matrixName.equals("B")) {
bMap.put(index, matrixValue);} }
// Multiply the matrices and sum the products
int sum = 0;
for (Map.Entry<Integer, Integer> entry : aMap.entrySet()) {
int index = entry.getKey();
int aVal = entry.getValue();
if (bMap.containsKey(index)) {
int bVal = bMap.get(index);
sum += aVal * bVal; }
}result.set(sum);
context.write(key, result);} }

3. Create the Driver Class:

• Finally, create the main driver class called MatrixMultiplicationDriver:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class MatrixMultiplication {
public static void main(String[] args) throws Exception {
// Ensure the correct number of arguments are passed
if (args.length < 5) {
System.err.println("Usage: MatrixMultiplication <input path> <output path> <m> <n>
<p>");
System.exit(-1);
}
Configuration conf = new Configuration();
conf.setInt("m", Integer.parseInt(args[2])); // rows of A
conf.setInt("n", Integer.parseInt(args[3])); // columns of A (and rows of B)
conf.setInt("p", Integer.parseInt(args[4])); // columns of B
// Validate matrix dimensions
if (conf.getInt("m", 0) <= 0 || conf.getInt("n", 0) <= 0 || conf.getInt("p", 0) <= 0) {
System.err.println("Invalid matrix dimensions: m, n, and p must be positive
integers.");
System.exit(-1);
}
Job job = Job.getInstance(conf, "Matrix Multiplication");
job.setJarByClass(MatrixMultiplication.class);
job.setMapperClass(MatrixMapper.class);
job.setReducerClass(MatrixReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1); } }

Step 4: Compile and Run the Program


1. Export the JAR file:

o Right-click on the project and select Export > JAR file.

o Choose a location to save the JAR file and click Finish.

2. Copy the Input Files to HDFS:

o Use the Cloudera terminal to create directories in HDFS and copy the input
matrices:
hadoop fs -mkdir -p /user/matrix/input

hadoop fs -put /path/to/matrixA.txt /user/matrix/input

hadoop fs -put /path/to/matrixB.txt /user/matrix/input

3. Run the MapReduce job:


• In the terminal, run the following command:

hadoop jar /path/to/MatrixMultiplication.jar /user/matrix/input /user/matrix/output

4. Check the Output:

• After the job completes, check the output in HDFS:

hadoop fs -cat /user/matrix/output/part-r-00000

Step 5: Debugging and Logs


• If the job fails, check the logs in Cloudera Manager or use the yarn logs command to
troubleshoot errors.

Step 6: Visualize the Results


• Use any tool or script to visualize the output matrix, depending on your needs.
Outputs:

You might also like