1.
Downloading and Installing Hadoop; Understanding Different Hadoop
Modes. Startup Scripts, Configuration Files
Aim:
To install Hadoop and understand its modes of operation, startup scripts, and configuration
files.
Description:
Hadoop can operate in three modes: standalone, pseudo-distributed, and fully distributed.
Startup scripts help manage Hadoop services, while configuration files define system
behavior.
Procedure:
1. Download Hadoop:
o Visit Apache Hadoop, download the stable version, and extract it.
2. Set up Environment Variables:
o Configure HADOOP_HOME and update PATH in .bashrc.
3. Understand Modes:
o Modify configuration files (core-site.xml, hdfs-site.xml) for pseudo or
fully distributed modes.
4. Start Services:
5. start-dfs.sh
6. start-yarn.sh
Result:
Hadoop installed and modes configured successfully.
2. Hadoop Implementation of File Management Tasks
Aim:
To perform file management tasks such as adding, retrieving, and deleting files in Hadoop
Distributed File System (HDFS).
Description:
HDFS allows distributed storage and management of files. Common tasks include adding
files to HDFS, retrieving files, and deleting them.
Procedure:
1. Start Hadoop Services:
2. start-dfs.sh
3. File Operations:
o Add file:
o hdfs dfs -put localfile.txt /hdfs_directory/
o Retrieve file:
o hdfs dfs -get /hdfs_directory/file.txt localdir/
o Delete file:
o hdfs dfs -rm /hdfs_directory/file.txt
Result:
File management tasks were successfully executed in HDFS.
3. Implementation of Matrix Multiplication with Hadoop MapReduce
Aim:
To implement matrix multiplication using Hadoop's MapReduce programming model.
Description:
Matrix multiplication is performed by splitting input matrices into key-value pairs processed
by mappers and reducers.
Procedure:
1. Write Mapper and Reducer Classes:
o Mapper processes matrix rows and columns.
o Reducer combines intermediate results to generate the final matrix.
2. Run the MapReduce Job:
3. hadoop jar MatrixMultiply.jar input output
4. View Output:
5. hdfs dfs -cat /output/*
Result:
Matrix multiplication was successfully executed.
4. Run a Basic Word Count MapReduce Program
Aim:
To implement a Word Count program using Hadoop MapReduce.
Description:
Word Count identifies the frequency of words in a text file. It demonstrates the MapReduce
paradigm of splitting, mapping, shuffling, and reducing.
Procedure:
1. Write Word Count Program:
o Mapper emits words as keys and 1 as values.
o Reducer sums the values for each key.
2. Run the Job:
3. hadoop jar WordCount.jar input output
4. View Results:
5. hdfs dfs -cat /output/*
Result:
Word frequencies were successfully calculated using MapReduce.
5. Installation of Hive Along with Practice Examples
Aim:
To install Apache Hive and practice querying structured data in Hadoop.
Description:
Hive provides a SQL-like interface to query data stored in HDFS. It simplifies querying large
datasets.
Procedure:
1. Install Hive:
o Download Hive and set HIVE_HOME environment variable.
2. Start Hive Shell:
3. hive
4. Create and Query a Table:
5. CREATE TABLE sample(id INT, name STRING);
6. LOAD DATA LOCAL INPATH 'data.txt' INTO TABLE sample;
7. SELECT * FROM sample;
Result:
Hive installed and queries executed successfully.
6. Installation of HBase and Thrift with Practice Examples
Aim:
To install HBase and Thrift to manage NoSQL data and perform basic operations.
Description:
HBase is a NoSQL database for real-time data storage, and Thrift is used for client-server
communication.
Procedure:
1. Install HBase:
o Download HBase and configure HBASE_HOME.
2. Start HBase Services:
3. start-hbase.sh
4. Perform Operations in HBase Shell:
5. create 'table1', 'cf1'
6. put 'table1', 'row1', 'cf1:col1', 'value1'
7. scan 'table1'
Result:
HBase and Thrift installed successfully, and operations executed.
7. Practice Importing and Exporting Data
Aim:
To import and export data between databases using Cassandra, Hadoop, Java, Pig, Hive, and
HBase.
Description:
Tools like Hive and Pig can move data between HDFS and other databases. Cassandra
manages NoSQL data efficiently.
Procedure:
1. Install Required Software:
o Ensure Hadoop, Hive, Pig, and Cassandra are installed.
2. Import/Export Data Using Sqoop:
o Import:
o sqoop import --connect jdbc:mysql://localhost/db --table table1
--target-dir /hdfs_dir
o Export:
o sqoop export --connect jdbc:mysql://localhost/db --table table1
--export-dir /hdfs_dir
3. Verify Data:
o Use respective database shells (Hive, Cassandra, etc.) to confirm.
Result:
Data successfully imported and exported using the mentioned tools.