0% found this document useful (0 votes)
17 views

bda lab

The document outlines the installation and configuration of Hadoop, including its operation modes and file management tasks in HDFS. It also details the implementation of matrix multiplication and a word count program using Hadoop MapReduce, as well as the installation of Hive and HBase for querying and managing data. Additionally, it covers the process of importing and exporting data between databases using various tools like Sqoop, Hive, and Cassandra.

Uploaded by

kaviyaarul6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

bda lab

The document outlines the installation and configuration of Hadoop, including its operation modes and file management tasks in HDFS. It also details the implementation of matrix multiplication and a word count program using Hadoop MapReduce, as well as the installation of Hive and HBase for querying and managing data. Additionally, it covers the process of importing and exporting data between databases using various tools like Sqoop, Hive, and Cassandra.

Uploaded by

kaviyaarul6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

1.

Downloading and Installing Hadoop; Understanding Different Hadoop


Modes. Startup Scripts, Configuration Files

Aim:

To install Hadoop and understand its modes of operation, startup scripts, and configuration
files.

Description:

Hadoop can operate in three modes: standalone, pseudo-distributed, and fully distributed.
Startup scripts help manage Hadoop services, while configuration files define system
behavior.

Procedure:

1. Download Hadoop:
o Visit Apache Hadoop, download the stable version, and extract it.
2. Set up Environment Variables:
o Configure HADOOP_HOME and update PATH in .bashrc.
3. Understand Modes:
o Modify configuration files (core-site.xml, hdfs-site.xml) for pseudo or
fully distributed modes.
4. Start Services:
5. start-dfs.sh
6. start-yarn.sh

Result:

Hadoop installed and modes configured successfully.

2. Hadoop Implementation of File Management Tasks

Aim:

To perform file management tasks such as adding, retrieving, and deleting files in Hadoop
Distributed File System (HDFS).

Description:

HDFS allows distributed storage and management of files. Common tasks include adding
files to HDFS, retrieving files, and deleting them.

Procedure:

1. Start Hadoop Services:


2. start-dfs.sh
3. File Operations:
o Add file:
o hdfs dfs -put localfile.txt /hdfs_directory/
o Retrieve file:
o hdfs dfs -get /hdfs_directory/file.txt localdir/
o Delete file:
o hdfs dfs -rm /hdfs_directory/file.txt

Result:

File management tasks were successfully executed in HDFS.

3. Implementation of Matrix Multiplication with Hadoop MapReduce

Aim:

To implement matrix multiplication using Hadoop's MapReduce programming model.

Description:

Matrix multiplication is performed by splitting input matrices into key-value pairs processed
by mappers and reducers.

Procedure:

1. Write Mapper and Reducer Classes:


o Mapper processes matrix rows and columns.
o Reducer combines intermediate results to generate the final matrix.
2. Run the MapReduce Job:
3. hadoop jar MatrixMultiply.jar input output
4. View Output:
5. hdfs dfs -cat /output/*

Result:

Matrix multiplication was successfully executed.

4. Run a Basic Word Count MapReduce Program

Aim:

To implement a Word Count program using Hadoop MapReduce.

Description:

Word Count identifies the frequency of words in a text file. It demonstrates the MapReduce
paradigm of splitting, mapping, shuffling, and reducing.
Procedure:

1. Write Word Count Program:


o Mapper emits words as keys and 1 as values.
o Reducer sums the values for each key.
2. Run the Job:
3. hadoop jar WordCount.jar input output
4. View Results:
5. hdfs dfs -cat /output/*

Result:

Word frequencies were successfully calculated using MapReduce.

5. Installation of Hive Along with Practice Examples

Aim:

To install Apache Hive and practice querying structured data in Hadoop.

Description:

Hive provides a SQL-like interface to query data stored in HDFS. It simplifies querying large
datasets.

Procedure:

1. Install Hive:
o Download Hive and set HIVE_HOME environment variable.
2. Start Hive Shell:
3. hive
4. Create and Query a Table:
5. CREATE TABLE sample(id INT, name STRING);
6. LOAD DATA LOCAL INPATH 'data.txt' INTO TABLE sample;
7. SELECT * FROM sample;

Result:

Hive installed and queries executed successfully.

6. Installation of HBase and Thrift with Practice Examples

Aim:

To install HBase and Thrift to manage NoSQL data and perform basic operations.

Description:
HBase is a NoSQL database for real-time data storage, and Thrift is used for client-server
communication.

Procedure:

1. Install HBase:
o Download HBase and configure HBASE_HOME.
2. Start HBase Services:
3. start-hbase.sh
4. Perform Operations in HBase Shell:
5. create 'table1', 'cf1'
6. put 'table1', 'row1', 'cf1:col1', 'value1'
7. scan 'table1'

Result:

HBase and Thrift installed successfully, and operations executed.

7. Practice Importing and Exporting Data

Aim:

To import and export data between databases using Cassandra, Hadoop, Java, Pig, Hive, and
HBase.

Description:

Tools like Hive and Pig can move data between HDFS and other databases. Cassandra
manages NoSQL data efficiently.

Procedure:

1. Install Required Software:


o Ensure Hadoop, Hive, Pig, and Cassandra are installed.
2. Import/Export Data Using Sqoop:
o Import:
o sqoop import --connect jdbc:mysql://localhost/db --table table1
--target-dir /hdfs_dir
o Export:
o sqoop export --connect jdbc:mysql://localhost/db --table table1
--export-dir /hdfs_dir
3. Verify Data:
o Use respective database shells (Hive, Cassandra, etc.) to confirm.

Result:

Data successfully imported and exported using the mentioned tools.

You might also like