bda lab
bda lab
Aim:
To install Hadoop and understand its modes of operation, startup scripts, and configuration
files.
Description:
Hadoop can operate in three modes: standalone, pseudo-distributed, and fully distributed.
Startup scripts help manage Hadoop services, while configuration files define system
behavior.
Procedure:
1. Download Hadoop:
o Visit Apache Hadoop, download the stable version, and extract it.
2. Set up Environment Variables:
o Configure HADOOP_HOME and update PATH in .bashrc.
3. Understand Modes:
o Modify configuration files (core-site.xml, hdfs-site.xml) for pseudo or
fully distributed modes.
4. Start Services:
5. start-dfs.sh
6. start-yarn.sh
Result:
Aim:
To perform file management tasks such as adding, retrieving, and deleting files in Hadoop
Distributed File System (HDFS).
Description:
HDFS allows distributed storage and management of files. Common tasks include adding
files to HDFS, retrieving files, and deleting them.
Procedure:
Result:
Aim:
Description:
Matrix multiplication is performed by splitting input matrices into key-value pairs processed
by mappers and reducers.
Procedure:
Result:
Aim:
Description:
Word Count identifies the frequency of words in a text file. It demonstrates the MapReduce
paradigm of splitting, mapping, shuffling, and reducing.
Procedure:
Result:
Aim:
Description:
Hive provides a SQL-like interface to query data stored in HDFS. It simplifies querying large
datasets.
Procedure:
1. Install Hive:
o Download Hive and set HIVE_HOME environment variable.
2. Start Hive Shell:
3. hive
4. Create and Query a Table:
5. CREATE TABLE sample(id INT, name STRING);
6. LOAD DATA LOCAL INPATH 'data.txt' INTO TABLE sample;
7. SELECT * FROM sample;
Result:
Aim:
To install HBase and Thrift to manage NoSQL data and perform basic operations.
Description:
HBase is a NoSQL database for real-time data storage, and Thrift is used for client-server
communication.
Procedure:
1. Install HBase:
o Download HBase and configure HBASE_HOME.
2. Start HBase Services:
3. start-hbase.sh
4. Perform Operations in HBase Shell:
5. create 'table1', 'cf1'
6. put 'table1', 'row1', 'cf1:col1', 'value1'
7. scan 'table1'
Result:
Aim:
To import and export data between databases using Cassandra, Hadoop, Java, Pig, Hive, and
HBase.
Description:
Tools like Hive and Pig can move data between HDFS and other databases. Cassandra
manages NoSQL data efficiently.
Procedure:
Result: