0% found this document useful (0 votes)
236 views19 pages

Hadoop Setup Guide for Developers

1. The document provides instructions for installing Hadoop and related components like Hive on a single node Ubuntu virtual machine. It includes prerequisites, steps to set up Hadoop, Hive and configure related configuration files. 2. The setup involves installing Java, SSH, Eclipse, MySQL and other prerequisites before installing Hadoop. Hadoop is installed in a directory and environment variables are configured. HDFS, YARN and related services are configured and started. 3. Hive is then installed in another directory. The hive-site.xml configuration file is edited to configure Hive for local, Derby and MySQL modes by changing the metastore database connection properties.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
236 views19 pages

Hadoop Setup Guide for Developers

1. The document provides instructions for installing Hadoop and related components like Hive on a single node Ubuntu virtual machine. It includes prerequisites, steps to set up Hadoop, Hive and configure related configuration files. 2. The setup involves installing Java, SSH, Eclipse, MySQL and other prerequisites before installing Hadoop. Hadoop is installed in a directory and environment variables are configured. HDFS, YARN and related services are configured and started. 3. Hive is then installed in another directory. The hive-site.xml configuration file is edited to configure Hive for local, Derby and MySQL modes by changing the metastore database connection properties.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 19

===================================================================================

===================================================================================
===============================
Hadoop All In One Installations
===================================================================================
===================================================================================
===============================

Pre-requisites:
Software:
OOPs Concepts preferably with Core Java (knowledge of Java is strongly recommended)
Linux usage and File System operations
Shell Scripting

Hardware:
Laptop or Desktop with at least
HDD - 20+ GB free space
RAM - 4+ GB
CPU - 2 dual core
OS - Windows/Linux/Mac OS

1. Using VMware Workstation 12.0


2. Download ubuntu-16.04.2-desktop-amd64.iso from http://releases.ubuntu.com/
3. Install Ubuntu on VMware

===================================================================================
===================================================================================
==============================
HADOOP PRE-REQUISITES INSTALLATIONS:
===================================================================================
===================================================================================
==============================

1. sudo apt-get update

2. java:
-------
sudo apt-get install openjdk-8-jdk

3. ssh:
-------
sudo apt-get install ssh

4. eclipse:
----------
sudo apt-get install eclipse

5. mysql:
---------
sudo apt-get install mysql-server mysql-client

===================================================================================
===================================================================================
===============================
Hadoop Single Node setup
===================================================================================
===================================================================================
===============================

1. create a new 'hadoop' user in ubuntu.

2. create '/home/hadoop/work' and '/home/hadoop/work/hadoopdata' folders

mkdir /home/hadoop/work
mkdir /home/hadoop/work/hadoopdata

3. Download 'hadoop-2.6.0.tar.gz' file from Apache mirrors


https://archive.apache.org/dist/hadoop/core/hadoop-2.6.0/, copy 'hadoop-
2.6.0.tar.gz' file into this '/home/hadoop/work' directory and extract the tar file
into same directory.

tar -xvzf hadoop-2.6.0.tar.gz

4. Open the '~/.bashrc' file on all the machines and add the following lines at the
end and save:
command: gedit ~/.bashrc

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export HADOOP_HOME=/home/hadoop/work/hadoop-2.6.0
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

4.1 source .basrc

5. Enter the below commands on terminal:


ssh localhost
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

6. Update the '/home/hadoop/work/hadoop-2.6.0/etc' folder files 'hadoop-


env.sh','core-site.xml','hdfs-site.xml','mapred-site.xml', 'yarn-env.sh', 'yarn-
site.xml','masters' and 'slaves' files as per the below configurations

hadoop-env.sh
=============
# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export JAVA_HOME=${JAVA_HOME}

core-site.xml
=============
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>

hdfs-site.xml
=============
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/work/hadoopdata/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/work/hadoopdata/dfs/data</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>

<value>file:/home/hadoop/work/hadoopdata/dfs/namesecondary</value>
</property>
</configuration>

mapred-env.sh
=============
# export JAVA_HOME=/home/y/libexec/jdk1.8.0/
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64

mapred-site.xml
===============
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

yarn-env.sh
===========
# export JAVA_HOME=/home/y/libexec/jdk1.8.0/
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64

yarn-site.xml
=============
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>

<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

slaves
======
localhost
/Applications
7. Format the 'namenode' from current machine using this command:
hadoop namenode -format or hdfs namenode -format

8. Start the hadoop by using this command on current machine:


start-dfs.sh
start-all.sh (depricated)

9. Stop the hadoop by using this command on current machine:


start-yarn.sh
stop-all.sh (depricated)

10. jps

===================================================================================
===================================================================================
==============================
Hive 1.2.1 Installation
===================================================================================
===================================================================================
==============================

1. Download 'apache-hive-1.2.1-bin' file from apache mirrors


https://archive.apache.org/dist/hive/hive-1.2.1/, copy 'apache-hive-1.2.1-bin' file
into this'/home/hadoop/work' directory and extract the tar files into same
directory.
Using terminal:tar -xvzf apache-hive-1.2.1-bin

2. Open the '~/.bashrc' file using command: gedit ~/.bashrc in the terminal add
below lines at the end of the document.
export HIVE_HOME=/home/hadoop/work/apache-hive-1.2.1-bin
export PATH=$HIVE_HOME/bin:$PATH

3. Copy mysql-connector-java-5.1.38 to /home/hadoop/work/apache-hive-1.2.1-bin/lib


Copy /home/hadoop/work/db-derby-10.11.1.1-bin/lib/derbyclient.jar to
/home/hadoop/work/apache-hive-1.2.1-bin/lib

4. The following configuration files are required for hive to be run in different
modes.
1.Hive-site.xml (Main configuration file)
2.hive-site.xml_local (For this mode, copy paste this script into Main
configuration file)
3.Hive-site.xml_derby (For this mode, copy paste this script into Main
configuration file)
4.hive-site.xml_mysql (For this mode, copy paste this script into Main
configuration file)

5. Open hive-site.xml from /home/hadoop/work/apache-hive-1.2.1-bin/conf. Edit


configuration with below properties.

Local Mode:
=========
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/home/hadoop/work/warehouse</value>
<description> Local or HDFS directory where Hive keeps table
contents.</description>
</property>
<property>
<name>hive.metastore.local</name>hdfs
<value>true</value>
<description> Use false if a production metastore server is
used.</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:derby:;databaseName=/home/hadoop/work/warehouse/metastore_db;
create=true</value>
<description> The JDBC connection URL.</description>
</property>

6. Open hive-site.xml from /home/hadoop/work/apache-hive-1.2.1-bin/conf.

copy paste the following script from hive-site.xml_derby into Main configuration
file (hive-site.xml).
Edit configuration with below properties

Derby Mode:
===========
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/home/hadoop/work/warehouse</value>
<description>Local or HDFS directory where Hive keeps table
contents</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby://localhost:1527/myderby;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.ClientDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>

7. Open hive-site.xml from /home/hadoop/work/apache-hive-1.2.1-bin/conf.

copy paste the following script from hive-site.xml_mysql into Main


configuration file (hive-site.xml).
Edit configuration with below properties

Mysql Mode:
===========
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/home/hadoop/work/warehouse</value>
<description>Local or HDFS directory where Hive keeps table
contents</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive_db?
createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hadoop</value>
</property>
</configuration>

8. Type startNetworkServer -h 0.0.0.0 & to start derby db.


if any socket permission error, then
type cd $JAVA_HOME
then type cd jre/lib/security
then type sudo gedit java.policy
it opens a notepad, and findout the below line from the file

// allows anyone to listen on dynamic ports


permission java.net.SocketPermission "localhost:0", "listen";
Copy the above line and paste it underneath and modify as below.
// allows anyone to listen on dynamic ports
permission java.net.SocketPermission "localhost:1527", "listen,resolve";
// These two lines will be placed as below
permission java.net.SocketPermission "localhost:0", "listen";
permission java.net.SocketPermission "localhost:1527", "listen,resolve";

9. To check mysql in the system, in hadoop: type mysql -u root -p in the terminal
to install mysql:
sudo apt-get install mysql-server
to start mysql manually type
sudo service mysql start

ERROR:
======
hive> show databases;
FAILED: Error in metadata: java.lang.RuntimeException: Unable to instantiate
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
hive

If this error encountered, need to start hadoop first and then start hive.

Bug Fix:
========
Hive

Try after removing the jline-0.9.94.jar file under the path


$HADOOP_HOME/share/hadoop/yarn/lib/jline-0.9.94.jar s
===================================================================================
===================================================================================
==============================
DB Derby 10.11.1.1 Installation
===================================================================================
===================================================================================
==============================

1. Download 'db-derby-10.11.1.1-bin' files from apache mirror


https://archive.apache.org/dist/db/derby/db-derby-10.11.1.1/, copy 'db-derby-
10.11.1.1-bin' file into this'/home/hadoop/work' directory and extract the tar file
into same directory.
Using terminal:tar -xvzf db-derby-10.11.1.1-bin

2. Open the '~/.bashrc' file using command: gedit ~/.bashrc in the terminal add
below lines at the end of the document.
export DERBY_HOME=/home/hadoop/work/db-derby-10.11.1.1-bin
export PATH=$DERBY_HOME/bin:$PATH

3. Copy mysql-connector-java-5.1.38 to /home/hadoop/work/db-derby-10.11.1.1-bin/lib

4. echo $DERBY_HOME

===================================================================================
===================================================================================
==============================
PIG 0.15.0 Installation
===================================================================================
===================================================================================
==============================

TWO MODES:
==========
1.LOCAL MODE
2.MAPREDUCE MODE

INSTALLATION:
=============
1. Download 'pig-0.15.0.tar.gz' file from https://archive.apache.org/dist/pig/pig-
0.15.0/, copy 'pig-0.15.0.tar.gz ' file into this
'/home/hadoop/work' directory and extract the tar file into same directory.
Using Terminal: tar -xvzf pig-0.15.0.tar.gz

2. Check & Set PIG_HOME in .bashrc file.


quit

LOCAL MODE:
============
1.In local mode, it is not mandatory to start hadoop.
2.To start pig in local mode ---> pig -x local

MAPREDUCE MODE:
=============== ==
3.In mapreduce mode, it is mandatory to start hadoop.
4.To start pig in mapreduce mode ---> pig -x mapreduce

===================================================================================
===================================================================================
===============================
Sqoop 1.4.3 Installation
===================================================================================
===================================================================================
===============================

1. Download 'sqoop-1.4.3.bin__hadoop-2.0.0-alpha.tar' file from


https://archive.apache.org/dist/sqoop/1.4.3/, copy 'sqoop-1.4.3.bin__hadoop-2.0.0-
alpha.tar' file into this
'/home/hadoop/work' directory and extract the tar file into same directory.
Using Terminal: tar -xvzf sqoop-1.4.3.bin__hadoop-2.0.0-alpha.tar.gz

2. Check & Set SQOOP_HOME in .bashrc file.


export SQOOP_HOME=/home/hadoop/work/sqoop-1.4.3.bin__hadoop-2.0.0-alpha
export PATH=$SQOOP_HOME/bin:$PATH

3. Download mysql-connector-java-5.1.18-bin.jar, mysql-connector-java-5.1.38 and


paste them in

4. Start hadoop

5. To verify Sqoop --> type echo $SQOOP_HOME in the terminal.


(/home/hadoop/work/sqoop-1.4.3.bin__hadoop-2.0.0-alpha.tar)

6. To verify sqoop version --> type sqoop version in the terminal.

===================================================================================
===================================================================================
===============================
Flume 1.6.0 Installation
===================================================================================
===================================================================================
===============================

1.Download 'apache-flume-1.6.0-bin.tar.gz' from


https://www.apache.org/dist/flume/1.6.0/ and paste it in '/home/hadoop/work' folder
and extract.
Using Terminal: tar -xvzf apache-flume-1.6.0-bin.tar.gz

2.Check & Set FLUME_HOME in .bashrc file.


export FLUME_HOME=/home/hadoop/work/apache-flume-1.6.0-bin
export PATH=$FLUME_HOME/bin:$PATH

3.Type echo $FLUME_HOME in the terminal to verify.

4.Download flume-sources-1.0-SNAPSHOT.jar and paste it in /home/hadoop/work/apache-


flume-1.6.0-bin/lib

5.copy other .conf files and paste it in


/home/hadoop/work/apache-flume-1.6.0-bin/conf

===================================================================================
===================================================================================
===============================
Zookeeper 3.4.6 Installation
===================================================================================
===================================================================================
===============================

1. Download 'zookeeper-3.4.6.tar.gz' file from


https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/, copy 'zookeeper-
3.4.6.tar.gz' file into this'/home/hadoop/work' directory and extract the tar file
into same directory.
Using terminal : tar -xvzf zookeeper-3.4.6.tar.gz

2. Open the '~/.bashrc' file using command: gedit ~/.bashrc in the terminal add
below lines at the end of the document.
export ZOOKEEPER_HOME=/home/hadoop/work/zookeeper-3.4.6
PATH=$ZOOKEEPER_HOME/bin:$PATH

3. copy from zoo_template.cfg from


/home/hadoop/work/zookeeper-3.4.6/conf/zoo_template.cfg and paste it into newly
created
zoo.cfg in the same location.
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/hadoop/work/zoo_data
clientPort=2181
server.1=localhost:2888:3888

4. In /home/hadoop/work create a new folder zoo_data and create a file inside as


myid. Enter the value 1 inside the myid

5. zookeeper start command: zkServer.sh start


zookeeper stop command: zkServer.sh stop

===================================================================================
===================================================================================
===============================
Hbase 1.1.2 Installation
===================================================================================
===================================================================================
===============================

1. Downlaod hbase-1.1.2-bin.tar.gz from


https://archive.apache.org/dist/hbase/1.1.2/ and paste it in '/home/hadoop/work'
folder and extract the file.

2. update '~/.bashrc' file with below changes


export HBASE_HOME=/home/hadoop/work/hbase-1.1.2
export PATH=$HBASE_HOME/bin:$PATH

3. re-open the terminal and enter: echo $HBASE_HOME

4. update 'hbase-site.xml' in '$HBASE_HOME/conf' folder with proper content

5. hbase start command: start-hbase.sh


hbase stop command: stop-hbase.sh
hbase verify command: hbase shell
===================================================================================
===================================================================================
===============================
Phoenix 4.6.0 Installation
===================================================================================
===================================================================================
===============================
1. Downlaod phoenix-4.6.0-HBase-1.1-bin from
https://archive.apache.org/dist/phoenix/phoenix-4.6.0-HBase-1.1/bin/ and paste it
in '/home/hadoop/work' folder and extract the file.

2. update '~/.bashrc' file with below changes


export PHOENIX_HOME=/home/hadoop/work/phoenix-4.6.0-HBase-1.1-bin
export PATH=$PHOENIX_HOME/bin:$PATH

export PHOENIX_HOME=/home/hadoop/work/phoenix-4.6.0-HBase-1.1-bin
export PATH=$PHOENIX_HOME/bin:$PATH

cp $PHOENIX_HOME/phoenix-4.6.0-HBase-1.1-server.jar $HBASE_HOME/lib
cp $PHOENIX_HOME/phoenix-4.6.0-HBase-1.1-client.jar $HBASE_HOME/lib

Note: restart the hbase after copying the jars

psql.py localhost $PHOENIX_HOME/examples/WEB_STAT.sql


$PHOENIX_HOME/examples/WEB_STAT.csv $PHOENIX_HOME/examples/WEB_STAT_QUERIES.sql

Now you can connect to HBase using sqlline:


===========================================
sqlline.py localhost
!tables
select count(*) from web_stat;
select host, sum(active_visitor) from web_stat group by host;

CREATE VIEW "test" ( pk VARCHAR PRIMARY KEY, "cf"."a" VARCHAR, "cf"."b" VARCHAR,
"cf"."c" VARCHAR );
select * from "test";

UPSERT INTO "test" VALUES ('row4','mya','myb','myc');

UPSERT INTO "test" (pk, "cf"."a","cf"."b","cf"."c") VALUES


('row5','mya','myb','myc');

CREATE VIEW "cars" ( pk VARCHAR PRIMARY KEY, "vi"."make" VARCHAR, "vi"."model"


VARCHAR,"vi"."year" VARCHAR );
select * from "cars";

CREATE TABLE IF NOT EXISTS STUDENT (NAME VARCHAR NOT NULL PRIMARY KEY, ID INTEGER,
YEAR INTEGER);
UPSERT INTO STUDENT VALUES ('ARUN',1,2012);
UPSERT INTO STUDENT VALUES ('RAJ',2,2014);
UPSERT INTO STUDENT VALUES ('VENKAT',3,2013);
UPSERT INTO STUDENT VALUES ('ANIL',4,2012);
SELECT * FROM STUDENT;
===================================================================================
===================================================================================
===============================
Squirrel 3.6 Installation
===================================================================================
===================================================================================
===============================
1. Downlaod squirrel-sql-3.6-standard.jar from
https://sourceforge.net/projects/squirrel-sql/files/1-stable/3.6.0/ and paste it in
'/home/hadoop/work' folder and extract the file.

Using SQuirreL with Phoenix:


===========================
java -jar squirrel-sql-3.6-standard.jar
cp $PHOENIX_HOME/phoenix-4.6.0-HBase-1.1-client.jar ~/squirrel-sql-3.6/lib

===================================================================================
===================================================================================
===============================
Kafka 0.8.2 Installation
===================================================================================
===================================================================================
===============================

1. Downlaod kafka_2.11-0.9.0.0.tgz from


https://archive.apache.org/dist/kafka/0.9.0.0/ and paste it in '/home/hadoop/work'
folder and extract the file.

2. update '~/.bashrc' file with below changes


export KAFKA_HOME=/home/hadoop/work/kafka_2.11-0.9.0.0
export PATH=$KAFKA_HOME/bin:$PATH

3. re-open the terminal and enter: echo $KAFKA_HOME

===================================================================================
===================================================================================
==============================
Spark 1.6.3 Installation
===================================================================================
===================================================================================
==============================

1. Install Java, Python, R


sudo apt-get install openjdk-8-jdk
sudo apt-get install python
sudo apt-get install r-base

2.Download spark-1.6.3-bin-hadoop2.6.tar.gz from


https://spark.apache.org/downloads.html and paste it in '/home/hadoop/work' folder
and extract the file.

3.Check & Set SPARK_HOME in .bashrc file.


export SPARK_HOME=/home/hadoop/work/spark-1.6.3-bin-hadoop2.6
export PATH=$SPARK_HOME/bin:$PATH
4.Type echo $SPARK_HOME in the terminal to verify.

5.Type spark-shell

-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
-------------------------------
Spark Command Line Examples
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
-------------------------------
1. "scala" is the command to start the "scala"
"python" is the command to start the "python"
"R" is the command to start the "R"

Verify the all are installed or not

2. spark-shell, pyspark, sparkR, spark-sql these are the main commands to work with
spark

execute "spark-shell" command, it will move to "scala" prompt


execute "pyspark" command, it will move to "python" prompt
execute "sparkR" command, it will move to "R" prompt

===================================================================================
===================================================================================
==============================
Scala 2.11.7 Installation
===================================================================================
===================================================================================
==============================

1. Download scala-2.11.7.tar.gz from


https://www.scala-lang.org/download/2.11.7.html and paste it in /home/hadoop/work
folder and extract.
Using Terminal: tar -xvzf scala-2.11.7.tar.gz

2. Check & Set SCALA_HOME in .bashrc file.


export SCALA_HOME=/home/hadoop/work/scala-2.11.7
export PATH=$SCALA_HOME/bin:$PATH

3. Type echo $SCALA_HOME in the terminal to verify.

4. verify using "scala -version"

===================================================================================
===================================================================================
==============================
Maven 3.0.4 Installation
===================================================================================
===================================================================================
==============================

1.Download apache-maven-3.3.9-bin.tar.gz from https://maven.apache.org/download.cgi


and paste it in /home/hadoop/work folder and extract.
Using Terminal: tar -xvzf apache-maven-3.3.9-bin.tar.gz

2.Check & Set M2_HOME in .bashrc file.


export M2_HOME=/home/hadoop/work/apache-maven-3.3.9
export PATH=$M2_HOME/bin:$PATH

3.Type echo $M2_HOME in the terminal to verify.

===================================================================================
===================================================================================
==============================
SBT 0.13.9 Installation
===================================================================================
===================================================================================
==============================

1.Download SBT-0.13.9.GZ from http://www.scala-sbt.org/download.html and paste it


in /home/hadoop/work folder and extract the file.
Using Terminal: tar -xvzf SBT-0.13.9.GZ

2.Check & Set SBT_HOME in .bashrc file.


export SBT_HOME=/home/hadoop/work/sbt
export PATH=$SBT_HOME/bin:$PATH

3.Type echo $SBT_HOME in the terminal to verify.

4.verify using "sbt help"

===================================================================================
===================================================================================
==============================
Cassandra Installation
===================================================================================
===================================================================================
==============================
1. Download apache-cassandra-2.1.5-bin.tar.gz from
https://archive.apache.org/dist/cassandra/2.1.5/ and paste it in /home/hadoop/work
folder and extract the file.

2. update the "~/.bashrc" file with below content.


export CASSANDRA_HOME=/home/hadoop/work/apache-cassandra-2.1.5-bin
export PATH=$CASSANDRA_HOME/bin:$PATH

4. Re-open the new terminal

5. start cassndra server using "cassandra" command

6. start cassandra cli using "cassandra-cli" command

7. start cassandra cql using "cqlsh" command

8. stop cassandra using "pkill -f CassandraDaemon" command

===================================================================================
===================================================================================
==============================
Mongo DB Installation
===================================================================================
===================================================================================
==============================

1. Download mongodb-linux-x86_64-rhel57-2.4.12.tgz from


https://www.mongodb.org/dl/linux/x86_64-rhel and paste it in /home/hadoop/work
folder and extract the file.

2. export the environment varaibles


export MONGODB_HOME=/home/hadoop/work/mongodb-linux-x86_64-rhel57-2.4.12
export PATH=$MONGODB_HOME/bin:$PATH

3. Make directory: mkdir /home/hadoop/work/mongodb_data

4. To start mongodb server:-


mongod --dbpath=/home/hadoop/work/mongodb_data

5. To connect mongodb server:


mongo

===================================================================================
===================================================================================
==============================
Nifi Installation
===================================================================================
===================================================================================
==============================

1.Download nifi-1.2.0.tar.gz from apache mirrors


https://archive.apache.org/dist/nifi/1.2.0/ and paste it in /home/hadoop/work
folder and extract the file.
Using Terminal: tar -xvzf nifi-1.2.0.tar.gz

2.Check & Set NIFI_HOME in .bashrc file.


export NIFI_HOME=/home/hadoop/work/nifi-1.2.0
export PATH=$NIFI_HOME/bin:$PATH

3.Type echo $NIFI_HOME in the terminal to verify.

4.Type nifi.sh start

5.Type nifi.sh stop

6.Type nifi.sh status

===================================================================================
===================================================================================
==============================
Ganglia Installation
===================================================================================
===================================================================================
==============================
sudo apt-get update && sudo apt-get -y upgrade
sudo apt-get install apache2 libapache2-mod-php5 php5-gd rrdtool

Server Installation
========================
sudo apt-get install -y ganglia-monitor rrdtool gmetad ganglia-webfrontend
sudo cp /etc/ganglia-webfrontend/apache.conf
/etc/apache2/sites-enabled/ganglia.conf

sudo gedit /etc/ganglia/gmetad.conf


------------------------------------------
data_source "my cluster" 60 localhost
data_source "my cluster" 60 192.168.1.1:8649

sudo gedit /etc/ganglia/gmond.conf


------------------------------------------
cluster {
name = "my cluster" ## use the name from gmetad.conf
owner = "unspecified"
latlong = "unspecified"
url = "unspecified"
}

udp_send_channel {
#mcast_join = 239.2.11.71 ## comment out
host = localhost
#host = 192.168.1.1
port = 8649
ttl = 1
}

udp_recv_channel {
#mcast_join = 239.2.11.71 ## comment out
port = 8649
#bind = 239.2.11.71 ## comment out
}

restart the services


------------------------------------------
sudo service ganglia-monitor restart && sudo service gmetad restart && sudo service
apache2 restart

open the below url


------------------------------------------
http://localhost/ganglia/

Client Installation
========================
sudo apt-get install -y ganglia-monitor

sudo gedit /etc/ganglia/gmond.conf


------------------------------------------
cluster {
name = "my cluster" ## use the name from gmetad.conf
owner = "unspecified"
latlong = "unspecified"
url = "unspecified"
}

udp_send_channel {
#mcast_join = 239.2.11.71 ## Comment
host = 192.168.1.1 ## IP address of master node
port = 8649
ttl = 1
}

/*
udp_recv_channel {
mcast_join = 239.2.11.71
port = 8649
bind = 239.2.11.71
}
*/

restart the services


------------------------------------------
sudo service ganglia-monitor restart

===================================================================================
===================================================================================
==============================
OOzie oozie-4.2.0 Installation
===================================================================================
===================================================================================
==============================

1. Download "oozie-4.2.0.tar.gz" file & extract the file into "/home/hadoop/work"


folder

2. rename "oozie-4.2.0" to "oozie-4.2.0_build"

3. update '~/.bashrc' file with below changes

export OOZIE_BUILD=/home/hadoop/work/oozie-4.2.0_build
export PATH=$OOZIE_BUILD/bin:$PATH

4. re-open the terminal

echo $OOZIE_BUILD

5. execute the below commands to build oozie with hadoop version

$OOZIE_BUILD/bin/mkdistro.sh -DskipTests

$OOZIE_BUILD/bin/mkdistro.sh -P hadoop-2 -DskipTests

6. copy file from "$OOZIE_BUILD/distro/target/oozie-4.2.0-distro.tar.gz" to "work"


folder and then extract it.

7. update '~/.bashrc' file with below changes

export OOZIE_HOME=/home/hadoop/work/oozie-4.2.0
export PATH=$OOZIE_HOME/bin:$PATH

8. re-open the terminal

echo $OOZIE_HOME

9. create "$OOZIE_HOME/libext" folder


10. copy "ext-2.2.zip" to "$OOZIE_HOME/libext" folder

11. copy "hadooplib-2.6.0.oozie-4.2.0/*.jar" files to "$OOZIE_HOME/libext"

12. $OOZIE_HOME/bin/oozie-setup.sh prepare-war -d $OOZIE_HOME/libext

13. $OOZIE_HOME/bin/ooziedb.sh create -sqlfile oozie.sql -run

14. update "$HADOOP_HOME/etc/hadoop/core-site.xml" file with below properties

<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>

15. restart the "hadoop" using "stop-all.sh & start-all.sh"

16. update "$OOZIE_HOME/conf/oozie-site.xml" file with below property

<property>
<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
<value>*=/home/hadoop/work/hadoop-2.6.0/etc/hadoop</value>
</property>
<property>
<name>oozie.processing.timezone</name>
<value>GMT+0530</value>
</property>
<property>
<name>oozie.service.coord.check.maximum.frequency</name>
<value>false</value>
</property>

17. Start the Oozie Server using below commands

$OOZIE_HOME/bin/oozie-start.sh
$OOZIE_HOME/bin/oozie-run.sh

$OOZIE_HOME/bin/oozied.sh start
$OOZIE_HOME/bin/oozied.sh run

18. Verify the "oozie" status with below commands

$OOZIE_HOME/bin/oozie admin -oozie http://localhost:11000/oozie -status

19. extract "$OOZIE_HOME/oozie-examples.tar.gz" & "$OOZIE_HOME/oozie-sharelib-


4.2.0.tar.gz" files in "$OOZIE_HOME" folder

20. execute below commands to work with "oozie" examples

find $OOZIE_HOME/examples/ -name "job.properties" -exec sed -i


"s/localhost:8020/localhost:8020/g" '{}' \;
find $OOZIE_HOME/examples/ -name "job.properties" -exec sed -i
"s/localhost:8021/localhost:8032/g" '{}' \;
21. copy data from "local" to "hdfs" using below commands

hadoop fs -put $OOZIE_HOME/examples examples


hadoop fs -put $OOZIE_HOME/share share

22. verify "share lib" in "oozie" using below commands

oozie admin -shareliblist -oozie http://localhost:11000/oozie


oozie admin -sharelibupdate -oozie http://localhost:11000/oozie

23. run "oozie" command

$OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie -config


$OOZIE_HOME/examples/apps/map-reduce/job.properties -run

$OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie -config


$OOZIE_HOME/examples/apps/hive/job.properties -run

$OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie -config


$OOZIE_HOME/examples/apps/pig/job.properties -run

24. kill "oozie" command

$OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie -kill 0000017-


160315201745286-oozie-hado-C

25. new examples in oozie

hadoop fs -rmr kalyan-examples


hadoop fs -put $OOZIE_HOME/kalyan-examples kalyan-examples

$OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie -config


$OOZIE_HOME/kalyan-examples/apps/wordcount-wf/job.properties -run

$OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie -config


$OOZIE_HOME/kalyan-examples/apps/wordcount-cron/job.properties -run

$OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie -config


$OOZIE_HOME/kalyan-examples/apps/wordcount-event/job.properties -run

26. if jobs are not possible to run, issue with yarn.


update "yarn-site.xml" with below properties and restart "yarn"

<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>512</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>20480</value>
</property>
===================================================================================
===================================================================================
=================================
Cloudera
===================================================================================
===================================================================================
===============================

===================================================================================
===================================================================================
=================================
Hortonworks
===================================================================================
===================================================================================
===============================

===================================================================================
===================================================================================
=================================
MapR
===================================================================================
===================================================================================
===============================

===================================================================================
===================================================================================
=================================
Twitter
===================================================================================
===================================================================================
===============================

You might also like