0% found this document useful (0 votes)

236 views19 pages

Hadoop Setup Guide for Developers

1. The document provides instructions for installing Hadoop and related components like Hive on a single node Ubuntu virtual machine. It includes prerequisites, steps to set up Hadoop, Hive and configure related configuration files. 2. The setup involves installing Java, SSH, Eclipse, MySQL and other prerequisites before installing Hadoop. Hadoop is installed in a directory and environment variables are configured. HDFS, YARN and related services are configured and started. 3. Hive is then installed in another directory. The hive-site.xml configuration file is edited to configure Hive for local, Derby and MySQL modes by changing the metastore database connection properties.

Uploaded by

Fernando Andrés Hinojosa Villarreal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

236 views19 pages

Hadoop Setup Guide for Developers

Uploaded by

Fernando Andrés Hinojosa Villarreal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 19

===================================================================================

===================================================================================
===============================
Hadoop All In One Installations
===================================================================================
===================================================================================
===============================

Pre-requisites:
Software:
OOPs Concepts preferably with Core Java (knowledge of Java is strongly recommended)
Linux usage and File System operations
Shell Scripting

Hardware:
Laptop or Desktop with at least
HDD - 20+ GB free space
RAM - 4+ GB
CPU - 2 dual core
OS - Windows/Linux/Mac OS

1. Using VMware Workstation 12.0

2. Download ubuntu-16.04.2-desktop-amd64.iso from http://releases.ubuntu.com/
3. Install Ubuntu on VMware

1. sudo apt-get update

2. java:
-------
sudo apt-get install openjdk-8-jdk

3. ssh:
-------
sudo apt-get install ssh

4. eclipse:
----------
sudo apt-get install eclipse

5. mysql:
---------
sudo apt-get install mysql-server mysql-client

1. create a new 'hadoop' user in ubuntu.

2. create '/home/hadoop/work' and '/home/hadoop/work/hadoopdata' folders

mkdir /home/hadoop/work
mkdir /home/hadoop/work/hadoopdata

3. Download 'hadoop-2.6.0.tar.gz' file from Apache mirrors

https://archive.apache.org/dist/hadoop/core/hadoop-2.6.0/, copy 'hadoop-
2.6.0.tar.gz' file into this '/home/hadoop/work' directory and extract the tar file
into same directory.

tar -xvzf hadoop-2.6.0.tar.gz

4. Open the '~/.bashrc' file on all the machines and add the following lines at the
end and save:
command: gedit ~/.bashrc

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export HADOOP_HOME=/home/hadoop/work/hadoop-2.6.0
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

4.1 source .basrc

5. Enter the below commands on terminal:

ssh localhost
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

6. Update the '/home/hadoop/work/hadoop-2.6.0/etc' folder files 'hadoop-

env.sh','core-site.xml','hdfs-site.xml','mapred-site.xml', 'yarn-env.sh', 'yarn-
site.xml','masters' and 'slaves' files as per the below configurations

hadoop-env.sh
=============
# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export JAVA_HOME=${JAVA_HOME}

core-site.xml
=============
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>

hdfs-site.xml
=============
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/work/hadoopdata/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/work/hadoopdata/dfs/data</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>

<value>file:/home/hadoop/work/hadoopdata/dfs/namesecondary</value>
</property>
</configuration>

mapred-env.sh
=============
# export JAVA_HOME=/home/y/libexec/jdk1.8.0/
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64

mapred-site.xml
===============
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

yarn-env.sh
===========
# export JAVA_HOME=/home/y/libexec/jdk1.8.0/
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64

yarn-site.xml
=============
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>

<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

slaves
======
localhost
/Applications
7. Format the 'namenode' from current machine using this command:
hadoop namenode -format or hdfs namenode -format

8. Start the hadoop by using this command on current machine:

start-dfs.sh
start-all.sh (depricated)

9. Stop the hadoop by using this command on current machine:

start-yarn.sh
stop-all.sh (depricated)

10. jps

1. Download 'apache-hive-1.2.1-bin' file from apache mirrors

https://archive.apache.org/dist/hive/hive-1.2.1/, copy 'apache-hive-1.2.1-bin' file
into this'/home/hadoop/work' directory and extract the tar files into same
directory.
Using terminal:tar -xvzf apache-hive-1.2.1-bin

2. Open the '~/.bashrc' file using command: gedit ~/.bashrc in the terminal add
below lines at the end of the document.
export HIVE_HOME=/home/hadoop/work/apache-hive-1.2.1-bin
export PATH=$HIVE_HOME/bin:$PATH

3. Copy mysql-connector-java-5.1.38 to /home/hadoop/work/apache-hive-1.2.1-bin/lib

Copy /home/hadoop/work/db-derby-10.11.1.1-bin/lib/derbyclient.jar to
/home/hadoop/work/apache-hive-1.2.1-bin/lib

4. The following configuration files are required for hive to be run in different
modes.
1.Hive-site.xml (Main configuration file)
2.hive-site.xml_local (For this mode, copy paste this script into Main
configuration file)
3.Hive-site.xml_derby (For this mode, copy paste this script into Main
configuration file)
4.hive-site.xml_mysql (For this mode, copy paste this script into Main
configuration file)

5. Open hive-site.xml from /home/hadoop/work/apache-hive-1.2.1-bin/conf. Edit

configuration with below properties.

Local Mode:
=========
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/home/hadoop/work/warehouse</value>
<description> Local or HDFS directory where Hive keeps table
contents.</description>
</property>
<property>
<name>hive.metastore.local</name>hdfs
<value>true</value>
<description> Use false if a production metastore server is
used.</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:derby:;databaseName=/home/hadoop/work/warehouse/metastore_db;
create=true</value>
<description> The JDBC connection URL.</description>
</property>

6. Open hive-site.xml from /home/hadoop/work/apache-hive-1.2.1-bin/conf.

copy paste the following script from hive-site.xml_derby into Main configuration
file (hive-site.xml).
Edit configuration with below properties

Derby Mode:
===========
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/home/hadoop/work/warehouse</value>
<description>Local or HDFS directory where Hive keeps table
contents</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby://localhost:1527/myderby;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.ClientDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>

7. Open hive-site.xml from /home/hadoop/work/apache-hive-1.2.1-bin/conf.

copy paste the following script from hive-site.xml_mysql into Main

configuration file (hive-site.xml).
Edit configuration with below properties

Mysql Mode:
===========
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/home/hadoop/work/warehouse</value>
<description>Local or HDFS directory where Hive keeps table
contents</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive_db?
createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hadoop</value>
</property>
</configuration>

8. Type startNetworkServer -h 0.0.0.0 & to start derby db.

if any socket permission error, then
type cd $JAVA_HOME
then type cd jre/lib/security
then type sudo gedit java.policy
it opens a notepad, and findout the below line from the file

// allows anyone to listen on dynamic ports

permission java.net.SocketPermission "localhost:0", "listen";
Copy the above line and paste it underneath and modify as below.
// allows anyone to listen on dynamic ports
permission java.net.SocketPermission "localhost:1527", "listen,resolve";
// These two lines will be placed as below
permission java.net.SocketPermission "localhost:0", "listen";
permission java.net.SocketPermission "localhost:1527", "listen,resolve";

9. To check mysql in the system, in hadoop: type mysql -u root -p in the terminal
to install mysql:
sudo apt-get install mysql-server
to start mysql manually type
sudo service mysql start

ERROR:
======
hive> show databases;
FAILED: Error in metadata: java.lang.RuntimeException: Unable to instantiate
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
hive

If this error encountered, need to start hadoop first and then start hive.

Bug Fix:
========
Hive

Try after removing the jline-0.9.94.jar file under the path

$HADOOP_HOME/share/hadoop/yarn/lib/jline-0.9.94.jar s
===================================================================================
===================================================================================
==============================
DB Derby 10.11.1.1 Installation
===================================================================================
===================================================================================
==============================

1. Download 'db-derby-10.11.1.1-bin' files from apache mirror

https://archive.apache.org/dist/db/derby/db-derby-10.11.1.1/, copy 'db-derby-
10.11.1.1-bin' file into this'/home/hadoop/work' directory and extract the tar file
into same directory.
Using terminal:tar -xvzf db-derby-10.11.1.1-bin

2. Open the '~/.bashrc' file using command: gedit ~/.bashrc in the terminal add
below lines at the end of the document.
export DERBY_HOME=/home/hadoop/work/db-derby-10.11.1.1-bin
export PATH=$DERBY_HOME/bin:$PATH

3. Copy mysql-connector-java-5.1.38 to /home/hadoop/work/db-derby-10.11.1.1-bin/lib

4. echo $DERBY_HOME

TWO MODES:
==========
1.LOCAL MODE
2.MAPREDUCE MODE

INSTALLATION:
=============
1. Download 'pig-0.15.0.tar.gz' file from https://archive.apache.org/dist/pig/pig-
0.15.0/, copy 'pig-0.15.0.tar.gz ' file into this
'/home/hadoop/work' directory and extract the tar file into same directory.
Using Terminal: tar -xvzf pig-0.15.0.tar.gz

2. Check & Set PIG_HOME in .bashrc file.

quit

LOCAL MODE:
============
1.In local mode, it is not mandatory to start hadoop.
2.To start pig in local mode ---> pig -x local

MAPREDUCE MODE:
=============== ==
3.In mapreduce mode, it is mandatory to start hadoop.
4.To start pig in mapreduce mode ---> pig -x mapreduce

1. Download 'sqoop-1.4.3.bin__hadoop-2.0.0-alpha.tar' file from

https://archive.apache.org/dist/sqoop/1.4.3/, copy 'sqoop-1.4.3.bin__hadoop-2.0.0-
alpha.tar' file into this
'/home/hadoop/work' directory and extract the tar file into same directory.
Using Terminal: tar -xvzf sqoop-1.4.3.bin__hadoop-2.0.0-alpha.tar.gz

2. Check & Set SQOOP_HOME in .bashrc file.

export SQOOP_HOME=/home/hadoop/work/sqoop-1.4.3.bin__hadoop-2.0.0-alpha
export PATH=$SQOOP_HOME/bin:$PATH

3. Download mysql-connector-java-5.1.18-bin.jar, mysql-connector-java-5.1.38 and

paste them in

4. Start hadoop

5. To verify Sqoop --> type echo $SQOOP_HOME in the terminal.

(/home/hadoop/work/sqoop-1.4.3.bin__hadoop-2.0.0-alpha.tar)

6. To verify sqoop version --> type sqoop version in the terminal.

1.Download 'apache-flume-1.6.0-bin.tar.gz' from

https://www.apache.org/dist/flume/1.6.0/ and paste it in '/home/hadoop/work' folder
and extract.
Using Terminal: tar -xvzf apache-flume-1.6.0-bin.tar.gz

2.Check & Set FLUME_HOME in .bashrc file.

export FLUME_HOME=/home/hadoop/work/apache-flume-1.6.0-bin
export PATH=$FLUME_HOME/bin:$PATH

3.Type echo $FLUME_HOME in the terminal to verify.

4.Download flume-sources-1.0-SNAPSHOT.jar and paste it in /home/hadoop/work/apache-

flume-1.6.0-bin/lib

5.copy other .conf files and paste it in

/home/hadoop/work/apache-flume-1.6.0-bin/conf

1. Download 'zookeeper-3.4.6.tar.gz' file from

https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/, copy 'zookeeper-
3.4.6.tar.gz' file into this'/home/hadoop/work' directory and extract the tar file
into same directory.
Using terminal : tar -xvzf zookeeper-3.4.6.tar.gz

2. Open the '~/.bashrc' file using command: gedit ~/.bashrc in the terminal add
below lines at the end of the document.
export ZOOKEEPER_HOME=/home/hadoop/work/zookeeper-3.4.6
PATH=$ZOOKEEPER_HOME/bin:$PATH

3. copy from zoo_template.cfg from

/home/hadoop/work/zookeeper-3.4.6/conf/zoo_template.cfg and paste it into newly
created
zoo.cfg in the same location.
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/hadoop/work/zoo_data
clientPort=2181
server.1=localhost:2888:3888

4. In /home/hadoop/work create a new folder zoo_data and create a file inside as

myid. Enter the value 1 inside the myid

5. zookeeper start command: zkServer.sh start

zookeeper stop command: zkServer.sh stop

1. Downlaod hbase-1.1.2-bin.tar.gz from

https://archive.apache.org/dist/hbase/1.1.2/ and paste it in '/home/hadoop/work'
folder and extract the file.

2. update '~/.bashrc' file with below changes

export HBASE_HOME=/home/hadoop/work/hbase-1.1.2
export PATH=$HBASE_HOME/bin:$PATH

3. re-open the terminal and enter: echo $HBASE_HOME

4. update 'hbase-site.xml' in '$HBASE_HOME/conf' folder with proper content

5. hbase start command: start-hbase.sh

hbase stop command: stop-hbase.sh
hbase verify command: hbase shell
===================================================================================
===================================================================================
===============================
Phoenix 4.6.0 Installation
===================================================================================
===================================================================================
===============================
1. Downlaod phoenix-4.6.0-HBase-1.1-bin from
https://archive.apache.org/dist/phoenix/phoenix-4.6.0-HBase-1.1/bin/ and paste it
in '/home/hadoop/work' folder and extract the file.

2. update '~/.bashrc' file with below changes

export PHOENIX_HOME=/home/hadoop/work/phoenix-4.6.0-HBase-1.1-bin
export PATH=$PHOENIX_HOME/bin:$PATH

cp $PHOENIX_HOME/phoenix-4.6.0-HBase-1.1-server.jar $HBASE_HOME/lib
cp $PHOENIX_HOME/phoenix-4.6.0-HBase-1.1-client.jar $HBASE_HOME/lib

Note: restart the hbase after copying the jars

psql.py localhost $PHOENIX_HOME/examples/WEB_STAT.sql

$PHOENIX_HOME/examples/WEB_STAT.csv $PHOENIX_HOME/examples/WEB_STAT_QUERIES.sql

Now you can connect to HBase using sqlline:

===========================================
sqlline.py localhost
!tables
select count(*) from web_stat;
select host, sum(active_visitor) from web_stat group by host;

CREATE VIEW "test" ( pk VARCHAR PRIMARY KEY, "cf"."a" VARCHAR, "cf"."b" VARCHAR,
"cf"."c" VARCHAR );
select * from "test";

UPSERT INTO "test" VALUES ('row4','mya','myb','myc');

UPSERT INTO "test" (pk, "cf"."a","cf"."b","cf"."c") VALUES

('row5','mya','myb','myc');

CREATE VIEW "cars" ( pk VARCHAR PRIMARY KEY, "vi"."make" VARCHAR, "vi"."model"

VARCHAR,"vi"."year" VARCHAR );
select * from "cars";

CREATE TABLE IF NOT EXISTS STUDENT (NAME VARCHAR NOT NULL PRIMARY KEY, ID INTEGER,
YEAR INTEGER);
UPSERT INTO STUDENT VALUES ('ARUN',1,2012);
UPSERT INTO STUDENT VALUES ('RAJ',2,2014);
UPSERT INTO STUDENT VALUES ('VENKAT',3,2013);
UPSERT INTO STUDENT VALUES ('ANIL',4,2012);
SELECT * FROM STUDENT;
===================================================================================
===================================================================================
===============================
Squirrel 3.6 Installation
===================================================================================
===================================================================================
===============================
1. Downlaod squirrel-sql-3.6-standard.jar from
https://sourceforge.net/projects/squirrel-sql/files/1-stable/3.6.0/ and paste it in
'/home/hadoop/work' folder and extract the file.

Using SQuirreL with Phoenix:

===========================
java -jar squirrel-sql-3.6-standard.jar
cp $PHOENIX_HOME/phoenix-4.6.0-HBase-1.1-client.jar ~/squirrel-sql-3.6/lib

1. Downlaod kafka_2.11-0.9.0.0.tgz from

https://archive.apache.org/dist/kafka/0.9.0.0/ and paste it in '/home/hadoop/work'
folder and extract the file.

2. update '~/.bashrc' file with below changes

export KAFKA_HOME=/home/hadoop/work/kafka_2.11-0.9.0.0
export PATH=$KAFKA_HOME/bin:$PATH

3. re-open the terminal and enter: echo $KAFKA_HOME

1. Install Java, Python, R

sudo apt-get install openjdk-8-jdk
sudo apt-get install python
sudo apt-get install r-base

2.Download spark-1.6.3-bin-hadoop2.6.tar.gz from

https://spark.apache.org/downloads.html and paste it in '/home/hadoop/work' folder
and extract the file.

3.Check & Set SPARK_HOME in .bashrc file.

export SPARK_HOME=/home/hadoop/work/spark-1.6.3-bin-hadoop2.6
export PATH=$SPARK_HOME/bin:$PATH
4.Type echo $SPARK_HOME in the terminal to verify.

5.Type spark-shell

-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
-------------------------------
Spark Command Line Examples
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
-------------------------------
1. "scala" is the command to start the "scala"
"python" is the command to start the "python"
"R" is the command to start the "R"

Verify the all are installed or not

2. spark-shell, pyspark, sparkR, spark-sql these are the main commands to work with
spark

execute "spark-shell" command, it will move to "scala" prompt

execute "pyspark" command, it will move to "python" prompt
execute "sparkR" command, it will move to "R" prompt

1. Download scala-2.11.7.tar.gz from

https://www.scala-lang.org/download/2.11.7.html and paste it in /home/hadoop/work
folder and extract.
Using Terminal: tar -xvzf scala-2.11.7.tar.gz

2. Check & Set SCALA_HOME in .bashrc file.

export SCALA_HOME=/home/hadoop/work/scala-2.11.7
export PATH=$SCALA_HOME/bin:$PATH

3. Type echo $SCALA_HOME in the terminal to verify.

4. verify using "scala -version"

1.Download apache-maven-3.3.9-bin.tar.gz from https://maven.apache.org/download.cgi

and paste it in /home/hadoop/work folder and extract.
Using Terminal: tar -xvzf apache-maven-3.3.9-bin.tar.gz

2.Check & Set M2_HOME in .bashrc file.

export M2_HOME=/home/hadoop/work/apache-maven-3.3.9
export PATH=$M2_HOME/bin:$PATH

3.Type echo $M2_HOME in the terminal to verify.

1.Download SBT-0.13.9.GZ from http://www.scala-sbt.org/download.html and paste it

in /home/hadoop/work folder and extract the file.
Using Terminal: tar -xvzf SBT-0.13.9.GZ

2.Check & Set SBT_HOME in .bashrc file.

export SBT_HOME=/home/hadoop/work/sbt
export PATH=$SBT_HOME/bin:$PATH

3.Type echo $SBT_HOME in the terminal to verify.

4.verify using "sbt help"

===================================================================================
===================================================================================
==============================
Cassandra Installation
===================================================================================
===================================================================================
==============================
1. Download apache-cassandra-2.1.5-bin.tar.gz from
https://archive.apache.org/dist/cassandra/2.1.5/ and paste it in /home/hadoop/work
folder and extract the file.

2. update the "~/.bashrc" file with below content.

export CASSANDRA_HOME=/home/hadoop/work/apache-cassandra-2.1.5-bin
export PATH=$CASSANDRA_HOME/bin:$PATH

4. Re-open the new terminal

5. start cassndra server using "cassandra" command

6. start cassandra cli using "cassandra-cli" command

7. start cassandra cql using "cqlsh" command

8. stop cassandra using "pkill -f CassandraDaemon" command

1. Download mongodb-linux-x86_64-rhel57-2.4.12.tgz from

https://www.mongodb.org/dl/linux/x86_64-rhel and paste it in /home/hadoop/work
folder and extract the file.

2. export the environment varaibles

export MONGODB_HOME=/home/hadoop/work/mongodb-linux-x86_64-rhel57-2.4.12
export PATH=$MONGODB_HOME/bin:$PATH

3. Make directory: mkdir /home/hadoop/work/mongodb_data

4. To start mongodb server:-

mongod --dbpath=/home/hadoop/work/mongodb_data

5. To connect mongodb server:

mongo

1.Download nifi-1.2.0.tar.gz from apache mirrors

https://archive.apache.org/dist/nifi/1.2.0/ and paste it in /home/hadoop/work
folder and extract the file.
Using Terminal: tar -xvzf nifi-1.2.0.tar.gz

2.Check & Set NIFI_HOME in .bashrc file.

export NIFI_HOME=/home/hadoop/work/nifi-1.2.0
export PATH=$NIFI_HOME/bin:$PATH

3.Type echo $NIFI_HOME in the terminal to verify.

4.Type nifi.sh start

5.Type nifi.sh stop

6.Type nifi.sh status

===================================================================================
===================================================================================
==============================
Ganglia Installation
===================================================================================
===================================================================================
==============================
sudo apt-get update && sudo apt-get -y upgrade
sudo apt-get install apache2 libapache2-mod-php5 php5-gd rrdtool

Server Installation
========================
sudo apt-get install -y ganglia-monitor rrdtool gmetad ganglia-webfrontend
sudo cp /etc/ganglia-webfrontend/apache.conf
/etc/apache2/sites-enabled/ganglia.conf

sudo gedit /etc/ganglia/gmetad.conf

------------------------------------------
data_source "my cluster" 60 localhost
data_source "my cluster" 60 192.168.1.1:8649

sudo gedit /etc/ganglia/gmond.conf

------------------------------------------
cluster {
name = "my cluster" ## use the name from gmetad.conf
owner = "unspecified"
latlong = "unspecified"
url = "unspecified"
}

udp_send_channel {
#mcast_join = 239.2.11.71 ## comment out
host = localhost
#host = 192.168.1.1
port = 8649
ttl = 1
}

udp_recv_channel {
#mcast_join = 239.2.11.71 ## comment out
port = 8649
#bind = 239.2.11.71 ## comment out
}

restart the services

------------------------------------------
sudo service ganglia-monitor restart && sudo service gmetad restart && sudo service
apache2 restart

open the below url

------------------------------------------
http://localhost/ganglia/

Client Installation
========================
sudo apt-get install -y ganglia-monitor

sudo gedit /etc/ganglia/gmond.conf

------------------------------------------
cluster {
name = "my cluster" ## use the name from gmetad.conf
owner = "unspecified"
latlong = "unspecified"
url = "unspecified"
}

udp_send_channel {
#mcast_join = 239.2.11.71 ## Comment
host = 192.168.1.1 ## IP address of master node
port = 8649
ttl = 1
}

/*
udp_recv_channel {
mcast_join = 239.2.11.71
port = 8649
bind = 239.2.11.71
}
*/

restart the services

------------------------------------------
sudo service ganglia-monitor restart

1. Download "oozie-4.2.0.tar.gz" file & extract the file into "/home/hadoop/work"

folder

2. rename "oozie-4.2.0" to "oozie-4.2.0_build"

3. update '~/.bashrc' file with below changes

export OOZIE_BUILD=/home/hadoop/work/oozie-4.2.0_build
export PATH=$OOZIE_BUILD/bin:$PATH

4. re-open the terminal

echo $OOZIE_BUILD

5. execute the below commands to build oozie with hadoop version

$OOZIE_BUILD/bin/mkdistro.sh -DskipTests

$OOZIE_BUILD/bin/mkdistro.sh -P hadoop-2 -DskipTests

6. copy file from "$OOZIE_BUILD/distro/target/oozie-4.2.0-distro.tar.gz" to "work"

folder and then extract it.

7. update '~/.bashrc' file with below changes

export OOZIE_HOME=/home/hadoop/work/oozie-4.2.0
export PATH=$OOZIE_HOME/bin:$PATH

8. re-open the terminal

echo $OOZIE_HOME

9. create "$OOZIE_HOME/libext" folder

10. copy "ext-2.2.zip" to "$OOZIE_HOME/libext" folder

11. copy "hadooplib-2.6.0.oozie-4.2.0/*.jar" files to "$OOZIE_HOME/libext"

12. $OOZIE_HOME/bin/oozie-setup.sh prepare-war -d $OOZIE_HOME/libext

13. $OOZIE_HOME/bin/ooziedb.sh create -sqlfile oozie.sql -run

14. update "$HADOOP_HOME/etc/hadoop/core-site.xml" file with below properties

<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>

15. restart the "hadoop" using "stop-all.sh & start-all.sh"

16. update "$OOZIE_HOME/conf/oozie-site.xml" file with below property

<property>
<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
<value>*=/home/hadoop/work/hadoop-2.6.0/etc/hadoop</value>
</property>
<property>
<name>oozie.processing.timezone</name>
<value>GMT+0530</value>
</property>
<property>
<name>oozie.service.coord.check.maximum.frequency</name>
<value>false</value>
</property>

17. Start the Oozie Server using below commands

$OOZIE_HOME/bin/oozie-start.sh
$OOZIE_HOME/bin/oozie-run.sh

$OOZIE_HOME/bin/oozied.sh start
$OOZIE_HOME/bin/oozied.sh run

18. Verify the "oozie" status with below commands

$OOZIE_HOME/bin/oozie admin -oozie http://localhost:11000/oozie -status

19. extract "$OOZIE_HOME/oozie-examples.tar.gz" & "$OOZIE_HOME/oozie-sharelib-

4.2.0.tar.gz" files in "$OOZIE_HOME" folder

20. execute below commands to work with "oozie" examples

find $OOZIE_HOME/examples/ -name "job.properties" -exec sed -i

"s/localhost:8020/localhost:8020/g" '{}' \;
find $OOZIE_HOME/examples/ -name "job.properties" -exec sed -i
"s/localhost:8021/localhost:8032/g" '{}' \;
21. copy data from "local" to "hdfs" using below commands

hadoop fs -put $OOZIE_HOME/examples examples

hadoop fs -put $OOZIE_HOME/share share

22. verify "share lib" in "oozie" using below commands

oozie admin -shareliblist -oozie http://localhost:11000/oozie

oozie admin -sharelibupdate -oozie http://localhost:11000/oozie

23. run "oozie" command

$OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie -config

$OOZIE_HOME/examples/apps/map-reduce/job.properties -run

$OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie -config

$OOZIE_HOME/examples/apps/hive/job.properties -run

$OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie -config

$OOZIE_HOME/examples/apps/pig/job.properties -run

24. kill "oozie" command

$OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie -kill 0000017-

160315201745286-oozie-hado-C

25. new examples in oozie

hadoop fs -rmr kalyan-examples

hadoop fs -put $OOZIE_HOME/kalyan-examples kalyan-examples

$OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie -config

$OOZIE_HOME/kalyan-examples/apps/wordcount-wf/job.properties -run

$OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie -config

$OOZIE_HOME/kalyan-examples/apps/wordcount-cron/job.properties -run

$OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie -config

$OOZIE_HOME/kalyan-examples/apps/wordcount-event/job.properties -run

26. if jobs are not possible to run, issue with yarn.

update "yarn-site.xml" with below properties and restart "yarn"

<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>512</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>20480</value>
</property>
===================================================================================
===================================================================================
=================================
Cloudera
===================================================================================
===================================================================================
===============================

Install Sqoop
No ratings yet
Install Sqoop
7 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Hadoop Administration
No ratings yet
Hadoop Administration
97 pages
Hibernate Framework Overview and Guide
No ratings yet
Hibernate Framework Overview and Guide
140 pages
1Z0 809 PDF
No ratings yet
1Z0 809 PDF
26 pages
Cloudera Installation
No ratings yet
Cloudera Installation
180 pages
Cloudera Hive
No ratings yet
Cloudera Hive
106 pages
Administrator Exercise Instructions 201306
No ratings yet
Administrator Exercise Instructions 201306
117 pages
Administration of Hadoop Summer 2014 Lab Guide v3.1
No ratings yet
Administration of Hadoop Summer 2014 Lab Guide v3.1
107 pages
Cloudera Administration
No ratings yet
Cloudera Administration
694 pages
Cloudera Hbase
100% (1)
Cloudera Hbase
145 pages
Hibernate
No ratings yet
Hibernate
110 pages
Apache Hive: Data Warehouse Overview
No ratings yet
Apache Hive: Data Warehouse Overview
77 pages
Jenkins & CI/CD Beginner Guide
No ratings yet
Jenkins & CI/CD Beginner Guide
24 pages
Hibernate
No ratings yet
Hibernate
72 pages
SS1123 - D2T - Apache Cassandra Overview PDF
100% (1)
SS1123 - D2T - Apache Cassandra Overview PDF
45 pages
AWS Certified Developer Associate Course
No ratings yet
AWS Certified Developer Associate Course
2 pages
07 - Ingesting New Datasets Into Google BigQuery
No ratings yet
07 - Ingesting New Datasets Into Google BigQuery
8 pages
Apache Hive Tutorial
No ratings yet
Apache Hive Tutorial
139 pages
Cloudera Manager Installation Guide
No ratings yet
Cloudera Manager Installation Guide
23 pages
Hadoop Security for IT Professionals
No ratings yet
Hadoop Security for IT Professionals
27 pages
Cloudera Distribution of Apache Kafka
No ratings yet
Cloudera Distribution of Apache Kafka
56 pages
BK Hdfs Administration
No ratings yet
BK Hdfs Administration
73 pages
Encrypted PostgreSQL
No ratings yet
Encrypted PostgreSQL
37 pages
Cloudera Kafka
No ratings yet
Cloudera Kafka
175 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
Kubernetes Networking Essentials
No ratings yet
Kubernetes Networking Essentials
79 pages
Cloudera Hive
No ratings yet
Cloudera Hive
137 pages
Cloudera Administration PDF
100% (1)
Cloudera Administration PDF
476 pages
Apache Hue-Cloudera
No ratings yet
Apache Hue-Cloudera
63 pages
Install Hive 1.2.1 on Hadoop 2.x.x Guide
No ratings yet
Install Hive 1.2.1 on Hadoop 2.x.x Guide
7 pages
Troubleshooting: Mysql Replication Problem
No ratings yet
Troubleshooting: Mysql Replication Problem
3 pages
RESTful Web Services With Scala - Sample Chapter
No ratings yet
RESTful Web Services With Scala - Sample Chapter
26 pages
Mysql Interview Questions PDF
No ratings yet
Mysql Interview Questions PDF
5 pages
Apache Hive: Prashant Gupta
100% (1)
Apache Hive: Prashant Gupta
61 pages
Talend Installation Guide (Data Service Platform)
No ratings yet
Talend Installation Guide (Data Service Platform)
14 pages
RabbitMQ Master
No ratings yet
RabbitMQ Master
136 pages
Apache Cassandra Database - Instaclustr
No ratings yet
Apache Cassandra Database - Instaclustr
8 pages
Oozie Workflow Guide
No ratings yet
Oozie Workflow Guide
84 pages
Big Data Hadoop Certification Course
No ratings yet
Big Data Hadoop Certification Course
13 pages
Cloudera Administrator Exercise Instructions PDF
No ratings yet
Cloudera Administrator Exercise Instructions PDF
126 pages
Cloudera Developer Training PDF
No ratings yet
Cloudera Developer Training PDF
593 pages
Hadoop Admin Course
No ratings yet
Hadoop Admin Course
8 pages
Cloudera Certification Dump - 410-Anil
100% (3)
Cloudera Certification Dump - 410-Anil
49 pages
Cloudera Hive
No ratings yet
Cloudera Hive
107 pages
Cloudera Tutorial
100% (1)
Cloudera Tutorial
36 pages
Cloudera Manager Administration Guide
No ratings yet
Cloudera Manager Administration Guide
78 pages
Comprehensive Guide to Jenkins Pipeline
No ratings yet
Comprehensive Guide to Jenkins Pipeline
5 pages
Cloudera Apache Impala Guide
No ratings yet
Cloudera Apache Impala Guide
691 pages
05.azure Data Lake Authentication
No ratings yet
05.azure Data Lake Authentication
16 pages
Case Study of Linux - Linux Kernel Version 2.6
No ratings yet
Case Study of Linux - Linux Kernel Version 2.6
23 pages
Hadoop Troubleshooting Guide
100% (1)
Hadoop Troubleshooting Guide
3 pages
Cloudera Academic Partnership 3 PDF
0% (1)
Cloudera Academic Partnership 3 PDF
103 pages
Presto for Big Data Analytics in Cloud
100% (1)
Presto for Big Data Analytics in Cloud
31 pages
Installing Standalone and Pseudocode Hadoop Cluster: 1. Setting Up Vmware Virtual Machine
No ratings yet
Installing Standalone and Pseudocode Hadoop Cluster: 1. Setting Up Vmware Virtual Machine
14 pages
Hadoop and Hive Installation
No ratings yet
Hadoop and Hive Installation
19 pages
Hadoop Installation Guide: Single & Multi Node
No ratings yet
Hadoop Installation Guide: Single & Multi Node
11 pages
Hive INstallation
No ratings yet
Hive INstallation
13 pages
Hadoop Single DataNode Setup Guide
No ratings yet
Hadoop Single DataNode Setup Guide
9 pages
Hadoop Installation
No ratings yet
Hadoop Installation
4 pages
CSD 1043: Big Data Fundamentals Week1: Big Data Landscape: Definitions
No ratings yet
CSD 1043: Big Data Fundamentals Week1: Big Data Landscape: Definitions
13 pages
Apache Spark Training Overview
No ratings yet
Apache Spark Training Overview
30 pages
Big Data Analytics With Hadoop and Apache Spark
No ratings yet
Big Data Analytics With Hadoop and Apache Spark
17 pages
RTU Evaluation Checklist Guide
No ratings yet
RTU Evaluation Checklist Guide
9 pages
DIDS (Distributed Intrusion Detection System) Motivation, Architecture, and An Early Prototype
100% (1)
DIDS (Distributed Intrusion Detection System) Motivation, Architecture, and An Early Prototype
9 pages
Draftsight User Guide: Eagle Rock'S
No ratings yet
Draftsight User Guide: Eagle Rock'S
46 pages
20334B: Core Solutions of Skype For Business 2015 Microsoft Hyper-V Classroom Setup Guide
No ratings yet
20334B: Core Solutions of Skype For Business 2015 Microsoft Hyper-V Classroom Setup Guide
20 pages
Dpi620 Genii-Is User Manual 116m5464 Rev A
No ratings yet
Dpi620 Genii-Is User Manual 116m5464 Rev A
248 pages
Labs For Book
0% (2)
Labs For Book
8 pages
Chapter 3 Digital Forensics
No ratings yet
Chapter 3 Digital Forensics
5 pages
Mobile App Security Risks
No ratings yet
Mobile App Security Risks
8 pages
AIS10
No ratings yet
AIS10
67 pages
Computer Virus
No ratings yet
Computer Virus
11 pages
SDL Trados Studio 2019 - Intermediate - EN
100% (1)
SDL Trados Studio 2019 - Intermediate - EN
29 pages
Basic and Advanced Python - SRM
No ratings yet
Basic and Advanced Python - SRM
4 pages
Getting Started With PSCAD X4 (v4.6.3)
No ratings yet
Getting Started With PSCAD X4 (v4.6.3)
16 pages
AWS S3 Bucket Connectivity Using Azure Functions
No ratings yet
AWS S3 Bucket Connectivity Using Azure Functions
7 pages
The Sleuth Kit: Forensic Toolset Overview
100% (1)
The Sleuth Kit: Forensic Toolset Overview
37 pages
Common Works Registration User Manual: Source Language: English 23/09/2011
No ratings yet
Common Works Registration User Manual: Source Language: English 23/09/2011
54 pages
Yr 6 ICT G-6 PB PDF
100% (2)
Yr 6 ICT G-6 PB PDF
108 pages
Project 6
No ratings yet
Project 6
4 pages
Preparation Notes
No ratings yet
Preparation Notes
80 pages
MVC Tutorial
No ratings yet
MVC Tutorial
21 pages
Release Note
No ratings yet
Release Note
2 pages
DataPrivilege Operation and Administration Lab Guide
No ratings yet
DataPrivilege Operation and Administration Lab Guide
123 pages
Premiere Pro Logo Opener Guide
No ratings yet
Premiere Pro Logo Opener Guide
10 pages
Dell EMC Avamar
No ratings yet
Dell EMC Avamar
47 pages
Huawei g620s-l01 V100r001c00b256custc432d001 Upgrade Guideline v1.0
100% (2)
Huawei g620s-l01 V100r001c00b256custc432d001 Upgrade Guideline v1.0
7 pages
How To Display A Custom Icon On A VS2005 Add-In Button
No ratings yet
How To Display A Custom Icon On A VS2005 Add-In Button
7 pages
Urbanski M Ruby On Roda Rest Apis With Roda Sequel
No ratings yet
Urbanski M Ruby On Roda Rest Apis With Roda Sequel
172 pages
Windchill Access Control Overview
No ratings yet
Windchill Access Control Overview
7 pages
Uniop Designer 6.05 Manual tn179-03
No ratings yet
Uniop Designer 6.05 Manual tn179-03
595 pages
Chat Fruit E Commerce Website
No ratings yet
Chat Fruit E Commerce Website
75 pages
Mid Prep Linux
No ratings yet
Mid Prep Linux
6 pages