Hadoop Installation Final
Hadoop Installation Final
Exercise 2.6 Find procedure to set up the one node Hadoop cluster.
Check whether the Java is 1.8 version and above.
nsp@nspublin:~$ java –version
Suggested packages:
ssh-askpass rssh molly-guard monkeysphere
The following NEW packages will be installed:
ncurses-term openssh-server openssh-sftp-server ssh ssh-import-id
0 upgraded, 5 newly installed, 0 to remove and 19 not upgraded.
The following commands are used to find the location of ‘ssh’ and ‘sshd’
nsp@nspublin:~$ su hduser
Password:
4. ssh Configuring
hduser@nspublin:/home/nsp$ ssh-keygen -t rsa -P ""
SHA256:M9m4RAbYhO6ThJYxwQMevGjfDAtzz9kYzhUQwX
EfVO8 hduser@nspublin
nsp@nspublin:~$ wget
http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.6.0/hadoop-
2.6.0.tar.gz
--2016-06-08 00:38:10--
http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.6.0/hadoop-
2.6.0.tar.gz
hadoop-2.6.0.tar.gz 100%
[=======================================================
hadoop-2.6.0/
hadoop-2.6.0/etc/
hadoop-2.6.0/etc/hadoop/
hadoop-2.6.0/etc/hadoop/hdfs-site.xml
hadoop-2.6.0/etc/hadoop/hadoop-metrics2.properties
………………………………………..
….
hadoop-2.6.0/LICENSE.txt
hadoop-2.6.0/README.txt
hadoop-2.6.0/bin/
nsp@nspublin:/usr/local$ ls
bin etc games globus-5.0.5 include lib man sbin share src
6b. Create a new directory ‘hadoop’
nsp@nspublin:/usr/local$ sudo mkdir hadoop
nsp@nspublin:/usr/local$ ls had*
nsp@nspublin:/usr/local$ ls
bin etc games globus-5.0.5 hadoop include lib man sbin share src
nsp@nspublin:~$ su hduser
Password:
hduser@ksrietcsevb:/usr/local$ ls
bin etc games hadoop include lib man sbin share src
hduser@ksrietcsevb:/usr/local/etc$ cd..
cd..: command not found
hduser@ksrietcsevb:/usr/local/etc$ ^C
hduser@ksrietcsevb:/usr/local/etc$ cd ..
hduser@ksrietcsevb:/usr/local$ ls
bin etc games hadoop include lib man sbin share src
hduser@ksrietcsevb:/usr/local$ cd hadoop/
hduser@ksrietcsevb:/usr/local/hadoop$ ls
bin etc include lib libexec LICENSE.txt NOTICE.txt README.txt sbin
share
hduser@ksrietcsevb:/usr/local/hadoop$ cd etc
hduser@ksrietcsevb:/usr/local/hadoop/etc$ ls
hadoop
hduser@ksrietcsevb:/usr/local/hadoop/etc$ cd hadoop/
hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ ls
capacity-scheduler.xml httpfs-env.sh mapred-env.sh
configuration.xsl httpfs-log4j.properties mapred-queues.xml.template
container-executor.cfg httpfs-signature.secret mapred-site.xml.template
core-site.xml httpfs-site.xml slaves
hadoop-env.cmd kms-acls.xml ssl-client.xml.example
hadoop-env.sh kms-env.sh ssl-server.xml.example
hadoop-metrics2.properties kms-log4j.properties yarn-env.cmd
hadoop-metrics.properties kms-site.xml yarn-env.sh
hadoop-policy.xml log4j.properties yarn-site.xml
hdfs-site.xml mapred-env.cmd
http://www.apache.org/licenses/LICENSE-2.0
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ sudo nano hdfs-site.xml
hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ cat hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
</configuration>
hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ sudo nano yarn-site.xml
hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ cat yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ sudo cp
/usr/local/hadoop/etc/hadoop/mapred-site.xml.template
/usr/local/hadoop/etc/hadoop/mapred-site.xml
hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ sudo nano mapred-site.xml
hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
………………………………………………………………..
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
………………………………………………………………………………..
hduser@nspublin:/usr/local/hadoop/sbin$ start-all.sh
7040 NameNode
7505 ResourceManager
7177 DataNode
7356 SecondaryNameNode
7804 NodeManager
7919 Jps
hduser@nspublin:/usr/local/hadoop/sbin$ pwd
/usr/local/hadoop/sbin
Hadoop 2.6.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r
e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
Compiled by jenkins on 2014-11-13T21:10Z
Compiled with protoc 2.5.0
From source with checksum 18e43357c8f927c0695f1e9522859d6a
This command was run using
/usr/local/hadoop/share/hadoop/common/hadoop- common-2.6.0.jar
18d.
hduser@nspublin:/usr/local/hadoop/sbin$ hadoop fs -mkdir /user
wget http://archive.cloudera.com/one-click-install/maverick/cdh3-
repository_1.0_all.deb
sudo dpkg -i cdh3-repository_1.0_all.deb
sudo apt-get update
sudo apt-get install hadoop-0.20-fuse
sudo wget
'https://archive.cloudera.com/cdh5/debian/wheezy/amd64/
cdh/cloudera.list' -O
/etc/apt/sources.list.d/cloudera.list sudo apt-get
update sudo apt-get install hadoop-hdfs-fuse
Once fuse-dfs is installed, go ahead and mount HDFS using FUSE as follows.
sudo hadoop-fuse-dfs dfs://<name_node_hostname>:<namenode_port>
<mount_point>
Once HDFS has been mounted at <mount_point>, you can use most of the
traditional filesystem operations (e.g., cp, rm, cat, mv, mkdir, rmdir, more, scp).
However, random write operations such as rsync, and permission related
operations such as chmod, chown are not supported in FUSE-mounted HDFS.
2.8 Write a program to use the API’s of Hadoop to interact with it – to display
file content of a file exist in hdfs.
/home/hduser/HadoopFScat.java:
import java.io.InputStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
hduser@nspublin:/usr/local/hadoop/sbin$ ls /home/hduser/fscat
HadoopFScat.class
2.9 Write a wordcount program to demonstrate the use of Map and Reduce
tasks
/home/hduser/WordCount.java:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
hduser@nspublin:/usr/local/hadoop/sbin$ sudo
/usr/lib/jvm/java-8-oracle/bin/javac -classpath /home/hduser/hadoop-core-
1.2.1.jar -d /home/hduser/wc /home/hduser/WordCount.java
added manifest
adding: WordCount$IntSumReducer.class(in = 1739) (out= 739)(deflated 57%)
adding: WordCount$TokenizerMapper.class(in = 1736) (out= 753)(deflated 56%)
adding: WordCount.class(in = 1491) (out= 814)(deflated 45%)