0% found this document useful (0 votes)
5 views

Hadoop Installation Final

This document provides a step-by-step guide to setting up a single-node Hadoop cluster, including verifying Java version, creating a Hadoop user group and user, installing SSH, and configuring necessary environment variables. It details the downloading and extraction of the Hadoop bundle, as well as the configuration of essential files like 'hadoop-env.sh' and 'core-site.xml'. The instructions ensure that the Hadoop environment is correctly set up for operation on a local machine.

Uploaded by

Mr.Sakthivel csg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Hadoop Installation Final

This document provides a step-by-step guide to setting up a single-node Hadoop cluster, including verifying Java version, creating a Hadoop user group and user, installing SSH, and configuring necessary environment variables. It details the downloading and extraction of the Hadoop bundle, as well as the configuration of essential files like 'hadoop-env.sh' and 'core-site.xml'. The instructions ensure that the Hadoop environment is correctly set up for operation on a local machine.

Uploaded by

Mr.Sakthivel csg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 32

Hadoop

Exercise 2.6 Find procedure to set up the one node Hadoop cluster.
Check whether the Java is 1.8 version and above.
nsp@nspublin:~$ java –version

java version "1.8.0_91"


Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) Server VM (build 25.91-b14, mixed mode)

1. Create a separate user group for Hadoop:


nsp@nspublin:~$ sudo addgroup hadoop

Adding group `hadoop' (GID 1001) ...


Done.
2. Creating separate user for Hadoop:
nsp@nspublin:~$ sudo adduser --ingroup hadoop hduser

Adding user `hduser' ...


Adding new user `hduser' (1001) with group `hadoop' ...
Creating home directory `/home/hduser' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for hduser
Enter the new value, or press ENTER for the default
Full Name []: hadoop user
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] y
3. ssh Installation:
nsp@nspublin:~$ sudo apt-get install ssh

Reading package lists... Done


Building dependency tree
Reading state information... Done

Suggested packages:
ssh-askpass rssh molly-guard monkeysphere
The following NEW packages will be installed:
ncurses-term openssh-server openssh-sftp-server ssh ssh-import-id
0 upgraded, 5 newly installed, 0 to remove and 19 not upgraded.

Need to get 691 kB of archives.


After this operation, 5,420 kB of additional disk space will be used.
Do you want to continue? [Y/n] y

Get:1 http://in.archive.ubuntu.com/ubuntu xenial-updates/main i386 openssh-sftp-server i386


1:7.2p2-4ubuntu1 [44.0 kB]

The following commands are used to find the location of ‘ssh’ and ‘sshd’

3a. To find the location of ssh


nsp@nspublin:~$ which ssh
/usr/bin/ssh
To find the location of sshd
nsp@nspublin:~$ which sshd
/usr/sbin/sshd

3b. Change into newly created ‘hduser’

nsp@nspublin:~$ su hduser
Password:

4. ssh Configuring
hduser@nspublin:/home/nsp$ ssh-keygen -t rsa -P ""

Generating public/private rsa key pair.


Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.

The key fingerprint is:

SHA256:M9m4RAbYhO6ThJYxwQMevGjfDAtzz9kYzhUQwX
EfVO8 hduser@nspublin

The key's randomart image is:


+---[RSA 2048]----+
|o+...XB..o.. |
|..* o.oo. . . |
|...B +. . |
|.=+oo. + + . |
|..=oX.* S . E |
| o+X o + |
| . . |
| |

4a. To display the RSA public key

hduser@nspublin:/home/nsp$ cat /home/hduser/.ssh/id_rsa.pub >>


/home/hduser/.ssh/authorized_keys

4b. Change back to the root ‘nsp’


hduser@nspublin:/home/nsp$ su nsp
Password:

5. Downloading Hadoop Bundle

nsp@nspublin:~$ wget
http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.6.0/hadoop-
2.6.0.tar.gz
--2016-06-08 00:38:10--
http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.6.0/hadoop-
2.6.0.tar.gz

Resolving mirrors.sonic.net (mirrors.sonic.net)... 69.12.162.27


Connecting to mirrors.sonic.net (mirrors.sonic.net)|
69.12.162.27|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 195257604 (186M) [application/x-gzip]
Saving to: ‘hadoop-2.6.0.tar.gz’

hadoop-2.6.0.tar.gz 100%
[=======================================================

6. Extracting Hadoop Bundle

nsp@nspublin:~$ tar xvzf hadoop-2.6.0.tar.gz

hadoop-2.6.0/
hadoop-2.6.0/etc/
hadoop-2.6.0/etc/hadoop/
hadoop-2.6.0/etc/hadoop/hdfs-site.xml
hadoop-2.6.0/etc/hadoop/hadoop-metrics2.properties
………………………………………..

….

hadoop-2.6.0/LICENSE.txt
hadoop-2.6.0/README.txt
hadoop-2.6.0/bin/

6a. Change the directory to ‘hadoop-2.6.0/ usr / local’


nsp@nspublin:~$ cd hadoop-2.6.0/
nsp@nspublin:~/hadoop-2.6.0$ cd /usr/local

nsp@nspublin:/usr/local$ ls
bin etc games globus-5.0.5 include lib man sbin share src
6b. Create a new directory ‘hadoop’
nsp@nspublin:/usr/local$ sudo mkdir hadoop
nsp@nspublin:/usr/local$ ls had*

nsp@nspublin:/usr/local$ ls
bin etc games globus-5.0.5 hadoop include lib man sbin share src

6c. Again came back to ‘nsp/ hadoop-2.6.0/’


nsp@nspublin:/usr/local$ cd /
nsp@nspublin:/$ cd home/
nsp@nspublin:/$ cd nsp/
nsp@nspublin:~$ cd hadoop-2.6.0/
nsp@nspublin:~/hadoop-2.6.0$ ls
bin etc include lib libexec LICENSE.txt NOTICE.txt README.txt sbin
share

7. Moving Hadoop Extraction to /usr/local/hadoop:

nsp@nspublin:~/hadoop-2.6.0$ sudo mv * /usr/local/hadoop


nsp@nspublin:~/hadoop-2.6.0$

7a. Ssh to local host


nsp@nspublin:~$ ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is
SHA256:c5h0xofdEClBcHD0n8fAi+fLbDSmjD5dSIX1YnceSa8.
Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'localhost' (ECDSA) to the list of

7b. Change to ‘hduser’

nsp@nspublin:~$ su hduser
Password:

7c. Ssh to local host


hduser@nspublin:/home/nsp$ ssh localhost

The authenticity of host 'localhost (127.0.0.1)' can't be


established.
ECDSA key fingerprint is
SHA256:c5h0xofdEClBcHD0n8fAi+fLbDSmjD5dSIX1YnceSa8.
Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'localhost' (ECDSA) to the list of

7d. Change to ‘nsp’


hduser@nspublin:~$ su nsp
Password:

8. Adding root administrator previlage to hduser

nsp@nspublin:/home/hduser$ sudo adduser hduser sudo


[sudo] password for nsp:

Adding user `hduser' to group `sudo' ...


Adding user hduser to group sudo
Done.

9. Configure ~/.bashrc as below:


hduser@nspublin:/home/nsp/hadoop-2.6.0$ gedit ~/.bashrc

# ~/.bashrc: executed by bash(1) for non-login shells.


# see /usr/share/doc/bash/examples/startup-files (in the package bash-doc)
# for examples

# If not running interactively, don't do anything


case $- in
*i*) ;;
*) return;;
………………
…………….
if ! shopt -oq posix; then
if [ -f /usr/share/bash-completion/bash_completion ]; then
. /usr/share/bash-completion/bash_completion
elif [ -f /etc/bash_completion ]; then
. /etc/bash_completion
fi
fi

# -- HADOOP ENVIRONMENT VARIABLES START -- #


export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
# -- HADOOP ENVIRONMENT VARIABLES END -- #

9a. Change to ‘hduser’


nsp@nspublin:~$ su hduser
Password:

9b. Change directory ‘/usr/local’


hduser@ksrietcsevb:~$ cd /usr/local/

hduser@ksrietcsevb:/usr/local$ ls
bin etc games hadoop include lib man sbin share src

9c. Change directory ‘etc’


hduser@ksrietcsevb:/usr/local$ cd etc
hduser@ksrietcsevb:/usr/local/etc$ ls

hduser@ksrietcsevb:/usr/local/etc$ cd..
cd..: command not found
hduser@ksrietcsevb:/usr/local/etc$ ^C

hduser@ksrietcsevb:/usr/local/etc$ cd ..
hduser@ksrietcsevb:/usr/local$ ls
bin etc games hadoop include lib man sbin share src

9d. Change directory ‘hadoop/etc/hadoop’

hduser@ksrietcsevb:/usr/local$ cd hadoop/
hduser@ksrietcsevb:/usr/local/hadoop$ ls
bin etc include lib libexec LICENSE.txt NOTICE.txt README.txt sbin
share

hduser@ksrietcsevb:/usr/local/hadoop$ cd etc
hduser@ksrietcsevb:/usr/local/hadoop/etc$ ls
hadoop

hduser@ksrietcsevb:/usr/local/hadoop/etc$ cd hadoop/
hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ ls
capacity-scheduler.xml httpfs-env.sh mapred-env.sh
configuration.xsl httpfs-log4j.properties mapred-queues.xml.template
container-executor.cfg httpfs-signature.secret mapred-site.xml.template
core-site.xml httpfs-site.xml slaves
hadoop-env.cmd kms-acls.xml ssl-client.xml.example
hadoop-env.sh kms-env.sh ssl-server.xml.example
hadoop-metrics2.properties kms-log4j.properties yarn-env.cmd
hadoop-metrics.properties kms-site.xml yarn-env.sh
hadoop-policy.xml log4j.properties yarn-site.xml
hdfs-site.xml mapred-env.cmd

10. Configure ‘hadoop-env.sh’


hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ nano hadoop-env.sh

hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ sudo nano hadoop-


env.sh
[sudo] password for hduser:

hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ cat hadoop-env.sh

# Licensed to the Apache Software Foundation (ASF) under one


# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Set Hadoop-specific environment variables here.

# The only required environment variable is JAVA_HOME. All others are


# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.

# The java implementation to use.


export JAVA_HOME=/usr/lib/jvm/java-8-oracle

# The jsvc implementation to use. Jsvc is required to run secure datanodes


# that bind to privileged ports to provide authentication of data transfer
……………..
…………………..
export HADOOP_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}

# A string representing this instance of hadoop. $USER by default.


export HADOOP_IDENT_STRING=$USER

11. Configure ‘core-site.xml’

hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ sudo nano core-


site.xml

hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ cat core-site.xml

<?xml version="1.0" encoding="UTF-8"?>


<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software


distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ sudo nano hdfs-site.xml
hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ cat hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software


distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
</configuration>
hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ sudo nano yarn-site.xml
hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ cat yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software


distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->


<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ sudo cp
/usr/local/hadoop/etc/hadoop/mapred-site.xml.template
/usr/local/hadoop/etc/hadoop/mapred-site.xml
hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ sudo nano mapred-site.xml
hduser@ksrietcsevb:/usr/local/hadoop/etc/hadoop$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software


distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

………………………………………………………………..
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
………………………………………………………………………………..

12. Create namenode and datanode and configure them


hduser@nspublin:/home/nsp$ sudo mkdir -p
/usr/local/hadoop_store/hdfs/namenode
hduser@nspublin:/home/nsp$ sudo mkdir -p
/usr/local/hadoop_store/hdfs/datanode
hduser@nspublin:/home/nsp$ sudo chown -R hduser:hadoop
/usr/local/hadoop_store
hduser@nspublin:/home/nsp$ sudo chown -R hduser /usr/local/hadoop/

13. Formatting namenode as preliminary stage of HDFS:


hduser@nspublin:/home/nsp$ hdfs namenode -format

16/06/08 09:05:26 INFO namenode.NameNode: STARTUP_MSG:


/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = nspublin/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.6.0
STARTUP_MSG: classpath =
/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/
java-xmlbuilder-0.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-
jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/htrace-core-
…………………….
………………………..

16/06/08 09:05:29 INFO common.Storage: Storage directory


/usr/local/hadoop_store/hdfs/namenode has been successfully formatted.
16/06/08 09:05:30 INFO namenode.NNStorageRetentionManager: Going to retain
1 images with txid >= 0
16/06/08 09:05:30 INFO util.ExitUtil: Exiting with status 0
16/06/08 09:05:30 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at nspublin/127.0.1.1
************************************************************/

14. Starting Hadoop:


hduser@nspublin:/home/nsp$ cd /usr/local/hadoop/sbin/

hduser@nspublin:/usr/local/hadoop/sbin$ start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh


16/06/08 09:08:01 WARN util.NativeCodeLoader: Unable to load native-
hadoop library for your platform... using builtin-java classes where
applicable
Starting namenodes on [localhost]
15. Checking Hadoop:
hduser@nspublin:/usr/local/hadoop/sbin$ jps

7040 NameNode
7505 ResourceManager
7177 DataNode
7356 SecondaryNameNode
7804 NodeManager
7919 Jps

hduser@nspublin:/usr/local/hadoop/sbin$ netstat -plten | grep java

(Not all processes could be identified, non-owned process info


will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 1001
42035 7177/java

hduser@nspublin:/usr/local/hadoop/sbin$ pwd
/usr/local/hadoop/sbin

To Check hadoop started in browser: http://localhost:50070

16. Stoping Hadoop:


hduser@nspublin:/usr/local/hadoop/sbin$ stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
16/06/08 09:10:34 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
Stopping namenodes on [localhost]
hduser@localhost's password:
17. Checking HDFS installing:
hduser@nspublin:/usr/local/hadoop/sbin$ hadoop version

Hadoop 2.6.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r
e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
Compiled by jenkins on 2014-11-13T21:10Z
Compiled with protoc 2.5.0
From source with checksum 18e43357c8f927c0695f1e9522859d6a
This command was run using
/usr/local/hadoop/share/hadoop/common/hadoop- common-2.6.0.jar

18. HDFS Input Directory:


18a. Creating HDFS Directory

hduser@nspublin:/usr/local/hadoop/sbin$ hadoop fs -mkdir /user

16/06/08 13:08:46 WARN util.NativeCodeLoader: Unable to load native-


hadoop library for your platform... using builtin-java classes where
applicable

18b. Listing files in the directory

hduser@nspublin:/usr/local/hadoop/sbin$ hadoop fs -ls /

16/06/08 13:09:09 WARN util.NativeCodeLoader: Unable to load native-


hadoop library for your platform... using builtin-java classes where
applicable
Found 1 items
drwxr-xr-x - hduser supergroup 0 2016-06-08 13:08 /user

18c. Creating HDFS Subdirectory

hduser@nspublin:/usr/local/hadoop/sbin$ hadoop fs -mkdir /user/input

16/06/08 13:09:29 WARN util.NativeCodeLoader: Unable to load native-


hadoop library for your platform... using builtin-java classes where
applicable

18d.
hduser@nspublin:/usr/local/hadoop/sbin$ hadoop fs -mkdir /user

16/06/08 13:09:53 WARN util.NativeCodeLoader: Unable to load native-


hadoop library for your platform... using builtin-java classes where
applicable
mkdir: `/user': File exists

18d. Copying a file into HDFS Directory

hduser@nspublin:/usr/local/hadoop/sbin$ hadoop fs -put


/home/nsp/file.txt /user/input

16/06/08 13:14:19 WARN util.NativeCodeLoader: Unable to load native-


hadoop library for your platform... using builtin-java classes where
applicable

hduser@nspublin:/usr/local/hadoop/sbin$ hadoop fs -ls /user/input

16/06/08 13:14:54 WARN util.NativeCodeLoader: Unable to load native-


hadoop library for your platform... using builtin-java classes where
applicable
Found 1 items
-rw-r--r-- 1 hduser supergroup 780 2016-06-08 13:14
/user/input/file.txt
19c. Displaying file content which is in in HDFS Directory

hduser@nspublin:/usr/local/hadoop/sbin$ hadoop fs -cat


/user/input/file.txt
16/06/08 13:15:12 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
Alzheimer's virtual reality app simulates dementia
2 June 2016 Last updated at 19:13 BST
A virtual reality app has been launched to provide a sense of what it
is like to live with different forms of dementia.

2.7 Mount the one node Hadoop cluster using FUSE.


However, one can leverage FUSE to write a userland application that exposes
HDFS via a traditional filesystem interface. fuse-dfs is one such FUSE-based
application which allows you to mount HDFS as if it were a traditional Linux
filesystem. If you would like to mount HDFS on Linux, you can install fuse-dfs,
along with FUSE as follows:

wget http://archive.cloudera.com/one-click-install/maverick/cdh3-
repository_1.0_all.deb
sudo dpkg -i cdh3-repository_1.0_all.deb
sudo apt-get update
sudo apt-get install hadoop-0.20-fuse

sudo wget
'https://archive.cloudera.com/cdh5/debian/wheezy/amd64/
cdh/cloudera.list' -O
/etc/apt/sources.list.d/cloudera.list sudo apt-get
update sudo apt-get install hadoop-hdfs-fuse

Once fuse-dfs is installed, go ahead and mount HDFS using FUSE as follows.
sudo hadoop-fuse-dfs dfs://<name_node_hostname>:<namenode_port>
<mount_point>
Once HDFS has been mounted at <mount_point>, you can use most of the
traditional filesystem operations (e.g., cp, rm, cat, mv, mkdir, rmdir, more, scp).
However, random write operations such as rsync, and permission related
operations such as chmod, chown are not supported in FUSE-mounted HDFS.

2.8 Write a program to use the API’s of Hadoop to interact with it – to display
file content of a file exist in hdfs.
/home/hduser/HadoopFScat.java:
import java.io.InputStream;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;

public class HadoopFScat {


public static void main(String[] args) throws Exception {
String uri = args[0];
Configuration conf = new Configuration();
FileSystem fileSystem = FileSystem.get(URI.create(uri), conf);
InputStream inputStream = null;
try{
inputStream = fileSystem.open(new Path(uri));
IOUtils.copyBytes(inputStream, System.out, 4096, false);
}finally{
IOUtils.closeStream(inputStream);
}
}
}
Download the jar file:
Download Hadoop-core-1.2.1.jar, which is used to compile and execute the
MapReduce program. Visit the following link
http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core/1.2.1 to
download the jar. Let us assume the downloaded folder is /home/hduser/.

Creating a direfctory to collect class files:


hduser@nspublin:/usr/local/hadoop/sbin$ mkdir /home/hduser/fscat

Compiling the java file - HadoopFScat.java:


hduser@nspublin:/usr/local/hadoop/sbin$ sudo
/usr/lib/jvm/java-8-oracle/bin/javac -classpath /home/hduser/hadoop-core-
1.2.1.jar -d /home/hduser/fscat /home/hduser/HadoopFScat.java

sudo /usr/lib/jvm/java-8-openjdk-amd64/bin/javac -classpath


/home/hduser/hadoop-core-1.2.1.jar -d /home/hduser/fscat
/home/hduser/HadoopFScat.java

hduser@nspublin:/usr/local/hadoop/sbin$ ls /home/hduser/fscat
HadoopFScat.class

Creating jar file for HadoopFScat.java:


hduser@nspublin:/usr/local/hadoop/sbin$ jar -cvf /home/hduser/fscat.jar -C
/home/hduser/fscat/ .
added manifest
adding: HadoopFScat.class(in = 1224) (out= 667)(deflated 45%)

Executing jar file for HadoopFScat.java:


hduser@nspublin:/usr/local/hadoop/sbin$ hadoop jar /home/hduser/fscat.jar
HadoopFScat /user/input/file.txt
16/06/08 15:29:03 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Alzheimer's virtual reality app simulates dementia
2 June 2016 Last updated at 19:13 BST
A virtual reality app has been launched to provide a sense of what it is like to live
with different forms of dementia.
A Walk Through Dementia was created by the charity Alzheimer's Research UK.
It has been welcomed by other experts in the field.
We will increasingly be asked for help by people with dementia, and having had
some insight into what may be happening for them will improve how we can help,
said Tula Brannelly from the University of Southampton.
A woman living with the condition and her husband told the Today programme
why they supported the Android app's creation.
Visitors to St Pancras International station in London can try out the app until
1700 on Saturday 4 June.

2.9 Write a wordcount program to demonstrate the use of Map and Reduce
tasks
/home/hduser/WordCount.java:

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

public static class TokenizerMapper


extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);


private Text word = new Text();

public void map(Object key, Text value, Context context


) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

public static class IntSumReducer


extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,


Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}

public static void main(String[] args) throws Exception {


Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

hduser@nspublin:/usr/local/hadoop/sbin$ mkdir /home/hduser/wc

hduser@nspublin:/usr/local/hadoop/sbin$ sudo
/usr/lib/jvm/java-8-oracle/bin/javac -classpath /home/hduser/hadoop-core-
1.2.1.jar -d /home/hduser/wc /home/hduser/WordCount.java

sudo /usr/lib/jvm/java-8-openjdk-amd64/bin/javac -classpath


/home/hduser/hadoop-core-1.2.1.jar -d /home/hduser/wc
/home/hduser/WordCount.java

hduser@nspublin:/usr/local/hadoop/sbin$ jar -cvf /home/hduser/wc.jar -C


/home/hduser/wc/ .

added manifest
adding: WordCount$IntSumReducer.class(in = 1739) (out= 739)(deflated 57%)
adding: WordCount$TokenizerMapper.class(in = 1736) (out= 753)(deflated 56%)
adding: WordCount.class(in = 1491) (out= 814)(deflated 45%)

hduser@nspublin:/usr/local/hadoop/sbin$ hadoop fs -put /home/nsp/file.txt


/user/input
hadoop fs -put /home/hduser/samplemapin.txt /user/input

16/06/08 14:26:10 WARN util.NativeCodeLoader: Unable to load native-hadoop


library for your platform... using builtin-java classes where applicable

hduser@nspublin:/usr/local/hadoop/sbin$ hadoop fs -cat /user/input/file.txt

16/06/08 14:26:22 WARN util.NativeCodeLoader: Unable to load native-hadoop


library for your platform... using builtin-java classes where applicable
Alzheimer's virtual reality app simulates dementia
2 June 2016 Last updated at 19:13 BST
A virtual reality app has been launched to provide a sense of what it is like to live
with different forms of dementia.
A Walk Through Dementia was created by the charity Alzheimer's Research UK.
It has been welcomed by other experts in the field.
We will increasingly be asked for help by people with dementia, and having had
some insight into what may be happening for them will improve how we can help,
said Tula Brannelly from the University of Southampton.
A woman living with the condition and her husband told the Today programme
why they supported the Android app's creation.
Visitors to St Pancras International station in London can try out the app until
1700 on Saturday 4 June.

hduser@nspublin:/usr/local/hadoop/sbin$ hadoop fs -ls /user

16/06/08 14:26:36 WARN util.NativeCodeLoader: Unable to load native-hadoop


library for your platform... using builtin-java classes where applicable
Found 1 items
drwxr-xr-x - hduser supergroup 0 2016-06-08 14:26 /user/input

hduser@nspublin:/usr/local/hadoop/sbin$ hadoop jar /home/hduser/wc.jar


WordCount /user/input /user/output

16/06/08 14:26:52 WARN util.NativeCodeLoader: Unable to load native-hadoop


library for your platform... using builtin-java classes where applicable
16/06/08 14:26:53 INFO Configuration.deprecation: session.id is deprecated.
Instead, use dfs.metrics.session-id
16/06/08 14:26:53 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
16/06/08 14:26:53 WARN mapreduce.JobSubmitter: Hadoop command-line
option parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this.
16/06/08 14:26:53 INFO input.FileInputFormat: Total input paths to process : 1
16/06/08 14:26:53 INFO mapreduce.JobSubmitter: number of splits:1
16/06/08 14:26:54 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_local1838907415_0001
16/06/08 14:26:54 INFO mapreduce.Job: The url to track the job:
http://localhost:8080/
16/06/08 14:26:54 INFO mapreduce.Job: Running job:
job_local1838907415_0001
16/06/08 14:26:54 INFO mapred.LocalJobRunner: OutputCommitter set in config
null
16/06/08 14:26:54 INFO mapred.LocalJobRunner: OutputCommitter is
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
16/06/08 14:26:54 INFO mapred.LocalJobRunner: Waiting for map tasks
16/06/08 14:26:54 INFO mapred.LocalJobRunner: Starting task:
attempt_local1838907415_0001_m_000000_0
16/06/08 14:26:54 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/06/08 14:26:54 INFO mapred.MapTask: Processing split:
hdfs://localhost:54310/user/input/file.txt:0+780
16/06/08 14:26:55 INFO mapreduce.Job: Job job_local1838907415_0001 running
in uber mode : false
16/06/08 14:26:55 INFO mapred.MapTask: (EQUATOR) 0 kvi
26214396(104857584)
16/06/08 14:26:55 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/06/08 14:26:55 INFO mapred.MapTask: soft limit at 83886080
16/06/08 14:26:55 INFO mapreduce.Job: map 0% reduce 0%
16/06/08 14:26:55 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/06/08 14:26:55 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/06/08 14:26:55 INFO mapred.MapTask: Map output collector class =
org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/06/08 14:26:56 INFO mapred.LocalJobRunner:
16/06/08 14:26:56 INFO mapred.MapTask: Starting flush of map output
16/06/08 14:26:56 INFO mapred.MapTask: Spilling map output
16/06/08 14:26:56 INFO mapred.MapTask: bufstart = 0; bufend = 1319; bufvoid =
104857600
16/06/08 14:26:56 INFO mapred.MapTask: kvstart = 26214396(104857584);
kvend = 26213860(104855440); length = 537/6553600
16/06/08 14:26:56 INFO mapred.MapTask: Finished spill 0
16/06/08 14:26:56 INFO mapred.Task:
Task:attempt_local1838907415_0001_m_000000_0 is done. And is in the process
of committing
16/06/08 14:26:56 INFO mapred.LocalJobRunner: map
16/06/08 14:26:56 INFO mapred.Task: Task
'attempt_local1838907415_0001_m_000000_0' done.
16/06/08 14:26:56 INFO mapred.LocalJobRunner: Finishing task:
attempt_local1838907415_0001_m_000000_0
16/06/08 14:26:56 INFO mapred.LocalJobRunner: map task executor complete.
16/06/08 14:26:56 INFO mapred.LocalJobRunner: Waiting for reduce tasks
16/06/08 14:26:56 INFO mapred.LocalJobRunner: Starting task:
attempt_local1838907415_0001_r_000000_0
16/06/08 14:26:56 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/06/08 14:26:56 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin:
org.apache.hadoop.mapreduce.task.reduce.Shuffle@1175c7e
16/06/08 14:26:56 INFO reduce.MergeManagerImpl: MergerManager:
memoryLimit=334154944, maxSingleShuffleLimit=83538736,
mergeThreshold=220542272, ioSortFactor=10,
memToMemMergeOutputsThreshold=10
16/06/08 14:26:56 INFO reduce.EventFetcher:
attempt_local1838907415_0001_r_000000_0 Thread started: EventFetcher for
fetching Map Completion Events
16/06/08 14:26:56 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle
output of map attempt_local1838907415_0001_m_000000_0 decomp: 1282 len:
1286 to MEMORY
16/06/08 14:26:56 INFO reduce.InMemoryMapOutput: Read 1282 bytes from
map-output for attempt_local1838907415_0001_m_000000_0
16/06/08 14:26:56 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-
output of size: 1282, inMemoryMapOutputs.size() -> 1, commitMemory -> 0,
usedMemory ->1282
16/06/08 14:26:56 INFO reduce.EventFetcher: EventFetcher is interrupted..
Returning
16/06/08 14:26:56 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/06/08 14:26:56 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-
memory map-outputs and 0 on-disk map-outputs
16/06/08 14:26:56 INFO mapred.Merger: Merging 1 sorted segments
16/06/08 14:26:56 INFO mapred.Merger: Down to the last merge-pass, with 1
segments left of total size: 1275 bytes
16/06/08 14:26:56 INFO reduce.MergeManagerImpl: Merged 1 segments, 1282
bytes to disk to satisfy reduce memory limit
16/06/08 14:26:56 INFO reduce.MergeManagerImpl: Merging 1 files, 1286 bytes
from disk
16/06/08 14:26:56 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes
from memory into reduce
16/06/08 14:26:56 INFO mapred.Merger: Merging 1 sorted segments
16/06/08 14:26:56 INFO mapred.Merger: Down to the last merge-pass, with 1
segments left of total size: 1275 bytes
16/06/08 14:26:56 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/06/08 14:26:56 INFO Configuration.deprecation: mapred.skip.on is
deprecated. Instead, use mapreduce.job.skiprecords
16/06/08 14:26:57 INFO mapred.Task:
Task:attempt_local1838907415_0001_r_000000_0 is done. And is in the process
of committing
16/06/08 14:26:57 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/06/08 14:26:57 INFO mapred.Task: Task
attempt_local1838907415_0001_r_000000_0 is allowed to commit now
16/06/08 14:26:57 INFO output.FileOutputCommitter: Saved output of task
'attempt_local1838907415_0001_r_000000_0' to
hdfs://localhost:54310/user/output/_temporary/0/task_local1838907415_0001_
r_000000
16/06/08 14:26:57 INFO mapred.LocalJobRunner: reduce > reduce
16/06/08 14:26:57 INFO mapred.Task: Task
'attempt_local1838907415_0001_r_000000_0' done.
16/06/08 14:26:57 INFO mapred.LocalJobRunner: Finishing task:
attempt_local1838907415_0001_r_000000_0
16/06/08 14:26:57 INFO mapred.LocalJobRunner: reduce task executor complete.
16/06/08 14:26:57 INFO mapreduce.Job: map 100% reduce 100%
16/06/08 14:26:57 INFO mapreduce.Job: Job job_local1838907415_0001
completed successfully
16/06/08 14:26:57 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=9074
FILE: Number of bytes written=511304
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1560
HDFS: Number of bytes written=860
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=9
Map output records=135
Map output bytes=1319
Map output materialized bytes=1286
Input split bytes=107
Combine input records=135
Combine output records=105
Reduce input groups=105
Reduce shuffle bytes=1286
Reduce input records=105
Reduce output records=105
Spilled Records=210
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=398458880
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=780
File Output Format Counters
Bytes Written=860

hduser@nspublin:/usr/local/hadoop/sbin$ hadoop fs -ls /user/output

16/06/08 14:27:19 WARN util.NativeCodeLoader: Unable to load native-hadoop


library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 1 hduser supergroup 0 2016-06-08 14:26 /user/output/_SUCCESS
-rw-r--r-- 1 hduser supergroup 860 2016-06-08 14:26 /user/output/part-r-
00000

hduser@nspublin:/usr/local/hadoop/sbin$ hadoop fs -cat /user/output/*

16/06/08 14:27:35 WARN util.NativeCodeLoader: Unable to load native-hadoop


library for your platform... using builtin-java classes where applicable
1700 1
19:13 1
2 1
2016 1
4 1
A 3
Alzheimer's 2
Android 1
BST 1
Brannelly 1
Dementia 1
International 1
It 1
June 1
June. 1
Last 1
London 1
Pancras 1
Research 1
Saturday 1
Southampton. 1
St 1
Through 1
Today 1
Tula 1
UK. 1
University 1
Visitors 1
Walk 1
We 1
a 1
and 2
app 3
app's 1
asked 1
at 1
be 2
been 2
by 3
can 2
charity 1
condition 1
created 1
creation. 1
dementia 1
dementia, 1
dementia. 1
different 1
experts 1
field. 1
for 2
forms 1
from 1
had 1
happening 1
has 2
having 1
help 1
help, 1
her 1
how 1
husband 1
improve 1
in 2
increasingly 1
insight 1
into 1
is 1
it 1
launched 1
like 1
live 1
living 1
may 1
of 3
on 1
other 1
out 1
people 1
programme 1
provide 1
reality2
said 1
sense 1
simulates 1
some 1
station 1
supported 1
the 7
them 1
they 1
to 3
told 1
try 1
until 1
updated 1
virtual 2
was 1
we 1
welcomed 1
what 2
why 1
will 2
with 3
woman 1

You might also like