Wednesday, October 16, 2013

Steps to install Hadoop 2.2.0 Stable release (Single Node Cluster)

Steps to install Hadoop 2.2.0 release (Yarn) on single node cluster setup

1. Prerequisites:

  1. Java 6
  2. Dedicated unix user(hadoop) for hadoop
  3. SSH configured
  4. hadoop 2.x tarball ( hadoop-2.2.0.tar.gz )

Lazy ?? : Check this install-hadoop

2.Installation

$ tar -xvzf hadoop-2.2.0.tar.gz
$ mv hadoop-2.2.0 /home/hadoop/yarn/hadoop-2.2.0
$ cd /home/hadoop/yarn
$ sudo chown -R hadoop:hadoop hadoop-2.2.0
$ sudo chmod -R 755 hadoop-2.2.0


3. Setup Environment Variables in .bashrc (optional)

# Setup for Hadoop 2.0 .

export HADOOP_HOME=$HOME/yarn/hadoop-2.2.0
export HADOOP_MAPRED_HOME=$HOME/yarn/hadoop-2.2.0
export HADOOP_COMMON_HOME=$HOME/yarn/hadoop-2.2.0
export HADOOP_HDFS_HOME=$HOME/yarn/hadoop-2.2.0
export YARN_HOME=$HOME/yarn/hadoop-2.2.0
export HADOOP_CONF_DIR=$HOME/yarn/hadoop-2.2.0/etc/hadoop

After Adding these lines at bottom of the .bashrc file
$ source .bashrc

4. Create Hadoop Data Directories

# Two Directories for name node and datanode .

$ mkdir -p $HOME/yarn/yarn_data/hdfs/namenode
$ mkdir -p $HOME/yarn/yarn_data/hdfs/datanode

5. Configuration

# Base Directory .

cd $YARN_HOME
$ vi etc/hadoop/yarn-site.xml
Add the following contents inside configuration tag
# etc/hadoop/yarn-site.xml .

<property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
</property>
<property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
$ vi etc/hadoop/core-site.xml
Add the following contents inside configuration tag
# etc/hadoop/core-site.xml .

<property>

   <name>fs.default.name</name>

   <value>hdfs://localhost:9000</value>

</property>
$ vi etc/hadoop/hdfs-site.xml
Add the following contents inside configuration tag
# etc/hadoop/hdfs-site.xml .

 <property>

   <name>dfs.replication</name>

   <value>1</value>

 </property>

 <property>

   <name>dfs.namenode.name.dir</name>

   <value>file:/home/hadoop/yarn/yarn_data/hdfs/namenode</value>

 </property>

 <property>

   <name>dfs.datanode.data.dir</name>

   <value>file:/home/hadoop/yarn/yarn_data/hdfs/datanode</value>

 </property>

$ vi etc/hadoop/mapred-site.xml
If this file does not exist, create it and paste the content provided below:
# etc/hadoop/mapred-site.xml .

<?xml version="1.0"?>

<configuration>

   <property>

      <name>mapreduce.framework.name</name>

      <value>yarn</value>

   </property>

</configuration>

6. Format namenode(Onetime Process)

# Command for formatting Name node.

$ bin/hadoop namenode -format

7. Starting HDFS processes and Map-Reduce Process

# HDFS(NameNode & DataNode).

$ sbin/hadoop-daemon.sh start namenode
$ sbin/hadoop-daemon.sh start datanode
# MR(Resource Manager, Node Manager & Job History Server).

$ sbin/yarn-daemon.sh start resourcemanager
$ sbin/yarn-daemon.sh start nodemanager
$ sbin/mr-jobhistory-daemon.sh start historyserver

 8. Verifying Installation

$ jps
# Console Output.

22844 Jps
28711 DataNode
29281 JobHistoryServer
28887 ResourceManager
29022 NodeManager
28180 NameNode

Running Word count Example Program

$ mkdir input
$ cat > input/file
This is word count example
using hadoop 2.2.0
Add input directory to HDFS:
$ bin/hadoop hdfs -copyFromLocal input /input
Note : If hdfs shows error use dfs instead
Run wordcount example jar provided in HADOOP_HOME:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /input /output
Check the output at web interface
http://localhost:50070
Browse HDFS dir for /output folder


8. Verifying Installation

# Commands.

$ sbin/hadoop-daemon.sh stop namenode
$ sbin/hadoop-daemon.sh stop datanode
$ sbin/yarn-daemon.sh stop resourcemanager
$ sbin/yarn-daemon.sh stop nodemanager
$ sbin/mr-jobhistory-daemon.sh stop historyserver