Wednesday, October 16, 2013

Steps to install Hadoop 2.2.0 Stable release (Single Node Cluster)

Steps to install Hadoop 2.2.0 release (Yarn) on single node cluster setup

1. Prerequisites:

  1. Java 6
  2. Dedicated unix user(hadoop) for hadoop
  3. SSH configured
  4. hadoop 2.x tarball ( hadoop-2.2.0.tar.gz )

Lazy ?? : Check this install-hadoop

2.Installation

$ tar -xvzf hadoop-2.2.0.tar.gz
$ mv hadoop-2.2.0 /home/hadoop/yarn/hadoop-2.2.0
$ cd /home/hadoop/yarn
$ sudo chown -R hadoop:hadoop hadoop-2.2.0
$ sudo chmod -R 755 hadoop-2.2.0


3. Setup Environment Variables in .bashrc (optional)

# Setup for Hadoop 2.0 .

export HADOOP_HOME=$HOME/yarn/hadoop-2.2.0
export HADOOP_MAPRED_HOME=$HOME/yarn/hadoop-2.2.0
export HADOOP_COMMON_HOME=$HOME/yarn/hadoop-2.2.0
export HADOOP_HDFS_HOME=$HOME/yarn/hadoop-2.2.0
export YARN_HOME=$HOME/yarn/hadoop-2.2.0
export HADOOP_CONF_DIR=$HOME/yarn/hadoop-2.2.0/etc/hadoop

After Adding these lines at bottom of the .bashrc file
$ source .bashrc

4. Create Hadoop Data Directories

# Two Directories for name node and datanode .

$ mkdir -p $HOME/yarn/yarn_data/hdfs/namenode
$ mkdir -p $HOME/yarn/yarn_data/hdfs/datanode

5. Configuration

# Base Directory .

cd $YARN_HOME
$ vi etc/hadoop/yarn-site.xml
Add the following contents inside configuration tag
# etc/hadoop/yarn-site.xml .

<property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
</property>
<property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
$ vi etc/hadoop/core-site.xml
Add the following contents inside configuration tag
# etc/hadoop/core-site.xml .

<property>

   <name>fs.default.name</name>

   <value>hdfs://localhost:9000</value>

</property>
$ vi etc/hadoop/hdfs-site.xml
Add the following contents inside configuration tag
# etc/hadoop/hdfs-site.xml .

 <property>

   <name>dfs.replication</name>

   <value>1</value>

 </property>

 <property>

   <name>dfs.namenode.name.dir</name>

   <value>file:/home/hadoop/yarn/yarn_data/hdfs/namenode</value>

 </property>

 <property>

   <name>dfs.datanode.data.dir</name>

   <value>file:/home/hadoop/yarn/yarn_data/hdfs/datanode</value>

 </property>

$ vi etc/hadoop/mapred-site.xml
If this file does not exist, create it and paste the content provided below:
# etc/hadoop/mapred-site.xml .

<?xml version="1.0"?>

<configuration>

   <property>

      <name>mapreduce.framework.name</name>

      <value>yarn</value>

   </property>

</configuration>

6. Format namenode(Onetime Process)

# Command for formatting Name node.

$ bin/hadoop namenode -format

7. Starting HDFS processes and Map-Reduce Process

# HDFS(NameNode & DataNode).

$ sbin/hadoop-daemon.sh start namenode
$ sbin/hadoop-daemon.sh start datanode
# MR(Resource Manager, Node Manager & Job History Server).

$ sbin/yarn-daemon.sh start resourcemanager
$ sbin/yarn-daemon.sh start nodemanager
$ sbin/mr-jobhistory-daemon.sh start historyserver

 8. Verifying Installation

$ jps
# Console Output.

22844 Jps
28711 DataNode
29281 JobHistoryServer
28887 ResourceManager
29022 NodeManager
28180 NameNode

Running Word count Example Program

$ mkdir input
$ cat > input/file
This is word count example
using hadoop 2.2.0
Add input directory to HDFS:
$ bin/hadoop hdfs -copyFromLocal input /input
Note : If hdfs shows error use dfs instead
Run wordcount example jar provided in HADOOP_HOME:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /input /output
Check the output at web interface
http://localhost:50070
Browse HDFS dir for /output folder


8. Verifying Installation

# Commands.

$ sbin/hadoop-daemon.sh stop namenode
$ sbin/hadoop-daemon.sh stop datanode
$ sbin/yarn-daemon.sh stop resourcemanager
$ sbin/yarn-daemon.sh stop nodemanager
$ sbin/mr-jobhistory-daemon.sh stop historyserver

8 comments:

  1. Hi Naveen,

    Thanks for the installation details. I followed the steps set but when I invoked the command
    bin/hadoop hdfs -copyFromLocal input /input I obtained the following error
    "Error: Could not find or load main class hdfs"
    In as subsequent step I also appended all the jar directories to the HADOOP_CLASSPATH variable but that didn't help.
    Any idea what causes this?
    However when I used the deprecated command
    "hadoop dfs -copyFromLocal input /input" the input directory was copied over to the HDFS.
    Thanks,
    Wim

    ReplyDelete
    Replies
    1. Try put command ("hadoop hdfs -put input /input" )
      From which directory you have executing the first command ?

      Delete
    2. Hi Naveen,

      Thanks for your reply.
      I executed the command from my $HOME directory because it have appended the bin directory where hadoop is residing to my PATH env. variabele. As a test, I removed the bin directory from my $PATH variable but that didn't help. Thanks,
      Wim

      Delete
  2. @W
    I had the same error as you, and think this might be a version/deprecation issue: hadoop hdfs is not the way to setup/load the hdfs store now. You use the hdfs script now:

    hdfs dfs -copyFromLocal input /input

    This got things going for me, and the example worked too.

    ReplyDelete
  3. By the way, thanks Naveen for this guide. I'd been struggling getting hadoop running after successfully compiling from the source. There appears to be no published steps on what to do to finish the install, and it's complicated (/etc/conf files everywhere, lots of hadoop exes all over the place, no libexec directories specified etc. etc.). I downloaded the compiled hadoop-2.2.0 with its /bin, /sbin etc. all set up, and was able to get things going with your instructions. Thanks again.

    ReplyDelete
  4. Hai Naveen,

    Thanks for the post. With the help of your post i'm able to configure a single node cluster successfully. It work's fine
    Ranjith

    ReplyDelete
  5. everything is going right with me.....but i am getting problem after submitting the job...that is after http://localhost:50070 i am not getting my output.....and also the link provided during job submitting saying you can see job progress here://some link//...the job progress neither start nor stop....so what is this?? can you help me...

    ReplyDelete
    Replies
    1. Are u getting error in word count MR code or your own MR class ? And can you explain the steps to reproduce

      Delete