HADOOP installation on ubuntu

Posted on Updated on

Please refer this page if you are interested to know the required steps for setting up a single-node Hadoop cluster backed by the Hadoop Distributed File System, running on Ubuntu Linux.

You need to download Ubuntu and VMware workstation before proceeding further.

1) Download ubuntu iso from here.
2) Download vmware workstation ( required for running ubuntu operating system from your laptop )

Ubuntu Installation using VMware:

1) Install VMare workstation
2) Open VMware and click on create a new virtual machine with typical settings and select ubuntu iso file once prompted
3) Allocate 20 GB hdd and 1/2 GB RAM
4) Make sure you’re connected to internet at the time of installation
5) Once completed the setup, log on to ubuntu with the credentials created at time of setup.
6) Open terminal ( shortcut key is Ctrl+Alt+k )
7) install openssh-server
   sudo apt-get install openssh-server
8) install java
    sudo apt-get install openjdk-6-jdk  ( for hadoop 2.0 install java 7 )

Hadoop Installation:

1) Download latest Hadoop 1.0 or 2.0 binary based on your requirement from apache Hadoop official website
2) copy the Hadoop binary tar into ubuntu under some folder
3) untar Hadoop
 sudo tar xzf enter hadoop directory path here
4) edit .bashrc file and update the below environmental variables
export HADOOP_HOME=enter hadoop directory path here
export JAVA_HOME=/usr/lib/jvm/java-6-sun ( if java 7 update java7 path )
export PATH=$PATH:$HADOOP_HOME/bin

 SSH configuration:

Hadoop requires ssh to mange its nodes. From the terminal please execute “ssh-keygen -t rsa -P “”
The key’s random art image is displayed on the screen after executing above command.
Now copy  id_rsa.pub to authorized_keys ( this will enable SSH access to your local machine )
cd /.ssh
cp id_rsa.pub >> authorized_keys
Check whether “ssh localhost” is working.

Hadoop configuration for 1.x:

1) Open from filebrowser
2) go to conf directory
3) edit hadoop-env.sh file and change the JAVA_HOME path same as configured in .bashrc file
4) edit core-site.xml and copy the below lines

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>

This configure default file system and port.
5) edit mapred-site.xml and copy the below lines:

  <configuration>
  <property>
  <name>mapred.job.tracker</name>
  <value>localhost:8021</value>
  </property>
  </configuration>

  this configuration is related to job tracker

6) Edit hdfs-site.xml and set the replication factor to 1 as this is single node configuration

  <configuration>
<property>
  <name>dfs.replication</name>
<value>
 1</value>
</property>

   <property>
   <name>
dfs.permissions</name>
<value>
false</value>
</property>
</configuration>

 Note: If <configuration> </configuration> tag already exist then please do not enter again as it could cause issues with  hadoop start up.

Formatting file system and starting Hadoop daemons:

1) open a fresh terminal and execute below command(this will format the file system)
hadoop namenode -format
2) Now execute ‘start-all.sh’ to start all the daemons
3) type ‘jps’ and press enter to check which daemons are running
4) stop-all.sh will stop all daemons
Note: To avoid formatting Hadoop filesystem every time during ubuntu start-up, please copy below lines under core-site.xml (by default HDFS files are stored under /tmp) :

<configuration>
<property>
     <name>hadoop.tmp.dir</name>
<value>
enter some HDFS path</value>
</property>
</configuration>

 For Hadoop 2.0 do below changes in conf files:

1) Edit mapred-site.xml and add the below properties under <name> and <value> tags

  mapreduce.framework.name
  yarn

2) Edit core-site.xml add the below properties under <name> and <value> tags

fs.defaultFS
hdfs://localhost:9000

hadoop.tmp.dir
/home/xxxx/hdfs-tmp
 

3) Edit hdfs-site.xml add the below properties under <name> and <value> tags

        dfs.replication
        1
        

4) Edit yarn-site.xml add the below properties under <name> and <value> tags

yarn.nodemanager.aux-services
mapreduce_shuffle

You are Done….Enjoy Hadooping 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s