Saturday, December 15, 2012

Installing Hadoop on Windows

Having worked for quite some time on mobile technologies from past few days I have been on Big Data and have been trying out quite a few frameworks related to that.Its been a while since I blogged and thought of writing this step by step tutorial on installing hadoop in windows.
I definitely suggest that if you are taking big data seriously its better you setup hadoop on linux rather than windows ( in case if you are looking for production-like environment).

Apache Hadoop is an open-source framework which is used for distributed processing ,performing computations of large data sets on clusters by distributing computations to each of the node.This framework mainly comes with a hadoop kernel , ability to run distributed MapReduce jobs and a filesystem-HDFS.
There are many tutorials which help you install hadoop on windows but most of them have some issues .After referring few tutorials I am writing this to solve what is missed in other ones.
Since I said earlier that this tutorial is to install Hadoop on windows and the fact that hadoop contains lot of shell scripts to be executed we need a *nix shell for windows. Cygwin is one of them and the best as well.download from here.
Run the setup.exe as an administrator and after selecting the mirror for download remember to select ssh package for installation.refer image below :


once done open the cygwin terminal as an administrator.

step 1:  configure ssh using the command ssh-host-config 
here i have configured a username and password 


 once you have given the password you will recv the confirmation as in the above image..
so now once ssh is configured you can test it using the command ssh localhost

 now generate a key to configure the authentication mechanisms of ssh using the command
 ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa 
so that you need not give it everytime u invoke.
 and once done copy it using the command cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys as show above.

now time to download the hadoop framework , i used this mirror and select the hadoop-1.1.1 release.  from the link http://mirror.catn.com/pub/apache/hadoop/common/hadoop-1.1.1/ download the hadoop-1.1.1-bin.tar.gz

(  for list of other mirrors you can always check http://www.apache.org/dyn/closer.cgi/hadoop/common/  )

now extract the hadoop-1.1.1-bin.tar.gz file to c:\cygwin\usr\local and rename c:\cygwin\usr\local\hadoop-1.1.1 folder to c:\cygwin\usr\local\hadoop
now go to the path C:\cygwin\usr\local\hadoop\conf  and open the hadoop-env.sh
go to line9 and u will find an entry for export JAVA_HOME=something
change it to
export JAVA_HOME=/cygdrive/c/Program\ Files/Java/jdk1.6.0_11 
and do not forget to uncomment the line ( remove the # from the beginning of the line )

if you want to get rid of the escape sequence hassle for the space in "program files" you can always install jdk in c:\java\jre or something or use this /cygdrive/c/Program\ Files/Java/jdk1.6.0_11  .it worked for me !!

below are few snap shots of errors which you might get if you dont configure your JAVA_HOME properly.



   (  JAVA_HOME errors )
once your JAVA_HOME is configured ,
open the C:\cygwin\usr\local\hadoop\conf\hdfs-site.xml to configure the hdfs .....add the property tags between the configuration tags so that your file looks like below 


and open the C:\cygwin\usr\local\hadoop\conf\mapred-site.xml to configure the mapreduce service :




once this is done we can now format the hdfs filesystem using the command
bin/hadoop namenode -format 





 and start the dfs subsystems using the command
bin/start-dfs.sh


and you can see hadoop running in http://localhost:yourportnumber/dfshealth.jsp



now we have successfully installed hadoop on windows.I will most more as on when I learn and explore.

2 comments:

Mathew Stephen said...

I have configured in your way and its working. But when I am running code , its completing map 100 but reduce 0%/. Please I am facing this issue since long time please help me

Big Data Hadoop Training in Chennai
Best Hadoop Training in Chennai

Steve Hawks said...

There are lots of information about latest technology and how to get trained in them, like Big Data Training in Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Big Data Training). By the way you are running a great blog. Thanks for sharing this.

Hadoop Training in Chennai | Big Data Training in Chennai