Hadoop 1.2.1 installation - University of Michigan · Let’s run an example! There is an example...

Hadoop 1.2.1 installation

Chun-Chen Tu

[email protected]

Before installation

• Where to get hadoop– http://ftp.twaren.net/Unix/Web/apache/hadoop/common/had

oop-1.2.1/– or my ftp://hadoop:[email protected]– Please download: hadoop-1.2.1.tar.gz

• GUI mode may help for typing commands.• List of commands are also on ftp:

– the file cmd

• In this ppt, commands will be shown in italic and purple color– mkdir hadoop

http://ftp.twaren.net/Unix/Web/apache/hadoop/common/hadoop-1.2.1/

ftp://hadoop:[email protected]/

After login, install required packages firstsudo apt-get install libssl-dev rsync g++

type “y” when asked

Download files:cd Downloadswget ftp://hadoop:[email protected]/hadoop-1.2.1.tar.gzwget ftp://hadoop:[email protected]/jdk-7u45-linux-x64.gz

Install java : reference website(Under Downloads folder)tar -zxvf jdk-7u45-linux-x64.gzsudo mkdir /usr/lib/jdksudo cp -r jdk1.7.0_45 /usr/lib/jdk/

Edit profile: sudo vim /etc/profile(add four lines in at the end of profile)export JAVA_HOME=/usr/lib/jdk/jdk1.7.0_45export JRE_HOME=/usr/lib/jdk/jdk1.7.0_45/jreexport PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATHexport CLASSPATH=$CLASSPATH:.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

Config java:sudo update-alternatives --install /usr/bin/java java /usr/lib/jdk/jdk1.7.0_45/bin/java 300sudo update-alternatives --install /usr/bin/javac javac /usr/lib/jdk/jdk1.7.0_45/bin/javac 300

sudo update-alternatives --config javasudo update-alternatives --config javac

Test it with versionjava –versionYou will see the version informationif success.

http://www.dotblogs.com.tw/jhsiao/archive/2013/09/03/116186.aspx

SSH setting: SSH setting is optional but is recommended if you don’t want to enter password every time.Generate RSA keyssh-keygen -t rsa -P '' -f ~/.ssh/id_rsaCopy public keycat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

SSH test:ssh hadoop@localhostremember to exitexit

You will be asked for the authenticity for the first time. After this connection, no more inquiring.

If you fail the setting, you will need to enter password.

Install hadoop:tar -zxvf hadoop-1.2.1.tar.gzmv hadoop-1.2.1 ~/hadoop move it under home directory for convenience

vim ~/hadoop/conf/hadoop-env.sh edit hadoop environment shell scriptexport JAVA_HOME=/usr add this line

Set environment PATH:sudo vim ~/.bashrc configure bash settingexport PATH=/home/hadoop/hadoop/bin:$PATH add this line so we can use

command hadoop everywhere

logout and thenre-loginthe setting should take effect

type “hadoop” to try

Standalone mode: test if hadoop is available ref: websitecd ~/hadoopmkdir inputcp conf/*.xml inputhadoop jar hadoop-examples-1.2.1.jar grep input output 'dfs[a-z.]+'cat output/part-00000

http://shaurong.blogspot.tw/2013/07/hadoop-112-centos-64-x64.html

Pseudo-distributed configuration: reference websiteedit 3 .xml files under conf foldercore-site.xml, hdfs-site.xml, mapred-site.xml these files may also download from ftpcd ~/hadoop

vim conf/core-site.xml<configuration><property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property></configuration>

vim conf/hdfs-site.xml<configuration> <property> <name>dfs.replication</name>

<value>1</value> </property></configuration>

vim conf/mapred-site.xml<configuration> <property>

<name>mapred.job.tracker</name> <value>localhost:9001</value></property>

</configuration>

http://shaurong.blogspot.tw/2013/07/hadoop-112-centos-64-x64.html

HDFS format:hadoop namenode -format

Start hadoop in pseudo-distributed mode:start-all.sh

type jps to see what’s working

If you need to enter password,it’s fine just inconvenient.To solve this, please refer toSSH setting in previous slides.

Let’s run an example! There is an example jar file under ~/hadoophadoop-examples-1.2.1.jar

hadoop jar hadoop-examples-1.2.1.jar : to get more information

Now suppose we want to run the wordcount example.First, put the input data on HDFS (you have to create your own input.txt first)hadoop dfs –put input.txt /input.txt

Next, execute the wordcount examplehadoop jar hadoop-examples-1.2.1.jar wordcount /input.txt /test_out

Finally, get the resultshadoop dfs –get /test_out test_out

The result file part-r-00000 show up in the directory test_out

Run hadoop with C

• We need to use pipes provided by hadoop.

• Really slow!

Recompile library: websitevim ~/hadoop/src/c++/pipes/impl/HadoopPipes.cc#include <unistd.h>

cd ~/hadoop/src/c++/utilschmod 755 configure./configuremake install

cd ~/hadoop/src/c++/pipesexport LIBS=-lcryptochmod 755 configure./configuremake install

New library will appear in ~/hadoop/src/c++/install

http://www.linuxquestions.org/questions/linux-software-2/hadoop-1-0-3-pipes-server-failed-to-authenticate-4175429779/

Compile wordcount example : websitewget –r –np –nH ftp://hadoop:[email protected]/wordcountmake wordcounthadoop dfs –mkdir testhadoop dfs –put wordcount test/wordcounthadoop dfs –put testdata.txt test/testdata.txt

hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true -input test/testdata.txt -output test/output -program test/wordcount

hadoop dfs -get test/output outputcat output/part-00000

Congratulation!!

http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_2.2_--_Running_C++_Programs_on_Hadoop

ftp://hadoop:[email protected]/wordcount

Hadoop 1.2.1 installation - University of Michigan · Let’s run an example! There is an example...

Documents

Transcript of Hadoop 1.2.1 installation - University of Michigan · Let’s run an example! There is an example...