CDH3 Single Node Installation Guide Dell Server...
Transcript of CDH3 Single Node Installation Guide Dell Server...
CDH3 Single Node Installation Guide
Dell Server Configuration Guide
Ubuntu 10.04.4 LTS Desktop Installation
Guide
Created: 01-12-2015
Author: Hyun Kim
Last Updated: 01-12-2015
Version Number: 0.1
Contact info: [email protected]
Downloading Ubuntu 10.04.4 LTS Desktop
1. In order to run CDH3, we need an operating system. For this
particular demonstration, we are going to use Ubuntu Desktop. No,
you don’t need to remove your current operating system. Ubuntu is
quite light and you can install in ON your current operating
system. The best part is, if you don’t like it, you can easily
remove it. No hard feelings. Sounds good? Let’s get started.
2. Before you do ANYTHING and I mean ANYTHING, you need to check what
operating system CDH3 supports. Our ultimate goal is to install
CDH3 on the Ubuntu. You can check the requirements for CDH3 by
clicking the link below:
http://www.cloudera.com/content/cloudera/en/documentation
/archives/cdh3/v3u6/CDH3-Quick-Start/cdh3qs_topic_2.html
3. For this demonstration, we are going to install Ubuntu 10.04 LTS
(Lucid Lynx) Desktop 64-bit. CDH3 supports 32-bit operating
systems. Yet, according to Cloudera, “for production environments,
64-bit packages are recommended”. Therefore, be aware.
4. Ubuntu is a free operating system and you can download it from the
link below. Click the “64-bit PC (AMD64) desktop CD” link and it
will automatically start downloading.
http://old-releases.ubuntu.com/releases/lucid/
5. If you successfully downloaded Ubuntu disk image file on Google
Chrome like I did, it will be saved on your Downloads folder.
However, if you are unsure where the file is saved, click the down
arrow button next to the download icon and it will give you a list
of options. Select ‘Show in folders’, which will open up the folder
where the Ubuntu disk image is downloaded.
5. Done!
Creating a bootable Ubuntu USB Flash Drive
1. We downloaded Ubuntu and now we need to install it. In this
tutorial, I’m trying to install the Ubuntu on a server. In order to
do this, I have a couple options. However, I have a laptop with
Windows 7 installed on it and I happen to have a 7gb usb flash
drive. If you looked at the picture above you know what I’m going
to do. We are going to create a bootable Ubuntu USB Flash Drive!
This is already well explained on the official Ubuntu website. I
will leave some links below.
Creating a bootable USB stick on Windows.
http://www.ubuntu.com/download/desktop/create-a-usb-stick-on-
windows
Creating a bootable USB stick on Ubuntu.
http://www.ubuntu.com/download/desktop/create-a-usb-stick-on-ubuntu
Creating a bootable USB stick on OS X
http://www.ubuntu.com/download/desktop/create-a-usb-stick-on-mac-
osx
Download Universal USB Installer
http://www.pendrivelinux.com/universal-usb-installer-easy-as-1-2-
3/#button
2. Since I’m currently using Windows 7, I will click on the first link
and follow the instruction. However, Universal USB installer failed
to recognize my USB Flash Drive. Therefore, I had to activate the
“Now Showing All Drives” button in order to select my USB Flash
Drive. On my computer, F is the USB Flash Drive.
3. If you have anything on your USB flash drive, activate the format
option. As a matter of fact, my USB driver was already formatted
but just to be safe, I formatted it again on the installer. Now you
may click “create” button to create a bootable Ubuntu USB Flash
Drive.
4. Once the installation is successfully done, you will see what’s
shown in the picture below. You’ve created a bootable Ubuntu USB
Flash Drive.
Creating New Virtual Disk
1. In this tutorial, I’m using a Dell PowerEdge server. Turn on the
server and press Ctrl+R to run configuration utility. You will see
the screen below.
2. Press ‘F2’ while ‘Controller 0’ is selected. Select “Create New VD”
and press Enter.
3. I’m going to set RAID Level: RAID-1. To select drivers, use the
spacebar. Use the tab key to go to “Basic Settings”. Name the VD
and I left the “Advanced Settings” unchanged. You may configure the
“Advanced Settings” as you wish if your server allows to. Select OK
and press Enter to create a new virtual disk.
4. Let the virtual disk to initialize. This may take a couple hours
but it’s better to get it done now than later. Once it’s done,
restart the server by using the “Ctrl+Alt+Delete” key command.
Installing Ubuntu
1. Press F11 key to run BIOS boot manager after you restart the
server. Insert the Ubuntu bootable USB Flash Drive to the server.
2. You will see the options as shown in the picture below. Select
“Hard Drive C:” by using the down arrow key and select “From USB: “
option on the list. This will boot your USB flash drive.
3. Select Install Ubuntu and press Enter.
4. Change settings appropriately and press “Continue” button until
installation is being started.
5. Wait until the installation is completed. Once the installation is
done, we are almost ready to install CDH3.
Download JDK and Install it
1. Download JDK from the link below.
●
http://www.oracle.com/technetwork/java/javase/downloads/java-
archive-downloads-javase6-419409.html#jdk-6u26-oth-JPR
2. Since “Cloudera recommends version 1.6.0_26” we will be installing
that version of JDK. To extract and install jdk-6u26-linux-x64.bin, open
Terminal and do the followings.
3. Copy the file to /usr/local by using the commands below
$ cd Downloads
(Assuming that the JDK file is saved on Downloads folder)
$ sudo cp jdk-6u26-linux-x64.bin /usr/local
(this copies the file to /usr/local)
$ cd /usr/local
$ sudo sh jdk-6u26-linux-x64.bin
Download and installing CDH3 package
1. Now we are finally ready to install CDH3. Click the link below to
download CDH3 package. We installed Ubuntu 12.04 Lucid Lynx.
http://www.cloudera.com/content/cloudera/en/documentation/archives/
cdh3/v3u6/CDH3-Installation-Guide/cdh3ig_topic_4_4.html
$ cd
$ sudo dpkg -i Downloads/cdh3-repository_1.0_all.deb
Install CDH3 on all hosts
$ sudo apt-get update
$ apt-cache search hadoop
$ sudo apt-get install hadoop-0.20 hadoop-0.20-native
(Press y and then enter to continue installation)
Install daemon-type you need. However, this is tutorial for a
single node cluster. Therefore, I will be installing all of them.
$ sudo apt-get install hadoop-0.20-namenode
$ sudo apt-get install hadoop-0.20-datanode
$ sudo apt-get install hadoop-0.20-secondarynamenode
$ sudo apt-get install hadoop-0.20-tasktracker
$ sudo apt-get install hadoop-0.20-jobtracker
Add CDH3 Repository
Create a file by entering this command below:
$ sudo nano /etc/apt/sources.list.d/cloudera.list
edit the file by adding these two lines below:
deb http://archive.cloudera.com/debian lucid-cdh3 contrib
deb-src http://archive.cloudera.com/debian lucid-cdh3 contrib
Press “Ctrl+x” to save and exit the file.
Add Repository Key
$ curl -s http://archive.cloudera.com/debian/archive.key | sudo apt-key add -
(this gave me an error explaining that ‘curl’ needs to be installed)
$ sudo apt-get install curl
Set JAVA_HOME and HADOOP_HOME
1. Now we’ve installed all the hadoop packages. Are we done? No, not
quite yet. We need to set JAVA_HOME and HADOOP_HOME so that the
system can recognize what’s installed.
$ cd /usr/lib/hadoop-0.20/bin
$ nano ~/.bashrc
Once the file is opened, on the bottom of the file, copy and paste
these:
export HADOOP_HOME=/usr/lib/hadoop
export PATH=$PATH:/usr/lib/hadoop/bin
export JAVA_HOME=/usr/local/jdk1.6.0_26
export PATH=$PATH:/usr/local/jdk1.6.0_26/bin
Leave everything else unchanged.
Your java path might be different if you are using different Ubuntu
version or different version of jdk.
2. Run the following commands to see if JAVA_HOME and HADOOP_HOME are
set correctly.
3. If the commands output nothing, close the terminal and try the
commands again by reopening terminal.
4. If you still don’t get any output from the commands, go back to the
previous step and see if you misspelled anything or if there is any
extra comma when you edited .bashrc file.
5. If you see what’s in the picture below, you’ve set JAVA_HOME and
HADOOP_HOOP properly.
Hadoop and Java Version
1. $ hadoop version
$ java -version
If it prints out something similar to what’s shown in the picture
above, you’ve done everything correctly so far.
Edit hadoop-env.sh
$ sudo gedit /usr/lib/hadoop/conf/hadoop-env.sh
I didn’t delete anything. I just added these two lines and that’s
good enough.
export JAVA_HOME=/usr/local/jdk1.6.0_26
export HADOOP_HOME=/usr/lib/hadoop
Adding Dedicated users to Hadoop Group
$sudo gpasswd -a hdfs hadoop
$sudo gpasswd -a mapred hadoop
Edit core-site.xml
$ sudo gedit /usr/lib/hadoop/conf/core-site.xml
add this property between <configuration> </configuration>
<property> <name>hadoop.tmp.dir</name> <value>/usr/lib/hadoop/tmp</value> </property> <property>
<name>fs.default.name</name> <value>hdfs://localhost:8020</value> </property>
$ sudo mkdir /usr/lib/hadoop/tmp
$ cd /usr/lib/hadoop/
$ sudo chmod 750 /usr/lib/hadoop/tmp/
$ sudo chown hdfs:hadoop /usr/lib/hadoop/tmp/
See how the tmp folder is not under “root root”? Instead, it is under
“hdfs hadoop”, which we just did.
To see this on your machine, use the command below:
$ls -la /usr/lib/hadoop/
hdfs-site.xml
$ sudo gedit /usr/lib/hadoop/conf/hdfs-site.xml
add this property between <configuration> </configuration>
<property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.name.dir</name> <value>/storage/name</value> </property> <property> <name>dfs.data.dir</name>
<value>/storage/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property>
$cd
$cd /usr/lib/hadoop/conf
$ sudo mkdir storage
$ sudo chmod 775 /storage/
$ sudo chown hdfs:hadoop /storage/
$ls -la /usr/lib/hadoop/conf/
mapred-site.xml
$ sudo gedit /usr/lib/hadoop/conf/mapred-site.xml
add this property between <configuration> </configuration>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:8021</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/home/cdh3/mapred/system</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/cdh3/mapred/local</value>
</property>
<property>
<name>mapred.temp.dir</name>
<value>/home/cdh3/mapred/temp</value>
</property>
$cd
$ sudo mkdir /home/ chd3/mapred
$ sudo chmod 775 /home/ cdh3 /mapred
$ sudo chown mapred:hadoop /home/ cdh3 /mapred
User Assignment
Format namenode
Type and enter the commands below.
$ cd /usr/lib/hadoop/bin/
$ sudo -u hdfs hadoop namenode -format
When I tried to format namenode, this error occurred. In this case, we
just need to edit a few things so that Hadoop-config can read
jdk1.6.0_26. No big deal.
First, to open hadoop-config
$ cd /usr/lib/hadoop/bin/
$ sudo gedit hadoop-config.sh
This should fix the problem. Save and try again.
$ sudo -u hdfs hadoop namenode -format
will give you this screen below
Start Daemons
$ sudo /etc/init.d/hadoop-0.20-namenode start
$ sudo /etc/init.d/hadoop-0.20-secondarynamenode start
$ sudo /etc/init.d/hadoop-0.20-jobtracker start
$ sudo /etc/init.d/hadoop-0.20-datanode start
$ sudo /etc/init.d/hadoop-0.20-tasktracker start
$ netstat -ptlen
Checking UI
On your internet browser, type
“localhost:50030”
to open the “NameNode” page
On your internet browser, type
“localhost:50070”
to open the “Map/Reduce administration” page
If you see the pages above, you have