CDH3 Single Node Installation Guide Dell Server...

Post on 02-Aug-2020

1 views 0 download

Transcript of CDH3 Single Node Installation Guide Dell Server...

CDH3 Single Node Installation Guide

Dell Server Configuration Guide

Ubuntu 10.04.4 LTS Desktop Installation

Guide

Created: 01-12-2015

Author: Hyun Kim

Last Updated: 01-12-2015

Version Number: 0.1

Contact info: hyunk@loganbright.com

Krish@loganbriht.com

Downloading Ubuntu 10.04.4 LTS Desktop

1. In order to run CDH3, we need an operating system. For this

particular demonstration, we are going to use Ubuntu Desktop. No,

you don’t need to remove your current operating system. Ubuntu is

quite light and you can install in ON your current operating

system. The best part is, if you don’t like it, you can easily

remove it. No hard feelings. Sounds good? Let’s get started.

2. Before you do ANYTHING and I mean ANYTHING, you need to check what

operating system CDH3 supports. Our ultimate goal is to install

CDH3 on the Ubuntu. You can check the requirements for CDH3 by

clicking the link below:

http://www.cloudera.com/content/cloudera/en/documentation

/archives/cdh3/v3u6/CDH3-Quick-Start/cdh3qs_topic_2.html

3. For this demonstration, we are going to install Ubuntu 10.04 LTS

(Lucid Lynx) Desktop 64-bit. CDH3 supports 32-bit operating

systems. Yet, according to Cloudera, “for production environments,

64-bit packages are recommended”. Therefore, be aware.

4. Ubuntu is a free operating system and you can download it from the

link below. Click the “64-bit PC (AMD64) desktop CD” link and it

will automatically start downloading.

http://old-releases.ubuntu.com/releases/lucid/

5. If you successfully downloaded Ubuntu disk image file on Google

Chrome like I did, it will be saved on your Downloads folder.

However, if you are unsure where the file is saved, click the down

arrow button next to the download icon and it will give you a list

of options. Select ‘Show in folders’, which will open up the folder

where the Ubuntu disk image is downloaded.

5. Done!

Creating a bootable Ubuntu USB Flash Drive

1. We downloaded Ubuntu and now we need to install it. In this

tutorial, I’m trying to install the Ubuntu on a server. In order to

do this, I have a couple options. However, I have a laptop with

Windows 7 installed on it and I happen to have a 7gb usb flash

drive. If you looked at the picture above you know what I’m going

to do. We are going to create a bootable Ubuntu USB Flash Drive!

This is already well explained on the official Ubuntu website. I

will leave some links below.

Creating a bootable USB stick on Windows.

http://www.ubuntu.com/download/desktop/create-a-usb-stick-on-

windows

Creating a bootable USB stick on Ubuntu.

http://www.ubuntu.com/download/desktop/create-a-usb-stick-on-ubuntu

Creating a bootable USB stick on OS X

http://www.ubuntu.com/download/desktop/create-a-usb-stick-on-mac-

osx

Download Universal USB Installer

http://www.pendrivelinux.com/universal-usb-installer-easy-as-1-2-

3/#button

2. Since I’m currently using Windows 7, I will click on the first link

and follow the instruction. However, Universal USB installer failed

to recognize my USB Flash Drive. Therefore, I had to activate the

“Now Showing All Drives” button in order to select my USB Flash

Drive. On my computer, F is the USB Flash Drive.

3. If you have anything on your USB flash drive, activate the format

option. As a matter of fact, my USB driver was already formatted

but just to be safe, I formatted it again on the installer. Now you

may click “create” button to create a bootable Ubuntu USB Flash

Drive.

4. Once the installation is successfully done, you will see what’s

shown in the picture below. You’ve created a bootable Ubuntu USB

Flash Drive.

Creating New Virtual Disk

1. In this tutorial, I’m using a Dell PowerEdge server. Turn on the

server and press Ctrl+R to run configuration utility. You will see

the screen below.

2. Press ‘F2’ while ‘Controller 0’ is selected. Select “Create New VD”

and press Enter.

3. I’m going to set RAID Level: RAID-1. To select drivers, use the

spacebar. Use the tab key to go to “Basic Settings”. Name the VD

and I left the “Advanced Settings” unchanged. You may configure the

“Advanced Settings” as you wish if your server allows to. Select OK

and press Enter to create a new virtual disk.

4. Let the virtual disk to initialize. This may take a couple hours

but it’s better to get it done now than later. Once it’s done,

restart the server by using the “Ctrl+Alt+Delete” key command.

Installing Ubuntu

1. Press F11 key to run BIOS boot manager after you restart the

server. Insert the Ubuntu bootable USB Flash Drive to the server.

2. You will see the options as shown in the picture below. Select

“Hard Drive C:” by using the down arrow key and select “From USB: “

option on the list. This will boot your USB flash drive.

3. Select Install Ubuntu and press Enter.

4. Change settings appropriately and press “Continue” button until

installation is being started.

5. Wait until the installation is completed. Once the installation is

done, we are almost ready to install CDH3.

Download JDK and Install it

1. Download JDK from the link below.

http://www.oracle.com/technetwork/java/javase/downloads/java-

archive-downloads-javase6-419409.html#jdk-6u26-oth-JPR

2. Since “Cloudera recommends version 1.6.0_26” we will be installing

that version of JDK. To extract and install jdk-6u26-linux-x64.bin, open

Terminal and do the followings.

3. Copy the file to /usr/local by using the commands below

$ cd Downloads

(Assuming that the JDK file is saved on Downloads folder)

$ sudo cp jdk-6u26-linux-x64.bin /usr/local

(this copies the file to /usr/local)

$ cd /usr/local

$ sudo sh jdk-6u26-linux-x64.bin

Download and installing CDH3 package

1. Now we are finally ready to install CDH3. Click the link below to

download CDH3 package. We installed Ubuntu 12.04 Lucid Lynx.

http://www.cloudera.com/content/cloudera/en/documentation/archives/

cdh3/v3u6/CDH3-Installation-Guide/cdh3ig_topic_4_4.html

$ cd

$ sudo dpkg -i Downloads/cdh3-repository_1.0_all.deb

Install CDH3 on all hosts

$ sudo apt-get update

$ apt-cache search hadoop

$ sudo apt-get install hadoop-0.20 hadoop-0.20-native

(Press y and then enter to continue installation)

Install daemon-type you need. However, this is tutorial for a

single node cluster. Therefore, I will be installing all of them.

$ sudo apt-get install hadoop-0.20-namenode

$ sudo apt-get install hadoop-0.20-datanode

$ sudo apt-get install hadoop-0.20-secondarynamenode

$ sudo apt-get install hadoop-0.20-tasktracker

$ sudo apt-get install hadoop-0.20-jobtracker

Add CDH3 Repository

Create a file by entering this command below:

$ sudo nano /etc/apt/sources.list.d/cloudera.list

edit the file by adding these two lines below:

deb http://archive.cloudera.com/debian lucid-cdh3 contrib

deb-src http://archive.cloudera.com/debian lucid-cdh3 contrib

Press “Ctrl+x” to save and exit the file.

Add Repository Key

$ curl -s http://archive.cloudera.com/debian/archive.key | sudo apt-key add -

(this gave me an error explaining that ‘curl’ needs to be installed)

$ sudo apt-get install curl

Set JAVA_HOME and HADOOP_HOME

1. Now we’ve installed all the hadoop packages. Are we done? No, not

quite yet. We need to set JAVA_HOME and HADOOP_HOME so that the

system can recognize what’s installed.

$ cd /usr/lib/hadoop-0.20/bin

$ nano ~/.bashrc

Once the file is opened, on the bottom of the file, copy and paste

these:

export HADOOP_HOME=/usr/lib/hadoop

export PATH=$PATH:/usr/lib/hadoop/bin

export JAVA_HOME=/usr/local/jdk1.6.0_26

export PATH=$PATH:/usr/local/jdk1.6.0_26/bin

Leave everything else unchanged.

Your java path might be different if you are using different Ubuntu

version or different version of jdk.

2. Run the following commands to see if JAVA_HOME and HADOOP_HOME are

set correctly.

3. If the commands output nothing, close the terminal and try the

commands again by reopening terminal.

4. If you still don’t get any output from the commands, go back to the

previous step and see if you misspelled anything or if there is any

extra comma when you edited .bashrc file.

5. If you see what’s in the picture below, you’ve set JAVA_HOME and

HADOOP_HOOP properly.

Hadoop and Java Version

1. $ hadoop version

$ java -version

If it prints out something similar to what’s shown in the picture

above, you’ve done everything correctly so far.

Edit hadoop-env.sh

$ sudo gedit /usr/lib/hadoop/conf/hadoop-env.sh

I didn’t delete anything. I just added these two lines and that’s

good enough.

export JAVA_HOME=/usr/local/jdk1.6.0_26

export HADOOP_HOME=/usr/lib/hadoop

Adding Dedicated users to Hadoop Group

$sudo gpasswd -a hdfs hadoop

$sudo gpasswd -a mapred hadoop

Edit core-site.xml

$ sudo gedit /usr/lib/hadoop/conf/core-site.xml

add this property between <configuration> </configuration>

<property> <name>hadoop.tmp.dir</name> <value>/usr/lib/hadoop/tmp</value> </property> <property>

<name>fs.default.name</name> <value>hdfs://localhost:8020</value> </property>

$ sudo mkdir /usr/lib/hadoop/tmp

$ cd /usr/lib/hadoop/

$ sudo chmod 750 /usr/lib/hadoop/tmp/

$ sudo chown hdfs:hadoop /usr/lib/hadoop/tmp/

See how the tmp folder is not under “root root”? Instead, it is under

“hdfs hadoop”, which we just did.

To see this on your machine, use the command below:

$ls -la /usr/lib/hadoop/

hdfs-site.xml

$ sudo gedit /usr/lib/hadoop/conf/hdfs-site.xml

add this property between <configuration> </configuration>

<property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.name.dir</name> <value>/storage/name</value> </property> <property> <name>dfs.data.dir</name>

<value>/storage/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property>

$cd

$cd /usr/lib/hadoop/conf

$ sudo mkdir storage

$ sudo chmod 775 /storage/

$ sudo chown hdfs:hadoop /storage/

$ls -la /usr/lib/hadoop/conf/

mapred-site.xml

$ sudo gedit /usr/lib/hadoop/conf/mapred-site.xml

add this property between <configuration> </configuration>

<property>

<name>mapred.job.tracker</name>

<value>hdfs://localhost:8021</value>

</property>

<property>

<name>mapred.system.dir</name>

<value>/home/cdh3/mapred/system</value>

</property>

<property>

<name>mapred.local.dir</name>

<value>/home/cdh3/mapred/local</value>

</property>

<property>

<name>mapred.temp.dir</name>

<value>/home/cdh3/mapred/temp</value>

</property>

$cd

$ sudo mkdir /home/ chd3/mapred

$ sudo chmod 775 /home/ cdh3 /mapred

$ sudo chown mapred:hadoop /home/ cdh3 /mapred

User Assignment

Format namenode

Type and enter the commands below.

$ cd /usr/lib/hadoop/bin/

$ sudo -u hdfs hadoop namenode -format

When I tried to format namenode, this error occurred. In this case, we

just need to edit a few things so that Hadoop-config can read

jdk1.6.0_26. No big deal.

First, to open hadoop-config

$ cd /usr/lib/hadoop/bin/

$ sudo gedit hadoop-config.sh

This should fix the problem. Save and try again.

$ sudo -u hdfs hadoop namenode -format

will give you this screen below

Start Daemons

$ sudo /etc/init.d/hadoop-0.20-namenode start

$ sudo /etc/init.d/hadoop-0.20-secondarynamenode start

$ sudo /etc/init.d/hadoop-0.20-jobtracker start

$ sudo /etc/init.d/hadoop-0.20-datanode start

$ sudo /etc/init.d/hadoop-0.20-tasktracker start

$ netstat -ptlen

Checking UI

On your internet browser, type

“localhost:50030”

to open the “NameNode” page

On your internet browser, type

“localhost:50070”

to open the “Map/Reduce administration” page

If you see the pages above, you have