Post on 11-Jul-2015
Apache Hadoop cluster
on Macintosh OSX
The Trigger #DIY
The Kitchen Setup
The
Net
wo
rk
Master Chef a.k.a Namenode
Helpers a.k.a Datanode(s)
The Base Ingredients
0.13.0
10.7.5
0.9.5
200 MB/s
2.4.0
1.7.0.55
5.6.17
Basics• Ensure that all the namenode and datanode machines are running
on the same OSX version• For the purpose of this POC, I have selected OSX 10.7.5. All sample
commands are specific to this OS. You may need to tweak the commands to suit your OS version compatibility
• I am a homebrew fan , so I have used the old and gold ruby based platform for downloading all software needed to run the POC. You may very well opt for downloading the installers individually and tweak the process if you wish
• You will need fair bit of understanding of OSX and Hadoop to understand and interpret. If not, no worries – most of the stuff can be looked up online by simple Google search
• The “Namenode” machine needs more RAM than “Datanode” machines. Please configure the namenode machine with at least 8 GB RAM
The Cooking
• Ensure that ALL datanodes and namenode machines are running on the same OSX version and preferably have regulated software update strategy (i.e. automatic software disabled)
• Disable automatic “sleep” options in the machines to avoid machines goes into hibernation (from System Preferences)
• Download and Install “Xcode command line tools for Lion” (skip if Xcodepresent)
• As of today, hadoop is not IPv6 friendly. So, please disable IPv6 on all machines:
“networksetup –listallnetworkservices” command will display all the network names that your machine uses to connect to your network (E.g: Ethernet, Wi-Fi etc.)
“networksetup –setv6off Ethernet” will disable IPv6 over Ethernet (you may need to change the network name if it is any different)
The Cooking..
• Give logical names to ALL machines e.g. namenode.local ,datanode01.local
datanode02.local et al. (from System Preferences -> Sharing -> Computer
Name)
• Enable the following services from the Sharing panel of System
Preferences
– File Sharing
– Remote Login
– Remote Management
• Create one universal username (with Administrator privileges) on all
machines . E.g: hadoopuser. Preferably have the same password
• For the rest of steps , please login as this user and execute the commands
The Cooking
• On the namenode, run the command:
vi /etc/hosts
• Add all datanode hostnames , one host per line
• On each of the datanodes, run the command:
vi /etc/hosts
• Add the namenode hostname
sudo visudo
• Add an entry on the last line of the file as under:
hadoopuser ALL=(ALL) NOPASSWD: ALL
Coffee Time
• Install Java JDK and JRE on all the machines from Oracle Site
(http://bit.ly/1s2i7VC) . Configure $JAVA_HOME (see slides for
instructions)
• Set $JAVA_HOME in ALL machines. Usually, it is best to configure the same
in your .profile file. Run the following command to open your .profile
• vi ~/.profile
• #Paste the subsequent lines in the file and save it :export JAVA_HOME="`/System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands/java_home`"
• You may additionally paste the following lines in the same file:export PATH=$PATH:/usr/local/sbin
PS1="\H : \d \t: \w :"
This is helpful for housekeeping activities
The Brewing
• Install “brew” and other components from it Run on terminal :
ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"
[the quotes need to be there]
Run following command on terminal to ensure that it has been installed properly
brew doctor
Run following commands in the same order on terminal
brew install makedepend
brew install wget
brew install ssh-copy-id
brew install hadoop
Run following command on the “namenode” machine
brew install hive
brew install mysql
[assumption is that namenode will host resourcemanager, jobtracker, hive metastore, hiveserver.
brew installs the software in “/usr/local/Cellar” location]
Run the following command for setting up keyless login from namenode to ALL datanodes. Run the command on namenode:
ssh-keygen
[press Enter key twice to accept default RSA , and no-passphrase]
Run the following command recursively for ALL datanode hostnames. Run the command on namenode:
ssh-copy-id hadoopuser@datanode01.local
provide the password when prompted. The command is verbose and tells if the key is installed properly. You may validate the same by executing the command :
ssh hadoopuser@datanode01.local . It should NOT ask you to supply password anymore.
After the requisite software has been installed , the next step is to configure the different components in a stepwise manner. Hadoop works in a distributed mode with “namenode” being the central hub of the cluster. This gives enough reason to have the common configuration files created on namenode first, and then copied in an automated manner into all the datanodes. Let’s start with the .profile changes on namenode machine first.
The Saute
We are going to configure Hive to use MySQL as the metastore for this POC. All we need is to create a db user “hiveuser” with a valid password in the MySQL DB installed and running on namenode AND copy the MySQL driver jar into Hive lib directory
On the namenode , please fire the command to go to your HADOOP_CONF_DIR location:
cd /usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop
Here , we need to create/modify the following set of files:
slaves
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
log4j.properties
On the namenode, please fire the command to go to your HIVE_CONF_DIR location:cd /usr/local/Cellar/hive/0.13.0/libexec/conf
Here , we need to create/modify the following set of files:
hive-site.xml
hive-log4j.properties
The Slow cooking
Please find attached a simple script that, if installed on the namenode, can help you copy your config files to ALL datanodes (I call it the config-push)
Please find attached another simple script that I use for rebooting all the datanodes.
The Plating
You may wish to take the next steps if desired: Install zookeeper
Configure and run journalnodes
Go for High Availability cluster implementation with multiple Namenodes
Leave feedback if you wish to know the Hadoop configuration samples
The Garnishing
Disclaimer: Don’t sue me for any damage/infringement, I am not rich