Part!#1:!Install...
Transcript of Part!#1:!Install...
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Part #1: Install Hadoop Step 1: Install Homebrew
$ ruby -e "$(curl –fsSL https://raw.github.com/mxcl/homebrew/go/install)"
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Installing the command Line done automatically
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Step2: Installing Hadoop Step 3: Configure Hadoop
Note: It installed ( Hadoop-‐1.2.1 ) version
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Step3: Continue…
Add the following line to conf/hadoop-env.sh: export HADOOP_OPTS="-‐Djava.security.krb5.realm= -‐Djava.security.krb5.kdc="
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Add the following lines to conf/core-site.xml inside the configuration tags:
<property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property>
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Add the following lines to conf/hdfs-site.xml inside the configuration tags: <property> <name>dfs.replication</name> <value>1</value> </property>
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Add the following lines to conf/mapred-site.xml inside the configuration tags: <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property>
Step 4: Enable SSH to localhost Go to System Preferences > Sharing. Make sure “Remote Login” is checked. $ ssh-keygen -t rsa ��� $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Step 5: Format Hadoop filesystem $ bin/hadoop namenode -format
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Step 6: Start Hadoop $ bin/start-all.sh Make sure that all Hadoop processes are running: $ jps
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Run a Hadoop example:
���$ bin/hadoop jar /usr/local/Cellar/hadoop/1.2.1/libexec/hadoop-examples-1.2.1.jar pi 10 100
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Step 7: Verify hadoop started properly using: ( Output must be 6 )
ps ax | grep hadoop | wc -l
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Part #2: Install and Run Eclipse
Step 1: Create a java project ( wordcount ) Step 2: Configure the project
-‐ Select the project WordCount in the Package Explorer. -‐ Select File > Properties. -‐ Select Java Build Path. -‐ Select Libraries. -‐ Press Add External JARS and select the following file:
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Step 3: Add a Java class to the project (WordCount.java) import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "WordCount"); job.setJarByClass(WordCount.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setCombinerClass(Reduce.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Step 4: Configure the application * Select the project WordCount in the Package Explorer. * Select Run > Run Configurations… * Select Java Application. * Press the icon for New launch configuration. * Enter wordcount as the name. * Enter WordCount as the main class. * Select Arguments. * Add the following line to Program arguments: input output * Add the following line to VM arguments: -Djava.security.krb5.realm= -Djava.security.krb5.kdc= * Press Apply. * Press Close.
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Step 5: Create input files $ cd ~/Documents/workspace/WordCount ��� $ mkdir input $ curl http://www.gutenberg.org/cache/epub/1342/pg1342.txt > input/pg1342.txt $ curl http://www.gutenberg.org/cache/epub/4300/pg4300.txt > input/pg4300.txt $ curl http://www.gutenberg.org/cache/epub/5000/pg5000.txt > input/pg5000.txt $ curl http://www.gutenberg.org/cache/epub/20417/pg20417.txt > input/pg20417.txt
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Step 6: Run the application This will create output files _SUCCESS and part-r-00000 in a folder output.
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
Examine the output: $ cat output/part-r-00000
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2
-‐ Create my own file and run it using WordCount program :
* First, I removed the output folder before rerunning the application: $ rm -rf output
* This the output I got when I run my own file in Eclipse
Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2