Part!#1:!Install...

Yassmeen Abu Hasson -‐ 2555788 Extra Lab 2

Part #1: Install Hadoop Step 1: Install Homebrew

$ ruby -e "$(curl –fsSL https://raw.github.com/mxcl/homebrew/go/install)"


Installing the command Line done automatically


Step2: Installing Hadoop Step 3: Configure Hadoop

Note: It installed ( Hadoop-‐1.2.1 ) version


Step3: Continue…

Add the following line to conf/hadoop-env.sh: export HADOOP_OPTS="-‐Djava.security.krb5.realm= -‐Djava.security.krb5.kdc="


Add the following lines to conf/core-site.xml inside the configuration tags:

<property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property>


Add the following lines to conf/hdfs-site.xml inside the configuration tags: <property> <name>dfs.replication</name> <value>1</value> </property>


Add the following lines to conf/mapred-site.xml inside the configuration tags: <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property>

Step 4: Enable SSH to localhost Go to System Preferences > Sharing. Make sure “Remote Login” is checked. $ ssh-keygen -t rsa �� $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys


Step 5: Format Hadoop filesystem $ bin/hadoop namenode -format


Step 6: Start Hadoop $ bin/start-all.sh Make sure that all Hadoop processes are running: $ jps


Run a Hadoop example:

��$ bin/hadoop jar /usr/local/Cellar/hadoop/1.2.1/libexec/hadoop-examples-1.2.1.jar pi 10 100


Step 7: Verify hadoop started properly using: ( Output must be 6 )

ps ax | grep hadoop | wc -l


Part #2: Install and Run Eclipse

Step 1: Create a java project ( wordcount ) Step 2: Configure the project

-‐ Select the project WordCount in the Package Explorer. -‐ Select File > Properties. -‐ Select Java Build Path. -‐ Select Libraries. -‐ Press Add External JARS and select the following file:


Step 3: Add a Java class to the project (WordCount.java) import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "WordCount"); job.setJarByClass(WordCount.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setCombinerClass(Reduce.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }


Step 4: Configure the application * Select the project WordCount in the Package Explorer. * Select Run > Run Configurations… * Select Java Application. * Press the icon for New launch configuration. * Enter wordcount as the name. * Enter WordCount as the main class. * Select Arguments. * Add the following line to Program arguments: input output * Add the following line to VM arguments: -Djava.security.krb5.realm= -Djava.security.krb5.kdc= * Press Apply. * Press Close.


Step 5: Create input files $ cd ~/Documents/workspace/WordCount �� $ mkdir input $ curl http://www.gutenberg.org/cache/epub/1342/pg1342.txt > input/pg1342.txt $ curl http://www.gutenberg.org/cache/epub/4300/pg4300.txt > input/pg4300.txt $ curl http://www.gutenberg.org/cache/epub/5000/pg5000.txt > input/pg5000.txt $ curl http://www.gutenberg.org/cache/epub/20417/pg20417.txt > input/pg20417.txt


Step 6: Run the application This will create output files _SUCCESS and part-r-00000 in a folder output.


Examine the output: $ cat output/part-r-00000


-‐ Create my own file and run it using WordCount program :

* First, I removed the output folder before rerunning the application: $ rm -rf output

* This the output I got when I run my own file in Eclipse

Part!#1:!Install...

Documents

Transcript of Part!#1:!Install...