Part!#1:!Install...

25
Yassmeen Abu Hasson 2555788 Extra Lab 2 Part #1: Install Hadoop Step 1: Install Homebrew $ ruby -e "$(curl –fsSL https://raw.github.com/mxcl/homebrew/go/install)"

Transcript of Part!#1:!Install...

Page 1: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

   

Part  #1:  Install Hadoop  Step  1:  Install  Homebrew  

$ ruby -e "$(curl –fsSL https://raw.github.com/mxcl/homebrew/go/install)"      

   

               

Page 2: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

                 

Page 3: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

   Installing  the  command  Line  done  automatically    

   

       

Page 4: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

   

Step2:  Installing  Hadoop  Step  3:  Configure  Hadoop      

Note:  It  installed  (  Hadoop-­‐1.2.1  )  version          

Page 5: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

   Step3:  Continue…    

Add  the  following  line  to conf/hadoop-env.sh: export  HADOOP_OPTS="-­‐Djava.security.krb5.realm=  -­‐Djava.security.krb5.kdc="    

                                     

Page 6: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

     Add  the  following  lines  to conf/core-site.xml inside  the  configuration  tags:  

<property>          <name>fs.default.name</name>          <value>hdfs://localhost:9000</value>  </property>  

 

   

                                           

Page 7: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

 Add  the  following  lines  to conf/hdfs-site.xml inside  the  configuration  tags:    <property>          <name>dfs.replication</name>          <value>1</value>  </property>      

                             

Page 8: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

 Add  the  following  lines to conf/mapred-site.xml inside  the  configuration  tags:    <property>          <name>mapred.job.tracker</name>          <value>localhost:9001</value>  </property>    

   Step  4:  Enable  SSH  to  localhost    Go  to  System  Preferences  >  Sharing.  Make  sure  “Remote  Login”  is  checked.     $ ssh-keygen -t rsa ��� $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys        

Page 9: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

       

               

Page 10: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

     

                   

Page 11: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

 Step  5:  Format  Hadoop  filesystem    $ bin/hadoop namenode -format    

             

Page 12: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

 Step  6:  Start  Hadoop    $ bin/start-all.sh Make  sure  that  all  Hadoop  processes  are  running:      $ jps  

       

Page 13: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

   Run  a  Hadoop  example:    

���$ bin/hadoop jar /usr/local/Cellar/hadoop/1.2.1/libexec/hadoop-examples-1.2.1.jar pi 10 100  

   

                 

Page 14: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

 

                             

Page 15: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

 Step  7:  Verify  hadoop  started  properly  using:    (  Output  must  be  6  )  

 ps ax | grep hadoop | wc -l  

   

                 

Page 16: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

Part  #2:  Install and Run Eclipse

Step  1:  Create  a  java  project  (  wordcount  )    Step  2:  Configure  the  project    

-­‐  Select  the  project WordCount in  the  Package  Explorer.    -­‐  Select  File  >  Properties.    -­‐  Select  Java  Build  Path.    -­‐  Select  Libraries.  -­‐  Press  Add  External  JARS  and  select  the  following  file:  

 

                 

Page 17: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

Step  3:  Add  a  Java  class  to  the  project  (WordCount.java)    import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "WordCount"); job.setJarByClass(WordCount.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setCombinerClass(Reduce.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }  

 

Page 18: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

       

 

Page 19: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

Step  4:  Configure  the  application    *  Select  the  project  WordCount  in  the  Package  Explorer.    *  Select  Run  >  Run  Configurations…    *  Select  Java  Application.    *  Press  the  icon  for  New  launch  configuration.    *  Enter  wordcount  as  the  name.  *  Enter  WordCount  as  the  main  class.    *  Select  Arguments.    *  Add  the  following  line  to  Program  arguments:    input  output    *  Add  the  following  line  to  VM  arguments:      -Djava.security.krb5.realm= -Djava.security.krb5.kdc=    *  Press  Apply.    *  Press  Close.    

       

Page 20: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

                                     

Page 21: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

Step  5:  Create  input  files    $ cd ~/Documents/workspace/WordCount ��� $ mkdir input $ curl http://www.gutenberg.org/cache/epub/1342/pg1342.txt > input/pg1342.txt $ curl http://www.gutenberg.org/cache/epub/4300/pg4300.txt > input/pg4300.txt $ curl http://www.gutenberg.org/cache/epub/5000/pg5000.txt > input/pg5000.txt $ curl http://www.gutenberg.org/cache/epub/20417/pg20417.txt > input/pg20417.txt  

                               

Page 22: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

Step  6:  Run  the  application    This  will  create  output  files _SUCCESS and part-r-00000 in  a  folder output.      

                                         

Page 23: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

Examine  the  output:     $ cat output/part-r-00000

                                                               

 

Page 24: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2  

 -­‐  Create  my  own  file  and  run  it  using  WordCount  program  :      

*  First,  I  removed  the  output  folder  before  rerunning  the  application:    $ rm -rf output

*  This  the  output  I  got  when  I  run  my  own  file  in  Eclipse    

 

Page 25: Part!#1:!Install Hadoopcis.csuohio.edu/~sschung/cis612/Lab_MRHadoop_Um.pdfYassmeen’Abu’Hasson’-’2555788’ ExtraLab2’ ’ ’ Step2:!Installing’Hadoop’ Step!3:’Configure’Hadoop’

Yassmeen  Abu  Hasson  -­‐  2555788  Extra  Lab  2