Installation and setup hadoop published
-
Upload
er-dipendra-kusi -
Category
Data & Analytics
-
view
86 -
download
2
Transcript of Installation and setup hadoop published
![Page 1: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/1.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Installation and setup Hadoop
![Page 2: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/2.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Step 1: First go to virtual box site and download the virtual box:
https://www.virtualbox.org/wiki/Downloads
Step 2: Go to cloudera site and download cloudera
http://www.cloudera.com/downloads/quickstart_vms/5-8.html
![Page 3: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/3.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Step 3: Run the cloudera in virtual box
Step 4:
Now check whether the Hadoop is running or not through terminal
$ Hadoop version
![Page 4: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/4.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Step 5:
Also, check Hadoop configuration through browser
![Page 5: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/5.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Step 6:
Now go to site: http://tiny.cloudera.com/hadoopTutorialSample.
And download the source code of word count and extract it.
Step7:
Now open the terminal in this wordcount.jar location.
Create the own folder for input data:
$ Hadoop fs -mkdir /user/cloudera/Hadoop_data /user/cloudera/Hadoop_data/input
![Page 6: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/6.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Step 8:
Now put the file to be process in /user/cloudera/Hadoop_data/input folder
$ Hadoop fs -put file0 /user/cloudera/Hadoop_data/input
![Page 7: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/7.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Step 9:
Now run the word count jar in Hadoop to process the word in file0.
$ Hadoop jar wordcount.jar /user/cloudera/Hadoop_data/input /user/cloudera/Hadoop_data/output
![Page 8: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/8.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Running this command, exception occur saying “ClassNotFoundException”. This mean that jar file has no
explicity define the running class so let define the running class which is in org.myorg.WordCount class
Now wordcount.jar is running is Hadoop
$ Hadoop jar wordcount.jar org.myorg.WordCount /user/cloudera/Hadoop_data/input
/user/cloudera/Hadoop_data/output
![Page 9: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/9.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Step 10:
Now check the output contain:
$ Hadoop fs -cat /user/cloudera/Hadoop_data/output/*
![Page 10: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/10.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
So, the output has word count as expected.
Create Jar file and run in Hadoop
Step 11: Now let’s create java file in eclipse and export it to jar and run in Hadoop
First create project Hadoop_first_project in eclipse
![Page 11: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/11.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Now create WordCount class and paste the below code:
import java.io.IOException;
import java.util.regex.Pattern;
import org.apache.hadoop.conf.Configured;
![Page 12: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/12.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.log4j.Logger;
public class WordCount extends Configured implements Tool {
private static final Logger LOG = Logger.getLogger(WordCount.class);
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new WordCount(), args);
System.exit(res);
}
public int run(String[] args) throws Exception {
Job job = Job.getInstance(getConf(), "wordcount");
job.setJarByClass(this.getClass());
// Use TextInputFormat, the default unless job.setInputFormatClass is used
![Page 13: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/13.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
return job.waitForCompletion(true) ? 0 : 1;
}
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
private long numRecords = 0;
private static final Pattern WORD_BOUNDARY = Pattern.compile("\\s*\\b\\s*");
public void map(LongWritable offset, Text lineText, Context context)
throws IOException, InterruptedException {
String line = lineText.toString();
Text currentWord = new Text();
for (String word : WORD_BOUNDARY.split(line)) {
if (word.isEmpty()) {
continue;
}
currentWord = new Text(word);
context.write(currentWord,one);
}
}
![Page 14: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/14.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text word, Iterable<IntWritable> counts, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable count : counts) {
sum += count.get();
}
context.write(word, new IntWritable(sum));
}
}
}
Here, hadoop library is missing so let load the required library.
Go to project property
![Page 15: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/15.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Go to java build path and libraries:
Now, click on add external jars and add jar from following location
File System -> usr -> lib ->Hadoop
And add all the jar file
![Page 16: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/16.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Go to client-0.20 folder and add all jar from there as well
Go to lib folder and add all jar from there as well
Click on ok. You will see all the error will disappear.
Now export the project to jar file.
Right click on project-> export
![Page 17: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/17.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Click on jar file->next
![Page 18: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/18.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Now select the project and select the export location of jar file and click next and then next
![Page 19: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/19.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Click on browse to select the main running class
![Page 20: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/20.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Click ok-> finish
![Page 21: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/21.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Now go to export mywordcount.jar location.
Run command:
Delete the output folder that has been created previously
$hadoop fs -rm -r /user/cloudera/Hadoop_data/output
And run the jar in Hadoop(no need to define the class since we have already defined the class entry
point during the export)
![Page 22: Installation and setup hadoop published](https://reader034.fdocuments.us/reader034/viewer/2022052213/58ac43fb1a28ab99028b4fc9/html5/thumbnails/22.jpg)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
$ Hadoop jar mywordcount.jar /user/cloudera/Hadoop_data/input
/user/cloudera/Hadoop_data/input/output