Pharmaeffortsinsocialmedia 090927235753-phpapp02-120926093602-phpapp02
piginstallationandtestrun-131204125402-phpapp02
-
Upload
utibeimaukoh -
Category
Documents
-
view
217 -
download
0
description
Transcript of piginstallationandtestrun-131204125402-phpapp02
![Page 1: piginstallationandtestrun-131204125402-phpapp02](https://reader035.fdocuments.us/reader035/viewer/2022073121/55cf92be550346f57b9930ff/html5/thumbnails/1.jpg)
Pig Setup and Test run
By Kannan Kalidasan
![Page 2: piginstallationandtestrun-131204125402-phpapp02](https://reader035.fdocuments.us/reader035/viewer/2022073121/55cf92be550346f57b9930ff/html5/thumbnails/2.jpg)
Pig IntroductionPig is a data flow language ( PigLatin ) to write Hadoop operations without using MapReduce Java code. Pig is a layer of abstraction on top of Hadoop to simplify its use by giving a SQL-like interface to process data on Hadoop.
Help to increase productivity by not writing many lines of Java code.It supports a variety of data types and also support user-defined functions (UDFs) to write custom operations in Java, Python and JavaScript.
I recommended To learn Programming Pig – Allan Gates book.
Author explain the concepts in clear and simple way.
![Page 3: piginstallationandtestrun-131204125402-phpapp02](https://reader035.fdocuments.us/reader035/viewer/2022073121/55cf92be550346f57b9930ff/html5/thumbnails/3.jpg)
Pig Prompt is GRUNT pig grunts …
$ piggrunt>
![Page 4: piginstallationandtestrun-131204125402-phpapp02](https://reader035.fdocuments.us/reader035/viewer/2022073121/55cf92be550346f57b9930ff/html5/thumbnails/4.jpg)
Pig session has two modesLocal Mode : Access to a single machine. All files are installed and run using your local host and file system.This mode helps to debug the pig script before we process them in clusters. -x flag is used to specify the mode.pig -x local
MapReduce Mode : Access to a Hadoop cluster and HDFS installation. MapReduce mode is the default mode;To add Hadoop Conf details to Pig Class pathexport PIG_CLASSPATH=$HADOOP_HOME/conf/
both below commands are same and Start the pig session in MapReduce mode.
pig or pig -x mapreduce
![Page 5: piginstallationandtestrun-131204125402-phpapp02](https://reader035.fdocuments.us/reader035/viewer/2022073121/55cf92be550346f57b9930ff/html5/thumbnails/5.jpg)
Note to Remember ...● Hadoop services should be running to start the pig MapReduce mode and connect to HDFS and
proceed with our work.
● Pig translates the PigLatin scripts into MapReduce Jobs internally and run in hadoop cluster.
● In MapReduce mode, takes file from HDFS only, and stores the results back to HDFS.
![Page 6: piginstallationandtestrun-131204125402-phpapp02](https://reader035.fdocuments.us/reader035/viewer/2022073121/55cf92be550346f57b9930ff/html5/thumbnails/6.jpg)
Pig Installation1. Download the stable version of tarbal.
http://mirror.nexcess.net/apache/pig/pig-0.12.0/
pig-0.12.0.tar.gz
Release notes link
http://pig.apache.org/releases.html#Download
![Page 7: piginstallationandtestrun-131204125402-phpapp02](https://reader035.fdocuments.us/reader035/viewer/2022073121/55cf92be550346f57b9930ff/html5/thumbnails/7.jpg)
Pig Installation ...2.Copy the downloaded package to /usr/local
/usr/localkannan@kannandreams:/usr/local$ ls -ltrtotal 119460-rwxr-xr-x 1 root root 63851630 Nov 11 02:11 hadoop-1.2.1.tar.gzdrwxr-xr-x 16 hduser hadoop 4096 Nov 11 23:47 hadoop-rwxrwxrwx 1 root root 58433159 Dec 3 00:55 pig-0.11.1.tar.gzkannan@kannandreams:/usr/local$
![Page 8: piginstallationandtestrun-131204125402-phpapp02](https://reader035.fdocuments.us/reader035/viewer/2022073121/55cf92be550346f57b9930ff/html5/thumbnails/8.jpg)
Pig Installation ...3. unzip and change the ownersudo tar xzf pig-0.11.1.tar.gzsudo mv pig-0.11.1 pigsudo chown -R hduser:hadoop pigchown command change the owner of the directory pig from root to hadoop user hduser.
4.Login to Hadoop user hduser and set the environment variables.kannan@kannandreams:/usr/local$ su – hduserAdd the below two lines in ~/.bashrc file.
export PIG_HOME=”/usr/local/pig”
export PATH=$PATH:$PIG_HOME/bin
![Page 9: piginstallationandtestrun-131204125402-phpapp02](https://reader035.fdocuments.us/reader035/viewer/2022073121/55cf92be550346f57b9930ff/html5/thumbnails/9.jpg)
Pig Installation ...5. Source the profile file to reflect the changes
hduser@kannandreams:~$ . .bashrchduser@kannandreams:~$
6.check the pig commandoutput of the command mentioned below is not complete one.
hduser@kannandreams:~$ pig -helpWarning: $HADOOP_HOME is deprecated.Apache Pig version 0.11.1 (r1459641)compiled Mar 22 2013, 02:13:53
![Page 10: piginstallationandtestrun-131204125402-phpapp02](https://reader035.fdocuments.us/reader035/viewer/2022073121/55cf92be550346f57b9930ff/html5/thumbnails/10.jpg)
Test Run ...7. Create a sample file for processing ( file name as pigcsv )
Extension for the file doesn’t matter . it will understand based on mime type of the file.
sample file – create a file in HDFS directory with the below contents
“2006″;“2007″;“2008″;“2008″;“2008″;“2008″;“2007″;
![Page 11: piginstallationandtestrun-131204125402-phpapp02](https://reader035.fdocuments.us/reader035/viewer/2022073121/55cf92be550346f57b9930ff/html5/thumbnails/11.jpg)
Test Run ...8. Pig Scripts
Method 1 to run the pig script : Save the pig scripts as <<filename>>.pig ( In my case, it is pig_test.
pig ) and run as $ pig -x mapreduce pig_test.pig OR $ pig pig_test.pig
SampleRecord = LOAD ‘/user/hduser/piginput/pigcsv’USING PigStorage(‘;’) AS (Year:chararray);GroupByYear = GROUP SampleRecord BY Year;CountByYear = FOREACH GroupByYearGENERATE CONCAT((chararray)$0,CONCAT(‘:’,(chararray)COUNT($1)));STORE CountByYearINTO ‘/user/hduser/pigoutput’ USING PigStorage(‘t’);
![Page 12: piginstallationandtestrun-131204125402-phpapp02](https://reader035.fdocuments.us/reader035/viewer/2022073121/55cf92be550346f57b9930ff/html5/thumbnails/12.jpg)
Test Run ...Method 2 to run the pig script : line ends with ; is considered as one statement
grunt>SampleRecord = LOAD ‘/user/hduser/piginput/pigcsv’>> USING PigStorage(‘;’) AS (Year:chararray);
grunt>GroupByYear = GROUP SampleRecord BY Year;
grunt>CountByYear = FOREACH GroupByYear>>GENERATE CONCAT((chararray)$0,CONCAT(‘:’,(chararray)COUNT($1)));
grunt>STORE CountByYear>>INTO ‘/user/hduser/pigoutput’ USING PigStorage(‘t’);
![Page 13: piginstallationandtestrun-131204125402-phpapp02](https://reader035.fdocuments.us/reader035/viewer/2022073121/55cf92be550346f57b9930ff/html5/thumbnails/13.jpg)
Test Run ...9. Output :
hduser@kannandreams:/usr/local/hadoop/bin$ hadoop fs -cat /user/hduser/pigoutput/part-r-00000Warning: $HADOOP_HOME is deprecated.
“2006″:1“2007″:2“2008″:4“Year”:1hduser@kannandreams:/usr/local/hadoop/bin$
![Page 14: piginstallationandtestrun-131204125402-phpapp02](https://reader035.fdocuments.us/reader035/viewer/2022073121/55cf92be550346f57b9930ff/html5/thumbnails/14.jpg)
Script ExplanationLoad the file into a variable by mentioning the delimiter (‘;’) and Header name and its type.Use comma to include more than one column data available in file.By Default , Pig loads files delimited by tab. Need to explicitly mention type of delimiter character.
SampleRecord = LOAD ‘/user/hduser/piginput/pigcsv’USING PigStorage(‘;’) AS (Year:chararray);
Group the variable stored data by year
GroupByYear = GROUP SampleRecord BY Year;
![Page 15: piginstallationandtestrun-131204125402-phpapp02](https://reader035.fdocuments.us/reader035/viewer/2022073121/55cf92be550346f57b9930ff/html5/thumbnails/15.jpg)
Script Explanation ...Count the records for each group set and generate the output as Key:Value.Its your wish how you want to generate the file output.$0 is the group by criteria and $1 is the output of the countCountByYear = FOREACH GroupByYearGENERATE CONCAT((chararray)$0,CONCAT(‘:’,(chararray)COUNT($1)));
Store the variable in a fileSTORE CountByYearINTO ‘/user/hduser/pigoutput’ USING PigStorage(‘t’);
For Complete Script commands , refer
http://pig.apache.org/docs/r0.10.0/start.html#data-results
![Page 16: piginstallationandtestrun-131204125402-phpapp02](https://reader035.fdocuments.us/reader035/viewer/2022073121/55cf92be550346f57b9930ff/html5/thumbnails/16.jpg)
Pig in ClouderaPig Editor in Cloudera are explained in my blog.
http://kannandreams.wordpress.com/2013/12/03/pig-editor-in-cloudera/#!
![Page 17: piginstallationandtestrun-131204125402-phpapp02](https://reader035.fdocuments.us/reader035/viewer/2022073121/55cf92be550346f57b9930ff/html5/thumbnails/17.jpg)
Thank You !!!mail : [email protected]@kannanpoem on twitter
Blog: http://kannandreams.wordpress.com/about/FB Community: www.facebook.com/groups/huge360/
HUGE - Hadoop User Group & Enthusiasts
Huge , Yes Its All about "BIG" DataThis has been created to build a group to get expertise and experts in Hadoop and Big Data .