Pig workshop
-
Upload
sudar-muthu -
Category
Technology
-
view
5.763 -
download
0
description
Transcript of Pig workshop
![Page 1: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/1.jpg)
Pig WorkshopSudar Muthu
http://sudarmuthu.comhttp://twitter.com/sudarmuthu
https://github.com/sudar
![Page 2: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/2.jpg)
Research Engineer by profession I mine useful information from data You might recognize me from other HasGeek
events Blog at http://sudarmuthu.com Builds robots as hobby ;)
Who am I?
![Page 3: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/3.jpg)
HasGeekSpecial Thanks
![Page 4: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/4.jpg)
What I will not cover?
![Page 5: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/5.jpg)
What is BigData, or why it is needed? What is MapReduce? What is Hadoop? Internal architecture of Pig
http://sudarmuthu.com/blog/getting-started-with-hadoop-and-pig
What I will not cover?
![Page 6: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/6.jpg)
What we will see today?
![Page 7: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/7.jpg)
What is Pig How to use it
Loading and storing data Pig Latin SQL vs Pig Writing UDF’s
Debugging Pig Scripts Optimizing Pig Scripts When to use Pig
What we will see today?
![Page 8: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/8.jpg)
So, all of you have Pig installed right? ;)
![Page 9: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/9.jpg)
“Platform for analyzing large sets of data”
What is Pig?
![Page 10: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/10.jpg)
Pig Shell (Grunt) Pig Language (Latin) Libraries (Piggy Bank) User Defined Functions (UDF)
Components of Pig
![Page 11: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/11.jpg)
It is a data flow language Provides standard data processing
operations Insulates Hadoop complexity Abstracts Map Reduce Increases programmer productivity
… but there are cases where Pig is not suitable.
Why Pig?
![Page 12: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/12.jpg)
Pig Modes
![Page 13: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/13.jpg)
For this workshop, we will be using Pig only in local
mode
![Page 14: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/14.jpg)
Getting to know your Pig shell
![Page 15: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/15.jpg)
Similar to Python’s shellpig –x local
![Page 16: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/16.jpg)
Inline in shell From a file Streaming through other executable Embed script in other languages
Different ways of executing Pig Scripts
![Page 17: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/17.jpg)
Pigs eat anythingLoading and Storing data
![Page 18: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/18.jpg)
file = LOAD 'data/dropbox-policy.txt' AS (line);
data = LOAD 'data/tweets.csv' USING PigStorage(',');
data = LOAD 'data/tweets.csv' USING PigStorage(',') AS ('list', 'of', 'fields');
Loading Data into Pig
![Page 19: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/19.jpg)
PigStorage – for most cases TextLoader – to load text files JSONLoader – to load JSON files Custom loaders – You can write your own
custom loaders as well
Loading Data into Pig
![Page 20: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/20.jpg)
DUMP input;
Very useful for debugging, but don’t use it on huge datasets
Viewing Data
![Page 21: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/21.jpg)
STORE data INTO 'output_location';
STORE data INTO 'output_location' USING PigStorage();
STORE data INTO 'output_location' USING PigStorage(',');
STORE data INTO 'output_location' USING BinStorage();
Storing Data from Pig
![Page 22: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/22.jpg)
Similar to `LOAD`, lot of options are available
Can store locally or in HDFS You can write your own custom Storage as
well
Storing Data
![Page 23: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/23.jpg)
data = LOAD 'data/data-bag.txt' USING PigStorage(',');
STORE data INTO 'data/output/load-store' USING PigStorage('|');
https://github.com/sudar/pig-samples/load-store.pig
Load and Store example
![Page 24: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/24.jpg)
Pig Latin
![Page 25: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/25.jpg)
Scalar Types Complex Types
Data Types
![Page 26: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/26.jpg)
int, long – (32, 64 bit) integer float, double – (32, 64 bit) floating point boolean (true/false) chararray (String in UTF-8) bytearray (blob) (DataByteArray in Java)
If you don’t specify anything bytearray is used by default
Scalar Types
![Page 27: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/27.jpg)
tuple – ordered set of fields (data) bag – collection of tuples map – set of key value pairs
Complex Types
![Page 28: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/28.jpg)
Row with one or more fields Fields can be of any data type Ordering is important Enclosed inside parentheses ()
Eg: (Sudar, Muthu, Haris, Dinesh)(Sudar, 176, 80.2F)
Tuple
![Page 29: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/29.jpg)
Set of tuples SQL equivalent is Table Each tuple can have different set of fields Can have duplicates Inner bag uses curly braces {} Outer bag doesn’t use anything
Bag
![Page 30: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/30.jpg)
Outer bag
(1,2,3)(1,2,4)(2,3,4)(3,4,5)(4,5,6)
https://github.com/sudar/pig-samples/data-bag.pig
Bag - Example
![Page 31: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/31.jpg)
Inner bag
(1,{(1,2,3),(1,2,4)})(2,{(2,3,4)})(3,{(3,4,5)})(4,{(4,5,6)})
https://github.com/sudar/pig-samples/data-bag.pig
Bag - Example
![Page 32: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/32.jpg)
Set of key value pairs Similar to HashMap in Java Key must be unique Key must be of chararray data type Values can be any type Key/value is separated by # Map is enclosed by []
Map
![Page 33: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/33.jpg)
[name#sudar, height#176, weight#80.5F]
[name#(sudar, muthu), height#176, weight#80.5F]
[name#(sudar, muthu), languages#(Java, Pig, Python)]
Map - Example
![Page 34: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/34.jpg)
Similar to SQL Denotes that value of data element is
unknown Any data type can be null
Null
![Page 35: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/35.jpg)
We can specify a schema (collection of datatypes) to `LOAD` statements
data = LOAD 'data/data-bag.txt' USING PigStorage(',') AS (f1:int, f2:int, f3:int);
data = LOAD 'data/nested-schema.txt' AS (f1:int, f2:bag{t:tuple(n1:int, n2:int)}, f3:map[]);
Schemas in Load statement
![Page 36: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/36.jpg)
Fields can be looked up by
Position Name Map Lookup
Expressions
![Page 37: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/37.jpg)
data = LOAD 'data/nested-schema.txt' AS (f1:int, f2:bag{t:tuple(n1:int, n2:int)}, f3:map[]);
by_pos = FOREACH data GENERATE $0;DUMP by_pos;
by_field = FOREACH data GENERATE f2;DUMP by_field;
by_map = FOREACH data GENERATE f3#'name';DUMP by_map;
https://github.com/sudar/pig-samples/lookup.pig
Expressions - Example
![Page 38: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/38.jpg)
Operators
![Page 39: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/39.jpg)
All usual arithmetic operators are supported
Addition (+) Subtraction (-) Multiplication (*) Division (/) Modulo (%)
Arithmetic Operators
![Page 40: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/40.jpg)
All usual boolean operators are supported
AND OR NOT
Boolean Operators
![Page 41: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/41.jpg)
All usual comparison operators are supported
== != < > <= >=
Comparison Operators
![Page 42: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/42.jpg)
FOREACH FLATTERN GROUP FILTER COUNT ORDER BY DISTINCT LIMIT JOIN
Relational Operators
![Page 43: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/43.jpg)
Generates data transformations based on columns of data
x = FOREACH data GENERATE *;
x = FOREACH data GENERATE $0, $1;
x = FOREACH data GENERATE $0 AS first, $1 AS second;
FOREACH
![Page 44: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/44.jpg)
Un-nests tuples and bags. Most of the time results in cross product
(a, (b, c)) => (a,b,c)
({(a,b),(d,e)}) => (a,b) and (d,e)
(a, {(b,c), (d,e)}) => (a, b, c) and (a, d, e)
FLATTEN
![Page 45: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/45.jpg)
Groups data in one or more relations Groups tuples that have the same group key Similar to SQL group by operator
outerbag = LOAD 'data/data-bag.txt' USING PigStorage(',') AS (f1:int, f2:int, f3:int);DUMP outerbag;
innerbag = GROUP outerbag BY f1;DUMP innerbag;
https://github.com/sudar/pig-samples/group-by.pig
GROUP
![Page 46: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/46.jpg)
Selects tuples from a relation based on some condition
data = LOAD 'data/data-bag.txt' USING PigStorage(',') AS (f1:int, f2:int, f3:int);DUMP data;
filtered = FILTER data BY f1 == 1;DUMP filtered;
https://github.com/sudar/pig-samples/filter-by.pig
FILTER
![Page 47: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/47.jpg)
Counts the number of tuples in a relationship
data = LOAD 'data/data-bag.txt' USING PigStorage(',') AS (f1:int, f2:int, f3:int);grouped = GROUP data BY f2;
counted = FOREACH grouped GENERATE group, COUNT (data);DUMP counted;
https://github.com/sudar/pig-samples/count.pig
COUNT
![Page 48: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/48.jpg)
Sort a relation based on one or more fields. Similar to SQL order by
data = LOAD 'data/nested-sample.txt' USING PigStorage(',') AS (f1:int, f2:int, f3:int);DUMP data;
ordera = ORDER data BY f1 ASC;DUMP ordera;
orderd = ORDER data BY f1 DESC;DUMP orderd;
https://github.com/sudar/pig-samples/order-by.pig
ORDER By
![Page 49: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/49.jpg)
Removes duplicates from a relation
data = LOAD 'data/data-bag.txt' USING PigStorage(',') AS (f1:int, f2:int, f3:int);DUMP data;
unique = DISTINCT data;DUMP unique;
https://github.com/sudar/pig-samples/distinct.pig
DISTINCT
![Page 50: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/50.jpg)
Limits the number of tuples in the output.
data = LOAD 'data/data-bag.txt' USING PigStorage(',') AS (f1:int, f2:int, f3:int);DUMP data;
limited = LIMIT data 3;DUMP limited;
https://github.com/sudar/pig-samples/limit.pig
LIMIT
![Page 51: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/51.jpg)
Joins relation based on a field. Both outer and inner joins are supported
a = LOAD 'data/data-bag.txt' USING PigStorage(',') AS (f1:int, f2:int, f3:int);DUMP a;
b = LOAD 'data/simple-tuples.txt' USING PigStorage(',') AS (t1:int, t2:int);DUMP b;
joined = JOIN a by f1, b by t1;DUMP joined;
https://github.com/sudar/pig-samples/join.pig
JOIN
![Page 52: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/52.jpg)
From Table – Load file(s) Select – FOREACH GENERATE Where – FILTER BY Group By – GROUP BY + FOREACH
GENERATE Having – FILTER BY Order By – ORDER BY Distinct - DISTINCT
SQL vs Pig
![Page 53: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/53.jpg)
Count the number of words in a text file
Let’s see a complete example
https://github.com/sudar/pig-samples/count-words.pig
![Page 54: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/54.jpg)
Extending Pig - UDF
![Page 55: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/55.jpg)
Do operations on more than one field Do more than grouping and filtering Programmer is comfortable Want to reuse existing logic
Traditionally UDF can be written only in Java. Now other languages like Python are also supported
Why UDF?
![Page 56: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/56.jpg)
Eval Functions Filter functions Load functions Store functions
Different types of UDF’s
![Page 57: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/57.jpg)
Can be used in FOREACH statement Most common type of UDF Can return simple types or Tuples
b = FOREACH a generate udf.Function($0);
b = FOREACH a generate udf.Function($0, $1);
Eval Functions
![Page 58: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/58.jpg)
Extend EvalFunc<T> interface The generic <T> should contain the return type Input comes as a Tuple Should check for empty and nulls in input Extend exec() function and it should return the value Extend getArgToFuncMapping() to let UDF know
about Argument mapping Extend outputSchema() to let UDF know about
output schema
Eval Functions
![Page 59: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/59.jpg)
Create a jar file which contains your UDF classes
Register the jar at the top of Pig script Register other jars if needed Define the UDF function Use your UDF function
Using Java UDF in Pig Scripts
![Page 60: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/60.jpg)
Let’s see an example which returns a string
https://github.com/sudar/pig-samples/strip-quote.pig
![Page 61: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/61.jpg)
Let’s see an example which returns a Tuple
https://github.com/sudar/pig-samples/get-twitter-names.pig
![Page 62: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/62.jpg)
Can be used in the Filter statements Returns a boolean value
Eg: vim_tweets = FILTER data By FromVim(StripQuote($6));
Filter Functions
![Page 63: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/63.jpg)
Extends FilterFun, which is a EvalFunc<Boolean>
Should return a boolean Input it is same as EvalFunc<T> Should check for empty and nulls in input Extend getArgToFuncMapping() to let UDF
know about Argument mapping
Filter Functions
![Page 64: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/64.jpg)
Let’s see an example which returns a Boolean
https://github.com/sudar/pig-samples/from-vim.pig
![Page 65: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/65.jpg)
If the error affects only particular row then return null.
If the error affects other rows, but can recover, then throw an IOException
If the error affects other rows, and can’t recover, then also throw an IOException. Pig and Hadoop will quit, if there are many IOExceptions.
Error Handling in UDF
![Page 66: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/66.jpg)
Can we try to write some more UDF’s?
![Page 67: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/67.jpg)
Writing UDF in other languages
![Page 68: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/68.jpg)
Streaming
![Page 69: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/69.jpg)
Entire data set is passed through an external task
The external task can be in any language Even shell script also works Uses the `STREAM` function
Streaming
![Page 70: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/70.jpg)
data = LOAD 'data/tweets.csv' USING PigStorage(',');
filtered = STREAM data THROUGH `cut -f6,8`;
DUMP filtered;
https://github.com/sudar/pig-samples/stream-shell-script.pig
Stream through shell script
![Page 71: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/71.jpg)
data = LOAD 'data/tweets.csv' USING PigStorage(',');
filtered = STREAM data THROUGH `strip.py`;
DUMP filtered;
https://github.com/sudar/pig-samples/stream-python.pig
Stream through Python
![Page 72: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/72.jpg)
DUMP is your friend, but use with LIMIT DESCRIBE – will print the schema names ILLUSTRATE – Will show the structure of the
schema In UDF’s, we can use warn() function. It
supports upto 15 different debug levels Use Penny - https://cwiki.apache.org/PIG/
pennytoollibrary.html
Debugging Pig Scripts
![Page 73: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/73.jpg)
Project early and often Filter early and often Drop nulls before a join Prefer DISTINCT over GROUP BY Use the right data structure
Optimizing Pig Scripts
![Page 74: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/74.jpg)
-p key=value - substitutes a single key, value
-m file.ini – substitutes using an ini file default – provide default values
http://sudarmuthu.com/blog/passing-command-line-arguments-to-pig-scripts
Using Param substitution
![Page 75: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/75.jpg)
Anything data relatedProblems that can be solved using Pig
![Page 76: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/76.jpg)
Lot of custom logic needs to be implemented Need to do lot of cross lookup Data is mostly binary (processing image
files) Real-time processing of data is needed
When not to use Pig?
![Page 77: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/77.jpg)
PiggyBank - https://cwiki.apache.org/PIG/piggybank.html
DataFu – Linked-In Pig Library - https://github.com/linkedin/datafu
Elephant Bird – Twitter Pig Library - https://github.com/kevinweil/elephant-bird
External Libraries
![Page 78: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/78.jpg)
Pig homepage - http://pig.apache.org/ My blog about Pig - http://sudarmuthu.com/blog/category/hadoop-pig Sample code –
https://github.com/sudar/pig-samples Slides – http://slideshare.net/sudar
Useful Links
![Page 79: Pig workshop](https://reader035.fdocuments.us/reader035/viewer/2022062220/554f62bab4c905c8088b4b4a/html5/thumbnails/79.jpg)
Thank you