Pig Installation Guide and Practical Example Presented by Priagung Khusumanegara Prof. Kyungbaek...

11
Pig Installation Guide and Practical Example Presented by Priagung Khusumanegara Prof. Kyungbaek Kim

Transcript of Pig Installation Guide and Practical Example Presented by Priagung Khusumanegara Prof. Kyungbaek...

Page 1: Pig Installation Guide and Practical Example Presented by Priagung Khusumanegara Prof. Kyungbaek Kim.

Pig Installation Guide and Practical Example

Presented by Priagung KhusumanegaraProf. Kyungbaek Kim

Page 2: Pig Installation Guide and Practical Example Presented by Priagung Khusumanegara Prof. Kyungbaek Kim.

Installation Guide

•Requirements Java 1.6 (this example using java-7-openjdk) Hadoop 0.23.x, 1.2.x, or 2.5.x (example using Hadoop 1.2.1)

Page 3: Pig Installation Guide and Practical Example Presented by Priagung Khusumanegara Prof. Kyungbaek Kim.

Configuration

• Make sure you have installed Hadoop and can run Hadoop correctly• Download Pig Stable Version (0.13) $ wget http://apache.tt.co.kr/pig/pig-0.13.0/pig-0.13.0.tar.gz • Unpack the downloaded Pig distribution and move it to preferred directory (example using

/usr/local/pig/)$ tar -xvzf pig-0.13.0.tar.gz$ mv pig-0.13.0 /usr/local/pig

• Edit ~/.bashrc and add the following statement in the last lineexport PIG_HOME=/usr/local/pigexport PATH=$PATH:$PIG_HOME/bin

• Test the Pig installation with simple command $pig -help

Page 4: Pig Installation Guide and Practical Example Presented by Priagung Khusumanegara Prof. Kyungbaek Kim.

Practical ExampleObjective : Counting packet length between ip source and ip destination in the network traffic• Running Hadoop

• Download Input files and copy them to HDFS- $ wget https://www.dropbox.com/s/k6li67bha12geet/input.txt?dl=1 -O input.txt- $ hadoop dfs –copyFromLocal input.txt /input/input.txtNote: get input file using tcpdump : tcpdump -n -i wlan0 >> input.txt

Page 5: Pig Installation Guide and Practical Example Presented by Priagung Khusumanegara Prof. Kyungbaek Kim.

Screenshot Input File (input.txt)

• Enter grunt $ pig –x mapreduce

Page 6: Pig Installation Guide and Practical Example Presented by Priagung Khusumanegara Prof. Kyungbaek Kim.

• Load text file into a bag, stick entire line into element ‘line’ of type ’chararray’

RAW_LOGS = LOAD ‘/input/input.txt ' AS (line:chararray);

• Apply a schema to raw data LOGS_BASE = FOREACH RAW_LOGS GENERATE FLATTEN( (tuple(CHARARRAY,CHARARRAY,LONG))REGEX_EXTRACT_ALL(line,'.+\\s(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}).+\\s(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}).+length\\s+(\\d+)')) AS (IPS:chararray, IPD:chararray, S:long);

• Group traffic information by source IP addresses and destination IP addresses

FLOW = GROUP LOGS_BASE BY (IPS, IPD);

Page 7: Pig Installation Guide and Practical Example Presented by Priagung Khusumanegara Prof. Kyungbaek Kim.

• Count the number of packet length by each IP addressTRAFFIC = FOREACH FLOW {sorted = ORDER LOG_BASE by S DESC; GENERATE group, SUM(LOGS_BASE.S);}

• Store output data in HDFS (/output)STORE TRAFFIC INTO '/output';

Page 8: Pig Installation Guide and Practical Example Presented by Priagung Khusumanegara Prof. Kyungbaek Kim.

SCREENSHOT EACH PROCESS

Page 9: Pig Installation Guide and Practical Example Presented by Priagung Khusumanegara Prof. Kyungbaek Kim.
Page 10: Pig Installation Guide and Practical Example Presented by Priagung Khusumanegara Prof. Kyungbaek Kim.
Page 11: Pig Installation Guide and Practical Example Presented by Priagung Khusumanegara Prof. Kyungbaek Kim.

• Screenshot Output File