Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy,...
-
Upload
ernesto-spike -
Category
Documents
-
view
214 -
download
1
Transcript of Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy,...
Classification of Applications in HTTP Tunnels
By
Gajen Piraisoody, Changcheng Huang ,Biswajit Nandy, Nabil Seddigh
Electrical and Computer EngineeringCarleton University.Ottawa, ON. Canada.
12 November 2013
Slide 2
Outline• Overview• Motivation• Problem Statement• Contribution• Approach to classification• Evaluation• Conclusion
Slide 3
Overview – HTTP Tunnel
What is HTTP Tunnelled Traffic?
• HTTP port used to carry web traffic
• Non-HTTP applications are wrapped in HTTP protocols
• HTTP port now tunnels email, chat, video, image, audio, file-transfer and
peer to peer traffic
Why HTTP Tunnel non-HTTP applications?
• HTTP clients (browser) are readily available and deployable
• Tunneling permits applications to by-pass restricted network connectivity
that exists in the form of firewalls, proxy and NAT
Slide 4
Motivation
HTTP Traffic Classification
• HTTP traffic in an entire network is about 80%
• HTTP tunneled traffic is not identifiable by ports alone
• Tunneled traffic like YouTube and Netflix is increasing in cloud network
• Info on tunneled traffic helps cloud-centre management with planning,
provisioning and ensuring quality of service
Why flow-based against DPI classification process?
• Provides a scalable software solution(less CPU consumption)
• Can classify encrypted data
Slide 5
Problem Statement
Given network traffic measured with NetFlow
Find a way to classify HTTP tunnelled traffic
• Audio (Radio & Music), Video and File-transfer
No training dataset needed for the proposed algorithm
Use information available from NetFlow only
Slide 6
Contribution
Proposed scheme classifies HTTP tunneled traffic: audio(radio
& music), video and file-transfer
Proposed scheme helps audio classification by using
‘occupancy’ feature
Proposed scheme enhances classification performance by
including flow-group found using flows from Content
Servers(subnet masked IP of long-flow)
Slide 7
Approach in detail
Identify long-flow HTTP traffic Parameter : BPF
Classify radio trafficParameter : BPF, BPP, BPS, Occupancy
Classify music trafficParameter : BPF, BPP, BPS, Occupancy
Classify video trafficParameter : BPF, BPP, BPS, Flow-group
Classify file-transfer trafficParameter : BPF, BPP, BPS, Flow-group
Bytes-per-second(BPS), Bytes-per-flow(BPF), Bytes-per-pkt(BPP)
Slide 8
Approach to Classification
Identify Long-flow HTTP Traffic
Classify Audio Traffic
Classify Video & File-transfer Traffic
Slide 9
Identify Long-flow HTTP Traffic
Identifying HTTP Traffic
Long-flow has byte size larger than a threshold Audio, video and file-transfer are generally long-flow
HTTP_PORTS 80, 443, 1935, 8008, 8080, 8088, 8090
Slide 10
Identify Long-flow HTTP Traffic
Classify Audio Traffic
Classify Video & File-transfer Traffic
Approach
Slide 11
Classify Audio Traffic
99.4 % of radio rates are between 20 and 320 Kbps (Statistics from 3683 online radio web sites)
98% of online music rates are between 64 and 320Kbps (Statistics from >20 online music sites)
95% Confidence Interval of radio bytes-per-packet are between 900 and 1470 (Samruay et.al [1])
95% Confidence Interval of music bytes-per-packet are between 1260 and 1500 (Samruay et.al [1])
Slide 12
Classify Audio Traffic
Behavioral analysis: Online audio listener typically listens to
audio for more than 5 minutes
There are two distinct audio types : Radio & Music(songs)
New concept : Occupancy helps classify audio. Occupancy is a ratio of the
flow duration over the entire duration of a chunk of time.
0123456
Ave
rage
dow
nloa
d ra
te (M
bps)
music(Grooveshark)
radio (Hdradio)
video(CTV)
Slide 13
Classify Audio Traffic
Difference between Radio & MusicContinuous - Radio contents appears to download every second of the flow
Dirac - Songs in a playlist are downloaded & played one at a time
The max/min size of a radio flow is dependent on maximum flow-period configuration and the offered radio rates
The max/min size of a music flow is dependent on max/min song duration and offered online music rates
95% confidence interval of radio occupancy from DS-1,DS-2,SME-6,SME-7 and SME-8 is 82%,100%
95% confidence interval of music occupancy from DS-1,DS-2,SME-6,SME-7 and SME-8 is 0%,55%
Assumption : Minimum number of radio-flows are two (5 minutes at least)
Assumption : Minimum number of music-flows are two ( 5 minutes at least)
Assumption : Maximum radio-phase timeout is based on a flow-period(120 seconds)
Maximum music-phase timeout is based on maximum song duration (382 seconds)
Slide 14
Approach
Identify Long-flow HTTP Traffic
Classify Audio Traffic
Classify Video & File-transfer Traffic
Slide 15
CDN’s Authoritative DNS Server
Client Server
1) Client clicks on audio/video hyperlink
2) Metafile sent to client
3) M
etafi
le
Listening
HTTP Server
CDN_1
Web Browser
Media Player
8) Request multimedia content 1
5) Responds with CDN site
6) FromDNS lookup ,request sent tio CDN admin
7) Responds with address of all contents on all CDN’s
CDN_n
4) Request multimedia content
9) Request multimedia content 210) Content1
11) Content2
Background
• Multimedia Distribution (3 types)
Slide 16
Classify Video & File-transfer Traffic
Video flow-attributes (bytes-per-packet, bytes-per-flow, download rates)
& flow-group technique (FG) are used to classify video & file-transfers
Flow-group (FG)
• Video flow is associated with meta-data, style sheet, advertisements
• Kei.et.al[3] defined FG as the number of flows that occur within a few
seconds of video-flow with same destination-IP address
• Our expanded flow-group also includes flows that occur within a
longer duration that have the same subnet masked source-IP
address and the same destination-IP address
An Example
Slide 17
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536012345678
Flow Size
flow-index
Log
10(B
ytes
)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 360
102030405060708090
Flow Duration
flow-index
TIm
e (S
econ
ds)
Example cont`d
Slide 18
1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435360
200
400
600
800
1000
1200
1400
1600Bytes-per-packet
Flow Index
0 1 2 3 4 5 6 7 8 9 1011 12131415161718192021222324252627282930313233343536
Type of Flow
flow-index
vide
o-flo
w
flow-g
roup
signa
l-flo
w
Slide 19
Classify Video & File-transfer Traffic
-60 -4 0 1 10
Kei.et.al's flow-group - 98% within 4 seconds before video-flow and 97.8% of flow-group are
within 1 seconds after video-flow
Flow-group range (seconds)
Improved flow-group - 94.4% within 60 sec-onds before video-flow and 94.1% of flow-group are within 10 seconds after video-flow
video-flow
All flow-group statistics are estimated from dataset DS-4 and DS-5
-92.6% of flow-group-bytes-per-flow is above 1000 and below 500000 -Almost 100% of flow-group bytes-per-packet are above 200
Slide 20
Classify Video & File-transfer Traffic
Start
Gather potential V/F flows
• flow > 0.5MB
• & > 1260 bytes-per-pkt
• & > 128Kbps
• & order by destination-IP
and flow start time
End
For every potential V/F flow, gather potential
flow-group(FG) flows when:
• FG flow > V/F start-time – 4
• &FG flow < V/F start-time + 1
• & FG flow and V/F has same dest-IP
• & FG flow between 1000B and 0.5 MB
• & FG flow between 200 and 1500 BPP
For V/F-phase gather potential FG flows:
• Same source IP address-subnet
• Same destination IP address
• & FG flow > V/F start-time – 60
• &FG flow < V/F start-time + 10
• & FG flow between 1000B and 0.5 MB
• & FG flow between 200 and 1500 BPP
If FG == true:
inc FG counter
If FG == true:
inc FG counter
If FG >0:Label videoelse:Label file-transfer
Green is original flow-group(FG), Yellow is improvised flow-group. Both FG are run
:
Slide 21
Evaluation
Datasets used to test algorithms Accuracy measurement assessment
• Precision is the systems correct predictions against all predicted value. That is precision = TP / (TP+FP)
• Recall is the systems correct predictions against all actual correct value. That is recall = TP / (TP + FN)
• F-Measure is the harmonic mean of recall and precision. That is F-measure => 2 * Precision * Recall / (Precision + Recall)
• accuracy = TP + TN / (TP + FP + FN + TN) – true results Compare against other algorithms
NaïveBayes SVM (Support Vector Algorithm)
Slide 22
Evaluation – Datasets
SME-6 SME-7 SME-8Date 1/7/2013 1/22/2013 1/23/2013Duration(s) 24723 28207 13628Start-time (GMT-5) 10:18:04 10:29:04 10:56:20Flows 249822 287616 198409Packets 13376109 15351639 10170693
Bytes 11158181285 13589511746 8728052938
HTTP Flows 75485 87181 63951
HTTP Packets 7346663 8814438 5628558
HTTP Bytes 10456335955 12545720613 7982629610
Slide 23
Evaluation – Results
SME6-Audio SME6-File SME6-Video SME7-Audio SME7-File SME7-Video SME8-Audio SME8-File SME8-Video
27.5%
59.5%
39.4%
56.1%
79.7%
70.8%66.5%
64.0%
86.6%
16.8%
23.2%
42.6%
21.6%
12.5%
40.4%
60.4%
49.1%
43.1%
84.9%
60.8%
72.9%
93.0% 93.6%
82.5%85.1%
89.7%94.2%
F-Measure
NaivesBayes SVM Proposed Algorithm
Slide 24
Evaluation – Results
SME-6 SME-7 SME-8
NaivesBayes 39.1% 73.5% 71.4%
SVM 17.8% 16.3% 42.0%
Proposed Algorithm 70.5% 89.9% 90.9%
39.1%
73.5% 71.4%
17.8% 16.3%
42.0%
70.5%
89.9% 90.9%
Accuracy
Slide 25
Conclusion
• Proposed algorithm uses flow-based approach and classifies high percentage of tunneled traffic : audio, video and file-transfer
• Proposed audio algorithm:• Used a concept called occupancy to classify radio & music traffic
• Proposed video & file-transfer algorithm• Used improvised flow-group method to help increase
classification accuracy of video and file-transfer traffic• Proposed scheme’s F-measure is at least 10% more than
NaiveBayes and SVM
Slide 26
Reference[1] Samruay Kaoprakhon , Vasaka Visoottiviseth, "Classification of Audio and Video Traffic over HTTP Protocol," in Communications and Information Technology, 2009. ISCIT 2009. 9th International Symposium on, Sept 2009
[2] M. Twardos, "The Information Diet," 2011. [Online]. Available: http://theinformationdiet.blogspot.ca/2011/11/probability-distribution-of-song-length.html. [Accessed 2013]
[3] K Takeshita, T Kurosawa, M Tsujino and M Iwashita, "Evaluation of HTTP Video Classification Method Using Flow Group Information," in Telecommunications Network Strategy and Planning Symposium (NETWORKS), 2010 14th International, Sept 2010.
[4] H.Kim, K.Claffy, M.Fomenkov, D.Barman, M.Falutsos, K.Lee, " Internet Traffic Classification Demystified: Myths, Caveats, and the Best Practices Classification of Audio and Video Traffic over HTTP Protocol," in ACM, 2008
[5] POWERS, D.M.W. “EVALUATION: FROM PRECISION, RECALL AND F-MEASURE TO ROC, INFORMEDNESS, MARKEDNESS & CORRELATION ," in Journal of Machine Learning Technologies, Volume 2, Issue 1, 2011, pp-37-63