Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop...
-
Upload
fay-suzanna-bishop -
Category
Documents
-
view
220 -
download
0
Transcript of Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop...
![Page 1: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/1.jpg)
Internet Traffic Classification: On the Discriminative Power of
Traffic Flow Features
AAF Workshop Cairo, Egypt, 2009.5.15 (Fri)
Hyun-chul Kim [email protected]
Joint work with Yeon-sup Lim, Jiwoong Jeong
Seoul National Univ.
![Page 2: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/2.jpg)
MOTIVATIONWhy Internet Traffic Classification?
2
![Page 3: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/3.jpg)
3
Big struggles over the Internet
• File sharing tussle – File sharing community v.s. intellectual property
representatives (e.g., RIAA and MPAA).
• Cyber-security battle– Malicious hackers v.s. Security management
companies.
• Network neutrality debate – ISPs vs. contents/service providers.
3
![Page 4: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/4.jpg)
44
All the tussles boil down to...
Internet (application) traffic classification
(Broadly speaking, Internet System Measurement)
1/41The emergence of Napster traffic in 1999-2000 [Plonka 01]
![Page 5: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/5.jpg)
Internet Traffic Classification : Approaches so far
• Port number-based.
• Payload-based.
• Host-behavior-based.
• Flow-features-based.
5O(100) papers for the last 5-6 years.
![Page 6: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/6.jpg)
PROBLEM DEFINITIONWhat problem have we addressed ?
6
![Page 7: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/7.jpg)
Key Questions
• The best traffic classification approach? Under what conditions? (backbone? edge? Bandwidth?
…)
For what applications? (p2p? Web? Games? Streaming? …)
Why?
• What are marginal contributions and limitations of each approach?
![Page 8: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/8.jpg)
8
State of the Art Answer (2007)
“Who knows ?????” :-x
“One of the foremost challenges of traffic classification currently is effectively comparing between the many proposed approaches.” [Erman 07]
8
![Page 9: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/9.jpg)
9
Why? [Erman 07, Moore 07]
Every traffic classification approach/technique - is evaluated using different (local) traces, often w.o. payload.- tracks different features, tune different parameters, even with- different definition of traffic unit and/or application category.
盲人摸象 (Blind Men Guessing an Elephant)
![Page 10: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/10.jpg)
METHODOLOGIES
10
![Page 11: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/11.jpg)
11
Shared codes, data, & expertise
4/41112/2211
![Page 12: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/12.jpg)
We conducted
• Comprehensive evaluation of Port-based approach. Host-behavior-based approach. Flow-features-based approach.
• Using 7 traces with payload 3 backbone and 4 edge traces. From Japan, Korea, Trans-pacific, and
US.
![Page 13: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/13.jpg)
DatasetsTrace
(Country)
Link type Date (Local time)
Start time & duration
(Local time)
AverageUtilizatio
n(Mbps)
Payload bytes per packet
PAIX-I (US)
OC48 Backbone, uni-directional
2004.2.25 Wed
11:00, 2h 104 16
PAIX-II (US)
OC48 Backbone
2004.4.21 Wed
19:59, 2h 2m 997 16
WIDE (US-JP)
100 ME Backbone
2006.3.3 Fri
22:45, 55m 35 40
KEIO-I (JP)
1 GE Edge 2006.8.8 Tue
19:43, 30m 75 40
KEIO-II (JP)
1GE Edge 2006.8.10 Thu
01:18, 30m 75 40
KAIST-I (KR)
1GE Edge 2006.9.10 Sun
02:52, 48h 12m
24 40
KAIST-II (KR)
1GE Edge 2006.9.14 Thu
16:37, 21h 16m
28 40
![Page 14: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/14.jpg)
Tools evaluated
• CoralReef : port-number based classification Version 3.8 (or later).
• BLINC : host behavior-based classification
• WEKA : A collection of machine learning algorithms 7 most often used / well-known algorithms. Flow attributes selection. Training set size vs Performance (Accuracy / F-
Measure).
![Page 15: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/15.jpg)
Machine learning algorithms
Supervised machine learning algorithms
Bayesian Decision Trees Rules Functions Lazy
Naïve Bayesian, Support Vector Machine, [Moore 05, Williams 05] C4.5 [Williams 06B, Li 07, , k-Nearest Neighbors Bayesian Network [Williams 06B] Bennett 00] [Roughan 04] [Williams 06A, Neural Net. [Auld 07, Williams 06B] Nogueira 06]Naïve Bayes Kernel Estimation[Moore 05, Williams 05]
![Page 16: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/16.jpg)
16
RESULTS and LESSONS LEARNED(in a brief)
![Page 17: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/17.jpg)
17
Key Lessons Learned (~2008)
• Port numbers as key features Still useful in identifying many conventional
applications. Very powerful when used with (the first few) packet
size info. The first work that showed uni-directional traffic flow
feature set is good enough for accurate traffic classification.
• Support Vector Machine algorithm worked the best
Requires the smallest training set to achieve higher accuracy.
Scientifically grounded (reproducible) traffic classification research requires that researchers share tools, algorithms, and data sets from a wide range of Internet links to reproduce results.
![Page 18: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/18.jpg)
18
More (Fundamental) Questions
Q) If an algorithm A performs very well (with >90/95/99% classification accuracy)…– Why does the algorithm work that well?– i.e., where does the real good performance come
from?
4/16
A1) Is it because the algorithm itself is very smart(er) enough?
A2) OR, Is it because the traffic classification itself rather an easy (not that a complicated) pattern classification problem?
How much performance is gained from each of (A1) and (A2)? How do we quantify them?
![Page 19: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/19.jpg)
Selected key flow features
Protocol
srcport dstport Payloaded or not
Min pkt size
TCP flags
Size of n-th pkt
Keio-I V V V PUSH 2, 8
Keio-II V V V PUSH 1, 4
WIDE V V V V SYN, PUSH
4, 7
KAIST-I V V V V SYN,RST,PUSH, ECN
3, 5
KAIST-II V V V V SYN, PUSH
2, 3, 7
PAIX-I V V SYN, ECN 2, 9
PAIX-II V V V V SYN, CWR
1,4
* CFS (Correlation-based Feature Selection [Williams 06]) was used
![Page 20: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/20.jpg)
Accuracy with each traffic flow feature
20
Using the K-Nearest Neighbors method.
![Page 21: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/21.jpg)
Accuracy with the size of the first n packets in traffic flows
21
• Size of the first 4-5 packets only ~ 85% of accuracy
• Showing the feasibility of accurate real-time traffic classification.
![Page 22: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/22.jpg)
How many packets do we have to go through identify specific
TCP apps?
22
• Size of the first 4-5 packets only ~ 80~85% of accuracy
![Page 23: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/23.jpg)
23
How many packets do we have to go through identify specific
UDP apps?
• Size of the first 1-2 packets only ~ 80~100% of accuracy !!!!
![Page 24: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/24.jpg)
24
Concluding Remarks
• The Measured discriminative power of features 1. The first 4-5 packets ~ 85% accuracy.
• Doesn’t have to be bi-directional TCP connection flows [Bernaille ‘06]
2. 1 + Protocol + Ports ~ 88.6% accuracy.– Even without any algorithmic intelligence/model
(we did nothing but just distance calculation in the Euclidean Feature Space).
3. Real-time traffic classification with one-directional flow.
4. Ok then, now we’ve got 11-17% left to achieve• Which algorithm(s), based on what basic theory, is the
best to obtain/maximize the additional performance gain?
![Page 25: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/25.jpg)
Developing an open-source traffic classification benchmark
25
Open-source, Plug & Play Framework
![Page 26: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/26.jpg)
26
References[Kim ‘08] Kim et al., “Internet Traffic Classification Demystified: Myths, Caveats, and the Best Practices,” ACM
CoNEXT, Madrid, Spain, December 2008.[CoralReef ‘07] CoralReef. http://www.caida.org/tools/measurement/coralreef [Erman ‘06] Erman et al., “Traffic Classification Using Clustering Algorithms,” ACM SIGCOMM Workshop on Mining
Network Data (MineNet), Pisa, Italy, September 2006.[Karagiannis ‘05] Karagiannis et al., “BLINC: Multi-level Traffic Classification in the Dark,” ACM SIGCOMM 2005,
Philadelphia, PA, August 2005. [Won ‘06] Won et al., “A Hybrid Approach for Accurate Application Traffic Identification,” IEEE/IFIP E2EMON, April
2006.[Bernaille’06] Bernaille et al., “Early Application Identification,” ACM CoNEXT, Lisboa, Portugal, December 2006.
16/16
![Page 27: Internet Traffic Classification: On the Discriminative Power of Traffic Flow Features AAF Workshop Cairo, Egypt, 2009.5.15 (Fri) Hyun-chul Kim hkim@mmlab.snu.ac.kr.](https://reader035.fdocuments.us/reader035/viewer/2022062304/56649e9e5503460f94b9f6ff/html5/thumbnails/27.jpg)
27
k-Nearest Neighbors Training instances for class ATraining instances for class B
Feature X (e.g., 1st packet size, …)
Feature Y
Testing instances to classify
7/16