Statistical Identification of Encrypted Web-Browsing Traffic
description
Transcript of Statistical Identification of Encrypted Web-Browsing Traffic
![Page 1: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/1.jpg)
Statistical Identification of Encrypted Web-Browsing Traffic
Qixiang SunStanford University
Daniel R. Simon, Yi-Min Wang, Wilf Russell, Venkata N. Padmanabhan, Lili Qiu
Microsoft Research
![Page 2: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/2.jpg)
Outline
• Motivation & Problem• Intuition• Hypothetical Attacker• Attacker’s Success Rate• Countermeasures• Conclusion
![Page 3: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/3.jpg)
Anonymous Web Browsing
• Protect personal information from Attacker’s Inference– Medical (Online support group)– Questionable Activities
• Question: Is this REALLY anonymous?
R1 R2 R3 R4
![Page 4: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/4.jpg)
What’s Different?
In anonymous Web browsing– The chain of routers are used for both
sending and receiving data
Can link HTTP requests and responses!
– The target Web pages are publicly accessible
Responses are known!
Implication: The first link/router is an exploitable weakness.
![Page 5: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/5.jpg)
What Information is Available?HTTP Get
HTTP Get
Response
Response
Bro
wse
r 1st R
outer
• Number of objects
• Object sizes
• Ordering of the objects
• Delay between packets
R1 R2 R3 R4
![Page 6: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/6.jpg)
Intuition
• Number of objects and object sizes are sufficient to identify a Web page!
– On average, a Web page has 11 objects with each object yielding 8.4 bits of information
8.4*11 – log2(11!) 67 bits 1020 possibilities!!
– Currently, there are about 109 Web pages
![Page 7: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/7.jpg)
An Hypothetical Attacker
List of target Sensitive sites URLs
ProgrammaticAccess to URL
& Traffic recording
Traffic patternConstruction &
Database update
TrafficPattern
Database
History
Similarity scoresCalculation
Decision module
Negative
Positive
R1
Traffic recording& Pattern construction
TrafficPattern
Browser
![Page 8: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/8.jpg)
Guts of the Pattern Matching• Given two multisets of object sizes S1 and S2
Sim(S1, S2) = S1 S2 / S1 S2
• Decision module uses an absolute threshold.TrafficPattern
Database
TrafficPattern
Similarity scoresCalculation
Decision module
For example:S1 = {3KB, 3KB, 5KB}S2 = {3KB, 5KB, 5KB}
Sim(S1, S2) =
= 0.5
| {3KB, 5KB} |
| {3KB, 3KB, 5KB, 5KB} |
![Page 9: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/9.jpg)
Experiment Setup
• Approximately 100,000 Web pages in total (URLs obtained from the Open Directory Project).
• The hypothetical attacker chooses about 2200 pages as target pages.
• Goal: Can these 2200 pages be identified without causing many false positives?
![Page 10: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/10.jpg)
What is a Success and Failure?
• Successful Identification:– A target page passes the similarity threshold and is
not confused with other pages in the target set.
• False Positive:– A non-target page is incorrectly identified as one of
the target pages.
• Potential False Positive:– A page passes the similarity threshold when
compared with a single selected target page.
![Page 11: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/11.jpg)
Attacker’s Success Rate
• A threshold of 0.5 is sufficient.
0
10
20
30
40
50
60
70
80
90
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Absolute Threshold
% o
f Pag
es
Identification rate(2191 targetpages)
Actual false-positives rate(98496 nontargetpages)
80.4%
2.1%
Is this small enough?
![Page 12: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/12.jpg)
A Detailed Look Inside• False-positives are NOT generated uniformly!
707580859095
100
0 200 400 600 800 1000 1200
# of Potential False Positives
% o
f Tar
get P
ages
0-identifiable pages
HTTP 404sCommon-looking pages
![Page 13: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/13.jpg)
Dynamism in Web Pages
• Most pages are relatively static
One-day-old pattern database is sufficient
0
20
40
60
80
100
0 0.2 0.4 0.6 0.8 1
Self Similarity Score
% o
f Tar
get P
ages
![Page 14: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/14.jpg)
Countermeasures
• Padding– Individual objects– Add random-sized objects
• Morphing– Pipelining the HTTP GET requests– Pre-fetching
• Mimicking– Common templates or Web-hosting services
![Page 15: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/15.jpg)
Padding Object Size• Linear – Nearest multiple of padding size• Exponential – Nearest power of 2
0
10
20
30
40
50
60
128 256 512 1024 2048 4096 8192 16384
Minimum Object Size
% o
f 0-id
entif
iabl
e pa
ges
Linear Padding
Exponential Padding
![Page 16: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/16.jpg)
Padding Random Objects
05
1015202530354045
0.3 0.4 0.5 0.6 0.7
Absolute Threshold
% o
f 0-Id
entif
iabl
e P
ages
Multiple of 10
![Page 17: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/17.jpg)
Two-chunk Pipelining
• Approximately 36% of the target pages are 0-identifiable.
– Very close to the theoretical limit of 1/e (assuming traffic patterns are random)
• Implication: Can harness the total entropy in the Web page traffic patterns.
![Page 18: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/18.jpg)
One-chunk Pipelining
02468
1012
0 2 4 6 8 10 12
K (Number of Potential False Positives)
% o
f K
-iden
tifia
ble
Pag
es
![Page 19: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/19.jpg)
Conclusion• Encrypted Web browsing can be identified by the target page’s “unique” traffic pattern.
![Page 20: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/20.jpg)
![Page 21: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/21.jpg)
010203040506070
Padding Bucket Size
% o
f Ide
ntifi
able
Site
s 0-identifiable1-identifiable2-identifiable
Linear Padding
![Page 22: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/22.jpg)
05
1015
2025
3035
40
128
256
512
1024
2048
4096
8192
1638
4
Minimum Padding Size
% o
f Ide
ntifi
able
Site
s 0-identifiable1-identifiable2-identifiable
Exponential Padding
![Page 23: Statistical Identification of Encrypted Web-Browsing Traffic](https://reader035.fdocuments.us/reader035/viewer/2022062814/568167f5550346895ddd6f69/html5/thumbnails/23.jpg)
Pad Random Objects
05
1015202530354045
0.3 0.4 0.5 0.6 0.7
Absolute Threshold
% o
f Ide
ntifi
able
Site
s
Multiple of 10Multiple of 15Multiple of 20