Motivation
Today’s tracing help build tomorrow’s systems
ISPs view raw network traces as a liability Traces can compromise user privacy Protecting users’ privacy increasingly important
Trace anonymization mitigates these issues
Offline Anonymization
Trace anonymized after raw data is collected Privacy risk until raw data is deleted
Today’s traces require deep packet inspection Headers insufficient to understand phishing or P2P Payload traces pose a serious privacy risk
Risk to user privacy is too high Two universities rejected offline anonymization
Offline’s Privacy Vulnerabilities
Two types of attacks:
1. Traditional: Network intrusion attacks
2. New: Raw data can be subpoenaed
Both universities required that subpoenas would not affect privacy
Online Anonymization
Trace anonymized while tracing Raw data resides in RAM only
Difficult to meet performance demands Extraction and anonymization must be done at line speeds
Code is frequently buggy and difficult to maintain Low-level languages (e.g. C) + “Home-made” parsers
Small bugs cause large amounts of data loss Introduces consistent bias against long-lived flows
Simple Tasks can be Very Slow
Regular expression for phishing:
" ((password)|(<form)|(<input)|(PIN)|(username)|(<script)|(user id)|(sign in)|(log in)|(login)|(signin)|(log on)|(sign on)|(signon)|(passcode)|(logon)|(account)|(activate)|(verify)|(payment)|(personal)|(address)|(card)|(credit)|(error)|(terminated)|(suspend))[^A-Za-z]”
libpcre: 5.5 s for 30 M = 44 Mbps max
Online Anonymization
Trace anonymized while tracing Raw data resides in RAM only
Difficult to meet performance demands Extraction and anonymization must be done at line speeds
Code is frequently buggy and difficult to maintain Low-level languages (e.g. C) + “Home-made” parsers
Small bugs cause large amounts of data loss Introduces consistent bias against long-lived flows
Our solution: Bunker
Combines best of both worlds Same privacy benefits as online anonymization Same engineering benefits as offline anonymization
Pre-load analysis and anonymization code Lock-it and throw away the key (tamper-resistance)
Threat Model Accidental disclosure:
Risk is substantial whenever humans are handling data
Subpoenas: Attacker has physical access to tracing system Subpoenas force researcher and ISPs to cooperate
As long as cooperation is not “unduly burdensome”
Implication: Nobody can have access to raw data
It Depends on Intent of Use Developing Bunker is like
developing encryption
Must consider purpose and uses of Bunker Developing Bunker for user privacy is legal Misuse of Bunker to bypass law is illegal
Logical Design
capture
Anon.Key
Online
Offline
assemble
parse
anonymizeOne-Way Interface
(anon. data)
Capture Hardware
capture
Anon.Key
Online
Offline
Capture Hardware
Closed-box VM
assemble
parse
anonymize
Hypervisor
encrypt
decrypt
Enc.Key
Encrypted Raw Data
One-WaySocket
VM-based Implementation
Open-box NIC
Open-box NIC
Open-box VM
save trace
logging
maintenance
capture
Anon.Key
Online
Offline
Capture Hardware
Closed-box VM
assemble
parse
anonymize
Hypervisor
encrypt
decrypt
Enc.Key
Encrypted Raw Data
One-WaySocket
VM-based Implementation
Benefits
Strong privacy properties Raw trace and other sensitive data cannot be leaked
Trace processing done offline Can use your favorite language! Parsing can be done with off-the-shelf components
Key Technologies
“Closed-box” VM protects sensitive data Contains all raw trace data & processing code No interactive access to closed-box (e.g. no console)
Encryption protects on-disk data Randomly generated key held in volatile memory Data cannot be decrypted upon reboot
“Safe-on-reboot” VM mitigates hardware attacks
Software Engineering Benefits
One order of magnitude btw. online and offlineDevelopment time: Bunker - 2 months, UW/Toronto - years
63,38253,995
1,350
5,5120
20,000
40,000
60,000
UW Toronto Bunker
Lines of Code
PythonC
Work Deferral
Don’t do now what you can do later
0
50
100
150
200
12:00 PM 6:00 PM 12:00 AM 6:00 AM 12:00 PM
Time
Queue Size (GB)
Error Recovery
Small bugs lead to small errors in the trace -- not huge gaps
31.72%
68.20%
99.92%
0.08% 0.08%
0%
20%
40%
60%
80%
100%
Online Tracer Tamper Resistant Tracer
% of Flows
Parsing errors
Parsing OK
Collateral damage
Phishing is Bad
Costs U.S. economy hundreds of millions Affects 1+ million U.S. Internet users
2004 - mid 2006: # of phishing sites grew 10x Banks claim phishing is #1 source of fraud Phishing messages now personalized
Harder to filter
Two Day Hotmail Trace
Tues Jan 29/08 11:15am - Thurs Jan 31 11:23am,University of Toronto at Mississauga
Hotmail
Users 3,062
# of E-mails Received 13,438
# of From Addresses 7,422
# of To Addresses 25,456
Median # of Words in E-mail Body 130
Questions
How often are URLs present in e-mails? How often do people click on links in e-mails? Do people verify an e-mail for legitimacy
before clicking on a link?
Links in Email
1.53% 0.54%5.86%
90.80%
18.70%
78.80%
0%
20%
40%
60%
80%
100%
Users Emails
% with Clicks <= 2 s% with Clicks% with URLs
Conclusions
Today’s tracing experiments need to look “deep” into network activity IP-level trace vs. email and browse history
Serious privacy concerns Physical security isn’t enough: subpoenas
Bunker provides the safety of online anonymization the simplicity of offline anonymization
Acknowledgments
Andrew Miklas (U. of Toronto) Alec Wolman (Microsoft Research) Angela Demke Brown (U. of Toronto)
Design
Open-box VM
XEN Hypervisor
(DomainU)
Untrusted SoftwareOnline Software
Closed-box VM(Domain0)
Anon.Key
Enc.Key
CaptureNIC
Encrypted Raw Trace
OpenNIC
One-WayInterface
Offline Software
Phishy Mail Leaks through Filters
0.22%
0.03%
0.10%
0.42%
0.85%
2.93%
4.33%17.10%
0.01%
0% 5% 10% 15% 20%
SARE_EBAY_SPOOF_NAME
SARE_SPOOF_BADURL
SARE_BANK_URI_IP
HTML_OBFUSCATE_10_20
HTML_OBFUSCATE_05_10
MURTY_PHISHING3
SCREENTIP
NORMAL_HTTP_TO_IP
MURTY_PHISHING1
% of Emails
Commodity VM
save trace
logging
maintenancecapture
Anon.Key
Online
Offline
Anonymized Trace
CaptureHardware
Inaccessible VM
assemble
parse
anonymize
Hypervisor
One-WaySocket
Commodity VM
save trace
logging
maintenance
capture
Anon.Key
Online
Offline
Anonymized Trace
CaptureHardware
Inaccessible VM
assemble
parse
anonymize
Hypervisor
encrypt
decrypt
Enc.Key
Encrypted Raw Trace
One-WaySocket
Top Related