Automated Tracking of Online Service Policies
J. Trent Adams1 Kevin Bauer2 Asa Hardcastle3 Dirk Grunwald2 Douglas Sicker2
1 The Internet Society 2 University of Colorado 3 OpenLiberty.org
38th Research Conference on Communication, Information and Internet Policy
TPRC 2010: Automated Tracking of Online Service Policies
2
What They Know
Search queries Web browsing habits
Shopping habits
Social relationshipsOffline behaviors Personal interests
Possible medical conditions
Financial status
TPRC 2010: Automated Tracking of Online Service Policies
3
User Tracking is Easy and CommonWhen a user visits a website…
Website
Implicit information revealed: IP address HTTP request headers (user-agent, operating system,local time and language, referrer)
This information alone can be usedto construct an identifying, trackable profile [EFF’s Panopticlick, PETS ’10]
Additional tracking elements:Sites often embed cookiesand other tools to explicitly identify and track usersdictionary.reference.com
Source: http://blogs.wsj.com/wtk
TPRC 2010: Automated Tracking of Online Service Policies
4
The Need for Clear Policy Articulation
Given the inherent privacy risks in ordinary web browsing, most sites explicitlyexplain how they handle sensitive user data (PII) in a human-readable, natural
language privacy policy or terms of service document
Pros of natural language policies
Near universal deployment
Cons of natural language policies
Users must find, read, andcomprehend the policies
Comprehension is poor for natural language policies
[McDonald et al., PETS ’09]
TPRC 2010: Automated Tracking of Online Service Policies
5
Structured Policy Formats: P3P
• The Platform for Privacy Preferences (P3P) is a machine-readable XML schema for encoding:– What kind of user information is collected– How any collected user information is used– How long user information is stored
• P3P files can be automatically parsed and semantically analyzed by the web browser
• Users can specify their own preferences and interact only with sites with compatible policies
• Policy information can be transformed into “standardized” formats to improve policy comprehension
TPRC 2010: Automated Tracking of Online Service Policies
6
P3P and Standardized Policy Formats
Structured policy formats (like P3P) can be summarized and displayed to users in standardized, easy to read formats
... “Privacy Finder” P3P Search Engine Result
≈
TPRC 2010: Automated Tracking of Online Service Policies
7
Slow Adoption for P3P
A study by Cranor et al. found that the most popular web sites tend to be more likely to offer P3P, but overall deployment is very low
Source: Cranor et al., Electronic Commerce Research and Applications 2008
2006: Only 10.25% offer P3P
2008: Only 13.59% offer P3P
TPRC 2010: Automated Tracking of Online Service Policies
8
Our Goal: Make Interacting with Natural Language Policies Easier
P3P adoption is limited, but human-readable policies are prevalent
This is a stop-gap measure: Until a structured policy format is widelyadopted, we must interact with natural language policies
Our contribution: Design and implement Policy Audit System- Aggregates natural language policies for a wide variety of
websites - Periodically checks these policy documents for updates - Enables distribution of policies to interested users - Notifies users about specific changes in policies
P3PNatural language policy tracking
… New structured policy format?
TPRC 2010: Automated Tracking of Online Service Policies
9
Policy Audit System: Architecture
Key Components:- Policy Monitor: Periodically fetches known policy documents for a large set of websites; checks policies for changes- Policy Library: The collection of policy documents for each site over time- Policy Library Mirrors: Copies of the policy library hosted by third parties
- Clients: Offers a way for users to obtain current or past policy information
TPRC 2010: Automated Tracking of Online Service Policies
10
Policy Monitor
• Periodically fetches a set of policy document URLs • Extracts relevant policy text using standard text parsing techniques• Compares the latest version to previously seen version to detect changes• Records latest version (if changed)• Based on the EFF’s TOSBack service (http://www.tosback.org)
TPRC 2010: Automated Tracking of Online Service Policies
11
Policy Library
• The Policy Monitor produces a library of policy documents, as they change over time
• The Policy Library is a directory structure available via the web:– A list of tracked web websites– Policy text snapshots, or previous versions– Various metadata to help find the latest document version
• The master library is hosted by the University of Colorado• Currently tracking 76 distinct policies (more coming soon)
TPRC 2010: Automated Tracking of Online Service Policies
12
Policy Library Mirrors
• Policy Library copies that are distributed among trusted parties• The Electronic Frontier Foundation (EFF), the Center for Democracy
and Technology (CDT), and the University of Colorado host Policy Library mirrors
TPRC 2010: Automated Tracking of Online Service Policies
13
Clients
• Generically, a client offers an interface to the Policy Library, providing access to policy data
• A client could offer the ability to search the library, automate change notification via twitter, ATOM, RSS, or e-mail
• We developed a client as a Firefox plugin that displays policy information (and notification of changes) for the current site the user is visiting
TPRC 2010: Automated Tracking of Online Service Policies
14
Example Client: Firefox Browser Plug-in*
• Accesses the Policy Library and alerts the user when they visit a website that publishes a policy that the Policy Monitor is tracking
Alert Icons
Visiting a site that’s not tracked
Visiting a trackedsite, but no changein policy since last visit
Visiting a trackedsite with an updatedpolicy since last visit
Visiting a tracked sitewith an unread policy
* sponsored by
TPRC 2010: Automated Tracking of Online Service Policies
15
Plug-in: Visiting a Tracked Site
Menu lists tracked policies
TPRC 2010: Automated Tracking of Online Service Policies
16
Plug-in: Visiting a Tracked Site with Policy Changes
TPRC 2010: Automated Tracking of Online Service Policies
17
Plug-in: Discovering Third Party Information Disclosure
Current policies for a visited pagewww.apple.com/itunes
Notify user of third-party pageelements
TPRC 2010: Automated Tracking of Online Service Policies
18
Summary and Conclusion• Given the absence of a widely adopted structured policy format, we argue that
steps should be taken to make natural language policies easier for users to understand
• To this end, we present the Policy Audit System to track natural language policy documents and notify users of policy updates
• Our hope is that this work helps individuals make sense of natural language policies while we wait for a structured policy data format to be widely adopted
For more informationProject overview: http://www.policymonitor.org/aboutDevelopment community: http://www.policymonitor.org/sourcecodeFirefox plug-in download: http://www.policymonitor.org/auditplugin
Thank [email protected]
Top Related