Privacy and Anonymity in the Internet
Transcript of Privacy and Anonymity in the Internet
1
Part 2: Observation and analytics techniquesAspects:• Collection of data about the client (user)• Recognition of a specific client –
and combination of the collected information• Techniques for analytics• Browser identification techniques (browser fingerprinting)
Privacy and Anonymityin the Internet
BaSoTI 2014, Privacy and Anonymity in the Internet
2
Aim of User Behavior AnalyticsAnalytics:Collect data about the client (user), combination of information
Questions: Which websites were visited? Relations to other users Identification of a user and assignment to a user profile
(can be pseudonymized)
Aims: Improvement of service Selective advertisement Placement of products, Profit increase Selling of user profiles (anonymous statistics, personal data)
3
Web Usage AnalyticsInterest on: which websites were vistited in which order which links were followed how long did a user stay on a web page
Recognize a user: by IP address – only of a limited value by web browser identification
– gives more information than one would expect by cookies, web storage and scripts –
typical technology used by analytic tools
Privacy can be easily violated by these analytic techniques.
4
How a user leaves tracesDifferent scenarios:
A user leaves traces (data) in the network voluntarily on purpose of a service
(in blogs, social networks, serach engines)
without own intention (log entries, cookies, referer, geographic position) and without declaring a prohibition
stealthily collected data without permissionof the user
Possibility of server information leakage: Exploits of bugs by attackers, weak server configuration
5
Access LoggingIP address, protocol action (e.g. HTTP request) , access time, user agent (i.e. browser)
browser web-server
HTTP GET request
HTTP response
one log entry per request
103.161.195.112 –[24/Dec/2011:23:34:01 + 0100]“GET /images/holiday-greetings.jpg HTTP/1.1“200 4437 “http://www.greetings.com“ “Mozilla/5.0 (Windows NT 6.1; rv:13.0) Gecko/20100101 Firefox/13.0.1“
6
clientsided Techniques and CookiesThe most common way to track web browsers is via HTTP cookies that are set by 3rd party analytics and advertisingdomains
cookies are key value pairs, for example:key=“VisitorRecognition“value=“gid=964971c6-7c5f-4487-bd0b-cec3fc5fe3da7&mid=0&usl=“
technical principle: cookies are strored by the web browser
and allow the recognition of a web client cookies are transfered with every HTTP
request to the belonging web server domain
7
clientsided Techniques and Cookiescookies are managed typically by serversided scripts (PHP or ASP):The web server sets a cookie, later it reads a cookie<?php
$cookie_present = false;foreach ($_COOKIE as $k => $v) {
if (strcmp($k, “MY_ANALYTICS_COOKIE“)==0){ strcpy($cookie_value, $v); $cookie_present = true;}
}if ( $cookie_present ) { /* work with $cookie_value */ }else { setcookie("MY_ANALYTICS_COOKIE", “followed_my_link", time()+3600);
?>
Limitation:cookies can be read solely from the server that set the cookie.However, 3rd-party components can be involved.
8
Client Server
1: Analytics server gets the clients IP address2: Analytics server is able to install cookies
Third-Party analysis tools, such as Google Analytics, AdChoices, etc.
AnalyticsServer
JavaScript generates a request tothe analytics server+ parameters
2
1
Web siteapplies analytics by includingscriptsprovidedby analyticsservice
clientsided techniques and cookies
9
Analytics normal situation without analytics
browser web server Aone log entry
web server Bone log entry
web server Aone log entry
analytics server
web page 1
web page 2
web page 3
3 entriessituation withanalytics
10
AnalyticsWeb analytic tools use page tags (typically JavaScript) embeddedinto web pages to collect visitor data to store it (client-sided) to transmit it to a remote database by pretending to load a
graphic item and by transferring collected data as parameters ofthe request
to track IP addresses (not permitted by german law)
If a website uses an analytics tool together with an account(such as google analytics and google account for www.blogger.com, www.youtube.com), the analytics service owns sufficient informationto identify and track a user. NOTE: EU law requires a web site to get the users permission tostore non-essential cookies
11
DOM storage (supercookies)data can be stored in the web browser, similarly to cookies but technically using a different technique objects provide key-value storage objects accessed via the DOM of the browser:
window.localStorage object (persistent, only accessiblefor scripts from one domain)
window.sessionStorage object (deleted after a session) some browsers allow to store 5 MB per domain
DOM storage bypasses the design principle that JavaScript scripts can not access the storage of the client
12
DOM storage (supercookies)JavaScript example: keep track of the number of times that a user visits all pages of your domain:
window.globalStorage[‘basoti.org‘].setItem("visits",parseInt(window.globalStorage[‘basoti.org‘].getItem("visits") || 0 ) + 1);
Supercookies can be exploited for analytics/identification even when cookies are deactivated
Supercookies can be deactivated as well, for example in the Firefox configuration:about.config -> dom.storage.enabled = {true | false}
13
Browser Identification TechniquesPrinciple:Recognize a user by information that is revealed by the web browser
Browser fingerprinting
Two ways: The web browser sends a lot of information along with a HTTP
request Further information can be collected by JavaScript code that runs
within the web browser, e.g. screen resolution.
Browser fingerprinting that can be used when cookies do not workBrowser tracking analysis tool: panopticlick.eff.org
14
Browser Identification: $_SERVERA serversided script can use the PHP built-in array $_SERVER, which contains a lot of information including your browser's identification.
User agent: Many browsers send their name and version number in a header line of the HTTP request, e.g. GET /wesbsite.html HTTP/1.1 Accept: text/htmlUser-Agent: Mozilla/5.0 (Windows NT 6.1; rv:13.0) Gecko/20100101 Firefox/13.0.1Host: webserver.com
Can be read using a PHP inline script.$_SERVER["HTTP_USER_AGENT"]
The Web browser's identification string contains the browser name, the version number, and other information. Example:
Mozilla/5.0 (Windows NT 6.1; rv:13.0) Gecko/20100101 Firefox/13.0.1
15
Browser Identification: $_SERVERMuch more can be found out …
$_SERVER[‘REMOTE_ADDR’]… the IP address of the requesting client$_SERVER[‘REMOTE_PORT’]… the port number of the requesting process (typically changes)
$_SERVER[‘HTTP_REFERER’] … the URL of the website that provided a link for the current request, in other words: which website did the user visit before
When JavaScript is enabled at the client side, the browser may execute scripts that interact with the server via AJAX …
16
Browser identification via JavaScriptWhen a web site is displayed in the browser, it possibly contains JavaScript that is processed by the browser. These scripts can access to clientsided information, e.g. the screen configuration:
<script type="text/javascript">var info[] = new object;info[“depth“]= screen.colorDepth;info[“width“]= screen.width;info[“hight“]= screen.height;info[“cdepth“]= screen.colorDepth;…// prepare an AJAX request to any web server// send info as parameters to web server
</script>
17
Browser identification via JavaScript
Further information that can be directly read out by JavaScript code:
timezone browser plugins, plugin versions (PluginDetect JavaScript library) supported MIME types whether a specified font is installed or not
Combined with request data that is generated by the browser a so called browser fingerprint is left.
20
Browser fingerprintsA few results from Electronic Frontier Foundation:
S =18.1 Bits
observed 470.161 browsers
In average only one in 286.777 browsers share a fingerprint83.6 % with a unique fingerprint5.3 % with an anonymity set of 2
When focusing on browsers with installed flash plugins shows a slightly different picture:94.2 % of browsers with a unique fingerprint
21
Browser fingerprints
The next slides show resultsof fingerprint analysis takenon an arbitrary notebookusing the Firefox browser and the Tor-browser.
tested: 6th. August 2012
try it on:panopticlick.eff.org
22
Result shows a uniquefingerprint, i.e. the browser can be identifiednon-ambigously.
Browser fingerprints
23
Browser fingerprintsA big quantity ofinformation isprovided bybrowser-pluginsof other installedprograms
24
Browser fingerprintsValuable information is provided by the fonts that are installed on the system
25
Browser fingerprint – Tor BrowserThe TOR anonymizationinfrastructureprovides a special browser that deliversvery commonvalues thatmakesidentificationhard.
26
How to protect Privacy?Assumption of a honest and friendly environment: analytics are done, but not misused one strategy would be to strictly avoid data collection and analytics another to allow analytics, but publish identity-related information
sparely
Assumption of observers/attackers that use private informationto compromise users: strictly avoid data collection and analytics use identy-related data very carefully try to act in an unobservable way (use anomymity infrastuctures)
Problem: Who knows which assumption is true? Things can change!
27
How to protect Privacy?For suspicious web sites: browser configuration: block (automated) access to these web sites,
and to block cookies from them avoid fingerprinting …
most strictly done be deactivation of JavaScript use a widely spread browser version
In general: be aware that analytics are done publish your real identy only to trusted domains