Privacy and Anonymity in the Internet

28
1 Part 2: Observation and analytics techniques Aspects: Collection of data about the client (user) Recognition of a specific client – and combination of the collected information Techniques for analytics Browser identification techniques (browser fingerprinting) Privacy and Anonymity in the Internet BaSoTI 2014, Privacy and Anonymity in the Internet

Transcript of Privacy and Anonymity in the Internet

1

Part 2: Observation and analytics techniquesAspects:• Collection of data about the client (user)• Recognition of a specific client –

and combination of the collected information• Techniques for analytics• Browser identification techniques (browser fingerprinting)

Privacy and Anonymityin the Internet

BaSoTI 2014, Privacy and Anonymity in the Internet

2

Aim of User Behavior AnalyticsAnalytics:Collect data about the client (user), combination of information

Questions: Which websites were visited? Relations to other users Identification of a user and assignment to a user profile

(can be pseudonymized)

Aims: Improvement of service Selective advertisement Placement of products, Profit increase Selling of user profiles (anonymous statistics, personal data)

3

Web Usage AnalyticsInterest on: which websites were vistited in which order which links were followed how long did a user stay on a web page

Recognize a user: by IP address – only of a limited value by web browser identification

– gives more information than one would expect by cookies, web storage and scripts –

typical technology used by analytic tools

Privacy can be easily violated by these analytic techniques.

4

How a user leaves tracesDifferent scenarios:

A user leaves traces (data) in the network voluntarily on purpose of a service

(in blogs, social networks, serach engines)

without own intention (log entries, cookies, referer, geographic position) and without declaring a prohibition

stealthily collected data without permissionof the user

Possibility of server information leakage: Exploits of bugs by attackers, weak server configuration

5

Access LoggingIP address, protocol action (e.g. HTTP request) , access time, user agent (i.e. browser)

browser web-server

HTTP GET request

HTTP response

one log entry per request

103.161.195.112 –[24/Dec/2011:23:34:01 + 0100]“GET /images/holiday-greetings.jpg HTTP/1.1“200 4437 “http://www.greetings.com“ “Mozilla/5.0 (Windows NT 6.1; rv:13.0) Gecko/20100101 Firefox/13.0.1“

6

clientsided Techniques and CookiesThe most common way to track web browsers is via HTTP cookies that are set by 3rd party analytics and advertisingdomains

cookies are key value pairs, for example:key=“VisitorRecognition“value=“gid=964971c6-7c5f-4487-bd0b-cec3fc5fe3da7&mid=0&usl=“

technical principle: cookies are strored by the web browser

and allow the recognition of a web client cookies are transfered with every HTTP

request to the belonging web server domain

7

clientsided Techniques and Cookiescookies are managed typically by serversided scripts (PHP or ASP):The web server sets a cookie, later it reads a cookie<?php

$cookie_present = false;foreach ($_COOKIE as $k => $v) {

if (strcmp($k, “MY_ANALYTICS_COOKIE“)==0){ strcpy($cookie_value, $v); $cookie_present = true;}

}if ( $cookie_present ) { /* work with $cookie_value */ }else { setcookie("MY_ANALYTICS_COOKIE", “followed_my_link", time()+3600);

?>

Limitation:cookies can be read solely from the server that set the cookie.However, 3rd-party components can be involved.

8

Client Server

1: Analytics server gets the clients IP address2: Analytics server is able to install cookies

Third-Party analysis tools, such as Google Analytics, AdChoices, etc.

AnalyticsServer

JavaScript generates a request tothe analytics server+ parameters

2

1

Web siteapplies analytics by includingscriptsprovidedby analyticsservice

clientsided techniques and cookies

9

Analytics normal situation without analytics

browser web server Aone log entry

web server Bone log entry

web server Aone log entry

analytics server

web page 1

web page 2

web page 3

3 entriessituation withanalytics

10

AnalyticsWeb analytic tools use page tags (typically JavaScript) embeddedinto web pages to collect visitor data to store it (client-sided) to transmit it to a remote database by pretending to load a

graphic item and by transferring collected data as parameters ofthe request

to track IP addresses (not permitted by german law)

If a website uses an analytics tool together with an account(such as google analytics and google account for www.blogger.com, www.youtube.com), the analytics service owns sufficient informationto identify and track a user. NOTE: EU law requires a web site to get the users permission tostore non-essential cookies

11

DOM storage (supercookies)data can be stored in the web browser, similarly to cookies but technically using a different technique objects provide key-value storage objects accessed via the DOM of the browser:

window.localStorage object (persistent, only accessiblefor scripts from one domain)

window.sessionStorage object (deleted after a session) some browsers allow to store 5 MB per domain

DOM storage bypasses the design principle that JavaScript scripts can not access the storage of the client

12

DOM storage (supercookies)JavaScript example: keep track of the number of times that a user visits all pages of your domain:

window.globalStorage[‘basoti.org‘].setItem("visits",parseInt(window.globalStorage[‘basoti.org‘].getItem("visits") || 0 ) + 1);

Supercookies can be exploited for analytics/identification even when cookies are deactivated

Supercookies can be deactivated as well, for example in the Firefox configuration:about.config -> dom.storage.enabled = {true | false}

13

Browser Identification TechniquesPrinciple:Recognize a user by information that is revealed by the web browser

Browser fingerprinting

Two ways: The web browser sends a lot of information along with a HTTP

request Further information can be collected by JavaScript code that runs

within the web browser, e.g. screen resolution.

Browser fingerprinting that can be used when cookies do not workBrowser tracking analysis tool: panopticlick.eff.org

14

Browser Identification: $_SERVERA serversided script can use the PHP built-in array $_SERVER, which contains a lot of information including your browser's identification.

User agent: Many browsers send their name and version number in a header line of the HTTP request, e.g. GET /wesbsite.html HTTP/1.1 Accept: text/htmlUser-Agent: Mozilla/5.0 (Windows NT 6.1; rv:13.0) Gecko/20100101 Firefox/13.0.1Host: webserver.com

Can be read using a PHP inline script.$_SERVER["HTTP_USER_AGENT"]

The Web browser's identification string contains the browser name, the version number, and other information. Example:

Mozilla/5.0 (Windows NT 6.1; rv:13.0) Gecko/20100101 Firefox/13.0.1

15

Browser Identification: $_SERVERMuch more can be found out …

$_SERVER[‘REMOTE_ADDR’]… the IP address of the requesting client$_SERVER[‘REMOTE_PORT’]… the port number of the requesting process (typically changes)

$_SERVER[‘HTTP_REFERER’] … the URL of the website that provided a link for the current request, in other words: which website did the user visit before

When JavaScript is enabled at the client side, the browser may execute scripts that interact with the server via AJAX …

16

Browser identification via JavaScriptWhen a web site is displayed in the browser, it possibly contains JavaScript that is processed by the browser. These scripts can access to clientsided information, e.g. the screen configuration:

<script type="text/javascript">var info[] = new object;info[“depth“]= screen.colorDepth;info[“width“]= screen.width;info[“hight“]= screen.height;info[“cdepth“]= screen.colorDepth;…// prepare an AJAX request to any web server// send info as parameters to web server

</script>

17

Browser identification via JavaScript

Further information that can be directly read out by JavaScript code:

timezone browser plugins, plugin versions (PluginDetect JavaScript library) supported MIME types whether a specified font is installed or not

Combined with request data that is generated by the browser a so called browser fingerprint is left.

Browser fingerprints

Browser fingerprints

20

Browser fingerprintsA few results from Electronic Frontier Foundation:

S =18.1 Bits

observed 470.161 browsers

In average only one in 286.777 browsers share a fingerprint83.6 % with a unique fingerprint5.3 % with an anonymity set of 2

When focusing on browsers with installed flash plugins shows a slightly different picture:94.2 % of browsers with a unique fingerprint

21

Browser fingerprints

The next slides show resultsof fingerprint analysis takenon an arbitrary notebookusing the Firefox browser and the Tor-browser.

tested: 6th. August 2012

try it on:panopticlick.eff.org

22

Result shows a uniquefingerprint, i.e. the browser can be identifiednon-ambigously.

Browser fingerprints

23

Browser fingerprintsA big quantity ofinformation isprovided bybrowser-pluginsof other installedprograms

24

Browser fingerprintsValuable information is provided by the fonts that are installed on the system

25

Browser fingerprint – Tor BrowserThe TOR anonymizationinfrastructureprovides a special browser that deliversvery commonvalues thatmakesidentificationhard.

26

How to protect Privacy?Assumption of a honest and friendly environment: analytics are done, but not misused one strategy would be to strictly avoid data collection and analytics another to allow analytics, but publish identity-related information

sparely

Assumption of observers/attackers that use private informationto compromise users: strictly avoid data collection and analytics use identy-related data very carefully try to act in an unobservable way (use anomymity infrastuctures)

Problem: Who knows which assumption is true? Things can change!

27

How to protect Privacy?For suspicious web sites: browser configuration: block (automated) access to these web sites,

and to block cookies from them avoid fingerprinting …

most strictly done be deactivation of JavaScript use a widely spread browser version

In general: be aware that analytics are done publish your real identy only to trusted domains

28

How to protect Privacy?Example:Firefox Browser checks website

certificates +applies blacklisting

allows cookie/DOM storage deletion

→ hinders analytics, but does not completely protectagainst

→ does not protectagaints trafficobservation

from 2011: http://www.mozilla.org/en/firefox/features/