Collecting user-data-socially-responsibly
-
Upload
konark-modi -
Category
Data & Analytics
-
view
225 -
download
0
Transcript of Collecting user-data-socially-responsibly
![Page 1: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/1.jpg)
“Collecting User's Data in a Socially-Responsible Manner.” Photograph: Daniel Beltra/Greenpeace
Konark Modi @konarkmodi
Josep M. Pujol @solso
![Page 2: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/2.jpg)
About Cliqz
• 80+ - Team size
• 500,000 - DAU
• 3 Million+ - Downloads (Germany only)
• 1 billion+ - Indexed pages (We do not believe in indexing the web.)
• 5 TB - In-Memory indexed (Based on open source and in-house build NoSQL stores.)
• 10x more coverage for anti-phishing protection - As compared to other players like safebrowsing by Google.
• Upcoming products like Anti-tracking etc.
![Page 3: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/3.jpg)
About Cliqz
![Page 4: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/4.jpg)
We Love Data …
![Page 5: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/5.jpg)
Let's step back a bit in time, to get the context.
![Page 6: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/6.jpg)
Source : http://thehumanfaceofbigdata.com
“ Data is the new oil ” - Clive HumBy (2006)
![Page 7: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/7.jpg)
Data is still being collected without enough controls & measures.
Is privacy the new Green ?
![Page 8: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/8.jpg)
The biggest by-product of which being SESSIONS.
Is privacy the new Green ?
![Page 9: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/9.jpg)
How ?
Alice
Alice
Bob
MAP/REDUCE :D
Server-Side
Alice
Alice
Bob
Client-Side
Uncharted w
ater
![Page 10: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/10.jpg)
Instead …
Uncharted w
ater
Server-Side
Alice
Alice
Bob
Client-Side
Alice
Alice
Bob
MAP/REDUCE :D
MAP/REDUCE :D
MAP/REDUCE :D
![Page 11: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/11.jpg)
Who is responsible ?
Is there a conspiracy theory or an evil plan ?
![Page 12: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/12.jpg)
Well, we have a simpler explanation:
It’s the consequences of common development
practices, which results in trading user’s data
knowingly / unknowingly !
![Page 13: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/13.jpg)
Demo
![Page 14: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/14.jpg)
This looks like a toy example ?
![Page 15: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/15.jpg)
Which are the queries that are so bad that forces people to redo the same query
elsewhere ?
Let’s take a more complex case
![Page 16: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/16.jpg)
Aliceapache big data
conf
search engine 2
search engine 1
Aliceapache big data
conf
Client-Side
![Page 17: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/17.jpg)
Aliceapache big data
conf
search engine 2
search engine 1
Aliceapache big data
conf
Uncharted w
ater
Aliceapache big data
conf
search engine 2
search engine 1
Aliceapache big data
conf
Map-ReduceClient-Side
Server - Side
![Page 18: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/18.jpg)
Aliceapache big data
conf
search engine 2
search engine 1
Aliceapache big data
conf
Uncharted w
ater
Aliceapache big data
conf
search engine 2
search engine 1
Aliceapache big data
conf
Map-ReduceClient-Side
Server - Side
![Page 19: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/19.jpg)
Aliceapache big data
conf
search engine 2
search engine 1
Aliceapache big data
conf
Uncharted w
ater
Aliceapache big data
conf
search engine 2
search engine 1
Aliceapache big data
conf
Map-Reduce
Aliceapache big data
conf
search engine 2
search engine 1
Aliceapache big data
conf
Map-Reduce
Client-Side
Server - Side
![Page 20: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/20.jpg)
Aliceapache big data
conf
search engine 2
search engine 1
Aliceapache big data
conf
Uncharted w
ater
Aliceapache big data
conf
search engine 2
search engine 1
Aliceapache big data
conf
Map-Reduce
Aliceapache big data
conf
search engine 2
search engine 1
Aliceapache big data
conf
Map-Reduce
Client-Side
Server - Side
![Page 21: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/21.jpg)
We mentioned before, we believe in data and are not against the collection .
• Stopping data collection altogether would be foolish and dangerous.This also means stopping the wheels of innovation.
• Who would benefit the most by supporting the ban on advertisements of tobacco products??
![Page 22: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/22.jpg)
![Page 23: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/23.jpg)
“Socially responsible manner” is an analogy to ensure events being collected are not suffering from pollutants like Explicit IDs, Implicit IDs and reaches home Secure.
![Page 24: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/24.jpg)
Why does CLIQZ Care ?
![Page 25: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/25.jpg)
German Data Privacy Laws
Security breachesWhen government knocks
on your door
![Page 26: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/26.jpg)
So what do we bring on the table ??
![Page 27: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/27.jpg)
HUMAN WEB
• We have developed HumanWeb to balance the Right-to-Privacy with the needs to build products that improve the web and allow for more openness.
• Ensuring data that can infer sessions, linkages to navigation patterns is not collected.
• Does not create so much data that could allow identification of individuals
• We do not want to know who "YOU" are, what "YOU" searched and when "YOU" searched.
• Designed keeping in mind so that a "malicious/untrustworthy" actor or as a matter of fact even anyone at Cliqz, getting access to the raw data flow cannot infer or identify individuals.
![Page 28: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/28.jpg)
Sample events:{
"action": action of the message,
"ver": version name,
"type": "humanweb",
"payload": { }, //the actual data
"ts": UTC time capped to the day, e.g. 20150909
}
• Sample event for Page
• Sample event for Query
![Page 29: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/29.jpg)
HumanWeb
[{event1}, {event2},
{event3}]
Event Queue | Schedule to ensure not sent in batch
Final checks
Filtering
Sanitisation / Masking
Secure Channel
Client-side
Local storage | Structural data about webpages
Map-Reduce Aggregations, Heuristics,
Filtering,Hashing
![Page 30: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/30.jpg)
Privacy breaches on the way home
To achieve total privacy, we must rely on a network of proxies that remove any network-related data like cookies, IP,
headers so that finger-printing is impossible.
![Page 31: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/31.jpg)
SecureChannel : Protection from network fingerprinting
![Page 32: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/32.jpg)
SecureChannel : What do we encrypt ?
• The queries from the user (initiated by them upon activity on the Cliqz’s instrumented Firefox address bar).
• All telemetry signals (initiated by Cliqz’s instrumented Firefox)
• All messages regarding the HumanWeb data collection effort.
Also, before reaching our infrastructure the encrypted messages are routed through a mesh of
proxies.
![Page 33: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/33.jpg)
SecureChannel : How do we encrypt ?
Life-Cycle of hashes / keys : • AES : Hash-keys used with AES are used only one time. Even if the user types the
same query . • Public / Private KeyPair ( Client ) :
• The Keys on client side are all short lived, we continuously generate keys on the client-side.
• The public/private key pair of the client (the Extension) is meant to be used only once and then thrown away. The key pairs are regenerated to fill a pool while the browser is idle.
• Public / Private KeyPair ( Server ) : • Only public part of this key is shared with the extension. • The client uses it while encrypting the request. This is long lived key, currently
only to change in the case it is compromised
Client side : 128-bit symmetric AES encryption, OpenSSL RSA 1024-bit encryption. EventLogger: 128-bit symmetric AES encryption, OpenSSL RSA 4096-bit encryption.
![Page 34: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/34.jpg)
SecureChannel : How do we encrypt ? (Extension)
encryptedRequest(iv:encryptedMsg:encryptedKey)
iv :Initializaton Vector msg = (originalRequest + ExtensionPublicKey) key = md5(msg) encryptedMsg = AES.encrypt(msg, key, {mode: CBC, padding: PKCS7, iv: iv}) encryptedKey = sign(EventLoggerPublicKey, key)
Each request to be encrypted has the following components : • Message / Request to encrypt : Query or Data• ExtensionPublicKey : Chosen from a pool of public keys for that user on
the machine, key is used only once and then discarded).• Initialisation Vector : Derived from wordarray of 16-bits. • EventLoggerPublicKey : Our public key, shared with the extension.
![Page 35: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/35.jpg)
SecureChannel : Routing ? (Extension)
• Extension maintains a list of proxies which are healthy / good at that point in time.
• When sending the request / message extension picks up the end-point in a round-robin fashion (Round-robin for now).
• To avoid the risk of proxies being malicious with the message, we implement scrambling and splitting of messages into a random ‘n’ parts just before sending the message from extension.
• The value of n is determined by the extension, we expect ‘n’ to be 1,2,4 or 8 for the time being. Also, the value of ’n’ is not known to proxies hence they are unaware if it has all the parts.
• The only way to tamper a message is to have all the parts to decrypt it, but since messages are scrambled, split and send through different proxies this makes the messages safe from proxies.
• Event Logger waits for all the message by combination at our Event Logger(Secure) can decrypt the message.
![Page 36: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/36.jpg)
SecureChannel : How do we decrypt ? (Server)
EncryptedRequest = iv:encryptedMsg:encryptedKey key = unlock(EventLoggerPrivateKey, encryptedKey) msg = AES.decrypt(encryptedMsg, key, {mode: CBC, padding: PKCS7, iv: iv) request = msg.data ExtensionPublicKey = msg.pk (We need it to sign the response)
Important: • Because the server receives messages in parts, to get the key and message we rely on
combinations. • The message itself is scrambled, so even if it is decrypted we need to stitch it together by trying
different combinations.
![Page 37: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/37.jpg)
All talk and no play, makes Jack a dull boy !
Demo
![Page 38: Collecting user-data-socially-responsibly](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f2ecfe1a28abde318b4657/html5/thumbnails/38.jpg)
Thank You http://www.cliqz.com/en
We believe it’s possible, we are actually doing it
photo: projectsecretidentity.org