Airavat : Security and Privacy for MapReduce
description
Transcript of Airavat : Security and Privacy for MapReduce
![Page 1: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/1.jpg)
Airavat: Security and Privacy for MapReduceIndrajit Roy, Srinath T.V. Setty, Ann
Kilzer, Vitaly Shmatikov, Emmett Witchel
The University of Texas at Austin
![Page 2: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/2.jpg)
2
Computing in the year 201X
Illusion of infinite resourcesPay only for resources usedQuickly scale up or scale down …
Data
![Page 3: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/3.jpg)
3
Programming model in year 201X Frameworks available to ease cloud
programming MapReduce: Parallel processing on clusters
of machines
Reduce
Map
Output
Data• Data mining• Genomic computation• Social networks
![Page 4: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/4.jpg)
4
Programming model in year 201X Thousands of users upload their data
Healthcare, shopping transactions, census, click stream Multiple third parties mine the data for better service
Example: Healthcare data Incentive to contribute: Cheaper insurance policies,
new drug research, inventory control in drugstores… Fear: What if someone targets my personal data?
Insurance company can find my illness and increase premium
![Page 5: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/5.jpg)
5
Privacy in the year 201X ?
Output
Information leak?
• Data mining• Genomic computation• Social networks
Health Data
Untrusted MapReduce
program
![Page 6: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/6.jpg)
6
Use de-identification? Achieves ‘privacy’ by syntactic
transformations Scrubbing , k-anonymity …
Insecure against attackers with external information Privacy fiascoes: AOL search logs, Netflix
datasetRun untrusted code on the original data?
How do we ensure privacy of the users?
![Page 7: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/7.jpg)
7
Audit the untrusted code? Audit all MapReduce
programs for correctness?
Aim: Confine the code instead of auditing
Also, where is the source code?
Hard to do! Enlightenment?
![Page 8: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/8.jpg)
8
This talk: Airavat
Framework for privacy-preserving MapReduce computations with untrusted
code.
Airavat is the elephant of the clouds (Indian mythology).
Untrusted Program
ProtectedData Airava
t
![Page 9: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/9.jpg)
9
Airavat guarantee
Bounded information leak* about any individual data after performing a
MapReduce computation.
*Differential privacy
Untrusted Program
ProtectedData Airava
t
![Page 10: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/10.jpg)
10
Outline Motivation Overview Enforcing privacy Evaluation Summary
![Page 11: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/11.jpg)
11
map(k1,v1) list(k2,v2)reduce(k2, list(v2)) list(v2)
Data 1
Data 2
Data 3
Data 4
Output
Background: MapReduce
Map phase
Reduce phase
![Page 12: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/12.jpg)
12
iPad
Tablet PC
iPad
Laptop
MapReduce exampleMap(input){ if (input has iPad) print (iPad, 1) }Reduce(key, list(v)){ print (key + “,”+ SUM(v)) }
(iPad, 2)
Counts no. ofiPads sold(ipad,1)
(ipad,1)SUM
Map phase
Reduce phase
![Page 13: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/13.jpg)
13
Airavat model Airavat framework runs on the cloud
infrastructure Cloud infrastructure: Hardware + VM Airavat: Modified MapReduce + DFS + JVM +
SELinux
Cloud infrastructure
Airavat framework1
Trusted
![Page 14: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/14.jpg)
14
Airavat model Data provider uploads her data on
Airavat Sets up certain privacy parameters
Cloud infrastructure
Data provider
2
Airavat framework1
Trusted
![Page 15: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/15.jpg)
15
Airavat model Computation provider writes data mining
algorithm Untrusted, possibly malicious
Cloud infrastructure
Data provider
2
Airavat framework1
3
Computation provider
Output
Program
Trusted
![Page 16: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/16.jpg)
16
Threat model Airavat runs the computation, and still
protects the privacy of the data providers
Cloud infrastructure
Data provider
2
Airavat framework1
3
Computation provider
Output
Program
Trusted
Threat
![Page 17: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/17.jpg)
17
Roadmap What is the programming model?
How do we enforce privacy?
What computations can be supported in Airavat?
![Page 18: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/18.jpg)
18
Programming model
MapReduce program for data mining
Split MapReduce into untrusted mapper + trusted reducer
Data DataNo need to audit
Airavat
Untrusted Mapper
Trusted
Reducer
Limited set of stock reducers
![Page 19: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/19.jpg)
19
Programming model
MapReduce program for data mining
Data DataNo need to audit
Airavat
Untrusted Mapper
Trusted
Reducer
Need to confine the mappers !Guarantee: Protect the privacy of data
providers
![Page 20: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/20.jpg)
20
Challenge 1: Untrusted mapper Untrusted mapper code copies data,
sends it over the network
Peter
Meg
Reduce
Map
Peter
Data
Chris
Leaks using system
resources
![Page 21: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/21.jpg)
21
Challenge 2: Untrusted mapper Output of the computation is also an
information channel
Output 1 million if Peter bought
Vi*gra
Peter
Meg
Reduce
Map
Data
Chris
![Page 22: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/22.jpg)
22
Airavat mechanisms
Prevent leaks throughstorage channels like network connections, files…
Reduce
Map
Mandatory access control Differential privacy
Prevent leaks through the output of the computation
Output
Data
![Page 23: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/23.jpg)
23
Back to the roadmap What is the programming model?
How do we enforce privacy? Leaks through system resources Leaks through the output
What computations can be supported in Airavat?
Untrusted mapper + Trusted reducer
![Page 24: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/24.jpg)
Airavat confines the untrusted code
MapReduce + DFS
SELinux
Untrusted
programGiven by the computation providerAdd mandatory access control (MAC)Add MAC policy
Airavat
![Page 25: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/25.jpg)
Airavat confines the untrusted code
MapReduce + DFS
SELinux
Untrusted
program
We add mandatory access control to the MapReduce framework
Label input, intermediate values, output
Malicious code cannot leak labeled data
Data 1
Data 2
Data 3
Output
Access control label MapReduce
![Page 26: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/26.jpg)
Airavat confines the untrusted code
MapReduce + DFS
SELinux
Untrusted
program
SELinux policy to enforce MAC
Creates trusted and untrusted domains
Processes and files are labeled to restrict interaction
Mappers reside in untrusted domain Denied network access,
limited file system interaction
![Page 27: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/27.jpg)
27
But access control is not enough Labels can prevent the output from been
read When can we remove the labels?
iPad
Tablet PC
iPad
Laptop
(iPad, 2)
Output leaks the presence of Peter !Pete
r
if (input belongs-to Peter) print (iPad, 1000000) (ipad,10000
01)
(ipad,1)SUM
Access control label
Map phase Reduce phase
(iPad, 1000002)
![Page 28: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/28.jpg)
28
But access control is not enough
Need mechanisms to enforce that the output does not violate an individual’s
privacy.
![Page 29: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/29.jpg)
29
Background: Differential privacy
A mechanism is differentially private if every output is produced with similar probability whether any given input is included or not
Cynthia Dwork. Differential Privacy. ICALP 2006
![Page 30: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/30.jpg)
30
Differential privacy (intuition)
A mechanism is differentially private if every output is produced with similar probability whether any given input is included or not
Output distribution
F(x)
ABC
Cynthia Dwork. Differential Privacy. ICALP 2006
![Page 31: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/31.jpg)
31
Differential privacy (intuition)
A mechanism is differentially private if every output is produced with similar probability whether any given input is included or not
Similar output distributions
Bounded risk for D if she includes her data!
F(x) F(x)
ABC
ABCD
Cynthia Dwork. Differential Privacy. ICALP 2006
![Page 32: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/32.jpg)
32
Achieving differential privacy A simple differentially private
mechanism
How much noise should one add?
Tell me f(x)f(x)
+noise…xn
x1
![Page 33: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/33.jpg)
33
Achieving differential privacy
Function sensitivity (intuition): Maximum effect of any single input on the output Aim: Need to conceal this effect to preserve
privacy
Example: Computing the average height of the people in this room has low sensitivity Any single person’s height does not affect the final
average by too much Calculating the maximum height has high
sensitivity
![Page 34: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/34.jpg)
34
Achieving differential privacy
Function sensitivity (intuition): Maximum effect of any single input on the output Aim: Need to conceal this effect to
preserve privacy
Example: SUM over input elements drawn from [0, M]X1
X2
X3
X4
SUM Sensitivity = MMax. effect of any input
element is M
![Page 35: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/35.jpg)
35
Achieving differential privacy A simple differentially private
mechanism
f(x)+Lap(∆(f))
…xn
x1Tell me f(x)
Intuition: Noise needed to mask the effect of a single input
Lap = Laplace distribution
∆(f) = sensitivity
![Page 36: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/36.jpg)
36
Back to the roadmap What is the programming model?
How do we enforce privacy? Leaks through system resources Leaks through the output
What computations can be supported in Airavat?
Untrusted mapper + Trusted reducer
MAC
![Page 37: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/37.jpg)
37
Enforcing differential privacy Mapper can be any piece of Java code (“black
box”) but…
Range of mapper outputs must be declared in advance Used to estimate “sensitivity” (how much does a
single input influence the output?) Determines how much noise is added to outputs to
ensure differential privacy
Example: Consider mapper range [0, M] SUM has the estimated sensitivity of M
![Page 38: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/38.jpg)
38
Enforcing differential privacy Malicious mappers may output values outside
the range If a mapper produces a value outside the
range, it is replaced by a value inside the range User not notified… otherwise possible information
leakData
1
Data 2
Data 3
Data 4
Range enforcer
Noise
MapperReducer
Range enforcer
Mapper
Ensures that code is not more sensitive than declared
![Page 39: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/39.jpg)
39
Enforcing sensitivity All mapper invocations must be independent
Mapper may not store an input and use it later when processing another input Otherwise, range-based sensitivity estimates may be
incorrect
We modify JVM to enforce mapper independence Each object is assigned an invocation number JVM instrumentation prevents reuse of objects
from previous invocation
![Page 40: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/40.jpg)
40
Roadmap. One last time What is the programming model?
How do we enforce privacy? Leaks through system resources Leaks through the output
What computations can be supported in Airavat?
Untrusted mapper + Trusted reducer
MACDifferential
Privacy
![Page 41: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/41.jpg)
41
What can we compute? Reducers are responsible for enforcing privacy
Add an appropriate amount of random noise to the outputs
Reducers must be trusted Sample reducers: SUM, COUNT, THRESHOLD Sufficient to perform data mining algorithms, search
log processing, recommender system etc.
With trusted mappers, more general computations are possible Use exact sensitivity instead of range based estimates
![Page 42: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/42.jpg)
42
Sample computations Many queries can be done with
untrusted mappers How many iPads were sold today? What is the average score of male students at
UT? Output the frequency of security books that sold more than 25 copies today.
… others require trusted mapper code List all items and their quantity sold
Sum
MeanThreshold
Malicious mapper can encode information in item names
![Page 43: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/43.jpg)
43
Revisiting Airavat guarantees Allows differentially private MapReduce
computations Even when the code is untrusted
Differential privacy => mathematical bound on information leak
What is a safe bound on information leak ? Depends on the context, dataset Not our problem
![Page 44: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/44.jpg)
44
Outline Motivation Overview Enforcing privacy Evaluation Summary
![Page 45: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/45.jpg)
45
Implementation details
SELinux policyDomains for trusted and untrusted programs
Apply restrictions
on each domain
MapReduce
Modifications to support mandatory
access control
Set of trusted
reducers
JVM Modifications to enforce
mapper independen
ce
450 LoC 5000 LoC
500 LoC
LoC = Lines of Code
![Page 46: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/46.jpg)
46
Evaluation : Our benchmarks Experiments on 100 Amazon EC2
instances 1.2 GHz, 7.5 GB RAM running Fedora 8Benchmark Privacy
groupingReducer primitive
MapReduce operations
Accuracy metric
AOL queries Users THRESHOLD,SUM
Multiple % queries released
kNN recommender
Individual rating
COUNT, SUM Multiple RMSE
K-Means Individual points
COUNT, SUM Multiple, till convergence
Intra-cluster variance
Naïve Bayes Individual articles
SUM Multiple Misclassification rate
![Page 47: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/47.jpg)
47
Performance overhead
AOL Cov. Matrix k-Means N-Bayes0
0.2
0.4
0.6
0.8
1
1.2
1.4
CopyReduceSortMapSELinux
Norm
alize
d ex
ecut
ion
time Overheads are less than 32%
![Page 48: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/48.jpg)
48
Evaluation: accuracy Accuracy increases with decrease in privacy
guarantee Reducer : COUNT, SUM
0 0.2 0.4 0.6 0.8 1 1.2 1.40
20406080
100
k-MeansNaïve Bayes
Privacy parameter
Accu
racy
(%)
No information
leak
Decrease in privacy guarantee
*Refer to the paper for remaining benchmark results
![Page 49: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/49.jpg)
49
Related work: PINQ Set of trusted LINQ primitives
Airavat confines untrusted code and ensures that its outputs preserve privacy PINQ requires rewriting code with trusted
primitives
Airavat provides end-to-end guarantee across the software stack PINQ guarantees are language level
[McSherry SIGMOD 2009]
![Page 50: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/50.jpg)
50
Airavat in brief Airavat is a framework for privacy
preserving MapReduce computations Confines untrusted code First to integrate mandatory access
control with differential privacy for end-to-end enforcement
Protected
Airavat
Untrusted Program
![Page 51: Airavat : Security and Privacy for MapReduce](https://reader036.fdocuments.us/reader036/viewer/2022070500/568168b5550346895ddf8af4/html5/thumbnails/51.jpg)
THANK YOU