AFrameworkforProtec/ngWorkerLocaon …hto/resources/private_geocrowd.pdf3. Geocast {t,GR}2. Task...
Transcript of AFrameworkforProtec/ngWorkerLocaon …hto/resources/private_geocrowd.pdf3. Geocast {t,GR}2. Task...
A Framework for Protec/ng Worker Loca/on Privacy in Spa/al Crowdsourcing
Nov 12 2014 Hien To, Gabriel Ghinita, Cyrus Shahabi
VLDB 2014 1
Mo/va/on
[1] hOp://mobithinking.com/mobile-‐marke/ng-‐tools/latest-‐mobile-‐stats/
Ubiquity of mobile users
Technology advances on mobiles
Network bandwidth
improvements
From 2.5G (up to 384Kbps) to 3G (up to 14.7Mbps) and recently 4G (up to 100 Mbps)
Smartphone's sensors. e.g., video cameras
6.5 billion mobile subscrip/ons, 93.5% of the world popula/on [1]
VLDB 2014 2
Spa/al Crowdsourcing
q Crowdsourcing – Outsourcing a set of tasks to a set of workers
q Spa/al Crowdsourcing – Crowdsourcing a set of spa%al tasks to a set of workers. – Spa%al task is related to a loca/on .e.g., taking pictures
Loca/on privacy is one of the major impediments that may hinder workers from par/cipa/on in SC
VLDB 2014 3
Problem Statement
Workers
Requesters SC-‐server
Report loca+ons
Current solu/ons require the workers to disclose their loca/ons to untrustworthy en//es, i.e., SC-‐server.
A framework for protec/ng privacy of worker loca/ons, whereby the SC-‐server only has access to data sani/zed according to differen%al privacy. VLDB 2014 4
Outline
v Background v Privacy Framework v Worker PSD (Private Spa/al Decomposi/on) v Task Assignment v Experiments
VLDB 2014 5
U/lity-‐Privacy Trade-‐off
VLDB 2014 6
Utility
100%
100%
0%
Privacy 0%
Related Work v Pseudonymity (using fake iden/ty)
• e.g. fake iden/ty + loca/on == resident of the home
VLDB 2014 7
v K-‐anonymity model (not dis/nguish among other k records) iden//es are known the loca/on k-‐anonymity fails to prevent the loca/on of a subject being not iden/fiable
all k users reside in the exact same loca/on k-‐anonymity, do not provide rigorous privacy
v Cryptography such technique is computa%onal expensive
=>not suitable for SC applica/ons
Differen/al Privacy (DP)
VLDB 2014 8
DP ensures an adversary do not know from the sani/zed data whether an individual is present or not in the original data
Given neighboring datasets and , the sensi/vity of query set QS is the the maximum change in their query results
∑=
−=q
1i21,|)()(|max)(
21
DQSDQSQSDD
σ
1L -‐sensi+vity:
1D 2D
[Dwork’06] shows that it is sufficient to achieve -‐DP by adding random Laplace noise with mean εσλ /)(QS=
ε
DP allows only aggregate queries, e.g., count, sum.
ε ε≤=
=
]Pr[]Pr[ln
2
1
UQSUQS
D
D
A database produces transcript U on a set of queries. Transcript U sa/sfies -‐dis/nguishability if for every pair of sibling datasets and and they differ in only one record, it holds that
1D ,2D 21 DD =ε
: privacy budget
-‐dis$nguishability [Dwork’06] ε
Outline
v Background v Privacy Framework v Worker Private Spa/al Decomposi/on v Task Assignment v Experiments
VLDB 2014 9
3. Geocast {t,GR}2. Task Request t
RequestersWorkers
SC-Server
Worker Database
1. Sanitized ReleasePSD
4. Consent
Cell Service Provider
GR
0. Report Locations
Privacy Framework
VLDB 2014 10
0. Workers send their loca/ons to a trusted CSP
2. SC-‐server receives tasks from requesters
3. When SC-‐server receives task t, it queries the PSD to determine a GR that enclose sufficient workers. Then, SC-‐server ini/alizes geocast communica/on to disseminate t to all workers within GR
4. Workers confirm their availability to perform the assigned task
1. CSP releases a PSD according to . PSD is accessed by SC-‐server
ε
Workers trust SCP
Workers do not trust SC-‐server and requesters
Focus on private task assignment rather than post assignment
Design Goal and Performance Metrics
VLDB 2014 11
Assignment Success Rate (ASR): measures the ra/o of tasks accepted by workers to the total number of task requests
Worker Travel Distance (WTD): the average travel distance of all workers
System Overhead: the average number of no/fied workers (ANW). ANW affects both communica%on overhead required to geocast task requests and the computa%on overhead of matching algorithm
Protec/ng worker loca/on may reduce the effec/veness and efficiency of worker-‐task matching, captured by following metrics:
Outline
v Background v Privacy Framework v Worker PSD (Private Spa+al Decomposi+on) v Task Assignment v Experiments
VLDB 2014 12
Adap/ve Grid (Worker PSD)
VLDB 2014 13
A B
C D Level 1
Level 2 1c 2c
3c 4c
5c 6c
7c 8c9c 10c
11c 12c
13c 14c16c 17c
15c18c
19c 20c 21c
)100( ' =AN )100( ' =BN
)100( ' =CN )200( ' =DN
⎟⎟
⎠
⎞
⎜⎜
⎝
⎛⎥⎥
⎤⎢⎢
⎡ ×=
21 4
1,10maxkNm ε
Creates a coarse-‐grained, fixed size grid over data domain. Then issues count queries for each level-‐1 cell using
11 mm ×21m 1ε
Par//ons each level-‐1 cell into level-‐2 cells, is adap/vely chosen based on noisy count of level-‐1 cell
22 mm × 2m'N
⎥⎥
⎤⎢⎢
⎡ ×=
2
22
'41
kNm ε
21 εεε +=
[Qardaji’13]
Customized AG
VLDB 2014 14
Expected #workers (noisy count) in level-‐2 cells 2222 //' εkmNn ==
large leads to high communica+on cost n
Increase to decrease overhead, but only to the point where there is at least one worker in a cell
2m
1 0.5 6 2.8
0.5 0.25 5 5.6
0.1 0.05 2 28
J Customized AG %)88,2( 2 == hpk
ε 2ε 2m n
1 0.5 3 11 0.5 0.25 2 25 0.1 0.05 1 100
L Original AG )5( 2 =k
ε 2ε 2m n100'=N
⎟⎟⎠
⎞⎜⎜⎝
⎛−−=
2/1exp211
εPSD
hcountp
The probability that the real count is larger than zero:
Customized AG • Original AG and Customized AG adapts to data distribu/ons • Original AG minimizes overall es/ma/on error of region
queries while customized AG increases the number of 2nd level cells
VLDB 2014 15
Original AG Customized AG Yelp Dataset
Outline
v Background v Privacy Framework v Worker PSD (Private Spa/al Decomposi/on) v Task Assignment v Experiments
VLDB 2014 16
Analy/cal U/lity Model
VLDB 2014 17
SC-‐server establishes an Expected U%lity ( ) threshold, which is the targeted success rate for a task. > .
EUapEU
is a random variable for an event that a worker accepts a received task aa pFalseXPpTrueXP −==== 1)(;)(
X
waa
pUpwBinomialX)1(1
),(~
−−=⇒
Assuming independent workers. is the probability that at least one worker accepts the task
Uw
We define Acceptance Rate as a decreasing func/on of task-‐worker distance (e.g. linear, Zipian)
10);( ≤≤= aa pdFp
Acceptance Rate Func/ons
VLDB 2014 18
Acc
epta
cera
te
distance 0 MTD
0.5
Geocast Region Construc/on
VLDB 2014 19
Determines a small region that contains sufficient workers
2. Qci ←
4. If , return GR EUU ≥
5. MTDGRneighborsscneighbors i ∩−= }'{6. ; Go to 2. neighborsQQ ∪=
1. Init GR = {}, max-‐heap of candidates
Q = { the cell that contains }
tQ
t
1c 2c
3c 4c
5c 6c
7c 8c
9c 10c
11c 12c
14c16c 17c
15c
18c
19c 20c 21c
13c
3. )1)(1(1ic
UUU −−−←
Greedy Algorithm (GDY)
Par/al Cell Selec/on
VLDB 2014 20
t0t
icSub-cell 'ic
1t 2t 3t
4t
5t6t
7t
8t
Splitng ic
13c
1c 2c
3c 4c
5c 6c
7c 8c
9c 10c
11c 12c
14c
16c 17c15c
18c
19c 20c 21c
Splitng 7c
L The number of workers can s/ll be large with AG, especially when small 2ε
Allow par$al cell inclusion on the lastly added cell ic
Internet WLAN
Cellular
Mobile Ad-‐hoc Networks
Communica/on Cost
VLDB 2014 21
t
1c 2c
3c4c
5c 6c
7c 8c
9c 10c
11c 12c
14c
16c 17c15c
18c
19c 20c 21c
13c
The more compact the GR, the lower the cost
Measurement:
rangeionCommunicatcountHop
×=
2workerstwobetweendistanceFarthest
Infrastructure-‐based Mode v.s Infrastructure-‐less Mode
)()(
BALLMINareaGRareaDCM =
Digital Compactness Measurement [Kim’84]
Geocast Regions
VLDB 2014 22
A B
C
D
Outline
• Background • Privacy Framework • Worker PSD (Private Spa/al Decomposi/on) • Task Assignment • Experiments
VLDB 2014 23
Experimental Setup
• Datasets
• Assump/ons – Gowalla and Yelp users are workers – Check-‐in points (i.e., of restaurants) are task loca/ons
• Parameter setngs
• 1000 random tasks x 10 seeds
VLDB 2014 24
Name #Tasks #Workers MTD (km)
Gowalla 151,075 6,160 3.6
Yelp 15,583 70,817 13.5
}1,7.0,4.0,1.0{=ε}9.0,7.0,5.0,3.0{=EU
}1,7.0,4.0,1.0{=MaxAR
GR Construc/on Heuris/cs (Gow.-‐Linear)
VLDB 2014 25
0
20
40
60
80
100
120
Eps=0.1 Eps=0.4 Eps=0.7 Eps=1
GDY G-GR
G-PA G-GP
0
0.1
0.2
0.3
0.4
0.5
Eps=0.1 Eps=0.4 Eps=0.7 Eps=1
GDY G-GR
G-PA G-GP
0
2
4
6
8
Eps=0.1 Eps=0.4 Eps=0.7 Eps=1
GDY G-GR
G-PA G-GP
ANW WTD-‐FC HOP
GDY = geocast (GREedy algorithm) + original Adap/ve grid (AG) [Qardaji’13] G-‐GR = geocast + AG with customized GRanularity G-‐PA = geocast with PAr/al cell selec/on + original Adap/ve grid (AG) G-‐GP = geocast with Par/al cell selec/on + AG with customized Granularity
Effect of Grid Size to ASR
VLDB 2014 26
50
60
70
80
90
100
0.1 0.2 0.4 0.8 1.41 1.6 3.2 6.4 12.8 25.6
ASR
k2
Gowalla-Linear Gowalla-Zipf
Yelp-Linear Yelp-Zipf
Over-provision
Under-provision
Average ASR over all values of budget by varying k2
Compactness-‐based Heuris/cs (Yelp-‐Zipf)
HOP ANW
0
2
4
6
8
10
Eps=0.1 Eps=0.4 Eps=0.7 Eps=1
G-GP-Pure
G-GP-Hybrid
G-GP-Compact 0
20
40
60
80
Eps=0.1 Eps=0.4 Eps=0.7 Eps=1
G-GP-Pure G-GP-Hybrid G-GP-Compact
VLDB 2014 27
ANW WTD-‐FC ASR
Overhead of Archieving Privacy (Gow.-‐Zipf)
0
20
40
60
Eps=0.1 Eps=0.4 Eps=0.7 Eps=1
Privacy
Non-Privacy 0
0.1
0.2
0.3
0.4
Eps=0.1 Eps=0.4 Eps=0.7 Eps=1
Privacy
Non-Privacy 0
20
40
60
80
100
Eps=0.1 Eps=0.4 Eps=0.7 Eps=1
Privacy
Non-Privacy
VLDB 2014
28
Effect of Varying MAR (Yelp-‐Linear)
0
10
20
30
40
50
AR=0.1 AR=0.4 AR=0.7 AR=1
Eps=0.1 Eps=0.4
Eps=0.7 Eps=1
ANW CELL
0
0.1
0.2
0.3
0.4
AR=0.1 AR=0.4 AR=0.7 AR=1
Eps=0.1 Eps=0.4
Eps=0.7 Eps=1
WTD-‐FC
0
2
4
6
8
AR=0.1 AR=0.4 AR=0.7 AR=1
Eps=0.1 Eps=0.4
Eps=0.7 Eps=1
VLDB 2014 29
Effect of Varying EU (Yelp-‐Linear)
ANW CELL WTD-‐FC
0
10
20
30
40
50
EU=30 EU=50 EU=70 EU=90
Eps=0.1 Eps=0.4
Eps=0.7 Eps=1 0
0.1
0.2
0.3
0.4
EU=30 EU=50 EU=70 EU=90
Eps=0.1 Eps=0.4
Eps=0.7 Eps=1 0
2
4
6
8
EU=30 EU=50 EU=70 EU=90
Eps=0.1 Eps=0.4
Eps=0.7 Eps=1
VLDB 2014
30
Demo
VLDB 2014 31 hOps://www.youtube.com/watch?v=4zkiJ9gk79s
hOp://geocast.azurewebsites.net/geocast/
Conclusion
VLDB 2014 32
Iden/fied geocas/ng as a needed step to preseve privacy prior to workers consen/ng to a task
Introduced a novel privacy-‐aware framework in SC, which enables workers par/cipa/on without compromising their loca/on privacy
Provided heuris/cs and op/miza/ons for determining effec/ve geocast regions that achieve high assignment success rate with low overhead
Experimental results on real datasets shows that the proposed techniques are effec/ve and the cost of privacy is prac/cal
References
VLDB 2014 33
Hien To, Gabriel Ghinita, Cyrus Shahabi. A Framework for Protec%ng Worker Loca%on Privacy in Spa%al Crowdsourcing. In Proceedings of the 40th Interna/onal Conference on Very Large Data Bases (VLDB 2014)
Hien To, Gabriel Ghinita, Cyrus Shahabi. PriGeoCrowd: A Toolbox for Private Spa%al Crowdsourcing. (demo) In Proceedings of the 31st IEEE Interna/onal Conference on Data Engineering (ICDE 2015)