Differentially Private Publication of Location Entropyhto/resources/dple.pdf“iOS 10 adds Siri...
Transcript of Differentially Private Publication of Location Entropyhto/resources/dple.pdf“iOS 10 adds Siri...
1
DifferentiallyPrivatePublicationofLocationEntropy
HienTo,KienNguyen,CyrusShahabiUniversityofSouthernCalifornia
2
Outline
• Introduction– LocationPopularity– LocationEntropy
• PublishLocationEntropyPrivately• DifferentialPrivacyAdoption• Methods• EvaluationandExperiments
3
Location-EnrichedDatasets
• PopularityofLocation-BasedServices
Twitter: 10M+geo-taggedtweets/dayFoursquare: 5Mcheck-ins/day
Geo-TaggedTweetsonMap
byTwitter
NewYorkCity Tokyo Europemashable.comventurebeat.com/2015/08/09/
4
LocationPopularity
• Usedinvariousapplications:– Spatialcrowdsourcing[Toet.al.,2015]– Geosocialnetworks[Cranshaw et.al,2010]
– Multi-agentsystems[VanDykeParunak,Brueckner,2001]– Wire-lesssensornetworks[Wanget.al.,2004]
– Personalizedwebsearch[Leunget.al,2010]– ImageRetrieval[Yanai et.al.,2009]
5
LocationPopularityMeasurements
• Frequencyofvisits:– Numberofvisits
• Diversityofvisits:– Numberofuniqueusersvisiting
• LocationEntropy(LE):– Capturesbothfrequencyanddiversity– Intuitively:expectednumberofusersvisitingalocation
6
LocationEntropy
LessPopular
LE=0.566
Frequency=12Diversity=3
MorePopular
LE=1.099
Location1 Location2
7
LocationEntropyCalculationGivengeospatialcoordinates𝐿 = {𝑙%, 𝑙', … , 𝑙|*|},eachlocati𝑜𝑛𝑙4 isvisitedbyasetofusers𝑈 = 𝑢%, 𝑢', … , 𝑢 7 .
LocationEntropy:𝐻(𝑙) = −∑ 𝑝?,@log 𝑝?,@�
@∈7E𝑈? :thesetofdistinctusersthatvisited𝑙𝑝?,@ :fractionofvisitsto𝑙 thatbelongstouser𝑢,𝑝?,@ =
FEGFE
𝑐?: thetotalnumberofvisitsto𝑙
𝑙% 𝑙' … 𝑙|*|𝑢% 𝑐%% 𝑐'% … 𝑐*%… … … … …
𝑢 7 𝑐%7 𝑐'7 … 𝑐*7
(Shannon,1948)
8
PrivacyMotivation
• Locationdataissensitive!• “Uniquenessinthecrowd”[Nature2013]– 4 spatio-temporalpointsuniquelyidentify 95%individuals
• Noneed toknowabsolutelocationdatatoknowlocationpopularity
è GOAL:Publishentropyofalllocations𝑳withoutcompromisingusers’rawlocationdata
9
AdoptionofDifferentialPrivacy
“iOS10addsSiriintelligenceintoQuickType andPhotos,…,andopensupSiri,Maps,PhoneandMessagestodevelopers— whileincreasingsecurityand privacy withpowerfultechnologieslikeDifferential Privacy.”CraigFederighi, Apple’sseniorvicepresidentofSoftwareEngineering,June13,2016.
“RAPPOR enables learning statistics about the behavior of users’ softwarewhile guaranteeing client privacy. The guarantees of differential privacy,which are widely accepted as being the strongest form of privacy, ….RAPPOR introduces a practical method to achieve those guarantees.” ÚlfarErlingsson, Google’s Tech Lead Manager, Security Research
AppleandGooglehaveadaptedDPtodiscoverusagepatternsfromalargenumberofusers.
10
DifferentialPrivacy(DP)
DifferentialPrivacy[*]:Algorithm𝐴gives𝝐-differentialprivacy ifforallpairsofdatasets𝐷% and𝐷' differinginoneelement,andallsubsets𝑆 ofpossibleoutputs
OPQR(S TU ∈V)OPQR(S TW ∈V)
≤ exp(𝜖), 𝜖isthemeasureofprivacyloss
PrivatizedResults
PrivatizedAnalysisDatabase
𝑅𝐷 𝐴(𝐷) = 𝑅
Adversarydoesnotknowwhetheranindividualispresentornotintheoriginaldata: 𝐴 𝐷^_`a4b4a@c? ~𝐴(𝐷e_`a4b4a@c?)
[*] Dwork, Nissim, McSherry, Smith, Calibrating Noise to Sensitivity in Private Data Analysis, 2006
11
AchievingDifferentialPrivacy• ForqueryQ, addrandomlydistributed Laplace noise
scaledto∆gh
[*]
[*] Dwork, Nissim, McSherry, Smith, Calibrating Noise to Sensitivity in Private Data Analysis, 2006
• Sensitivity of𝑄:∆𝑄 = max(TU,TW)
𝑄 𝐷% − 𝑄 𝐷' %
• Highsensitivity∆𝑄 requiresmorenoisetobeinjected
è Lessaccurateoutput
èWant lowsensitivity
12
DPhasbeenusedtocomputeShannonentropy [Blumetal.PODS’05]
Challenges
DPforlocationentropy:
• Remove1userwillremoveALLhervisits
è highsensitivityofLE
o Alocationmayhaveafewusers (diversity)
o Oneuserscanvisitmanytimestoalocation(frequency)
o Oneusercanvisitmultiplelocations(frequency &diversity)
13
GlobalSensitivityofLocationEntropyDerivetightboundforthe(global)sensitivityofLE(∆𝐻)
Impactofasingleusertoalllocations:𝑴𝒎𝒂𝒙∆𝑯• 𝑀qcr isthemaximumnumberoflocationsvisitedbyauser
Baseline addsLaplacenoisewithscale𝑴𝒎𝒂𝒙∆𝑯:Publish:𝐻 𝑙 + 𝐿𝑎𝑝 𝑴𝒎𝒂𝒙∆𝑯
h[*]
GlobalsensitivityofLEis:∆𝐻 = max ln2, ln𝐶 − ln ln𝐶 − 1
• C isthemaximumvisitsausercontributestoalocation(𝐶 ≥ 1)Worst-caseDiversity Worst-caseFrequency
[*] Dwork, Nissim, McSherry, Smith, Calibrating Noise to Sensitivity in Private Data Analysis, 2006
14
ReducingtheGlobalSensitivityofLEBaseline injectsexcessivelyhighamountofnoise~𝑀qcr∆𝐻• linearlyincreaseswithM andmonotonicallyincreaseswithC
0 10 20 30 40
0.70.80.9
11.11.21.3
CG
loba
l Sen
sitiv
ity
max ln2,ln𝐶 − ln ln𝐶 − 1Reducenoiseby
thresholdingM and C
14.561
Limitreducesperturbationerror atthecostofincreasingapproximationerror
Limit limitsactivityofusers(satisfies 𝜖-DP)• whovisitmorethanM locations• whohavecontributedmorethanCvisits toalocation
15
Localsensitivityofaparticularlocation𝑙 with𝑛 usersis:• ln 2 when𝑛 = 1• ln(`^%
`) whenC = 1
• max
ln `e%`e%^{
+ {`e%^{
ln 𝐶 ,
ln ``^{
+ {`^{
ln 𝐶 ,
ln 1 + %
|}~ �� `e% e�� ���U ^��
�� ���U ^%
RelaxationwithSmoothSensitivity• Uses localsensitivity(LS)bound ofeachlocation
• dependsonC andthenumberofusersvisitingthelocation(n)
16
RelaxationwithSmoothSensitivity• Uses localsensitivity(LS)bound ofeachlocation
• dependsonC andthenumberofusersvisitingthelocation(n)
• SmoothsensitivityS upperboundonthelocalsensitivity
[*] Nissim, Raskhodnikova, Smith. Smooth Sensitivity and Sampling in Private Data Analysis, STOC 2007
• Limit−SSsatisfies 𝜖, 𝛿 -differentialprivacy• Publish𝐻 𝑙 + 𝐿𝑎𝑝 �∗'∗V
h[*]
• SScanbepre-computed,regardlessofthedataset
17
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
Number of users (n)G
loba
l sen
sitiv
ity
C = 10C = 20
RelaxationwithCrowd-BlendingPrivacy• Sensitivityislowifdatasetisdiverse(locationsvisitedby
manydifferentusers)– removingoneusernottoocostly
• Limit-CB publishesonlylocationswithatleastkusers• Limit-CBsatisfies(𝑘, 𝜖)-crowd-blendingprivacy[*]
Globalsensitivityoflocationentropyforlocationswithatleast𝑘 users(𝑘 ≥ {
�� { e%+ 1) is
thelocalsensitivityat𝑛 = 𝑘.
[*] Gehrke, Hay, Lui, Pass. Crowd-Blending Privacy, CRYPTO 2012
18
SensitivityComparisonSensitivityinnaturallogscale:• 𝜖 = 5, 𝐶qcr = 1000,𝑀q = 100, 𝐶 = 20,𝑀 = 5, 𝛿 = 10e�, 𝑘 = 25
Sensitivity
Summary:• StandardDPnotpractical
• Limit C(frequency)andM(diversity)alreadyveryeffective
• Smooth-Sensitivity bestfordense datasets.
• Crowd-Blending bestforsparse.
Sensitivity
19
Experiments• Syntheticdatasets:
– #locations:10k– #users:
• Sparse:100k• Dense:10M
– Power-lawdistribution• Realdataset:
– Gowalla NewYork(sparser thanSparsesyntheticdataset)• Measurements:
– KL-Divergence– MSE
20
DistributionsofNoisyvs.ActualLE
0 2000 4000 6000 8000 100000
2
4
6
8
10
12
14
16
18
Location id
Noi
sy e
ntro
py
0 2000 4000 6000 8000 100000
2
4
6
8
10
12
14
16
18
Location id
Noi
sy e
ntro
py
0 2000 4000 6000 8000 100000
2
4
6
8
10
12
14
16
18
Location id
Actu
al e
ntro
py
0 2000 4000 6000 8000 100000
2
4
6
8
10
12
14
16
18
Location id
Noi
sy e
ntro
py
Actualdistribution Limit
Limit-SS Limit-CB
21
Thankyou
Question?