Post on 29-Jul-2020
Reversegeocodingandimplica1onsforgeospa1alprivacy
PaulZandbergenDepartmentofGeographyUniversityofNewMexico
Outline
• Geospatial privacy • Geocoding / reverse geocoding • Experimental design • Results and Conclusion
Geospa1alPrivacy
• Underincreasingthreatfromnewtechnologies,highresolu1ondataandeasy‐to‐usetools
How easy is it to hack this map?
Geocoding
• Widelyemployed
• Wellunderstood• Substan1alerrors
Toolofahacker:Reversegeocoding
• Geocodinginreverse• Rela1veeasy,rela1velynew• Keytoolfor“hacking”publishedmaps• Notwellunderstood
ID Address ZIP 101 123 Main St 12345 102 456 Central Ave 12346 … … …
Howtoprotectspa1alconfiden1ality?
• Geographicmasking
• Buthowtodothismosteffec1vely?• NeedbeRerunderstandingofreversegeocoding
randompointwithincircle
randompointoncircle
randomdistanceanddirec1on
donutmasking
donutmaskingwithexclusion
ReviewoftheStateoftheArt
Panel of confidentiality issues arising from the integration of remotely sensed and self-identifying data
“No known technical strategy [….] for managing linked spatial-social data adequately resolves conflicts among the objectives of data linkage, open
access, data quality, and confidentiality protection across datasets and across
uses.” (Conclusion 3)
National Research Council, 2007
ResearchQues1ons
What are the capabili5es of reverse geocoding to iden5fy individuals from published loca5ons?
How does this vary with the methods employed for geocoding and reverse geocoding?
How does this vary with popula5on density?
ExperimentalDesign
• Actualbuildingloca1onswithknownaddresses– TravisCounty,TX(Aus1n)– Sampleof2,500residen1alloca1ons
– Stra1fiedacross5popula1ondensityclasses• Geocodeusing5differentgeocoders• Reversegeocodeusing3differenttechniques• Determineaccuracyofreversegeocoding
AddressPoints
Geocoders
• TeleAtlasAddressPoints(commercial)• TeleAtlasStreets(commercial)
• GoogleMaps(freeAPI)
• StreetMapUSAPro2007inArcGIS
• Geoly1cs2007(usingTIGER2007data)
StudyArea–TravisCounty,TX
Popula1onDensityZones Sampleofresiden1aladdresspoints
ReverseGeocoding&Accuracy
• Originalresiden1aladdresspoints– Snaptonearestresiden1albuilding
• TeleAtlasreversestreetgeocoding– Submitforcommercialprocessing
• GoogleMapsreversegeocoding– FreeAPI
• Accuracyofreversematches1. Perfectmatch(streetnameandnumber)
2. Closematch(numberwithin10)
3. Samestreetonly
Results–MatchRates(%)
GeocodingTechnique Popula1ondensity(people/km2)
<50 50to250 250to1000 1000to2500 >2500 Total
TeleAtlasAP 43.6 70.2 91.8 92.8 93.6 78.4
TeleAtlasStreet 93.2 92.8 98.0 99.8 96.8 96.1
GoogleMaps 92.2 95.6 98.2 99.0 96.2 96.2
StreetMapPro 81.2 83.6 95.8 99.0 95.8 91.1
Geoly1cs 77.0 80.2 92.4 96.2 90.2 87.2
Combined 31.4 59.0 86.0 89.6 86.8 70.6
Results–SameStreetMatches
ReverseGeocodingTechnique
Aus1nAP GoogleMaps TeleAtlasStreet
GeocodingTechnique
Aus1nAP 100.0 96.7 90.1
TeleAtlasAP 99.5 92.0 89.6
GoogleMaps 99.2 32.0 90.5
TeleAtlasStreet 88.2 92.5 99.5
StreetMapPro 89.1 76.8 82.5
Geoly1cs 54.5 54.7 56.9
Results–CloseReverseMatches
ReverseGeocodingTechnique
Aus1nAP GoogleMaps TeleAtlasStreet
GeocodingTechnique
Aus1nAP 100.0 95.5 23.1
TeleAtlasAP 99.1 91.8 24.3
GoogleMaps 98.8 28.6 24.3
TeleAtlasStreet 69.8 63.2 92.5
StreetMapPro 60.8 42.9 44.0
Geoly1cs 33.8 27.4 29.3
Results–PerfectReverseMatches
ReverseGeocodingTechnique
Aus1nAP GoogleMaps TeleAtlasStreet
GeocodingTechnique
Aus1nAP 100.0 94.2 9.1
TeleAtlasAP 97.9 91.8 9.0
GoogleMaps 97.2 27.8 9.1
TeleAtlasStreet 18.7 9.4 55.0
StreetMapPro 17.0 7.3 8.2
Geoly1cs 7.1 2.3 2.6
EffectofPopula1onDensity–PercentPerfectMatches
Geocoding Reverse Popula1ondensity(people/km2)
<50 50to250 250to1000 1000to2500 >2500
StreetMapPro Aus1nAP 22.9 16.9 15.3 16.1 17.3
Geoly1cs Aus1nAP 14.0 7.1 7.0 5.4 6.5
Aus1nAP TeleAtlasStreet 2.5 6.8 11.6 11.4 8.3
TeleAtlasStreet TeleAtlasStreet 51.6 63.7 56.7 53.8 49.8
GoogleMaps TeleAtlasStreet 4.5 6.4 12.3 11.2 7.4
TeleAtlasStreet GoogleMaps 9.6 9.8 8.8 9.6 9.4
Geoly1cs GoogleMaps 1.9 2.0 2.6 2.5 2.3
ResultsSummary
• Accuracyofreversegeocodingvariesgreatly• Buildinglevel(reverse)geocodingistypicallymostaccurate• Streetgeocodingisquitenoisy
– Easytogettherightstreet– Veryfewperfectmatches
• Accuracyissubstan1allyimprovedifknowledgeoftheoriginalgeocodingtechniqueisavailable!
• NoclearpaRernwithpopula1ondensity
RoopopGeocodinginGoogleMapsandVirtualEarth
Commercial Address Points - TeleAtlas
Source: TeleAtlas, 2009
Licensed to Google, Virtual Earth, ArcGIS Business Analyst, Pitney Bowes / Group 1 / MapInfo
51 million address points in the US
ReverseGeocoding
ReversegeocodingnowsupportedinGoogleMapsandMicrosopVirtualEarth
AlsosupportedinlatestversionofArcGIS–requiressomecustomiza1onorArcWebservices
Numerousfreeeasy‐to‐useonlineu1li1es
Conclusions
• Accuracyofreversegeocoding– Variesgreatlywithgeocoder/reversegeocodercombina1on
– Between2%and98%perfectreversematches
• Knowledgeoforiginalgeocodingmethodiscri1cal– noisyresultsfromstreetgeocodingcanbereversecoded
• Trends:– Addresspointsarethenewstandardingeocoding– Reversegeocodingisrela1velyeasy
• Techniquestoprotectprivacymayneedtoassumeaworst‐casescenario:veryhighresolu1onaddressdata
FutureResearch
• Replicateinotherstudyareas• Examineurban/ruralgradientsmoreclosely• Experimentwithdifferentmaskingtechniques• Developaframeworkforspa1alκ‐anonymity
Acknowledgements
• Na1onalScienceFounda1on• UNMResearchAlloca1onCommiRee• AmericanCivilLiber1esUnion