A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis...
Transcript of A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis...
A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories
Jennifer A. Miller
2015UT CID Report #1512
This UT CID research was supported in part by the following organizations:
identity.utexas.edu
ACOMPUTATIONALMOVEMENTANALYSISFRAMEWORKFOREXPLORINGANONYMITYINHUMAN
MOBILITYTRAJECTORIES
BackgroundAdvancementsintrackingtechnologiessuchasglobalpositioningsystems(GPS),radio
frequencyidentification(RFID),cellularphonenetworks,andWiFihotspotshaveresulted
insignificantincreasesintheavailabilityofhighlyaccuratedataonmovingobjects,with
unprecedentedhighspatialandtemporalresolution.Withingeographicinformation
science(GIScience),‘computationalmovementanalysis’(CMA)hasrecentlyemergedasa
subfieldthatfocusesonthedevelopmentandapplicationofcomputationaltechniquesfor
collecting,managing,andanalyzingmovementdatainordertobetterunderstandthe
processesthatareassociatedwiththem(Gudmundssenetal.2012).Asthesetechnologies
facilitatethecollectionofnear-seamless(insomecasessub-second)movementtracks,the
‘spatiotemporalfootprint’ofanindividual’smovementcanbeexploredusingCMA
techniques.
Theselocationdataareoftenstudiedas‘trajectories’,comprisedofaseriesoftime-
stampedsequentiallocations.Dependinguponthecollectionmethod,thelocation
informationcanberepresentedbypreciselatitudeandlongitudecoordinates(e.g.,GPS
datafromasmartphoneorotherdevice)ortheuniquecatchmentareaofasinglecellular
tower(e.g.calldetailrecordsfromcellularphones).Theserelativelylowcostlocationdata
areusedtoexplorehumanmobilitypatternsrelatedto,forexample,urbanplanning,
transportationinfrastructure,disasterplanning/evacuationstrategies,potentialdisease
spread,andmanyotherapplications(Beckeretal.2013).
Theabilitytostudyhumanmobilityandissuesrelatedtointeractionwiththeenvironment
orotherindividuals,andthebehaviorstheseinteractionssuggesthasbeengreatly
enhancedbytechnologicaladvancementsthatfacilitatethecollectionofhighquality
locationdataatunprecedentedspatialandtemporalresolutions.However,asoften
happenswithtechnologicaladvancements,thecollectionofthesedatahaspreceded
extensivestudyonhowandwhattheycan(orshould)beusedfor,aswellastheprivacy
implicationsassociatedwithdistributinginformationonanindividual’slocation.The
researchpresentedhereexploresissuesrelatedtoprivacyandidentityassociatedwith
morerecentlyavailablehighresolutionGPSlocationdata.Theanalysisfocusesonusing
methodsfrommovementpatternanalysisandspatialstatisticalmethodstoaddressthe
followingissues:
• Canactivity“hotspots”beidentifiedfrommovementdataandhowcantheir
spatiotemporalstructurebeexplored?
• How“unique”areanonymizedmovementtrajectories?Howistheiruniqueness
affectedbyspatialandtemporalresolution?Canmovementcharacteristicssuchas
speedbeusedtouniquelycharacterizetrajectories?
Usingmovementpatternanalysistoidentifypotentialactivity“hotspots”fromGPS
trajectorydata:acasestudyusingtaxicabdatainSanFrancisco.
Animportantapplicationusinglocationdatainvolvesexploringthespatiotemporalpattern
ofactivitytheyrepresent.Previousexamplesfocusedpredominantlyoncalldetailrecords
(CDR)thatwereaggregatedtotheirnearestcellulartower(seeGao2015forreview).
Spatialautocorrelationanalysiswasusedtoidentify“source”areaswithmoreoutgoing
callsand“sink”areas,wheremoreincomingcallsoccurred.Morerecently,GPSdatahave
beenusedtoexploremovementactivityoftaxisinShanghai(DengandJi2011),taxisin
NewYorkCity(Qianetal.2015),andcementtrucksinAthens(Orellanaetal.2010).While
therearecertainpredictablespatialpatternsoftaxicablocationandmovementrelatedto
citystructure(e.g.greateractivityincentralbusinessdistrict)ortimeofday(e.g.towards
andawayfromCBDinmorningandevening,respectively),therearealsostochastic
elementsassociatedwithotherfactorsthatcanoftenberelatedtoephemeralactivitiesand
passengerbehaviors.
Ihypothesizedthatthespatiotemporalstructureofthecollectivemovementofthetaxicabs
couldbeusedtoinferpoints-of-interest(POI)oractivity“hotspots”,andthatsomehot
spotswouldemergeordisappeardependingonthetimeofday.
Researchquestions:Howcanmovementpatternanalysisandspatialstatisticsbeusedto
identifycollectivepointsofinterestfromGPSlocationdata?Howcanthespatiotemporal
structureofthesemovementactivitiesbeexplicitlyanalyzedandvisualized?
DataSanFranciscoCabDataset(http://crawdad.org/epfl/mobility/20090224/).Iused40cabs
andextracteddataforoneweekday(WednesdayJune4,2008)toexaminehowmovement
analysisandspatialstatisticscanbeusedtoexplorepotentialpointsofinterest(POI).The
temporalresolutionwasapproximately1minute.GPSlocationsforeachofthe40cabs
werepartitionedintooneofthreetemporalbins:morning(7-10am,n=4634),afternoon
(4-7pm,n=6009),andevening(9pm-12midnight,n=6087).
MethodsTwodifferentmethodswereusedtoexplorehotspotactivities:thefirstmethodinvolved
aggregatingthetaxilocationstoa250meterx250metersquare(sizewasselected
becauseitisgreaterthan1cityblockbutlessthan2blocks)forasubsetofdowntownSan
Francisco(peakactivity).Thenumberoftaxilocationsforeachsquareandforeachofthe
threetimeperiods(morning,afternoon,evening)wascountedandanalyzedusingglobal
andlocalMoran’sI.
Wherexisthecountoftaxicabsandwijisthespatialweightsmatrixusedtorepresent
whatis“near”.Iusedboth1stand2ndorder(row-standardized)contiguityforspatial
weightsmatrixhere.Moran’sIrangesfrom[-1]indicatingextremenegativespatial
autocorrelationto[+1],indicatingextremepositivespatialautocorrelation,withvalues
near0indicatingnoautocorrelation.
Anselin(1995)introducedalocalstatisticsthatdecomposedtheglobalMoran’sItoalocal
measure(LISA-localindicatorofspatialautocorrelation)as:
Whereavalueiscalculatedforeachobservation.Asinglestatisticisnolongerreported
withLISA,butthevaluescanbemappedandthespatialdistributionofspatial
autocorrelationcanbeexplored.
Figures2-4showthelocalspatialautocorrelationofthetaxicabsforthemorning(fig.2),
afternoon(fig.3),andevening(fig.4)timeperiods,alongwiththerawcounts.Thereisa
coreofrelativelyhighcountsintheupperrightofthestudythatismaintainedforalltime
periods,butthemagnitudeofthiscoreisdifferentforeachtimeperiod,rangingfroma
smallclusterofhigh-highvaluesinthemorning(fig.2b)tothelargestclusterforthe
afternoon(fig.3b).WhiletheglobalMoran’sIwaspositiveandstatisticallysignificantfor
alltimeperiods(usingMonteCarlopermutations(n=499),indicatingthattheoverall
patternwasnearvaluesweresimilartoeachother,therewereoutliersforeachtime
period.Therewere7high-lowcellsinthemorning-cells(fig.2b)thathadahighcount
surroundedbyneighborswithlowcells-whichcouldindicateanisolatedareaofhigh
activity.Additionally,asinglestatisticallysignificantpositivevaluefor(global)Moran’sI
indicatesoverallpositivespatialautocorrelation,butcannotdifferentiatebetweenclusters
ofhighvaluesandclustersoflowvalues.MappingthelocalIivaluesillustratesthat,in
additiontothecoreofhightaxiactivity,thereisacoreoflowtaxiactivityinthebottomleft
foralltimeperiods,aswellaspocketsofnegativespatialautocorrelation(high-lowand
low-high).
ThelocalstatisticLISAcanbefurtherextendedtomeasurecross-correlationbetweenthe
valueofavariableforatargetcellcomparedtothelaggedvalueofadifferentvariablefor
itsneighbors(inequation2.0above,thexvariablesontherightsideoftheequationwould
representadifferentvariable).BivariateLISAstatisticsareparticularlyusefulforstudying
thechangeinavariableacrosstimeperiods(Anselinetal.2007).Figure5ashowsthe
bivariateLISAformorningcountsasthetargetcomparedtotheneighboringcells’counts
fortheafternoon.Ahigh-highandlow-lowcellswouldbeinterpretedasanareaofhighor
lowactivity,respectively,acrossbothtimeperiods,whilealow-highwouldidentifyacell
thathadlowactivityinthemorningcomparedtohighactivityamongitsneighborsinthe
afternoon.Converselyahigh-lowwouldindicateacellthathadhighmorningactivity
comparedtothelowafternoonactivityofitsneighbors.Thelow-highcellsfringingtheCBD
showthattheareaofactivityincreasesfrommorningtoafternoon.
Figure6ashowstheafternoon-eveningpattern,whereahotspotemergesinthe
southeasternpartofthestudyareanearamajorfreeway(Bayshore).Figure7acompares
morningcountstoevening,andthisareaisalargehotspot,confirmingthatitisanareaof
highactivityinthemorningandevening,butrelativelylowactivityintheafternoon.A
high-lowcellhererepresentsanareaofhighactivityinthemorningthatislessactivein
theeveningandlow-highiscellsforwhichactivityishigherintheeveningcomparedto
morning.
AmorerecentextensiontobivariateLISAisthedirectionalMoranscatterplot,which
allowsforbettervisualizationofthedynamicsbetweenchangingspatialpatternsacross
timeperiods(Rey2014).ThedirectionalLISAshowsthemovementofthestatisticsacross
twotimeperiods,andthereforeincorporatesinformationfromtwodifferentMoran
scatterplots.Forexample,figure5bshowsthechangeinLISAstatisticforeachcellfrom
morningtoafternoon:eachvector‘starts’initspositionformorning(fromfigure2b)and
‘ends’initspositionforafternoon(fromfigure3b).Thesmallarrowsinthetopleftlow-
highquadrantrepresentcellsthatwerelowactivitysurroundedbyhighactivityinboth
morningandafternoon.Figure6bshowsthattherewasmuchmorevariationinLISA
statisticsinafternoonandevening.Thevectorthatishighlightedwithayellowstar
representsacellthatwasa‘coldspot’,orareaoflowactivityintheafternoon,butbecamea
hotspotintheevening.
Inadditiontomeasuringthespatialpatternofaggregatedcountstoidentifylikelyactivity
hotspots,amorenovelmethodinvolvesmeasuringthespatialautocorrelationof
movementparameters,specificallyspeed.OrellanaandWachowicz(2011)usedLISAto
analyzepedestrianmovementinordertouncover“movementsuspension”(low-low
clusters)theysuggestedwouldindicatepointsofinterestoractivityhotspots.Aftertesting
differentnearestneighborspatialweightsmatrices,onethatconsidersonlythe10closest
neighborstobe“near”wasusedtomeasurespatialautocorrelationforallpointswithin
eachofthethreetimeperiods.Asthevariableofinterestisnowspeed,alow-lowcouldbe
usedtosuggestanareaofinterestorahotspot,whilehigh-highpointswouldmostlikely
beassociatedwithfreeways.Low-highsandhigh-lowswouldlikelybedifficultto
disentanglefromvariabletrafficpatterns(e.g.“stopandgo”traffic).Figure8ashowsthe
patternforthemorningperiod(fig.8biszoomedintodowntownSanFrancisco).Figures
9aand9barefortheafternoonperiodandfigures10aand10bareevening.Thespatial
distributionofthese‘suspendedmovement’areasrepresentedbylow-lowsvarieswiththe
timeofday.Inordertomoreeasilyvisualizethedifferencesinthespatialdistributionof
low-lows,kerneldensityofjustthelow-lowpointswascalculatedanda‘homerange”or
areawheremostofthemwasoccurredwasextracted.Whiletheafternoonandevening
low-lowclustersareinsimilarplaces,themorninglow-lowclustersextendfarther
downtown.Also,theafternoonandeveningclustersatthebottomrightrepresenttheSan
FranciscoAirport,wheretherewaslessslowspeedamongtaxicabsinthemorningperiod.
Examiningthe‘uniqueness’ofhumanmobilitytrajectories:acasestudyusingsmartphonedatafromBeijingAggregatingcountsofGPSlocationstoalargerareaapproximatesthespatialresolution
availablewhenthesestudiesweredonewithcellphonetowersandcallcountswere
aggregatedtotheVoronoipolygondrawnaroundeachcellulartower.However,thereare
importantprivacyissuesassociatedwithdealingwithactualGPSlocationsthatareoften
overlooked.Locationdataareoftenreleasedaftertheyhavebeen‘anonymized’—which
meansthatthetrajectoryhasbeenstrippedofanyobviousidentifyinginformationsuchas
name,address,phonenumber,etc.However,personalpointsofinterest(home,work)can
stillbeidentifiedbyminingtrajectorydataformovementpatterns,andthesepointsof
interestareoftenassociatedwithuniqueindividuals.Additionallocationsmayberesolved
thatcouldhavenegativeimplications(ex.repeatedvisitstoamedicalclinicmaybeacause
forconcernforemployers).
Duetodataavailability,mostofthepreviousworkon“unicity”ormeasuringthe
uniquenessofmovementtracesortrajectorieshasbeenwithmuchcoarserscaledcell
phonedata.Surprisingly,evenrelativelycoarsespatialresolutionlocationdatasuchasthat
associatedwithcalldetailrecords(CDR),where‘location’isanareadefinedbyits
proximitytoaspecificcellphonetower,canbeusedtouniquelyidentifyanindividual.
Locationsofcellphonetowersorantennaearebasedonpopulationdensityandthearea
associatedwitheachonevariesconsiderably.IntheirstudyinasmallEuropeancountry,
deMontjoyeetal.(2013)foundthatthereceptionorcatchmentareaforanantennaranged
from0.15km2inurbanareasto15km2inruralareas.
ZangandBolot(2011)usedanonymizedCDRfrom25millionindividualsacrosstheU.S.to
determinethe“topN”locationsatwhichcallswererecordedforeachofthreemonths.
TheyfoundthatwhenN=2(typicallycorrespondingtoworkandhome),theyfoundthat
upto35%oftheindividualscouldbeuniquelyidentified.WhenN=3(theysuggestedthe
3rdlocationtypicallyrepresentedaschoolorshoppingrelatedlocation),50%couldbe
uniquelyidentified.
Intheirseminalstudy,deMontjoyeetal.(2013)usedfifteenmonthsofanonymizedmobile
phonedata(CDR)for1.5millionindividualsinawesternEuropeancountryandfoundthat
fourrandomlyselectedspatiotemporalpointsweresufficienttouniquelyidentify95%of
theindividuals.Perhapsmoretroubling,theyfoundthatover50%ofindividualswere
uniquelyidentifiablefromjusttworandomlyselectedlocations(typicallyalso
correspondingtohomeandwork).Songetal(2014)foundsimilarresultswithadatasetof
oneweekofmobilitydatafor1.14millionpeople(total56millionrecords):withjusttwo
randompoints,60%ofthetrajectorieswereunique.
Itisimportanttonotethatuniquenessdoesnotequatetore-identifiabilityandthe
objectivesofthesestudiesweretoexaminehowuniqueindividualtrajectorieswere,notto
actuallydeanonymizethemorre-attachanindividual’sinformationtoauniquetrajectory.
However,theabilitytodetermineuniquenessoftrajectoriesisanimportantprerequisite
forre-identification(whichwouldinvolvecorrelationwithanancillarydataset)and
therefore,representsapotentialthreattoindividualprivacy.
Thedegreeofuniquenessoftrajectoriescanvaryasafunctionoffactorssuchastypical
commutingpatterns,transportationmodes,andgeographicalregion(whichaffects
commutingpatternsandtransportationmodes).Therehavebeenseveralmethods
proposedtoquantifythe‘anonymity’ofadatabase.Themostcommonlyusedmethodofk-
anonymitywasintroducedbySweeney(2002)asameasuretoincreaseanonymityfor
non-spatialdatabases.Whenappliedtospatialdatabases,itensuresthatanysetofrecords
(locations)foranindividualisatleastthesameask-1individuals.Generally,k=2,
ensuringthatatleasttwotrajectoriesareequivalent,butaskincreases,sotoodoesthe
anonymity.Extensionsofk-anonymityincludel-diversityandt-closeness(Lietal.2007).
Thesemeasuresaregenerallyusedtomanagetrajectorydatasets(i.e.,datawouldbe
manipulatedsothatthelevelofanonymityreachedthereportedklevel),butinorderto
quantifytheactuallevelofanonymityoftrajectorydatasets,arigorousanalysiscomparing
randompointsfromeachtrajectorytoallothertrajectorieshastobeconducted.With
trajectorydatasetsnowavailableatonesecondintervals,thevolumeofthesedatacan
resultincomputationallyintensiveanalysis.
Montjoyeetal(2013)measured‘unicity’asthepercentageof2500randomtracesthat
wereuniquegiveprandompoints(prangedfrom2to5).Songetal(2014)defined
uniquenessoftrajectoriesasthepercentageofallavailabletrajectoriesthatwereuniquely
associatedwithprandompoints,whichtheyvariedfrom2to6.Whileanonymity(orlack
thereof)hasbeenstudiedwithCDRdata,asthepreviousexamplesshow,ithasnotyet
beenaddressedwithfinerspatiotemporalresolutionavailableasGPSlocationsfrom,e.g.,
smartphones.Thesedatasetscouldpotentiallybefarmoreuniqueandthereforemore
difficulttoanonymize.
Inthisstudy,IdoanextensivestudyoftheunicityofGPSmovementtrajectoriestestingthe
effectofspatialresolutionandtemporalresolution.Inadditiontolocation,Ialsoexplore
howeffectivemovementparameterssuchasspeedcouldbeforuniquelyidentifyinga
trajectory.Thisisoneofthefirststudiestomeasureunicityoftrajectoriescomposedof
GPSlocations.
IhypothesizedthattheunicityoftrajectorieswillbegreaterforGPSlocationsthanthe
coarserscaledCDRlocations.Ialsohypothesizedthattheattenuatingeffectsofcoarsening
spatialandtemporalresolutionwillhavelessimpactthantheywouldwithCDRlocations.I
expectthatspeedwillalsobeeffectiveforuniquelyidentifyingatrajectorywhenseveral
datapointsareused.
Researchquestions:Howcananonymitybequantifiedfordifferenttypesoftrajectories
andhowisitaffectedbyspatialandtemporalresolution?Canmovementcharacteristics
suchasspeedbeusedtouniquelycharacterizeatrajectoryintheabsenceofactual
locations?
DataMicrosoftGeoLifeTrajectories(http://research.microsoft.com/en-
us/downloads/b16d359dd164-469e-9fd4-daa38f2b2e13/).Thisisanextremelydense
dataset,withtemporalresolutionof~1-5secondsandspatialresolutionof~5-10meters.
Weusedonlyoneyearofdata(January2009-December2009)andusedaspatialmaskof
Beijing(39.6°to40.2°Nlatitude),(116°to116.8°Elongitudes)toremoveuserswho
traveledoutsideofthecityduringthistimeperiod.Thisresultedin71userswhohada
totalof7,243dailytrajectories(numberoflocationsvisitedwithintrajectoriesvariedbut
themeanwas1600).
MethodsThebasisofourunicitytestinvolvedextracting500setsofpointsofsizenfromeachuser
andcountinghowmanyothertrajectoriestheyarefoundin.Thepercentageof500setsof
pointsthatmatchedonlyonetrajectorywascalculatedandthiswasdoneforeachof71
usersforfourdifferentpointsizes(n=2,3,4,and5).Ourmeasureofunicity,u,isthe
percentageof500randompointsofsizenthatarecontainedinonlyonetrajectory
averagedacrossall71users.Aunicityvaluecloseto100indicatesahighlyunique
trajectorythatcouldtheoreticallybedeanonymized,orre-connectedwithidentifyinguser
informationmoreeasily;alowunicityvaluesuggeststhattherandomsetofpointsare
containedinseveraldifferenttrajectoriesandthereforewouldmakede-anonymizing
trajectoriesfarmorechallenging.Theamountofinformationfromeachpointwasvaried-
weusedjustlocation(xandy),locationandtime(x,y,andt),andtheabsoluteangle(the
absoluteangleforpointiismeasuredbetweenthexdirectionandthestepbuiltby
relocationsiandi+1).
Theoriginallatitudeandlongitudecoordinatesfortheselocationshaveaspatialprecision
ofsixdecimalplaces(~0.1m).Inordertotesthowspatialandtemporalresolutionaffected
measurementofunicity,thegeographiccoordinateswerecoarsenedfirsttofourdecimal
places(~10m)andthetemporalresolutionwascoarsenedto30seconds,thenfurther
coarsenedtothreedecimalplaces(100m)and60seconds.Additionally,theprecisionof
theabsoluteanglemeasurewasdecreasedfromtheoriginal(fivedecimalplaces)tothree
decimalplaces.
Figure12displaysthetrajectoriesfortwodifferentusersforasingleday.Thezoomedin
subsethelpstoillustratetheimportanceoflocationprecision-thereareseverallocations
thatwouldbethesameforbothusersifthelocationprecisionwascoarsenedten-or100-
fold.Figure13showsthreedifferentusers’dailytrajectoriesinaspace-timecube.Thered
andgreenusersoverlapintime,butnotspace,whiletheredandblueusersoverlapin
spacebutnottime.Theuseofallthreepiecesofcoordinateinformation-x,y,andt-canbe
extremelyimportantforuniquelyassociatingasingletrajectory.
ResultsTable1showstheunicityvaluesassociatedwithsizeofeachrandompointset.Themean
wastheaverageunicityacrossall71users,whiletheminimumandmaximumshowthe
variationinunicityamongusers.Ingeneral,thelocationsofpointsonatrajectorywere
highlyunique.90%oftherandomsetsofjusttwopointscomposedofonlylocation(no
timestamp)wereassociatedwithonlyonetrajectory.Addingthetimestampincreasedthe
unicityoftwopointsto97%.Whenfivepointswithlocationandtimestampwereused,the
unicityincreasedtoalmost99%.Somewhatsurprisingly,theangleofmovementalonehas
fairlyhighunicity—whentheangleofthreepointsaretested,theunicityissimilartothe
unicityoflocationforCDRasfoundindeMontjoyeetal.(2013)andSongetal.(2014).Five
anglevaluescoulduniquelyidentifyatrajectory73%ofthetime.
Table1:unicityresultsforlocation(x,y),locationandtime(x,y,t)andtheabsoluteangleof
apoint.Meansandrangesarereportedfor500setsofrandompointsforeachof71users.
Unicityvaluesforcoarsenedlocation,time,andabsoluteangleareshownintable2.When
justtwopoints(notimestamp)areusedatthecoarserresolution(spatialprecision
reducedtenfoldto~10m),only68.5%ofthetimearethepointsassociatedwithaunique
trajectory.Whenthe30secondslessprecisetimestampisadded,theunicityissimilarto
theoriginalresolution,andwithfivepointswithlocationandtimestamp,unicityincreases
to~94%.Theunicityoftheabsoluteangledegradessubstantially—evenusingasetoffive
pointsresultsinlessthan5%unicity.
Table2:unicityresultsforlocation(x,y),locationandtime(x,y,t)andtheabsoluteangleof
apoint.Spatialresolutionhasbeencoarsenedto4decimalplaces;temporalresolutionhas
beencoarsenedto30seconds;andabsoluteanglecoarsenedtothreedecimalplaces.
Meansandrangesarereportedfor500setsofrandompointsforeachof71users.
Table3showsunicityvaluesforthecoarsestlocationcoordinates:thespatialresolutionof
anx/ypairisnow~100mandthetemporalresolutionwascoarsenedtooneminute.The
spatialresolutionhereisclosertotheresolutionoftheantennareceptionareasusedinthe
deMontjoyeetal.(2013)paper(wherespatialresolutionrangedfrom115mto15km),
butthecoarsenedtemporalresolutionisstillmuchmoreprecisethantheoneusedinthe
CDRstudies.Asaresult,usinglocationandtimeforjusttwopointsstillresultsinahigh
unicity(mean80.3%),whilefivepointsincreasesthemeanunicitytoalmost88%.Using
justlocation(notimestamp),theunicitydegradesto32%fortwopointsand66%forfive
points.
Table3:unicityresultsforlocation(x,y),locationandtime(x,y,t).Spatialresolutionhas
beencoarsenedto3decimalplaces;temporalresolutionhasbeencoarsenedto60seconds.
Meansandrangesarereportedfor500setsofrandompointsforeachof71users.
Themeanunicityvaluesforlocationandlocation+time,withdifferentlevelsofcoarsening
aresummarizedinfigure14.WiththemuchhigherprecisionandspatialresolutionofGPS
datacurrentlyavailable,twox/ylocationsaresufficienttobeuniquelyassociatedwitha
singletrajectory90%ofthetime,addingthetimestampmatchesasingletrajectory97%of
thetime.Thethreepiecesofinformation-x,y,time-aresospecificthatincreasingthe
numberofpointstomatchtofiveincreasestheunicityverylittlebecauseitisalreadyso
highusingjusttwopoints.Thefirstlevelofcoarseningforx,y,t(~10mspatial,30seconds
temporal)hassimilarunicitytotheoriginalresolutionforjustx,ycoordinates,andwhen
fourorfivepointsareused,thecoarsenedx,y,thasslightlyhighermeanunicity.Themost
coarsenedlevelforx,y,t(~100m,60seconds)stillhasahighunicity(80%fortwopoints).
Thex,ycoordinates(notimestamp)showthegreatestincreaseinunicitywhenmore
pointsareusedformatching.Thissuggeststhatthereisatrade-offbetweenlocation
resolutionandamountofinformation(locationpoints)available.
Discussion/FutureWorkWhileeachoftheissuesaddressedherefocusesonasingledatasetforthecasestudy,I
wouldexpecttheresultstobegenerallyapplicabletoothersimilarmobilitydatasets.The
hotspotanalysisillustratedthatlocalspatialstatisticscanbeusedtoidentify“hotspots”of
movementactivity,andspatialstatisticsvisualizationtoolsareusefulforexploringhow
thesehotspotschangethroughtime.Thespatialstatisticsusedhereallwereextensionsof
Moran’sIindex,whichrequiresavariableofinterestthatismeasuredonaratioorinterval
scale,andlocationsofpointsdonotmeetthiscondition.Therefore,thepointswere
aggregatedtopolygonsandthecountswereusedasthevariableofinterest.Inthesecond
partofthisstudy,speed(m/sec)wascalculatedforeachpoint(basedonthedistancefrom
andtimesincethepreviouslocation)andusedasthevariableofinterest.Spatial
autocorrelationofthespeedassociatedwitheachpointwasclassifiedintohigh-high(likely
associatedwithhighways),low-high,high-low,andlowlow,whichwereusedhereto
indicatepotentialareasofinterest.
Abetterunderstandingofthespatiotemporalstructureofhumanmobilitycouldalso
increasethepredictabilityofmovement.Forexample,apatternofhighactivityor
relativelyslowmovementincertainlocationsatcertaintimesofthedaycouldbeusedto
inferfuturemovementatthesamelocations.
Inbothoftheaboveexamples,locationandrelativespeedwereusedasproxiesfor
behavior,respectively.Itisalsoimportanttonotethatthesevariablesrepresented
collectivebehavior,asallpointsforall40taxicabswereconsideredtogether.Thereare
interestingfuturedirectionstogoinwiththisresearch,particularlycomparingtheutility
ofassociatingrelativespeedwithpoints-of-interestfordifferenttypesofmovingentities.
Automobiles,andtaxicabsinparticular,movedifferently(andslowdownfordifferent
reasonssometimes)comparedtopedestriansandwildlife,andevenregularvehicles.It
wouldbeinterestingtoalsotesthowusefulothermovementparameterssuchasrelative
andabsoluteangleandsteplengthwouldbeforidentifyingpoints-of-interest.Thisisonly
applicableforentitiesthatcanmovemorefreelyacrossspaceandarelessconfinedto
streetnetworksorsidewalks.
Theunicitystudyhasparticularlyimportantimplicationsforprivacyandtheincreasing
availabilityof‘anonymized’trajectorydatasets.Thisisoneofthefirststudiestoexplore
unicityandanonymitywithhigherresolutionGPSdataanditshouldbetroublinghow
uniqueasetoftwolocationpointscanbe.Decreasingthespatialandtemporalresolution
reducestheunicity,butfivepointswithx,ycoordinatesatthecoarsestresolutiontested
herewerestilluniquelyassociatedwithasingletrajectorymorethan60%ofthetime.
Movementparameterssuchasspeed,angle,andsteplengthhavenotbeentestedas
potentialidentifiersoftrajectories,butthecasestudyherefocusingonabsoluteangle
highlightstheirpotentialimportance.Fiveabsoluteangledatapointswereuniquely
associatedwithasingletrajectory72%ofthetime.Thissuggeststhatindividual
movement,irrespectiveofabsolutegeographiclocation,canbeidentifiablewithasufficient
levelofprecisionofanglemeasurementsanddatapoints.Futureworkshouldfocus
specificallyonhowmovementparameterscouldbeusedsinglyortogethertoidentifya
trajectory.
Itisalsoimportanttonoteherethatthefocusofthisstudywasnottore-attachuser
informationtotrajectories,itwasjusttoexaminehowuniquetrajectorieswerebasedon
differentfactors.TheprivacyissuesassociatedwithhigherqualityGPSlocationdatashould
beaddressedwiththeassumptionthatifatrajectorycanbeuniquelydescribedwith2-5
GPSpoints,thetrajectorycouldeventuallybede-anonymized.
ReferencesAnselinL(1995)LocalIndicatorsofSpatialAssociation—LISA.GeographicalAnalysis,
27(2),93–115.
AnselinL,SridharanSandGholstonS(2006)UsingExploratorySpatialDataAnalysisto
LeverageSocialIndicatorDatabases:TheDiscoveryofInterestingPatterns.Social
IndicatorsResearch,82(2),287–309.
BeckerR,CáceresR,HansonK,etal.(2013)HumanMobilityCharacterizationfromCellular
NetworkData.Commun.ACM,56(1),74–82.
deMontjoyeY-A,HidalgoCA,VerleysenM,etal.(2013)UniqueintheCrowd:Theprivacy
boundsofhumanmobility.ScientificReports,3,
DengZandJiM(2011)SpatiotemporalstructureoftaxiservicesinShanghai:Using
exploratoryspatialdataanalysis.In:IEEE,pp.1–5.
GaoS(2015)Spatio-TemporalAnalyticsforExploringHumanMobilityPatternsandUrban
DynamicsintheMobileAge.SpatialCognition&Computation,15(2),86–114.
GudmundssonJ,LaubePandWolleT(2012)Computationalmovementanalysis.In:Kresse
WandDankoDM(eds),SpringerHandbookofGeographicInformation,SpringerBerlin
Heidelberg,pp.423–438.
LiN,LiTandVenkatasubramanianS(2007)t-Closeness:PrivacyBeyondk-Anonymityand
lDiversity.In:IEEE23rdInternationalConferenceonDataEngineering,2007.ICDE2007,
pp.106–115.
OrellanaDandWachowiczM(2011)ExploringPatternsofMovementSuspensionin
PedestrianMobility.GeographicalAnalysis,43(3),241–260.
OrellanaDA,WachowiczM,KnegtdeHJ,etal.(2010)Uncoveringpatternsofsuspensionof
movement.
Piorkowski,M.,Sarafijanovic--�-Djukic,N.,andGrossglauser,M.CRAWDADdataset
epfl/mobility(v.2009--�-02--�-24),downloadedfromhttp://crawdad.org/epfl/mobility/20090224,doi:10.15783/C7J010,Feb2009.
QianX,ZhanXandUkkusuriSV(2015)CharacterizingUrbanDynamicsUsingLargeScale
TaxicabData.SpringerInternationalPublishing.
ReySJ(2014)SpatialDynamicsandSpace-TimeDataAnalysis.SpringerBerlinHeidelberg.
SweeneyL(2002)k-ANONYMITY:AMODELFORPROTECTINGPRIVACY.International
JournalofUncertainty,FuzzinessandKnowledge-BasedSystems,10(05),557–570.
ZangHandBolotJ(2011)AnonymizationofLocationDataDoesNotWork:ALarge-scale
MeasurementStudy.In:Proceedingsofthe17thAnnualInternationalConferenceon
MobileComputingandNetworking,MobiCom’11,NewYork,NY,USA:ACM,pp.145–156,
Availablefrom:http://doi.acm.org.ezproxy.lib.utexas.edu/10.1145/2030613.2030630
(accessed15May2015).
Zheng,Y.LizhuZhang,XingXie,Wei-YingMa.Mininginterestinglocationsandtravel
sequencesfromGPStrajectories.InProceedingsofInternationalconferenceonWorldWild
Web(WWW2009),MadridSpain.ACMPress:791-800.
Zheng,Y.,QuannanLi,YukunChen,XingXie,Wei-YingMa.UnderstandingMobilityBased
onGPSData.InProceedingsofACMconferenceonUbiquitousComputing(UbiComp2008),
Seoul,Korea.ACMPress:312-321.
Zheng,Y.XingXie,Wei-YingMa,GeoLife:ACollaborativeSocialNetworkingServiceamong
User,locationandtrajectory.Invitedpaper,inIEEEDataEngineeringBulletin.33,2,2010,
pp.32-40.
Appendix:
© 2015 Proprietary, The University of Texas at Austin, All Rights Reserved.
For more information on Center for Identity research, resources and information, visit identity.utexas.edu.
identity.utexas.edu