A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis...

37
A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller 2015 UT CID Report #1512 This UT CID research was supported in part by the following organizations: identity.utexas.edu

Transcript of A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis...

Page 1: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories

Jennifer A. Miller

2015UT CID Report #1512

This UT CID research was supported in part by the following organizations:  

identity.utexas.edu  

Page 2: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

ACOMPUTATIONALMOVEMENTANALYSISFRAMEWORKFOREXPLORINGANONYMITYINHUMAN

MOBILITYTRAJECTORIES

BackgroundAdvancementsintrackingtechnologiessuchasglobalpositioningsystems(GPS),radio

frequencyidentification(RFID),cellularphonenetworks,andWiFihotspotshaveresulted

insignificantincreasesintheavailabilityofhighlyaccuratedataonmovingobjects,with

unprecedentedhighspatialandtemporalresolution.Withingeographicinformation

science(GIScience),‘computationalmovementanalysis’(CMA)hasrecentlyemergedasa

subfieldthatfocusesonthedevelopmentandapplicationofcomputationaltechniquesfor

collecting,managing,andanalyzingmovementdatainordertobetterunderstandthe

processesthatareassociatedwiththem(Gudmundssenetal.2012).Asthesetechnologies

facilitatethecollectionofnear-seamless(insomecasessub-second)movementtracks,the

‘spatiotemporalfootprint’ofanindividual’smovementcanbeexploredusingCMA

techniques.

Theselocationdataareoftenstudiedas‘trajectories’,comprisedofaseriesoftime-

stampedsequentiallocations.Dependinguponthecollectionmethod,thelocation

informationcanberepresentedbypreciselatitudeandlongitudecoordinates(e.g.,GPS

datafromasmartphoneorotherdevice)ortheuniquecatchmentareaofasinglecellular

tower(e.g.calldetailrecordsfromcellularphones).Theserelativelylowcostlocationdata

areusedtoexplorehumanmobilitypatternsrelatedto,forexample,urbanplanning,

transportationinfrastructure,disasterplanning/evacuationstrategies,potentialdisease

spread,andmanyotherapplications(Beckeretal.2013).

Theabilitytostudyhumanmobilityandissuesrelatedtointeractionwiththeenvironment

orotherindividuals,andthebehaviorstheseinteractionssuggesthasbeengreatly

enhancedbytechnologicaladvancementsthatfacilitatethecollectionofhighquality

locationdataatunprecedentedspatialandtemporalresolutions.However,asoften

happenswithtechnologicaladvancements,thecollectionofthesedatahaspreceded

extensivestudyonhowandwhattheycan(orshould)beusedfor,aswellastheprivacy

implicationsassociatedwithdistributinginformationonanindividual’slocation.The

researchpresentedhereexploresissuesrelatedtoprivacyandidentityassociatedwith

morerecentlyavailablehighresolutionGPSlocationdata.Theanalysisfocusesonusing

methodsfrommovementpatternanalysisandspatialstatisticalmethodstoaddressthe

followingissues:

• Canactivity“hotspots”beidentifiedfrommovementdataandhowcantheir

spatiotemporalstructurebeexplored?

Page 3: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

• How“unique”areanonymizedmovementtrajectories?Howistheiruniqueness

affectedbyspatialandtemporalresolution?Canmovementcharacteristicssuchas

speedbeusedtouniquelycharacterizetrajectories?

Usingmovementpatternanalysistoidentifypotentialactivity“hotspots”fromGPS

trajectorydata:acasestudyusingtaxicabdatainSanFrancisco.

Animportantapplicationusinglocationdatainvolvesexploringthespatiotemporalpattern

ofactivitytheyrepresent.Previousexamplesfocusedpredominantlyoncalldetailrecords

(CDR)thatwereaggregatedtotheirnearestcellulartower(seeGao2015forreview).

Spatialautocorrelationanalysiswasusedtoidentify“source”areaswithmoreoutgoing

callsand“sink”areas,wheremoreincomingcallsoccurred.Morerecently,GPSdatahave

beenusedtoexploremovementactivityoftaxisinShanghai(DengandJi2011),taxisin

NewYorkCity(Qianetal.2015),andcementtrucksinAthens(Orellanaetal.2010).While

therearecertainpredictablespatialpatternsoftaxicablocationandmovementrelatedto

citystructure(e.g.greateractivityincentralbusinessdistrict)ortimeofday(e.g.towards

andawayfromCBDinmorningandevening,respectively),therearealsostochastic

elementsassociatedwithotherfactorsthatcanoftenberelatedtoephemeralactivitiesand

passengerbehaviors.

Ihypothesizedthatthespatiotemporalstructureofthecollectivemovementofthetaxicabs

couldbeusedtoinferpoints-of-interest(POI)oractivity“hotspots”,andthatsomehot

spotswouldemergeordisappeardependingonthetimeofday.

Researchquestions:Howcanmovementpatternanalysisandspatialstatisticsbeusedto

identifycollectivepointsofinterestfromGPSlocationdata?Howcanthespatiotemporal

structureofthesemovementactivitiesbeexplicitlyanalyzedandvisualized?

DataSanFranciscoCabDataset(http://crawdad.org/epfl/mobility/20090224/).Iused40cabs

andextracteddataforoneweekday(WednesdayJune4,2008)toexaminehowmovement

analysisandspatialstatisticscanbeusedtoexplorepotentialpointsofinterest(POI).The

temporalresolutionwasapproximately1minute.GPSlocationsforeachofthe40cabs

werepartitionedintooneofthreetemporalbins:morning(7-10am,n=4634),afternoon

(4-7pm,n=6009),andevening(9pm-12midnight,n=6087).

MethodsTwodifferentmethodswereusedtoexplorehotspotactivities:thefirstmethodinvolved

aggregatingthetaxilocationstoa250meterx250metersquare(sizewasselected

becauseitisgreaterthan1cityblockbutlessthan2blocks)forasubsetofdowntownSan

Francisco(peakactivity).Thenumberoftaxilocationsforeachsquareandforeachofthe

threetimeperiods(morning,afternoon,evening)wascountedandanalyzedusingglobal

andlocalMoran’sI.

Page 4: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

Wherexisthecountoftaxicabsandwijisthespatialweightsmatrixusedtorepresent

whatis“near”.Iusedboth1stand2ndorder(row-standardized)contiguityforspatial

weightsmatrixhere.Moran’sIrangesfrom[-1]indicatingextremenegativespatial

autocorrelationto[+1],indicatingextremepositivespatialautocorrelation,withvalues

near0indicatingnoautocorrelation.

Anselin(1995)introducedalocalstatisticsthatdecomposedtheglobalMoran’sItoalocal

measure(LISA-localindicatorofspatialautocorrelation)as:

Whereavalueiscalculatedforeachobservation.Asinglestatisticisnolongerreported

withLISA,butthevaluescanbemappedandthespatialdistributionofspatial

autocorrelationcanbeexplored.

Figures2-4showthelocalspatialautocorrelationofthetaxicabsforthemorning(fig.2),

afternoon(fig.3),andevening(fig.4)timeperiods,alongwiththerawcounts.Thereisa

coreofrelativelyhighcountsintheupperrightofthestudythatismaintainedforalltime

periods,butthemagnitudeofthiscoreisdifferentforeachtimeperiod,rangingfroma

smallclusterofhigh-highvaluesinthemorning(fig.2b)tothelargestclusterforthe

afternoon(fig.3b).WhiletheglobalMoran’sIwaspositiveandstatisticallysignificantfor

alltimeperiods(usingMonteCarlopermutations(n=499),indicatingthattheoverall

patternwasnearvaluesweresimilartoeachother,therewereoutliersforeachtime

period.Therewere7high-lowcellsinthemorning-cells(fig.2b)thathadahighcount

surroundedbyneighborswithlowcells-whichcouldindicateanisolatedareaofhigh

activity.Additionally,asinglestatisticallysignificantpositivevaluefor(global)Moran’sI

indicatesoverallpositivespatialautocorrelation,butcannotdifferentiatebetweenclusters

ofhighvaluesandclustersoflowvalues.MappingthelocalIivaluesillustratesthat,in

additiontothecoreofhightaxiactivity,thereisacoreoflowtaxiactivityinthebottomleft

foralltimeperiods,aswellaspocketsofnegativespatialautocorrelation(high-lowand

low-high).

Page 5: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

ThelocalstatisticLISAcanbefurtherextendedtomeasurecross-correlationbetweenthe

valueofavariableforatargetcellcomparedtothelaggedvalueofadifferentvariablefor

itsneighbors(inequation2.0above,thexvariablesontherightsideoftheequationwould

representadifferentvariable).BivariateLISAstatisticsareparticularlyusefulforstudying

thechangeinavariableacrosstimeperiods(Anselinetal.2007).Figure5ashowsthe

bivariateLISAformorningcountsasthetargetcomparedtotheneighboringcells’counts

fortheafternoon.Ahigh-highandlow-lowcellswouldbeinterpretedasanareaofhighor

lowactivity,respectively,acrossbothtimeperiods,whilealow-highwouldidentifyacell

thathadlowactivityinthemorningcomparedtohighactivityamongitsneighborsinthe

afternoon.Converselyahigh-lowwouldindicateacellthathadhighmorningactivity

comparedtothelowafternoonactivityofitsneighbors.Thelow-highcellsfringingtheCBD

showthattheareaofactivityincreasesfrommorningtoafternoon.

Figure6ashowstheafternoon-eveningpattern,whereahotspotemergesinthe

southeasternpartofthestudyareanearamajorfreeway(Bayshore).Figure7acompares

morningcountstoevening,andthisareaisalargehotspot,confirmingthatitisanareaof

highactivityinthemorningandevening,butrelativelylowactivityintheafternoon.A

high-lowcellhererepresentsanareaofhighactivityinthemorningthatislessactivein

theeveningandlow-highiscellsforwhichactivityishigherintheeveningcomparedto

morning.

AmorerecentextensiontobivariateLISAisthedirectionalMoranscatterplot,which

allowsforbettervisualizationofthedynamicsbetweenchangingspatialpatternsacross

timeperiods(Rey2014).ThedirectionalLISAshowsthemovementofthestatisticsacross

twotimeperiods,andthereforeincorporatesinformationfromtwodifferentMoran

scatterplots.Forexample,figure5bshowsthechangeinLISAstatisticforeachcellfrom

morningtoafternoon:eachvector‘starts’initspositionformorning(fromfigure2b)and

‘ends’initspositionforafternoon(fromfigure3b).Thesmallarrowsinthetopleftlow-

highquadrantrepresentcellsthatwerelowactivitysurroundedbyhighactivityinboth

morningandafternoon.Figure6bshowsthattherewasmuchmorevariationinLISA

statisticsinafternoonandevening.Thevectorthatishighlightedwithayellowstar

representsacellthatwasa‘coldspot’,orareaoflowactivityintheafternoon,butbecamea

hotspotintheevening.

Inadditiontomeasuringthespatialpatternofaggregatedcountstoidentifylikelyactivity

hotspots,amorenovelmethodinvolvesmeasuringthespatialautocorrelationof

movementparameters,specificallyspeed.OrellanaandWachowicz(2011)usedLISAto

analyzepedestrianmovementinordertouncover“movementsuspension”(low-low

clusters)theysuggestedwouldindicatepointsofinterestoractivityhotspots.Aftertesting

differentnearestneighborspatialweightsmatrices,onethatconsidersonlythe10closest

neighborstobe“near”wasusedtomeasurespatialautocorrelationforallpointswithin

eachofthethreetimeperiods.Asthevariableofinterestisnowspeed,alow-lowcouldbe

usedtosuggestanareaofinterestorahotspot,whilehigh-highpointswouldmostlikely

beassociatedwithfreeways.Low-highsandhigh-lowswouldlikelybedifficultto

disentanglefromvariabletrafficpatterns(e.g.“stopandgo”traffic).Figure8ashowsthe

Page 6: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

patternforthemorningperiod(fig.8biszoomedintodowntownSanFrancisco).Figures

9aand9barefortheafternoonperiodandfigures10aand10bareevening.Thespatial

distributionofthese‘suspendedmovement’areasrepresentedbylow-lowsvarieswiththe

timeofday.Inordertomoreeasilyvisualizethedifferencesinthespatialdistributionof

low-lows,kerneldensityofjustthelow-lowpointswascalculatedanda‘homerange”or

areawheremostofthemwasoccurredwasextracted.Whiletheafternoonandevening

low-lowclustersareinsimilarplaces,themorninglow-lowclustersextendfarther

downtown.Also,theafternoonandeveningclustersatthebottomrightrepresenttheSan

FranciscoAirport,wheretherewaslessslowspeedamongtaxicabsinthemorningperiod.

Examiningthe‘uniqueness’ofhumanmobilitytrajectories:acasestudyusingsmartphonedatafromBeijingAggregatingcountsofGPSlocationstoalargerareaapproximatesthespatialresolution

availablewhenthesestudiesweredonewithcellphonetowersandcallcountswere

aggregatedtotheVoronoipolygondrawnaroundeachcellulartower.However,thereare

importantprivacyissuesassociatedwithdealingwithactualGPSlocationsthatareoften

overlooked.Locationdataareoftenreleasedaftertheyhavebeen‘anonymized’—which

meansthatthetrajectoryhasbeenstrippedofanyobviousidentifyinginformationsuchas

name,address,phonenumber,etc.However,personalpointsofinterest(home,work)can

stillbeidentifiedbyminingtrajectorydataformovementpatterns,andthesepointsof

interestareoftenassociatedwithuniqueindividuals.Additionallocationsmayberesolved

thatcouldhavenegativeimplications(ex.repeatedvisitstoamedicalclinicmaybeacause

forconcernforemployers).

Duetodataavailability,mostofthepreviousworkon“unicity”ormeasuringthe

uniquenessofmovementtracesortrajectorieshasbeenwithmuchcoarserscaledcell

phonedata.Surprisingly,evenrelativelycoarsespatialresolutionlocationdatasuchasthat

associatedwithcalldetailrecords(CDR),where‘location’isanareadefinedbyits

proximitytoaspecificcellphonetower,canbeusedtouniquelyidentifyanindividual.

Locationsofcellphonetowersorantennaearebasedonpopulationdensityandthearea

associatedwitheachonevariesconsiderably.IntheirstudyinasmallEuropeancountry,

deMontjoyeetal.(2013)foundthatthereceptionorcatchmentareaforanantennaranged

from0.15km2inurbanareasto15km2inruralareas.

ZangandBolot(2011)usedanonymizedCDRfrom25millionindividualsacrosstheU.S.to

determinethe“topN”locationsatwhichcallswererecordedforeachofthreemonths.

TheyfoundthatwhenN=2(typicallycorrespondingtoworkandhome),theyfoundthat

upto35%oftheindividualscouldbeuniquelyidentified.WhenN=3(theysuggestedthe

3rdlocationtypicallyrepresentedaschoolorshoppingrelatedlocation),50%couldbe

uniquelyidentified.

Intheirseminalstudy,deMontjoyeetal.(2013)usedfifteenmonthsofanonymizedmobile

phonedata(CDR)for1.5millionindividualsinawesternEuropeancountryandfoundthat

Page 7: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

fourrandomlyselectedspatiotemporalpointsweresufficienttouniquelyidentify95%of

theindividuals.Perhapsmoretroubling,theyfoundthatover50%ofindividualswere

uniquelyidentifiablefromjusttworandomlyselectedlocations(typicallyalso

correspondingtohomeandwork).Songetal(2014)foundsimilarresultswithadatasetof

oneweekofmobilitydatafor1.14millionpeople(total56millionrecords):withjusttwo

randompoints,60%ofthetrajectorieswereunique.

Itisimportanttonotethatuniquenessdoesnotequatetore-identifiabilityandthe

objectivesofthesestudiesweretoexaminehowuniqueindividualtrajectorieswere,notto

actuallydeanonymizethemorre-attachanindividual’sinformationtoauniquetrajectory.

However,theabilitytodetermineuniquenessoftrajectoriesisanimportantprerequisite

forre-identification(whichwouldinvolvecorrelationwithanancillarydataset)and

therefore,representsapotentialthreattoindividualprivacy.

Thedegreeofuniquenessoftrajectoriescanvaryasafunctionoffactorssuchastypical

commutingpatterns,transportationmodes,andgeographicalregion(whichaffects

commutingpatternsandtransportationmodes).Therehavebeenseveralmethods

proposedtoquantifythe‘anonymity’ofadatabase.Themostcommonlyusedmethodofk-

anonymitywasintroducedbySweeney(2002)asameasuretoincreaseanonymityfor

non-spatialdatabases.Whenappliedtospatialdatabases,itensuresthatanysetofrecords

(locations)foranindividualisatleastthesameask-1individuals.Generally,k=2,

ensuringthatatleasttwotrajectoriesareequivalent,butaskincreases,sotoodoesthe

anonymity.Extensionsofk-anonymityincludel-diversityandt-closeness(Lietal.2007).

Thesemeasuresaregenerallyusedtomanagetrajectorydatasets(i.e.,datawouldbe

manipulatedsothatthelevelofanonymityreachedthereportedklevel),butinorderto

quantifytheactuallevelofanonymityoftrajectorydatasets,arigorousanalysiscomparing

randompointsfromeachtrajectorytoallothertrajectorieshastobeconducted.With

trajectorydatasetsnowavailableatonesecondintervals,thevolumeofthesedatacan

resultincomputationallyintensiveanalysis.

Montjoyeetal(2013)measured‘unicity’asthepercentageof2500randomtracesthat

wereuniquegiveprandompoints(prangedfrom2to5).Songetal(2014)defined

uniquenessoftrajectoriesasthepercentageofallavailabletrajectoriesthatwereuniquely

associatedwithprandompoints,whichtheyvariedfrom2to6.Whileanonymity(orlack

thereof)hasbeenstudiedwithCDRdata,asthepreviousexamplesshow,ithasnotyet

beenaddressedwithfinerspatiotemporalresolutionavailableasGPSlocationsfrom,e.g.,

smartphones.Thesedatasetscouldpotentiallybefarmoreuniqueandthereforemore

difficulttoanonymize.

Inthisstudy,IdoanextensivestudyoftheunicityofGPSmovementtrajectoriestestingthe

effectofspatialresolutionandtemporalresolution.Inadditiontolocation,Ialsoexplore

howeffectivemovementparameterssuchasspeedcouldbeforuniquelyidentifyinga

trajectory.Thisisoneofthefirststudiestomeasureunicityoftrajectoriescomposedof

GPSlocations.

Page 8: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

IhypothesizedthattheunicityoftrajectorieswillbegreaterforGPSlocationsthanthe

coarserscaledCDRlocations.Ialsohypothesizedthattheattenuatingeffectsofcoarsening

spatialandtemporalresolutionwillhavelessimpactthantheywouldwithCDRlocations.I

expectthatspeedwillalsobeeffectiveforuniquelyidentifyingatrajectorywhenseveral

datapointsareused.

Researchquestions:Howcananonymitybequantifiedfordifferenttypesoftrajectories

andhowisitaffectedbyspatialandtemporalresolution?Canmovementcharacteristics

suchasspeedbeusedtouniquelycharacterizeatrajectoryintheabsenceofactual

locations?

DataMicrosoftGeoLifeTrajectories(http://research.microsoft.com/en-

us/downloads/b16d359dd164-469e-9fd4-daa38f2b2e13/).Thisisanextremelydense

dataset,withtemporalresolutionof~1-5secondsandspatialresolutionof~5-10meters.

Weusedonlyoneyearofdata(January2009-December2009)andusedaspatialmaskof

Beijing(39.6°to40.2°Nlatitude),(116°to116.8°Elongitudes)toremoveuserswho

traveledoutsideofthecityduringthistimeperiod.Thisresultedin71userswhohada

totalof7,243dailytrajectories(numberoflocationsvisitedwithintrajectoriesvariedbut

themeanwas1600).

MethodsThebasisofourunicitytestinvolvedextracting500setsofpointsofsizenfromeachuser

andcountinghowmanyothertrajectoriestheyarefoundin.Thepercentageof500setsof

pointsthatmatchedonlyonetrajectorywascalculatedandthiswasdoneforeachof71

usersforfourdifferentpointsizes(n=2,3,4,and5).Ourmeasureofunicity,u,isthe

percentageof500randompointsofsizenthatarecontainedinonlyonetrajectory

averagedacrossall71users.Aunicityvaluecloseto100indicatesahighlyunique

trajectorythatcouldtheoreticallybedeanonymized,orre-connectedwithidentifyinguser

informationmoreeasily;alowunicityvaluesuggeststhattherandomsetofpointsare

containedinseveraldifferenttrajectoriesandthereforewouldmakede-anonymizing

trajectoriesfarmorechallenging.Theamountofinformationfromeachpointwasvaried-

weusedjustlocation(xandy),locationandtime(x,y,andt),andtheabsoluteangle(the

absoluteangleforpointiismeasuredbetweenthexdirectionandthestepbuiltby

relocationsiandi+1).

Theoriginallatitudeandlongitudecoordinatesfortheselocationshaveaspatialprecision

ofsixdecimalplaces(~0.1m).Inordertotesthowspatialandtemporalresolutionaffected

measurementofunicity,thegeographiccoordinateswerecoarsenedfirsttofourdecimal

places(~10m)andthetemporalresolutionwascoarsenedto30seconds,thenfurther

coarsenedtothreedecimalplaces(100m)and60seconds.Additionally,theprecisionof

Page 9: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

theabsoluteanglemeasurewasdecreasedfromtheoriginal(fivedecimalplaces)tothree

decimalplaces.

Figure12displaysthetrajectoriesfortwodifferentusersforasingleday.Thezoomedin

subsethelpstoillustratetheimportanceoflocationprecision-thereareseverallocations

thatwouldbethesameforbothusersifthelocationprecisionwascoarsenedten-or100-

fold.Figure13showsthreedifferentusers’dailytrajectoriesinaspace-timecube.Thered

andgreenusersoverlapintime,butnotspace,whiletheredandblueusersoverlapin

spacebutnottime.Theuseofallthreepiecesofcoordinateinformation-x,y,andt-canbe

extremelyimportantforuniquelyassociatingasingletrajectory.

ResultsTable1showstheunicityvaluesassociatedwithsizeofeachrandompointset.Themean

wastheaverageunicityacrossall71users,whiletheminimumandmaximumshowthe

variationinunicityamongusers.Ingeneral,thelocationsofpointsonatrajectorywere

highlyunique.90%oftherandomsetsofjusttwopointscomposedofonlylocation(no

timestamp)wereassociatedwithonlyonetrajectory.Addingthetimestampincreasedthe

unicityoftwopointsto97%.Whenfivepointswithlocationandtimestampwereused,the

unicityincreasedtoalmost99%.Somewhatsurprisingly,theangleofmovementalonehas

fairlyhighunicity—whentheangleofthreepointsaretested,theunicityissimilartothe

unicityoflocationforCDRasfoundindeMontjoyeetal.(2013)andSongetal.(2014).Five

anglevaluescoulduniquelyidentifyatrajectory73%ofthetime.

Table1:unicityresultsforlocation(x,y),locationandtime(x,y,t)andtheabsoluteangleof

apoint.Meansandrangesarereportedfor500setsofrandompointsforeachof71users.

Unicityvaluesforcoarsenedlocation,time,andabsoluteangleareshownintable2.When

justtwopoints(notimestamp)areusedatthecoarserresolution(spatialprecision

reducedtenfoldto~10m),only68.5%ofthetimearethepointsassociatedwithaunique

trajectory.Whenthe30secondslessprecisetimestampisadded,theunicityissimilarto

theoriginalresolution,andwithfivepointswithlocationandtimestamp,unicityincreases

to~94%.Theunicityoftheabsoluteangledegradessubstantially—evenusingasetoffive

pointsresultsinlessthan5%unicity.

Page 10: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

Table2:unicityresultsforlocation(x,y),locationandtime(x,y,t)andtheabsoluteangleof

apoint.Spatialresolutionhasbeencoarsenedto4decimalplaces;temporalresolutionhas

beencoarsenedto30seconds;andabsoluteanglecoarsenedtothreedecimalplaces.

Meansandrangesarereportedfor500setsofrandompointsforeachof71users.

Table3showsunicityvaluesforthecoarsestlocationcoordinates:thespatialresolutionof

anx/ypairisnow~100mandthetemporalresolutionwascoarsenedtooneminute.The

spatialresolutionhereisclosertotheresolutionoftheantennareceptionareasusedinthe

deMontjoyeetal.(2013)paper(wherespatialresolutionrangedfrom115mto15km),

butthecoarsenedtemporalresolutionisstillmuchmoreprecisethantheoneusedinthe

CDRstudies.Asaresult,usinglocationandtimeforjusttwopointsstillresultsinahigh

unicity(mean80.3%),whilefivepointsincreasesthemeanunicitytoalmost88%.Using

justlocation(notimestamp),theunicitydegradesto32%fortwopointsand66%forfive

points.

Table3:unicityresultsforlocation(x,y),locationandtime(x,y,t).Spatialresolutionhas

beencoarsenedto3decimalplaces;temporalresolutionhasbeencoarsenedto60seconds.

Meansandrangesarereportedfor500setsofrandompointsforeachof71users.

Themeanunicityvaluesforlocationandlocation+time,withdifferentlevelsofcoarsening

aresummarizedinfigure14.WiththemuchhigherprecisionandspatialresolutionofGPS

datacurrentlyavailable,twox/ylocationsaresufficienttobeuniquelyassociatedwitha

Page 11: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

singletrajectory90%ofthetime,addingthetimestampmatchesasingletrajectory97%of

thetime.Thethreepiecesofinformation-x,y,time-aresospecificthatincreasingthe

numberofpointstomatchtofiveincreasestheunicityverylittlebecauseitisalreadyso

highusingjusttwopoints.Thefirstlevelofcoarseningforx,y,t(~10mspatial,30seconds

temporal)hassimilarunicitytotheoriginalresolutionforjustx,ycoordinates,andwhen

fourorfivepointsareused,thecoarsenedx,y,thasslightlyhighermeanunicity.Themost

coarsenedlevelforx,y,t(~100m,60seconds)stillhasahighunicity(80%fortwopoints).

Thex,ycoordinates(notimestamp)showthegreatestincreaseinunicitywhenmore

pointsareusedformatching.Thissuggeststhatthereisatrade-offbetweenlocation

resolutionandamountofinformation(locationpoints)available.

Discussion/FutureWorkWhileeachoftheissuesaddressedherefocusesonasingledatasetforthecasestudy,I

wouldexpecttheresultstobegenerallyapplicabletoothersimilarmobilitydatasets.The

hotspotanalysisillustratedthatlocalspatialstatisticscanbeusedtoidentify“hotspots”of

movementactivity,andspatialstatisticsvisualizationtoolsareusefulforexploringhow

thesehotspotschangethroughtime.Thespatialstatisticsusedhereallwereextensionsof

Moran’sIindex,whichrequiresavariableofinterestthatismeasuredonaratioorinterval

scale,andlocationsofpointsdonotmeetthiscondition.Therefore,thepointswere

aggregatedtopolygonsandthecountswereusedasthevariableofinterest.Inthesecond

partofthisstudy,speed(m/sec)wascalculatedforeachpoint(basedonthedistancefrom

andtimesincethepreviouslocation)andusedasthevariableofinterest.Spatial

autocorrelationofthespeedassociatedwitheachpointwasclassifiedintohigh-high(likely

associatedwithhighways),low-high,high-low,andlowlow,whichwereusedhereto

indicatepotentialareasofinterest.

Abetterunderstandingofthespatiotemporalstructureofhumanmobilitycouldalso

increasethepredictabilityofmovement.Forexample,apatternofhighactivityor

relativelyslowmovementincertainlocationsatcertaintimesofthedaycouldbeusedto

inferfuturemovementatthesamelocations.

Inbothoftheaboveexamples,locationandrelativespeedwereusedasproxiesfor

behavior,respectively.Itisalsoimportanttonotethatthesevariablesrepresented

collectivebehavior,asallpointsforall40taxicabswereconsideredtogether.Thereare

interestingfuturedirectionstogoinwiththisresearch,particularlycomparingtheutility

ofassociatingrelativespeedwithpoints-of-interestfordifferenttypesofmovingentities.

Automobiles,andtaxicabsinparticular,movedifferently(andslowdownfordifferent

reasonssometimes)comparedtopedestriansandwildlife,andevenregularvehicles.It

wouldbeinterestingtoalsotesthowusefulothermovementparameterssuchasrelative

andabsoluteangleandsteplengthwouldbeforidentifyingpoints-of-interest.Thisisonly

applicableforentitiesthatcanmovemorefreelyacrossspaceandarelessconfinedto

streetnetworksorsidewalks.

Page 12: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

Theunicitystudyhasparticularlyimportantimplicationsforprivacyandtheincreasing

availabilityof‘anonymized’trajectorydatasets.Thisisoneofthefirststudiestoexplore

unicityandanonymitywithhigherresolutionGPSdataanditshouldbetroublinghow

uniqueasetoftwolocationpointscanbe.Decreasingthespatialandtemporalresolution

reducestheunicity,butfivepointswithx,ycoordinatesatthecoarsestresolutiontested

herewerestilluniquelyassociatedwithasingletrajectorymorethan60%ofthetime.

Movementparameterssuchasspeed,angle,andsteplengthhavenotbeentestedas

potentialidentifiersoftrajectories,butthecasestudyherefocusingonabsoluteangle

highlightstheirpotentialimportance.Fiveabsoluteangledatapointswereuniquely

associatedwithasingletrajectory72%ofthetime.Thissuggeststhatindividual

movement,irrespectiveofabsolutegeographiclocation,canbeidentifiablewithasufficient

levelofprecisionofanglemeasurementsanddatapoints.Futureworkshouldfocus

specificallyonhowmovementparameterscouldbeusedsinglyortogethertoidentifya

trajectory.

Itisalsoimportanttonoteherethatthefocusofthisstudywasnottore-attachuser

informationtotrajectories,itwasjusttoexaminehowuniquetrajectorieswerebasedon

differentfactors.TheprivacyissuesassociatedwithhigherqualityGPSlocationdatashould

beaddressedwiththeassumptionthatifatrajectorycanbeuniquelydescribedwith2-5

GPSpoints,thetrajectorycouldeventuallybede-anonymized.

ReferencesAnselinL(1995)LocalIndicatorsofSpatialAssociation—LISA.GeographicalAnalysis,

27(2),93–115.

AnselinL,SridharanSandGholstonS(2006)UsingExploratorySpatialDataAnalysisto

LeverageSocialIndicatorDatabases:TheDiscoveryofInterestingPatterns.Social

IndicatorsResearch,82(2),287–309.

BeckerR,CáceresR,HansonK,etal.(2013)HumanMobilityCharacterizationfromCellular

NetworkData.Commun.ACM,56(1),74–82.

deMontjoyeY-A,HidalgoCA,VerleysenM,etal.(2013)UniqueintheCrowd:Theprivacy

boundsofhumanmobility.ScientificReports,3,

DengZandJiM(2011)SpatiotemporalstructureoftaxiservicesinShanghai:Using

exploratoryspatialdataanalysis.In:IEEE,pp.1–5.

GaoS(2015)Spatio-TemporalAnalyticsforExploringHumanMobilityPatternsandUrban

DynamicsintheMobileAge.SpatialCognition&Computation,15(2),86–114.

GudmundssonJ,LaubePandWolleT(2012)Computationalmovementanalysis.In:Kresse

WandDankoDM(eds),SpringerHandbookofGeographicInformation,SpringerBerlin

Heidelberg,pp.423–438.

Page 13: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

LiN,LiTandVenkatasubramanianS(2007)t-Closeness:PrivacyBeyondk-Anonymityand

lDiversity.In:IEEE23rdInternationalConferenceonDataEngineering,2007.ICDE2007,

pp.106–115.

OrellanaDandWachowiczM(2011)ExploringPatternsofMovementSuspensionin

PedestrianMobility.GeographicalAnalysis,43(3),241–260.

OrellanaDA,WachowiczM,KnegtdeHJ,etal.(2010)Uncoveringpatternsofsuspensionof

movement.

Piorkowski,M.,Sarafijanovic--�-Djukic,N.,andGrossglauser,M.CRAWDADdataset

epfl/mobility(v.2009--�-02--�-24),downloadedfromhttp://crawdad.org/epfl/mobility/20090224,doi:10.15783/C7J010,Feb2009.

QianX,ZhanXandUkkusuriSV(2015)CharacterizingUrbanDynamicsUsingLargeScale

TaxicabData.SpringerInternationalPublishing.

ReySJ(2014)SpatialDynamicsandSpace-TimeDataAnalysis.SpringerBerlinHeidelberg.

SweeneyL(2002)k-ANONYMITY:AMODELFORPROTECTINGPRIVACY.International

JournalofUncertainty,FuzzinessandKnowledge-BasedSystems,10(05),557–570.

ZangHandBolotJ(2011)AnonymizationofLocationDataDoesNotWork:ALarge-scale

MeasurementStudy.In:Proceedingsofthe17thAnnualInternationalConferenceon

MobileComputingandNetworking,MobiCom’11,NewYork,NY,USA:ACM,pp.145–156,

Availablefrom:http://doi.acm.org.ezproxy.lib.utexas.edu/10.1145/2030613.2030630

(accessed15May2015).

Zheng,Y.LizhuZhang,XingXie,Wei-YingMa.Mininginterestinglocationsandtravel

sequencesfromGPStrajectories.InProceedingsofInternationalconferenceonWorldWild

Web(WWW2009),MadridSpain.ACMPress:791-800.

Zheng,Y.,QuannanLi,YukunChen,XingXie,Wei-YingMa.UnderstandingMobilityBased

onGPSData.InProceedingsofACMconferenceonUbiquitousComputing(UbiComp2008),

Seoul,Korea.ACMPress:312-321.

Zheng,Y.XingXie,Wei-YingMa,GeoLife:ACollaborativeSocialNetworkingServiceamong

User,locationandtrajectory.Invitedpaper,inIEEEDataEngineeringBulletin.33,2,2010,

pp.32-40.

Page 14: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 15: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 16: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 17: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 18: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 19: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 20: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 21: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 22: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 23: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 24: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

Appendix:

Page 25: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 26: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 27: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 28: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 29: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 30: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 31: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 32: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 33: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 34: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 35: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 36: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller
Page 37: A Computational Movement Analysis Framework For Exploring ... · A Computational Movement Analysis Framework For Exploring Anonymity In Human Mobility Trajectories Jennifer A. Miller

© 2015 Proprietary, The University of Texas at Austin, All Rights Reserved.

For more information on Center for Identity research, resources and information, visit identity.utexas.edu.

identity.utexas.edu