Privacy Preserving Publication of Moving Object Data

29
Privacy Preserving Publication of Moving Object Data Joey Lei CS295 Francesco Bonchi Yahoo! Research Avinguda Diagonal 177, Barcelona, Spain 03/22/22 1 CS295 - Privacy and Data Management

description

Privacy Preserving Publication of Moving Object Data. Francesco Bonchi Yahoo! Research Avinguda Diagonal 177, Barcelona, Spain. Joey Lei CS295. Outline. Intro & Background Clustering and Perturbation Techniques Spatio-Temporal Cloaking (Generalization) Techniques Future Research. - PowerPoint PPT Presentation

Transcript of Privacy Preserving Publication of Moving Object Data

Privacy Preserving Publication of Moving Object Data

Joey LeiCS295

Francesco BonchiYahoo! Research

Avinguda Diagonal 177, Barcelona, Spain

04/19/23 1CS295 - Privacy and Data Management

Outline

• Intro & Background• Clustering and Perturbation Techniques• Spatio-Temporal Cloaking (Generalization)

Techniques• Future Research

04/19/23 CS295 - Privacy and Data Management 2

Location Privacy

• Growing prevalence of location aware devices– mobile phones and GPS devices

• Two Analysis Groups– Online

• Real-time monitoring of moving objects and motion patterns• development of location based services (LBS)

– Google Maps on the iPhone

– Offline • Collection of traces left by moving objects• Offline analysis to extract behavioral knowledge

– public transportation

04/19/23 3CS295 - Privacy and Data Management

Privacy Concerns

• Location Data allows for intrusive inferences– Reveals habits– Social customs– Religious and sexual preferences– Unauthorized advertisement– User profiling

04/19/23 4CS295 - Privacy and Data Management

Offline Analysis

• Traffic Management Application– Paths (trajectories) of vehicles with GPS are recorded

• Geographic Privacy-aware Knowledge Discovery and Delivery (GeoPKDD)– Traffic data published for the city of Milan (Italy)– Car identifiers were replaced with pseudonyms

• Daily Commute Example– Bob’s home and workplace are traceable by location

systems (QIDs)– Join data with a telephone directory

04/19/23 5CS295 - Privacy and Data Management

Definitions

• Anonymity Preserving Data Publishing of Moving Objects Databases– How to transform published location data while

maintaining utility

• Moving Object Database (MOD)– A set of individuals, time points, and trajectories

04/19/23 6CS295 - Privacy and Data Management

Background: Location Based Services

• Ideals– Provide service without learning user’s exact

position– Location data is forgotten once service is provided

• k-anonymity definition– A response to a request for location data is k-

anonymous when it is indistinguishable from the spatial and temporal information of at least k – 1 other responses sent from different users

04/19/23 7CS295 - Privacy and Data Management

LBS: Location k-Anonymity

• Spatial Requirements– Ubiquity – that a user visits at least k regions– Congestion – number of users be at least k

• One Way to Achieve This: Mix Zones– An area where LBS providers cannot trace a

specific users’ movement– Identity is replaced with pseudonyms

• Users entering these zones at the same time are mixed together

04/19/23 8CS295 - Privacy and Data Management

LBS: Location Based Quasi-Identifier

• A spatio-temporal pattern that can uniquely identify one individual – set of spatial areas and time intervals plus a

recurrence formula– AreaCondominium [7am, 8am],AreaOfficeBldg

[8am, 9am],– AreaOfficeBldg [4pm,

6pm],AreaCondominium[5pm, 7pm]– Recurrence : 3.Weekdays 2.Weeks∗

04/19/23 9CS295 - Privacy and Data Management

LBS: Historical k-Anonymity

• In the offline context– A set of requests satisfies historical k-anonymity if

there exists k – 1 personal histories of locations (trajectories) belonging to k – 1 different users such that they are location-time consistent (undistinguishable)

04/19/23 10CS295 - Privacy and Data Management

Outline

• Intro & Background• Clustering and Perturbation Techniques• Spatio-Temporal Cloaking (Generalization)

Techniques• Conclusions

04/19/23 CS295 - Privacy and Data Management 11

Clustering and Perturbation

• C&P ignores the inherent problems with location QIDs:– each individual can have their own QIDs which

makes it difficult to create a QID for all individuals– Area(Home,Office,??)[??am- ??pm]– Recurrence : 7.Weekdays 52.Weeks∗

• Solution: anonymize trajectories instead– Microaggregation / k-member anonymity

04/19/23 12CS295 - Privacy and Data Management

Clustering and Perturbation

• Trajectories are not polylines, but instead a cylindrical volume with radius δ (or uncertainty radius)

• If another trajectory moves within the cylinder of the given trajectory, then the two trajectory are indistinguishable from each other ((k, δ)-anonymity set)

04/19/23 13CS295 - Privacy and Data Management

Clustering and Perturbation

a) Uncertainty trajectoryb) Anonymity set for two trajectories

04/19/23 14CS295 - Privacy and Data Management

Achieving (k, δ)-anonymity

• Achieved by Space Translation – slightly moving some observations in space

• Step One: cluster trajectories of similar sizes– NWA (Never Walk Alone)

• All equivalence classes have the same time span and special timestamp requirements π (ie. π = 60, only full hours, from 1:00PM-2:00PM)

04/19/23 15CS295 - Privacy and Data Management

Achieving (k, δ)-anonymity

• Step Two: perturb trajectories within uncertainty radius δ (i.e. transformation into anonymity set)– Grouping and Reconstruction

• Finding the nearest matching points to group• Reconstruct a generalization for utility• Multi TGA and Fast TGA Algorithms

04/19/23 16CS295 - Privacy and Data Management

Outline

• Intro & Background• Clustering and Perturbation Techniques• Spatio-Temporal Cloaking (Generalization)

Techniques• Conclusions

04/19/23 CS295 - Privacy and Data Management 17

Trajectory Generalization

Anonymization of three trajectories tr1, tr2 and tr3, based on point matching and removal, and spatio-temporal generalization04/19/23 18CS295 - Privacy and Data Management

Trajectory Reconstruction

Reference: Aggarwal, C.C., Yu, P.S.: A condensation approach to privacy preserving data mining.

04/19/23 19CS295 - Privacy and Data Management

Quasi-identifier Methods

• QIDs are a sequence of locations with multiple sensitive values (locations) – values are different from the perspective of each

adversary

• Yet, must consider linkage attacks from all adversaries

04/19/23 20CS295 - Privacy and Data Management

Quasi-identifier Methods

• Possible Attack– T5 and t5

A match! We know that person visited b1

04/19/23 21CS295 - Privacy and Data Management

Space Generalization

• Each position is an exact point on a grid• Generalizations become rectangles of nearby

points.

04/19/23 22CS295 - Privacy and Data Management

Attack Graph

• Privacy Breach on prior example• Definitions

– I-Nodes (Individuals)– O-Nodes (Moving Object IDs)

04/19/23 23CS295 – Data Privacy and Confidentiality

Attack Graph

• If I1 is mapped to O2, there is no clear mapping for I2 or I3 – Both I2 and I3 map to O3.

• Conclusion– O1 must map to I1

04/19/23 24CS295 - Privacy and Data Management

Attack Graph

• Shortcomings on basic k-anonymity definition– Standard k-anonymity states there should be at

least k paths originating from I (based on grouping).

– What if we group O to have at least k paths?

04/19/23 25CS295 - Privacy and Data Management

Attack Graph

• Privacy Breach– Assume I2, O5 are a pair

– I1 maps to both O1, O2, but this is impossible!• I5 must map to O5

04/19/23 26CS295 - Privacy and Data Management

Final k-Anonymity Definition

• Every I-node has degree k or more• The attack graph is symmetric

– For edge (Ii, Oj) there is also an edge (Ij,Oi)

• 2-anonymous attack graph:

04/19/23 27CS295 - Privacy and Data Management

Future Research

• Ad-Hoc anonymization techniques for intended use of data

• Privacy Preserving Data Mining– Focus on the analysis methods instead of the

publishing

04/19/23 CS295 - Privacy and Data Management 28

Questions?

04/19/23 CS295 - Privacy and Data Management 29