Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very...

32
Data and Information Systems Laboratory University of Illinois Urbana-Champaign ASONAM 2011 July, 2011 Geo-Friends Recommendation in GPS-Based Cyber-Physical Social Network Xiao Yu, Ang Pan, Lu-An Tang, Zhenhui Li, Jiawei Han University of Illinois at Urbana-Champaign Acknowledgements: NSF, ARL, NASA, AFOSR (MURI), IBM & Boeing

Transcript of Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very...

Page 1: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011

Geo-Friends Recommendation in GPS-Based Cyber-Physical

Social Network

Xiao Yu, Ang Pan, Lu-An Tang, Zhenhui Li, Jiawei Han

University of Illinois at Urbana-Champaign

Acknowledgements: NSF, ARL, NASA, AFOSR (MURI), IBM & Boeing

Page 2: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 2

Roadmap • Motivation • Background and Preliminaries • Geo-friend Finding Framework

• GPS Pattern Extraction • Build Pattern-Based Information Network • Random Walk with Restart on

Heterogeneous Information Network • Experiments • Conclusions

Page 3: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 3

Motivation: Popularity of Mobile Devices • Mobile devices: Very popular, a major media of

communication

• Data from mobile devices (like real time GPS location, moving trajectories): Reflect users’ daily activities and real life social interactions

• Social network services: Allow users to store and share locations and trajectories collected from their mobile devices

A List of Major Location-Based Social Network Services

Foursquare Facebook Place Google Latitude Twitter Location Update

Yelp Check-in Google+ ……

Page 4: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 4

Motivation: Geo-Friends Recommendation

• Social network with data collected from sensors is usually referred as Cyber-Physical Social Network

• Problem to be solved: Friend recommendation in GPS-based cyber-physical social networks, by combining GPS data with social network information

• Our method discovers real life friends on web-based social network

• Geo-Friends: Potential real life friends, who have both social similarities and geographical correlation

Page 5: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 5

A Geo-Friend Finding Example

• Real life friends play an important role in off-line social events while most virtual on-line friends can fulfill such social function

Alex needs geo-friends join him in a local charity

event

Bob is college friend who lives in another

country now

Carlos is a co-worker but no social network

similarity with Alex

David shares common friends and goes to

same gym, same game store with Alex

David is more likely to be Alex’s geo-friend, but we cannot get this information by only analyzing social network or GPS data.

Page 6: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 6

Contribution • Propose a geo-friend recommendation problem, and

discuss the differences from previously studied link prediction problem

• Define and generate a set of GPS patterns to describe people’s real life social interaction and correlation

• Propose a random walk-based statistical framework for geo-friend recommendation

• Design and conduct a series of experiments on both synthetic and real-world datasets

• Demonstrate the power of our method in various situations

Page 7: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 7

Roadmap • Motivation • Background and Preliminaries • Geo-friend Finding Framework

• GPS Pattern Extraction • Build Pattern-Based Information Network • Random Walk with Restart on

Heterogeneous Information Network • Experiments • Conclusions

Page 8: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 8

Data Model • GPS Trajectory: Sequentially connecting GPS records

of a particular user, following the ascending order of timestamps

• GPS-Based Cyber Physical Social Network:

G(S, V, E): • V: Set of people in the

network

• E: Set of edges, represents all the links between people

• S: Set of GPS trajectories associated with people

Page 9: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 9

Problem Definition

• Given G(S, V, E), and a particular query posed by person v∗

• Return a ranked list of people nodes in V and also for each element v′ in the list:

• What’s more, the ranking score in the process should be relevant to both GPS trajectory S and social network (V, E)

Evv >∉< '*,

Page 10: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 10

Roadmap • Motivation • Background and Preliminaries • Geo-friend Finding Framework

• GPS Pattern Extraction • Build Pattern-Based Information Network • Random Walk with Restart on

Heterogeneous Information Network • Experiments • Conclusions

Page 11: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 11

Geo-Friends Finding Framework: 3 Steps • GPS pattern extraction

• Convert raw, noisy GPS data to meaningful and representative GPS patterns

• Pattern-based heterogeneous information network building

• Combine geographical and social information together in one network

• Random walk with restart on the network

• Use random walk score to measure similarity between people vertices

Page 12: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 12

GPS Pattern Extraction

• Based on empirical observations and heuristics, we propose four different GPS patterns to capture these information

• First, convert raw GPS trajectory dataset S to categorical dataset Scat , and sequential dataset Sseq

• Scat : Discard temporal information and keep discretized locations in an unordered manner

• Sseq : Locations are sequentially connected by the order of timestamps

Page 13: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 13

FL-Pattern • FL-Pattern: Closed frequent patterns with support ≥

2 in Scat is defined as Frequent Location Patterns

• Frequent patterns in Scat could be generated using FP-Growth

• Heuristic: GPS locations can reflect people’s interests, and people tend to go to their interest-related locations more often

• If two people share common locations, which suggests they might share common interests, the probability that they become friends would be higher.

Page 14: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 14

FT-Pattern

• FT-Pattern: Closed sequential pattern with support ≥ 2 and length ≥ 2 in Sseq is Frequent Trajectory Pattern • Sequential Patterns in Sseq could be generated

using PrefixSpan • Heuristic: : GPS trajectory segments indicate people’s

habits and routines • People who share similar routines, tend to

become friends

Page 15: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 15

FLT-Pattern • FLT-Pattern: For each FL-Pattern, if locations share

the same timestamp in all corresponding GPS trajectories, and no super-pattern with the same support can be generated by adding another time constrained location, this pattern is a Frequent Location with Time Constraint Pattern

• Heuristic:

• If two people share same locations at the same timestamps in their GPS trajectory, they should be geographically related.

Page 16: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 16

FTT-Pattern

• FTT-Pattern: Similarly to FLT-Pattern, Frequent Trajectory with Time Constraint Pattern can be defined as closed sequential pattern with support ≥ 2 and length ≥ 2 in Sseq and it shares the same time period in corresponding GPS trajectories

• Heuristic: Two people share same routine in a specific time period, which indicates they are hanging out in that time period

• If two people hang out, the probability of they becoming geo-friends would be higher

Page 17: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 17

Pattern-Based Social Network

• Build a pattern-based heterogeneous information network by combining GPS patterns and social network structures

• Given G(S, V, E), first discard raw GPS trajectory set S

• Then for each GPS pattern, create an additional node p, and link corresponding person node v with p if this GPS pattern exists in person v’s GPS trajectory history

Page 18: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 18

Pattern-Based Social Network (2) • Create a new edge <v, p>, and add it to E′. Set E′ in

contains three types of edges: edges between people, edges from person nodes to pattern nodes, and edges from pattern nodes to person nodes.

Page 19: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 19

Pattern Refinement

• Adding a large number of GPS patterns without selection may decrease the performance badly • Common locations contains no social similarity, e.g.,

bus stop, and hospital • Instead of manually refining patterns, we employ an

entropy-based thresholding measure* to refine and select discriminative GPS patterns • This method filter out patterns with high frequency

and low length * J.N. Kapur, P.K. Sahoo and A.K.C. Wong. A new method for gray-level picture

thresholding using the entropy of the histogram In Computer Vision, Graphics, and Image Processing, March 1985.

Page 20: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 20

Edge Weights: between Pattern Nodes and Person Nodes

• After the construction of the heterogeneous information network, edge weights between nodes need to be defined

• From different types of GPS pattern nodes to person nodes

Nbp(v) is the set of pattern nodes

length(p) denotes the length of pattern p

timespan(p) denotes time span of a time constraint pattern p

Parameters α, β, γ and θ controls pattern importance

Page 21: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 21

Edge Weights (2) • From pattern nodes to person nodes

• Nbv(p) denotes the set of person nodes connecting to pattern node p

• From person nodes to person nodes

• Nbv(v) denotes the set of person nodes connected to person node v

Page 22: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 22

Transition Matrix • In order to apply random walk with restart on the

network, we need to convert network into a transition matrix and then normalize edge weights of pattern nodes • Pr(V) is an |V| × |V| matrix representing the transition

probability between person nodes to person nodes • Pr(A) is a |P|× |V| matrix representing the transition

probability from GPS pattern nodes to person nodes

• Pr(B) is a |V| × |P| matrix representing the transition probability from person nodes to GPS pattern nodes

Page 23: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 23

Why Choose Random Walk with Restart • Random Walk with Restart can simulate the following aspects of

friend finding in GPS-based social network • If a GPS pattern contains more geographical information, the

in-coming probability from person nodes to this pattern should be higher, which increases the probability from one person to another via this GPS pattern

• If two people share more GSP patterns, the overall probability for one person link to another via these GPS pattern nodes would be higher

• If one GPS pattern is rare, the out-going probability of this node would be larger, so that people connected to this pattern would have a higher probability to be linked together

Page 24: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 24

Random Walk with Restart

• Denote the query person as v∗. The random walk process can be represented as:

• RN is a vector, that represents the link relevance from all the nodes to query person v*

• R(t)N represents the link relevance of each node at

the tth iteration

• We assign R(0)N(v*) = 1 where v* is the query

nodes, and all the other elements to 0

Page 25: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 25

Roadmap • Motivation • Background and Preliminaries • Geo-friend Finding Framework

• GPS Pattern Extraction • Build Pattern-Based Information Network • Random Walk with Restart on

Heterogeneous Information Network • Experiments • Conclusions

Page 26: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 26

Datasets

• We generate 4 synthetic datasets with different sizes, attributes and distributions in order to cover different scenarios and thoroughly test our framework

• Also, apply our method on MIT Reality Mining dataset

Page 27: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 27

Competitor Methods

• Random: random selection

• Same Edge: choose friends based on number of same friends

• GPS Similarity: choose friends by measuring GPS location and trajectory similarity

• Random Walk without GPS Patterns: Recommend friends by applying random walk with restart on the original social network

• Bluetooth (only MIT dataset): Recommend friends by returning people who share high meeting frequency

Page 28: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 28

Performance (1)

gpsnet120 precision gpsnet120 recall

Page 29: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 29

Performance (2)

Mit dataset precision Mit dataset recall

Page 30: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 30

Performance (3)

gpsnet120 dataset precision-recall curve MIT dataset precision-recall curve

Precision and recall curve between Random Walk with Restart without GPS information and our method

Please refer to the paper for more experiment results and analysis

Page 31: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 31

Roadmap • Motivation • Background and Preliminaries • Geo-friend Finding Framework

• GPS Pattern Extraction • Build Pattern-Based Information Network • Random Walk with Restart on

Heterogeneous Information Network • Experiments • Conclusions

Page 32: Geo-Friends Recommendation in GPS-Based Cyber-Physical ...€¦ · • Mobile devices: Very popular, a major media of communication • Data from mobile devices (like real time GPS

Data and Information Systems Laboratory University of Illinois Urbana-Champaign

ASONAM 2011 July, 2011 32

Conclusions

• Propose a problem of identifying geographically related friends, and also a three-step statistical framework which combines geo-information with social analysis

• Future work • Domain-oriented GPS pattern definition • Friends recommendation based on user and

his/her interests • Real time friend recommendation by tracking user

GPS usage on the fly