Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology...

25
Collaborative Location and Activity Recommendations with GPS History Data Vincent W. Zheng , Yu Zheng , Xing Xie , Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent was doing internship in Microsoft Research A 1

Transcript of Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology...

Page 1: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

1

Collaborative Location and Activity Recommendations

with GPS History Data

Vincent W. Zheng†, Yu Zheng‡, Xing Xie‡, Qiang Yang†

†Hong Kong University of Science and Technology‡Microsoft Research Asia

This work was done when Vincent was doing internship in Microsoft Research Asia.

Page 2: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

2

Introduction and MotivationUsers now sharing GPS trajectories on the WebWisdom of crowd: incorporating users’ knowledge

A GPS trajectory

A comment

Travel experience:Some places are more popular than the othersUser activities:“The food is delicious” --> dining at that place

Page 3: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

3

Goal: To Answer 2 Typical Questions

A recommended location

Recommended activity list

Recommended location list

Location query

Activity query

Q2: where should I go if I want to do something?

(Location recommendation given activity query)

Q1: what can I do there if I visit some place?

(Activity recommendation given location query)

Page 4: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

4

Problem DefinitionHow to well model the location-activity relation

Encode it into a matrix

Example

ActivitiesL

ocat

ion

sActivities

Loc

atio

ns

Location recommendation Activity recommendation

An entry denotes how popular an activity is performed at a location

Ranking along theColumns or rows

5

4

1

5 3 2

5 4 2

4 3 1

1 2 6

Forbidden City

Bird’s Nest

Zhongguancun

Location recommendationTourism:Forbidden City > Bird’s Nest > Zhongguancun

TourismExhibitionShopping

Activity recommendationForbidden City:Tourism > Exhibition > Shopping

Page 5: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

5

ContributionsIn practice, it’s sparse!

User comments are few (in out dataset, <0.6% entries are filled)

Features

Loca

tion

s

Activities

Loca

tion

s

Activities

Act

ivit

ies

Location functionalities Activity correlations

5 ? ?

? 1 ?

1 ? 6

Forbidden City

TourismExhibitionShopping

Bird’s Nest

Zhongguancun

?

Page 6: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

6

System Architecture

GPS Log

Grid-based Clustering

Collaborative Location and Activity Recommender

Laptops and PCs

PDAs and Smart-phones

1

2

3

54

1 Data Inputs

6

2 Stay Region Extration 3 Location-Activity Extraction 4 Location-Feature5 Activity Correlation Mining 6 Collaborative Loc. & Act. Recommendations

POI Category Database

World Wide Web

Location Feature Extraction

Activity Correlation Mining

Location-based Activity Statistics

Stay Regions Location-Feature Matrix

Activity-Activity Matrix

Location-Activity Matrix

Extraction

Page 7: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

7

Road Map

GPS Log

Grid-based Clustering

Collaborative Location and Activity Recommender

Laptops and PCs

PDAs and Smart-phones

1

2

3

54

1 Data Inputs

6

2 Stay Region Extration 3 Location-Activity Extraction 4 Location-Feature5 Activity Correlation Mining 6 Collaborative Loc. & Act. Recommendations

POI Category Database

World Wide Web

Location Feature Extraction

Activity Correlation Mining

Location-based Activity Statistics

Stay Regions Location-Feature Matrix

Activity-Activity Matrix

Location-Activity Matrix

Extraction

?

Page 8: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

8

GPS Log ProcessingGPS trajectories*

p4

p3

p5

p6

p7

a stay point sp1

P2

Latitude, Longitude, Arrival Timestampp1: 39.975, 116.331, 9/9/2009 17:54p2: 39.978, 116.308, 9/9/2009 18:08 …pK: 39.992, 116.333, 9/12/2009 13:56

a GPS trajectory

stay region r

Raw GPS points Stay points

• Stand for a geo-spot where a user has stayed for a while• Preserve the sequence and vicinity info

Stay regions

• Stand for a geo-region that we may recommend • Discover the meaningful locations

* In GPS logs, we have some user comments associated with the trajectories. Shown later.

Page 9: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

9

Stay Region ExtractionGrid-based clustering

Greedy algorithmEasy, fast and effective

O(n log n), due to sorting Return fixed-size regions

ExampleA big shopping area (“Zhonggancun”) in west Beijing,

>6km2

3 10 2

5 1 8 6

4

10

8

(2) DBSCAN[ε=0.001, MinPts = 4]

(1) K-means[K = 200]

(3) OPTICS[ε=0.05, MinPts = 4]

(4) Grid clustering[d=300]

Page 10: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

10

Location-Activity ExtractionLocation-activity matrix

Activity: tourism

“We took a tour bus to see around along the forbidden city moat …”

GPS: “39.903, 116.391, 14/9/2009 15:25”

Stay Region: “39.910, 116.400 (Forbidden City)”

+1Forbidden City

Tourism

Zhongguancun

Food

Location-Activity Matrix

User comments are few -> this matrix is sparse!Our objective: to fill this matrix.

Page 11: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

11

Road Map

GPS Log

Grid-based Clustering

Collaborative Location and Activity Recommender

Laptops and PCs

PDAs and Smart-phones

1

2

3

54

1 Data Inputs

6

2 Stay Region Extration 3 Location-Activity Extraction 4 Location-Feature5 Activity Correlation Mining 6 Collaborative Loc. & Act. Recommendations

POI Category Database

World Wide Web

Location Feature Extraction

Activity Correlation Mining

Location-based Activity Statistics

Stay Regions Location-Feature Matrix

Activity-Activity Matrix

Location-Activity Matrix

Extraction

?

Page 12: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

12

Location Feature ExtractionLocation features: Points of Interests (POIs)

restaurant

bank

shopping mall

restaurant

Stay Region: “39.980, 116.306 (Zhongguancun)”

[restaurant, bank, shop] = [3, 1, 1]

TF-IDF style normalization*: feature = [0.13, 0.32, 0.18]restaurant

TF-IDF (Term-Frequency Inverse Document Frequency):

Example: Assume in 10 locations, 8 have restaurants (less distinguishing), while 2 have banks and 4 have shops:

tf-idf(restaurant) = (3/5)*log(10/8) = 0.13tf-idf(bank) = (1/5)*log(10/2) = 0.32tf-idf(shop) = (1/5)*log(10/4) = 0.18

0.13

0.32

Forbidden City

restaurantbank

Location-Feature Matrix

Zhongguancun

Page 13: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

13

Road Map

GPS Log

Grid-based Clustering

Collaborative Location and Activity Recommender

Laptops and PCs

PDAs and Smart-phones

1

2

3

54

1 Data Inputs

6

2 Stay Region Extration 3 Location-Activity Extraction 4 Location-Feature5 Activity Correlation Mining 6 Collaborative Loc. & Act. Recommendations

POI Category Database

World Wide Web

Location Feature Extraction

Activity Correlation Mining

Location-based Activity Statistics

Stay Regions Location-Feature Matrix

Activity-Activity Matrix

Location-Activity Matrix

Extraction

?

Page 14: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

14

Activity Correlation ExtractionHow possible for one activity to happen, if another activity

happens? Automatically mined from the Web, potentially useful when

#(act) is large

Most mined correlations are reasonable. Example: “Tourism” with other activities.

Web search (from Bing) Human design (average on 8 subjects)-0.0999999999999994

5.82867087928207E-16

0.100000000000001

0.200000000000001

0.300000000000001

0.400000000000001

0.500000000000001

0.600000000000001

FoodSports

Movie

Shopping

-0.0999999999999994

5.82867087928207E-16

0.100000000000001

0.200000000000001

0.300000000000001

0.400000000000001

0.500000000000001

0.600000000000001

Food

SportsMovie

Shopping

“Tourism and Amusement” and

“Food and Drink”

Correlation = h(1.16M),where h is a normalization func.

Tourism-Shoppingmore likely to happen

together thanTourism-Sports

Page 15: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

15

Road Map

GPS Log

Grid-based Clustering

Collaborative Location and Activity Recommender

Laptops and PCs

PDAs and Smart-phones

1

2

3

54

1 Data Inputs

6

2 Stay Region Extration 3 Location-Activity Extraction 4 Location-Feature5 Activity Correlation Mining 6 Collaborative Loc. & Act. Recommendations

POI Category Database

World Wide Web

Location Feature Extraction

Activity Correlation Mining

Location-based Activity Statistics

Stay Regions Location-Feature Matrix

Activity-Activity Matrix

Location-Activity Matrix

Extraction

Page 16: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

16

Solution: Collaborative Location and Activity Recommendation (CLAR)Collaborative filtering, with collective matrix

factorization

Low rank approximation, by minimizing

After getting U* and V*, reconstruct the incomplete XEfficient: complexity is linear to #(loc), can handle large data

where U, V and W are the low-dimensional representations for the locations, activities and location features, respectively. I is an indicatory matrix.

Loc

atio

ns

Features Activities

X = UVTY = UWT Z = VVTU V

Activities

Loc

atio

ns

Act

ivit

ies

Page 17: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

17

ExperimentsData

2.5 years (2007.4-2009.10) 162 users 13K GPS trajectories, 4M GPS

points, 140K kilometers 530 comments

Evaluation Invite 5 subjects to give ratings

independently Location recommendation

Measured on top 10 returned locations for each of the 5 activities

Activity recommendation Measured on the 5 activities for top

20 popular locations with most visits

Normalized discounted cumulative gain (nDCG)

8%

40%45%

7%

age<=22 22<age<=25

26<=age<29 age>=30

16%

16%

6%62%

Microsoft employeesOther companies' employeesGovernment staffsCollege students

Example: for a rating <1,3,0,2>

Page 18: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

18

Result 1: System performancesImpact of location feature information (i.e. λ1 @

Fig.11)Impact of activity correlation information (i.e. λ2

@ Fig.12)Observations

The weight for each information source should be moderate

Using both sources outperforms using single source (i.e. λ1=0, λ2=0)

Page 19: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

19

Result 2: Baseline ComparisonSingle collaborative filtering (SCF)

Using only the location-activity matrix

Unifying collaborative filtering (UCF)Using all 3 matrices, but in a different wayFor each missing entry, combine the entries belonging to

the top N similar locations × top N similar activities in a weighted way

One-tail t-test p1<0.01, two-tail t-test p2<0.01

Page 20: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

20

Result 3: Impact of stay region sizeStay region: cluster of stay points, i.e. “locations”We propose a grid-based clustering algorithm to get

stay regionsd=300 implies a size of 300 × 300 m2

Stay region size Should not be too small (two regions refer to same place) or

too big (hard to find)

One-tail t-test p1<0.05, two-tail t-test p2<0.05

Page 21: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

21

Result 4: Impact of user number#(user)↑ -> data↑ -> #(loc)↑

Run on PC with dual core CPU, 2.33GHz, 2G RAMRunning time is linear to #(loc), converge fast (<300

iterations)

#(stay point) does not necessary linearly increase w.r.t. #(user)

#(s

tay

poin

t)

#(user) We are here.

perf

orm

an

ce

#(user)

Expected:

Page 22: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

22

Discussion 1Impact of the location types to activity

recommendationRecommend 5 activities for top 20 locations with most

visits Aggregate the evaluations and pick top 2 activities as

location types

0.8 0.85 0.9 0.95 1

nDCG@5

food & sports area

sports & tourism area

shopping & movie area

• More often happen• Strong dependency on location features - More likely to have many restaurant POIs - “sports” with parks and stadium

• Usually more comments for ”tourism”

• Usually also suitable for food hunting and sometimes tourism, dominated by them• Fewer comments

Page 23: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

23

Discussion 2Impact of the activity types to location

recommendationRecommend top 10 locations for each activity

0.7 0.75 0.8 0.85 0.9 0.95 1

nDCG@10

tourism & amusement

sports & exercises

food & drinks

movie & shows

shopping

• Popular places with higher scores are more likely to be recommended• Many of them are available for shows/movies

• Usually more comments

• More often happen

• Popular places are usually not suitable for “sports & exercises”• Usually fewer comments

Page 24: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

24

ConclusionWe show how to mine knowledge from the real-world GPS

data to answer two typical questions:If we want to do something, where shall we go?If we visit some place, what can we do there?

We evaluated our system on a large GPS dataset>7% improvement on activity recommendation>20% improvement on location recommendation

over the simple baseline without exploiting any additional infoFuture Work

Incorporate user features to provide personalized recommendation

Establish a comprehensive social network based on user activity and location history

Page 25: Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.

25

Thanks!Questions?

Vincent W. [email protected]

http://www.cse.ust.hk/~vincentz