Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University [email protected]

26
Wang-Chien Lee Pervasive Data Access (iPDA) Group Pennsylvania State University [email protected] Mining Social Network Big Data Intelligen t

description

Mining Social Network Big Data. Intelligent. Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University [email protected]. Research Dimensions. Networks. Intelligent Pervasive Data Access. Data. Mobility. Research Agenda. - PowerPoint PPT Presentation

Transcript of Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University [email protected]

Page 1: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

Wang-Chien LeePervasive Data Access (iPDA) Group

Pennsylvania State University

[email protected]

Mining Social Network Big DataIntelligent

Page 2: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

2

Research Dimensions

Industry Day

IntelligentPervasive Data

Access

Networks

MobilityData

4/3/14

Page 3: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

3

Research Agenda

Location-Based Services Road/Transportation Networks

Sensor Data Management Peer-to-Peer Data Management Wireless Data Broadcast and

Mobile Access Social Networks

Industry Day

Developing data management techniques for supporting complex services in networking and mobile environments

4/3/14

Page 4: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

4Industry Day

Big Data Landscape

4/3/14

Page 5: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

5

Social Media

4/3/14Industry Day

Page 6: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

6

Location-based Social Networks

Important Aspacts Users (Social Network) Places (Locations) Who visits Where in form of

check-in & trajectory logs

4/3/14Industry Day

Page 7: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

7

LBSN App.’s & Research Opp.’s LBSN users can track & share their locations and

relevant info. Collective social intelligence can be leveraged from user-

generated location data to enable novel applications. LBSN Applications

Suggesting the best restaurants, finding popular hiking routes, or forming a biking community.

Recommendation services for location, activity, trip planning, friends, etc.

Research opportunities Techniques for LBSN Apps, social network analysis, user

profiling, data management and mining, pervasive computing, etc, are urgently needed.

4/3/14Industry Day

Page 8: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

8

Point-of-Interest Recommendation POI Recommendation

Helps a user to explore new POIs

Good for local business to gain customers

Where to have dinner tonight?

Requirements Interests, e.g., Seafood Geo-proximity, e.g,, not

too far away Real-time, i.e., time is

money 4/3/14Industry Day

Page 9: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

9

Collaborative Filtering Treating POI as items

The idea is that users’ preference can be deduced by other users who exhibit similar visiting behaviors to POIs in previous check-in activities

Key issue is to find similar users and similar places/POIs effectively and efficiently.

4/3/14Industry Day

Page 10: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

10

Social & Geo Influences POI recommendation in LBSN is more than a

problem of item recommendation Social Network

People may turn to friends for suggestion Geographical Proximity

Tobler’s First law of geography “Everything is related to everything else, but near things are more related than distant things”

People may go to places near home or office favored places

4/3/14Industry Day

Page 11: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

11

Our approach Incorporate the following three factors:

User preference Social Influence from friends who has a role on user

activities. Geographical influence existing in user activities.

4/3/14Industry Day

User preference

Social Influence

Geo Influence DB

POI Recommendation System

Check in

Page 12: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

12

Recommendation based on user preference i.e., Pure collaborative filtering (CF) approach User-POI matrix

User Preference

POI1 POI2 POI3 POI4 POI5

User1 X X X

User2 X X

User3 X X

User4 X X X

User5 X X

Users with similar preference

4/3/14Industry Day

Page 13: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

13

Recommendation based on Social influence Social influenced CF

approach Similarity function considers

both the strength of social tie and check-in similarity …

Friend-POI matrix

Social Influence

POI1 POI2 POI3 POI4 POI5

User1 X X X

User2 X X

User4 X X X

POI1 POI2 POI3 POI4 POI5

User1 X X X

User2 X X

User3 X X

User4 X X X

User5 X X X

user1

user2

user3

user4

user5

4/3/14Industry Day

Page 14: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

Social Influence Selection Model

User u picks a friend (f) which includes herself (i.e., f=u). Social influence.

User f generates a latent topic z. User preference.

Latent topic z generates item i and a descriptive word w.

Nov. 201314Industry Day

Page 15: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

15

Phenomenon of spatial clustering in user’s check-ins

Geographical Influence

Let p1 and p2 denote two POIs, and d(p1,p2) be their distance, the probability is denoted by Pr[d(p1,p2)] How likely are two of a user’s check-in POIs in a given distance?

Power law

4/3/14Industry Day

Page 16: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

16

Exploiting Geographical Influence for Recommendation

Geographical Influence

User I’s check-in history Pi={p1,p2…}

Which POI is the best candidate to explore?

p1 p2

p3 p4

p5

User iq1

q2

q3

Pr[q1|Pi] = ?Pr[q2|Pi] = ?Pr[q3|Pi] = ?

4/3/14Industry Day

Page 17: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

17

Fusion Framework

User’s own preference

Social influence

Geographical influence

q1 (Su)

q2 (Ss)

q3 (Sg)

Fusion

q3q3 q2

q3 q1

q1 q2

q1 (S)

q2

4/3/14Industry Day

Page 18: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

18

Tags can support:1) Location search2) Recommendation service3) Data cleaning4) …

32.00%

68.00%

Places missing tagsPlaces with

tags

The above shows statistics summarized from our dataset collected from Whrrl. Statistics in our Foursquare dataset is similar.

Semantic Annotation of Places

Tags are very useful! Tags are missing

4/3/14Industry Day

Page 19: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

19

Problem Description

Given a database of user check-in logs <who, where, when> where some places are tagged, infer tags for the rest of places i.e., places with question mark in the above figure

How to automatically label appropriate tags on places is a very challenging issue!

Our approach is to reduce the place semantic annotation problem into a classification problem.

4/3/14Industry Day

Page 20: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

20

How to learn the classifier for a tag (or tag type)?

Feature extraction is very important Features explicitly describing places Features implicitly correlating similar places (i.e.,

places with same/similar tags) Feature source?

The SAP Framework

Feature Extraction Component

Check-in logs

Place

Binary classifier for tag t1

Binary classifier for tag t2

Binary classifier for tag tm

Decision for t1

Decision for t2

Decision for tm

Classification Process:

check-in logsIndustry Day 4/3/14

Page 21: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

21

What are the explicit patterns associated with individual places?

Explicit Patterns (EP) Extraction

EP Feature List

Total number of check-in

Total number of unique visitors

Maximum number of check-in of a single userDaily probability of check-in

Hourly probability of check-in

4/3/14Industry Day

Page 22: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

22

Are places really correlated? If yes, how do we extract the IR between places?

Places checked in by the user at around the same time are probably in the same category

Implicit Relatedness (IR) Extraction

00:00

23:59

Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 Day 8

Bars Bars

Bars

RestaurantRestaurant Restaurant

Restaurant

Restaurant

Restaurant

Restaurant Restaurant Restaurant

Restaurant

Shopping ShoppingShopping

Gym Health Beauty

Spa

?

Check-in log of a user.

Industry Day 4/3/14

Page 23: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

23

Build an NRP by exploring the regularities in users-places and time-places interactions.

Network of Related Places (NRP)

Relatedness between places

Network of Related Places (NRP)

Users Places

Times Places

RandomWalkwith

Restart

4/3/14Industry Day

Page 24: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

24

Label Propagation on NRPIR features:Tag 1 – score1Tag 2 – score2….Tag k – scorek

restaurant

restaurant

shopping

?

restaurant

restaurant

shopping

Label propagation

Restaurant 0.66Shopping 0.34

restaurant

restaurant

shopping

restaurant

restaurant

4/3/14Industry Day

Page 25: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

25

LBSNs have received a lot of attention from the research community LBSN data have rich social and location information.

Novel applications can be developed from the rich user-generated data in LBSNs. We have incorporated social and geo influences with

collaborative filtering technique for POI recommendation. To address the semantic annotation problem in LBSNs,

we extract explicit pattern (EP) of individual places and implicit relatedness (IR) among places to classify the missing tags.

New applications and more research are forth coming.

Conclusion

4/3/14Industry Day

Page 26: Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

26

4/3/14Industry Day