On the Enrichment of a RDF Repository of City Points of Interest based on Social Data Zied Sellami*,...

20
On the Enrichment of a RDF Repository of City Points of Interest based on Social Data Zied Sellami*, Gianluca Quercini**, Chantal Reynaud* *IASI Team, Université Paris-Sud 11, France {sellami, reynaud}@lri.fr **E3S Team, Supélec, France [email protected] WOD’2013 - Paris - 03 th June, 2013

Transcript of On the Enrichment of a RDF Repository of City Points of Interest based on Social Data Zied Sellami*,...

On the Enrichment of a RDF Repository of City Points of Interest based on Social Data

Zied Sellami*, Gianluca Quercini**, Chantal Reynaud*

*IASI Team, Université Paris-Sud 11, France

{sellami, reynaud}@lri.fr

**E3S Team, Supélec, France

[email protected]

WOD’2013 - Paris - 03th June, 2013

/20

Outline

1. Introduction and Related Issues

2. Reconciliation of POI Data and Social Data

3. Enrichment based on Opinion Mining

4. Experiments and Results

5. Conclusion and Future Work

2WOD'2013, Paris, June 2013

/20

Introduction and Related Issues

Points of interest (POI) : geographic locations Restaurants, museums, hotels, theatres, landmarks, etc.

Formalized as a RDF repository in the context of the DataBridges project (Quercini et al., 2012)

A POI is described by facets (or attributes) : name, type, category, address, longitude and latitude.

Example of POI : Louvre Museum

3WOD'2013, Paris, June 2013

/20

Introduction and Related Issues

POIs are automatically obtained by data extracting from Google Fusion Tables (GFT) (Quercini et al., 2012)

Some extracted POIs contains few attributes Some extracted POIs do not contains a precise attributes (not complete

address, not precise geographic location) Lack of valuable indications in the extracted POIs (users reviews, official

Web Site, e-mail, etc.)

Enrich and Correct POIs Additional elements : Phone number, e-mail, official web site… Useful indications to potential visitors (good and bad aspects of

the place)

Enrich POI using what? Using Social Networking Systems (Social Data)

4WOD'2013, Paris, June 2013

/20

Matching POIs Across Social Networks

Accessing and searching social Web Pages concerning POI

1. Yelp (http://www.yelp.com/) Social networking site for retrieving and reviewing POI

2. Foursquare (https://foursquare.com/) Application combining geolocalisation and social guidance

Similar searching method Input: name and geographic position Output: list of Web Pages of POI related to the geographic

position and words included in the query

Filtering the list to select only pertinent Web pages

5WOD'2013, Paris, June 2013

/20

Matching POIs Across Social Networks

Selecting the appropriate Web Pages for a POI Computing a similarity value

Several parameters can be used Name Address Category Longitude and Latitude

Definition of a similarity formula

6WOD'2013, Paris, June 2013

/20

Matching POIs Across Social Networks: Similarity Measure

2 parameters used : name; longitude and latitude Different social data different manner to describe category

Eiffel Tower (Monument, Garden, etc ; Landmarks, Historical Building)

Uncontrolled social data string address incomplete or wrong O Pelicano (Portugal); Restaurante O Requinte (Portugal); etc.

String techniques for name pruning and name comparison Stemming with porter algorithm; stop words lists Levensthein distance and Jaccard distance

Filtering results using distance proximity Processing geographic distance between POI and Web Page by

using longitude and latitude

7WOD'2013, Paris, June 2013

/20

Matching POIs Across Social Networks: Similarity Measure

Similarity measure WP(x).name : name of an entity x in a Web Page p.name : name of a POI

Combination of Levenshtein and Jaccard Boosts the similarity score between names that employ words

even in a different order Example : Museum of Louvre; Louvre Museum

8WOD'2013, Paris, June 2013

/20

Matching POIs Across Social Networks: Filtering Measure

Filtering measure δ1 and δ2 : similarity name thresholds

distmax: distance thresholds

Thresholds values fixed after some experiments δ1 = 0.9 and δ2 = 0.7

distmax = 1000 meters

9WOD'2013, Paris, June 2013

/20

Opinion Mining

Evaluation of the POI from reviews and comments Notation: Good, Very Good, Bad, Very Bad, etc. Useful information for a potential visitor:

What is interesting? (food, ambiance, place, etc.) What is to be avoided? (drink, person, etc.)

Go further than a conventional sentiment analysis Tweets classification (positive, negative or undetermined) (Pak

and Paroubek, 2010) http://smm.streamcrab.com/ http://www.sentiment140.com/

Linguistic approach for opinion mining

10WOD'2013, Paris, June 2013

/20

Opinion Mining: Principle

Identification of positive and negative expressions Using Verbs and adjectives (Chesley et al., 2006) (Moghaddam

and Popowich, 2010) (Li et al., 2012) Example : Great food, not good place, I like the place, etc.

Generating a lexicon of positive and negative verbs and adjectives

Processing with TreeTagger a lexicon of positive words and negative words

http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html Positive adjectives (1467 adj) / Negative adjectives (1609 adj) Positive verbs (421 verb) / Negative Verbs (1243 verb)

11WOD'2013, Paris, June 2013

/20

Opinion Mining: Phrase Extraction

Definition of lexico-syntactic patterns to identify pertinent expressions

Expressions describing objects 1. (NOT)* ADJ OBJECT (Great food, not interesting place, etc.)

2. OBJECT BE ADJ (Sandwich is good, restaurant is nice, etc.)

Expressions describing sentiments or advice1. ITS ADJ (it’s interesting, it’s happy, etc.)

2. I FEEL OR SUGGEST OBJECT ( I like this place, I advice you to test the hotel, etc.)

3. I FEEL (NOT)* ADJ (I feel happy, I feel very hungry, etc.)

Implementation with Java Regex

12WOD'2013, Paris, June 2013

/20

Repository Enrichment: Notation of a POI

Notation measure

Scale for giving appreciation to POI

Very bad Bad Medium Indetermined Fairly GoodVery Good

-10 -6,6 -3,3 0 3,3 6,6 10

13WOD'2013, Paris, June 2013

/20

Repository Enrichment: Identifying Useful Information

1. General assessment Expressions describing sentiments Expressions describing objects concerning the place; the name

of a POI; or one of the POI category

2. Tips Expressions describing advices

3. Specific ideas Expressions describing objects other than place; name or

category of the POI

14WOD'2013, Paris, June 2013

/20

Evaluation of the Similarity Measure

Dataset : 600 POI compared with foursquare data Comparing our formula with Levenshtein and Jaccard

The combination of Levenshtein and Jaccard improves the similarity precision

Our formula and Levenshtein have a same F-measure Precision parameter is more important

Formula Precision Recall F-measure

Name + Levenshtein 0.84 0.68 0.75

Name + Jaccard 0.85 0.66 0.74

Our formula 0.86 0.66 0.75

15WOD'2013, Paris, June 2013

/20

Evaluation of the Opinion Mining Approach

40 Yelp reviews of Louvre Museum and Eiffel Tower Louvre Museum notation: Very Good (7.23) Eiffel Tower notation: Very Good (8.58)

Louvre Museum Eiffel Tower

General assessment:Positive [magnificent place, beautiful place, good museum, prestigious museum]Negative [crowded place, hard museum, uncomfortable museum]Tips:go basement, visit basement, not use pyramid entranceSpecific ideas:Positive [contemporary art, contemporary sculpture, original decor, real mummy]Negative [sketchy people, strange marble sculpture, massive crowd, grumpy folk]

General assessment:Positive [great place, funny place, beautiful monument]Negative []Tips:Go topSpecific ideas:Positive [good view, panoramic view, light show]Negative [slow elevator, crazy line, illegal Eiffel tower souvenir]

16WOD'2013, Paris, June 2013

/20

Evaluation of the Opinion Mining Approach

Comparison with sentiment140 (statistical approach) Analysis of 20 tweets concerning Louvre Museum and 14 tweets

concerning Eiffel TowerPolarity of Louvre Museum tweet

sentiment140 Our approach

Positive 13 10

Negative 2 0

Undetermined 5 10

Polarity of Eiffel Tower tweet

sentiment140 Our approach

Positive 11 10

Negative 1 1

Undetermined 2 3

Not contradictory resultsOur approach identified 3 sentiments that where not identified by sentiment140

2 tweets analyzed differentlyOur approach identified the correctness polarity

17WOD'2013, Paris, June 2013

/20

Conclusion

Original approach for POI data enrichment Definition of a similarity formula to compare POI data Linguistic approach to analyze reviews and comments Complete tool implemented in Java

Experiments shows promising results About 86 % of similarity precision Linguistic approach able to identify exactly positive and negative

aspects of the POI

18WOD'2013, Paris, June 2013

/20

Future Work

1. Similarity measure optimisation Compare selected Web Pages for the POI

2. Filtering positive and negative expressions Using metrics like frequency

3. Learning new positive and negative verbs and adjectives Using SentiWordNet (Baccianella et al., 2010)

4. Using adverbs in the opinion mining approach (Benamara et al., 2007)

Very good food is stronger than Good food

19WOD'2013, Paris, June 2013

/20

Thank you!

20WOD'2013, Paris, June 2013