SSSW 2013 - Feeding Recommender Systems with Linked Open Data

36
T OOLS AND TECHNIQUES FEEDING RECOMMENDER S YSTEMS WITH LINKED OPEN D ATATommaso Di Noia [email protected] Politecnico di Bari ITALY SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

description

Basic introduction to recommender systems + Implementing a content-based recommender system by leveraging knowledge encoded into Linked Open Data datasets

Transcript of SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Page 1: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

TOOLS AND TECHNIQUES

“FEEDING RECOMMENDER SYSTEMS WITH LINKED OPEN DATA”

Tommaso Di Noia [email protected]

Politecnico di Bari ITALY

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 2: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Recommender Systems

Input Data: A set of users U={u1, …, uM} A set of items X={x1, …, xN} The rating matrix R=[ru,i]

Problem Definition:

Given user u and target item i Predict the rating ru,i

A definition Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user. [F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.]

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 3: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

The rating matrix

5 1 2 4 3 ??

2 4 5 3 5 2

4 3 2 4 1 3

3 5 1 5 2 4

4 4 5 3 5 2

The

Mat

rix

Tita

nic

I lo

ve s

ho

pp

ing

Arg

o

Love

Act

ual

ly

The

han

gove

r

Tommaso

Enrico

Sean

Natasha

Valentina

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 4: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

The rating matrix (in the real world)

5 4 3 ??

2 4 5 5

3 4 3

3 5 5 2

4 4 5 5 2

The

Mat

rix

Tita

nic

I lo

ve s

ho

pp

ing

Arg

o

Love

Act

ual

ly

The

han

gove

r

Tommaso

Enrico

Sean

Natasha

Valentina

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 5: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Collaborative Recommender Systems

Collaborative RSs recommend items to a user by identifying other users with a similar profile

Recommender System

User profile

Users

Item7 Item15 Item11 …

Top-N Recommendations

Item1, 5 Item2, 1 Item5, 4 Item10, 5 ….

….

Item1, 4 Item2, 2 Item5, 5 Item10, 3 ….

Item1, 4 Item2, 2 Item5, 5 Item10, 3 ….

Item1, 4 Item2, 2 Item5, 5 Item10, 3 ….

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 6: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Content-based Recommender Systems

Recommender System

User profile

Item7 Item15 Item11 …

Top-N Recommendations

Item1, 5 Item2, 1 Item5, 4 Item10, 5 ….

Items

Item1 Item2

Item100 Item’s

descriptions

….

CB-RSs recommend items to a user based on their description and on the profile of the user’s interests

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 7: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Knowledge-based Recommender Systems

Recommender System

User profile

Item7 Item15 Item11 …

Top-N Recommendations Item1, 5 Item2, 1 Item5, 4 Item10, 5 ….

Items

Item1 Item2

Item100 Item’s descriptions

….

KB-RSs recommend items to a user based on their description and domain knowledge encoded in a knowledge base

Knowledge-base

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 8: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

User-based Collaborative Recommendation

5 1 2 4 3 ??

2 4 5 3 5 2

4 3 2 4 1 3

3 5 1 5 2 4

4 4 5 3 5 2

The

Mat

rix

Tita

nic

I lo

ve s

ho

pp

ing

Arg

o

Love

Act

ual

ly

The

han

gove

r

Tommaso

Enrico

Sean

Natasha

Valentina

𝑠𝑖𝑚 𝑢𝑖 , 𝑢𝑗 = 𝑟𝑢𝑖,𝑥 − 𝑟𝑢𝑖

∗ 𝑟𝑗,𝑥 − 𝑟𝑢𝑗𝑥∈𝑋

𝑟𝑢𝑖,𝑥 − 𝑟𝑢𝑖

2

𝑥∈𝑋 ∗ 𝑟𝑢𝑗,𝑥 − 𝑟𝑢𝑗

2

𝑥∈𝑋

Pearson’s correlation coefficient

Rate prediction

𝑟 𝑢𝑖 , 𝑥′ = 𝑟𝑢𝑖

+ 𝑠𝑖𝑚 𝑢𝑖 , 𝑢𝑗 ∗ 𝑟𝑢𝑗,𝑥

′ − 𝑟𝑢𝑗𝑢𝑗

𝑠𝑖𝑚(𝑢𝑖 , 𝑢𝑗)𝑢𝑗

= 𝑋

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 9: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Item-based Collaborative Recommendation

5 1 2 4 3 ??

2 4 5 3 5 2

4 3 2 4 1 3

3 5 1 5 2 4

4 4 5 3 5 2

The

Mat

rix

Tita

nic

I lo

ve s

ho

pp

ing

Arg

o

Love

Act

ual

ly

The

han

gove

r

Tommaso

Enrico

Sean

Natasha

Valentina

𝑠𝑖𝑚 𝑥𝑖 , 𝑥𝑗 = 𝑥𝑖 ⋅ 𝑥𝑗

|𝑥𝑖| ∗ |𝑥𝑗|=

𝑟𝑢,𝑥𝑖∗ 𝑟𝑢,𝑥𝑗𝑢

𝑟𝑢,𝑥𝑖2

𝑢 ∗ 𝑟𝑢,𝑥2

𝑢

Cosine Similarity

Rate prediction

𝑟 𝑢𝑖 , 𝑥′ =

𝑠𝑖𝑚 𝑥 , 𝑥 ′ ∗ 𝑟𝑥,𝑢𝑖𝑥∈𝑋𝑢𝑖

𝑠𝑖𝑚 𝑥 , 𝑥 ′𝑥∈𝑋𝑢𝑖

𝑠𝑖𝑚 𝑥𝑖 , 𝑥𝑗 = 𝑟𝑢,𝑥𝑖

− 𝑟𝑢 ∗ 𝑟𝑢,𝑥𝑗− 𝑟𝑢 𝑢

𝑟𝑢,𝑥𝑖− 𝑟𝑢

2𝑢 ∗ 𝑟𝑢,𝑥𝑗

− 𝑟𝑢 2

𝑢

Adjusted Cosine Similarity

= 𝑋𝑢𝑖

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 10: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Content-Based Recommender Systems

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

P. Lops, M. de Gemmis, G. Semeraro. Content-based recommender Systems: State of the Art and Trends. In: P. Kantor, F. Ricci, L. Rokach, B. Shapira, editors, Recommender Systems Hankbook: A complete Guide for Research Scientists & Practitioners

Page 11: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Content-Based Recommender Systems

• Items are described in terms of attributes/features

• A finite set of values is associated to each feature

• Items representation is a (Boolean) vector

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 12: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Content-Based Recommender Systems

• Compute similarity between items

𝑠𝑖𝑚 𝑥𝑖 , 𝑥𝑗 = |𝑥𝑖 ∩ 𝑥𝑗|

|𝑥𝑖 ∪ 𝑥𝑗|

Jaccard similarity

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 13: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Content-Based Recommender Systems

• Compute similarity between items

𝑠𝑖𝑚 𝑥𝑖 , 𝑥𝑗 = 𝑥𝑖 ⋅ 𝑥𝑗

𝑥𝑖 ∗ |𝑥𝑗|

Cosine similarity

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 14: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Content-Based Recommender Systems

• Compute similarity between items

𝑠𝑖𝑚 𝑥𝑖 , 𝑥𝑗 = 𝑥𝑖 ⋅ 𝑥𝑗

𝑥𝑖 ∗ |𝑥𝑗|

Cosine similarity and TF-IDF

𝑇𝐹 𝑣, 𝑥 = # 𝑣 𝑎𝑝𝑝𝑒𝑎𝑟𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛 𝑜𝑓 𝑥

𝐼𝐷𝐹 𝑣 = log|𝑋|

#𝑣 𝑎𝑝𝑝𝑒𝑎𝑟𝑠 𝑖𝑛 𝑋

𝑥 = (𝑇𝐹 𝑣1, 𝑥 ∗ 𝐼𝐷𝐹 𝑣1 , … , 𝑇𝐹 𝑣𝑛, 𝑥 ∗ 𝐼𝐷𝐹 𝑣𝑛 )

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 15: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Content-Based Recommender Systems

Rate prediction

𝑟 𝑢𝑖 , 𝑥′ =

𝑠𝑖𝑚 𝑥 , 𝑥 ′ ∗ 𝑟𝑥,𝑢𝑖𝑥∈𝑋𝑢𝑖

𝑠𝑖𝑚 𝑥 , 𝑥 ′𝑥∈𝑋𝑢𝑖

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 16: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Content-Based Recommender Systems

• Nearest neighbors

– Given a set of items representing the user profile, select their most similar items which are not in the user profile

– Predict the rate only for the N nearest neighbors

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 17: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Content-Based Recommender Systems

• Nearest neighbors with kNN

𝑥𝑖,1 𝑥𝑖,2 𝑥𝑖,3

𝑥𝑖,4

𝑥𝑖,5

𝑥𝑖,6

k = 5

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 18: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Content-Based Recommender Systems

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 19: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Need of domain knowledge! We need rich descriptions of the items!

No suggestion is available if the analyzed content does not contain enough information to discriminate items the user might like from items the user might not like.*

(*) P. Lops, M. de Gemmis, G. Semeraro. Content-based Recommender Systems: State of the Art and Trends. In: P. Kantor, F. Ricci, L. Rokach and B. Shapira, editors, Recommender Systems Handbook: A Complete Guide for Research Scientists & Practitioners

The quality of CB recommendations are correlated with the quality of the features that are explicitly associated with the items.

Main CB RSs Drawback: Limited Content Analysis

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 20: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

A solution based on Linked Data

Use Linked Data to mitigate the limited content analysis issue

• Plenty of structured data available

• No Content Analyzer required

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 21: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

FRED: Transforming Natural Language text to RDF/OWL graphs

input

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Courtesy of Valentina Presutti

Page 22: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

FRED: RDF/OWL graph output

Resolution and linking Vocabulary alignment

Frames/events

Taxonomy induction

Semantic roles Co-reference

Custom namespace

Types

Induced types (sense tagging)

22

Courtesy of Valentina Presutti

Page 23: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Linked Data as structured information source for items descriptions

Rich items descriptions

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 24: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Select the domain(s) of your RS

SELECT count(?i) AS ?num ?c

WHERE {

?i a ?c .

FILTER(regex(?c, "^http://dbpedia.org/ontology")) .

}

ORDER BY DESC(?num)

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 25: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Tìpalo: automatic typing of DBpedia entities

Input: natural language definition of an entity

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Courtesy of Valentina Presutti

Page 26: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

26

Linking to WN supersenses

DBpedia resource Extracted type

Disambiguated sense

Inferred superclass

Linked resource

Disambiguated sense

A chaise longue is an upholstered sofa in

the shape of a chair that is long enough

to support the legs.

A chaise longue (English /ˌʃeɪz ˈlɔːŋ/;[1] French

pronunciation: [ʃɛzlɔ̃ŋɡ(ə)], "long chair") is an

upholstered sofa in the shape of a chair that is long

enough to support the legs.

Linking to DOLCE

http://en.wikipedia.org/wiki/Chaise_longue

Rich taxonomical data!

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Courtesy of Valentina Presutti

Page 27: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Select the sub-graph you are interested in

select ?m ?p ?o1

where{

?m ?p ?o1 .

?m dcterms:subject ?o.

dbpedia:The_Matrix dcterms:subject ?o.

?m a dbpedia-owl:Film.

filter(regex(?p,"^http://dbpedia.org/ontology/")).

}

order by ?m

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 28: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Enriching Annotations with Resources From the Web

Watson: find ontologies on the Semantic Web for enriching the representation of your data

Sindice: find other entities for enriching your knowledge base

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

SameAs.org: find other entities to link to from your knowledge base

Courtesy of Valentina Presutti

Page 29: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Computing similarity in LOD datasets

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 30: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Vector Space Model for LOD

Righteous Kill

starring director

subject/broader genre

Heat

Ro

be

rt D

e N

iro

Joh

n A

vnet

Seri

al k

ille

r fi

lms

Dra

ma

Al P

acin

o

Bri

an D

en

ne

hy

He

ist

film

s C

rim

e f

ilms

starring

Ro

be

rt D

e N

iro

A

l Pac

ino

B

rian

De

nn

eh

y

Righteous Kill Heat

… …

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 31: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Vector Space Model for LOD

Righteous Kill

STARRING Al Pacino

(v1)

Robert De Niro

(v2)

Brian Dennehy

(v3)

Righteous Kill (m1)

X X X

Heat (m2) X X

Heat

Righteous Kill (x1) wv1,x1 wv2,x1 wv3,x1

Heat (x2) wv1,x2 wv2,x2 0

𝑤𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜,𝐻𝑒𝑎𝑡 = 𝑡𝑓𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜,𝐻𝑒𝑎𝑡 ∗ 𝑖𝑑𝑓𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 32: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Vector Space Model for LOD

Righteous Kill

STARRING Al Pacino

(v1)

Robert De Niro

(v2)

Brian Dennehy

(v3)

Righteous Kill (m1)

X X X

Heat (m2) X X

Heat

Righteous Kill (x1) wv1,x1 wv2,x1 wv3,x1

Heat (x2) wv1,x2 wv2,x2 0

𝑤𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜,𝐻𝑒𝑎𝑡 = 𝑡𝑓𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜,𝐻𝑒𝑎𝑡 ∗ 𝑖𝑑𝑓𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜

𝑡𝑓 ∈ {0,1}

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 33: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Vector Space Model for LOD

+

+

+

… =

𝒔𝒊𝒎𝒔𝒕𝒂𝒓𝒓𝒊𝒏𝒈(𝒙𝒊, 𝒙𝒋) = 𝒘𝒗𝟏,𝒙𝒊

∗ 𝒘𝒗𝟏,𝒙𝒋+ 𝒘𝒗𝟐,𝒙𝒊

∗ 𝒘𝒗𝟐,𝒙𝒋+ 𝒘𝒗𝟑,𝒙𝒊

∗ 𝒘𝒗𝟑,𝒙𝒋

𝒘𝒗𝟏,𝒙𝒊𝟐 + 𝒘𝒗𝟐,𝒙𝒊

𝟐 + 𝒘𝒗𝟑,𝒙𝒊𝟐 ∗ 𝒘𝒗𝟏,𝒙𝒋

𝟐 + 𝒘𝒗𝟐,𝒙𝒋𝟐 + 𝒘𝒗𝟑,𝒙𝒋

𝟐

𝜶𝒔𝒕𝒂𝒓𝒓𝒊𝒏𝒈 ∗ 𝒔𝒊𝒎𝒔𝒕𝒂𝒓𝒓𝒊𝒏𝒈(𝒙𝒊, 𝒙𝒋)

𝜶𝒅𝒊𝒓𝒆𝒄𝒕𝒐𝒓 ∗ 𝒔𝒊𝒎𝒅𝒊𝒓𝒆𝒄𝒕𝒐𝒓(𝒙𝒊, 𝒙𝒋)

𝜶𝒔𝒖𝒃𝒋𝒆𝒄𝒕 ∗ 𝒔𝒊𝒎𝒔𝒖𝒃𝒋𝒆𝒄𝒕(𝒙𝒊, 𝒙𝒋)

𝒔𝒊𝒎𝒔𝒕𝒂𝒓𝒓𝒊𝒏𝒈(𝒙𝒊, 𝒙𝒋)

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 34: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

A LOD-based CB RS

Given a user profile defined as:

We can predict the rating using a Nearest Neighbor Classifier wherein the similarity measure is a linear combination of local property similarities

𝑿𝒖𝒊= { < 𝒙𝒊, 𝒓𝒙𝒊,𝒖𝒊

> }

𝑟 𝑢𝑖 , 𝑥′ =

𝛼𝑝 ∗ 𝑠𝑖𝑚𝑝(𝑥𝑖 , 𝑥 ′)𝑝

𝑃∗ 𝑟𝑥𝑖,𝑢𝑖<𝑥𝑖,𝑟𝑥𝑖,𝑢𝑖

> ∈ 𝑋𝑢𝑖

|𝑋𝑢𝑖|

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 35: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

We’ve just scratched the surface…

Missing in this presentation (not a complete list)

• Learning 𝛼𝑝

• Explanation

– Quite straight with a content-based approach

• Hybrid approaches

– Mix a content based approach with a collaborative one

– Collaborative similarity as a further vector space (property) of the content based approach?

– Exploit the graph-based nature of both the rating matrix and the RDF graph

• … SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web

Page 36: SSSW 2013 - Feeding Recommender Systems with Linked Open Data

Many thanks to Vito Claudio Ostuni and Roberto Mirizzi

SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web