Discovery Hub: on-the-fly linked data exploratory search

Post on 15-Jul-2015

1.311 views 2 download

Tags:

Transcript of Discovery Hub: on-the-fly linked data exploratory search

Discovery Hub: on-the-fly linked data exploratory search

Nicolas Marie, Fabien Gandon, Myriam RibièreFlorentin Rodio, Damien Legrand

CONTEXTPROPOSITIONEVALUATIONCONCLUSION

Search…ExploratoryLookup

???« members » + « The Beatles»

Precise information need Fuzzy information need

you are here

related work…Aemoo Kaminskas & al. LED MORE Seevl Yovisto

Purpose Explorator

y search

Cross-domain

recommendation

Exploratory

search on

ICT domain

Film

recommendati

on

Musical

recommendati

on

Video

exploratory

search

Data DBpedia

EN +

external

services

DBpedia EN

subset

DBpedia +

external

services

DBpedia EN

subset

DBpedia EN

subset

DBpedia

EN+DE

subset

Multi-domain Yes Cross two

domains

No No, cinema No, music Yes

Query Entity

search

Entity selection in

a pre-processed

list

Entity search Entity search Entity

recognition

from Youtube.

Entity

recognition in

keywords

Algorithm EKP

filtered

view

weighted

activation

DBpedia

Ranker

sVSM algo. DBrec

algorithm

Set of

heuristics

Ranking No Yes Yes Yes Yes Yes

Explanations Wikipedia-

based

Path-based No Shared prop. Shared

properties

No

Offline proc. Yes , EKP

part

Yes Yes Yes Yes Yes

goal: domain-independent, customizable, on the fly, remote sources

composite interest queries

knowing my interest for X and Y what can I

discover/learn which is related to all these resources?

The Beatles Ken Loach

CONTEXTPROPOSITIONEVALUATIONCONCLUSION

principle

results selectionrankingsorting/categorizationexplanations

1

2

3

4

http://dbpedia.org/resource/Ken_Loach

…dbpedia.org/resource/The_Beatles

research questions

1. How can we discover linked resources of interest

to be explored ?

2. How to address remote LOD sources for this?

3. How to present and explain the results to the user

for an exploratory objective ?

http://fr.dbpedia.org/sparql

http://es.dbpedia.org/sparql

http://it.dbpedia.org/sparql

semantic adaptation of spreading activation

1

0,2

0,2

0,2 0,2

0,1

0,6

0,6

1

0,8

1

example of semantic spreading activation

Album, Band, Film, Musical Artist, Music Genre, Person, Radio Station, Single, Song, Television Show

Company, Election, Film, Journalist, Musical Artist, Newspaper, Office Holder, Organisation, Politician, School, Single, Television Show, Writer

propagation domain propagation domain

research questions

1. How can we discover linked resources of interest

to be explored ?

2. How to address remote LOD sources for it?

3. How to present and explain the results to the user

for an exploratory objective ?

http://fr.dbpedia.org/sparql

http://es.dbpedia.org/sparql

http://it.dbpedia.org/sparql

sampling algorithm

1.sparql endpoint = http://xxx/sparql

2.seeds = xxx//The_Beatles, xxx/Ken_Loach

3. compute the propagation domain (w(i,o))

4. find a path between the seeds

5. import path nodes & their neighbors

6. for(i=1; i<=maxPulse; i++){

7. pulse();

8. if(sampleSize <= maxSampleSize){

9. extend the sample

10. }

11.}

iterative import

Local Kgram instance

Online LOD source

magic numbers

1.sparql endpoint = http://xxx/sparql

2.seeds = xxx//The_Beatles, xxx/Ken_Loach

3. compute the propagation domain (w(i,o))

4. find a path between the seeds

5. import path nodes & their neighbors

6. for(i=1; i<=maxPulse; i++){

7. pulse

8. if(sampleSize <= maxSampleSize){

9. extend the sample

10. }

11.}

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

500

1000

1500

2000

2500

3000

3500

4000

4500

0 5000 10000 15000 20000

Ken

da

ll T

au

Resp

on

se

Tim

e

Triples loading limit

Sample size influence on top 100 results, maxSampleSize

Convergence, top 100 results maxPulse

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10

Ken

dall-T

au

Sh

are

d r

es

ult

s

Iterations

Response time histogram

000001111111111222222

seco

nd

s

Queries response time histogram

5

20

research questions

1. How can we discover linked resources of interest

to be explored ?

2. How to address remote LOD sources for it?

3. How to present and explain the results to the user

for an exploratory objective ?

http://fr.dbpedia.org/sparql

http://es.dbpedia.org/sparql

http://it.dbpedia.org/sparql

Discovery Hub 1.0

1. Start from what you like or are interested in

3. Be redirected on third-party platforms to continue the

discovery experience

Book

2. Explore, understand, disco

ver

Discovery Hub 1.0

short demo

CONTEXTPROPOSITIONEVALUATIONCONCLUSION

composite queries

• randomly combining Facebook likes of 12 users

• two queries for each participants to judge the top 20 results

­ The result interests me [Strongly Disagree … Strongly Agree ]

­ The result is unexpected [Strongly Disagree … Strongly Agree ]

Very interesting

Not interesting at all

overall•61.6% of the results were rated as strongly relevant

or relevant by the participants.

•65% of the results were rated as strongly

unexpected or unexpected.

•35.42% of the results were rated both as strongly

relevant or relevant and strongly unexpected or

unexpected.

Explanatory features evaluation

Common prop. Wiki-based Graph-based OverallCommon prop. Wiki-based Graph-based Overall

Very Helpful

Not helpful at all

comparison SSA(Discovery Hub) vs. sVSM (More)

• Hypothesis 1: SSA gives results at least as relevant as sVSM.

• Hypothesis 2: SSA has a weaker degradation than sVSM (better end-lists).

• Hypothesis 3: results less relevant but newer to users at the end of the lists.

• Hypothesis 4: advanced search gives better results compared to standard

query.

Measure Algo Rank Mean St. Dev.

Relevance SSA 1-10 1.54 0.305

11-20 1.28 0.243

sVSM 1-10 1.42 0.294

11-20 0.93 0.228

Discovery SSA 1-10 1.10 0.247

11-20 1.21 0.228

sVSM 1-10 1.14 0.251

11-20 1.50 0.205 0

0.5

1

1.5

2

2001 Erin Term Princess Fight Overall

SC

OR

E

SSA sVSM

CONTEXTPROPOSITIONEVALUATIONCONCLUSION

•semantic spreading activation

algorithm coupled to a graph

sampling to address remote

LOD sources.

• faceted browsing and

multiple explanations of

the results.

•on-going extensive user

evaluation

•publicly available http://discoveryhub.co

Discovery Hub : enabling exploratory

search starting from several interests

using linked data sources

1

0,2

0,2 0,2

0,6

0,6

1

0,8

1

current work: ­ propagation over multiple data sources in parallel.

­ redesign of the interface: Discovery Hub 2.0 released

perspective: other applications of semantic spreading

activation

multi-lingual modedbpedia:Charles_Baudelaire sameAs fr.dbpedia:Charles_Baudelaire

French

English

http://discoveryhub.co/

@discovery_hub

werarediscoveryhub@gmail.com