SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

22
SEMashup -Mazen Alsarem & Pierre-Edouard Portier 1 How to enhance Web snippets with Linked Data? Mazen Alsarem & Pierre-Edouard Portier Laboratory LIRIS, INSA de Lyon, France SEMashup

description

 

Transcript of SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

Page 1: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier 1

How to enhance Web snippets with Linked Data?Mazen Alsarem & Pierre-Edouard PortierLaboratory LIRIS, INSA de Lyon, France

SEMashup

Page 2: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

2

Given the query: “epimenides knossos paradox”,Among the first results returned by the Google

SE, we find these snippets:

Page 3: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

3

We enhance these snippets:

Page 4: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

4

Our snippet highlights an alternative excerpt to better summarize the conceptual content of the document.

Page 5: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

5

Alternative excerpt:

Page 6: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

6

Our snippet also accentuates concepts that are present in the document and related to the user's information need as expressed by her query.

Page 7: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

7

Important concepts:

Page 8: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

8

After clicking the concept “Epimenides”:

Page 9: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

9

Auto scrolling to an instance of the concept “Epimenides” in the underlying document:

Page 10: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

10

How is it done?

Page 11: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

11

A mashup of Web of Data services

We use the DBpedia Spotlight service to extract concepts from the document.

Page 12: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

12

Page 13: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

13

A mashup of Web of Data services

We use the DBpedia Spotlight service to extract concepts from the document.

We query a DBpedia SPARQL endpoint to find existing triples between the concepts.

Page 14: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

14

dbp_res:Bertrand_Russell

dbp_res:Logic

dbp_res:Mathematics

dbp_res:Zondervan

dbp_res:Grand_Rapids,_Michigan

dbp_res:Callimachus

dbp_res:Alexandria

dbp_ont:mainInterest dbp_prop:deathPlace

dbp_prop:headquarters

Page 15: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

15

In order to benefit from the Linked Data, we need to select the concepts to extend.

We propose to rank the concepts by their importance relatively to the user's information need.

To do this efficiently, we cannot rely only on the small graph we built, but we need to go back to the textual content of the document.

Therefore, we introduce a new iterative SVD algorithm.

Page 16: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

16

To each concept, we associate a text made of its abstract and of the sentences of the document that contain its instances.

We build a concept-stem matrix whose entries are frequencies.

We do a first SVD decomposition.

We give more importance to the concepts and the stems close to the query, whereafter we do a second SVD decomposition.

In the reduced SVD space, we measure how the norms of the concepts and the stems evolved.

Page 17: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

17

dbp:

Epim

enid

es

dbp:

Knoss

osdb

p:Par

adox

Evolution of the norms of the concepts in the reduced SVD space, between iterations 1 and 2:

Page 18: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

18

The stems and the concepts that moved the most will be stressed at next iteration, the stems that nearly didn't move will be removed.

Concepts linked by a predicate to concepts elected to be stressed, will also be stressed.

Page 19: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

19

Page 20: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

20

We use a DBpedia SPARQL endpoint to find new triples about the most important resources.

In a pre-processing step, we kept only the DBpedia predicates that carry enough information (we discarded the predicates whose objects when concatenated had a low entropy).

Page 21: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

21

In order to rank the triples of the extended graph and build the snippet, we do a tensor decomposition (CP) of the graph.

In order to take into account the types of the predicates, we choose to do a tensor decomposition instead of a decomposition of the adjacency matrix (each horizontal slice of the tensor represents the adjacency matrix for one given predicate).

Page 22: SEMashup - ENsEN in Aimashup2014 by M.Alsarem and P.Portier

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

22

Thank you!

And, please, come see the live demo!

http://demo.ensen-insa.org