User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent...

Post on 14-Dec-2015

216 views 3 download

Transcript of User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent...

User Centered and Ontology Based InformationRetrieval System for Life Sciences

Sylvie Ranwez Vincent Ranwez Mohameth-François SyJacky MontmainMichel Crampes

LGI2P Research Center / Ecole des Mines d'Alès, France ISEM – CNRS / Montpellier II University, France

2User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Overview

Context and objectives

Ontology based information retrieval

Relevance calculus between a document index and a query Similarity between two concepts

Relevance of a document with respect to a concept

Relevance of a document with respect to a query

Results visualization

Conclusion et perspectives

3

Context: usual information retrieval engine

Boolean search + Results are easy to

understand

- Exact terms matching - Number of results- Rough measurement:

"match" or "does not match"

- Limited interaction- Aggregating operators

are not used (AND, OR…)

Hard to grasp even with clustering

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

4

Context: information retrieval based on a concepts hierarchy

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Boolean search + specialization + Extend the query

- Number of results- Results are difficult to understand

Which concepts are taken into account? Which ones have been added?

- No relevance assessment

Loss of the first query context

5

Context: information retrieval using ontologies

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Number of retrieved genesGenBank Ensembl

AND 0 0OR 13 15

?Organelle organization (GO_0006996)Cardiac muscle fiber development (GO_0048739)

6

Objectives

Take better benefits of ontologies during the information retrieval process (indexing/query matching) Expand the query if necessary Measure document/query adequacy by identifying added concepts

Favor the overall results' grasp by the user Explain why a document has been selected Give an overall vision of results If a selected document is not relevant, identify why in order to reformulate the

query conveniently

Taking user preferences into account

Favor interactions and iterative querying process

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

7User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Overview

Context and objectives

Ontology based information retrieval

Relevance calculus between a document index and a query Similarity between two concepts

Relevance of a document with respect to a concept

Relevance of a document with respect to a query

Results visualization

Conclusion et perspectives

8

Ontology based information retrieval

Hyponyms and hypernyms to avoid silences

Mix documents that match more or less the queryThe selection may be difficult to understand

Biological process (GO_0008150)

Cellular component organization (GO_0016043)

Organelle organization

(GO_0006996)Mitochondrion

organisation

(GO_0007005)

Cytoskeleton

organization

(GO_0007010)

Cellular process (GO_0009987)

Muscle fiber development

(GO_0048747)

Cardiac

muscle

fiber developme

nt (GO_00487

39)

Skeletal

muscle

fiber developme

nt (GO:004874

1)

Mitochondrion organisation (GO_0007005)

Muscle fiber development (GO_0048747)

?Organelle organization (GO_0006996)Cardiac muscle fiber development (GO_0048739)

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

9User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Overview

Context and objectives

Ontology based information retrieval

Relevance calculus between a document index and a query Similarity between two concepts

Relevance of a document with respect to a concept

Relevance of a document with respect to a query

Results visualization

Conclusion et perspectives

INTERFACE

Query Q

Relevance calculus between a document index and a query

Domain ontologySemantic map

Selected documents

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

( , ),

relevance of with respect to

Q D

D Q

1 2{ , ,..., ,..., }t nQ Q Q Q Q1 2{ , ,..., ,..., }i mD D D D D

?

10

11

Relevance calculus between a document index and a query

Three-level relevance calculus Similarity between two concepts: a concept from the

document index and a concept from the query

Relevance of a document (i.e. the set of its indexingconcepts) with respect to a concept from the query

Relevance of a document with respect to a query: Fuzzy aggregation of relevance measures

Advantages • Ranking of documents with respect to their relevance • Detailed explanation of the document selection

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

( , )t iQ D

( , )tQ D

( , )Q D

12

Relevance calculus between a document index and a query

Three-level relevance calculus Similarity between two concepts: a concept from the

document index and a concept from the query

Relevance of a document (i.e. the set of its indexingconcepts) with respect to a concept from the query

Relevance of a document with respect to a query: Fuzzy aggregation of relevance measures

• Several similarity measurements have been proposed in literature, this one is easy to understand (% of mutual hyponyms)

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

1 2

1 2 2 11 21 2

( ) ( ), if ( ) ou ( )

( ) ( ),

0, elseJD

hypo C hypo CC hypo C C hypo C

hypo C hypo CC C

( , )t iQ D

( , )tQ D

( , )Q D

13

Relevance calculus between a document index and a query

Three-level relevance calculus Similarity between two concepts: a concept from the

document index and a concept from the query

Relevance of a document (i.e. the set of its indexingconcepts) with respect to a concept from the query

Relevance of a document with respect to a query: Fuzzy aggregation of relevance measures

• Best relevance between indexing concepts of document D and a query concept Qt

• May be generalized by weighting the concepts Di (using evidence codes in the Gene Ontology for example)

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

0 | |( , ) max ( , )t i D t iQ D Q D

( , )t iQ D

( , )tQ D

( , )Q D

14

Relevance calculus between a document index and a query

Three-level relevance calculus Similarity between two concepts: a concept from the

document index and a concept from the query

Relevance of a document (i.e. the set of its indexingconcepts) with respect to a concept from the query

Relevance of a document with respect to a query: Fuzzy aggregation of relevance measures

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

• Combine individual relevance scores to estimate an overall relevance of the document Take user preferences into account: decision theory Yager operator (with q )

q = 1, arithmetic mean, q 0, geometrical mean,

q = -1, harmonic mean, q +∞, max (OR generalization)

q  -∞, min (AND generalization)

R1/

1

1

( ( , ),.., ( , )) ( , ) / | |

q

m Q

Qq

tt

Y Q D Q D Q D Q

( , )t iQ D

( , )tQ D

( , )Q D

15User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Overview

Context and objectives

Ontology based information retrieval

Relevance calculus between a document index and a query Similarity between two concepts

Relevance of a document with respect to a concept

Relevance of a document with respect to a query

Results visualization

Conclusion et perspectives

16

Visualization

A document may be selected even if its index contains no terms of the query

Explain the selection to the user: pictograms Each concept of the query is associated with a bar:

• Its height is proportional to its relevance• Its color says if

index the document ( ) specialize (is an hyponym of) generalize (is an hypernym of)

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Mitochondrion organisation (GO_0007005)

Muscle fiber development (GO_0048747)

?Organelle organization (GO_0006996)

Cardiac muscle fiber development (GO_0048739)

( , )tQ DtQ

DiDiD

17

Visualization

Pictograms are displayed on a semantic map Their physical distance to the query is proportional to their relevance score: Visualization and navigation: fit the human cognitive limits (lens, number of results, relevance

threshold…) and help the user (selection of concept for the query…)

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

( , )Q D

18User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Overview

Context and objectives

Ontology based information retrieval

Relevance calculus between a document index and a query Similarity between two concepts

Relevance of a document with respect to a concept

Relevance of a document with respect to a query

Results visualization

Conclusion et perspectives

19

Conclusion and perspectives

Results Find more documents (avoid silences) Improve the relevance: documents ranking Explain relevance calculus (diagnose) Visualize overall results Interaction with the list of retrieved documents: customize user preferences Iterative improvement of the query

Perspectives Improve CHI Suggest query reformulation

• From documents selection by the user (weighting + complement)• Underline query terms that are discriminated

Test several semantic distance calculus on different benchmarks (TREC, Much more…) Improve visualization

• Filter the displayed results using sub-ontologies extraction• Propose a view of the results underlining clusters

Propose an online version

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

User Centered and Ontology Based InformationRetrieval System for Life Sciences

sylvie.ranwez@mines-ales.frvincent.ranwez@univ-montp2.fr

mohameth.sy@mines-ales.frjacky.montmain@mines-ales.frmichel.crampes@mines-ales.fr

21

Visualization

OBIRS on line: http://www.ontotoolkit.mines-ales.fr/ObirsClient/

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

22

Ontology based information retrieval

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Documents

Indexation Analyze/Indexation

Documents' index Query Index

Match

Relevant documents

REFORMULATION

?

Domain ontology

Query

23

Visualization

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

24

Avant-propos : retours sur la pertinence des résultats

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Pertinence apprise à partir d'annotations Peut mutualiser les indexations de certaines

bases de données Taguage possible avec des mots clés

personnels hiérarchisésmais Liste des résultats peut être longue Pas de justification sur la pertinence

Filtres mais non sémantiques Pas de vision globale

25

Calcul de pertinence d'un document par rapport à une requête

Il existe de mesures de distance entre des ensemble de concepts

Entre indexations réalisées avec GO, par exemple

cependant la mesure de pertinence d'un document par rapport à une requête

Ne doit pas être symétrique

Doit permettre de détailler le score de chaque terme de la requête

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

26

• GO:0006996• ENSG00000115204 ENSG00000115204 ENSG00000025708 ENSG00000025708

ENSG00000025708 ENSG00000025708 ENSG00000151729 ENSG00000078142 ENSG00000139112 ENSG00000170296

• GO:0048739• Ensembl Gene ID ENSG00000154639 ENSG00000197616 ENSG00000197616

ENSG00000133454 ENSG00000133392 ENSG00000173991

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

27

Ontologie : ex tiré du MeSH

Nervous System Diseases

Central Nervous System Diseases

Brain Diseases

Headache Disorder, Primary

Migraine = Migraine Disorder

Sign and Symptoms

Headache

Neurologic Manifestations

Migraine Disorder with Aura

Migraine Disorder without Aura

Headache DisorderPain

Pathological Conditions, Signs and Symptoms

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez