User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent...

27
User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie Ranwez Vincent Ranwez Mohameth-François Sy Jacky Montmain Michel Crampes LGI2P Research Center / Ecole des Mines d'Alès, France ISEM – CNRS / Montpellier II University, France

Transcript of User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent...

Page 1: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

User Centered and Ontology Based InformationRetrieval System for Life Sciences

Sylvie Ranwez Vincent Ranwez Mohameth-François SyJacky MontmainMichel Crampes

LGI2P Research Center / Ecole des Mines d'Alès, France ISEM – CNRS / Montpellier II University, France

Page 2: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

2User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Overview

Context and objectives

Ontology based information retrieval

Relevance calculus between a document index and a query Similarity between two concepts

Relevance of a document with respect to a concept

Relevance of a document with respect to a query

Results visualization

Conclusion et perspectives

Page 3: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

3

Context: usual information retrieval engine

Boolean search + Results are easy to

understand

- Exact terms matching - Number of results- Rough measurement:

"match" or "does not match"

- Limited interaction- Aggregating operators

are not used (AND, OR…)

Hard to grasp even with clustering

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Page 4: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

4

Context: information retrieval based on a concepts hierarchy

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Boolean search + specialization + Extend the query

- Number of results- Results are difficult to understand

Which concepts are taken into account? Which ones have been added?

- No relevance assessment

Loss of the first query context

Page 5: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

5

Context: information retrieval using ontologies

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Number of retrieved genesGenBank Ensembl

AND 0 0OR 13 15

?Organelle organization (GO_0006996)Cardiac muscle fiber development (GO_0048739)

Page 6: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

6

Objectives

Take better benefits of ontologies during the information retrieval process (indexing/query matching) Expand the query if necessary Measure document/query adequacy by identifying added concepts

Favor the overall results' grasp by the user Explain why a document has been selected Give an overall vision of results If a selected document is not relevant, identify why in order to reformulate the

query conveniently

Taking user preferences into account

Favor interactions and iterative querying process

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Page 7: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

7User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Overview

Context and objectives

Ontology based information retrieval

Relevance calculus between a document index and a query Similarity between two concepts

Relevance of a document with respect to a concept

Relevance of a document with respect to a query

Results visualization

Conclusion et perspectives

Page 8: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

8

Ontology based information retrieval

Hyponyms and hypernyms to avoid silences

Mix documents that match more or less the queryThe selection may be difficult to understand

Biological process (GO_0008150)

Cellular component organization (GO_0016043)

Organelle organization

(GO_0006996)Mitochondrion

organisation

(GO_0007005)

Cytoskeleton

organization

(GO_0007010)

Cellular process (GO_0009987)

Muscle fiber development

(GO_0048747)

Cardiac

muscle

fiber developme

nt (GO_00487

39)

Skeletal

muscle

fiber developme

nt (GO:004874

1)

Mitochondrion organisation (GO_0007005)

Muscle fiber development (GO_0048747)

?Organelle organization (GO_0006996)Cardiac muscle fiber development (GO_0048739)

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Page 9: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

9User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Overview

Context and objectives

Ontology based information retrieval

Relevance calculus between a document index and a query Similarity between two concepts

Relevance of a document with respect to a concept

Relevance of a document with respect to a query

Results visualization

Conclusion et perspectives

Page 10: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

INTERFACE

Query Q

Relevance calculus between a document index and a query

Domain ontologySemantic map

Selected documents

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

( , ),

relevance of with respect to

Q D

D Q

1 2{ , ,..., ,..., }t nQ Q Q Q Q1 2{ , ,..., ,..., }i mD D D D D

?

10

Page 11: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

11

Relevance calculus between a document index and a query

Three-level relevance calculus Similarity between two concepts: a concept from the

document index and a concept from the query

Relevance of a document (i.e. the set of its indexingconcepts) with respect to a concept from the query

Relevance of a document with respect to a query: Fuzzy aggregation of relevance measures

Advantages • Ranking of documents with respect to their relevance • Detailed explanation of the document selection

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

( , )t iQ D

( , )tQ D

( , )Q D

Page 12: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

12

Relevance calculus between a document index and a query

Three-level relevance calculus Similarity between two concepts: a concept from the

document index and a concept from the query

Relevance of a document (i.e. the set of its indexingconcepts) with respect to a concept from the query

Relevance of a document with respect to a query: Fuzzy aggregation of relevance measures

• Several similarity measurements have been proposed in literature, this one is easy to understand (% of mutual hyponyms)

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

1 2

1 2 2 11 21 2

( ) ( ), if ( ) ou ( )

( ) ( ),

0, elseJD

hypo C hypo CC hypo C C hypo C

hypo C hypo CC C

( , )t iQ D

( , )tQ D

( , )Q D

Page 13: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

13

Relevance calculus between a document index and a query

Three-level relevance calculus Similarity between two concepts: a concept from the

document index and a concept from the query

Relevance of a document (i.e. the set of its indexingconcepts) with respect to a concept from the query

Relevance of a document with respect to a query: Fuzzy aggregation of relevance measures

• Best relevance between indexing concepts of document D and a query concept Qt

• May be generalized by weighting the concepts Di (using evidence codes in the Gene Ontology for example)

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

0 | |( , ) max ( , )t i D t iQ D Q D

( , )t iQ D

( , )tQ D

( , )Q D

Page 14: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

14

Relevance calculus between a document index and a query

Three-level relevance calculus Similarity between two concepts: a concept from the

document index and a concept from the query

Relevance of a document (i.e. the set of its indexingconcepts) with respect to a concept from the query

Relevance of a document with respect to a query: Fuzzy aggregation of relevance measures

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

• Combine individual relevance scores to estimate an overall relevance of the document Take user preferences into account: decision theory Yager operator (with q )

q = 1, arithmetic mean, q 0, geometrical mean,

q = -1, harmonic mean, q +∞, max (OR generalization)

q  -∞, min (AND generalization)

R1/

1

1

( ( , ),.., ( , )) ( , ) / | |

q

m Q

Qq

tt

Y Q D Q D Q D Q

( , )t iQ D

( , )tQ D

( , )Q D

Page 15: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

15User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Overview

Context and objectives

Ontology based information retrieval

Relevance calculus between a document index and a query Similarity between two concepts

Relevance of a document with respect to a concept

Relevance of a document with respect to a query

Results visualization

Conclusion et perspectives

Page 16: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

16

Visualization

A document may be selected even if its index contains no terms of the query

Explain the selection to the user: pictograms Each concept of the query is associated with a bar:

• Its height is proportional to its relevance• Its color says if

index the document ( ) specialize (is an hyponym of) generalize (is an hypernym of)

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Mitochondrion organisation (GO_0007005)

Muscle fiber development (GO_0048747)

?Organelle organization (GO_0006996)

Cardiac muscle fiber development (GO_0048739)

( , )tQ DtQ

DiDiD

Page 17: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

17

Visualization

Pictograms are displayed on a semantic map Their physical distance to the query is proportional to their relevance score: Visualization and navigation: fit the human cognitive limits (lens, number of results, relevance

threshold…) and help the user (selection of concept for the query…)

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

( , )Q D

Page 18: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

18User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Overview

Context and objectives

Ontology based information retrieval

Relevance calculus between a document index and a query Similarity between two concepts

Relevance of a document with respect to a concept

Relevance of a document with respect to a query

Results visualization

Conclusion et perspectives

Page 19: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

19

Conclusion and perspectives

Results Find more documents (avoid silences) Improve the relevance: documents ranking Explain relevance calculus (diagnose) Visualize overall results Interaction with the list of retrieved documents: customize user preferences Iterative improvement of the query

Perspectives Improve CHI Suggest query reformulation

• From documents selection by the user (weighting + complement)• Underline query terms that are discriminated

Test several semantic distance calculus on different benchmarks (TREC, Much more…) Improve visualization

• Filter the displayed results using sub-ontologies extraction• Propose a view of the results underlining clusters

Propose an online version

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Page 20: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

User Centered and Ontology Based InformationRetrieval System for Life Sciences

[email protected]@univ-montp2.fr

[email protected]@[email protected]

Page 21: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

21

Visualization

OBIRS on line: http://www.ontotoolkit.mines-ales.fr/ObirsClient/

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Page 22: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

22

Ontology based information retrieval

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Documents

Indexation Analyze/Indexation

Documents' index Query Index

Match

Relevant documents

REFORMULATION

?

Domain ontology

Query

Page 23: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

23

Visualization

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Page 24: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

24

Avant-propos : retours sur la pertinence des résultats

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Pertinence apprise à partir d'annotations Peut mutualiser les indexations de certaines

bases de données Taguage possible avec des mots clés

personnels hiérarchisésmais Liste des résultats peut être longue Pas de justification sur la pertinence

Filtres mais non sémantiques Pas de vision globale

Page 25: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

25

Calcul de pertinence d'un document par rapport à une requête

Il existe de mesures de distance entre des ensemble de concepts

Entre indexations réalisées avec GO, par exemple

cependant la mesure de pertinence d'un document par rapport à une requête

Ne doit pas être symétrique

Doit permettre de détailler le score de chaque terme de la requête

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Page 26: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

26

• GO:0006996• ENSG00000115204 ENSG00000115204 ENSG00000025708 ENSG00000025708

ENSG00000025708 ENSG00000025708 ENSG00000151729 ENSG00000078142 ENSG00000139112 ENSG00000170296

• GO:0048739• Ensembl Gene ID ENSG00000154639 ENSG00000197616 ENSG00000197616

ENSG00000133454 ENSG00000133392 ENSG00000173991

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez

Page 27: User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent Ranwez Mohameth-François Sy Jacky Montmain Michel.

27

Ontologie : ex tiré du MeSH

Nervous System Diseases

Central Nervous System Diseases

Brain Diseases

Headache Disorder, Primary

Migraine = Migraine Disorder

Sign and Symptoms

Headache

Neurologic Manifestations

Migraine Disorder with Aura

Migraine Disorder without Aura

Headache DisorderPain

Pathological Conditions, Signs and Symptoms

User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez