User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent...
-
Upload
kathryn-cottam -
Category
Documents
-
view
216 -
download
3
Transcript of User Centered and Ontology Based Information Retrieval System for Life Sciences Sylvie RanwezVincent...
User Centered and Ontology Based InformationRetrieval System for Life Sciences
Sylvie Ranwez Vincent Ranwez Mohameth-François SyJacky MontmainMichel Crampes
LGI2P Research Center / Ecole des Mines d'Alès, France ISEM – CNRS / Montpellier II University, France
2User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
Overview
Context and objectives
Ontology based information retrieval
Relevance calculus between a document index and a query Similarity between two concepts
Relevance of a document with respect to a concept
Relevance of a document with respect to a query
Results visualization
Conclusion et perspectives
3
Context: usual information retrieval engine
Boolean search + Results are easy to
understand
- Exact terms matching - Number of results- Rough measurement:
"match" or "does not match"
- Limited interaction- Aggregating operators
are not used (AND, OR…)
Hard to grasp even with clustering
User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
4
Context: information retrieval based on a concepts hierarchy
User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
Boolean search + specialization + Extend the query
- Number of results- Results are difficult to understand
Which concepts are taken into account? Which ones have been added?
- No relevance assessment
Loss of the first query context
5
Context: information retrieval using ontologies
User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
Number of retrieved genesGenBank Ensembl
AND 0 0OR 13 15
?Organelle organization (GO_0006996)Cardiac muscle fiber development (GO_0048739)
6
Objectives
Take better benefits of ontologies during the information retrieval process (indexing/query matching) Expand the query if necessary Measure document/query adequacy by identifying added concepts
Favor the overall results' grasp by the user Explain why a document has been selected Give an overall vision of results If a selected document is not relevant, identify why in order to reformulate the
query conveniently
Taking user preferences into account
Favor interactions and iterative querying process
User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
7User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
Overview
Context and objectives
Ontology based information retrieval
Relevance calculus between a document index and a query Similarity between two concepts
Relevance of a document with respect to a concept
Relevance of a document with respect to a query
Results visualization
Conclusion et perspectives
8
Ontology based information retrieval
Hyponyms and hypernyms to avoid silences
Mix documents that match more or less the queryThe selection may be difficult to understand
Biological process (GO_0008150)
Cellular component organization (GO_0016043)
Organelle organization
(GO_0006996)Mitochondrion
organisation
(GO_0007005)
Cytoskeleton
organization
(GO_0007010)
Cellular process (GO_0009987)
…
Muscle fiber development
(GO_0048747)
Cardiac
muscle
fiber developme
nt (GO_00487
39)
Skeletal
muscle
fiber developme
nt (GO:004874
1)
Mitochondrion organisation (GO_0007005)
Muscle fiber development (GO_0048747)
?Organelle organization (GO_0006996)Cardiac muscle fiber development (GO_0048739)
User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
9User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
Overview
Context and objectives
Ontology based information retrieval
Relevance calculus between a document index and a query Similarity between two concepts
Relevance of a document with respect to a concept
Relevance of a document with respect to a query
Results visualization
Conclusion et perspectives
INTERFACE
Query Q
Relevance calculus between a document index and a query
Domain ontologySemantic map
Selected documents
User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
( , ),
relevance of with respect to
Q D
D Q
1 2{ , ,..., ,..., }t nQ Q Q Q Q1 2{ , ,..., ,..., }i mD D D D D
?
10
11
Relevance calculus between a document index and a query
Three-level relevance calculus Similarity between two concepts: a concept from the
document index and a concept from the query
Relevance of a document (i.e. the set of its indexingconcepts) with respect to a concept from the query
Relevance of a document with respect to a query: Fuzzy aggregation of relevance measures
Advantages • Ranking of documents with respect to their relevance • Detailed explanation of the document selection
User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
( , )t iQ D
( , )tQ D
( , )Q D
12
Relevance calculus between a document index and a query
Three-level relevance calculus Similarity between two concepts: a concept from the
document index and a concept from the query
Relevance of a document (i.e. the set of its indexingconcepts) with respect to a concept from the query
Relevance of a document with respect to a query: Fuzzy aggregation of relevance measures
• Several similarity measurements have been proposed in literature, this one is easy to understand (% of mutual hyponyms)
User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
1 2
1 2 2 11 21 2
( ) ( ), if ( ) ou ( )
( ) ( ),
0, elseJD
hypo C hypo CC hypo C C hypo C
hypo C hypo CC C
( , )t iQ D
( , )tQ D
( , )Q D
13
Relevance calculus between a document index and a query
Three-level relevance calculus Similarity between two concepts: a concept from the
document index and a concept from the query
Relevance of a document (i.e. the set of its indexingconcepts) with respect to a concept from the query
Relevance of a document with respect to a query: Fuzzy aggregation of relevance measures
• Best relevance between indexing concepts of document D and a query concept Qt
• May be generalized by weighting the concepts Di (using evidence codes in the Gene Ontology for example)
User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
0 | |( , ) max ( , )t i D t iQ D Q D
( , )t iQ D
( , )tQ D
( , )Q D
14
Relevance calculus between a document index and a query
Three-level relevance calculus Similarity between two concepts: a concept from the
document index and a concept from the query
Relevance of a document (i.e. the set of its indexingconcepts) with respect to a concept from the query
Relevance of a document with respect to a query: Fuzzy aggregation of relevance measures
User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
• Combine individual relevance scores to estimate an overall relevance of the document Take user preferences into account: decision theory Yager operator (with q )
q = 1, arithmetic mean, q 0, geometrical mean,
q = -1, harmonic mean, q +∞, max (OR generalization)
q -∞, min (AND generalization)
R1/
1
1
( ( , ),.., ( , )) ( , ) / | |
q
m Q
tt
Y Q D Q D Q D Q
( , )t iQ D
( , )tQ D
( , )Q D
15User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
Overview
Context and objectives
Ontology based information retrieval
Relevance calculus between a document index and a query Similarity between two concepts
Relevance of a document with respect to a concept
Relevance of a document with respect to a query
Results visualization
Conclusion et perspectives
16
Visualization
A document may be selected even if its index contains no terms of the query
Explain the selection to the user: pictograms Each concept of the query is associated with a bar:
• Its height is proportional to its relevance• Its color says if
index the document ( ) specialize (is an hyponym of) generalize (is an hypernym of)
User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
Mitochondrion organisation (GO_0007005)
Muscle fiber development (GO_0048747)
?Organelle organization (GO_0006996)
Cardiac muscle fiber development (GO_0048739)
( , )tQ DtQ
DiDiD
17
Visualization
Pictograms are displayed on a semantic map Their physical distance to the query is proportional to their relevance score: Visualization and navigation: fit the human cognitive limits (lens, number of results, relevance
threshold…) and help the user (selection of concept for the query…)
User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
( , )Q D
18User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
Overview
Context and objectives
Ontology based information retrieval
Relevance calculus between a document index and a query Similarity between two concepts
Relevance of a document with respect to a concept
Relevance of a document with respect to a query
Results visualization
Conclusion et perspectives
19
Conclusion and perspectives
Results Find more documents (avoid silences) Improve the relevance: documents ranking Explain relevance calculus (diagnose) Visualize overall results Interaction with the list of retrieved documents: customize user preferences Iterative improvement of the query
Perspectives Improve CHI Suggest query reformulation
• From documents selection by the user (weighting + complement)• Underline query terms that are discriminated
Test several semantic distance calculus on different benchmarks (TREC, Much more…) Improve visualization
• Filter the displayed results using sub-ontologies extraction• Propose a view of the results underlining clusters
Propose an online version
User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
User Centered and Ontology Based InformationRetrieval System for Life Sciences
[email protected]@univ-montp2.fr
21
Visualization
OBIRS on line: http://www.ontotoolkit.mines-ales.fr/ObirsClient/
User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
22
Ontology based information retrieval
User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
Documents
Indexation Analyze/Indexation
Documents' index Query Index
Match
Relevant documents
REFORMULATION
?
Domain ontology
Query
23
Visualization
User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
24
Avant-propos : retours sur la pertinence des résultats
User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
Pertinence apprise à partir d'annotations Peut mutualiser les indexations de certaines
bases de données Taguage possible avec des mots clés
personnels hiérarchisésmais Liste des résultats peut être longue Pas de justification sur la pertinence
Filtres mais non sémantiques Pas de vision globale
25
Calcul de pertinence d'un document par rapport à une requête
Il existe de mesures de distance entre des ensemble de concepts
Entre indexations réalisées avec GO, par exemple
cependant la mesure de pertinence d'un document par rapport à une requête
Ne doit pas être symétrique
Doit permettre de détailler le score de chaque terme de la requête
User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
26
• GO:0006996• ENSG00000115204 ENSG00000115204 ENSG00000025708 ENSG00000025708
ENSG00000025708 ENSG00000025708 ENSG00000151729 ENSG00000078142 ENSG00000139112 ENSG00000170296
• GO:0048739• Ensembl Gene ID ENSG00000154639 ENSG00000197616 ENSG00000197616
ENSG00000133454 ENSG00000133392 ENSG00000173991
User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez
27
Ontologie : ex tiré du MeSH
Nervous System Diseases
Central Nervous System Diseases
Brain Diseases
Headache Disorder, Primary
Migraine = Migraine Disorder
Sign and Symptoms
Headache
Neurologic Manifestations
Migraine Disorder with Aura
Migraine Disorder without Aura
Headache DisorderPain
…
Pathological Conditions, Signs and Symptoms
User Centered and Ontology Based Information Retrieval System for Life Sciences – S . Ranwez