Articles indexats publicats per investigadors del Campus ...
Biomedical articles per year
description
Transcript of Biomedical articles per year
George Paliouras, May 2014 www.bioasq.org
National Center for Scientific Research ‘Demokritos’
George Paliouras
BioASQ
Intelligent Information Management Targeted Competition Framework ICT-2011.4.4(d)
George Paliouras, May 2014
A challenge on large-scale biomedical semantic indexing and question answering
www.bioasq.org
George Paliouras, May 2014 www.bioasq.org
Biomedical articles per year
2/43
George Paliouras, May 2014 www.bioasq.org
Questions of biomedical experts
“Are there any DNMT3 proteins present in plants?”
“Yes”
“Yes. The plant DOMAINS REARRANGED METHYLTRANSFERASE2 (DRM2) is a homolog of the mammalian de novo methyltransferase DNMT3. DRM2 contains a novel arrangement of the motifs required for DNA methyltransferase catalytic activity.”
Yes/No question
Exact Answer
Ideal Answer
3/43
George Paliouras, May 2014 www.bioasq.org
Questions of biomedical experts
“What is the methyl donor of DNA (cytosine-5)-methyltransferases?”
“S-adenosyl-L-methionine”
“S-adenosyl-L-methionine (AdoMet, SAM) is the methyl donor of DNA (cytosine-5)-methyltransferases. DNA (cytosine-5)-methyltransferases catalyze the transfer of a methyl group from S-adenosyl-L-methionine to the C-5 position of cytosine residues in DNA.”
Factoid question
Exact Answer
Ideal Answer
4/43
George Paliouras, May 2014 www.bioasq.org
Questions of biomedical experts (III)List question
“In 1955, the production of itaconic acid was firstly described for Ustilago maydis. Some Aspergillus species, like A. itaconicus and A. terreus, show the ability to synthesize this organic acid and A. terreus can secrete significant amounts to the media. Itaconic acid is mainly supplied by biotechnological processes with the fungus Aspergillus terreus. Cloning of the cadA gene into the citric acid producing fungus A. niger showed that it is possible to produce itaconic acid also in a different host organism.”
“Aspergillus terreus”, “Aspergillus niger”, “Ustilago maydis”
Exact Answer
Ideal Answer
“Which species may be used for the biotechnological production of itaconic acid?”
5/43
George Paliouras, May 2014 www.bioasq.org
Questions of biomedical experts (III) Summary question
“Histone methyltransferases (HMTs) are responsible for the site-specific addition of covalent modifications on the histone tails, which serve as markers for the recruitment of chromatin organization complexes. There are two major types of HMTs: histone-lysine N-Methyltransferases and histone-arginine N-methyltransferases. The former methylate specific lysine (K) residues such as 4, 9, 27, 36, and 79 on histone H3 and residue 20 on histone H4. The latter methylate arginine (R) residues such as 2, 8, 17, and 26 on histone H3 and residue 3 on histone H4. Depending on what residue is modified and the degree of methylation (mono-, di- and tri-methylation), lysine methylation of histones is linked to either transcriptionally active or silent chromatin.”
-
Exact Answer
Ideal Answer
“How do histone methyltransferases cause histone modification?”
6/43
George Paliouras, May 2014 www.bioasq.org 7/43
George Paliouras, May 2014 www.bioasq.org
Finding relevant snippets
8/43
George Paliouras, May 2014 www.bioasq.org
Not only texts: ontologies, linked data, …
9/43
George Paliouras, May 2014 www.bioasq.org 10/43
George Paliouras, May 2014 www.bioasq.org
Information from structured dataList question
http://www.disease-ontology.org/api/metadata/DOID:162 (cancer) http://www.uniprot.org/uniprot/M3K8_RAT (TPL2 synonym)
Subject: http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseases/3003 (lung cancer)Predicate: http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseasome/associatedGeneObject: http://www4.wiwiss.fu-berlin.de/diseasome/resource/genes/TPL2"
Related RDF triple
Related concepts
“Which forms of cancer is the Tpl2 gene associated with?”
11/43
George Paliouras, May 2014 www.bioasq.org 12/43
George Paliouras, May 2014 www.bioasq.org
BioASQ Vision
• Make sure this knowledge is used to the benefit of patients
• Need to make it accessible to biomedical experts• Search is not effective enough• Push research in automated answering of
questions• A challenge for such systems can achieve a
multiplying effect
13/43
George Paliouras, May 2014 www.bioasq.org
What is BioASQ?A challenge funded by the European Union (FP7).
Task a: Hierarchical text classification• Organizers distribute new unclassified PubMed articles.• Participants assign MeSH terms to the articles.• Evaluation based on annotations of PubMed curators.
Task b: IR, QA, summarization, …• Organizers distribute English biomedical questions.• Participants provide: relevant articles, snippets,
concepts, triples, “exact” answers, “ideal” answers. • Evaluation: both automatic (GMAP, MRR, ROUGE etc.)
and manual (by biomedical experts). 14/43
George Paliouras, May 2014 www.bioasq.org
Task bThe challenge
15/43
Task a
George Paliouras, May 2014 www.bioasq.org 16/43
George Paliouras, May 2014 www.bioasq.org
Behind the scenes
17/43
George Paliouras, May 2014 www.bioasq.org
BioASQ Platform
18/43
George Paliouras, May 2014 www.bioasq.org
Datasets
Task b data contain gold articles, snippets, concepts, triples, “exact” and “ideal” answers prepared by biomedical experts from around Europe.
Task a 1st challenge 2nd challenge
Training 10,876,004 12,628,968Test 83490 71950 Task b 1st challenge 2nd challenge
Training 29 310Test 281 500
19/43
George Paliouras, May 2014 www.bioasq.org
Data sources
They include both text and structured info.► PubMed abstracts,
PubMed Central articles, MeSH.
► Gene Ontology, UniProt, Jochem, Disease Ontology.
20/43
George Paliouras, May 2014 www.bioasq.org
Annotation: questions and queries
21/43
George Paliouras, May 2014 www.bioasq.org
Annotation: snippets
22/43
George Paliouras, May 2014 www.bioasq.org
Annotation: answers
23/43
George Paliouras, May 2014 www.bioasq.org
Assessment: relevance of material
24/43
George Paliouras, May 2014 www.bioasq.org
Assessment: information in answers
25/43
George Paliouras, May 2014 www.bioasq.org
BioASQ social network
26/43
George Paliouras, May 2014 www.bioasq.org
Oracle
27/43
George Paliouras, May 2014 www.bioasq.org
Oracle
28/43
George Paliouras, May 2014 www.bioasq.org
Two cycles
Evaluation infrastructure &
dry-run data
Start of the
challenge
End of the
challenge
BioASQ worksho
p March 2013 June 2013 August 2013 September 2013
2013 Schedule
Start of Task 2A
Start of Task 2B
End of the
challenge
BioASQ worksh
op February 2014 March 2014 May 2014 September 2014
2014 Schedule
The official challenge is over, but…► Task a continues to run each week .► An oracle for task b will be available soon.► Oracles will remain available.► Third cycle is being designed …
29/43
George Paliouras, May 2014 www.bioasq.org
Challenge participants so far
30/43
George Paliouras, May 2014 www.bioasq.org
Challenge participants in each cycle
31/43
George Paliouras, May 2014 www.bioasq.org
Evaluation measuresTask a: Hierarchical text classification
Flat measures for multi-label classification: Accuracy, MiF, MaF, EBFHierarchical measures: LCA-F (new), HF
Task b: IR, QA, summarization, …Phase A:
standard IR measures, mean precision, mean recall, mean F-measure, MAP (used for winners selection), G-MAP
Phase B:‘Exact answers’ (based on type): accuracy (yes/no), strict/lenient accuracy, MRR (factoid), mean F-measure (list)‘Ideal answers’: manual scores from the experts {Readability, Repetition, Information Precision and Recall}, plus ROUGE
32/43
George Paliouras, May 2014 www.bioasq.org
First year technology/results overview• Task 1a
– Mainly SVMs and learning-to-rank.– Mostly flat classification, ignoring class taxonomy.– Mediocre results by hierarchical methods.– One of the systems outperformed NLM’s system.
• Task 1b– Phase A (retrieve relevant documents, concepts, snippets,
triples): low performance (compared to baselines).– Phase B (formulate ‘exact’ and ‘ideal’ answers): poor performance
for ‘exact’ answers (except for yes/no questions); high performance for ‘ideal’ answers (paragraph-sized summaries), but starting with gold documents, snippets etc.
• Large scope for improvements, esp. in Task 1b.
33/43
George Paliouras, May 2014 www.bioasq.org
“Exact” answer results (batch 2/3)
34/43
George Paliouras, May 2014 www.bioasq.org
“Ideal” answer results (batch 2/3)
35/43
George Paliouras, May 2014 www.bioasq.org
Results – task a – flat measures
36/43
George Paliouras, May 2014 www.bioasq.org
Results – task a – hierarchical
37/43
George Paliouras, May 2014 www.bioasq.org
First challenge prizes
38/43
George Paliouras, May 2014 www.bioasq.org
Sustainability
• BioASQ Oracle• Software release and installation instructions• Benchmark datasets • BioASQ social network• Involvement of the biomedical community in the
process• Attracting sponsors for prizes
Making the challenge viable, at very low cost, after the end of the project
39/43
George Paliouras, May 2014 www.bioasq.org
Project Consortium
1. National Centre for Scientific Research “Demokritos” -NSCR “D” (EL)
2. Transinsight GmbH – TI (D)3. Universite Joseph Fourier- UJF (F)4. University Leipzig - ULEI (D)5. Universite Pierre et Marie Curie Paris 6 – UPMC (F)6. Athens University of Economics and Business –
Research Centre – AUEB-RC (EL)
40/43
George Paliouras, May 2014 www.bioasq.org
Project Consortium
41/43
George Paliouras, May 2014 www.bioasq.org
Get in touch!
BioASQ workshop @CLEF (Sheffield, Sept 14)
Visit www.bioasq.orgFollow @BioASQ
42/43
George Paliouras, May 2014 www.bioasq.org
Useful Links• BioASQ Annotation & assessment tools:
– http://at.bioasq.org/– http://assess.bioasq.org/– https://github.com/AKSW/BioASQ-AT
• BioASQ social network: – http://sn.bioasq.org/– https://github.com/AKSW/BioASQ-SN
• BioASQ platform: – http://bioasq.lip6.fr/
• BioASQ Oracles: – http://bioasq.lip6.fr/oracle/
43/43
A. Kosmopoulos, I. Partalas, E. Gaussier, G. Paliouras, I. Androutsopoulos, Evaluation Measures for Hierarchical Classification: a unified view and novel approaches. Data Mining and Knowledge Discovery (To appear)