Moving beyond free text
description
Transcript of Moving beyond free text
Moving beyond free text
Moving beyond free textAuthors
Scientist does research
Scientist publishes research results in journal article
Old Paradigm:
Want:
All genes involved in seed development(name, species, protein sequence)
Read 3,404 articles???
Read 592,000 articles???
Results extracted from free text and converted to a structured format (ontology annotations)
Structured data combined with other data for queries, further analysis
manual curation (+ NLP…?)
Scientist does research
Scientist publishes research results as free text
Database
Old Paradigm - extended:
Example –Journal article about gene function
The goal: an annotation that captures the result
Example –Journal article about gene function
Manual curation:Time consuming, does not scale well
NLP:Very challenging
The goal: an annotation that captures the result
Example –Journal article about gene function
Example – phylogenetic treatment
http://www.mobot.org/mobot/research/apweb/welcome.html
Relatively high degree of structure compared to journal article
May be more amenable to natural language processing but still very challenging, complex information
Results extracted from free text and converted to a structured format (ontology annotations)
Structured data combined with other data for queries, further analysis
manual curation (+ NLP)Can we get authors involved?
Scientist does research
Scientist publishes research results as free text
Database
Link to external resource
Scientific Publishers are interested in this problem…
Science Direct: http://www.sciencedirect.com/science/article/pii/S0378111910001502
Scientific Publishers are interested in this problem…
Scientific Publishers are interested in this problem…
Databases are interested in this problem…
Databases are interested in this problem…
What if we had a good general tool for authors to do this themselves?
http://herbarium.usu.edu/webmanual/
Example: Morphological description of species
http://herbarium.usu.edu/webmanual/
Example: Morphological description of species
PO:0025034 (leaf), PATO:0000599 (decreased width)
PO:0020003 (ovule), PATO:0000460 (abnormal)
PO:0009010 (seed), PATO:0001997 (reduced)
Example: Mutant phenotype description
Scientist does research
Scientist publishes research results as free textand as annotations using ontology terms
Benefit to scientist – wider exposure and reuse of results
Benefit to publishers – tagged text allows enhanced presentation for subscribers
Benefit to research community – Better access to data
New Paradigm: