Zhiyong Lu, Earl Stadtman Investigator National Center for ... · 10.6% 19.0% 19.9% 0.00 0.05 0.10...
Transcript of Zhiyong Lu, Earl Stadtman Investigator National Center for ... · 10.6% 19.0% 19.9% 0.00 0.05 0.10...
![Page 1: Zhiyong Lu, Earl Stadtman Investigator National Center for ... · 10.6% 19.0% 19.9% 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 eries Neveol, Dogan, Lu, Semi-automatic semantic annotation](https://reader033.fdocuments.us/reader033/viewer/2022060414/5f12560e66884560f66b9936/html5/thumbnails/1.jpg)
Zhiyong Lu, Earl Stadtman Investigator National Center for Biotechnology Information (NCBI) National Library of Medicine (NLM) National Institutes of Health (NIH)
![Page 2: Zhiyong Lu, Earl Stadtman Investigator National Center for ... · 10.6% 19.0% 19.9% 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 eries Neveol, Dogan, Lu, Semi-automatic semantic annotation](https://reader033.fdocuments.us/reader033/viewer/2022060414/5f12560e66884560f66b9936/html5/thumbnails/2.jpg)
0
20000
40000
60000
80000
100000
120000
1/02 1/03 1/04 1/05 1/06 1/07
# S
wis
s-P
rot
Pro
tein
sProteins missing a FUNCTION comment
Proteins gaining a FUNCTION comment
![Page 3: Zhiyong Lu, Earl Stadtman Investigator National Center for ... · 10.6% 19.0% 19.9% 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 eries Neveol, Dogan, Lu, Semi-automatic semantic annotation](https://reader033.fdocuments.us/reader033/viewer/2022060414/5f12560e66884560f66b9936/html5/thumbnails/3.jpg)
![Page 4: Zhiyong Lu, Earl Stadtman Investigator National Center for ... · 10.6% 19.0% 19.9% 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 eries Neveol, Dogan, Lu, Semi-automatic semantic annotation](https://reader033.fdocuments.us/reader033/viewer/2022060414/5f12560e66884560f66b9936/html5/thumbnails/4.jpg)
10.6%
19.0% 19.9%
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
Pro
po
rtio
n o
f q
ue
rie
s
Neveol, Dogan, Lu, Semi-automatic semantic annotation of PubMed queries: A study on quality, efficiency, satisfaction, Journal of Biomedical Informatics, 2010
Bibliographic Non-bibliographic
![Page 5: Zhiyong Lu, Earl Stadtman Investigator National Center for ... · 10.6% 19.0% 19.9% 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 eries Neveol, Dogan, Lu, Semi-automatic semantic annotation](https://reader033.fdocuments.us/reader033/viewer/2022060414/5f12560e66884560f66b9936/html5/thumbnails/5.jpg)
• diabetes mellitus; DM; type 2 diabetes Disease
• c.77A>C; c.77A->C; A77C; AC Genomic variation
• TP53; tumor protein p53; p53; BCC7; LFS1 Gene/Protein
• Arabidopsis thaliana; thale-cress; AT Species
• Aspirin; 2-(Acetyloxy)benzoic Acid; Acetysal Chemical/Drug
![Page 6: Zhiyong Lu, Earl Stadtman Investigator National Center for ... · 10.6% 19.0% 19.9% 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 eries Neveol, Dogan, Lu, Semi-automatic semantic annotation](https://reader033.fdocuments.us/reader033/viewer/2022060414/5f12560e66884560f66b9936/html5/thumbnails/6.jpg)
Disease
DNorm – 80.90%
Mutation
tmVar – 91.39%
Gene/Protein
GenNorm – 84.50%
Species
SR4GN – 85.42%
Chemical/Drug
tmChem – 88.27% All numbers are F1 scores
Freely available & open source
High Performance DNorm: Best in 2013 ShARe/CLEF shared
task on Disease Normalization
tmChem: Best in 2013 BioCreative IV Chemical Entity Mention task
GenNorm: Best in 2010 BioCreative III Gene Normalization Task
BioC format compatible for improved
interoperability
![Page 7: Zhiyong Lu, Earl Stadtman Investigator National Center for ... · 10.6% 19.0% 19.9% 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 eries Neveol, Dogan, Lu, Semi-automatic semantic annotation](https://reader033.fdocuments.us/reader033/viewer/2022060414/5f12560e66884560f66b9936/html5/thumbnails/7.jpg)
DNorm: www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/Dnorm/
tmVar: www.ncbi.nlm.nih.gov/CBBresearch/Lu/pub/tmVar/
SR4GN: www.ncbi.nlm.nih.gov/CBBresearch/Lu/downloads/SR4GN/
GenNorm: http://ikmbio.csie.ncku.edu.tw/GN/
tmChem: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmChem/
To make it easy for biocurators, we have already applied all these tools to PubMed abstracts and store results in our Web-based annotation tool – PubTator!
![Page 8: Zhiyong Lu, Earl Stadtman Investigator National Center for ... · 10.6% 19.0% 19.9% 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 eries Neveol, Dogan, Lu, Semi-automatic semantic annotation](https://reader033.fdocuments.us/reader033/viewer/2022060414/5f12560e66884560f66b9936/html5/thumbnails/8.jpg)
1. Web-based; no installation required; in sync with PubMed
2. One-stop curation service from literature search to annotation
3. Curator friendly (PubMed-like) interface; easy to use
4. Integrates competition-winning text-mining tools for automatic pre-annotations
5. Easy to adapt and customize to different curation tasks
Wei, Kao, & Lu: PubTator: a Web-based text-mining tool for assisting biocuration, to appear in Nucleic Acids Research, 2013
![Page 9: Zhiyong Lu, Earl Stadtman Investigator National Center for ... · 10.6% 19.0% 19.9% 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 eries Neveol, Dogan, Lu, Semi-automatic semantic annotation](https://reader033.fdocuments.us/reader033/viewer/2022060414/5f12560e66884560f66b9936/html5/thumbnails/9.jpg)
10
Bio-concept annotation
Bio-relation annotation
Document triage
![Page 10: Zhiyong Lu, Earl Stadtman Investigator National Center for ... · 10.6% 19.0% 19.9% 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 eries Neveol, Dogan, Lu, Semi-automatic semantic annotation](https://reader033.fdocuments.us/reader033/viewer/2022060414/5f12560e66884560f66b9936/html5/thumbnails/10.jpg)
BioCreative Workshop Location
Workshop date
GM GN GO PPI IAT CTD Curation
Workflow BioC
CHEMDNER
BC I Granada, Spain Mar, 2004
BC II Madrid, Spain Apr, 2007
BC II.5 Madrid, Spain Oct, 2009
BC III Bethesda, USA Sep, 2010
BC 2012 DC, USA Oct, 2012
BC IV Bethesda, USA Oct, 2013
www.biocreative.org
![Page 11: Zhiyong Lu, Earl Stadtman Investigator National Center for ... · 10.6% 19.0% 19.9% 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 eries Neveol, Dogan, Lu, Semi-automatic semantic annotation](https://reader033.fdocuments.us/reader033/viewer/2022060414/5f12560e66884560f66b9936/html5/thumbnails/11.jpg)
Task: manually annotating genes in 50 abstracts
Experimental settings (25 abstracts each)
1. PubMed + spreadsheet (baseline)
2. PubTator + computer-generated gene results
Results: 40% decease in curation time
& slightly higher accuracy
Wei, Harris, … Lu. Accelerating literature curation with text mining tools: a case study of using PubTator to curate genes in PubMed abstracts. Database, 2012; bas041
![Page 12: Zhiyong Lu, Earl Stadtman Investigator National Center for ... · 10.6% 19.0% 19.9% 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 eries Neveol, Dogan, Lu, Semi-automatic semantic annotation](https://reader033.fdocuments.us/reader033/viewer/2022060414/5f12560e66884560f66b9936/html5/thumbnails/12.jpg)
Arighi, et al., An overview of the BioCreative 2012 Workshop Track III: interactive text mining task. Database, 2013. bas056
![Page 13: Zhiyong Lu, Earl Stadtman Investigator National Center for ... · 10.6% 19.0% 19.9% 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 eries Neveol, Dogan, Lu, Semi-automatic semantic annotation](https://reader033.fdocuments.us/reader033/viewer/2022060414/5f12560e66884560f66b9936/html5/thumbnails/13.jpg)
“PubTator substantially reduces the manual data input involved, reflected in both time-savings and reduction in physical fatigue of keyboard typing.” – Mindy C.
![Page 14: Zhiyong Lu, Earl Stadtman Investigator National Center for ... · 10.6% 19.0% 19.9% 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 eries Neveol, Dogan, Lu, Semi-automatic semantic annotation](https://reader033.fdocuments.us/reader033/viewer/2022060414/5f12560e66884560f66b9936/html5/thumbnails/14.jpg)
eCuration: computer-assisted curation can improve productivity
Future directions
Working with ontologies
Working with full-text
What would you do with PubTator?
![Page 15: Zhiyong Lu, Earl Stadtman Investigator National Center for ... · 10.6% 19.0% 19.9% 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 eries Neveol, Dogan, Lu, Semi-automatic semantic annotation](https://reader033.fdocuments.us/reader033/viewer/2022060414/5f12560e66884560f66b9936/html5/thumbnails/15.jpg)
My Team
Rezarta Dogan
Bethany Harris
Ritu Khare
Aurelie Neveol
Yuqing Mao
Robert Leaman
Jiao Li
Chih-Hsuan Wei
BioCreative
Lynette Hirschman, MITRE
Kevin Cohen, U of Colorado
Alfonso Valencia; Martin Krallinger, CNIO
Cecilia Arighi, Cathy Wu, U of Delaware
Carolyn Mattingly; Tom Wiegers, NCSU
Supported by NIH Intramural Research Program, National Library of Medicine.
![Page 16: Zhiyong Lu, Earl Stadtman Investigator National Center for ... · 10.6% 19.0% 19.9% 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 eries Neveol, Dogan, Lu, Semi-automatic semantic annotation](https://reader033.fdocuments.us/reader033/viewer/2022060414/5f12560e66884560f66b9936/html5/thumbnails/16.jpg)
Pacific Symposium on Biocomputing (PSB) 2015
Robert Leaman and Zhiyong Lu, NCBI/NLM/NIH
Ben Good and Andrew Su, Scripps Research Institute
January 4 – 8, 2015
The Big Island of Hawaii
Crowdsourcing and Mining Crowd Data
Crowdsourcing techniques
microtask environments
games with a purpose
workflow sequestration
Crowd data
human genomics sequence data
electronic health records
social media data