The Gene Ontology and its insertion into UMLS Jane Lomax.
-
Upload
annice-merritt -
Category
Documents
-
view
214 -
download
1
Transcript of The Gene Ontology and its insertion into UMLS Jane Lomax.
![Page 1: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/1.jpg)
The Gene Ontology and its insertion into UMLS
Jane Lomax
![Page 2: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/2.jpg)
The Gene Ontology
Set of three structured vocabularies
Provide functional annotation of gene products
Dynamic
Cross-references to external databases
![Page 3: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/3.jpg)
The vocabularies
Molecular function — elemental activity or task
Biological process — broad objective or goal
Cellular component — location or complex
![Page 4: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/4.jpg)
The vocabularies
Molecular function — elemental activity or task• nuclease, DNA binding, microtubule motor
Biological process — broad objective or goal
Cellular component — location or complex
![Page 5: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/5.jpg)
The vocabularies
Molecular function — elemental activity or task• nuclease, DNA binding, microtubule motor
Biological process — broad objective or goal• mitosis, signal transduction, metabolism
Cellular component — location or complex
![Page 6: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/6.jpg)
The vocabularies
Molecular function — elemental activity or task• nuclease, DNA binding, microtubule motor
Biological process — broad objective or goal• mitosis, signal transduction, metabolism
Cellular component — location or complex• nucleus, ribosome
![Page 7: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/7.jpg)
GO structure
Directed acyclic graph (DAG) Allows multiple parentage
![Page 8: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/8.jpg)
True-path rule
Every path from a node back to the root must be biologically accurate
![Page 9: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/9.jpg)
Relationship types
is_a• subclass: a is a type of b
part_of• physical part of (component)• sub-process of (process)
![Page 10: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/10.jpg)
What makes up a GO term?
• term name• go_id• definition and
definition dbxref
• GO synonym• general dbxref• comment
![Page 11: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/11.jpg)
GO cross-links
Cross-references within GO• EC• RESID• MetaCyc
Mappings• SWISS-PROT keywords
Links in other databases• InterPro• UMLS/MeSH – in progress
![Page 12: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/12.jpg)
Why insert GO into UMLS?
A rich, widely used source for expanding UMLS• Can be used to improve areas of MeSH
Potential for ‘non-fuzzy’ text mining using GO terms• MeSH terms manually assigned to papers
![Page 13: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/13.jpg)
Unified Medical LanguageSystem (UMLS)
Research project maintained by the National Library of Medicine (NLM)
Aims to • allow computers to ‘understand’ biomedical meaning• improve retrieval and integration of computer
readable info
Has three ‘Knowledge sources’:• UMLS Metathesaurus • SPECIALIST lexicon • semantic network
![Page 14: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/14.jpg)
Knowledge sources
UMLS Metathesaurus• links multiple source vocabularies into unified
concepts, includes MeSH (Medical Subject Headings)
• GO to become source vocabulary
SPECIALIST lexicon• provides biomedical/English lexical info
semantic network • for categorizing concepts
![Page 15: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/15.jpg)
Inserting GO into UMLS
inversion• converting GO to correct format for UMLS
insertion• inserting GO using matching algorithms
editing• all concepts containing GO term reviewed
by hand
![Page 16: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/16.jpg)
Statistics
Approximately 23% of GO terms ‘match’ something in another source vocabulary
23.03%GO terms in concepts with other sources
76.97%GO terms in concepts where they are the only source
![Page 17: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/17.jpg)
Statistics
biological process molecular functioncellular component
% of GO in sources with other concepts, by GO vocabulary
4.6% 27.8% 45.2%
![Page 18: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/18.jpg)
Statistics
% of GO in sources with other concepts, by source
CSP2002 (Computer Retrieval of Information on Scientific Projects Thesaurus)
7.34 %
MSH2003_2002_08_14 (Medical Subject Headings)
19.74 %
SNMI98 (Systemized Nomenclature of Human and
Veterinary Medicine)
11.05 %
GO
CRISP
MeSH
SNOMED
![Page 19: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/19.jpg)
concept name
concept id
GO atoms
MeSH atoms
EC number
contexts
relationships to other concepts
definition
![Page 20: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/20.jpg)
Challenges with insertion
GO synonyms• As GO evolved - now not all synonymous
GO enzymes• GO separates enzyme function from enzyme
‘complexes’ - most vocabularies don’t
Semantic types• What semantic types now apply to concepts with GO
atoms?
![Page 21: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/21.jpg)
Future of insertion
Hoped that GO can be released with UMLS early next year• dependent on ironing out problems
Maintenance of insertion• GO changing continually - large differences
between UMLS releases
![Page 22: The Gene Ontology and its insertion into UMLS Jane Lomax.](https://reader035.fdocuments.us/reader035/viewer/2022062721/56649f1d5503460f94c337a7/html5/thumbnails/22.jpg)
www.geneontology.org•FlyBase & Berkeley Drosophila Genome Project•Saccharomyces Genome Database• PomBase (Sanger Institute)• Rat Genome Database• Genome Knowledge Base (CSHL)• The Institute for Genomic Research• Compugen, Inc•The Arabidopsis Information Resource•WormBase•DictyBase•Mouse Genome Informatics•Swiss-Prot/TrEMBL/InterPro•Pathogen Sequencing Unit(Sanger Institute)
•National Library of Medicine
•Alexa McCray•Stuart Nelson•Bill Hole
•Oak Ridge Institute for Science and Education•National Library of Medicine•U. S. Department of Energy
The Gene Ontology Consortium is supported by an R01 grant from the National Human Genome Research Institute (NHGRI) [grant HG02273]. SGD is supported by a P41, National Resources, grant from the NHGRI [grant HG01315]; MGD by a P41 from the NHGRI [grant HG00330]; GXD by the National Institute of Child Health and Human Development [grant HD33745]; FlyBase by a P41 from the NHGRI [grant HG00739] and by the Medical Research Council, London. TAIR is supported by the National Science Foundation [grant DBI-9978564]. WormBase is supported by a P41, National Resources, grant from the NHGRI [grant HG02223]; RGD is supported by an R01 grant from the NHLBI [grant HL64541]; DictyBase is supported by an R01 grant from the NIGMS [grant GM064426].