GIGA2 Structuring Phenotype Data
-
Upload
chris-mungall -
Category
Science
-
view
47 -
download
1
Transcript of GIGA2 Structuring Phenotype Data
![Page 1: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/1.jpg)
GIGA2, Munich, March 2015
STRUCTURING
PHENOTYPE DATA:
Chris
Mungall
LBNL,
Berkeley
Gene
Ontology
Lessons from vertebrate
genomes
![Page 3: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/3.jpg)
Desvignes, T., Pontarotti, P., & Bobe, J. (2010).
Nme gene family evolutionary history reveals pre-
metazoan origins and high conservation between
humans and the sea anemone, nematostella
vectensis. PLoS ONE, 5(11).
doi:10.1371/journal.pone.0015506
Genome
structures are
highly
amenable to
comparison
![Page 4: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/4.jpg)
Can we compute over the architecture of phenomes as we do
for genome architecture?
oWhat genes affect distal appendage length or shape?
oWhat are the genes expressed in the mouth during development?
oWhat structures develop using the same gene regulatory networks as
in bilaterian mouths?
Current methods
o Text based search of literature and manually gather results
Time consuming
Hard to automate
COMPUTING OVER PHENOTYPES
![Page 5: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/5.jpg)
Ge
ne
Every phenotype ever to have existed
expressed
in mouth
Affects appendage length
regulates EMT …
![Page 6: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/6.jpg)
PHENOTYPES: ENDLESS FORMS
Pe
yto
ian
ath
ors
tiA
mp
hip
ho
lis
sq
ua
ma
taP
etr
om
yzo
nm
ari
nu
s
Bu
gu
la
Ho
mo
sa
pie
ns
(wit
h c
left
pa
late
)
Myste
ce
tiA
ply
sin
aa
ero
ph
ob
aG
astr
ula
(M
eta
zoa
n)
mouth anusosculum
blastopore
cleft
lip and
palate
![Page 7: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/7.jpg)
Ge
ne
“expressed
in mouth”
“affects appendage length”
“long tentacles”
“elongated arms”
FREE TEXT != STRUCTURED
“expressed
around oral
opening”
“expressed
in anterior
end of gut
tube”
![Page 8: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/8.jpg)
ONTOLOGIES: STRUCTURING A DIVERSITY
OF PHENOTYPES
tentacle
tentacular
bud
circumoral
appendage
tentacular
club sucker
arm
develops
into
is a subtype of
Is part of
homologous
arm IV
https://github.com/obophenotype/cephalopod-ontology
mouthsurrounds
![Page 9: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/9.jpg)
ONTOLOGIES FOR MOLECULAR
PHENOTYPES
tentacle
tentacular
bud
circumoral
appendage
tentacular
club sucker
arm
develops
into
is a subtype of
Is part of
homologous
arm IV
Scr Lox5 Antp
Expressed in
mouthsurrounds
![Page 10: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/10.jpg)
GRAPH KNOWLEDGE QUERIES
tentacle
tentacular
bud
circumoral
appendage
tentacular
club sucker
arm
develops
into
is a subtype of
Is part of
homologous
arm IV
Scr Lox5 Antp
Expressed in
mouthsurrounds
“What genes
Are expressed in
structures that develop from
a tentacle bud, or homologs?”
![Page 11: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/11.jpg)
ONTOLOGIES FOR TRAITS
tentacle
tentacular
bud
circumoral
appendage
tentacular
club sucker
arm
develops
into
is a subtype of
Is part of
homologous
arm IV
mouthsurrounds
shape length++
=shape of
tentacular club
=length of
arm IV
![Page 12: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/12.jpg)
Wild-type phenotypic function:
o The Gene Ontology
Anatomy:
o Uberon anatomy ontology
APPLICATIONS OF ONTOLOGIES
![Page 13: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/13.jpg)
For curating the ‘wild type functional phenotypes’
Genes for over 0.5 million species have associations to GO
terms
>40,000 terms
oMolecular function
o Cellular component
o Biological Process
Core and taxon-specific
Uses include
o Gene set selection
o Term enrichment
THE GENE ONTOLOGY
Gene Ontology: tool for the unification of biology: Ashburner et al. Nature Genetics 25, 25 - 29 (2000)
http://geneontology.org
![Page 14: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/14.jpg)
Experimental
o Curated from literature
Automated methods:
o Based on sequence similarity
E.g. blast2go
o Based on protein features
Interpro2GO
o Based on phylogenetic evidence
Ensembl COMPARA
Panther Families and PAINT
Typically only applied for
conserved cellular biology
ASSIGNING GENE FUNCTION
Gaudet, P., et al. (2011). Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium.
Briefings in Bioinformatics, 12(5), 449–62. doi:10.1093/bib/bbr042
PAINT
![Page 15: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/15.jpg)
EXTRACTING GENE LISTS AND
INTERPRETING TRANSCRIPTOMIC DATA
Wang, Z., Pascual-Anaya, J., Zadissa, A., Li, W., Niimura, Y., Huang,
Z., … Irie, N. (2013). The draft genomes of soft-shell turtle and
green sea turtle yield insights into the development and evolution
of the turtle-specific body plan. Nature Genetics, 45(6), 701–6.
doi:10.1038/ng.2615
![Page 16: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/16.jpg)
BEYOND THE GO
Functional
Genomics: Gene
function
Transcriptomics:
Gene expression
Phenomics: Effects
of gene mutations
Gene Ontology
Anatomy and Stage
Ontology
Phenotype and Trait
Ontology
Links genes to
What they do
Links genes to
where they
are expressed
Links genes to
what happens
when they are
disrupted
![Page 17: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/17.jpg)
Core: 14,000 terms
o Bias towards vertebrate systems
Composite-Metazoan edition: 42,000 terms
o Integrates cell types, developmental stages,
o Species-specific ontologies
Uses
o Standard reference for animal anatomy
o Linking model organism databases
o Evolutionary systematics (Phenoscape)
o Comparative transcriptomics (Bgee)
o Standardized vocabulary for mammalian
sequencing consortia
o Cross-species phenotype matching (Monarch)
THE UBERON MULTI-SPECIES
COMPARATIVE ANATOMY ONTOLOGY
http://uberon.org
Mungall, C. J., Torniai, C., Gkoutos, G. V, Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species
anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5
![Page 18: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/18.jpg)
PHENOSCAPE: LINKING EVOLUTION TO
GENOMICS USING PHENOTYPE ONTOLOGIES
Phenotypic knowledgebase
o Linking phenotypes to extant and extinct vertebrate taxa
o Integrate with model organism databases
Extending Uberon to cover diversity of vertebrates
Haendel, MA, Balhoff JP, ..., Sereno, PC., Mungall, C.J (2014).
Unification of multi-species vertebrate anatomy ontologies for
comparative biology in Uberon. Journal of Biomedical Semantics,
5(1), 21. doi:10.1186/2041-1480-5-21
![Page 19: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/19.jpg)
UBERON FOR COMPARATIVE GENE
EXPRESSION
![Page 20: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/20.jpg)
EXAMPLE OF EXPRESSION DATA
Ensembl ID Gene Stage ID Stage Anatomy ID Anatomy Evidence
ENSMUSG
00000071424Grid2 UBERON:00
00112
sexually
immature
UBERON:00
02979
Purkinje cell
layer of
cerebellar
cortex
high quality
ENSMUSG
00000071424Grid2 UBERON:00
18241
prime adult UBERON:00
04720
cerebellar
vermis
high quality
Mus_musculus (‘simple’ expression file)
http://bgee.org/?page=download
![Page 21: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/21.jpg)
EXAMPLE OF INFERRED EXPRESSION
DATA
Ensembl ID Gene Stage ID Stage Anatomy ID Anatomy Evidence
ENSMUSG
00000071424Grid2 UBERON:0
000112
sexually
immature
UBERON:00
02979
Purkinje cell layer
of cerebellar cortex
high quality
ENSMUSG
00000071424Grid2 UBERON:0
000112
sexually
immature
UBERON:00
02129
cerebellar cortex high quality
ENSMUSG
00000071424Grid2 UBERON:0
000112
sexually
immature
UBERON:00
02979
cerebellum high quality
ENSMUSG
00000071424Grid2 UBERON:0
000112
sexually
immature
UBERON:00
02028
hindbrain high quality
… …
ENSMUSG
00000071424Grid2 UBERON:0
018241
prime
adult
UBERON:00
04720
cerebellar vermis high quality
ENSMUSG
00000071424Grid2 UBERON:0
018241
prime
adult
UBERON:00
04720
cerebellum high quality
… …
Mus_musculus (‘complete’ expression file)
http://bgee.org/?page=download
![Page 22: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/22.jpg)
CURATING A DATABASE OF HOMOLOGY
HYOPTHESES
https://github.com/BgeeDB/anatomical-similarity-annotations
gastrodermis
mouth
choanoderm
osculumhomologous
homologous
Leininger S, Adamski M, …
Adamska M 10.1038/ncomms4905Developmen
tal
Gene expression
evidence
Cnidaria Porifera
![Page 23: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/23.jpg)
ONTOLOGIES FOR DATA
STANDARDIZATION IN SEQUENCING
CONSORTIA
Malladi, V. S., Erickson, D. T., Podduturi, N. R., Rowe, L. D., Chan, E. T., Davidson, J. M., … Hong, E. L. (2015). Ontology application and use at the
ENCODE DCC. Database : The Journal of Biological Databases and Curation, 2015, bav010–. doi:10.1093/database/bav010
Washington, N.L., Stinson, E.O., Perry, M.D. et al. (2011) The modENCODE Data Coordination Center: lessons in harvesting comprehensive
experimental details. Database, 2011, bar023
https://www.encodeproject.org/search/?type=biosample
![Page 24: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/24.jpg)
Monarch Initiative
o Large knowledgebase connecting genes, genotypes and diseases to
phenotypes
o Find novel linkages between human diseases to model systems
o http://monarchinitiative.org
Driving use case
o Given a patient with a rare or unique spectrum of abnormal
phenotypes, determine the causative genomic variant(s)
DISEASES AND ABNORMAL PHENOTYPES
![Page 25: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/25.jpg)
Standard Clinical
Exome
Testing Pipeline
Predicts causative variant based on information in genome of patient and
background genomic data
![Page 26: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/26.jpg)
https://www.sanger.ac.uk/resources/databases/exomiser/query/exomiser2
Robinson, P., et al . (2013). Improved exome prioritization of
disease genes through cross species phenotype comparison.
Genome Research. doi:10.1101/gr.160325.113
![Page 27: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/27.jpg)
http://monarchinitiative.org/analyze/phenotypes/
EXOMISER USES ONTOLOGY-BASED
PHENOTYPE MATCHING
cleft palate = cleft
(attribute)
palate
(structure)+
![Page 28: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/28.jpg)
SOLVING UNDIAGNOSED
DISEASES
Behavioural/Psychiatric Abnormality
Thyroid stimulating
hormone excess
Gait apraxia
Spasticity
increased exploration in new
environment
increased dopamine level
hyperactivity
hyperactivity
Behavioral
abnormality
Abnormality of
the endocrine
system
abnormal
locomotor
behavior
Abnormal
voluntary
movement
Patient
phenotypes Sh3kbp1 tm1Ivdi -/-
NIH Undiagnosed Disease Program, patient 2731
![Page 29: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/29.jpg)
Think about
o How your data will be re-used by others
o How what your doing will scale
Provide structured metadata for experimental data
o Free text is not enough
o Use ontologies and standardized vocabularies where possible
Failing to do so will cost you later!
o All major human and model organism omics consortia now enforce
this
ENCODE, FANTOM, LINCS
o Also major phenotyping projects
IMPC/KOMP2
LESSONS
![Page 30: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/30.jpg)
Providing metadata requires the right ontologies or
vocabularies in place
Make phenotypic knowledge about your favorite system
structured and computable
o This seems daunting, where do I start…?
LESSONS
![Page 31: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/31.jpg)
Got transcriptome data?
o Bgee will curate it for you!
o Caveat: Your genome must be in Ensembl Genomes
oWe are also interested in your homology hypotheses
Got classic systematics data?
o Talk to me about using Phenoscape infrastructure
BGEE WILL CURATE YOUR
TRANSCRIPTOME DATA
![Page 32: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/32.jpg)
Uberon Core
GOT ANATOMY EXPERTISE? CLAIM AN
INVERTEBRATE MODULE!
Thacker, R. W., Díaz, M. C., Kerner, A., Vignes-Lebbe, R., Segerdell, E.,
Haendel, M. a, & Mungall, C. J. (2014). The Porifera Ontology (PORO):
enhancing sponge systematics with an anatomy ontology. Journal of
Biomedical Semantics, 5(1), 39
Vertebrate
structures
Porifera
Ontology
Ctenophore
Ontology
Cephalopod
Ontology
http://phenotypercn.org
Eric Edsinger, CephSeq
https://github.com/obophenotype/cephalopod-ontology
https://github.com/obophenotype/ctenophore-ontology
https://github.com/obophenotype/porifera-ontology
https://github.com/obophenotype/uberon
Arthropod
Ontology
![Page 33: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/33.jpg)
Noctua
Curation using multiple ontologies with a graph model
oWeb-based, collaborative
oAdvanced GO curation
oPhenotype curation
Beta available in summer 2015
ohttp://noctua.berkeleybop.org
CURATE GENE REGULATORY NETWORKS
AND PHENOTYPES
![Page 34: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/34.jpg)
Structured metadata is valuable
o Helps build the knowledge graph of invertebrate genomics
o Capture metadata up-front, not after the fact
o Use ontologies where possible
o Don’t repeat mistakes of projects that ignored this advice
Invertebrate Ontologies at a nascent stage
o This is an opportunity! Get involved!
CONCLUSIONS
![Page 35: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/35.jpg)
Monarch
oMelissa A Haendel
o Nicole Washington
o Sebastian Kohler
o Harry Hochheiser
oMaryann Martone
o Suzanna Lewis
o Damian Smedley
o Peter Robinson
oWilliam Bone
o Jeremy Nguyen-
Xuan
ACKNOWLEDGMENTS
Uberon
o Frederic Bastian
o Ann Niknejad
oMarc Robinson-
Rechavi
o Todd Vision
o Jim Balhoff
o Paul Sereno
o Nizar Ibrahim
o Alex Dececchi
o Yvonne Bradford
o Terry Hayamizu
o Robert Druzinsky
NSF Phenotype RCN
o Paula Mabee
o Suzanna Lewis
o Eva Huala
o Andy Deans
o Erik Segerdell
o Robert Thacker
o Eric Edsinger
oMatt Yoder
o Istvan Miko
o David Osumi-
Sutherland
![Page 36: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/36.jpg)
![Page 37: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/37.jpg)
Toward synthesizing our knowledge of morphology: using ontologies and machine reasoning to extract presence/absence
evolutionary phenotypes across studies. Dececchi TA et al. https://peerj.com/preprints/807/
![Page 38: GIGA2 Structuring Phenotype Data](https://reader034.fdocuments.us/reader034/viewer/2022042522/55a8e1551a28abb94e8b46a5/html5/thumbnails/38.jpg)
FORWARD GENOMICS
http://bejerano.stanford.edu/phenotree/public/html/ Hiller et al. 2012 Cell Reports