Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what:...
-
Upload
mabel-wilcox -
Category
Documents
-
view
212 -
download
0
Transcript of Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what:...
![Page 1: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/1.jpg)
Core 2: Bioinformatics
CBio-Berkeley
![Page 2: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/2.jpg)
Outline
• Berkeley group background• Core 2 first round
– what: aims, milestones– how: software lifecycle, interaction w/
other cores• Current progress • Discussion
![Page 3: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/3.jpg)
Berkeley group: genomics
• Formerly BDGP (Berkeley Drosophila Genome Project) Informatics– Genome sequencing, analysis and
annotation– Genomic application development– Database development
• FlyBase• Generic Model Organism Database
![Page 4: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/4.jpg)
Apollo
![Page 5: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/5.jpg)
GBrowse
![Page 6: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/6.jpg)
In-situ expression database
![Page 7: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/7.jpg)
Genomics applications
• GadFly– analysis and annotation database– pipeline software
• BOP– computational analysis integration
• CGL– Comparative Genomics Software
Library
![Page 8: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/8.jpg)
SO and SOFA
• Sequence Ontology for Feature Annotation
• Ontology for genomics– Sequence feature classes:
• mRNA, intron, UTR, sequence_variant, …
– Sequence feature relations• exon part_of transcript• polypeptide derives_from mRNA
![Page 9: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/9.jpg)
Chado• Model organism relational database schema
– FlyBase, GMOD
• Modules– sequence annotations– expression– map– genotype– phenotype– ontology/cv– …
• Generic schema– Uses ontologies for strong typing
![Page 10: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/10.jpg)
Berkeley group: GO
• Gene Ontology - Informatics– Database, web portal – Ontology editing tools– Ontology QC and integration– OBO
![Page 11: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/11.jpg)
OBO-Edit (formerly DAG-Edit)
![Page 12: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/12.jpg)
AmiGO and GO Database
![Page 13: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/13.jpg)
Obol
• Problem: large ontologies of composite terms are difficult to manage
• Solution: partial automation (reasoners)• Requires logical definitions
– how do we obtain them?
• Solution: Obol– Parses logical definitions from class names– Logical definitions can be reasoned over
• detect errors and automation
– Integrates OBO ontologies
![Page 14: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/14.jpg)
OBO Relations Ontology
• Common relations used across ontologies must mean the same thing
– is_a– part_of– derives_from– has_participant– …
• OBO relations ontology provides precise definitions– defines class-level relations in terms of their
instances
• http://obo.sourceforge.net/relationship– collaboration with core5, Manchester & others
![Page 15: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/15.jpg)
Outline
• Berkeley group background• Core 2 first round
– what: aims, milestones– how: software lifecycle, interaction w/
other cores• Current progress • Open questions
![Page 16: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/16.jpg)
![Page 17: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/17.jpg)
Core 2 specific aims
• Aims1. Capture and describe data2. Reconcile annotation and ontology
changes3. Store, view and compare annotations4. Link disease genes
• First round– phenotypes: Fly and Zebrafish– HIV clinical trial data
![Page 18: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/18.jpg)
Aim 1: Capture and describe data
• Phenotype data capture– OBO-Edit plug-ins– Combine classes from multiple
ontologies• PATO, anatomical ontologies
– NLP tools?
• Clinical trial data capture– what are the appropriate tools?
![Page 19: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/19.jpg)
Aim 1: Capture and describe data
• Zebrafish, fly– PaTO: Phenotype and trait ontology
• phenotype ‘primitives’– ‘Entity-Attribute-Value’ model– Phenotype ontologies– Genetic data– Orthologs
• Clinical trial data– generic instance model– what are the appropriate ontologies here?
![Page 20: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/20.jpg)
PATO
• An ontology of attributes and attribute values– e.g. morphology, structure, placement
• Current status of PATO?– needs work to conform to sound ontology
principles• definitions• formalisation of attributes
– working with core3-cambridge (Gkoutos) and core5 (Neuhaus)
![Page 21: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/21.jpg)
Phenotype annotation
• Entity-attribute structured annotations– Entity term; PATO term
• brain FBbt:00005095; fused PATO:0000642
• gut MA:0000917; dysplastic PATO:0000640
• tail fin ZDB:020702-16; ventralized PATO:0000636
• kidney ZDB:020702-16; hypertrophied PATO:0000636
• midface ZDB:020702-16; hypoplastic PATO:0000636
• Pre-composed phenotype terms– Mammalian Phenotype Ontology
• “increased activated B-cell number” MPO:0000319
• “pink fur hue” MPO:0000374
![Page 22: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/22.jpg)
Example (Fly)
Entity Attribute Value Background/Environment
embryp viability lethal Scer\GAL4[hs.PB]
dorsal cuticle shape abnormal
… … … …
wing vein L2 shape branched temperature sensitive
Gene: JraAllele: Jra[bZIP.Scer\UAS]Allele Description:defects in head and dorsal cuticle.Scer\GAL4[hs.PB] induces…..
A481G
bZIP
![Page 23: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/23.jpg)
Genotype-Phenotype datamodel
• Need to model complex genotypes• Environment• Phenotype
– E-A-V is not enough• Relational attributes• Complex phenotypes• Measurements and assays
– CSHL 2005 Phenotype meeting
![Page 24: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/24.jpg)
Aim 2: Reconcile annotation and
ontology changes• Ontology evolution can trigger
annotation changes• Identifiers
– all classes and annotations will have stable identifiers
– Cores 1 and 2 to decide on identifier model• LSID URNs
• OntoTrack
![Page 25: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/25.jpg)
Aim 3: Store, view and compare annotations
• OBO: ontologies• OBD: data annotated using
ontologies– genotype-phenotype– clinical trials– others
![Page 26: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/26.jpg)
OBD: A Database for OBO
• Data warehouse– collected from MODs and other sources
• Annotation versioning• Generic data model
– Any data typed by OBO classes can be stored
• Specific annotation data views– Clinical trial data view– Phenotype data view
• Chado-compliant• Entity-attribute-(value) model
![Page 27: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/27.jpg)
![Page 28: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/28.jpg)
Key technologies
• ‘Semantic Web’ database technology– ontology-aware
• ontologies are part of meta-model• higher level query languages
– SPARQL, SeRQL, …• tool interoperability
– Protégé-OWL, Jena, ..
– SQL compatibility• optionally layered on relational model
– Standards? Maturity?• Many implementations
– Sesame, Kowari,
![Page 29: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/29.jpg)
Aim 3: Store, view and compare annotations
• Browsing– AmiGO-2
• Advanced visualization– work with core 1 (University of
Victoria)
![Page 30: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/30.jpg)
Comparing annotations
• process vs state– regulatory processes:
• acidification of midgut has_quality reduced rate• midgut has_quality low acidity
• development vs behavior– wing development has_quality abnormal– flight has_quality intermittent
• granularity (scale)– chemical vs molecular vs cell vs tissue vs
anatomical part
![Page 31: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/31.jpg)
Integrating anatomical ontologies
• Annotations should be comparable between species– phenotype annotations are composed of anatomical
terms
• Multiple species-centric anatomical ontologies– Problem: how do we compare across species?– XSPAN (Bard et al): creating mappings– Core 1: ontology mappings
![Page 32: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/32.jpg)
Aim 4: Linking disease genes
• Homology data– Orthologous genes
• Genomic data– SNPs, sequence variants
• Ontologies– Disease ontologies– Semantic similarity– Ontology integration
• Obol, XSPAN
![Page 33: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/33.jpg)
Linking disease to phenotype
• Relationship of phenotype to diseases and disorders– essentialist– statistical
• Disease ontologies– OBO disease ontology (Northwestern)– EVOC disease ontology (EVOC)– Others
• Disease ontology workshop (core 5)– November 2006
![Page 34: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/34.jpg)
Outline
• Berkeley group background• Core 2 first round
– what: aims, milestones– how: software lifecycle,
interaction w/ other cores• Current progress • Open questions
![Page 35: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/35.jpg)
Software lifecycle
• Software is developed in phases• Different phases require
interaction with different cores• Iterative “Agile” methodology
– fast cycles– involve ‘customer’ (core3) at all
phases
![Page 36: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/36.jpg)
![Page 37: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/37.jpg)
Outline
• Berkeley group background• Core 2 first round
– what: aims, milestones– how: software lifecycle, interaction w/
other cores
• Current progress
![Page 38: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/38.jpg)
Current progress
• Meetings– CSHL November 2005
• Phenotype ontology meeting• Phenotype tools workshop
– Berkeley, UVic, Core 3
• OBO-Edit complex class plug-in• Phenotype browser prototype• Genotype-Phenotype datamodel
![Page 39: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/39.jpg)
OBO-Edit complex class plug-in
• Combinatorial composition of classes
• Current use-cases:– plant anatomical structures– integrating GO and OBO-Cell
• Ideal for phenotype classes– extend to make ‘phenotype’ plug-in
![Page 40: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/40.jpg)
OBD Progress
• Genotype-Phenotype data model defined
• Prototype implemented• evaulating technologies
![Page 41: Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.](https://reader036.fdocuments.us/reader036/viewer/2022070406/56649de65503460f94ade5e1/html5/thumbnails/41.jpg)
Phenotype browser
• Experimental branch of AmiGO code• Allows browsing and querying of
combinatorial phenotype annotations
• Experimental dataset• Demo
– http://yuri.lbl.gov/amigo/obd