Amo amos amot amomus amotis amont. Happy birthday Swiss-Prot Fortaleza August 2006.
-
Upload
antony-booker -
Category
Documents
-
view
216 -
download
2
Transcript of Amo amos amot amomus amotis amont. Happy birthday Swiss-Prot Fortaleza August 2006.
Three (Orthogonal) Ontologies
• Biological Process
– Goal or objective within cell, tissue ..
• Molecular Function
– Elemental activity or task
• Cellular Component
– Location or complex
•molecular function 7,432 terms•biological process 10,740 terms•cellular component 1,772 terms
•all 19,994 terms
definitions 19,042 (96%)
Content of GO
!version v.4.2!date 4 November 1998!author Michael Ashburner$Gene Ontology ; GO:0000001 ; remark: $function ; GO:0000002 ; remark: %macromolecule ; GO:0000003 ; remark: %protein ; GO:0000004 ; remark: %enzyme ; GO:0000005 ; remark: %alpha-alpha-trehalase ; GO:0000006 ; remark: ; EC:3.2.1.28 %alpha-alpha-trehalose-phosphate synthase (UDP-forming) ; GO:0000007 ; remark: ; EC:2.4.1.15 %alpha-L-fucosidase ; GO:0000008 ; remark: ; EC:3.2.1.51 %alpha-N-acetylglucosaminidase ; GO:0000009 ; remark: ; EC:3.2.1.50 %alpha-amylase ; GO:0000010 ; remark: ; EC:3.2.1.1 %alpha-glucosidase II ; GO:0000011 ; remark: ; EC:3.1.2.20 %alpha-ketoacid dehydrogenase complex ; GO:0000012 ; remark: <oxoglutarate dehydrogenase (lipoamide) ; GO:0000013 ; remark: ; EC:1.2.4.2
....
%DNA-directed DNA polymerase ; GO:0000054 ; remark: ; EC:2.7.7.7 %nuclear DNA-directed DNA polymerase ; GO:0000055 ; remark: %alpha DNA polymerase ; GO:0000056 ; remark: <alpha DNA polymerase, 180Kd-subunit ; GO:0000057 ; remark:
ma11> wc gene_ontology.v4.1 3081 22643 192480 gene_ontology.v4.1
Problems with the GO:
is_a and part_of relationships are poorly definedand not used consistently.
carries a baggage of implicit ontologies.
lack of relationships between the three GOontologies.
Problems with the GO:
is_a and part_of relationships are poorly definedand not used consistently.
carries a baggage of implicit ontologies.
lack of relationships between the three GOontologies.
• cysteine biosynthesis (ChEBI)• myoblast fusion (Cell Type Ontology)• hydrogen ion transporter activity (ChEBI)• snoRNA catabolism (Sequence Ontology)• wing disc pattern formation (Drosophila anatomy)• epidermal cell differentiation (Cell Type Ontology)• regulation of flower development (Plant anatomy)• interleukin-18 receptor complex (not yet in OBO)• B-cell differentiation (Cell Type Ontology)
Implicit ontologies within the GO:
B-cell
differentiation
lymphocytedifferentiati
onlymphocyte
B-cell
GO CL
is_a
cell differentiationbloodcell
B-cellactivatio
n
Integrating ontologies
[Term]id: GO:0030183name: B-cell differentiationis_a: GO:0042113 ! B-cell activationis_a: GO:0030098 ! lymphocyte differentiationintersection_of: is_a GO:0030154 ! cell differentiationintersection_of: has_participant CL:0000236 ! B-cell
[Term]id: CL:0000236name: B-cellis_a: CL:0000542 ! lymphocytedevelops_from: CL:0000231 ! B-lymphoblast
Augmented GO
CELL Ontology
Problems with the GO:
is_a and part_of relationships are poorly definedand not used consistently.
carries a baggage of implicit ontologies.
lack of relationships between the three GOontologies.
• To create the conditions for a step-by-step evolution towards robust gold standard reference ontologies in the biomedical domain.
• To introduce some of the features of scientific peer review into biomedical ontology development.
The OBO Foundry
The OBO Foundry
A subset of OBO ontologies whose developers agree in advance to accept a common set of principles designed to assure
– intelligibility to biologist curators, annotators, users– formal robustness – stability– compatibility– interoperability – support for logic-based reasoning
• The ontology is open and available to be used by all.
• The developers of the ontology agree in advance to collaborate with developers of other OBO Foundry ontology where domains overlap. The importance of community collaboration cannot be overstated.
• The ontology is in, or can be instantiated in, a common formal language.
• The ontology possesses a unique identifier space within OBO.
• The ontology provider has procedures for identifying distinct successive versions.
The OBO Foundry
• The ontology has a clearly specified and clearly delineated content.
• The ontology includes textual definitions for all terms.
• The ontology is well-documented.
• The ontology has a plurality of independent users.
• The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.
The OBO Foundry
Foundational relationsis_apart_of
Spatial relationslocated_incontained_inadjacent_to
Temporal relationstransformation_ofderives_frompreceded_by
Participation relationshas_participanthas_agent
regulates
Good ontologies require:Consistent use of terms, supported by logically coherent (non-circular) definitions, in equivalent human-readable and computable formats
Coherent shared treatment of relations to allow cascading inference both within and between ontologies
Ontology = A Representation of Types
Each node of an ontology consists of:
• preferred term
• term identifier
• synonyms
• definition, glosses, comments
Ontology = A Representation of Types
Nodes in an ontology are connected by relations:
primarily: is_a (= is subtype of) and part_of
designed to support search, reasoning and annotation
The aims of SO
1. Develop a shared set of terms and concepts to annotate biological sequences.
2. Apply these in our separate projects to provide consistent query capabilities between them.
3. Provide a software resource to assist in the application and distribution of SO.
The scope of the SO
1. Features that can be located on a sequence with coordinates. exon, promoter, binding_site
2. Properties of these features:– Sequence attributes
• Maternally_imprinted_gene
– Consequences of mutation• mutation_affecting_editing
– Chromosome variation• aneuploid
What is a pseudogene?
• Human– Sequence similar to known protein but contains
frameshift(s) and/or stop codons which disrupts the ORF.
• Neisseria– A gene that is inactive - but may be activated by
translocation (e.g. by gene conversion) to a new chromosome site.
– - note such a gene would be called a “cassette” in yeast.
Give me all the dicistronic genes
• Define a dicistronic gene in terms of the
cardinality of the transcript to open-reading-
frame relationship and the spatial arrangement
of open-reading frames.
Relationships allow reasoning.
• VALIDATION - We can check the internal consistency of an annotation against the ontology. We can also check that any topological assertions are true.
3’ UTR part_of mRNA
intron part_of mRNA
• The formal properties of parts:
1. If A is a proper part of B then B is not a part of A
(nothing is a proper part of itself)
2. If A is a part of B and B is a part of C then A is a part of
C
• Because of these rules, we can apply functions
to parts…
Classical Extensional Mereology
EM operation Definition
Overlap
(x○y)
x and y overlap if they have a part in common.
Disjoint
(xιy)
x and y are disjoint if they share no parts in common.
Binary Product
(x.y)
The parts that x and y share in common.
Difference
(x–y)
The largest portion of x which has no part in common with y.
Binary Sum
(x+y)
The set consisting of individuals x and y
Extensional Mereology (EM) : a formal theory of parts