MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for...

28
MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI

Transcript of MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for...

Page 1: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

MGED Ontology Working Group

MGED4

Boston, MA

Feb. 15, 2002

Chris Stoeckert, Center for Bioinformatics, U. Penn

Helen Parkinson, EBI

Page 2: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

Agenda

• Overview of ontologies• Status of MGED Ontology• Incorporating ontologies into microarray

database annotation forms - Helen Parkinson• Discussion

– Annotation experience – Use Cases: needs besides retrieving

experiments?– issues:

• Missing concepts? (quick tour of ontology)• Relationship between MAGE and MGED ontology

Page 3: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

What Does an Ontology Do?

• Captures knowledge• Creates a shared understanding – between

humans and for computers• Makes knowledge machine processable• Makes meaning explicit – by definition and

context

From Building and Using Ontologies, Robert Stevens, U. of Manchester

Page 4: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

What is an Ontology?

Catalog/ID

GeneralLogical

constraints

Terms/glossary

Thesauri“narrower

term”relation Formal

is-aFrames

(properties)

Informalis-a

Formalinstance

Value Restrs. Disjointness, Inverse, part-

of…

From Building and Using Ontologies, Robert Stevens, U. of Manchester

Page 5: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

Uses of Ontology

• Community reference -- neutral authoring. • Either defining database schema or defining a common

vocabulary for database annotation -- ontology as specification. • Providing common access to information. Ontology-based

search by forming queries over databases. • Understanding database annotation and technical literature.• Guiding and interpreting analyses and hypothesis generation

From Building and Using Ontologies, Robert Stevens, U. of Manchester

Page 6: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

Components of an Ontology

• Concepts: Class of individuals – The concept Protein and the individual `human cytochrome C’

• Relationships between concepts• Is a kind of relationship forms a taxonomy• Other relationships give further structure – is a

part of• Axioms – Disjointness, covering, equivalence,…

From Building and Using Ontologies, Robert Stevens, U. of Manchester

Page 7: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

Languages• Vocabularies using natural language

– Hand crafted, flexible but difficult to evolve, maintain and keep consistent, with weak semantics

– Gene Ontology

• Object-based KR: frames– Extensively used, good structuring, intuitive. Semantics defined by OKBC

standard– EcoCyc (uses Ocelot) and RiboWeb (uses Ontolingua)

• Logic-based: Description Logics– Very expressive, model is a set of theories, well defined semantics– Automatic derived classification taxonomies– Concepts are defined and primitive

From Building and Using Ontologies, Robert Stevens, U. of Manchester

Page 8: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

Microarray Information to be Captured

Figure from:David J. Duggan et al. (1999) Expression Profiling using cDNA microarrays. Nature Genetics 21: 10-14

Page 9: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

MGED Ontology Working Group Goals

1. Identify concepts

2. Collect available controlled vocabularies and ontologies for concepts

3. Define concepts

4. Formalize concept relationships

Page 10: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

Relationship of MGED Efforts

MAGEMIAMEDB

MIAMEDBExternal

Ontologies/CVs

MGED Ontology

AnnotationFormatOntologies External Internal

Ontologies provide common terms and their definitions for describing microarray experiments.

Page 11: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

http://www.cbil.upenn.edu/Ontology/

Page 12: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

SpeciesResources

Page 13: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

ConceptDefinitions

Page 14: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

Usage of Concepts and Resources for Microarrays

• MIAME glossary– Provide definitions for types of information

(concepts) listed in MIAME

• MIAME qualifier, value, source– Provide pointers to relevant sources that can be

used to annotate experiments

Page 15: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

sample source and treatment ID as used in section 1organism (NCBI taxonomy)additional "qualifier, value, source" list; the list includes:

cell source and type (if derived from primary sources (s))sexagegrowth conditionsdevelopment stageorganism part (tissue)animal/plant strain or linegenetic variation (e.g., gene knockout, transgenic variation)individualindividual genetic characteristics (e.g., disease alleles, polymorphisms)disease state or normaltarget cell typecell line and source (if applicable)in vivo treatments (organism or individual treatments)in vitro treatments (cell culture conditions)treatment type (e.g., small molecule, heat shock, cold shock, food deprivation)compoundis additional clinical information available (link)separation technique (e.g., none, trimming, microdissection, FACS)

laboratory protocol for sample treatment

MIAME Section on Sample Source and Treatment

Page 16: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

External References ©-BioMaterialDescription

©-Biosource Property

©-Organism

©-Age

©-DevelopmentStage

©-Sex

©-StrainOrLine

©-BiosourceProvider

©-OrganismPart

©-BioMaterialManipulation

©-EnvironmentalHistory

©-CultureCondition

©-Temperature

©-Humidity

©-Light

©-PathogenTests

©-Water

©-Nutrients

©-Treatment

©-CompoundBasedTreatment

(Compound)

(Treatment_application)

(Measurement)

MGED Ontology Instances

NCBI TaxonomyNCBI Taxonomy

Mouse Anatomical DictionaryMouse Anatomical Dictionary

International Committee on Standardized Genetic Nomenclature for Mice

International Committee on Standardized Genetic Nomenclature for Mice

Mouse Anatomical DictionaryMouse Anatomical Dictionary

ChemIDplusChemIDplus

Mus musculus musculus id: 39442

7 weeks after birth

Stage 28

Female

C57BL/6N

Charles River, Japan

Liver

22 2C

55 5%

12 hours light/dark cycle

Specified pathogen free conditions

ad libitum

MF, Oriental Yeast, Tokyo, Japan

Fenofibrate, CAS 49562-28-9

in vivo, oral gavage

100mg/kg body weight

An example of microarray sample annotation using the MGED ontology Susanna A. Sansone, Helen Parkinson, Philippe Rocca-Serra,

Chris Stoeckert and Alvis Brazma

Page 17: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

MAGE BioMaterial Model

Page 18: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.
Page 19: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

MGED Biomaterial Ontology• Under construction

– Using OILed (Not wedded to any one tool)– Generate multiple formats: RDFS, DAML+OIL

• Define classes, provide relations and constraints, identify instances

• Motivated by MIAME and coordinated with MAGE

Page 20: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

http://www.ontoknowledge.org/oil/

Page 21: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

Building a Microarray Ontology

http://www.cbil.upenn.edu/Ontology/Build_Ontology2.html

Page 22: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

http://mged.sourceforge.net/Ontologies.shtml

Page 23: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

Ontology in Browseable Form

Page 24: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

Example of Internal Terms

Page 25: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

Example of External Terms

Page 26: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

Example of Combined Internal and External: Treatment

Page 27: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

OWG Use Cases• Make it easier and more accurate to annotate a microarray experiment.

– Build forms that provide menus of terms and links to external resources.– Only ask for relevant terms and fill in terms that can be inferred.

• Return a summary of all experiments that use a specified type of biosource.– Use “age” to select and order experiments– Use Mouse Anatomical Dictionary Stage 28 to pick experiments according to

“organism part”

• Return a summary of all experiments done examining effects of a specified treatment– E.g., Look for “CompoundBasedTreatment”, “in vivo”– Select “Compound” based on CAS registry number– Order based on “CompoundMeasurement”

• ? Use to check if “MIAME-compliant.”– Assess only fields that are relevant– Check for proper use of terms

• ? Build gene networks based on biomaterial description– Generate a distance metric based on biosource and use in calculation of

correlation with gene expression level– Generate an error estimation based on biosample (i.e., even when biosources are

identical, there will be variation resulting from different treatments)

Page 28: MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.

MGED Ontology Plans• More Concepts? Improve definitions?

– Extend to other parts of MIAME

• More instances!• Add identifiers to all classes (facilitate neutral

authoring). Instances?• Add constraints. Prevent nonsense associations

(e.g., only time units for age)• Write a paper describing and explaining MGED

ontology by next meeting with example applications and datasets.– Mechanism to establish a consensus “standard.”