From MIAME to MAML: Microarray Gene Expression Database (MGED)

25
From MIAME to MAML: Microarray Gene Expression Database (MGED) Chris Stoeckert Center for Bioinformatics University of Pennsylvania Sept. 19, 2001 GE ^

description

GE. ^. From MIAME to MAML: Microarray Gene Expression Database (MGED). Chris Stoeckert Center for Bioinformatics University of Pennsylvania Sept. 19, 2001. Standardisation of Microarray Data and Annotations -MGED Group. - PowerPoint PPT Presentation

Transcript of From MIAME to MAML: Microarray Gene Expression Database (MGED)

Page 1: From MIAME to MAML: Microarray Gene Expression Database (MGED)

From MIAME to MAML: Microarray Gene Expression

Database (MGED)

Chris Stoeckert

Center for Bioinformatics

University of Pennsylvania

Sept. 19, 2001

GE

^

Page 2: From MIAME to MAML: Microarray Gene Expression Database (MGED)

Standardisation of Microarray Data and Annotations -MGED Group

The MGED group is a grass roots movement initially established at the Microarray Gene Expression Database meeting MGED 1 (14-15 November, 1999, Cambridge, UK). The goal of the group is to facilitate the adoption of standards for DNA-array experiment annotation and data representation, as well as the introduction of standard experimental controls and data normalisation methods. Members are from academia, government, and industry from around the world.

www.mged.org

Page 3: From MIAME to MAML: Microarray Gene Expression Database (MGED)

Why Microarray Data Standards?

• Standards are needed for:– Evaluating microarray data (standards in

quality measures, protocols).– Exchanging microarray data (standards in data

exchange).– Analysing microarray data (standards in

annotations, data provided)

Page 4: From MIAME to MAML: Microarray Gene Expression Database (MGED)

How to Create Microarray Data Standards

• Understand thoroughly what is the minimum information about a microarray experiment that is needed to interpret it unambiguously and what is the structure of this information (objects and relationships)

• Create the technical data format able to capture this information

• Find or generate appropriate controlled vocabularies and ontologies

• Create standards in experiments themselves (standard controls and protocols)

Page 5: From MIAME to MAML: Microarray Gene Expression Database (MGED)

MGED Working Groups

• Experiment description and data representation standards (Alvis Brazma, EMBL-EBI)

• Microarray data XML exchange format (Paul Spellman, UC Berkeley)

• Ontologies for sample description (Chris Stoeckert, U Penn)

• Normalisation, quality control and cross-platform comparison (Frank Holstege, UMC Utrecht, Roger Bumgarner, U Wash)

Page 6: From MIAME to MAML: Microarray Gene Expression Database (MGED)

MGED Milestones• MGED 2 meeting in Heidelberg in 2000, MGED 3 in

Stanford in 2001, both ~ 300 participants

• Minimum Information About a Microarray Experiment – MIAME version 1.0 posted

• Collaboration with OMG on data formats MAML+GEML = MAGE-ML and MAGE-OM

• MGED 4 meeting in 2001, in Boston in February

• MGED will become ISCB Special Interest Group

Page 7: From MIAME to MAML: Microarray Gene Expression Database (MGED)

MIAME v1.0Minimum Information About a Microarray Experiment Approved at MGED 3 meeting, Stanford University, March 28, 2001

The goal of the MIAME is to specify the minimum information that must be reported about an array based gene expression monitoring experiment in order to ensure the interpretability of the results, as well as potential verification by third parties. This is to facilitate establishing repositories and a data exchange format for array based gene expression data. The MGED group will encourage scientific journals and funding agencies to adopt policies requiring data submissions to repositories, once MIAME compliant repositories and annotation tools are established.

Page 8: From MIAME to MAML: Microarray Gene Expression Database (MGED)

MIAME DescriptionsDefinition:

The minimum information about a published microarray-based

gene expression experiment should include a description of the:

1. Experimental design: the set of hybridisation experiments as a whole

2. Array design: each array used and each element (spot) on the array

3. Samples: samples used, extract preparation and labeling

4. Hybridisations: procedures and parameters

5. Measurements: images, quantitation, specifications

6. Normalisation controls: types, values, specifications

An additional section dealing with the data quality assurance

will be added in the next MIAME release.

Page 9: From MIAME to MAML: Microarray Gene Expression Database (MGED)

sample source and treatment ID as used in section 1organism (NCBI taxonomy)additional "qualifier, value, source" list; the list includes:

cell source and type (if derived from primary sources (s))sexagegrowth conditionsdevelopment stageorganism part (tissue)animal/plant strain or linegenetic variation (e.g., gene knockout, transgenic variation)individualindividual genetic characteristics (e.g., disease alleles, polymorphisms)disease state or normaltarget cell typecell line and source (if applicable)in vivo treatments (organism or individual treatments)in vitro treatments (cell culture conditions)treatment type (e.g., small molecule, heat shock, cold shock, food deprivation)compoundis additional clinical information available (link)separation technique (e.g., none, trimming, microdissection, FACS)

laboratory protocol for sample treatment

MIAME Section on Sample Source and Treatment

Page 10: From MIAME to MAML: Microarray Gene Expression Database (MGED)

MAGE SourceForge

Page 11: From MIAME to MAML: Microarray Gene Expression Database (MGED)

MAGE BioMaterial Model

Page 12: From MIAME to MAML: Microarray Gene Expression Database (MGED)

MAGE Programming Jamboree

• Toronto Sept. 2001

• Hosted by Jason Goncalves, Iobion

• APIs, Importers, Exporters

• Perl, Java, C++

Page 13: From MIAME to MAML: Microarray Gene Expression Database (MGED)

MGED OWGhome page

Page 14: From MIAME to MAML: Microarray Gene Expression Database (MGED)

What is an ontology?

• An ontology is a specification of concepts that includes the relationships between those concepts.

• Provides semantics and constraints

• Allows for computational inferences and reliable comparisons

Page 15: From MIAME to MAML: Microarray Gene Expression Database (MGED)

OWG Use Cases• Return a summary of all experiments that use a

specified type of biosource.– Group the experiments according to treatment.

• Return a summary of all experiments done examining effects of a specified treatment– Group the experiments according to biosource.

• Return a summary of all experiments measuring the expression of a specified gene.– Indicate when experiments confirm results, provide new

information, or conflict.

• Generate a distance metric for experiment types• Generate an error estimation for experimental

descriptions

Page 16: From MIAME to MAML: Microarray Gene Expression Database (MGED)

SpeciesResources

Page 17: From MIAME to MAML: Microarray Gene Expression Database (MGED)
Page 18: From MIAME to MAML: Microarray Gene Expression Database (MGED)

ConceptDefinitions

Page 19: From MIAME to MAML: Microarray Gene Expression Database (MGED)
Page 20: From MIAME to MAML: Microarray Gene Expression Database (MGED)

Excerpts from a Sample Descriptioncourtesy of M. Hoffman, S. Schmidtke, Lion BioSciences

Organism: mus musculus [ NCBI taxonomy browser ]Cell source: in-house bred mice (contact: [email protected]) Sex: female [ MGED ]Age: 3 - 4 weeks after birth [ MGED ]Growth conditions: normal

controlled environment20 - 22 oC average temperaturehoused in cages according to German and EU legislationspecified pathogen free conditions (SPF)14 hours light cycle10 hours dark cycle

Developmental stage: stage 28 (juvenile (young) mice) [ GXD "Mouse Anatomical Dictionary" ]Organism part: thymus [ GXD "Mouse Anatomical Dictionary" ]Strain or line: C57BL/6 [International Committee on Standardized Genetic Nomenclature for Mice]Genetic Variation: Inbr (J) 150. Origin: substrains 6 and 10 were separated prior to 1937. This substrain is now probably the most widely used of all inbred strains. Substrain 6 and 10 differ at the H9, Igh2 and Lv loci. Maint. by J,N, Ola. [International Committee on Standardized Genetic Nomenclature for Mice ]Treatment: in vivo [MGED] intraperitoneal injection of Dexamethasone into mice, 10 microgram per 25 g bodyweight of the mouseCompound: drug [MGED] synthetic glucocorticoid Dexamethasone, dissolved in PBS

Page 21: From MIAME to MAML: Microarray Gene Expression Database (MGED)

MGED Biomaterial Ontology• Under construction

– Using OILed (May use others)– Generating a RDF schema file

• Motivated by MIAME and coordinated with MAGE

• Extend classes, provide constraints, provide terms to use

Page 22: From MIAME to MAML: Microarray Gene Expression Database (MGED)
Page 23: From MIAME to MAML: Microarray Gene Expression Database (MGED)

MGED Plans• MIAME 2.0

– Add/extend sections on normalisation,quality assurance, data analysis

• MAGE Software– Importers, exporters– Reflect MIAME changes and ontologies

• Ontologies– Identified resources– Ontology of entire microarray experiment

• Normalization– Discussion of methods– Common controls

• User’s Queries– Community needs

Page 24: From MIAME to MAML: Microarray Gene Expression Database (MGED)

MGED Summary

• International grass-roots organization for microarray standards.– Public databases

– Published experiments

• Generated MIAME and MAGE– MIAME: Guidelines for information capture

– MAGE: Common object model

• Building ontologies and normalization standards– Ontologies: Common language

– Normalization: New web page

• MGED 4 in Boston, MA, Feb. 13-16, 2002

Page 25: From MIAME to MAML: Microarray Gene Expression Database (MGED)

MGED-Related sites

• MGED: http://www.mged.org• MIAME: http://www.mged.org/Annotations-wg/• MAGE: http://www.geml.org/omg.html

http://sourceforge.net/projects/mged/• OWG: http://www.cbil.upenn.edu/Ontology/• NWG: http://www.dnachip.org/mged/normalization.html