1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller,...

44
1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics

Transcript of 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller,...

Page 1: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

1

MAGE:Revised submission

against LSR RFP-007"Gene Expression"

Ugis Sarkans, EBI

Michael Miller, Rosetta Inpharmatics

Page 2: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

2

Overview• Acknowledgements

• Specification history and structure

• Fundamental Terms

• UML Packages

• Mapping from PIM to XML-PSM

• Schedule

• Resources

Page 3: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

3

Acknowledgements• Doug Bassett (Rosetta)

• Derek Bernhart (Affymetrix)

• Alvis Brazma (EBI)

• Steve Chervitz (Affymetrix)

• Francisco Dela Vega (Applied Biosystems)

• Michael Dickson (NetGenics)

• David Frankel (IONA)

• Ken Griffiths (NetGenics)

• Scott Markel (NetGenics)

• Michael Miller (Rosetta)

• Dave Nellesen (Incyte)

• Alan Robinson (EBI)

• Ugis Sarkans (EBI)

• Barry Schwartz (Affymetrix)

• Martin Senger (EBI)

• Paul Spellman (Stanford)

• Jason Stewart (NCGR)

• Charles Troup (Agilent)

• participants of MAGE programming jamboree (hosted by Iobion) in Toronto, September 2001

Page 4: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

4

Model -Driven Architecture• Platform Independent Model (UML)

– most of the effort spent on this

• Platform Specific Model– XML

• UML (refined from PIM):– not used (Rational Rose profile for UML not that useful)

• DTD – generated from PIM

– manual modifications

Page 5: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

5

History of the submittal• lifesci/01-06-02 - an interim draft before the

Danvers meeting– not enough time to work out XML

• lifesci/01-08-01 - not the final submission– programming jamboree after the Toronto

meeting helped a lot, especially in the XML mapping area

• lifesci/01-10-01 - current submission

Page 6: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

6

Specification Structure

• Text document with explanations, including all diagrams– prepared partly by exporting from Rational

Rose

• PIM, UML model as a single XMI file

• XMI => DTD translation software (as a formal representation of the mapping rules)

• XML DTD

Page 7: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

7

Fundamental Terms• BioSample - tissue, cell-line, etc. that may

be treated

• BioMaterial - generic term for biological-based material

• BioSequence - an abstraction of a biological sequence

• BioAssay – treatment of an array with a labeled extract, i.e.

hybridization– experimental step in a broader sense

Page 8: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

8

Fundamental Terms (2)• Reporter - the physical representation of

biosequence(s) on an array

• Feature - location on an array

• Event - description of an action, i.e. treatment of a BioSample or the act of hybridization

• Transformation - a specific Event, transforming a set of data to another set of data.

Page 9: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

9

UML Packages (1)

• BioSequence and BQS

• BioMaterial

• BioEvent

• ArrayDesign and DesignElement

• ArrayManufacture

• BioAssay

• BioAssayData

Page 10: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

10

UML Packages (2)

• Experiment

• HigherLevelAnalysis

• Miscellaneous– Describable– Measurement– QuantitationType– Protocol– Audit and Security

Page 11: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

11BSANE BQS

Description

Protocol

Measurement

Audit

Treatment

Transformation

BioEvent

Experiment

ArrayDesign

BioMaterial

BioAssayData BioAssay

DesignElement

UML Packages (3)

HigherLevelAnalysis

BioSequence

ArrayManufactureQuantitationType

Page 12: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

12

Package dependencies

Page 13: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

13

Important package dependencies

Page 14: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

14

Experiment• Represents the container for a hierarchical

grouping of BioAssays

• ExperimentDesign decribes and annotates the overall design and purpose of the experiment

• Description of experimental steps can be structured by ExperimentalFactors/ FactorValues:– ExperimentalFactor is a part of

ExperimentDesign– FactorValues can be attached to BioAssays

Page 15: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

15

Experiment

Page 16: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

16

HigherLevelAnalysis

• The results of performing analysis on the BioAssayData from an Experiment

• Clustering allows specifying the results of analysis as a hierarchical tree

• Cluster Nodes can have NodeValues and are associated with *Dimension objects

Page 17: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

17

BioAssayData• The data associated with either a measured

BioAssay or a derived BioAssay• Data is conceptually a 3-D matrix, with

dimensions:– BioAssayDimension

– DesignElementDimension

– QuantitationTypeDimension

• Transformations are used to capture data processing sequence and rules– *Mapping objects formalize dimension translations

• Two representations for BioDataValues:– a set of BioDataTuples

– BioDataCube

Page 18: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

18

BioAssayData

Page 19: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

19

BioAssayDataBioAssay

QuantitationType

DesignElement

Transformation

Page 20: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

20

QuantitationType

• StandardQuantitationTypes and SpecializedQuantitationTypes

• list of SQTs

• can refer to a Channel object

• QuantitationTypeMap - within BioAssayData package

Page 21: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

21

BioAssay

• Three types of BioAssays (experimental steps):– PhysicalBioAssay

• Contains information and annotation on the event of joining an Array with BioMaterial, typically with LabeledExtract(s); also, Treatments

– MeasuredBioAssay• FeatureExtraction

– DerivedBioAssay• corresponds to a dry-lab experimental step

Page 22: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

22

BioAssay

Page 23: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

23

Array• Manufacturing information about the

implementation of an array design– Defects and deviations from the design can be

recorded• FeatureDefects

• ZoneDefects

– The LIMS biomaterial information for what was put on each feature can be recorded here

– ArrayGroups and Fiducials

Page 24: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

24

Array

Page 25: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

25

BioMaterial• Describes how a BioSource is treated to

obtain the BioMaterial for Hybridization (typically a LabeledExtract)

• Used by a BioAssayCreation in combination with an Array to produce a PhysicalBioAssay

• A set of treatments are typically linear in time but can form a Directed Acyclic Graph

• Formalization of Treatments with Compounds

Page 26: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

26

BioMaterial

Page 27: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

27

DesignElement

• DesignElements– Features are the locations on the array

– Reporters represents some biological sequence (clone, oligo, etc.) that can be placed on one or more features

• immobilized characteristics

– CompositeSequence is a grouping that represents a biological sequence composed of other biological sequences (gene, exon, etc.)

• biological characteristics

• *Maps - for relating Features to Reporters etc– MismatchInformation

Page 28: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

28

DesignElement

Page 29: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

29

BioSequence

• BioSequence class - abstraction of various biosequences

• DatabaseEntries for characterizing BioSequences

• Simplication of BSANE draft; will need to be compatible with the end result of BSANE

Page 30: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

30

ArrayDesign

• ArrayDesign describes a microarray design that can be manufactured– Zone information– DesignElementGroups

Page 31: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

31

ArrayDesign

Page 32: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

32

BioEvent• Abstraction of various MAGE events:

– physical (e.g., BioMaterial Treatment) – data manipulation (Transformation)

• Have associated ProtocolApplications (an ordered list)

• Subclasses have some target (the result of the BioEvent)

• Often have sources

• Relevant for BioMaterial, BioAssay, BioAssayData packages

Page 33: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

33

Protocol• Protocol and ProtocolApplication

– Protocol describes a generic laboratory procedure or analysis algorithm

– ProtocolApplication describes the actual application of a protocol

– ProtocolApplication:• values for the replaceable parameters

• any variation from the Protocol

• Similarly:– Hardware and HardwareApplication– Software and SoftwareApplication

Page 34: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

34

Protocol

Page 35: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

35

Miscellaneous (1)• Hierarchy of top-level abstract classes

– Extendable - can have properties– Describable - can have also Descriptions and

Security and Audit information– Identifiable - also has (unambiguous within

some scope) identifier and a name

• AuditAndSecurity package– Contact/Person/Organization classes– tracking of changes (audit trail)– user security (access rights to MAGE objects)

Page 36: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

36

Miscellaneous (2)• Description package

– Description is a container for• free text description

• OntologyEntries

• DatabaseEntries

• BibliographicReferences

• BQS package– BibliographicReference class

• Measurement package– Measurement is a quantity with a unit– simple Measurement ontology provided

Page 37: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

37

DTD & XML Format

<MAGE-ML> <{packageName}_package> <{className}_assnlist> <!-- generated container element --> <{className}> <!-- independent class elements --> <{container}> <!-- one of *_assn, *_assnref, *_assnlist, *_assnreflist --> <{className or className_ref}>

…<!-- alternating {container} and {className or className_ref} --> </{className or className_ref}> </{container}> </{className}> </{className}_assnlist> ... <!-- more independent classes --> </{packageName}_package>> ... <!-- more packages --></MAGE-ML>

* slide borrowed from Angel Pizarro, UPenn

Page 38: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

38

XML tree example

AuditAndSecurity_pkg

Contact_assnlist

ExperimentDesign_assn

Experiment_pkg

Experiment_assnlist

Experiment

Contact_ref

ExperimentDesign

Provider_assnref

MAGE-ML

Contact

* slide borrowed from Angel Pizarro, UPenn

Page 39: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

39

Programming APIs• Mapping of OM to language-specific OMs• API’s are automatically generated from the

OM specifications– Get/set methods for associations– Get/set methods for attributes

• XML <=> language-specific OM marshallers/unmarshallers - also automatically generated

Page 40: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

40

Programming APIs (cont.)

• Use standard modules/packages– Xerces, JDK, etc.

• Implementation in Java, C++, Perl

• Building annotation tools/database access modules on top of these APIs

Page 41: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

41

Schedule

• LSR ‘vote to vote’ at Dublin OMG meeting in November– LSR, AB, DTC votes at Dublin OMG meeting

• Setting up FTF

• open source implementation efforts– Jamboree II at EBI, December 6-11

• MAGE v.2.0– current MAGE <=> MAGE v.2.0 mapping

rules

Page 42: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

42

Web Sites

• MAGE specification - hosted by Rosetta– links to documents

• presentations

• UML models– XMI files

– Rose .mdl files

– HTML version

– PNG image files of diagrams

– http://www.geml.org/omg.htm

• MGED programming effort:– http://sourceforge.net/projects/mged

Page 43: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

43

Mailing Lists• Specification-related

[email protected]– to subscribe, send the following to

[email protected]

subscribe lsr-ge <yourEmailAddress>

• MAGE-STK development-related– https://lists.sourceforge.net/lists/listinfo/mged-

mage

Page 44: 1 MAGE: Revised submission against LSR RFP-007 "Gene Expression" Ugis Sarkans, EBI Michael Miller, Rosetta Inpharmatics.

44

Questions?