ArrayExpress: Helen Parkinson

11
EBI is an Outstation of the European Molecular Biology Laboratory. MAGE-TAB - The ArrayExpress Production Experience Helen Parkinson, PhD

Transcript of ArrayExpress: Helen Parkinson

Page 1: ArrayExpress: Helen Parkinson

EBI is an Outstation of the European Molecular Biology Laboratory.

MAGE-TAB - The ArrayExpress Production Experience

Helen Parkinson, PhD

Page 2: ArrayExpress: Helen Parkinson

www.ebi.ac.uk/arrayexpress

Content

• All change at ArrayExpress• Data acquisition • Validation• Extension • Downloads• Long Term Future• Tutorial – submitting in MAGETAB format

Page 3: ArrayExpress: Helen Parkinson

www.ebi.ac.uk/arrayexpress

MAGEMLMAGEML

MAGEML

MAGEML

AEM.EXPRESS

MAGETABULATOR

Tracking

M.EXPRESS

MAGETABULATOR

AE2

MAGETAB MIGRATION

MAGETAB

MAGETAB

Page 4: ArrayExpress: Helen Parkinson

www.ebi.ac.uk/arrayexpress

Data acquisition

• MAGETAB data acquisition is integrated with existing tab2mage submissions

• MAGETAB export is being added to the MIAMExpress system

• All MAGE-ML submissions will be converted to MAGETAB• We will unify data acquisition on MAGETAB• We decided to do most curation/validation/ontology matching

at the end for MAGETAB submissions• MAGETAB makes curator edit and user update much easier• Human readable tab delimited formats=efficient curation

• 1600 Experiments processed (1600/3700) • All curated• Subset of ArrayExpress MAGETAB data will be re-curated at

migration

Page 5: ArrayExpress: Helen Parkinson

www.ebi.ac.uk/arrayexpress

Automated processing and validation

• Sections• MAGETAB Column Headers• MAGTAB Column Orders• MAGETAB Content – length, terms• External data files – released monthly• vs. ArrayExpress content • MIAME score• DW candidates

Page 6: ArrayExpress: Helen Parkinson

www.ebi.ac.uk/arrayexpress

Extensibility

• Solexa data • Proteomics• Metabolomics

• Array Genotype data (Gen2Phen)

• Association study data (Gen2Phen, Engage)

• Locus specific SNP data

• Clinical Data

• …..

Page 7: ArrayExpress: Helen Parkinson

www.ebi.ac.uk/arrayexpress

Downloads

• All ArrayExpress data will be available in MAGETAB format now (exported direct from AE)

• ~90% is currently available and passes checks (issues with MAGE-OM->MAGETAB)

• More ontology term sources will be added incrementally – NCI thesaurus/OBI/ArrayExpress Factor Ontology

• Beta MAGETAB ArrayExpress Bioconductor Module (Huber, Kauffman)

• All MAGETAB generation code is available• All validation code is available

Page 8: ArrayExpress: Helen Parkinson

www.ebi.ac.uk/arrayexpress

Ontologies

• Working to develop OBI to replace MGED ontology• Generating a sample/factor ontology for ArrayExpress

based on data content • Developed in Protégé/OWL format• Will be served from OLS• Also mapping to external ontologies for samples e.g NCI

thesaurus• Text mining to annotate external data using dictionaries

based on NCI thesaurus and some custom ones (GEOimporter, tab2mage->MAGETAB)• Data import, meta analysis

Page 9: ArrayExpress: Helen Parkinson

www.ebi.ac.uk/arrayexpress

Future: ArrayExpress and Community

• ArrayExpress Submission in MAGETAB ADF format • All ArrayExpress ADF in MAGETAB format• Alpha ArrayExpress-MAGETAB BioConductor MAGETAB importer• AE2• AE2 data migration • More people post their MAGETAB examples and we agree on a gold

std validated set for typical cases• Community lists of MAGETAB supportive tools where people can

register their interests and describe their applications (like GO tools)• Addressing HLA • MAGETAB model, firm up the spec• Decide what factors really are, and whether the MAGE case is still

valid – controlled vs uncontrolled variables instead? • Issues with global variables - inter experiment comparison of

compounds needs to know dose even if dose doesn’t vary in an experiment

Page 10: ArrayExpress: Helen Parkinson

www.ebi.ac.uk/arrayexpress

Acknowledgments • Anna Farne• Ele Holloway• James Malone• Margus Lukk ArrayExpress Production Team• Helen Parkinson• Tim Rayner• Faisal Rezwan• Eleanor Williams• Mengyao Zhao• Holly Zheng• Mohammad Shojatalab ArrayExpress Development Team

• FundingEC - FELICS, EMERALD, Gen2Phen, MUGENNIH - MAGE grant

Page 11: ArrayExpress: Helen Parkinson

www.ebi.ac.uk/arrayexpress

Tutorial

• Creation of MAGETAB templates• Completion of a pre-made template• Curation

• Scoring and validation templates• Viewing Data in ArrayExpress• Backend of the template generation/tracking system

• www.ebi.ac.uk/~parkinso/MAGETAB_tutorial/