Beyond protein expression, MOPED goes multi-omics€¦ · solely a protein expression database to a...

7
Published online 17 November 2014 Nucleic Acids Research, 2015, Vol. 43, Database issue D1145–D1151 doi: 10.1093/nar/gku1175 Beyond protein expression, MOPED goes multi-omics Elizabeth Montague 1,2,3,4 , Imre Janko 2,3,4 , Larissa Stanberry 1,2,3,4 , Elaine Lee 2,3,4 , John Choiniere 1,2,4 , Nathaniel Anderson 1,2,4 , Elizabeth Stewart 1,4 , William Broomall 2,3,4 , Roger Higdon 1,2,3,4 , Natali Kolker 2,3,4 and Eugene Kolker 1,2,3,4,5,6,* 1 Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children’s Research Institute, Seattle, WA, USA 98101, 2 High-Throughput Analysis Core, Seattle Children’s Research Institute, Seattle, WA, USA 98101, 3 CDO Analytics, Seattle Children’s, Seattle, WA, USA 98101, 4 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101, 5 Departments of Biomedical Informatics and Medical Education and Pediatrics, University of Washington, Seattle, WA, USA 98109 and 6 Department of Chemistry and Chemical Biology, College of Science, Northeastern University, Boston, MA 02115 Received September 02, 2014; Revised October 23, 2014; Accepted November 02, 2014 ABSTRACT MOPED (Multi-Omics Profiling Expression Database; http://moped.proteinspire.org) has transitioned from solely a protein expression database to a multi- omics resource for human and model organisms. Through a web-based interface, MOPED presents consistently processed data for gene, protein and pathway expression. To improve data quality, con- sistency and use, MOPED includes metadata detail- ing experimental design and analysis methods. The multi-omics data are integrated through direct links between genes and proteins and further connected to pathways and experiments. MOPED now contains over 5 million records, information for approximately 75 000 genes and 50 000 proteins from four organ- isms (human, mouse, worm, yeast). These records correspond to 670 unique combinations of exper- iment, condition, localization and tissue. MOPED includes the following new features: pathway ex- pression, Pathway Details pages, experimental meta- data checklists, experiment summary statistics and more advanced searching tools. Advanced searching enables querying for genes, proteins, experiments, pathways and keywords of interest. The system is enhanced with visualizations for comparing across different data types. In the future MOPED will expand the number of organisms, increase integration with pathways and provide connections to disease. INTRODUCTION When it was created in 2011, MOPED (Multi-Omics Profil- ing Expression Database) initially focused on just one class of molecules––proteins (1,2). Current high-throughput ap- proaches produce a vast amount of diverse molecular ex- pression data (3) that can be utilized to enable a deeper understanding of biological processes. The need for multi- omics data systems is highlighted by the number of biolog- ical discoveries that were made possible by examining mul- tiple omics data types (4–7). With the addition of transcriptomics in 2014, MOPED became a multi-omics resource (8). Utilizing consistent processing, MOPED provides gene and protein expression data that can be compared across experiments, conditions and tissues. Integration of these multi-omics data occurs through direct links between genes and proteins, connec- tions to pathways and consistent experimental metadata. MOPED provides detailed experimental metadata via a standardized multi-omics checklist. The checklist provides details about research objectives, experimental design, pro- tocols, instrumentation, data processing and analysis (9– 11). This added layer of experimental metadata can be used by researchers to assess data quality, consistency, usability and reproducibility. New features and data have been added to MOPED to provide meaningful integration of multi-omics. These new features include Pathway Details pages, pathway expression summaries, experimental metadata checklists for all gene and most protein expression experiments, summary statis- tics for experiments and more advanced searching tools in- cluding Boolean searching. MOPED data are integrated in the context of biological pathways through pathway sum- maries and expression statistics, allowing the user to exam- ine connections and interactions (12). Advanced searching * To whom correspondence should be addressed. Tel: +206-884-7170; Fax: +206-987-7660; Email: [email protected] Disclaimer: The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Science Foundation, The Robert B. McMillen Foundation or Seattle Children’s Research Institute. C The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]

Transcript of Beyond protein expression, MOPED goes multi-omics€¦ · solely a protein expression database to a...

Page 1: Beyond protein expression, MOPED goes multi-omics€¦ · solely a protein expression database to a multi-omics resource for human and model organisms. Through a web-based interface,

Published online 17 November 2014 Nucleic Acids Research 2015 Vol 43 Database issue D1145ndashD1151doi 101093nargku1175

Beyond protein expression MOPED goes multi-omicsElizabeth Montague1234 Imre Janko234 Larissa Stanberry1234 Elaine Lee234John Choiniere124 Nathaniel Anderson124 Elizabeth Stewart14 William Broomall234Roger Higdon1234 Natali Kolker234 and Eugene Kolker123456

1Bioinformatics and High-Throughput Analysis Laboratory Center for Developmental Therapeutics Seattle ChildrenrsquosResearch Institute Seattle WA USA 98101 2High-Throughput Analysis Core Seattle Childrenrsquos Research InstituteSeattle WA USA 98101 3CDO Analytics Seattle Childrenrsquos Seattle WA USA 98101 4Data-Enabled Life SciencesAlliance (DELSA Global) Seattle WA USA 98101 5Departments of Biomedical Informatics and Medical Educationand Pediatrics University of Washington Seattle WA USA 98109 and 6Department of Chemistry and ChemicalBiology College of Science Northeastern University Boston MA 02115

Received September 02 2014 Revised October 23 2014 Accepted November 02 2014

ABSTRACT

MOPED (Multi-Omics Profiling Expression Databasehttpmopedproteinspireorg) has transitioned fromsolely a protein expression database to a multi-omics resource for human and model organismsThrough a web-based interface MOPED presentsconsistently processed data for gene protein andpathway expression To improve data quality con-sistency and use MOPED includes metadata detail-ing experimental design and analysis methods Themulti-omics data are integrated through direct linksbetween genes and proteins and further connectedto pathways and experiments MOPED now containsover 5 million records information for approximately75 000 genes and 50 000 proteins from four organ-isms (human mouse worm yeast) These recordscorrespond to 670 unique combinations of exper-iment condition localization and tissue MOPEDincludes the following new features pathway ex-pression Pathway Details pages experimental meta-data checklists experiment summary statistics andmore advanced searching tools Advanced searchingenables querying for genes proteins experimentspathways and keywords of interest The system isenhanced with visualizations for comparing acrossdifferent data types In the future MOPED will expandthe number of organisms increase integration withpathways and provide connections to disease

INTRODUCTION

When it was created in 2011 MOPED (Multi-Omics Profil-ing Expression Database) initially focused on just one classof moleculesndashndashproteins (12) Current high-throughput ap-proaches produce a vast amount of diverse molecular ex-pression data (3) that can be utilized to enable a deeperunderstanding of biological processes The need for multi-omics data systems is highlighted by the number of biolog-ical discoveries that were made possible by examining mul-tiple omics data types (4ndash7)

With the addition of transcriptomics in 2014 MOPEDbecame a multi-omics resource (8) Utilizing consistentprocessing MOPED provides gene and protein expressiondata that can be compared across experiments conditionsand tissues Integration of these multi-omics data occursthrough direct links between genes and proteins connec-tions to pathways and consistent experimental metadata

MOPED provides detailed experimental metadata via astandardized multi-omics checklist The checklist providesdetails about research objectives experimental design pro-tocols instrumentation data processing and analysis (9ndash11) This added layer of experimental metadata can be usedby researchers to assess data quality consistency usabilityand reproducibility

New features and data have been added to MOPED toprovide meaningful integration of multi-omics These newfeatures include Pathway Details pages pathway expressionsummaries experimental metadata checklists for all geneand most protein expression experiments summary statis-tics for experiments and more advanced searching tools in-cluding Boolean searching MOPED data are integrated inthe context of biological pathways through pathway sum-maries and expression statistics allowing the user to exam-ine connections and interactions (12) Advanced searching

To whom correspondence should be addressed Tel +206-884-7170 Fax +206-987-7660 Email eugenekolkerseattlechildrensorgDisclaimer The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Science Foundation The

Robert B McMillen Foundation or Seattle Childrenrsquos Research Institute

Ccopy The Author(s) 2014 Published by Oxford University Press on behalf of Nucleic Acids ResearchThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby-nc40) whichpermits non-commercial re-use distribution and reproduction in any medium provided the original work is properly cited For commercial re-use please contactjournalspermissionsoupcom

D1146 Nucleic Acids Research 2015 Vol 43 Database issue

enables queries of diverse data types including genes pro-teins pathways metadata conditions localizations tissuesand organisms

NEW DATA

MOPED data come from individual researchers and publicrepositories (Table 1) (13ndash21) (httpproteomecentralproteomexchangeorg httpswwwproteomicsdborg)Consistent processing of hand-curated experiments pro-vides a foundation for integrated data analysis (1822ndash23)All proteomics data are processed and analyzed usingSPIRE protein expression data combined with normaltissue protein concentrations are used to uniquely estimateprotein concentrations (12224ndash25) Preprocessed tran-scriptomics data are downloaded from GEO and analyzedusing the R package LIMMA (826ndash27) The addition ofexperimental metadata provides details about experimentaldesign instrument processing and analysis (910)

The number of expression records contained in MOPEDhas increased to over 5 million sim1 million for protein ab-solute expression 100 000 for protein relative expressionand over 4 million for gene relative expression data (Table2) These expression records correspond to sim75 000 genesand 50 000 proteins from four organisms (human mouseworm yeast) MOPED includes data from sim25 000 hu-man genes and 18 000 human proteins These expressionrecords are from 670 unique combinations of experimentcondition localization and tissue Each MOPED experi-ment can include a number of these unique combinationsThe proteins and genes are linked to over 5000 pathwaysfrom three prominent pathway resources (BioCyc Reac-tome and PANTHER) (16ndash18)

MOPED is continually expanding by moni-toring and downloading new experimental datafrom public repositories PRIDE PeptideAtlasproteomeXchange ProteomicsDB and GEO (19ndash21) (httpproteomecentralproteomexchangeorghttpswwwproteomicsdborg) Recent additions toMOPED include two prominent Human Proteome studiesof protein absolute expression data on 30 tissues and celllines from the Pandey group (28) and on 21 tissues celllines and fluids from the Kuster group (Figure 1) (29)(raw data download from ProteomeXchange PXD000561and PXD000865) Other recent data additions includeabsolute expression from deep proteome profiling of 11 celllines (30) and proteomics analysis of the NCI-60 cell linepanel (31) To provide a valuable resource for investigatingcancer mechanisms transcriptomics data from the NCI-60cell line panel (32) are being incorporated along with theproteomics data

MOPED FEATURES

The enhanced MOPED interface enables querying of pro-teins genes and keywords to access expression data sum-mary information and advanced data visualization Theinterface not only allows exploration of expression datapoints but also provides pre-calculated pathway statisticssummary pages and visualizations for a comprehensive yetconcise view of the data The home search page features six

tabs Protein Absolute Expression Protein Relative Expres-sion Gene Relative Expression Pathways Experiments andVisualization

Each of the six tabs in MOPED is designed to en-able rapid querying MOPED uses Lucene for full-textindexing and AspectJ and Google Analytics for track-ing and optimization (Apache Foundation httpluceneapacheorg Eclipse Foundation httpeclipseorgaspectjGoogle httpwwwgooglecomanalytics) The queries arebuilt and executed via the Basic Search (Figure 2A and B)and Advanced Search (Figure 3) options The Highlightssection features preset examples

Search capabilities

Basic Search now supports wildcards and logic state-ments For example in the Gene Relative Expressiontab a basic search for lsquogene=BRCrsquo returns relativeexpression data for all genes from human and modelorganisms whose names start with BRC (Figure 2A)Similarly a basic search in the Gene Relative Expres-sion tab for lsquogene=BRC and organism=human and(condition=standard or condition=cancer)rsquo returns allhuman gene relative expression data for genesrsquo names thatstart with BRC and conditions that are either standard orhave lsquocancerrsquo in the name (Figure 2B)

As a multi-omics resource MOPED supports gene andprotein queries under each of the six tabs For examplea protein ID can be used to search under the Gene Rela-tive Expression tab to obtain all the gene relative expressionrecords for the gene that corresponds to the query proteinPathways that contain the query proteingene are conve-niently linked to the Results table This integration enablescomprehensive searching with a single ID whether gene orprotein

The Advanced Search panels available for all tabs havebeen expanded (Supplementary Figure S1) The drop-downmenus show all of the available search options for exampleorganism experiment condition localization tissue chro-mosome location expression ratio and data type In addi-tion MOPED allows switching between tabs while preserv-ing the query results

Results tables

Query results are displayed in tables that are divided intosections for ease of use For instance searching under GeneRelative Expression returns a table of gene relative expres-sion records Each record has an expression ratio and thecorresponding P-value for each gene in an experiment Thisexpression ratio reflects a comparison between either tis-sues or conditions as determined by the experimental de-sign The expression records are also linked to chromo-somes and pathways (14ndash18) All values can be sorted anddownloaded

Details pages

Detailed data on each biological entity are provided on theDetails pages For example a Gene Details page contains alink to the corresponding protein chromosome mappings

Nucleic Acids Research 2015 Vol 43 Database issue D1147

Table 1 External sources for MOPED data

Data type Expression data Names descriptions and mappings

Protein PRIDE PeptideAtlas proteomeXchangeProteomicsDB Individual Labs

UniProt

Gene GEO Ensembl EntrezPathway BioCyc PANTHER Reactome

Expression data come from public repositories and individual labs (13ndash21) (httpproteomecentralproteomexchangeorg httpswwwproteomicsdborg)

Table 2 Current data statistics for MOPED as of August 31 2014

Protein absoluteexpression

Protein relativeexpression

Gene relativeexpression

Number of expression records 960 574 92 824 4 224 869Unique combinations of experiment condition localizationand tissue

362 59 249

Number of experiments 73 18 183Number of conditions 94 48 272Number of tissues 137 17 112Number of human proteins or genes 18 475 7987 24 517Number of mouse proteins or genes 14 116 3557 26 904Number of worm proteins or genes 10 178 466 18 143Number of yeast proteins or genes 5402 0 5811Number of pathways 5417 3295 5517

Each experiment consists of unique combinations of conditions localizations and tissues Expression records are expression values for a protein or genecorresponding to a combination of experiment condition localization and tissue

Figure 1 Absolute expression chart for two prominent lsquoNaturersquo proteome studies which highlights the absolute expression of protein P07148 in standardconditions across different tissues (2829)

D1148 Nucleic Acids Research 2015 Vol 43 Database issue

Figure 2 (A) Basic Search example for lsquogene = BRCrsquo default view (B) Basic Search example for lsquogene=BRC and organism=human and(condition=standard or condition=cancer)rsquo extended view Search results are expanded to include P-value False Discovery Rate (FDR) localizationand chromosome

Figure 3 Pathway Absolute Expression bar chart of Scavenging of Heme from Plasma Chart can be compared and filtered by tissue or condition Mouse-overs provide more information about the expression data point

Nucleic Acids Research 2015 Vol 43 Database issue D1149

gene description and a link to GeneCards (33) Genes arealso linked to external sources including NCBI Gene GOetc and pathways (Reactome PANTHER and BioCyc)(Table 3) (13ndash1833ndash44) Gene and Protein Details pagesdisplay gene and protein expression data respectively Thepages are enhanced by the embedded expression bar charts

Details pages for Pathways display pathway absolute ex-pression data and their bar graph together with links toexternal resources and a list of associated genes and pro-teins The associated genes and proteins link directly totheir expression data The Experiment Details page includesthe standardized experimental metadata checklist and sum-mary statistics such as the number of over- or under-expressed proteins or genes (910)

Pathways

MOPED adopted a pathway-centric view to integrate geneand protein expression data The new Pathways tab enablesquerying for pathway names keywords and biomolecules inaddition to the standard organism experiment conditionlocalization and tissue options Protein absolute expressiondata from a given experiment are mapped onto establishedpathways from Reactome PANTHER and BioCyc (16ndash18)Using this mapping MOPED provides summary statisticsfor pathway expression to enable a view of proteins workingtogether in the same pathway These statistics include path-way coverage odds ratio and the corresponding P-value cal-culated by Fisherrsquos exact test (12)

Versatile query and search tools enable complex queriesto explore these experiment-level pathway datasets Underthe Pathways tab researchers can query for pathways byfull or partial names as well as by bimolecular identifiersThe pathway expression results can then be sorted or down-loaded For example the user can visualize the results forpathways expressed in cystic fibrosis-related experiments(Figure 3)

Experimental metadata

Under the Experiments tab researchers can search filterand review experimental metadata For example queryingfor lsquocystic fibrosisrsquo will return a list of experiments wherethe search term appears in the experiment description Theresults include summary statistics on genes proteins andpathways expressed in each experiment

Experimental metadata are captured using a multi-omicschecklist (9ndash1145) This checklist provides informationabout experimental design data collection and analysisThis added layer of experimental metadata can be used byresearchers to assess the quality consistency usability andreproducibility of data in MOPED

MOPED currently contains experimental meta-data checklists on 12 proteomics and all 183 tran-scriptomics experiments (910) For example the sny-der personal omics profiling experiment has experimentalmetadata describing experiment design data collectionand data analysis (Supplementary Figure S2) (45) Built-inlinks to GEO provide additional information about thetranscriptomics experiments (21) The MOPED team iscurrently working on completing metadata checklists forall experiments in MOPED

Visualizations

MOPED provides interactive visualizations including bargraphs of protein gene and pathway expression levelsacross experiments that can be viewed by either tissue orcondition (Figures 1 and 3) The expression matrix com-pares expression of neighboring genes and correspondingproteins along the chromosome that can be grouped by tis-sue or condition The chord diagram gives an overview ofMOPED data content for different organisms Mouse-overdescriptions enhance the plots by providing more detailedgraphical and experimental information

MOPEDrsquoS IMPACT

MOPED provides support for the scientific community asa resource of processed expression data Researchers haveused MOPED to help identify potential proteins to studybased on expression levels (46) to compare expression ofdifferent proteins across experiments conditions localiza-tions and tissues (4748) to investigate expression levels invarious tissue types cell lines conditions and diseases (49ndash51) and to help link uncharacterized proteins to diabetesand cancer (5253)

MOPED was originally created in response to a sur-vey indicating a need for a processed expression database(13) Through direct and indirect interactions the MOPEDteam is able to learn the needs of biological and biomedicalresearchers and respond with updates and improvementsWith this feedback in mind MOPED acknowledges thechanging needs of the life sciences community

Data submissions

To be a complete and accurate data resource MOPEDneeds to reflect the current understanding of gene proteinand pathway expression patterns through access to the mostrecent data and experiments MOPED provides a conve-nient venue for researchers to both make their data pub-lic and share these privately with collaborators and review-ers (13) Private MOPED provides the same features toolsand conveniences as the public MOPED but enables theowner of the data to both limit access and make the datapublic when appropriate If interested in submitting data toMOPED please contact the MOPED team through the Fo-rum httpmoped-forumproteinspireorg

Future directions

As the field of biology rapidly advances MOPED is ad-vancing with it MOPED took the following steps to com-plement developments in research transitioned from pro-teomics to multi-omics (8) added metadata to enable morereliable use and integration of multiple data sources (911)and used pathways to create a coherent view of multi-omics data and an experimental level assessment of ex-pression (12) The next steps for MOPED are to enhanceintegration of different MOPED data types This will bedone by providing integrated data views graphical dis-plays and pathway analyses that show and utilize multipleomics data types In addition future versions of MOPED

D1150 Nucleic Acids Research 2015 Vol 43 Database issue

Table 3 MOPED references the following external resources on the Protein and Gene Details pages (13ndash1633ndash43)

External source Protein Gene Reference

GeneCards X X (33)PDB X (4142)BioCyc X (16)UniProt X (13)NCBI Gene X X (15)RefSeq X (34)ensembl X X (14)EMBL X (35)

WormBase X X (36)MGI X (37)KEGG X (38)HGNC X X (39)SGD X (40)GenBank X (43)GO X X (44)

indicates resources that are cross-referenced with MOPED

will offer disease-centric data integration and include addi-tional model organisms The disease-centric view will allowthe comparison of biomolecular processes and their func-tions between normal and disease states Also MOPEDplans to add pathway relative expression statistics to iden-tify overunder-expression of pathways in relative expres-sion experiments for both proteomics and transcriptomicsdata (5455) These steps will continue the advancement ofMOPED as a resource for biological and biomedical sci-ence

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGMENT

We thank Maggie Lackey for her critical reading themanuscript

FUNDING

National Science Foundation under the Division of Bio-logical Infrastructure [0969929] The Robert B McMillenFoundation Seattle Childrenrsquos Research Institute [to EK]Funding for open access charge Seattle Childrenrsquos ResearchInstituteConflict of interest statement None declared

REFERENCES1 HigdonR StewartE StanberryL HaynesW ChoiniereJ

MontagueE AndersonN YandlG JankoI BroomallW et al(2014) MOPED enables discoveries through consistently processedproteomics data J Proteome Res 13 107ndash113

2 KolkerE HigdonR HaynesW WelchD BroomallWLancetD StanberryL and KolkerN (2012) MOPED ModelOrganism Protein Expression Database Nucleic Acids Res 40D1093ndashD1099

3 HigdonR HaynesW StanberryL StewartE YandlGHowardC BroomallW KolkerN and KolkerE (2013) Unravelingthe complexities of life sciences data Big Data 1 42ndash50

4 ZhangW LiF and NieL (2010) Integrating multiple lsquoomicsrsquoanalysis for microbial biology application and methodologiesMicrobiol Read Engl 156 287ndash301

5 SikaroodiM GalachiantzY and BaranovaA (2010) Tumormarkers the potential of lsquoomicsrsquo approach Curr Mol Med 10249ndash257

6 StanberryL MiasGI HaynesW HigdonR SnyderM andKolkerE (2013) Integrative analysis of longitudinal metabolomicsdata from a personal multi-omics profile Metabolites 3 741ndash760

7 ChenR MiasGI Li-Pook-ThanJ JiangL LamHYKChenR MiriamiE KarczewskiKJ HariharanM DeweyFEet al (2012) Personal omics profiling reveals dynamic molecular andmedical phenotypes Cell 148 1293ndash1307

8 MontagueE StanberryL HigdonR JankoI LeeEAndersonN ChoiniereJ StewartE YandlG BroomallW et al(2014) MOPED 25ndashndashan integrated multi-omics resourceMulti-Omics Profiling Expression Database now includestranscriptomics data Omics J Integr Biol 18 335ndash343

9 KolkerE OzdemirV MartensL HancockW AndersonGAndersonN AynaciogluS BaranovaA CampagnaSR ChenRet al (2014) Toward more transparent and reproducible omics studiesthrough a common metadata checklist and data publications OmicsJ Integr Biol 18 10ndash14

10 DumbillE and KolkerE (2013) Introducing a metadata checklistfor omics data Big Data 1 195ndash195

11 Announcement Reducing our irreproducibility (2013) Nature 496398ndash398

12 KhatriP SirotaM and ButteAJ (2012) Ten years of pathwayanalysis current approaches and outstanding challenges PLoSComput Biol 8 e1002375

13 The UniProt Consortium (2014) Activities at the Universal ProteinResource (UniProt) Nucleic Acids Res 42 D191ndashD198

14 FlicekP AmodeMR BarrellD BealK BillisK BrentSCarvalho-SilvaD ClaphamP CoatesG FitzgeraldS et al (2014)Ensembl 2014 Nucleic Acids Res 42 D749ndashD755

15 MaglottD OstellJ PruittKD and TatusovaT (2011) EntrezGene gene-centered information at NCBI Nucleic Acids Res 39D52ndashD57

16 CaspiR AltmanT BillingtonR DreherK FoersterHFulcherCA HollandTA KeselerIM KothariA KuboA et al(2014) The MetaCyc database of metabolic pathways and enzymesand the BioCyc collection of PathwayGenome Databases NucleicAcids Res 42 D459ndashD471

17 MiH MuruganujanA and ThomasPD (2013) PANTHER in2013 modeling the evolution of gene function and other geneattributes in the context of phylogenetic trees Nucleic Acids Res 41D377ndashD386

18 CroftD MundoAF HawR MilacicM WeiserJ WuGCaudyM GarapatiP GillespieM KamdarMR et al (2014) The

Nucleic Acids Research 2015 Vol 43 Database issue D1151

Reactome pathway knowledgebase Nucleic Acids Res 42D472ndashD477

19 VizcaınoJA CoteRG CsordasA DianesJA FabregatAFosterJM GrissJ AlpiE BirimM ContellJ et al (2013) ThePRoteomics IDEntifications (PRIDE) database and associated toolsstatus in 2013 Nucleic Acids Res 41 D1063ndashD1069

20 DesiereF (2006) The PeptideAtlas project Nucleic Acids Res 34D655ndashD658

21 BarrettT WilhiteSE LedouxP EvangelistaC KimIFTomashevskyM MarshallKA PhillippyKH ShermanPMHolkoM et al (2013) NCBI GEO archive for functional genomicsdata setsndashndashupdate Nucleic Acids Res 41 D991ndashD995

22 KolkerE HigdonR WelchD BaumanA StewartE HaynesWBroomallW and KolkerN (2011) SPIRE systematic proteininvestigative research environment J Proteomics 75 122ndash126

23 HigdonRoger van BelleGerald and KolkerEugene (2008) A noteon the false discovery rate and inconsistent comparisons betweenexperiments Bioinformatics (Oxford England) 24 1225ndash1228[PubMed]

24 HatherG HigdonR BaumanA von HallerPD and KolkerE(2010) Estimating false discovery rates for peptide and proteinidentification using randomized databases Proteomics 102369ndash2376

25 HigdonR ReiterL HatherG HaynesW KolkerN StewartEBaumanAT PicottiP SchmidtA van BelleG et al (2011) IPMan integrated protein model for false discovery rate estimation andidentification in high-throughput proteomics J Proteomics 75116ndash121

26 HolzmanT and KolkerE (2004) Statistical analysis of global geneexpression data some practical considerations Curr OpinBiotechnol 15 52ndash57

27 SmythGK (2005) Limma linear models for microarray data InGentlemanR CareyV DudoitS IrizarryR and HuberW (eds)Bioinformatics and Computational Biology Solutions using R andBioconductor Springer NY pp 397ndash420

28 KimM-S PintoSM GetnetD NirujogiRS MandaSSChaerkadyR MadugunduAK KelkarDS IsserlinR JainSet al (2014) A draft map of the human proteome Nature 509575ndash581

29 WilhelmM SchleglJ HahneH Moghaddas GholamiALieberenzM SavitskiMM ZieglerE ButzmannL GessulatSMarxH et al (2014) Mass-spectrometry-based draft of the humanproteome Nature 509 582ndash587

30 GeigerT MaddenSF GallagherWM CoxJ and MannM(2012) Proteomic portrait of human breast cancer progressionidentifies novel prognostic markers Cancer Res 72 2428ndash2439

31 Moghaddas GholamiA HahneH WuZ AuerFJ MengCWilhelmM and KusterB (2013) Global proteome analysis of theNCI-60 cell line panel Cell Rep 4 609ndash620

32 PfisterTD ReinholdWC AgamaK GuptaS KhinSAKindersRJ ParchmentRE TomaszewskiJE DoroshowJH andPommierY (2009) Topoisomerase I levels in the NCI-60 cancer cellline panel determined by validated ELISA and microarray analysisand correlation with indenoisoquinoline sensitivity Mol CancerTher 8 1878ndash1884

33 StelzerG DalahI SteinTI SatanowerY RosenN NativNOz-LeviD OlenderT BelinkyF BahirI et al (2011) In-silicohuman genomics with GeneCards Hum Genomics 5 709ndash717

34 PruittKD BrownGR HiattSM Thibaud-NissenFAstashynA ErmolaevaO FarrellCM HartJ LandrumMJMcGarveyKM et al (2014) RefSeq an update on mammalianreference sequences Nucleic Acids Res 42 D756ndashD763

35 PaksereshtN AlakoB AmidC Cerdeno-TarragaA ClelandIGibsonR GoodgameN GurT JangM KayS et al (2014)Assembly information services in the European Nucleotide ArchiveNucleic Acids Res 42 D38ndashD43

36 HarrisTW BaranJ BieriT CabunocA ChanJ ChenWJDavisP DoneJ GroveC HoweK et al (2014) WormBase 2014new views of curated biology Nucleic Acids Res 42 D789ndashD793

37 BlakeJA BultCJ EppigJT KadinJA RichardsonJE andMouse Genome Database Group (2014) The Mouse GenomeDatabase integration of and access to knowledge about thelaboratory mouse Nucleic Acids Res 42 D810ndashD817

38 KanehisaM GotoS SatoY KawashimaM FurumichiM andTanabeM (2014) Data information knowledge and principle backto metabolism in KEGG Nucleic Acids Res 42 D199ndashD205

39 GrayKA DaughertyLC GordonSM SealRL WrightMWand BrufordEA (2013) Genenamesorg the HGNC resources in2013 Nucleic Acids Res 41 D545ndashD552

40 CostanzoMC EngelSR WongED LloydP KarraKChanET WengS PaskovKM RoeGR BinkleyG et al (2014)Saccharomyces genome database provides new regulation dataNucleic Acids Res 42 D717ndashD725

41 RosePW BiC BluhmWF ChristieCH DimitropoulosDDuttaS GreenRK GoodsellDS PrlicA QuesadaM et al(2013) The RCSB Protein Data Bank new resources for research andeducation Nucleic Acids Res 41 D475ndashD482

42 GutmanasA AlhroubY BattleGM BerrisfordJM BochetEConroyMJ DanaJM Fernandez MonteceloMA vanGinkelG GoreSP et al (2014) PDBe Protein Data Bank inEurope Nucleic Acids Res 42 D285ndashD291

43 BensonDA CavanaughM ClarkK Karsch-MizrachiILipmanDJ OstellJ and SayersEW (2013) GenBank NucleicAcids Res 41 D36ndashD42

44 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene ontology tool for the unification of biology The GeneOntology Consortium Nat Genet 25 25ndash29

45 SnyderM MiasG StanberryL and KolkerE (2014) Metadatachecklist for the integrated personal OMICS study proteomics andmetabolomics experiments Omics J Integr Biol 18 81ndash85

46 XueJ WijeratneSSK and ZempleniJ (2013) Holocarboxylasesynthetase synergizes with methyl CpG binding protein 2 and DNAmethyltransferase 1 in the transcriptional repression of long-terminalrepeats Epigenetics 8 504ndash511

47 StanevaR RukovaB HadjidekovaS NeshevaD AntonovaODimitrovP SimeonovV StamenovG CukuranovicRCukuranovicJ et al (2013) Whole genome methylation arrayanalysis reveals new aspects in Balkan endemic nephropathy etiologyBMC Nephrol 14 225

48 WilhelmT (2014) Phenotype prediction based on genome-wide DNAmethylation data BMC Bioinformatics 15 193

49 BlackerTS MannZF GaleJE ZieglerM BainAJSzabadkaiG and DuchenMR (2014) Separating NADH andNADPH fluorescence in live cells and tissues using FLIM NatCommun 5 3936

50 WilliamsBC FilterJJ Blake-HodekKA WadzinskiBEFudaNJ ShallowayD and GoldbergML (2014)Greatwall-phosphorylated Endosulfine is both an inhibitor and asubstrate of PP2A-B55 heterotrimers eLife 3 e01695

51 ChenM and PenningTM (2014) 5-reduced steroids and human(4)-3-ketosteroid 5-reductase (AKR1D1) Steroids 83 17ndash26

52 DelgadoAP BrandaoP and NarayananR (2014) Diabetesassociated genes from the dark matter of the human proteome MOJProteomics Bioinform 1 00020

53 DelgadoAP BrandaoP ChapadoMJ HamidS andNarayananR (2014) Open reading frames associated with cancer inthe dark matter of the human genome Cancer Genomics Proteomics11 201ndash213

54 HaynesWA HigdonR StanberryL CollinsD and KolkerE(2013) Differential expression analysis for pathways PLoS ComputBiol 9 e1002967

55 SubramanianA TamayoP MoothaVK MukherjeeSEbertBL GilletteMA PaulovichA PomeroySL GolubTRLanderES et al (2005) Gene set enrichment analysis aknowledge-based approach for interpreting genome-wide expressionprofiles Proc Natl Acad Sci USA 102 15545ndash15550

Page 2: Beyond protein expression, MOPED goes multi-omics€¦ · solely a protein expression database to a multi-omics resource for human and model organisms. Through a web-based interface,

D1146 Nucleic Acids Research 2015 Vol 43 Database issue

enables queries of diverse data types including genes pro-teins pathways metadata conditions localizations tissuesand organisms

NEW DATA

MOPED data come from individual researchers and publicrepositories (Table 1) (13ndash21) (httpproteomecentralproteomexchangeorg httpswwwproteomicsdborg)Consistent processing of hand-curated experiments pro-vides a foundation for integrated data analysis (1822ndash23)All proteomics data are processed and analyzed usingSPIRE protein expression data combined with normaltissue protein concentrations are used to uniquely estimateprotein concentrations (12224ndash25) Preprocessed tran-scriptomics data are downloaded from GEO and analyzedusing the R package LIMMA (826ndash27) The addition ofexperimental metadata provides details about experimentaldesign instrument processing and analysis (910)

The number of expression records contained in MOPEDhas increased to over 5 million sim1 million for protein ab-solute expression 100 000 for protein relative expressionand over 4 million for gene relative expression data (Table2) These expression records correspond to sim75 000 genesand 50 000 proteins from four organisms (human mouseworm yeast) MOPED includes data from sim25 000 hu-man genes and 18 000 human proteins These expressionrecords are from 670 unique combinations of experimentcondition localization and tissue Each MOPED experi-ment can include a number of these unique combinationsThe proteins and genes are linked to over 5000 pathwaysfrom three prominent pathway resources (BioCyc Reac-tome and PANTHER) (16ndash18)

MOPED is continually expanding by moni-toring and downloading new experimental datafrom public repositories PRIDE PeptideAtlasproteomeXchange ProteomicsDB and GEO (19ndash21) (httpproteomecentralproteomexchangeorghttpswwwproteomicsdborg) Recent additions toMOPED include two prominent Human Proteome studiesof protein absolute expression data on 30 tissues and celllines from the Pandey group (28) and on 21 tissues celllines and fluids from the Kuster group (Figure 1) (29)(raw data download from ProteomeXchange PXD000561and PXD000865) Other recent data additions includeabsolute expression from deep proteome profiling of 11 celllines (30) and proteomics analysis of the NCI-60 cell linepanel (31) To provide a valuable resource for investigatingcancer mechanisms transcriptomics data from the NCI-60cell line panel (32) are being incorporated along with theproteomics data

MOPED FEATURES

The enhanced MOPED interface enables querying of pro-teins genes and keywords to access expression data sum-mary information and advanced data visualization Theinterface not only allows exploration of expression datapoints but also provides pre-calculated pathway statisticssummary pages and visualizations for a comprehensive yetconcise view of the data The home search page features six

tabs Protein Absolute Expression Protein Relative Expres-sion Gene Relative Expression Pathways Experiments andVisualization

Each of the six tabs in MOPED is designed to en-able rapid querying MOPED uses Lucene for full-textindexing and AspectJ and Google Analytics for track-ing and optimization (Apache Foundation httpluceneapacheorg Eclipse Foundation httpeclipseorgaspectjGoogle httpwwwgooglecomanalytics) The queries arebuilt and executed via the Basic Search (Figure 2A and B)and Advanced Search (Figure 3) options The Highlightssection features preset examples

Search capabilities

Basic Search now supports wildcards and logic state-ments For example in the Gene Relative Expressiontab a basic search for lsquogene=BRCrsquo returns relativeexpression data for all genes from human and modelorganisms whose names start with BRC (Figure 2A)Similarly a basic search in the Gene Relative Expres-sion tab for lsquogene=BRC and organism=human and(condition=standard or condition=cancer)rsquo returns allhuman gene relative expression data for genesrsquo names thatstart with BRC and conditions that are either standard orhave lsquocancerrsquo in the name (Figure 2B)

As a multi-omics resource MOPED supports gene andprotein queries under each of the six tabs For examplea protein ID can be used to search under the Gene Rela-tive Expression tab to obtain all the gene relative expressionrecords for the gene that corresponds to the query proteinPathways that contain the query proteingene are conve-niently linked to the Results table This integration enablescomprehensive searching with a single ID whether gene orprotein

The Advanced Search panels available for all tabs havebeen expanded (Supplementary Figure S1) The drop-downmenus show all of the available search options for exampleorganism experiment condition localization tissue chro-mosome location expression ratio and data type In addi-tion MOPED allows switching between tabs while preserv-ing the query results

Results tables

Query results are displayed in tables that are divided intosections for ease of use For instance searching under GeneRelative Expression returns a table of gene relative expres-sion records Each record has an expression ratio and thecorresponding P-value for each gene in an experiment Thisexpression ratio reflects a comparison between either tis-sues or conditions as determined by the experimental de-sign The expression records are also linked to chromo-somes and pathways (14ndash18) All values can be sorted anddownloaded

Details pages

Detailed data on each biological entity are provided on theDetails pages For example a Gene Details page contains alink to the corresponding protein chromosome mappings

Nucleic Acids Research 2015 Vol 43 Database issue D1147

Table 1 External sources for MOPED data

Data type Expression data Names descriptions and mappings

Protein PRIDE PeptideAtlas proteomeXchangeProteomicsDB Individual Labs

UniProt

Gene GEO Ensembl EntrezPathway BioCyc PANTHER Reactome

Expression data come from public repositories and individual labs (13ndash21) (httpproteomecentralproteomexchangeorg httpswwwproteomicsdborg)

Table 2 Current data statistics for MOPED as of August 31 2014

Protein absoluteexpression

Protein relativeexpression

Gene relativeexpression

Number of expression records 960 574 92 824 4 224 869Unique combinations of experiment condition localizationand tissue

362 59 249

Number of experiments 73 18 183Number of conditions 94 48 272Number of tissues 137 17 112Number of human proteins or genes 18 475 7987 24 517Number of mouse proteins or genes 14 116 3557 26 904Number of worm proteins or genes 10 178 466 18 143Number of yeast proteins or genes 5402 0 5811Number of pathways 5417 3295 5517

Each experiment consists of unique combinations of conditions localizations and tissues Expression records are expression values for a protein or genecorresponding to a combination of experiment condition localization and tissue

Figure 1 Absolute expression chart for two prominent lsquoNaturersquo proteome studies which highlights the absolute expression of protein P07148 in standardconditions across different tissues (2829)

D1148 Nucleic Acids Research 2015 Vol 43 Database issue

Figure 2 (A) Basic Search example for lsquogene = BRCrsquo default view (B) Basic Search example for lsquogene=BRC and organism=human and(condition=standard or condition=cancer)rsquo extended view Search results are expanded to include P-value False Discovery Rate (FDR) localizationand chromosome

Figure 3 Pathway Absolute Expression bar chart of Scavenging of Heme from Plasma Chart can be compared and filtered by tissue or condition Mouse-overs provide more information about the expression data point

Nucleic Acids Research 2015 Vol 43 Database issue D1149

gene description and a link to GeneCards (33) Genes arealso linked to external sources including NCBI Gene GOetc and pathways (Reactome PANTHER and BioCyc)(Table 3) (13ndash1833ndash44) Gene and Protein Details pagesdisplay gene and protein expression data respectively Thepages are enhanced by the embedded expression bar charts

Details pages for Pathways display pathway absolute ex-pression data and their bar graph together with links toexternal resources and a list of associated genes and pro-teins The associated genes and proteins link directly totheir expression data The Experiment Details page includesthe standardized experimental metadata checklist and sum-mary statistics such as the number of over- or under-expressed proteins or genes (910)

Pathways

MOPED adopted a pathway-centric view to integrate geneand protein expression data The new Pathways tab enablesquerying for pathway names keywords and biomolecules inaddition to the standard organism experiment conditionlocalization and tissue options Protein absolute expressiondata from a given experiment are mapped onto establishedpathways from Reactome PANTHER and BioCyc (16ndash18)Using this mapping MOPED provides summary statisticsfor pathway expression to enable a view of proteins workingtogether in the same pathway These statistics include path-way coverage odds ratio and the corresponding P-value cal-culated by Fisherrsquos exact test (12)

Versatile query and search tools enable complex queriesto explore these experiment-level pathway datasets Underthe Pathways tab researchers can query for pathways byfull or partial names as well as by bimolecular identifiersThe pathway expression results can then be sorted or down-loaded For example the user can visualize the results forpathways expressed in cystic fibrosis-related experiments(Figure 3)

Experimental metadata

Under the Experiments tab researchers can search filterand review experimental metadata For example queryingfor lsquocystic fibrosisrsquo will return a list of experiments wherethe search term appears in the experiment description Theresults include summary statistics on genes proteins andpathways expressed in each experiment

Experimental metadata are captured using a multi-omicschecklist (9ndash1145) This checklist provides informationabout experimental design data collection and analysisThis added layer of experimental metadata can be used byresearchers to assess the quality consistency usability andreproducibility of data in MOPED

MOPED currently contains experimental meta-data checklists on 12 proteomics and all 183 tran-scriptomics experiments (910) For example the sny-der personal omics profiling experiment has experimentalmetadata describing experiment design data collectionand data analysis (Supplementary Figure S2) (45) Built-inlinks to GEO provide additional information about thetranscriptomics experiments (21) The MOPED team iscurrently working on completing metadata checklists forall experiments in MOPED

Visualizations

MOPED provides interactive visualizations including bargraphs of protein gene and pathway expression levelsacross experiments that can be viewed by either tissue orcondition (Figures 1 and 3) The expression matrix com-pares expression of neighboring genes and correspondingproteins along the chromosome that can be grouped by tis-sue or condition The chord diagram gives an overview ofMOPED data content for different organisms Mouse-overdescriptions enhance the plots by providing more detailedgraphical and experimental information

MOPEDrsquoS IMPACT

MOPED provides support for the scientific community asa resource of processed expression data Researchers haveused MOPED to help identify potential proteins to studybased on expression levels (46) to compare expression ofdifferent proteins across experiments conditions localiza-tions and tissues (4748) to investigate expression levels invarious tissue types cell lines conditions and diseases (49ndash51) and to help link uncharacterized proteins to diabetesand cancer (5253)

MOPED was originally created in response to a sur-vey indicating a need for a processed expression database(13) Through direct and indirect interactions the MOPEDteam is able to learn the needs of biological and biomedicalresearchers and respond with updates and improvementsWith this feedback in mind MOPED acknowledges thechanging needs of the life sciences community

Data submissions

To be a complete and accurate data resource MOPEDneeds to reflect the current understanding of gene proteinand pathway expression patterns through access to the mostrecent data and experiments MOPED provides a conve-nient venue for researchers to both make their data pub-lic and share these privately with collaborators and review-ers (13) Private MOPED provides the same features toolsand conveniences as the public MOPED but enables theowner of the data to both limit access and make the datapublic when appropriate If interested in submitting data toMOPED please contact the MOPED team through the Fo-rum httpmoped-forumproteinspireorg

Future directions

As the field of biology rapidly advances MOPED is ad-vancing with it MOPED took the following steps to com-plement developments in research transitioned from pro-teomics to multi-omics (8) added metadata to enable morereliable use and integration of multiple data sources (911)and used pathways to create a coherent view of multi-omics data and an experimental level assessment of ex-pression (12) The next steps for MOPED are to enhanceintegration of different MOPED data types This will bedone by providing integrated data views graphical dis-plays and pathway analyses that show and utilize multipleomics data types In addition future versions of MOPED

D1150 Nucleic Acids Research 2015 Vol 43 Database issue

Table 3 MOPED references the following external resources on the Protein and Gene Details pages (13ndash1633ndash43)

External source Protein Gene Reference

GeneCards X X (33)PDB X (4142)BioCyc X (16)UniProt X (13)NCBI Gene X X (15)RefSeq X (34)ensembl X X (14)EMBL X (35)

WormBase X X (36)MGI X (37)KEGG X (38)HGNC X X (39)SGD X (40)GenBank X (43)GO X X (44)

indicates resources that are cross-referenced with MOPED

will offer disease-centric data integration and include addi-tional model organisms The disease-centric view will allowthe comparison of biomolecular processes and their func-tions between normal and disease states Also MOPEDplans to add pathway relative expression statistics to iden-tify overunder-expression of pathways in relative expres-sion experiments for both proteomics and transcriptomicsdata (5455) These steps will continue the advancement ofMOPED as a resource for biological and biomedical sci-ence

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGMENT

We thank Maggie Lackey for her critical reading themanuscript

FUNDING

National Science Foundation under the Division of Bio-logical Infrastructure [0969929] The Robert B McMillenFoundation Seattle Childrenrsquos Research Institute [to EK]Funding for open access charge Seattle Childrenrsquos ResearchInstituteConflict of interest statement None declared

REFERENCES1 HigdonR StewartE StanberryL HaynesW ChoiniereJ

MontagueE AndersonN YandlG JankoI BroomallW et al(2014) MOPED enables discoveries through consistently processedproteomics data J Proteome Res 13 107ndash113

2 KolkerE HigdonR HaynesW WelchD BroomallWLancetD StanberryL and KolkerN (2012) MOPED ModelOrganism Protein Expression Database Nucleic Acids Res 40D1093ndashD1099

3 HigdonR HaynesW StanberryL StewartE YandlGHowardC BroomallW KolkerN and KolkerE (2013) Unravelingthe complexities of life sciences data Big Data 1 42ndash50

4 ZhangW LiF and NieL (2010) Integrating multiple lsquoomicsrsquoanalysis for microbial biology application and methodologiesMicrobiol Read Engl 156 287ndash301

5 SikaroodiM GalachiantzY and BaranovaA (2010) Tumormarkers the potential of lsquoomicsrsquo approach Curr Mol Med 10249ndash257

6 StanberryL MiasGI HaynesW HigdonR SnyderM andKolkerE (2013) Integrative analysis of longitudinal metabolomicsdata from a personal multi-omics profile Metabolites 3 741ndash760

7 ChenR MiasGI Li-Pook-ThanJ JiangL LamHYKChenR MiriamiE KarczewskiKJ HariharanM DeweyFEet al (2012) Personal omics profiling reveals dynamic molecular andmedical phenotypes Cell 148 1293ndash1307

8 MontagueE StanberryL HigdonR JankoI LeeEAndersonN ChoiniereJ StewartE YandlG BroomallW et al(2014) MOPED 25ndashndashan integrated multi-omics resourceMulti-Omics Profiling Expression Database now includestranscriptomics data Omics J Integr Biol 18 335ndash343

9 KolkerE OzdemirV MartensL HancockW AndersonGAndersonN AynaciogluS BaranovaA CampagnaSR ChenRet al (2014) Toward more transparent and reproducible omics studiesthrough a common metadata checklist and data publications OmicsJ Integr Biol 18 10ndash14

10 DumbillE and KolkerE (2013) Introducing a metadata checklistfor omics data Big Data 1 195ndash195

11 Announcement Reducing our irreproducibility (2013) Nature 496398ndash398

12 KhatriP SirotaM and ButteAJ (2012) Ten years of pathwayanalysis current approaches and outstanding challenges PLoSComput Biol 8 e1002375

13 The UniProt Consortium (2014) Activities at the Universal ProteinResource (UniProt) Nucleic Acids Res 42 D191ndashD198

14 FlicekP AmodeMR BarrellD BealK BillisK BrentSCarvalho-SilvaD ClaphamP CoatesG FitzgeraldS et al (2014)Ensembl 2014 Nucleic Acids Res 42 D749ndashD755

15 MaglottD OstellJ PruittKD and TatusovaT (2011) EntrezGene gene-centered information at NCBI Nucleic Acids Res 39D52ndashD57

16 CaspiR AltmanT BillingtonR DreherK FoersterHFulcherCA HollandTA KeselerIM KothariA KuboA et al(2014) The MetaCyc database of metabolic pathways and enzymesand the BioCyc collection of PathwayGenome Databases NucleicAcids Res 42 D459ndashD471

17 MiH MuruganujanA and ThomasPD (2013) PANTHER in2013 modeling the evolution of gene function and other geneattributes in the context of phylogenetic trees Nucleic Acids Res 41D377ndashD386

18 CroftD MundoAF HawR MilacicM WeiserJ WuGCaudyM GarapatiP GillespieM KamdarMR et al (2014) The

Nucleic Acids Research 2015 Vol 43 Database issue D1151

Reactome pathway knowledgebase Nucleic Acids Res 42D472ndashD477

19 VizcaınoJA CoteRG CsordasA DianesJA FabregatAFosterJM GrissJ AlpiE BirimM ContellJ et al (2013) ThePRoteomics IDEntifications (PRIDE) database and associated toolsstatus in 2013 Nucleic Acids Res 41 D1063ndashD1069

20 DesiereF (2006) The PeptideAtlas project Nucleic Acids Res 34D655ndashD658

21 BarrettT WilhiteSE LedouxP EvangelistaC KimIFTomashevskyM MarshallKA PhillippyKH ShermanPMHolkoM et al (2013) NCBI GEO archive for functional genomicsdata setsndashndashupdate Nucleic Acids Res 41 D991ndashD995

22 KolkerE HigdonR WelchD BaumanA StewartE HaynesWBroomallW and KolkerN (2011) SPIRE systematic proteininvestigative research environment J Proteomics 75 122ndash126

23 HigdonRoger van BelleGerald and KolkerEugene (2008) A noteon the false discovery rate and inconsistent comparisons betweenexperiments Bioinformatics (Oxford England) 24 1225ndash1228[PubMed]

24 HatherG HigdonR BaumanA von HallerPD and KolkerE(2010) Estimating false discovery rates for peptide and proteinidentification using randomized databases Proteomics 102369ndash2376

25 HigdonR ReiterL HatherG HaynesW KolkerN StewartEBaumanAT PicottiP SchmidtA van BelleG et al (2011) IPMan integrated protein model for false discovery rate estimation andidentification in high-throughput proteomics J Proteomics 75116ndash121

26 HolzmanT and KolkerE (2004) Statistical analysis of global geneexpression data some practical considerations Curr OpinBiotechnol 15 52ndash57

27 SmythGK (2005) Limma linear models for microarray data InGentlemanR CareyV DudoitS IrizarryR and HuberW (eds)Bioinformatics and Computational Biology Solutions using R andBioconductor Springer NY pp 397ndash420

28 KimM-S PintoSM GetnetD NirujogiRS MandaSSChaerkadyR MadugunduAK KelkarDS IsserlinR JainSet al (2014) A draft map of the human proteome Nature 509575ndash581

29 WilhelmM SchleglJ HahneH Moghaddas GholamiALieberenzM SavitskiMM ZieglerE ButzmannL GessulatSMarxH et al (2014) Mass-spectrometry-based draft of the humanproteome Nature 509 582ndash587

30 GeigerT MaddenSF GallagherWM CoxJ and MannM(2012) Proteomic portrait of human breast cancer progressionidentifies novel prognostic markers Cancer Res 72 2428ndash2439

31 Moghaddas GholamiA HahneH WuZ AuerFJ MengCWilhelmM and KusterB (2013) Global proteome analysis of theNCI-60 cell line panel Cell Rep 4 609ndash620

32 PfisterTD ReinholdWC AgamaK GuptaS KhinSAKindersRJ ParchmentRE TomaszewskiJE DoroshowJH andPommierY (2009) Topoisomerase I levels in the NCI-60 cancer cellline panel determined by validated ELISA and microarray analysisand correlation with indenoisoquinoline sensitivity Mol CancerTher 8 1878ndash1884

33 StelzerG DalahI SteinTI SatanowerY RosenN NativNOz-LeviD OlenderT BelinkyF BahirI et al (2011) In-silicohuman genomics with GeneCards Hum Genomics 5 709ndash717

34 PruittKD BrownGR HiattSM Thibaud-NissenFAstashynA ErmolaevaO FarrellCM HartJ LandrumMJMcGarveyKM et al (2014) RefSeq an update on mammalianreference sequences Nucleic Acids Res 42 D756ndashD763

35 PaksereshtN AlakoB AmidC Cerdeno-TarragaA ClelandIGibsonR GoodgameN GurT JangM KayS et al (2014)Assembly information services in the European Nucleotide ArchiveNucleic Acids Res 42 D38ndashD43

36 HarrisTW BaranJ BieriT CabunocA ChanJ ChenWJDavisP DoneJ GroveC HoweK et al (2014) WormBase 2014new views of curated biology Nucleic Acids Res 42 D789ndashD793

37 BlakeJA BultCJ EppigJT KadinJA RichardsonJE andMouse Genome Database Group (2014) The Mouse GenomeDatabase integration of and access to knowledge about thelaboratory mouse Nucleic Acids Res 42 D810ndashD817

38 KanehisaM GotoS SatoY KawashimaM FurumichiM andTanabeM (2014) Data information knowledge and principle backto metabolism in KEGG Nucleic Acids Res 42 D199ndashD205

39 GrayKA DaughertyLC GordonSM SealRL WrightMWand BrufordEA (2013) Genenamesorg the HGNC resources in2013 Nucleic Acids Res 41 D545ndashD552

40 CostanzoMC EngelSR WongED LloydP KarraKChanET WengS PaskovKM RoeGR BinkleyG et al (2014)Saccharomyces genome database provides new regulation dataNucleic Acids Res 42 D717ndashD725

41 RosePW BiC BluhmWF ChristieCH DimitropoulosDDuttaS GreenRK GoodsellDS PrlicA QuesadaM et al(2013) The RCSB Protein Data Bank new resources for research andeducation Nucleic Acids Res 41 D475ndashD482

42 GutmanasA AlhroubY BattleGM BerrisfordJM BochetEConroyMJ DanaJM Fernandez MonteceloMA vanGinkelG GoreSP et al (2014) PDBe Protein Data Bank inEurope Nucleic Acids Res 42 D285ndashD291

43 BensonDA CavanaughM ClarkK Karsch-MizrachiILipmanDJ OstellJ and SayersEW (2013) GenBank NucleicAcids Res 41 D36ndashD42

44 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene ontology tool for the unification of biology The GeneOntology Consortium Nat Genet 25 25ndash29

45 SnyderM MiasG StanberryL and KolkerE (2014) Metadatachecklist for the integrated personal OMICS study proteomics andmetabolomics experiments Omics J Integr Biol 18 81ndash85

46 XueJ WijeratneSSK and ZempleniJ (2013) Holocarboxylasesynthetase synergizes with methyl CpG binding protein 2 and DNAmethyltransferase 1 in the transcriptional repression of long-terminalrepeats Epigenetics 8 504ndash511

47 StanevaR RukovaB HadjidekovaS NeshevaD AntonovaODimitrovP SimeonovV StamenovG CukuranovicRCukuranovicJ et al (2013) Whole genome methylation arrayanalysis reveals new aspects in Balkan endemic nephropathy etiologyBMC Nephrol 14 225

48 WilhelmT (2014) Phenotype prediction based on genome-wide DNAmethylation data BMC Bioinformatics 15 193

49 BlackerTS MannZF GaleJE ZieglerM BainAJSzabadkaiG and DuchenMR (2014) Separating NADH andNADPH fluorescence in live cells and tissues using FLIM NatCommun 5 3936

50 WilliamsBC FilterJJ Blake-HodekKA WadzinskiBEFudaNJ ShallowayD and GoldbergML (2014)Greatwall-phosphorylated Endosulfine is both an inhibitor and asubstrate of PP2A-B55 heterotrimers eLife 3 e01695

51 ChenM and PenningTM (2014) 5-reduced steroids and human(4)-3-ketosteroid 5-reductase (AKR1D1) Steroids 83 17ndash26

52 DelgadoAP BrandaoP and NarayananR (2014) Diabetesassociated genes from the dark matter of the human proteome MOJProteomics Bioinform 1 00020

53 DelgadoAP BrandaoP ChapadoMJ HamidS andNarayananR (2014) Open reading frames associated with cancer inthe dark matter of the human genome Cancer Genomics Proteomics11 201ndash213

54 HaynesWA HigdonR StanberryL CollinsD and KolkerE(2013) Differential expression analysis for pathways PLoS ComputBiol 9 e1002967

55 SubramanianA TamayoP MoothaVK MukherjeeSEbertBL GilletteMA PaulovichA PomeroySL GolubTRLanderES et al (2005) Gene set enrichment analysis aknowledge-based approach for interpreting genome-wide expressionprofiles Proc Natl Acad Sci USA 102 15545ndash15550

Page 3: Beyond protein expression, MOPED goes multi-omics€¦ · solely a protein expression database to a multi-omics resource for human and model organisms. Through a web-based interface,

Nucleic Acids Research 2015 Vol 43 Database issue D1147

Table 1 External sources for MOPED data

Data type Expression data Names descriptions and mappings

Protein PRIDE PeptideAtlas proteomeXchangeProteomicsDB Individual Labs

UniProt

Gene GEO Ensembl EntrezPathway BioCyc PANTHER Reactome

Expression data come from public repositories and individual labs (13ndash21) (httpproteomecentralproteomexchangeorg httpswwwproteomicsdborg)

Table 2 Current data statistics for MOPED as of August 31 2014

Protein absoluteexpression

Protein relativeexpression

Gene relativeexpression

Number of expression records 960 574 92 824 4 224 869Unique combinations of experiment condition localizationand tissue

362 59 249

Number of experiments 73 18 183Number of conditions 94 48 272Number of tissues 137 17 112Number of human proteins or genes 18 475 7987 24 517Number of mouse proteins or genes 14 116 3557 26 904Number of worm proteins or genes 10 178 466 18 143Number of yeast proteins or genes 5402 0 5811Number of pathways 5417 3295 5517

Each experiment consists of unique combinations of conditions localizations and tissues Expression records are expression values for a protein or genecorresponding to a combination of experiment condition localization and tissue

Figure 1 Absolute expression chart for two prominent lsquoNaturersquo proteome studies which highlights the absolute expression of protein P07148 in standardconditions across different tissues (2829)

D1148 Nucleic Acids Research 2015 Vol 43 Database issue

Figure 2 (A) Basic Search example for lsquogene = BRCrsquo default view (B) Basic Search example for lsquogene=BRC and organism=human and(condition=standard or condition=cancer)rsquo extended view Search results are expanded to include P-value False Discovery Rate (FDR) localizationand chromosome

Figure 3 Pathway Absolute Expression bar chart of Scavenging of Heme from Plasma Chart can be compared and filtered by tissue or condition Mouse-overs provide more information about the expression data point

Nucleic Acids Research 2015 Vol 43 Database issue D1149

gene description and a link to GeneCards (33) Genes arealso linked to external sources including NCBI Gene GOetc and pathways (Reactome PANTHER and BioCyc)(Table 3) (13ndash1833ndash44) Gene and Protein Details pagesdisplay gene and protein expression data respectively Thepages are enhanced by the embedded expression bar charts

Details pages for Pathways display pathway absolute ex-pression data and their bar graph together with links toexternal resources and a list of associated genes and pro-teins The associated genes and proteins link directly totheir expression data The Experiment Details page includesthe standardized experimental metadata checklist and sum-mary statistics such as the number of over- or under-expressed proteins or genes (910)

Pathways

MOPED adopted a pathway-centric view to integrate geneand protein expression data The new Pathways tab enablesquerying for pathway names keywords and biomolecules inaddition to the standard organism experiment conditionlocalization and tissue options Protein absolute expressiondata from a given experiment are mapped onto establishedpathways from Reactome PANTHER and BioCyc (16ndash18)Using this mapping MOPED provides summary statisticsfor pathway expression to enable a view of proteins workingtogether in the same pathway These statistics include path-way coverage odds ratio and the corresponding P-value cal-culated by Fisherrsquos exact test (12)

Versatile query and search tools enable complex queriesto explore these experiment-level pathway datasets Underthe Pathways tab researchers can query for pathways byfull or partial names as well as by bimolecular identifiersThe pathway expression results can then be sorted or down-loaded For example the user can visualize the results forpathways expressed in cystic fibrosis-related experiments(Figure 3)

Experimental metadata

Under the Experiments tab researchers can search filterand review experimental metadata For example queryingfor lsquocystic fibrosisrsquo will return a list of experiments wherethe search term appears in the experiment description Theresults include summary statistics on genes proteins andpathways expressed in each experiment

Experimental metadata are captured using a multi-omicschecklist (9ndash1145) This checklist provides informationabout experimental design data collection and analysisThis added layer of experimental metadata can be used byresearchers to assess the quality consistency usability andreproducibility of data in MOPED

MOPED currently contains experimental meta-data checklists on 12 proteomics and all 183 tran-scriptomics experiments (910) For example the sny-der personal omics profiling experiment has experimentalmetadata describing experiment design data collectionand data analysis (Supplementary Figure S2) (45) Built-inlinks to GEO provide additional information about thetranscriptomics experiments (21) The MOPED team iscurrently working on completing metadata checklists forall experiments in MOPED

Visualizations

MOPED provides interactive visualizations including bargraphs of protein gene and pathway expression levelsacross experiments that can be viewed by either tissue orcondition (Figures 1 and 3) The expression matrix com-pares expression of neighboring genes and correspondingproteins along the chromosome that can be grouped by tis-sue or condition The chord diagram gives an overview ofMOPED data content for different organisms Mouse-overdescriptions enhance the plots by providing more detailedgraphical and experimental information

MOPEDrsquoS IMPACT

MOPED provides support for the scientific community asa resource of processed expression data Researchers haveused MOPED to help identify potential proteins to studybased on expression levels (46) to compare expression ofdifferent proteins across experiments conditions localiza-tions and tissues (4748) to investigate expression levels invarious tissue types cell lines conditions and diseases (49ndash51) and to help link uncharacterized proteins to diabetesand cancer (5253)

MOPED was originally created in response to a sur-vey indicating a need for a processed expression database(13) Through direct and indirect interactions the MOPEDteam is able to learn the needs of biological and biomedicalresearchers and respond with updates and improvementsWith this feedback in mind MOPED acknowledges thechanging needs of the life sciences community

Data submissions

To be a complete and accurate data resource MOPEDneeds to reflect the current understanding of gene proteinand pathway expression patterns through access to the mostrecent data and experiments MOPED provides a conve-nient venue for researchers to both make their data pub-lic and share these privately with collaborators and review-ers (13) Private MOPED provides the same features toolsand conveniences as the public MOPED but enables theowner of the data to both limit access and make the datapublic when appropriate If interested in submitting data toMOPED please contact the MOPED team through the Fo-rum httpmoped-forumproteinspireorg

Future directions

As the field of biology rapidly advances MOPED is ad-vancing with it MOPED took the following steps to com-plement developments in research transitioned from pro-teomics to multi-omics (8) added metadata to enable morereliable use and integration of multiple data sources (911)and used pathways to create a coherent view of multi-omics data and an experimental level assessment of ex-pression (12) The next steps for MOPED are to enhanceintegration of different MOPED data types This will bedone by providing integrated data views graphical dis-plays and pathway analyses that show and utilize multipleomics data types In addition future versions of MOPED

D1150 Nucleic Acids Research 2015 Vol 43 Database issue

Table 3 MOPED references the following external resources on the Protein and Gene Details pages (13ndash1633ndash43)

External source Protein Gene Reference

GeneCards X X (33)PDB X (4142)BioCyc X (16)UniProt X (13)NCBI Gene X X (15)RefSeq X (34)ensembl X X (14)EMBL X (35)

WormBase X X (36)MGI X (37)KEGG X (38)HGNC X X (39)SGD X (40)GenBank X (43)GO X X (44)

indicates resources that are cross-referenced with MOPED

will offer disease-centric data integration and include addi-tional model organisms The disease-centric view will allowthe comparison of biomolecular processes and their func-tions between normal and disease states Also MOPEDplans to add pathway relative expression statistics to iden-tify overunder-expression of pathways in relative expres-sion experiments for both proteomics and transcriptomicsdata (5455) These steps will continue the advancement ofMOPED as a resource for biological and biomedical sci-ence

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGMENT

We thank Maggie Lackey for her critical reading themanuscript

FUNDING

National Science Foundation under the Division of Bio-logical Infrastructure [0969929] The Robert B McMillenFoundation Seattle Childrenrsquos Research Institute [to EK]Funding for open access charge Seattle Childrenrsquos ResearchInstituteConflict of interest statement None declared

REFERENCES1 HigdonR StewartE StanberryL HaynesW ChoiniereJ

MontagueE AndersonN YandlG JankoI BroomallW et al(2014) MOPED enables discoveries through consistently processedproteomics data J Proteome Res 13 107ndash113

2 KolkerE HigdonR HaynesW WelchD BroomallWLancetD StanberryL and KolkerN (2012) MOPED ModelOrganism Protein Expression Database Nucleic Acids Res 40D1093ndashD1099

3 HigdonR HaynesW StanberryL StewartE YandlGHowardC BroomallW KolkerN and KolkerE (2013) Unravelingthe complexities of life sciences data Big Data 1 42ndash50

4 ZhangW LiF and NieL (2010) Integrating multiple lsquoomicsrsquoanalysis for microbial biology application and methodologiesMicrobiol Read Engl 156 287ndash301

5 SikaroodiM GalachiantzY and BaranovaA (2010) Tumormarkers the potential of lsquoomicsrsquo approach Curr Mol Med 10249ndash257

6 StanberryL MiasGI HaynesW HigdonR SnyderM andKolkerE (2013) Integrative analysis of longitudinal metabolomicsdata from a personal multi-omics profile Metabolites 3 741ndash760

7 ChenR MiasGI Li-Pook-ThanJ JiangL LamHYKChenR MiriamiE KarczewskiKJ HariharanM DeweyFEet al (2012) Personal omics profiling reveals dynamic molecular andmedical phenotypes Cell 148 1293ndash1307

8 MontagueE StanberryL HigdonR JankoI LeeEAndersonN ChoiniereJ StewartE YandlG BroomallW et al(2014) MOPED 25ndashndashan integrated multi-omics resourceMulti-Omics Profiling Expression Database now includestranscriptomics data Omics J Integr Biol 18 335ndash343

9 KolkerE OzdemirV MartensL HancockW AndersonGAndersonN AynaciogluS BaranovaA CampagnaSR ChenRet al (2014) Toward more transparent and reproducible omics studiesthrough a common metadata checklist and data publications OmicsJ Integr Biol 18 10ndash14

10 DumbillE and KolkerE (2013) Introducing a metadata checklistfor omics data Big Data 1 195ndash195

11 Announcement Reducing our irreproducibility (2013) Nature 496398ndash398

12 KhatriP SirotaM and ButteAJ (2012) Ten years of pathwayanalysis current approaches and outstanding challenges PLoSComput Biol 8 e1002375

13 The UniProt Consortium (2014) Activities at the Universal ProteinResource (UniProt) Nucleic Acids Res 42 D191ndashD198

14 FlicekP AmodeMR BarrellD BealK BillisK BrentSCarvalho-SilvaD ClaphamP CoatesG FitzgeraldS et al (2014)Ensembl 2014 Nucleic Acids Res 42 D749ndashD755

15 MaglottD OstellJ PruittKD and TatusovaT (2011) EntrezGene gene-centered information at NCBI Nucleic Acids Res 39D52ndashD57

16 CaspiR AltmanT BillingtonR DreherK FoersterHFulcherCA HollandTA KeselerIM KothariA KuboA et al(2014) The MetaCyc database of metabolic pathways and enzymesand the BioCyc collection of PathwayGenome Databases NucleicAcids Res 42 D459ndashD471

17 MiH MuruganujanA and ThomasPD (2013) PANTHER in2013 modeling the evolution of gene function and other geneattributes in the context of phylogenetic trees Nucleic Acids Res 41D377ndashD386

18 CroftD MundoAF HawR MilacicM WeiserJ WuGCaudyM GarapatiP GillespieM KamdarMR et al (2014) The

Nucleic Acids Research 2015 Vol 43 Database issue D1151

Reactome pathway knowledgebase Nucleic Acids Res 42D472ndashD477

19 VizcaınoJA CoteRG CsordasA DianesJA FabregatAFosterJM GrissJ AlpiE BirimM ContellJ et al (2013) ThePRoteomics IDEntifications (PRIDE) database and associated toolsstatus in 2013 Nucleic Acids Res 41 D1063ndashD1069

20 DesiereF (2006) The PeptideAtlas project Nucleic Acids Res 34D655ndashD658

21 BarrettT WilhiteSE LedouxP EvangelistaC KimIFTomashevskyM MarshallKA PhillippyKH ShermanPMHolkoM et al (2013) NCBI GEO archive for functional genomicsdata setsndashndashupdate Nucleic Acids Res 41 D991ndashD995

22 KolkerE HigdonR WelchD BaumanA StewartE HaynesWBroomallW and KolkerN (2011) SPIRE systematic proteininvestigative research environment J Proteomics 75 122ndash126

23 HigdonRoger van BelleGerald and KolkerEugene (2008) A noteon the false discovery rate and inconsistent comparisons betweenexperiments Bioinformatics (Oxford England) 24 1225ndash1228[PubMed]

24 HatherG HigdonR BaumanA von HallerPD and KolkerE(2010) Estimating false discovery rates for peptide and proteinidentification using randomized databases Proteomics 102369ndash2376

25 HigdonR ReiterL HatherG HaynesW KolkerN StewartEBaumanAT PicottiP SchmidtA van BelleG et al (2011) IPMan integrated protein model for false discovery rate estimation andidentification in high-throughput proteomics J Proteomics 75116ndash121

26 HolzmanT and KolkerE (2004) Statistical analysis of global geneexpression data some practical considerations Curr OpinBiotechnol 15 52ndash57

27 SmythGK (2005) Limma linear models for microarray data InGentlemanR CareyV DudoitS IrizarryR and HuberW (eds)Bioinformatics and Computational Biology Solutions using R andBioconductor Springer NY pp 397ndash420

28 KimM-S PintoSM GetnetD NirujogiRS MandaSSChaerkadyR MadugunduAK KelkarDS IsserlinR JainSet al (2014) A draft map of the human proteome Nature 509575ndash581

29 WilhelmM SchleglJ HahneH Moghaddas GholamiALieberenzM SavitskiMM ZieglerE ButzmannL GessulatSMarxH et al (2014) Mass-spectrometry-based draft of the humanproteome Nature 509 582ndash587

30 GeigerT MaddenSF GallagherWM CoxJ and MannM(2012) Proteomic portrait of human breast cancer progressionidentifies novel prognostic markers Cancer Res 72 2428ndash2439

31 Moghaddas GholamiA HahneH WuZ AuerFJ MengCWilhelmM and KusterB (2013) Global proteome analysis of theNCI-60 cell line panel Cell Rep 4 609ndash620

32 PfisterTD ReinholdWC AgamaK GuptaS KhinSAKindersRJ ParchmentRE TomaszewskiJE DoroshowJH andPommierY (2009) Topoisomerase I levels in the NCI-60 cancer cellline panel determined by validated ELISA and microarray analysisand correlation with indenoisoquinoline sensitivity Mol CancerTher 8 1878ndash1884

33 StelzerG DalahI SteinTI SatanowerY RosenN NativNOz-LeviD OlenderT BelinkyF BahirI et al (2011) In-silicohuman genomics with GeneCards Hum Genomics 5 709ndash717

34 PruittKD BrownGR HiattSM Thibaud-NissenFAstashynA ErmolaevaO FarrellCM HartJ LandrumMJMcGarveyKM et al (2014) RefSeq an update on mammalianreference sequences Nucleic Acids Res 42 D756ndashD763

35 PaksereshtN AlakoB AmidC Cerdeno-TarragaA ClelandIGibsonR GoodgameN GurT JangM KayS et al (2014)Assembly information services in the European Nucleotide ArchiveNucleic Acids Res 42 D38ndashD43

36 HarrisTW BaranJ BieriT CabunocA ChanJ ChenWJDavisP DoneJ GroveC HoweK et al (2014) WormBase 2014new views of curated biology Nucleic Acids Res 42 D789ndashD793

37 BlakeJA BultCJ EppigJT KadinJA RichardsonJE andMouse Genome Database Group (2014) The Mouse GenomeDatabase integration of and access to knowledge about thelaboratory mouse Nucleic Acids Res 42 D810ndashD817

38 KanehisaM GotoS SatoY KawashimaM FurumichiM andTanabeM (2014) Data information knowledge and principle backto metabolism in KEGG Nucleic Acids Res 42 D199ndashD205

39 GrayKA DaughertyLC GordonSM SealRL WrightMWand BrufordEA (2013) Genenamesorg the HGNC resources in2013 Nucleic Acids Res 41 D545ndashD552

40 CostanzoMC EngelSR WongED LloydP KarraKChanET WengS PaskovKM RoeGR BinkleyG et al (2014)Saccharomyces genome database provides new regulation dataNucleic Acids Res 42 D717ndashD725

41 RosePW BiC BluhmWF ChristieCH DimitropoulosDDuttaS GreenRK GoodsellDS PrlicA QuesadaM et al(2013) The RCSB Protein Data Bank new resources for research andeducation Nucleic Acids Res 41 D475ndashD482

42 GutmanasA AlhroubY BattleGM BerrisfordJM BochetEConroyMJ DanaJM Fernandez MonteceloMA vanGinkelG GoreSP et al (2014) PDBe Protein Data Bank inEurope Nucleic Acids Res 42 D285ndashD291

43 BensonDA CavanaughM ClarkK Karsch-MizrachiILipmanDJ OstellJ and SayersEW (2013) GenBank NucleicAcids Res 41 D36ndashD42

44 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene ontology tool for the unification of biology The GeneOntology Consortium Nat Genet 25 25ndash29

45 SnyderM MiasG StanberryL and KolkerE (2014) Metadatachecklist for the integrated personal OMICS study proteomics andmetabolomics experiments Omics J Integr Biol 18 81ndash85

46 XueJ WijeratneSSK and ZempleniJ (2013) Holocarboxylasesynthetase synergizes with methyl CpG binding protein 2 and DNAmethyltransferase 1 in the transcriptional repression of long-terminalrepeats Epigenetics 8 504ndash511

47 StanevaR RukovaB HadjidekovaS NeshevaD AntonovaODimitrovP SimeonovV StamenovG CukuranovicRCukuranovicJ et al (2013) Whole genome methylation arrayanalysis reveals new aspects in Balkan endemic nephropathy etiologyBMC Nephrol 14 225

48 WilhelmT (2014) Phenotype prediction based on genome-wide DNAmethylation data BMC Bioinformatics 15 193

49 BlackerTS MannZF GaleJE ZieglerM BainAJSzabadkaiG and DuchenMR (2014) Separating NADH andNADPH fluorescence in live cells and tissues using FLIM NatCommun 5 3936

50 WilliamsBC FilterJJ Blake-HodekKA WadzinskiBEFudaNJ ShallowayD and GoldbergML (2014)Greatwall-phosphorylated Endosulfine is both an inhibitor and asubstrate of PP2A-B55 heterotrimers eLife 3 e01695

51 ChenM and PenningTM (2014) 5-reduced steroids and human(4)-3-ketosteroid 5-reductase (AKR1D1) Steroids 83 17ndash26

52 DelgadoAP BrandaoP and NarayananR (2014) Diabetesassociated genes from the dark matter of the human proteome MOJProteomics Bioinform 1 00020

53 DelgadoAP BrandaoP ChapadoMJ HamidS andNarayananR (2014) Open reading frames associated with cancer inthe dark matter of the human genome Cancer Genomics Proteomics11 201ndash213

54 HaynesWA HigdonR StanberryL CollinsD and KolkerE(2013) Differential expression analysis for pathways PLoS ComputBiol 9 e1002967

55 SubramanianA TamayoP MoothaVK MukherjeeSEbertBL GilletteMA PaulovichA PomeroySL GolubTRLanderES et al (2005) Gene set enrichment analysis aknowledge-based approach for interpreting genome-wide expressionprofiles Proc Natl Acad Sci USA 102 15545ndash15550

Page 4: Beyond protein expression, MOPED goes multi-omics€¦ · solely a protein expression database to a multi-omics resource for human and model organisms. Through a web-based interface,

D1148 Nucleic Acids Research 2015 Vol 43 Database issue

Figure 2 (A) Basic Search example for lsquogene = BRCrsquo default view (B) Basic Search example for lsquogene=BRC and organism=human and(condition=standard or condition=cancer)rsquo extended view Search results are expanded to include P-value False Discovery Rate (FDR) localizationand chromosome

Figure 3 Pathway Absolute Expression bar chart of Scavenging of Heme from Plasma Chart can be compared and filtered by tissue or condition Mouse-overs provide more information about the expression data point

Nucleic Acids Research 2015 Vol 43 Database issue D1149

gene description and a link to GeneCards (33) Genes arealso linked to external sources including NCBI Gene GOetc and pathways (Reactome PANTHER and BioCyc)(Table 3) (13ndash1833ndash44) Gene and Protein Details pagesdisplay gene and protein expression data respectively Thepages are enhanced by the embedded expression bar charts

Details pages for Pathways display pathway absolute ex-pression data and their bar graph together with links toexternal resources and a list of associated genes and pro-teins The associated genes and proteins link directly totheir expression data The Experiment Details page includesthe standardized experimental metadata checklist and sum-mary statistics such as the number of over- or under-expressed proteins or genes (910)

Pathways

MOPED adopted a pathway-centric view to integrate geneand protein expression data The new Pathways tab enablesquerying for pathway names keywords and biomolecules inaddition to the standard organism experiment conditionlocalization and tissue options Protein absolute expressiondata from a given experiment are mapped onto establishedpathways from Reactome PANTHER and BioCyc (16ndash18)Using this mapping MOPED provides summary statisticsfor pathway expression to enable a view of proteins workingtogether in the same pathway These statistics include path-way coverage odds ratio and the corresponding P-value cal-culated by Fisherrsquos exact test (12)

Versatile query and search tools enable complex queriesto explore these experiment-level pathway datasets Underthe Pathways tab researchers can query for pathways byfull or partial names as well as by bimolecular identifiersThe pathway expression results can then be sorted or down-loaded For example the user can visualize the results forpathways expressed in cystic fibrosis-related experiments(Figure 3)

Experimental metadata

Under the Experiments tab researchers can search filterand review experimental metadata For example queryingfor lsquocystic fibrosisrsquo will return a list of experiments wherethe search term appears in the experiment description Theresults include summary statistics on genes proteins andpathways expressed in each experiment

Experimental metadata are captured using a multi-omicschecklist (9ndash1145) This checklist provides informationabout experimental design data collection and analysisThis added layer of experimental metadata can be used byresearchers to assess the quality consistency usability andreproducibility of data in MOPED

MOPED currently contains experimental meta-data checklists on 12 proteomics and all 183 tran-scriptomics experiments (910) For example the sny-der personal omics profiling experiment has experimentalmetadata describing experiment design data collectionand data analysis (Supplementary Figure S2) (45) Built-inlinks to GEO provide additional information about thetranscriptomics experiments (21) The MOPED team iscurrently working on completing metadata checklists forall experiments in MOPED

Visualizations

MOPED provides interactive visualizations including bargraphs of protein gene and pathway expression levelsacross experiments that can be viewed by either tissue orcondition (Figures 1 and 3) The expression matrix com-pares expression of neighboring genes and correspondingproteins along the chromosome that can be grouped by tis-sue or condition The chord diagram gives an overview ofMOPED data content for different organisms Mouse-overdescriptions enhance the plots by providing more detailedgraphical and experimental information

MOPEDrsquoS IMPACT

MOPED provides support for the scientific community asa resource of processed expression data Researchers haveused MOPED to help identify potential proteins to studybased on expression levels (46) to compare expression ofdifferent proteins across experiments conditions localiza-tions and tissues (4748) to investigate expression levels invarious tissue types cell lines conditions and diseases (49ndash51) and to help link uncharacterized proteins to diabetesand cancer (5253)

MOPED was originally created in response to a sur-vey indicating a need for a processed expression database(13) Through direct and indirect interactions the MOPEDteam is able to learn the needs of biological and biomedicalresearchers and respond with updates and improvementsWith this feedback in mind MOPED acknowledges thechanging needs of the life sciences community

Data submissions

To be a complete and accurate data resource MOPEDneeds to reflect the current understanding of gene proteinand pathway expression patterns through access to the mostrecent data and experiments MOPED provides a conve-nient venue for researchers to both make their data pub-lic and share these privately with collaborators and review-ers (13) Private MOPED provides the same features toolsand conveniences as the public MOPED but enables theowner of the data to both limit access and make the datapublic when appropriate If interested in submitting data toMOPED please contact the MOPED team through the Fo-rum httpmoped-forumproteinspireorg

Future directions

As the field of biology rapidly advances MOPED is ad-vancing with it MOPED took the following steps to com-plement developments in research transitioned from pro-teomics to multi-omics (8) added metadata to enable morereliable use and integration of multiple data sources (911)and used pathways to create a coherent view of multi-omics data and an experimental level assessment of ex-pression (12) The next steps for MOPED are to enhanceintegration of different MOPED data types This will bedone by providing integrated data views graphical dis-plays and pathway analyses that show and utilize multipleomics data types In addition future versions of MOPED

D1150 Nucleic Acids Research 2015 Vol 43 Database issue

Table 3 MOPED references the following external resources on the Protein and Gene Details pages (13ndash1633ndash43)

External source Protein Gene Reference

GeneCards X X (33)PDB X (4142)BioCyc X (16)UniProt X (13)NCBI Gene X X (15)RefSeq X (34)ensembl X X (14)EMBL X (35)

WormBase X X (36)MGI X (37)KEGG X (38)HGNC X X (39)SGD X (40)GenBank X (43)GO X X (44)

indicates resources that are cross-referenced with MOPED

will offer disease-centric data integration and include addi-tional model organisms The disease-centric view will allowthe comparison of biomolecular processes and their func-tions between normal and disease states Also MOPEDplans to add pathway relative expression statistics to iden-tify overunder-expression of pathways in relative expres-sion experiments for both proteomics and transcriptomicsdata (5455) These steps will continue the advancement ofMOPED as a resource for biological and biomedical sci-ence

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGMENT

We thank Maggie Lackey for her critical reading themanuscript

FUNDING

National Science Foundation under the Division of Bio-logical Infrastructure [0969929] The Robert B McMillenFoundation Seattle Childrenrsquos Research Institute [to EK]Funding for open access charge Seattle Childrenrsquos ResearchInstituteConflict of interest statement None declared

REFERENCES1 HigdonR StewartE StanberryL HaynesW ChoiniereJ

MontagueE AndersonN YandlG JankoI BroomallW et al(2014) MOPED enables discoveries through consistently processedproteomics data J Proteome Res 13 107ndash113

2 KolkerE HigdonR HaynesW WelchD BroomallWLancetD StanberryL and KolkerN (2012) MOPED ModelOrganism Protein Expression Database Nucleic Acids Res 40D1093ndashD1099

3 HigdonR HaynesW StanberryL StewartE YandlGHowardC BroomallW KolkerN and KolkerE (2013) Unravelingthe complexities of life sciences data Big Data 1 42ndash50

4 ZhangW LiF and NieL (2010) Integrating multiple lsquoomicsrsquoanalysis for microbial biology application and methodologiesMicrobiol Read Engl 156 287ndash301

5 SikaroodiM GalachiantzY and BaranovaA (2010) Tumormarkers the potential of lsquoomicsrsquo approach Curr Mol Med 10249ndash257

6 StanberryL MiasGI HaynesW HigdonR SnyderM andKolkerE (2013) Integrative analysis of longitudinal metabolomicsdata from a personal multi-omics profile Metabolites 3 741ndash760

7 ChenR MiasGI Li-Pook-ThanJ JiangL LamHYKChenR MiriamiE KarczewskiKJ HariharanM DeweyFEet al (2012) Personal omics profiling reveals dynamic molecular andmedical phenotypes Cell 148 1293ndash1307

8 MontagueE StanberryL HigdonR JankoI LeeEAndersonN ChoiniereJ StewartE YandlG BroomallW et al(2014) MOPED 25ndashndashan integrated multi-omics resourceMulti-Omics Profiling Expression Database now includestranscriptomics data Omics J Integr Biol 18 335ndash343

9 KolkerE OzdemirV MartensL HancockW AndersonGAndersonN AynaciogluS BaranovaA CampagnaSR ChenRet al (2014) Toward more transparent and reproducible omics studiesthrough a common metadata checklist and data publications OmicsJ Integr Biol 18 10ndash14

10 DumbillE and KolkerE (2013) Introducing a metadata checklistfor omics data Big Data 1 195ndash195

11 Announcement Reducing our irreproducibility (2013) Nature 496398ndash398

12 KhatriP SirotaM and ButteAJ (2012) Ten years of pathwayanalysis current approaches and outstanding challenges PLoSComput Biol 8 e1002375

13 The UniProt Consortium (2014) Activities at the Universal ProteinResource (UniProt) Nucleic Acids Res 42 D191ndashD198

14 FlicekP AmodeMR BarrellD BealK BillisK BrentSCarvalho-SilvaD ClaphamP CoatesG FitzgeraldS et al (2014)Ensembl 2014 Nucleic Acids Res 42 D749ndashD755

15 MaglottD OstellJ PruittKD and TatusovaT (2011) EntrezGene gene-centered information at NCBI Nucleic Acids Res 39D52ndashD57

16 CaspiR AltmanT BillingtonR DreherK FoersterHFulcherCA HollandTA KeselerIM KothariA KuboA et al(2014) The MetaCyc database of metabolic pathways and enzymesand the BioCyc collection of PathwayGenome Databases NucleicAcids Res 42 D459ndashD471

17 MiH MuruganujanA and ThomasPD (2013) PANTHER in2013 modeling the evolution of gene function and other geneattributes in the context of phylogenetic trees Nucleic Acids Res 41D377ndashD386

18 CroftD MundoAF HawR MilacicM WeiserJ WuGCaudyM GarapatiP GillespieM KamdarMR et al (2014) The

Nucleic Acids Research 2015 Vol 43 Database issue D1151

Reactome pathway knowledgebase Nucleic Acids Res 42D472ndashD477

19 VizcaınoJA CoteRG CsordasA DianesJA FabregatAFosterJM GrissJ AlpiE BirimM ContellJ et al (2013) ThePRoteomics IDEntifications (PRIDE) database and associated toolsstatus in 2013 Nucleic Acids Res 41 D1063ndashD1069

20 DesiereF (2006) The PeptideAtlas project Nucleic Acids Res 34D655ndashD658

21 BarrettT WilhiteSE LedouxP EvangelistaC KimIFTomashevskyM MarshallKA PhillippyKH ShermanPMHolkoM et al (2013) NCBI GEO archive for functional genomicsdata setsndashndashupdate Nucleic Acids Res 41 D991ndashD995

22 KolkerE HigdonR WelchD BaumanA StewartE HaynesWBroomallW and KolkerN (2011) SPIRE systematic proteininvestigative research environment J Proteomics 75 122ndash126

23 HigdonRoger van BelleGerald and KolkerEugene (2008) A noteon the false discovery rate and inconsistent comparisons betweenexperiments Bioinformatics (Oxford England) 24 1225ndash1228[PubMed]

24 HatherG HigdonR BaumanA von HallerPD and KolkerE(2010) Estimating false discovery rates for peptide and proteinidentification using randomized databases Proteomics 102369ndash2376

25 HigdonR ReiterL HatherG HaynesW KolkerN StewartEBaumanAT PicottiP SchmidtA van BelleG et al (2011) IPMan integrated protein model for false discovery rate estimation andidentification in high-throughput proteomics J Proteomics 75116ndash121

26 HolzmanT and KolkerE (2004) Statistical analysis of global geneexpression data some practical considerations Curr OpinBiotechnol 15 52ndash57

27 SmythGK (2005) Limma linear models for microarray data InGentlemanR CareyV DudoitS IrizarryR and HuberW (eds)Bioinformatics and Computational Biology Solutions using R andBioconductor Springer NY pp 397ndash420

28 KimM-S PintoSM GetnetD NirujogiRS MandaSSChaerkadyR MadugunduAK KelkarDS IsserlinR JainSet al (2014) A draft map of the human proteome Nature 509575ndash581

29 WilhelmM SchleglJ HahneH Moghaddas GholamiALieberenzM SavitskiMM ZieglerE ButzmannL GessulatSMarxH et al (2014) Mass-spectrometry-based draft of the humanproteome Nature 509 582ndash587

30 GeigerT MaddenSF GallagherWM CoxJ and MannM(2012) Proteomic portrait of human breast cancer progressionidentifies novel prognostic markers Cancer Res 72 2428ndash2439

31 Moghaddas GholamiA HahneH WuZ AuerFJ MengCWilhelmM and KusterB (2013) Global proteome analysis of theNCI-60 cell line panel Cell Rep 4 609ndash620

32 PfisterTD ReinholdWC AgamaK GuptaS KhinSAKindersRJ ParchmentRE TomaszewskiJE DoroshowJH andPommierY (2009) Topoisomerase I levels in the NCI-60 cancer cellline panel determined by validated ELISA and microarray analysisand correlation with indenoisoquinoline sensitivity Mol CancerTher 8 1878ndash1884

33 StelzerG DalahI SteinTI SatanowerY RosenN NativNOz-LeviD OlenderT BelinkyF BahirI et al (2011) In-silicohuman genomics with GeneCards Hum Genomics 5 709ndash717

34 PruittKD BrownGR HiattSM Thibaud-NissenFAstashynA ErmolaevaO FarrellCM HartJ LandrumMJMcGarveyKM et al (2014) RefSeq an update on mammalianreference sequences Nucleic Acids Res 42 D756ndashD763

35 PaksereshtN AlakoB AmidC Cerdeno-TarragaA ClelandIGibsonR GoodgameN GurT JangM KayS et al (2014)Assembly information services in the European Nucleotide ArchiveNucleic Acids Res 42 D38ndashD43

36 HarrisTW BaranJ BieriT CabunocA ChanJ ChenWJDavisP DoneJ GroveC HoweK et al (2014) WormBase 2014new views of curated biology Nucleic Acids Res 42 D789ndashD793

37 BlakeJA BultCJ EppigJT KadinJA RichardsonJE andMouse Genome Database Group (2014) The Mouse GenomeDatabase integration of and access to knowledge about thelaboratory mouse Nucleic Acids Res 42 D810ndashD817

38 KanehisaM GotoS SatoY KawashimaM FurumichiM andTanabeM (2014) Data information knowledge and principle backto metabolism in KEGG Nucleic Acids Res 42 D199ndashD205

39 GrayKA DaughertyLC GordonSM SealRL WrightMWand BrufordEA (2013) Genenamesorg the HGNC resources in2013 Nucleic Acids Res 41 D545ndashD552

40 CostanzoMC EngelSR WongED LloydP KarraKChanET WengS PaskovKM RoeGR BinkleyG et al (2014)Saccharomyces genome database provides new regulation dataNucleic Acids Res 42 D717ndashD725

41 RosePW BiC BluhmWF ChristieCH DimitropoulosDDuttaS GreenRK GoodsellDS PrlicA QuesadaM et al(2013) The RCSB Protein Data Bank new resources for research andeducation Nucleic Acids Res 41 D475ndashD482

42 GutmanasA AlhroubY BattleGM BerrisfordJM BochetEConroyMJ DanaJM Fernandez MonteceloMA vanGinkelG GoreSP et al (2014) PDBe Protein Data Bank inEurope Nucleic Acids Res 42 D285ndashD291

43 BensonDA CavanaughM ClarkK Karsch-MizrachiILipmanDJ OstellJ and SayersEW (2013) GenBank NucleicAcids Res 41 D36ndashD42

44 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene ontology tool for the unification of biology The GeneOntology Consortium Nat Genet 25 25ndash29

45 SnyderM MiasG StanberryL and KolkerE (2014) Metadatachecklist for the integrated personal OMICS study proteomics andmetabolomics experiments Omics J Integr Biol 18 81ndash85

46 XueJ WijeratneSSK and ZempleniJ (2013) Holocarboxylasesynthetase synergizes with methyl CpG binding protein 2 and DNAmethyltransferase 1 in the transcriptional repression of long-terminalrepeats Epigenetics 8 504ndash511

47 StanevaR RukovaB HadjidekovaS NeshevaD AntonovaODimitrovP SimeonovV StamenovG CukuranovicRCukuranovicJ et al (2013) Whole genome methylation arrayanalysis reveals new aspects in Balkan endemic nephropathy etiologyBMC Nephrol 14 225

48 WilhelmT (2014) Phenotype prediction based on genome-wide DNAmethylation data BMC Bioinformatics 15 193

49 BlackerTS MannZF GaleJE ZieglerM BainAJSzabadkaiG and DuchenMR (2014) Separating NADH andNADPH fluorescence in live cells and tissues using FLIM NatCommun 5 3936

50 WilliamsBC FilterJJ Blake-HodekKA WadzinskiBEFudaNJ ShallowayD and GoldbergML (2014)Greatwall-phosphorylated Endosulfine is both an inhibitor and asubstrate of PP2A-B55 heterotrimers eLife 3 e01695

51 ChenM and PenningTM (2014) 5-reduced steroids and human(4)-3-ketosteroid 5-reductase (AKR1D1) Steroids 83 17ndash26

52 DelgadoAP BrandaoP and NarayananR (2014) Diabetesassociated genes from the dark matter of the human proteome MOJProteomics Bioinform 1 00020

53 DelgadoAP BrandaoP ChapadoMJ HamidS andNarayananR (2014) Open reading frames associated with cancer inthe dark matter of the human genome Cancer Genomics Proteomics11 201ndash213

54 HaynesWA HigdonR StanberryL CollinsD and KolkerE(2013) Differential expression analysis for pathways PLoS ComputBiol 9 e1002967

55 SubramanianA TamayoP MoothaVK MukherjeeSEbertBL GilletteMA PaulovichA PomeroySL GolubTRLanderES et al (2005) Gene set enrichment analysis aknowledge-based approach for interpreting genome-wide expressionprofiles Proc Natl Acad Sci USA 102 15545ndash15550

Page 5: Beyond protein expression, MOPED goes multi-omics€¦ · solely a protein expression database to a multi-omics resource for human and model organisms. Through a web-based interface,

Nucleic Acids Research 2015 Vol 43 Database issue D1149

gene description and a link to GeneCards (33) Genes arealso linked to external sources including NCBI Gene GOetc and pathways (Reactome PANTHER and BioCyc)(Table 3) (13ndash1833ndash44) Gene and Protein Details pagesdisplay gene and protein expression data respectively Thepages are enhanced by the embedded expression bar charts

Details pages for Pathways display pathway absolute ex-pression data and their bar graph together with links toexternal resources and a list of associated genes and pro-teins The associated genes and proteins link directly totheir expression data The Experiment Details page includesthe standardized experimental metadata checklist and sum-mary statistics such as the number of over- or under-expressed proteins or genes (910)

Pathways

MOPED adopted a pathway-centric view to integrate geneand protein expression data The new Pathways tab enablesquerying for pathway names keywords and biomolecules inaddition to the standard organism experiment conditionlocalization and tissue options Protein absolute expressiondata from a given experiment are mapped onto establishedpathways from Reactome PANTHER and BioCyc (16ndash18)Using this mapping MOPED provides summary statisticsfor pathway expression to enable a view of proteins workingtogether in the same pathway These statistics include path-way coverage odds ratio and the corresponding P-value cal-culated by Fisherrsquos exact test (12)

Versatile query and search tools enable complex queriesto explore these experiment-level pathway datasets Underthe Pathways tab researchers can query for pathways byfull or partial names as well as by bimolecular identifiersThe pathway expression results can then be sorted or down-loaded For example the user can visualize the results forpathways expressed in cystic fibrosis-related experiments(Figure 3)

Experimental metadata

Under the Experiments tab researchers can search filterand review experimental metadata For example queryingfor lsquocystic fibrosisrsquo will return a list of experiments wherethe search term appears in the experiment description Theresults include summary statistics on genes proteins andpathways expressed in each experiment

Experimental metadata are captured using a multi-omicschecklist (9ndash1145) This checklist provides informationabout experimental design data collection and analysisThis added layer of experimental metadata can be used byresearchers to assess the quality consistency usability andreproducibility of data in MOPED

MOPED currently contains experimental meta-data checklists on 12 proteomics and all 183 tran-scriptomics experiments (910) For example the sny-der personal omics profiling experiment has experimentalmetadata describing experiment design data collectionand data analysis (Supplementary Figure S2) (45) Built-inlinks to GEO provide additional information about thetranscriptomics experiments (21) The MOPED team iscurrently working on completing metadata checklists forall experiments in MOPED

Visualizations

MOPED provides interactive visualizations including bargraphs of protein gene and pathway expression levelsacross experiments that can be viewed by either tissue orcondition (Figures 1 and 3) The expression matrix com-pares expression of neighboring genes and correspondingproteins along the chromosome that can be grouped by tis-sue or condition The chord diagram gives an overview ofMOPED data content for different organisms Mouse-overdescriptions enhance the plots by providing more detailedgraphical and experimental information

MOPEDrsquoS IMPACT

MOPED provides support for the scientific community asa resource of processed expression data Researchers haveused MOPED to help identify potential proteins to studybased on expression levels (46) to compare expression ofdifferent proteins across experiments conditions localiza-tions and tissues (4748) to investigate expression levels invarious tissue types cell lines conditions and diseases (49ndash51) and to help link uncharacterized proteins to diabetesand cancer (5253)

MOPED was originally created in response to a sur-vey indicating a need for a processed expression database(13) Through direct and indirect interactions the MOPEDteam is able to learn the needs of biological and biomedicalresearchers and respond with updates and improvementsWith this feedback in mind MOPED acknowledges thechanging needs of the life sciences community

Data submissions

To be a complete and accurate data resource MOPEDneeds to reflect the current understanding of gene proteinand pathway expression patterns through access to the mostrecent data and experiments MOPED provides a conve-nient venue for researchers to both make their data pub-lic and share these privately with collaborators and review-ers (13) Private MOPED provides the same features toolsand conveniences as the public MOPED but enables theowner of the data to both limit access and make the datapublic when appropriate If interested in submitting data toMOPED please contact the MOPED team through the Fo-rum httpmoped-forumproteinspireorg

Future directions

As the field of biology rapidly advances MOPED is ad-vancing with it MOPED took the following steps to com-plement developments in research transitioned from pro-teomics to multi-omics (8) added metadata to enable morereliable use and integration of multiple data sources (911)and used pathways to create a coherent view of multi-omics data and an experimental level assessment of ex-pression (12) The next steps for MOPED are to enhanceintegration of different MOPED data types This will bedone by providing integrated data views graphical dis-plays and pathway analyses that show and utilize multipleomics data types In addition future versions of MOPED

D1150 Nucleic Acids Research 2015 Vol 43 Database issue

Table 3 MOPED references the following external resources on the Protein and Gene Details pages (13ndash1633ndash43)

External source Protein Gene Reference

GeneCards X X (33)PDB X (4142)BioCyc X (16)UniProt X (13)NCBI Gene X X (15)RefSeq X (34)ensembl X X (14)EMBL X (35)

WormBase X X (36)MGI X (37)KEGG X (38)HGNC X X (39)SGD X (40)GenBank X (43)GO X X (44)

indicates resources that are cross-referenced with MOPED

will offer disease-centric data integration and include addi-tional model organisms The disease-centric view will allowthe comparison of biomolecular processes and their func-tions between normal and disease states Also MOPEDplans to add pathway relative expression statistics to iden-tify overunder-expression of pathways in relative expres-sion experiments for both proteomics and transcriptomicsdata (5455) These steps will continue the advancement ofMOPED as a resource for biological and biomedical sci-ence

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGMENT

We thank Maggie Lackey for her critical reading themanuscript

FUNDING

National Science Foundation under the Division of Bio-logical Infrastructure [0969929] The Robert B McMillenFoundation Seattle Childrenrsquos Research Institute [to EK]Funding for open access charge Seattle Childrenrsquos ResearchInstituteConflict of interest statement None declared

REFERENCES1 HigdonR StewartE StanberryL HaynesW ChoiniereJ

MontagueE AndersonN YandlG JankoI BroomallW et al(2014) MOPED enables discoveries through consistently processedproteomics data J Proteome Res 13 107ndash113

2 KolkerE HigdonR HaynesW WelchD BroomallWLancetD StanberryL and KolkerN (2012) MOPED ModelOrganism Protein Expression Database Nucleic Acids Res 40D1093ndashD1099

3 HigdonR HaynesW StanberryL StewartE YandlGHowardC BroomallW KolkerN and KolkerE (2013) Unravelingthe complexities of life sciences data Big Data 1 42ndash50

4 ZhangW LiF and NieL (2010) Integrating multiple lsquoomicsrsquoanalysis for microbial biology application and methodologiesMicrobiol Read Engl 156 287ndash301

5 SikaroodiM GalachiantzY and BaranovaA (2010) Tumormarkers the potential of lsquoomicsrsquo approach Curr Mol Med 10249ndash257

6 StanberryL MiasGI HaynesW HigdonR SnyderM andKolkerE (2013) Integrative analysis of longitudinal metabolomicsdata from a personal multi-omics profile Metabolites 3 741ndash760

7 ChenR MiasGI Li-Pook-ThanJ JiangL LamHYKChenR MiriamiE KarczewskiKJ HariharanM DeweyFEet al (2012) Personal omics profiling reveals dynamic molecular andmedical phenotypes Cell 148 1293ndash1307

8 MontagueE StanberryL HigdonR JankoI LeeEAndersonN ChoiniereJ StewartE YandlG BroomallW et al(2014) MOPED 25ndashndashan integrated multi-omics resourceMulti-Omics Profiling Expression Database now includestranscriptomics data Omics J Integr Biol 18 335ndash343

9 KolkerE OzdemirV MartensL HancockW AndersonGAndersonN AynaciogluS BaranovaA CampagnaSR ChenRet al (2014) Toward more transparent and reproducible omics studiesthrough a common metadata checklist and data publications OmicsJ Integr Biol 18 10ndash14

10 DumbillE and KolkerE (2013) Introducing a metadata checklistfor omics data Big Data 1 195ndash195

11 Announcement Reducing our irreproducibility (2013) Nature 496398ndash398

12 KhatriP SirotaM and ButteAJ (2012) Ten years of pathwayanalysis current approaches and outstanding challenges PLoSComput Biol 8 e1002375

13 The UniProt Consortium (2014) Activities at the Universal ProteinResource (UniProt) Nucleic Acids Res 42 D191ndashD198

14 FlicekP AmodeMR BarrellD BealK BillisK BrentSCarvalho-SilvaD ClaphamP CoatesG FitzgeraldS et al (2014)Ensembl 2014 Nucleic Acids Res 42 D749ndashD755

15 MaglottD OstellJ PruittKD and TatusovaT (2011) EntrezGene gene-centered information at NCBI Nucleic Acids Res 39D52ndashD57

16 CaspiR AltmanT BillingtonR DreherK FoersterHFulcherCA HollandTA KeselerIM KothariA KuboA et al(2014) The MetaCyc database of metabolic pathways and enzymesand the BioCyc collection of PathwayGenome Databases NucleicAcids Res 42 D459ndashD471

17 MiH MuruganujanA and ThomasPD (2013) PANTHER in2013 modeling the evolution of gene function and other geneattributes in the context of phylogenetic trees Nucleic Acids Res 41D377ndashD386

18 CroftD MundoAF HawR MilacicM WeiserJ WuGCaudyM GarapatiP GillespieM KamdarMR et al (2014) The

Nucleic Acids Research 2015 Vol 43 Database issue D1151

Reactome pathway knowledgebase Nucleic Acids Res 42D472ndashD477

19 VizcaınoJA CoteRG CsordasA DianesJA FabregatAFosterJM GrissJ AlpiE BirimM ContellJ et al (2013) ThePRoteomics IDEntifications (PRIDE) database and associated toolsstatus in 2013 Nucleic Acids Res 41 D1063ndashD1069

20 DesiereF (2006) The PeptideAtlas project Nucleic Acids Res 34D655ndashD658

21 BarrettT WilhiteSE LedouxP EvangelistaC KimIFTomashevskyM MarshallKA PhillippyKH ShermanPMHolkoM et al (2013) NCBI GEO archive for functional genomicsdata setsndashndashupdate Nucleic Acids Res 41 D991ndashD995

22 KolkerE HigdonR WelchD BaumanA StewartE HaynesWBroomallW and KolkerN (2011) SPIRE systematic proteininvestigative research environment J Proteomics 75 122ndash126

23 HigdonRoger van BelleGerald and KolkerEugene (2008) A noteon the false discovery rate and inconsistent comparisons betweenexperiments Bioinformatics (Oxford England) 24 1225ndash1228[PubMed]

24 HatherG HigdonR BaumanA von HallerPD and KolkerE(2010) Estimating false discovery rates for peptide and proteinidentification using randomized databases Proteomics 102369ndash2376

25 HigdonR ReiterL HatherG HaynesW KolkerN StewartEBaumanAT PicottiP SchmidtA van BelleG et al (2011) IPMan integrated protein model for false discovery rate estimation andidentification in high-throughput proteomics J Proteomics 75116ndash121

26 HolzmanT and KolkerE (2004) Statistical analysis of global geneexpression data some practical considerations Curr OpinBiotechnol 15 52ndash57

27 SmythGK (2005) Limma linear models for microarray data InGentlemanR CareyV DudoitS IrizarryR and HuberW (eds)Bioinformatics and Computational Biology Solutions using R andBioconductor Springer NY pp 397ndash420

28 KimM-S PintoSM GetnetD NirujogiRS MandaSSChaerkadyR MadugunduAK KelkarDS IsserlinR JainSet al (2014) A draft map of the human proteome Nature 509575ndash581

29 WilhelmM SchleglJ HahneH Moghaddas GholamiALieberenzM SavitskiMM ZieglerE ButzmannL GessulatSMarxH et al (2014) Mass-spectrometry-based draft of the humanproteome Nature 509 582ndash587

30 GeigerT MaddenSF GallagherWM CoxJ and MannM(2012) Proteomic portrait of human breast cancer progressionidentifies novel prognostic markers Cancer Res 72 2428ndash2439

31 Moghaddas GholamiA HahneH WuZ AuerFJ MengCWilhelmM and KusterB (2013) Global proteome analysis of theNCI-60 cell line panel Cell Rep 4 609ndash620

32 PfisterTD ReinholdWC AgamaK GuptaS KhinSAKindersRJ ParchmentRE TomaszewskiJE DoroshowJH andPommierY (2009) Topoisomerase I levels in the NCI-60 cancer cellline panel determined by validated ELISA and microarray analysisand correlation with indenoisoquinoline sensitivity Mol CancerTher 8 1878ndash1884

33 StelzerG DalahI SteinTI SatanowerY RosenN NativNOz-LeviD OlenderT BelinkyF BahirI et al (2011) In-silicohuman genomics with GeneCards Hum Genomics 5 709ndash717

34 PruittKD BrownGR HiattSM Thibaud-NissenFAstashynA ErmolaevaO FarrellCM HartJ LandrumMJMcGarveyKM et al (2014) RefSeq an update on mammalianreference sequences Nucleic Acids Res 42 D756ndashD763

35 PaksereshtN AlakoB AmidC Cerdeno-TarragaA ClelandIGibsonR GoodgameN GurT JangM KayS et al (2014)Assembly information services in the European Nucleotide ArchiveNucleic Acids Res 42 D38ndashD43

36 HarrisTW BaranJ BieriT CabunocA ChanJ ChenWJDavisP DoneJ GroveC HoweK et al (2014) WormBase 2014new views of curated biology Nucleic Acids Res 42 D789ndashD793

37 BlakeJA BultCJ EppigJT KadinJA RichardsonJE andMouse Genome Database Group (2014) The Mouse GenomeDatabase integration of and access to knowledge about thelaboratory mouse Nucleic Acids Res 42 D810ndashD817

38 KanehisaM GotoS SatoY KawashimaM FurumichiM andTanabeM (2014) Data information knowledge and principle backto metabolism in KEGG Nucleic Acids Res 42 D199ndashD205

39 GrayKA DaughertyLC GordonSM SealRL WrightMWand BrufordEA (2013) Genenamesorg the HGNC resources in2013 Nucleic Acids Res 41 D545ndashD552

40 CostanzoMC EngelSR WongED LloydP KarraKChanET WengS PaskovKM RoeGR BinkleyG et al (2014)Saccharomyces genome database provides new regulation dataNucleic Acids Res 42 D717ndashD725

41 RosePW BiC BluhmWF ChristieCH DimitropoulosDDuttaS GreenRK GoodsellDS PrlicA QuesadaM et al(2013) The RCSB Protein Data Bank new resources for research andeducation Nucleic Acids Res 41 D475ndashD482

42 GutmanasA AlhroubY BattleGM BerrisfordJM BochetEConroyMJ DanaJM Fernandez MonteceloMA vanGinkelG GoreSP et al (2014) PDBe Protein Data Bank inEurope Nucleic Acids Res 42 D285ndashD291

43 BensonDA CavanaughM ClarkK Karsch-MizrachiILipmanDJ OstellJ and SayersEW (2013) GenBank NucleicAcids Res 41 D36ndashD42

44 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene ontology tool for the unification of biology The GeneOntology Consortium Nat Genet 25 25ndash29

45 SnyderM MiasG StanberryL and KolkerE (2014) Metadatachecklist for the integrated personal OMICS study proteomics andmetabolomics experiments Omics J Integr Biol 18 81ndash85

46 XueJ WijeratneSSK and ZempleniJ (2013) Holocarboxylasesynthetase synergizes with methyl CpG binding protein 2 and DNAmethyltransferase 1 in the transcriptional repression of long-terminalrepeats Epigenetics 8 504ndash511

47 StanevaR RukovaB HadjidekovaS NeshevaD AntonovaODimitrovP SimeonovV StamenovG CukuranovicRCukuranovicJ et al (2013) Whole genome methylation arrayanalysis reveals new aspects in Balkan endemic nephropathy etiologyBMC Nephrol 14 225

48 WilhelmT (2014) Phenotype prediction based on genome-wide DNAmethylation data BMC Bioinformatics 15 193

49 BlackerTS MannZF GaleJE ZieglerM BainAJSzabadkaiG and DuchenMR (2014) Separating NADH andNADPH fluorescence in live cells and tissues using FLIM NatCommun 5 3936

50 WilliamsBC FilterJJ Blake-HodekKA WadzinskiBEFudaNJ ShallowayD and GoldbergML (2014)Greatwall-phosphorylated Endosulfine is both an inhibitor and asubstrate of PP2A-B55 heterotrimers eLife 3 e01695

51 ChenM and PenningTM (2014) 5-reduced steroids and human(4)-3-ketosteroid 5-reductase (AKR1D1) Steroids 83 17ndash26

52 DelgadoAP BrandaoP and NarayananR (2014) Diabetesassociated genes from the dark matter of the human proteome MOJProteomics Bioinform 1 00020

53 DelgadoAP BrandaoP ChapadoMJ HamidS andNarayananR (2014) Open reading frames associated with cancer inthe dark matter of the human genome Cancer Genomics Proteomics11 201ndash213

54 HaynesWA HigdonR StanberryL CollinsD and KolkerE(2013) Differential expression analysis for pathways PLoS ComputBiol 9 e1002967

55 SubramanianA TamayoP MoothaVK MukherjeeSEbertBL GilletteMA PaulovichA PomeroySL GolubTRLanderES et al (2005) Gene set enrichment analysis aknowledge-based approach for interpreting genome-wide expressionprofiles Proc Natl Acad Sci USA 102 15545ndash15550

Page 6: Beyond protein expression, MOPED goes multi-omics€¦ · solely a protein expression database to a multi-omics resource for human and model organisms. Through a web-based interface,

D1150 Nucleic Acids Research 2015 Vol 43 Database issue

Table 3 MOPED references the following external resources on the Protein and Gene Details pages (13ndash1633ndash43)

External source Protein Gene Reference

GeneCards X X (33)PDB X (4142)BioCyc X (16)UniProt X (13)NCBI Gene X X (15)RefSeq X (34)ensembl X X (14)EMBL X (35)

WormBase X X (36)MGI X (37)KEGG X (38)HGNC X X (39)SGD X (40)GenBank X (43)GO X X (44)

indicates resources that are cross-referenced with MOPED

will offer disease-centric data integration and include addi-tional model organisms The disease-centric view will allowthe comparison of biomolecular processes and their func-tions between normal and disease states Also MOPEDplans to add pathway relative expression statistics to iden-tify overunder-expression of pathways in relative expres-sion experiments for both proteomics and transcriptomicsdata (5455) These steps will continue the advancement ofMOPED as a resource for biological and biomedical sci-ence

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGMENT

We thank Maggie Lackey for her critical reading themanuscript

FUNDING

National Science Foundation under the Division of Bio-logical Infrastructure [0969929] The Robert B McMillenFoundation Seattle Childrenrsquos Research Institute [to EK]Funding for open access charge Seattle Childrenrsquos ResearchInstituteConflict of interest statement None declared

REFERENCES1 HigdonR StewartE StanberryL HaynesW ChoiniereJ

MontagueE AndersonN YandlG JankoI BroomallW et al(2014) MOPED enables discoveries through consistently processedproteomics data J Proteome Res 13 107ndash113

2 KolkerE HigdonR HaynesW WelchD BroomallWLancetD StanberryL and KolkerN (2012) MOPED ModelOrganism Protein Expression Database Nucleic Acids Res 40D1093ndashD1099

3 HigdonR HaynesW StanberryL StewartE YandlGHowardC BroomallW KolkerN and KolkerE (2013) Unravelingthe complexities of life sciences data Big Data 1 42ndash50

4 ZhangW LiF and NieL (2010) Integrating multiple lsquoomicsrsquoanalysis for microbial biology application and methodologiesMicrobiol Read Engl 156 287ndash301

5 SikaroodiM GalachiantzY and BaranovaA (2010) Tumormarkers the potential of lsquoomicsrsquo approach Curr Mol Med 10249ndash257

6 StanberryL MiasGI HaynesW HigdonR SnyderM andKolkerE (2013) Integrative analysis of longitudinal metabolomicsdata from a personal multi-omics profile Metabolites 3 741ndash760

7 ChenR MiasGI Li-Pook-ThanJ JiangL LamHYKChenR MiriamiE KarczewskiKJ HariharanM DeweyFEet al (2012) Personal omics profiling reveals dynamic molecular andmedical phenotypes Cell 148 1293ndash1307

8 MontagueE StanberryL HigdonR JankoI LeeEAndersonN ChoiniereJ StewartE YandlG BroomallW et al(2014) MOPED 25ndashndashan integrated multi-omics resourceMulti-Omics Profiling Expression Database now includestranscriptomics data Omics J Integr Biol 18 335ndash343

9 KolkerE OzdemirV MartensL HancockW AndersonGAndersonN AynaciogluS BaranovaA CampagnaSR ChenRet al (2014) Toward more transparent and reproducible omics studiesthrough a common metadata checklist and data publications OmicsJ Integr Biol 18 10ndash14

10 DumbillE and KolkerE (2013) Introducing a metadata checklistfor omics data Big Data 1 195ndash195

11 Announcement Reducing our irreproducibility (2013) Nature 496398ndash398

12 KhatriP SirotaM and ButteAJ (2012) Ten years of pathwayanalysis current approaches and outstanding challenges PLoSComput Biol 8 e1002375

13 The UniProt Consortium (2014) Activities at the Universal ProteinResource (UniProt) Nucleic Acids Res 42 D191ndashD198

14 FlicekP AmodeMR BarrellD BealK BillisK BrentSCarvalho-SilvaD ClaphamP CoatesG FitzgeraldS et al (2014)Ensembl 2014 Nucleic Acids Res 42 D749ndashD755

15 MaglottD OstellJ PruittKD and TatusovaT (2011) EntrezGene gene-centered information at NCBI Nucleic Acids Res 39D52ndashD57

16 CaspiR AltmanT BillingtonR DreherK FoersterHFulcherCA HollandTA KeselerIM KothariA KuboA et al(2014) The MetaCyc database of metabolic pathways and enzymesand the BioCyc collection of PathwayGenome Databases NucleicAcids Res 42 D459ndashD471

17 MiH MuruganujanA and ThomasPD (2013) PANTHER in2013 modeling the evolution of gene function and other geneattributes in the context of phylogenetic trees Nucleic Acids Res 41D377ndashD386

18 CroftD MundoAF HawR MilacicM WeiserJ WuGCaudyM GarapatiP GillespieM KamdarMR et al (2014) The

Nucleic Acids Research 2015 Vol 43 Database issue D1151

Reactome pathway knowledgebase Nucleic Acids Res 42D472ndashD477

19 VizcaınoJA CoteRG CsordasA DianesJA FabregatAFosterJM GrissJ AlpiE BirimM ContellJ et al (2013) ThePRoteomics IDEntifications (PRIDE) database and associated toolsstatus in 2013 Nucleic Acids Res 41 D1063ndashD1069

20 DesiereF (2006) The PeptideAtlas project Nucleic Acids Res 34D655ndashD658

21 BarrettT WilhiteSE LedouxP EvangelistaC KimIFTomashevskyM MarshallKA PhillippyKH ShermanPMHolkoM et al (2013) NCBI GEO archive for functional genomicsdata setsndashndashupdate Nucleic Acids Res 41 D991ndashD995

22 KolkerE HigdonR WelchD BaumanA StewartE HaynesWBroomallW and KolkerN (2011) SPIRE systematic proteininvestigative research environment J Proteomics 75 122ndash126

23 HigdonRoger van BelleGerald and KolkerEugene (2008) A noteon the false discovery rate and inconsistent comparisons betweenexperiments Bioinformatics (Oxford England) 24 1225ndash1228[PubMed]

24 HatherG HigdonR BaumanA von HallerPD and KolkerE(2010) Estimating false discovery rates for peptide and proteinidentification using randomized databases Proteomics 102369ndash2376

25 HigdonR ReiterL HatherG HaynesW KolkerN StewartEBaumanAT PicottiP SchmidtA van BelleG et al (2011) IPMan integrated protein model for false discovery rate estimation andidentification in high-throughput proteomics J Proteomics 75116ndash121

26 HolzmanT and KolkerE (2004) Statistical analysis of global geneexpression data some practical considerations Curr OpinBiotechnol 15 52ndash57

27 SmythGK (2005) Limma linear models for microarray data InGentlemanR CareyV DudoitS IrizarryR and HuberW (eds)Bioinformatics and Computational Biology Solutions using R andBioconductor Springer NY pp 397ndash420

28 KimM-S PintoSM GetnetD NirujogiRS MandaSSChaerkadyR MadugunduAK KelkarDS IsserlinR JainSet al (2014) A draft map of the human proteome Nature 509575ndash581

29 WilhelmM SchleglJ HahneH Moghaddas GholamiALieberenzM SavitskiMM ZieglerE ButzmannL GessulatSMarxH et al (2014) Mass-spectrometry-based draft of the humanproteome Nature 509 582ndash587

30 GeigerT MaddenSF GallagherWM CoxJ and MannM(2012) Proteomic portrait of human breast cancer progressionidentifies novel prognostic markers Cancer Res 72 2428ndash2439

31 Moghaddas GholamiA HahneH WuZ AuerFJ MengCWilhelmM and KusterB (2013) Global proteome analysis of theNCI-60 cell line panel Cell Rep 4 609ndash620

32 PfisterTD ReinholdWC AgamaK GuptaS KhinSAKindersRJ ParchmentRE TomaszewskiJE DoroshowJH andPommierY (2009) Topoisomerase I levels in the NCI-60 cancer cellline panel determined by validated ELISA and microarray analysisand correlation with indenoisoquinoline sensitivity Mol CancerTher 8 1878ndash1884

33 StelzerG DalahI SteinTI SatanowerY RosenN NativNOz-LeviD OlenderT BelinkyF BahirI et al (2011) In-silicohuman genomics with GeneCards Hum Genomics 5 709ndash717

34 PruittKD BrownGR HiattSM Thibaud-NissenFAstashynA ErmolaevaO FarrellCM HartJ LandrumMJMcGarveyKM et al (2014) RefSeq an update on mammalianreference sequences Nucleic Acids Res 42 D756ndashD763

35 PaksereshtN AlakoB AmidC Cerdeno-TarragaA ClelandIGibsonR GoodgameN GurT JangM KayS et al (2014)Assembly information services in the European Nucleotide ArchiveNucleic Acids Res 42 D38ndashD43

36 HarrisTW BaranJ BieriT CabunocA ChanJ ChenWJDavisP DoneJ GroveC HoweK et al (2014) WormBase 2014new views of curated biology Nucleic Acids Res 42 D789ndashD793

37 BlakeJA BultCJ EppigJT KadinJA RichardsonJE andMouse Genome Database Group (2014) The Mouse GenomeDatabase integration of and access to knowledge about thelaboratory mouse Nucleic Acids Res 42 D810ndashD817

38 KanehisaM GotoS SatoY KawashimaM FurumichiM andTanabeM (2014) Data information knowledge and principle backto metabolism in KEGG Nucleic Acids Res 42 D199ndashD205

39 GrayKA DaughertyLC GordonSM SealRL WrightMWand BrufordEA (2013) Genenamesorg the HGNC resources in2013 Nucleic Acids Res 41 D545ndashD552

40 CostanzoMC EngelSR WongED LloydP KarraKChanET WengS PaskovKM RoeGR BinkleyG et al (2014)Saccharomyces genome database provides new regulation dataNucleic Acids Res 42 D717ndashD725

41 RosePW BiC BluhmWF ChristieCH DimitropoulosDDuttaS GreenRK GoodsellDS PrlicA QuesadaM et al(2013) The RCSB Protein Data Bank new resources for research andeducation Nucleic Acids Res 41 D475ndashD482

42 GutmanasA AlhroubY BattleGM BerrisfordJM BochetEConroyMJ DanaJM Fernandez MonteceloMA vanGinkelG GoreSP et al (2014) PDBe Protein Data Bank inEurope Nucleic Acids Res 42 D285ndashD291

43 BensonDA CavanaughM ClarkK Karsch-MizrachiILipmanDJ OstellJ and SayersEW (2013) GenBank NucleicAcids Res 41 D36ndashD42

44 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene ontology tool for the unification of biology The GeneOntology Consortium Nat Genet 25 25ndash29

45 SnyderM MiasG StanberryL and KolkerE (2014) Metadatachecklist for the integrated personal OMICS study proteomics andmetabolomics experiments Omics J Integr Biol 18 81ndash85

46 XueJ WijeratneSSK and ZempleniJ (2013) Holocarboxylasesynthetase synergizes with methyl CpG binding protein 2 and DNAmethyltransferase 1 in the transcriptional repression of long-terminalrepeats Epigenetics 8 504ndash511

47 StanevaR RukovaB HadjidekovaS NeshevaD AntonovaODimitrovP SimeonovV StamenovG CukuranovicRCukuranovicJ et al (2013) Whole genome methylation arrayanalysis reveals new aspects in Balkan endemic nephropathy etiologyBMC Nephrol 14 225

48 WilhelmT (2014) Phenotype prediction based on genome-wide DNAmethylation data BMC Bioinformatics 15 193

49 BlackerTS MannZF GaleJE ZieglerM BainAJSzabadkaiG and DuchenMR (2014) Separating NADH andNADPH fluorescence in live cells and tissues using FLIM NatCommun 5 3936

50 WilliamsBC FilterJJ Blake-HodekKA WadzinskiBEFudaNJ ShallowayD and GoldbergML (2014)Greatwall-phosphorylated Endosulfine is both an inhibitor and asubstrate of PP2A-B55 heterotrimers eLife 3 e01695

51 ChenM and PenningTM (2014) 5-reduced steroids and human(4)-3-ketosteroid 5-reductase (AKR1D1) Steroids 83 17ndash26

52 DelgadoAP BrandaoP and NarayananR (2014) Diabetesassociated genes from the dark matter of the human proteome MOJProteomics Bioinform 1 00020

53 DelgadoAP BrandaoP ChapadoMJ HamidS andNarayananR (2014) Open reading frames associated with cancer inthe dark matter of the human genome Cancer Genomics Proteomics11 201ndash213

54 HaynesWA HigdonR StanberryL CollinsD and KolkerE(2013) Differential expression analysis for pathways PLoS ComputBiol 9 e1002967

55 SubramanianA TamayoP MoothaVK MukherjeeSEbertBL GilletteMA PaulovichA PomeroySL GolubTRLanderES et al (2005) Gene set enrichment analysis aknowledge-based approach for interpreting genome-wide expressionprofiles Proc Natl Acad Sci USA 102 15545ndash15550

Page 7: Beyond protein expression, MOPED goes multi-omics€¦ · solely a protein expression database to a multi-omics resource for human and model organisms. Through a web-based interface,

Nucleic Acids Research 2015 Vol 43 Database issue D1151

Reactome pathway knowledgebase Nucleic Acids Res 42D472ndashD477

19 VizcaınoJA CoteRG CsordasA DianesJA FabregatAFosterJM GrissJ AlpiE BirimM ContellJ et al (2013) ThePRoteomics IDEntifications (PRIDE) database and associated toolsstatus in 2013 Nucleic Acids Res 41 D1063ndashD1069

20 DesiereF (2006) The PeptideAtlas project Nucleic Acids Res 34D655ndashD658

21 BarrettT WilhiteSE LedouxP EvangelistaC KimIFTomashevskyM MarshallKA PhillippyKH ShermanPMHolkoM et al (2013) NCBI GEO archive for functional genomicsdata setsndashndashupdate Nucleic Acids Res 41 D991ndashD995

22 KolkerE HigdonR WelchD BaumanA StewartE HaynesWBroomallW and KolkerN (2011) SPIRE systematic proteininvestigative research environment J Proteomics 75 122ndash126

23 HigdonRoger van BelleGerald and KolkerEugene (2008) A noteon the false discovery rate and inconsistent comparisons betweenexperiments Bioinformatics (Oxford England) 24 1225ndash1228[PubMed]

24 HatherG HigdonR BaumanA von HallerPD and KolkerE(2010) Estimating false discovery rates for peptide and proteinidentification using randomized databases Proteomics 102369ndash2376

25 HigdonR ReiterL HatherG HaynesW KolkerN StewartEBaumanAT PicottiP SchmidtA van BelleG et al (2011) IPMan integrated protein model for false discovery rate estimation andidentification in high-throughput proteomics J Proteomics 75116ndash121

26 HolzmanT and KolkerE (2004) Statistical analysis of global geneexpression data some practical considerations Curr OpinBiotechnol 15 52ndash57

27 SmythGK (2005) Limma linear models for microarray data InGentlemanR CareyV DudoitS IrizarryR and HuberW (eds)Bioinformatics and Computational Biology Solutions using R andBioconductor Springer NY pp 397ndash420

28 KimM-S PintoSM GetnetD NirujogiRS MandaSSChaerkadyR MadugunduAK KelkarDS IsserlinR JainSet al (2014) A draft map of the human proteome Nature 509575ndash581

29 WilhelmM SchleglJ HahneH Moghaddas GholamiALieberenzM SavitskiMM ZieglerE ButzmannL GessulatSMarxH et al (2014) Mass-spectrometry-based draft of the humanproteome Nature 509 582ndash587

30 GeigerT MaddenSF GallagherWM CoxJ and MannM(2012) Proteomic portrait of human breast cancer progressionidentifies novel prognostic markers Cancer Res 72 2428ndash2439

31 Moghaddas GholamiA HahneH WuZ AuerFJ MengCWilhelmM and KusterB (2013) Global proteome analysis of theNCI-60 cell line panel Cell Rep 4 609ndash620

32 PfisterTD ReinholdWC AgamaK GuptaS KhinSAKindersRJ ParchmentRE TomaszewskiJE DoroshowJH andPommierY (2009) Topoisomerase I levels in the NCI-60 cancer cellline panel determined by validated ELISA and microarray analysisand correlation with indenoisoquinoline sensitivity Mol CancerTher 8 1878ndash1884

33 StelzerG DalahI SteinTI SatanowerY RosenN NativNOz-LeviD OlenderT BelinkyF BahirI et al (2011) In-silicohuman genomics with GeneCards Hum Genomics 5 709ndash717

34 PruittKD BrownGR HiattSM Thibaud-NissenFAstashynA ErmolaevaO FarrellCM HartJ LandrumMJMcGarveyKM et al (2014) RefSeq an update on mammalianreference sequences Nucleic Acids Res 42 D756ndashD763

35 PaksereshtN AlakoB AmidC Cerdeno-TarragaA ClelandIGibsonR GoodgameN GurT JangM KayS et al (2014)Assembly information services in the European Nucleotide ArchiveNucleic Acids Res 42 D38ndashD43

36 HarrisTW BaranJ BieriT CabunocA ChanJ ChenWJDavisP DoneJ GroveC HoweK et al (2014) WormBase 2014new views of curated biology Nucleic Acids Res 42 D789ndashD793

37 BlakeJA BultCJ EppigJT KadinJA RichardsonJE andMouse Genome Database Group (2014) The Mouse GenomeDatabase integration of and access to knowledge about thelaboratory mouse Nucleic Acids Res 42 D810ndashD817

38 KanehisaM GotoS SatoY KawashimaM FurumichiM andTanabeM (2014) Data information knowledge and principle backto metabolism in KEGG Nucleic Acids Res 42 D199ndashD205

39 GrayKA DaughertyLC GordonSM SealRL WrightMWand BrufordEA (2013) Genenamesorg the HGNC resources in2013 Nucleic Acids Res 41 D545ndashD552

40 CostanzoMC EngelSR WongED LloydP KarraKChanET WengS PaskovKM RoeGR BinkleyG et al (2014)Saccharomyces genome database provides new regulation dataNucleic Acids Res 42 D717ndashD725

41 RosePW BiC BluhmWF ChristieCH DimitropoulosDDuttaS GreenRK GoodsellDS PrlicA QuesadaM et al(2013) The RCSB Protein Data Bank new resources for research andeducation Nucleic Acids Res 41 D475ndashD482

42 GutmanasA AlhroubY BattleGM BerrisfordJM BochetEConroyMJ DanaJM Fernandez MonteceloMA vanGinkelG GoreSP et al (2014) PDBe Protein Data Bank inEurope Nucleic Acids Res 42 D285ndashD291

43 BensonDA CavanaughM ClarkK Karsch-MizrachiILipmanDJ OstellJ and SayersEW (2013) GenBank NucleicAcids Res 41 D36ndashD42

44 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene ontology tool for the unification of biology The GeneOntology Consortium Nat Genet 25 25ndash29

45 SnyderM MiasG StanberryL and KolkerE (2014) Metadatachecklist for the integrated personal OMICS study proteomics andmetabolomics experiments Omics J Integr Biol 18 81ndash85

46 XueJ WijeratneSSK and ZempleniJ (2013) Holocarboxylasesynthetase synergizes with methyl CpG binding protein 2 and DNAmethyltransferase 1 in the transcriptional repression of long-terminalrepeats Epigenetics 8 504ndash511

47 StanevaR RukovaB HadjidekovaS NeshevaD AntonovaODimitrovP SimeonovV StamenovG CukuranovicRCukuranovicJ et al (2013) Whole genome methylation arrayanalysis reveals new aspects in Balkan endemic nephropathy etiologyBMC Nephrol 14 225

48 WilhelmT (2014) Phenotype prediction based on genome-wide DNAmethylation data BMC Bioinformatics 15 193

49 BlackerTS MannZF GaleJE ZieglerM BainAJSzabadkaiG and DuchenMR (2014) Separating NADH andNADPH fluorescence in live cells and tissues using FLIM NatCommun 5 3936

50 WilliamsBC FilterJJ Blake-HodekKA WadzinskiBEFudaNJ ShallowayD and GoldbergML (2014)Greatwall-phosphorylated Endosulfine is both an inhibitor and asubstrate of PP2A-B55 heterotrimers eLife 3 e01695

51 ChenM and PenningTM (2014) 5-reduced steroids and human(4)-3-ketosteroid 5-reductase (AKR1D1) Steroids 83 17ndash26

52 DelgadoAP BrandaoP and NarayananR (2014) Diabetesassociated genes from the dark matter of the human proteome MOJProteomics Bioinform 1 00020

53 DelgadoAP BrandaoP ChapadoMJ HamidS andNarayananR (2014) Open reading frames associated with cancer inthe dark matter of the human genome Cancer Genomics Proteomics11 201ndash213

54 HaynesWA HigdonR StanberryL CollinsD and KolkerE(2013) Differential expression analysis for pathways PLoS ComputBiol 9 e1002967

55 SubramanianA TamayoP MoothaVK MukherjeeSEbertBL GilletteMA PaulovichA PomeroySL GolubTRLanderES et al (2005) Gene set enrichment analysis aknowledge-based approach for interpreting genome-wide expressionprofiles Proc Natl Acad Sci USA 102 15545ndash15550