Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome...
Transcript of Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome...
![Page 1: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/1.jpg)
Management of Proteomics Data:
2D Gel Electrophoresis and Other Methods
Philip AndrewsNational Resource for Proteomics &
Pathway MappingMichigan Proteome Consortium
University of Michigan
![Page 2: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/2.jpg)
Outline of Presentation
Introduction and Point of View.Role of Standards in ProteomicsManagement, Annotation, Distribution of Proteomics Data.Role of Open Source in Proteomics
![Page 3: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/3.jpg)
Challenges in Proteomics.
The gaps between Basic Research, Technical Development, and Commercialization narrow dramatically in rapidly-evolving fields.Commitment to a single technology may be fatal in a rapidly evolving field.Computational and IT infrastructure is critical and limiting for Proteomics.
![Page 4: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/4.jpg)
Proteome Informatics and the Rate of Progress in Proteomics
Availability of appropriate software.
Effective information management.
Lack of basic standards.
![Page 5: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/5.jpg)
Challenges in Proteome Informatics
Proteome technologies evolve rapidly.Software always lags behind hardware.Software always lags behind applications.
Current database structures are inadequate –missing data, data quality, complex interactions, changing interactions, new data types, pedigree, etc.
![Page 6: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/6.jpg)
Challenges in Proteome Informatics II
Ability to extract knowledge from large, complex biological datasets is still evolving.Mechanisms for annotation of genome databases need to be improved.Planning large-scale experiments must be automated.
![Page 7: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/7.jpg)
Data Challenges in Proteomics
High-throughput technologies generate large amounts of data.Data are heterogeneous.Data relationships are complex.There are minimal standards.
![Page 8: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/8.jpg)
Data Complexity: Proteome Mapping Data
• 2D Gel Images• MS spectra• MS/MS spectra• LC MS• 2D LC/ MS/MS• Tune files• Sample data• Data analysis parameters
![Page 9: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/9.jpg)
Standards are more easily applied to production.Standards should conform to technology, not the other way around.
Proteomics Is Both Research and Production.
![Page 10: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/10.jpg)
Why Do We Need Proteome File Standards?
Standardize reporting.
Data pipelines require batch export of files.
Allow more facile development of open source software for proteomics.
Data longevity.
![Page 11: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/11.jpg)
XML Files for Proteomics
ProsStructuredEasily readableTranslatable into other formatsAmenable to open source development
ConsInefficientComplexity can be problem
![Page 12: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/12.jpg)
Development of Proteomics XML Standards
ISB (www.isb.org)mzXML
EBI/HUPO (www.hupo.org, www.pedro.org)PedromzDATA, MIAPE
www.gaml.org
![Page 13: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/13.jpg)
XML Summary
A near-term solution.Useful for data exchange.Multiple formats will be necessary.Formats will change.Mechanism for timely updates will be necessary.We will support all major XML formats.
![Page 14: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/14.jpg)
But Other Formats Could Be Better
Computationally, restructuring files to better fit your data structure can lead to increases in efficiency.
It is important that the file format be publicly available.Parsers and translators should also be available.Batch conversions should be supported.
The standard should be openess in data formats.
![Page 15: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/15.jpg)
Information Management in Proteomics
Low level data management (e.g., LIMS).Curation tools (goal is to automate).Higher level information management (e.g., metadata).Aggregation and Integration systems.
![Page 16: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/16.jpg)
General Goals:
• Integrated, simple, flexible system to acquire, manage, and mine Proteomics data (code generation).
• Useable by distributed groups.• Support all data types and standards.• Secure.
![Page 17: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/17.jpg)
Specific Design Goals:• Open architecture development.• Multi-tier with HTML user interface.• Distributed system.• Scaleable system.• Compatible with other databases.• Flexibly accommodate data types.• Low maintenance.• Easily extensible.• Developed using open standards.
![Page 18: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/18.jpg)
System Components
Laboratory data management system,Data viewers (2D Gel images, chromatograms, and mass spectra),Automated data collection from instruments,Automated protein database search engines (Mascot, Prot. Prospector, X!Tandem), Data discovery toolkits.
![Page 19: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/19.jpg)
Prime Architecture
Session Objects
Prime DatabaseProprietary Databases
JDBC Layer
Data Service
C++JavaPerl
WEB Server
Servlets Servlets
File System
Daemons
![Page 20: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/20.jpg)
Work Flow Levels
Lab Level (Samples)Data Level (Processing)Administrative Level (Paper)
![Page 21: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/21.jpg)
Lab Level Work FlowImage Analysis Top Segment
Robotics Top Segment
![Page 22: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/22.jpg)
PRIME Table Structure Segment
![Page 23: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/23.jpg)
PRIME Summary Stats
220 tables.~2,200 Java source files~ 106 lines of code.
prime.proteome.med.umich.edu
![Page 24: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/24.jpg)
Uses of PRIME
Documentation and management.Curation.Collaboration tool.Provide data access for reviewers.Host public access to data.
![Page 25: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/25.jpg)
Distribution of Proteomics Data: The Cathedral vs the Bazaar Revisited.
Centralized system.Must deal with many of the same issues as standards development.May be better suited for metadata.
Distributed system.Distributes costs of maintenance.Puts ‘ownership’ in hands of interested parties.
![Page 26: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/26.jpg)
What Are Issues for Distributed Systems?
Ownership of data.Persistence.Maintenance of context.Quality Control.Security.Cost (long-term maintenance).
![Page 27: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/27.jpg)
Challenges in Proteome Informatics
Proteome technologies evolve rapidly.Software always lags behind hardware.Software always lags behind applications.
Instrument development is negatively impacted by software development (cost and time).
![Page 28: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/28.jpg)
The (partial) Solution: Open Source Efforts in Proteomics
Progress in Proteomics will be faster if a robust open source community is developed.Open source efforts allow the community to respond to new technologies rapidly.Open source allows each individual in the community to respond to their own needs.Cost of development is shared.Open Source is compatible with commercial proprietary software.
![Page 29: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/29.jpg)
Open Source Websites
www.proteomecommons.orgwww.thegpm.orghttp://www.systemsbiology.org/www.jasondunsmore.org/projectswww.bioexchange.com/tools/http://bioinformatics.icmb.utexas.edu/OPD/http://www.peptideatlas.org/
![Page 30: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/30.jpg)
www.proteomecommons.org
Versioning system for organization and archives.Full source code and documentation downloadable.Spectra used for development and testing downloadable. Digital signatures used for security.Allows mirroring and bittorrent so users may host their own projects.Supports metainformation attached to projects.Code-in-progress accessible.
![Page 31: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/31.jpg)
Spectrum Viewer Module in PRIME
![Page 32: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/32.jpg)
Open Source Spectrum Viewer
Uses WebStartDisplays peak lists or spectra.Allows usual data manipulations.Generates peak lists.Allows spectrum annotation.Exports publishable-quality images.
![Page 33: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/33.jpg)
Selected Datasets Distributed on Proteome Commons
Development datasets‘Gold Standard’ datasets
~50 eukaryotic proteins400 human proteins
Hosting/mirroring other datasets
![Page 34: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/34.jpg)
Acknowledgements
ProteomicsMary Hurley (MPC)Eric Olsen (NRPP)Gary Rymar (NRPP)John Strahler (NRPP, MPC)Donna Veine (NRPP)Angela Walker (NRPP, MPC)Xuequn Xhu (NRPP)
NRPPRuss Finley (WSU)Brett Phinney (MSU)Trey Ideker (UCSD)Curt Wilkerson (MSU)
![Page 35: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/35.jpg)
Acknowledgements
Software DevelopmentDavid LentzNarayani Anand Hsueling Chang Panagiotis Papoulias
Proteome InformaticsThomas Blackwell (UM)Jayson Falkner (UM)Russ Finley (WSU)Catherine Grasso(UM)Trey Ideker (UCSD)George Michailidis (UM)Panagiotis PapouliasDavid States (UM)Peter Ulintz (UM)Curt Wilkerson (MSU)
![Page 36: Management of Proteomics Data · 2005. 1. 11. · Challenges in Proteome Informatics Proteome technologies evolve rapidly. Software always lags behind hardware. Software always lags](https://reader033.fdocuments.us/reader033/viewer/2022060600/60544e0fbcedf609624b03ac/html5/thumbnails/36.jpg)
Websites
www.proteomecommons.orgwww.proteomeconsortium.orgwww.proteome.med.umich.eduwww.proteomecenter.med.umich.edu
prime.proteome.med.umich.edu