Identification of variables and parameters for protein data analysis in clinical diagnostics David...

62
variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    3

Transcript of Identification of variables and parameters for protein data analysis in clinical diagnostics David...

Page 1: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Identification of variables and parameters for protein data

analysis in clinical diagnostics

David Yang

Leighton Ing

Mentor: Dr. Tina Xiao

JPL/NASA

Page 2: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Proteomics

National Cancer Institute and Early Detection Resource Network - Clinical Diagnostics Analyzing protein signature for general

characterization of normal vs. pathogenic states

Page 3: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Project Goals Characterize the experimental variables

which affect Mass Spectrometry(MS) output & the necessary steps of MS data processing What influences output and how do we

correct for those influences? What information do other users need?

Identify parameters for software evaluation in the processing of MS data.

Page 4: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Methodology

Research a method of protein analysis Research the mechanics Analyze how the mechanics influence the

output Recognize data important to other users Identify the data processing steps for

extracting a useful spectrum

Page 5: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Method of Protein Analysis

Mass spectrometry Measures quantity of molecules with specific

mass to charge ratios Produces output which could be used as a

protein signature Matrix Assisted Laser

Desorption/Ionization Time of Flight for protein analysis

Page 6: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Matrix Assisted Laser Desorption/Ionization (MALDI)

Light

MassAnalyzer

Proteinsample

Page 7: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Time of Flight (TOF)

Ionized particles accelerated by magnetic field

Page 8: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

MALDI-TOF-MS

MALDI TOF Mass Spectrometry of a protein sample has three elements with parameters that influence output Inconsistencies between them reduce the

ability to compare samples• Produce variation which is not necessarily caused

by protein composition of sample

Page 9: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Sample

Freeze/thaw cycles Source of sample

Serum vs tissue Fractionated? Digested w/ protease?

Page 10: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Laser Ionization/Desorption

Plate and Matrix used in LDI Crystallization pattern Laser intensity

Page 11: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Plate and Matrix

Page 12: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Laser Ionization/Desorption

Plate and Matrix used in LDI Crystallization pattern Laser intensity

Page 13: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Crystallization

Randomized process Introduces variation between shots

Page 14: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Laser Ionization/Desorption

Plate and Matrix used in LDI Crystallization pattern Laser intensity

Page 15: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Mass analyzer

Mass calibration Internal vs external

Reflectron usage Detector voltage Detector saturation

Page 16: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Mass Calibration

Internal

External

Sample + Standard

Sample Standard

Page 17: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Mass analyzer

Mass calibration Internal vs external

Reflectron usage Detector voltage Detector saturation

Page 18: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Reflectron

Page 19: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Mass analyzer

Mass calibration Internal vs external

Reflectron usage Detector voltage Detector saturation

Page 20: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Mass analyzer

Mass calibration Internal vs external

Reflectron usage Detector voltage Detector saturation

Page 21: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Output Processing

Understanding the mechanics tells us what we need to do to process the output Usability of raw output for protein signature

comparison is limited

Page 22: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Baseline Correction

High KE ions saturate the detector, resulting in a higher intensity output

Malyarenko et al. Enhancement of Sensitivity and Resolution of Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometric Records for Serum Peptides Using Time-Series Analysis Techniques

Page 23: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Mass Calibration

Required to convert time series output into m/z ratio

Time Intensity1 21832 21523 21184 21155 2086

M/Z Intensity9.9294487 21839.9375644 21529.9455109 21189.9532881 21159.9608962 2086

Page 24: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Normalization

Scale the intensities based on the largest intensity

Improves ability to compare samples by reducing the variability of intensity between spectra

www.psrc.usm.edu/mauritz/maldi.html

Page 25: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Smoothing

Decrease effects of electrical system noise

Page 26: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Peak detection

Identify potential masses Reduces number of features which need

to compared

Where am I?

Page 27: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Peak alignment

Aligns corresponding peaks across samples

Reduces phase variation across samples by ensuring that peptides share their set of peak locations

Page 28: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Averaging of spectra

Address variability between runs by averaging replicates

Recall crystallization and shot variability Averaging of multiple laser shots often

performed by machine

Page 29: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Results Identified vital information that affects the output

of the machine Information useful for a researcher using the spectra

Researched the processes which make the output more useful as protein signature

Next step: Identify parameters for software evaluation in MS data processing

Page 30: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Goal – Identify parameters for evaluating

software capabilities in the processing and analysis of Mass Spectrometry data.

Three candidates VIBE (Incogen Inc.) geWorkbench (Forge) S-PLUS (Insightful Corp.)

Page 31: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Software Evaluation

General parameters Input formats Algorithms for processing and analysis of

proteomics data Results Benefits Limitations

Page 32: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Software Evaluation

General parameters Input formats Algorithms for processing and analysis of

proteomics data Results Benefits Limitations

Page 33: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

General Parameters Platform/Operating system compatibility?

Is the software Open source?

Is the software capable of performing the necessary tasks independently? Additional modifications? Internet access? Server ?

Page 34: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Software Evaluation

General parameters Input formats Algorithms for processing and analysis of

proteomics data Results Benefits Limitations

Page 35: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Data Input

What types of file formats can the software open? Import?

What type of format must the data be? DNA (nucleotides – A, T, G, C) Proteins (amino acids – M, L, A, I, etc.)

Page 36: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Software Evaluation

General parameters Input formats Algorithms for processing and analysis of

proteomics data Results Benefits Limitations

Page 37: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Software algorithms necessary for Proteomics data analysis

Can the software perform: Baseline subtractions? Mass calibrations? Noise reductions? Peak identifications? Normalization? Peak alignments?

Page 38: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Baseline Subtraction

(Malyarenko, et al. 2005)

Page 39: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Mass Calibration

(Kearsleya, et al. 2005)

Page 40: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Smoothing/Noise Reduction

(Malyarenko, et al. 2005)

Page 41: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Peak Identifications

(Do, 2006)

Page 42: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Normalization

(Kearsleya, et al. 2005)

Page 43: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Peak Alignments

(Malyarenko, et al. 2005)

Page 44: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Software Evaluation

General parameters Input formats Algorithms for processing and analysis of

proteomics data Results Benefits Limitations

Page 45: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Results – Visualization of results

How can you visualize the data? Save/Export work

Can you save/export your results? If yes, what format can it save/export? Once saved, can the files be opened by other

software packages? Print out

Can you print out a hard copy for record?

Page 46: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Visualization

MUSCLE (Edgar)

VIBE (Incogen Inc.)

Page 47: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Results – Visualization of results

How can you visualize the data? Save/Export work

Can you save/export your results? If yes, what format can it save/export? Once saved, can the files be opened by other

software packages? Print out

Can you print out a hard copy for record?

Page 48: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Software Evaluation

General parameters Input formats Algorithms for processing and analysis of

proteomics data Results Benefits Limitations

Page 49: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Software Benefits

What benefits does the software offer? Convenience of integrated modules Efficient – saves “man-power” of having to sit

there and do everything User-friendly interface

Page 50: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Convenience of Integrated

Modules

Page 51: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Efficiency

Page 52: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

User-friendly Interface?

Page 53: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Software Evaluation

General parameters Input formats Algorithms for processing and analysis of

proteomics data Results Benefits Limitations

Page 54: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Software Limitations

Limitations customization Small modifications to existing modules? Adding a new module?

Internet/Server Dependent?

Page 55: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Conclusion – We have identified these parameters to

be crucial for the processing of MS data.• Baseline subtractions• Mass calibrations• Noise reductions• Peak identifications• Normalization• Peak alignments

Page 56: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Conclusion –

VIBE Capable of manipulating protein sequences,

but unable to process raw data. geWorkbench

Did not pass general parameters for installation.

S-Plus Evaluation still in progress…

Page 57: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

VIBE (by Incogen Inc.) Convenient integration of nucleotide and

amino acid analysis tools – BLAST (–X, –N, –P, TBLASTN, TBLASTP) Nucleotide and AA search

• FASTA, –X, –Y, Smith-Waterman, etc. Sequence manipulations

• Primer3, Conditional Filters, Translations, etc. Sequence alignments

• Crossmatch, ClustalW, Hidden Markov Model, etc.

Page 58: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Conclusion – We have identified these parameters to

be crucial for the processing of MS data.• Baseline subtractions• Mass calibrations• Noise reductions• Peak identifications• Normalization• Peak alignments

Page 59: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Conclusion –

VIBE Capable of manipulating protein sequences,

but unable to process raw data. geWorkbench

Did not pass general parameters for installation.

S-Plus Evaluation still in progress…

Page 60: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Conclusion –

VIBE Capable of manipulating protein sequences,

but unable to process raw data. geWorkbench

Did not pass general parameters for installation.

S-Plus Evaluation still in progress…

Page 61: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Literature Citations1) Do, P. Improved Peak Detection in Mass Spectrometry Spectrum by

Incorporating Continuous Wavelet Transform-based Pattern Matching. Robert H. Lurie Comprehensive Cancer Center, Northwestern University. ppt slides. 2006.

2) Kearsleya, A., Wallaceb, W.E., Bernala, J., and CM Guttmanb. A numerical method for mass spectral data analysis. Applied Mathematics Letters. 18:1412–1417, 2005.

3) Malyarenko, D.I., Cooke, W.E., Adam B-L, Malik, G., Chen, H., Tracy, E.R., Trosset, M.W., Sasinowski, M., Semmes, O.J. and D.M. Manos. Enhancement of Sensitivity and Resolution of Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometric Records for Serum Peptides Using Time-Series Analysis Techniques. Clinical Chemistry. 51(1):65-74. 2005.

Page 62: Identification of variables and parameters for protein data analysis in clinical diagnostics David Yang Leighton Ing Mentor: Dr. Tina Xiao JPL/NASA.

Acknowledgements Jet Propulsion Laboratory

Dr. Tina Xiao

Southern California Bioinformatics Summer Institute (SoCalBSI) Dr. Sandra Sharp Dr. Jamil Momand Dr. Wendie Johnston Dr. Nancy Warter-Perez Ronnie Cheng Friends

Duke University Medical Center Dr. Simon Lin

Center for Disease Control and Prevention (CDC) Dr. R Cameron Craddock

Huntington Medical Research Institute (HMRI) Dr. James Riggins Dr. Alfred Fonteh