Common parameters

44
Common parameters • At the beginning one need to set up the parameters. http://human.thegpm.org

description

Common parameters. At the beginning one need to set up the parameters. http://human.thegpm.org. Common parameters. Most important: the input experimental spectra Self-explaining. . Common parameters. Taxon, and database Self-explaining. - PowerPoint PPT Presentation

Transcript of Common parameters

Page 1: Common parameters

Common parameters

• At the beginning one need to set up the parameters.

• http://human.thegpm.org

Page 2: Common parameters

Common parameters

• Most important: the input experimental spectra– Self-explaining.

Sequest Mascot X!Tandem

.DTA x X X

.RAW X

.MGF X X X

.PKL X X

.PKS X

.mzData X X X

.mzXML X X X

.mzML X

Page 3: Common parameters

Common parameters

• Taxon, and database– Self-explaining.– E.g. samples form human cells should be queried

against human protein database.– Sometimes Protein Sequence libraries are

available.

Page 4: Common parameters

Common parameters

• Parent mass tolerance• If it is much smaller than the optimal would be:

– the correct peptide can be eliminated from the search space

– Execution timedecreases 100 200 300 400 500 600 700 800 900 1000 1100

0

20

40

60

80

100

120

m/z

inte

nsity

(%)

100 200 300 400 500 600 700 800 900 1000 11000

20

40

60

80

100

120

m/z

inte

nsity

(%)

b1 b2 b3 b4 b5 b6 b7 b8 b9 b10y1 y2 y3 y4 y5 y6 y7 y8 y9 y10

Spectra comparison |)()(| tPMqPM

Page 5: Common parameters

Common parameters

• Parent mass tolerance• If it is much bigger than the optimal would be:

– decreases the significance of the scores, – makes execution time longer

Spectra comparison

T hscore

A

B

probability distributionof random scores

probability distributionof correct scores

p-value of hit h

Freq

uenc

y

Page 6: Common parameters

Common parameters

• Parent mass tolerance• Usually is around 1Da.

100 200 300 400 500 600 700 800 900 1000 11000

20

40

60

80

100

120

m/z

inte

nsity

(%)

100 200 300 400 500 600 700 800 900 1000 11000

20

40

60

80

100

120

m/z

inte

nsity

(%)

b1 b2 b3 b4 b5 b6 b7 b8 b9 b10y1 y2 y3 y4 y5 y6 y7 y8 y9 y10

Spectra comparison |)()(| tPMqPM

Page 7: Common parameters

Common parameters

• Fragment ion match tolerance– Depends on the instrument accuracy.– If it is mach small than

the optimum:matches will be lost

100%

0%1

0

Page 8: Common parameters

Common parameters

• Fragment ion match tolerance– If it is much smaller than the optimal would be:

Correct matched peaks will be lost.Increases the FDR, increases the false negatives, decreases the sensitivity,

T hscore

Freq

uenc

y

A

B

probability distributionof random scores

probability distributionof correct scores

p-value of hit h

Page 9: Common parameters

T hscore

A

B

probability distributionof random scores

probability distributionof correct scores

p-value of hit h

Freq

uenc

y

Common parameters

• If the fragment ion match tolerance is much bigger than the optimal would be:– Many theoretical peaks will match to an

experimental peak– Increases the random scores and it decreases the

statistical significance

Page 10: Common parameters

Common parameters

Page 11: Common parameters

Fragment ion tolerance (T)T = 0.4Da (correct) T = 0.05Da (too small) T = 2.0Da (too large)

Page 12: Common parameters

Fragment ion tolerance (T)T = 0.4 (correct) T = 0.05 (too small) T = 2.0 (too large)

217 proteins 713 homologs930 proteins

132 proteins 406 homologs538 proteins

197 proteins 589 homologs786 proteins

Page 13: Common parameters

Common parameters

• Instrument– Some database search software's allow you to

select the type of the instruments like ESI QUAD or Quad-TOF

– This fine-tunes the search engine according to which fragment ion series will be used for scoring.

– E.g.: Immonium ions, a series ions, b-, c-, x-, a-NH3,z+H series, y-H2O etc.

Page 14: Common parameters

Common parameters

• Enzyme,– the enzyme used for enzymatic digestion in the

biological sample preparation.– This will be used for the in silico digestion of

protein sequences for peptide generation.

Page 15: Common parameters

Common parameters

• E-value cut off

T hscore

A

B

probability distributionof random scores

probability distributionof correct scores

p-value of hit h

Freq

uenc

y

Page 16: Common parameters

Common parameters

• Ion mass search type– Monoisotopic (default)

• More accurate,

– Average• Might need larger fragment ion tolerance,

Page 17: Common parameters

Common parameters

• Charge state– Too high charge state increases the FDR.

T hscore

A

B

probability distributionof random scores

probability distributionof correct scores

p-value of hit h

Freq

uenc

y

Page 18: Common parameters

Common parameters

• Decoy search– Includes reversed dataset in the peptide

identification.– Provides more accurate p-value and FDR estimation– Can double the search time

Page 19: Common parameters

Common parameters• Error tolerant search. Large number of spectra

remain without significant score. Reasonable number of fragment ion peaks might have not match.– Underestimated mass measurement error (should be

seen in peptide view graphs,– Incorrect determination of precursor charge state– Peptide sequence is not in the database.– Missed cleavage & unexpected cleavage,– Unexpected chemical & post-translational

modification.

Page 20: Common parameters

Scores:13. 156. 41. 49. 34. 33. 27. 211. 28. 110. 12. 15. 112. 1

Input dataExperimental Spectra

Protein sequence DB

Score: 4Peptide: AELDLNMTR

Score: 32Peptide: SHLITLLLFLFHSETICR

Score: 3Peptide: MEICRGLRScore: 15Peptide: LLHGDPGEEDKScore: 4Peptide: MDHPEDESHSEKScore: 5Peptide: SAEDLEADK

Score: 3Peptide: SIEAKLTLR

Input data Peptide assignment Validation

Protein inference

Quantitation

Interpretation

Cn=(32-4)/32=0.875

Cn=(4-4)/4=0

Cn=(3-3)/3=0

Cn=(15-4)/15=0.733

Keep the peptide assignment that exceeds a certain limit.

Page 21: Common parameters

>IPI:IPI00000044.1|SWISS-PROT:P01127MNRTFGQVVARLVSAEGDPIPEELYEMLSDHSIRSFDDLQRLLHGDPGEEDKAELDLNMTRSHSGGELESLARGRRSLGSLTIAEPAMIAECKTRTEVFEISRRLIDRTNANFLVWPPCVEVQRCSGCCNNRNVQCRPTQVQLRPVQVRKIEIVRKKPIFKKATVTLEDHLACKCETVAAARPVTRSPGGSQEQRAKTPQTRVTIRTVRVRRPPKGKHRKFKHTHDKTALKETLGA

Input dataExperimental Spectra

Scores:1. 2

Input data Peptide assignment

Validation Protein inference

Quantitation

Interpretation

100 200 300 400 500 600 700 800 900 1000 11000

20

40

60

80

100

120

m/z

inte

nsity

(%)

100 200 300 400 500 600 700 800 900 1000 11000

20

40

60

80

100

120

m/z

inte

nsity

(%)

b1 b2 b3 b4 b5 b6 b7 b8 b9 b10y1 y2 y3 y4 y5 y6 y7 y8 y9 y10

Spectra comparison:

Protein sequence DB

TFGQVVAR FGQVVAR GQVVAR QVVAR VVAR VAR AR TFGQVVA TFGQVV TFGQV TFGQ TFG TF

Unexpected cleavages

Page 22: Common parameters

100 200 300 400 500 600 700 800 900 1000 11000

20

40

60

80

100

120

m/zin

tens

ity (%

)

>IPI:IPI00000044.1|SWISS-PROT:P01127MNRCWALFLSLCCYLRLVSAEGDPIPEELYEMLSDHSIRSFDDLQRLLHGDPGEEDKAELDLNMTRSHSGGELESLARGRRSLGSLTIAEPAMIAECKTRTEVFEISRRLIDRTNANFLVWPPCVEVQRCSGCCNNRNVQCRPTQVQLRPVQVRKIEIVRKKPIFKKATVTLEDHLACKCETVAAARPVTRSPGGSQEQRAKTPQTRVTIRTVRVRRPPKGKHRKFKHTHDKTALKETLGA

100 200 300 400 500 600 700 800 900 1000 11000

20

40

60

80

100

120

m/z

inte

nsity

(%)

b1 b2 b3 b4 b5 b6 b7 b8 b9 b10y1 y2 y3 y4 y5 y6 y7 y8 y9 y10

Input dataExperimental Spectra

Scores:1. 2

Input data Peptide assignment

Validation Protein inference

Quantitation

Interpretation

Spectra comparison:

Protein sequence DB

Missed cleavages

Page 23: Common parameters

100 200 300 400 500 600 700 800 900 1000 11000

20

40

60

80

100

120

m/zin

tens

ity (%

)

>IPI:IPI00000044.1|SWISS-PROT:P01127MNRCWALFLSLCCYLRLVSAEGDPIPEELYEMLSDHSIRSFDDLQRLLHGDPGEEDKAELDLNMTRSHSGGELESLARGRRSLGSLTIAEPAMIAECKTRTEVFEISRRLIDRTNANFLVWPPCVEVQRCSGCCNNRNVQCRPTQVQLRPVQVRKIEIVRKKPIFKKATVTLEDHLACKCETVAAARPVTRSPGGSQEQRAKTPQTRVTIRTVRVRRPPKGKHRKFKHTHDKTALKETLGA

Input dataExperimental Spectra

Scores:1.22.2

Input data Peptide assignment

Validation Protein inference

Quantitation

Interpretation

Spectra comparison:

Protein sequence DB

100 200 300 400 500 600 700 800 900 1000 11000

20

40

60

80

100

120

m/z

inte

nsity

(%)

b1 b2 b3 b4 b5 b6 b7 b8 b9 b10y1 y2 y3 y4 y5 y6 y7 y8 y9 y10

Missed cleavages

Page 24: Common parameters

100 200 300 400 500 600 700 800 900 1000 11000

20

40

60

80

100

120

m/zin

tens

ity (%

)

>IPI:IPI00000044.1|SWISS-PROT:P01127MNRCWALFLSLCCYLRLVSAEGDPIPEELYEMLSDHSIRSFDDLQRLLHGDPGEEDKAELDLNMTRSHSGGELESLARGRRSLGSLTIAEPAMIAECKTRTEVFEISRRLIDRTNANFLVWPPCVEVQRCSGCCNNRNVQCRPTQVQLRPVQVRKIEIVRKKPIFKKATVTLEDHLACKCETVAAARPVTRSPGGSQEQRAKTPQTRVTIRTVRVRRPPKGKHRKFKHTHDKTALKETLGA

100 200 300 400 500 600 700 800 900 1000 11000

20

40

60

80

100

120

m/z

inte

nsity

(%)

b1 b2 b3 b4 b5 b6 b7 b8 b9 b10y1 y2 y3 y4 y5 y6 y7 y8 y9 y10

Input dataExperimental Spectra

Scores:1.22.23.1

Input data Peptide assignment

Validation Protein inference

Quantitation

Interpretation

Spectra comparison:

Protein sequence DB

Missed cleavages

Page 25: Common parameters

Common parameters

• Automatic error tolerant search.• Chemical and Post-Translational Modifications

(PTMs)• Fixed modification (simply modifies the mass of

the Amino Acid)• Variable modifications (can modify the mass)

• Search engines iteratively insert all combination of the possible PTMs.

Page 26: Common parameters

Common parameters

• Automatic error tolerant search.

– more peptides can be indentified.– enlarges the search space much more

• Increases the execution time• Decreases the statistical significance, increases the

FDR.

Page 27: Common parameters

Common parameters• Automatic error tolerant search.• In order to reduce the search space two pass

approach is applied.– 1st pass:

• Identification of perfect peptides (no PTMs, perfect digestion)

– 2nd pass:• Pass the proteins whose one of the peptides was identified

in the 1st pass.• Extensive search in the reduced protein sequence, including

missed and unexpected cleavage, PTMs, point mutations, etc.

Page 28: Common parameters

Common parameters

• Output parameters– Mainly about formatting the results files. What

and how many details want to see.

Page 29: Common parameters

Common parameters

• Other program specific parameters. • Different for X!tandem, Mascot, Sequest, etc.

Page 30: Common parameters

X!Tandem

Page 31: Common parameters

Outputs – Browsing the results

Page 32: Common parameters

Outputs – Browsing the results

Page 33: Common parameters

Outputs – Browsing the results

Page 34: Common parameters

Outputs – Browsing the results

Page 35: Common parameters

Outputs – Browsing the results

Page 36: Common parameters

OMSSA’s search engine

Page 37: Common parameters

OMSSA’s output

Page 38: Common parameters

OMSSA’s result

Page 39: Common parameters

• Good spectrum, good score, bad annotation– Rare if the p-value is significant

• Good spectrum, bad score, bad annotation– Peptide might be modified, non-perfect digestion,

not in the database.

Page 40: Common parameters

• Bad spectrum, bad score, bad annotation

Page 41: Common parameters

• Good spectrum, good score, good annotation

Page 42: Common parameters

Trans-Proteomic Pipeline (TPP)• Trans-Proteomic Pipeline (TPP) is a data analysis

pipeline for the analysis of LC/MS/MS proteomics data. • TPP includes modules for validation of database search

results, quantitation of isotopically labeled samples, and validation of protein identifications, as well as tools for viewing raw LC/MS data, peptide identification results, and protein identification results.

• The XML backbone of this pipeline enables a uniform analysis for LC/MS/MS data generated by a wide variety of mass spectrometer types, and assigned peptides using a wide variety of database search engines.

Page 43: Common parameters

Trans-Proteomic Pipeline (TPP)

Page 44: Common parameters

Summary

• Protein identification from MS/MS data is not a black box.

• Always look at the results and understand how it