Information-Theoretic Mass Spectral Library Search
description
Transcript of Information-Theoretic Mass Spectral Library Search
1
Information-Theoretic Mass Spectral Library Search
Arvind Visvanathan
CSCE 990Seminar in Multi-Dimensional Chromatography Systems, Informatics,
and Applications
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OutlineIntroduction
Related WorkMethod
Results and Discussion
2
Outline
• Introduction– Mass spectrum search types
• Related Work– Other techniques
• NIST, PBM, DotMap
• Method– Probability and Information– Normalized distribution function
• Results• Conclusion
OutlineIntroduction
Related WorkMethod
Results and Discussion
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
3
Introduction – Mass Spectrum
Mass SpectrumSearch AlgorithmSearch TypesApplications
OutlineIntroduction
Related WorkMethod
Results and Discussion
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
m/z
Inte
nsity
Decane
4
Introduction – Mass Spectrum Search
OutlineIntroduction
Related WorkMethod
Results and Discussion
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
MS Library
Unknown Spectrum Search
Algorithm
Pot
entia
l Mat
ches
Mass SpectrumSearch AlgorithmSearch TypesApplications
5
Introduction – Search Types
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
• Identity search– Unknown mass spectrum present in library– Looking for exact spectrum
• Similarity search– Unknown mass spectrum not present in library– Looking for similar spectrum
Mass SpectrumSearch AlgorithmSearch TypesApplications
OutlineIntroduction
Related WorkMethod
Results and Discussion
6
Introduction – MS Search Applications
• Steroid detection in athletes• Monitor patient breath during surgery• Composition of molecular species found in
space• Honey adulterated with corn syrup• Locate oil deposits• Monitor fermentation process in the
biotechnology industry• Detect dioxins in contaminated fish
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Mass SpectrumSearch AlgorithmSearch TypesApplications
OutlineIntroduction
Related WorkMethod
Results and Discussion
7
Related Work – NIST MS-Search [Stein ‘94]
• Pre-search the unknown spectra in library– Reduce search domain (160K 4K compounds)
• Compute match factor for each compound in the pre-search result
• Match Factor (MF)– Range 0-999– Higher the better
• Pre-search result sorted based on MF value• Pick the topmost compounds as possible matches
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
MS SearchProbability Based MatchingDotMap
OutlineIntroduction
Related WorkMethod
Results and Discussion
8
Related Work – NIST MS-Search [Stein ‘94]
• Match Factor Computation [Stein ‘94]– Term 1 – Mass weighted normalized dot product
– Term 2 – Relative intensities of adjacent peaks in both spectra
– Combination of F1 & F2
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
MS SearchProbability Based MatchingDotMap
OutlineIntroduction
Related WorkMethod
Results and Discussion
9
Related Work – NIST MS-Search [Stein ‘94]
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
MS SearchProbability Based MatchingDotMap
OutlineIntroduction
Related WorkMethod
Results and Discussion
m/z Intensity
35 100
36 1
37 1
45 999
55 200
m/z Intensity
35 100
36 1
37 2
45 999
55 200
C-1 C-2
Compare
C-1 & C-1
Compare
C-1 & C-2
F1 999 999
F2 999 824
MF 999 925
10
Related Work – Probability Based Matching [McLafferty et. al. ‘75]
• Confidence Value (K) instead of MF• Four components for each m/z
– Term 1 : U : Based on the uniqueness of a m/z value– Term 2 : A : Intensity contribution to the confidence– Term 3 : W : Window factor (measure of agreement)– Term 4 : D : Dilution factor (measure of purity)– K ∑ (U + A + W – D) for each m/z
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OutlineIntroduction
Related WorkMethod
Results and Discussion
MS SearchProbability Based MatchingDotMap
11
Related Work – DotMap [Sinovec et. al. ‘04]
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OutlineIntroduction
Related WorkMethod
Results and Discussion
MS SearchProbability Based MatchingDotMap
Fumaric acid
Adipic acid
Lactic acid
DotMap
12
Related Work – DotMap [Sinovec et. al. ‘04]
• Inverse problem• DotMap computed across the image
• Higher valued areas indicate presence of compound of interest
• Multiple compounds of interest– Compute DotMap overlay
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OutlineIntroduction
Related WorkMethod
Results and Discussion
MS SearchProbability Based MatchingDotMap
13
Related Work – DotMap [Sinovec et. al. ‘04]
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OutlineIntroduction
Related WorkMethod
Results and Discussion
MS SearchProbability Based MatchingDotMap
14
Related Work – DotMap [Sinovec et. al. ‘04]
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OutlineIntroduction
Related WorkMethod
Results and Discussion
MS SearchProbability Based MatchingDotMap
15
Method – Motivation
• NIST MS-Search [Stein ‘94]– No domain information utilized
• PBM Matching [McLafferty et. al. ‘75]– Old technique (‘75)– Ad hoc domain information utilization
• DotMap– No domain information utilized
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
MotivationProbability & EntropyDistribution FunctionMatch Factor
OutlineIntroduction
Related WorkMethod
Results and Discussion
16
Method – Entropy
• Entropy based approach– Entropy measure of the amount of
uncertainty – Based on probabilities
• Include domain based knowledge (information) in computing the match factor
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
MotivationProbability & EntropyDistribution FunctionMatch Factor
OutlineIntroduction
Related WorkMethod
Results and Discussion
17
Method – Distribution Function
• Library– NIST EPA Library– 163K compounds
• Compute distribution function (DF)– 2 dimensional array
• m/z vs intensity
– DF[i][j]• # compounds in library
– m/z = i– Intensity = j
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
MotivationProbability & EntropyDistribution FunctionMatch Factor
OutlineIntroduction
Related WorkMethod
Results and Discussion
18
Method – Distribution Function
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
MotivationProbability & EntropyDistribution FunctionMatch Factor
OutlineIntroduction
Related WorkMethod
Results and Discussion
m/z
Inte
nsity
19
Method – Normalized Distribution Function (NDF)
• Normalized Distribution Function
– NDF[mz][int] = DF[mz][int] / ∑ DF[mz][i]
– Where ∑ DF[mz][i] = 163K
– NDF Probabilities [0-1]
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
MotivationProbability & EntropyDistribution FunctionMatch Factor
OutlineIntroduction
Related WorkMethod
Results and Discussion
i
i
20
Method – Assumptions
• AssumptionEach m/z is treated independently in the match
factor computation from normalized distribution function
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
MotivationProbability & EntropyDistribution FunctionMatch Factor
OutlineIntroduction
Related WorkMethod
Results and Discussion
21
Method – Match Factor
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
MotivationProbability & EntropyDistribution FunctionMatch Factor
OutlineIntroduction
Related WorkMethod
Results and Discussion
22
Results – Overview
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise
OutlineIntroduction
Related WorkMethod
Results and Discussion
• Technique– Compound in library + Noise – Search noisy compound in library
• Evaluation metric - Average Rank– Rank = Position of correct compound in hit list– Repeat above 3000 times and take average rank
• Compared with– NIST– NISTDOT (First term in NIST algorithm)
23
Results – Noise models
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise
OutlineIntroduction
Related WorkMethod
Results and Discussion
• AdditiveAU = AL + G(0,σ)
• MultiplicativeAU = AL + AL* G(0,σ)
• Johnson ColoredAU = AL + G(0,σ*√m)
• Random spectrumAU = AL + x * AR
24
Results – Additive Noise
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise
• Compound = Compound + Additive noise• Additive Gaussian noise
– Zero mean– Variable standard deviation
• For each m/z in library spectrumAU = AL + G(0,σ)
OutlineIntroduction
Related WorkMethod
Results and Discussion
25
Results – Additive Noise (Example)
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise
OutlineIntroduction
Related WorkMethod
Results and Discussion
0
200
400
600
800
1000
1200
27
29
31
33
35
37
39
41
43
45
47
49
51
53
55
57
59
61
63
65
67
69
71
73
75
77
79
81
83
85
m/z
Inte
ns
ity
Pure
Noisy
-50
-40
-30
-20
-10
0
10
20
27 34 41 48 55 62 69 76 83
m/z
No
ise
Inte
nsi
ty
26
Results – Additive Noise (Performance)
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise
OutlineIntroduction
Related WorkMethod
Results and Discussion
27
Results – Multiplicative Noise
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise
• Compound = Compound + Multiplicative noise
• Multiplicative Gaussian noise – Zero mean– Variable standard deviation
• For each m/z in library spectrumAU = AL + AL* G(0,σ)
OutlineIntroduction
Related WorkMethod
Results and Discussion
28
Results – Multiplicative Noise (Example)
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise
OutlineIntroduction
Related WorkMethod
Results and Discussion
-200
-150
-100
-50
0
50
100
27 34 41 48 55 62 69 76 83
m/z
No
ise
Inte
nsi
ty
0
200
400
600
800
1000
1200
27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85
m/z
Inte
nsi
ty
Pure
Noisy
29
Results – Multiplicative Noise (Performance)
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise
OutlineIntroduction
Related WorkMethod
Results and Discussion
30
Results – Johnson Colored Noise
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise
• Compound = Compound + Colored Noise• Gaussian noise
– Zero mean– Variable standard deviation
• For each m/z in library spectrumAU = AL + G(0,σ*√m)
OutlineIntroduction
Related WorkMethod
Results and Discussion
31
Results – Johnson Colored Noise (Example)
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise
OutlineIntroduction
Related WorkMethod
Results and Discussion
-50
-40
-30
-20
-10
0
10
20
30
40
27 34 41 48 55 62 69 76 83
m/z
No
ise
Inte
nsi
ty
0
200
400
600
800
1000
1200
27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85
m/z
Inte
nsi
ty
Pure
Noisy
32
Results – Johnson Colored Noise (Performance)
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise
OutlineIntroduction
Related WorkMethod
Results and Discussion
33
Results – Random Spectrum Noise
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise
• Compound = Compound + Random Spectrum
• Additive Spectrum– Add x% of another random spectrum
• For each m/z in library or random spectrum– AU = AL + x * AR
OutlineIntroduction
Related WorkMethod
Results and Discussion
34
Results – Random Spectrum Noise (Example)
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise
OutlineIntroduction
Related WorkMethod
Results and Discussion
0
5
10
15
20
25
27 34 41 48 55 62 69 76 83
m/z
No
ise
Inte
nsi
ty
0
200
400
600
800
1000
1200
27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85
m/z
Inte
nsi
ty
Pure
Noisy
35
Results – Random Spectrum Noise (Performance)
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise
OutlineIntroduction
Related WorkMethod
Results and Discussion
36
Results – Summary of Noise Models
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise
• AdditiveAU = AL + G(0,σ)
• MultiplicativeAU = AL + AL* G(0,σ)
• Johnson ColoredAU = AL + G(0,σ*√m)
• Random SpectrumAU = AL + x * AR
OutlineIntroduction
Related WorkMethod
Results and Discussion
37
Results – Summary of Noise Models
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise
OutlineIntroduction
Related WorkMethod
Results and Discussion
-200
-150
-100
-50
0
50
100
27 29 38 39 41 42 43 50 51 52 55 56 57 71 74 76 77 78 79 85
m/z
Inten
sity
Additive
Multiplicative
Johnson
Random
38
Results – Summary of Noise Models
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise
OutlineIntroduction
Related WorkMethod
Results and Discussion
0
200
400
600
800
1000
1200
27 29 38 39 41 42 43 50 51 52 55 56 57 71 74 76 77 78 79 85
m/z
Inten
sity
Additive
Multiplicative
Johnson
Random
39
Conclusion
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
• MS library search algorithm• Information theoretic
– Domain knowledge incorporated
• Algorithm works well for various noise models
• Future work– Must improve performance for the random
spectrum noise case
OutlineIntroduction
Related WorkMethod
Results and Discussion
40
Questions & Suggestions
Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
?
OutlineIntroduction
Related WorkMethod
Results and Discussion