How We Handle Mass Spectra - NIST...2004/10/26 · Algorithm Performance 12,592 Replicate Spectra...

How We Handle Mass Spectra

NIST Mass Spectrometry Data Center

Numbers of Spectra

020,00040,00060,00080,000

100,000120,000140,000160,000180,000200,000

'78 '80 '83 '86 '88 '90 '93 '98 '02

ReplicatesCompounds

Red Books EPA NIST

NIST/EPA/NIH Mass Spectral L ibrary

Libraries Dis tributed/Year

10001500200025003000350040004500

'88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 '01 '02 '03 '04

The Data

Connection Table

1 2 3 4

From Structure to Spectrum:A Mass “Fragmentogram”

CH3OCH

e- 2e-

mass = 140 u

CH3OH P

CH3OCH

mass = 99 u

mass = 125 u

Molecular Fingerprints

I will discuss

• Library Searching– Full and Partial Spectra

• Spectrum Purification

• Chemical Structure Representation

• Peptide Spectra Libraries

0 50 100 150 200 250 3000

Instrument ‘Noise Signature’250 Hexachlorobenzene Spectrasame instrument, calibration mix

Bars show quartiles

Instrument Effects

Library Search

unknown

Spectral Similarity

• M = f(Abundance) Peak in Measured Spectrum

• R = f(Abundance) Peak in Reference Spectrum

• Sum over all peaks

• f(Abundance) – Abundance

– Abundance * m/z

– Certainty

� �

Top Hit Top 2 Hits Top 3 Hits

Correlation – Weighted 74.9 86.9 91.7

Correlation 72.9 85.9 90.8

Euclidean Distance 71.9 83.9 88.9

Absolute Distance 67.9 80.3 85.5

PBM - Published 64.7 78.4 84.8

Hites/Hertz/Biemann 64.4 77.2 83.2

Algorithm Performance12,592 Replicate Spectra against NIST Library

Percent CorrectModel

0 20 40 60 80 1000.0

False Positives(108,000 compounds)

False Negatives(21,000 replicate spectra)

Match Factor Threshold

FractionRecovered

FP/FP Above Given Match Factor for NIST Library Spectra

80 85 90 95 100

m/z weighting

no weighting

Match Factor

FractionRecovered

FP/FNExpanded View

FP x 10,000

0 20 40 60 80 100

200 decanedecalinTMBHCB

malathion

HCB = hexachlorobenzeneDMPB = dimethylpenobarbitalTMB = 1,2,3-trimethylbenzene

FP Depends on Spectrum Uniqueness

Match Factor

Multiple Ion Monitoring

• What is is?– Use 2-5 Major Peaks in Spectrum of Target

• 10 – 100 more sensitive

• What’s the problem? – Can match major Target peaks with Minor Sample

• What we have done:– Examine risk using library as source of potential false

positive IDs

False Positive Risk vs Number of Peaks Used

Figure 1. Median FPP vs. NP

0.00001

0.0001

1 2 3 4 5

Number of Peaks

Figure 1. Median FPP vs. NP

0.00001

0.0001

1 2 3 4 5

Number of Peaks

1/21/2

1/41/4

1/81/8

1/161/16

1/321/32

1/641/64

1/1281/128

Number of Peaks

FP/spectrum

(median)

Abundance Ratio:Biggest Search Peak/Matching Peak in FP

0 10 20 30 40 50 60 70 80 90 100

m/z Difference

0 10 20 30 40 50 60 70 80 90 100

m/z Difference

Mass Spectral Peak Occurrences are Correlated

Difference in Peak Position (m/z)

Joint Occurrence

Big Peaks

SmallPeaks

MediumPeaks

FP Observed and Computed(from individual peak probabilities)

1 2 3 4 5 6 7 8 9 10

Observed FPP Percentile

1 2 3 4 5 6 7 8 9 10

Observed FPP Percentile

FP Percentile/10

Actual

No Peak Correlation

Search Results Depend on Search Spectrum Quality

AMDIS: http://chemdata.nist.gov

Real DataTotal ion

chromatogram

A mass spectrum

(scan)

Chromatogram with single ion

AMDIS Analysis of Data

AMDIS Match = 81O P

Order of Analysis

• Noise Analysis – find ‘Noise Factor’

• Find and quantify maximizing ions

• Combine to create ‘Model Peak’

• Use Model Peak shape (intensity vs time) to purify spectra

• Find best matching library spectrum

IntensityKNoise noise=

Intensity

Derive Noise Factor

Finding Possible Peaks for Each m/z

Maxim um rate

Scan number n

Find Possible Compounds:Do Ions Maximize at Same Time?

.0 .1 .7.6.5.4.2 .3 .9.8

Separate the Components

168511

.3 yes

30514 19

8237147

.0 .1 .7.6.5.4.2 .3 .9.8

5781 23

A ‘Model Peak’ Provides Shape

168511

.3 yes

30514 19

8237147

.0 .1 .7.6.5.4.2 .3 .9.8

5781 23

The model shape is defined as the sum of all of the ion chromatograms that maximize within the range and have a sharpness value within 75% of the maximum.

AMDIS Testing – Closely Eluting Components

Representing Chemical Identity

• Visual: 2D Structure

• Text: IUPAC Name

• Digital: No Accepted, Open Method

• Solution:

The IUPAC/NIST Chemical Identifier

Connection Table

1 2 3 4

Chemical Identity Problems

CH3CH3

Registry Number possible for each exact form, mixture, unknown, unspecified

Experts required

Expensive, ambiguous and error prone

Requirements

• Different compounds have different identifiers– Keep all distinguishing structural information

IChI - 1 IChI - 2=

Requirements

• One compound has only one identifier– Omit unnecessary information

Same INChI

3 Steps to INChI

• Chemistry– ‘Normalize’ Input Structure

• Implement chemical rules

• Math– ‘Canonicalize’ (label the atoms)

• Equivalent atoms get the same label

• Format– ‘Serialize’ Labeled Structure

• Output as character string (‘name’)

formula

connectivity

stereo

isotope

Chemical Substances“Layers”

NitrobenzeneCH5

Canonical numbering

Descr iption Layers

formula C6H5NO2

connectivity 8-7(9)6-4-2-1-3-5-6

H-atoms 1-5H

charges

CH3O10OH

Canonical numbering

Descr iption Layers

formula C5H8NO4.Na

connectivity 6-3(5(9)10)1-2-4(7)8;

H-atoms 1-2H2,3H,6H2(H-,7,8,9,10);

stereo sp3 3-;

charges -1;+1

C5H9NO4.Na/c6-3(5(9)10)1-2-4(7)8;/h1-2H2,3H,6H2,(H,7,8)(H,9,10);/q;+1/p-1/t3-;/m1./s1

INChITest

Version

Input/Result

Mobile H On/Off

Include Org-Metal Bonds

Peptide Mass Spectra:Libraries for Organisms

• Proteins are linear sequences of amino acids– characteristic of Genome (organism)

• Peptides are ‘digested’ fragments of proteins

• MS ‘sequences’ peptides to reveal source Protein

• Peptides fragmentation spectra are not quite predictable

• Peptide fragmentation spectra for a ‘genome’ can be contained in one Library.

Spectrum Prediction Programs

Peptide Spectra Reference Library (multiple measurements each of 10,000 peptides)

HLQLAIR/2+

MS Mapped to the Genome

From Eric Deutsch, ISB, 6/2004

How We Handle Mass Spectra - NIST...2004/10/26 · Algorithm Performance 12,592 Replicate Spectra...

Documents

Transcript of How We Handle Mass Spectra - NIST...2004/10/26 · Algorithm Performance 12,592 Replicate Spectra...

NIST/EPA/NIH Mass Spectral Library (NIST 17) and NIST Mass ... Basis for Interpretation of the Library Search Results ... Features Relevant to Electron Ionization Mass Spectra (Low

NIST Big Data | NIST

NIST Database for the Simulation of Electron … NIST Database for the Simulation of Electron Spectra For Surface Analysis (SESSA)* Cedric Powell National Institute of Standards and

01-1016-01 Cintra Brochure October 2016 - GBC Scientific … · 2017-01-18 · 340.0 rtm 3. 3130 Abs CAP NUM SCRL Replicate 3 CIE mple Replicate 1 Replicate 2 Replicate 3 0.338 75.309

Development and Evaluation of Advanced Hydride Systems …to hydrogen vibration spectra from NIST due to site occupancy and local structures ¾Supporting analyses of NMR data from

The Many Forms of NIST 2020 MS Libraries · NIST Tandem Mass Spectral Library 2020 Release 31K Compounds, 2X More than 2017 186K Precursor Ions - 1.3M Spectra Compounds: Fragmentation

Reference Materials McLafferty & Turacek, “Interpretation of Mass Spectra”, 4 th Ed. (University Science Books) NIST: .

Re-certification of NIST Standard Reference Material® 2372Green Traces 2006 low/high absorbance spectra Black Traces 2012 low/high absorbance spectra Evidence of dsDNA/ssDNA mixture

ANSI NIST CSL1-1993PDF | NIST

NIST DATABASE WORK ON SPECTRA OF LIGHT ......NIST DATABASE WORK ON SPECTRA OF LIGHT ELEMENTS AND BRIEF UPDATE FOR TUNGSTEN JOSEPH READER Atomic Spectroscopy Group National Institute

Presentación de PowerPoint - Replicate Project

Replicate coring final - waisdivide.sr.unh.eduwaisdivide.sr.unh.edu/docs/WAIS-Divide_Replicate_coring_final.pdf · Replicate Coring and Borehole Logging Science and Implementation

replicate 1 replicate 2 replicate 3 replicate 2 replicate 3 ΣCoverag … · 2015. 2. 9. · Abundances soluble Accession. Description: ΣCoverag Σ# Protein Σ# Unique Peptide Σ#

NIST Plan Public Access.pdf | NIST

The Analyses of Optical Atomic Spectra - NIST...MnITXIIVTTTV111 9 FeXXVI 22 MnITX 11TY1A. 9 MnY 10 \j\jyj

MS Solutions Solutions/MSSol14NA.pdf · Part 4 This installment in the series on NIST 11 is about the Incremental Name Search, replicate spectra, and the NIST GC Methods, ... compound

NIST Standard Reference Database 1A · NIST Standard Reference Database 1A ... type D:\SETUP.EXE and select OK. ... source/EI with accurate m/z spectra added with NIST MS Search to

Recent Developments of the NIST Atomic Data Program · DCN Meeting, IAEA, Sep 4 2017 . Plan •Staff •Atomic Spectra Database –New version and contents –LIBS database •Bibliographic

Mass spectra of fluorocarbons - NIST · 2012. 2. 24. · f Journal of Research of the National Bureau of Standards Vo!' 49, No. 5, November 1952 Research Paper 2370 Mass Spectra of

Materials and Tools (per replicate