High-Throughput Computer-Assisted Identification Platform ...

27
Page: 1 / 25 Knorr / 2 nd European GCxGC Symposium, 2011 High-Throughput Computer-Assisted Identification Platform of Small Molecules ACD/Labs European Users Meeting 2012 June 12-13, 2012 A. Knorr, A. Monge, D. Arndt, E. Martin, and P. Pospisil Pospisil et al, ACD/Labs European Users’ Meeting 2012

Transcript of High-Throughput Computer-Assisted Identification Platform ...

Page 1: High-Throughput Computer-Assisted Identification Platform ...

Page: 1 / 25 Knorr / 2nd European GCxGC Symposium, 2011

High-Throughput Computer-Assisted Identification Platform

of Small Molecules

ACD/Labs European Users Meeting 2012

June 12-13, 2012

A. Knorr, A. Monge, D. Arndt, E. Martin, and P. Pospisil

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 2: High-Throughput Computer-Assisted Identification Platform ...

Page: 3 / 25 Knorr / 2nd European GCxGC Symposium, 2011

Aim of CASI

Goal

• To accelerate and standardize the identification of small molecules with highest

confidence possible

• To Increase the throughput of identified compound structures by fully automatic

process

• Standardize the identification process

CASI is Computer-Assisted Structure Identification platform This platform identifies automatically compound structures in highly complex matrices

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 3: High-Throughput Computer-Assisted Identification Platform ...

Page: 4 / 25 Knorr / 2nd European GCxGC Symposium, 2011

Example

Compound ?

Smoke of a conventional cigarette, measured by GCxGC-TOF

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 4: High-Throughput Computer-Assisted Identification Platform ...

Page: 5 / 25 Knorr / 2nd European GCxGC Symposium, 2011

CASI Automated Workflow

1

GCxGC-TOF

2

Search in Mass Spectra

Databases

(NIST MS Search)

2nd

column

relative retention

time matching

KI matching

Multi JDX MS file

+ KI experimental values

+ relative second retention time

Hits

Boiling Point

matching

3 CASI Score

4

Ranking

5

Submission to

UCSD

database

Sorted Hits Confirmed Hits

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 5: High-Throughput Computer-Assisted Identification Platform ...

Page: 6 / 25 Knorr / 2nd European GCxGC Symposium, 2011

CASI Software Architecture

NIST MS

Search RapidMiner

(Modeling)

Dragon

(Modeling)

CASI Core Engine (Java + Tomcat)

Oracle

11gR2

Oracle Application Express

(Web Application)

ACD PhysChem

ACD Batch

ACD name-to-

structure

SOAP

Hibernate

Query

Mass

Spectra

Pipeline

Pilot

(Chemistry)

MS Databases

(NIST Format)

ACD/Labs + Pipeline Pilot Server

CASI Server

Java API CLJava API

Input data

· JDX file (mass spectra file)

· Experimental KI

· Experimental 2D retention index

CLCL

ORACLE Server

HPC environment

CL: command line

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 6: High-Throughput Computer-Assisted Identification Platform ...

Page: 9 / 25 Knorr / 2nd European GCxGC Symposium, 2011

CASI Automated Workflow

Latest spectra databases acquired (versions 2010)

> 650’000 Mass Spectra available in CASI

1

GCxGC-TOF

2

Search in Mass Spectra

Databases

(NIST MS Search)

2nd

column

relative retention

time matching

KI matching

Multi JDX MS file

+ KI experimental values

+ relative second retention time

Hits

Boiling Point

matching

3 CASI Score

4

Ranking

5

Submission to

UCSD

database

Sorted Hits Confirmed Hits

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 7: High-Throughput Computer-Assisted Identification Platform ...

Page: 10 / 25 Knorr / 2nd European GCxGC Symposium, 2011

1

GCxGC-TOF

2

Search in Mass Spectra

Databases

(NIST MS Search)

2nd

column

relative retention

time matching

KI matching

Multi JDX MS file

+ KI experimental values

+ relative second retention time

Hits

Boiling Point

matching

3 CASI Score

4

Ranking

5

Submission to

UCSD

database

Sorted Hits Confirmed Hits

CASI Automated Workflow

Computational modeling:

Kovats Indices

Several good models were published.

We model Kovats Indices for selected instruments.

GCxGC-TOF 2D retention

Never published.

BP for unknown compounds

It is known that Boiling Point is correlated to Kovats Indices.

BP can be also predicted or structure-retrieved (ACD/Labs).

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 8: High-Throughput Computer-Assisted Identification Platform ...

Page: 11 / 25 Knorr / 2nd European GCxGC Symposium, 2011

Predictive QSPR Model for KI, 2DrelRT and BP

GA – Support Vector Regression,

8 Molecular Descriptors

Kovats Index 2DrelRT Boiling Point

Linear Regression: BP calc. by

ACD/PhysChem vs. BP calc. by KI

GA – Linear Regression,

15 Molecular Descriptors

Validation r2 = 0.981 r2 = 0.855 r2 = 0.942

Selected models

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 9: High-Throughput Computer-Assisted Identification Platform ...

Page: 13 / 25 Knorr / 2nd European GCxGC Symposium, 2011

1

GCxGC-TOF

2

Search in Mass Spectra

Databases

(NIST MS Search)

2nd

column

relative retention

time matching

KI matching

Multi JDX MS file

+ KI experimental values

+ relative second retention time

Hits

Boiling Point

matching

3 CASI Score

4

Ranking

5

Submission to

UCSD

database

Sorted Hits Confirmed Hits

CASI Automated Workflow

Experimental method to determine

2D relative retention time (2DrelRT)

Never published

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 10: High-Throughput Computer-Assisted Identification Platform ...

Page: 14 / 25 Knorr / 2nd European GCxGC Symposium, 2011

Experimental 2D Relative RT Approach for GCxGC-TOF

• Currently, no 2D relative RT approach (2DrelRT) published

• Advantage of having 2DrelRT approach compared to 2D absolute RT (2DabsRT) approach: correction of

systemic GC fluctuations, enhanced reproducibility

• Challenge is the definition of a reference system accessible to all 2D peaks

• Our approach is to refer 2D peaks to hypothetical reference points based on linear regression on deuterated

n-alkanes (patented)

2

nd

dim

ensio

n s

epara

tion t

ime [

seconds]

1st dimension separation time [seconds]

deuterated

n-alkane 1

Example

compound

hypothetical n-alkane retention

(for 1D RT range < 1D RTdA2)

deuterated

n-alkane 2

deuterated

n-alkane 3

2D RThypothetical

n-alkane

abs 2D RTComp

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 11: High-Throughput Computer-Assisted Identification Platform ...

Page: 15 / 25 Knorr / 2nd European GCxGC Symposium, 2011

2DrelRT for GCxGC-TOF Data - Reproducibility

• Comparison of 3 different studies showed significantly better reproducibility using relative

than absolute 2DRT.

• 1219 compounds found (signal/noise >250), no outlier correction

0

100

200

300

400

0.3

0.6

0.9

1.2

1.5

2.0

2.5

3.0

3.5

4.0

5.0

6.0

7.0

8.0

10.0

>10.0

Relative standard deviation of absolute / relative retention time

for the second dimension of GCxGC-TOF (RSD, %)

Nu

mb

er

of

co

mp

ou

nd

s 2D-relRT

2D-absRT

4.3%2.5%

90th Percentile

(1097 compounds):

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 12: High-Throughput Computer-Assisted Identification Platform ...

Page: 16 / 25 Knorr / 2nd European GCxGC Symposium, 2011

1

GCxGC-TOF

2

Search in Mass Spectra

Databases

(NIST MS Search)

2nd

column

relative retention

time matching

KI matching

Multi JDX MS file

+ KI experimental values

+ relative second retention time

Hits

Boiling Point

matching

3 CASI Score

4

Ranking

5

Submission to

UCSD

database

Sorted Hits Confirmed Hits

CASI Automated Workflow

Experimental method to determine

In-silico fragmentation tools

Not included in CASI Score

In-silico

fragmentation

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 13: High-Throughput Computer-Assisted Identification Platform ...

Page: 17 / 25 Knorr / 2nd European GCxGC Symposium, 2011

Evaluation of in-silico fragmentation tools for CASI

• In-silico tool ”AutoAssignment” by ACD/Labs was used to

evaluate the rate of assignment of molecular

substructures to fragments

• Results by “AutoAssignment” for 90 MS of confirmed

structures and the corresponding 50 hits for each proposal

from NIST MS Search (in total 4500 structures)

(AutoAssignment parameters optimized by ACD/Labs)

27%

• Reject approach for low resolution mass-spectrometry by GCxGC-TOF

• Evaluate approach for accurate mass-spectrometry by GC-APCI-TOF (selectivity of substructure

assignment will increase by accurate mass of fragments/isotopic pattern)

Statement from ACD/Labs (Graham McGibbon, ACD/Labs MS Product Manager):

“The AutoAssignment tool was designed primarily to facilitate fragmentation interpretation rather

than for definitive structure differentiation, which currently needs comparisons of specific

characteristic ions and authentic spectra.”

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 14: High-Throughput Computer-Assisted Identification Platform ...

Page: 18 / 25 Knorr / 2nd European GCxGC Symposium, 2011

1

GCxGC-TOF

2

Search in Mass Spectra

Databases

(NIST MS Search)

2nd

column

relative retention

time matching

KI matching

Multi JDX MS file

+ KI experimental values

+ relative second retention time

Hits

Boiling Point

matching

3 CASI Score

4

Ranking

5

Submission to

UCSD

database

Sorted Hits Confirmed Hits

CASI Automated Workflow

CASI Score is a combination of NIST Match Factor (from step 2)

and of the three components of step 3:

predquerytrainBPBPpredquerytrainDRTDRT

predquerytrainKIKI

BPBPSEPhypDRTDRTSEPhyp

KIKISEPhypNIST MFCASI Score

,,2,2,

,,

22

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 15: High-Throughput Computer-Assisted Identification Platform ...

Page: 20 / 25 Knorr / 2nd European GCxGC Symposium, 2011

CASI Score function fitting – KI example

Predicted KI 2000.0

Experimental KI 2200.0

Standard Error of

Prediction65.8

Curve fitting (variable) 10.0

Score 0.908

Example: 8x50 structures with highest

NIST Fit

- red squares: Score correct structure

- blue dots: Score wrong proposals

from NIST

Score by MS Similarity and

Predicted KI

0

200

400

600

800

1000

0 1 2 3 4 5 6 7 8 9

Example Compound No.

CA

SI

Sc

ore

Good Hits

Hit ranking of correct

structures

8

0 0 0 00

2

4

6

8

10

#1 #2 #3 #4 #5-50

Hit-No.

Visualization of curve fitting

0.0

0.2

0.4

0.6

0.8

1.0

500 1500 2500 3500

Kovats Index

Sc

ore

of

KI m

od

ule

Predicted KI 2000.0

Experimental KI 2200.0

Standard Error of

Prediction65.8

Curve fitting (variable) 1000.0

Score 1.000

Example: 8x50 structures with highest

NIST Fit

- red squares: Score correct structure

- blue dots: Score wrong proposals

from NIST

Score by MS Similarity and

Predicted KI

0

200

400

600

800

1000

0 1 2 3 4 5 6 7 8 9

Example Compound No.

CA

SI

Sc

ore

Good Hits

Hit ranking of correct

structures

5

3

0 0 00

2

4

6

8

10

#1 #2 #3 #4 #5-50

Hit-No.

Visualization of curve fitting

0.0

0.2

0.4

0.6

0.8

1.0

500 1500 2500 3500

Kovats Index

Sc

ore

of

KI m

od

ule

Predicted KI 2000.0

Experimental KI 2200.0

Standard Error of

Prediction65.8

Curve fitting (variable) 300.0

Score 1.000

Example: 8x50 structures with highest

NIST Fit

- red squares: Score correct structure

- blue dots: Score wrong proposals

from NIST

Score by MS Similarity and

Predicted KI

0

200

400

600

800

1000

0 1 2 3 4 5 6 7 8 9

Example Compound No.

CA

SI

Sc

ore

Good Hits

Hit ranking of correct

structures

6

2

0 0 00

2

4

6

8

10

#1 #2 #3 #4 #5-50

Hit-No.

Visualization of curve fitting

0.0

0.2

0.4

0.6

0.8

1.0

500 1500 2500 3500

Kovats Index

Sc

ore

of

KI

mo

du

le

Predicted KI 2000.0

Experimental KI 2200.0

Standard Error of

Prediction65.8

Curve fitting (variable) 100.0

Score 0.999

Example: 8x50 structures with highest

NIST Fit

- red squares: Score correct structure

- blue dots: Score wrong proposals

from NIST

Score by MS Similarity and

Predicted KI

0

200

400

600

800

1000

0 1 2 3 4 5 6 7 8 9

Example Compound No.

CA

SI

Sc

ore

Good Hits

Hit ranking of correct

structures

6

2

0 0 00

2

4

6

8

10

#1 #2 #3 #4 #5-50

Hit-No.

Visualization of curve fitting

0.0

0.2

0.4

0.6

0.8

1.0

500 1500 2500 3500

Kovats Index

Sc

ore

of

KI m

od

ule

Predicted KI 2000.0

Experimental KI 2200.0

Standard Error of

Prediction65.8

Curve fitting (variable) 50.0

Score 0.996

Example: 8x50 structures with highest

NIST Fit

- red squares: Score correct structure

- blue dots: Score wrong proposals

from NIST

Score by MS Similarity and

Predicted KI

0

200

400

600

800

1000

0 1 2 3 4 5 6 7 8 9

Example Compound No.

CA

SI

Sc

ore

Good Hits

Hit ranking of correct

structures

6

2

0 0 00

2

4

6

8

10

#1 #2 #3 #4 #5-50

Hit-No.

Visualization of curve fitting

0.0

0.2

0.4

0.6

0.8

1.0

500 1500 2500 3500

Kovats Index

Sc

ore

of

KI m

od

ule

Predicted KI 2000.0

Experimental KI 2200.0

Standard Error of

Prediction65.8

Curve fitting (variable) 30.0

Score 0.990

Example: 8x50 structures with highest

NIST Fit

- red squares: Score correct structure

- blue dots: Score wrong proposals

from NIST

Score by MS Similarity and

Predicted KI

0

200

400

600

800

1000

0 1 2 3 4 5 6 7 8 9

Example Compound No.

CA

SI

Sc

ore

Good Hits

Hit ranking of correct

structures

7

10 0 0

0

2

4

6

8

10

#1 #2 #3 #4 #5-50

Hit-No.

Visualization of curve fitting

0.0

0.2

0.4

0.6

0.8

1.0

500 1500 2500 3500

Kovats Index

Sc

ore

of

KI m

od

ule

Predicted KI 2000.0

Experimental KI 2200.0

Standard Error of

Prediction65.8

Curve fitting (variable) 20.0

Score 0.977

Example: 8x50 structures with highest

NIST Fit

- red squares: Score correct structure

- blue dots: Score wrong proposals

from NIST

Score by MS Similarity and

Predicted KI

0

200

400

600

800

1000

0 1 2 3 4 5 6 7 8 9

Example Compound No.

CA

SI

Sc

ore

Good Hits

Hit ranking of correct

structures

7

10 0 0

0

2

4

6

8

10

#1 #2 #3 #4 #5-50

Hit-No.

Visualization of curve fitting

0.0

0.2

0.4

0.6

0.8

1.0

500 1500 2500 3500

Kovats Index

Sc

ore

of

KI m

od

ule

Predicted KI 2000.0

Experimental KI 2200.0

Standard Error of

Prediction65.8

Curve fitting (variable) 10.0

Score 0.908

Example: 8x50 structures with highest

NIST Fit

- red squares: Score correct structure

- blue dots: Score wrong proposals

from NIST

Score by MS Similarity and

Predicted KI

0

200

400

600

800

1000

0 1 2 3 4 5 6 7 8 9

Example Compound No.

CA

SI

Sc

ore

Good Hits

Hit ranking of correct

structures

8

0 0 0 00

2

4

6

8

10

#1 #2 #3 #4 #5-50

Hit-No.

Visualization of curve fitting

0.0

0.2

0.4

0.6

0.8

1.0

500 1500 2500 3500

Kovats Index

Sc

ore

of

KI m

od

ule

Predicted KI 2000.0

Experimental KI 2200.0

Standard Error of

Prediction65.8

Curve fitting (variable) 5.0

Score 0.630

Example: 8x50 structures with highest

NIST Fit

- red squares: Score correct structure

- blue dots: Score wrong proposals

from NIST

Score by MS Similarity and

Predicted KI

0

200

400

600

800

1000

0 1 2 3 4 5 6 7 8 9

Example Compound No.

CA

SI

Sc

ore

Good Hits

Hit ranking of correct

structures

8

0 0 0 00

2

4

6

8

10

#1 #2 #3 #4 #5-50

Hit-No.

Visualization of curve fitting

0.0

0.2

0.4

0.6

0.8

1.0

500 1500 2500 3500

Kovats Index

Sc

ore

of

KI m

od

ule

Predicted KI 2000.0

Experimental KI 2200.0

Standard Error of

Prediction65.8

Curve fitting (variable) 3.0

Score 0.000

Example: 8x50 structures with highest

NIST Fit

- red squares: Score correct structure

- blue dots: Score wrong proposals

from NIST

Score by MS Similarity and

Predicted KI

0

200

400

600

800

1000

0 1 2 3 4 5 6 7 8 9

Example Compound No.

CA

SI

Sc

ore

Good Hits

Hit ranking of correct

structures

7

0 0 01

0

2

4

6

8

10

#1 #2 #3 #4 #5-50

Hit-No.

Visualization of curve fitting

0.0

0.2

0.4

0.6

0.8

1.0

500 1500 2500 3500

Kovats Index

Sc

ore

of

KI m

od

ule

Predicted KI 2000.0

Experimental KI 2200.0

Standard Error of

Prediction65.8

Curve fitting (variable) 2.0

Score 0.000

Example: 8x50 structures with highest

NIST Fit

- red squares: Score correct structure

- blue dots: Score wrong proposals

from NIST

Score by MS Similarity and

Predicted KI

0

200

400

600

800

1000

0 1 2 3 4 5 6 7 8 9

Example Compound No.

CA

SI

Sc

ore

Good Hits

Hit ranking of correct

structures

5

10

1 1

0

2

4

6

8

10

#1 #2 #3 #4 #5-50

Hit-No.

Visualization of curve fitting

0.0

0.2

0.4

0.6

0.8

1.0

500 1500 2500 3500

Kovats Index

Sc

ore

of

KI m

od

ule

Predicted KI 2000.0

Experimental KI 2200.0

Standard Error of

Prediction65.8

Curve fitting (variable) 1.0

Score 0.000

Example: 8x50 structures with highest

NIST Fit

- red squares: Score correct structure

- blue dots: Score wrong proposals

from NIST

Score by MS Similarity and

Predicted KI

0

200

400

600

800

1000

0 1 2 3 4 5 6 7 8 9

Example Compound No.

CA

SI

Sc

ore

Good Hits

Hit ranking of correct

structures

3

1 10

3

0

2

4

6

8

10

#1 #2 #3 #4 #5-50

Hit-No.

Visualization of curve fitting

0.0

0.2

0.4

0.6

0.8

1.0

500 1500 2500 3500

Kovats Index

Sc

ore

of

KI m

od

ule

Predicted KI 2000.0

Experimental KI 2200.0

Standard Error of

Prediction65.8

Curve fitting (variable) 5.0

Score 0.630

Example: 8x50 structures with highest

NIST Fit

- red squares: Score correct structure

- blue dots: Score wrong proposals

from NIST

Score by MS Similarity and

Predicted KI

0

200

400

600

800

1000

0 1 2 3 4 5 6 7 8 9

Example Compound No.

CA

SI

Sc

ore

Good Hits

Hit ranking of correct

structures

8

0 0 0 00

2

4

6

8

10

#1 #2 #3 #4 #5-50

Hit-No.

Visualization of curve fitting

0.0

0.2

0.4

0.6

0.8

1.0

500 1500 2500 3500

Kovats Index

Sc

ore

of

KI m

od

ule

• Training (fitting) of the CASI Score is done over all predicted modules (KI, BP,

2DrelRT) simultaneously and is based on #1-Hits.

• All fittings are done automatically (no manual intervention).

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 16: High-Throughput Computer-Assisted Identification Platform ...

Page: 21 / 25 Knorr / 2nd European GCxGC Symposium, 2011

1

GCxGC-TOF

2

Search in Mass Spectra

Databases

(NIST MS Search)

2nd

column

relative retention

time matching

KI matching

Multi JDX MS file

+ KI experimental values

+ relative second retention time

Hits

Boiling Point

matching

3 CASI Score

4

Ranking

5

Submission to

UCSD

database

Sorted Hits Confirmed Hits

CASI Automated Workflow

Ranking of suggested hits

by CASI score

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 17: High-Throughput Computer-Assisted Identification Platform ...

Page: 22 / 25 Knorr / 2nd European GCxGC Symposium, 2011

Analysis page example

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Nr. 1

Nr. 2

Nr. 3

Nr. 4 Nr. 1

Nr. 2

Nr. 3

Nr. 4

CASI NIST

Page 18: High-Throughput Computer-Assisted Identification Platform ...

Page: 24 / 25 Knorr / 2nd European GCxGC Symposium, 2011

Performance Indicators of CASI Platform

Discrimination between identified and unknowns:

CASI score < 795:

Pospisil et al, ACD/Labs European Users’ Meeting 2012

0

10

20

30

40

50

300 400 500 600 700 800 900 1000

CASI Score

Fre

qu

en

cy

Correct hits of the validation set

Selected hit of unknown compounds

0

10

20

30

40

50

300 400 500 600 700 800 900 1000

NIST MS Search Match Factor

Fre

qu

en

cy

Correct hits of the validation set

Hit with highest score for unknowns

NIST score < 825:

Distribution of the CASI scores or NIST Match Factors for the correct hits (blue) of the validation set and of the hits of

unknown compounds selected by default (1st highest score, pink) for a set of 176 unknown compounds

Below this threshold: high probability to be true negatives (false proposals given by mass-spectral

databases)

Page 19: High-Throughput Computer-Assisted Identification Platform ...

Page: 26 / 25 Knorr / 2nd European GCxGC Symposium, 2011

Dataset for CASI Platform

• Comparing the chemical composition of different smoke samples

• Application of non-targeted differential screening using GCxGC-EI-TOF-MS

• GCxGC-EI-TOF-MS data provide:

– structural proposals for the most relevant differences

– unknowns (no structural proposal available)

• 218 structural proposals were confirmed by reference standards

• 176 unknowns were additionally included in dataset

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 20: High-Throughput Computer-Assisted Identification Platform ...

Page: 27 / 25 Knorr / 2nd European GCxGC Symposium, 2011

Range and Structural Diversity of CASI dataset

Comparison of CASI dataset against

smoke related organic compounds (>6000)

CASI compounds ( ) are distributed

between a broad range of structural features of

the in-house database ( ) of smoke related

organic compounds.

- PCA based on ECFP6 fingerprints: circular topological fingerprints

for e.g. similarity searching Dataset we used covers the common ranges

of GC-MS

50

150

250

350

450

550

500 1500 2500 3500

Kovats Index

Mo

lec

ula

r w

eig

ht

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 21: High-Throughput Computer-Assisted Identification Platform ...

Page: 28 / 25 Knorr / 2nd European GCxGC Symposium, 2011

Overall results for current dataset (Identified + Unknowns)

• Evaluation of results on combined dataset (247 mass-spectra) by contingency table

Used threshold of being identified: 825 for NIST MS Search, 795 for CASI Score)

• CASI Platform shows significantly less false positives (11 vs. 29) than NIST MS search.

true

(CASI score)

false

(CASI score)

true

(NIST MF)

false

(NIST MF)

positive 46 11 40 29

negative 165 14 147 20

total (%) 89% 11% 79% 21%

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 22: High-Throughput Computer-Assisted Identification Platform ...

Page: 29 / 25 Knorr / 2nd European GCxGC Symposium, 2011

1

GCxGC-TOF

2

Search in Mass Spectra

Databases

(NIST MS Search)

2nd

column

relative retention

time matching

KI matching

Multi JDX MS file

+ KI experimental values

+ relative second retention time

Hits

Boiling Point

matching

3 CASI Score

4

Ranking

5

Submission to

UCSD

database

Sorted Hits Confirmed Hits

CASI Automated Workflow

Automatic submission into PMI

unique chemical database

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 23: High-Throughput Computer-Assisted Identification Platform ...

Page: 30 / 25 Knorr / 2nd European GCxGC Symposium, 2011

User Web Interface

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 24: High-Throughput Computer-Assisted Identification Platform ...

Page: 31 / 25 Knorr / 2nd European GCxGC Symposium, 2011

User and admin web interface

Job queueJob queueResult summaryResults summaryResult summaryResults summaryCandidates for each resultCandidates for each queryCandidates for each resultCandidates for each query

Export to

.sdf or

publish in in-

house

database

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 25: High-Throughput Computer-Assisted Identification Platform ...

Page: 32 / 25 Knorr / 2nd European GCxGC Symposium, 2011

Conclusion

• CASI (for GC) platform is functioning.

• CASI can be expanded to other GC systems with the possibility to

dynamically change it to other instruments and analytical columns.

• Innovative 2D relative retention time concept is proven working.

• Robust models for KI, BP and 2DrelRT were established.

• CASI Score performs better than NIST MS Search.

• Patent application submitted (>20 claims).

• Very good feedback from the scientific community.

• Platform will be extended to accurate mass and LC-MS and to other GC

systems

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 26: High-Throughput Computer-Assisted Identification Platform ...

Page: 34 / 25 Knorr / 2nd European GCxGC Symposium, 2011

Acknowledgment

Bioanalytical chemistry

Chemoinformatics

Arno Knorr Markus Stueber

Andre Stratmann Daniel Arndt Manuel Peitsch Pavel Pospisil

Elyette Martin Aurelien Monge

Pospisil et al, ACD/Labs European Users’ Meeting 2012

Page 27: High-Throughput Computer-Assisted Identification Platform ...

Page: 35 / 25 Knorr / 2nd European GCxGC Symposium, 2011

End

Thank you for your attention.

Pospisil et al, ACD/Labs European Users’ Meeting 2012