Developing CAS Products for Substructure Searching...
Transcript of Developing CAS Products for Substructure Searching...
Developing CAS Products for Substructure Searching by Chemists
Linda Toler
®
Kurt Loening Symposium, August 20012
Developing CAS Products for Substructure Searching
� Evolution of the CAS Registry� Development of substructure searching
for CAS products� Future challenges
Kurt Loening Symposium, August 20013
Evolution of the CAS Registry
Kurt Loening Symposium, August 20014
Evolution of the CAS Registry
Names and MFs
Fragment Codes and
Linear Notations
Kurt Loening Symposium, August 20015
Building the CAS Registry
Late 1950’s
Dyson linear notation
Register Number
Kurt Loening Symposium, August 20016
1959 1965
Registry I Morgan
Connection Table
Dyson linear notation
Building the CAS Registry
Kurt Loening Symposium, August 20017
Sample of a CAS Connection Table
CAS Registry Number: 125417-03-0
Rank # : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15Atom : C C C C C C C C C O C C C C CBond to: 8 8 1 4 4 4 5 5 6 6 7 7 11 13 10 12 8 14 9 15Bond is: -- -- -- -* -* -* -* =* -* -* =* -* -* -* RC -* RC -* RC -*Mol Form: C14 H20 O
ooo
Kurt Loening Symposium, August 20018
1959 1965
Dyson linear notation
- Covered organic molecules- CAS Registry Number- “Normalized” ring bonds
Registry I Morgan
Connection Table
Building the CAS Registry
®
Kurt Loening Symposium, August 20019
1959 1965
Dyson linear notation
1968
Registry II
- Covered all compound classes- Standardized stereo descriptors- “Normalized” tautomer bonds
Registry I Morgan
Connection Table
Building the CAS Registry
Kurt Loening Symposium, August 200110
1959 1965
Dyson linear notation
1968
Registry II
1973
Registry III
Streamlined internal handling and display of connection table information
Registry I Morgan
Connection Table
Building the CAS Registry
Kurt Loening Symposium, August 200111
CAS Registry Handles New Chemistry
� Superconducting substances of the 1980s� registered ranges of element compositions � registered non-stochiometric compositions
RN 301237-56-9 REGISTRYCN Barium calcium mercury rhenium oxide (Ba4Ca1.5Hg2Re0.5O8.5)
(9CI) (CA INDEX NAME)MF Ba . Ca . Hg . O . ReAF Ba4 Ca1.5 Hg2 O8.5 Re0.5
Component | Ratio | Component | | Registry Number
==============+====================+===================O | 8.5 | 17778-80-2Ca | 1.5 | 7440-70-2Ba | 4 | 7440-39-3Re | 0.5 | 7440-15-5Hg | 2 | 7439-97-6
Kurt Loening Symposium, August 200112
CAS Registry Handles New Chemistry
� “Textual” descriptions of stereochemistry became outdated and difficult to use � Upgraded connection tables to include stereo
parityRN 350021-96-4 REGISTRYCN Benzeneethanol, .beta.-[(2-furanylmethyl)[(1R)-1-methyl-2-
propynyl]amino]-, (.beta.R)- (9CI) (CA INDEX NAME)
Absolute stereochemistry.
Kurt Loening Symposium, August 200113
CAS Registry Handles New Chemistry
� Emphasis on biomolecules mushroomed in the 1990’s� Macromolecules represented via one letter
codes for their basic building blocks
RN 349518-49-6 REGISTRYCN G protein-coupled receptor 35 (mouse fragment) (9CI)
(CA INDEX NAME)FS PROTEIN SEQUENCESQL 93
SEQ 1 AHMVWANLAV FVICFLPLHV VLTVQVSLNL NTCAARDTFS RALSITGKLS
51 DTNCCLDAIC YYYMAREFQE ASKPATSSNT PHKSQDSQIL SLT
Kurt Loening Symposium, August 200114
CAS Registry Today
World’s Largest, Most Diverse Substance Collection
Polymers3%
Alloys2%
Coordination compounds
5%
"Inorganics"2%
Sequences 43%"Organic"
45%
>32,400,000 records
Kurt Loening Symposium, August 200115
CAS Registry Today
Know for the Quality and Integrity of Its Structural Information
<Pictures of CAS Registry staff>
Kurt Loening Symposium, August 200116
� CAS Registry Numbers are used throughout the world to identify substances
– Databases– Handbooks– Government regulatory agencies– Consumer products
CAS Registry Today
An International Resource for Substance Identification
Kurt Loening Symposium, August 200117
CAS Registry Today
Largest publicly available structure searchable compound collection
Connection Tables>18.3M
Biosequences>13.9M
Kurt Loening Symposium, August 200118
Substructure Searching in the CAS Registry
Kurt Loening Symposium, August 200119
“Substructure” Searching Can Mean Different Things
� Compounds with structures� answers contain specified structural
characteristics� Alloys and non-stochiometric inorganics
� answers are compounds of various composition ranges
� Protein and nucleic acid sequences� answers are sequences containing the same
string of building-block residues
Kurt Loening Symposium, August 200120
“Substructure” Searching Can Mean Different Things
� Compounds with structures� answers contain specified structural
characteristics
Kurt Loening Symposium, August 200121
� Query Input� Easy-to-use query input mechanism� Flexible query definition options
-C; -CH3; R=Me, Et, or n-Pr; Ak� Retrieval
� Quick, comprehensive retrieval of all matching compounds
� Tools for dealing with registration “idiosyncracies”
What Should a Good Substructure Search System Offer?
Kurt Loening Symposium, August 200122
Query Input Methods Evolved Over Time
� Screens � “Commands”� Drawing
Kurt Loening Symposium, August 200123
Query Input via Screens
=> SCREEN 1867 AND 42 1199 AND 745 AND 1033 AND 1139 AND 1142 AND 1707 AND 1831
1867 TR DDDDDD42 AS C-C*C*C-C-O
1199 AA C -1C -1O -2O745 BS A *2A *1A *1A *1A *1A *1A
Kurt Loening Symposium, August 200124
Query Input via Commands
=> STR:GRAPH R6, 1 C1, 3 C1, 4 C3, 5 C1, 6 C1, 9 C1:NODE 8 OH, 7 14 10 O, 11 12 13 AK :BOND ALL S, 1-7 2-3 9-14 D:RSP I, CON 11 12 13 E1, DIS
=> STR:GRAPH R6, 1 C1, 3 C1, 4 C3, 5 C1, 6 C1, 9 C1:NODE 8 OH, 7 14 10 O, 11 12 13 AK :BOND ALL S, 1-7 2-3 9-14 D:RSP I, CON 11 12 13 E1, DIS
CC
C
CC
C
2O 1
10
OH3 8
C O Ak4
14
11Ak
5
12
Ak6
13
O
9
7
Kurt Loening Symposium, August 200125
Query Input Offline via Drawing
Kurt Loening Symposium, August 200126
� Query Input� Easy-to-use query input mechanism� Flexible query definition options
-C; -CH3; R=-CH3, Et, or n-Pr; Ak� Retrieval
� Quick, comprehensive retrieval of all matching compounds
� Tools for dealing with registration “idiosyncracies”
What Should a Good Substructure Search System Offer?
✔
✔
Kurt Loening Symposium, August 200127
The CAS Substructure Search System Evolved Over Time
1980
Substructure searching via
screens
1960’s
Development of Prototype SSS system
1981
Substructure searching via
structure diagrams
≈
Kurt Loening Symposium, August 200128
Substructure Searching Is a Two-step Process
“Iterative” Search
Structure Compilation
Screen Generation
Screen Search
Candidate Answers
Kurt Loening Symposium, August 200129
The CAS Substructure Search System Evolved Over Time
1960’s 1980
Substructure searching via
screens
Development of Prototype SSS system
1981
Substructure searching via
structure diagrams
1990
Substructure searching
extended to Markush
≈
Markush specification:
R= unsaturated alkyl of 1-4 carbon atoms
R = methyl, ethyl, or n-propyl
Query structure:
C??
Ak ??
Kurt Loening Symposium, August 200130
� Query Input� Easy-to-use query input mechanism� Flexible query definition options
-C; -CH3; R=-CH3, Et, or n-Pr; Ak� Retrieval
� Quick, comprehensive retrieval of all matching compounds
� Tools for dealing with registration “idiosyncracies”
What Should a Good Substructure Search System Offer?
✔
✔
✔
Kurt Loening Symposium, August 200131
Structural Representations Present Search Challenges
� Salts
Kurt Loening Symposium, August 200132
Structural Representations Present Search Challenges
� Enol-keto tautomers
Kurt Loening Symposium, August 200133
Structural Representations Present Search Challenges
� Pyrazoles
Kurt Loening Symposium, August 200134
CAS Designed Search Tools/Systems to Address Structuring Conventions
� Query input tools to allow for all relevant structural characteristics
� Algorithms that handle structural representation conventions
Kurt Loening Symposium, August 200135
STN Searchers Have a Variety of Tools To Allow for Different Structural Representations
Answers:
Unspecified bonds
Connectivity
®
Kurt Loening Symposium, August 200136
SciFinder Search Algorithms Handle Many Structure Conventions
®
Kurt Loening Symposium, August 200137
� Query Input� Easy-to-use query input mechanism� Flexible query definition options
-C; -CH3; R=-CH3, Et, or n-Pr; Ak� Retrieval
� Quick, comprehensive retrieval of all matching compounds
� Tools for dealing with registration “idiosyncracies”
What Should a Good Substructure Search System Offer?
✔
✔
✔
✔
Kurt Loening Symposium, August 200138
What Does the Future Hold?
Kurt Loening Symposium, August 200139
� Chemical substance retrieval� Similar structures � “Shape” matching
� Converting information into knowledge� Tools for discovering “structural
relationships”� Tools for mining the diversity in the CAS
Registry for relevant substances
Customers Want More!
Kurt Loening Symposium, August 200140
"To remain relevant to the work of scientists in the twenty-first century…., CAS information technology must keep pace with the evolution of the chemical sciences, including related biological sciences, and remain adaptive enough to accommodate the unexpected and exciting developments that undoubtedly lie ahead."
From: “Chemical Abstracts Service Information System", Encyclopedia of Computational Chemistry, John Wiley & Sons, November 1998.
What Does the Future Hold?
Kurt Loening Symposium, August 200141
Acknowledgements
Weisgerber, D. W. (1977) Chemical Abstracts Chemical Registry System: History, Scope, and Impacts. Journal of the American Society for Information Science 48(4) p. 349-360
Fisanick, W., Amaral, N.J., Metanomski, W.V., Shively, E.R., Soukup, K.M., Stobaugh, R.E. (1998) Chemical Abstracts Information System. Encyclopedia of Computational Chemistry p. 277-315