Analysing and Classifying Names of Chemical Compounds with ... · Analysing and Classifying Names...
-
Upload
nguyenkhuong -
Category
Documents
-
view
213 -
download
0
Transcript of Analysing and Classifying Names of Chemical Compounds with ... · Analysing and Classifying Names...
Analysing and Classifying
Names of Chemical Compounds
with CHEMorph
Stefanie Anstein Gerhard Kremer
��� ��
�� ��
�� ��
�� ��
�� ��
�� ��
�� ��
�� ��
�� ��
�� ��
�� ��
�� ��
�� ��
�� ��
��
�
�
��
�
�
��
�
�
��
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�� IMS, University of Stuttgart
April 11, 2006
Introduction System Details Conclusion
Example Analysis
CH
H
H
C
O
C
H
H
C
H
H
C
H
H
C
H
H
C
H
H
O H
7-hydroxyheptan-2-one
compd(ane(7*C),pref([1*[7]-hydroxy]),suff([1*[2]-one]))
CC(=O)CCCCCO ALCOHOL,KETONE,...
Stefanie Anstein, Gerhard Kremer CHEMorph 2 / 13
Introduction System Details Conclusion
Example Analysis
CH
H
H
C
O
C
H
H
C
H
H
C
H
H
C
H
H
C
H
H
O H
7-hydroxyheptan-2-one
compd(ane(7*C),pref([1*[7]-hydroxy]),suff([1*[2]-one]))
CC(=O)CCCCCO ALCOHOL,KETONE,...
Stefanie Anstein, Gerhard Kremer CHEMorph 2 / 13
Introduction System Details Conclusion
Example Analysis
CH
H
H
C
O
C
H
H
C
H
H
C
H
H
C
H
H
C
H
H
O H
7-hydroxyheptan-2-one
compd(ane(7*C),pref([1*[7]-hydroxy]),suff([1*[2]-one]))
CC(=O)CCCCCO ALCOHOL,KETONE,...
Stefanie Anstein, Gerhard Kremer CHEMorph 2 / 13
Introduction System Details Conclusion
Example Analysis
CH
H
H
C
O
C
H
H
C
H
H
C
H
H
C
H
H
C
H
H
O H
7-hydroxyheptan-2-one
compd(ane(7*C),pref([1*[7]-hydroxy]),suff([1*[2]-one]))
CC(=O)CCCCCO ALCOHOL,KETONE,...
Stefanie Anstein, Gerhard Kremer CHEMorph 2 / 13
Introduction System Details Conclusion
Motivation & Background
life sciences . . .
and the amount of biomedical data
terminology . . .
and biochemical nomenclature
Stefanie Anstein, Gerhard Kremer CHEMorph 3 / 13
Introduction System Details Conclusion
Motivation & Background
life sciences . . .
and the amount of biomedical data
terminology . . .
and biochemical nomenclature
Stefanie Anstein, Gerhard Kremer CHEMorph 3 / 13
Introduction System Details Conclusion
Challenges
term reference
coreferences
R-0. 1.7.3 (IUPAC nomenclature of organic compounds):
Addition of the vowel “o”.
For euphonic reasons, the vowel “o” is sometimes inserted
between consonants.
Stefanie Anstein, Gerhard Kremer CHEMorph 4 / 13
Introduction System Details Conclusion
Challenges
term reference
coreferences
R-0. 1.7.3 (IUPAC nomenclature of organic compounds):
Addition of the vowel “o”.
For euphonic reasons, the vowel “o” is sometimes inserted
between consonants.
Stefanie Anstein, Gerhard Kremer CHEMorph 4 / 13
Introduction System Details Conclusion
Modules Overview
name
parser
semantic representation
SMILES string
generator
SMILES string
classifier
classes
Stefanie Anstein, Gerhard Kremer CHEMorph 5 / 13
Introduction System Details Conclusion
Modules Overview
name
parser
semantic representation
SMILES string
generator
SMILES string
classifier
classes
Stefanie Anstein, Gerhard Kremer CHEMorph 5 / 13
Introduction System Details Conclusion
Modules Overview
name
parser
semantic representation
SMILES string
generator
SMILES string
classifier
classes
Stefanie Anstein, Gerhard Kremer CHEMorph 5 / 13
Introduction System Details Conclusion
Modules Overview
name
parser
semantic representation
SMILES string
generator
SMILES string
classifier
classes
Stefanie Anstein, Gerhard Kremer CHEMorph 5 / 13
Introduction System Details Conclusion
Name Types
fully specified underspecified
systematic 7-hydroxyheptan-2-one heptanone
trivial benzene ∅semi-systematic benzene-1,3,5-triacetic acid dihydrobenzene
class ∅ alcohol
semi-systematic ∅ 2-deoxysugar
Stefanie Anstein, Gerhard Kremer CHEMorph 6 / 13
Introduction System Details Conclusion
Parser
7 - hydroxy hept an - 2 - one
mult7
parent suffixλ(X,ane(X*’C’))
parent nonsugarane(7*’C’)
organic compound
prefix[??*[7]-hydroxy]
locant??*[7]
loc[7]
hyphen∅
prefhydroxy
locant??*[2]
suffix[??*[2]-one]
hyphen∅
loc[2]
hyphen∅
suffone
compd( ane(7*C) , pref( [??*[7]-hydroxy] ) ,
suff( [??*[2]-one] ) )
Stefanie Anstein, Gerhard Kremer CHEMorph 7 / 13
Introduction System Details Conclusion
Parser
7 - hydroxy hept an - 2 - one
mult7
parent suffixλ(X,ane(X*’C’))
parent nonsugarane(7*’C’)
organic compound
prefix[??*[7]-hydroxy]
locant??*[7]
loc[7]
hyphen∅
prefhydroxy
locant??*[2]
suffix[??*[2]-one]
hyphen∅
loc[2]
hyphen∅
suffone
compd( ane(7*C) , pref( [??*[7]-hydroxy] ) ,
suff( [??*[2]-one] ) )
Stefanie Anstein, Gerhard Kremer CHEMorph 7 / 13
Introduction System Details Conclusion
Parser
7 - hydroxy hept an - 2 - one
mult7
parent suffixλ(X,ane(X*’C’))
parent nonsugarane(7*’C’)
organic compound
prefix[??*[7]-hydroxy]
locant??*[7]
loc[7]
hyphen∅
prefhydroxy
locant??*[2]
suffix[??*[2]-one]
hyphen∅
loc[2]
hyphen∅
suffone
compd( ane(7*C) , pref( [??*[7]-hydroxy] ) ,
suff( [??*[2]-one] ) )
Stefanie Anstein, Gerhard Kremer CHEMorph 7 / 13
Introduction System Details Conclusion
Parser
7 - hydroxy hept an - 2 - one
mult7
parent suffixλ(X,ane(X*’C’))
parent nonsugarane(7*’C’)
organic compound
prefix[??*[7]-hydroxy]
locant??*[7]
loc[7]
hyphen∅
prefhydroxy
locant??*[2]
suffix[??*[2]-one]
hyphen∅
loc[2]
hyphen∅
suffone
compd( ane(7*C) , pref( [??*[7]-hydroxy] ) ,
suff( [??*[2]-one] ) )
Stefanie Anstein, Gerhard Kremer CHEMorph 7 / 13
Introduction System Details Conclusion
Parser
7 - hydroxy hept an - 2 - one
mult7
parent suffixλ(X,ane(X*’C’))
parent nonsugarane(7*’C’)
organic compound
prefix[??*[7]-hydroxy]
locant??*[7]
loc[7]
hyphen∅
prefhydroxy
locant??*[2]
suffix[??*[2]-one]
hyphen∅
loc[2]
hyphen∅
suffone
compd( ane(7*C) , pref( [??*[7]-hydroxy] ) ,
suff( [??*[2]-one] ) )
Stefanie Anstein, Gerhard Kremer CHEMorph 7 / 13
Introduction System Details Conclusion
SMILES String Generator
representation of single chain elements
consistency check
underspecification:
underspecified( CC(=O)CCCCC , [{1,3,4,5,6,7}-hydroxy] )
Stefanie Anstein, Gerhard Kremer CHEMorph 8 / 13
Introduction System Details Conclusion
SMILES String Generator
representation of single chain elements
consistency check
underspecification:
underspecified( CC(=O)CCCCC , [{1,3,4,5,6,7}-hydroxy] )
Stefanie Anstein, Gerhard Kremer CHEMorph 8 / 13
Introduction System Details Conclusion
SMILES String Generator
representation of single chain elements
consistency check
underspecification:
underspecified( CC(=O)CCCCC , [{1,3,4,5,6,7}-hydroxy] )
Stefanie Anstein, Gerhard Kremer CHEMorph 8 / 13
Introduction System Details Conclusion
Classifier
morpheme class
hydroxy- | -ol ALCOHOL
cyclo- & -ane CYCLOALKANE
compd( ane(7*C) , pref([1*[7]-hydroxy]) , suff([1*[2]-one]) )
_ ALKANE, ALCOHOL, KETONE
compd( ene(??*[??],ane(4*’C’)) , pref([]) , suff([]) )
_ ALKENE
Stefanie Anstein, Gerhard Kremer CHEMorph 9 / 13
Introduction System Details Conclusion
Classifier
morpheme class
hydroxy- | -ol ALCOHOL
cyclo- & -ane CYCLOALKANE
compd( ane(7*C) , pref([1*[7]-hydroxy]) , suff([1*[2]-one]) )
_ ALKANE, ALCOHOL, KETONE
compd( ene(??*[??],ane(4*’C’)) , pref([]) , suff([]) )
_ ALKENE
Stefanie Anstein, Gerhard Kremer CHEMorph 9 / 13
Introduction System Details Conclusion
Classifier
morpheme class
hydroxy- | -ol ALCOHOL
cyclo- & -ane CYCLOALKANE
compd( ane(7*C) , pref([1*[7]-hydroxy]) , suff([1*[2]-one]) )
_ ALKANE, ALCOHOL, KETONE
compd( ene(??*[??],ane(4*’C’)) , pref([]) , suff([]) )
_ ALKENE
Stefanie Anstein, Gerhard Kremer CHEMorph 9 / 13
Introduction System Details Conclusion
Classifier
morpheme class
hydroxy- | -ol ALCOHOL
cyclo- & -ane CYCLOALKANE
compd( ane(7*C) , pref([1*[7]-hydroxy]) , suff([1*[2]-one]) )
_ ALKANE, ALCOHOL, KETONE
compd( ene(??*[??],ane(4*’C’)) , pref([]) , suff([]) )
_ ALKENE
Stefanie Anstein, Gerhard Kremer CHEMorph 9 / 13
Introduction System Details Conclusion
Results & Applications
SMILES string and classification
underspecification
term reference
coreference resolution
database curation and ontology acquisition
Stefanie Anstein, Gerhard Kremer CHEMorph 10 / 13
Introduction System Details Conclusion
Results & Applications
SMILES string and classification
underspecification
term reference
coreference resolution
database curation and ontology acquisition
Stefanie Anstein, Gerhard Kremer CHEMorph 10 / 13
Introduction System Details Conclusion
Results & Applications
SMILES string and classification
underspecification
term reference
coreference resolution
database curation and ontology acquisition
Stefanie Anstein, Gerhard Kremer CHEMorph 10 / 13
Introduction System Details Conclusion
Results & Applications
SMILES string and classification
underspecification
term reference
coreference resolution
database curation and ontology acquisition
Stefanie Anstein, Gerhard Kremer CHEMorph 10 / 13
Introduction System Details Conclusion
Results & Applications
SMILES string and classification
underspecification
term reference
coreference resolution
database curation and ontology acquisition
Stefanie Anstein, Gerhard Kremer CHEMorph 10 / 13
Introduction System Details Conclusion
Conclusion & Outlook
feasible, extendable and transferable approach
extend grammar and lexicon
elaborate SMILES and classification
sophisticated linguistic analysis _ database curation
term identification _ text processing applications
Stefanie Anstein, Gerhard Kremer CHEMorph 11 / 13
Introduction System Details Conclusion
Conclusion & Outlook
feasible, extendable and transferable approach
extend grammar and lexicon
elaborate SMILES and classification
sophisticated linguistic analysis _ database curation
term identification _ text processing applications
Stefanie Anstein, Gerhard Kremer CHEMorph 11 / 13
Introduction System Details Conclusion
Conclusion & Outlook
feasible, extendable and transferable approach
extend grammar and lexicon
elaborate SMILES and classification
sophisticated linguistic analysis _ database curation
term identification _ text processing applications
Stefanie Anstein, Gerhard Kremer CHEMorph 11 / 13
Introduction System Details Conclusion
Acknowledgements
Stefanie Anstein
Uwe Reyle
Jasmin Saric
EML Research gGmbH
Stefanie Anstein, Gerhard Kremer CHEMorph 12 / 13
Introduction System Details Conclusion
Schonen Dank.
Stefanie Anstein, Gerhard Kremer CHEMorph 13 / 13
IUPAC Nomenclatures
Amino Acids and Peptides EC 5 Isomerases Phosphorus containing compds
Biochemical thermodynamics EC 6 Ligases Polymerized amino acids
Branched nucleic acids Folic acid Polypeptide conformation
Carbohydrates Glycolipids Polynucleotide conformation
Carotenoids Glycoproteins Polysaccharide conformation
Corrinoids (vitamin B12) myo-Inositol numbering Prenol nomenclature
Cyclitols Lignan Nomenclature Pyridoxal (vitamin B6)
Electron transport proteins Lipid Nomenclature Quinones w. an Isoprenoid Chain
Enzyme kinetics Multienzymes Retinoids
Enzyme nomenclature Multiple forms of enzymes Steroids
EC 1 Oxidoreductases Nucleic acid constituents Tetrapyrroles
EC 2 Transferases Nucleic acid sequence Tocopherols (vitamin E)
EC 3 Hydrolases Organic Chemistry Translation Factors
EC 4 Lyases Peptide hormones Vitamin D
KEGG: Kyoto Encyclopedia of Genes and Genomes
7-HYDROXYHEPTAN-2-ONE
PRIMARY ALCOHOL
ALCOHOL
7-HYDROXYHEPTANE
HYDROXYHEPTANE
HEPTANE
7-HYDROXYALKANE
HYDROXYALKANE
7-HYDROXYKETONE
HYDROXYKETONE
HYDROXYHEPTAN-2-ONE
HEPTAN-2-ONE
KETONEALKANE