Interactive Datamining of Large-Scale Screening Datasets
-
Upload
frank-oellien -
Category
Documents
-
view
184 -
download
1
Transcript of Interactive Datamining of Large-Scale Screening Datasets
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Interactive Datamining of Large-Scale Screening Datasets
Klaus Engel, Thomas ErtlVisualization and Interactive Systems Group University Stuttgart
Frank Oellien, Wolf D. IhlenfeldtComputer-Chemie-Centrum University Erlangen-Nuremberg
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Chemical data
0
2000000
4000000
6000000
8000000
10000000
12000000
14000000
16000000
18000000
Merck Katalog
Synopsys PG
ACX
NCI DTP
ChemInform
Spresi
Beilstein
CAS
Current datasets
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Multi-Variate and Multi-Dimensional Numeric Datasets Today
Change in chemical synthesis technology
• new technologies (HTS, combinatorial synthesis)
→ experiments generate terabytes of data per year• development of data mining and visualization tools
could not keep pace• most critical bottleneck in R&D today !
→ tools for interactive mining and information visualization are needed
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Tools for Interactive Visualization of Multi-Variate and Multi-Dimensional Data
Standard applications
• barchart, 2D and pseudo 3D scatter plots, molecular spreadsheets
• limited to small subsets
• platform-dependent
Our goal: applications that are• simple to use• allow straightforward interpretation of results
• generalized access to tabular numeric data
• platform-independent
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
3D Tools for Interactive Information Visualization
Information Visualization Applications that uses 3D capabilities of modern clients
• Glyph-based InfVis approaches
• Volume-based InfVis approaches
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Glyph-based InfVis Tools
• 3 orthogonal axes
• color
• shape
• size
• transparency
• surface effects
• animation
• up to ~100 Glyphs
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Java/Java3D InfVis Applet
Tool Panel(filters, selection
tools, details)
Java3DCanvas
ControlPanel
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Java/Java3D InfVis Applet3D Render Panel
3D Barchart3D Glyphs
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Java/Java3D InfVis Applet3D Tool Panel
Dynamic Filter Tools
Selection Tools
Detail Tools
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Advantages of Volume-based InfVis Tools
Databases with millions of data points – Glyph-based InfVis approaches
• produce millions of geometricprimitives
• interactive visualization not possible
– Volume-based InfVis approaches • can handle large number of
data points
• interactive visualization using low-cost graphics hardware is possible
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
ChemCodes Reaction Database
• 100 most important FGs ~75% chemistry• 100 standard reactions• Limits of standard reactions• Functional Group Compatibility• Generating Rules
Goal: Analysis of the reaction space
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
ChemCodes - Reaction Optimization I
• Goal: Reaction Optimization: > 95% Yield
• 7 Dimensions:reagent, solvent, time, temperature,stoichiometry,reagent order,FG-compatibility
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
FunctionalGroupCompatibilityCheck
ChemCodes - Reaction Planning
N
H H
H
O
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Example 2: NCI Anti-tumor / Anti-viral Database
• Initiated in April 1990 (modified 1994)• ~ 250.000 compounds• ~ 30.000 with anti-tumor screening data
Enhanced NCI Database Browser• > 30 different molecular properties• up to 23 3D conformers per compound
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Acknowledgment
• Prof. Johann GasteigerComputer-Chemie-CentrumUniversity of Erlangen-Nuremberg
• Prof. Thomas Ertl, Dipl. Inf. Klaus Engel Visualization and interactive SystemsUniversity of Stuttgart
• Dr. Patrick Kiser, Dr. Gary Eichenbaum ChemCodes Inc.
• Marc NicklausLaboratory of Medicinal ChemistryNCI, NIH
• Deutsche Forschungsgemeinschaft