Thomas Steinke Zuse Institute Berlin (ZIB) [email protected] Activities of the COST D37 GridChem...
-
Upload
dortha-osborne -
Category
Documents
-
view
218 -
download
0
Transcript of Thomas Steinke Zuse Institute Berlin (ZIB) [email protected] Activities of the COST D37 GridChem...
![Page 1: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/1.jpg)
Thomas Steinke
Zuse Institute Berlin (ZIB) <www.zib.de>[email protected]
Activities of the COST D37 GridChemActivities of the COST D37 GridChemComputational Chemistry Workflow Computational Chemistry Workflow
GroupGroup
EGEE'07 ConferenceEGEE'07 Conference
BudapestBudapest
01.10.200701.10.2007
![Page 2: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/2.jpg)
2
• Berlin
• Manno•
• Erlangen
• London•
• Sevilla
Zürich
Cambridge Thomas Steinke, Tim Clark (DE)
Hans-Peter Lüthi, Martin Brändle
(CH)
Peter Murray-Rust, Henry Rzepa
(UK)
Antonio Márquez (ES)
Kurt Mikkelsen (DK)
- CSCS (Manno, CH)
- ZIB (Berlin, DE)
Partners in the CCWF Working Group
København•
![Page 3: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/3.jpg)
3
“Traditional” Workflow in Computational Chemistry
Workflows have a long tradition in the CC domain.
start knowledge base (DB search)automated/manually edited molecular structuresmolecular simulations
method / program Amethod / program B…
propertiesprimary visualization / quality controlanalysis / archival / DB storagenew insights?
in the 80’s – 90’s
![Page 4: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/4.jpg)
4
Databases: Computational protocol (T. Clark, 1998)
Complete protocol runs automatically with less than 0.5% failure rate. Cleanup 2D 3D conversion VAMP optimization Calculate properties
~3,000 compounds per processor day (3 GHz Xeon)
Enhanced 3D-Databases: A Fully Electrostatic Database of AM1-Optimized Structures B. Beck, A. Horn, J. E. Carpenter, and T. Clark, J.Chem. Inf. Comput.Sci. 1998, 38, 1214-1217.
source: Tim Clark, Uni Erlangen
![Page 5: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/5.jpg)
5
Distributed Computing Environment in the 90’s
QMpackages
![Page 6: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/6.jpg)
6
Distributed Computing Environment in the 90’s
Example: UniChemdistributed environment for quantum-chemical
simulationsCray Research Inc. 1991-(2004)
![Page 7: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/7.jpg)
7
CCWF Chemical Illustrator Applications
Molecular design of functionalised enzynesHans-Peter Lüthi, Martin Brändle, ZürichPeter Murray-Rust, Cambridge; Henry Rzepa, London
Quantum chemical based QSAR/QSPRTim Clark, Erlangen; Jon Essex, Southampton
High-order dynamic and static electrostatic molecular properties
Kurt Mikkelsen, Copenhagen
Computational heterogeneous catalysisAntonio M. Márquez Cruz, Javier Fdez. Sanz, Sevilla
![Page 8: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/8.jpg)
8
Molecular Design Workflow (Enzyne Design)
Steps: Generation and
Archiving of data
ExtractionXPath queries
Statistical Analysis
DB
QC Input
QC Output
Input
Output
Parser
StatisticalAnalysis
XMLXPathQuery
XSLT
QCApplication
source: Hans-Peter Lüthi, ETH Zürich
![Page 9: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/9.jpg)
9
Quantum Chemical Based QSAR and QSPR
2D-Database
2D 3DConformations,
Tautomers
VAMP
ParaSurf
QSPR
Virtual Screening
ADME/Tox.
Pharmacokinetics
Molecular Info
Materials Design
Multiscale Modeling
Property Optimization
generate structures,conformations and protonation states
semiempirical MO geometry optimization and electron density
generate isodensity surfaces, spherical-harmonic fits and local properties
apply models
source: Tim Clark, Uni Erlangen
![Page 10: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/10.jpg)
10
-14 -12 -10 -8 -6 -4 -2 0 2 4
Experimental Gsolv(H2O) (kcal mol-1)
-14
-12
-10
-8
-6
-4
-2
0
2
4
Cal
cula
ted
G
solv(H
2O)
(kca
l mol
-1)
Properties: Free Energies of Hydration
N = 362MUE = 0.85 kcal mol-1
RMSD = 1.09 kcal mol-1
r2 = 0.88q2 = 0.83
source: Tim Clark, Uni Erlangen
![Page 11: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/11.jpg)
11
Computing the NCI database (P. Murray-Rust, ’05)
MOPACPM5
source: Peter Murray-Rust et al., Uni Cambridge / Unilever Institute
Workflow built with Taverna
![Page 12: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/12.jpg)
12
Times to run jobs
0
40,000
80,000
120,000
0.E+00 5.E+08 1.E+09
(n basis functions)4
time
/ s
source: Peter Murray-Rust et al., Uni Cambridge / Unilever Institute
![Page 13: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/13.jpg)
13
Protocol
Log Files
Parse
SystemCrashes
ScienceErrors
Analysis
PathologicalBehaviour
Statistics
Other Science DisseminateResults
UnsuitableData
ProgramCrashes
InformDeveloper
source: Peter Murray-Rust et al., Uni Cambridge / Unilever Institute
![Page 14: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/14.jpg)
14
source: Peter Murray-Rust et al., Uni Cambridge / Unilever Institute
Conclusions from NCI “Experiment” (2005)
Protocols can be automated
Machines can highlight unusual behaviour, geometries and distribution of results for humans to consider
Computational programs can provide high quality “experimental” molecular properties
![Page 15: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/15.jpg)
15
Motivation
The orchestration of complex workflow scenarios is on today’s agenda.
complex scientific solution paths linking in-house and (commercial) legacy codes
Transformation of scientific ventures into a scientifically validated protocol
allowing a highly (semi-) automated data generation (pre-processing) and data processing steps.
![Page 16: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/16.jpg)
16
Goals of the CCWF Working Group
implementation of workflow environments for QC by adapting standard (Grid) technologies
fostering standard techniques (interfaces) for handling quantum chemical data in a flexible and extensible format to ensure application program interoperability and support of an efficient access to chemical information based on a CC ontology.
implementation of computational chemistry illustrator scenarios to demonstrate the applicability of our approach
![Page 17: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/17.jpg)
17
Generic Workflow
1. Automatic generation + validation of input data
2. Submission, monitoring, and gathering of output data of
simulation jobs
3. Integration of results (primary data) into project database
4. Data mining and visualization techniques to reduce
complexity
5. Knowledge generation by applying methods of statistical
analysis and pattern recognition.
6. On-line publication and archiving of valuable scientific
data.
![Page 18: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/18.jpg)
18
Challenges
Diversity:Molecular properties derived from state functions obtained with electronic-structure methods. ab-initio, semi-empirical, DFT, approximate potentials
Gaussian, COLUMBUS, Dalton, Turbomole, MOPAC, Vamp, CPMD…
Data formats:How to implement seamless data export/import? ~80 relevant formats known in CC: XYZ, MDL, SDF, PDB, …
OpenBABEL
![Page 19: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/19.jpg)
19
Challenges (cont.)
Scaling, Robustness, Load Balancing:I can handle O(10) jobs by hand but…what about campaigns of O(1000) of jobs? workflow system computational resources distributed computing persistence, automated failure recovery, … long simulation times, sometimes unpredictable
Acceptance: easy of use, GUI + CLI
![Page 20: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/20.jpg)
20
What I Want…
easy-of-use: workflow orchestration usage installation / maintenance
sharing of workflow descriptions with my colleagues standard languages
support in a heterogeneous environment laptop – server – cluster – supercomputer – grid
![Page 21: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/21.jpg)
21
Which Workflow System?
… to be spoilt for choice?
![Page 22: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/22.jpg)
22
Some Assessment Criteria
workflows in distributed systems supported batch systems: PBS (,
LSF) support for managing large files
recovery / backup
quality of the documentation customizability PKI / security
required installation effort Web interface WF language
robustness, stability Grid environment open source
restart/stop/debugging user/installation base
status & exception handling legacy codes and Web services project development activity
GUI
![Page 23: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/23.jpg)
23
TRIANA Experiences (2005/06)
workflow orchestration integration of web
services semantic check of WSDL
files support for self-written
Triana modules negligible control logic
overhead pre-requisite for migration
to Grid environments
- proprietary workflow description language in TRIANA (BPEL is announced)
- GUI robustness for very complex workflow definitions
![Page 24: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/24.jpg)
24
GWES Experiences (MediGRID, since 2006)
integration of web services and legacy codes
monitoring + debugging support
Grid environments under active development
(A. Hoheisel et al./FhG FIRST)
- workflow orchestration (WF GUI builder in preparation)
- proprietary workflow description language
![Page 25: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/25.jpg)
25
![Page 26: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/26.jpg)
26
OMII Server: Attracting Features
Workflows language: BPEL (Active BPEL) WF editor (Eclipse) Web Services customization
Jobs submission & monitoring via
WS job manager API
persistent (job recovery), in-memory (via Hibernate)
Distributed Resource Management (DRM)
Condor-G, Globus Gram SSH-exec your own plug-ins, e.g. PBS
Data GridSAM file staging support within job (JSDL): file stage in/out Apache Virtual File System library
(vfs) FTP, local files, http, http, ssftp zip, jar, tar, bzip2, gzip ram - data in memory
GridFTP
![Page 27: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/27.jpg)
27
OMII/Active BPEL Experiences (3 months)
workflow orchestration (Eclipse plugin)
standardized WF language monitoring support Grid environments security features: https +
signed messages (X.509 cert.)
active development (UK eScience)
- deployment requires manual workarounds
- learning barrier (BPEL)- BPEL editor not fully
mature (validation of BPEL workflows)
![Page 28: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/28.jpg)
28
Summary
there are a couple of workflow system available design/development of workflow system still an on-
going research not yet decided for our working group
barriers: easy to use vs. robustness middleware stack: more complicated Grid
environments vs. script-based approaches on clusters
standards vs. proprietary but powerful/sufficient WF languages BPEL has a high chance to survive
![Page 29: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/29.jpg)
29
Acknowledgement
Core members of D37 CCWF working group Hans-Peter Lüthi, ETH Zurich Tim Clark, CCC Uni Erlangen J. A. Townsend, P. Murray-Rust, S. M. Tyrrell, Y. Zhang, Uni
Cambridge/Unilever Inst.
developer of workflow systems mentioned in this talk
![Page 30: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649eba5503460f94bc2264/html5/thumbnails/30.jpg)
30
QUESTIONS?