Post on 19-Jan-2016
description
Efrat Jaeger – SDSC
Bertram Ludäscher – UC DAVIS
Krishna Sinha – Virginia Tech
Ashraf Memon – SDSC
Ghulam Memon – SDSC
Ilkay Altintas – SDSC
Kai Lin – SDSC
& many others esp. KEPLER community
San Diego Supercomputer Center
UC DAVISDepartment ofComputer Science
CYBERINFRASTRUCTUREFOR THE GEOSCIENCES
Scientific Workflows & GEON
GEON AHM May 5-6, 2005, San Diego
CYBERINFRASTRUCTUREFOR THE GEOSCIENCES
Scientific Workflows Pre-Cyberinfrastructure
• Data Federation & Grid “Plumbing”:– access, move, replicate, query … data (Data-Grid)
• authenticate … SRB Sget/Sput … OPeNDAP, … Antelope/ORBs– schedule, launch, monitor jobs (Compute-Grid)
• Globus, Condor, Nimrod, APST, … • Data Integration:
– Conceptual querying & integration, structure & semantics, e.g. mediation w/ SQL, XQuery + OWL (Semantics-enabled Mediator)
• Data Analysis, Mining, Knowledge Discovery:– manual/textbook (e.g. ternary diagrams), Excel, R, simulations, …
• Visualization:– 3-D (volume), 4-D (spatio-temporal), n-D (conceptual views) …
one-of-a-kind custom apps., detached (island) solutions workflows are hard to reproduce, maintain no/little workflow design, automation, reuse, documentation
need for an integrated scientific workflow environment
GEON AHM May 5-6, 2005, San Diego
CYBERINFRASTRUCTUREFOR THE GEOSCIENCES
GEON AHM May 5-6, 2005, San Diego
CYBERINFRASTRUCTUREFOR THE GEOSCIENCES
Analysis Workflow in KEPLER
• Scientific Workflow (SWF) design• SWF automation• Exploration & discovery mode (change
parameters, data sets, etc. and rerun)• SWF reuse, documentation, reproducibility
GEON AHM May 5-6, 2005, San Diego
CYBERINFRASTRUCTUREFOR THE GEOSCIENCES
Some KEPLER Components (Actors)
GEON AHM May 5-6, 2005, San Diego
CYBERINFRASTRUCTUREFOR THE GEOSCIENCES
KEPLER Team Work: GEON Dataset Generation & Registration
Xiaowen (SDM)
Edward et al.(Ptolemy)
Yang (Ptolemy)
Efrat(GEON)
Ilkay(SDM)
SQL database access (JDBC)Matt,Chad,
Dan et al. (SEEK)
% Makefile$> ant run
% Makefile$> ant run
GEON AHM May 5-6, 2005, San Diego
CYBERINFRASTRUCTUREFOR THE GEOSCIENCES
KEPLER: an open source, cross-project collaboration
Ilkay Altintas SDM, Resurgence, NLADR,…Kim Baldridge Resurgence, NMI Chad Berkley SEEK Shawn Bowers SEEKTerence Critchlow SDM Tobin Fricke ROADNetJeffrey Grethe BIRNChristopher H. Brooks Ptolemy II Zhengang Cheng SDM Dan Higgins SEEKEfrat Jaeger GEON Matt Jones SEEK Werner Krebs, EOLEdward A. Lee Ptolemy II Kai Lin GEONBertram Ludaescher GEON, SDM, SEEK, BIRN, ROADNetMark Miller EOLSteve Mock NMISteve Neuendorffer Ptolemy II Jing Tao SEEK Mladen Vouk SDM Xiaowen Xin SDM Yang Zhao Ptolemy IIBing Zhu SEEK •••
Ptolemy IIPtolemy II
www.kepler-project.orgwww.kepler-project.org
Your Logos& NamesHERE!!!
GEON AHM May 5-6, 2005, San Diego
CYBERINFRASTRUCTUREFOR THE GEOSCIENCES
GEON AHM May 5-6, 2005, San Diego
CYBERINFRASTRUCTUREFOR THE GEOSCIENCES
Demonstration by Efrat Jaeger
GEON AHM May 5-6, 2005, San Diego
CYBERINFRASTRUCTUREFOR THE GEOSCIENCES
Q & A
GEON AHM May 5-6, 2005, San Diego
CYBERINFRASTRUCTUREFOR THE GEOSCIENCES
KEPLER: An Open Collaboration
• Initiated by members from NSF/ITR SEEK and DOE SDM/SPA; now several other projects (GEON, Ptolemy II, EOL, Resurgence/NMI, …)
• Open Source (BSD-style license)
• Intensive Communications: – Web-archived mailing lists– IRC (!)– Meetings, Hackathons
• Co-development: – via shared CVS repository– joining as a new co-developer (currently):
• get a CVS account (read-only)• local development + contribution via existing KEPLER member• be voted “in” as a member/co-developer
GEON AHM May 5-6, 2005, San Diego
CYBERINFRASTRUCTUREFOR THE GEOSCIENCES
Scientific Workflow (SWF) Design
• Support SWF design & reuse, via:– Structural data types – Semantic types– Associations (=constraints) between
them – Type checking, inference,
propagationSeparation of concerns:– structure, semantics, WF
orchestration, etc.
GEON AHM May 5-6, 2005, San Diego
CYBERINFRASTRUCTUREFOR THE GEOSCIENCES
Related Publications
Scientific Workflows• Scientific Workflow Management and the Kepler System, B. Ludäscher, I. Altintas, C. Berkley, D.
Higgins, E. Jaeger-Frank, M. Jones, E. Lee, J. Tao, Y. Zhao, Concurrency and Computation: Practice & Experience, Special Issue on Scientific Workflows, to appear, 2005.
• A Framework for the Design and Reuse of Grid Workflows, Ilkay Altintas, Adam Birnbaum, Kim Baldridge, Wibke Sudholt, Mark Miller, Celine Amoreira, Yohann Potier, and Bertram Ludaescher, Intl. Workshop on Scientific Applications on Grid Computing (SAG'04), LNCS 3458, Springer, 2005
• Kepler: An Extensible System for Design and Execution of Scientific Workflows, I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, S. Mock, 16th International Conference on Scientific and Statistical Database Management (SSDBM'04), 21-23 June 2004, Santorini Island, Greece.
• Kepler: Towards a Grid-Enabled System for Scientific Workflows, Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher, Steve Mock, Workflow in Grid Systems (GGF10), Berlin, March 9th, 2004.
• An Ontology-Driven Framework for Data Transformation in Scientific Workflows, S. Bowers and B. Ludäscher, Intl. Workshop on Data Integration in the Life Sciences (DILS'04), March 25-26, 2004 Leipzig, Germany, LNCS 2994.
• A Web Service Composition and Deployment Framework for Scientific Workflows, I. Altintas, E. Jaeger, K. Lin, B. Ludaescher, A. Memon, In the 2nd Intl. Conference on Web Services (ICWS), San Diego, California, July 2004.
GEON AHM May 5-6, 2005, San Diego
CYBERINFRASTRUCTUREFOR THE GEOSCIENCES
Data Data IntegrationIntegration
KnowledgeKnowledgeRepresentationRepresentation
Process IntegrationProcess Integration(Scientific Workflows)(Scientific Workflows)
Source: B. Ludaescher, UC Source: B. Ludaescher, UC DAVISDAVISECS-289 Scientific Data Management WQ’05ECS-289 Scientific Data Management WQ’05
Data Data FederationFederation
EcoEcoGridGrid