OGCE MSI Presentation
description
Transcript of OGCE MSI Presentation
Open Gateway Computing Environments: Software for
Science Gateways
Marlon Pierce, Suresh Marru, Raminder Singh, Gerald Guo, Archit Kulshrestha,
Ye Fan, Patanachai Tangchaisin, and collaborators.
What Is a Science Gateway?
• User Interface and supporting Web services to scientific applications, data sets, and resources running on cyberinfrastructure.– Science portals, Grid Computing Environments, …– Broaden and simplify usage
• Cyberinfrastructure: Distributed computing resources and overlaying middleware for scientific computing.– Prominent examples include TeraGrid, Open Science Grid– Middleware includes Globus, Condor, iRods/SRB, …– Some of these approaches being pushed by scientific cloud
computing– That is another topic
TeraGrid is one of the largest investments in shared CI from NSF’s Office of Cyberinfrastructure
2 PetaFLOPS
Computation Visualization
20 Petabyte
s Storage
Dedicated high-speed, cross—
country networkStaff & Advanced
Support
Compute Resources
Resource Middleware Cloud Interfaces Grid Middleware SSH & Resource
Managers
Computational Clouds
Computational Grids
Gateway Software
User Interfaces
Web/Gadget
Container
Web Enabled Desktop
Applications
User Managemen
t
Auditing & Reporting
Fault Tolerance
Application Abstractions
Workflow System
Information ServicesMonitoring
Registry Security
Provenance & Metadata Managemen
t
Local Resources
Web/Gadget
Interfaces
Gateway Abstraction Interfaces
Cyberinfrastructure Layers
Color Coding
Dependent resource provider components
Complimentary Gateway Components
OGCE Gateway Components
Open Gateway Computing Environments
• The OGCE team develops software for building secure, Web-based Science Gateways– Chemistry, Bioinformatics, Biophysics,
Environmental Sciences• OGCE is funded by the National Science
Foundation’s Software Development for Cyberinfrastructure (SDCI) program.
OGCE Funds Software Lifecycle
OGCE SoftwareName Description
OGCE Gadget Container
An OpenSocial and Google gadget-compatible Web container for running Web gadgets.
GFAC A Web service for generating, securely invoking, and managing the lifecycle of scientific applications on Grids and Clouds
Workflow Tools Composer (XBaya), enactment (“interpreter”) engines, event system, and service registry to support scientific workflows on Grids and Clouds.
Gadgets and Gadget Building Tools
Tools for building secure Google-gadget based Science Gateways.
Putting It All Together
OGCE Components in ActionFeatured Gateway
OGCE Components Used
UltraScan GFAC scientific application management service
GridChem, ParamChem
XBaya workflow composer, OGCE Messenger Service, XRegistry
SimpleGrid OGCE Gadget Container (in development)
Purdue CCSM Portal
Gadget Container and gadget building libraries (in development)
BioVLAB GFAC, XBaya, XRegistry, Workflow Interpreter Service
Software Strategy• We develop downloadable, packaged, open source
software• SourceForge
• Focus: a) gadget container and b) tools for running science applications and workflows on grids and clouds.
• Provide a tool set that can be used in whole or in part.– If you just want GFac, then you can use it without buying an
entire framework.• Out of our scope: visualization, security, information
services, data and metadata provenance and management.– MyProxy, TG IIS, Globus, Condor, XMC Cat, iRods, etc.
Apache Incubators
• Joining Apache is key to our software sustainability strategy– Open source licensing, meritocracy, visibility
• Vigyan: tools for science gateway services and workflows– XBaya, GFAC, Messenger, XRegistry– Collaboration with WS02/LSF, IBM– Builds on Apache Axis2, Apache ODE
• Rave: OpenSocial gadget manager, general purpose gadgets– Collaboration with Hippo, Mitre, SURFnet– Builds on Apache Shindig
The OGCE Gadget Container
Managing layouts, look and feel, and behind-the-scenes services for
aggregated Web gadgets
The OGCE Gadget Container allows you to build portals out of public and private Google Open Social gadgets. Supports HTTPS. Downloadable, packaged software.
The OGCE Application Registry gadget allows users to interactively register hosts and applications that are
dynamically wrapped as Web services.
Google Gadget-Based Science Gateways
PolarGrid
LEAD
Mobile SupportGadget Container is built with HTML, JavaScript and CSS. Works in both iPhone and Android native browsers with out modification.
Developing layout managers better suited to limited screen real estate.
Feature Groups Features
Look and Feel Tabbed and Tree layout managers, 2 and 3 column layouts, default maximized views of gadgets, customizable color styling.
Security Supports end-to-end SSL between browser, container, and gadgets; OpenID authentation; OAuth-secured gadgets; MyProxy logins; limited Grid credential sharing between gadgets; CILogon for InCommon login
Inter-Gadget Communication
Supports OpenAjax publish-subscribe style messaging between gadgets. PMRPC JavaScript messaging support in development
REST Service API Layouts, logins, sign-ups, user administration, user identification, and Grid credentials all accessible via REST service calls as well as the user interface.
Open Source Social Networking
All code is open source and builds on Apache Shindig 2.0.
Gadget Development
Support for GWT-based gadgets and YUI JavaScript libraries in development.
SimpleGrid Gadgets
Requires YUI integration, OpenAJAX messaging, REST APIs
Bioinformatics Workflows in the Cloud
BioVLAB Architecture
BioVLAB Application Deployment Procedure
• Develop a command line app.
• Install the app. in Amazon EC2• Let the app. store any output to Amazon S3 • Make a virtual machine image• Register the app. by using Gfac
• Instantiate EC2 and run the app. by using XBaya
Use
rAd
min
Use
r
Gfac Registration form
• Analysis of high throughput microarray experiment
• Multiple tasks in a single batch
• Output of a task can plugged into another task
• Repeat the same set of tasks with small changes of parameters
BioVLAB-Microarray
BioVLAB-mCpG
OGCE Layered Workflow Architecture:Derived from LEAD Workflow System
Workflow Execution & Control
Engines
Apache ODE
Workflow Specification
Workflow Interfaces (Design
& Definition)
PythonBPEL 2.0
BPEL 1.0 Java Code Pegasus DAG
Scufl
XBaya GUI (Composition,
Deploying, Steering & Monitoring) Gadget Interface for
Input Binding
Condor DAGMan
Taverna
Dynamic Enactor
Jython InterpreterGBPEL
Flex/Web Composition
UltraScan Science Gateway
Biophysics gateway for ultracentrifugation experiment data
analysis
UTHSCSA JacintoTerascale storage
Web Server
US LIMS
MySQL DB
User
High Performance Computing Clusters
TeraGrid
TIGRE/Globus Network
GridControl
UltraScan2 High Level Overview
UltraScan TG Usage July 2007-June 2010
UltraScan Collaboration
• Immediate Goals: Use GFAC as a replacement job submission service. – GRAM 2, 4, 5 independence– Significant effort into GRAM5
testing on Ranger.• Longer term goals
– Integrate with TG information services to provide better job scheduling.
• OGCE Resource Prediction Service
– Support UNICORE job management.
Current Architecture
UltraScan problems Solution provided by OGCE
Gateway code can only submit to resources with GRAM4 installed and running.
GFAC supports different provider like GRAM2/4/5, Condor, Local, Remote using SSH keys. There is a generic GUI interface to configure them all.
Adding new resource is time consuming User need to fill two web form to configure new resource.
Local cluster needed to install GRAM4. We can directly invoke mpirun on local or remote cluster using local/remote providers.
TACC resources like Lonestar and Ranger decided not to install GRAM4 and move to GRAM5.
Its was easy to start using GRAM5 in GFAC but time consuming to GRAM5 to run operationally on these resources.
Problem related to job failure and missing status.
Retry mechanism for certain GRAM error codes but still trying to find how to deal with missing status or reconnect to those jobs as Globus api does not support that.
Restart of jobs were not provided in Gateway even application supports check pointing.
Added restart job support from checkpoint files.
Ultrascan3 need to rewrite all these component again as it using different technology.
Provided REST interface to OGCE services and now different language clients can call same interfaces for required operations.
GFac Current & Future Features
Input Handlers
Scheduling Interface
Auditing
Monitoring Interface
Data Management Abstraction
Job ManagementAbstraction
Fault Tolerance
Output Handlers
Registry Interface
Checkpoint Support
Apac
he A
xis2
Globus
Campus Resources
Unicore
Condor
Amazon Eucalyptus
Color Coding
Planned/Requested Features
Existing Features
Gram5 Testing
• Developed Testing harness to run different cases.• Started with small number of jobs and increased
the concurrency later• Watched job behavior of the job on resource and
monitored the gram log– There were lot of issue which we found from
the logs and working with Globus team to fix them
• Recorded all the job run data to create a google gadget to create graph for different runs on different resources.
Patterns:
TG Resources and patternsVersion Resource EndpointGT 5.0.2 QueenBee queenbee.loni-lsu.teragrid.org:2120/jobmanager-pbs
GT 5.0.2 Ranger login5.ranger.tacc.teragrid.org:2120/jobmanager-sge
GT 5.0.2 Lonestar gatekeeper.lonestar.tacc.teragrid.org:2120/jobmanager-lsf
Concurrent jobs Batch Size Total jobs Job Status Pass : Fail
1 10 10 10:0
3 10 30 30:0
5 10 50 50:0
10 10 100 20:0
20 10 200 40:0
50 10 500 100:0
100 10 1000 200:0
200 5 1000 Not tested (Need allocation)
500 2 1000 Not tested (Need allocation)
GFAC Integration
• UltraScan job submission previously relied on GRAM4 GFAC integrated as middleware to abstract
submission process GRAM5, UNICORE and any future mechanism
• Science Gateway is in active use Initial testing done on IU quarry node Extensively tested job submission process using
GFAC to LONI's QueenBee and TACC's Ranger Deployed 26 October 2010 Implementation details available
http://wiki.bcf.uthscsa.edu/cauma/wiki/US2GFACTesting
GridChem/ParamChem
Gateways for Computational Chemistry
GridChem Science Gateway• A chemistry/material Science Gateway for running
computational chemistry codes, workflows, and parameter sweeps.
• Integrates molecular science applications and tools for community use.
• 400+ users heavily using TeraGrid. One of the consistent top5 TeraGrid Gateway users.
• Supports all popular Chemistry applications including Gaussian, GAMESS, NWChem, QMCPack, Amber and MolPro, CHARMM
• ParamChem is a follow-on project to develop workflows for chemical parameter studies and provide the infrastructure to execute them.
Empirical ForceFields Parameterization Need Process
Vanommeslaeghe et al. J. Comp.Chem 2010, 31, 671-690
Published by AAAS
A. J. Stone Science 321, 787 -789 (2008)
Fig. 1. Errors (V) in electrostatic potential on a surface at 1.8 times van der Waals radii around N-methyl propanamide for two models. (Left) Point charges; (right) charge, dipole, and quadrupole on C, N, and O; charge and dipole on H. The errors are much reduced in the multipole approach
Lack of Accurate Force Fields Produce Erroneous Property Estimation
Cyberenvironments for ParameterizationComputational Reference Data Generation
Conclusions• Our project focus is providing long-term sustainable
software for science gateways.• What we learned:
– Try to serve a few high profile collaborators very well.• Derive good software engineering practices from this: versioning,
code reviews, testing , packaging, portability, …
– Define and keep to your project’s scope. – Let the collaborations determine the direction of innovation
• This is more than just getting “customer requirements”. Collaborators expect you to know your field and guide them.
• There is a tension between this and research– “Collaborators, not customers” is the resolution.
More Information• OGCE Web Site: http://www.collab-ogce.org• News Feed/Blog: http://collab-ogce.blogspot.com• Contact us:
– [email protected]– http://groups.google.com/group/ogce-discuss/
• Software Downloads: Software is available as tagged SVN releases from our SourceForge project. – http://sourceforge.net/projects/ogce/ – See
http://www.collab-ogce.org/ogce/index.php/Portal_download
Backup Slides
OGCE Partners and PeopleInstitution PeopleIndiana University
Marlon Pierce, Suresh Marru, Raminder Singh, Archit Kulshrestha, Gerald Guo
NCSA/UIUC Sudhakar Pamidighantam, Shaowen Wang, Yan Liu
Purdue University
Carol Song, Lan Zhao, David Braun, Shawn Wu
UTHSCSA Emre Brookes, Borries Demeler, Bruce Dubbs
Award Highlights
• Full Circle Development– Directly fund both software developers and gateway
consumers.• Directly supported (non-IU) gateways:
– UltraScan (UTHSCSA), GridChem (NCSA), SimpleGrid (UIUC), Purdue CCSM and Environmental Gateways
– Among the most used TG gateways.• Sustainability strategy: Apache Incubator for
workflow suite of tools – XBaya, GFac, and supporting services.
SimpleGrid, GISolve
• Short term goal: develop SimpleGrid Gadgets deployable into gadget container.– Must meet security requirements– Support PHP development– Support interactivity requirements
• Integrate YUI JavaScript libraries with Gadget JavaScript.
• Longer term goals: investigate workflow, job management tools. Apply to GISolve
Purdue CCSM and Data Portals
• Short terms goals: Develop CCSM and data management gadgets and necessary backing middleware.– Interactivity and security requirements.– Significant requirements overlap with SimpleGrid
• Longer term goals: Build gateways out of gadgets hosted by multiple containers; examine workflow and other tools.
Open Gateway Computing Environments
• The OGCE team develops software for building secure, Web-based Science Gateways– Chemistry, Bioinformatics, Biophysics,
Environmental Sciences• OGCE is funded by the National Science
Foundation’s Software Development for Cyberinfrastructure (SDCI) program.
More Information
• OGCE Web Site: http://www.collab-ogce.org• News Feed/Blog: http://collab-ogce.blogspot.com• Contact us:
– [email protected]– http://groups.google.com/group/ogce-discuss/
• Software Downloads: Software is available as tagged SVN releases from our SourceForge project. – http://sourceforge.net/projects/ogce/ – See
http://www.collab-ogce.org/ogce/index.php/Portal_download
The OGCE Gadget Container
Managing layouts, look and feel, and behind-the-scenes services for
aggregated Web gadgets
• MicroRNAs (miRNAs) • small (19-22 nucleotide) non-
protein-coding RNA molecules• regulate the expression of specific
gene products• effect translational blockade or
message degradation• MMIA: microRNA and mRNA
integrated analysis
BioVLAB-MMIA• Computation in the Cloud• MMIA expertise in workflow
• Analysis of high throughput microarray experiment
• Multiple tasks in a single batch
• Output of a task can plugged into another task
• Repeat the same set of tasks with small changes of parameters
BioVLAB-Microarray
Back
EXPERIMENTS
Back
• MicroRNAs (miRNAs) • small (19-22 nucleotide) non-
protein-coding RNA molecules• regulate the expression of specific
gene products• effect translational blockade or
message degradation• MMIA: microRNA and mRNA
integrated analysis
BioVLAB-MMIA• Computation in the Cloud• MMIA expertise in workflow
Back
Back
BioVLAB-mCpG
Back
BioVLAB Summary
• Usability (Reconfigurable environments)– As an adoption of the SaaS model of Cloud Computing for BioVLAB, end-users only need
to launch the pre-composed BioVLAB workflows.With XBaya, users can easily customize it by modifying just a few components and input parameters.
• Flexibility (Full privileges)– As a way of the IaaS model, BioVLAB workflow developers can have flexibility for
handling computing resources and implementing applications with Amazon Cloud. They can choose specific systems resources to satisfy their needs with a fully controlled access power.
• Reducing processing time & Cost effective– Users can have number of servers, and control their usage time as they want. That
reduces researching cost and initial time to construct physical infrastructure for research.
Back
Background: What is AUC ? AUC is an important technique for the solution study of macromolecules Molecules are not fixed to a microscope grid Molecules are not distorted by crystal packing forces (vs X-Ray crystallography) Very large size range (complements cryo-EM and NMR) Dynamic processes can be studied Conformational changes
Background: What is AUC ?
Sample placed in cell Run Ultracentrifuge
Usually 20-60k RPM Collect data
4 to 24 hours or more Analyze the data
Back
TG SGUsage 2007-10
• Job statistics for UltraScan project for approximately the last 4 years.
• Only partial data is available for 2007 (2nd half) and 2010 (thru June), and only successful runs are included. • Totals of CPU hours consumed
from TeraGrid, UTHSCSA and international resources
• Number of investigators whose data were analyzed (left Y-axis), and number of submitted jobs (right Y-axis).
• Both panels indicate increasing usage and need for TeraGrid resources and an increasing number of investigators requiring access to these resources.
Back
GFAC Integration UltraScan job submission previously relied on GRAM4
GFAC integrated as middleware to abstract submission processGRAM5, UNICORE and any future mechanism
Science Gateway is in active use Initial testing done on IU quarry node Extensively tested job submission process using GFAC to LONI's
QueenBee and TACC's Ranger Deployed 26 October 2010 Implementation details available
http://wiki.bcf.uthscsa.edu/cauma/wiki/US2GFACTesting
Back
User Community: Publications
Since the development of our advanced methods, virtually every publication from our lab has used these methods
We currently count 35 peer reviewed journal publications and poster abstracts
Many additional presented talks where these methods have provided important new detail to the investigations of biological as well as synthetic polymer systems
We are aware of at least another 25 publications that were facilitated by our methods from other laboratories using our TeraGrid applications
Back
Conclusion• We focus initially on one component per
gateway.– SimpleGrid, CCSM, Data Portal: gadgets
• Other gadget based gateways at UC
– GridChem: Xbaya– UltraScan: GFac
• Goal is to establish an Apache-style meritocracy for contributed code.
• Making distributed teams work: hacking retreats.
6161
OVP/RST/ MIG
OGCERe-engineer, Generalize,
Build, Test and Release
LEAD
OGCE Gateway Tool Adaption & Reuse
GridChem
TeraGridUser Portal
OGCE Team
GridChem
Ultrascan
BioVLab
ODI
Bio Drug Screen
EST Pipeline
Future Grid
GFac, XBaya, XRegistry, FTR
Eventing System
LEAD
Resource Discovery Service
GPIR, File Browser
Gadget Container, GTLab, Javascript Cog,
XRegistry Interface, Experiment Builder, Axis2 Gfac, Axis2 Eventing System,
Resource Prediction Service, Swarm
Experiment Builder, XRegistry Interface
Xbaya, GC Middleware
GFac, Eventing System
XBaya, GFac
Workflow Suite, Gadget Container
Swarm->GFac
Swarm->GFac
GFac, Xbaya, …
Software Strategy
• Focus on gadget container and tools for running science applications on grids and clouds.
• Provide a tool set that can be used in whole or in part.– If you just want GFac, then you can use it without
buying an entire framework.• Outsource security, information services, data
and metadata, etc to other providers.– MyProxy, TG IIS, Globus, Condor, XMC Cat, iRods, etc.
Advanced Support Scenarios
• GridChem/ParamChem workflow support• UltraScan Job Submission (GFAC)• EST Pipeline
– Bioinformatics pipeline for managing mass job submission.
More Information• This is downloadable, packaged software.
– Apache Maven build system provides everything you need to to build the gadget container, gadgets, workflow composer, and backing services.
– Get code by anonymous SVN checkout.• Email: [email protected],
[email protected], [email protected]
• OGCE Web Site: www.collab-ogce.org• Blog/News Feed:
http://collab-ogce.blogspot.com/
Acknowledgements and People
• Funding by TeraGrid GIG, RP and by OCI SDCI• IU: Marlon Pierce, Suresh Marru, Raminder
Singh, Archit Kulshrestha, Zhenhua Guo• TACC: Maytal Dahan, Rion Dooley• SDSC: Nancy Wilkins-Diehr, Jeff Sale• SDSU: Mary Thomas
Gateway Computing Environments (GCE10)
Molecular Force Field CyberenvironmentsParameter Initialization and optimization Workflow
Parameter definitions
Model/Reference Data Definition
Merit Function Specification
Consistency Checker
Optimization Methods Choice
Optmization Job Launcher
Update Parameter Database with new set
Workflow Manager
Optimization Incomplete?
Paramater testing Model
Successful Testing
Optimization Monitor
Optimization Job Completed?
Paramater Sensitivity Analysis
Notification of End of Workflow
Expert Interface
OGCE Alumni
• We also gratefully acknowledge the contributions of participants in previous incarnations of the OGCE:– TACC: Maytal Dahan, Rion Dooley– SDSU: Mary Thomas– SDSC: Nancy Wilkins-Diehr, Jeff Sale– LSF: Srinath Perera, Sanjiva Weeravarna