OGCE MSI Presentation

68
Open Gateway Computing Environments: Software for Science Gateways Marlon Pierce, Suresh Marru, Raminder Singh, Gerald Guo, Archit Kulshrestha, Ye Fan, Patanachai Tangchaisin, and collaborators.

description

OGCE Presentation by Marlon Pierce at University of Minnesota Supercomputing Institute, February 11, 2011

Transcript of OGCE MSI Presentation

Page 1: OGCE MSI Presentation

Open Gateway Computing Environments: Software for

Science Gateways

Marlon Pierce, Suresh Marru, Raminder Singh, Gerald Guo, Archit Kulshrestha,

Ye Fan, Patanachai Tangchaisin, and collaborators.

Page 2: OGCE MSI Presentation

What Is a Science Gateway?

• User Interface and supporting Web services to scientific applications, data sets, and resources running on cyberinfrastructure.– Science portals, Grid Computing Environments, …– Broaden and simplify usage

• Cyberinfrastructure: Distributed computing resources and overlaying middleware for scientific computing.– Prominent examples include TeraGrid, Open Science Grid– Middleware includes Globus, Condor, iRods/SRB, …– Some of these approaches being pushed by scientific cloud

computing– That is another topic

Page 3: OGCE MSI Presentation

TeraGrid is one of the largest investments in shared CI from NSF’s Office of Cyberinfrastructure

2 PetaFLOPS

Computation Visualization

20 Petabyte

s Storage

Dedicated high-speed, cross—

country networkStaff & Advanced

Support

Page 4: OGCE MSI Presentation

Compute Resources

Resource Middleware Cloud Interfaces Grid Middleware SSH & Resource

Managers

Computational Clouds

Computational Grids

Gateway Software

User Interfaces

Web/Gadget

Container

Web Enabled Desktop

Applications

User Managemen

t

Auditing & Reporting

Fault Tolerance

Application Abstractions

Workflow System

Information ServicesMonitoring

Registry Security

Provenance & Metadata Managemen

t

Local Resources

Web/Gadget

Interfaces

Gateway Abstraction Interfaces

Cyberinfrastructure Layers

Color Coding

Dependent resource provider components

Complimentary Gateway Components

OGCE Gateway Components

Page 5: OGCE MSI Presentation

Open Gateway Computing Environments

• The OGCE team develops software for building secure, Web-based Science Gateways– Chemistry, Bioinformatics, Biophysics,

Environmental Sciences• OGCE is funded by the National Science

Foundation’s Software Development for Cyberinfrastructure (SDCI) program.

Page 6: OGCE MSI Presentation

OGCE Funds Software Lifecycle

Page 7: OGCE MSI Presentation

OGCE SoftwareName Description

OGCE Gadget Container

An OpenSocial and Google gadget-compatible Web container for running Web gadgets.

GFAC A Web service for generating, securely invoking, and managing the lifecycle of scientific applications on Grids and Clouds

Workflow Tools Composer (XBaya), enactment (“interpreter”) engines, event system, and service registry to support scientific workflows on Grids and Clouds.

Gadgets and Gadget Building Tools

Tools for building secure Google-gadget based Science Gateways.

Page 8: OGCE MSI Presentation

Putting It All Together

Page 9: OGCE MSI Presentation

OGCE Components in ActionFeatured Gateway

OGCE Components Used

UltraScan GFAC scientific application management service

GridChem, ParamChem

XBaya workflow composer, OGCE Messenger Service, XRegistry

SimpleGrid OGCE Gadget Container (in development)

Purdue CCSM Portal

Gadget Container and gadget building libraries (in development)

BioVLAB GFAC, XBaya, XRegistry, Workflow Interpreter Service

Page 10: OGCE MSI Presentation

Software Strategy• We develop downloadable, packaged, open source

software• SourceForge

• Focus: a) gadget container and b) tools for running science applications and workflows on grids and clouds.

• Provide a tool set that can be used in whole or in part.– If you just want GFac, then you can use it without buying an

entire framework.• Out of our scope: visualization, security, information

services, data and metadata provenance and management.– MyProxy, TG IIS, Globus, Condor, XMC Cat, iRods, etc.

Page 11: OGCE MSI Presentation

Apache Incubators

• Joining Apache is key to our software sustainability strategy– Open source licensing, meritocracy, visibility

• Vigyan: tools for science gateway services and workflows– XBaya, GFAC, Messenger, XRegistry– Collaboration with WS02/LSF, IBM– Builds on Apache Axis2, Apache ODE

• Rave: OpenSocial gadget manager, general purpose gadgets– Collaboration with Hippo, Mitre, SURFnet– Builds on Apache Shindig

Page 12: OGCE MSI Presentation

The OGCE Gadget Container

Managing layouts, look and feel, and behind-the-scenes services for

aggregated Web gadgets

Page 13: OGCE MSI Presentation

The OGCE Gadget Container allows you to build portals out of public and private Google Open Social gadgets. Supports HTTPS. Downloadable, packaged software.

Page 14: OGCE MSI Presentation

The OGCE Application Registry gadget allows users to interactively register hosts and applications that are

dynamically wrapped as Web services.

Page 15: OGCE MSI Presentation

Google Gadget-Based Science Gateways

PolarGrid

LEAD

Page 16: OGCE MSI Presentation

Mobile SupportGadget Container is built with HTML, JavaScript and CSS. Works in both iPhone and Android native browsers with out modification.

Developing layout managers better suited to limited screen real estate.

Page 17: OGCE MSI Presentation

Feature Groups Features

Look and Feel Tabbed and Tree layout managers, 2 and 3 column layouts, default maximized views of gadgets, customizable color styling.

Security Supports end-to-end SSL between browser, container, and gadgets; OpenID authentation; OAuth-secured gadgets; MyProxy logins; limited Grid credential sharing between gadgets; CILogon for InCommon login

Inter-Gadget Communication

Supports OpenAjax publish-subscribe style messaging between gadgets. PMRPC JavaScript messaging support in development

REST Service API Layouts, logins, sign-ups, user administration, user identification, and Grid credentials all accessible via REST service calls as well as the user interface.

Open Source Social Networking

All code is open source and builds on Apache Shindig 2.0.

Gadget Development

Support for GWT-based gadgets and YUI JavaScript libraries in development.

Page 18: OGCE MSI Presentation

SimpleGrid Gadgets

Requires YUI integration, OpenAJAX messaging, REST APIs

Page 19: OGCE MSI Presentation

Bioinformatics Workflows in the Cloud

Page 20: OGCE MSI Presentation

BioVLAB Architecture

Page 21: OGCE MSI Presentation

BioVLAB Application Deployment Procedure

• Develop a command line app.

• Install the app. in Amazon EC2• Let the app. store any output to Amazon S3 • Make a virtual machine image• Register the app. by using Gfac

• Instantiate EC2 and run the app. by using XBaya

Use

rAd

min

Use

r

Gfac Registration form

Page 22: OGCE MSI Presentation

• Analysis of high throughput microarray experiment

• Multiple tasks in a single batch

• Output of a task can plugged into another task

• Repeat the same set of tasks with small changes of parameters

BioVLAB-Microarray

Page 23: OGCE MSI Presentation

BioVLAB-mCpG

Page 24: OGCE MSI Presentation

OGCE Layered Workflow Architecture:Derived from LEAD Workflow System

Workflow Execution & Control

Engines

Apache ODE

Workflow Specification

Workflow Interfaces (Design

& Definition)

PythonBPEL 2.0

BPEL 1.0 Java Code Pegasus DAG

Scufl

XBaya GUI (Composition,

Deploying, Steering & Monitoring) Gadget Interface for

Input Binding

Condor DAGMan

Taverna

Dynamic Enactor

Jython InterpreterGBPEL

Flex/Web Composition

Page 25: OGCE MSI Presentation

UltraScan Science Gateway

Biophysics gateway for ultracentrifugation experiment data

analysis

Page 26: OGCE MSI Presentation

UTHSCSA JacintoTerascale storage

Web Server

US LIMS

MySQL DB

User

High Performance Computing Clusters

TeraGrid

TIGRE/Globus Network

GridControl

UltraScan2 High Level Overview

Page 27: OGCE MSI Presentation

UltraScan TG Usage July 2007-June 2010

Page 28: OGCE MSI Presentation

UltraScan Collaboration

• Immediate Goals: Use GFAC as a replacement job submission service. – GRAM 2, 4, 5 independence– Significant effort into GRAM5

testing on Ranger.• Longer term goals

– Integrate with TG information services to provide better job scheduling.

• OGCE Resource Prediction Service

– Support UNICORE job management.

Current Architecture

Page 29: OGCE MSI Presentation

UltraScan problems Solution provided by OGCE

Gateway code can only submit to resources with GRAM4 installed and running.

GFAC supports different provider like GRAM2/4/5, Condor, Local, Remote using SSH keys. There is a generic GUI interface to configure them all.

Adding new resource is time consuming User need to fill two web form to configure new resource.

Local cluster needed to install GRAM4. We can directly invoke mpirun on local or remote cluster using local/remote providers.

TACC resources like Lonestar and Ranger decided not to install GRAM4 and move to GRAM5.

Its was easy to start using GRAM5 in GFAC but time consuming to GRAM5 to run operationally on these resources.

Problem related to job failure and missing status.

Retry mechanism for certain GRAM error codes but still trying to find how to deal with missing status or reconnect to those jobs as Globus api does not support that.

Restart of jobs were not provided in Gateway even application supports check pointing.

Added restart job support from checkpoint files.

Ultrascan3 need to rewrite all these component again as it using different technology.

Provided REST interface to OGCE services and now different language clients can call same interfaces for required operations.

Page 30: OGCE MSI Presentation

GFac Current & Future Features

Input Handlers

Scheduling Interface

Auditing

Monitoring Interface

Data Management Abstraction

Job ManagementAbstraction

Fault Tolerance

Output Handlers

Registry Interface

Checkpoint Support

Apac

he A

xis2

Globus

Campus Resources

Unicore

Condor

Amazon Eucalyptus

Color Coding

Planned/Requested Features

Existing Features

Page 31: OGCE MSI Presentation

Gram5 Testing

• Developed Testing harness to run different cases.• Started with small number of jobs and increased

the concurrency later• Watched job behavior of the job on resource and

monitored the gram log– There were lot of issue which we found from

the logs and working with Globus team to fix them

• Recorded all the job run data to create a google gadget to create graph for different runs on different resources.

Page 32: OGCE MSI Presentation

Patterns:

TG Resources and patternsVersion Resource EndpointGT 5.0.2 QueenBee queenbee.loni-lsu.teragrid.org:2120/jobmanager-pbs

GT 5.0.2 Ranger login5.ranger.tacc.teragrid.org:2120/jobmanager-sge

GT 5.0.2 Lonestar gatekeeper.lonestar.tacc.teragrid.org:2120/jobmanager-lsf

Concurrent jobs Batch Size Total jobs Job Status Pass : Fail

1 10 10 10:0

3 10 30 30:0

5 10 50 50:0

10 10 100 20:0

20 10 200 40:0

50 10 500 100:0

100 10 1000 200:0

200 5 1000 Not tested (Need allocation)

500 2 1000 Not tested (Need allocation)

Page 33: OGCE MSI Presentation

GFAC Integration

• UltraScan job submission previously relied on GRAM4 GFAC integrated as middleware to abstract

submission process GRAM5, UNICORE and any future mechanism

• Science Gateway is in active use Initial testing done on IU quarry node Extensively tested job submission process using

GFAC to LONI's QueenBee and TACC's Ranger Deployed 26 October 2010 Implementation details available

http://wiki.bcf.uthscsa.edu/cauma/wiki/US2GFACTesting

Page 34: OGCE MSI Presentation

GridChem/ParamChem

Gateways for Computational Chemistry

Page 35: OGCE MSI Presentation

GridChem Science Gateway• A chemistry/material Science Gateway for running

computational chemistry codes, workflows, and parameter sweeps.

• Integrates molecular science applications and tools for community use.

• 400+ users heavily using TeraGrid. One of the consistent top5 TeraGrid Gateway users.

• Supports all popular Chemistry applications including Gaussian, GAMESS, NWChem, QMCPack, Amber and MolPro, CHARMM

• ParamChem is a follow-on project to develop workflows for chemical parameter studies and provide the infrastructure to execute them.

Page 36: OGCE MSI Presentation

Empirical ForceFields Parameterization Need Process

Vanommeslaeghe et al. J. Comp.Chem 2010, 31, 671-690

Published by AAAS

A. J. Stone Science 321, 787 -789 (2008)

Fig. 1. Errors (V) in electrostatic potential on a surface at 1.8 times van der Waals radii around N-methyl propanamide for two models. (Left) Point charges; (right) charge, dipole, and quadrupole on C, N, and O; charge and dipole on H. The errors are much reduced in the multipole approach

Lack of Accurate Force Fields Produce Erroneous Property Estimation

Page 37: OGCE MSI Presentation

Cyberenvironments for ParameterizationComputational Reference Data Generation

Page 38: OGCE MSI Presentation

Conclusions• Our project focus is providing long-term sustainable

software for science gateways.• What we learned:

– Try to serve a few high profile collaborators very well.• Derive good software engineering practices from this: versioning,

code reviews, testing , packaging, portability, …

– Define and keep to your project’s scope. – Let the collaborations determine the direction of innovation

• This is more than just getting “customer requirements”. Collaborators expect you to know your field and guide them.

• There is a tension between this and research– “Collaborators, not customers” is the resolution.

Page 39: OGCE MSI Presentation

More Information• OGCE Web Site: http://www.collab-ogce.org• News Feed/Blog: http://collab-ogce.blogspot.com• Contact us:

[email protected]– http://groups.google.com/group/ogce-discuss/

• Software Downloads: Software is available as tagged SVN releases from our SourceForge project. – http://sourceforge.net/projects/ogce/ – See

http://www.collab-ogce.org/ogce/index.php/Portal_download

Page 40: OGCE MSI Presentation

Backup Slides

Page 41: OGCE MSI Presentation

OGCE Partners and PeopleInstitution PeopleIndiana University

Marlon Pierce, Suresh Marru, Raminder Singh, Archit Kulshrestha, Gerald Guo

NCSA/UIUC Sudhakar Pamidighantam, Shaowen Wang, Yan Liu

Purdue University

Carol Song, Lan Zhao, David Braun, Shawn Wu

UTHSCSA Emre Brookes, Borries Demeler, Bruce Dubbs

Page 42: OGCE MSI Presentation

Award Highlights

• Full Circle Development– Directly fund both software developers and gateway

consumers.• Directly supported (non-IU) gateways:

– UltraScan (UTHSCSA), GridChem (NCSA), SimpleGrid (UIUC), Purdue CCSM and Environmental Gateways

– Among the most used TG gateways.• Sustainability strategy: Apache Incubator for

workflow suite of tools – XBaya, GFac, and supporting services.

Page 43: OGCE MSI Presentation

SimpleGrid, GISolve

• Short term goal: develop SimpleGrid Gadgets deployable into gadget container.– Must meet security requirements– Support PHP development– Support interactivity requirements

• Integrate YUI JavaScript libraries with Gadget JavaScript.

• Longer term goals: investigate workflow, job management tools. Apply to GISolve

Page 44: OGCE MSI Presentation

Purdue CCSM and Data Portals

• Short terms goals: Develop CCSM and data management gadgets and necessary backing middleware.– Interactivity and security requirements.– Significant requirements overlap with SimpleGrid

• Longer term goals: Build gateways out of gadgets hosted by multiple containers; examine workflow and other tools.

Page 45: OGCE MSI Presentation

Open Gateway Computing Environments

• The OGCE team develops software for building secure, Web-based Science Gateways– Chemistry, Bioinformatics, Biophysics,

Environmental Sciences• OGCE is funded by the National Science

Foundation’s Software Development for Cyberinfrastructure (SDCI) program.

Page 46: OGCE MSI Presentation

More Information

• OGCE Web Site: http://www.collab-ogce.org• News Feed/Blog: http://collab-ogce.blogspot.com• Contact us:

[email protected]– http://groups.google.com/group/ogce-discuss/

• Software Downloads: Software is available as tagged SVN releases from our SourceForge project. – http://sourceforge.net/projects/ogce/ – See

http://www.collab-ogce.org/ogce/index.php/Portal_download

Page 47: OGCE MSI Presentation

The OGCE Gadget Container

Managing layouts, look and feel, and behind-the-scenes services for

aggregated Web gadgets

Page 48: OGCE MSI Presentation

• MicroRNAs (miRNAs) • small (19-22 nucleotide) non-

protein-coding RNA molecules• regulate the expression of specific

gene products• effect translational blockade or

message degradation• MMIA: microRNA and mRNA

integrated analysis

BioVLAB-MMIA• Computation in the Cloud• MMIA expertise in workflow

Page 49: OGCE MSI Presentation

• Analysis of high throughput microarray experiment

• Multiple tasks in a single batch

• Output of a task can plugged into another task

• Repeat the same set of tasks with small changes of parameters

BioVLAB-Microarray

Back

Page 50: OGCE MSI Presentation

EXPERIMENTS

Back

Page 51: OGCE MSI Presentation

• MicroRNAs (miRNAs) • small (19-22 nucleotide) non-

protein-coding RNA molecules• regulate the expression of specific

gene products• effect translational blockade or

message degradation• MMIA: microRNA and mRNA

integrated analysis

BioVLAB-MMIA• Computation in the Cloud• MMIA expertise in workflow

Back

Page 52: OGCE MSI Presentation

Back

Page 53: OGCE MSI Presentation

BioVLAB-mCpG

Back

Page 54: OGCE MSI Presentation

BioVLAB Summary

• Usability (Reconfigurable environments)– As an adoption of the SaaS model of Cloud Computing for BioVLAB, end-users only need

to launch the pre-composed BioVLAB workflows.With XBaya, users can easily customize it by modifying just a few components and input parameters.

• Flexibility (Full privileges)– As a way of the IaaS model, BioVLAB workflow developers can have flexibility for

handling computing resources and implementing applications with Amazon Cloud. They can choose specific systems resources to satisfy their needs with a fully controlled access power.

• Reducing processing time & Cost effective– Users can have number of servers, and control their usage time as they want. That

reduces researching cost and initial time to construct physical infrastructure for research.

Back

Page 55: OGCE MSI Presentation

Background: What is AUC ? AUC is an important technique for the solution study of macromolecules Molecules are not fixed to a microscope grid Molecules are not distorted by crystal packing forces (vs X-Ray crystallography) Very large size range (complements cryo-EM and NMR) Dynamic processes can be studied Conformational changes

Page 56: OGCE MSI Presentation

Background: What is AUC ?

Sample placed in cell Run Ultracentrifuge

Usually 20-60k RPM Collect data

4 to 24 hours or more Analyze the data

Back

Page 57: OGCE MSI Presentation

TG SGUsage 2007-10

• Job statistics for UltraScan project for approximately the last 4 years.

• Only partial data is available for 2007 (2nd half) and 2010 (thru June), and only successful runs are included. • Totals of CPU hours consumed

from TeraGrid, UTHSCSA and international resources

• Number of investigators whose data were analyzed (left Y-axis), and number of submitted jobs (right Y-axis).

• Both panels indicate increasing usage and need for TeraGrid resources and an increasing number of investigators requiring access to these resources.

Back

Page 58: OGCE MSI Presentation

GFAC Integration UltraScan job submission previously relied on GRAM4

GFAC integrated as middleware to abstract submission processGRAM5, UNICORE and any future mechanism

Science Gateway is in active use Initial testing done on IU quarry node Extensively tested job submission process using GFAC to LONI's

QueenBee and TACC's Ranger Deployed 26 October 2010 Implementation details available

http://wiki.bcf.uthscsa.edu/cauma/wiki/US2GFACTesting

Back

Page 59: OGCE MSI Presentation

User Community: Publications

Since the development of our advanced methods, virtually every publication from our lab has used these methods

We currently count 35 peer reviewed journal publications and poster abstracts

Many additional presented talks where these methods have provided important new detail to the investigations of biological as well as synthetic polymer systems

We are aware of at least another 25 publications that were facilitated by our methods from other laboratories using our TeraGrid applications

Back

Page 60: OGCE MSI Presentation

Conclusion• We focus initially on one component per

gateway.– SimpleGrid, CCSM, Data Portal: gadgets

• Other gadget based gateways at UC

– GridChem: Xbaya– UltraScan: GFac

• Goal is to establish an Apache-style meritocracy for contributed code.

• Making distributed teams work: hacking retreats.

Page 61: OGCE MSI Presentation

6161

OVP/RST/ MIG

OGCERe-engineer, Generalize,

Build, Test and Release

LEAD

OGCE Gateway Tool Adaption & Reuse

GridChem

TeraGridUser Portal

OGCE Team

GridChem

Ultrascan

BioVLab

ODI

Bio Drug Screen

EST Pipeline

Future Grid

GFac, XBaya, XRegistry, FTR

Eventing System

LEAD

Resource Discovery Service

GPIR, File Browser

Gadget Container, GTLab, Javascript Cog,

XRegistry Interface, Experiment Builder, Axis2 Gfac, Axis2 Eventing System,

Resource Prediction Service, Swarm

Experiment Builder, XRegistry Interface

Xbaya, GC Middleware

GFac, Eventing System

XBaya, GFac

Workflow Suite, Gadget Container

Swarm->GFac

Swarm->GFac

GFac, Xbaya, …

Page 62: OGCE MSI Presentation

Software Strategy

• Focus on gadget container and tools for running science applications on grids and clouds.

• Provide a tool set that can be used in whole or in part.– If you just want GFac, then you can use it without

buying an entire framework.• Outsource security, information services, data

and metadata, etc to other providers.– MyProxy, TG IIS, Globus, Condor, XMC Cat, iRods, etc.

Page 63: OGCE MSI Presentation

Advanced Support Scenarios

• GridChem/ParamChem workflow support• UltraScan Job Submission (GFAC)• EST Pipeline

– Bioinformatics pipeline for managing mass job submission.

Page 64: OGCE MSI Presentation

More Information• This is downloadable, packaged software.

– Apache Maven build system provides everything you need to to build the gadget container, gadgets, workflow composer, and backing services.

– Get code by anonymous SVN checkout.• Email: [email protected],

[email protected], [email protected]

• OGCE Web Site: www.collab-ogce.org• Blog/News Feed:

http://collab-ogce.blogspot.com/

Page 65: OGCE MSI Presentation

Acknowledgements and People

• Funding by TeraGrid GIG, RP and by OCI SDCI• IU: Marlon Pierce, Suresh Marru, Raminder

Singh, Archit Kulshrestha, Zhenhua Guo• TACC: Maytal Dahan, Rion Dooley• SDSC: Nancy Wilkins-Diehr, Jeff Sale• SDSU: Mary Thomas

Page 66: OGCE MSI Presentation

Gateway Computing Environments (GCE10)

Page 67: OGCE MSI Presentation

Molecular Force Field CyberenvironmentsParameter Initialization and optimization Workflow

Parameter definitions

Model/Reference Data Definition

Merit Function Specification

Consistency Checker

Optimization Methods Choice

Optmization Job Launcher

Update Parameter Database with new set

Workflow Manager

Optimization Incomplete?

Paramater testing Model

Successful Testing

Optimization Monitor

Optimization Job Completed?

Paramater Sensitivity Analysis

Notification of End of Workflow

Expert Interface

Page 68: OGCE MSI Presentation

OGCE Alumni

• We also gratefully acknowledge the contributions of participants in previous incarnations of the OGCE:– TACC: Maytal Dahan, Rion Dooley– SDSU: Mary Thomas– SDSC: Nancy Wilkins-Diehr, Jeff Sale– LSF: Srinath Perera, Sanjiva Weeravarna