The GRIDS Center, part of the NSF Middleware Initiative GRIDS Targeted Community Workshop Grid...

74
www.grids-center.org The GRIDS Center, part of the NSF Middleware Initiative GRIDS Targeted Community Workshop Grid Software Management Doru Marcusiu Assistant Director Grid and Security Technologies National Center for Supercomputing Applications [email protected]
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of The GRIDS Center, part of the NSF Middleware Initiative GRIDS Targeted Community Workshop Grid...

www.grids-center.orgThe GRIDS Center, part of the NSF Middleware Initiative

GRIDS Targeted Community Workshop

Grid Software Management

Doru MarcusiuAssistant Director

Grid and Security Technologies

National Center for Supercomputing Applications

[email protected]

www.grids-center.org2The GRIDS Center, part of the NSF Middleware Initiative

The Approach

• Discuss building grid infrastructure focusing on grid software management issues

• Correlate issues to solutions using the NEESgrid project as an example

• Propose possible ways in which GRIDS can help other communities

www.grids-center.org3The GRIDS Center, part of the NSF Middleware Initiative

The Goals

• To raise community awareness about the challenges of grid software management

• To learn and understand community concerns and requirements

• To educate the communities about GRIDS services and products

• To identify specific action items for GRIDS

www.grids-center.org4The GRIDS Center, part of the NSF Middleware Initiative

The Ground Rules

• We want to understand your requirements and concerns

• This is a 2-way, open, interactive session• We don’t claim to have all the answers• We welcome suggestions and ideas

www.grids-center.org5The GRIDS Center, part of the NSF Middleware Initiative

Questions Addressed

• What is Cyber infrastructure, middleware, and grid infrastructure?• What are the services of such an infrastructure?• How is such an infrastructure established?• Are these technologies available and if so from where?• Will future technologies be backward compatible with existing technologies?• What is required to integrate project specific software with middleware?• How is a common software stack integrated, built, and deployed?• How will technical and end users be educated in using these technologies?• Are there current projects using grid technologies?• What are the social, political, and technology challenges of building cyber

infrastructure?• How can GRIDS help?

 

www.grids-center.org6The GRIDS Center, part of the NSF Middleware Initiative

Definitions

• Cyber Infrastructure– The complete end to end solution that allows new research and

collaboration that has not previously been possible – Like the physical infrastructure of roads, bridges, power grids, telephone

lines, and water systems that support modern society, "cyberinfrastructure" refers to the distributed computer, information and communication technologies combined with the personnel and integrating components that provide a long- term platform to empower the modern scientific research endeavor. (GridToday - DAILY NEWS AND INFORMATION FOR THE GLOBAL GRID COMMUNITY / FEBRUARY 10, 2003: VOL. 2 NO. 6)

• Grid Infrastructure– A generic software stack providing fundamental services such as

authentication, data management, and information services.– Middleware, or "glue", is a layer of software between the network and the

applications. This software provides services such as identification, authentication, authorization, directories, and security.

• Middleware– A software stack integrated with or layered on top of Grid infrastructure

providing project specific services for the end user.

www.grids-center.org7The GRIDS Center, part of the NSF Middleware Initiative

What is a Grid?• 1969, Len Kleinrock:

“We will probably see the spread of ‘computer utilities’, which, like present electric and telephone utilities, will service individual homes and offices

across the country.”

• 1998, Kesselman & Foster:“A computational grid is a hardware and software infrastructure that provides dependable, consistent,

pervasive, and inexpensive access to high-end computational capabilities.”

• 2000, Kesselman, Foster, Tuecke:“…coordinated resource sharing and problem solving

in dynamic, multi-institutional virtual organizations.”

www.grids-center.org8The GRIDS Center, part of the NSF Middleware Initiative

Building Infrastructure

• Engage Users

• Define requirements

• Evaluate existing solutions

• Determine how missing functionality will be addressed

• Develop software integration and deployment plan

www.grids-center.org9The GRIDS Center, part of the NSF Middleware Initiative

Engaging Users

• Are people interested and excited about the possibilities that new technology can offer?

• Are users willing to “suffer” through the necessary changes associated with using new technologies?

• Are users committed to the evolution necessary to view the way they do their science?

• Do users understand the benefits to the community and the science?

www.grids-center.org10The GRIDS Center, part of the NSF Middleware Initiative

Defining Requirements

• What are the project goals?• What is needed to achieve these goals?• What is the infrastructure architecture that needs

to be established?• What technologies exist and don’t exist to meet

these requirements?

www.grids-center.org11The GRIDS Center, part of the NSF Middleware Initiative

Evaluate Existing Solutions

• Research existing commercial and non commercial solutions

• Evaluate trade offs between commercial and non commercial solutions

• Establish relationships to gain knowledge and share experiences with other projects

www.grids-center.org12The GRIDS Center, part of the NSF Middleware Initiative

Missing Functionality

• How will requirements for which no solution exists be addressed?

• Can the project afford to wait for a future implementation to be developed by a community?

• Can the project afford to develop solutions “in-house”?

www.grids-center.org13The GRIDS Center, part of the NSF Middleware Initiative

Integration and Deployment Plans

• How will all the necessary components be integrated to provide a complete system solution?

• How will interoperability be addressed?• How will the middleware software stack be

deployed• How will updates to and evolution of the

software stack take place?• How will the software be maintained and

supported?

www.grids-center.org14The GRIDS Center, part of the NSF Middleware Initiative

Infrastructure Services

• Authentication and authorization• Data management• Workflow management• Collaboration tools

www.grids-center.org15The GRIDS Center, part of the NSF Middleware Initiative

Infrastructure Services

• Authentication and authorization– Authentication

• The process by which one gains access to resources based on the verification of one’s identity

– Authorization• The process by which one user, or a group of users, is able to

use resources or services based on the permissions granted to that user

www.grids-center.org16The GRIDS Center, part of the NSF Middleware Initiative

Infrastructure Services

• NEESgrid Authentication and authorization– Authentication

• Requirement to provide single sign on and use delegated credentials

• Implementation uses– GSI for authentication

– NCSA CA to issue X.509 certificates

– MyProxy to manage credentials retrieved by a portal

– Authorization• Anticipated requirement to provide group access to data

– Currently implemented using UNIX file permissions and customized code

– Plan to use CAS for future authorization needs

www.grids-center.org17The GRIDS Center, part of the NSF Middleware Initiative

Infrastructure Services

• Data management– Movement

• Input to computation and output results to/from remote locations

– Discovery• Searches or mining data bases

– Replication• Duplication of and improved access to data

– Archives• Long term storage

– Meta Data• Data describing data

www.grids-center.org18The GRIDS Center, part of the NSF Middleware Initiative

Infrastructure Services

• NEESgrid Data management– Movement

• To/from experiments to/from repositories

• GridFTP and customized data repositories

– Discovery• Searches or mining data bases

• Development of meta data services

– Replication• Duplication of and improved access to data

• Implementation of local and central repositories

– Meta Data• Community developed meta data schemas

www.grids-center.org19The GRIDS Center, part of the NSF Middleware Initiative

Infrastructure Services

• Work flow management– Data pre-fetching

– Distributed computations

– Sequencing

– Visualization

– Archiving

www.grids-center.org20The GRIDS Center, part of the NSF Middleware Initiative

Infrastructure Services

• NEESgrid Work flow management– Distributed computations

• Support for future access to HPC resources using Condor–G Sequencing

• NEESgrid Tele Control Protocol to sequence experiment events

– Visualization• Portal data viewer for graphical representation of experimental

results

• Streaming experimental data results in real time

• Tele presence cameras

– Archiving• Local and Central Data Repository

www.grids-center.org21The GRIDS Center, part of the NSF Middleware Initiative

Infrastructure Services

• Collaboration Tools– Notes– Email– E-Notebooks– Data repositories– Document sharing

www.grids-center.org22The GRIDS Center, part of the NSF Middleware Initiative

Infrastructure Services

• NEESgrid Collaboration Tools– Notes

• UM.Worktools

– Email• UM.Worktools and NEESgrid specific mailing lists

– E-Notebooks• ANL (Argonne National Lab) Enotebook• PNNL (Pacific Northwest National Lab)

– Data repositories• Local at each site and central at NCSA

– Document sharing• UM.Worktools

www.grids-center.org23The GRIDS Center, part of the NSF Middleware Initiative

Establishing an Infrastructure

• Defining project requirements• Defining the software stack• Identifying software integration tasks• Establishing a software deployment process• Defining a software maintenance policy• Defining software support procedures

www.grids-center.org24The GRIDS Center, part of the NSF Middleware Initiative

Establishing NEESgrid Infrastructure

• Defining project requirements– Solicited input from community– Developed architecture based on requirements and existing

technologies as well as technologies that needed to be developed

• Defining the software stack– Evaluated existing software distributions (GT®, GRIDS, 3rd party)

and packaging tools (GPT, RPM, tar files)– Developed software integration plan– Developed installation scripts

• Identifying software integration tasks– Integration of 3rd party software, existing grid technologies, and

newly developed code

www.grids-center.org25The GRIDS Center, part of the NSF Middleware Initiative

Establishing NEESgrid Infrastructure cont.

• Establishing a software deployment process– Required to be simple, upgradeable, and well documented– Required custom solution

• Defining a software maintenance policy– Predefined, feature-based software releases– Patches generated as needed

• Defining software support procedures– Email discuss lists open to community and monitored

by software deployment team– Establishment of bugzilla– Development of installation documentation

www.grids-center.org26The GRIDS Center, part of the NSF Middleware Initiative

Available Technologies• Globus Toolkit®

– The Globus Toolkit® includes software services and libraries for resource monitoring, discovery, and management, plus security and file management.

• Condor-G– A Computation Management Agent for Multi-Institutional Grids

• MyProxy– MyProxy is a credential repository for the Grid

• GSI OpenSSH– OpenSSH clients and servers with support for the GSI authentication mechanism

• GridPort– Enables the development of portals and applications on top of underlying distributed and grid

computing infrastructure to facilitate computational science. • UberFTP

– UberFTP is the first interactive, GridFTP-enabled ftp client that supports GSI authentication, parallel data channels and third party transfers.

• Shibboleth– a joint project of Internet2/MACE and IBM,is investigating architectures, frameworks, and

practical technologies to supportinter-institutional sharing and controlled access to web available services

• SUN Grid Engine– Provides enabling distributed resource management software for wide ranging requirements

from compute farms to grid computing. ngine• PLATFORM GLOBUS TOOLKIT

– Open-source, commercially supported toolkit for building grids

www.grids-center.org27The GRIDS Center, part of the NSF Middleware Initiative

Technologies used by NEESgrid

• Globus Toolkit®– The Globus Toolkit® includes software services and libraries for resource

monitoring, discovery, and management, plus security and file management.

• Condor-G– Provides capability to support future work flow management of distributed

computation on HPC resources

• MyProxy– Provides a credential repository for portal access to user proxies

• GSI OpenSSH– Provides community access to NEESpop (NEES point of presence)

gateway systems using GSI authentication

• CHEF (CompreHensive collaborativE Framework)– CHEF provides the user interface to the NEESgrid tools and services such

as chat, schedules, discussion, data viewer, video cameras, GridFTP, Data repository, etc.

www.grids-center.org28The GRIDS Center, part of the NSF Middleware Initiative

Technologies used by NEESgrid cont.

• NTCP– (Teleoperation Control Protocol) Newly developed

technology to remotely control equipment site resources such as shake tables, centrifuges, wave tasks, etc. during collaborative, distributed experiments.

• NFMS– (NEESgrid File Management Service) Newly developed

technology to files independently of how and where they are stored, as well as the ability to negotiate transactions with storage systems.

www.grids-center.org29The GRIDS Center, part of the NSF Middleware Initiative

Social & Political Challenges

• Establishing a Plan

• Coordinating Efforts

• Setting Policy

• Establishing Trust

• Making Compromises

• Engaging Application Scientists

www.grids-center.org30The GRIDS Center, part of the NSF Middleware Initiative

Social & Political Challenges cont.

• Establishing a Plan– Milestones and time lines– Clear task definitions– Task assignment– Problem resolution– Scope of responsibility– Accountability– Consequences– Risk assessment

www.grids-center.org31The GRIDS Center, part of the NSF Middleware Initiative

Social & Political Challenges cont.

• Coordinating Efforts– Are you committed?– Do you have the resources to support that

commitment?– Accountability– What is the plan?

www.grids-center.org32The GRIDS Center, part of the NSF Middleware Initiative

Social & Political Challenges cont.

• Setting Policy– Users– Resources– Sites– Expectations– Support– Problem Resolution

www.grids-center.org33The GRIDS Center, part of the NSF Middleware Initiative

Social & Political Challenges cont.

• Establishing Trust– CAs– Support

• Identity and rights– Who are the users– What rights do they have

• Authorized Use– What can they do

www.grids-center.org34The GRIDS Center, part of the NSF Middleware Initiative

Social & Political Challenges cont.

• Making Compromises– Priorities

• Functionality, stability

– Services• Security, computation, data,

– Pride and Ownership• Whose software do we use?

www.grids-center.org35The GRIDS Center, part of the NSF Middleware Initiative

Social & Political Challenges cont.

• Engaging Application Scientists

– Defining requirements• Services, support

– Setting expectations• Services, reliability, support

www.grids-center.org36The GRIDS Center, part of the NSF Middleware Initiative

Technology Challenges

• Providing production quality services• Software stack• Standards• Implementing• Deploying Middleware (lessons learned)• Grid Management (accounting/allocations)• Security• Data Management• Extensibility• Authorization• Account Management• Certificate Management• QoS or Prediction• Rapidly Changing Dynamics

www.grids-center.org37The GRIDS Center, part of the NSF Middleware Initiative

Technology Challenges cont.

• Providing production quality services– Robust– Reliable– Supportable– Extensible– Useful

www.grids-center.org38The GRIDS Center, part of the NSF Middleware Initiative

Technology Challenges cont.

• Software stack– What should it be– How is it maintained– How is it it kept consistent across resources– Compatibility– Consistency– Functionality– Software Interoperability– Among differing versions of software

www.grids-center.org39The GRIDS Center, part of the NSF Middleware Initiative

Technology Challenges cont.

• Standards– Do any exist?– Can we use them?– Why should we use them?– Can we agree on what to use?

www.grids-center.org40The GRIDS Center, part of the NSF Middleware Initiative

Technology Challenges cont.

• Implementing– Missing functionality– Compatibility of newly developed site specific

functionality with quickly changing software releases

www.grids-center.org41The GRIDS Center, part of the NSF Middleware Initiative

Technology Challenges cont.

• Deploying middleware– Rapidly Changing Dynamics– Single data base of base line code– Packaging– Meeting site specific needs

www.grids-center.org42The GRIDS Center, part of the NSF Middleware Initiative

Technology Challenges cont.

• Grid Management (accounting/allocations)– User support

• Problem resolution• Site coordination of debugging/help• Local vs global problem

– User training• How to do grid computing

– Sys admin tasks• Grid map file mgmt• Acct mgmt• Software patches, upgrades, verifications

– System and services monitoring

www.grids-center.org43The GRIDS Center, part of the NSF Middleware Initiative

Technology Challenges cont.

• Security– Data Integrity (encryption)– Authentication– Interoperability w/ existing mechanisms

(kerberos)

www.grids-center.org44The GRIDS Center, part of the NSF Middleware Initiative

Technology Challenges cont.

• Data Management– Access to multiple distributed Archival storage

systems– High performance data transfers

www.grids-center.org45The GRIDS Center, part of the NSF Middleware Initiative

Technology Challenges cont.

• Extensibility

– How do new sites join• Dynamically or by criteria

– What must they do• Provide resources, services, etc.

www.grids-center.org46The GRIDS Center, part of the NSF Middleware Initiative

Technology Challenges cont.

• Authorization– Control access to subset of resources or services– Provide group verses individual access– CAS

www.grids-center.org47The GRIDS Center, part of the NSF Middleware Initiative

Technology Challenges cont.

• Account Management– Managing user access to resources

• Centralized vs. distributed• Coordination• Authoritative source

– Reporting usage• Interest by funding agencies• Help with capacity planning

www.grids-center.org48The GRIDS Center, part of the NSF Middleware Initiative

Technology Challenges cont.

• Certificate Management– Certificate requests/issuance/revocations– Grid map file maintenance– Grid map file coordination

www.grids-center.org49The GRIDS Center, part of the NSF Middleware Initiative

Technology Challenges cont.

• QoS or Prediction– Best effort not enough

www.grids-center.org50The GRIDS Center, part of the NSF Middleware Initiative

Technology Challenges cont.

• Rapidly Changing Dynamics– Projects– Organizations– Trends and or Technology

www.grids-center.org51The GRIDS Center, part of the NSF Middleware Initiative

Lessons Learned

• This is hard• Can’t do everything at once• Clearly define goals• Set priorities• Must make compromises• Will probably need more resources than you expect• Communication and cooperation is critical• Build a good team• Leverage existing support services like GRIDS

www.grids-center.org52The GRIDS Center, part of the NSF Middleware Initiative

GRIDS Engagement Examples

• VDT – UWisc – NMI VTD specific release• ISI – NPACKAGE – using latest NMI NPACKAGE

tailored release• SDSC – BIRN – potential addition of end to end

security solution• NCSA – LEAD – potential release management

plan• ANL – NEESgrid – potential customized releases

www.grids-center.org53The GRIDS Center, part of the NSF Middleware Initiative

How Can GRIDS Help?

• Provide software distributions• Provide a build and test system• Help define a software management process• Provide training, education, documentation• Provide support and expertise

www.grids-center.org54The GRIDS Center, part of the NSF Middleware Initiative

Provide Software Distributions

• Releases can include project specific components

• Component interoperability can be assured

• Releases schedule can be flexible

• Component updates can be provided

www.grids-center.org55The GRIDS Center, part of the NSF Middleware Initiative

Providing a Build and Test System

• Existing system can be used to provide customized distributions

• Existing system can be deployed for internal use• Existing system can be tailored to project specific needs

www.grids-center.org56The GRIDS Center, part of the NSF Middleware Initiative

Defining a Software Management Process

• Managing a changing Software stack• Choosing the software stack components• Developing a deployment strategy• Requirements for preparing a release

www.grids-center.org57The GRIDS Center, part of the NSF Middleware Initiative

Managing a Software Stack

• Why Does Software Change?

– Because software ‘evolves’• Platforms change• Components become unsupported

– Because Science demands it• Need more speed• Need more features• Need to take advantage of latest technology

– Security and other Bug Fixes

www.grids-center.org58The GRIDS Center, part of the NSF Middleware Initiative

Choosing Your Components

• Supported Platforms– Platforms include operating system versions– More platform types will increase release costs

• Versions– Newer versions may have needed features but are less

mature

• Prerequisites and Dependencies– Components may use other software that has to be

included in a release

www.grids-center.org59The GRIDS Center, part of the NSF Middleware Initiative

Choosing Your Components cont.

• Maturity Levels– Prototype

• Released from developer repository• No testing procedures• No multiple platform support• Major changes in feature set expected• No documentation

– Immature• Source Code releases• Minimal testing and deployment procedures• Community platform support• No effort towards maintaining compatibility with previous releases• Minimal documentation

– Mature• Has a release plan which includes maintenance releases• Does multi-platform release testing• Has documentation for deployment and features• Provides a migration plan for transitioning to new releases

– Stable• Minimal releases to fix bugs• Feature set is set in concrete• Training manuals, tutorials, exist

www.grids-center.org60The GRIDS Center, part of the NSF Middleware Initiative

Choosing Your Components cont.

• Maturity Tradeoffs– Prototype

• Developers eager to accommodate users.• Feature set could be ‘tailored’ to your specifications

– Immature• Community of developers and users willing to accommodate requests and

help with release issues– Mature

• Active bug database to handle requests and bugs.• Feature requests need to be balanced with requests from others

– Stable• No feature requests are considered.• Only popular platforms are supported.

www.grids-center.org61The GRIDS Center, part of the NSF Middleware Initiative

Deployment Strategy

• Release Preparation Efforts– Deployment automation

• CVS repository with components that are installed individually. • Source packages from an FTP site which include installation instructions• Automated scripts which installs the entire release

• The End User Deployment Experience.– Good

• Minimal Deployment hassle is good advertising• Less effort spent on deployment means more effort spent on doing Science

– Bad• More work for release team (possibly much more work if components are

immature).• Source packages which can be upgraded individually for bug fixes• Binary packages that spare the end user the hassle of building the software.• auto update services which ensure that the latest versions are installed

automatically– Several levels of component documentation integration that ensures the

release has a consistent ‘look’

www.grids-center.org62The GRIDS Center, part of the NSF Middleware Initiative

Deployment Strategy cont.

• Testing– Functional testing that ensures that the program requirements are being

met

– Release testing which makes sure that components are properly integrated

– deployment testing which makes sure that an installation works

• Bug Fix to Release Turnaround Time.– Quick turnaround needed for security fixes, ‘hot’ bugs, and to keep up with

the latest technology

www.grids-center.org63The GRIDS Center, part of the NSF Middleware Initiative

Release Preparation Requirements

• Need a release manager– Responsible for schedule/quality/feature decisions

• Identify release team– Integration and build experts

– Outside power testers

– Documentation integrator

• Need to Identify Available Component support

www.grids-center.org64The GRIDS Center, part of the NSF Middleware Initiative

Training, Education, and Documentation

• Ideas for a new approach– New Website: “Knowledge Central”– Instructor-Led Training & Seminars– Targeted Audiences– Transition Mentoring

www.grids-center.org65The GRIDS Center, part of the NSF Middleware Initiative

New Website: “Knowledge Central”

• Centralized List of Links to:– Tutorials– References– Cookbooks (a.k.a. HOW-TO’s)– Best Practices– Recommendations

• Links Prescreened (Rated?) for Quality• Categorized by Functional Area• Searchable

www.grids-center.org66The GRIDS Center, part of the NSF Middleware Initiative

Instructor-Led Training & Seminars

• Overview and “Intro to” Presentations– Answers the “What is?” and “Why?” kinds of questions

– No hands-on labs

• Set of Standard Courses– Focus on typical skills needed in order to accomplish routine activities

– Includes hands-on lab exercises

– Separate courses for Application Developers, System Administrators, End Users

• Specialized offerings that go more in-depth on topics such as:– Security

– Middleware APIs & Coding

– Job Scheduling

– etc.

www.grids-center.org67The GRIDS Center, part of the NSF Middleware Initiative

Education and Training

• System administrators– Software stack maintenance (deployment, upgrades,

etc.)

• Middleware developers– Implementing of end user requirements using

middleware APIs and services

• End users– Computational scientists using grid technologies to

advance science

www.grids-center.org68The GRIDS Center, part of the NSF Middleware Initiative

System Administrators

• Identifying software stack• Planning deployment process• Ensuring compatibility• Verification of services• Planning upgrade process• Evaluating technology evolution

www.grids-center.org69The GRIDS Center, part of the NSF Middleware Initiative

Middleware Developers

• Requirement definitions• Mapping requirements to technologies• Integration of existing technologies• Implementing of end user requirements using

middleware APIs and services

www.grids-center.org70The GRIDS Center, part of the NSF Middleware Initiative

End Users

• Computational scientists using grid technologies to advance science

• What’s new?– Credential management– Data services– Information services– Workflow tools– Collaborative tools

www.grids-center.org71The GRIDS Center, part of the NSF Middleware Initiative

Transition Mentoring

• Consulting services that focus on the transition period after training while applying the skills and technologies in the context of a specific project or application

www.grids-center.org72The GRIDS Center, part of the NSF Middleware Initiative

Discussion Items

• How can GRIDS help with Software Management?

• How can GRIDS help with training, education, and documentation of grid technologies?

• Session Break

www.grids-center.org73The GRIDS Center, part of the NSF Middleware Initiative

Software Management

• Define requirements• Produce a distribution• Provide tools to produce a distribution• Provide a release manager• Provide system administration training focused on

grid technologies

www.grids-center.org74The GRIDS Center, part of the NSF Middleware Initiative

Trainng, Education, and Documentation

– Hands on workshops targeted to specific system audiences (system administrators, developers, end users) or specific topics (security, work flow management, etc.)

– On site training addressing project specific requirements

– Consolidated and integrated web site consisting of documentation, tutorials, references, best practices, recommendations, etc.

– Help desk