Post on 21-Dec-2015
CSE 160/Berman
Grid Computing 2
http://www.globus.orghttp://www.cs.virginia.edu/
~legion/http://www.cs.wisc.edu/condor/
(thanks to shava and holly [see notes for CSE 225])
CSE 160/Berman
Outline
• Today:– Condor
– Globus
– Legion
• Next class: – Talk by Marc Snir, Architect of IBM’s
Blue Gene
– Tuesday June 6, AP&M 4301 1:00-2:00
CSE 160/Berman
Condor
• Condor is a high-throughput scheduler
– Main idea is to leverage free cycles on very large collections of privately owned, non-dedicated desktop workstations
– Performance measure is throughput of jobs
• Rather than how fast can a particular job run, how many jobs can complete over a long period of time.
• Developed by Miron Livny et al. at U. of Wisconsin
CSE 160/Berman
Condor Basics• Condor = “hunter of idle workstations”• Condor pool consists of large number of
privately controlled UNIX workstations– (Condor now being ported to NT)– WS owners define the conditions under which the WS
can be allocated by Condor to an external user
• External Condor jobs run while machines are idle– User does not need a login on participating
machines• Uses remote system calls to submitting WS
CSE 160/Berman
Condor Architecture (all machines in same Condor
Pool)
Architecture:• Each WS runs Schedd
and Startd daemons– Startd monitors and
terminates jobs assigned by CM
– Schedd queues jobs submitted to Condor at that WS and seeks resources for them
• Central Manager (CM) WS controls allocation and execution for all jobs
Schedd
Shadow
Startd
Starter
UserProcess
Central Manager
SubmissionMachine
ExecutionMachine
CSE 160/Berman
Standard Condor Protocol (all machines in same Condor
Pool)Protocol:• Schedd (submitting machine) sends job
context to CM; Execution machine sends machine context to CM
• CM identifies a match between job requirements and execution machine resources
• CM sends to Schedd the execution machine ID
• Schedd forks a Shadow process on submission machine
• Shadow passes job requirements to Startd on execution machine and gets acknowledgement that execution machine is still idle
• Shadow sends executable to execution machine where it executes until completion or migration
Schedd
Shadow
Startd
Starter
UserProcess
Central Manager
SubmissionMachine
ExecutionMachine
CSE 160/Berman
More Condor Basics• Participating condor machines not required to
share file systems• No source code changes to user’s code
required to use Condor, users must re-link their program in order to use checkpoint and migration– vanilla jobs vs. condor jobs
• Condor jobs allocated to good target resource using a matchmaker
• Single condor jobs automatically checkpointed and migrated between WSs, and restarted as needed
CSE 160/Berman
Condor Remote System Call Strategy • Job must be able to read and write files on its
submit workstation
Submission WS
Submittedprocess
file
Execution WS
Submission WS
allocatedprocess
Shadowprocess
file
ExecutionWS
After allocation …
CSE 160/Berman
Condor Matchmaking• Matchmaking mechanism matches job specs to
machine characteristics• Matchmaking done using classads
– Resources produce resource offer ads• Include information such as available RAM memory, CPU type
and speed, virtual memory size, physical location, current load average, etc.
– Jobs provide resource request ad which defines the required and desired set of resources to run on
• Condor acts as a broker which matches and ranks resource offer ads against resource request ads – Condor makes sure that all requirements in both ads are
satisfied– Priorities of users and certain types of ads also taken into
consideration
CSE 160/Berman
Condor Checkpointing
• When WS owner returns, job can be checkpointed and restarted on another WS– Periodic checkpoint feature can periodically
checkpoint the job so that work is not lost should the job be migrated
• Condor jobs vs. “vanilla” jobs– Condor job executables must be relinked and can be
checkpointed, migrated and restarted
– Vanilla jobs are not relinked and cannot be checkpointed and migrated
CSE 160/Berman
Condor Checkpointing Limitations
• Only single process jobs supported
– Inter-process communication not supported (socket, send, recv, etc. not implemented)
• All file operations idempotent (read-only, write-only work correctly, read and write to the same file may not)
• Disk space must be available to store the checkpoint file on the submitting machines.
– Each checkpointed job has an associated checkpoint file which is approximately the size of the address space of the process.
CSE 160/Berman
Condor-PVM and Parallel Jobs
• PVM master/slave jobs can be submitted to Condor pool. (Special condor-pvm universe)– Master is run on machine where the job was submitted– Slaves pulled from the condor pool as they become
available
• Condor acts as resource manager for pvm daemon– Whenever pvm program asks for nodes, request is
remapped to Condor– Condor finds machine in condor pool and adds it to pvm
virtual machine
CSE 160/Berman
Condor and the Grid
• Condor and the Alliance– Condor one of the Grid technologies
deployed by the Alliance– Used for production high-throughput
computing by partners
• Condor and Globus– Globus can use Condor as a local resource
manager. – Globus RSL specs translated into
matchmaker classads
CSE 160/Berman
Condor and the Grid• Flock of Condors
– Aggregation of condor pools into “flock” enables Condor pools to cross load-sharing and protection boundaries
– Condor flock may include Condor pools connected by wide-area networks
• Infrastructure– Idea is to add Gateway machine for every pool.
– Gateway machines act as resource brokers for machines external to a pool• In published description, GW machine presents randomly chosen external pools/machines
• CM does not need to know about flocking
– Each GW machine runs GW-startd and GW-schedd as with a single condor pool
CSE 160/Berman
Flocking Protocol(machines in different pools)
Schedd
Shadow
GW-Startd
GW-Startdchild
Central Manager
SubmissionMachine
GatewayMachine
Startd
Starter
UserProcess
Central Manager
GatewayMachine
ExecutionMachine
Submission Pool Execution Pool
GW-Schedd
GW-SimulateShadow
CSE 160/Berman
Globus
• Globus -- integrated toolkit of Grid services– Developed by Ian Foster (ANL/UC) and
Carl Kesselman (USC/ISI)– Bag of services model – applications
can use Grid services without having to adopt a particular programming model
CSE 160/Berman
Core Globus Services
• Resource allocation and process management (GRAM, DUROC, RSL)
• Information Infrastructure (MDS)• Security (GSI)• Communication (Nexus)• Remote Access (GASS, GEM)• Fault Detection (HBM)• QoS (GARA, Gloperf)
CSE 160/Berman
Globus Layered Architecture
Applications
Core ServicesMetacomputing
Directory Service
GRAMGlobus
Security Interface
Heartbeat Monitor
Nexus
Gloperf
Local Services
LSF
Condor MPI
NQEEasy
TCP
SolarisIrixAIX
UDP
High-level Services and Tools
DUROC globusrunMPI Nimrod/GMPI-IO CC++
GlobusView Testbed Status
GASS
CSE 160/Berman
Globus Resource Management Services
• Resource Management services provide mechanism for remote job submission and management
• 3 low level services: – GRAM (Globus Resource Allocation Manager)
• Provides remote job submission and management
– DUROC (Dynamically Updated Request Online Co-allocator)
• Provides simultaneous job submission• Layers on top of GRAM
– RSL (Resource Specification Language)• Language used to communicate resource requests
CSE 160/Berman
GRAM GRAM GRAM
LSF EASY-LL NQE
Application
RSL
Simple ground RSL
Information Service
Localresourcemanagers
RSLspecialization
Broker
Ground RSL
Co-allocator
Queries& Info
Globus Resource Management Architecture
CSE 160/Berman
Globus Information Infrastructure
• MDS (Metacomputing Directory Service)– MDS stores information about entry = some
type of object (organization, person, network, computer, etc.)
– Object class associated with each entry describes a set of entry attributes
– LDAP (Lightweight Directory Access Protocol) used to store information about resources
• LDAP = hierarchical, tree-structured information model defining form and character of information
CSE 160/Berman
Globus Security Service
• GSI (Grid Security Infrastructure)– Provides public key-based security system
that layers on top of local site security• User identified to system using X.509 certificate
containing info about the duration of permissions, public key, signature of certificate authority
• User also has private key
– Provides users with a single sign-on access to the various sites to which they are authorized
CSE 160/Berman
More GSI
• Resource management system uses GSI to establish which machines user may have access to
• GSI system allows for proxies so that user only need logon once, as opposed to logging on for all machines involved in a distributed computation– Proxies used for short-term authentication, rather than
long-term use
CSE 160/Berman
Globus Communication Services
• Nexus– Communication library which provides
asynchronous RPC, multi-method communication, data conversion and multi-threading facilities
• I/O– Low level communication library which
provides a thin wrapper around TCP, UDP, IP multicast and file I/O
– Integrates GSI into TCP communication
CSE 160/Berman
Globus Remote Access Services
• GASS (Globus Access to Secondary Storage)– Provides secure remote access to files
• GEM (Globus Executable Management)– Intended to support identification,
location, and creation of executables in a heterogeneous environment.
CSE 160/Berman
Globus Fault Detection Services
• HBM (Heartbeat Monitor)– Provides mechanisms for monitoring
multiple remote processes in a job and enabling application to respond to failures
• Nexus Fault Detection:– Notifies applications using Nexus
when a communicating process fails (but not which one)
CSE 160/Berman
Globus QoS Services• GARA (Globus Architecture for Reservation
and Allocation)– Provides dedicated access to collections of
resources via reservations
• Gloperf – Provides bandwidth and latency information
• Wolski’s NWS being integrated with Globus– NWS provides monitoring and predictive
information
CSE 160/Berman
Globus and the Grid
• Major player in Grid Infrastructure development
• Currently deployed widely • User community strong• Infrastructure supported by IPG,
Alliance and NPACI– Exclusive infrastructure of Alliance
and IPG
CSE 160/Berman
Legion• Developed by Andrew Grimshaw (UVA)
• Provides single, coherent virtual machine model that addresses grid issues within a reflective, object-based metasystem
• Everything is an object in Legion model – HW resources, SW resources, etc.
CSE 160/Berman
Legion Goals• Site autonomy
– Each organization maintains control over their own resources
• Extensibility– Users can construct own mechanisms and
policies within Legion
• Scalability– No centralized structures or servers; full
distribution
CSE 160/Berman
Legion Goals• Easy to use / seamless
– System must hide complexity of environment– “Ninja users” must be able to tune applications
• High performance via parallelism– Coarse-grained applications should perform
well
• Single, persistent object space– Single name space, transparent of location or
replication
• Security– “do no harm” – Legion should not weaken local
security policies
CSE 160/Berman
Legion Object Model• Every Legion object is defined and
managed by its class object; class objects act as managers and make policy, as well as define instances
• Legion defines the interface and basic functionality of a set of core object types which support basic services
• Users may also define and build their own class objects
CSE 160/Berman
Legion Object Model• Core Objects:
– Host objects• Encapsulate machine capabilities in Legion
(processors and memory)• Currently represent single host systems
(uniprocessor and multiprocessor shared memory)
– Vault objects• Represents persistent storage
– Implementation objects• Generally an executable file – host object can
execute when it receives a request to activate or create an object
CSE 160/Berman
Legion Object Model
• Basic system services provided by core objects– Naming and binding, object creation, activation,
deactivation and deletion
• Responsibility for system-level functionality endowed on classes– Classes (which are also objects) define and manage
objects associated with them
– Classes create new instances, schedule them for execution, activate and deactivate them, and provide current location info for contacting them
• Users can define and build own class objects
CSE 160/Berman
Legion Programming• Legion supports MPI and PVM
libraries via “emulation libraries” (which use runtime Legion library)– Applications need to be recompiled
and relinked
• Legion supports BFS (Basic Fortran Support) and Java
• Legion OO programming language = Mentat (MPL)
CSE 160/Berman
Legion and the Grid• Major Grid player with Globus• Legion infrastructure deployed at NPACI,
Department of Defense Modernization sites, being considered as infrastructure for Boeing’s distributed product data management and manufacturing resource control systems.
• Large-scale application implementations of molecular dynamics applications [Charmm and Amber] at NPACI
CSE 160/Berman
Still other Infrastructure Approaches
• Corba• Globe (Europe)• Suma (Venezuela)• Web-based approaches (Geoffrey Fox)• Jini (Sun)• DCom (MS)etc.
CSE 160/Berman
What’s Missing?• How do we ensure application
performance?• Performance-efficient application
development and execution:– Ninja programming– AppLeS, Nimrod, Mars,
Prophet/Gallop, MSHN, etc.– GrADS
GrADS – Grid Application Development and Execution
Environment• Prototype system which
facilitates end-to-end “grid-aware” program development
• Based on the idea of a performance economy in which negotiated contracts bind application to resources
• Joint project with large team of researchers
Ken KennedyJack DongarraDennis GannonDan Reed Lennart Johnsson
PSE
Config.object
program
wholeprogramcompiler
Source appli-cation
libraries
Realtimeperf
monitor
Dynamicoptimizer
Grid runtime System
(Globus)
negotiation
Softwarecomponents
Scheduler/Service
Negotiator
Performance feedbackPerf
problem
Grid Application Development System
Andrew ChienRich WolskiIan FosterCarl KesselmanFran Berman
Cool GrADS Ideas• Performance Contracts
– Vehicle for sharing complex, multi-dimensional performance information between components
• Performance Economy– Framework in which to negotiate services and promote
performance.
– Performance contracts play fundamental role in exchange of information and binding of resources
• Resource allocation and performance steering using fuzzy logic (“AppLePilot”)– Mechanism for describing quality of information
– Allows for performance steering based on evaluation of application progress
Next Time• Talk by Marc Snir,
Architect of IBM’s Blue Gene– Tuesday June 6, AP&M 4301 1:00-2:00
• Abstract IBM Research announced in December a 5 year, $100M research
project aimed at developing a petaop computer and using it for research in computational biology. The talk will discuss the architectural choices involved in the design of a petaop computer, and will present the design point pursued by the Blue Gene project. We shall discuss the mapping of molecular dynamic computations onto the Blue Gene architecture and outline research problems in Computer Science and Computational Biology that such project motivates.