Virtualization in MetaSystems Vaidy Sunderam Emory University, Atlanta, USA [email protected].
-
Upload
erica-neal -
Category
Documents
-
view
213 -
download
0
Transcript of Virtualization in MetaSystems Vaidy Sunderam Emory University, Atlanta, USA [email protected].
Credits and Acknowledgements
Distributed Computing Laboratory, Emory University
Dawid Kurzyniec, Piotr Wendykier, David DeWolfs, Dirk Gorissen, Maciej Malawski, Vaidy Sunderam
Collaborators Oak Ridge Labs (A. Geist, C. Engelmann, J. Kohl) Univ. Tennessee (J. Dongarra, G. Fagg, E. Gabriel)
Sponsors U. S. Department of Energy National Science Foundation Emory University
Virtualization
Fundamental and universal concept in CS, but receiving renewed, explicit recognitionMachine level
Single OS image: Virtuozo, Vservers, Zones Full virtualization: VMware, VirtualPC, QEMU Para-virtualization: UML, Xen (Ian Pratt et. al, cl.cam.uk)
“Consolidate under-utilized resources, avoid downtime, load-balancing, enforce security policy”
Parallel distributed computing Software systems: PVM, MPICH, grid toolkits and systems
Consolidate under-utilized resources, avoid downtime, load-balancing, enforce security policy + aggregate resources
Virtualization in PVM
Historical perspective – PVM 1.0, 1989
Key PVM Abstractions
Programming model Timeshared, multiprogrammed virtual machine Two-level process space
Functional name + ordinal number Flat, open, reliable messaging substrate
Heterogeneous messages and data representation
Multiprocessor emulation Processor/process decoupling Dynamic addition/deletion of processors Raw nodes projected
Transparently Or with exposure of heterogeneous attributes
Parallel Distributed Computing
Multiprocessor systems Parallel distributed memory computing Stable and mainstream: SPMD, MPI Issues relatively clear: performance Platforms
Applications Correspondingly tightly coupled
Parallel Distributed Computing
Metacomputing and grids Platforms
Parallelism Possibly within components, but mostly loose concurrency or
pipelining between components (PVM: 2-level model) Grids: resource virtualization across multiple admin domain
Moved to explicit focus on service orientation “Wrap applications as services, compose applications into
workflows”; deploy on service oriented infrastructure Motivation: service/resource coupling
Provider provides resource and service; virtualized access
Virtualization in PDC
What can/should be virtualized? Raw resource
CPU : process/task instantiation => staging, security etc Storage : e.g. network file system over GMail Data : value added or processed
Service Define interface and input-output behavior Service provider must operate the service
Communication Interaction paradigm with strong/adequate semantics
Key capability: Configurable/reconfigurable resource, service, and
communication
The Harness II Project
Theme Virtualized abstractions for critical aspects of parallel
distributed computing implemented as pluggable modules, (including programming systems)
Major project components Fault-tolerant MPI: specification, libraries Container/component infrastructure: C-kernel, H2O Communication framework: RMIX Programming systems:
FT-MPI + H2O, MOCCA (CCA + H2O), PVM
DVM-enabling components
Virtual layer
Harness II
Provider BProvider A Provider C
Cooperatingusers
FT-MPI PVM Comp.Activeobjects
...
Applications
App 1 App 2
Programming model
Aggregation for Concurrent High Performance ComputingHosting layer
Collection of H2O kernels Flexible/lightweight middleware
Equivalent to Distributed Virtual Machine But only on client side
DVM pluglets responsible for (Co) allocation/brokering Naming/discovery Failures/migration/persistence
Programming environments: FT- MPI, CCA, paradigm frameworks, distributed numerical libraries
H2O Middleware Abstraction
Providers own resourcesIndependently make them available over the networkClients discover, locate, andutilize resourcesResource sharing occurs between single provider and single client
Relationships may betailored as appropriate
Including identity formats, resource allocation, compensation agreements
Clients can themselves be providers Cascading pairwise relationships may
be formed
Network
Providers
Clients
H2O Framework
Resources provided as services
Service = active software component exposing functionality of the resource
May represent „added value” Run within a provider’s
container (execution context)
May be deployed by any authorized party: provider, client, or third-party reseller
Provider specifies policies Authentication/authorization Actors kernel/pluglet
Decoupling Providers/providers/clients
Container
Provider host
Deploy Lookup& use
ProviderClient
<<create>>
B
A
Provider
<<create>>
A
B
Container
Lookup& use
Client
DeployProvider,
Client,or Reseller
Provider host
Traditional model
H2O model
Example usage scenarios
Deploy
B
A
LegacyApp
DeployProvider
AClient
Repository
A BReseller
C
Deploy
Anativecode
ProviderClient
Repository
ABDeveloper
C
ProviderClient
B
A
...
Registration and Discovery e-mail,phone, ...JNDIUDDI LDAP DNS GIS ...
B
Publish Find
Provider
Resource = computational service Reseller deploys software
component into provider’s container
Reseller notifies the client about the offered computational service
Client utilizes the service
Resource = raw CPU power Client gathers application
components Client deploys components into
providers’ containers Client executes distributed
application utilizing providers’ CPU power
Resource = legacy application Provider deploys the service Provider stores the information
about the service in a registry Client discovers the service Client accesses legacy
application through the service
Model and Implementation
H2O nomenclature container = kernel component = pluglet
Object-oriented model, Java and C-based implementations
Pluglet = remotely accessible object
Must implement Pluglet interface, may implement Suspendible interface
Used by kernel to signal/trigger pluglet state changes
Model Implement (or wrap) service as a
pluglet to be deployed on kernel(s)
Pluglet
Pluglet
Functionalinterfaces
Kernel
Clients
[Suspendible]
Interface Pluglet { void init(ExecutionContext cxt); void start(); void stop(); void destroy();}
Interface Suspendible { void suspend(); void resume();}
Interface StockQuote { double getStockQuote();}
(e.g. StockQuote)
Accessing Virtualized Services
Request-response ideally suited, but Stateful service access must be supported Efficiency issues, concurrent access Asynchronous access for compute intensive service Semantics of cancellation and error handling Many approaches focus on performance alone and
ignore semantic issues
Solution Enhanced procedure call/method invocation Well understood paradigm, extend to be more
appropriate to access metacomputing services
The RMIX layer
H2O built on top of RMIX communication substrate Provides flexible p2p communication layer for H2O applications
Enable various message layer protocols within a single, provider-based framework library
Adopting common RMI semantics
Enable high performance and interoperability Easy porting between protocols, dynamic protocol negotiation
Offer flexible communication model, but retain RMI simplicity Extended with: asynchronous and one-way calls
Issues: Consistency, Ordering, Exceptions, Cancellation
RPC clientsWeb Services
SOAP clients...
Java H2O kernel
A
C
B
H2O kernel
E
F
D
RMIX
Networking
RMIX
NetworkingRPC, IIOP,JRMP, SOAP, …
RMIX Overview
Extensible RMI frameworkClient and provider APIs
uniform access to communication capabilities
supplied by pluggable provider implementations
Multiple protocols supported
JRMPX, ONC-RPC, SOAP
Configurable and flexible Protocol switching Asynchronous invocation
ONC-RPCWeb Services
SOAP clients
GM
RMIX
RMIXXSOAP
RMIXRPCX
RMIXMyri
RMIXJRMPX
Java
ServiceAccess
RMIX Abstractions
Uniform interface and API
Protocol switching Protocol negotiation Various protocol stacks
for different situations SOAP: interoperability SSL: security ARPC, custom (Myrinet,
Quadrics): efficiency
Harness Kernel
Internet
security
firewall
efficiency
efficiency
H2O PlugletClient or Server
H2O PlugletClient or Server
H2O PlugletClient or Server
H2O Pluglet
H2O PlugletClient or Server
Asynchronous access to virtualized remote resources
Parameter marshalling Data consistency Also in PVM, MPI etc
Exceptions/cancellation Critical for stateful servers Conservative vs. best effort
Other issues Execution order Security
Virtualizing communications
Performance/familiarity vs. semantic issues
:stub
:paramcreate()
asyncCall()
modify() read()
Asynchronous RMIX
:stub
“started”
:target
“completed”
Client Server
DisregardAt Client-Side
InterruptClient I/O
DisregardAt Server-Side
Interrupt Server Thread
InterruptServer I/O
Ignore ResultReset server state
Result Delivery
ResultUnmarshalling
ParameterMarshalling Parameter
Unmarshalling
ResultMarshalling
Method Call
Call Initiation
Cancellation at various stages of the call
Programming Models: CCA and H2O
Common Component Architecture
Component standard for HPC
Uses and provides ports described in SIDL
Support for scientific data types
Existing tightly coupled (CCAFFEINE) and loosely coupled, distributed (XCAT) frameworks
H2O Well matched to CCA
model
ContainerProvider host
Deploy Lookup& use
Provider Client
<<create>>
B
A
Provider
<<create>>
A
B
Container
Lookup& use
Client
DeployProvider,
Client,or Reseller
Provider host
Traditional model
Proposed model
ContainerProvider host
Deploy Lookup& use
Provider Client
<<create>>
BB
AA
Provider
<<create>>
AA
BB
Container
Lookup& use
Client
DeployProvider,
Client,or Reseller
Provider host
Traditional model
Proposed model
MOCCA implementation in H2O
ComponentPlugletComponent
Pluglet
CCAComponent
ComponentPluglet
CCAComponent
BuilderPluglet
H2O Kernel
BuilderService
Invoke
Manage
Builder
CCACCA
Pluglet Pluglet
Builder Builder
CCACCA
Pluglet Pluglet
BuilderBuilder
CCACCA
Pluglet Pluglet
Builder
MoccaMainBuilder
MoccaMainBuilder
Each component running in separate pluglet
Thanks to H2O kernel security mechanisms, multiple components may run without interfering
Two-level builder hierarchy
ComponentID: pluglet URI
MOCCA_Light: pure Java implementation (no SIDL)
Performance: Small Data Packets
Factors:SOAP header overhead in XCATConnection pools in RMIX
Large Data Packets
• Encoding (binary vs. base64)
• CPU saturation on Gigabit LAN (serialization)
• Variance caused by Java garbage collection
Use Case 2: H2O + FT-MPI
Overall scheme: H2O framework installed on computational nodes, or
cluster front-ends Pluglet for startup, event notification, node discovery FT-MPI native communication (also MPICH)
Major value added FT-MPI need not be installed anywhere on computing
nodes To be staged just-in-time before program execution Likewise, application binaries and data need not be
present on computing nodes The system must be able to stage them in a secure
manner
Staging FT-MPI runtime with H2O
FT-MPI runtime library and daemons Staged from a repository (e.g. Web server) to the
computational node upon user’s request Automatic platform type detection; appropriate binary files
are downloaded from the repository as needed
Allows users to run fault tolerant MPI programs on machines where FT-MPI is not pre-installed
Not needing login account to do so: using H2O credentials instead
host
FT-MPIbinary
repository
StartupPluglet.classLINUX/ startup_d libftmpi.soSUN4SOL2/ startup_d libftmpi.so...
kernel
startuppluglet
deployProvider
User
startup_d
libf tmpi.so
stage
Launching FT-MPI applications with H2O
Staging applications from a network repository
Uses URL code base to refer to a remotely stored application
Platform-specific binary transparently uploaded to a computational node upon client request
Separation of roles Application developer
bundles the application and puts it into a repository
The end-user launches the application, unaware of heterogeneity
Applicationrepository
http://myorg.edu/mpiapps/
LINUX/ myapp1 myapp2SUN4SOL2/ myapp1 myapp2...
kernel – cluster 1
startuppluglet
Providers
User
kernel – cluster 2
startuppluglet
kernel – cluster n
startuppluglet
ftmpirun -np 512 -codebase ”http://myorg.edu/mpiapps/” myapp1
Distributed Virtual Machine
stage, run
Interconnecting heterogeneous clusters
Private, non-routable networks Communication proxies on cluster front-ends route
data streams Local (intra-cluster) channels not affected Nodes use virtual addresses at the IP level; resolved
by the proxy
Cluster 1
App
Startup_d
App
Startup_d
Cluster 2App
Startup_d
App
Startup_d
Communicationacross clusters
Communicationwithin cluster
H2O proxy H2O proxy
Startuppluglet
Startuppluglet
Initial experimental results
Proxied connection versus direct connection Standard FT-MPI throughput benchmark was used within a Gig-Ethernet cluster: proxies retain 65% of
throughput
Summary
Virtualization in PDC Devising appropriate abstractions Balance pragmatics and performance vs. model
cleanness
The Harness II Project H2O kernel
Reconfigurability, by clients/tpr’s very valuable RMIX communications framework
High level abstractions for control comms (native data comms)
Multiple programming model overlays CCA, FT-MPI, PVM Concurrent computing environments on demand