Advanced Computing and Information Systems laboratory Virtual Appliances for Training and Education...
-
Upload
steven-hancock -
Category
Documents
-
view
214 -
download
0
Transcript of Advanced Computing and Information Systems laboratory Virtual Appliances for Training and Education...
Advanced Computing and Information Systems laboratory
Virtual Appliances for Training and Education in FutureGrid
Renato FigueiredoArjun Prakash, David WolinskyACIS Lab - University of Florida
Advanced Computing and Information Systems laboratory 2
Education and TrainingEducation and Training
Importance of experimental work in systems research• Needs also to be addressed in education
• Complement to fundamental theory
FutureGrid: a testbed for experimentation and collaboration• Education and training contributions:
• Lower barrier to entry – pre-configured environments, zero-configuration technologies
• Community/repository of hands-on executable environments: develop once, share and reuse
Advanced Computing and Information Systems laboratory 3
Goals and ApproachGoals and Approach
A flexible, extensible platform for hands-on, lab-oriented education on FutureGrid
Focus on usability – lower entry barrier• Plug and play, open-source
• Seamlessly work on local, cloud resources
Virtualization + social networking to create educational sandboxes• Virtual “Grid” appliances: self-contained, pre-
packaged execution environments
• Group VPNs: simple management of virtual clusters by students and educators
Advanced Computing and Information Systems laboratory 4
Outline
Virtual appliances GroupVPN Virtual cluster configuration Example: MPI appliance
Advanced Computing and Information Systems laboratory 5
What is an appliance?
Physical appliances• Webster – “an instrument or device designed
for a particular use or function”
Advanced Computing and Information Systems laboratory 6
What is an appliance?
Hardware/software appliances• TV receiver + computer + hard disk + Linux +
user interface
• Computer + network interfaces + FreeBSD + user interface
Advanced Computing and Information Systems laboratory 7
What is a virtual appliance?
An appliance that packages software and configuration needed for a particular purpose into a virtual machine “image”
The virtual appliance has no hardware – just software and configuration
The image is a (big) file
It can be instantiated on hardware
Advanced Computing and Information Systems laboratory 8
Virtual appliance example
Linux + Apache + MySQL + PHP
copy
instantiate
LAMPimage
A web server Another Web server
Repeat…
VirtualizationLayer
Advanced Computing and Information Systems laboratory 9
Training/education in clusters
Replace LAMP with the middleware of your choice – e.g. MPI, Hadoop, Condor
copy
instantiate
MPIimage
An MPI workerAnother MPI worker
Repeat…
VirtualizationLayer
Advanced Computing and Information Systems laboratory 10
What about the network?
Multiple Web servers might be completely independent from each other
MPI nodes are not• Need to communicate and coordinate with
each other
• Each worker needs an IP address, uses TCP/IP sockets
Cluster middleware stacks assume a collection of machines, typically on a LAN (Local Area Network)
Advanced Computing and Information Systems laboratory 11
Enter virtual networks
Physical machines
Switched
network
NOWs, COWs “WOWs”
•Wide-area
•Virtual machines (VMs)
•Self-organizing overlay
IP tunnels, P2P routing
Installation
image
Virtual machinesVM image
•Local-area
•Physical machines
•Self-organizing switching
(e.g. Ethernet spanning tree)
Advanced Computing and Information Systems laboratory 12
Virtual cluster appliances
Virtual appliance + virtual network
copy
instantiate
MPI+
VirtualNetwork An MPIworker Another MPI worker
Repeat…
Virtual machine
Virtual network
Advanced Computing and Information Systems laboratory 13
Virtual network architecture
Application
VNIC
VirtualRouter
VirtualRouter
VNIC
Application
(Wide-area)Overlay network
Isolated, private virtualaddress space
10.10.1.2
10.10.1.1
Unmodified applicationsConnect(10.10.1.2,80)
Capture/tunnel, scalable,resilient, self-configuringrouting and object store
Advanced Computing and Information Systems laboratory 14
Virtual appliance clustersVirtual appliance clusters
Virtual appliances• Encapsulate software environment in image
• Virtual disk file(s) and virtual hardware configuration
The Grid appliance • Encapsulates cluster software environments
• Current examples: Condor, MPI, Hadoop
• Homogeneous images at each node
• Virtual LAN connecting nodes forms a cluster
• Deploy within or across domains
Advanced Computing and Information Systems laboratory 15
Grid appliance internals
Host O/S• Linux
Grid/cloud stack• MPI, Hadoop, Condor, …
Glue logic for zero-configuration• Automatic DHCP address assignment
• Multicast DNS (Bonjour, Avahi) resource discovery
• Shared data store - Distributed Hash Table
• Interaction with VM/cloud
Advanced Computing and Information Systems laboratory 16
Example – FutureGrid
Nimbus
Eucalyptus
Appliance
imageEducationTraining
Advanced Computing and Information Systems laboratory 17
Example - Archer
2. Create/joinVPN groupDownload config
Free pre-packaged ArcherVirtual appliances - run on free VMMs (VMware, VirtualBox, KVM)
CMS, Wiki, YouTube: Community-contributedcontent: applications,datasets, tutorials
Archer seed resources450 cores, 5 sites
–
Archer Global Virtual Network
Condor schedulerNFS file systems
1: Downloadappliance
3. Boot appliancesAutomatic connection to groupVPN – self-configuring DHCP
Free pre-packaged ArcherVirtual appliances - run on free VMMs (VMware, VirtualBox, KVM)
Community-contributedcontent: applications,datasets, tutorials
–
Archer Global Virtual Network
Middleware:Condor schedulerNFS file systems
1: Downloadappliance
Advanced Computing and Information Systems laboratory 18
One appliance, multiple hosts
Allow same logical cluster environment to instantiate on a variety of platforms• Local desktop, clusters; FutureGrid; Amazon
EC2; Science Clouds…
Avoid dependence on host environment• Make minimum assumptions about VM and
provisioning software• Desktop: 1 image, VMware, VirtualBox, KVM
• Para-virtualized VMs (e.g. Xen) and cloud stacks – need to deal with idiosyncrasies
• Minimum assumptions about networking• Private, NATed Ethernet virtual network interface
Advanced Computing and Information Systems laboratory 19
Virtual network: GroupVPN
Key techniques: • IP-over-P2P (IPOP) tunneling
• GroupVPN Web 2.0/social network interface
Self-configuring• Avoid administrative overhead of typical VPNs
• NAT and firewall traversal; DHCP virtual addresses
Scalable and robust• P2P routing deals with node joins and leaves
Networks are isolated• One or more private IP address spaces
• Decentralized DHCP serves addresses for each space
Advanced Computing and Information Systems laboratory 20
GroupVPN Overview
Alice
CarolBob
SocialNetworkWeb interface
Social network(e.g. XMPP,group site
Overlay network(IPOP)
node0.ipop10.10.0.2
node1.ipop10.10.0.3
SocialNetwork API
Messaging layer/information system
Alice’s public keysBob’s public keysCarol’s public key
Bootstrapping private links throughWeb 2.0 interfaces and IP-over-P2P overlay tunneling
node2.ipop
Advanced Computing and Information Systems laboratory 21
GroupVPN Web interface
Users can request to join a group, or create their own VPN group• E.g. instructor creates a GroupVPN for class
• Determines who is allowed to connect to virtual network
Owner can authorize users to join, remove users, authorize other to admin• Actions typical of a certificate authority
happen in the back-end without user having to deal with security operations
• E.g. sign/revoke a VPN X.509 certificate
Advanced Computing and Information Systems laboratory 22
GroupVPN architecture
Application
VNIC
VirtualRouter
VirtualRouter
VNIC
Application
GroupVPNoverlay
“Tap” devices
10.10.1.2
10.10.1.1
Grid/cloud Middleware, apps
GroupVPN router
Advanced Computing and Information Systems laboratory 23
Bi-directional structured overlay (Brunet library) Self-configured NAT traversal Self-optimized links
Direct, relay Self-healing structure
Multi-hoppath Overlay
router
Under the hood: overlay architecture
Overlayrouter
Directpath
Advanced Computing and Information Systems laboratory 24
Deploying virtual clusters
Same image, per-group VPNs
copy
instantiate
Hadoop+
VirtualNetwork A Hadoop worker Another Hadoop worker
Repeat…
Virtual machine
GroupVPN
GroupVPNCredentials
(fromWeb site)
Virtual IP - DHCP10.10.1.1
Virtual IP - DHCP10.10.1.2
Advanced Computing and Information Systems laboratory 25
Configuration framework
At the end of GroupVPN initialization:• Each node of a private virtual cluster gets a
DHCP address on virtual tap interface
• A barebones cluster
• Additional configuration required depending on middleware• Which node is the Condor negotiator? Hadoop
front-end? Which nodes are in the MPI ring?
Key frameworks used:• IP multicast discovery over GroupVPN
• Front-end queries for all IPs listening in GroupVPN
• Distributed hash table• Advertise (put key,value), discover (get key)
Advanced Computing and Information Systems laboratory 26
Configuring and deploying groups
Generate virtual floppies• Through GroupVPN Web interface
Deploy appliances image(s)• FutureGrid (Nimbus/Eucalyptus), EC2
• GUI or command line tools
• Use APIs to copy virtual floppy to image
Submit jobs; terminate VMs when done
Advanced Computing and Information Systems laboratory 27
FutureGrid example - Nimbus
Example using Nimbus:
workspace.sh --deploy --mdUserdata /tmp/floppy-worker.zip.b64 --service https://f1r.idp.ufl.futuregrid.org:8443/wsrf/services/WorkspaceFactoryService --file /tmp/output.xml --metadata /tmp/grid-appliance.xml --deploy-mem 1000 --deploy-duration 100 --trash-at-shutdown Trash --exit-state Running --displayname grid-appliance --sshfile /home/renato/.ssh/id_dsa.pub
GroupVPN floppy imageNimbus service endpoint
Metadata – points to image on Nimbus
server
SSH public key to log in to instance
Advanced Computing and Information Systems laboratory 28
Summary
Hands-on experience with clusters is essential for education and training
Virtualization, clouds simplify software packaging/configuration
Grid appliance allows users to easily deploy hands-on virtual clusters
FutureGrid provides resources and cloud stacks for educators to easily deploy their own virtual clusters
Goal - towards a community-based marketplace of educational appliances for TeraGrid
Advanced Computing and Information Systems laboratory 29
Thank you!
More information:• http://www.futuregrid.org
• http://grid-appliance.org
This document was developed with support from the National Science Foundation (NSF) under Grant No. 0910812 to Indiana University for "FutureGrid: An Experimental, High-Performance Grid Test-bed." Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF
Advanced Computing and Information Systems laboratory 31
Local appliance deployments
Two possibilities:• Share our “bootstrap” infrastructure, but run a
separate GroupVPN• Simplest to setup
• Deploy your own “bootstrap” infrastructure• More work to setup
• Especially if across multiple LANs
• Potential for faster connectivity
Advanced Computing and Information Systems laboratory 32
PlanetLab bootstrap Shared virtual network bootstrap
• Runs 24/7 on 100s of machines on the public Internet
• Connect machines across multiple domains, behind NATs
Advanced Computing and Information Systems laboratory 33
PlanetLab bootstrap: approach
Create GroupVPN and GroupAppliance on the Grid appliance Web site
Download configuration floppy Point users to the interface; allow users
you trust into the group Trusted users can download
configuration floppies and boot up appliances
Advanced Computing and Information Systems laboratory 34
Private bootstrap: General approach
Good choice for single-domain pools Create GroupVPN and GroupAppliance
on the Grid appliance Web site Deploy a small IPOP/GroupVPN
bootstrap P2P pool• Can be on a physical machine, or appliance
• Detailed instructions at grid-appliance.org
The remaining steps are the same as for the shared bootstrap
Advanced Computing and Information Systems laboratory 35
Connecting external resources
GroupVPN can run directly on a physical machine, if desired• Provides a VPN network interface
• Useful for example if you already have a local Condor pool• Can “flock” to Archer
• Also allows you to install Archer stack directly on a physical machine if you wish
Advanced Computing and Information Systems laboratory 36
FutureGrid example - Eucalyptus
Example using Eucalyptus (or ec2-run-instances on Amazon EC2):
euca-run-instances ami-fd4aa494 -f floppy.zip --instance-type m1.large -k keypair
GroupVPN floppy image
Image ID on Eucalyptus server
SSH public key to log in to instance
Advanced Computing and Information Systems laboratory 37
Where to go from here?
Tutorials on FutureGrid and Grid appliance Web sites for various middleware stacks• Condor, MPI, Hadoop
A community resource for educational virtual appliances• Success hinges on users effectively getting
involved
• If you are happy with the system, let others know!
• Contribute with your own content – virtual appliance images, tutorials, etc