Slide 1
Cluster-on-Demand Cluster-on-Demand (COD)(COD)
Justin MooreJustin Moore
Duke UniversityDuke University
Slide 2
How Big Is It?How Big Is It?
500? 5000? 25,000?500? 5000? 25,000?
Clusters are growingClusters are growing
Clusters are expensiveClusters are expensive
– Power, A/C, Power, A/C, ManagementManagement … …
How to manage {heat, power, failures}?How to manage {heat, power, failures}?
How to keep everything organized?How to keep everything organized?
How to divide resources?How to divide resources?
Slide 3
How Do You Use It?How Do You Use It?
We’ve got good middlewareWe’ve got good middleware– Batch queues, Internet Services, research apps …Batch queues, Internet Services, research apps …
But customers are very pickyBut customers are very picky– ““Linux!” “FreeBSD!” “Windows!” “Minix!” “Minix??”Linux!” “FreeBSD!” “Windows!” “Minix!” “Minix??”
– ““I only need it for 30 minutes!!”I only need it for 30 minutes!!”
Customers != administratorsCustomers != administrators– Contributing to the problem, not the solutionContributing to the problem, not the solution
How to share and manage our clusters?How to share and manage our clusters?
““Can’t we all just get along??”Can’t we all just get along??”
Slide 4
COD: The More the MerrierCOD: The More the Merrier
Automated framework for resource managementAutomated framework for resource management
Owners define policies, customers define configsOwners define policies, customers define configs
COD creates, configures COD creates, configures dynamic virtual clustersdynamic virtual clusters
– Isolated, secure collection of nodesIsolated, secure collection of nodes
– Backed by network storageBacked by network storage
– Automatic configuration: fast and OS-agnosticAutomatic configuration: fast and OS-agnostic
Middleware negotiates allocations with CODMiddleware negotiates allocations with COD
– Virtual Cluster Manager: COD-aware layerVirtual Cluster Manager: COD-aware layer
Slide 5
Dynamic Virtual ClustersDynamic Virtual Clusters
CODManager
Reserve pool(off-power)
SGE VirtualCluster
Ninja Virtual Cluster
Node reallocatio
nExample: CNN on 9/11
DB
Slide 6
Those Wonderful ToysThose Wonderful Toys
Leverage open standards and open sourceLeverage open standards and open source– DHCP, NFS, NIS, XMLDHCP, NFS, NIS, XML
– Only constraint is that Linux must support hardwareOnly constraint is that Linux must support hardware
– PXELinux-based installer, RHAT/Debian toolsPXELinux-based installer, RHAT/Debian tools
Currently testing working COD prototypeCurrently testing working COD prototype– Core of policy-based scheduling engine: CSP-solverCore of policy-based scheduling engine: CSP-solver
– Framework of node requests + allocation Framework of node requests + allocation negotiationnegotiation
– OS- and filesystem-agnostic installerOS- and filesystem-agnostic installer
– Testbed to examine policies and microbenchmarksTestbed to examine policies and microbenchmarks
Slide 7
COD: Size Doesn’t MatterCOD: Size Doesn’t Matter
Enable management scalability for hosting Enable management scalability for hosting centerscenters
– Hierarchical policy-driven mechanismsHierarchical policy-driven mechanisms
– Empower owners and customersEmpower owners and customers
Details and paper atDetails and paper at
http://www.cs.duke.edu/~justin/cod/http://www.cs.duke.edu/~justin/cod/
Slide 8
Questions?Questions?
Top Related