Latency as a Performability Metric: Experimental Results Pete Broadwell [email protected].
Grid & performability Aad van Moorsel aadvanmoorsel.com.
-
Upload
ian-bailey -
Category
Documents
-
view
216 -
download
0
Transcript of Grid & performability Aad van Moorsel aadvanmoorsel.com.
grid &performability
Aad van Moorselaadvanmoorsel.com
page 2April 2003 Copyright Aad van Moorsel, HP Labs
outline
to set the stage:• what is grid?• what is performability?
three perspectives on grid performability:• `customer’ requirements• system implementation
– utility computing• associated research challenges
– focus on stochastic modeling
page 3April 2003 Copyright Aad van Moorsel, HP Labs
what is grid?
what is performability?
page 4April 2003 Copyright Aad van Moorsel, HP Labs
grid
for me, and in this talk:• middleware layer, Globus-like• shares resources• crosses boundaries
– administrative domains, user domains, enterprise domains, …
• software-implemented boundaries– flexibility in who uses what when– flexibility in what is secured against whom when– flexibility in who charges for what when– …
• makes resources manageable– grades of QoS– dynamic management of QoS– service level agreements, business metrics and
penalties
page 5April 2003 Copyright Aad van Moorsel, HP Labs
performability
for me, and in this talk:
• quality of service (QoS)
context:• Meyer: metric P(T<t) where T was some random variable• my thesis: meaningful quantitative evaluation of a system
(definition 2 out of 3)• others: performance and reliability• SPN models for system state, rewards or queuing networks for
performance/metric
page 6April 2003 Copyright Aad van Moorsel, HP Labs
grid & performability
we accept the claim that grid is software that will facilitate flexible performability management
• the software design still leaves to be desired– automation? autonomous? autonomic?– scaling? inter-business? security?
• but the applications will drive it in the right direction– utility computing– service-centric outsourcing
page 7April 2003 Copyright Aad van Moorsel, HP Labs
grid & performability
`customer’ perspective
page 8April 2003 Copyright Aad van Moorsel, HP Labs
business costs of owning and operating IT have gone through the roof
page 9April 2003 Copyright Aad van Moorsel, HP Labs
business cost of IT failures
downtime costs per hour
brokerage operations $6,450,000credit card authorization $2,600,000e-bay (1 outage 22 hours)$225,000amazon.com $180,000package shipping services$150,000home shopping channel $113,000catalog sales center $90,000airline reservation center $89,000cellular service activation $41,000on-line network fees $25,000ATM service fees $14,000
source: Dave Patterson keynote at FAST ‘02
survey of computer damages in France, 2000
page 10
April 2003 Copyright Aad van Moorsel, HP Labs
courtesy of Lisa Spainhower, IBM
operational complexity: scale
page 11
April 2003 Copyright Aad van Moorsel, HP Labs
operator faces heterogeneity
Content Logic Processes
Business Place content closer to where it is needed
Reengineer business process Select services for each activity in the process dynamically
Databases App servers Web servers
Software
Share a database vs. create a new database Re-index tables to optimize queries
Number of app servers needed Start and stop new app servers
Number of web servers needed Load balance transactions across servers
Servers Network Storage
Hardware
Allocate machines to applications Replace a failed machine transparently by migrating its applications
Reserve network bandwidth prior to use QoS-based routing decisions
Assign storage devices to workloads Configure buffer sizes in device drivers to maximize performance
CDN
BPR
dynamiccomposition
databaseUtility
ZLE, DBMS
App serverUtility
Web serverUtility
loadbalancing
UDC/QM/SF
VMs
Storagemanagement
RSVP
page 12
April 2003 Copyright Aad van Moorsel, HP Labs
operation faces federation needs
Content Logic Processes
Business Place content closer to where it is needed
Reengineer business process Select services for each activity in the process dynamically
Databases App servers Web servers
Software
Share a database vs. create a new database Re-index tables to optimize queries
Number of app servers needed Start and stop new app servers
Number of web servers needed Load balance transactions across servers
Servers Network Storage
Hardware
Allocate machines to applications Replace a failed machine transparently by migrating its applications
Reserve network bandwidth prior to use QoS-based routing decisions
Assign storage devices to workloads Configure buffer sizes in device drivers to maximize performance
page 13
April 2003 Copyright Aad van Moorsel, HP Labs
customer needs
business-driven, automated operator toolsfor systems with increasing
scale, heterogeneity and federation challenges
page 14
April 2003 Copyright Aad van Moorsel, HP Labs
grid & performability
system perspective (utility computing)
page 15
April 2003 Copyright Aad van Moorsel, HP Labs
twin UDCs in HP Labs
• built the first large utility data center in Palo Alto (US) and Bristol (UK)
– learn what it takes to build a solution
– move HPL IT services to the UDC• the first virtualized data center
– from Server, storage, networks to energy management
– dynamically assigns applications to resources
– customer sees resources as ‘utility’
– operator sees resources as ‘utility’
page 16
April 2003 Copyright Aad van Moorsel, HP Labs
utility computing from usage perspective
UDC1
UDC2
Server Cluster
??
reserving resourcesgetting resourcesflexing resources
page 17
April 2003 Copyright Aad van Moorsel, HP Labs
utility computing from operator perspective
UDC/XMLInterface
Utility Data Center =programmable poolof data center resources
UDC GRAM =GlobusGatekeeper +UDC Adapter
UDCGRAM
UDCGRAM
Grid interface
(prototype developed at HP Labs, initially gtk2, currently migrated to
gtk3)
page 18
April 2003 Copyright Aad van Moorsel, HP Labs
title
configureproperties
page 19
April 2003 Copyright Aad van Moorsel, HP Labs
title
generateRSL
page 20
April 2003 Copyright Aad van Moorsel, HP Labs
utility computing for operators
utility computing has great potential to improve operations:
• better utilization of resources• better tools for setting up applications• new business models, better accountability
but UDC is just one, high-end solution
need something that is open, extensible, uniform, …
grid based management backplane
page 21
April 2003 Copyright Aad van Moorsel, HP Labs
utility computing grid middleware
everything is a Grid
service
leverage Grid
HP value-add
management
OpenView orchestrate
s IT
OpenView command and control
SLA
base Grid:uniform interface, single sign-on, federation, stateful services
management backplane: monitoring, rich discovery, life-cycle, coordinated ‘act’, policy,biz-impact driven adaptation, flexible secure mgmt domains
page 22
April 2003 Copyright Aad van Moorsel, HP Labs
more automation: flexing resources
objective: increase asset utilization via resource sharing while providing a desired quality of service for applications
approach: a statistical multiplexing technique for resource utilities that host business applications
characteristics of business applications:• require resources continuously• changes in number of users and workload mix may result in:
– time varying demands
– large peak to mean ratios for demand
– future demands that are difficult to predict precisely
• customers want assurances they will get resources when needed
– for example, resource request will be satisfied with a prob. p=0.999
– i.e. 999 times out of 1000
– customers don’t always need an assurance of p=1.0
page 23
April 2003 Copyright Aad van Moorsel, HP Labs
statistical demand profiles
to guide the development of our techniques we rely on gathered data:– 48 servers in an HP data center– hosting business applications– each with 2 to 8 CPUs
create a statistical demand profile for each application– compact representation of pattern for demand– characterize “day of week” and “day of weekend” separately
• ignore weekends for the purpose of the study– characterize a “weekday” by 24 60-minute time slots
• probability mass function (pmf) gives the observed distribution for the number of CPUs needed per slot
the profiles populate a calendar of “expected demand” for the utility– enables admission control
page 24
April 2003 Copyright Aad van Moorsel, HP Labs
admission control approach
• a new application requests admission to the utility
• assume we admit the new application• unfold its profile onto the utility’s calendar for a
capacity planning horizon – for example, several months into the future
• characterize the calendar’s new per-slot distributions of aggregate demand
• use distributions to estimate required size of resource pool
• admit application if there are sufficient resources
page 25
April 2003 Copyright Aad van Moorsel, HP Labs
demands for a time slot t
applications
utility:- distribution of aggregate demand is approximated by the joint pmf- however, we must also consider correlations between application demands
page 26
April 2003 Copyright Aad van Moorsel, HP Labs
experimental design and results
• how many CPUs are needed if applications:– are statically assigned their peak numbers of CPUs?– are assigned the peak number of CPUs needed on per-slot basis?– are offered assurance p that resource requests will be satisfied?
• about the experiments:– include application demand correlations as measured– include 60 minute warm-up/warm-down application migration
overheads– reported estimates verified using trace driven simulation
resource access mechanism
number of CPUs required
static 309peak per slot (p=1.0) 275statistical multiplexing p=0.999
179 (estimate)
statistical multiplexing p=0.99
163 (estimate)
page 27
April 2003 Copyright Aad van Moorsel, HP Labs
grid & performability
modeling research perspective
page 28
April 2003 Copyright Aad van Moorsel, HP Labs
modeling issue I the many perspectives of virtualization
virtualization enables flexibility in UDC:1. storage area networks let applications use any
storage device 2. computing virtualization allows to assign CPUs
dynamically to customers3. virtual LAN creates a secure private network
virtualization gives the illusion of some traditional functionality (‘boundaries’), but implements it ‘soft’
modeling challenges: different views for different users, dynamic changing of boundaries (performability!), how to utilize the models contained by the software
page 29
April 2003 Copyright Aad van Moorsel, HP Labs
modeling issue IIon-line algorithms
on-line algorithms are key to conquer complexity:• automated adaptation needs on-line algorithms
on-line algorithms come in many shapes and forms:• days: resource scheduling• seconds: load balancing, admission control, retries• milliseconds: memory optimization, real-time scheduling
typical issues:• speed of the model solution• chose between statistical and structural models• obtaining the right on-line data• plug-in algorithm module need data model that fits with
operational model
page 30
April 2003 Copyright Aad van Moorsel, HP Labs
modeling issue IIIhow to validate large scale systems
many facets to scale:• more and more devices• more and more interconnected (even globally)• increasing number of users• multi-party and multi-ownership• greater differences in scale: smaller devices,
bigger data centers• amount of data collected and analysis done
increases with the scale of the systems
we have no good ways of analyzing large-scale systems: no test beds, no reliable data, no widely accepted modeling approaches
page 31
April 2003 Copyright Aad van Moorsel, HP Labs
modeling issue IVhow to evaluate for business metrics
the real metric of interest is euros:• how much is the total cost of ownership• how much am I as customer willing to pay for a
service• what penalties do I as provider accept in an SLA• if I invest x, what is the return on IT investment
how do we model the money/QoS correlation?
page 32
April 2003 Copyright Aad van Moorsel, HP Labs
conclusion
• adaptive/utility/autonomic computing has intrinsic need for QoS (performability) modeling and analysis
• the grid is believed to be the platform of choice– applications are more interesting than the
middleware
• challenges for stochastic modeling larger than ever in this setting:– virtualization– on-line algorithms– large-scale systems– business metrics