Energy Conservation in Datacenters through Cluster Memory Management and Barely-Alive Memory Servers...
-
Upload
melvin-jordan -
Category
Documents
-
view
214 -
download
0
Transcript of Energy Conservation in Datacenters through Cluster Memory Management and Barely-Alive Memory Servers...
Energy Conservation in Energy Conservation in Datacenters through Cluster Datacenters through Cluster Memory Management and Memory Management and
Barely-Alive Memory ServersBarely-Alive Memory Servers
Vlasia Anagnostopoulou ([email protected]), Susmit Biswas, Alan Savage, Ricardo
Bianchini*, Tao Yang, Frederic T. Chong
Department of Computer Science, UC Santa Barbara*Department of Computer Science, Rutgers University
Dependence on Internet-services…
• More online services– new internet-services,
email, informational sites, social networks…
• Example:– Growth of web-search 0
100000000
200000000
300000000
400000000
500000000
1998 2001 2004 2007
Year
Rise in Daily Searches
Google Searches Total
10,000500,000
200,000,000
450,000,000
Environmental impact• Internet-services live in datacenters
• Thousands of machines per datacenter
• Many datacenters across the globe– Energy consumption: ~1.2% total US
[Ref: EPA]– Energy consumpt.
growth:
Are Datacenters efficient?
• Strict performance standards for internet-services through SLAs
• Over-provisioning– Machines are
under-utilized most of the time
– Servers are inefficient at low or average utilization Ref: Barroso and Hölzle
Current techniques for efficiency• Under low load, reconfigure cluster
– Consolidate load into fewer machines– Turn rest off– Transition to low power idle state– Memory is not accessible in these states
• Operate at lower frequency (VS) • Performance problems
– For internet-services the working set typically doesn’t shrink with load!– Because of reboot, very slow to restart (~sec)– Have to warm-up memory
For web-search, memory is particularly critical• Search dataset doesn’t change much with
load!
• Searches have temporal locality -> Zipf’s distribution [Ref:Adamic]
• Intense database search– May search up to hundreds servers at a time
• But fairly light CPU task to process a search queryMemory can and should be managed wisely, in order not to loose performance!
Our technique for efficiency + performance:
• Barely-Alive state:– CPU is turned off, memory is kept on– Much lower power consumption
• Distributed middleware:– Request distribution– Transition servers to BA state– Manage server memory content locally– Allocate optimal memory to services globally– Do not degrade performance (respect SLA)Hardware requirements?
How would it be implemented• Memory is accessed
through Memory Controller (MC)
• MC is on CPU (bummer!)
• Install small CPU on NIC
• Memory accessible like DMA via new CPU + MC
• Turn off main CPU
Software requirements?
OFF
Basic request distribution in a self-managed cluster algorithm
Challenges of integrating BA servers into the request distribution scheme:
• Transition from Active to BA and vice versa
• Stale content of BA servers
• LARD: locality-aware request distribution [Ref: V.Pai+others]
• PRESS: its distributed version [Ref: Carrera+Bianchini]
• Main idea:
• Exploit locality in references by forwarding same requests to same machine
• But balance load evenly among machines
I am less loaded than
Server 3
Self-managed cluster with BA servers• Transition Active to BA
– Application decides on global level– Locally, if there are no procs or reqs – Make sure not to over-utilize active servers!
• Stale content of BA servers– Store installs new object (immutability)– Application may invalidate old objects at will– On activate, BA updates its Directory from
active– Periodic activation or state swapping– Space of obsolete objects can be reclaimed
Optimal memory allocation? Multiple services? Energy efficiency?
Middleware for efficient memory management
• Optimal memory allocation– Dynamically size memory to respect exactly the SLA
requirement– Translate SLA requirement -> target hit-ratio– Use stack algorithm to predict optimal size from target
hit-ratio
• Stack algorithm overview– Measures contribution of cache size
to the hit-ratio– On a single pass, it calculates the
cumulative hit-ratio with size
How to adapt the stack algorithm for resizing the global cluster memory optimally?
Size Hits Hit-ratio
1 6/9 66.7%
2 1/9 77.8%
3 0/9 77.8%
Optimal memory allocation
• Distributed stack algorithm– For each server:– Keep track of memory size +
hit ratio information– On time window,
broadcast size for desired hit-ratio. – Resize local stack with global average size
Extension for BA servers, variable sized objects, multiple services…
Extension of distributed Stack Algo• Include BA servers
– Contribute fixed amount of memory (passive)
• Multiple-size objects– Separate stack for each object size– This leverages directory look-up
• Multiple services– Each service keeps its own stack in the
memory – Memory partitioned across services
Energy efficiency of BA state (without the efficiency yielded from the memory management)
Future work
• Currently looking into more on-line and off-line apps (e.g. web-translation, sorting algorithm)
• Extend power consumption breakdown • Sensitivity analysis of power savings to
simulation’s parameters – (e.g. memory capacity, network assumptions,
component access times, etc)
• Evaluation of distributed algorithm
Conclusions• Datacenters have a growing impact on the
environment• Machines in datacenters are inefficient• Memory is a critical component for
performance for applications run on a cluster• Exploit memory without degrading
performance with Barely-Alive state + middleware
• Potential power savings up to 49%, without loss of performance
Questions?
• Thank you for your attention!
• www.cs.ucsb.edu/~arch