Energy Conservation in Datacenters through Cluster Memory Management and Barely-Alive Memory Servers...

Energy Conservation in Energy Conservation in Datacenters through Cluster Datacenters through Cluster Memory Management and Memory Management and

Barely-Alive Memory ServersBarely-Alive Memory Servers

Vlasia Anagnostopoulou ([email protected]), Susmit Biswas, Alan Savage, Ricardo

Bianchini*, Tao Yang, Frederic T. Chong

Department of Computer Science, UC Santa Barbara*Department of Computer Science, Rutgers University

Dependence on Internet-services…

• More online services– new internet-services,

email, informational sites, social networks…

• Example:– Growth of web-search 0

100000000

200000000

300000000

400000000

500000000

1998 2001 2004 2007

Year

Rise in Daily Searches

Google Searches Total

10,000500,000

200,000,000

450,000,000

Environmental impact• Internet-services live in datacenters

• Thousands of machines per datacenter

• Many datacenters across the globe– Energy consumption: ~1.2% total US

[Ref: EPA]– Energy consumpt.

growth:

Are Datacenters efficient?

• Strict performance standards for internet-services through SLAs

• Over-provisioning– Machines are

under-utilized most of the time

– Servers are inefficient at low or average utilization Ref: Barroso and Hölzle

Current techniques for efficiency• Under low load, reconfigure cluster

– Consolidate load into fewer machines– Turn rest off– Transition to low power idle state– Memory is not accessible in these states

• Operate at lower frequency (VS) • Performance problems

– For internet-services the working set typically doesn’t shrink with load!– Because of reboot, very slow to restart (~sec)– Have to warm-up memory

For web-search, memory is particularly critical• Search dataset doesn’t change much with

load!

• Searches have temporal locality -> Zipf’s distribution [Ref:Adamic]

• Intense database search– May search up to hundreds servers at a time

• But fairly light CPU task to process a search queryMemory can and should be managed wisely, in order not to loose performance!

Our technique for efficiency + performance:

• Barely-Alive state:– CPU is turned off, memory is kept on– Much lower power consumption

• Distributed middleware:– Request distribution– Transition servers to BA state– Manage server memory content locally– Allocate optimal memory to services globally– Do not degrade performance (respect SLA)Hardware requirements?

How would it be implemented• Memory is accessed

through Memory Controller (MC)

• MC is on CPU (bummer!)

• Install small CPU on NIC

• Memory accessible like DMA via new CPU + MC

• Turn off main CPU

Software requirements?

OFF

Basic request distribution in a self-managed cluster algorithm

Challenges of integrating BA servers into the request distribution scheme:

• Transition from Active to BA and vice versa

• Stale content of BA servers

• LARD: locality-aware request distribution [Ref: V.Pai+others]

• PRESS: its distributed version [Ref: Carrera+Bianchini]

• Main idea:

• Exploit locality in references by forwarding same requests to same machine

• But balance load evenly among machines

I am less loaded than

Server 3

Self-managed cluster with BA servers• Transition Active to BA

– Application decides on global level– Locally, if there are no procs or reqs – Make sure not to over-utilize active servers!

• Stale content of BA servers– Store installs new object (immutability)– Application may invalidate old objects at will– On activate, BA updates its Directory from

active– Periodic activation or state swapping– Space of obsolete objects can be reclaimed

Optimal memory allocation? Multiple services? Energy efficiency?

Middleware for efficient memory management

• Optimal memory allocation– Dynamically size memory to respect exactly the SLA

requirement– Translate SLA requirement -> target hit-ratio– Use stack algorithm to predict optimal size from target

hit-ratio

• Stack algorithm overview– Measures contribution of cache size

to the hit-ratio– On a single pass, it calculates the

cumulative hit-ratio with size

How to adapt the stack algorithm for resizing the global cluster memory optimally?

Size Hits Hit-ratio

1 6/9 66.7%

2 1/9 77.8%

3 0/9 77.8%

Optimal memory allocation

• Distributed stack algorithm– For each server:– Keep track of memory size +

hit ratio information– On time window,

broadcast size for desired hit-ratio. – Resize local stack with global average size

Extension for BA servers, variable sized objects, multiple services…

Extension of distributed Stack Algo• Include BA servers

– Contribute fixed amount of memory (passive)

• Multiple-size objects– Separate stack for each object size– This leverages directory look-up

• Multiple services– Each service keeps its own stack in the

memory – Memory partitioned across services

Energy efficiency of BA state (without the efficiency yielded from the memory management)

Power savings potential

Cumulative power savings

•Synthetic search trace over 1 day (24h)

Future work

• Currently looking into more on-line and off-line apps (e.g. web-translation, sorting algorithm)

• Extend power consumption breakdown • Sensitivity analysis of power savings to

simulation’s parameters – (e.g. memory capacity, network assumptions,

component access times, etc)

• Evaluation of distributed algorithm

Conclusions• Datacenters have a growing impact on the

environment• Machines in datacenters are inefficient• Memory is a critical component for

performance for applications run on a cluster• Exploit memory without degrading

performance with Barely-Alive state + middleware

• Potential power savings up to 49%, without loss of performance

Questions?

• Thank you for your attention!

• [email protected]

• www.cs.ucsb.edu/~arch

mailto:[email protected]

http://www.cs.ucsb.edu/~arch

Energy Conservation in Datacenters through Cluster Memory Management and Barely-Alive Memory Servers...

Documents

Transcript of Energy Conservation in Datacenters through Cluster Memory Management and Barely-Alive Memory Servers...