Vasileios Trigonakis | 06.20161
Unlocking Energy
Babak Falsafi, Rachid Guerraoui, Javier Picorel, Vasileios Trigonakis
Vasileios Trigonakis | 06.20162
Unlocking Energy
Conference Dilemma: To Sleep or Not to Sleep?
For the next 25 minutes
Do not sleep for such short duration
Motivation
sleep
busy wait
locking
Vasileios Trigonakis | 06.20163
Unlocking Energy
Energy Efficiency Through Synchronization (Locking)
Why lock-based synchronization?
1. Concurrent systems synchronize with locks
2. Locks are well-defined abstractions
lock() / unlock()
3. Locking strategies affect power consumption
Motivation
busy lock
lock() wait!(waste time
and energy)
Vasileios Trigonakis | 06.20164
Unlocking Energy
0
0.5
1
1.5
2
Power Energy Efficiency
No
rmal
ized
to
Sle
epin
g
sleeping spinning
Lock Waiting Techniques
Locking is a good candidate for reducing energy consumption
Motivation
busy lock
sleeping
(blocking)
busy waiting
(spinning)
Java CopyOnWriteArrayList
Energy Efficiency
Throughput / Power
Q. Is sleeping energy-friendly?
Vasileios Trigonakis | 06.20165
Unlocking Energy
Energy Efficiency By Improving Locking
1. Concurrent systems synchronize with locks
2. Locks are well-defined abstractions
3. Locking strategies affect power consumption
POLY
4.
Energy efficiency and throughput go hand in hand
in the context of lock algorithms
Vasileios Trigonakis | 06.20166
Unlocking Energy
Outline
• Motivation
• Improving the energy efficiency of systems
Target platform2-socket Intel Ivy Bridge
20 cores, 40 hyper-threads
busy lock
busy waiting hurts power consumption
sleeping saves power, but hurts throughput
spin-then-sleep “cleverly”
Vasileios Trigonakis | 06.20167
Unlocking Energy
Observations
Power Consumption of Waiting
Busy waiting is very power hungry
1 lock, never released
Waiting
1. Sleeping power-friendly
2. Busy waiting
while (*lock != FREE) {}
0
20
40
60
80
100
120
140
160
10 20 30 40
Po
wer
(W
atts
)
# Threads
sleeping busy waiting
Vasileios Trigonakis | 06.20168
Unlocking Energy
Observations
Reducing Power of Busy Waiting
Power consumption of busy waiting cannot be practically reduced
1 lock, never released
Busy Waiting
1. empty: 1 iteration / cycle
2. Intel docs: use pause
3. pause not ideal
4. mfence > pause
5. Still, mfence ~5% better
6. DVFS and mwait
are not practical
(details in the paper)
while (*lock != FREE) { ?? }
0
50
100
150
200
10 20 30 40
Po
wer
(W
atts
)
# Threads
empty pause mfence DVFS mwait
Vasileios Trigonakis | 06.20169
Unlocking Energy
Outline
• Motivation
• Improving the energy efficiency of systems
busy lock
busy waiting hurts power consumption
sleeping saves power, but hurts throughput
spin-then-sleep “cleverly”
Vasileios Trigonakis | 06.201610
Unlocking Energy
0
0.2
0.4
0.6
0.8
1
sleeping unfair spin fair spin
No
rmal
ized
Th
rou
gh
pu
t
MySQL In-Memory
Sleeping Might Be Necessary (For Two Reasons)
Sleeping can reduce power consumption (and more)
Sleeping
Power Consumption—Waiting Locks with Multiprogramming1 2
# threads >
hw contexts
0
50
100
150
10 20 30 40
Po
wer
(W
atts
)
# Threads
sleeping busy waiting
Vasileios Trigonakis | 06.201611
Unlocking Energy
Observations
Latency: The Price of Sleeping
Frequent sleep/wake-up calls reduce throughput without saving energy
2 threads invoke futex
1 sleeps,1 wakes up
Sleeping
1. Sleep call:
release context
2. Wake-up call:
to handover the lock
3. Turnaround latency ≈
lock handover latency
0
1000
2000
3000
4000
5000
6000
7000
8000
sleep call wake-up call turnaround
Lat
ency
(cy
cles
)
large critical section
small critical section
Vasileios Trigonakis | 06.201612
Unlocking Energy
Outline
• Motivation
• Improving the energy efficiency of systems
busy lock
busy waiting hurts power consumption
sleeping saves power, but hurts throughput
spin-then-sleep “cleverly”
Vasileios Trigonakis | 06.201613
Unlocking Energy
Reducing Fairness: Sleeping for Long Durations
Trade fairness for energy efficiency
Power Consumption Communication Throughput
Passing a “token”
from thread to thread
Sping-then-sleep
0
50
100
150
1 5 10 15 20 25 30 35
Po
wer
(W
atts
)
# Threads
sleep spin unfair
0
5
10
15
1 5 10 15 20 25 30 35Th
rou
gh
pu
t (O
ps/
s)
Mill
ions
# Threads
sleep spin unfair
unfair: 1000:1 spin-to-sleep ratio (while 2 threads spin, the rest sleep)
Vasileios Trigonakis | 06.201614
Unlocking Energy
How can we use these results in designing locks?
Design locks
Vasileios Trigonakis | 06.201615
Unlocking Energy
Problems of Pthread MUTEX lock
Pthread MUTEX does not take into account the sleep overheads
MUTEX
MUTEX
For up to 100 attempts
spin with pause
if still busy, sleep
MUTEX
release in user space
wake up a thread
lock()
unlock()
1. Spins less than sleep latencies
2. Spins with pause
3. Always wakes up a thread
Vasileios Trigonakis | 06.201616
Unlocking Energy
MUTEXEE: An Optimized MUTEX Lock
MUTEXEE
MUTEX MUTEXEE
For up to 100 attempts For up to ~8000 cycles
spin with pause spin with mfence
if still busy, sleep
MUTEX MUTEXEE
release in user space (lock->locked = 0)
wait in user space (~300 cycles)
wake up a thread
lock()
unlock()
Vasileios Trigonakis | 06.201617
Unlocking Energy
Performance of MUTEXEE over MUTEX
MUTEXEE fixes the problematic cases of MUTEX
One lock
MUTEXEE
10
30
500
1
2
3
0
1000
2000
3000
4000
5000
6000
7000
8000
# T
hre
ads
Critical Section (cycles)
10
30
500123456
0
1000
2000
3000
4000
5000
6000
7000
8000
# T
hre
ads
Critical Section (cycles)
Fairness: results and analysis in the paper
Throughput Energy Efficiency
Vasileios Trigonakis | 06.201618
Unlocking Energy
Outline
• Motivation
• Improving the energy efficiency of systems
busy lock
busy waiting hurts power consumption
sleeping saves power, but hurts throughput
spin-then-sleep “cleverly”
Vasileios Trigonakis | 06.201619
Unlocking Energy
Evaluation: Improving Energy Efficiency of Systems Through Locks
Six modern software systems
– Overload pthread mutex
Average Throughput and Energy Efficiency
Locking can indeed be used to improve the energy efficiency of systems
Systems
KyotoCabinet
0
0.5
1
1.5
Throughput Energy Efficiency
No
rmal
ized
to
M
UT
EX
TICKET MUTEXEE
1.26
1.28
Results
1. Benefits: Avoid sleeping
2. Sleeping is sometimes necessary
3. Throughput-driven benefits
4. MUTEXEE >> MUTEX
Vasileios Trigonakis | 06.201620
Unlocking Energy
Concluding Remarks
• An analysis of the energy efficiency of lock-based synchronization
– Energy efficiency of locks goes hand in hand with throughput
– MUTEXEE: an optimized MUTEX lock
• LOCKIN: https://github.com/LPD-EPFL/lockin
THANK YOU! QUESTIONS?
Conclusions
Locking can be used to improve the energy efficiency of systems
Top Related