Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary...
Transcript of Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary...
MSST 2010
Jianhui Yue, Yifeng Zhu,
Zhao Cai, and Lin Lin
Electrical and Computer Engineering
University of Maine
Energy and Thermal Aware Buffer Cache
Replacement Algorithm
2
Research Summary
Memory power consumption
Consumes 50% more power than CPU on server
Consumes 2 times more power than disks in a specific
supercomputing center
Power dissipation increases cooling cost
Power and thermal aware buffer cache replacement algorithm
Limit replacements on a smaller set of memory ranks without hurting
hit rate
Save up to 27% energy
Reduce temperature 5.45 ℃ than LRU
3
Outline
Background of memory systems
New buffer cache replacement algorithm
Evaluation
Conclusion and future work
4
Memory Chip Organization
DIMM includes one/two ranks which is an independent pwr. mgt. unit
5
DRAM Power States
Read
Precharge
Powerdown
Write
144 mW
396 mW405 mW
12.6 mW
10 cycles2 cycles
DDR2 533MHz 512Mb device
Inactive/Active
Background Power
I/O Power
• Each rank includes multiple DDRx devices and provides data to data bus in union
• The rank is basic memory power management unit
6
DMA Overlapping
Multiple DMA transfers on different
PIC buses to access the same
memory rank simultaneously in a time
multiplex way.
Multiplexing multiple slow DMA
transmissions to one memory rank can
reduce active background power
Disk DMA
Channels
Memory Chips
Disk Array
Network DMA
Channels
Network
I/O Requests
Network
Adapters
Network I/O
DMA
Disk I/O
DMAA1
B1 A2B2
B3
A3
B0
A0
7
Fully-Buffered DIMM(FBDIMM)
Memory Controller
AMB
AMB
AMB
southbound linknorthbound link
write data +cmdread data
DIMM 0
DIMM 1
DIMM n
n is up to 7
1. Increases memory
density, reliability and
speed Includes read data
link and write data link
2. AMB acts as pass-
through switch
3. AMB consumes power
8
Outline
Background of memory systems
New buffer cache replacement algorithm
Evaluation
Conclusion and future work
9
Bottom Most Replacement Algorithm (BMA)
Main Idea
Limit replacements to a smaller number of memory
ranks without hurting hit rate much.
Method
Choose the victim rank with the most number of cold
blocks in the bottom of LRU stack
Find victim block from the specified victim rank rather
than LRU block for energy saving
Update victim rank for adaption to blocks’ recency
changes
10
Choose Victim Rank
D
E
G
H
I
J
A
C
B
rank0 rank1 rank2
E
D
A
I
G
B
J
H
C
LRU stack
bottom
top Memory Physical Layout
LRU
Request
block
Victim
block
Victim
rankD
E
G
H
I
J
A
C
B
rank0 rank1 rank2
E
D
A
I
G
B
J
H
C
LRU stack
bottom
top Memory Physical Layout
Bottom Most Algorithm (BMA)
Request
block
Victim
block
Victim
rank
Window = 5
victim_rank = 0last_victim_cnt = 3
11
Access Sequence: W, X, Y, Z
D
E
G
H
I
J
A
C
B
rank0 rank1 rank2
E
D
A
I
G
B
J
H
C
LRU stack
bottom
top Memory Physical Layout
LRU
Request
block
Victim
block
Victim
rankD
E
G
H
I
J
A
C
B
rank0 rank1 rank2
E
D
A
I
G
B
J
H
C
LRU stack
bottom
top Memory Physical Layout
Request
block
Victim
block
Victim
rank
Window = 5
victim_rank = 0last_victim_cnt = 3
W A 0 W 0A
Bottom Most Algorithm (BMA)
12
Access Sequence: W, X, Y, Z
E
G
H
I
J
W
B
D
C
rank0 rank1 rank2
E
D
W
I
G
B
J
H
C
LRU stack
bottom
top Memory Physical Layout
LRU
Request
block
Victim
block
Victim
rank
W A 0
rank0 rank1 rank2
E
D
W
I
G
B
J
H
C
LRU stack
bottom
top Memory Physical Layout
Request
block
Victim
block
Victim
rank
W A 0
Window = 5
victim_rank = 0last_victim_cnt = 2
Bottom Most Algorithm (BMA)
E
G
H
I
J
W
B
D
C
X B 1 X D 0
13
Access Sequence: W, X, Y, Z
rank0 rank1 rank2
E
D
W
I
G
X
J
H
C
LRU stack
bottom
top Memory Physical Layout
LRU
Request
block
Victim
block
Victim
rank
W A 0
X B 1
rank0 rank1 rank2
E
X
W
I
G
B
J
H
C
LRU stack
bottom
top Memory Physical Layout
Request
block
Victim
block
Victim
rank
W A 0
X D 0
Window = 5
victim_rank = 0last_victim_cnt = 1
Bottom Most Algorithm (BMA)
G
H
I
J
W
X
C
E
D
G
H
I
J
W
X
B
E
C
Y C 2 Y E 0
14
After Y swapped in
rank0 rank1 rank2
E
D
W
I
G
X
J
H
Y
LRU stack
bottom
top Memory Physical Layout
LRU
Request
block
Victim
block
Victim
rank
W A 0
X B 1
Y C 2
rank0 rank1 rank2
Y
X
W
I
G
B
J
H
C
LRU stack
bottom
top Memory Physical Layout
Request
block
Victim
block
Victim
rank
W A 0
X D 0
Y E 0
Window = 5
victim_rank = 0last_victim_cnt = 0
Bottom Most Algorithm (BMA)
H
I
J
W
X
Y
D
G
E
H
I
J
W
X
Y
B
G
C
15
Choose New Victim Rank
rank0 rank1 rank2
E
D
W
I
G
X
J
H
Y
LRU stack
bottom
top Memory Physical Layout
LRU
Request
block
Victim
block
Victim
rank
W A 0
X B 1
Y C 2
rank0 rank1 rank2
Y
X
W
I
G
B
J
H
C
LRU stack
bottom
top Memory Physical Layout
Request
block
Victim
block
Victim
rank
W A 0
X D 0
Y E 0
Window = 5
victim_rank = 1last_victim_cnt = 3
Bottom Most Algorithm (BMA)
H
I
J
W
X
Y
D
G
E
H
I
J
W
X
Y
B
G
C
16
Access Sequence: W, X, Y, Z
rank0 rank1 rank2
E
D
W
I
G
X
J
H
Y
LRU stack
bottom
top Memory Physical Layout
LRU
Request
block
Victim
block
Victim
rank
W A 0
X B 1
Y C 2
rank0 rank1 rank2
Y
X
W
I
G
B
J
H
C
LRU stack
bottom
top Memory Physical Layout
Request
block
Victim
block
Victim
rank
W A 0
X D 0
Y E 0
Window = 5
victim_rank = 1last_victim_cnt = 3
Bottom Most Algorithm (BMA)
H
I
J
W
X
Y
D
G
E
H
I
J
W
X
Y
B
G
C
Z D 0 Z 1B
17
Outline
Background
New buffer cache replacement algorithm
Evaluation
Conclusion and future work
18
Simulator and I/O Trace Characteristics
19
TPC-C Results
active
↓idle
↑
AMB
DIMM
20
LM-TBE Results
idle
↑
active
↓
AMB
DIMM
21
MSN-BEFS Results
idle
↑
active
↓
AMB
DIMM
22
Outline
Background
New buffer cache replacement algorithm
Evaluation
Conclusion and future work
23
Conclusion and Future Work
The first method to save buffer cache memory energy without
data migration
Limit replacement to smaller set of memory rank without
sacrificing hit rate for memory
Saves up to 27% energy and reduces temperature 5.45 ℃than LRU
Implement this algorithm at Linux Kernel
Evaluate algorithm at real physical machine with I/O intensive
workload
24
Thanks !
25
DRAM Thermal
TrafficforwardingiclocalTraffstaticAMB PPPP
26
Simulation Parameters
Search window 1024 Thresh window 1024 Block size 64KB