Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary...

26
MSST 2010 Jianhui Yue,Yifeng Zhu, Zhao Cai, and Lin Lin Electrical and Computer Engineering University of Maine Energy and Thermal Aware Buffer Cache Replacement Algorithm

Transcript of Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary...

Page 1: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

MSST 2010

Jianhui Yue, Yifeng Zhu,

Zhao Cai, and Lin Lin

Electrical and Computer Engineering

University of Maine

Energy and Thermal Aware Buffer Cache

Replacement Algorithm

Page 2: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

2

Research Summary

Memory power consumption

Consumes 50% more power than CPU on server

Consumes 2 times more power than disks in a specific

supercomputing center

Power dissipation increases cooling cost

Power and thermal aware buffer cache replacement algorithm

Limit replacements on a smaller set of memory ranks without hurting

hit rate

Save up to 27% energy

Reduce temperature 5.45 ℃ than LRU

Page 3: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

3

Outline

Background of memory systems

New buffer cache replacement algorithm

Evaluation

Conclusion and future work

Page 4: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

4

Memory Chip Organization

DIMM includes one/two ranks which is an independent pwr. mgt. unit

Page 5: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

5

DRAM Power States

Read

Precharge

Powerdown

Write

144 mW

396 mW405 mW

12.6 mW

10 cycles2 cycles

DDR2 533MHz 512Mb device

Inactive/Active

Background Power

I/O Power

• Each rank includes multiple DDRx devices and provides data to data bus in union

• The rank is basic memory power management unit

Page 6: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

6

DMA Overlapping

Multiple DMA transfers on different

PIC buses to access the same

memory rank simultaneously in a time

multiplex way.

Multiplexing multiple slow DMA

transmissions to one memory rank can

reduce active background power

Disk DMA

Channels

Memory Chips

Disk Array

Network DMA

Channels

Network

I/O Requests

Network

Adapters

Network I/O

DMA

Disk I/O

DMAA1

B1 A2B2

B3

A3

B0

A0

Page 7: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

7

Fully-Buffered DIMM(FBDIMM)

Memory Controller

AMB

AMB

AMB

southbound linknorthbound link

write data +cmdread data

DIMM 0

DIMM 1

DIMM n

n is up to 7

1. Increases memory

density, reliability and

speed Includes read data

link and write data link

2. AMB acts as pass-

through switch

3. AMB consumes power

Page 8: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

8

Outline

Background of memory systems

New buffer cache replacement algorithm

Evaluation

Conclusion and future work

Page 9: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

9

Bottom Most Replacement Algorithm (BMA)

Main Idea

Limit replacements to a smaller number of memory

ranks without hurting hit rate much.

Method

Choose the victim rank with the most number of cold

blocks in the bottom of LRU stack

Find victim block from the specified victim rank rather

than LRU block for energy saving

Update victim rank for adaption to blocks’ recency

changes

Page 10: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

10

Choose Victim Rank

D

E

G

H

I

J

A

C

B

rank0 rank1 rank2

E

D

A

I

G

B

J

H

C

LRU stack

bottom

top Memory Physical Layout

LRU

Request

block

Victim

block

Victim

rankD

E

G

H

I

J

A

C

B

rank0 rank1 rank2

E

D

A

I

G

B

J

H

C

LRU stack

bottom

top Memory Physical Layout

Bottom Most Algorithm (BMA)

Request

block

Victim

block

Victim

rank

Window = 5

victim_rank = 0last_victim_cnt = 3

Page 11: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

11

Access Sequence: W, X, Y, Z

D

E

G

H

I

J

A

C

B

rank0 rank1 rank2

E

D

A

I

G

B

J

H

C

LRU stack

bottom

top Memory Physical Layout

LRU

Request

block

Victim

block

Victim

rankD

E

G

H

I

J

A

C

B

rank0 rank1 rank2

E

D

A

I

G

B

J

H

C

LRU stack

bottom

top Memory Physical Layout

Request

block

Victim

block

Victim

rank

Window = 5

victim_rank = 0last_victim_cnt = 3

W A 0 W 0A

Bottom Most Algorithm (BMA)

Page 12: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

12

Access Sequence: W, X, Y, Z

E

G

H

I

J

W

B

D

C

rank0 rank1 rank2

E

D

W

I

G

B

J

H

C

LRU stack

bottom

top Memory Physical Layout

LRU

Request

block

Victim

block

Victim

rank

W A 0

rank0 rank1 rank2

E

D

W

I

G

B

J

H

C

LRU stack

bottom

top Memory Physical Layout

Request

block

Victim

block

Victim

rank

W A 0

Window = 5

victim_rank = 0last_victim_cnt = 2

Bottom Most Algorithm (BMA)

E

G

H

I

J

W

B

D

C

X B 1 X D 0

Page 13: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

13

Access Sequence: W, X, Y, Z

rank0 rank1 rank2

E

D

W

I

G

X

J

H

C

LRU stack

bottom

top Memory Physical Layout

LRU

Request

block

Victim

block

Victim

rank

W A 0

X B 1

rank0 rank1 rank2

E

X

W

I

G

B

J

H

C

LRU stack

bottom

top Memory Physical Layout

Request

block

Victim

block

Victim

rank

W A 0

X D 0

Window = 5

victim_rank = 0last_victim_cnt = 1

Bottom Most Algorithm (BMA)

G

H

I

J

W

X

C

E

D

G

H

I

J

W

X

B

E

C

Y C 2 Y E 0

Page 14: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

14

After Y swapped in

rank0 rank1 rank2

E

D

W

I

G

X

J

H

Y

LRU stack

bottom

top Memory Physical Layout

LRU

Request

block

Victim

block

Victim

rank

W A 0

X B 1

Y C 2

rank0 rank1 rank2

Y

X

W

I

G

B

J

H

C

LRU stack

bottom

top Memory Physical Layout

Request

block

Victim

block

Victim

rank

W A 0

X D 0

Y E 0

Window = 5

victim_rank = 0last_victim_cnt = 0

Bottom Most Algorithm (BMA)

H

I

J

W

X

Y

D

G

E

H

I

J

W

X

Y

B

G

C

Page 15: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

15

Choose New Victim Rank

rank0 rank1 rank2

E

D

W

I

G

X

J

H

Y

LRU stack

bottom

top Memory Physical Layout

LRU

Request

block

Victim

block

Victim

rank

W A 0

X B 1

Y C 2

rank0 rank1 rank2

Y

X

W

I

G

B

J

H

C

LRU stack

bottom

top Memory Physical Layout

Request

block

Victim

block

Victim

rank

W A 0

X D 0

Y E 0

Window = 5

victim_rank = 1last_victim_cnt = 3

Bottom Most Algorithm (BMA)

H

I

J

W

X

Y

D

G

E

H

I

J

W

X

Y

B

G

C

Page 16: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

16

Access Sequence: W, X, Y, Z

rank0 rank1 rank2

E

D

W

I

G

X

J

H

Y

LRU stack

bottom

top Memory Physical Layout

LRU

Request

block

Victim

block

Victim

rank

W A 0

X B 1

Y C 2

rank0 rank1 rank2

Y

X

W

I

G

B

J

H

C

LRU stack

bottom

top Memory Physical Layout

Request

block

Victim

block

Victim

rank

W A 0

X D 0

Y E 0

Window = 5

victim_rank = 1last_victim_cnt = 3

Bottom Most Algorithm (BMA)

H

I

J

W

X

Y

D

G

E

H

I

J

W

X

Y

B

G

C

Z D 0 Z 1B

Page 17: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

17

Outline

Background

New buffer cache replacement algorithm

Evaluation

Conclusion and future work

Page 18: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

18

Simulator and I/O Trace Characteristics

Page 19: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

19

TPC-C Results

active

↓idle

AMB

DIMM

Page 20: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

20

LM-TBE Results

idle

active

AMB

DIMM

Page 21: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

21

MSN-BEFS Results

idle

active

AMB

DIMM

Page 22: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

22

Outline

Background

New buffer cache replacement algorithm

Evaluation

Conclusion and future work

Page 23: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

23

Conclusion and Future Work

The first method to save buffer cache memory energy without

data migration

Limit replacement to smaller set of memory rank without

sacrificing hit rate for memory

Saves up to 27% energy and reduces temperature 5.45 ℃than LRU

Implement this algorithm at Linux Kernel

Evaluate algorithm at real physical machine with I/O intensive

workload

Page 24: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

24

Thanks !

Page 25: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

25

DRAM Thermal

TrafficforwardingiclocalTraffstaticAMB PPPP

Page 26: Energy and Thermal Aware Buffer Cache Replacement Algorithm · 2010. 5. 8. · 2 Research Summary Memory power consumption Consumes 50% more power than CPU on server Consumes 2 times

26

Simulation Parameters

Search window 1024 Thresh window 1024 Block size 64KB