Page-based Commands for DRAM Systems Aamer Jaleel Brinda Ganesh Lei Zong.

27
Page-based Commands for DRAM Systems Aamer Jaleel Brinda Ganesh Lei Zong
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of Page-based Commands for DRAM Systems Aamer Jaleel Brinda Ganesh Lei Zong.

Page-based Commands for DRAM Systems

Aamer Jaleel

Brinda Ganesh

Lei Zong

Outline

• Memory System Overview

• Related work

• Experiment setup

• Page level access measurements

• Solution

• Expected Speedup

Processor-Memory Gap

µProc 60% / year. Doubles every 1.5yearsDRAM 9% / year. Doubles every 10 years

Processor-Memory Performance Gap: Grows 50% / yearhttp://www.e-insite.net/ednmag

Memory Access Time

Core L1 L2 MC DRAM

CPU

Access Time (cycles)

L1 3

L2 8

DRAM 181

Data for 1.8GHz Opteron www.aceshardware.com/

Large Size Memory Accesses

• Applications– Initialization– Data Movement– Stream operations

• Operating System– Task Creation– System Calls– Page Allocation, Management

• Functions that would use them– Memset, Clear User– Memcpy, Copy from User, Copy To User

Experiment Setup

• Workstation based– 2.4 GHz P4 (Wonko)– 750MHz PIII (Majikthise) – 900 MHz P III (Jaleel)

• Bochs x86 emulator• Operating System

– Linux Kernel v 2.4.19

• Applications– SPEC2000 Integer benchmarks using glibc-2.2.5

Memset : Count

1.00E+00

1.00E+02

1.00E+04

1.00E+06

1.00E+08

Memset Count

Memset : Access Size

1.0E+00

1.0E+03

1.0E+06

1.0E+09

Average Length Maximum Length

Memset : % Overhead

0

5

10

15

20

25

vorte

xgc

cgz

ip

perlb

mk

twolf

craf

ty vpr

bzip2 m

cf

parse

r

% O

verh

ead

% Memset Time

Memcpy: Count

1.00E+00

1.00E+02

1.00E+04

1.00E+06

1.00E+08

Memcpy Count

Memcpy : Access Size

1.0E+00

1.0E+03

1.0E+06

1.0E+09

vortex gcc gzip perlbmk twolf crafty vpr bzip2 mcf parser

Average Length Maximum Length

Memcpy: % Overhead

0

5

10

15

20

25

30

35

% O

ve

rhe

ad

OS : Memset / Clear User Real-Time Plot

• Behavior over Time

• Frequency of operation

• Access Size

• Operation Duration

• Averages

OS : Memcpy / Copy User Real-Time Plot

• Behavior over Time

• Frequency of operation

• Access Size

• Operation Duration

• Averages

Page based Commands

• Set Page– A constant

• Copy Page– A B

• Page level Arithmetic operations– A B + C– A B - C

Page based Commands

4 kB

DRAM

SETPAGE ZERO, 0x04000

Page based Commands

4 kB

DRAM

SETPAGE ZERO, 0x04000

Cache

128 bytes

Page based Commands Issue

4 kB

DRAM

SETPAGE ZERO, 0x04000

Cache

How do we ensure Memory and Cache Consistency?

128 bytes

How much data is actually in the cache ?

Function % Hit Rate

Boot + Halt

% Hit Rate

SPEC workloadMemset 7.23% 0.23

Memcpy ( Source) 7.88 10.53%

Memcpy (Destination) < 0.01 % < 0.01 %

Page based Commands

4 kB

DRAM

SETPAGE ZERO, 0x04000

Page based Commands Issue

SETPAGE ZERO, 0x04000

4 kB

DRAM

4 kBDRAM level Page Fragmentation

Page based Commands Issue

SETPAGE ZERO, 0x04000

4 kB

DRAM

4 kBDRAM level Page Fragmentation

Maximum number of rows a page can occupy is 2

Solution

• Hardware at Cache Level

• Ability to map s/w pages to h/w pages

Expected Speedup I

Current Implementation

EndAddr Addr + LengthWhile ( Address < EndAddr) Mem[Address] SetValue Address Address + 1

Memset( Address, Length, SetValue)

Proposed Implementation

While (Length >= PageSize) SetPage (SetValue, Address) Length Length – PageSize Address Address + Length

Call Memset ( Address , Length, SetValue)

Expected Speedup II

• Current Memset Time for a page : 4 s• Expected Memset Time for a page

= # Rows in a page * Time to read a Row + +Cache Coherence Logic + Misc

= 2 * 100 ns + X

= 200 ns + X

Related Work

• IRAM – On-chip DRAM– Advantage: bigger storage, eliminates much of the

off-chip memory access, energy efficient– Disadvantage: not much performance increase,

doesn’t work with conventional microprocessors

• Active page – bring computation to DRAM– break the memory into fixed page-size and add

reconfigurable logic to DRAM

• Heap paper shows some memory accesses that can be eliminated entirely

Conclusion

• Page- based commands are necessary.