Memory Arithmetic Unit Interface

28
Memory Arithmetic Unit Interface Jason M. Meier Justin S. Teller Tom J. Keeley

description

Memory Arithmetic Unit Interface. Jason M. Meier Justin S. Teller Tom J. Keeley. Current Paradigm. CPU. Done: Task 1. CPU:. Task 1. Task 2. MEMORY CTRL:. MEMORY:. DRAM System. Memory Controller. Active Pages Implementation. Used Configurable DRAM - RADRAM. - PowerPoint PPT Presentation

Transcript of Memory Arithmetic Unit Interface

Page 1: Memory Arithmetic Unit Interface

Memory Arithmetic Unit InterfaceJason M. MeierJustin S. TellerTom J. Keeley

Page 2: Memory Arithmetic Unit Interface

MemoryController

Current Paradigm

Task 1CPU: Task 2

MEMORY:

CPU

MEMORYCTRL:

DRAM System

Done: Task 1

Page 3: Memory Arithmetic Unit Interface

Active Pages Implementation

• Used Configurable DRAM - RADRAM

•Reconfigurable logic implements various memory functions•“Active Page” consists of a page of data and a set of associated functions•Works on individual DRAM chips•Processor-centric and Memory-centric partitioning

* Active Pages - Oskin, Chong, Sherwood – ISCA ‘98

Page 4: Memory Arithmetic Unit Interface

MAUI Implementation

Task 1CPU:

MEMORY:

CPU

MEMORYCTRL/MAUI: Task 1

DRAM System

Task 2

MAUI

MemoryController

MAU

Done: Task 1

Page 5: Memory Arithmetic Unit Interface

1) CPU sends an MAU_LOAD register command to the MC (along with the reg # and address to read) across the front-side bus.2) MC interprets command and places a Read command in the transaction queue.3) DRAM performs read.4) Result is stored in appropriate register in the MAUI register file.

MAUI Instruction Set

LOAD REGCPU:

DRAM: R

MC/MAUI:

DRAM System

MAUI

MemoryController

MAU

1

23

4

1

2 3

4

MAUI_LD <m_rd>,offset(<cpu_rs>)

Page 6: Memory Arithmetic Unit Interface

1) CPU sends an MAU_LOADI register command to the MC (along with the reg # and integer to save) across the front-side bus.2) MC interprets command and places integer in the appropriate register in the MAUI register file.

MAUI Instruction Set II

LOADI REGCPU:

DRAM:

MC/MAUI:

DRAM System

MAUI

MemoryController

MAU

1

2

1

2

MAUI_LDI <rd>,<cpu_rs>

Page 7: Memory Arithmetic Unit Interface

1) CPU invalidates addresses in the cache that fall within the range of the destination array. Addresses within the range of the source arrays are written back if dirty. 2) CPU sends an MAUI_ADD command to the MC (along with the reg #’s) across the front-side bus.3) MC interprets command, MAUI adds the appropriate registers and places a Write command and next two Read commands in the transaction queue.4) Step 3 repeats for the length of the array.

MAUI Instruction Set III

MAU_ADDCPU:

DRAM: W

MC/MAUI:

1

2

4

MAUI_ADD <rd>,<rs1>,<rs2>,<rsz>

CPU

DRAM SystemMAUI

MemoryController

MAU1

2

3

3

R R W

4

Page 8: Memory Arithmetic Unit Interface

Issues: Read & Write Locks

Page 9: Memory Arithmetic Unit Interface

Issues: Address Mapping

TLB

Virtual Space

PhysicalSpace

Memory that is Contiguous in Virtual Space may not be Contiguous in Physical Space

•MAUI assumes consecutive addressing (size register)

•MAUI operations which cross page boundaries must be split into separate operations for each

page

•Programmer will not know mapping scheme

•Result: All MAUI operations will need to be privileged instructions, accessed by

programs through a system call.

Page 10: Memory Arithmetic Unit Interface

• The compiler will be responsible for deciding when MAUI instructions should be used.

• This decision will be based on the size of the array, and if it’s likely to be in the cache, or if it’s likely to used by an instruction that isn’t implemented in the MAUI.

Issues: Compiler Issues

Page 11: Memory Arithmetic Unit Interface

Issues: Task Interrupts

Task 1CPU: Task 2

MEMORY:

CPU

MEMORYCTRL/MAUI: Task 1 Task 1

DRAM System

Task 2

Task 2

MAUI

MemoryController

MAU

Page 12: Memory Arithmetic Unit Interface

Memory

maui_ld r1, 0

Transaction Queue

BIU

Size(r4) OffsetRL1_beg RL1_endRL2_beg RL2_endWL_beg WL_endR1_Data R1_Addr = 0 R1_statusR2_Data R2_Addr R2_statusR3_Data R3_Addr R3_statusMAU_Status = open

maui_ld r1, 0

Example: maui_add I

Memory Controller

Page 13: Memory Arithmetic Unit Interface

Memory

maui_ld r2, 5

Size(r4) OffsetRL1_beg RL1_endRL2_beg RL2_endWL_beg WL_endR1_Data R1_Addr = 0 R1_statusR2_Data R2_Addr = 5 R2_statusR3_Data R3_Addr R3_statusMAU_Status = open

maui_ld r2, 5

Example: maui_add II

Transaction Queue

Memory Controller

BIU

Page 14: Memory Arithmetic Unit Interface

Memory

maui_ld r3, 10

Size(r4) OffsetRL1_beg RL1_endRL2_beg RL2_endWL_beg WL_endR1_Data R1_Addr = 0 R1_statusR2_Data R2_Addr = 5 R2_statusR3_Data R3_Addr = 10 R3_statusMAU_Status = open

maui_ld r3, 10

Example: maui_add III

Transaction Queue

Memory Controller

BIU

Page 15: Memory Arithmetic Unit Interface

Memory

maui_ld r4, 2

Size(r4) = 2 OffsetRL1_beg RL1_endRL2_beg RL2_endWL_beg WL_endR1_Data R1_Addr = 0 R1_statusR2_Data R2_Addr = 5 R2_statusR3_Data R3_Addr = 10 R3_statusMAU_Status = open

maui_ld r4, 2

Example: maui_add IV

Transaction Queue

Memory Controller

BIU

Page 16: Memory Arithmetic Unit Interface

Memory

maui_add r3, r1, r2

R, 0

R, 5

Size(r4) = 2 Offset = 0RL1_beg = 0 RL1_end = 1RL2_beg = 5 RL2_end = 6WL_beg = 10 WL_end = 11R1_Data R1_Addr = 0 R1_status = wR2_Data R2_Addr = 5 R2_status = wR3_Data R3_Addr = 10 R3_status = uMAU_Status = occupied

maui_add r3, r1, r2

Example: maui_add V

Transaction Queue

Memory Controller

BIU

Page 17: Memory Arithmetic Unit Interface

Memory

Read 10

D1[0]

maui_add r3, r1, r2*

Size(r4) = 2 Offset = 0RL1_beg = 1 RL1_end = 1RL2_beg = 5 RL2_end = 6WL_beg = 10 WL_end = 11R1_Data = D1[0] R1_Addr = 0 R1_status = fR2_Data R2_Addr = 5 R2_status = wR3_Data R3_Addr = 10 R3_status = uMAU_Status = occupied

Example: maui_add VI

Transaction Queue

Memory Controller

BIU

Page 18: Memory Arithmetic Unit Interface

Memory

D2[0]

Read 10

maui_add r3, r1, r2*

Size(r4) = 2 Offset = 0RL1_beg = 1 RL1_end = 1RL2_beg = 6 RL2_end = 6WL_beg = 10 WL_end = 11R1_Data = D1[0] R1_Addr = 0 R1_status = fR2_Data = D2[0] R2_Addr = 5 R2_status = fR3_Data R3_Addr = 10 R3_status = uMAU_Status = occupied

Example: maui_add VII

Transaction Queue

Memory Controller

BIU

Page 19: Memory Arithmetic Unit Interface

Memory

R, 1

R, 6

W,10, D1[0]+D2[0]

Read 10

maui_add r3, r1, r2*

Size(r4) = 2 Offset = 1RL1_beg = 1 RL1_end = 1RL2_beg = 6 RL2_end = 6WL_beg = 11 WL_end = 11R1_Data = D1[0] R1_Addr = 0 R1_status = wR2_Data = D2[0] R2_Addr = 5 R2_status = wR3_Data = D1[0] + D2[0] R3_Addr = 10 R3_status = fMAU_Status = occupied

Example: maui_add VIII

Transaction Queue

Memory Controller

BIU

Page 20: Memory Arithmetic Unit Interface

Memory

Write 6, D

D1[1]

maui_add r3, r1, r2*

Size(r4) = 2 Offset = 1RL1_beg = NULL RL1_end = NULLRL2_beg = 6 RL2_end = 6WL_beg = 11 WL_end = 11R1_Data = D1[1] R1_Addr = 0 R1_status = fR2_Data R2_Addr = 5 R2_status = wR3_Data R3_Addr = 10 R3_status = uMAU_Status = occupied

Example: maui_add IX

Transaction Queue

Memory Controller

BIU

Page 21: Memory Arithmetic Unit Interface

Memory

D2[1]

Write 6, D

maui_add r3, r1, r2*

Size(r4) = 2 Offset = 1RL1_beg = NULL RL1_end = NULLRL2_beg = NULL RL2_end = NULLWL_beg = 11 WL_end = 11R1_Data = D1[1] R1_Addr = 0 R1_status = fR2_Data = D2[1] R2_Addr = 5 R2_status = fR3_Data R3_Addr = 10 R3_status = uMAU_Status = occupied

Example: maui_add X

Transaction Queue

Memory Controller

BIU

Page 22: Memory Arithmetic Unit Interface

Memory

Next Instruction

W,10, D1[1]+D2[1]

Size(r4) = 2 Offset = 2RL1_beg = NULL RL1_end = NULLRL2_beg = NULL RL2_end = NULLWL_beg = NULL WL_end = NULLR1_Data = D1[1] R1_Addr = 0 R1_status = uR2_Data = D2[1] R2_Addr = 5 R2_status = uR3_Data = D1[1] + D2[1] R3_Addr = 10 R3_status = fMAU_Status = free?

Example: maui_add XI

Transaction Queue

Memory Controller

BIU

Page 23: Memory Arithmetic Unit Interface

Advantages & Disadvantages

Advantages•Better performance for DRAM latency bound computations

•Lower latency to DRAM compared to CPU

•Reduced traffic on front-side bus

•Concurrent execution

Disadvantages•MAUI operates at a lower clock frequency

•Increased compiler complexity

•Increased fabrication costs (More Logic = More $$)

•Recently used data may not be cached

Page 24: Memory Arithmetic Unit Interface

Alternative Implementation

MAUI Occupies its Own Read & Write Bus

CPU

DRAM System

MAUIMAU

MemoryController

MAUI Read &Write Bus

•Eliminate Contention with CPU for DRAM system resources.•Create Circular Data flow resulting in increased performance•Need Specialized Triple-Ported DRAM system leading to increased production costs

GOODGOOD

X BAD

Page 25: Memory Arithmetic Unit Interface

• Simulated on SimpleScalar version 4.0

• One set of test benches with dual array operations running in both the MAUI and CPU with four different array sizes. This trial was repeated for both shared and independent memory access busses.

• Found up to a 43% speedup!

Test Setup

Page 26: Memory Arithmetic Unit Interface

Results

10000

100000

1000000

10000000

60 Int Array 600 Int Array 6000 Int Array 60000 Int Array

No MAUI

MAUI (Shared Bus)

MAUI (Separate Bus)

Tot

al C

PU

Cyc

les

Page 27: Memory Arithmetic Unit Interface

Future Enhancements I

DRAM System

MAUI

MemoryController

MAUS

MAU Multi-taskingTask 1CPU: Task 2

MEMORY:

MEMORYCTRL/MAUI: Task 1

Task 2

Task 3

Task 3

Larger RegisterFile

More MAUs for Parallelism

SmallCache

Page 28: Memory Arithmetic Unit Interface

Future Enhancements II

MAU_ADDCPU:

DRAM: W

MC/MAUI:

Better Pipelining

R R WR R R R R R WW

DRAM System

MAUI

MemoryController

MAU

Larger RegisterFile to Hold

Intermediate Results