EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.

18
EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    222
  • download

    0

Transcript of EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.

Page 1: EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.

EECS 470Cache and Memory Systems

Lecture 14Coverage: Chapter 5

Page 2: EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.

Cache Modeling

• Components:– Storage modules

module mem32x64 (clk, addr, data_in, rd_wrbar, data_out); input clk; // Clock inputinput [4:0] addr; // data addressinput [63:0] data_in; // write datainput rd_wrbar; // 1'b1 does a read, 1'b0 does a write output [63:0] data_out; // read data // All writes occur at the negative clock edge// All read data is available within 1 clock cycle endmodule

Page 3: EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.

Cache Modeling

• Components:– Cache line overhead

• Tag storage (1..n bits)• Valid bit (1 bit)• Dirty bit (0 or 1 bit)

– Set overhead• Nothing for Direct mapped• LRU state machine for set associative (victim)

– Internal Organization• Muxes, decoders, tag compare logic

– External Interface• Command bus, data buses, clock, protocol, interaction with

pipeline– 64-bit bus to processor, 64-bit bus to memory, blocking?

Page 4: EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.

Cache Modeling

• LRU state logic– 2 lines is easy

• Only two states (lines 0 is LRU; line 1 is LRU)• Single bit (set on access to line 0; reset on access to line 1)

– 4 lines is tougher• How many states?

– Enumerate them» L0, L1, L2, L3» L0, L1, L3, L2» …» L3, L2, L1, L0

– Build a state machine to recognize the state.» How do you update the state if L1 is referenced?» How is a victim cache different?

Page 5: EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.

Homework #4

2. (40 pts) Extend the Verisimple pipeline from HW3 5b (or 5a or Verisimple) to include split instruction and data caches. Both caches should be able to handle indeterminate memory latencies and interface to the unchanged memory provided with the pipeline. They only need to support 8-byte reads and writes. Both I-cache and D-Cache should be 4K direct-mapped with 32 byte lines. The D-cache also includes a 4-line victim cache (fully associative, LRU with 32 byte lines).

Page 6: EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.
Page 7: EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.
Page 8: EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.
Page 9: EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.
Page 10: EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.
Page 11: EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.

Issues with DRAM

• Density is good since they need only a single transistor.

• They can lose the bit over time -> need for refresh– Refresh done by reading each row– If access occurs during refresh, access must

wait (5% of time)– Memory must have signal saying when data

is available

Page 12: EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.
Page 13: EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.
Page 14: EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.
Page 15: EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.
Page 16: EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.

An Alternate Approach

• Split transaction bus protocol– Add more state to controller in memory to enable

multiple independent accesses to memory chip.

Use synchronous protocol so address does not need to be ackedAccess can be read/write for various sized data

Memory will send signal when data is available (in order)

Example: RAMBUS

Page 17: EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.
Page 18: EECS 470 Cache and Memory Systems Lecture 14 Coverage: Chapter 5.