August 8th, 2011Kevan Thompson
Creating a Scalable Coherent L2 Cache
Motivation
Cache Background
System Overview
Methodology
Progress
Future Work
Outline
2
Goal
Create a configurable shared Last Level Cache for the use in the PolyBlaze system
Motivation
3
Introduction
4
Zia
Eric
Kevan
In modern systems, processors out perform main memory, creating a bottleneck
This problem is only exacerbated as more cores contend for the memory
This problem is reduced if each processor maintains a local copy of the data
Cache Background
5
A cache is a small amount of memory on the same die as the processor
The cache is capable of providing a lower latency and a higher throughput than the main memory
Systems may include multiple cache levels
The smallest and most local cache is the L1 cache. The next level cache is the L2, etc
Caches
6
Shared Last Level Cache
Acts as a common location for data
Can be used to maintain cache coherency between processors
Does not exist in current MicroBlaze system
We will design our own shared L2 Cache to maintain cache coherency
7
Cache Speeds
In typical systems:
An L1 cache is very fast (1 or 2 cycles )
An L2 cache is slower (10’s of cycles)
Main memory is very slow (100’s of cycles)
8
Cache Speeds
In our system we expect :
The L1 cache to be very fast (1 or 2 cycles )
The L2 cache to be about (10 of cycles)
Main memory to be faster (10’s of cycles)
In order to model the memory bottleneck of a much faster system we’ll need to stall the Main Memory
9
Direct Mapped Cache
10
Caches store Data, a Valid Bit and a unique identifier called a tag
Tags
11
As an example imagine a system with the following :
32-bit Address Bus, and 32-bit Word Size
64-KByte Cache with 32-Byte Line Size
Therefore we have 2047 (211) Lines
Set-Associated Cache
12
A cache with n possible entries for each address is called an n-way set associated cache
4-Way Set Associated Cache
Replacement Policies
13
When an entry needs to be evicted from the cache we need to decide which Way it is evicted from.
To do this we use a replacement policy
LRU
Clock
FIFO
LRU
14
Keep track of when each entry is accessed
Always evict the Least Recently Used
Implemented using a stack
MRU
LRU
Access 4 Access 2
Clock
15
For each Way we store a Reference Bit
Also store a pointed to the oldest entry (Hand)
Starting with the Hand we test and clear each R Bit until we reach one that is 0
0 1 2 3
01 1 10 0 0
System Overview
16
PolyBlaze L2 Cache
17
1-16 Way Set Associated Cache
LRU or Clock Replacement Policy
32 or 64 Byte Line Width
64 Bit Memory Interface
Write Back Cache
L2 Cache
18
Reuse Policy
19
Determines which Way is evicted on Cache Miss
Currently uses LRU Policy
Tag Bank
20
Contains Tags and Valid Bits
Stored on FPGA using BRAMs
Instantiate one bank for each Way
Control Unit
21
Finite State Machine for L2 Cache Pipelining
If a request is outstanding from NPI we can service other requests in SRAM
Data Bank
22
Control interface for off-chip SRAM
SRAM
23
32-bit ZBT synchronous SRAM
1 MB
Methodology
24
Break L2 cache into three parts and test separately then combine and test system
SRAM Controller
NPI Interface
L2 Core
Complete L2 Cache
SRAM Controller
25
Create a wrapper that connects the SRAM controller to the MicroBlaze by an FSL
Write a program that will write and read data to all addresses in the SRAM
Write all 1’s
Write all 0’s
Alternate writing all 1’s and all 0’s
Write Random data
√
√
√
√
NPI Interface
26
Uses a custom FSL width, so we cannot test using MicroBlaze
Create a hardware test bench to read and write data to all addresses
Write all 1’s
Write all 0’s
Alternate writing all 1’s and all 0’s
Write Random data
X
X
X
X
L2 Core
27
Simulate the core of the L2 cache in iSim
Write a test bench that will approximate the responses from the L1/L2 Arbiter, SRAM Controller, and NPI Interface
The test bench will write to each line multiple times to create a large number of cache misses
X
X
X
Complete L2 Cache
28
Combine the L2 Cache with the rest of PolyBlaze
Write test programs to read and write to various regions of memory
X
X
Current Progress
29
SRAM Controller and Data Bank:
Designed and Tested
NPI Interface:
Testing and Debugging in Progress
L2 Core:
Testing and Debugging in Progress
Future Work
30
Add Clock Replacement Policy to L2 Cache
Add a Write Back Buffer to L2 Cache
Migrate System from XUPV5 to a BEE3 so we can create a system with more cores
Modify the L2 Cache into a NUMA system
Add Custom Hardware Accelerators to PolyBlaze
Questions?
31
Top Related