Post on 17-Jan-2018
description
Understanding and Implementing Cache Coherency Policies
CSE 8380: Parallel and Distributed Processing
Dr. Hesham El-Rewini
Presented by,Fazela Vohra
CSE fvohra@mail.smu.eduGraduate Student,
Southern Methodist University.
Goals• Create a pure software cache system
as a test bed.• Implement five cache write policies
for maintaining coherency on the test bed.
• Perform experiments and test different scenarios
• Gather statistics, measure and make conclusions.
Cache Basics• Cache is a small store placed between a
processor and its main memory in a shared memory system
• Faster Volatile store• Exploits locality of reference.• Spatial locality: Neighboring locations in a
store have a higher chance of being accessed. • Temporal locality: Once accessed, a location in
a store will be accessed repeatedly over time. • Hit: An event when data to be read is already in
the cache.• Large number of hits give better throughput.
Issues• Multiple copies of a datum exist.
• Keeping copies of cached items in sync
• Sync’ing should not affect performance or throughput of the system.
Project Details• Implement various cache policies.• Tinker with tunables to understand
effects on the system.• Measure performance/effectiveness
of the policies NOT the algorithms or implementation.
• Software written in C on Windows Operating System.
Model of the System
Input:
I/O Load
Policy Parameter
Diagnostic Output:
Cache and Main Mem Dumps
Policies
Main Memor
y
Caches
Processing Units
Inputs and Outputs• The input is given through a file which
contains:– I/O type (0=Read, 1=Write)– I/O address.– Processor to perform I/O on.– The data to be written for the basic system
where no computations are performed• A parser converts input to actual I/O.• Policies can be specified by the user.• Observe dumps of cache/main memory
to verify functionality.
Assumptions and Simplifications
• Inputs are small sequences of reads and writes.
• Use small caches to create maximum activity.
• Memory and cache locations are byte wide.• All caches have the same write policy
configured at any point in time.• Each cache entry has the following
structure: DATAADDRSTATUS
Policies Implemented
• Write Through – Write Invalidate• Write Back – Write Invalidate• Write Once• Write Update – Partial Write
Through• Write Back – Write Update
Policy 1: WRITE THROUGH WRITE INVALIDATE
STATES
VALIDCopy consistent
with main memory
INVALIDCopy inconsistent with main memory
READ WRITE
Policy 1: WRITE THROUGH WRITE INVALIDATE
HIT MISS HIT MISS
Read the copy found in cache.Done!
Any other cache has a valid copy
No other cache has. Go to global memory
Replacement is required if no space to accommodate incoming new copy. Since cache is always consistent with main memory. No write back is required.
STATUS=VALID
Write over the copy found in cache.
Update global memory and invalidate otherCaches.
STATUS=VALID
Any other cache has a valid copy
No other cache has. Go to global memory
Write new data over this copy. Update global memory. Invalidate others. Replacement may be needed if no space. No write back.
STATUS=VALID
ResultsWrite Through - Write Invalidate Cache size vs. number of Memory Accesses
6760
5544
27 27
01020304050607080
0 50 100 150 200 250
Cache Size (% of Main Memory)
No. o
f Mem
ory A
cces
esWrite Through-Write Invalidate
Hit rate vs Size of Cache
8
1214
17
20 20
0
5
10
15
20
25
0 50 100 150 200 250
Cache Size (% of Main Memory)
Hits
Keep I/O load constant.Vary cache size.Measure cache hits and main memory accesses.
Policy 2: WRITE BACK WRITE INVALIDATE
STATES
RO-SHAREDMultiple copies
consistent with main memory
INVALID
Copy inconsistent with main memory
RW-EXCLUSIVEOnly one copy inconsistent
with main memory(Ownership)
READ WRITE
Policy 2: WRITE BACK WRITE INVALIDATE
HIT MISS HIT
MISSRead the copy found in cache.Done!
RW copy in no other cache. Get a copy from global
RW copy in another cacheGet it.Update global memory.
If Status=RWWrite over it.
STATUS=RW
Other has RWNo other cache has RW. Go to global memory. Write new data. Invalidate others
STATUS=RWIf entry to be replaced=RW, write back to global.If entry to be replaced=I/RONo write back.
STATUS=ROIn both.
Write over it.
STATUS=RO
In both caches if got from another cache.
SPACE??n y
If Status=ROWrite over it. Invalidate others.STATUS=RW
Copy into own. Invalidate others.Write new data.STATUS=RWSPACE?
?If no space, Replace. If copy to be replaced = RW, write back to global. Otherwise simply write over it. No write back.
STATUS=RW
Write Back - Write Invalidate Cache size vs. number of Memory Accesses
6255
4635
18 18
0
20
40
60
80
0 50 100 150 200 250
Cache Size (% of Main Memory)
No. o
f Mem
ory A
cces
es
c
Write Back-Write InvalidateHit rate vs Size of Cache
8
1214
17
20 20
0
5
10
15
20
25
0 50 100 150 200 250
Cache Size (% of Main Memory)
Hits
Results
Keep I/O load constant.Vary cache size.Measure cache hits and main memory accesses.
Policy 3: WRITE ONCE
STATES
RESERVEDWritten onceconsistent with mainmemory
VALIDCopy
Consistent with
main memory
DIRTYWritten more
than once. InconsistentWith mainmemory
INVALIDCopy
Consistent with
main memory
READ WRITE
Policy 3: WRITE ONCE
HIT MISS HIT
MISSRead the copy found in cache.Done!
DIRTY copy in no other cache. Get a copy from global
DIRTY copy in another cacheGet it.Update global memory.
If Status=D/RESWrite over it.
STATUS=DOther has DIRTY
No other cache has DIRTY. Go to global memory. Write new data. Invalidate others
If entry to be replaced=DIRTY, write back to global.If entry to be replaced=V/RESNo write back.STATUS=VALIDIn both.
Write over it.
STATUS=VALIDIn both
caches if got from another
cache.
SPACE??n y
If Status=VALIDWrite over it. Invalidate others.Update globalSTATUS=RES
Copy into own. Invalidate others.Write new data.
SPACE??
If no space, Replace. If copy to be replaced = DIRTY, write back to global. Otherwise simply write over it. No write back.
STATUS=DIRTY
Write OnceHit rate vs Size of Cache
8
1214
17
20 20
0
5
10
15
20
25
0 50 100 150 200 250
Cache Size (% of Main Memory)
Hits
Write Once Cache size vs. number of Memory Accesses
696560 58 55 55
0
20
40
60
80
0 50 100 150 200 250
Cache Size (% of Main Memory)
No. o
f Mem
ory A
cces
es
c
Results
Keep I/O load constant.Vary cache size.Measure cache hits and main memory accesses.
Policy 4: WRITE UPDATE PARTIAL WRITE THROUGH
STATES
SHAREDMultiple copies
consistent with main memory
DIRTYOnly one copy inconsistent
with main memory(Ownership)
VALID-EXCLUSIVEOnly one copy
consistent with main memory
READ
Policy 4: WRITE UPDATE ‘PARTIAL’ WRITE THROUGH
HIT MISS
Read the copy found in cache.Done!
No other cache has a copy. Get a copy from global
DIRTY copy in another cacheGet it. Update global.
If entry to be replaced=DIRTY, write back to global.If entry to be replaced=V/SHARENo write back.
STATUS=VALX
Write over it.
STATUS =VALX
SPACE??n y
SPACE??
VALX/SHARE copy in another cacheGet it.
If entry to be replaced=DIRTY, write back to global.If entry to be replaced=V/SHARENo write back.STATUS=SHAR
EIn both.
Write over it.
STATUS=SHARE
In both caches.
n y
WRITE
Policy 4: Contd…
HIT MISS
Copy=D/VALXWrite locallyCopy=SHAREWrite overUpdate all sharing caches.Update global.STATUS=SHARE
Another cache has a copy. Get it Write overUpdate all cachesUpdate global
No other cache has a copy. Get it from global memory.Write over it.
SPACE??If entry to be
replaced=DIRTY, write back to global.If entry to be replaced=V/SHARENo write back.
STATUS=SHARE
Write over it.
STATUS=SHARE
n y
SPACE??
If entry to be replaced=DIRTY, write back to global.If entry to be replaced=V/SHARENo write back.
STATUS=DIRTY
Write over it.
STATUS=DIRTY
n y
Write Update Partial Write ThroughHit rate vs Size of Cache
9
13
16
20
23 23
0
5
10
15
20
25
0 50 100 150 200 250
Cache Size (% of Main Memory)
Hits
Write Update Partial Write Through Cache size vs. number of Memory Accesses
635652
43
26 26
0
20
40
60
80
0 50 100 150 200 250
Cache Size (% of Main Memory)
No. o
f Mem
ory A
cces
es
c
Results
Keep I/O load constant.Vary cache size.Measure cache hits and main memory accesses.
Policy 5: WRITE UPDATE WRITE BACK
STATES
SHARED-CLEANMultiple shared Copies, could be Consistent with Main memory.(No ownership)
VALID-EXOnly one copy
Consistent with
main memory
SHARED-DIRTYMultiple shared Copies, last one to be modified
(Ownership)
DIRTYUnshared and
updatedInconsistentWith mainmemory
READ
Policy 5: WRITE UPDATE WRITE BACK
HIT MISS
Read the copy found in cache.Done!
No other cache has a copy. Get a copy from global
DIRTY/SD copy in another cacheGet it.
If entry to be replaced=D/SD, write back to global.If entry to be replaced=VALX/SCNo write back.
STATUS=VALX
Write over it.
STATUS =VALX
SPACE??n y
SPACE??
VALX/SCcopy in another cacheGet it.
If entry to be replaced=D/SD write back to global.If entry to be replaced=VALX/SCNo write back.Supplying cacheSTATUS=SDTaking cacheSTATUS=SC
Write over it.Supplying cacheSTATUS=SDTaking cacheSTATUS=SC
n
y
SPACE??
If entry to be replaced=D/SD, write back to global.If entry to be replaced=VALX/SCNo write back.
STATUS=SC In both.
Write over it.STATUS=SC
In both caches.
n
y
WRITE
Policy 5: contd…
HIT MISS
Copy=D/VALXWrite locallySTATUS=DIRTYCopy=SC/SDWrite overUpdate all sharing caches. STATUS (own)=SDSTATUD (others)=SC
Another cache has a copy. Get it Write over.Update all caches
No other cache has a copy. Get it from global memory.Write over it.
SPACE??If entry to be
replaced=D/SD, write back to global.If VALX/SCNo write back.Supplying cacheSTATUS=SCTaking cacheSTATUS=SD
Write over it.
Supplying cacheSTATUS=SC
Taking cacheSTATUS=SD
n y
SPACE??
If entry to be replaced=D/SD, write back to global.If entry to be replaced=VALX/SCNo write back.
STATUS=DIRTY
Write over it.
STATUS=DIRTY
n y
ResultsWrite Update Write Back
5549
4333
16
0102030405060
0 50 100 150
Cache Size (% Main Memory)
# of m
emor
y acc
esse
sKeep I/O load constant.Vary cache size.Measure cache hits and main memory accesses.
Write Update Write Back
913
1620
23
05
10152025
0 50 100 150
Cache Size (% Main Memory)
Hits
A Practical Experiment: Matrix Multiplication
• 3 x 3 matrix data from input file to main memory
• Start with empty caches.• Matrices multiplied by reading values from
main memory.• Results written to main memory.• Policy used is Write Through - Write Invalidate• Three processor/cache sets.• Each processor computes three elements of
each row.• Each cache has only 7 locations, 6 inputs and 1
result.• Lot of inter-cache exchange • Replacements abound due to small cache
Logic
222120
121110
020100
bbbbbbbbb
222120
121110
020100
aaaaaaaaa
222120
121110
020100
xxxxxxxxx
=
00a
01a
02a
00b
10b
20b
00x
Processor 0
01b
11b
21b
01x
Processor 0
02b
12b
22b
02x
Processor 0
00a
01a
02a
00a
01a02a
Replace Replace
00a
01a
02a
00b
10b
20b
00x
Processor 0
01b
11b
21b
01x
02b
12b
22b
02x
00a
01a
02a
00a
01a02a
20a
21a
22a
00b
10b
20b
20x
01b
11b
21b
21x
02b
12b
22b
22x
10a
11a
12a
01b
11b
21b
11x
02b
12b
22b
12x
As it is seen most of the times each processor can
find what it wants in another cache!
Processor 2
Processor 1
10a11a
12a
10a
11a
12a
00b
10b
20b
10x
20a
21a
22a
20a
21a
22a
Replacement Logic • Each entry also carries a Use tag and a
Replaced bit.• When the entry is accessed the Use tag is
incremented.• When the entry is replaced the Replaced
bit is set• So always entries with smaller use tags will
be replaced• The replaced bit takes care that an entry
that has just been replaced is not immediately replaced in the next cycle because it will always have a smaller use tag!
The Broadcast Issue!
• Shared memory systems interconnected using a BUS, I implemented it as a loop where I invalidate other caches
• Could also do with event based system.• Processor posts an ‘event’ to all caches
when it updates an entry.• Other caches invalidate their entries on
demand based on the events posted.
Future Work
• Implement matrix multiplication for all policies
References• Advanced Computer Architecture
and Parallel Processing,Hesham El-Rewini, Mostafa Abd-El-Barr
• https://www.cs.tcd.ie/Jeremy.Jones/vivio/vivio.htm
Questions / Answers
Thank You !