Caches and Memory - Cornell University · Caches and Memory Anne Bracy CS 3410 Computer Science...

127
Caches and Memory Anne Bracy CS 3410 Computer Science Cornell University See P&H Chapter: 5.1-5.4, 5.8, 5.10, 5.13, 5.15, 5.17 1 Slides by Anne Bracy with 3410 slides by Professors Weatherspoon, Bala, McKee, and Sirer.

Transcript of Caches and Memory - Cornell University · Caches and Memory Anne Bracy CS 3410 Computer Science...

  • CachesandMemoryAnneBracyCS3410

    ComputerScienceCornellUniversity

    SeeP&HChapter:5.1-5.4,5.8,5.10,5.13,5.15,5.171

    Slides by Anne Bracy with 3410 slides by Professors Weatherspoon, Bala, McKee, and Sirer.

  • Programs101

    Load/StoreArchitectures:• Readdatafrommemory(putinregisters)

    • Manipulateit• Storeitbacktomemory

    int main (int argc, char* argv[ ]) {int i;int m = n;int sum = 0;for (i = 1; i

  • 1CyclePerStage:theBiggestLie(SoFar)

    3

    Write-BackMemory

    InstructionFetch Execute

    InstructionDecode

    extend

    registerfile

    control

    ALU

    memory

    din dout

    addr

    PC

    memory

    newpc

    inst

    IF/ID ID/EX EX/MEM MEM/WB

    imm

    BA

    ctrl

    ctrl

    ctrl

    BD D

    M

    computejump/branch

    targets

    +4

    forwardunitdetect

    hazardStack, Data, Code Stored in Memory

    Code Stored in Memory(also, data and stack)

  • What’stheproblem?

    +big– slow– faraway

    SandyBridge Motherboard,2011http://news.softpedia.com

    CPUMainMemory

    4

  • TheNeedforSpeed

    CPUPipeline

    5

  • Instructionspeeds:• add,sub,shift: 1cycle• mult: 3cycles• load/store:100 cycles

    off-chip50(-70)ns2(-3)GHzprocessorà 0.5nsclock

    TheNeedforSpeed

    CPUPipeline

    6

  • TheNeedforSpeed

    CPUPipeline

    7

  • What’sthesolution?

    Whatluckydatagetstogohere?

    Level2$

    Level1Data$

    Level1Insn$

    IntelPentium3,1999

    Caches!

    8

  • LocalityLocalityLocality

    Ifyouaskforsomething,you’relikelytoaskfor:• thesamethingagainsoon

    à TemporalLocality• somethingnearthatthing,soon

    à SpatialLocalitytotal = 0;for (i = 0; i < n; i++)

    total += a[i];return total;

    9

  • YourlifeisfullofLocality

    10

    LastCalledSpeedDialFavoritesContactsGoogle/Facebook/email

  • YourlifeisfullofLocality

    11

  • TheMemoryHierarchy

    Registers

    L1Caches

    L2Cache

    L3Cache

    MainMemory

    Disk

    1cycle,128bytes

    4 cycles,64KB

    IntelHaswell Processor,2013

    12cycles,256KB

    36cycles,2-20MB

    50-70ns,512MB– 4GB

    5-20ms16GB– 4TB,

    Small,Fast

    Big,Slow

    12

  • SomeTerminologyCachehit• dataisintheCache• thit :timeittakestoaccessthecache• %hit:Hitrate.#cachehits/#cacheaccessesCachemiss• dataisnotintheCache• tmiss :timeittakestogetthedatafrombelowthe$• Missrate(%miss):#cachemisses/#cacheaccesses

    13

  • TheMemoryHierarchy

    Registers

    L1Caches

    1cycle,128bytes

    4 cycles,64KB

    IntelHaswell Processor,2013

    50-70ns,512MB– 4GB

    5-20ms16GB– 4TB,

    averageaccesstimetavg =thit + %miss*tmiss

    =4+ 5%x 100=9cycles

    12cycles,256KB

    36cycles,2-20MB

    L2Cache

    L3Cache

    MainMemory

    Disk14

  • SingleCoreMemoryHierarchy

    16

    Registers

    L1Caches

    L2Cache

    L3Cache

    MainMemory

    Disk

    ONCHIP

    Disk

    Processor

    Regs

    I$ D$

    L2

    MainMemory

  • Multi-CoreMemoryHierarchy

    Registers(

    L1(Caches(

    L2(Cache(

    L3(Cache(

    Main(Memory(

    Disk(

    ONCHIP

    MainMemory

    Processor

    Regs

    I$ D$

    L2

    L3

    Processor

    Regs

    I$ D$

    L2

    Processor

    Regs

    I$ D$

    L2

    Processor

    Regs

    I$ D$

    L2

    Disk17

  • MemoryHierarchybytheNumbersCPUclockrates~0.33ns– 2ns(3GHz-500MHz)

    *Registers,D-Flip Flops:10-100’sofregisters

    Memorytechnology

    Transistorcount*

    Accesstime Accesstime incycles

    $perGIBin2012

    Capacity

    SRAM(onchip)

    6-8transistors 0.5-2.5ns 1-3cycles $4k 256 KB

    SRAM(offchip)

    1.5-30ns 5-15cycles $4k 32MB

    DRAM 1transistor(needsrefresh)

    50-70ns 150-200cycles $10-$20 8GB

    SSD(Flash) 5k-50kns Tens ofthousands

    $0.75-$1 512GB

    Disk 5M-20Mns Millions $0.05-$0.1

    4TB

    18

  • BasicCacheDesign

    DirectMappedCaches

    19

  • addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    16ByteMemoryMEMORY

    • Byte-addressablememory• 4addressbitsà 16bytestotal• baddr bitsà 2bbytesinmemory

    load 0x1100 à r1

    20

  • 4-Byte,DirectMappedCacheaddr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    MEMORY

    CACHE

    dataABCD

    Directmapped:• Eachaddressmapsto1cacheblock• 4entriesà 2indexbits(2n à nbits)IndexwithLSB:• Supportsspatiallocality

    indexXXXX

    index00011011

    21

    ßCacheentry=row=(cache)line=(cache)block

    BlockSize:1byte

  • AnalogytoaSpiceRack

    • Comparedtoyourspicewall– Smaller– Faster– Morecostly(peroz.)

    ABCDEF

    …Z

    http://www.bedbathandbeyond.com

    SpiceWall(Memory)

    SpiceRack(Cache)

    index spice

    22

  • Cinnamon

    • Howdoyouknowwhat’sinthejar?• Needlabels

    Tag=Ultra-minimalistlabel

    AnalogytoaSpiceRack

    innamon

    SpiceWall(Memory)

    ABCDEF

    …Z

    SpiceRack(Cache)

    index spicetag

    23

  • tag|indexXXXX

    dataABCD

    tag00000000

    4-Byte,DirectMappedCache

    MEMORY

    CACHE

    Tag: minimalistlabel/addressaddress = tag + index

    index00011011

    addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 24

  • 4-Byte,DirectMappedCache

    MEMORY

    CACHE

    Onelasttweak:validbit

    V tag data0 00 X0 00 X0 00 X0 00 X

    index00011011

    addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 25

  • addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    Lookup:• Index into$• Checktag• Checkvalid bit

    Simulation#1ofa4-byte,DMCache

    MEMORY

    CACHE

    V tag data0 11 X0 11 X0 11 X0 11 X

    load 0x1100 Miss

    tag|indexXXXX

    index00011011

    26

  • addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    Simulation#1ofa4-byte,DMCache

    MEMORY

    CACHE

    V tag data1 11 N0 xx X0 xx X0 xx X

    load 0x1100 Miss Lookup:• Index into$• Checktag• Checkvalid bit

    tag|indexXXXX

    index00011011

    27

  • Simulation#1ofa4-byte,DMCache

    MEMORY

    CACHE

    V tag data1 11 N0 11 X0 11 X0 11 X

    load 0x1100...load 0x1100

    Miss

    Hit!

    Awesome!

    tag|indexXXXX

    Lookup:• Index into$• Checktag• Checkvalid bit

    index00011011

    addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 28

  • BlockDiagram4-entry,directmappedCache

    CACHE

    V tag data1 00 111100001 11 1010 01010 01 1010 10101 11 0000 0000

    tag|index1101

    2

    2

    2

    =

    Hit!data

    8

    10100101

    Great!Arewedone?

    29

  • addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    Simulation#2:4-byte,DMCache

    MEMORY

    CACHE

    V tag data0 11 X0 11 X0 11 X0 11 X

    load 0x1100load 0x1101load 0x0100load 0x1100

    Miss Lookup:• Index into$• Checktag• Checkvalid bit

    tag|indexXXXX

    index00011011

    30

  • addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    Simulation#2:4-byte,DMCache

    MEMORY

    CACHE

    V tag data1 11 N0 xx X0 xx X0 xx X

    load 0x1100load 0x1101load 0x0100load 0x1100

    Miss

    tag|indexXXXX

    Lookup:• Index into$• Checktag• Checkvalid bit

    index00011011

    31

  • addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    Simulation#2:4-byte,DMCache

    MEMORY

    CACHE

    V tag data1 11 N0 11 X0 11 X0 11 X

    load 0x1100load 0x1101load 0x0100load 0x1100

    MissMiss

    tag|indexXXXX

    Lookup:• Index into$• Checktag• Checkvalid bit

    index00011011

    32

  • addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    Simulation#2:4-byte,DMCache

    MEMORY

    CACHE

    V tag data1 11 N1 11 O0 11 X0 11 X

    load 0x1100load 0x1101load 0x0100load 0x1100

    MissMiss

    tag|indexXXXX

    Lookup:• Index into$• Checktag• Checkvalid bit

    index00011011

    33

  • addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    Simulation#2:4-byte,DMCache

    MEMORY

    CACHE

    V tag data1 11 N1 11 O0 xx X0 xx X

    load 0x1100load 0x1101load 0x0100load 0x1100

    Miss

    Miss

    Miss

    tag|indexXXXX

    Lookup:• Index into$• Checktag• Checkvalid bit

    index00011011

    34

  • addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    Simulation#2:4-byte,DMCache

    MEMORY

    CACHE

    V tag data1 01 E1 11 O0 11 X0 11 X

    load 0x1100load 0x1101load 0x0100load 0x1100

    MissMiss

    Miss

    tag|indexXXXX

    Lookup:• Index into$• Checktag• Checkvalid bit

    index00011011

    35

  • Simulation#2:4-byte,DMCache

    MEMORY

    CACHE

    V tag data1 01 E1 11 O0 11 X0 11 X

    load 0x1100load 0x1101load 0x0100load 0x1100

    Miss

    Miss

    Miss

    Miss

    tag|indexXXXX

    Lookup:• Index into$• Checktag• Checkvalid bit

    index00011011

    addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 36

  • Simulation#2:4-byte,DMCache

    MEMORY

    CACHE

    V tag data1 11 N1 11 O0 11 X0 11 X

    load 0x1100load 0x1101load 0x0100load 0x1100

    Miss

    Miss

    Miss

    Miss

    Disappointed!

    L

    tag|indexXXXX

    coldcoldcold

    index00011011

    addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 37

  • ReducingColdMissesbyIncreasingBlockSize

    LeveragingSpatialLocality

    38

  • IncreasingBlockSize addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    CACHE

    V tag data0 x A|B0 x C|D0 x E|F0 x G|H

    MEMORY

    • BlockSize:2bytes• BlockOffset: leastsignificantbitsindicatewhereyouliveintheblock

    • Whichbitsaretheindex?tag?

    offsetXXXX index

    00011011

    39

  • addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    Simulation#3:8-byte,DMCache

    MEMORY

    CACHE

    V tag data0 x X|X0 x X|X0 x X|X0 x X|X

    load 0x1100load 0x1101load 0x0100load 0x1100

    Miss

    tag| |offsetXXXX

    index

    Lookup:• Index into$• Checktag• Checkvalid bit

    index00011011

    40

  • addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    V tag data0 x X|X0 x X|X1 1 N|O0 x X|X

    Simulation#3:8-byte,DMCache

    MEMORY

    CACHE

    load 0x1100load 0x1101load 0x0100load 0x1100

    tag| |offsetXXXX

    index

    Miss Lookup:• Index into$• Checktag• Checkvalid bit

    index00011011

    41

  • V tag data0 x X|X0 x X|X1 1 N|O0 x X|X

    Simulation#3:8-byte,DMCache

    MEMORY

    CACHE

    load 0x1100load 0x1101load 0x0100load 0x1100

    Hit!

    tag| |offsetXXXX

    index

    Miss Lookup:• Index into$• Checktag• Checkvalid bit

    index00011011

    addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 42

  • addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    V tag data0 x X|X0 x X|X1 1 N|O0 x X|X

    Simulation#3:8-byte,DMCache

    MEMORY

    CACHE

    load 0x1100load 0x1101load 0x0100load 0x1100

    Hit!

    Miss

    Miss

    tag| |offsetXXXX

    index

    Lookup:• Index into$• Checktag• Checkvalid bit

    index00011011

    43

  • addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    V tag data0 x X|X0 x X|X1 0 E|F0 x X|X

    Simulation#3:8-byte,DMCache

    MEMORY

    CACHE

    load 0x1100load 0x1101load 0x0100load 0x1100

    Hit!

    Miss

    Miss

    tag| |offsetXXXX

    index

    Lookup:• Index into$• Checktag• Checkvalid bit

    index00011011

    44

  • V tag data0 x X|X0 x X|X1 0 E|F0 x X|X

    Simulation#3:8-byte,DMCache

    MEMORY

    CACHE

    load 0x1100load 0x1101load 0x0100load 0x1100

    Hit!

    Miss

    Miss

    Miss

    tag| |offsetXXXX

    index

    Lookup:• Index into$• Checktag• Checkvalid bit

    index00011011

    addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 45

  • V tag data0 x X|X0 x X|X1 0 E|F0 x X|X

    Simulation#3:8-byte,DMCache

    MEMORY

    CACHE

    load 0x1100load 0x1101load 0x0100load 0x1100

    Hit!

    Miss

    Miss

    Miss

    1hit,3misses3bytesdon’tfitinan8bytecache?

    cold

    cold

    conflict

    index00011011

    addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 46

  • RemovingConflictMisseswithFully-AssociativeCaches

    47

  • 8byte,fully-associativeCache

    CACHE

    MEMORY

    Whatshouldtheoffsetbe?Whatshouldtheindexbe?Whatshouldthetag be?

    XXXXtag|offsetXXXX

    offsetXXXX

    V tag data0 xxx X|X

    V tag data0 xxx X|X

    V tag data0 xxx X|X

    V tag data0 xxx X|X

    addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 48

  • V tag data0 xxx X|X

    V tag data0 xxx X|X

    V tag data0 xxx X|X

    V tag data0 xxx X|X

    Simulation#4:8-byte,FACache

    addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    MEMORY

    load 0x1100load 0x1101load 0x0100load 0x1100

    Miss

    XXXXtag|offset

    Lookup:• Index into$• Checktags• Checkvalid bits

    CACHE

    49LRUPointer

  • V tag data1 110 N|O

    V tag data0 xxx X|X

    V tag data0 xxx X|X

    V tag data0 xxx X|X

    Simulation#4:8-byte,FACache

    MEMORY

    load 0x1100load 0x1101load 0x0100load 0x1100

    Miss

    XXXXtag|offset

    Lookup:• Index into$• Checktags• Checkvalid bits

    CACHE

    Hit!

    addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 50

  • addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    V tag data1 110 N|O

    V tag data0 xxx X|X

    V tag data0 xxx X|X

    V tag data0 xxx X|X

    Simulation#4:8-byte,FACache

    MEMORY

    load 0x1100load 0x1101load 0x0100load 0x1100

    Miss

    XXXXtag|offset

    Lookup:• Index into$• Checktags• Checkvalid bits

    CACHE

    Hit!

    Miss

    51LRUPointer

  • V tag data1 110 N|O

    V tag data1 010 E|F

    V tag data0 xxx X|X

    V tag data0 xxx X|X

    Simulation#4:8-byte,FACache

    MEMORY

    load 0x1100load 0x1101load 0x0100load 0x1100

    Miss

    XXXXtag|offset

    Lookup:• Index into$• Checktags• Checkvalid bits

    CACHE

    Hit!

    Miss

    Hit!

    addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 52LRUPointer

  • ProsandConsofFullAssociativity+Nomoreconflicts!+Excellentutilization!Buteither:ParallelReads

    – lotsofreading!SerialReads

    – lotsofwaiting

    tavg =thit + %miss*tmiss=4+ 5%x 100=9cycles

    =6+ 3%x 100=9cycles

    53

  • Pros&Cons

    DirectMapped FullyAssociativeTagSize Smaller LargerSRAMOverhead Less MoreControllerLogic Less MoreSpeed Faster SlowerPrice Less MoreScalability Very NotVery#ofconflictmisses Lots ZeroHitRate Low HighPathologicalCases Common ?

  • ReducingConflictMisseswithSet-AssociativeCaches

    Nottooconflict-y.Nottooslow.…JustRight!

    55

  • 8byte,2-waysetassociativeCache

    CACHE

    MEMORY

    Whatshouldtheoffsetbe?Whatshouldtheindexbe?Whatshouldthetag be?

    XXXXtag||offset

    index

    XXXXoffset

    V tag data0 xx E|F0 xx C|D

    V tag data0 xx N |O0 xx P|Q

    XXXX

    index01

    addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q 56

  • addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    8byte,2-waysetassociativeCache

    CACHE

    index01

    MEMORY

    XXXXtag||offset

    index

    V tag data0 xx X|X0 xx X|X

    V tag data0 xx X |X0 xx X|X

    load 0x1100load 0x1101load 0x0100load 0x1100

    Miss Lookup:• Index into$• Checktag• Checkvalid bit

    58LRUPointer

  • addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    8byte,2-waysetassociativeCache

    CACHE

    index01

    MEMORY

    XXXXtag||offset

    index

    V tag data1 11 N|O0 xx X|X

    V tag data0 xx X |X0 xx X|X

    load 0x1100load 0x1101load 0x0100load 0x1100

    Miss Lookup:• Index into$• Checktag• Checkvalid bit

    Hit!

    59LRUPointer

  • addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    8byte,2-waysetassociativeCache

    CACHE

    index01

    MEMORY

    XXXXtag||offset

    index

    V tag data1 11 N|O0 xx X|X

    V tag data0 xx X |X0 xx X|X

    load 0x1100load 0x1101load 0x0100load 0x1100

    Miss Lookup:• Index into$• Checktag• Checkvalid bit

    Hit!

    Miss

    60LRUPointer

  • 8byte,2-waysetassociativeCache

    addr data0000 A0001 B0010 C0011 D0100 E0101 F0110 G0111 H1000 J1001 K1010 L1011 M1100 N1101 O1110 P1111 Q

    CACHE

    index01

    MEMORY

    XXXXtag||offset

    index

    V tag data1 11 N|O0 xx X|X

    V tag data1 01 E |F0 xx X|X

    load 0x1100load 0x1101load 0x0100load 0x1100

    Miss Lookup:• Index into$• Checktag• Checkvalid bit

    Hit!

    MissHit!

    61LRUPointer

  • EvictionPolicies

    Whichcachelineshouldbeevictedfromthecachetomakeroomforanewline?• Direct-mapped:nochoice,mustevictlineselectedbyindex

    • Associativecaches• Random:selectoneofthelinesatrandom• Round-Robin:similartorandom• FIFO:replaceoldestline• LRU:replacelinethathasnotbeenusedinthelongesttime

    62

  • Misses:theThreeC’s

    • Cold(compulsory)Miss:neverseenthisaddressbefore

    • ConflictMiss:cacheassociativity istoolow

    • CapacityMiss:cacheistoosmall

    63

  • MissRatevs.BlockSize

    64

  • BlockSizeTradeoffs• Foragiventotalcachesize,

    Largerblocksizesmean….– fewerlines– sofewertags,lessoverhead– andfewercoldmisses(within-block“prefetching”)

    • Butalso…– fewerblocksavailable(forscatteredaccesses!)– somoreconflicts– candecreaseperformanceifworkingsetcan’tfitin$– andlargermisspenalty(timetofetchblock)

  • MissRatevs.Associativity

    66

  • ABCsofCaches

    +Associativity:⬇conflictmissesJ⬆hittimeL

    +BlockSize:⬇coldmissesJ⬆conflictmissesL

    +Capacity:⬇capacitymissesJ⬆hittimeL

    tavg =thit +%miss*tmiss

    67

  • Whichcachesgetwhatproperties?

    L1Caches

    L2Cache

    L3Cache

    Fast

    Big

    MoreAssociativeBiggerBlockSizesLargerCapacity

    tavg =thit +%miss*tmiss

    Designwithmissrateinmind

    Designwithspeedinmind

    68

  • Roadmap

    • Thingswehavecovered:– TheNeedforSpeed– LocalitytotheRescue!– Calculatingaveragememoryaccesstime– $Misses:Cold,Conflict,Capacity– $Characteristics:Associativity,BlockSize,Capacity

    • Thingswewillnowcover:– CacheFigures– CachePerformanceExamples– Writes

    69

  • MoreSlidesComing…

  • data

    2-WaySetAssociativeCache(Reading)

    71

    wordselect

    hit?

    lineselect

    = =

    32bits

    64bytes

    Tag IndexOffset

  • data

    3-WaySetAssociativeCache(Reading)

    72

    wordselect

    hit?

    lineselect

    = = =

    32bits

    64bytes

    Tag IndexOffset

  • HowBigistheCache?

    n bitindex,m bitoffset,N-waySetAssociativeQuestion:Howbigiscache?• Dataonly?

    (whatweusuallymeanwhenweask“howbig”isthecache)

    • Data+overhead?

    73

    Tag Index Offset

  • CachePerformanceExample

    tavg =foraccessing16words?MemoryParameters(verysimplified):• MainMemory: 4GB

    – Datacost:50cycleforfirstword,plus3cyclespersubsequentword

    • L1: 512x64bytecachelines,directmapped– Datacost:3cycleperwordaccess– Lookupcost:2cyclePerformanceif%hit=90%?Performanceif%hit=95%?Note:herethit splitsuplookupvs.datacost.Whyaretheretwoways?

    75

    tavg =thit +%miss*tmiss

  • PerformanceCalculation with$Hierarchy• Parameters

    – Referencestream:allloads– D$:thit =1ns,%miss =5%– L2:thit =10ns,%miss =20%(localmissrate)– Mainmemory:thit =50ns

    • WhatistavgD$ withoutanL2?– tmissD$ =– tavgD$=

    • WhatistavgD$ withanL2?– tmissD$ =– tavgL2=– tavgD$=

    77

    tavg =thit +%miss*tmiss

  • PerformanceSummaryAveragememoryaccesstime(AMAT)dependson:• cachearchitectureandsize• Hitandmissrates• Accesstimesandmisspenalty

    Cachedesignaverycomplexproblem:• Cachesize,blocksize(akalinesize)• Numberofwaysofset-associativity(1,N,¥)• Evictionpolicy• Numberoflevelsofcaching,parametersforeach• SeparateI-cachefromD-cache,orUnifiedcache• Prefetchingpolicies/instructions• Writepolicy

    79

  • TakeawayDirectMappedà fast,butlowhitrateFullyAssociativeà higherhitcost,higherhitrateSetAssociativeàmiddleground

    Linesizematters.Largercachelinescanincreaseperformanceduetoprefetching.BUT,canalsodecreaseperformanceisworkingset sizecannotfitincache.

    Cacheperformanceismeasuredbytheaveragememoryaccesstime(AMAT),whichdependscachearchitectureandsize,butalsotheaccesstimeforhit,misspenalty,hitrate.

    80

  • WhataboutStores?

    Whereshouldyouwritetheresultofastore?• Ifthatmemorylocationisinthecache?

    – Sendittothecache– Shouldwealsosendittomemoryrightaway?(write-throughpolicy)– Waituntilweevicttheblock(write-backpolicy)

    • Ifitisnotinthecache?– Allocatetheline(putitinthecache)?(writeallocatepolicy)– Writeitdirectlytomemorywithoutallocation?(nowriteallocatepolicy)

  • CacheWritePoliciesQ:Howtowritedata?

    CPUCache

    SRAMMemory

    DRAM

    addr

    data

    Ifdataisalreadyinthecache…No-Writewrites invalidatethecacheandgodirectlytomemory

    Write-Throughwritesgotomainmemoryandcache

    Write-BackCPUwritesonlytocachecachewrites tomainmemorylater(whenblockisevicted)

  • WriteAllocationPoliciesQ:Howtowritedata?

    CPUCache

    SRAMMemory

    DRAM

    addr

    data

    Ifdataisnotinthecache…Write-Allocateallocateacachelinefornewdata(andmaybewrite-through)

    No-Write-Allocateignorecache, justgoto mainmemory

  • Write-ThroughStores

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    Instructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    CacheRegisterFile$0$1$2$3

    Memory78

    120

    71

    173

    21

    28

    200

    225

    Misses: 0Hits: 0

    0

    0

    16byte,byte-addressedmemory4btye,fully-associativecache:2-byteblocks,write-allocate

    4bitaddresses:3 bittag,1bitoffset

    lru Vtagdata1

    0

  • CacheRegisterFile

    Write-Through(REF1)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    $0$1$2$3

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 0Hits: 0

    0

    0

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    lru Vtagdata1

    0

  • CacheRegisterFile

    Write-Through(REF1)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    000

    $0$1$2$3

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 1Hits: 0

    1

    02978

    29

    Addr: 0001

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    lru Vtagdata

    M

    0

    1

  • CacheRegisterFile

    Write-Through(REF2)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    000

    $0$1$2$3

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 1Hits: 0

    1

    02978

    29

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    lru Vtagdata

    M

    0

    1

  • CacheRegisterFile

    Write-Through(REF2)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    000

    $0$1$2$3

    011

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 2Hits: 0

    1

    12978

    29

    162173

    173

    Addr: 0111

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    lru Vtagdata

    MM

    0

    1

  • CacheRegisterFile

    Write-Through(REF3)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    000

    $0$1$2$3

    011

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 2Hits: 0

    1

    12978

    29

    162173

    173

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    lru Vtagdata

    MM

    1

    0

  • CacheRegisterFile

    Write-Through(REF3)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    000

    $0$1$2$3

    011

    120

    71

    173

    21

    28

    200

    225

    Misses: 2Hits: 1

    1

    129

    29

    162173

    173

    173

    173Addr: 0000

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    lru Vtagdata

    MMHit

    0

    1

  • CacheRegisterFile

    Write-Through(REF4)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    000

    $0$1$2$3

    010

    173

    120

    71

    173

    21

    28

    200

    225

    Misses: 2Hits: 1

    1

    129173

    29173

    Addr: 0101

    16217315071

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    lru Vtagdata

    MMHitM

    0

    1

  • CacheRegisterFile

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    000

    $0$1$2$3

    010

    173

    120

    71

    173

    21

    28

    200

    225

    Misses: 3Hits: 1

    1

    129173

    29173

    1507115029

    Write-Through(REF4)

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    lru Vtagdata

    MMHitM

    1

    0

    29

  • CacheRegisterFile

    Write-Through(REF5)

    29

    123

    29162

    18

    33

    19

    210

    0123456789

    101112131415

    101

    $0$1$2$3

    010

    173

    120

    71

    173

    21

    28

    200

    225

    Misses: 3Hits: 1

    1

    129173

    29173

    2971

    Addr: 1010

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    lru Vtagdata

    MMHitM

    1

    0

  • CacheRegisterFile

    Write-Through(REF5)

    29

    123

    29162

    18

    33

    19

    210

    0123456789

    101112131415

    101

    $0$1$2$3

    010

    173

    120

    71

    173

    21

    28

    200

    225

    Misses: 4Hits: 1

    1

    1

    29

    2971

    3328

    33

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    lru Vtagdata

    MMHitMM 0

    1

  • CacheRegisterFile

    Write-Through(REF6)

    29

    123

    29162

    18

    33

    19

    210

    0123456789

    101112131415

    101

    $0$1$2$3

    010

    173

    120

    71

    173

    21

    28

    200

    225

    Misses: 4Hits: 1

    1

    1

    29

    2971

    3328

    33

    29

    29

    Addr: 0101

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    lru Vtagdata

    MMHitMM 0

    1

  • CacheRegisterFile

    Write-Through(REF6)

    29

    123

    29162

    18

    33

    19

    210

    0123456789

    101112131415

    101

    $0$1$2$3

    010

    173

    120

    71

    173

    21

    28

    200

    225

    Misses: 4Hits: 2

    1

    1

    29

    2971

    3328

    33

    29

    29

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    lru Vtagdata

    MMHitMMHit

    0

    1

  • CacheRegisterFile

    Write-Through(REF7)

    29

    123

    29162

    18

    33

    19

    210

    0123456789

    101112131415

    101

    $0$1$2$3

    010

    173

    120

    71

    173

    21

    28

    200

    225

    Misses: 4Hits: 2

    1

    1

    29

    2971

    3328

    33

    29

    29

    Addr: 1011

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    lru Vtagdata

    MMHitMMHit

    0

    1

  • CacheRegisterFile

    3329

    Write-Through(REF7)

    29

    123

    29162

    18

    19

    210

    0123456789

    101112131415

    101

    $0$1$2$3

    010

    173

    120

    71

    173

    21

    28

    200

    225

    Misses: 4Hits: 3

    1

    1

    29

    297128

    33

    29

    29

    3329

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    lru Vtagdata

    MMHitMMHitHit

    0

    1

  • HowManyMemoryReferences?

    Write-throughperformance• Howmanymemoryreads?

    • Howmanymemorywrites?

    • Overhead?Doweneedadirtybit?

  • CacheRegisterFile

    Write-Through(REF8,9)

    29

    123

    29162

    18

    29

    19

    210

    0123456789

    101112131415

    101

    $0$1$2$3

    010

    173

    120

    71

    173

    21

    28

    200

    225

    Misses: 4Hits: 3

    1

    1

    29

    2971

    2928

    33

    29

    29

    29

    29

    MemoryInstructions:...SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]SB$1àM[5]SB$1àM[10]

    lru Vtagdata

    MMHitMMHitHit 0

    1

  • CacheRegisterFile

    Write-Through(REF8,9)

    29

    123

    29162

    18

    29

    19

    210

    0123456789

    101112131415

    101

    $0$1$2$3

    010

    173

    120

    71

    173

    21

    28

    200

    225

    Misses: 4Hits: 5

    1

    1

    29

    2971

    2928

    33

    29

    29

    29

    MemoryInstructions:...SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]SB$1àM[5]SB$1àM[10]

    lru Vtagdata

    MMHitMMHitHitHitHit

    0

    1

  • Summary:WriteThrough

    Write-throughpolicywithwriteallocate• Cachemiss:readentireblockfrommemory• Write:writeonlyupdateditemtomemory• Eviction:noneedtowritetomemory

  • NextGoal:Write-Throughvs.Write-Back

    CanwealsodesignthecacheNOT towriteallstoresimmediatelytomemory?– Keepthecurrentcopyincache,andupdatememorywhendataisevicted (write-backpolicy)

    – Write-backallevictedlines?• No,onlywritten-toblocks

  • Write-BackMeta-Data(Valid,DirtyBits)

    • V=1meansthelinehasvaliddata• D=1meansthebytesarenewerthanmainmemory• Whenallocatingline:

    – SetV=1,D=0,fillinTagandData• Whenwritingline:

    – SetD=1• Whenevictingline:

    – IfD=0:justsetV=0– IfD=1:write-backData,thensetD=0,V=0

    V D Tag Byte1 Byte2 …ByteN

  • Write-backExample

    • Example:Howdoesawrite-back cachework?• Assumewrite-allocate

  • CacheRegisterFile

    HandlingStores(Write-Back)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    $0$1$2$3

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 0Hits: 0

    0

    0

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    lru Vdtagdata

    16byte,byte-addressedmemory4btye,fully-associativecache:2-byteblocks,write-allocate

    4bitaddresses:3 bittag,1bitoffset

    1

    0

  • CacheRegisterFile

    Write-Back(REF1)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    $0$1$2$3

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 0Hits: 0

    0

    0

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    lru Vdtagdata1

    0

  • CacheRegisterFile

    Write-Back(REF1)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    000

    $0$1$2$3

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 1Hits: 0

    01

    02978

    29

    Addr: 0001

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    lru Vdtagdata

    M

    0

    1

  • CacheRegisterFile

    Write-Back(REF1)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    000

    $0$1$2$3

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 1Hits: 0

    01

    02978

    29

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    M

    lru Vdtagdata0

    1

  • CacheRegisterFile

    Write-Back(REF2)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    000

    $0$1$2$3

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 1Hits: 0

    01

    02978

    29

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    M

    lru Vdtagdata0

    1

  • CacheRegisterFile

    Write-Back(REF2)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    000

    $0$1$2$3

    011

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 2Hits: 0

    0

    0

    1

    12978

    29

    162173

    173

    Addr: 0111

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    1

    0

    MM

    lru Vdtagdata

  • CacheRegisterFile

    Write-Back(REF3)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    000

    $0$1$2$3

    011

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 2Hits: 0

    0

    0

    1

    12978

    162173

    29173

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    1

    0

    MM

    lru Vdtagdata

  • CacheRegisterFile

    Write-Back(REF3)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    000

    $0$1$2$3

    011

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 2Hits: 1

    1

    0

    1

    129173

    29

    162173

    173

    Addr: 0000

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    MMHit

    lru Vdtagdata0

    1

  • CacheRegisterFile

    Write-Back(REF4)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    000

    $0$1$2$3

    011

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 2Hits: 1

    1

    0

    1

    129173

    29

    162173

    173

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    MMHit

    lru Vdtagdata0

    1

  • CacheRegisterFile

    Write-Back(REF4)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    000

    $0$1$2$3

    010

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 3Hits: 1

    1

    1

    1

    129173

    29173

    15071

    Addr: 0101

    29

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    1

    0

    MMHitM lru Vdtagdata

  • CacheRegisterFile

    Write-Back(REF5)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    000

    $0$1$2$3

    010

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 3Hits: 1

    1

    1

    1

    129173

    29173

    2971

    Addr: 1010

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    1

    0

    MMHitM lru Vdtagdata

  • CacheRegisterFile

    Write-Back(REF5)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    000

    $0$1$2$3

    010

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 3Hits: 1

    1

    1

    1

    129173

    29173

    2971

    173Addr: 1010

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    1

    0

    MMHitM lru Vdtagdata

  • CacheRegisterFile

    Write-Back(REF5)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    101

    $0$1$2$3

    010

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 4Hits: 1

    0

    1

    1

    1

    29

    2971

    3328

    33

    Addr: 1010

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    MMHitMM

    lru Vdtagdata0

    1

  • CacheRegisterFile

    Write-Back(REF6)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    101

    $0$1$2$3

    010

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 4Hits: 1

    0

    1

    1

    1

    29

    2971

    3328

    33

    Addr: 0101

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    MMHitMM

    lru Vdtagdata0

    1

  • CacheRegisterFile

    Write-Back(REF6)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    101

    $0$1$2$3

    010

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 4Hits: 2

    0

    1

    1

    1

    29

    2971

    3328

    33

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    1

    0

    MMHitMMHit

    lru Vdtagdata

  • CacheRegisterFile

    Write-Back(REF7)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    101

    $0$1$2$3

    010

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 4Hits: 2

    0

    1

    1

    1

    29

    2971

    3328

    33

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    1

    0

    MMHitMMHit

    lru Vdtagdata

  • CacheRegisterFile

    Write-Back(REF7)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    101

    $0$1$2$3

    010

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 4Hits: 3

    1

    1

    1

    1

    29

    2971

    2928

    33

    MemoryInstructions:LB$1ßM[1]LB$2ßM[7]SB$2àM[0]SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]

    MMHitMMHitHit

    lru Vdtagdata0

    1

  • CacheRegisterFile

    Write-Back(REF8,9)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    101

    $0$1$2$3

    010

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 4Hits: 3

    1

    1

    1

    1

    29

    2971

    2928

    33

    MemoryInstructions:...SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]SB$1àM[5]SB$1àM[10]

    MMHitMMHitHit

    lru Vdtagdata0

    1

  • CacheRegisterFile

    Write-Back(REF8,9)

    29

    123

    150162

    18

    33

    19

    210

    0123456789

    101112131415

    101

    $0$1$2$3

    010

    78

    120

    71

    173

    21

    28

    200

    225

    Misses: 4Hits: 5

    1

    1

    1

    1

    29

    2971

    2928

    33

    MemoryInstructions:...SB$1àM[5]LB$2ßM[10]SB$1àM[5]SB$1àM[10]SB$1àM[5]SB$1àM[10]

    MMHitMMHitHitHitHit

    lru Vdtagdata0

    1

  • HowManyMemoryReferences?

    Write-backperformance• Howmanyreads?

    • Howmanywrites?

  • Write-backvs.Write-throughExampleAssume:largeassociativecache,16-bytelines

    for (i=1; i

  • Soiswritebackjustbetter?

    ShortAnswer: Yes(fewerwritesisagoodthing)LongAnswer:It’scomplicated.• Evictionsrequireentirelinebewrittenbacktomemory(vs.justthedatathatwaswritten)

    • Write-backcanleadtoincoherentcachesonmulti-coreprocessors(laterlecture)

  • 1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    12• Everyaccessacachemiss!• (unlessentirematrixfitsincache)

    // H = 12, W = 10int A[H][W];

    for(x=0; x < W; x++) for(y=0; y < H; y++)

    sum += A[y][x];

    CacheConsciousProgramming

  • CacheConsciousProgramming

    • Blocksize=4à 75%hitrate• Blocksize=8à 87.5%hitrate• Blocksize=16à 93.75%hitrate• Andyoucaneasilyprefetch towarmthecache

    1 2 3 4 5 6 7 8 9 1011 12 13 …

    // H = 12, W = 10int A[H][W];

    for(y=0; y < H; y++)for(x=0; x < W; x++)

    sum += A[y][x];

  • Bytheendofthecachelectures…

  • Summary• Memoryperformancematters!

    – oftenmorethanCPUperformance– …becauseitisthebottleneck,andnotimprovingmuch– …becausemostprogramsmoveaLOTofdata

    • Designspaceishuge– Gamblingagainstprogrambehavior– Cutsacrossalllayers:usersà programsà osà hardware

    • NEXT:Multi-coreprocessorsarecomplicated– Inconsistentviewsofmemory– Extremelycomplexprotocols,veryhardtogetright