Token Tenure: PATCHing Token Counting Using Directory...

38
Token Tenure: PATCHing Token Counting Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo Martin University of Pennsylvania {arraghav, blundell, milom}@cis.upenn.edu

Transcript of Token Tenure: PATCHing Token Counting Using Directory...

Page 1: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Token Tenure: PATCHing Token Counting Using Directory-Based Cache Coherence

Arun Raghavan, Colin Blundell, Milo Martin University of Pennsylvania

{arraghav, blundell, milom}@cis.upenn.edu

Page 2: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

This work licensed under the Creative Commons Attribution-Share Alike 3.0 United States License

•  You are free: •  to Share — to copy, distribute, display, and perform the work •  to Remix — to make derivative works

•  Under the following conditions: •  Attribution. You must attribute the work in the manner specified by the author or

licensor (but not in any way that suggests that they endorse you or your use of the work).

•  Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.

•  For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to:

http://creativecommons.org/licenses/by-sa/3.0/us/ •  Any of the above conditions can be waived if you get permission from

the copyright holder. •  Apart from the remix rights granted under this license, nothing in this

license impairs or restricts the author's moral rights.

[ 2 ] PATCH - Arun Raghavan - MICRO 2008

Page 3: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Why Yet Another Coherence Protocol? Fast

sharing Avoids

broadcast Scalable

interconnect

✔ ✗ Snoopy

Directory • Track sharers

Token Coherence • Token counting

✗ ✗ ✔ ✔

✔ ✔ ✗

✔ ✔ ✔

1

2

1 2

3

1

2

Our goal

This work: combining directory and token counting

?

3 PATCH - Arun Raghavan - MICRO 2008

Page 4: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Overview • Begin with a standard directory protocol • Fast sharing misses? Direct requests • Ensure safety? Token counting • Broadcast-free forward progress? Token Tenure

•  Directory selects one requestor to retain tokens •  Requestors give up tokens after a timeout interval

• PATCH: Predictive, Adaptive Token Counting Hybrid

•  Send request “hints” directly to predicted sharers •  Retain scalability? Lowest-priority, best-effort delivery

Fast sharing misses, scales as directory [ 4 ] PATCH - Arun Raghavan - MICRO 2008

Page 5: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Directory Operation

[ 5 ] PATCH - Arun Raghavan - MICRO 2008

P0 P1 P2

Directory Directory Directory

Page 6: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Directory Operation

[ 6 ] PATCH - Arun Raghavan - MICRO 2008

P0 P1 P2

Sharers: Owner: P0

M I I

Directory Directory Directory

Page 7: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Directory Operation

[ 7 ] PATCH - Arun Raghavan - MICRO 2008

P0 P1 P2

Sharers: Owner: P0

M I I

Directory

Store miss

GetM

I Data, acks=1

M

Unblock

1 2

3

Page 8: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Directory Operation

[ 8 ] PATCH - Arun Raghavan - MICRO 2008

P0 P1 P2

Sharers: Owner: P0

M I I

Directory

Store miss

GetM

I M

Unblock

P1

Data, acks=1

Page 9: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Directory with Direct Requests?

[ 9 ] PATCH - Arun Raghavan - MICRO 2008

P0 P1 P2

Sharers: Owner: P1

I M I

Load miss

Data O S

GetS

Page 10: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Directory with Direct Requests?

[ 10 ] PATCH - Arun Raghavan - MICRO 2008

P0 P1 P2

Sharers: Owner: P1

I M I

Load miss

Data O S

GetS

Store miss

Fwd(

P0)

ac

ks=1

Page 11: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Directory with Direct Requests?

[ 11 ] PATCH - Arun Raghavan - MICRO 2008

P0 P1 P2

Sharers: Owner: P1

I M I

Load miss

Data O S

GetS

Store miss

Fwd(

P0)

ac

ks=1

Data acks=1

I

Page 12: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Directory with Direct Requests?

[ 12 ] PATCH - Arun Raghavan - MICRO 2008

P0 P1 P2

Sharers: Owner: P1

I M I

Load miss

Data O S

GetS

Store miss

Fwd(

P0)

ac

ks=1

Data acks=1

M

Incoherence!!

I

Why? Direct requests break key directory assumption

Page 13: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Restoring Coherence

• Coherence invariant: one writer or many readers • Directory: enforces implicitly by distributed algorithm

•  Assumes complete state information at the directory • Alternative: encode permission with token count

•  Fixed number of tokens per cache block •  Need all tokens to write •  One or more tokens to read

• Explicitly enforces coherence invariant •  Without regard to races, protocol details

[ 13 ] PATCH - Arun Raghavan - MICRO 2008

Token Coherence [ISCA ’03]

Page 14: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Directory with Direct Requests: Tokens

[ 14 ] PATCH - Arun Raghavan - MICRO 2008

P0 P1 P2

Sharers: Owner: P1

Load miss GetS

Page 15: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

[ 15 ] PATCH - Arun Raghavan - MICRO 2008

P0 P1 P2

Sharers: Owner: P1

Load miss GetS

Data

Directory with Direct Requests: Tokens

Page 16: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

[ 16 ] PATCH - Arun Raghavan - MICRO 2008

P0 P1 P2

Sharers: Owner: P1

Load miss GetS

Data

Store miss

Fwd(

P0)

Directory with Direct Requests: Tokens

Page 17: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

[ 17 ] PATCH - Arun Raghavan - MICRO 2008

P0 P1 P2

Sharers: Owner: P1

Load miss GetS

Data

Store miss

Data

Directory with Direct Requests: Tokens

Fwd(

P0)

Page 18: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

[ 18 ] PATCH - Arun Raghavan - MICRO 2008

P0 P1 P2

Sharers: Owner: P1

Load miss GetS

Data

Store miss

Data

Directory with Direct Requests: Tokens

Fwd(

P0)

Page 19: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

[ 19 ] PATCH - Arun Raghavan - MICRO 2008

P0 P1 P2

Sharers: Owner: P1

Load miss GetS

Data

Store miss

Data

Directory with Direct Requests: Tokens

Fwd(

P0)

Page 20: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

[ 20 ] PATCH - Arun Raghavan - MICRO 2008

P0 P1 P2

Sharers: Owner: P1

Load miss

Store miss

P0 Starves

Directory with Direct Requests: Tokens

Page 21: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

[ 21 ] PATCH - Arun Raghavan - MICRO 2008

P0 P1 P2

Sharers: Owner: P1

Load miss

Store miss

Token Coherence Solution: Persistent Requests

Persistent Request

• Broadcast • Table at each processor

• N2 state

Page 22: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

[ 22 ] PATCH - Arun Raghavan - MICRO 2008

P0 P1 P2

Sharers: Owner: P1

Load miss

Store miss

P0’s request reached directory first • Directory declares P0 winner

Our Solution P2 non-winner

• Inferred after timeout

Timeout

Directory forwards to P0

Page 23: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Token Tenure

[ 23 ] PATCH - Arun Raghavan - MICRO 2008

Page 24: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Token Tenure • Tokens can be tenured or untenured

•  Tokens by default untenured •  Untenured tokens must be sent to the directory… •  … unless tenured within timeout window

• Active (winner) requestors tenure tokens •  Directory activates one request at a time •  Directory explicitly informs active requestor

• Multiple processors can hold tenured tokens

Why does this ensure forward progress?

[ 24 ] PATCH - Arun Raghavan - MICRO 2008

Page 25: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Flow of Tokens to Active Requestor

[ 25 ] PATCH - Arun Raghavan - MICRO 2008

Timeout

Bounce

Forwarded request

Racing Request

Active

Directory

Untenured

Tenured

Direct request

Restore directory’s ability to

resolve races

Implementation: add timeout

Page 26: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Token Tenure: Implementation • No common-case performance impact

• Activation off critical path of miss •  Token count still determines permissions

• No additional traffic •  Activation piggybacked on forwarded messages

• Set timeout to twice average roundtrip latency •  Avoid early timeout…. •  …but minimize slowing down winner in races

[ 26 ] PATCH - Arun Raghavan - MICRO 2008

Page 27: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Using Direct Requests • Direct requests to no, some or all processors

[ 27 ] PATCH - Arun Raghavan - MICRO 2008

0.6

0.7

0.8

0.9

1

1.1

Directory PATCH-NoDirect PATCH-Owner Broad.-If-Shared PATCH-Broadcast

19% 7% 28% 8% 18%

Direct requests improve performance But at what cost?

jbb oltp apache barnes ocean

64 processors, 16B/cycle

norm

aliz

ed ru

ntim

e

Average 14%

Dest. Set Prediction [ISCA ’03]

Page 28: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Direct Requests: Runtime and Traffic

[ 28 ] PATCH - Arun Raghavan - MICRO 2008

0.6

0.7

0.8

0.9

1

1.1

Directory PATCH-NoDirect PATCH-Owner Broad.-If-Shared PATCH-Broadcast

norm

aliz

ed ru

ntim

e

0 0.5

1 1.5

2 2.5

3

norm

aliz

ed tr

affic

jbb oltp apache barnes ocean

PATCH-NoDirect and Directory have identical traffic PATCH-Broadcast has >100% overhead

Runtime

Traffic

Page 29: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Best-Effort Direct Requests • Direct requests in PATCH

1.  Strictly in addition to directory requests 2.  Don’t need explicit acks

direct requests can be dropped arbitrarily • Best-effort delivery

•  Lowest priority, deliver strictly on “do-no-harm-basis” •  If queued up too long in switches, controller: drop

lower-bound: PATCH-NoDirect performance • Adequate bandwidth? drop no requests • Scarce bandwidth? drop all requests

Never worse than directory [ 29 ] PATCH - Arun Raghavan - MICRO 2008

Page 30: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Best-Effort Direct Requests

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4 Directory PATCH-Broadcast PATCH-BestEffort

microbenchmark-2B/cycle

4 8 16 32 64 128 256 512

norm

aliz

ed ru

ntim

e

number of processors

29%

20%

Broadcast performance with plentiful bandwidth Converges with directory performance at 512

Adapt dynamically; one-size-fits-all

Better than both

[ 30 ] PATCH - Arun Raghavan - MICRO 2008

Page 31: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Enhancing Directory Scalability

[ 31 ] PATCH - Arun Raghavan - MICRO 2008

Page 32: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Enhancing Directory Scalability

Req

I I S S

Directory

Forward

0 1 0 1

Directory

[ 32 ] PATCH - Arun Raghavan - MICRO 2008

Page 33: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Enhancing Directory Scalability

• Coarse directories: 1-bit for k sharers •  Fan-out delivery of forwards: worst case O(N) traffic •  Requires acks from non-sharers too

•  Multiple unicast messages (no ack combining) •  Worst case O(N√N) on 2D torus interconnect

Req

I I S S

Directory

Forward

Directory-coarse

1 1 0 1 0 1

[ 33 ] PATCH - Arun Raghavan - MICRO 2008

Page 34: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Enhancing Directory Scalability

• With PATCH only token holders need respond •  Avoid “unnecessary acknowledgements”

•  When # of sharers small, prevents ack from dominating

Even more scalable than directory

Req

I I S S

Directory

Forward

Req

Directory

Forward

Directory-coarse PATCH-coarse

1 1 1 1

[ 34 ] PATCH - Arun Raghavan - MICRO 2008

Page 35: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

PATCH has high tolerance to inexactness

0.8 1.8 2.8 3.8 4.8

Traffic comparison

norm

aliz

ed

traf

fic

1 2 4 8 16 32 64 128 256 1 2 4 8 16 32 64 128 256 coarseness (sharers/bit)

32%

0.8 1.2 1.6

2 2.4 2.8

Directory PATCH

Runtime comparison, 2B/cycle

norm

aliz

ed

runt

ime 3.6%

microbenchmark @ 256 processors

Coarse Directory: Runtime and Traffic

[ 35 ] PATCH - Arun Raghavan - MICRO 2008

Page 36: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Related Work •  Token counting

•  Token Coherence [Martin+, ISCA ‘03] •  Priority Requests [Cuesta+, PDP ‘07] •  Virtual Hierarchies [Marty+, ISCA ’07] •  Ring Order [Marty+, MICRO ‘06]

• Predictive direct requests •  Multicast snooping [Bilir+, ISCA ‘99] •  Owner Prediction [Acacio+, SC ‘02] •  Producer-Consumer sharing [Cheng+, HPCA ‘07] •  Virtual Circuit Tree Multicast [Jerger+, ISCA ‘08]

• Bandwidth Adaptive Snooping [Martin+, HPCA ‘02] • Embedded ring snooping

•  Uncorq [Strauss+, MICRO ‘07]

[ 36 ] PATCH - Arun Raghavan - MICRO 2008

Page 37: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo

Conclusion • PATCH

•  Directory protocol foundation •  Fast sharing? Direct requests •  Safety? Token counting •  Forward progress? Token tenure

•  Broadcast-free •  Retain scaling of directory? Best-effort delivery

• Resulting properties •  One-size-fits-all •  Opportunistically uses bandwidth for performance •  Yet scales no worse than directory

[ 37 ] PATCH - Arun Raghavan - MICRO 2008

Page 38: Token Tenure: PATCHing Token Counting Using Directory ...acg.cis.upenn.edu/talks/micro08_patch_talk.pdf · Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo