HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin...

Post on 17-Jan-2018

219 views 0 download

description

Motivating Example

Transcript of HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin...

HARD: Hardware-Assisted lockset-based Race Detection

P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07

Shimin Chen

LBA Reading Group Presentation

Motivation

Data race detection important S/W solutions slow (not good for production

runs) Previous H/W solutions focus on happens-

before relation

Cannot detect potential races

Motivating Example

Solution: HARD (h/w lockset)

Challenges:– How to efficiently store and maintain lockset for

each variable in hardware?– How to efficiently perform the set operation in the

lockset algorithm? Main ideas (will be detailed later)

– h/w bloom filter– Piggybacking on cache coherence protocols– Reset all bloom filters after exiting a barrier

Outline

LockSet (refresh our memory) HARD Evaluation Conclusion

Main Lockset Algorithm

Idea: accesses to every shared variable should be protected by some common lock.

Data structures:– Thread t’s current lock set: L(t)– Candidate set for a variable v: C(v)

Algorithms– Modify L(t) upon lock acquire and release– Initiate C(v) to be a set of all locks– When t accesses v, C(v)=C(v) L(t)– If C(v) == then report violation on variable v

Reducing False Positives

Outline

LockSet (refresh our memory) HARD Evaluation Conclusion

HARD Overview

LState: exclusive, shared, etc.

BFVector: candidate lock set for the cache line

Lock Register: Thread’s lockset

Counter Register: used for resolving hash collisions (more detail later)

2bits 16bits16bits

32bits

HARD Overview: Operations

A lock a ‘1’ in bloom filter Fetching a line from memory: set the

BFVector to all 1s, LState to exclusive Update BFVector and LState on accesses Communicate them through coherence

protocol Lock register: thread’s lock set

2b 16b

16b

32b

Bloom Filter

Bloom filter: A bit vector that represents a set of keys– A key is hashed d (e.g. d=3) times and represented by d bits

Construct: for every key in the set, set its 3 bits in vector Membership Test: given a key, check if all its 3 bits are 1

– Definitely not in the set if some bits are 0– May have false positives

0 0 0 1 1 1 0 0 0 1 1 0 0 1 0 0 0 0 0 1

Bit0=H0(key)

Bit1=H1(key)

Bit2=H2(key)

Filter

Representing LockSet as Bloom Filter

4 hash functions Lockset Intersection:

bloom filter intersection Lockset empty:

any of the 4bits are all 0

False Negative Caused by Bloom Filter

Prob of False Negatives

Suppose the candidate set contains m locks Given a lock, probability of recognizing it as a member:

prob_whole = prob_part k

prob_part = 1 – (1-1/n)m

When k=4, n=4:– 0.0039 (m=1), 0.037 (m=2), 0.111 (m=3)– Paper says: “experiments show that no races were missed”

But what if the thread currently holds multiple locks?

n bits n bits n bits n bits n bits n bits

k parts

k=4, n=4

If threads hold 1 to 8 locks (not in the paper)

n bits =4k parts =4----------------------------------------------- m=1 m=2 m=3 m=4 t=1 : 0.0039 0.0366 0.1117 0.2184 t=2 : 0.0078 0.0719 0.2109 0.3891 t=3 : 0.0117 0.1059 0.2991 0.5225 t=4 : 0.0155 0.1387 0.3774 0.6267 t=5 : 0.0194 0.1702 0.4469 0.7083 t=6 : 0.0232 0.2006 0.5087 0.7720 t=7 : 0.0270 0.2299 0.5636 0.8218 t=8 : 0.0308 0.2581 0.6123 0.8607 -----------------------------------------------

Try another design

n bits =8k parts =8----------------------------------------------- m=1 m=2 m=3 m=4 t=1 : 0.0000 0.0000 0.0001 0.0009 t=2 : 0.0000 0.0000 0.0003 0.0017 t=3 : 0.0000 0.0000 0.0004 0.0026 t=4 : 0.0000 0.0000 0.0006 0.0034 t=5 : 0.0000 0.0000 0.0007 0.0043 t=6 : 0.0000 0.0001 0.0008 0.0051 t=7 : 0.0000 0.0001 0.0010 0.0060 t=8 : 0.0000 0.0001 0.0011 0.0069 -----------------------------------------------

Unlock operationremove bit from bloom filter?

32 bit counter register each bloom filter bit has 2 bit counter Increment the 2-bit counter if the

bloom filter bit is set Unlock: decrement the 2-bit counter,

if 0, clear bloom filter bit

2b 16b

16b

32b

Candidate Set and LState Communications

must broadcast changes to C(v) if cache line is in shared state

Handling Barriers

Set BFVectors to all 1s after exiting a barrier

(what if t2 does not hold any lock?)

Three Approximations

Bloom filter to represent lockset Lockset info only in cache

– Can only detect races in a short window of execution Cache line granularity

– False sharing– Compiler to put shared variables to different lines?– Removing false sharing is generally good

Outline

LockSet (refresh our memory) HARD Evaluation Conclusion

Methodology

SESC: cycle-accurate execution-driven simulator (MIPS instruction set)

Six SPLASH-2 benchmarks Randomly inject a data race: randomly remove a dynamic

instance of lock and corresponding unlock Compare with happens-before, ideal lockset

Bug detected, false alarms

Ideal: word-granularity, keep state in memory, perfect lockset # of false alarms is # of source code locations, dynamic errors are

much more

Mainly bus traffic increase Note that HARD requires bloom filter operation per memory

access in processor pipeline

Conclusion

Main idea: bloom filter to represent lockset Three approximations:

– Bloom filter to represent lockset– Lockset info only in cache– Cache line granularity

Problems:– Lockset: false positives– Seems hard to add operations into processor pipeline– Are these the right approximations for monitoring production

runs?