ECE729 : Advance Computer Architecture

23
Anshul Kumar, CSE IITD ECE729 : Advance ECE729 : Advance Computer Architecture Computer Architecture Lecture 26: Synchronization, Memory Consistency 25 th March, 2010

description

ECE729 : Advance Computer Architecture. Lecture 26: Synchronization, Memory Consistency 25 th March, 2010. Synchronization Problem. Processes run on different processors independently At some point they need to know the status of each other for communication mutual exclusion etc - PowerPoint PPT Presentation

Transcript of ECE729 : Advance Computer Architecture

Page 1: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD

ECE729 : Advance Computer ECE729 : Advance Computer ArchitectureArchitecture

ECE729 : Advance Computer ECE729 : Advance Computer ArchitectureArchitecture

Lecture 26: Synchronization, Memory Consistency

25th March, 2010

Page 2: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 2

Synchronization ProblemSynchronization ProblemSynchronization ProblemSynchronization Problem

• Processes run on different processors independently

• At some point they need to know the status of each other for– communication– mutual exclusion etc

• Hardware primitives required for these operations

Page 3: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 3

Consider an exampleConsider an exampleConsider an exampleConsider an example

Bank transaction from account number A :

• b = read_bal (A)

• b = b – debit_amt

• if b >= bmin

update_bal (A, b)

Page 4: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 4

Two concurrent transactionsTwo concurrent transactionsTwo concurrent transactionsTwo concurrent transactions

Transaction 1 :• b1 = read_bal (A)• b1 = b1 – debit_amt1• if b1 >= bmin

update_bal (A, b1)

Transaction 2 :• b2 = read_bal (A)• b2 = b2 – debit_amt2• if b2 >= bmin

update_bal (A, b2)

Page 5: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 5

Two concurrent transactionsTwo concurrent transactionsTwo concurrent transactionsTwo concurrent transactions

serialize reads

Transaction 1 :• b1 = read_bal (A)• b1 = b1 – debit_amt1

• if b1 >= bmin

update_bal (A, b1)

and writes

Transaction 2 :

• b2 = read_bal (A)• b2 = b2 – debit_amt2• if b2 >= bmin

update_bal (A, b2)

Page 6: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 6

Lock for mutual exclusionLock for mutual exclusionLock for mutual exclusionLock for mutual exclusion

Transaction 1 :aquire: x1 = read (lock) if x1 = 1 then goto aquire set (lock) do transaction1……release: clear (lock)

Transaction 2 :aquire: x2 = read (lock) if x2 = 1 then goto aquire set (lock) do transaction2……release: clear (lock)

Transaction 1 :aquire: x1 = read (lock) if x1 = 1 then goto aquire set (lock) do transaction1……release: clear (lock)

Transaction 2 :aquire: x2 = read (lock) if x2 = 1 then goto aquire set (lock) do transaction2……release: clear (lock)

Page 7: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 7

Lock for mutual exclusionLock for mutual exclusionLock for mutual exclusionLock for mutual exclusion

Transaction 1 :aquire: x1 = read (lock) if x1 = 1 then goto aquire set (lock) do transaction1……release: clear (lock)

Transaction 2 :aquire: x2 = read (lock) if x2 = 1 then goto aquire set (lock) do transaction2……release: clear (lock)

Transaction 1 :aquire: x1 = read (lock) if x1 = 1 then goto aquire set (lock) do transaction1……release: clear (lock)

Transaction 2 :aquire: x2 = read (lock) if x2 = 1 then goto aquire set (lock) do transaction2……release: clear (lock)

Page 8: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 8

Synchronization PrimitivesSynchronization PrimitivesSynchronization PrimitivesSynchronization Primitives

Hardware primitive required

• Should have atomic read+write operation

• Examples:– test&set – exchange– fetch&increment– load linked, store contitional

Page 9: ECE729 : Advance Computer Architecture

Spin Lock with Exchange Instr.Spin Lock with Exchange Instr.Spin Lock with Exchange Instr.Spin Lock with Exchange Instr.Lock: 0 indicates free and 1 indicates locked

Code to lock X : r2 1lockit: r2 X ;atomic exchange

if(r20)lockit ;already locked

locks are cached for efficiency, coherence is used

Better code to lock X :lockit: r2 X ;read lock if(r20)lockit ;not available r2 1 r2 X ;atomic exchange

if(r20)lockit ;already locked

Page 10: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 10

LD Linked & ST conditionalLD Linked & ST conditionalLD Linked & ST conditionalLD Linked & ST conditionalSimpler to implement• atomic exchange r2 X using LL and SCtry: r3 r2 ;move exchange value LL r1, X ;load linked SC r3, X ;store conditional if(r3=0)try ;branch, store fails r2 r1 ;put loaded value in r2• fetch&increment using LL and SCtry: LL r1, X ;load locked r3 r1 + 1 ;increment SC r3, X ;store conditional if(r3=0)try ;branch, store fails

Page 11: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 11

Spin Lock with LL & SCSpin Lock with LL & SCSpin Lock with LL & SCSpin Lock with LL & SClockit: LL r2, X ;load locked if(r20)lockit ;not available r2 1 SC r2, X ;store cond if(r2=0)lockit ;branch store fails• performance in presence of contention?• spin lock with exponential back-off reduces

contention

Page 12: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 12

Barrier SynchronizationBarrier SynchronizationBarrier SynchronizationBarrier Synchronization

lock (X)

if(count=0)release 0count++

unlock(X)

if(count=total){count0;release1}else spin(release=1)

0

1

Page 13: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 13

Improved Barrier Synch.Improved Barrier Synch.Improved Barrier Synch.Improved Barrier Synch.

local_sense !local_senselock (X)count++unlock(X)if(count = total) {count0;releaselocal_sense}else {spin(release = local_sense)}

tree based barrier reduces contention

Page 14: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 14

Memory Consistency ProblemMemory Consistency ProblemMemory Consistency ProblemMemory Consistency Problem

• When must a processor see the value that has been written by another processor? Atomicity of operations – system wide?

• Can memory operations be re-ordered?

Various models :

http://rsim.cs.uiuc.edu/~sadve/Publications/

models_tutorial.ps

Page 15: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 15

ExampleExampleExampleExample

P1: A = 0 P2: B = 0 ... ... A = 1 B = 1L1: if(B=0)S1 L2: if(A=0)S2

Which statements among S1 and S2 are done?

Both S1, S2 may be done if writes are delayed

Page 16: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 16

Sequential ConsistencySequential ConsistencySequential ConsistencySequential Consistency

• result of any execution is same as if the operations of all processors were executed in some sequential order

• operations of each processor occur in the order specified by its program

- it requires all memory operations to be atomic

- too restrictive, high overheads

Page 17: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 17

Relaxing WRelaxing WR orderR orderRelaxing WRelaxing WR orderR order

Loads are allowed to overtake stores

Write buffering is permitted

1. Total Store Ordering : Writes are atomic

2. Processor Consistency : Writes need not be atomic - Invalidations may gradually propagate

Page 18: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 18

Relaxing WRelaxing WR & WR & WW orderW orderRelaxing WRelaxing WR & WR & WW orderW order

Partial Store Ordering

• Loads are allowed to overtake stores

• Writes can be re-ordered

• Memory barrier or fence are used to explicitly order any operations

Further improves the performance

Page 19: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 19

ExamplesExamplesExamplesExamples

P1 P2

A = 1; while(flag=0);

flag = 1; print A;

P1 P2

A = 1; print B;

B = 1; print A;

SC ensures that “1” is printed

TSO, PC also do so

PSO does not

SC ensures that if B is printed as “1” then A is also printed as “1”

TSO, PC also do so

PSO does not

Page 20: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 20

Examples - continuedExamples - continuedExamples - continuedExamples - continued

P1 P2 P3A = 1; while(A=0); while(B=0); B = 1; print A;SC ensures that “1” is printed. TSO and PSO also do that but

PC does not

P1 P2A = 1; B = 1;print B; print A;SC ensures that both can’t be printed as “0”. TSO, PC and

PSO do not

Page 21: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 21

Relaxing all R/W orderRelaxing all R/W orderRelaxing all R/W orderRelaxing all R/W order

Weak Ordering or Weak Consistency

• Loads and Stores are not restricted to follow an order

• Explicit synchronization primitives are used

• Synchronization primitives follow a strict order

• Easy to achieve

• Low overhead

Page 22: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 22

Release ConsistencyRelease ConsistencyRelease ConsistencyRelease Consistency

• Further relaxation of weak ordering• Synch primitives are divided into aquire

and release operations• R/W operations after an aquire cannot

move before it but those before it can be moved after

• R/W operations before a release cannot move after it but those after it can be moved before

Page 23: ECE729 : Advance Computer Architecture

Anshul Kumar, CSE IITD slide 23

WC and RC ComparisonWC and RC ComparisonWC and RC ComparisonWC and RC Comparison

R/W…

R/W

R/W…

R/W

R/W…

R/W

synch

synch

1

2

3

R/W…

R/W

R/W…

R/W

R/W…

R/W

aquire

release

1

2

3

WC RC