Formalizing Memory Consistency Models
for Program Analysis
Jason Yue Yang
This work was supported in part by NSF Research Grant No. CCR-0081406 and SRC Task 1031.001.
Doctoral Dissertation Defense
2
Memory architectures - more aggressive
Central Problem – shared memory consistency models
- Need a clear specification of memory ordering rules- Need an executable version of memory ordering rules- Need a method to analyze thread executions against the rules
Load/store Data dependence
Semaphore
Memory fence Load-acquire/store-release
Write atomicity
Motivation
Multithreaded software – popular, BUT hard to analyze- Thread libraries: e.g., P-thread, Win32, Solaris- Language level support of threads: e.g., Java
3
What Is a Memory Model?
It defines the legal orderings of memory operations that can be perceived at the user level
CPU
memory
st a,1;st b,1;
ld r1,b; <1>ld r2,a; <0>
st a,1 ;st.rel b,1;
ld.acq r1,b; <1>ld r2,a; <1>
CPU CPUCPU
memory
Example (Itanium assembly code, initially: a = b = 0)Can’t observe 0
store/loadless restriction
store-release/load-acquiremore restriction
0 is OK
4
Classical Memory Models
1. Common total order 2. Program order 3. Read sees the “latest”
write
Sequential Consistency (SC)
Other Weaker Models:Parallel Random Access Memory (PRAM), Coherence, Causal Consistency,Processor Consistency, Release Consistency, Lazy Release Consistency,Location Consistency, and more …
memory
They execute as ifconnected to a singlememory through anon-deterministic switch
Non-operational View: Operational View:
5
Industrial Memory Models
The Intel Itanium® Memory Model • Intel application note contains more than 30 pages of semi-formal rules • English + large amount of special notations• Many non-obvious consequences• Use litmus tests to illustrate properties• Cannot automatically execute litmus tests• Use pencil-and-paper reasoning
Example:
6
Language Level Memory Models
• Original JMM: Chapter 17 of Java Language Specification • Poorly understood• Flawed
- too weak (may introduce security hole)- too strong (prevents common optimizations)
• Currently under revision (JSR-133)- Extensive discussions for more than 3 years- Several replacement proposals- Issues still remain
Example:The Java Memory Model (JMM)
7
Why Does a Memory Model Matter?
Initially, flag1 = flag2 = false, turn = 0.
Thread 1 Thread 2flag1 = true;
turn = 2;while (turn == 2 && flag2) ;<critical section>flag1 = false;
flag2 = true;turn = 1;while (turn == 1 && flag1) ;<critical section>flag2 = false;
Can both threads enter the critical section simultaneously?
• For sequential consistency: No (the “intended behavior” is guaranteed)• For many weaker models: Yes (the algorithm would be broken)
Example: Peterson’s Algorithm for Mutual Exclusion
8
Do Programmers Really Care?Another example: Double-Checked Locking for Singleton creation
class foo { private static Helper helper = null; public static Helper get() { if (helper == null) { synchronized (this) { if (helper == null) helper = new Helper(); } } return helper; }}
Only use locking as needed
“Double-check” the reference
9
Broken Under the Current JMM
class foo { private static Helper helper = null; public static Helper get() { if (helper == null) { synchronized (this) { if (helper == null) helper = new Helper(); } } return helper; }}
Only use locking as needed
“Double-check” the reference
Problem:Broken under the JMM! - on weak architectures - with race conditions - reference can be “visible” before constructor completes
Can’t guarantee Helper is fully constructed!
10
Problems with Previous Approaches
Virtually for all industrial weak memory models •They don’t have formal specifications
For those that do have a formal spec on paper•They can’t be executed
For those that have a machine-readable formal spec•They use a “state machine” approach that
- employ architecture-specific data structures - cannot be decomposed into orthogonal components- have not been verified against higher level rules
No support for verifying “programmer expectations” in multithreaded software
11
Analysis of Multithreaded Software
Intra-procedural Inter-procedural
Inter-threadIntra-thread
More precise
Memory-modelinsensitive
More Scalable
My thesis work
Memory-modelsensitive
12
ContributionsOperational style framework - UMMApplications:
Language level memory model issues Applications:
Prototype tools based on various solvers: CLP, SAT, QBF
Incremental SAT solving; Different encoding
Intel Itanium Memory Model, Classical memory models
Execution validationRace detectionAtomicity verification
Operational Specification
Method
Axiomatic Specification
Method
Constraint Solving Method
Concurrency Analysis
Non-Operational style framework - NemosApplications:
Java Memory Model, Classical memory models
13
Operational Approach: UMM
1. Supports formal verification Integrates a model checker (Murphi) Inspired by Park & Dill’s work on Sparc
2. Employs a generic memory abstraction To eliminate architecture-specific complexities Uniform notation A parameterized method
Uniform Memory Model
14
UMM Abstract Machine
LIB – Local Instruction Buffer GIB – Global Instruction Buffer
LIBjLIBi
ThreadjThreadi
GIB
- Only two layers
- GIB can grow as needed
Key insight: make it easy to configure program order and visibility order
15
General Strategy in UMM
Enabling mechanism- Program order may be relaxed to enable - certain interleaving- Controlled via bypassing table
Filtering mechanism- Visibility order constructed from GIB following - proper ordering requirements - Enforced in read selection rules
16
UMM Example: Sequential Consistency
Event Condition Actionread iLIBt(i) :
ready(i) op(i) = Read ( wGIB: legalWrite(i, w))
i.data := data(w);LIBt(i) := delete(LIBt(i), i);
write iLIBt(i) : ready(i) op(i) = Write
GIB := append(GIB, i);LIBt(i) := delete(LIBt(i), i);
Transition Table
ready(i) jLIBt(i): pc(j) < pc(i) BYPASS[op(j)][op(i)] = No
legalWrite(r, w) op(w) = Write var(w) = var(r) ( w’GIB : op(w’) = Write var(w’) = var(r) time(r) > time(w’) time(w’) > time(w))
Program order
Visibility order
17
Non-Operational Approach: Nemos Desired Features
Easy to understand, flexiblePrecise
Compositional, modularExecutable
SolutionsDeclarative (axiomatic)Predicate logic“Higher order” logicMake “hidden” rules explicit
Key insights (1) Make the rules higher order - pass down the order relation through all the rules
- Compositional, reusable, scalable, easy to compare (2) Make all rules explicit
- Executable using a constraint-programming system
(Non-operational yet Executable Memory Ordering Specifications)
18
legal ops order requireProgramOrder ops order requireReadValue ops order requireWeakTotalOrder ops odder requireTransitiveOrder ops order requireAsymmetricOrder ops order
Nemos Example: Sequential Consistency
Formal Definition of SC
- Program order
requireTransitiveOrder ops order i, j, k ops. (order i j order j k) order i k
requireProgramOrder ops order i, j ops. (t i = t j pc i < pc j) (t i = t_init t j t_init) order i j
- Common total order
- Read sees “latest” write
order is repeatedly refined
Hidden rules are explicit
(ops is the execution; order is the ordering relation)
19
The Itanium Memory Ordering Rules
legal ops order requireLinearOrder ops order requireWriteOperationOrder ops order requirePO ops odder requireMemoryDataDependence ops order requireDataFlowDependence ops order requireCoherence ops order requireReadValue ops order requireAtomicWBRelease ops order requireNoUCBypass ops order
20
– requireLinearOrder • Irreflexive• Transitive• Total• Asymmetric
– requireWriteOperationOrder • Local/Remote case• Remote/Remote case
– requireProgramOrder• Acquire Rule• Release Rule• Fence Rule
– requireMemoryDataDependence • MD:RAW• MD:WAR• MD:WAW
– requireDataFlowDependence • DF:RAW• DF:WAR• DF:WAW
– requireCoherence •Local/Local case•Remote/Remote case
– requireReadValue •ValidWr
•ValidLocalWr•ValidRemoteWr•ValidDefaultWr
•ValidRd
– requireAutomicWBRelease
– requireSequentialUC –RAR Rule–RAW Rule–WAR Rule–WAW Rule
– requireNoUCBypasss
Specification Hierarchy for Itanium
21
Execution Validation:
Memory Model Specification Constraints
How to Make an Axiomatic Specification Executable?
SAT
UNSAT
SolverCLPSATQBF
Test Program
validateExecution ops order. legal ops order
- Effective for revealing critical properties- Effective for verifying common programming patterns
22
• Implementation in FD-Prolog is straightforward• Universal quantification handled via enumeration
• Existential quantification handled via backtracking• Built-in constraint solver from FD-Prolog:
- logical variables- Finite-domain (FD) variables
Using Constraint Logic Programming (CLP)
23
How to Encode the Ordering Relation?
Given a test program with N operations, use a 2D precedence matrix with N2 constraint variables
Interpret the symbolic execution, impose constraints to the 2D matrix
When interpretation finishes, x values reveal latitude in weak order
When an x changes to a 1, an attempt to set it to 0 later triggers backtracking
x x x x x xx x x x x xx x x x x xx x x x x xx x x x x xx x x x x x
ji
Values of entry Mij:1: i is ordered before j0: i is not ordered before jx: value not bound yet
Precedence matrix M
nn Encoding: The Method:
24
Example of Prolog Implementation
requireProgramOrder ops order i, j ops. (t i = t j pc i < pc j) (t i = t_init t j t_init) order i j
requireProgramOrder(Ops,Order):- for_each_elem(Ops,Order,doProgramOrder).
elem_prog(doProgramOrder,Ops,Order,I,J):- nth(I,Ops,Oi), nth(J,Ops,Oj), p(Oi,P_i), p(Oj,P_j), pc(Oi,PC_i), pc(Oj,PC_j), length(Ops,N), matrix_elem(Order,N,I,J,Oij),
(T_i #= T_j #/\ PC_i #< PC_j) #\/ T_i #= 0 #/\ T_j #\= 0) #=> Oij.
Formal Specification (e.g., requireProgramOrder)
SICStus Prolog Code
25
Interactive and Incremental Analysis
Initially, a = b = 0.
P1st a,1;st b,1;
P2ld r1,b; <1>ld r2,a; <0>
Can r1 = 1 and r2 = 0?
P1 P2 (1) st_local(a,1); (7) ld(1,b); (2) st_remote1(a,1); (8) ld(0,a); (3) st_remote2(a,1); (4) st_local(b,1); (5) st_remote1(b,1); (6) st_remote2(b,1);
Itanium Test Program Execution (ops)
0 1 1 x x x x x 0 0 1 x x x x x 0 0 0 x x x x 0x x x 0 1 1 1 xx x x 0 0 1 1 xx x x 0 0 0 1 xx x x 0 0 0 0 xx x 1 x x x x 0
Result: legal
1 2 3 4 5 6 7 812345678
Order satisfying all constraints An instantiated Order
Interleaving: 8 4 5 6 7 1 2 3
0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 01 1 1 0 1 1 1 01 1 1 0 0 1 1 01 1 1 0 0 0 1 01 1 1 0 0 0 0 01 1 1 1 1 1 1 0
1 2 3 4 5 6 7 812345678
26
The SAT/QBF Approach
Initially, we “retro-fit” our Prolog version with SAT-generating code - Showed speed improvement in constraint solving, BUT … - Still slow in CNF generation - Very difficult to debug
So we re-engineered our tool: (Done by Prof. Ganesh Gopalakrishnan) - “Stamping out” a finite execution as a QBF formula - “Stamping out” a finite execution as a CNF formula - Experimenting different encoding method: nn vs. nlogn - Check pointing SAT generation
27
Gist of Results1. SAT seems to be better than QBF2. The nn encoding method is better than nlogn
- despite using more bits - many unit clauses, good for SAT solving
2. Check pointing method does pay-off up to 64 tuples3. We can easily handle 128 operations4. Latest result: completed Intel-provided test run
(experiment done by Hemanthkumar Sivaraj)- test contains 500 Itanium memory operations- had to suppress the total-order constraint, UNSAT- takes 10 sec to generate SAT instance; 0.1 sec to solve- still lots of room for improvement
28
How to Verify Programmer Expectations?
Program propertiese.g., race / atomicity
(2) Model correctness properties as additional constraints (3) Reduce a verification problem to a constraint satisfaction problem
and solve it automatically
SAT
UNSAT
Solver
Test Program
Constraints
(1) Define both intra-thread and inter-thread semantics as constraints
Program semantics
+
Memory model semantics
29
Race DetectionWhat’s a data-race? Informally: conflicting and concurrent accesses
Initially, a = b = 0.Thread 1r1 = a;if (r1 > 0) b = 1;
Thread 2r2 = b;if (r2 > 0) a = 1;
Is this program race-free?
• Control flow interwoven with memory consistency requirements• Hence, the question depends on the memory model
- Under SC, this program is race-free- Under a weaker model, this program might contain races
Are these two instructions conflicting and concurrent?
30
Constraints for Control Flow• Treat control operations similar to memory operations
–Imagine “assigns” and “uses” of “control variables”• Add an auxiliary control variable ck for each branch statement k, and convert the if-statement to an auxiliary assign of ck
–E.g. if(r1>0) becomes c1=r1>0• Every op k has a path predicate ctrExpr
–K is a use of those control variables in ctrExpr• k is feasible if ctrExpr evaluates to ture• Feasibility of ops are checked when setting the rules
31
Data and Control Dependence
Data/control flow can be treated similar to global read value rule, i.e., a read should see the “latest” write
Global Reads: for all r = x, exists a x = …Local Reads: for all x = r, exists a r = …Control Reads: for all op that depends on c, exists a c = …
requireReadValue ops order globalReadValue ops
order localReadValue ops
order controlReadValue ops
order
32
How to Formalize Data-Race?detectDataRace ops scOrder, hbOrder.
legalSC ops scOrder requireHbOrder ops hbOrder mapConstraints ops hbOrder
scOrder existDataRace ops hbOrderrequireHbOrder ops hbOrder requireProgramOrder ops
hbOrder requireSyncOrder ops hbOrder
requireTransitiveOrder ops hbOrderexistDataRace ops hbOrder i, j ops.
conflictingAccess i j ¬ (hbOrder i j) ¬ (hbOrder j
i)
33
Atomicity Verification
What’s Atomicity? Informally: a block of code executed atomically Neither a necessary nor a sufficient condition for race-
freedomOur approach:
Annotate the atomic block with AtomicEnter and AtomicExit Verify it automatically Our definition is generic, can be fine-tuned
34
Constraints for AtomicityverifyAtomicity ops order.
legalSC ops order existsAtomicityViolation ops order
existsAtomicityViolation ops order i, j, k ops.
matchedAtomicPair i j (t k t i) ¬ (order k i) ¬ (order j k)
35
ConclusionMy thesis addressed the following issues
- How to make memory ordering rules clear and executable?
- How to analyze thread executions against these rules?Our methods have been shown to be practical - A wide range of academic memory models as well as real-world models (Itanium, JMM) - Validation of test cases far exceeded others’ both in speed and scale - Being applied for post-silicon verification in industry
Many “customers” can benefit from our methods- Software developers, compiler writers, system designers
36
Publications• Analyzing the CRF Java Memory Model (APSEC’01)• Specifying Java Thread Semantics Using a Uniform Memory Model (JGI’02)• UMM: An Operational Memory Model Specification Framework with Integrated Model Checking Capability (CCPE)
Operational Specification
Method
Axiomatic Specification
Method
Constraint Solving Method
Concurrency Analysis
• Analyzing the Intel Itanium Memory Ordering Rules Using Logic Programming and SAT(CHARME’03)• Nemos: A Framework for Axiomatic and Executable Specifications of Memory Consistency Models (IPDPS’04)• A Constraint-Based Approach for Specifying Memory Consistency Models (sent to TPLP)
• QB or not QB: An Efficient Execution Verification Tool for Memory Orderings (sent to CAV)
• Rigorous Concurrency Analysis of Multithreaded Programs (sent to ISSTA)
37
Continuing Research Opportunities Scale-up our approach even further - Give up certain precision - Compositional methods - Create assertion language to help abstraction Improve solving algorithms - Exploit the structural information “Memory-model-sensitive” compilers - Code synthesis, optimization Other application domains - Security, embedded systems
Thank You !The dissertation is available at
http://www.cs.utah.edu/~yyang/papers/thesis.pdf
The prototype tools are available athttp://www.cs.utah.edu/~yyang/research.html
Top Related