† Susan Owicki and David Gries. Verifying properties of parallel programs: An axiomatic approach....

23
The State of Parallel Programming Burton Smith Technical Fellow Microsoft Corporation FT07

Transcript of † Susan Owicki and David Gries. Verifying properties of parallel programs: An axiomatic approach....

The State of Parallel Programming

Burton SmithTechnical FellowMicrosoft Corporation

FT07

Parallel Computing is Now Mainstream> Cores are reaching performance

limits> More transistors per core just makes it

hot> New processors are multi-core

> and maybe multithreaded as well> Uniform shared memory within a

socket> Multi-socket may be pretty non-

uniform> Logic cost ($ per gate-Hz) keeps

falling> New “killer apps” will doubtless

need more performance> How should we write parallel

programs?

Parallel Programming Practice Today> Threads and locks> SPMD languages

> OpenMP> Co-array Fortran, UPC, and Titanium

> Message passing languages> MPI, Erlang

> Data-parallel languages> Cuda, OpenCL

> Most of these are pretty low level

Higher Level Parallel Languages

> Allow higher level data-parallel operations> E.g. programmer-defined reductions and

scans> Exploit architectural support for

parallelism> SIMD instructions, inexpensive

synchronization> Provide for abstract specification of

locality> Present a transparent performance

model> Make data races impossible

For the last item, something must be done about unrestricted use of variables

Shared Memory is not the Problem

> Shared memory has some benefits:> Forms a delivery vehicle for high

bandwidth> Permits unpredictable data-dependent

sharing> Provides a large synchronization

namespace> Facilitates high level language

implementations> Language implementers like it as a

target> Non-uniform memory can even scale> But shared variables are an issue:

stores do not commute with other loads or stores

> Shared memory isn’t a programming model

Pure Functional Languages

> Imperative languages do computations by scheduling values into variables> Their parallel dialects are prone to data

races> There are far too many parallel

schedules> Pure functional languages avoid data

races simply by avoiding variables entirely> They compute new constants from old> Loads commute so data races can’t

happen > Dead constants can be reclaimed

efficiently> But no variables implies no mutable

state

Mutable State is Crucial for Efficiency> To let data structures inexpensively

evolve> To avoid always copying nearly all of

them > Monads were added to pure functional

languages to allow mutable state (and I/O)> Plain monadic updates may still have

data races > The problem is maintaining state

invariants> These are just a program’s “conservation

laws”> They describe the legal attributes of the

state> As with physics, they are associated with

a certain generalized type of commutativity

Maintaining Invariants

> Updates perturb, then restore an invariant> Program composability depends on this > It’s automatic for us once we learn to program> How can we maintain invariants in parallel?

> Two requirements must be met:> Updates must not interfere with each other

> That is, they must be isolated in some fashion> Updates must finish once they start

> …lest the next update see the invariant false> We say the state updates must be atomic

> Updates that are both isolated and atomic are called transactions

Commutativity and Non-Determinism> If p and q preserve invariant I and do not interfere,

their parallel execution { p || q } also preserves I †

> If p and q are performed in isolation and atomically, i.e. as transactions, then they will not interfere ‡

> Operations may not commute with respect to state> But we always get commutativity with respect to the

invariant

> This leads to a weaker form of determinism> Long ago some of us called it “good non-determinism”> It’s the non-determinism operating systems rely on† Susan Owicki and David Gries. Verifying properties of parallel programs:

An axiomatic approach. CACM 19(5), pp. 279−285, May 1976. ‡ Leslie Lamport and Fred Schneider. The “Hoare Logic” of CSP, And All That. ACM TOPLAS 6(2), pp. 281−296, April 1984.

Example: Hash Tables

> Hash tables implement sets of items> The key invariant is that an item is in

the set iff its insertion followed all removals> There are also storage structure

invariants, e.g. hash buckets must be well-formed linked lists

> Parallel insertions and removals need only maintain the logical AND of these invariants> This may not result in deterministic state> The order of items in a bucket is

unspecified

High Level Data Races

> Some loads and stores can be isolated and atomic but cover only a part of the invariant> E.g. copying data from one structure to

another> If atomicity is violated, the data can be

lost> Another example is isolating a graph

node while deleting it but then decrementing neighbors’ reference counts with LOCK DEC> Some of the neighbors may no longer

exist> It is challenging to see how to

automate data race detection for examples like these

Other Examples

> Data bases and operating systems commonly mutate state in parallel

> Data bases use transactions to achieve consistency via atomicity and isolation> SQL programming is pretty simple> SQL is arguably not general-purpose

> Operating systems use locks for isolation> Atomicity is left to the OS developer> Lock ordering is used to prevent deadlock

> A general purpose parallel language should easily handle applications like these

Implementing Isolation

> Analysis> Proving concurrent state updates are isolated

> Locking> Deadlock must be handled somehow

> Buffering> Often used for wait-free updates

> Partitioning> Partitions can be dynamic, e.g. as in quicksort

> Serializing> These schemes can be nested

Isolation in Existing Languages

> Static in space: MPI, Erlang> Dynamic in space: Refined C, Jade> Static in time: Serial execution > Dynamic in time: Single global

lock> Static in both: Dependence

analysis> Semi-static in both: Inspector-

executor > Dynamic in both: Multiple locks

Atomicity

> Atomicity means “all or nothing” execution> State changes must be all done or undone

> Isolation without atomicity has little value> But atomicity is vital even in the serial case

> Implementation techniques:> Compensating, i.e. reversing a computation “in

place”> Logging, i.e. remembering and restoring the

original state values> Atomicity is challenging for distributed

computing and I/O

Exceptions

> Exceptions can threaten atomicity> An aborted state update must be undone

> What if a state update depends on querying a remote service and the query fails?> The message from the remote service

should send exception information in lieu of the data

> Message arrival can then throw as usual and the partial update can be undone

Transactional Memory

> “Transactional memory” means transaction semantics within lexically scoped blocks> TM has been a hot topic of late> As usual, lexical scope seems a virtue

here> Adding TM to existing languages has

problems> There is a lot of optimization work to

do> to make atomicity and isolation highly

efficient> Meanwhile, we shouldn’t ignore

traditional ways to get transactional semantics

Whence Invariants?

> Can we generate invariants from code?> Only sometimes, and it is difficult even then

> Can we generate code from invariants?> Is this the same as intentional programming?

> Can we write invariants plus code and let the compiler check invariant preservation? > This is much easier, but may be less attractive

> Can languages make it more likely that a transaction covers the invariant’s domain?> E.g. leveraging objects with encapsulated state

> Can we at least debug our mistakes?

Conclusions

> Functional languages with transactions enable higher level parallel programming> Microsoft is heading in this general

direction> Efficient implementations of isolation

and atomicity are important> We trust architecture will ultimately

help support these things> The von Neumann model needs

replacing, and soon

YOUR FEEDBACK IS IMPORTANT TO US!

Please fill out session evaluation

forms online atMicrosoftPDC.com

Learn More On Channel 9

> Expand your PDC experience through Channel 9.

> Explore videos, hands-on labs, sample code and demos through the new Channel 9 training courses.

channel9.msdn.com/learnBuilt by Developers for Developers….

© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.