A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and...

23
A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory

Transcript of A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and...

Page 1: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

A Behavioral Memory Model for the

UPC Language

Kathy YelickUniversity of California, Berkeley and

Lawrence Berkeley National Laboratory

Page 2: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 2Oct 17, 2003

UPC Collaborators

• This talk presents joint work with:• Chuck Wallace, MTU• Dan Bonachea, UCB• Jason Duell, LBNL

• With input from the UPC Community, in particular• Bill Carlson, IDA• Brian Wibecan, HP

• The Berkeley UPC Group• Christian Bell• Dan Bonachea• Wei Yu Chen• Jason Duell• Paul Hargrove• Parry Husbands• Costin Iancu• Mike Welcome

Page 3: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 3Oct 17, 2003

Global Address Space Languages

• Explicitly-parallel programming model with SPMD parallelism• Fixed at program start-up, typically 1 thread per processor

• Global address space model of memory• Allows programmer to directly represent distributed data

structures• Address space is logically partitioned

• Local vs. remote memory (two-level hierarchy)• Programmer control over performance critical decisions

• Data layout and communication • Performance transparency and tunability are goals

• Initial implementation can use fine-grained shared memory• Suitable for current and future architectures

• Either shared memory or lightweight messaging is key• Base languages differ: UPC (C), CAF (Fortran), Titanium (Java)

Page 4: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 4Oct 17, 2003

Why Another Language?

• MPI is current standard for programming large-scale machines• But difficulty-of-use has left users behind

• Clusters of SMPs lead to two parallel programs in one

• Single model for shared and distributed memory machines• Shared memory multiprocessors (SMPs, SGI Origin, etc.)

• Global address space machines (Cray T3D/E, X1)

• Remote put/get instructions, but no HW caching of remote data

• Distributed memory machines/clusters with fast communication

• Shmem, GASNet (LAPI, GM, Elan, SCI), Active Messages

• Software caching in some implementations

• UPC is popular within some government labs• Commercial and Open Source compilers

Page 5: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 5Oct 17, 2003

Global Address Space

• Several kinds of array distributions• double a[n] a private n-element array on each processor• shared double b[n] a n-element shared array, with cyclic mapping • shared [4] double c[n] a block cyclic array with 4-element blocks

• Pointers for irregular data structures• shared double *sp a pointer to shared data• double *lp a pointer to local data (assumed private)

Shared

Glo

bal

ad

dre

ss

spac

e

a[0]

Privatesp: sp: sp:

a[1] a[P]

lp: lp:

Page 6: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 6Oct 17, 2003

UPC Memory Model

• UPC has two types of memory accesses• Relaxed:

• operation must respect local (on-thread) dependencies• other threads may observe these operations happening in

different orders• Strict:

• operation must appear atomic • all relaxed operations issued earlier must complete before• all relaxed operations issued later must happen later

• Several ways to specify the access:• strict shared int x; type qualifier• #pragma upc_relaxed pragma • #include <upc_relaxed.h> include file

Page 7: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 7Oct 17, 2003

Behavioral Approach

• Problems with operations specifications• Implicit assumptions about implementation strategy (e.g., caches)• May unnecessarily restrict implementations • Intuitive in principle, but complicated in practice

• A Behavioral Approach• Based on partial and total orders• Using Sequential Consistency definition as model

• Processor order defines a total order on each thread• Their union defines a partial order• 9 a consistent total order that is correct as a serial execution

•P0

•P1

Page 8: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 8Oct 17, 2003

Some Basic Notation

• The set of operations is • Ot = the set of operations issued by thread t

• The set of memory operations is:• M = {m0, m1, …}• Mt = the set of memory operations from thread t

• Each memory operations has properties• Thread(mi) is the thread that executed the operation• Location(mi) is the memory location involved

• Memory operations are partitioned into 6 sets, given by• S = Strict, R=Relaxed, P=Private• W=Write, R=Read (in the 2nd position)• Some useful groups: Strict(M) = SW(M) [ SR(M) W(M) = SW(M) [ RW(M) [ PW(M)

Page 9: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 9Oct 17, 2003

Compiler Assumption

• For specification purposes, assume the code is compiled by a naïve compiler in to ISO C machine• Real compilers may do optimizations

• E.g., reorder, remove, insert memory operations• Even strict operations may be reordered with sufficient

analysis (cycle detection)• These must produce an execution whose input/output and volatile

behavior is identical to that of an unoptimized program (ISO C)

Page 10: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 10Oct 17, 2003

Orderings on Strict Operations

Threads must agree on an ordering of:

• For pairs of strict accesses, it will be total:

• For a strict/relaxed pair on the same thread, they will all see the program order

Page 11: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 11Oct 17, 2003

Orderings on Local Operations

• Conflicting accesses have the usual definition

• Given a serial execution S = [o1,…on] defining <S let St be the subsequence of operations issued by t

• S conforms to program order for thread t iff:• St is consistent with the program text for t (follows control flow)

• S conforms to program dependence order for t iff 9 a permutation P(S) such that:• P(S) conforms to program order for t

• 8 (m1, m2) 2 Conflicting(M) m1 <S m2 , m1 <P(S) m2

Page 12: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 12Oct 17, 2003

UPC Consistency

An execution on T threads with memory ops M is UPC consistent iff:

•9 a partial <strict that orients all pairs in allStrict(M)

• And for each thread t 9 a total order <t on Ot [ W(M) [ SR(M)• <t is consistent with <strict

• All threads agree on ordering of strict operations

• <t conforms to program dependence order

• Local dependencies are observed

• <t is a correct execution

• Reads return most recent write values

Page 13: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 13Oct 17, 2003

Intuition on Strict Oderings

• Each thread may “build” its own total order to explain behavior

• They all agree on the strict ordering shown above in black, but• Different threads may see relaxed writes in different orders

• Allows non-blocking writes to be used in implementations• Each thread sees own dependencies, but not those of other threads

• Weak, but otherwise there would be consistency requirements on some relaxed operations

• Preserving dependencies requires usual compiler/hw analysis

•P0

•P1

Page 14: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 14Oct 17, 2003

Synchronization Operations

• UPC has both global and pairwise synchronization

• In addition to the synchronization properties, they also have memory model implications:• Locks

• upc_lock is a strict read• upc_unlock is a strict write

• Barriers (which may be split-phase)• upc_notify (begin barrier) is a strict write• upc_wait (end of barrier) is a strict read• upc_barrier = upc_notify; upc_wait

• (More technical details in definitions as to the variable being read/written)

Page 15: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 15Oct 17, 2003

Properties of UPC Consistency

• A program containing only strict operations is sequentially consistent

• A program that produces only race-free executions is sequentially consistent• A UPC consistent execution of a program is race-free if for all

threads t and all enabling orderings <t

• For all potential races:

• If m1<t m2 then 9 synchronization operations o1, o2 such that m1<t o1<t o2<t m2 and Thread(o1) = Thread(m1) and Thread(o2) = Thread (m2) and either• o1 is upc_notify and o2 is upc_wait or• o1 is upc_unlock and o2 is upc_lock on the same lock variable

Page 16: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 16Oct 17, 2003

Alternative Models

• As specified, two relaxed writes to the same location may be viewed differently by different processors• Nothing to force eventual consistency (likely in implementations)• May add this to barrier points, at least• So far it looks ad hoc

• Adding directionality to reads/writes seems reasonable• Strict reads “fence” things that follows• Strict writes “fence” things that preceed• Simple replace for StrictOnThreads definition

• Support user-defined synchronization primitive built from strict operations

Page 17: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 17Oct 17, 2003

Future Plans

• Show that various implementations satisfy this spec• Use of non-blocking writes for relaxed writes with write

fench/synch at strict points• Compiler-inserted prefetching of relaxed reads• Compiler-inserted “message vectorization” to aggregate a set of

small operations into one larger one• A software caching implementation with cache flushes at strict

points

• Develop an operational model and show equivalence (or at least that it implements the spec)

• Define the data unit of atomicity• Fundamental unit of interleaving, Data tearing, Conflicts

Page 18: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 18Oct 17, 2003

Conclusions

• Behavioral specifications• Are relatively concise• Not intended for most end-users: they would see “properties” part• Avoids reference to implementation-specific notions, and is likely

to constrain implementations less than operational specs

• UPC• Has user-control specification model at the language level• Language model need not match that of the underlying machine

• It may be stronger (by inserting fences)• It may be weaker (by reordering operations at compile-time)

• Seems to be acceptable within high end programming community (also evidence in the MPI-2 spec)

Page 19: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Backup Slides

Page 20: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 20Oct 17, 2003

Communication Support Today

0

5

10

15

20

25

T3E/S

hm

T3E/E

-Reg

T3E/M

PI

IBM

/LAPI

IBM

/MPI

Quadr

ics/S

hm

Quadr

ics/M

PI

Myri

net/G

M

Myri

net/M

PI

GigE/V

IPL

GigE/M

PI

use

c

Added Latency

Send Overhead (Alone)

Send & Rec Overhead

Rec Overhead (Alone)

• Potential performance advantage for fine-grained, one-sided programs• Potential productivity advantage for irregular applications

Page 21: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 21Oct 17, 2003

Hardware Limitations to Software Innovation

• Software send overhead for 8-byte messages over time.• Not improving much over time (even in absolute terms)

Page 22: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 22Oct 17, 2003

Example: Berkeley UPC Compiler

UPC

Higher WHIRL

Lower WHIRL

• Compiler based on Open64• Multiple front-ends, including gcc• Intermediate form called WHIRL

• Current focus on C backend• IA64 possible in future

• UPC Runtime • Pointer representation• Shared/distribute memory

• Communication in GASNet• Portable • Language-independent

Optimizingtransformations

C + Runtime

Assembly: IA64, MIPS,… + Runtime

Page 23: A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.

Dagstuhl: Consistency Models 23Oct 17, 2003

Research Opportunities

• Compiler analysis and optimizations• Recognize local accesses and avoid runtime checks/storage• Communication and memory optimizations

• Separate get/put initiation from synchronization (prefetch)• Message aggregation (fine to bulk), tiling, and caching

• Language design• Dynamic parallelism for load balance• Multiscale parallelism: express parallelism at all levels• Linguistic support for unstructured and sparse data structures• Annotations, types, pragmas for correctness and performance

• Higher-level languages• Parallel Matlab or parallelizing Matlab compilers• Domain-specific parallelism