Java memory model

Inspiration

Java Memory Model

Micha Warecki

Outline

Introduction to JMM

Happens-before

Memory barriers

Performance issues

Atomicity

JEP 171

Non blocking algorithms

JavaC++ASM

Java Memory Model

Instructions reordering

Visibility

Final fields

Interaction with atomic instructions

Java Memory Model

The Java memory model (JMM) describes how threads in the Java programming language interact through memory.

Provides sequential consistency for data race free programs.

Instructions reordering

Program order:

int a = 1;int b = 2;int c = 3;int d = 4;int e = a + b;int f = c d;

Execution order:

int d = 4;int c = 3;int f = c d;int b = 2;int a = 1;int e = a + b;

Quiz

x = y = 0x = 1j = yy = 1i = x

What could be the result?

Thread 1

Thread 2

Answer(s)

i = 1; j = 1

i = 0; j = 1

i = 1; j = 0

i = 0; j = 0

Happens-before order

Two actions can be ordered by a happens-before relationship. If one action happens-before another, then the first is visible to and ordered before the second.

Java Language Specification, Java SE 7 Edition

Happens-before rules

A monitor release and matching later monitor acquire establish a happens before ordering.

A write to a volatile field happens-before every subsequent read of that field.

Execution order within a thread also establishes a happens before order.

Happens before order is transitive.

Java tools

Volatile variables

volatile boolean running = true;

Monitors

synchronized (this) { i = a; a = i;}

ReentrantLock lock = new ReentrantLock();lock.lock();lock.unlock();

What does volatile do?

Volatile reads/writes can not be reordered

Compilers and runtime are not allowed to allocate volatile variables in registers

Volatile longs and doubles are atomic

Happens-before, volatile

Happens-before, Monitors

Volatiles and monitors ordering

Can Reorder 2nd operation

1st operation Normal Load
Normal Store Volatile Load
MonitorEnter Volatile Store
MonitorExit

Normal Load
Normal Store No

Volatile Load
MonitorEnter No No No

Volatile store
MonitorExit No No

The JSR-133 Cookbook for Compiler Writers

Visibility

Thread 1:

public void run() { int counter = 0; while (running) { counter++; } System.out.println("Counted up to " + counter);}

Thread 2:

public void run() { try { Thread.sleep(100); } catch (InterruptedException ignored) { } running = false;}

LoopFlag

Visibility

How is it possible?

Compiler can reorder instructions.

Compiler can keep values in registers.

Processor can reorder instructions.

Values may not be synchronized to main memory.

JMM is designed to allow aggressive optimizations.

LoopFlag - volatile

Visibility

LoopFlag asm - loop

Intel processor

Processor

Memory access time

Registers / Buffers: < 1ns

L1: ~1ns (3-4 cycles)

L2: ~3ns (10-12 cycles)

L3: ~15ns (40-45 cycles)

DRAM: ~65ns

QPI: ~40ns

Memory barriers

LoadLoad

StoreStore

LoadStore

StoreLoad

Memory barrier - LoadLoad

The sequence: Load1; LoadLoad; Load2Ensures that Load1's data are loaded before data accessed by Load2 and all subsequent load instructions are loaded. In general, explicit LoadLoad barriers are needed on processors that perform speculative loads and/or out-of-order processing in which waiting load instructions can bypass waiting stores. On processors that guarantee to always preserve load ordering, the barriers amount to no-ops.


Memory barrier - StoreStore

The sequence: Store1; StoreStore; Store2Ensures that Store1's data are visible to other processors (i.e., flushed to memory) before the data associated with Store2 and all subsequent store instructions. In general, StoreStore barriers are needed on processors that do not otherwise guarantee strict ordering of flushes from write buffers and/or caches to other processors or main memory.


Memory barrier - LoadStore

The sequence: Load1; LoadStore; Store2Ensures that Load1's data are loaded before all data associated with Store2 and subsequent store instructions are flushed. LoadStore barriers are needed only on those out-of-order procesors in which waiting store instructions can bypass loads.


Memory barrier - StoreLoad

The sequence: Store1; StoreLoad; Load2Ensures that Store1's data are made visible to other processors (i.e., flushed to main memory) before data accessed by Load2 and all subsequent load instructions are loaded. StoreLoad barriers protect against a subsequent load incorrectly using Store1's data value rather than that from a more recent store to the same location performed by a different processor. Because of this, on the processors discussed below, a StoreLoad is strictly necessary only for separating stores from subsequent loads of the same location(s) as were stored before the barrier. StoreLoad barriers are needed on nearly all recent multiprocessors, and are usually the most expensive kind. Part of the reason they are expensive is that they must disable mechanisms that ordinarily bypass cache to satisfy loads from write-buffers. This might be implemented by letting the buffer fully flush, among other possible stalls.


Memory barriers

Required barriers 2nd operation

1st operation Normal Load Normal Store Volatile Load
MonitorEnter Volatile Store
MonitorExit

Normal Load LoadStore

Normal Store StoreStore

Volatile Load
MonitorEnter LoadLoad LoadStore LoadLoad LoadStore

Volatile Store
MonitorExit StoreLoad StoreStore


Intel X86/64 Memory Model

Loads are not reordered with other loads.

Stores are not reordered with other stores.

Stores are not reordered with older loads.

Loads may be reordered with older stores to different locations but not with older stores to the same location.

In a multiprocessor system, memory ordering obeys causality (memory ordering respects transitive visibility).

In a multiprocessor system, stores to the same location have a total order.

In a multiprocessor system, locked instructions have a total order.

Loads and stores are not reordered with locked instructions.

LoopFlag asm - store, MemoryBarriers asm

StoreLoad on Intel Ivy Bridge

lock addl $0x0,(%rsp)

Intel's IA-32 developer manual: Locked operations are atomic with respect to all other memory operations and all externally visible events. [...] Locked instructions can be used to synchronize data written by one processor and read by another processor.

Volatile performance

JiT - asm

Memory barriers - architecture

Processor LoadStore LoadLoad StoreStore StoreLoad Data
dependency
orders loads? Atomic
Conditional Other
Atomics Atomics
provide
barrier?

sparc-TSO no-op no-op no-op membar
(StoreLoad) yes CAS:
casa swap,
ldstub full

x86 no-op no-op no-op mfence or
cpuid or
locked insn yes CAS:
cmpxchg xchg,
locked insn full

ia64 combine
with
st.rel or
ld.acq ld.acq st.rel mf yes CAS:
cmpxchg xchg,
fetchadd target +
acq/rel

arm dmb
(see below) dmb
(see below) dmb-st dmb indirection
only LL/SC:
ldrex/strex target
only

ppc lwsync
(see below) lwsync
(see below) lwsync hwsync indirection
only LL/SC:
ldarx/stwcx target
only

alpha mb mb wmb mb no LL/SC:
ldx_l/stx_c target
only

pa-risc no-op no-op no-op no-op yes build
from
ldcw ldcw (NA)


* The x86 processors supporting "streaming SIMD" SSE2 extensions require LoadLoad "lfence" only only in connection with these streaming instructions.

Final fields

Act as a normal field, but:A store of a final field (inside a constructor) and, if the field is a reference, any store that this final can reference, cannot be reordered with a subsequent store (outside that constructor) of the reference to the object holding that field into a variable accessible to other threads. (x.finalField = v; ... ; sharedRef = x;)

The initial load (i.e., the very first encounter by a thread) of a final field cannot be reordered with the initial load of the reference to the object containing the final field. (v.afield = 1; x.finalField = v; ... ; sharedRef = x;)

Final field example

class FinalFieldExample {final int x;int y;static FinalFieldExample f;

public FinalFieldExample() {x = 3;y = 4;}

static void writer() {f = new FinalFieldExample();}

static void reader() {if (f != null) {int i = f.x;int j = f.y;}}}

Final field example

class FinalFieldExample {final int x;int y;static FinalFieldExample f;

public FinalFieldExample() {x = 3;y = 4;}

static void writer() {f = new FinalFieldExample();}

static void reader() {if (f != null) {int i = f.x;int j = f.y;}}}

Guaranteed value 3

4 or 0 !!

Atomicity

java.util.concurrent.atomicAtomicBoolean

AtomicInteger

AtomicIntegerArray

AtomicIntegerFieldUpdater

AtomicLong

AtomicLongArray

AtomicLongFieldUpdater

AtomicMarkableReference

AtomicReference

AtomicReferenceArray

AtomicReferenceFieldUpdater

AtomicStampedReference

AtomicInteger

public class AtomicInteger extends Number implements java.io.Serializable { //... private volatile int value; public final void set(int newValue) { value = newValue; } //... public final void lazySet(int newValue) { unsafe.putOrderedInt(this, valueOffset, newValue); } //... public final boolean compareAndSet(int expect, int update) { return unsafe.compareAndSwapInt(this, valueOffset, expect, update); }

Atomic - asm

Unsafe.putOrdered*

StoreStore barrier

JEP 171: Fence Intrinsics

loadFence: { OrderAccess::acquire(); }

storeFence: { OrderAccess::release(); }

fullFence: { OrderAccess::fence(); }

NonBlocking

Thanks!

Questions?

CLIQUE PARA EDITAR O FORMATO DO TEXTO DO TTULO

Clique para editar o formato do texto da estrutura de tpicos2. Nvel da estrutura de tpicos3. Nvel da estrutura de tpicos4. Nvel da estrutura de tpicos5. Nvel da estrutura de tpicos6. Nvel da estrutura de tpicos7. Nvel da estrutura de tpicos

1000000000 operationsNormal writeVolatile writeNormal readVolatile read

Row 8655913161148520777.666674539885.6666666750252582

Java memory model

Technology

Transcript of Java memory model