Java memory model
-
Upload
michal-warecki -
Category
Technology
-
view
4.091 -
download
1
Transcript of Java memory model
Inspiration
Java Memory Model
Micha Warecki
Outline
Introduction to JMM
Happens-before
Memory barriers
Performance issues
Atomicity
JEP 171
Non blocking algorithms
JavaC++ASM
Java Memory Model
Instructions reordering
Visibility
Final fields
Interaction with atomic instructions
Java Memory Model
The Java memory model (JMM) describes how threads in the Java programming language interact through memory.
Provides sequential consistency for data race free programs.
Instructions reordering
Program order:
int a = 1;int b = 2;int c = 3;int d = 4;int e = a + b;int f = c d;
Execution order:
int d = 4;int c = 3;int f = c d;int b = 2;int a = 1;int e = a + b;
Quiz
x = y = 0x = 1j = yy = 1i = x
What could be the result?
Thread 1
Thread 2
Answer(s)
i = 1; j = 1
i = 0; j = 1
i = 1; j = 0
i = 0; j = 0
Happens-before order
Two actions can be ordered by a happens-before relationship. If one action happens-before another, then the first is visible to and ordered before the second.
Java Language Specification, Java SE 7 Edition
Happens-before rules
A monitor release and matching later monitor acquire establish a happens before ordering.
A write to a volatile field happens-before every subsequent read of that field.
Execution order within a thread also establishes a happens before order.
Happens before order is transitive.
Java tools
Volatile variables
volatile boolean running = true;
Monitors
synchronized (this) { i = a; a = i;}
ReentrantLock lock = new ReentrantLock();lock.lock();lock.unlock();
What does volatile do?
Volatile reads/writes can not be reordered
Compilers and runtime are not allowed to allocate volatile variables in registers
Volatile longs and doubles are atomic
Happens-before, volatile
Happens-before, Monitors
Volatiles and monitors ordering
Can Reorder 2nd operation
1st operation Normal Load
Normal Store Volatile Load
MonitorEnter Volatile Store
MonitorExit
Normal Load
Normal Store No
Volatile Load
MonitorEnter No No No
Volatile store
MonitorExit No No
The JSR-133 Cookbook for Compiler Writers
Visibility
Thread 1:
public void run() { int counter = 0; while (running) { counter++; } System.out.println("Counted up to " + counter);}
Thread 2:
public void run() { try { Thread.sleep(100); } catch (InterruptedException ignored) { } running = false;}
LoopFlag
Visibility
How is it possible?
Compiler can reorder instructions.
Compiler can keep values in registers.
Processor can reorder instructions.
Values may not be synchronized to main memory.
JMM is designed to allow aggressive optimizations.
LoopFlag - volatile
Visibility
LoopFlag asm - loop
Intel processor
Processor
Memory access time
Registers / Buffers: < 1ns
L1: ~1ns (3-4 cycles)
L2: ~3ns (10-12 cycles)
L3: ~15ns (40-45 cycles)
DRAM: ~65ns
QPI: ~40ns
Memory barriers
LoadLoad
StoreStore
LoadStore
StoreLoad
Memory barrier - LoadLoad
The sequence: Load1; LoadLoad; Load2Ensures that Load1's data are loaded before data accessed by Load2 and all subsequent load instructions are loaded. In general, explicit LoadLoad barriers are needed on processors that perform speculative loads and/or out-of-order processing in which waiting load instructions can bypass waiting stores. On processors that guarantee to always preserve load ordering, the barriers amount to no-ops.
The JSR-133 Cookbook for Compiler Writers
Memory barrier - StoreStore
The sequence: Store1; StoreStore; Store2Ensures that Store1's data are visible to other processors (i.e., flushed to memory) before the data associated with Store2 and all subsequent store instructions. In general, StoreStore barriers are needed on processors that do not otherwise guarantee strict ordering of flushes from write buffers and/or caches to other processors or main memory.
The JSR-133 Cookbook for Compiler Writers
Memory barrier - LoadStore
The sequence: Load1; LoadStore; Store2Ensures that Load1's data are loaded before all data associated with Store2 and subsequent store instructions are flushed. LoadStore barriers are needed only on those out-of-order procesors in which waiting store instructions can bypass loads.
The JSR-133 Cookbook for Compiler Writers
Memory barrier - StoreLoad
The sequence: Store1; StoreLoad; Load2Ensures that Store1's data are made visible to other processors (i.e., flushed to main memory) before data accessed by Load2 and all subsequent load instructions are loaded. StoreLoad barriers protect against a subsequent load incorrectly using Store1's data value rather than that from a more recent store to the same location performed by a different processor. Because of this, on the processors discussed below, a StoreLoad is strictly necessary only for separating stores from subsequent loads of the same location(s) as were stored before the barrier. StoreLoad barriers are needed on nearly all recent multiprocessors, and are usually the most expensive kind. Part of the reason they are expensive is that they must disable mechanisms that ordinarily bypass cache to satisfy loads from write-buffers. This might be implemented by letting the buffer fully flush, among other possible stalls.
The JSR-133 Cookbook for Compiler Writers
Memory barriers
Required barriers 2nd operation
1st operation Normal Load Normal Store Volatile Load
MonitorEnter Volatile Store
MonitorExit
Normal Load LoadStore
Normal Store StoreStore
Volatile Load
MonitorEnter LoadLoad LoadStore LoadLoad LoadStore
Volatile Store
MonitorExit StoreLoad StoreStore
The JSR-133 Cookbook for Compiler Writers
Intel X86/64 Memory Model
Loads are not reordered with other loads.
Stores are not reordered with other stores.
Stores are not reordered with older loads.
Loads may be reordered with older stores to different locations but not with older stores to the same location.
In a multiprocessor system, memory ordering obeys causality (memory ordering respects transitive visibility).
In a multiprocessor system, stores to the same location have a total order.
In a multiprocessor system, locked instructions have a total order.
Loads and stores are not reordered with locked instructions.
LoopFlag asm - store, MemoryBarriers asm
StoreLoad on Intel Ivy Bridge
lock addl $0x0,(%rsp)
Intel's IA-32 developer manual: Locked operations are atomic with respect to all other memory operations and all externally visible events. [...] Locked instructions can be used to synchronize data written by one processor and read by another processor.
Volatile performance
JiT - asm
Memory barriers - architecture
Processor LoadStore LoadLoad StoreStore StoreLoad Data
dependency
orders loads? Atomic
Conditional Other
Atomics Atomics
provide
barrier?
sparc-TSO no-op no-op no-op membar
(StoreLoad) yes CAS:
casa swap,
ldstub full
x86 no-op no-op no-op mfence or
cpuid or
locked insn yes CAS:
cmpxchg xchg,
locked insn full
ia64 combine
with
st.rel or
ld.acq ld.acq st.rel mf yes CAS:
cmpxchg xchg,
fetchadd target +
acq/rel
arm dmb
(see below) dmb
(see below) dmb-st dmb indirection
only LL/SC:
ldrex/strex target
only
ppc lwsync
(see below) lwsync
(see below) lwsync hwsync indirection
only LL/SC:
ldarx/stwcx target
only
alpha mb mb wmb mb no LL/SC:
ldx_l/stx_c target
only
pa-risc no-op no-op no-op no-op yes build
from
ldcw ldcw (NA)
The JSR-133 Cookbook for Compiler Writers
* The x86 processors supporting "streaming SIMD" SSE2 extensions require LoadLoad "lfence" only only in connection with these streaming instructions.
Final fields
Act as a normal field, but:A store of a final field (inside a constructor) and, if the field is a reference, any store that this final can reference, cannot be reordered with a subsequent store (outside that constructor) of the reference to the object holding that field into a variable accessible to other threads. (x.finalField = v; ... ; sharedRef = x;)
The initial load (i.e., the very first encounter by a thread) of a final field cannot be reordered with the initial load of the reference to the object containing the final field. (v.afield = 1; x.finalField = v; ... ; sharedRef = x;)
Final field example
class FinalFieldExample {final int x;int y;static FinalFieldExample f;
public FinalFieldExample() {x = 3;y = 4;}
static void writer() {f = new FinalFieldExample();}
static void reader() {if (f != null) {int i = f.x;int j = f.y;}}}
Final field example
class FinalFieldExample {final int x;int y;static FinalFieldExample f;
public FinalFieldExample() {x = 3;y = 4;}
static void writer() {f = new FinalFieldExample();}
static void reader() {if (f != null) {int i = f.x;int j = f.y;}}}
Guaranteed value 3
4 or 0 !!
Atomicity
java.util.concurrent.atomicAtomicBoolean
AtomicInteger
AtomicIntegerArray
AtomicIntegerFieldUpdater
AtomicLong
AtomicLongArray
AtomicLongFieldUpdater
AtomicMarkableReference
AtomicReference
AtomicReferenceArray
AtomicReferenceFieldUpdater
AtomicStampedReference
AtomicInteger
public class AtomicInteger extends Number implements java.io.Serializable { //... private volatile int value; public final void set(int newValue) { value = newValue; } //... public final void lazySet(int newValue) { unsafe.putOrderedInt(this, valueOffset, newValue); } //... public final boolean compareAndSet(int expect, int update) { return unsafe.compareAndSwapInt(this, valueOffset, expect, update); }
Atomic - asm
Unsafe.putOrdered*
StoreStore barrier
JEP 171: Fence Intrinsics
loadFence: { OrderAccess::acquire(); }
storeFence: { OrderAccess::release(); }
fullFence: { OrderAccess::fence(); }
NonBlocking
Thanks!
Questions?
CLIQUE PARA EDITAR O FORMATO DO TEXTO DO TTULO
Clique para editar o formato do texto da estrutura de tpicos2. Nvel da estrutura de tpicos3. Nvel da estrutura de tpicos4. Nvel da estrutura de tpicos5. Nvel da estrutura de tpicos6. Nvel da estrutura de tpicos7. Nvel da estrutura de tpicos
1000000000 operationsNormal writeVolatile writeNormal readVolatile read
Row 8655913161148520777.666674539885.6666666750252582