Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf ·...
Transcript of Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf ·...
1 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Machine Code Snippets in Java
Vladimir Ivanov HotSpot JVM Compile r Oracle Corp. JVM Language Summit 2016
2 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
3 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
4 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
0x11529c8c0:mov%eax,-0x16000(%rsp)0x11529c8c7:push%rbp0x11529c8c8:sub$0x20,%rsp0x11529c8cc:mov%rdx,(%rsp)0x11529c8d0:mov%rsi,%rbp0x11529c8d3:movabs$0x7c0013d10,%rsi0x11529c8dd:nop0x11529c8de:nop0x11529c8df:nop0x11529c8e0:vzeroupper0x11529c8e3:callq0x00000001152418a00x11529c8e8:mov%rax,%rbx0x11529c8eb:mov(%rsp),%r100x11529c8ef:vmovdqu0x10(%r10),%ymm10x11529c8f5:vmovdqu0x10(%rbp),%ymm00x11529c8fa:vpaddd%ymm0,%ymm1,%ymm00x11529c8fe:vmovdqu%ymm0,0x10(%rbx)0x11529c903:mov%rbx,%rax0x11529c906:vzeroupper0x11529c909:add$0x20,%rsp0x11529c90d:pop%rbp0x11529c90e:test%eax,-0xb786914(%rip)0x11529c914:retq
0x11529d240:mov%eax,-0x16000(%rsp)0x11529d247:push%rbp0x11529d248:sub$0x30,%rsp0x11529d24c:mov%rcx,%rbp0x11529d24f:vmovdqu0x10(%rsi),%ymm00x11529d254:vmovdqu0x10(%rdx),%ymm10x11529d259:vpaddd%ymm0,%ymm1,%ymm00x11529d25d:vmovdqu%ymm0,(%rsp)0x11529d262:movabs$0x7c0013d10,%rsi0x11529d26c:vzeroupper0x11529d26f:callq0x00000001152418a00x11529d274:mov%rax,%rbx0x11529d277:vmovdqu0x10(%rbp),%ymm10x11529d27c:vmovdqu(%rsp),%ymm00x11529d281:vpaddd%ymm0,%ymm1,%ymm00x11529d285:vmovdqu%ymm0,0x10(%rbx)0x11529d28a:mov%rbx,%rax0x11529d28d:vzeroupper0x11529d290:add$0x30,%rsp0x11529d294:pop%rbp0x11529d295:test%eax,-0xb78729b(%rip)0x11529d29b:retq
x86Assembly
5 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
The Plan
§ Background
§ Machine Code Snippets – the concept & its evolution
§ Vectors – box elimination, C2 optimizations, GC
6 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
7 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Vector ISA Extensions
§ 100s of vector instructions on x86 § Intel intrinsic instructions
– MMX: ~120 – SSE: ~130 – SSE2/3/SSSE3/4.1/4.2: ~260 – AVX/AVX2: ~380
8 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Vector ISA Extensions
§ 1000s of vector instructions on x86 § Intel intrinsic instructions
– MMX: ~120 – SSE: ~130 – SSE2/3/SSSE3/4.1/4.2: ~260 – AVX/AVX2: ~380 – AVX-512: ~3800
9 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
10 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Motivation
§ Vector API – expose data-parallel operations through a cross-platform API
§ How to bind to particular machine instructions in the implementation?
§ Existing solutions – JVM intrinsics – JNI / NativeMethodHandles (in Project Panama)
11 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
JVM Intrinsics
“A method is intrinsified if the HotSpot VM replaces the annotated method with hand-written assembly and/or hand-written compiler IR -- a compiler intrinsic -- to improve performance.”
@HotSpotIntrinsicCandidate JavaDoc
publicfinalclassjava.lang.Class<T>implements…{@HotSpotIntrinsicCandidatepublicnativebooleanisInstance(Objectobj);
12 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
JNI @since 1.1
13 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
JNI
classLib{staticnativevoidm();}
voidJNICALLJava_Lib_m(JNIEnv*env,jclassc){ m();}
Usage scenario
14 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Native Method Handles
MethodTypemt=MethodType.methodType(void.class);MethodHandlemh= MethodHandles.lookup().findNative("m",mt);mh.invokeExact();
Project Panama
15 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Native Method Handles Project Panama
Java Native
Construction Lookup.findVirtual() et al Lookup.findNative()
Reference (typed) DirectMethodHandle NativeMethodHandle
Reference (direct) MemberName NativeEntryPoint
Linker MH.linkToVirtual() et al MH.linkToNative()
Invocation indy, MH.invoke(), MH.invokeExact()
“Making native calls from the JVM” by John Rose http://cr.openjdk.java.net/~jrose/panama/native-call-primitive.html
16 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Native Method Handles Project Panama
callq 0x1057b2eb0 ; native method entry
getpid JNI 13.7 ± 0.5 ns Direct call 3.4 ± 0.2 ns
17 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Native code vs JVM Intrinsics
§ Native method + arbitrary native code - too much ceremony - opaque to the JVM
§ JVM Intrinsics + powerful, lightweight, and flexible - high development costs
18 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Machine Code Snippets
19 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Machine Code Snippets
New breed:
NativeMethodHandle + JVM intrinsic
Idea (1st iteration)
Wrap raw machine code in a method handle
The Idea
20 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Machine Code Snippets
§ Use case: prototyping 1. minimize implementation costs 2. decent performance 3. up to a dozen instructions in size
§ Existing solutions – JVM intrinsics: 1. no / 2. yes / 3. yes – JNI / NMH: 1. yes / 2. no / 3. yes
Motivation / Goals
21 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Vectorized Memory Copy
vmovdqumem,reg//256-bitloadvmovdqureg,mem//256-bitstore
22 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Vectorized Memory Copy Machine Code as a Method Handle
mov256MH.invokeExact(src,off1,dst,off2);
C4E17E6F0437C4E17E7F040A
MH(LJLJ)V
23 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Vectorized Memory Copy Machine Code as a Method Handle
vmovdqu(?,?,1),%ymm0vmovdqu%ymm0,(?,?,1)
MH(LJLJ)Vmov256MH.invokeExact(src,off1,dst,off2);
24 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Vectorized Memory Copy Machine Code as a Method Handle
vmovdqu(%rdi,%rsi,1),%ymm0vmovdqu%ymm0,(%rdx,%rcx,1)
/*(rdi,rsi,rdx,rcx)*/
mov256MH.invokeExact(src,off1,dst,off2);
MH(LJLJ)V
25 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Machine Code Snippet
§ 2 execution modes – optimized:
§ embedded in generated code – non-optimized, interpreted
§ invokes stand-alone version MethodHandleJava
Native
Unsafe wrapper
Safe wrapper
Stand-alone Embedded
User-defined
produced by j.l.i
26 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Machine Code Snippet
§ matches native ABI – same machine code in all execution modes
§ System V AMD64 ABI: “Function Calling Sequence” – first 6 integer arguments: RDI, RSI, RDX, RCX, R8, R9 – first 8 FP arguments: XMM0, …, XMM7 – return registers: RAX/RDX (integer), XMM0/XMM1 (FP) – …
Calling Convention
27 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Native Method Handle
packagejava.lang.invoke;/*non-public*/classNativeMethodHandleextendsMethodHandle{finalNativeEntryPointnativeFunc;
/*non-public*/classNativeEntryPoint{finallongaddr;finalMethodTypetype;/*non-public*/classMachineCodeSnippetextendsNativeEntryPoint{finalbyte[]code;
28 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Machine Code Snippet
publicstaticMethodHandlemake(Stringname,MethodTypemt,booleanisSupported,byte...machineCode){...}
How To Use
29 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Vectorized Memory Copy
staticfinalMethodHandlemov256MH=MachineCodeSnippet.make(”copy256”,MethodType.methodType(void.class,//returntypeObject.class/*rdi*/,long.class/*rsi*/,//srcObject.class/*rdx*/,long.class/*rcx*/),//dstrequires(AVX),0xC4,0xE1,0x7E,0x6F,0x04,0x37,//vmovdqu(%rdi,%rsi,1),%ymm00xC4,0xE1,0x7E,0x7F,0x04,0x0A);//vmovdqu%ymm0,(%rdx,%rcx,1)
MethodHandle
30 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Stand-alone version
Decodingcodesnippet"move256"@0x10f05d6a0<+0>:push%rbp<+1>:mov%rsp,%rbp<+4>:vmovdqu(%rdi,%rsi,1),%ymm0;c4e17e6f0437<+10>:vmovdqu%ymm0,(%rdx,%rcx,1);c4e17e7f040a
<+16>:leaveq<+17>:retq
$ java … -XX:+PrintCodeSnippets ...
#parm0:rdi:rdi='java/lang/Object’#parm1:rsi:rsi=long#parm2:rdx:rdx='java/lang/Object’#parm3:rcx:rcx=long
31 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Vectorized Memory Copy
staticfinalMethodHandlemov256MH=...;//Unsafewrapperstaticvoidmove256(Objectsrc,longoff1,Objectdst,longoff2){try{mov256MH.invokeExact(src,off1,dst,off2);}catch(Throwablee){thrownewError(e);}}
32 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Vectorized Memory Copy
4159181MachineCodeSnippetSamples::move256(27bytes)@8LambdaForm$MH::invokeExact_MT(29bytes)forceinlinebyannotation@11Invokers::checkExactType(17bytes)forceinlinebyannotation@1MethodHandle::type(5bytes)accessor@15Invokers::checkCustomized(23bytes)forceinlinebyannotation@1MethodHandleImpl::isCompileConstant(2bytes)(intrinsic)@25LambdaForm$NMH::invokeNative_LJLJ_V(27bytes)forceinlinebyann…@7NativeMethodHandle::internalNativeEntryPoint(8bytes)forceinline…@23MethodHandle::linkToNative(LJLJL)V(0bytes)directnativecall
33 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
#{method}'move256’...<+0>:mov%eax,-0x16000(%rsp)<+7>:push%rbp<+8>:sub$0x10,%rsp<+12>:mov%rsi,%rdi<+15>:mov%rdx,%rsi<+17>:mov%rcx,%rdx<+21>:mov%r8,%rcx<+24>:vmovdqu(%rdi,%rsi,1),%ymm0;c4e17e6f0437<+30>:vmovdqu%ymm0,(%rdx,%rcx,1);c4e17e7f040a<+36>:add$0x10,%rsp<+40>:pop%rbp<+41>:test%eax,-0x4d3d58f(%rip)<+47>:retq
#parm0:rsi:rsi='java/lang/Object'#parm1:rdx:rdx=long#parm2:rcx:rcx='java/lang/Object'#parm3:r8:r8=long
34 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Calling Convention
hotspot/src/cpu/x86/vm/sharedRuntime_x86_64.cpp:“TheJavacallingconventionisa"shifted"versionoftheCABI.
ByskippingthefirstCABIregisterwecancallnon-staticjnimethods
withsmallnumbersofargumentswithouthavingtoshufflethearguments
atall.SincewecontrolthejavaABIweoughttoatleastgetsome
advantageoutofit.“
Java vs C
arg 1st 2nd 3rd 4th 5th 6th … C RDI RSI RDX RCX R8 R9 stack
Java RSI RDX RCX R8 R9 RDI stack
System V AMD64 ABI
35 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Native Method Linker MH.linkToNative
#{method}'linkToNative'’(LJLJL)V’<+0>:push%rbp<+1>:mov%rsp,%rbp<+4>:mov%rsi,%rdi<+7>:mov%rdx,%rsi<+10>:mov%rcx,%rdx<+13>:mov%r8,%rcx<+16>:mov%r9,%r8<+19>:callq*0x10(%r8)<+23>:leaveq<+24>:retq
#parm0:rsi:rsi='java/lang/Object’#parm1:rdx:rdx=long#parm2:rcx:rcx='java/lang/Object’#parm4:r8:r8=long#parm3:r9:r9=‘.../NativeEntryPoint'
36 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Vectorized Memory Copy
//Safewrapperstaticvoidcopy256(byte[]src,intidx1, byte[]dst,intidx2){//ArrayboundschecksObjects.checkIndex(idx1+32,src.length); Objects.checkIndex(idx2+32,dst.length);//Offsetcomputationslongoff1=Unsafe.ARRAY_BYTE_BASE_OFFSET+idx1;longoff2=Unsafe.ARRAY_BYTE_BASE_OFFSET+idx2;move256(src,off1,dst,off2);//Unsafewrapper}
37 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Vectors
38 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
VPADDD Add Packed Integers
39 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
VPADDD Add Packed Integers
vpaddd%ymm1,%ymm0,%ymm0
vector/*ymm0*/vpaddd(vectorv1/*ymm0*/,vectorv2/*ymm1*/)
MH(??)?/*(ymm0,ymm1)ymm0*/
40 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
JVM vs Hardware Impedance Mismatch
size (bits) 8 16 32 64 128 256 512 …
x86 regs AL AX EAX RAX XMM0 YMM0 ZMM0 -
JVM B S I J - - - …
41 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Vectors
§ java.lang.Long2 / Long4 / Long8 / … – represent 128/256/512-bit values – value-based class (but should be a value class!)
§ “well-known“ to the JVM – special treatment in the JVM – C2 knows how to map the values to appropriate vector registers
42 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
JVM vs Hardware Impedance Mismatch
size (bits) 8 16 32 64 128 256 512 …
x86 regs AL AX EAX RAX XMM0 YMM0 ZMM0 -
JVM B S I J j.l.Long2 j.l.Long4 j.l.Long8 …
43 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Valhalla JVM vs Hardware
size (bits) 8 16 32 64 128 256 512 …
x86 regs AL AX EAX RAX XMM0 YMM0 ZMM0 -
JVM B S I J j.l.Long2 j.l.Long4 j.l.Long8 …
44 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
VPADDD Add Packed Integers
vpaddd%ymm1,%ymm0,%ymm0
Long4/*rax*/vpaddd(Long4v1/*rsi*/,Long4v2/*rdx*/)
MH(L4L4)L4/*(rdi,rsi)rax*/
45 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
VPADDD Add Packed Integers
vpaddd%ymm1,%ymm0,%ymm0
Long4/*rax*/vpaddd(Long4v1/*rsi*/,Long4v2/*rdx*/)
MH(L4L4)L4/*(rdi,rsi)rax*/
46 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
VPADDD Add Packed Integers
vmovdqu0x10(%rdi),%ymm0;unboxvmovdqu0x10(%rsi),%ymm1;unboxvpaddd%ymm1,%ymm0,%ymm0vmovdqu%ymm0,0x10(%rax);box
Long4/*rax*/vpaddd(Long4v1/*rsi*/,Long4v2/*rdx*/)
MH(L4L4)L4/*(rdi,rsi)rax*/
47 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
VPADDD Add Packed Integers
vmovdqu0x10(%rdi),%ymm0vmovdqu0x10(%rsi),%ymm1vpaddd%ymm1,%ymm0,%ymm0vmovdqu%ymm0,0x10(%rax)
Long4/*rax*/vpaddd(Long4v1/*rsi*/,Long4v2/*rdx*/)
MH(L4L4)L4/*(rdi,rsi)rax*/
48 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
VPADDD Add Packed Integers
vmovdqu0x10(%rsi),%ymm0vmovdqu0x10(%rdx),%ymm1vpaddd%ymm1,%ymm0,%ymm0mov%rdi,%raxvmovdqu%ymm0,0x10(%rax)
Long4/*rax*/vpaddd(Long4box/*rsi*/, Long4v1/*rdx*/,Long4v2/*rcx*/)
MH(L4L4L4)L4/*(rdi,rsi,rdx)rax*/
49 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
VPADDD Add Packed Integers
Long4/*rax*/vpaddd(Long4v1/*rsi*/,Long4v2/*rdx*/){Objectbox=Long4.make();return(Long4)MHm256_vpaddd.invokeExact(box,s1,s2);}
vmovdqu0x10(%rsi),%ymm0vmovdqu0x10(%rdx),%ymm1vpaddd%ymm1,%ymm0,%ymm0mov%rdi,%raxvmovdqu%ymm0,0x10(%rax)
MH(L4L4L4)L4/*(rdi,rsi,rdx)rax*/
50 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
VPADDD Add Packed Integers //findStatic(Long?.class,“make”)+collectArguments()://Tadapter(A...a){//Long?b=Long?::make();//returntarget(b,a...);//}
vmovdqu0x10(%rsi),%ymm0vmovdqu0x10(%rdx),%ymm1vpaddd%ymm1,%ymm0,%ymm0mov%rdi,%raxvmovdqu%ymm0,0x10(%rax)
MH(L4L4L4)L4/*(rdi,rsi,rdx)rax*/
MH(L4L4)L4
51 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Stand-alone version
Decodingcodesnippet"m256_vpaddd"@0x10f06dd20<+0>:push%rbp<+1>:mov%rsp,%rbp<+4>:vmovdqu0x10(%rsi),%ymm0<+9>:vmovdqu0x10(%rdx),%ymm1
<+14>:vpaddd%ymm1,%ymm0,%ymm0<+18>:mov%rdi,%rax<+21>:vmovdqu%ymm0,0x10(%rax)<+26>:leaveq<+27>:retq
$ java … -XX:+PrintCodeSnippets ...
#parm0:rdi:rdi='java/lang/Object’#parm1:rsi:rsi='java/lang/Long4’#parm2:rdx:rdx='java/lang/Long4’
52 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
#{method}’vpaddd''(L4L4)L4<+0>:...preamble...<+12>:mov%rdx,(%rsp)<+16>:mov%rsi,%rbp
...<+35>:callq_new_instance_Java<+40>:mov%rax,%rbx<+43>:mov(%rsp),%r10<+47>:vmovdqu0x10(%r10),%ymm1
<+53>:vmovdqu0x10(%rbp),%ymm0<+58>:vpaddd%ymm0,%ymm1,%ymm0<+62>:vmovdqu%ymm0,0x10(%rbx)<+67>:mov%rbx,%rax<+70>:...prologue...
#parm0:rsi:rsi='java/lang/Long4'#parm1:rdx:rdx='java/lang/Long4’
53 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
VPADDD
publicstaticLong4testVAdd4(Long4v1,Long4v2, Long4v3,Long4v4){Long4t1=vpaddd(v1,v2);Long4t2=vpaddd(v3,v4);Long4t3=vpaddd(t1,t2);returnvpaddd(t3,v1);}
Nested Sum
v1 v2 v3 v4
v1
+
+
+
+
54 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
...preamble...
<+12>:mov%r8,%r13
<+15>:mov%rcx,%rbx
<+18>:mov%rsi,%rbp
<+21>:vmovdqu0x10(%rsi),%ymm0;unbox
<+26>:vmovdqu0x10(%rdx),%ymm1;unbox
<+31>:vpaddd%ymm0,%ymm1,%ymm0;snippet
<+35>:vmovdqu%ymm0,(%rsp)
<+40>:vmovdqu0x10(%rbx),%ymm0;unbox
<+45>:vmovdqu0x10(%r13),%ymm1;unbox
<+51>:vpaddd%ymm0,%ymm1,%ymm0;snippet
<+55>:vmovdqu%ymm0,%ymm1
<+59>:vmovdqu(%rsp),%ymm0
<+64>:vpaddd%ymm0,%ymm1,%ymm0;snippet
<+68>:vmovdqu%ymm0,(%rsp)
...objectallocation...
<+87>:callq_new_instance_Java
<+92>:mov%rax,%rbx
<+95>:vmovdqu0x10(%rbp),%ymm1;unbox
<+100>:vmovdqu(%rsp),%ymm0
<+105>:vpaddd%ymm0,%ymm1,%ymm0;snippet
<+109>:vmovdqu%ymm0,0x10(%rbx);box
<+114>:mov%rbx,%rax
...prologue...
<+131>:retq
0x112ea1640:
#{method}'testVAdd4’...
#parm0:rsi:rsi='java/lang/Long4’#parm1:rdx:rdx='java/lang/Long4’#parm2:rcx:rcx='java/lang/Long4’#parm3:r8:r8='java/lang/Long4'
55 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Register Allocator-aware Snippets
56 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
RA-aware Snippets
Let user know about RA decisions!
Idea (2nd iteration):
Use snippet “recipe” instead of raw machine code.
57 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Register Masks
vpaddd_,_,_
Long4vpaddd(Long4v1,Long4v2)
MH(L4L4)L4
([%ymm0-15],[%ymm0-15])[%ymm0-15]
58 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
59 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
JVMCI
60 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
JVMCI
packagejdk.vm.ci.amd64;
/**RepresentstheAMD64architecture.*/publicclassAMD64extendsArchitecture{ //GeneralpurposeCPUregisters publicstaticfinalRegisterrax=newRegister(0,0,"rax",CPU); publicstaticfinalRegisterrcx=newRegister(1,1,"rcx",CPU); publicstaticfinalRegisterrdx=newRegister(2,2,"rdx",CPU); publicstaticfinalRegisterrbx=newRegister(3,3,"rbx",CPU); publicstaticfinalRegisterrsp=newRegister(4,4,"rsp",CPU);
...
/**Representsatargetmachineregister.*/publicfinalclassRegisterimplementsComparable<Register>{
Platform Definitions
61 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
RA-aware Snippets
packagejdk.vm.ci.panama;publicclassMachineCodeSnippet{publicstaticMethodHandlemake(Stringname,MethodTypemt,booleanisSupported,Register[][]regMasks,SnippetGeneratorcodeProducer){...}@FunctionalInterfacepublicinterfaceSnippetGenerator{int[]getCode(Register...regs);}
62 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Register Masks
vpaddd_,_,_
Long4vpaddd(Long4v1,Long4v2)
MH(L4L4)L4([%ymm0-15],[%ymm0-15])[%ymm0-15]
63 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
VPADDD
staticfinalMethodHandleMHm256_vpaddd=MachineCodeSnippet.make("mm256_vpaddd", MethodType.methodType(Long4.class,Long4.class,Long4.class), requires(AVX2), newRegister[][]{xmmRegs,xmmRegs,xmmRegs}, (Register[]regs)->{ Registerout=regs[0],in1=regs[1],in2=regs[2]; intvex2=vex2(/*R*/1,in2.encoding(),/*L*/1,/*pp*/1);returnnewint[]{0xC5,vex2,0xFE,modRM(out,in1)};});}
RA-Aware Version
64 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
...preamble...
<+12>:mov%r8,%r13 ;???
<+15>:mov%rcx,%rbx;???
<+18>:mov%rsi,%rbp;???
<+21>:vmovdqu0x10(%rsi),%ymm0;unbox
<+26>:vmovdqu0x10(%rdx),%ymm1;unbox
<+31>:vpaddd%ymm0,%ymm1,%ymm0;snippet
<+35>:vmovdqu0x10(%rbx),%ymm1;unbox
<+40>:vmovdqu0x10(%r13),%ymm2;unbox
<+46>:vpaddd%ymm1,%ymm2,%ymm1;snippet
<+50>:vpaddd%ymm0,%ymm1,%ymm0;snippet
<+54>:vmovdqu%ymm0,(%rsp);spill
...objectallocation...
<+75>:callq_new_instance_Java
<+80>:mov%rax,%rbx;???
<+83>:vmovdqu0x10(%rbp),%ymm0;unbox
<+88>:vmovdqu(%rsp),%ymm1;fill
<+93>:vpaddd%ymm1,%ymm0,%ymm0;snippet
<+97>:vmovdqu%ymm0,0x10(%rbx);box
<+102>:mov%rbx,%rax;???
...prologue...
<+119>:retq
0x112ea1640:
#{method}'testVAdd4’...
#parm0:rsi:rsi='java/lang/Long4’#parm1:rdx:rdx='java/lang/Long4’#parm2:rcx:rcx='java/lang/Long4’#parm3:r8:r8='java/lang/Long4'
65 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Effects
• Memory • Killed registers
66 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Register Preservation
vpaddd_,_,_
Long4vpaddd(Long4v1,Long4v2)
MT(L4L4)L4([ymm0-15],[ymm0-15])[ymm0-15]
/*nomemoryeffects*//*KILL:[%rax,%rcx,%rdx,…,%xmm0,...,%xmm15]*/
Calling Convention
67 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Register Preservation Explicit KILLs
vpaddd_,_,_
Long4vpaddd(Long4v1,Long4v2)
MT(L4L4)L4([ymm0-15],[ymm0-15])[ymm0-15]
/*nomemoryeffects*/KILL:[/*empty*/]
68 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
...nmethodpreamble...
<+12>:mov%rsi,%rbp;reg-to-regmove
<+15>:vmovdqu0x10(%rsi),%ymm0;unbox
<+20>:vmovdqu0x10(%rdx),%ymm1;unbox
<+25>:vpaddd%ymm0,%ymm1,%ymm0;snippet
<+29>:vmovdqu0x10(%rcx),%ymm1;unbox
<+34>:vmovdqu0x10(%r8),%ymm2;unbox
<+40>:vpaddd%ymm1,%ymm2,%ymm1;snippet
<+44>:vpaddd%ymm0,%ymm1,%ymm0;snippet
<+48>:vmovdqu%ymm0,(%rsp);spill
...allocation...
<+67>:callq_new_instance_Java
<+72>:vmovdqu0x10(%rbp),%ymm0;unbox
<+77>:vmovdqu(%rsp),%ymm1;fill
<+82>:vpaddd%ymm1,%ymm0,%ymm0;snippet
<+86>:vmovdqu%ymm0,0x10(%rax);box
...nmethodprologue...
#parm0:rsi:rsi='java/lang/Long4'#parm1:rdx:rdx='java/lang/Long4’
#{method}’vpaddd’...
69 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Vectors and Box Elimination
70 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
packagejava.lang;//128-bitvector.public/*value*/finalclassLong2{privatefinallongl1,l2;//FIXMEprivateLong2(){thrownewUnsupportedOperationException();}@HotSpotIntrinsicCandidatepublicstaticnativeLong2make(longlo,longhi);@HotSpotIntrinsicCandidatepublicnativelongextract(inti);@HotSpotIntrinsicCandidatepublicbooleanequals(Long2v){...}...
71 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Vector Box Layout
+8 +12 +20 0
l1 l2 mark klass
+16 +32
j.l.Long2
72 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Vector Box Layout
vmovdqu
+8 +12 +20 0
l1 l2 mark klass
+16 +32
j.l.Long2
73 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Vector Box Layout: Endianness
vmovdqu
+8 +12 +20 0
l1 l2 mark klass
+16 +32
j.l.Long2 LSB MSB
MSB LSB
74 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Vector Box Layout: Endianness
vmovdqu
+8 +12 0
mark klass
+16 +32
j.l.Long2 LSB MSB
MSB LSB
75 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Vector Box Layout: Alignment
vmovdqa?
+8 +12 +20 0
mark klass
+16 +32
j.l.Long2
76 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
8
Vector Box Layout: Alignment
vmovdqa
+8 +12 +20
mark klass
+32
j.l.Long2
vmovdqu
+16
8
77 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Snippet Result Boxing C2 IR
78 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Nested Vector Operations C2 IR
79 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
...
<+48>:vmovdqu%ymm0,(%rsp);spill
...allocation...
<+67>:callq_new_instance_Java;SAFEPOINT/DEOPT
<+72>:vmovdqu0x10(%rbp),%ymm0;unbox
<+77>:vmovdqu(%rsp),%ymm1;fill
...
PcDesc(pc=<+67>offset=48bits=4):
...
j.l.i.LambdaForm$BMH::reinvoke@20
Locals
-l0:a’.../BMH$Species_LL'{0x...}
-l1:obj[20]
...
Objects
-20:java.lang.Long4stack[0]
.
#parm0:rsi:rsi='java/lang/Long4'#parm1:rdx:rdx='java/lang/Long4’
No Boxing Across Safepoints
80 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
...nmethodpreamble...
<+12>:mov%rsi,%rbp;reg-to-regmove
<+15>:vmovdqu0x10(%rsi),%ymm0;unbox
<+20>:vmovdqu0x10(%rdx),%ymm1;unbox
<+25>:vpaddd%ymm0,%ymm1,%ymm0;snippet
<+29>:vmovdqu0x10(%rcx),%ymm1;unbox
<+34>:vmovdqu0x10(%r8),%ymm2;unbox
<+40>:vpaddd%ymm1,%ymm2,%ymm1;snippet
<+44>:vpaddd%ymm0,%ymm1,%ymm0;snippet
<+48>:vmovdqu%ymm0,(%rsp);spill
...allocation...
<+67>:callq_new_instance_Java
<+72>:vmovdqu0x10(%rbp),%ymm0
<+77>:vmovdqu(%rsp),%ymm1;fill
<+82>:vpaddd%ymm1,%ymm0,%ymm0;snippet
<+86>:vmovdqu%ymm0,0x10(%rax);box
...nmethodprologue...
Allocation Placement
81 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
...nmethodpreamble...
<+12>:mov%rsi,%rbp;reg-to-regmove
<+15>:vmovdqu0x10(%rsi),%ymm0;unbox
<+20>:vmovdqu0x10(%rdx),%ymm1;unbox
<+25>:vpaddd%ymm0,%ymm1,%ymm0;snippet
<+29>:vmovdqu0x10(%rcx),%ymm1;unbox
<+34>:vmovdqu0x10(%r8),%ymm2;unbox
<+40>:vpaddd%ymm1,%ymm2,%ymm1;snippet
<+44>:vpaddd%ymm0,%ymm1,%ymm0;snippet
<+48>:vmovdqu%ymm0,(%rsp);spill
...allocation...
<+67>:callq_new_instance_Java
<+72>:vmovdqu0x10(%rbp),%ymm0;repeatedunbox
<+77>:vmovdqu(%rsp),%ymm1;fill
<+82>:vpaddd%ymm1,%ymm0,%ymm0;snippet
<+86>:vmovdqu%ymm0,0x10(%rax);box
...nmethodprologue...
Repeated Unboxing
82 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Hash
hi+1 = 31*hi + vi
83 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Vectorized Hash
31 31 31 31 31 311 5432
i i i i i i i i234567 1 0
3176* * * * * * * *+ + + + + + + +a7318 a6318 a5318 a4318 a3318 a2318 a1318 a0318
b7 b6 b5 b4 b3 b2 b1 b0
a7 a6 a5 a4 a3 a2 a1 a0
+
84 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Vectorized Hash
Long4vhash8(Long4acc,longch8){acc=mullo_epi32(acc,pow88);Long4cv8=load_v8qi_to_v8si(ch8);cv8=mullo_epi32(cv8,pow8);acc=add_epi32(acc,cv8);returnacc;}intvectorized_hash(byte[]buf,intoff,intlen){Long4acc=Long4.ZERO;for(;len>=8;off+=8,len-=8){longv=U.getLong(buf,Unsafe.ARRAY_BYTE_BASE_OFFSET+off);acc=vhash8(acc,v);}
85 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
86 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Long4acc=Long4.ZERO;for(;len>=8;off+=8,len-=8){longv=U.getLong(...);acc=vhash8(acc,v);}
87 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Escape Analysis in C2
“The test case shows limitation of the current EA implementation. Objects will not be eliminated if there is merge point in which it is undefined which object is referenced.”
JDK-6853701 ”[The] address may point to more then one object. This may produce the false positive result (set not scalar replaceable) since the flow-insensitive escape analysis can't separate the case when stores overwrite the field's value from the case when stores happened on different control branches.”
hotpsot/src/share/vm/opto/escape.cpp#l1733
Limitations
88 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Escape Analysis in C2
§ Phi node blocks allocation elimination § No vector load hoisting out of a loop § No constant folding of vector values § Repeated unboxing § Box allocation placement
Observations
89 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Summary
90 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Current Status
§ Almost feature complete § C2 support on Linux/Solaris/Mac x86-64 § Extensively used for Vector API experiments § Early adopters (thanks!)
– Paul Sandoz (Oracle) – Ian Graves (Intel)
91 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Remaining Work
§ Feature work – temporary registers – instruction alignment – multiple return values – improve error diagnostics
§ Support non-x86 ISAs – SPARC (VIS), ARM64 (NEON)
§ Support in other JIT-compilers (Graal, C1) – implement snippet embedding
§ User-friendly API § EA enhancements
92 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Multiple Return Values cpuid
“CPUID can be executed at any privilege level to serialize instruction execution. Serializing instruction execution guarantees that any modifications to flags, registers, and memory for previous instructions are completed before the next instruction is fetched and executed.”
93 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Multiple Return Values
staticfinalMethodHandleMHcpuid=MachineCodeSnippet.builder("cpuid").effects(CONTROL,READ_MEMORY,WRITE_MEMORY).argument(int.class,rsi).argument(int.class,rdx).returns(Long2.class,xmm0)//MT(II)L2.kills(rax,rbx,rcx,rdx).code(0x8B,0xC6,//mov%esi,%eax 0x8B,0xCA,//mov%edx,%ecx 0x0F,0xA2,//cpuid 0x66,0x0F,0x3A,0x22,0xC0,0x00,//pinsrd$0x0,%eax,%xmm0 0x66,0x0F,0x3A,0x22,0xC3,0x01,//pinsrd$0x1,%ebx,%xmm0 0x66,0x0F,0x3A,0x22,0xC1,0x02,//pinsrd$0x2,%ecx,%xmm0 0x66,0x0F,0x3A,0x22,0xC2,0x03)//pinsrd$0x3,%edx,%xmm0.make();
cpuid
94 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Multiple Return Values
#parm0:rsi=’I’#parm1:rdx=’I’mov%esi,%eaxmov%edx,%ecxcpuidpinsrd$0x0,%eax,%xmm0pinsrd$0x1,%ebx,%xmm0pinsrd$0x2,%ecx,%xmm0pinsrd$0x3,%edx,%xmm0pextrd%eax,%xmm0,$0x2retq
cpuid(0x0,0x0).ecx
#parm0:rsi=’I’#parm1:rdx=’I’mov%esi,%eaxmov%edx,%ecxcpuidmov%ecx,%eaxretq
95 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Summary
§ Evolution of machine code snippet prototype – raw code, vectors, code “recipes”, effects
§ Vector values – should be value classes (w/ “heisenboxes”)
§ identity-less § aggressive boxing/unboxing
– EA is interim solution § limitations of EA implementation in C2
96 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Links
§ Project Panama repo: http://hg.openjdk.java.net/panama/panama/
§ Machine Code Snippets API: – jdk.vm.ci/vm/ci/panama/MachineCodeSnippet.java
§ Samples – http://hg.openjdk.java.net/panama/panama/jdk/file/tip/test/panama/snippets
97 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Thank you!
[email protected] @iwan0www
98 Copyright © 2016, Oracle and/or its affiliates. All rights reserved
Graphic Section Divider