Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf ·...

Post on 19-Apr-2020

9 views 0 download

Transcript of Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf ·...

1 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Machine Code Snippets in Java

Vladimir Ivanov HotSpot JVM Compile r Oracle Corp. JVM Language Summit 2016

2 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

3 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

4 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

0x11529c8c0:mov%eax,-0x16000(%rsp)0x11529c8c7:push%rbp0x11529c8c8:sub$0x20,%rsp0x11529c8cc:mov%rdx,(%rsp)0x11529c8d0:mov%rsi,%rbp0x11529c8d3:movabs$0x7c0013d10,%rsi0x11529c8dd:nop0x11529c8de:nop0x11529c8df:nop0x11529c8e0:vzeroupper0x11529c8e3:callq0x00000001152418a00x11529c8e8:mov%rax,%rbx0x11529c8eb:mov(%rsp),%r100x11529c8ef:vmovdqu0x10(%r10),%ymm10x11529c8f5:vmovdqu0x10(%rbp),%ymm00x11529c8fa:vpaddd%ymm0,%ymm1,%ymm00x11529c8fe:vmovdqu%ymm0,0x10(%rbx)0x11529c903:mov%rbx,%rax0x11529c906:vzeroupper0x11529c909:add$0x20,%rsp0x11529c90d:pop%rbp0x11529c90e:test%eax,-0xb786914(%rip)0x11529c914:retq

0x11529d240:mov%eax,-0x16000(%rsp)0x11529d247:push%rbp0x11529d248:sub$0x30,%rsp0x11529d24c:mov%rcx,%rbp0x11529d24f:vmovdqu0x10(%rsi),%ymm00x11529d254:vmovdqu0x10(%rdx),%ymm10x11529d259:vpaddd%ymm0,%ymm1,%ymm00x11529d25d:vmovdqu%ymm0,(%rsp)0x11529d262:movabs$0x7c0013d10,%rsi0x11529d26c:vzeroupper0x11529d26f:callq0x00000001152418a00x11529d274:mov%rax,%rbx0x11529d277:vmovdqu0x10(%rbp),%ymm10x11529d27c:vmovdqu(%rsp),%ymm00x11529d281:vpaddd%ymm0,%ymm1,%ymm00x11529d285:vmovdqu%ymm0,0x10(%rbx)0x11529d28a:mov%rbx,%rax0x11529d28d:vzeroupper0x11529d290:add$0x30,%rsp0x11529d294:pop%rbp0x11529d295:test%eax,-0xb78729b(%rip)0x11529d29b:retq

x86Assembly

5 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

The Plan

§ Background

§ Machine Code Snippets –  the concept & its evolution

§ Vectors –  box elimination, C2 optimizations, GC

6 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

7 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vector ISA Extensions

§ 100s of vector instructions on x86 §  Intel intrinsic instructions

–  MMX: ~120 –  SSE: ~130 –  SSE2/3/SSSE3/4.1/4.2: ~260 –  AVX/AVX2: ~380

8 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vector ISA Extensions

§ 1000s of vector instructions on x86 §  Intel intrinsic instructions

–  MMX: ~120 –  SSE: ~130 –  SSE2/3/SSSE3/4.1/4.2: ~260 –  AVX/AVX2: ~380 –  AVX-512: ~3800

9 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

10 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Motivation

§ Vector API –  expose data-parallel operations through a cross-platform API

§ How to bind to particular machine instructions in the implementation?

§ Existing solutions –  JVM intrinsics –  JNI / NativeMethodHandles (in Project Panama)

11 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

JVM Intrinsics

“A method is intrinsified if the HotSpot VM replaces the annotated method with hand-written assembly and/or hand-written compiler IR -- a compiler intrinsic -- to improve performance.”

@HotSpotIntrinsicCandidate JavaDoc

publicfinalclassjava.lang.Class<T>implements…{@HotSpotIntrinsicCandidatepublicnativebooleanisInstance(Objectobj);

12 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

JNI @since 1.1

13 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

JNI

classLib{staticnativevoidm();}

voidJNICALLJava_Lib_m(JNIEnv*env,jclassc){ m();}

Usage scenario

14 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Native Method Handles

MethodTypemt=MethodType.methodType(void.class);MethodHandlemh= MethodHandles.lookup().findNative("m",mt);mh.invokeExact();

Project Panama

15 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Native Method Handles Project Panama

Java Native

Construction Lookup.findVirtual() et al Lookup.findNative()

Reference (typed) DirectMethodHandle NativeMethodHandle

Reference (direct) MemberName NativeEntryPoint

Linker MH.linkToVirtual() et al MH.linkToNative()

Invocation indy, MH.invoke(), MH.invokeExact()

“Making native calls from the JVM” by John Rose http://cr.openjdk.java.net/~jrose/panama/native-call-primitive.html

16 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Native Method Handles Project Panama

callq 0x1057b2eb0 ; native method entry

getpid JNI 13.7 ± 0.5 ns Direct call 3.4 ± 0.2 ns

17 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Native code vs JVM Intrinsics

§ Native method + arbitrary native code - too much ceremony - opaque to the JVM

§  JVM Intrinsics + powerful, lightweight, and flexible - high development costs

18 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Machine Code Snippets

19 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Machine Code Snippets

New breed:

NativeMethodHandle + JVM intrinsic

Idea (1st iteration)

Wrap raw machine code in a method handle

The Idea

20 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Machine Code Snippets

§ Use case: prototyping 1.  minimize implementation costs 2.  decent performance 3.  up to a dozen instructions in size

§ Existing solutions –  JVM intrinsics: 1. no / 2. yes / 3. yes –  JNI / NMH: 1. yes / 2. no / 3. yes

Motivation / Goals

21 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectorized Memory Copy

vmovdqumem,reg//256-bitloadvmovdqureg,mem//256-bitstore

22 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectorized Memory Copy Machine Code as a Method Handle

mov256MH.invokeExact(src,off1,dst,off2);

C4E17E6F0437C4E17E7F040A

MH(LJLJ)V

23 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectorized Memory Copy Machine Code as a Method Handle

vmovdqu(?,?,1),%ymm0vmovdqu%ymm0,(?,?,1)

MH(LJLJ)Vmov256MH.invokeExact(src,off1,dst,off2);

24 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectorized Memory Copy Machine Code as a Method Handle

vmovdqu(%rdi,%rsi,1),%ymm0vmovdqu%ymm0,(%rdx,%rcx,1)

/*(rdi,rsi,rdx,rcx)*/

mov256MH.invokeExact(src,off1,dst,off2);

MH(LJLJ)V

25 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Machine Code Snippet

§ 2 execution modes –  optimized:

§  embedded in generated code –  non-optimized, interpreted

§  invokes stand-alone version MethodHandleJava

Native

Unsafe wrapper

Safe wrapper

Stand-alone Embedded

User-defined

produced by j.l.i

26 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Machine Code Snippet

§ matches native ABI –  same machine code in all execution modes

§ System V AMD64 ABI: “Function Calling Sequence” –  first 6 integer arguments: RDI, RSI, RDX, RCX, R8, R9 –  first 8 FP arguments: XMM0, …, XMM7 –  return registers: RAX/RDX (integer), XMM0/XMM1 (FP) –  …

Calling Convention

27 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Native Method Handle

packagejava.lang.invoke;/*non-public*/classNativeMethodHandleextendsMethodHandle{finalNativeEntryPointnativeFunc;

/*non-public*/classNativeEntryPoint{finallongaddr;finalMethodTypetype;/*non-public*/classMachineCodeSnippetextendsNativeEntryPoint{finalbyte[]code;

28 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Machine Code Snippet

publicstaticMethodHandlemake(Stringname,MethodTypemt,booleanisSupported,byte...machineCode){...}

How To Use

29 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectorized Memory Copy

staticfinalMethodHandlemov256MH=MachineCodeSnippet.make(”copy256”,MethodType.methodType(void.class,//returntypeObject.class/*rdi*/,long.class/*rsi*/,//srcObject.class/*rdx*/,long.class/*rcx*/),//dstrequires(AVX),0xC4,0xE1,0x7E,0x6F,0x04,0x37,//vmovdqu(%rdi,%rsi,1),%ymm00xC4,0xE1,0x7E,0x7F,0x04,0x0A);//vmovdqu%ymm0,(%rdx,%rcx,1)

MethodHandle

30 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Stand-alone version

Decodingcodesnippet"move256"@0x10f05d6a0<+0>:push%rbp<+1>:mov%rsp,%rbp<+4>:vmovdqu(%rdi,%rsi,1),%ymm0;c4e17e6f0437<+10>:vmovdqu%ymm0,(%rdx,%rcx,1);c4e17e7f040a

<+16>:leaveq<+17>:retq

$ java … -XX:+PrintCodeSnippets ...

#parm0:rdi:rdi='java/lang/Object’#parm1:rsi:rsi=long#parm2:rdx:rdx='java/lang/Object’#parm3:rcx:rcx=long

31 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectorized Memory Copy

staticfinalMethodHandlemov256MH=...;//Unsafewrapperstaticvoidmove256(Objectsrc,longoff1,Objectdst,longoff2){try{mov256MH.invokeExact(src,off1,dst,off2);}catch(Throwablee){thrownewError(e);}}

32 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectorized Memory Copy

4159181MachineCodeSnippetSamples::move256(27bytes)@8LambdaForm$MH::invokeExact_MT(29bytes)forceinlinebyannotation@11Invokers::checkExactType(17bytes)forceinlinebyannotation@1MethodHandle::type(5bytes)accessor@15Invokers::checkCustomized(23bytes)forceinlinebyannotation@1MethodHandleImpl::isCompileConstant(2bytes)(intrinsic)@25LambdaForm$NMH::invokeNative_LJLJ_V(27bytes)forceinlinebyann…@7NativeMethodHandle::internalNativeEntryPoint(8bytes)forceinline…@23MethodHandle::linkToNative(LJLJL)V(0bytes)directnativecall

33 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

#{method}'move256’...<+0>:mov%eax,-0x16000(%rsp)<+7>:push%rbp<+8>:sub$0x10,%rsp<+12>:mov%rsi,%rdi<+15>:mov%rdx,%rsi<+17>:mov%rcx,%rdx<+21>:mov%r8,%rcx<+24>:vmovdqu(%rdi,%rsi,1),%ymm0;c4e17e6f0437<+30>:vmovdqu%ymm0,(%rdx,%rcx,1);c4e17e7f040a<+36>:add$0x10,%rsp<+40>:pop%rbp<+41>:test%eax,-0x4d3d58f(%rip)<+47>:retq

#parm0:rsi:rsi='java/lang/Object'#parm1:rdx:rdx=long#parm2:rcx:rcx='java/lang/Object'#parm3:r8:r8=long

34 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Calling Convention

hotspot/src/cpu/x86/vm/sharedRuntime_x86_64.cpp:“TheJavacallingconventionisa"shifted"versionoftheCABI.

ByskippingthefirstCABIregisterwecancallnon-staticjnimethods

withsmallnumbersofargumentswithouthavingtoshufflethearguments

atall.SincewecontrolthejavaABIweoughttoatleastgetsome

advantageoutofit.“

Java vs C

arg 1st 2nd 3rd 4th 5th 6th … C RDI RSI RDX RCX R8 R9 stack

Java RSI RDX RCX R8 R9 RDI stack

System V AMD64 ABI

35 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Native Method Linker MH.linkToNative

#{method}'linkToNative'’(LJLJL)V’<+0>:push%rbp<+1>:mov%rsp,%rbp<+4>:mov%rsi,%rdi<+7>:mov%rdx,%rsi<+10>:mov%rcx,%rdx<+13>:mov%r8,%rcx<+16>:mov%r9,%r8<+19>:callq*0x10(%r8)<+23>:leaveq<+24>:retq

#parm0:rsi:rsi='java/lang/Object’#parm1:rdx:rdx=long#parm2:rcx:rcx='java/lang/Object’#parm4:r8:r8=long#parm3:r9:r9=‘.../NativeEntryPoint'

36 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectorized Memory Copy

//Safewrapperstaticvoidcopy256(byte[]src,intidx1, byte[]dst,intidx2){//ArrayboundschecksObjects.checkIndex(idx1+32,src.length); Objects.checkIndex(idx2+32,dst.length);//Offsetcomputationslongoff1=Unsafe.ARRAY_BYTE_BASE_OFFSET+idx1;longoff2=Unsafe.ARRAY_BYTE_BASE_OFFSET+idx2;move256(src,off1,dst,off2);//Unsafewrapper}

37 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectors

38 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD Add Packed Integers

39 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD Add Packed Integers

vpaddd%ymm1,%ymm0,%ymm0

vector/*ymm0*/vpaddd(vectorv1/*ymm0*/,vectorv2/*ymm1*/)

MH(??)?/*(ymm0,ymm1)ymm0*/

40 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

JVM vs Hardware Impedance Mismatch

size (bits) 8 16 32 64 128 256 512 …

x86 regs AL AX EAX RAX XMM0 YMM0 ZMM0 -

JVM B S I J - - - …

41 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectors

§  java.lang.Long2 / Long4 / Long8 / … –  represent 128/256/512-bit values –  value-based class (but should be a value class!)

§  “well-known“ to the JVM –  special treatment in the JVM –  C2 knows how to map the values to appropriate vector registers

42 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

JVM vs Hardware Impedance Mismatch

size (bits) 8 16 32 64 128 256 512 …

x86 regs AL AX EAX RAX XMM0 YMM0 ZMM0 -

JVM B S I J j.l.Long2 j.l.Long4 j.l.Long8 …

43 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Valhalla JVM vs Hardware

size (bits) 8 16 32 64 128 256 512 …

x86 regs AL AX EAX RAX XMM0 YMM0 ZMM0 -

JVM B S I J j.l.Long2 j.l.Long4 j.l.Long8 …

44 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD Add Packed Integers

vpaddd%ymm1,%ymm0,%ymm0

Long4/*rax*/vpaddd(Long4v1/*rsi*/,Long4v2/*rdx*/)

MH(L4L4)L4/*(rdi,rsi)rax*/

45 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD Add Packed Integers

vpaddd%ymm1,%ymm0,%ymm0

Long4/*rax*/vpaddd(Long4v1/*rsi*/,Long4v2/*rdx*/)

MH(L4L4)L4/*(rdi,rsi)rax*/

46 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD Add Packed Integers

vmovdqu0x10(%rdi),%ymm0;unboxvmovdqu0x10(%rsi),%ymm1;unboxvpaddd%ymm1,%ymm0,%ymm0vmovdqu%ymm0,0x10(%rax);box

Long4/*rax*/vpaddd(Long4v1/*rsi*/,Long4v2/*rdx*/)

MH(L4L4)L4/*(rdi,rsi)rax*/

47 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD Add Packed Integers

vmovdqu0x10(%rdi),%ymm0vmovdqu0x10(%rsi),%ymm1vpaddd%ymm1,%ymm0,%ymm0vmovdqu%ymm0,0x10(%rax)

Long4/*rax*/vpaddd(Long4v1/*rsi*/,Long4v2/*rdx*/)

MH(L4L4)L4/*(rdi,rsi)rax*/

48 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD Add Packed Integers

vmovdqu0x10(%rsi),%ymm0vmovdqu0x10(%rdx),%ymm1vpaddd%ymm1,%ymm0,%ymm0mov%rdi,%raxvmovdqu%ymm0,0x10(%rax)

Long4/*rax*/vpaddd(Long4box/*rsi*/, Long4v1/*rdx*/,Long4v2/*rcx*/)

MH(L4L4L4)L4/*(rdi,rsi,rdx)rax*/

49 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD Add Packed Integers

Long4/*rax*/vpaddd(Long4v1/*rsi*/,Long4v2/*rdx*/){Objectbox=Long4.make();return(Long4)MHm256_vpaddd.invokeExact(box,s1,s2);}

vmovdqu0x10(%rsi),%ymm0vmovdqu0x10(%rdx),%ymm1vpaddd%ymm1,%ymm0,%ymm0mov%rdi,%raxvmovdqu%ymm0,0x10(%rax)

MH(L4L4L4)L4/*(rdi,rsi,rdx)rax*/

50 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD Add Packed Integers //findStatic(Long?.class,“make”)+collectArguments()://Tadapter(A...a){//Long?b=Long?::make();//returntarget(b,a...);//}

vmovdqu0x10(%rsi),%ymm0vmovdqu0x10(%rdx),%ymm1vpaddd%ymm1,%ymm0,%ymm0mov%rdi,%raxvmovdqu%ymm0,0x10(%rax)

MH(L4L4L4)L4/*(rdi,rsi,rdx)rax*/

MH(L4L4)L4

51 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Stand-alone version

Decodingcodesnippet"m256_vpaddd"@0x10f06dd20<+0>:push%rbp<+1>:mov%rsp,%rbp<+4>:vmovdqu0x10(%rsi),%ymm0<+9>:vmovdqu0x10(%rdx),%ymm1

<+14>:vpaddd%ymm1,%ymm0,%ymm0<+18>:mov%rdi,%rax<+21>:vmovdqu%ymm0,0x10(%rax)<+26>:leaveq<+27>:retq

$ java … -XX:+PrintCodeSnippets ...

#parm0:rdi:rdi='java/lang/Object’#parm1:rsi:rsi='java/lang/Long4’#parm2:rdx:rdx='java/lang/Long4’

52 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

#{method}’vpaddd''(L4L4)L4<+0>:...preamble...<+12>:mov%rdx,(%rsp)<+16>:mov%rsi,%rbp

...<+35>:callq_new_instance_Java<+40>:mov%rax,%rbx<+43>:mov(%rsp),%r10<+47>:vmovdqu0x10(%r10),%ymm1

<+53>:vmovdqu0x10(%rbp),%ymm0<+58>:vpaddd%ymm0,%ymm1,%ymm0<+62>:vmovdqu%ymm0,0x10(%rbx)<+67>:mov%rbx,%rax<+70>:...prologue...

#parm0:rsi:rsi='java/lang/Long4'#parm1:rdx:rdx='java/lang/Long4’

53 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD

publicstaticLong4testVAdd4(Long4v1,Long4v2, Long4v3,Long4v4){Long4t1=vpaddd(v1,v2);Long4t2=vpaddd(v3,v4);Long4t3=vpaddd(t1,t2);returnvpaddd(t3,v1);}

Nested Sum

v1 v2 v3 v4

v1

+

+

+

+

54 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

...preamble...

<+12>:mov%r8,%r13

<+15>:mov%rcx,%rbx

<+18>:mov%rsi,%rbp

<+21>:vmovdqu0x10(%rsi),%ymm0;unbox

<+26>:vmovdqu0x10(%rdx),%ymm1;unbox

<+31>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+35>:vmovdqu%ymm0,(%rsp)

<+40>:vmovdqu0x10(%rbx),%ymm0;unbox

<+45>:vmovdqu0x10(%r13),%ymm1;unbox

<+51>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+55>:vmovdqu%ymm0,%ymm1

<+59>:vmovdqu(%rsp),%ymm0

<+64>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+68>:vmovdqu%ymm0,(%rsp)

...objectallocation...

<+87>:callq_new_instance_Java

<+92>:mov%rax,%rbx

<+95>:vmovdqu0x10(%rbp),%ymm1;unbox

<+100>:vmovdqu(%rsp),%ymm0

<+105>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+109>:vmovdqu%ymm0,0x10(%rbx);box

<+114>:mov%rbx,%rax

...prologue...

<+131>:retq

0x112ea1640:

#{method}'testVAdd4’...

#parm0:rsi:rsi='java/lang/Long4’#parm1:rdx:rdx='java/lang/Long4’#parm2:rcx:rcx='java/lang/Long4’#parm3:r8:r8='java/lang/Long4'

55 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Register Allocator-aware Snippets

56 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

RA-aware Snippets

Let user know about RA decisions!

Idea (2nd iteration):

Use snippet “recipe” instead of raw machine code.

57 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Register Masks

vpaddd_,_,_

Long4vpaddd(Long4v1,Long4v2)

MH(L4L4)L4

([%ymm0-15],[%ymm0-15])[%ymm0-15]

58 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

59 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

JVMCI

60 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

JVMCI

packagejdk.vm.ci.amd64;

/**RepresentstheAMD64architecture.*/publicclassAMD64extendsArchitecture{ //GeneralpurposeCPUregisters publicstaticfinalRegisterrax=newRegister(0,0,"rax",CPU); publicstaticfinalRegisterrcx=newRegister(1,1,"rcx",CPU); publicstaticfinalRegisterrdx=newRegister(2,2,"rdx",CPU); publicstaticfinalRegisterrbx=newRegister(3,3,"rbx",CPU); publicstaticfinalRegisterrsp=newRegister(4,4,"rsp",CPU);

...

/**Representsatargetmachineregister.*/publicfinalclassRegisterimplementsComparable<Register>{

Platform Definitions

61 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

RA-aware Snippets

packagejdk.vm.ci.panama;publicclassMachineCodeSnippet{publicstaticMethodHandlemake(Stringname,MethodTypemt,booleanisSupported,Register[][]regMasks,SnippetGeneratorcodeProducer){...}@FunctionalInterfacepublicinterfaceSnippetGenerator{int[]getCode(Register...regs);}

62 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Register Masks

vpaddd_,_,_

Long4vpaddd(Long4v1,Long4v2)

MH(L4L4)L4([%ymm0-15],[%ymm0-15])[%ymm0-15]

63 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD

staticfinalMethodHandleMHm256_vpaddd=MachineCodeSnippet.make("mm256_vpaddd", MethodType.methodType(Long4.class,Long4.class,Long4.class), requires(AVX2), newRegister[][]{xmmRegs,xmmRegs,xmmRegs}, (Register[]regs)->{ Registerout=regs[0],in1=regs[1],in2=regs[2]; intvex2=vex2(/*R*/1,in2.encoding(),/*L*/1,/*pp*/1);returnnewint[]{0xC5,vex2,0xFE,modRM(out,in1)};});}

RA-Aware Version

64 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

...preamble...

<+12>:mov%r8,%r13 ;???

<+15>:mov%rcx,%rbx;???

<+18>:mov%rsi,%rbp;???

<+21>:vmovdqu0x10(%rsi),%ymm0;unbox

<+26>:vmovdqu0x10(%rdx),%ymm1;unbox

<+31>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+35>:vmovdqu0x10(%rbx),%ymm1;unbox

<+40>:vmovdqu0x10(%r13),%ymm2;unbox

<+46>:vpaddd%ymm1,%ymm2,%ymm1;snippet

<+50>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+54>:vmovdqu%ymm0,(%rsp);spill

...objectallocation...

<+75>:callq_new_instance_Java

<+80>:mov%rax,%rbx;???

<+83>:vmovdqu0x10(%rbp),%ymm0;unbox

<+88>:vmovdqu(%rsp),%ymm1;fill

<+93>:vpaddd%ymm1,%ymm0,%ymm0;snippet

<+97>:vmovdqu%ymm0,0x10(%rbx);box

<+102>:mov%rbx,%rax;???

...prologue...

<+119>:retq

0x112ea1640:

#{method}'testVAdd4’...

#parm0:rsi:rsi='java/lang/Long4’#parm1:rdx:rdx='java/lang/Long4’#parm2:rcx:rcx='java/lang/Long4’#parm3:r8:r8='java/lang/Long4'

65 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Effects

•  Memory •  Killed registers

66 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Register Preservation

vpaddd_,_,_

Long4vpaddd(Long4v1,Long4v2)

MT(L4L4)L4([ymm0-15],[ymm0-15])[ymm0-15]

/*nomemoryeffects*//*KILL:[%rax,%rcx,%rdx,…,%xmm0,...,%xmm15]*/

Calling Convention

67 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Register Preservation Explicit KILLs

vpaddd_,_,_

Long4vpaddd(Long4v1,Long4v2)

MT(L4L4)L4([ymm0-15],[ymm0-15])[ymm0-15]

/*nomemoryeffects*/KILL:[/*empty*/]

68 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

...nmethodpreamble...

<+12>:mov%rsi,%rbp;reg-to-regmove

<+15>:vmovdqu0x10(%rsi),%ymm0;unbox

<+20>:vmovdqu0x10(%rdx),%ymm1;unbox

<+25>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+29>:vmovdqu0x10(%rcx),%ymm1;unbox

<+34>:vmovdqu0x10(%r8),%ymm2;unbox

<+40>:vpaddd%ymm1,%ymm2,%ymm1;snippet

<+44>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+48>:vmovdqu%ymm0,(%rsp);spill

...allocation...

<+67>:callq_new_instance_Java

<+72>:vmovdqu0x10(%rbp),%ymm0;unbox

<+77>:vmovdqu(%rsp),%ymm1;fill

<+82>:vpaddd%ymm1,%ymm0,%ymm0;snippet

<+86>:vmovdqu%ymm0,0x10(%rax);box

...nmethodprologue...

#parm0:rsi:rsi='java/lang/Long4'#parm1:rdx:rdx='java/lang/Long4’

#{method}’vpaddd’...

69 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectors and Box Elimination

70 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

packagejava.lang;//128-bitvector.public/*value*/finalclassLong2{privatefinallongl1,l2;//FIXMEprivateLong2(){thrownewUnsupportedOperationException();}@HotSpotIntrinsicCandidatepublicstaticnativeLong2make(longlo,longhi);@HotSpotIntrinsicCandidatepublicnativelongextract(inti);@HotSpotIntrinsicCandidatepublicbooleanequals(Long2v){...}...

71 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vector Box Layout

+8 +12 +20 0

l1 l2 mark klass

+16 +32

j.l.Long2

72 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vector Box Layout

vmovdqu

+8 +12 +20 0

l1 l2 mark klass

+16 +32

j.l.Long2

73 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vector Box Layout: Endianness

vmovdqu

+8 +12 +20 0

l1 l2 mark klass

+16 +32

j.l.Long2 LSB MSB

MSB LSB

74 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vector Box Layout: Endianness

vmovdqu

+8 +12 0

mark klass

+16 +32

j.l.Long2 LSB MSB

MSB LSB

75 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vector Box Layout: Alignment

vmovdqa?

+8 +12 +20 0

mark klass

+16 +32

j.l.Long2

76 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

8

Vector Box Layout: Alignment

vmovdqa

+8 +12 +20

mark klass

+32

j.l.Long2

vmovdqu

+16

8

77 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Snippet Result Boxing C2 IR

78 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Nested Vector Operations C2 IR

79 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

...

<+48>:vmovdqu%ymm0,(%rsp);spill

...allocation...

<+67>:callq_new_instance_Java;SAFEPOINT/DEOPT

<+72>:vmovdqu0x10(%rbp),%ymm0;unbox

<+77>:vmovdqu(%rsp),%ymm1;fill

...

PcDesc(pc=<+67>offset=48bits=4):

...

j.l.i.LambdaForm$BMH::reinvoke@20

Locals

-l0:a’.../BMH$Species_LL'{0x...}

-l1:obj[20]

...

Objects

-20:java.lang.Long4stack[0]

.

#parm0:rsi:rsi='java/lang/Long4'#parm1:rdx:rdx='java/lang/Long4’

No Boxing Across Safepoints

80 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

...nmethodpreamble...

<+12>:mov%rsi,%rbp;reg-to-regmove

<+15>:vmovdqu0x10(%rsi),%ymm0;unbox

<+20>:vmovdqu0x10(%rdx),%ymm1;unbox

<+25>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+29>:vmovdqu0x10(%rcx),%ymm1;unbox

<+34>:vmovdqu0x10(%r8),%ymm2;unbox

<+40>:vpaddd%ymm1,%ymm2,%ymm1;snippet

<+44>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+48>:vmovdqu%ymm0,(%rsp);spill

...allocation...

<+67>:callq_new_instance_Java

<+72>:vmovdqu0x10(%rbp),%ymm0

<+77>:vmovdqu(%rsp),%ymm1;fill

<+82>:vpaddd%ymm1,%ymm0,%ymm0;snippet

<+86>:vmovdqu%ymm0,0x10(%rax);box

...nmethodprologue...

Allocation Placement

81 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

...nmethodpreamble...

<+12>:mov%rsi,%rbp;reg-to-regmove

<+15>:vmovdqu0x10(%rsi),%ymm0;unbox

<+20>:vmovdqu0x10(%rdx),%ymm1;unbox

<+25>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+29>:vmovdqu0x10(%rcx),%ymm1;unbox

<+34>:vmovdqu0x10(%r8),%ymm2;unbox

<+40>:vpaddd%ymm1,%ymm2,%ymm1;snippet

<+44>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+48>:vmovdqu%ymm0,(%rsp);spill

...allocation...

<+67>:callq_new_instance_Java

<+72>:vmovdqu0x10(%rbp),%ymm0;repeatedunbox

<+77>:vmovdqu(%rsp),%ymm1;fill

<+82>:vpaddd%ymm1,%ymm0,%ymm0;snippet

<+86>:vmovdqu%ymm0,0x10(%rax);box

...nmethodprologue...

Repeated Unboxing

82 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Hash

hi+1 = 31*hi + vi

83 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectorized Hash

31 31 31 31 31 311 5432

i i i i i i i i234567 1 0

3176* * * * * * * *+ + + + + + + +a7318 a6318 a5318 a4318 a3318 a2318 a1318 a0318

b7 b6 b5 b4 b3 b2 b1 b0

a7 a6 a5 a4 a3 a2 a1 a0

+

84 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectorized Hash

Long4vhash8(Long4acc,longch8){acc=mullo_epi32(acc,pow88);Long4cv8=load_v8qi_to_v8si(ch8);cv8=mullo_epi32(cv8,pow8);acc=add_epi32(acc,cv8);returnacc;}intvectorized_hash(byte[]buf,intoff,intlen){Long4acc=Long4.ZERO;for(;len>=8;off+=8,len-=8){longv=U.getLong(buf,Unsafe.ARRAY_BYTE_BASE_OFFSET+off);acc=vhash8(acc,v);}

85 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

86 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Long4acc=Long4.ZERO;for(;len>=8;off+=8,len-=8){longv=U.getLong(...);acc=vhash8(acc,v);}

87 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Escape Analysis in C2

“The test case shows limitation of the current EA implementation. Objects will not be eliminated if there is merge point in which it is undefined which object is referenced.”

JDK-6853701 ”[The] address may point to more then one object. This may produce the false positive result (set not scalar replaceable) since the flow-insensitive escape analysis can't separate the case when stores overwrite the field's value from the case when stores happened on different control branches.”

hotpsot/src/share/vm/opto/escape.cpp#l1733

Limitations

88 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Escape Analysis in C2

§ Phi node blocks allocation elimination § No vector load hoisting out of a loop § No constant folding of vector values § Repeated unboxing § Box allocation placement

Observations

89 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Summary

90 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Current Status

§ Almost feature complete § C2 support on Linux/Solaris/Mac x86-64 § Extensively used for Vector API experiments § Early adopters (thanks!)

–  Paul Sandoz (Oracle) –  Ian Graves (Intel)

91 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Remaining Work

§ Feature work –  temporary registers –  instruction alignment –  multiple return values –  improve error diagnostics

§ Support non-x86 ISAs –  SPARC (VIS), ARM64 (NEON)

§ Support in other JIT-compilers (Graal, C1) –  implement snippet embedding

§ User-friendly API § EA enhancements

92 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Multiple Return Values cpuid

“CPUID can be executed at any privilege level to serialize instruction execution. Serializing instruction execution guarantees that any modifications to flags, registers, and memory for previous instructions are completed before the next instruction is fetched and executed.”

93 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Multiple Return Values

staticfinalMethodHandleMHcpuid=MachineCodeSnippet.builder("cpuid").effects(CONTROL,READ_MEMORY,WRITE_MEMORY).argument(int.class,rsi).argument(int.class,rdx).returns(Long2.class,xmm0)//MT(II)L2.kills(rax,rbx,rcx,rdx).code(0x8B,0xC6,//mov%esi,%eax 0x8B,0xCA,//mov%edx,%ecx 0x0F,0xA2,//cpuid 0x66,0x0F,0x3A,0x22,0xC0,0x00,//pinsrd$0x0,%eax,%xmm0 0x66,0x0F,0x3A,0x22,0xC3,0x01,//pinsrd$0x1,%ebx,%xmm0 0x66,0x0F,0x3A,0x22,0xC1,0x02,//pinsrd$0x2,%ecx,%xmm0 0x66,0x0F,0x3A,0x22,0xC2,0x03)//pinsrd$0x3,%edx,%xmm0.make();

cpuid

94 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Multiple Return Values

#parm0:rsi=’I’#parm1:rdx=’I’mov%esi,%eaxmov%edx,%ecxcpuidpinsrd$0x0,%eax,%xmm0pinsrd$0x1,%ebx,%xmm0pinsrd$0x2,%ecx,%xmm0pinsrd$0x3,%edx,%xmm0pextrd%eax,%xmm0,$0x2retq

cpuid(0x0,0x0).ecx

#parm0:rsi=’I’#parm1:rdx=’I’mov%esi,%eaxmov%edx,%ecxcpuidmov%ecx,%eaxretq

95 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Summary

§ Evolution of machine code snippet prototype –  raw code, vectors, code “recipes”, effects

§ Vector values –  should be value classes (w/ “heisenboxes”)

§  identity-less §  aggressive boxing/unboxing

–  EA is interim solution §  limitations of EA implementation in C2

96 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Links

§ Project Panama repo: http://hg.openjdk.java.net/panama/panama/

§ Machine Code Snippets API: –  jdk.vm.ci/vm/ci/panama/MachineCodeSnippet.java

§ Samples –  http://hg.openjdk.java.net/panama/panama/jdk/file/tip/test/panama/snippets

97 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Thank you!

vladimir.x.ivanov@oracle.com @iwan0www

98 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Graphic Section Divider