Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna...

45
Highly Parallel, Object- Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers, Research Fellow, The University of Manchester [email protected]

Transcript of Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna...

Page 1: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Highly Parallel, Object-Oriented Computer Architecture

(also the Jikes RVM and PearColator)Vienna University of Technology

August 2nd

Dr. Ian Rogers,Research Fellow,

The University of [email protected]

Page 2: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Presentation Outline The JAMAICA project

What are the problems?

Hardware design solutions

Software solutions

Where we are

The Jikes RVM – behind the scenes

PearColator – a quick overview

Page 3: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Problems “The job of an engineer is to identify problems

and find better ways to solve them”, James Dyson (I'm sure many others)

There are many problems currently in Computer Science and more on the horizon

Problem solving is adhoc, and many good solutions aren't successful

Let's look at the problems from the bottom up

Page 4: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Problems in Process Heat and power are problems

Smaller sizes (<45nm) lead to problems with process variations, degradation over time, more transient errors.

3D or stacked designs have significant problems

Simulation must be repeated for all possible design and environment variations so statistically a design should work

Page 5: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Problems in Architecture Speed of interconnect

Die area is very large

Knowing your market “tricks” are key to realising performance – especially in the embedded space

Move away from general purpose design – GPUs, physics processors

Page 6: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Problems in Systems Software Lag of systems software behind new hardware

How to virtualise systems with minimal cost

Lots of momentum in existing solutions

Problems with natively executable code: needs to be run in a hardware/software sandbox

no dynamic optimisation of code with libraries and operating system

cost of hardware to support virtualization

Page 7: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Problems in Compilers The hardware target keeps moving

The notion of stacked languages, and virtual machines isn't popular

Why aren't we better at instruction selection? embedded designs have vasts amount of

assembler code targeting exotic registers and ISAs

How to parallelize for thousands of contexts

Machine learning?

Page 8: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Problems for Applications Application writers have an abundance of tools

and wisdom to listen to, the wisdom often conflicts

Application concerns: performance

maintainability (evolution?)

time to implement

elegance of solution

Page 9: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Problems for Consumers Cost

Migration of systems

Legacy support

Page 10: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Recap Process: lots of transistors, lots of problems

Architecture: speed of interconnect, complexity

Systems software: momentum

Compilers: stacking, using new architectures

Applications: lots of tools and wisdom, concerns

Consumers: how much does it cost? What about my current business systems and processes?

Page 11: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Oh dear, it's a horrible problem, what we need is a new programming language

Why? parallel programming is done badly at the moment

we should teach CS undergraduates this language

we will then inherently get parallel software

Why not? CSP has already been here [Hoare 1978]

clusters are already solving similar problems using existing languages

it's easy to blame the world's problems on C

Page 12: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Do we need another programming language?

What I think is needed are domain specific languages, or domain specific uses of a language: extracting parallelism from Java implies work not

necessary at a mathematical abstraction – MatlabP, Mathematica, Fortran 90

codecs, graphics pipelines, network processors – languages devised here should express just what's necessary to do the job

message passing to avoid use of shared memory

Page 13: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Virtual Machine Abstraction We don't need another programming language,

we need common abstractions

This abstraction will be inherently parallel but not inherently shared memory

Java is a reasonable choice and brings momentum

Page 14: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Architecture Point-to-point asynchronous interconnect

Simultaneous Multi-Threading to hide latencies (e.g. Sun Niagara, Intel HyperThreading)

Object-Oriented – improve GC, simplify directories

Transactional – remove locks, enable speculation

Page 15: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Object-Oriented Hardware Proposed in the Mushroom project from

Manchester

Recent papers by Wright and Wolzcko

Address the cache using object ID and offset

Object ID Offset

L1 Data Cache

Page 16: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Object-Oriented Hardware

On a cache miss the object ID is translated to the object’s address

Object ID Offset

L1 Data Cache

MISS!

Page 17: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Object-Oriented Hardware

We can re-use the TLB

Having a map allows objects to be moved without altering references

Only object headers will contain locks

Object ID Object to Address Map

Virtual Memory

TLB

Page 18: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Transactional Hardware

Threads reading the same object can do so regardless of their epoch

(based on [Ananian et al., 2005])

Transaction Object ID

Page 19: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Transactional Hardware

When a later epoch thread writes to an object a clone is made

Transaction Object ID

Page 20: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Transactional Hardware

If an earlier thread writes to an object the later threads using that object rollback

Transaction Object ID

Page 21: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Object-Oriented and Transactional Hardware

Again the TLB can remove some of the cost

Map Virtual Memory

TLB

Transaction Object ID

Page 22: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Speculation Speculative threads created with predicted

input values and expected not to interact with other non-speculative threads

Transaction can complete if we didn’t rollback and inputs to thread were as predicted

Can speculate at: Method calls

Loops

Page 23: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Operating Systems Supporting an object based and virtual

memory view of a system implies extra controls in our system

Therefore, we want the whole system software stack inside the VM

Page 24: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Operating Systems

Page 25: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Where we are Jikes RVM and JNode based Java operating

systems

Open source dynamic binary translator (arguably state-of-the-art performance)

Simulated architecture

Parallelizing JVM, working for do-all loops, new work on speculation and loop pipelining

Lots of work on the other things I've talked about

Page 26: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

The Jikes RVM Overview of the adaptive compilation system:

Methods recompiled based on their predicted future execution time and the time taken to compile

Some optimisation levels are skipped

Page 27: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

The baseline compiler Used to compile code the first time it’s invoked

Very simple code generation:

iload_0

iload_1

iadd

istore_0

Load t0, [locals + 0]

Store [stack+0], t0

Load t0, [locals + 4]

Store [stack+4], t0

Load t0, [stack+0]

Load t1, [stack+4]

Add t0, t0, t1

Store [stack+0], t0

Load t0, [stack+0]

Store [locals + 0], t0

Page 28: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

The baseline compiler Pros:

Easy to port – just write emit code for each bytecode

Minimal work needed to port runtime and garbage collector

Cons: Very slow

Page 29: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

The boot image Hijack the view of memory (mapping of objects to

addresses)

Compile list of primordial classes

Write view of memory to disk (the boot image)

The boot image runner loads the disk image and branches into the code block for VM.boot

Page 30: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

The boot image Problems:

Difference of views between: Jikes RVM Classpath Bootstrap JVM

Fix by writing null to some fields

Jikes RVM runtime needs to keep pace with Classpath

Page 31: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

The runtime M-of-N threading

Thread yields are GC points

Native code can deadlock the VM

JNI written in Java with knowledge of C layout

Classpath interface written in Java

Page 32: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

The Jikes RVM Overview of the adaptive compilation system:

Methods recompiled based on their predicted future execution time and the time taken to compile

Some optimisation levels are skipped

Page 33: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

The optimizing compiler Structured from compiler phases based on HIR, LIR

and MIR phases from Muchnick

IR object holds instructions in linked lists in a control flow graph

Instructions are an object with:

One operator

Variable number of use operands

Variable number of def operands

Support for def/use operands

Some operands and operators are virtual

Page 34: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

The optimizing compiler HIR:

Infinite registers

Operators correspond to bytecodes

SSA phase performed

LIR:

Load/store operators

Java specific operators expanded

GC barrier operators

SSA phase performed

MIR:

Fixed number of registers

Machine operators

Page 35: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

The optimizing compiler Factored control graph:

Don’t terminate blocks on Potentially Exceptioning Instructions (PEIs)

Bound check Null check

Checks define guards which are used by: Putfield, getfield, array load/store, invokevirtual

Eliminating guards requires propagation of use

Page 36: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

The optimizing compiler Java – can we capture and benefit from strong type

information?

Extended Array SSA:

Single assignment

Array – Fortran style - a float and an int array can’t alias

Extended – different fields and different objects can’t alias

Phi operator – for registers, heaps and exceptions

Pi operator – define points where knowledge of a variable is exposed. E.g. A = new int[100], later uses of A can know the array length is 100 (ABCD)

Page 37: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

The optimizing compiler HIR: Simplification, tail recursion elimination, estimate

execution frequencies, loop unrolling, branch optimizations, (simple) escape analysis, local copy and constant propagation, local common sub-expression elimination

SSA in HIR: load/store elimination, redundant branch elimination, global constant propagation, loop versioning

AOS framework

Page 38: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

The optimizing compiler LIR: Simplification, estimate execution frequencies,

basic block reordering, branch optimizations, (simple) escape analysis, local copy and constant propagation, local common sub-expression elimination

SSA in LIR: global code placement, live range splitting

AOS framework

Page 39: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

The optimizing compiler MIR: instruction selection, register allocation,

scheduling, simplification, branch optimizations

Fix-ups for runtime

Page 40: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Speculative Optimisations Often in a JVM there’s potentially not a

complete picture, in particular for dynamic class loading

On-stack replacement allows optimisation to proceed with a get out clause

On-stack replacement is a virtual Jikes RVM instruction

Page 41: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Applications of on-stack replacement

Safe invalidation for speculative optimisation

Class hierarchy-based inlining

Deferred compilation Don’t compile uncommon cases Improve dataflow optimization and improve compile time

Debug optimised code via dynamic deoptimisaton

At break-point, deoptimize activation to recover program state

Runtime optimization of long-running activities

Promote long-running loops to higher optimisation levels

Page 42: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

PearColator

Page 43: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

PearColator

Decoder:

Disassembler

Interpreter (Java threaded)

Translator

Generic components:

Loaders

System calls

Memory

Page 44: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

PearColator

Page 45: Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,

Thanks and…

any questions?