Hydra VM: Extracting Parallelization From Legacy Code Using STM

29
Hydra VM: Extracting Parallelization From Legacy Code Using STM Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute and State University

description

Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran. Hydra VM: Extracting Parallelization From Legacy Code Using STM . VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute and State University. Outline. Motivation & Objectives - PowerPoint PPT Presentation

Transcript of Hydra VM: Extracting Parallelization From Legacy Code Using STM

Page 1: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Hydra VM: Extracting Parallelization From Legacy Code Using STM

Mohamed. M. SaadMohamed A. Mohamedin &Prof. Binoy Ravindran

VT-MENA ProgramElectrical & Computer Engineering DepartmentVirginia Polytechnic Institute and State University

Page 2: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Outline

Motivation & Objectives Background

Transactional Memory Jikes RVM

Program Reconstruction Architecture

Profiler, Builder & Runtime Future Work

Page 3: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Motivation

Why Multicores? Difficult to make single-

core clock frequencies even higher

Deeply pipelined circuits▪ heat problems▪ speed of light problems▪ difficult design and

verification▪ large design teams

necessary▪ server farms need expensive

air-conditioning

Page 4: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Motivation No fast CPUs any

more, just more cores!

Trend is using multi-core & hyper-threading

Page 5: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Motivation At 2005, Sun Niagara (8 cores with HT run 32

HWT) At 2010, Supermicro (48-core AMD Opteron). Now, Sun  make boxes with between 128-512

hardware threads (16 HWT/core, 8 cores/CPU) !!

What About Software!!!

Are we ready for this HW ?!

Page 6: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Objective Many applications are designed to use few threads Legacy systems were designed to run at a single

processor Multi-threading programming is headache for

developers (race situations, concurrent access, …)

HydraVM: Java Virtual Machine Prototype based on Jikes RVM and targets utilizing large number of cores through detecting automatically possible parallel portions of code

Page 7: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Background

Transactional Memory

Jikes RVM (Adaptive Online Architecture)

Page 8: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Atomicity Atomicity: An operation (or set of operations) appears

to the rest of the system to occur instantaneously

Example (Money Transfer):……synchronized {

from = from - amount to = to + amount }

…………

Example (Money Transfer):…………account1.lock()account2.lock()from = from - amount to = to + amount account1.unlock()account2.unlock()…………

account1

account2X

Y

Page 9: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Mutual Locks “Classical Approach”

Drawbacks Deadlock Livelock Starvation Priority Inversion Non-composable Cost of managing the lock Non-scalable on multiprocessors

A B

X

Y

Page 10: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Transactional Memory Simplifies parallel programming by allowing a group of load

and store instructions to execute in an atomic way using additional primitives

Example (Money Transfer):…………START-

TRANSACTIONfrom = from - amount to = to + amount END-TRANSACTION ………… Commit

orRollback & Retry

account1

account2X

Yaccount1y

account2y

account1x

account2x

Page 11: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Transactional Memory Each transaction has ReadSet & WriteSet Transactions conflict if have the same variable(s)

at ReadSet / WriteSet Conflict Resolution using Contention Manager

that employs different policies (Aggressive, Polite, Back-Off, Random, …..)

Aborted code undo changes (if required) and retries again

Page 12: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Transactional Memory Transactions may be nested (multiple levels)

Inner transaction share the ReadSet/WriteSet of parent Inner transactions conflicts with each other and with

other higher level transactions Aborting parent transaction forces abort for children Inner transactions changes are visible to parents once

commit successfully, but hidden from outside world till commit of highest level

Page 13: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Transactional Memory Hardware Transactional Memory

Modifications in processors, cache and bus protocolex; unbounded HTM, TCC, ….

Software Transactional MemorySoftware runtime library or the programming language supportMinimal hardware support; CAS, LL/SCex; RSTM, DSTM, ESTM, ..

Hybrid Transactional MemoryExploits HTM support to achieve hardware performance for transactions that do not exceed the HTM’s limitations, and STM otherwiseex; LogTM, HyTM, …

Distributed Transactional MemoryExtends transaction primitives to distributed environment (network of multiple machines)ex; HyFlow, DecentSTM, GenSTM, …

Page 14: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Jikes RVM Mature modular open source Java virtual machine designed

for research purposes. Unlike most other JVMs it is written in Java!

Adaptive Online System

Page 15: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Program Reconstruction “The Main Idea”

We view program as a set of basic building blocks Each block is a set of instructions Block has single entry and multiple exists Blocks may access the same memory (variables) It is possible to reconstruct the program from

these blocks by rearranging it differently with some changes to the control instructions.

It is even possible to assign each set of blocks to different thread

Page 16: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Example

int counter = 0; for(int i=0; i<2; i++)      if(Math.random()>0.3)        counter++;      else          counter--; 

0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 4: goto 29 7: invokestatic #13; 10: ldc2_w #19; 13: dcmpl 14: ifle 23 17: iinc 1, 1 20: goto 26 23: iinc 1, -1 26: iinc 2, 1 29: iload_2 30: bipush 12 32: if_icmplt 7 35: return

0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 4: goto 29 7: invokestatic #13; 10: ldc2_w #19; 13: dcmpl 14: ifle 23 17: iinc 1, 1 20: goto 26 23: iinc 1, -1 26: iinc 2, 1 29: iload_2 30: bipush 12 32: if_icmplt 7 35: return

Page 17: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Examplepublic class Test{

   public static void foo(){     int counter = 0;     for(int i=0; i<12; i++)        if(Math.random()>0.3)           counter++;        else           counter--;   }

   public static void zoo(){        System.out.println("hi");   }

   public static void main(String[] args){        int i=6;        if(i<10)                foo();        else                zoo();   }}

Page 18: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Architecture

Page 19: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Architecture Profiler

Split code into Basic Block

Inject loaded classes with additional instructions to monitor: Program Flow (Which Basic

Blocks are accessed and in what order?)

Memory accessed by each Basic Block

Which Basic Block is doing I/O ?

Page 20: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Architecture Profiler

0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 write J write C visit B1 4: goto 29 7: invokestatic #13; 10: ldc2_w #19; 13: dcmpl read K write K visit B2 14: ifle 23 17: iinc 1, 1 read C write C visit B3 20: goto 26 23: iinc 1, -1 read C write C visit B4 26: iinc 2, 1 read J write J visit B5 29: iload_2 30: bipush 12 visit B6 read J 32: if_icmplt 7 35: return

0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 4: goto 29 7: invokestatic #13; 10: ldc2_w #19; 13: dcmpl 14: ifle 23 17: iinc 1, 1 20: goto 26 23: iinc 1, -1 26: iinc 2, 1 29: iload_2 30: bipush 12 32: if_icmplt 7 35: return

Example:int C = 0; for(int J=0; J<2; J++)      if(Math.random()>0.3)        C++;      else          C--; 

Page 21: Hydra VM: Extracting Parallelization From Legacy Code Using STM

ArchitectureRecompilation

Recompile the Java class bytecode into machine-code

Replace and reload class definition at memory

Page 22: Hydra VM: Extracting Parallelization From Legacy Code Using STM

ArchitectureCode Execution

Running the profiled code

Collecting flow & memory access information and store it at the knowledge repository

Page 23: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Architecture Builder

Analyze knowledge repository information and know: Which Blocks can be

grouped together Which groups of blocks can

be parallelized

Page 24: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Architecture Builder

Program can be represented as a string (each character is a basic block).

Example: for (Integer i = 0; i < DIMx; i++) { for (Integer j = 0; j < DIMx; j++) {for (Integer k = 0; k < DIMy; k++) {C[i][j] += A[i][k] * B[k][j];} } }

abjbhcfefghcfefghijbhcfefghcfefghijk

ab(jb(hcfefg)2hi)2jk

Page 25: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Architecture Builder

ab(jb(hcfefg)2hi)2k

Externalize common blocks patterns as methods

Generated methods may benested

Reconstruct the program asproducer-consumer pattern Collector

▪ Provides Executor with suitable blocks as Tasks to execute according to flow up-to time

Executor▪ Allocates core threads▪ Assign tasks to threads▪ Requests Collector for more blocks based on program flow, after all threads

complete

Page 26: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Architecture Builder

Problems Threads may conflict when

access the same variables Threads may finish out of

normal order Collector may generate invalid

tasks

Lets represents each Thread as Transaction

When two transactions conflicts abort one that has newer blocks relative to normal execution

Transaction will not commit unless its preceding one in timeline is finished

Transaction timeout if not reachable

Page 27: Hydra VM: Extracting Parallelization From Legacy Code Using STM

ArchitectureCode Execution – revisit

Collects which transactions conflicts and commit rate

We can refine the constructed program

Builder re-organize generated blocks and recompile the code again

Page 28: Hydra VM: Extracting Parallelization From Legacy Code Using STM

Ongoing & Future Work Complete the implementation of HydraVM

Profiling by monitoring memory instead of generating new instructions

Automatically uses of Java NIO to handle I/O operations and generate callbacks to process it

Using thread scheduling techniques instead of TM

Formal verification of reconstructed programs matches desired semantics