Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

46
Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware-Assisted Virtualization Mian-Muhammad Hamayun, Fr´ ed´ eric P´ etrot and Nicolas Fournel System Level Synthesis Group, TIMA Laboratory, CNRS/INP Grenoble/UJF, 46, Avenue F´ elix Viallet, F-38031 Grenoble, FRANCE January 25 th , 2013 Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25 th , 2013 1 / 26

Transcript of Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Page 1: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Native Simulation of Complex VLIW Instruction Sets UsingStatic Binary Translation and Hardware-Assisted Virtualization

Mian-Muhammad Hamayun, Frederic Petrot and Nicolas Fournel

System Level Synthesis Group,TIMA Laboratory, CNRS/INP Grenoble/UJF,

46, Avenue Felix Viallet, F-38031 Grenoble, FRANCE

January 25th, 2013

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 1 / 26

Page 2: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Introduction and Motivation

Introduction – MPSoC Trends

Motivation

Homogeneous vs. Heterogeneous Multi-Processor System-on-ChipMany General Purpose Processors (GPPs) à System Level ParallelismSpecialized Processing Elements e.g. Digital Signal Processors (DSPs)Very Long Instruction Word (VLIW) à Instruction Level Parallelism (ILP)

MPSoC Complexity limits use of Analytical Methods for Design SpaceExploration (DSE) and System Validation à Simulation Systems

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 2 / 26

Page 3: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Introduction and Motivation

Introduction – MPSoC Trends

Motivation

Homogeneous vs. Heterogeneous Multi-Processor System-on-ChipMany General Purpose Processors (GPPs) à System Level ParallelismSpecialized Processing Elements e.g. Digital Signal Processors (DSPs)Very Long Instruction Word (VLIW) à Instruction Level Parallelism (ILP)

MPSoC Complexity limits use of Analytical Methods for Design SpaceExploration (DSE) and System Validation à Simulation Systems

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 2 / 26

Page 4: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Introduction and Motivation

Software Simulation Levels and Native Simulation

Software Simulation Levels

InterpretationInstruction Set Simulation (ISS).

Native SimulationSource Level Simulation (SLS).Intermediate Representation Level Simulation (IRLS).

Binary Level Simulation (BLS)Dynamic Binary Translation (DBT)Static Binary Translation (SBT)

What is Native Simulation ?

When software is compiled/translated for host machine and does notrequire run-time translation or interpretation support.

Native software accesses host machine resources (CPU, Memory, ...)directly or at-least has an illusion of direct access.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 3 / 26

Page 5: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Introduction and Motivation

Software Simulation Levels and Native Simulation

Software Simulation Levels

InterpretationInstruction Set Simulation (ISS).

Native SimulationSource Level Simulation (SLS).Intermediate Representation Level Simulation (IRLS).

Binary Level Simulation (BLS)Dynamic Binary Translation (DBT)Static Binary Translation (SBT)

What is Native Simulation ?

When software is compiled/translated for host machine and does notrequire run-time translation or interpretation support.

Native software accesses host machine resources (CPU, Memory, ...)directly or at-least has an illusion of direct access.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 3 / 26

Page 6: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Problem Definition – Simulation of VLIW on Native Machines

Table of Contents

1 Introduction and Motivation

2 Problem Definition – Simulation of VLIW on Native Machines

3 Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

4 Experiments and Results

5 Summary and Conclusions

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 4 / 26

Page 7: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Problem Definition – Simulation of VLIW on Native Machines

Native Simulation Platform and Compilation Flow – I

Software Execution in Virtual Machine

Software executes in target address-spaceà Transparent memory accesses.

Requires Host-Dependent HAL layerimplementation e.g. x86.

Native Processor Wrapper

Initializes and Runs VM(s) using KVMuserspace library and forwards MMIOaccesses to SystemC platform.

Provides semi-hosting facilities e.g.annotations, profiling etc.

Like a Baremetal Machine

Software executing in Guest Mode cannotsee the Host operating system and librariesà No Dynamic Translations.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 5 / 26

Page 8: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Problem Definition – Simulation of VLIW on Native Machines

Native Simulation Platform and Compilation Flow – II

Traditional Compilation Flow

Software is Compiled to IR using CompilerFront-end.

Target-specific Backend optimizes the IR.

An annotation pass annotates the Cross-IRà Equivalent CFG (Control Flow Graph)

What can we do for VLIW Machines ?

Source Level Simulation?sequential vs. parallel instructions.

IR Level Simulation?Requires a retargetable compiler e.g. LLVM

Source code may not be availableà Binary Translation for Native Simulation?

Static translation is a better match i.e.Explicit ILP in VLIW.Generated code could be optimized.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 6 / 26

Page 9: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Problem Definition – Simulation of VLIW on Native Machines

Native Simulation Platform and Compilation Flow – II

Traditional Compilation Flow

Software is Compiled to IR using CompilerFront-end.

Target-specific Backend optimizes the IR.

An annotation pass annotates the Cross-IRà Equivalent CFG (Control Flow Graph)

What can we do for VLIW Machines ?

Source Level Simulation?sequential vs. parallel instructions.

IR Level Simulation?Requires a retargetable compiler e.g. LLVM

Source code may not be availableà Binary Translation for Native Simulation?

Static translation is a better match i.e.Explicit ILP in VLIW.Generated code could be optimized.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 6 / 26

Page 10: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Problem Definition – Simulation of VLIW on Native Machines

Superscalar vs. VLIW Processors

VLIW: A Simplified Superscalar

No Reservation Stations or ROBsà No Dynamic Scheduling.

Static Scheduling à Compile-TimeILP Specification.

VLIW: Still A Complex Architecture !

Parallel Instruction Execution i.e.Execute Packets à Data Hazards.

Complex Pipelinesà Data + Control Hazards.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 7 / 26

Page 11: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Problem Definition – Simulation of VLIW on Native Machines

Superscalar vs. VLIW Processors

VLIW: A Simplified Superscalar

No Reservation Stations or ROBsà No Dynamic Scheduling.

Static Scheduling à Compile-TimeILP Specification.

VLIW: Still A Complex Architecture !

Parallel Instruction Execution i.e.Execute Packets à Data Hazards.

Complex Pipelinesà Data + Control Hazards.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 7 / 26

Page 12: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Problem Definition – Simulation of VLIW on Native Machines

VLIW Processors Features and Translation Issues (TI C6x Series)

Key Features – VLIW Processors

NOP Instructions

Delay Slots à Out-of-Order Completion.

No Pipeline FlushingInstruction Fetch à Instruction Execution.

Key Issues – Binary Translation

Data Hazards (RAW, WAR, WAW).

Control Hazards (Nested Branches).

Early Termination e.g. Multi-Cycle NOPs.

Side Effects i.e. Modification of SourceOperands.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 8 / 26

Page 13: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Problem Definition – Simulation of VLIW on Native Machines

Address Translation + Indirect Branches

Data, Instruction and I/O Memory Accesses

Exploit memory virtualization capabilities provided by VMM.

Transfer I/O accesses to SystemC platform.

Indirect Branch Instructions

No dynamic translation support à Resort to static translation.

Provide multi-level translations i.e. Basic Blocks and Execute Packets.

Hand-Written and Self-Modifying Code

Branch Instructions targeting non-startup instructions in Execute Packets.Usually VLIW compilers do not produce such code à Not Addressed.

Presence of Pointers and Dynamic Linking. No very common in VLIWbinaries à Not Addressed.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 9 / 26

Page 14: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

Table of Contents

1 Introduction and Motivation

2 Problem Definition – Simulation of VLIW on Native Machines

3 Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

4 Experiments and Results

5 Summary and Conclusions

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 10 / 26

Page 15: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

A Generic Approach Illustrated using LLVM

Translation Flow – RISC Machines

1 Target-specific instruction decoders.

2 RISC-specific Basic Block construction.

3 Target ISA functional specification in C.

4 Target-independent intermediate code generation.

5 Native compilation using native backend.

Translation Flow – VLIW Machines

Instruction decoders à VLIW packet decoders.

VLIW-specific Basic Block construction.

à Better suited to VLIW i.e. Minimal translationunit is an Execute Packet (Upto 8 Instructions).

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 11 / 26

Page 16: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

A Generic Approach Illustrated using LLVM

Translation Flow – RISC Machines

1 Target-specific instruction decoders.

2 RISC-specific Basic Block construction.

3 Target ISA functional specification in C.

4 Target-independent intermediate code generation.

5 Native compilation using native backend.

Translation Flow – VLIW Machines

Instruction decoders à VLIW packet decoders.

VLIW-specific Basic Block construction.

à Better suited to VLIW i.e. Minimal translationunit is an Execute Packet (Upto 8 Instructions).

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 11 / 26

Page 17: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

VLIW Packet Decoder + Target Basic Blocks

VLIW Packet Decoding

Decodes target instructions à Generates in-memory Instruction Objects.Each object contains target-specific details (Predicate, Operands Types,Values, Delay Slots etc.)Each object can generate a Function Call in LLVM-IR (Architecture +Instruction + Operand Types)

Extract Parallelism from instruction stream à Execute Packets.

Branch Analysis à Mark statically known Branch Targets.

Basic Block Construction

Start a New Basic Block for each statically known Branch Target.

End at Branch Instruction + Execute Packets within its Delay Slot Range.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 12 / 26

Page 18: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

VLIW ISA Functional Specifications – I

Target-specific Instruction Behavior in LLVM-IR

When and How to modify the Register, Memory or Control state of CPU.

We require ISA behavior in LLVM-IR for Composing Intermediate Code.

Target-specific Instruction Behavior in C

Defined in ’Simple’ C and converted to LLVM-IR using LLVM CompilerFront-End.

Multiple ISA behavior definitions i.e. Exhaustively representing AllOperand Type combinations à Simple and Easy to Generate.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 13 / 26

Page 19: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

VLIW ISA Functional Specifications – II

An ISA Example: MPYSU Instruction in ’C’

1/// MPYSU - Multiply Signed 16 LSB and Unsigned 16 LSB.

2ReturnStatus_t

3C62xMPYSU_SC5_UR16_SR32(C62x_DSPState_t * p_state , uint8_t is_cond ,

4uint8_t be_zero , uint16_t idx_rc , uint32_t constant , uint16_t idx_rb ,

5uint16_t idx_rd , uint8_t delay , C62x_Result_t * result){

6if(Check_Predicate(p_state , is_cond , be_zero , idx_rc))

7{

8int16_t ra = C6XSC5_TO_S16(constant);

9uint16_t rb = GET_LSB16(p_state ->m_reg[idx_rb ]);

10int32_t rd = ra * rb;

1112SAVE_REG_RESULT(result , idx_rd , rd);

13}

14return OK;

15}

Key Elements

Naming Convention: C62xMPYSU_SC5_UR16_SR32(...)

Behavior Specification: int32_t rd = ra * rb;

Result on Parent’s Stack: C62x_Result_t * result à Life time + Scope.

Return Value: OK à Instruction does not require special processing.à Early Termination, Wait-for-Interrupt, Error Condition etc..

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 14 / 26

Page 20: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

Intermediate Code Generation

IR Function Composition

One Entry / Return Basic Block foreach Target Basic Block.

Core and Update Basic Block Pairfor each Target Execute Packet.

Control Flows between Generated IRBasic Blocks.

Core Basic Blocks

C1 Stack Memory AllocationInstructions for ISA Results.

C2 Calls to ISA Behavior in LLVM-IR.

C3 No Delay Slots à Immediate UpdateDelay Slots à Buffered Update;Handles Data Hazards + Side-Effects

C4 Instructions for testing ISA ReturnValues e.g. Early Termination.

Update Basic Blocks

U1 Update Processor StateRegisters including PC, Cyclesand Buffered Results.

U2 If RPCT 6= 0 à Branch Taken;Pass control to Software KernelHandles Nested Branches

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 15 / 26

Page 21: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

Intermediate Code Generation

IR Function Composition

One Entry / Return Basic Block foreach Target Basic Block.

Core and Update Basic Block Pairfor each Target Execute Packet.

Control Flows between Generated IRBasic Blocks.

Core Basic Blocks

C1 Stack Memory AllocationInstructions for ISA Results.

C2 Calls to ISA Behavior in LLVM-IR.

C3 No Delay Slots à Immediate UpdateDelay Slots à Buffered Update;Handles Data Hazards + Side-Effects

C4 Instructions for testing ISA ReturnValues e.g. Early Termination.

Update Basic Blocks

U1 Update Processor StateRegisters including PC, Cyclesand Buffered Results.

U2 If RPCT 6= 0 à Branch Taken;Pass control to Software KernelHandles Nested Branches

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 15 / 26

Page 22: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

Intermediate Code Generation

IR Function Composition

One Entry / Return Basic Block foreach Target Basic Block.

Core and Update Basic Block Pairfor each Target Execute Packet.

Control Flows between Generated IRBasic Blocks.

Core Basic Blocks

C1 Stack Memory AllocationInstructions for ISA Results.

C2 Calls to ISA Behavior in LLVM-IR.

C3 No Delay Slots à Immediate UpdateDelay Slots à Buffered Update;Handles Data Hazards + Side-Effects

C4 Instructions for testing ISA ReturnValues e.g. Early Termination.

Update Basic Blocks

U1 Update Processor StateRegisters including PC, Cyclesand Buffered Results.

U2 If RPCT 6= 0 à Branch Taken;Pass control to Software KernelHandles Nested Branches

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 15 / 26

Page 23: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

Intermediate Code Generation

IR Function Composition

One Entry / Return Basic Block foreach Target Basic Block.

Core and Update Basic Block Pairfor each Target Execute Packet.

Control Flows between Generated IRBasic Blocks.

Core Basic Blocks

C1 Stack Memory AllocationInstructions for ISA Results.

C2 Calls to ISA Behavior in LLVM-IR.

C3 No Delay Slots à Immediate UpdateDelay Slots à Buffered Update;Handles Data Hazards + Side-Effects

C4 Instructions for testing ISA ReturnValues e.g. Early Termination.

Update Basic Blocks

U1 Update Processor StateRegisters including PC, Cyclesand Buffered Results.

U2 If RPCT 6= 0 à Branch Taken;Pass control to Software KernelHandles Nested Branches

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 15 / 26

Page 24: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

Intermediate Code Generation

IR Function Composition

One Entry / Return Basic Block foreach Target Basic Block.

Core and Update Basic Block Pairfor each Target Execute Packet.

Control Flows between Generated IRBasic Blocks.

Core Basic Blocks

C1 Stack Memory AllocationInstructions for ISA Results.

C2 Calls to ISA Behavior in LLVM-IR.

C3 No Delay Slots à Immediate UpdateDelay Slots à Buffered Update;Handles Data Hazards + Side-Effects

C4 Instructions for testing ISA ReturnValues e.g. Early Termination.

Update Basic Blocks

U1 Update Processor StateRegisters including PC, Cyclesand Buffered Results.

U2 If RPCT 6= 0 à Branch Taken;Pass control to Software KernelHandles Nested Branches

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 15 / 26

Page 25: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

Intermediate Code Generation

IR Function Composition

One Entry / Return Basic Block foreach Target Basic Block.

Core and Update Basic Block Pairfor each Target Execute Packet.

Control Flows between Generated IRBasic Blocks.

Core Basic Blocks

C1 Stack Memory AllocationInstructions for ISA Results.

C2 Calls to ISA Behavior in LLVM-IR.

C3 No Delay Slots à Immediate UpdateDelay Slots à Buffered Update;Handles Data Hazards + Side-Effects

C4 Instructions for testing ISA ReturnValues e.g. Early Termination.

Update Basic Blocks

U1 Update Processor StateRegisters including PC, Cyclesand Buffered Results.

U2 If RPCT 6= 0 à Branch Taken;Pass control to Software KernelHandles Nested Branches

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 15 / 26

Page 26: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

Intermediate Code Generation

IR Function Composition

One Entry / Return Basic Block foreach Target Basic Block.

Core and Update Basic Block Pairfor each Target Execute Packet.

Control Flows between Generated IRBasic Blocks.

Core Basic Blocks

C1 Stack Memory AllocationInstructions for ISA Results.

C2 Calls to ISA Behavior in LLVM-IR.

C3 No Delay Slots à Immediate UpdateDelay Slots à Buffered Update;Handles Data Hazards + Side-Effects

C4 Instructions for testing ISA ReturnValues e.g. Early Termination.

Update Basic Blocks

U1 Update Processor StateRegisters including PC, Cyclesand Buffered Results.

U2 If RPCT 6= 0 à Branch Taken;Pass control to Software KernelHandles Nested Branches

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 15 / 26

Page 27: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

Native Memory Accesses using Memory Virtualization

Extended Target Address Space

All Memory Accesses are in Target Address Space; Thanks toHardware-Assisted Memory Virtualization.

Native Binary Size > VLIW Binary Size à Extended Target Address Space.

Simulation Flow

1 Initialize Platform.

2 Load Bootstrap Code +Native Binary.

3 Boot KVM CPUs.

4 Load Target VLIW Binary.

5 Continue Simulation.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 16 / 26

Page 28: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

Native Memory Accesses using Memory Virtualization

Extended Target Address Space

All Memory Accesses are in Target Address Space; Thanks toHardware-Assisted Memory Virtualization.

Native Binary Size > VLIW Binary Size à Extended Target Address Space.

Simulation Flow

1 Initialize Platform.

2 Load Bootstrap Code +Native Binary.

3 Boot KVM CPUs.

4 Load Target VLIW Binary.

5 Continue Simulation.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 16 / 26

Page 29: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

Native Memory Accesses using Memory Virtualization

Extended Target Address Space

All Memory Accesses are in Target Address Space; Thanks toHardware-Assisted Memory Virtualization.

Native Binary Size > VLIW Binary Size à Extended Target Address Space.

Simulation Flow

1 Initialize Platform.

2 Load Bootstrap Code +Native Binary.

3 Boot KVM CPUs.

4 Load Target VLIW Binary.

5 Continue Simulation.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 16 / 26

Page 30: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

Native Memory Accesses using Memory Virtualization

Extended Target Address Space

All Memory Accesses are in Target Address Space; Thanks toHardware-Assisted Memory Virtualization.

Native Binary Size > VLIW Binary Size à Extended Target Address Space.

Simulation Flow

1 Initialize Platform.

2 Load Bootstrap Code +Native Binary.

3 Boot KVM CPUs.

4 Load Target VLIW Binary.

5 Continue Simulation.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 16 / 26

Page 31: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

Native Memory Accesses using Memory Virtualization

Extended Target Address Space

All Memory Accesses are in Target Address Space; Thanks toHardware-Assisted Memory Virtualization.

Native Binary Size > VLIW Binary Size à Extended Target Address Space.

Simulation Flow

1 Initialize Platform.

2 Load Bootstrap Code +Native Binary.

3 Boot KVM CPUs.

4 Load Target VLIW Binary.

5 Continue Simulation.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 16 / 26

Page 32: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

Native Memory Accesses using Memory Virtualization

Extended Target Address Space

All Memory Accesses are in Target Address Space; Thanks toHardware-Assisted Memory Virtualization.

Native Binary Size > VLIW Binary Size à Extended Target Address Space.

Simulation Flow

1 Initialize Platform.

2 Load Bootstrap Code +Native Binary.

3 Boot KVM CPUs.

4 Load Target VLIW Binary.

5 Continue Simulation.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 16 / 26

Page 33: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

Multiple Code Generation/Translation Levels

Which Translation Level and Why / Why Not ?

Each Execute Packet à Slower Simulation (Switching in Software Kernel)

Basic Blocks Only à Dynamic Translation Support ? (Indirect Branches)

Basic Blocks + Execute Packets à Fast Simulation but Redundant Code.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 17 / 26

Page 34: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

Multiple Code Generation/Translation Levels

Which Translation Level and Why / Why Not ?

Each Execute Packet à Slower Simulation (Switching in Software Kernel)

Basic Blocks Only à Dynamic Translation Support ? (Indirect Branches)

Basic Blocks + Execute Packets à Fast Simulation but Redundant Code.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 17 / 26

Page 35: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

Multiple Code Generation/Translation Levels

Which Translation Level and Why / Why Not ?

Each Execute Packet à Slower Simulation (Switching in Software Kernel)

Basic Blocks Only à Dynamic Translation Support ? (Indirect Branches)

Basic Blocks + Execute Packets à Fast Simulation but Redundant Code.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 17 / 26

Page 36: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

Multiple Code Generation/Translation Levels

Code Generation Modes – Summary

Generation Mode Execute Packets Basic Blocks Hybrid (BB+EP)

Simulation Speed Slow Medium Fast

Simulator Size Medium Small Large

Dynamic Translations Not Required Required Not Required

H/W Synchronization Per EP Per EP/BB Per EP/BB

Self Modifying Code Support No Yes No

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 18 / 26

Page 37: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

Multiple Code Generation/Translation Levels

Code Generation Modes – Summary

Generation Mode Execute Packets Basic Blocks Hybrid (BB+EP)

Simulation Speed Slow Medium Fast

Simulator Size Medium Small Large

Dynamic Translations Not Required Required Not Required

H/W Synchronization Per EP Per EP/BB Per EP/BB

Self Modifying Code Support No Yes No

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 18 / 26

Page 38: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

Multiple Code Generation/Translation Levels

Code Generation Modes – Summary

Generation Mode Execute Packets Basic Blocks Hybrid (BB+EP)

Simulation Speed Slow Medium Fast

Simulator Size Medium Small Large

Dynamic Translations Not Required Required Not Required

H/W Synchronization Per EP Per EP/BB Per EP/BB

Self Modifying Code Support No Yes No

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 18 / 26

Page 39: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Experiments and Results

Table of Contents

1 Introduction and Motivation

2 Problem Definition – Simulation of VLIW on Native Machines

3 Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

4 Experiments and Results

5 Summary and Conclusions

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 19 / 26

Page 40: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Experiments and Results

Experimental Setup and Benchmarks

Test Kernels – Control and Compute Intensive

Fibonacci Index – Recursive

Factorial Index – Recursive

IDCT Block Decoding

Benchmark / Reference Simulators

TI C6x Full Cycle Accurate simulator (TI-C6x-FCA)

TI C6x Device Functional Cycle Accurate simulator (TI-C6x-DFCA)

Modest Host Machine for Experimentation

Pentium(R) Dual-Core CPU E5300 (2.60 GHz, 2M Cache) + 2 GB RAM.

Linux version 2.6.32-37 32-bit (SMP)

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 20 / 26

Page 41: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Experiments and Results

Experimental Results – Fibonacci Index

0.0001

0.001

0.01

0.1

1

10

100

1000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Sim

ula

tio

n T

ime

(S

eco

nd

s)

Fibonacci Index

ExecutePacket (EP)Hybrid (BB+EP)

Hybrid+LMaps+OptHybrid+Hash+Opt

TI-C6x-DFCATI-C6x-FCA

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 21 / 26

Page 42: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Experiments and Results

Experimental Results

0.1

1

10

100

1000

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Sim

ula

tion T

ime (

Seconds)

Factorial Index

ExecutePacket (EP)Hybrid (BB+EP)

Hybrid+LMaps+OptHybrid+Hash+Opt

TI-C6x-DFCATI-C6x-FCA

0.001

0.01

0.1

1

10

100

1 5 10 15 20 25 30 35 40

Sim

ula

tion T

ime (

Seconds)

IDCT -- Number of Blocks

ExecutePacket (EP)Hybrid (BB+EP)

Hybrid+LMaps+OptHybrid+Hash+Opt

TI-C6x-DFCATI-C6x-FCA

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 22 / 26

Page 43: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Experiments and Results

Experimental Results – Summary

0.0001

0.001

0.01

0.1

1

10

100

1000

Fibonacci(Index=30) Factorial(Index=14,100K) IDCT(Blocks=40)

Sim

ula

tio

n T

ime

(S

eco

nd

s)

Benchmark Application

DirectHostNative

TI-C6x-FCATI-C6x-DFCA

Static-BT

Average Speedups/Slowdowns of SBT-Based Simulation

ApplicationC6x-FCA C6x-DFCA Native DirectHost

Speedup Speedup Slowdown Slowdown

Fibonacci 159x 39x 90x 101x

Factorial 132x 33x 205x 220x

IDCT 129x 31x 133x 141x

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 23 / 26

Page 44: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Summary and Conclusions

Table of Contents

1 Introduction and Motivation

2 Problem Definition – Simulation of VLIW on Native Machines

3 Proposed Solution – Native Simulation of VLIW ISA using SBT and HAV

4 Experiments and Results

5 Summary and Conclusions

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 24 / 26

Page 45: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Summary and Conclusions

Final Remarks

Summary

A flow for Static Translation of VLIW Binaries to Native Code.

Functionally Identical to TI Simulators; Verified by Trace Comparison.

Profits from LLVM Infrastructure Components à Optimized Native Code.

Limitation and Overhead

Completely Static à Does not support Basic Block only simulation.

Hybrid Translation mode à Redundancy in Translated Code.

Future Directions

Automatic Generation of VLIW Instruction Decoders and ISA Behavior.

Performance Estimation of Complex Benchmark Applications.

Reducing the VLIW Architecture Modeling Overheads in Translated Code.

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 25 / 26

Page 46: Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware

Questions & Answers

Thanks for Your Attention !

Questions ?

Mian-Muhammad Hamayun (TIMA Laboratory) ASPDAC 2013 January 25th, 2013 26 / 26