A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for...

A Universal Parallel Front End for ExecutionDriven Microarchitecture Simulation

Chad D. KerseySudhakar Yalamanchili

Georgia Institute of Technology

Arun RodriguesSandia National Laboratories

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Outline

I Introduction

I SimulatorsI Front Ends

I Back Ends

I Summary

I Acknowledgements

Outline

I IntroductionI Simulators

I Front Ends

I Back Ends

I Summary

I Acknowledgements

Outline

I Back Ends

I Summary

I Acknowledgements

Outline

I The QSim Simulator Front End

I API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Outline

I The QSim Simulator Front EndI API Overview

I How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Outline

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?

I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Outline

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?

I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Outline

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?

I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Outline

I Back Ends

I Summary

I Acknowledgements

Outline

I Back Ends

I Summary

I Acknowledgements

Outline

I Back Ends

I Summary

I Acknowledgements

Outline

I Back Ends

I Summary

I Acknowledgements

Introduction

Introduction– The Ubiquity of Simulation

I Simulation is a requirement of architecture research.

I Few architecture researchers have access to the resourcesneeded to create full-scale prototypes.

I Those with the resources would prefer not to spend thembuilding incremental prototypes.

I Even if they would, the turn-around time for building a newCPU, even using pre-designed components would be very long.

Introduction– The Simulation Gap

Capacity

Demand

tion C

Reasons for the simulation gap:I Parallel simulation is hard, so we use serial simulators for

parallel machines.I Developments in computer architecture tend to be additive,

but we keep building simulators from scratch.

Capacity

Demand

tion C

Reasons for the simulation gap:

I Parallel simulation is hard, so we use serial simulators forparallel machines.

I Developments in computer architecture tend to be additive,but we keep building simulators from scratch.

Capacity

Demand

tion C

parallel machines.

I Developments in computer architecture tend to be additive,but we keep building simulators from scratch.

Capacity

Demand

tion C

parallel machines.I Developments in computer architecture tend to be additive,

but we keep building simulators from scratch.

Introduction– Narrowing the Gap

Ways to narrow the simulation gap:

I Spend less time researching architecture and more timedeveloping simulators?

I Probably would not be well-received by the architecturecommunity.

I Increase simulator throughput so more simulations can be runin a reasonable amount of time.

I Parallelize them.

I Find ways to make simulator development more efficient.

If we make simulator development more efficient, we increase therate at which simulation capacity can grow.

Ways to narrow the simulation gap:I Spend less time researching architecture and more time

developing simulators?

I Probably would not be well-received by the architecturecommunity.

I Parallelize them.

developing simulators?I Probably would not be well-received by the architecture

community.

I Parallelize them.

community.

I Parallelize them.

community.

I Parallelize them.

community.

I Parallelize them.

community.

I Parallelize them.

Introduction– Front Ends

What is a front end?

I Most simulators are broken into a front end and a back endby their designers.

I The front end handles the execution of instructions (makingsure the register state is correct).

I Because instruction sets are very complex, front ends areusually created by using and modifying an existing emulationsolution or avoiding emulation entirely and tracing nativeexecution.

I The back end handles timing, power, and other metrics (howlong did that instruction take to clear the pipeline).

I Back ends are the part that implements the logic that makesa simulator unique.

Introduction– The Ideal Front End

Writer

Reader

TraceBack−End

Results

Back−End

Results

Emulator

Ideally:

I Each front end and back end must be written only once, afterwhich they can be used in any combination, like compier frontends and back ends.

I No additional code would need to be written to adapt generalpurpose emulators for simulation duty.

Writer

Reader

TraceBack−End

Results

Back−End

Results

Emulator

Ideally:

Writer

Reader

TraceBack−End

Results

Back−End

Results

Emulator

Ideally:

Introduction– Real Front Ends

WriterTrace

Reader

Trace Internal

APIBack−End

Results

Internal

Custom

ShimEmulator Back−End

Results

Compatibility

Layers

Realistically:

I Much custom code needs to be written to adapt mostoff-the-shelf emulators as simulator front ends.

I Each time this is done, yet another simulator-specific frontend is created.

WriterTrace

Reader

Trace Internal

APIBack−End

Results

Internal

Custom

Results

Compatibility

Layers

Realistically:

WriterTrace

Reader

Trace Internal

APIBack−End

Results

Internal

Custom

Results

Compatibility

Layers

Realistically:

How do we make new simulator front ends closer to the ideal?

I Provide an API that does not change unnecessarily beacuse ofthe type of front end or the instruction set.

I Enable the construction of multithreaded simulators.

I Provide sufficient control and detail in the API to make ituseable with most back ends.

The QSim Simulator Front End

I We have built a simlator front end based on QEMU1 thataims to address these issues. It currently:

I Runs unmodified binaries on a lightly-modified Linux guest.I Enables the construction of multithreaded simulators.I Has full support for 32-bit x86 guests. (Port to 64-bit x86

weeks from completion; port to ARM in early coding stages.)I Enables parallel simulation.

1http://www.qemu.org

I Runs unmodified binaries on a lightly-modified Linux guest.

I Enables the construction of multithreaded simulators.I Has full support for 32-bit x86 guests. (Port to 64-bit x86

I Runs unmodified binaries on a lightly-modified Linux guest.I Enables the construction of multithreaded simulators.

I Has full support for 32-bit x86 guests. (Port to 64-bit x86weeks from completion; port to ARM in early coding stages.)

I Enables parallel simulation.

weeks from completion; port to ARM in early coding stages.)

I Enables parallel simulation.

QSim– API Overview

Set callback

Unset callback

RunBack End

Callbacks

A simplified diagram of the QSim API.

run(i , j) Advance guest CPU i by j instructions.

set * callback(x) Set callbacks.

unset * callback(h) Unset callbacks by handle.

Callback types include: instruction, register access, memory access,interrupt

QSim– How is it “Universal”?

Typical emulator used as a simulator front end:

Internal

Custom

Results

I Off-the-shelf open-source emulators like QEMU provide mostof the code needed to build a front end but are incomplete.

I Simulation projects like PTLSim and FAST have modifiedQEMU heavily to create their front ends.

Internal

Custom

Results

Internal

Custom

Results

Back−End

Results

Custom

ShimEmulator

I QSim has done this yet again, but with compatibility with adiverse set of back ends in mind.

I Similarly, the API is the same no matter what the instructionset.

Back−End

Results

Custom

ShimEmulator

Back−End

Results

Custom

ShimEmulator

QSim– Parallelism

I The run() function can be called simultaneously for twodifferent guest CPUs from different host threads.

I This enables parallel emulation of multithreaded guests, aswell as multithreaded back ends.

I Since back ends tend to use more CPU time than front ends,thread safety is more important than parallel emulation (bothare provided by QSim).

I Up to 512 guest cores have been demonstrated running on upto 512 host threads.

QSim– Parallelism

QSim– Software Architecture

QSim– Parallel Scaling

QSim– Performance

The following represents the performance of QSim with emptycallbacks. With typical simulation speeds measured in thousands ofinstructions per second, QSim will not likely be the bottleneck.

Benchmark Slowdown MIPSswaptions 259x 18.5

mtgl-bfs 387x 36.6

ocean-non-contig 267x 40.7

QSim– Relation to QEMU

How are QEMU and QSim related?

QEMU QSim

Emulator Simulator Front-EndStandalone Built on QEMUFull-system CPU and RAM only

CPUs serialized CPUs in parallelProgram Library

QEMU QSimEmulator Simulator Front-End

Standalone Built on QEMUFull-system CPU and RAM only

Standalone Built on QEMU

Full-system CPU and RAM onlyCPUs serialized CPUs in parallel

Program Library

CPUs serialized CPUs in parallel

Program Library

Back Ends

Back Ends– The Canonical Example

Host Thread 3Host Thread 2Host Thread 1

Parallel Discrete Event Simulation Engine

Component

. . . Other

Components

This is the kind of simulation QSim was designed for.

I QSim feeds instructions into CPU timing models that are partof a larger simulation infrastructure.

I A parallel discrete event simulation engine keeps track ofevents and ensures correctness.

Component

. . . Other

Components

Component

. . . Other

Components

Back Ends– Others

Other back ends that have been built for QSim include:

I A binary trace writer, which was built along with a tracereader library that exports the QSim API.

I A serial universal processor emulator, simplesim.I A demonstration vehicle; breaks instructions into plausible

micro-ops regardless of instruction set.

I An interactive OS/application debugger.

I Visualization utilities.

Back Ends– Others

I A serial universal processor emulator, simplesim.

I A demonstration vehicle; breaks instructions into plausiblemicro-ops regardless of instruction set.

Back Ends– Others

Summary

This has been:

I An appeal for consistent front end/back end APIs.

I A plea for parallel front ends.

I A look at how we might narrow the simulation gap.

I An invitation to try QSim2.

2http://www.cdkersey.com/qsim-web/

Summary

This has been:

Summary

This has been:

Summary

This has been:

Summary

This has been:

Acknowledgements

The authors are grateful to Paolo Faraboschi and Daniel Ortega fortheir suggestions and guidance in getting QSim started. This workwas supported by the National Science Foundation under grantCNS855110, Sandia National Laboratories, and HP Laboratories.

A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for...

Documents

Transcript of A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for...

Twp Parallel Execution Fundamentals 133639

Parallel Thread Execution ISA · Parallel Thread Execution ISA v6.0 | ii TABLE OF CONTENTS Chapter 1. Introduction.....1

UNIVERSAL PARALLEL GRIPPER

PARALLEL SPARQL QUERY EXECUTION USING APACHE SPARK

Parallel Program Execution and Architecture Models with ...

Enabling Parallel Execution via Principled Speculation.

Parallel Execution Plans

Ciel universal distributed execution engine

Globally Precise-restartable Execution of Parallel Programs

Universal Parallel Computing Research Center

Introduction to Parallel Execution

Dapper The Distributed and Parallel Program Execution Runtimecarsomyr.github.io/dapper/manual.pdf · Dapper, standing for Distributed and Parallel Program Execution Runtime, is one

Parallel Execution Fundamentals in Oracle Database

Introduction of Parallel Program Execution and ...

ViennaX: a parallel plugin execution framework for ... · ViennaX: a parallel plugin execution framework for scientiﬁc computing ... focus on the data parallel approach based on

PTX: Parallel Thread Execution ISA Version 1

A Universal Parallel Front End for Execution Driven ...casl.gatech.edu/wp-content/uploads/2012/07/qsim-slides.pdf · A Universal Parallel Front End for Execution Driven Microarchitecture

Parallel Execution with Oracle Database 10g Release 2 · Parallel Execution with Oracle Database 10g Release 2 EXECUTIVE OVERVIEW Parallel execution is one of the fundamental database

Amalgamation of BDD, parallel execution and mobile automation

Introduction of Parallel Program Execution and ... · PDF fileIntroduction of Parallel Program Execution and Architecture Models ... Burroughs B5000 Project ... Barton’s DDM1