A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for...

Post on 17-Jul-2020

17 views 0 download

Transcript of A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for...

A Universal Parallel Front End for ExecutionDriven Microarchitecture Simulation

Chad D. KerseySudhakar Yalamanchili

Georgia Institute of Technology

Arun RodriguesSandia National Laboratories

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Outline

I Introduction

I SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Outline

I IntroductionI Simulators

I Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front End

I API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API Overview

I How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?

I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?

I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?

I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Introduction

Introduction– The Ubiquity of Simulation

I Simulation is a requirement of architecture research.

I Few architecture researchers have access to the resourcesneeded to create full-scale prototypes.

I Those with the resources would prefer not to spend thembuilding incremental prototypes.

I Even if they would, the turn-around time for building a newCPU, even using pre-designed components would be very long.

Introduction– The Ubiquity of Simulation

I Simulation is a requirement of architecture research.

I Few architecture researchers have access to the resourcesneeded to create full-scale prototypes.

I Those with the resources would prefer not to spend thembuilding incremental prototypes.

I Even if they would, the turn-around time for building a newCPU, even using pre-designed components would be very long.

Introduction– The Ubiquity of Simulation

I Simulation is a requirement of architecture research.

I Few architecture researchers have access to the resourcesneeded to create full-scale prototypes.

I Those with the resources would prefer not to spend thembuilding incremental prototypes.

I Even if they would, the turn-around time for building a newCPU, even using pre-designed components would be very long.

Introduction– The Ubiquity of Simulation

I Simulation is a requirement of architecture research.

I Few architecture researchers have access to the resourcesneeded to create full-scale prototypes.

I Those with the resources would prefer not to spend thembuilding incremental prototypes.

I Even if they would, the turn-around time for building a newCPU, even using pre-designed components would be very long.

Introduction– The Simulation Gap

Time

Capacity

Demand

Sim

ula

tion C

om

ple

xity

Reasons for the simulation gap:I Parallel simulation is hard, so we use serial simulators for

parallel machines.I Developments in computer architecture tend to be additive,

but we keep building simulators from scratch.

Introduction– The Simulation Gap

Time

Capacity

Demand

Sim

ula

tion C

om

ple

xity

Reasons for the simulation gap:

I Parallel simulation is hard, so we use serial simulators forparallel machines.

I Developments in computer architecture tend to be additive,but we keep building simulators from scratch.

Introduction– The Simulation Gap

Time

Capacity

Demand

Sim

ula

tion C

om

ple

xity

Reasons for the simulation gap:I Parallel simulation is hard, so we use serial simulators for

parallel machines.

I Developments in computer architecture tend to be additive,but we keep building simulators from scratch.

Introduction– The Simulation Gap

Time

Capacity

Demand

Sim

ula

tion C

om

ple

xity

Reasons for the simulation gap:I Parallel simulation is hard, so we use serial simulators for

parallel machines.I Developments in computer architecture tend to be additive,

but we keep building simulators from scratch.

Introduction– Narrowing the Gap

Ways to narrow the simulation gap:

I Spend less time researching architecture and more timedeveloping simulators?

I Probably would not be well-received by the architecturecommunity.

I Increase simulator throughput so more simulations can be runin a reasonable amount of time.

I Parallelize them.

I Find ways to make simulator development more efficient.

If we make simulator development more efficient, we increase therate at which simulation capacity can grow.

Introduction– Narrowing the Gap

Ways to narrow the simulation gap:I Spend less time researching architecture and more time

developing simulators?

I Probably would not be well-received by the architecturecommunity.

I Increase simulator throughput so more simulations can be runin a reasonable amount of time.

I Parallelize them.

I Find ways to make simulator development more efficient.

If we make simulator development more efficient, we increase therate at which simulation capacity can grow.

Introduction– Narrowing the Gap

Ways to narrow the simulation gap:I Spend less time researching architecture and more time

developing simulators?I Probably would not be well-received by the architecture

community.

I Increase simulator throughput so more simulations can be runin a reasonable amount of time.

I Parallelize them.

I Find ways to make simulator development more efficient.

If we make simulator development more efficient, we increase therate at which simulation capacity can grow.

Introduction– Narrowing the Gap

Ways to narrow the simulation gap:I Spend less time researching architecture and more time

developing simulators?I Probably would not be well-received by the architecture

community.

I Increase simulator throughput so more simulations can be runin a reasonable amount of time.

I Parallelize them.

I Find ways to make simulator development more efficient.

If we make simulator development more efficient, we increase therate at which simulation capacity can grow.

Introduction– Narrowing the Gap

Ways to narrow the simulation gap:I Spend less time researching architecture and more time

developing simulators?I Probably would not be well-received by the architecture

community.

I Increase simulator throughput so more simulations can be runin a reasonable amount of time.

I Parallelize them.

I Find ways to make simulator development more efficient.

If we make simulator development more efficient, we increase therate at which simulation capacity can grow.

Introduction– Narrowing the Gap

Ways to narrow the simulation gap:I Spend less time researching architecture and more time

developing simulators?I Probably would not be well-received by the architecture

community.

I Increase simulator throughput so more simulations can be runin a reasonable amount of time.

I Parallelize them.

I Find ways to make simulator development more efficient.

If we make simulator development more efficient, we increase therate at which simulation capacity can grow.

Introduction– Narrowing the Gap

Ways to narrow the simulation gap:I Spend less time researching architecture and more time

developing simulators?I Probably would not be well-received by the architecture

community.

I Increase simulator throughput so more simulations can be runin a reasonable amount of time.

I Parallelize them.

I Find ways to make simulator development more efficient.

If we make simulator development more efficient, we increase therate at which simulation capacity can grow.

Introduction– Front Ends

What is a front end?

I Most simulators are broken into a front end and a back endby their designers.

I The front end handles the execution of instructions (makingsure the register state is correct).

I Because instruction sets are very complex, front ends areusually created by using and modifying an existing emulationsolution or avoiding emulation entirely and tracing nativeexecution.

I The back end handles timing, power, and other metrics (howlong did that instruction take to clear the pipeline).

I Back ends are the part that implements the logic that makesa simulator unique.

Introduction– Front Ends

What is a front end?

I Most simulators are broken into a front end and a back endby their designers.

I The front end handles the execution of instructions (makingsure the register state is correct).

I Because instruction sets are very complex, front ends areusually created by using and modifying an existing emulationsolution or avoiding emulation entirely and tracing nativeexecution.

I The back end handles timing, power, and other metrics (howlong did that instruction take to clear the pipeline).

I Back ends are the part that implements the logic that makesa simulator unique.

Introduction– Front Ends

What is a front end?

I Most simulators are broken into a front end and a back endby their designers.

I The front end handles the execution of instructions (makingsure the register state is correct).

I Because instruction sets are very complex, front ends areusually created by using and modifying an existing emulationsolution or avoiding emulation entirely and tracing nativeexecution.

I The back end handles timing, power, and other metrics (howlong did that instruction take to clear the pipeline).

I Back ends are the part that implements the logic that makesa simulator unique.

Introduction– Front Ends

What is a front end?

I Most simulators are broken into a front end and a back endby their designers.

I The front end handles the execution of instructions (makingsure the register state is correct).

I Because instruction sets are very complex, front ends areusually created by using and modifying an existing emulationsolution or avoiding emulation entirely and tracing nativeexecution.

I The back end handles timing, power, and other metrics (howlong did that instruction take to clear the pipeline).

I Back ends are the part that implements the logic that makesa simulator unique.

Introduction– Front Ends

What is a front end?

I Most simulators are broken into a front end and a back endby their designers.

I The front end handles the execution of instructions (makingsure the register state is correct).

I Because instruction sets are very complex, front ends areusually created by using and modifying an existing emulationsolution or avoiding emulation entirely and tracing nativeexecution.

I The back end handles timing, power, and other metrics (howlong did that instruction take to clear the pipeline).

I Back ends are the part that implements the logic that makesa simulator unique.

Introduction– Front Ends

What is a front end?

I Most simulators are broken into a front end and a back endby their designers.

I The front end handles the execution of instructions (makingsure the register state is correct).

I Because instruction sets are very complex, front ends areusually created by using and modifying an existing emulationsolution or avoiding emulation entirely and tracing nativeexecution.

I The back end handles timing, power, and other metrics (howlong did that instruction take to clear the pipeline).

I Back ends are the part that implements the logic that makesa simulator unique.

Introduction– The Ideal Front End

Trace

Writer

Trace

Reader

TraceBack−End

Results

Back−End

Results

Emulator

Ideally:

I Each front end and back end must be written only once, afterwhich they can be used in any combination, like compier frontends and back ends.

I No additional code would need to be written to adapt generalpurpose emulators for simulation duty.

Introduction– The Ideal Front End

Trace

Writer

Trace

Reader

TraceBack−End

Results

Back−End

Results

Emulator

Ideally:

I Each front end and back end must be written only once, afterwhich they can be used in any combination, like compier frontends and back ends.

I No additional code would need to be written to adapt generalpurpose emulators for simulation duty.

Introduction– The Ideal Front End

Trace

Writer

Trace

Reader

TraceBack−End

Results

Back−End

Results

Emulator

Ideally:

I Each front end and back end must be written only once, afterwhich they can be used in any combination, like compier frontends and back ends.

I No additional code would need to be written to adapt generalpurpose emulators for simulation duty.

Introduction– Real Front Ends

Trace

WriterTrace

Reader

Trace Internal

APIBack−End

Results

Internal

API

Custom

ShimEmulator Back−End

Results

Compatibility

Layers

Realistically:

I Much custom code needs to be written to adapt mostoff-the-shelf emulators as simulator front ends.

I Each time this is done, yet another simulator-specific frontend is created.

Introduction– Real Front Ends

Trace

WriterTrace

Reader

Trace Internal

APIBack−End

Results

Internal

API

Custom

ShimEmulator Back−End

Results

Compatibility

Layers

Realistically:

I Much custom code needs to be written to adapt mostoff-the-shelf emulators as simulator front ends.

I Each time this is done, yet another simulator-specific frontend is created.

Introduction– Real Front Ends

Trace

WriterTrace

Reader

Trace Internal

APIBack−End

Results

Internal

API

Custom

ShimEmulator Back−End

Results

Compatibility

Layers

Realistically:

I Much custom code needs to be written to adapt mostoff-the-shelf emulators as simulator front ends.

I Each time this is done, yet another simulator-specific frontend is created.

Introduction– Front Ends

How do we make new simulator front ends closer to the ideal?

I Provide an API that does not change unnecessarily beacuse ofthe type of front end or the instruction set.

I Enable the construction of multithreaded simulators.

I Provide sufficient control and detail in the API to make ituseable with most back ends.

Introduction– Front Ends

How do we make new simulator front ends closer to the ideal?

I Provide an API that does not change unnecessarily beacuse ofthe type of front end or the instruction set.

I Enable the construction of multithreaded simulators.

I Provide sufficient control and detail in the API to make ituseable with most back ends.

Introduction– Front Ends

How do we make new simulator front ends closer to the ideal?

I Provide an API that does not change unnecessarily beacuse ofthe type of front end or the instruction set.

I Enable the construction of multithreaded simulators.

I Provide sufficient control and detail in the API to make ituseable with most back ends.

Introduction– Front Ends

How do we make new simulator front ends closer to the ideal?

I Provide an API that does not change unnecessarily beacuse ofthe type of front end or the instruction set.

I Enable the construction of multithreaded simulators.

I Provide sufficient control and detail in the API to make ituseable with most back ends.

The QSim Simulator Front End

QSim

I We have built a simlator front end based on QEMU1 thataims to address these issues. It currently:

I Runs unmodified binaries on a lightly-modified Linux guest.I Enables the construction of multithreaded simulators.I Has full support for 32-bit x86 guests. (Port to 64-bit x86

weeks from completion; port to ARM in early coding stages.)I Enables parallel simulation.

1http://www.qemu.org

QSim

I We have built a simlator front end based on QEMU1 thataims to address these issues. It currently:

I Runs unmodified binaries on a lightly-modified Linux guest.

I Enables the construction of multithreaded simulators.I Has full support for 32-bit x86 guests. (Port to 64-bit x86

weeks from completion; port to ARM in early coding stages.)I Enables parallel simulation.

1http://www.qemu.org

QSim

I We have built a simlator front end based on QEMU1 thataims to address these issues. It currently:

I Runs unmodified binaries on a lightly-modified Linux guest.I Enables the construction of multithreaded simulators.

I Has full support for 32-bit x86 guests. (Port to 64-bit x86weeks from completion; port to ARM in early coding stages.)

I Enables parallel simulation.

1http://www.qemu.org

QSim

I We have built a simlator front end based on QEMU1 thataims to address these issues. It currently:

I Runs unmodified binaries on a lightly-modified Linux guest.I Enables the construction of multithreaded simulators.I Has full support for 32-bit x86 guests. (Port to 64-bit x86

weeks from completion; port to ARM in early coding stages.)

I Enables parallel simulation.

1http://www.qemu.org

QSim

I We have built a simlator front end based on QEMU1 thataims to address these issues. It currently:

I Runs unmodified binaries on a lightly-modified Linux guest.I Enables the construction of multithreaded simulators.I Has full support for 32-bit x86 guests. (Port to 64-bit x86

weeks from completion; port to ARM in early coding stages.)I Enables parallel simulation.

1http://www.qemu.org

QSim– API Overview

Set callback

Unset callback

RunBack End

Callbacks

QSim

A simplified diagram of the QSim API.

run(i , j) Advance guest CPU i by j instructions.

set * callback(x) Set callbacks.

unset * callback(h) Unset callbacks by handle.

Callback types include: instruction, register access, memory access,interrupt

QSim– How is it “Universal”?

Typical emulator used as a simulator front end:

Internal

API

Custom

ShimEmulator Back−End

Results

I Off-the-shelf open-source emulators like QEMU provide mostof the code needed to build a front end but are incomplete.

I Simulation projects like PTLSim and FAST have modifiedQEMU heavily to create their front ends.

QSim– How is it “Universal”?

Typical emulator used as a simulator front end:

Internal

API

Custom

ShimEmulator Back−End

Results

I Off-the-shelf open-source emulators like QEMU provide mostof the code needed to build a front end but are incomplete.

I Simulation projects like PTLSim and FAST have modifiedQEMU heavily to create their front ends.

QSim– How is it “Universal”?

Typical emulator used as a simulator front end:

Internal

API

Custom

ShimEmulator Back−End

Results

I Off-the-shelf open-source emulators like QEMU provide mostof the code needed to build a front end but are incomplete.

I Simulation projects like PTLSim and FAST have modifiedQEMU heavily to create their front ends.

QSim– How is it “Universal”?

QSim:

Back−End

Results

Custom

ShimEmulator

API

QSim

I QSim has done this yet again, but with compatibility with adiverse set of back ends in mind.

I Similarly, the API is the same no matter what the instructionset.

QSim– How is it “Universal”?

QSim:

Back−End

Results

Custom

ShimEmulator

API

QSim

I QSim has done this yet again, but with compatibility with adiverse set of back ends in mind.

I Similarly, the API is the same no matter what the instructionset.

QSim– How is it “Universal”?

QSim:

Back−End

Results

Custom

ShimEmulator

API

QSim

I QSim has done this yet again, but with compatibility with adiverse set of back ends in mind.

I Similarly, the API is the same no matter what the instructionset.

QSim– Parallelism

I The run() function can be called simultaneously for twodifferent guest CPUs from different host threads.

I This enables parallel emulation of multithreaded guests, aswell as multithreaded back ends.

I Since back ends tend to use more CPU time than front ends,thread safety is more important than parallel emulation (bothare provided by QSim).

I Up to 512 guest cores have been demonstrated running on upto 512 host threads.

QSim– Parallelism

I The run() function can be called simultaneously for twodifferent guest CPUs from different host threads.

I This enables parallel emulation of multithreaded guests, aswell as multithreaded back ends.

I Since back ends tend to use more CPU time than front ends,thread safety is more important than parallel emulation (bothare provided by QSim).

I Up to 512 guest cores have been demonstrated running on upto 512 host threads.

QSim– Parallelism

I The run() function can be called simultaneously for twodifferent guest CPUs from different host threads.

I This enables parallel emulation of multithreaded guests, aswell as multithreaded back ends.

I Since back ends tend to use more CPU time than front ends,thread safety is more important than parallel emulation (bothare provided by QSim).

I Up to 512 guest cores have been demonstrated running on upto 512 host threads.

QSim– Parallelism

I The run() function can be called simultaneously for twodifferent guest CPUs from different host threads.

I This enables parallel emulation of multithreaded guests, aswell as multithreaded back ends.

I Since back ends tend to use more CPU time than front ends,thread safety is more important than parallel emulation (bothare provided by QSim).

I Up to 512 guest cores have been demonstrated running on upto 512 host threads.

QSim– Software Architecture

QSim– Parallel Scaling

QSim– Performance

The following represents the performance of QSim with emptycallbacks. With typical simulation speeds measured in thousands ofinstructions per second, QSim will not likely be the bottleneck.

Benchmark Slowdown MIPSswaptions 259x 18.5

mtgl-bfs 387x 36.6

ocean-non-contig 267x 40.7

QSim– Relation to QEMU

How are QEMU and QSim related?

QEMU QSim

Emulator Simulator Front-EndStandalone Built on QEMUFull-system CPU and RAM only

CPUs serialized CPUs in parallelProgram Library

QSim– Relation to QEMU

How are QEMU and QSim related?

QEMU QSimEmulator Simulator Front-End

Standalone Built on QEMUFull-system CPU and RAM only

CPUs serialized CPUs in parallelProgram Library

QSim– Relation to QEMU

How are QEMU and QSim related?

QEMU QSimEmulator Simulator Front-End

Standalone Built on QEMU

Full-system CPU and RAM onlyCPUs serialized CPUs in parallel

Program Library

QSim– Relation to QEMU

How are QEMU and QSim related?

QEMU QSimEmulator Simulator Front-End

Standalone Built on QEMUFull-system CPU and RAM only

CPUs serialized CPUs in parallelProgram Library

QSim– Relation to QEMU

How are QEMU and QSim related?

QEMU QSimEmulator Simulator Front-End

Standalone Built on QEMUFull-system CPU and RAM only

CPUs serialized CPUs in parallel

Program Library

QSim– Relation to QEMU

How are QEMU and QSim related?

QEMU QSimEmulator Simulator Front-End

Standalone Built on QEMUFull-system CPU and RAM only

CPUs serialized CPUs in parallelProgram Library

Back Ends

Back Ends– The Canonical Example

Host Thread 3Host Thread 2Host Thread 1

Parallel Discrete Event Simulation Engine

. . .

CPU

Component

CPU

Component

QSim

. . . Other

Components

This is the kind of simulation QSim was designed for.

I QSim feeds instructions into CPU timing models that are partof a larger simulation infrastructure.

I A parallel discrete event simulation engine keeps track ofevents and ensures correctness.

Back Ends– The Canonical Example

Host Thread 3Host Thread 2Host Thread 1

Parallel Discrete Event Simulation Engine

. . .

CPU

Component

CPU

Component

QSim

. . . Other

Components

This is the kind of simulation QSim was designed for.

I QSim feeds instructions into CPU timing models that are partof a larger simulation infrastructure.

I A parallel discrete event simulation engine keeps track ofevents and ensures correctness.

Back Ends– The Canonical Example

Host Thread 3Host Thread 2Host Thread 1

Parallel Discrete Event Simulation Engine

. . .

CPU

Component

CPU

Component

QSim

. . . Other

Components

This is the kind of simulation QSim was designed for.

I QSim feeds instructions into CPU timing models that are partof a larger simulation infrastructure.

I A parallel discrete event simulation engine keeps track ofevents and ensures correctness.

Back Ends– Others

Other back ends that have been built for QSim include:

I A binary trace writer, which was built along with a tracereader library that exports the QSim API.

I A serial universal processor emulator, simplesim.I A demonstration vehicle; breaks instructions into plausible

micro-ops regardless of instruction set.

I An interactive OS/application debugger.

I Visualization utilities.

Back Ends– Others

Other back ends that have been built for QSim include:

I A binary trace writer, which was built along with a tracereader library that exports the QSim API.

I A serial universal processor emulator, simplesim.I A demonstration vehicle; breaks instructions into plausible

micro-ops regardless of instruction set.

I An interactive OS/application debugger.

I Visualization utilities.

Back Ends– Others

Other back ends that have been built for QSim include:

I A binary trace writer, which was built along with a tracereader library that exports the QSim API.

I A serial universal processor emulator, simplesim.

I A demonstration vehicle; breaks instructions into plausiblemicro-ops regardless of instruction set.

I An interactive OS/application debugger.

I Visualization utilities.

Back Ends– Others

Other back ends that have been built for QSim include:

I A binary trace writer, which was built along with a tracereader library that exports the QSim API.

I A serial universal processor emulator, simplesim.I A demonstration vehicle; breaks instructions into plausible

micro-ops regardless of instruction set.

I An interactive OS/application debugger.

I Visualization utilities.

Back Ends– Others

Other back ends that have been built for QSim include:

I A binary trace writer, which was built along with a tracereader library that exports the QSim API.

I A serial universal processor emulator, simplesim.I A demonstration vehicle; breaks instructions into plausible

micro-ops regardless of instruction set.

I An interactive OS/application debugger.

I Visualization utilities.

Back Ends– Others

Other back ends that have been built for QSim include:

I A binary trace writer, which was built along with a tracereader library that exports the QSim API.

I A serial universal processor emulator, simplesim.I A demonstration vehicle; breaks instructions into plausible

micro-ops regardless of instruction set.

I An interactive OS/application debugger.

I Visualization utilities.

Summary

This has been:

I An appeal for consistent front end/back end APIs.

I A plea for parallel front ends.

I A look at how we might narrow the simulation gap.

I An invitation to try QSim2.

2http://www.cdkersey.com/qsim-web/

Summary

This has been:

I An appeal for consistent front end/back end APIs.

I A plea for parallel front ends.

I A look at how we might narrow the simulation gap.

I An invitation to try QSim2.

2http://www.cdkersey.com/qsim-web/

Summary

This has been:

I An appeal for consistent front end/back end APIs.

I A plea for parallel front ends.

I A look at how we might narrow the simulation gap.

I An invitation to try QSim2.

2http://www.cdkersey.com/qsim-web/

Summary

This has been:

I An appeal for consistent front end/back end APIs.

I A plea for parallel front ends.

I A look at how we might narrow the simulation gap.

I An invitation to try QSim2.

2http://www.cdkersey.com/qsim-web/

Summary

This has been:

I An appeal for consistent front end/back end APIs.

I A plea for parallel front ends.

I A look at how we might narrow the simulation gap.

I An invitation to try QSim2.

2http://www.cdkersey.com/qsim-web/

Acknowledgements

The authors are grateful to Paolo Faraboschi and Daniel Ortega fortheir suggestions and guidance in getting QSim started. This workwas supported by the National Science Foundation under grantCNS855110, Sandia National Laboratories, and HP Laboratories.