A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for...

88
A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili Georgia Institute of Technology Arun Rodrigues Sandia National Laboratories

Transcript of A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for...

Page 1: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

A Universal Parallel Front End for ExecutionDriven Microarchitecture Simulation

Chad D. KerseySudhakar Yalamanchili

Georgia Institute of Technology

Arun RodriguesSandia National Laboratories

Page 2: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Page 3: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Outline

I Introduction

I SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Page 4: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Outline

I IntroductionI Simulators

I Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Page 5: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Page 6: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front End

I API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Page 7: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API Overview

I How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Page 8: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?

I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Page 9: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?

I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Page 10: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?

I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Page 11: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Page 12: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Page 13: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Page 14: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Outline

I IntroductionI SimulatorsI Front Ends

I The QSim Simulator Front EndI API OverviewI How is QSim “Universal”?I How is it Parallel?I How does it Perform?I How are QEMU and QSim Related?

I Back Ends

I Summary

I Acknowledgements

Page 15: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction

Page 16: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– The Ubiquity of Simulation

I Simulation is a requirement of architecture research.

I Few architecture researchers have access to the resourcesneeded to create full-scale prototypes.

I Those with the resources would prefer not to spend thembuilding incremental prototypes.

I Even if they would, the turn-around time for building a newCPU, even using pre-designed components would be very long.

Page 17: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– The Ubiquity of Simulation

I Simulation is a requirement of architecture research.

I Few architecture researchers have access to the resourcesneeded to create full-scale prototypes.

I Those with the resources would prefer not to spend thembuilding incremental prototypes.

I Even if they would, the turn-around time for building a newCPU, even using pre-designed components would be very long.

Page 18: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– The Ubiquity of Simulation

I Simulation is a requirement of architecture research.

I Few architecture researchers have access to the resourcesneeded to create full-scale prototypes.

I Those with the resources would prefer not to spend thembuilding incremental prototypes.

I Even if they would, the turn-around time for building a newCPU, even using pre-designed components would be very long.

Page 19: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– The Ubiquity of Simulation

I Simulation is a requirement of architecture research.

I Few architecture researchers have access to the resourcesneeded to create full-scale prototypes.

I Those with the resources would prefer not to spend thembuilding incremental prototypes.

I Even if they would, the turn-around time for building a newCPU, even using pre-designed components would be very long.

Page 20: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– The Simulation Gap

Time

Capacity

Demand

Sim

ula

tion C

om

ple

xity

Reasons for the simulation gap:I Parallel simulation is hard, so we use serial simulators for

parallel machines.I Developments in computer architecture tend to be additive,

but we keep building simulators from scratch.

Page 21: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– The Simulation Gap

Time

Capacity

Demand

Sim

ula

tion C

om

ple

xity

Reasons for the simulation gap:

I Parallel simulation is hard, so we use serial simulators forparallel machines.

I Developments in computer architecture tend to be additive,but we keep building simulators from scratch.

Page 22: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– The Simulation Gap

Time

Capacity

Demand

Sim

ula

tion C

om

ple

xity

Reasons for the simulation gap:I Parallel simulation is hard, so we use serial simulators for

parallel machines.

I Developments in computer architecture tend to be additive,but we keep building simulators from scratch.

Page 23: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– The Simulation Gap

Time

Capacity

Demand

Sim

ula

tion C

om

ple

xity

Reasons for the simulation gap:I Parallel simulation is hard, so we use serial simulators for

parallel machines.I Developments in computer architecture tend to be additive,

but we keep building simulators from scratch.

Page 24: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– Narrowing the Gap

Ways to narrow the simulation gap:

I Spend less time researching architecture and more timedeveloping simulators?

I Probably would not be well-received by the architecturecommunity.

I Increase simulator throughput so more simulations can be runin a reasonable amount of time.

I Parallelize them.

I Find ways to make simulator development more efficient.

If we make simulator development more efficient, we increase therate at which simulation capacity can grow.

Page 25: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– Narrowing the Gap

Ways to narrow the simulation gap:I Spend less time researching architecture and more time

developing simulators?

I Probably would not be well-received by the architecturecommunity.

I Increase simulator throughput so more simulations can be runin a reasonable amount of time.

I Parallelize them.

I Find ways to make simulator development more efficient.

If we make simulator development more efficient, we increase therate at which simulation capacity can grow.

Page 26: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– Narrowing the Gap

Ways to narrow the simulation gap:I Spend less time researching architecture and more time

developing simulators?I Probably would not be well-received by the architecture

community.

I Increase simulator throughput so more simulations can be runin a reasonable amount of time.

I Parallelize them.

I Find ways to make simulator development more efficient.

If we make simulator development more efficient, we increase therate at which simulation capacity can grow.

Page 27: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– Narrowing the Gap

Ways to narrow the simulation gap:I Spend less time researching architecture and more time

developing simulators?I Probably would not be well-received by the architecture

community.

I Increase simulator throughput so more simulations can be runin a reasonable amount of time.

I Parallelize them.

I Find ways to make simulator development more efficient.

If we make simulator development more efficient, we increase therate at which simulation capacity can grow.

Page 28: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– Narrowing the Gap

Ways to narrow the simulation gap:I Spend less time researching architecture and more time

developing simulators?I Probably would not be well-received by the architecture

community.

I Increase simulator throughput so more simulations can be runin a reasonable amount of time.

I Parallelize them.

I Find ways to make simulator development more efficient.

If we make simulator development more efficient, we increase therate at which simulation capacity can grow.

Page 29: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– Narrowing the Gap

Ways to narrow the simulation gap:I Spend less time researching architecture and more time

developing simulators?I Probably would not be well-received by the architecture

community.

I Increase simulator throughput so more simulations can be runin a reasonable amount of time.

I Parallelize them.

I Find ways to make simulator development more efficient.

If we make simulator development more efficient, we increase therate at which simulation capacity can grow.

Page 30: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– Narrowing the Gap

Ways to narrow the simulation gap:I Spend less time researching architecture and more time

developing simulators?I Probably would not be well-received by the architecture

community.

I Increase simulator throughput so more simulations can be runin a reasonable amount of time.

I Parallelize them.

I Find ways to make simulator development more efficient.

If we make simulator development more efficient, we increase therate at which simulation capacity can grow.

Page 31: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– Front Ends

What is a front end?

I Most simulators are broken into a front end and a back endby their designers.

I The front end handles the execution of instructions (makingsure the register state is correct).

I Because instruction sets are very complex, front ends areusually created by using and modifying an existing emulationsolution or avoiding emulation entirely and tracing nativeexecution.

I The back end handles timing, power, and other metrics (howlong did that instruction take to clear the pipeline).

I Back ends are the part that implements the logic that makesa simulator unique.

Page 32: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– Front Ends

What is a front end?

I Most simulators are broken into a front end and a back endby their designers.

I The front end handles the execution of instructions (makingsure the register state is correct).

I Because instruction sets are very complex, front ends areusually created by using and modifying an existing emulationsolution or avoiding emulation entirely and tracing nativeexecution.

I The back end handles timing, power, and other metrics (howlong did that instruction take to clear the pipeline).

I Back ends are the part that implements the logic that makesa simulator unique.

Page 33: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– Front Ends

What is a front end?

I Most simulators are broken into a front end and a back endby their designers.

I The front end handles the execution of instructions (makingsure the register state is correct).

I Because instruction sets are very complex, front ends areusually created by using and modifying an existing emulationsolution or avoiding emulation entirely and tracing nativeexecution.

I The back end handles timing, power, and other metrics (howlong did that instruction take to clear the pipeline).

I Back ends are the part that implements the logic that makesa simulator unique.

Page 34: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– Front Ends

What is a front end?

I Most simulators are broken into a front end and a back endby their designers.

I The front end handles the execution of instructions (makingsure the register state is correct).

I Because instruction sets are very complex, front ends areusually created by using and modifying an existing emulationsolution or avoiding emulation entirely and tracing nativeexecution.

I The back end handles timing, power, and other metrics (howlong did that instruction take to clear the pipeline).

I Back ends are the part that implements the logic that makesa simulator unique.

Page 35: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– Front Ends

What is a front end?

I Most simulators are broken into a front end and a back endby their designers.

I The front end handles the execution of instructions (makingsure the register state is correct).

I Because instruction sets are very complex, front ends areusually created by using and modifying an existing emulationsolution or avoiding emulation entirely and tracing nativeexecution.

I The back end handles timing, power, and other metrics (howlong did that instruction take to clear the pipeline).

I Back ends are the part that implements the logic that makesa simulator unique.

Page 36: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– Front Ends

What is a front end?

I Most simulators are broken into a front end and a back endby their designers.

I The front end handles the execution of instructions (makingsure the register state is correct).

I Because instruction sets are very complex, front ends areusually created by using and modifying an existing emulationsolution or avoiding emulation entirely and tracing nativeexecution.

I The back end handles timing, power, and other metrics (howlong did that instruction take to clear the pipeline).

I Back ends are the part that implements the logic that makesa simulator unique.

Page 37: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– The Ideal Front End

Trace

Writer

Trace

Reader

TraceBack−End

Results

Back−End

Results

Emulator

Ideally:

I Each front end and back end must be written only once, afterwhich they can be used in any combination, like compier frontends and back ends.

I No additional code would need to be written to adapt generalpurpose emulators for simulation duty.

Page 38: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– The Ideal Front End

Trace

Writer

Trace

Reader

TraceBack−End

Results

Back−End

Results

Emulator

Ideally:

I Each front end and back end must be written only once, afterwhich they can be used in any combination, like compier frontends and back ends.

I No additional code would need to be written to adapt generalpurpose emulators for simulation duty.

Page 39: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– The Ideal Front End

Trace

Writer

Trace

Reader

TraceBack−End

Results

Back−End

Results

Emulator

Ideally:

I Each front end and back end must be written only once, afterwhich they can be used in any combination, like compier frontends and back ends.

I No additional code would need to be written to adapt generalpurpose emulators for simulation duty.

Page 40: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– Real Front Ends

Trace

WriterTrace

Reader

Trace Internal

APIBack−End

Results

Internal

API

Custom

ShimEmulator Back−End

Results

Compatibility

Layers

Realistically:

I Much custom code needs to be written to adapt mostoff-the-shelf emulators as simulator front ends.

I Each time this is done, yet another simulator-specific frontend is created.

Page 41: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– Real Front Ends

Trace

WriterTrace

Reader

Trace Internal

APIBack−End

Results

Internal

API

Custom

ShimEmulator Back−End

Results

Compatibility

Layers

Realistically:

I Much custom code needs to be written to adapt mostoff-the-shelf emulators as simulator front ends.

I Each time this is done, yet another simulator-specific frontend is created.

Page 42: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– Real Front Ends

Trace

WriterTrace

Reader

Trace Internal

APIBack−End

Results

Internal

API

Custom

ShimEmulator Back−End

Results

Compatibility

Layers

Realistically:

I Much custom code needs to be written to adapt mostoff-the-shelf emulators as simulator front ends.

I Each time this is done, yet another simulator-specific frontend is created.

Page 43: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– Front Ends

How do we make new simulator front ends closer to the ideal?

I Provide an API that does not change unnecessarily beacuse ofthe type of front end or the instruction set.

I Enable the construction of multithreaded simulators.

I Provide sufficient control and detail in the API to make ituseable with most back ends.

Page 44: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– Front Ends

How do we make new simulator front ends closer to the ideal?

I Provide an API that does not change unnecessarily beacuse ofthe type of front end or the instruction set.

I Enable the construction of multithreaded simulators.

I Provide sufficient control and detail in the API to make ituseable with most back ends.

Page 45: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– Front Ends

How do we make new simulator front ends closer to the ideal?

I Provide an API that does not change unnecessarily beacuse ofthe type of front end or the instruction set.

I Enable the construction of multithreaded simulators.

I Provide sufficient control and detail in the API to make ituseable with most back ends.

Page 46: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Introduction– Front Ends

How do we make new simulator front ends closer to the ideal?

I Provide an API that does not change unnecessarily beacuse ofthe type of front end or the instruction set.

I Enable the construction of multithreaded simulators.

I Provide sufficient control and detail in the API to make ituseable with most back ends.

Page 47: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

The QSim Simulator Front End

Page 48: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim

I We have built a simlator front end based on QEMU1 thataims to address these issues. It currently:

I Runs unmodified binaries on a lightly-modified Linux guest.I Enables the construction of multithreaded simulators.I Has full support for 32-bit x86 guests. (Port to 64-bit x86

weeks from completion; port to ARM in early coding stages.)I Enables parallel simulation.

1http://www.qemu.org

Page 49: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim

I We have built a simlator front end based on QEMU1 thataims to address these issues. It currently:

I Runs unmodified binaries on a lightly-modified Linux guest.

I Enables the construction of multithreaded simulators.I Has full support for 32-bit x86 guests. (Port to 64-bit x86

weeks from completion; port to ARM in early coding stages.)I Enables parallel simulation.

1http://www.qemu.org

Page 50: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim

I We have built a simlator front end based on QEMU1 thataims to address these issues. It currently:

I Runs unmodified binaries on a lightly-modified Linux guest.I Enables the construction of multithreaded simulators.

I Has full support for 32-bit x86 guests. (Port to 64-bit x86weeks from completion; port to ARM in early coding stages.)

I Enables parallel simulation.

1http://www.qemu.org

Page 51: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim

I We have built a simlator front end based on QEMU1 thataims to address these issues. It currently:

I Runs unmodified binaries on a lightly-modified Linux guest.I Enables the construction of multithreaded simulators.I Has full support for 32-bit x86 guests. (Port to 64-bit x86

weeks from completion; port to ARM in early coding stages.)

I Enables parallel simulation.

1http://www.qemu.org

Page 52: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim

I We have built a simlator front end based on QEMU1 thataims to address these issues. It currently:

I Runs unmodified binaries on a lightly-modified Linux guest.I Enables the construction of multithreaded simulators.I Has full support for 32-bit x86 guests. (Port to 64-bit x86

weeks from completion; port to ARM in early coding stages.)I Enables parallel simulation.

1http://www.qemu.org

Page 53: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim– API Overview

Set callback

Unset callback

RunBack End

Callbacks

QSim

A simplified diagram of the QSim API.

run(i , j) Advance guest CPU i by j instructions.

set * callback(x) Set callbacks.

unset * callback(h) Unset callbacks by handle.

Callback types include: instruction, register access, memory access,interrupt

Page 54: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim– How is it “Universal”?

Typical emulator used as a simulator front end:

Internal

API

Custom

ShimEmulator Back−End

Results

I Off-the-shelf open-source emulators like QEMU provide mostof the code needed to build a front end but are incomplete.

I Simulation projects like PTLSim and FAST have modifiedQEMU heavily to create their front ends.

Page 55: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim– How is it “Universal”?

Typical emulator used as a simulator front end:

Internal

API

Custom

ShimEmulator Back−End

Results

I Off-the-shelf open-source emulators like QEMU provide mostof the code needed to build a front end but are incomplete.

I Simulation projects like PTLSim and FAST have modifiedQEMU heavily to create their front ends.

Page 56: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim– How is it “Universal”?

Typical emulator used as a simulator front end:

Internal

API

Custom

ShimEmulator Back−End

Results

I Off-the-shelf open-source emulators like QEMU provide mostof the code needed to build a front end but are incomplete.

I Simulation projects like PTLSim and FAST have modifiedQEMU heavily to create their front ends.

Page 57: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim– How is it “Universal”?

QSim:

Back−End

Results

Custom

ShimEmulator

API

QSim

I QSim has done this yet again, but with compatibility with adiverse set of back ends in mind.

I Similarly, the API is the same no matter what the instructionset.

Page 58: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim– How is it “Universal”?

QSim:

Back−End

Results

Custom

ShimEmulator

API

QSim

I QSim has done this yet again, but with compatibility with adiverse set of back ends in mind.

I Similarly, the API is the same no matter what the instructionset.

Page 59: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim– How is it “Universal”?

QSim:

Back−End

Results

Custom

ShimEmulator

API

QSim

I QSim has done this yet again, but with compatibility with adiverse set of back ends in mind.

I Similarly, the API is the same no matter what the instructionset.

Page 60: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim– Parallelism

I The run() function can be called simultaneously for twodifferent guest CPUs from different host threads.

I This enables parallel emulation of multithreaded guests, aswell as multithreaded back ends.

I Since back ends tend to use more CPU time than front ends,thread safety is more important than parallel emulation (bothare provided by QSim).

I Up to 512 guest cores have been demonstrated running on upto 512 host threads.

Page 61: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim– Parallelism

I The run() function can be called simultaneously for twodifferent guest CPUs from different host threads.

I This enables parallel emulation of multithreaded guests, aswell as multithreaded back ends.

I Since back ends tend to use more CPU time than front ends,thread safety is more important than parallel emulation (bothare provided by QSim).

I Up to 512 guest cores have been demonstrated running on upto 512 host threads.

Page 62: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim– Parallelism

I The run() function can be called simultaneously for twodifferent guest CPUs from different host threads.

I This enables parallel emulation of multithreaded guests, aswell as multithreaded back ends.

I Since back ends tend to use more CPU time than front ends,thread safety is more important than parallel emulation (bothare provided by QSim).

I Up to 512 guest cores have been demonstrated running on upto 512 host threads.

Page 63: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim– Parallelism

I The run() function can be called simultaneously for twodifferent guest CPUs from different host threads.

I This enables parallel emulation of multithreaded guests, aswell as multithreaded back ends.

I Since back ends tend to use more CPU time than front ends,thread safety is more important than parallel emulation (bothare provided by QSim).

I Up to 512 guest cores have been demonstrated running on upto 512 host threads.

Page 64: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim– Software Architecture

Page 65: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim– Parallel Scaling

Page 66: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim– Performance

The following represents the performance of QSim with emptycallbacks. With typical simulation speeds measured in thousands ofinstructions per second, QSim will not likely be the bottleneck.

Benchmark Slowdown MIPSswaptions 259x 18.5

mtgl-bfs 387x 36.6

ocean-non-contig 267x 40.7

Page 67: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim– Relation to QEMU

How are QEMU and QSim related?

QEMU QSim

Emulator Simulator Front-EndStandalone Built on QEMUFull-system CPU and RAM only

CPUs serialized CPUs in parallelProgram Library

Page 68: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim– Relation to QEMU

How are QEMU and QSim related?

QEMU QSimEmulator Simulator Front-End

Standalone Built on QEMUFull-system CPU and RAM only

CPUs serialized CPUs in parallelProgram Library

Page 69: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim– Relation to QEMU

How are QEMU and QSim related?

QEMU QSimEmulator Simulator Front-End

Standalone Built on QEMU

Full-system CPU and RAM onlyCPUs serialized CPUs in parallel

Program Library

Page 70: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim– Relation to QEMU

How are QEMU and QSim related?

QEMU QSimEmulator Simulator Front-End

Standalone Built on QEMUFull-system CPU and RAM only

CPUs serialized CPUs in parallelProgram Library

Page 71: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim– Relation to QEMU

How are QEMU and QSim related?

QEMU QSimEmulator Simulator Front-End

Standalone Built on QEMUFull-system CPU and RAM only

CPUs serialized CPUs in parallel

Program Library

Page 72: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

QSim– Relation to QEMU

How are QEMU and QSim related?

QEMU QSimEmulator Simulator Front-End

Standalone Built on QEMUFull-system CPU and RAM only

CPUs serialized CPUs in parallelProgram Library

Page 73: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Back Ends

Page 74: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Back Ends– The Canonical Example

Host Thread 3Host Thread 2Host Thread 1

Parallel Discrete Event Simulation Engine

. . .

CPU

Component

CPU

Component

QSim

. . . Other

Components

This is the kind of simulation QSim was designed for.

I QSim feeds instructions into CPU timing models that are partof a larger simulation infrastructure.

I A parallel discrete event simulation engine keeps track ofevents and ensures correctness.

Page 75: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Back Ends– The Canonical Example

Host Thread 3Host Thread 2Host Thread 1

Parallel Discrete Event Simulation Engine

. . .

CPU

Component

CPU

Component

QSim

. . . Other

Components

This is the kind of simulation QSim was designed for.

I QSim feeds instructions into CPU timing models that are partof a larger simulation infrastructure.

I A parallel discrete event simulation engine keeps track ofevents and ensures correctness.

Page 76: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Back Ends– The Canonical Example

Host Thread 3Host Thread 2Host Thread 1

Parallel Discrete Event Simulation Engine

. . .

CPU

Component

CPU

Component

QSim

. . . Other

Components

This is the kind of simulation QSim was designed for.

I QSim feeds instructions into CPU timing models that are partof a larger simulation infrastructure.

I A parallel discrete event simulation engine keeps track ofevents and ensures correctness.

Page 77: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Back Ends– Others

Other back ends that have been built for QSim include:

I A binary trace writer, which was built along with a tracereader library that exports the QSim API.

I A serial universal processor emulator, simplesim.I A demonstration vehicle; breaks instructions into plausible

micro-ops regardless of instruction set.

I An interactive OS/application debugger.

I Visualization utilities.

Page 78: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Back Ends– Others

Other back ends that have been built for QSim include:

I A binary trace writer, which was built along with a tracereader library that exports the QSim API.

I A serial universal processor emulator, simplesim.I A demonstration vehicle; breaks instructions into plausible

micro-ops regardless of instruction set.

I An interactive OS/application debugger.

I Visualization utilities.

Page 79: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Back Ends– Others

Other back ends that have been built for QSim include:

I A binary trace writer, which was built along with a tracereader library that exports the QSim API.

I A serial universal processor emulator, simplesim.

I A demonstration vehicle; breaks instructions into plausiblemicro-ops regardless of instruction set.

I An interactive OS/application debugger.

I Visualization utilities.

Page 80: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Back Ends– Others

Other back ends that have been built for QSim include:

I A binary trace writer, which was built along with a tracereader library that exports the QSim API.

I A serial universal processor emulator, simplesim.I A demonstration vehicle; breaks instructions into plausible

micro-ops regardless of instruction set.

I An interactive OS/application debugger.

I Visualization utilities.

Page 81: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Back Ends– Others

Other back ends that have been built for QSim include:

I A binary trace writer, which was built along with a tracereader library that exports the QSim API.

I A serial universal processor emulator, simplesim.I A demonstration vehicle; breaks instructions into plausible

micro-ops regardless of instruction set.

I An interactive OS/application debugger.

I Visualization utilities.

Page 82: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Back Ends– Others

Other back ends that have been built for QSim include:

I A binary trace writer, which was built along with a tracereader library that exports the QSim API.

I A serial universal processor emulator, simplesim.I A demonstration vehicle; breaks instructions into plausible

micro-ops regardless of instruction set.

I An interactive OS/application debugger.

I Visualization utilities.

Page 83: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Summary

This has been:

I An appeal for consistent front end/back end APIs.

I A plea for parallel front ends.

I A look at how we might narrow the simulation gap.

I An invitation to try QSim2.

2http://www.cdkersey.com/qsim-web/

Page 84: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Summary

This has been:

I An appeal for consistent front end/back end APIs.

I A plea for parallel front ends.

I A look at how we might narrow the simulation gap.

I An invitation to try QSim2.

2http://www.cdkersey.com/qsim-web/

Page 85: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Summary

This has been:

I An appeal for consistent front end/back end APIs.

I A plea for parallel front ends.

I A look at how we might narrow the simulation gap.

I An invitation to try QSim2.

2http://www.cdkersey.com/qsim-web/

Page 86: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Summary

This has been:

I An appeal for consistent front end/back end APIs.

I A plea for parallel front ends.

I A look at how we might narrow the simulation gap.

I An invitation to try QSim2.

2http://www.cdkersey.com/qsim-web/

Page 87: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Summary

This has been:

I An appeal for consistent front end/back end APIs.

I A plea for parallel front ends.

I A look at how we might narrow the simulation gap.

I An invitation to try QSim2.

2http://www.cdkersey.com/qsim-web/

Page 88: A Universal Parallel Front End for Execution Driven ...€¦ · A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili

Acknowledgements

The authors are grateful to Paolo Faraboschi and Daniel Ortega fortheir suggestions and guidance in getting QSim started. This workwas supported by the National Science Foundation under grantCNS855110, Sandia National Laboratories, and HP Laboratories.