Simulation: Modeling + Execution On...

17
On Simulation Jakob Engblom, PhD Virtutech & Uppsala University [email protected] ESSES, 4 Sept 2003 5 Simulation: Modeling + Execution Build a model of the system Try various scenarios on this model Experimental, not analytical approach Understand the real system by working with the model More available More inspectable Less dangerous ESSES, 4 Sept 2003 6 Simulation or Analysis Simulation gets closer to real world More details Fewer assumptions High computational workload Analytical models Efficient predictors Low computational workload ... but more removed from world=less accurate ESSES, 4 Sept 2003 7 Sufficient Level of Detail Maintain sufficient details To observe relevant aspects of reality To avoid artifacts of experiment Abstract away unimportant aspects Newtonian vs. quantum physics Timing vs. function Danger: bad abstractions = bad simulation

Transcript of Simulation: Modeling + Execution On...

On Simulation

Jakob Engblom, PhDVirtutech & Uppsala University

[email protected]

ESSES, 4 Sept 20035

Simulation: Modeling + Execution

• Build a model of the system• Try various scenarios on this model

– Experimental, not analytical approach• Understand the real system by

working with the model– More available– More inspectable– Less dangerous

ESSES, 4 Sept 20036

Simulation or Analysis

• Simulation gets closer to real world– More details– Fewer assumptions– High computational workload

• Analytical models– Efficient predictors– Low computational workload– ... but more removed from world=less accurate

ESSES, 4 Sept 20037

Sufficient Level of Detail

• Maintain sufficient details– To observe relevant aspects of reality– To avoid artifacts of experiment

• Abstract away unimportant aspects– Newtonian vs. quantum physics– Timing vs. function

• Danger: bad abstractions = bad simulation

ESSES, 4 Sept 20038

Scope versus Abstraction

Scope of modelLe

velo

fabs

trac

tion

String theory

Atom TheUniverse

Galaxies

Reasonable to simulate: scope

proportional to abstraction

To simulate the universe, the units of simulation have

to be galaxies

Simulating a single atom, we can use

the incredible detail of quantom

mechanics and string theory

ESSES, 4 Sept 20039

Example: Scope/Detail tradeoff

• ”GPL”• ”Life-like”

action:– Momentum– Friction– Steering– Engine torque

• Not nuts & bolts of cars

Grand-Prix Legends

ESSES, 4 Sept 200310

Simulation is never perfect

• It is never quite the real thing ...

• ...but it can be very close indeed

Simulating Computers

ESSES, 4 Sept 200312

Simulating Computer Systems

• We need to decide the level of abstraction• More detail = smaller scope• Less detail = larger scope

– Size of systems that can be investigated– Number of different systems

• Measure of scope: speed– As number of software instructions per second

ESSES, 4 Sept 200313

What do we need to simulate?

Simulating Computer Systems

Program

Stimuli

Peripherals

Processor

ESSES, 4 Sept 200314

Detailed Hardware Models

• Transistor-level model– Very close to actual

implementation• Small scope

– Small piece of HW– Small programs– Stimuli at bit level– Speed: 100s of instructions per second– With 25MUSD hardware: 10-100 KIPS

• Necessary for hardware development

ESSES, 4 Sept 200315

Instruction-Set Simulation

• Model computer at instruction set level– Stable & defined interface– The level where hardware & software meet– Stimuli at transaction level

• Abstractions to increase scope:– Keep functionality correct– Vary fidelity in timing– Simplify some behavior

• Speed: 10 KIPS to 100 MIPS700 MIPS

Key issue: there can be no software

visible difference (including to the OS)

ESSES, 4 Sept 200316

Sufficient Detail of Model

• Complete from a software perspective– All readable values represented– All registers of CPU implemented– Software=OS, drivers, applications, middleware, ...

• Hardware considered as a set of devices– I/O-space or memory mapped– Behavior at level seen by device drivers

• No “abstract” networks, all concrete

• Next slide: example of detail required

ESSES, 4 Sept 200317

ESSES, 4 Sept 200318

Instruction-Set Simulation

• To run real workloads, you need– Hardware: CPU & devices– OS and other services– Stimuli to feed them

• Common methods to achieve this– Full-system simulation– Virtualization– User-level simulation

ESSES, 4 Sept 200319

Full-System Simulation

One physical computer

Virtutech Simics

Virtual computer systems of many different types

ESSES, 4 Sept 200320

Full-System Simulation

One physical computer

Virtualization system

Several virtual computers of the same type

VirtualizationNotNot

ESSES, 4 Sept 200321

Full-System Simulation

Hardware

CPU

Operating system

User program

RAM

Network

Disk

Middleware DB ServersReal OS & Software

GPU

Simulated hardware

Device controller

ESSES, 4 Sept 200322

Full-System SimulationUser-level simulation

NotNot

Real user program

Hardware

CPU

Operating system

User program

RAM

Middleware DB Servers

Simulated OS,

services, some HW

ESSES, 4 Sept 200323

Speed

• Depends of level of timing detail in model• Slowest: cycle-accurate simulation

– Hardware timing modeled in great detail• Fastest: emulation (user-level only)• Sweet spot: somewhere inbetween

– Simics tries to hit this spot– Configurable level of detail

ESSES, 4 Sept 200324

Speed

speed

accuracy

emulator(5x)

10 KIPS 1000 MIPS

cycle-accuratesimulator

(>10,000x)

Detailed hardware sim

Virtualization

fast full-system simulation(20-400x)

ESSES, 4 Sept 200325

Going up in Scope

• Interesting systems are larger than single CPU• Multiprocessors

– Homogeneous like servers– Heterogeneous like mobile phones

• Distributed systems– Local-Area Networks– Embedded CAN buses– Networks-on-chips

• = Simulated shared memory, networks

ESSES, 4 Sept 200326

Distributed Network Simulation

• Level of simulation– Entire packets, not physical layer– Simulate the network cards in nodes

• Spread simulation across multiple machines– Necessary increase of speed

• Still, maintain determinism– Synchronize simulated machines– One machine stops, all machines stop– Global checkpointing & restore

ESSES, 4 Sept 200327

Network Simulation

Real network of physical machines

Simulated network of simulated machines

Interface to real network

if needed

Simulation Advantages

ESSES, 4 Sept 200329

Simulation Advantages

• Configurability– Simulate anything,

• Independent of available hardware• Target architecture• System configuration

• Availability– Easy to copy setup, no manufacturing involved

• Determinism– Removes real-world indeterminism– Synchronization across machines and networks

ESSES, 4 Sept 200330

Simulation Advantages

• Checkpoint & restart– Save state of machine to reload later– Parallelize & repeat runs– Distribute fixed starting points

• Non-intrusive inspection & tracing– Any events or state in the machine– Does not affect running system– IO events, hardware events

• Deep inspection of system state– Caches, TLBs, registers, device registers, buffers ...

ESSES, 4 Sept 200331

Simulation Advantages

• Sandboxing– Completely walled-in – No hidden communications– Undo state changes– Dangerous experiments possible– Viruses, worms, buffer overflows, …

• Magic instructions:– Allows programs to communicate

with outside

HW/SW Cosimulation

Program

Peripherals

ESSES, 4 Sept 200333

Integrated Systems

• Highly-integrated devices on the rise

• Develop HW & SW in parallel

• Simulate hardware and software together in development of entire system

DSP

LCD driver

CPU

Blu

etoo

th

GSM Radio

Code memory

Data mem

ESSES, 4 Sept 200334

Big Systems & Small Details

• To achieve speed: reduce level of detail• To capture important effects: increase level

• Solution: model only parts at great detail– Finished hardware can be modeled simply– Model only what needs to be observed– Mostly, no need for RTL-level understanding

ESSES, 4 Sept 200335

Transactions vs Pins

• Transaction-level modeling:– Model transactions as a unit– Level of model:

• ”Memory read” / ”Network packet send” / ... – Only when something is activated

• Pin-level modeling:– Model detailed electronics of a transaction– Level of model:

• Individual pins• Clocked pulsing of transmission pins

– Every clock cycle

0101011001011

ESSES, 4 Sept 200336

Clocking vs Blocking

• Traditional hardware modeling:– One (or two) step per clock cycle– Clock to generate evolution of internal state– =All devices called each cycle

• Large overhead for context switching

• Optimized hardware modeling (blocking):– Only call when events (read, writes) occur– Evolve internal state several cycles at a time

• Count the time since last activation– Lower context switch overhead

ESSES, 4 Sept 200337

Transactions/Events vs Pins/Clock

• Example: device read operations• Transaction:

– Call device model: (op=read, address=0x17)– Immediate reply: data=0x420x42

• Pins:– Set address pins to 00011110– Drive clock pin to 1 and then 0– ... until data ready pin is 1– Then read 01000010 from data pins

CPU Device

CPU Device

ESSES, 4 Sept 200338

HW/SW Cosimulation: fast

Interface:transactions,

events, maybe clock cycles

Simics

(RT)OSApplication

Drivers

Memory

CPU Core

Devices

Behavioral model

ESSES, 4 Sept 200339

VHDL/Verilog Simulator

RTL-level simulation

HW/SW Cosimulation: detailed

Simics

(RT)OSApplication

Drivers

Memory

CPU Core

Devices

Interface: pins,

clock cycles

ESSES, 4 Sept 200340

Device Modeling

• Large part of work for a platform– Processors: few and standardized– Devices: (very) many and varied. But simpler. – Still pretty fast, at transaction level

• Modeling devices:– C/C++/Python with simulator APIs– SystemC– VHDL/Verilog– Graphical languages (Magic-C)

Stimulating a Simulation

Stimuli

ESSES, 4 Sept 200342

Stimuli

• Without proper stimuli, model is useless• Feed mechanism

– How to get information into the simulation• Data generation

– What to supply to the simulation – Can get tricky

ESSES, 4 Sept 200343

Regular Computers

• Fixed inputs– Spec benchmarks: loaded from disk

• Network– Load generation on simulated machines– Interface to a real network

• Interactive use• Load generators on real machines

• Keyboard & mouse– Map directly to real device– Easy for PC-on-PC-style– Interactive user

ESSES, 4 Sept 200344

Non-traditional Computers

• Phones, navigation computers, PDAs, etc.

• Application development– Use GUI to provide interactive

sessions with user– Keyboard, joystick, touch screen– Not radio data etc.

ESSES, 4 Sept 200345

Physical World Interaction

• Special simulated devices– Sensors & actuators

• Data sources– Statistical models of real

system behavior– Simulation models of

physical reality– Hardware-in-the-loop

simulation

ESSES, 4 Sept 200346

Configuration as Stimuli

• Stimuli = hardware configuration• Booting an operating system

– Test of OS software vs hardware– Reconfigure hardware, alter devices

• Self-configuring systems– Networks & other distributed systems– Master election, device discovery, etc.– Adding/removing simulated nodes

ESSES, 4 Sept 200347

Workload Scaling

• Problem: simulation is slow– Especially for detailed architectural simulation– Slowdown 10000:

• 1 minute real time = 7 days simulation time

• Scale (down) workloads to fit – Smaller data sets– How to make representative of full runs?– Tricky problem in its own right

Using Simulation

ESSES, 4 Sept 200349

Software Development

• Low-level software development– Supervisor-level (OS) & interrupt code debug– Inspection of system state– Device access tracing & breakpoints– Debugging unfinished operating systems– Developing drivers

• High-level software development– Powerful debugger, with checkpointing

ESSES, 4 Sept 200350

Hardware Replacement

• Embedded HW– Cheaper, more convenient, available, stable– Often 10000+ USD development platforms

• Virtual platform for early software dev– Boards under development– AMD64 (Hammer, Opteron, Athlon 64)

• Saved months for the Linux/AMD64 ports

– Next-gen UltraSparcs, ...

ESSES, 4 Sept 200351

Hardware Development

• Model hardware in development• Test components before physical prototype• Stimulate HW with real workloads

– Requires ability to run operating systems• HW/SW cosimulation

– At various levels of detail• Shortens time to market dramatically

ESSES, 4 Sept 200352

Parallelization of Development

Board design Board prototype

Software development

Handoff to the software team, when ”working” hardware exists

Board design & build simulator

Board prototype

Software developmentHandoff to the software team, using

a simulation of the hardware platform

Simulator = reference

ESSES, 4 Sept 200353

Network Software

• Develop network stacks & protocols– Easy to instrument the network, trace traffic– Easy to inject packets – No interference from other traffic– Synchronous breaks at important events

• Try network configurations– Large networks– Pathological topologies

ESSES, 4 Sept 200354

Performance Tuning

• Performance tuning of software– Trace & statistics on performance events– Cache misses, TLB misses, disk accesses– Memory access patterns– Get first-order estimates from event counts

• Absolute performance measurements– Requires very detailed models– Not a design goal of large-scale simulators

ESSES, 4 Sept 200355

Faults and Boundary Cases

• Fault injection– Repeatable, no physical damage necessary– Fault tolerant systems, safety critical systems– Examples: next slide

• Boundary case testing– Extremely small or large configurations– Intense bursts of interrupts– Communications latencies and intensity

ESSES, 4 Sept 200356

RAM

CPU

Network

Sensor

Device

BridgeDevice

Fault Injection Examples

Transient errors in xmit

Corrupt register values

Permanent

bit errors

Corrupt measurements

Kill entire

subsystem

Corrupt network

packets; unplugUnplug a

device

ESSES, 4 Sept 200357

Restore checkpoint

Check results

Run 2

Fault Injection & Checkpointing

Boot system SW

Position workload

Check results

Take checkpoint CKP

injectedfault

Restore checkpoint

Check results

Run 1

Did the fault

affect the result?

ESSES, 4 Sept 200358

Teaching

• Enable hands-on experience• Computer architecture• Embedded systems programming• Operating systems

– Debug half-finished systems– Same setup for all students, easy handins

• System management– Easy to restore system state– No risk to real machines and networks

Simulate with Care

ESSES, 4 Sept 200360

Obtaining Significant Results

• Computer architecture research– 90% or more done in simulation– Measure of success:

effect of modification to a reference machine– What is a significant result? -5%?+10%?

• How real is the machine modified?– SimpleScalar is not a real processor

• = need for quite extensive modeling

ESSES, 4 Sept 200361

Wisconsin Experiments

• Mark Hill et al, IEEE Computer Feb 2003• Investigating potential pitfalls of simulation• Detailed microarchitectural modeling

– Pipeline, caches, reordering, the works– Randomized L2 miss time (80-89 cycles)

• Several runs with same workload• Variable results!

ESSES, 4 Sept 200362

Wisconsin Experiments

• WCR (16,32) = 18% (“Wrong Conclusion Ratio”)• WCR (16,64) = 7.5%• WCR (32,64) = 26%

16 32 64

ROB Size

2.5

3.0

3.5

4.0

Cycle

s Per

Tran

s. (m

illion

s)

maxavgmin

ESSES, 4 Sept 200363

Wisconsin Experiments

5 10 15 20

Sample Size (number of runs)

2.6

2.8

3.0

3.2

3.4

Cycle

s Per

Tran

s. (m

illion

s)3264

ESSES, 4 Sept 200364

Wisconsin Experiments

• Conclusions:– Simulation no different from runs on real HW– Use standard statistics– Non-overlapping confidence intervals

• Danger of determinism in simulation– Testing a single path of a program– Induce variability by randomization

Implementations

ESSES, 4 Sept 200366

Full-System Simulators

• Integrated full-system simulators– Virtutech Simics (better at abstraction)– Virtio Virtio (more on the pin-level)

• Frameworks for combining discrete simulators– Combine ISS with VHDL and Verilog simulators– Mentor Graphics Seamless– Cadence Incisive

• Virtualization environments– VmWare VmWare (fast PC-on-PC)– Connectix/Microsoft VirtualPC (fast PC-on-PC)

Almost done...

ESSES, 4 Sept 200368

Cost of Simulation• Sim in 1977: on DEC VAX-11/780

– 200,000 USD (1977 dollars)– 1 VAX MIPS– simulation technology ~200– cost for simulated server hour: 4,000 USD

• Sim in 2002: on Dell PC (P4 2.2 GHz)– 1,500 USD– approx 3100 VAX MIPS– simulation technology ~40– cost for simulated server hour: 2 USD

… a factor of 2000 in 25 years!

Photo courtesy of The Computer Museum History Center

Phot

o co

urte

sy o

f Int

el Windows XP/64 on AMD Hammer

Windows NT on x86VxWorks on

PowerPC

Linux on x86

Linux on Itanium

Solaris on Sun SunFire

All running on a Linux host

ESSES, 4 Sept 200370

Demo Time!

• Booting Linux • Lifting checkpoints of Windows and Solaris• Kernel debugging• IO access history• Configuration file syntax• ... and more ..

Thank You

http://www.virtutech.comhttp://[email protected]