Simulation: Modeling + Execution On...
Transcript of Simulation: Modeling + Execution On...
On Simulation
Jakob Engblom, PhDVirtutech & Uppsala University
ESSES, 4 Sept 20035
Simulation: Modeling + Execution
• Build a model of the system• Try various scenarios on this model
– Experimental, not analytical approach• Understand the real system by
working with the model– More available– More inspectable– Less dangerous
ESSES, 4 Sept 20036
Simulation or Analysis
• Simulation gets closer to real world– More details– Fewer assumptions– High computational workload
• Analytical models– Efficient predictors– Low computational workload– ... but more removed from world=less accurate
ESSES, 4 Sept 20037
Sufficient Level of Detail
• Maintain sufficient details– To observe relevant aspects of reality– To avoid artifacts of experiment
• Abstract away unimportant aspects– Newtonian vs. quantum physics– Timing vs. function
• Danger: bad abstractions = bad simulation
ESSES, 4 Sept 20038
Scope versus Abstraction
Scope of modelLe
velo
fabs
trac
tion
String theory
Atom TheUniverse
Galaxies
Reasonable to simulate: scope
proportional to abstraction
To simulate the universe, the units of simulation have
to be galaxies
Simulating a single atom, we can use
the incredible detail of quantom
mechanics and string theory
ESSES, 4 Sept 20039
Example: Scope/Detail tradeoff
• ”GPL”• ”Life-like”
action:– Momentum– Friction– Steering– Engine torque
• Not nuts & bolts of cars
Grand-Prix Legends
ESSES, 4 Sept 200310
Simulation is never perfect
• It is never quite the real thing ...
• ...but it can be very close indeed
Simulating Computers
ESSES, 4 Sept 200312
Simulating Computer Systems
• We need to decide the level of abstraction• More detail = smaller scope• Less detail = larger scope
– Size of systems that can be investigated– Number of different systems
• Measure of scope: speed– As number of software instructions per second
ESSES, 4 Sept 200313
What do we need to simulate?
Simulating Computer Systems
Program
Stimuli
Peripherals
Processor
ESSES, 4 Sept 200314
Detailed Hardware Models
• Transistor-level model– Very close to actual
implementation• Small scope
– Small piece of HW– Small programs– Stimuli at bit level– Speed: 100s of instructions per second– With 25MUSD hardware: 10-100 KIPS
• Necessary for hardware development
ESSES, 4 Sept 200315
Instruction-Set Simulation
• Model computer at instruction set level– Stable & defined interface– The level where hardware & software meet– Stimuli at transaction level
• Abstractions to increase scope:– Keep functionality correct– Vary fidelity in timing– Simplify some behavior
• Speed: 10 KIPS to 100 MIPS700 MIPS
Key issue: there can be no software
visible difference (including to the OS)
ESSES, 4 Sept 200316
Sufficient Detail of Model
• Complete from a software perspective– All readable values represented– All registers of CPU implemented– Software=OS, drivers, applications, middleware, ...
• Hardware considered as a set of devices– I/O-space or memory mapped– Behavior at level seen by device drivers
• No “abstract” networks, all concrete
• Next slide: example of detail required
ESSES, 4 Sept 200317
ESSES, 4 Sept 200318
Instruction-Set Simulation
• To run real workloads, you need– Hardware: CPU & devices– OS and other services– Stimuli to feed them
• Common methods to achieve this– Full-system simulation– Virtualization– User-level simulation
ESSES, 4 Sept 200319
Full-System Simulation
One physical computer
Virtutech Simics
Virtual computer systems of many different types
ESSES, 4 Sept 200320
Full-System Simulation
One physical computer
Virtualization system
Several virtual computers of the same type
VirtualizationNotNot
ESSES, 4 Sept 200321
Full-System Simulation
Hardware
CPU
Operating system
User program
RAM
Network
Disk
Middleware DB ServersReal OS & Software
GPU
Simulated hardware
Device controller
ESSES, 4 Sept 200322
Full-System SimulationUser-level simulation
NotNot
Real user program
Hardware
CPU
Operating system
User program
RAM
Middleware DB Servers
Simulated OS,
services, some HW
ESSES, 4 Sept 200323
Speed
• Depends of level of timing detail in model• Slowest: cycle-accurate simulation
– Hardware timing modeled in great detail• Fastest: emulation (user-level only)• Sweet spot: somewhere inbetween
– Simics tries to hit this spot– Configurable level of detail
ESSES, 4 Sept 200324
Speed
speed
accuracy
emulator(5x)
10 KIPS 1000 MIPS
cycle-accuratesimulator
(>10,000x)
Detailed hardware sim
Virtualization
fast full-system simulation(20-400x)
ESSES, 4 Sept 200325
Going up in Scope
• Interesting systems are larger than single CPU• Multiprocessors
– Homogeneous like servers– Heterogeneous like mobile phones
• Distributed systems– Local-Area Networks– Embedded CAN buses– Networks-on-chips
• = Simulated shared memory, networks
ESSES, 4 Sept 200326
Distributed Network Simulation
• Level of simulation– Entire packets, not physical layer– Simulate the network cards in nodes
• Spread simulation across multiple machines– Necessary increase of speed
• Still, maintain determinism– Synchronize simulated machines– One machine stops, all machines stop– Global checkpointing & restore
ESSES, 4 Sept 200327
Network Simulation
Real network of physical machines
Simulated network of simulated machines
Interface to real network
if needed
Simulation Advantages
ESSES, 4 Sept 200329
Simulation Advantages
• Configurability– Simulate anything,
• Independent of available hardware• Target architecture• System configuration
• Availability– Easy to copy setup, no manufacturing involved
• Determinism– Removes real-world indeterminism– Synchronization across machines and networks
ESSES, 4 Sept 200330
Simulation Advantages
• Checkpoint & restart– Save state of machine to reload later– Parallelize & repeat runs– Distribute fixed starting points
• Non-intrusive inspection & tracing– Any events or state in the machine– Does not affect running system– IO events, hardware events
• Deep inspection of system state– Caches, TLBs, registers, device registers, buffers ...
ESSES, 4 Sept 200331
Simulation Advantages
• Sandboxing– Completely walled-in – No hidden communications– Undo state changes– Dangerous experiments possible– Viruses, worms, buffer overflows, …
• Magic instructions:– Allows programs to communicate
with outside
HW/SW Cosimulation
Program
Peripherals
ESSES, 4 Sept 200333
Integrated Systems
• Highly-integrated devices on the rise
• Develop HW & SW in parallel
• Simulate hardware and software together in development of entire system
DSP
LCD driver
CPU
Blu
etoo
th
GSM Radio
Code memory
Data mem
ESSES, 4 Sept 200334
Big Systems & Small Details
• To achieve speed: reduce level of detail• To capture important effects: increase level
• Solution: model only parts at great detail– Finished hardware can be modeled simply– Model only what needs to be observed– Mostly, no need for RTL-level understanding
ESSES, 4 Sept 200335
Transactions vs Pins
• Transaction-level modeling:– Model transactions as a unit– Level of model:
• ”Memory read” / ”Network packet send” / ... – Only when something is activated
• Pin-level modeling:– Model detailed electronics of a transaction– Level of model:
• Individual pins• Clocked pulsing of transmission pins
– Every clock cycle
0101011001011
ESSES, 4 Sept 200336
Clocking vs Blocking
• Traditional hardware modeling:– One (or two) step per clock cycle– Clock to generate evolution of internal state– =All devices called each cycle
• Large overhead for context switching
• Optimized hardware modeling (blocking):– Only call when events (read, writes) occur– Evolve internal state several cycles at a time
• Count the time since last activation– Lower context switch overhead
ESSES, 4 Sept 200337
Transactions/Events vs Pins/Clock
• Example: device read operations• Transaction:
– Call device model: (op=read, address=0x17)– Immediate reply: data=0x420x42
• Pins:– Set address pins to 00011110– Drive clock pin to 1 and then 0– ... until data ready pin is 1– Then read 01000010 from data pins
CPU Device
CPU Device
ESSES, 4 Sept 200338
HW/SW Cosimulation: fast
Interface:transactions,
events, maybe clock cycles
Simics
(RT)OSApplication
Drivers
Memory
CPU Core
Devices
Behavioral model
ESSES, 4 Sept 200339
VHDL/Verilog Simulator
RTL-level simulation
HW/SW Cosimulation: detailed
Simics
(RT)OSApplication
Drivers
Memory
CPU Core
Devices
Interface: pins,
clock cycles
ESSES, 4 Sept 200340
Device Modeling
• Large part of work for a platform– Processors: few and standardized– Devices: (very) many and varied. But simpler. – Still pretty fast, at transaction level
• Modeling devices:– C/C++/Python with simulator APIs– SystemC– VHDL/Verilog– Graphical languages (Magic-C)
Stimulating a Simulation
Stimuli
ESSES, 4 Sept 200342
Stimuli
• Without proper stimuli, model is useless• Feed mechanism
– How to get information into the simulation• Data generation
– What to supply to the simulation – Can get tricky
ESSES, 4 Sept 200343
Regular Computers
• Fixed inputs– Spec benchmarks: loaded from disk
• Network– Load generation on simulated machines– Interface to a real network
• Interactive use• Load generators on real machines
• Keyboard & mouse– Map directly to real device– Easy for PC-on-PC-style– Interactive user
ESSES, 4 Sept 200344
Non-traditional Computers
• Phones, navigation computers, PDAs, etc.
• Application development– Use GUI to provide interactive
sessions with user– Keyboard, joystick, touch screen– Not radio data etc.
ESSES, 4 Sept 200345
Physical World Interaction
• Special simulated devices– Sensors & actuators
• Data sources– Statistical models of real
system behavior– Simulation models of
physical reality– Hardware-in-the-loop
simulation
ESSES, 4 Sept 200346
Configuration as Stimuli
• Stimuli = hardware configuration• Booting an operating system
– Test of OS software vs hardware– Reconfigure hardware, alter devices
• Self-configuring systems– Networks & other distributed systems– Master election, device discovery, etc.– Adding/removing simulated nodes
ESSES, 4 Sept 200347
Workload Scaling
• Problem: simulation is slow– Especially for detailed architectural simulation– Slowdown 10000:
• 1 minute real time = 7 days simulation time
• Scale (down) workloads to fit – Smaller data sets– How to make representative of full runs?– Tricky problem in its own right
Using Simulation
ESSES, 4 Sept 200349
Software Development
• Low-level software development– Supervisor-level (OS) & interrupt code debug– Inspection of system state– Device access tracing & breakpoints– Debugging unfinished operating systems– Developing drivers
• High-level software development– Powerful debugger, with checkpointing
ESSES, 4 Sept 200350
Hardware Replacement
• Embedded HW– Cheaper, more convenient, available, stable– Often 10000+ USD development platforms
• Virtual platform for early software dev– Boards under development– AMD64 (Hammer, Opteron, Athlon 64)
• Saved months for the Linux/AMD64 ports
– Next-gen UltraSparcs, ...
ESSES, 4 Sept 200351
Hardware Development
• Model hardware in development• Test components before physical prototype• Stimulate HW with real workloads
– Requires ability to run operating systems• HW/SW cosimulation
– At various levels of detail• Shortens time to market dramatically
ESSES, 4 Sept 200352
Parallelization of Development
Board design Board prototype
Software development
Handoff to the software team, when ”working” hardware exists
Board design & build simulator
Board prototype
Software developmentHandoff to the software team, using
a simulation of the hardware platform
Simulator = reference
ESSES, 4 Sept 200353
Network Software
• Develop network stacks & protocols– Easy to instrument the network, trace traffic– Easy to inject packets – No interference from other traffic– Synchronous breaks at important events
• Try network configurations– Large networks– Pathological topologies
ESSES, 4 Sept 200354
Performance Tuning
• Performance tuning of software– Trace & statistics on performance events– Cache misses, TLB misses, disk accesses– Memory access patterns– Get first-order estimates from event counts
• Absolute performance measurements– Requires very detailed models– Not a design goal of large-scale simulators
ESSES, 4 Sept 200355
Faults and Boundary Cases
• Fault injection– Repeatable, no physical damage necessary– Fault tolerant systems, safety critical systems– Examples: next slide
• Boundary case testing– Extremely small or large configurations– Intense bursts of interrupts– Communications latencies and intensity
ESSES, 4 Sept 200356
RAM
CPU
Network
Sensor
Device
BridgeDevice
Fault Injection Examples
Transient errors in xmit
Corrupt register values
Permanent
bit errors
Corrupt measurements
Kill entire
subsystem
Corrupt network
packets; unplugUnplug a
device
ESSES, 4 Sept 200357
Restore checkpoint
Check results
Run 2
Fault Injection & Checkpointing
Boot system SW
Position workload
Check results
Take checkpoint CKP
injectedfault
Restore checkpoint
Check results
Run 1
Did the fault
affect the result?
ESSES, 4 Sept 200358
Teaching
• Enable hands-on experience• Computer architecture• Embedded systems programming• Operating systems
– Debug half-finished systems– Same setup for all students, easy handins
• System management– Easy to restore system state– No risk to real machines and networks
Simulate with Care
ESSES, 4 Sept 200360
Obtaining Significant Results
• Computer architecture research– 90% or more done in simulation– Measure of success:
effect of modification to a reference machine– What is a significant result? -5%?+10%?
• How real is the machine modified?– SimpleScalar is not a real processor
• = need for quite extensive modeling
ESSES, 4 Sept 200361
Wisconsin Experiments
• Mark Hill et al, IEEE Computer Feb 2003• Investigating potential pitfalls of simulation• Detailed microarchitectural modeling
– Pipeline, caches, reordering, the works– Randomized L2 miss time (80-89 cycles)
• Several runs with same workload• Variable results!
ESSES, 4 Sept 200362
Wisconsin Experiments
• WCR (16,32) = 18% (“Wrong Conclusion Ratio”)• WCR (16,64) = 7.5%• WCR (32,64) = 26%
16 32 64
ROB Size
2.5
3.0
3.5
4.0
Cycle
s Per
Tran
s. (m
illion
s)
maxavgmin
ESSES, 4 Sept 200363
Wisconsin Experiments
5 10 15 20
Sample Size (number of runs)
2.6
2.8
3.0
3.2
3.4
Cycle
s Per
Tran
s. (m
illion
s)3264
ESSES, 4 Sept 200364
Wisconsin Experiments
• Conclusions:– Simulation no different from runs on real HW– Use standard statistics– Non-overlapping confidence intervals
• Danger of determinism in simulation– Testing a single path of a program– Induce variability by randomization
Implementations
ESSES, 4 Sept 200366
Full-System Simulators
• Integrated full-system simulators– Virtutech Simics (better at abstraction)– Virtio Virtio (more on the pin-level)
• Frameworks for combining discrete simulators– Combine ISS with VHDL and Verilog simulators– Mentor Graphics Seamless– Cadence Incisive
• Virtualization environments– VmWare VmWare (fast PC-on-PC)– Connectix/Microsoft VirtualPC (fast PC-on-PC)
Almost done...
ESSES, 4 Sept 200368
Cost of Simulation• Sim in 1977: on DEC VAX-11/780
– 200,000 USD (1977 dollars)– 1 VAX MIPS– simulation technology ~200– cost for simulated server hour: 4,000 USD
• Sim in 2002: on Dell PC (P4 2.2 GHz)– 1,500 USD– approx 3100 VAX MIPS– simulation technology ~40– cost for simulated server hour: 2 USD
… a factor of 2000 in 25 years!
Photo courtesy of The Computer Museum History Center
Phot
o co
urte
sy o
f Int
el Windows XP/64 on AMD Hammer
Windows NT on x86VxWorks on
PowerPC
Linux on x86
Linux on Itanium
Solaris on Sun SunFire
All running on a Linux host
ESSES, 4 Sept 200370
Demo Time!
• Booting Linux • Lifting checkpoints of Windows and Solaris• Kernel debugging• IO access history• Configuration file syntax• ... and more ..
Thank You
http://www.virtutech.comhttp://[email protected]