defense presentation 4th final.ppt [Read-Only] · MS Thesis Presentation - June 7th, 2010...
Transcript of defense presentation 4th final.ppt [Read-Only] · MS Thesis Presentation - June 7th, 2010...
Doğan Fennibay MS Thesis Presentation - June 7th, 2010
Supervisor: Assoc. Prof. Arda Yurdakul
Motivation
System-level Modeling More integrated systems HW & SW modeled together Models larger but more abstract Component-based strategies
2010-06-07 Fennibay Slide 2
Hardware-in-the-Loop (HiL) DO NOT model real subsystems Integrate real and virtual worlds Avoid modeling complex systems Avoid modeling effort for implemented/
off-the-shelf components Increase modeling accuracy More realistic test beds
Trends in embedded systems More connected with others Increasing use of off-the-shelf
components ⇒ Real/implemented
components become important for developing models
Achievements
Published work:
Fennibay, D., Yurdakul, A. and Sen, A., “Introducing Hardware-in-Loop Concept to the Hardware/Software Co-design of Real-time Embedded Systems”, Proceedings of the seventh IEEE International Conference on Embedded Software and Systems, Bradford 29 June-1 July 2010, pp. tbd.
Fennibay, D., Yurdakul, A. and Sen, A., “Hardware-in-the-loop for hardware/software co-design of real-time embedded systems” (Poster), DATE’10 Workshop: Designing for Embedded Parallel Computing Platforms: Architectures, Design Tools, and Applications, 8-12 March 2010.
Fennibay, D., Yurdakul, A. and Sen, A., “Endüstriyel Uygulamalar için SystemC ile Döngü İçinde Donanım”, Proceedings of the Fourth Ulusal Yazılım Mühendisliği Sempozyumu, İstanbul, 8-10 October 2009, pp. 219-226.
Commercial use:
Not yet
A workshop is planned with Siemens Germany to evaluate the usability
2010-06-07 Fennibay Slide 3
Outline
• Introduction
• Problem definition • Preliminaries
• Related work
• Solution • Experimental evaluation
• Conclusion
• Discussion
2010-06-07 Fennibay Slide 4
Introduction
Standardized: IEEE 1666-2005
Wide use Fit for system-level
Modular Transaction-level modeling
Domain: Industrial Communication • Strict hard-real-time constraints
• Data exchange rate (10 KHz) is achievable
• System-level design is a new trend in the domain
2010-06-07 Fennibay Slide 5
Problem definition
• Communication between real and virtual subsystems • Virtual to real communication & real to virtual communication
• Real-time behavior of virtual subsystems • Determinism & speed
2010-06-07 Fennibay Slide 6
Preliminaries: Simulation
Discrete Event Simulation • Events and passing time
abstracted from each other • Simulation clock advanced in
discrete intervals • Part of “State” • According to Event Queue
Real-time simulation • Simulation clock TS • Wall clock TW
Synchronize • TS – TSstart = TW – Twstart • External events
• Not in event queue • Simulation clock cannot be
advanced according to those
2010-06-07 Fennibay Slide 7
Preliminaries: SystemC kernel
Discrete event simulator • Delta-cycle used for
concurrency modeling in a single thread
• 0 simulation time advance during delta-cycles
2010-06-07 Fennibay Slide 8
Problem! No 0 time advance in real-time
Related work: Existing HiL methods
Commercial devices and tools for test • xPC Target, Real-Time
Windows Target by MathWorks
2010-06-07 Fennibay Slide 9
...but they are orthogonal to our work: Introduction of HiL to the System-
level HW/SW co-design of embedded systems
...a new field!!!
Related work: Integrating different environments
CHILS • Chip HiL Simulation Real-time constraint
relaxed Exchange period adjustable
Only for processors Via Remote Debugging Interface
2010-06-07 Fennibay Slide 10
Related work: Timing concerns & determinism
Virtual Chip • Operational Buffer Unit to
accommodate for the speed difference
Realtimify Real-time execution of
SystemC No concern w.r.t. determinism Uncertainty about
synchronization point ⇒Suitable for human observation,
not for HiL
RTAI (used by Lu et. al.) • Time sharing with Linux
kernel RT_PREEMPT • Increase determinism of
Linux Kernel 2010-06-07 Fennibay Slide 11
Solution
2010-06-07 Fennibay Slide 12
SystemC is insensitive to external events!!!!
Solution: Achieving Real-Time Behavior Bind simulation clock to wall clock
• Simulation clock • Advance tS → tSnew
• Wall clock • Current time: tWactual • tWpassed in delta-cycles & outputs • Delay by tWdelay
⇒ tWnew – tW = tSnew - tS
2010-06-07 Fennibay Slide 13
TS – TSstart = TW – TWstart
simulation step n - 1 simulation step n
simulation step n + 1
delta cycles outputs time advance
tS tSnew
tW tWnew
tWdeltacycles tWoutput tWdeltaend tWactual
tWdelay
tWpassed
• GPOS with real-time improvements • Real-time scheduler • Increased preemptibility • Priority inheritance protocol • High resolution timers
• Simulation thread set to real-time priority • Latency sources eliminated • Swap memory, power mgmt. etc.
2010-06-07 Fennibay Slide 14
Solution: Achieving Real-Time Behavior Outputs in real-time
2010-06-07 Fennibay Slide 15
Solution: Achieving Real-Virtual Communication: Hybrid channel
Connect real and virtual worlds • Virtual: SystemC interface • Real: I/O driver
2010-06-07 Fennibay Slide 16
Solution: Achieving Real-Virtual Communication Concurrent outputs
Concurrent outputs in simulation sequential in real-time • We define two constraints to model the limitation
• All concurrent outputs must occur in an output window tWow • Some concurrent outputs must occur in a smaller critical output window tWcowi
Solution • Use HW support when available • Enforce a strict ordering of output operations
2010-06-07 Fennibay Slide 17
tWow & tWcowi parameterized by model developer
Solution: Achieving Real-Virtual Communication Sensitivity to External events
Soln1: Polling • Most simple • Tradeoff:
simulation performance vs. I/O latency
2010-06-07 Fennibay Slide 18
Soln2: Adaptive polling • Change polling period dynamically • PID control
2010-06-07 Fennibay Slide 19
Solution: Achieving Real-Virtual Communication Sensitivity to External events
Soln3: Event-driven • Patch in SystemC kernel • Events detected by input threads put in a
special queue • SystemC patch interrupts the wait and notifies
the events 2010-06-07 Fennibay Slide 20
Solution: Achieving Real-Virtual Communication Sensitivity to External events
Solution: Achieving Real-Virtual Communication Sensitivity to External events
Advantages Disadvantages Polling Simple
implementation Fastest
Tradeoff necessary
Adaptive polling No complex OS constructs
Tuning necessary
Fully event-driven No tradeoff or tuning necessary
Complex implementation Uses complex OS constructs
2010-06-07 Fennibay Slide 21
A mathematical model to estimate execution performance
Virtual subsystems • Must show sufficiently realistic behavior • e.g. run in real-time
• Remain loyal to the simulation model • e.g. concurrent outputs must be concurrent enough in real-time ⇒ A mathematical model to estimate the simulation’s execution
2010-06-07 Fennibay Slide 22
Mathematical model: Real-time simulation
For tWdelay ≥ 0:
2010-06-07 Fennibay Slide 23
SystemC is based on C++ Very hard to statically determine tWevaluate and tWupdate Instrumentation and profiling can be used
Measure tWpassed / (tSnew – tS) if > 1 model not real-time capable
simulation step n - 1 simulation step n
simulation step n + 1
delta cycles outputs time advance
tS tSnew
tW tWnew
tWdeltacycles tWoutput tWdeltaend tWactual
tWdelay
tWpassed
Mathematical model: Concurrent outputs
Simplifying assumptions • All outputs done by
async. threads • Trigger to output threads
similar for all hybrid channels
• Number of operations at a simulation time does not exceed the number of processing cores
• All hybrid channels use update_real for output
2010-06-07 Fennibay Slide 24
⇒All operations take the same trigger time tWo in simulation thread ⇒Real output operation time tWdi differs
Mathematical model: Concurrent outputs
Output window
Critical output window For exclusive subsets of
outputs Critical outputs can be
ordered successively ⇒For a subset of m hybrid channels
2010-06-07 Fennibay Slide 25
nH: # hybrid channels tWo: measured as a platform characteristic
Experimental evaluation: Simulation speed & determinism
Pulse width modulation (PWM)
2010-06-07 Fennibay Slide 26
Experimental evaluation: PWM Results
• Signal stable up to 10 KHz • Only minor effect of CPU load ⇒ real-time scheduler works fine
• The ratio tWpassed / (tSnew – tS) is reflected in the output jitter ⇒ math. model works
• Computation power still available, but jitter prevents higher freq.
2010-06-07 Fennibay Slide 27 (a) Max jitter / desired period, (b) tWpassed / (tSnew – tS)
Waveform of (a) 10 KHz desired freq., (b) 100 KHz desired freq. (persistence = infinite)
%0,0
%5,0
%10,0
%15,0
%20,0
%25,0
0,01 0,1 1 10 Desired frequency [KHz]
w/o CPU load
with CPU load
0,00%
5,00%
10,00%
15,00%
20,00%
25,00%
30,00%
35,00%
40,00%
45,00%
0,01 0,1 1 10 Desired frequency [KHz]
max (w/o CPU load)
max (with CPU load)
avg (w/o CPU load)
avg (with CPU load)
(a)
(b)
Experimental evaluation:, I/O Performance
Experiment: Ethernet round-trip time (RTT)
2010-06-07 Fennibay Slide 28
Experimental evaluation: RTT results
Simple polling • Poling period has bigger effect
than frame size • Frame size has linear effect • 10 ms cycle usable, 1 ms
cycle usable with low polling period
Adaptive polling • PI control in [10 µs; 10 ms] • Complex behavior
• Longer frames decrease polling period, but increase transmission time
• 1 ms cycle usable
2010-06-07 Fennibay Slide 29
RTT for simple polling
RTT for adaptive polling
0 µs 50 µs
100 µs 150 µs 200 µs 250 µs 300 µs 350 µs 400 µs 450 µs 500 µs
64 bytes 780 bytes 1514 bytes
frame size
min
max
avg
0 µs 200 µs 400 µs 600 µs 800 µs
1000 µs 1200 µs 1400 µs 1600 µs 1800 µs
64 bytes, 100 µs
64 bytes, 1000
µs
780 bytes, 100 µs
780 bytes, 1000
µs
1514 bytes, 100 µs
1514 bytes, 1000
µs
frame size, polling period
min
max
avg
Experimental evaluation: RTT results
Event-driven • No tradeoff necessary • No tuning necessary • Max RTT not affected by
frame size • Performs worse than
adaptive polling for larger frames
• There is an inherent latency in the more complex OS constructs employed
2010-06-07 Fennibay Slide 30
RTT for event-driven
RTT for adaptive polling
0 µs 50 µs
100 µs 150 µs 200 µs 250 µs 300 µs 350 µs 400 µs 450 µs 500 µs
64 bytes 780 bytes 1514 bytes
frame size
min
max
avg
0 µs
200 µs
400 µs
600 µs
800 µs
1000 µs
1200 µs
64 bytes 780 bytes 1514 bytes
frame size
min
max
avg
Experimental evaluation Concurrent Outputs
MultiPWM Experiment • Performance of concurrent outputs with output ordering
MultiPWM with HW Support Experiment • Performance of concurrent
outputs with HW support
2010-06-07 Fennibay Slide 31
Experimental evaluation: Concurrent output results
No HW support • Successive ordering gains
6.7x smaller difference • Mathematical model works:
(nH – 1) → (m – 1) ≅ 6 → 1 • Maximum difference gains
10.7x smaller difference • The possibility to be caught by
the worst system latency smaller
• HW support achieves 100% concurrence
2010-06-07 Fennibay Slide 32
Waveform of 2 signals for (a) with 5 channels inbetween, (b) successive (persistence = infinite)
(a) (b)
Time difference between signals in µs
0 2 4 6 8
10 12 14 16 18 20
output ordering, dist = 6
output ordering, dist = 1
avg max min
Experimental evaluation: Real-Life Experiment
BBMD Experiment • BACnet Broadcast Management Device model in SystemC • Non-timed transaction-level model • Traffic
• Between management station and automation stations • Peer-to-peer traffic among
automation stations
2010-06-07 Fennibay Slide 33
Experimental evaluation: BBMD results
Non-timed transaction-level BBMD outperformed the real BBMD • Average response time: up to 80x better • Incoming packet burst: 2000 packets/s
• 67% drop at real BBMD • No drops at virtual BBMD
2010-06-07 Fennibay Slide 34
Conclusion
Hardware-in-the-loop concept for HW/SW co-design of embedded systems • Developed • Implemented • Experimentally evaluated
Hybrid channel • Good encapsulation • Very generic, can
implement any kind of communication
• Clear interface to SystemC model
2010-06-07 Fennibay Slide 35
Real-time patch • Non-intrusive • Adequate level of
determinism reached with common tools
Conclusion
Use in new domains may multiply • More powerful modeling
platforms • Improvements in simulation
speed: parallelism, optimization
External events • Event-driven implementation could perform better on an RTOS
• Adaptive polling can be tuned better via the provided PID mechanism
2010-06-07 Fennibay Slide 36
Determinism requires manual tuning • RT_PREEMPT is constantly
being improved, better tools are developed
Conclusion
Mathematical model • Usable for estimating if our
method will work for a given SystemC model
• Needed empirical data is easy to obtain
• Also usable for understanding bottlenecks in the SystemC model
Future work: scalability • Test of our method with
larger SystemC models • BBMD experiment is
realistic, yet smaller than industry-scale models
• Workshop in Germany will focus on this aspect, too
• Mathematical model can also be leveraged to estimate scalability
2010-06-07 Fennibay Slide 37
Discussion
Thanks for your attention
Contributions & questions are welcome
Doğan Fennibay [email protected]
2010-06-07 Fennibay Slide 38
Appendix
2010-06-07 Fennibay Slide 39
Transaction-level modeling
High-level, more abstract modeling Do NOT model every register transfer. Model transactions at the bottom level. TLM serves as a golden copy for less abstract RTL modelers.
2010-06-07 Fennibay Slide 40
Hybrid channel examples
2010-06-07 Fennibay Slide 41
Experimental evaluation
SW Platform • Linux 2.6.31.6-rt19
• In RT_PREEMPT mode • SystemC 2.2.0
• + Our real-time patch • CPU load via multiple instances of infinitely spinning shell script
HW Platform • Dual Intel Quad-Core XEON at 3.4 GHz
• Intel Pentium 4 HT at 3.2 GHz (only in BBMD Experiment)
2010-06-07 Fennibay Slide 42
Preliminaries: SystemC
System-level modeling Transaction-level to register-transfer level Constructs • Channel: communication • Process: Execution paths
• Method, thread, clocked thread • sc_event: synchronization • Interface: Entry point to a
channel • Port: Exit point from a
module
2010-06-07 Fennibay Slide 43
Different from the event in a discrete event simulator
Solution: Output timing
2010-06-07 Fennibay Slide 44
Moment Advantages Disadvantages Evaluate e.g. write
Data does not wait at all Data must not change in later cycles (e.g.. sc_fifo)
update The final data of concurrent processes is used (e.g. sc_signal)
Delta cycle processing time increases Concurrent outputs are distributed wider in real-time.
Time advance update_real
Less number of outputs in total Concurrent outputs calculated by delta cycles are gathered together in real-time
Glitches occurring at the end of delta cycles are not relayed to outside
Sys
tem
C s
tand
ard
Our
con
tribu
tion