On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1...
-
Upload
joel-landon -
Category
Documents
-
view
216 -
download
2
Transcript of On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1...
![Page 1: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/1.jpg)
On Workload in an SCA-based System, with Varying Component and Data Packet Sizes
Tore Ulversøy1
Jon Olavsson Neset2
1FFI1UNIK University Graduate Center1University of Oslo (UiO)2Norwegian University of Science and Technology (NTNU)
![Page 2: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/2.jpg)
Outline
• Background and Problem Definition• Empirical Analysis• Analysis using Low-Complexity Analytical Models• Conclusions
![Page 3: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/3.jpg)
Background
• The base code of one of the waveform applications used in the following originate from a member in, and the waveform application is also used for other activities in
• The Regular Task Group on SDR founded below the RTO-IST-080
RTO-IST-080 RTG-038 Software Defined Radio currently, the team consists of experts from government,
university and industry fromCA, DK, GE, HU, IT, NL, NO, SP, TU, US and: SDR Forum
headed by NL (Chairman: Hans Segers, TNO)
![Page 4: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/4.jpg)
Background: Main Objectives of RTG-038
© IBM/Levono
© Rockwell Collins
… 011010 …
© Spectrum Signal Processing
• Share knowledge & experience of (multi)national SDR/SCA developments
• Report on possibilities of sharing waveforms and waveform components
• Investigations of portability and interoperability:
SCA-based implementation of STANAG 4285 waveform
demonstrate portability onto national SDR platforms
demonstrate interoperability between the different implementations
1.
2.
3.
![Page 5: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/5.jpg)
Problem Definition and Problem Background
•SCA defines an environment that allows applications to be built as compositions of SW components (and devices)•SCA defines a distributed system, communication through CORBA for CORBA-capable processors•There is wide freedom as to how small components to split the application into: Many small components reuse of components becomes easier, but CPU overhead increases
Processor 1 Processor 2 Processor 3
C6 C7 C8 C9 C10
C1 C2 C3 C4 C5
Application/Component View:
Physical View:
•What are the CPU overhead effects of a fine structure (many components) relative to a course one (few components), and how can we predict this overhead?
C_tot
![Page 6: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/6.jpg)
Analysis Approach:
CPU workload implied by a task or a group of tasks = the fraction of available processor cycles occupied over a time period
Increasing accuracy
Analytical model
Increasing clarity / simplicity
Concurrency model, e.g. Petri-net
Simulation model
Measurements on testbed
Measurements on actual system
![Page 7: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/7.jpg)
Empirical Analysis
• Using OSSIE (Open Source SCA Implementation Embedded) [2] from VirginiaTech which uses omniORB [3]– Advantages: Low user-threshold, full source-code available,
Linux-based• Profiling and monitoring tools:
– OProfile [4]– SYSSTAT sar [5]
Experiment: Stanag 4285
Experiment: Synthetic waveform
OS Linux 2.6.9-34.EL Linux 2.6.9-34.EL OSSIE revision 0.6.0 0.6.2 Processor Pentium M 1.86GHz Pentium M 1.86GHz RAM 1,5 GByte 1,5 GByte Cache specifics L1i=first level instruction cache L1d=first level data cache L2=second level cache
L1i=32kB L1d=32kB L2=2MB
L1i=32kB L1d=32kB L2=2MB
![Page 8: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/8.jpg)
Empirical Analysis, Simple Waveform Application
• Stanag 4285, TX part. Base code provided by Telefunken Racoms for RTO-IST-080 RTG-038
• Implemented as three different configurations, all performing the same processing functional work
• Non-SCA ‘c’ version as a reference
Data Sink
Stanag4285 TX
Float to fixed converter
Forwarder Forwarder Forwarder Forwarder
Data Source
FEC Encoder
Inter-leaver
Symbol Mapper & Scrambler
Symbol to I/Q & TX Filter
Data Sink
Data Source
FEC Encoder
Inter-leaver
Symbol Mapper & Scrambler
Symbol to I/Q & TX Filter
Float to fixed converter
Data Sink
2 comp.
7 comp.
11 comp.
Packet rate regulator
![Page 9: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/9.jpg)
Empirical Analysis, Stanag 4285 TX: Results, User
0
5
10
15
20
25
30
2395 25600 64000
Symbol rate
Use
r C
PU
% Non-SCA
Single component+sink
6-components+sink
10-components + sink
WL measured by SYSSTAT sar (sar –u 40 5)
![Page 10: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/10.jpg)
Empirical Analysis, Stanag 4285 TX: Results, User+System
0
5
10
15
20
25
30
35
40
45
2395 25600 64000
Symbol rate
Use
r +
Sys
tem
CP
U %
Non-SCA
Single component+sink
6-components+sink
10-components + sink
WL measured by SYSSTAT sar (sar –u 40 5)
![Page 11: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/11.jpg)
Empirical Analysis, Synthetic Application
• A total of 9 FIR-filters, N taps and packet size B (NxB mult/adds per FIR)
• Both N and B can be varied
• 4 different configurations
• ‘c’ version as a reference
W2:
W3:
W5:
W11:
SRC F1TO9 SNK
FTOT SNK
SRC F123 F456 F789 SNK
F5 F6 F7 F8 F9
SRC F1 F2 F3 F4
SNK
Packet rate regulator
![Page 12: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/12.jpg)
Synthetic Application: WL versus Configuration and N
WL results measured by SYSSTAT sar (sar –u 40 3)
CPU Workload versus ConfigurationB=2000, PR=40
0
5
10
15
20
25
30
35
40
45
50
FUNC N=10
W2 N=10
W3 N=10
W5 N=10
W11 N=10
FUNC N=50
W2 N=50
W3 N=50
W5 N=50
W11 N=50
Configuration
CP
U W
L [
%]
System
UserRatio, user: 1,67Ratio, user+syst: 1,94
Ratio, user: 1,10Ratio, user+syst: 1,17
![Page 13: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/13.jpg)
Synthetic Application: WL versus Packet Size
CPU Workload versus Packet Size
0
5
10
15
20
25
30
35
0 10000 20000 30000 40000 50000 60000 70000
Packet Size
CP
U W
L [
%]
FUNC U
FUNC U+S
W3 U
W3 U+S
W5 U
W5 U+S
W11 U
W11 U+S
W2 U
W2 U+S
•Packet rate: 10/sec
•N and B selected such that ‘C’ implementation (FUNC) is at 10±0,3% user CPU WL
•WL overhead is seen to increase significantly with B
![Page 14: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/14.jpg)
The Simple Lower Bound Model (SLBM)
• Ideal, unrealistically optimistic model
• Serves as a lower bound
• ti = number of cycles per packet
100
)2()1()1(9%
PCR
tMtMtMtttPRWLi TFTSpacketSNKSRCCL
CN CN+1
TStTFt
packettCLt CLt
tSRC
tSNK tpacket tTS tCL tTF tpacket
tTS tpacket tCLtTF tTS
Idle
On timer strobe
![Page 15: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/15.jpg)
Parameters in the Simple Lower Bound Model
CORBATSTSRC CORBATSTSINK
•For simplicity, we measure the parameters in the model with OProfile and/or SYSSTAT sar, using test applications:
for (i=0; i < BLSZ; i++)
......
‘c’-prog
SRCt SNKt
TSt(B) TFt
(B) packett(B)
sar estimate Assumed 0
Assumed 0
14,5*B 10,6*B 79200+5,5*B User
OProfile Estimate
650 200 14,6*B 10,6*B 80700+5,5*B
sar estimate Assumed 0
Assumed 0
≈ 0 ≈ 0 160000+23,6*B System
OProfile Estimate
≈ 0 ≈ 0 ≈ 0 ≈ 0 150000+23,0*B
CORBA test application:
![Page 16: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/16.jpg)
Results, SLBM
• The simple model describes the dominating part of the user CPU overhead. Agreement best for small packet sizes
0 1 2 3 4 5 6
x 104
10
12
14
16
18
20
22
24
B
WL
Use
r [%
]
Comparison of W11 Measured Data and Simple Lower Bound Model
Packet SizeN and B selected such that ‘C’ implementation (FUNC) is at 10±0,3% user CPU WL
Measured
SLBM
M=11, Packet rate =10/sec
![Page 17: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/17.jpg)
The Context Switch Model (CSM)
100%%
PCR
ttCSRWLiWLcs CSICSD
Context Switch Rate [switches/second]
CS Direct Cost [cycles]
Cycle rate of the processor
CS Indirect Cost [cycles]Here: Using course estimate based on addressed space and memory speed
Here: ≈ 5µsec = 9300 cycles
Measured ≈ 1300 for example next page
![Page 18: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/18.jpg)
Results, CSM
• With the CS model, we better explain the measured WL
0 1 2 3 4 5 6
x 104
10
12
14
16
18
20
22
24
B
WL
Use
r [%
]
Comparison of W11 Measured Data and CS Model with Approximate Parameters
Measured
CSM, only tCSD, course estimate
CSM example (course parameter estimates)
M=11, Packet rate =10/sec
N and B selected such that ‘C’ implementation (FUNC) is at 10±0,3% user CPU WL
Packet Size
![Page 19: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/19.jpg)
Conclusions:
• We have used empirical analysis and simple analytical models to understand the effects of granularity in an SCA-based system w/CORBA capable processors
• When executing the same total functional processing work, we observe that the processor workload increases as the number of components increases
• This overhead increases with data packet size, and becomes more dominant the lesser the functional work per packet
• Contributors: Data conversions, packet communication through CORBA, direct cost of context switches, indirect cost of context switches
• Hence the scalability and reusability benefits that result from implementing the SDR-application with a high number of components, must be balanced against the processing efficiency loss that occurs when having to run several components on the same processor
• Two simple models are described that help explain the major effects, and may be used to calculate the overhead
![Page 20: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/20.jpg)
References:
[1] P. J. Fortier and H. E. Michel, Computer Systems Performance Evaluation and Prediction. Amsterdam: Digital Press, 2003.
[2] VirginiaTech, OSSIE development site for software-defined radio, http://ossie.wireless.vt.edu/trac as of Dec. 20 2007
[3] omniORB, http://omniorb.sourceforge.net/ as of Feb. 29 2008
[4] OProfile - A System Profiler for Linux, http://oprofile.sourceforge.net as of Feb. 29 2008
[5] SYSSTAT, http://pagesperso-orange.fr/sebastien.godard/ as of Feb. 29 2008
[6] Chuanpeng Li, Chen Ding, and Kai Shen, "Quantifying The Cost of Context Switch," in ExpCS '07: Proceedings of the 2007 workshop on Experimental computer science, San Diego, CA, 13-14 June 2007.
![Page 21: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate.](https://reader035.fdocuments.us/reader035/viewer/2022062511/55163d63550346c6758b5275/html5/thumbnails/21.jpg)
Questions?