Workshop - November 2011 - Toulouse
A.BERJAOUI (AKKA IS for Astrium) A.LEFEVRE & C. LE LANN
(Astrium)
SystemC/TLM virtual platforms Use of SystemC/TLM virtual platforms for the exploration, the specification and the
validation of critical embedded SoC
2
OverviewContext
Separation of time & functionality presentationTimed TLM models Vs CABA models
Design Space Exploration with SystemC/TLM 2.0HW in the loop – Use of CHIPit®
Future prospectsOpen questions
Context
Define a proper method to use SystemC/TLM for SoC modelling
Use SystemC/TLM for DSE (performance estimation, bottleneck identification…)
Use SystemC/TLM models for HW specification
Evaluate the selected methodology
SystemC/TLM Usage Context
Define a proper method to use SystemC/TLM for SoC modelling
Use SystemC/TLM for DSE (performance estimation, bottleneck identification…)
Use SystemC/TLM models for HW specification
Evaluate the selected methodology
Programmer’s View (PV) or functional simulationTime is not represented, only functionality is
modelled.
Functional synchronization is necessary. It is done at System Synchronization Points (SSP): configuration registers access, interrupts and all state alternating accesses.
The need for time
Performance measurementsDesign Space Exploration
…how ???Precision?Modelling granularity?Simulation performance?
The obvious solution: mixing time and functionalityIt works !!!
…but…Functional modifications cannot be verified without
having to verify all timed aspects as wellModelling granularity is hard to modify once it has
been setModules cannot be easily reused for other platforms
8
Separation of time & functionality
Initiator port
Target port
MemoryT
ISSPV
ISS PVTMemory PVT
PV routerMemory
PV
ISST
Detailed busmodel
ISS Router Memory
Functional simulation phaseTimed simulation phase
9
ISSPV
PV routerMemory
PV
Initiator port
Target port
ISST
ISS PVT
Detailed busmodel
MemoryT
Memory PVT
T= 0 nsT= 1 nsT= 2 nsT= 3 nsT= 4 nsT= 5 nsT= 6 nsT= 7 nsT= 8 nsT= 9 nsT= 10 nsT= 11 nsT= 12 ns
Advantages and limitationsPV & T mixed
Modelling is “natural”. Platforms are simple.
Interrupts can be modelled easily
Granularity is fixedMixed debugging and no control
over simulation performanceReuse problem
PV & T separated
Parallel development and debug of reusable PV and T models
Granularity can be controlled easily (by changing T model)
Modelling is more abstract. Platforms are complex
Interrupts are harder to model
TTP in the industry
Modelling is too complex to be used by architectsModules are not re-used enough to justify such a
modelling effortTraffic generators are enough for DSE. Detailed
functionality does not need to be specified for performance estimation.
HW specification is easier using cycle “approximate”/bit accurate models
In its current form, TTP cannot be used on an industrial scale:
Timed TLM vs CABA modelsDifferent time modelling granularities:
CABA in HDL => available, but slow simulationsCABA in SystemC => not interesting (not available and
slow simulations)Timed TLM (SystemC AT) => preferred
A timed TLM model of an existing RTL IP has been build to evaluate the methodology and assess the necessary effort
RTL IP chosen = SDRAM memory controller, because:this is a central module in SoC architecture explorationsits timing behaviour is harder to determine than other
modules’ (AHB buses for example)
SDRAM Memory ControllerThe Memory Controller is the
interface between the SoC bus and the external (on-board) memories
One access latency depends on: the access parameters the controller internal state
Objective for the timed model : the model should be
pessimistic=longer than the RTL+0 to +20 % timing accuracy
MCTL
AH
B SDRAM
SRAM
EEPROM
Time analysis methodology
IDLE
ACTV
WRITEWRITE_SCRUB
READ_RMW,READ,
READ_SCRUB
ALL_PRE EARLYPRE SEARLYPRE
ALL_PRE(latepre)
RMW_RSENCODE
PWDOWN
RTL analysisRTL is composed of intricate cycle-
based State MachinesRequires manual extraction of
timing rulesMay need to duplicate the RTL
FSM in the TLM model Not interesting
Macroscopic analysisUsing RTL simulations to produce
timing informationEither guided
statistics choiceOr semi-automated
using scripts Elected method
Macroscopic time analysisGuided time analysis
Timing data is extracted from RTL simulations(traces of all the timings + relevant parameters)
Rules are guessed by manually analyzing the traces……and then automatically tested against a calibration test setThis process iterates until the timing accuracy is satisfactory
Results of the time analysis iterationsThe parameters of the previous access also have a major
impact (in addition to the parameters of the current access)Some features interfere (refresh and automatic scrubbing)
Timed Model ValidationThis timing model has been
checked against RTL on an extensive test setmore than 86000 transactionscomes from the RTL validation
test suite
Frequency Mistimed transactio
ns
Latency error
32 MHz 18% 12%
48 MHz 14% 17%
64 MHz 14% 18%
96 MHz 17% 17%
Validation resultsThe model is pessimistic (longer than the RTL)Latency error between 12%-18%
The model is too simple to be 100% exactBut the goal is to keep a high level of abstractionPossibility to increase the accuracy if necessary
17
OverviewContext
Separation of time & functionality presentationTimed TLM models vs. CABA models
Design Space Exploration with SystemC/TLM 2.0HW in the loop – Use of CHIPit®
Future prospectsOpen questions
Design Space Exploration with SystemC/TLM 2.0A simple image processing platform has been
designed to assess the use of SystemC/TLM for design space exploration
AlgorithmImage spectral-compression
platformPerforms “subsampling” on
incoming data packets
Subsampled packets are then transferred to an auxiliary processing unit which performs a 2D-FFT (using a co-processor) and data encoding
Subsampling
Encoding
5N
10N
2D-FFT
5N
N
Input
Output
Processing platform
Mem_a
DMA_aLeon_a
Mem_b
Leon_bDMA_b
FFT
IO
Processing platform (cont’d)IO module generates an interrupt causing DMA_a to
transfer the input packet of size 10N to Mem_aAt the end of the transfer, Leon_a subsamples the
data and writes the result to Mem_aLeon_a configures DMA_b to transfer the result to
Mem_bAt the end of the transfer, Leon_b configures the
FFT module to perform a 2D-FFTLeon_b encodes the result and programs DMA_b to
send the result to the IO module
SystemC implementationTLM-2 compliant (time & functionality are mixed)Data exchange is AMBA – bus accurate
(single/burst transactions, split)Data sizes are respected and packets are
identified by a packet ID.The Leon processor modules act as “smart”
traffic generators: they generate transactions in the correct order towards the appropriate targets.
OS tasks are simulated using SC_THREADs
SystemC implementation (cont’d)No actual processing is performed. Processing time is
simulatedBus occupation, processing loads for all processing
units were measured accuratelyA system synchronization bug was identified => a “lock”
register has been added to lock DMA_b during its configuration
It was possible to observe the impact of the modification of HW parameters and the input data rate. DMA_a was identified as a bottleneck.
ABV could also be implemented using ISIS
Example
25
OverviewContext
Separation of time & functionality presentationTimed TLM models vs. CABA models
Design Space Exploration with SystemC/TLM 2.0HW in the loop – Use of CHIPit®
Future prospectsOpen questions
HW in the loop – use of CHIPitCHIPit
Virtex-based development platformCustom extension boards (SDRAM, Flash, IO, …)UMRBus = practical & fast PC-CHIPit ready-made interface
HW in the loop – use of CHIPitCHIPit can be used for :
Incremental validation flow SC/TLM testbench composed of multiple sub-blocks Some sub-blocks may run on hardware (FPGA) The others still run as software SC functional models Soft-hard inter-block transactions via UMRBus + extra SystemC/VHDL
Improved simulation speed 1000+ times faster is possible fewer soft-hard transactions = better improvement
CHIPit
soft
soft
soft
hard soft
hard
soft
HW in the loop – use of CHIPitWhat happens on a transaction ?
Uncontrolled clock mode HW clock keeps working during a transaction SW clock and HW clock are not synchronised Easy to implement
Controlled clock mode HW clock is stopped upon each transaction, waiting for soft SW clock and HW clock are synchronised on transaction bounds Needed if inputs/outputs must observe precise relative timings Harder to implement, more timing issues Not possible for all designs : complex designs require extra care
SDRAM controller needs constant auto-refresh Inputs from extension boards may need immediate treatment
HW in the loop – use of CHIPitUncontrolled clock example : whole system overview
Electronic board with inputs/outputs to other electronic systemsSDRAM for internal data storageASIC/FPGA for data processing
SDRAM memory
ASIC
Periph 1 Periph 2
Input 1
Input 2
Output 1
Output 2
Instrument 1
Instrument 2
Storage
RF comm
Electronic board
OBC
HW in the loop – use of CHIPitUncontrolled clock example : ASIC internal view
Data processing composed of several sub-blocksSub-blocks perform independent tasksSequenced altogether with very few signals (eg. req/ack)
SDRAM memory
ASIC
Processing 1
Memory controller
CoreProcessing 2 Processing 4
Sequencer
req/ack req/ack
Input 1
Input 2
RX TX
Output 1
Output 2
OBC
FIFO FIFOFIFO
HW in the loop – use of CHIPitUncontrolled clock example : ASIC re-modelling for HW
Sequencer control signals re-modelled as APB transactions Inter-block FIFOs splitted (FIFO->SDRAM and SDRAM->FIFO)FIFOs mapped on AHB buses at fixed addressesAdded DMAs to handle pipeline inputs and outputs from/to memoryDMA channels can perform any AHB transfer (eg. SDRAM<->FIFO)
SDRAM memory
ASIC
Processing 1
Memory controller
CoreProcessing 2 Processing 4
Sequencer
Input 1
Input 2
RX TX
Output 1
Output 2
FIF
O
APB
FIF
O
FIF
O
FIF
O
FIF
O
FIF
O
FIF
O
FIF
O
DMAs
AHBs
HW in the loop – use of CHIPitUncontrolled clock example : ASIC re-modelling for SC
Use of TLM2 transactions between blocksSDRAM+controller merged into a memory abstraction modelSDRAM access ports re-modelled as AHB buses
ASIC SystemC model
Processing 1
Memory model
CoreProcessing 2 Processing 4
Sequencer
RX
FIF
O
FIF
O
FIF
O
FIF
O
FIF
O
FIF
O
FIF
O
FIF
O
TXDMA DMA
AHB bus(es) model
DMADMA DMA
HW in the loop – use of CHIPitBenefits
Same C file used for both Gaut VHDL generation and SystemC full-soft emulation► intrinsic algorithm consistency between model and hardware
Few steps necessary from Gaut regeneration to FPGA synthesis and SC model compilation, scriptable for process automation► handy for fast algorithm exploration
Outcome: SystemC model executable, allowing choice at runtime between full-soft functional model and soft+hard co-simulation
$> scmodel SIMU input.bin output_simu.bin > log_simu.txt
$> scmodel CHIPit input.bin output_hard.bin > log_hard.txt
$> diff output_simu.bin output_hard.bin
$>
HW in the loop – use of CHIPitLimitations
Still have to develop SystemC+VHDL for each new transactor Limits whole process automation Encourages the use of common transactor types (AMBA, etc)
Controlled clock mode much more complex to implement Encourages the design of independent blocks, inter-connected via a
few FIFOs or via a common memory Blocks with strong timing requirements on IO hardly compatible with
uncontrolled clock mode (better design with intelligent IO behaviour : req+ack, handshake, etc)
Implementation limited to actual CHIPit resources SDRAM bus width is static (cannot test larger bus than available) Custom extension boards required as early as algorithm exploration
HW in the loop – use of CHIPitSceMi : the wanna-be standard for co-simulation
Formerly proposed by Cadence, now transferred to AcceleraDefines a C++ API for HW-SW co-simulation
Controlled clock / uncontrolled clock modes Function-based interface Pipe-based interface (C++ stream = hardware FIFO) Multi-threaded operation on software side
CHIPit SceMi library available Needs a supplementary licence Just a wrapper over UMRBus libraries to provide clock control All transactors still need to be coded by hand (SystemC+VHDL)
► still a lot of work to do before getting co-simulation working
36
OverviewContext
Separation of time & functionality presentationTimed TLM models vs. CABA models
Design Space Exploration with SystemC/TLM 2.0HW in the loop – Use of CHIPit®
Future prospectsOpen questions
Space industry applicability
SystemC/TLM is suitable for DSE with the use of HLS
Specification flow needs to be sorted out
Future prospectsImportant need in development infrastructure:
Abstraction layer (architects are not TLM2 experts)Interrupts and streaming modelling (TLM is currently a
memory mapped platform oriented protocol)Build and assembly tools are neededWell defined modelling guidelines should be established
Workshop - November 2011
Thank you
? ??
Any questions ?
Open questionsWho does the modelling? System, HW or SW
architect?
SW validation uses paper specs => Towards validation using HW based models in SystemC/TLM?
Towards a TLM3 standard? With embedded systems industrial partners such as Airbus and Astrium? (Business model?)
Top Related