Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is...
-
date post
21-Dec-2015 -
Category
Documents
-
view
219 -
download
2
Transcript of Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is...
Real-Time Address Trace Compression for Emulated and Real System-on-Chip
Processor Core Debugging
Bojan Mihajlovi´c, Željko Žili´cMcGill University
Dept. of Electrical and Computer EngineeringMontreal, Quebec, CanadaGLSVLSI’11, May 2–4, 2011
Presenter: Shao-Jay Hou
In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability to transfer vast amounts of trace data off-chip without significant slow-down has impeded the debugging of such software, in both pre-silicon emulation and in real designs. We consider on-chip trace compression performed in hardware to reduce data volume, using techniques that exploit inherent higher-order redundancy in address trace data. While hardware trace compression is often restricted to poor or moderate performance due to area and memory constraints, we present a parameterizable scheme that leverages the re- sources already found on existing platforms. Harnessing resources such as existing trace buffers on CPUs, and unused embedded memory on FPGA emulation platforms, our trace compression scheme requires only a small additional hardware area to achieve superior compression ratios.
Abstract
MPSoCs multi-threaded program Traditional debug method can’t be use Non-invasive method is a good way(on-chip emulation)
immense amount of data that must be either stored on-chip or transferred off-chip in real-time trace of a 32-bit processor, 1 clock per instruction, 100
MHz 400 MB/s data Data need to be compressed
What’s the problem?
Related workThis Paper
Compression
algorithms[5]
Combine MTF and LZ
[1]
DMTF[17]
Multi-stage compression
[11]
Lempel-Ziv(LZ)
[18]
MCDS[12]
ARM ETM[2]
Trace compression
schemes
Compression methods
Some example
tools
Proposes method
Compression flow
Why? instructions consecutively until a branch is reached Branch target address
How? Divided into two part
。address 。length
Example:
Consecutive Address Elimination
Compression flow
Why? Branch will be taken or not taken Sequential locality
How? similar to a cache
。miss the first time a set of instructions is encountered。hit for every subsequent encounter that matches the
prediction
Finite Context Method
Compression flow
Why? MTF
。Increase the relevance Prefix
。Assist for differential compression
How? Input address and predicted address Differential compression
Move-to-Front & Address Encoding
Compression flow
Why? Prefix byte compression Probability of prefix
How? Huffman encoding
Run-length and Prefix Encoding
Compression flow
Why? The input for data form MTF/AE stage is 5bytes But the output to LZ stage is 1byte
How? Use a little buffer to save
Data Stream Serializer
Compression flow
Why? The input data has high Repeatability
How? Use LZ compression
。Create a dictionary to save the repeat part。But don’t output the dictionary。While decompression, create a same dictionary
Don’t output every cycle
Lempel-Ziv Encoding of Data Stream
Benchmark : Mibench CPU: Apple PowerMac G4 with a 1.25 GHz
PowerPC 7455, 32-bit fixed instruction-length processor, Linux SMP kernel 2.6.32-24.
Simulation software: ModelSim SE-64 v6.5c
Experimental Results
Logic utilization
Usage Scenario JTAG software fault 10-3
Experimental Results(cont.)
This paper presented a parameterizable microarchitecture for address trace compression, suited to implementation on ASICs and modern FPGAs.
Better compression ratio to others
Conclution
The paper use a dictionary base, multi-stage compression method, can be use to improve our tracer.
The paper give a inspiration for future work for our tracer
My comment
CPU GPU
Bus
B.T.
P.T.P.T.
T.M.