RAMP Stats and Monitoring Derek Chiou, Bill Reinhart, Nikhil Patil with Krste Asanovic and Joel Emer...
-
date post
19-Dec-2015 -
Category
Documents
-
view
218 -
download
1
Transcript of RAMP Stats and Monitoring Derek Chiou, Bill Reinhart, Nikhil Patil with Krste Asanovic and Joel Emer...
1
RAMP Stats and Monitoring
Derek Chiou, Bill Reinhart, Nikhil Patilwith Krste Asanovic and Joel Emer
1/15/2009
2
Goals/Requirements
• Provide functionality equivalent to software-based simulators at RAMP speeds– Full observability– Monitoring for events
• Triggers for breakpoints, dumping state, etc.
– Trace (lossy and lossless)– Aggregate Statistics
• Baseline functionality automatically included• Resource efficient• Flexible• Dynamic and static configurablility• Integrated with other infrastructure (component interfaces)1/15/2009
3
At Least Three Levels of Debug/Monitoring/Stats
• Platform/Unmodel level– Bringing up BEE3/ACP system independent of RAMP code– May be strange bugs that get exercised with RAMP usage model
• Simulator (Model) level– Simulator may model target incorrectly– Monitor simulator bandwidth requirements
• Could be very different than target machine (e.g., cache of target cache)
• Target level– The target machine may have been implemented correctly, but that is
incorrect– Stats/tracing of working target
• We focus on simulator (model)/target level, but hopefully some will be useful for platform level as well
1/15/2009
4
Statistics/Monitoring Philosophy• Instrument simulator communication (eg, RAMP channels)
– Communication mechanisms are logically connected to command network
– Can export/examine/change anything being communicated• No need to add additional code if that is sufficient
– Turn off to save resources when possible
• Introduce additional communication to export where communication does not already exist– Use standard simulator communication (channel) interfaces
• Automatically provides target timing information• Connected to null end-point that logically dumps
– Pipe to /dev/null
• Potentially have non-timed interface, but need time reference point
Bill Reinhart, Nikhil A Patil
1/15/2009
5
Simple Example
F D E M W
compressorcompressor
State
1/15/2009
6
Required Support
• Endpoint support• Channel support• Transport (network)• Naming
1/15/2009
7
User vs Simulator Initiated• Precise User-Initiated
– function call to read/write value at specific target time– Can be implemented through timed channels
• Commands live in target time
– Can be handled logically as a compressor • discard data unless there is a command
– How far ahead in target time should pull command be issued?• Too close impact performance but enables precise control• Too far makes reacting to event difficult
• Imprecise User-Initiated– Issue a read of state, perform whenever, report back target time
• Simulator-initited– dump everything, filter later– can be slow if there is limited bandwidth, storage, filtering
1/15/2009
8
Required Support: Endpoint
– Provide state connected to command network• Same interface as a register, drop in replacement• Stats counters, monitor points, control points, etc.
– Provide default compressors/filters• Output every n cycles• Output on rollover• Output toggled on signal• Etc.
1/15/2009
9
Required Support: Channel
• Optional connection to control network• Use internal buffering to look back in time– Channels implements as circular buffer in BRAM• Far more storage than needed (in general)
– Can look back in time– Can save bandwidth by only exporting when
needed headtail
1/15/2009
10
Required Support: Transport• Transport
– To units: commands, configuration, state changes, etc.– From units: Extract target/host state, statistics, etc.
• Could be virtual channel(s) on common physical network
• Lossy Network?– Lossless for now, support lossy at endpoint
• QoS?• A ring or a ring of rings for simplicity• Ordered network simpler
– helps reconstruction of data outside– But, could result in less efficiency
1/15/2009
11
Required Support: Naming/Tagging
• Naming of source of data– Command
• read P1.iCache.num_hits stats register translated to actual register
– Returned data/Trace entry• Needs to be tagged to indicate data
• Each stats entry also includes at least– Target time– Potentially platform/host time for platform/simulator-
level debugging1/15/2009
12
FPGA Debug
Hari Angepat, Chris Craik and Derek ChiouElectrical and Computer Engineering
University of Texas at Austin
1/15/2009
13
Introduction
• FPGA Simulators offer magnitude speedup– However, can suffer from traditional hardware
issues of limited visibility and debugging challenges
• RAMP Simulators face additional complexity to due scalability requirements that may prevent instrumenting every signal in the simulator
FPGADBG11/15/2009
14
Challenge
• How to bring software level debugging visibility to RAMP simulators without dramatically increasing resources or affecting timing closure
1/15/2009
15
Challenge
• How to bring software level debugging visibility to RAMP simulators without dramatically increasing resources or affecting timing closure
• Revisit idea of FPGA state readback in combination with gdb style debug interfaces
1/15/2009
16
Our Technique
• 1) Leverage FPGA readback mechanism to exploit as much free visibility as possible– FPGA frame readback exists in V2Pro, V4, V5– Can sample flip-flop state dynamically– Can sample BRAM/LUT (notes on this later..)– Can use JTAG hardware for latency-tolerant low-
resource physical link
FPGADBG11/15/2009
17
Our Technique
• 2) Provide a GDB interface that can debug both a software process, as well as a FPGA fabric simultaneously.– Can display FPGA netlist symbols alongside
software symbols– Can allow for hybrid CPU/FPGA platform
debugging (ie. X86-FSB-FPGA)
FPGADBG11/15/2009
18
FPGADBG Toolflow
FPGADBG1
Dummy!
Compiler
Software Sources(C/C++/…)
Synthesis
FPGA Implementation
Hardware Sources(Verilog/VHDL/…)
Hierarchy Name PreservationConstraints
Debug Flags(-g -Ox)
Logic Allocation
Map
PAR Netlist
FPGA Bitstream
Symbol Table
ASCII Disassembly
BinaryExecutable
FPGADBG – Interactive extension that enables non-intrusive debugging of software running on FPGA (GDB-Py)
Software Debugger (GDB)
1/15/2009
19
Architecture
• Designed as set of C/Python libraries– GDB Interface (plugin)– Netlist Frontend (parsing, mapping)– FPGA Backend (board comm, readback)– Hardware library (step control, ICAP readback)
• GDB frontend allows connecting to software-based portions of a simulator
• Assumes design-level support for step– Allows design to ensure consistent state before
sampling
FPGADBG11/15/2009
20
Architecture
FPGADBG1
User Logic
Domain Step Control
Readback Engine (ICAP)
IO Logic (Transport Layer)
FPGA Fabric
Target Application
Target OS
Target Virtual Machine
GDB
GDB Plugin Bindings (Python)
FPGADBG Core (Python)
FPGA Chip Comm
(C)
FPGA Readback
(C)
Netlist Parser
(Python)
HW/SW Simulation Platform1/15/2009
21FPGADBG1
Bit 6597758 0x005e0200 5758 Block=SLICE_X88Y18 Latch=XQ Net=dout(3)Bit 6597838 0x005e0200 5838 Block=SLICE_X88Y16 Latch=XQ Net=dout(1)Bit 6604350 0x005e0400 5758 Block=SLICE_X88Y18 Latch=YQ Net=dout(2)Bit 6604430 0x005e0400 5838 Block=SLICE_X88Y16 Latch=YQ Net=dout(0)
inst "regOut(1)" "SLICE",placed R72C45 SLICE_X88Y16 , cfg " BXINV::BX BXOUTUSED::#OFF BYINV::BY BYINVOUTUSED::#OFF BYOUTUSED::#OFF ... DXMUX::0 DYMUX::0 F::#OFF F5USED::#OFF FFX:myREG/dout_1:#FF FFX_INIT_ATTR::INIT0 FFX_SR_ATTR::SRLOW FFY:myREG/dout_0:#FF FFY_INIT_ATTR::INIT0 FFY_SR_ATTR::SRLOW ... ";inst "regOut(3)" "SLICE",placed R71C45 SLICE_X88Y18 , cfg " BXINV::BX BXOUTUSED::#OFF BYINV::BY BYINVOUTUSED::#OFF BYOUTUSED::#OFF ... DXMUX::0 DYMUX::0 F::#OFF F5USED::#OFF FFX:myREG/dout_3:#FF FFX_INIT_ATTR::INIT0 FFX_SR_ATTR::SRLOW FFY:myREG/dout_2:#FF FFY_INIT_ATTR::INIT0 FFY_SR_ATTR::SRLOW ...“;
TopmyREG
doutregOut
Netlist Parsing
1/15/2009
22
Netlist Parsing
Alias Detection
Physical Netlist
Vector Merger HierarchyConstruction
Frame AddressMapping
FPGA Cmd Generator
Symbolic Netlist
Readback Cmd Parser Bitstream Reorder
Readback Bitstream
FPGA Board Communication
• FPGA toolflow introduces optimizations and naming issues
1/15/2009
23
Limitations
• Hardware readback has limitations:– RAMs require offline readback due to resource
contention issues– FPGA frame span large vertical stripes potentially
restricting visibility if some logic cannot be disabled during sampling
– Hierarchy must be preserved during synthesis to ensure understandable netnames
– Step control requires design-level support
FPGADBG11/15/2009
24
Status & Future Work
• Current prototype implements board communication with the XUP Virtex2Pro30 with JTAG-based frame readback
• Frontend netlist parser support hierachical node generation, bit vector merging and some support for aliased signals.
• Full GDB shell expected to be released in Q1-2009 with support for Virtex5{110/330}
FPGADBG11/15/2009