Post on 03-Jan-2016
HW/SW Co-designHW/SW Co-design
Lecture 4:Lecture 4:Lab 2 – Passive HW Accelerator Lab 2 – Passive HW Accelerator
DesignDesign
Course material designed by Professor Yarsun Hsu, EE Dept, NTHURA: Yi-Chiun Fang, EE Dept, NTHU
OutlineOutline
Introduction to AMBA Bus SystemPassive Hardware DesignInterrupt Service RoutineEnvironment ConfigurationCo-designed System with GHDL SimulationCo-designed System on FPGA
INTRODUCTION TO AMBA INTRODUCTION TO AMBA BUS SYSTEMBUS SYSTEM
AMBA 2.0 Bus System (1/7)AMBA 2.0 Bus System (1/7)Established by ARMAdvanced High-performance Bus (AHB)
For high-performance, high clock frequency system modules such as embedded processor, DMA controller, and memory controller
Advanced Peripheral Bus (APB)Optimized for minimal power consumption and reduced interface complexity to support peripheral functions
For more details, please refer to the following documentsAMBA 2.0 SpecificationIntroduction to AMBA Bus SystemGRLIB AHBCTRL - AMBA AHB controller with plug&play support
AMBA 2.0 Bus System (2/7)AMBA 2.0 Bus System (2/7)
Slave on AHBThe only master on APB
AMBA 2.0 Bus System (3/7)AMBA 2.0 Bus System (3/7)
AMBA AHB is designed to be used with a central multiplexor interconnection scheme
Avoids tri-state bus
AMBA 2.0 Bus System (4/7)AMBA 2.0 Bus System (4/7)
An AHB transfer consists of two distinct sections
The address phase, which lasts only a single cycleThe data phase, which may require several cycles
This is achieved using the HREADY signal
AMBA 2.0 Bus System (5/7)AMBA 2.0 Bus System (5/7)
A slave may insert wait states into any transferFor write operations, the bus master will hold the data stable throughout the extended cyclesFor read transfers, the slave does not have to provide valid data until the transfer is about to complete
wait states
AMBA 2.0 Bus System (6/7)AMBA 2.0 Bus System (6/7)
GRLIB implements AMBA AHB with slight modificationsPlease refer to the GRLIB User's Manual and GRLIB IP Cores Manual for detailed information
AMBA 2.0 Bus System (7/7)AMBA 2.0 Bus System (7/7)The GRLIB implementation of AHB includes a mechanism to provide plug&play support
The implementation is located at grlib-gpl-1.0.19-b3188/lib/grlib/amba/
The configuration record from each AHB unit is sent to the AHB bus controller via the HCONFIG signal
identification of attached units
address mapping of slaves
interrupt routing
type ahb_config_type is array (0 to NAHBCFG-1) of amba_config_word;
type ahb_config_type is array (0 to NAHBCFG-1) of amba_config_word;
PASSIVE HARDWARE DESIGNPASSIVE HARDWARE DESIGN
Passive HW AcceleratorsPassive HW Accelerators
The accelerator (bus slave) does not actively send signals to the bus
It only responds to the masterThe master gives commands to the slave via its control registers and probes its status registers
master
slave
Passive 1-D IDCT HW Acc. (1/4)Passive 1-D IDCT HW Acc. (1/4)
A simple 2-stage designGate delay
Stage 1: ~1 multStage 2: ~3 add
Action registerWrite ‘1’ to start, resetto 0 automatically by theaccelerator when done
Mode registerRow/column mode
No wait statesImmediate response
action
mode
Passive 1-D IDCT HW Acc. (2/4)Passive 1-D IDCT HW Acc. (2/4)
Data packingSince the 8x8 blocks are of type short (16-bit), each value occupies only half of the data bus (32-bit)We pack two values together to increase data bus utilization and reduce the communication overheadThe action bit and mode bit are also packed together
Y2n, x2n
32 bits
16 bits 16 bits
MSB
Y2n+1, x2n+1 actionmodeUNUSED
31 012
Passive 1-D IDCT HW Acc. (3/4)Passive 1-D IDCT HW Acc. (3/4)
1-D IDCT calculationSTEP1: Write Y registers (4 transfers)STEP2: Write mode bit & action bitSTEP3: Poll the action bitSTEP4: Read x registers after action bit reset
Passive 1-D IDCT HW Acc. (4/4)Passive 1-D IDCT HW Acc. (4/4)
static voidhw_idct_1d(short *dst, short *src, unsigned int mode){ long *long_ptr = (long *)src;
Y_array_base[0] = long_ptr[0]; Y_array_base[1] = long_ptr[1]; ...
*c_reg = (long)((mode << 1) | 0x1);
while (*c_reg & 0x1){ /*busy waiting loop*/ } dst[ 0] = ((short *)x_array_base)[0]; dst[ 8] = ((short *)x_array_base)[1]; ...}
INTERRUPT SERVICE INTERRUPT SERVICE ROUTINEROUTINE
GRLIB GPTIMER (1/2)GRLIB GPTIMER (1/2)General Purpose Timer UnitTimers are present in almost any electronic device which needs timing functions (e.g. timekeeping & time measurement)Acts as a slave on AMBA APBProvides a common decrementing prescaler (clocked by the system clock) and decrementing timersCapable of assertinginterrupt on timerunderflowWe initialize timer 2 for1ms resolution (i.e. aninterrupt will be assertedevery 1ms)
GRLIB GPTIMER (2/2)GRLIB GPTIMER (2/2)
Please refer to the GRLIB IP Cores Manual for detailed information
eCos ISR (1/3)eCos ISR (1/3)
When an interrupt occurs, the processor jumps to a specific address for execution of the Interrupt Service Routine (ISR)One of the key concerns in embedded systems with respect to interrupts is latency, which is the interval of time from when an interrupt occurs until the ISR begins to execute
interrupt latency
eCos ISR (2/3)eCos ISR (2/3)
Basic API for implementing ISRPlease refer to the eCos Reference Manual for detailed information#include <cyg/kernel/kapi.h>
void cyg_interrupt_create(cyg_vector_t vector, cyg_priority_t priority, cyg_addrword_tdata, cyg_ISR_t* isr, cyg_DSR_t* dsr, cyg_handle_t* handle, cyg_interrupt* intr);void cyg_interrupt_delete(cyg_handle_t interrupt);void cyg_interrupt_attach(cyg_handle_t interrupt);void cyg_interrupt_detach(cyg_handle_t interrupt);void cyg_interrupt_acknowledge(cyg_vector_t vector);void cyg_interrupt_mask(cyg_vector_t vector);void cyg_interrupt_unmask(cyg_vector_t vector);
eCos ISR (3/3)eCos ISR (3/3)
An ISR is a C function which takes the following formAn ISR should complete as soon as possible
cyg_uint32isr_function(cyg_vector_t vector, cyg_addrword_t data){ ... /* do the service routine */ return CYG_ISR_HANDLED;}
Program Profiling (1/2)Program Profiling (1/2)
We use GPTIMER for time measurmentEvery time the timer asserts an interrupt, the timer ISR will increase a global variable time_tickcyg_uint32timer_isr(cyg_vector_t vector, cyg_addrword_t data){ unsigned long *time_tick = (unsigned long *) data;
(*time_tick)++;
cyg_interrupt_acknowledge(vector); return CYG_ISR_HANDLED;}
Program Profiling (2/2)Program Profiling (2/2)
We record the latency of every function block by monitoring the time_tick variable
voidfunc(){ unsigned long local_timer = time_tick;
...
time_elapsed += (time_tick - local_timer);}
ENVIRONMENT ENVIRONMENT CONFIGURATIONCONFIGURATION
Build SW ApplicationBuild SW Application
Copy the files in lab_pkg/lab2/sw to your original Lab 1 directory
Replace the Makefile and modify the path for ECOSDIR in Makefile
Type “make” to build-D_HW_ACC_ flag will link the co-designed version of hw_idct_2d() in idct_hw.c with the testbench
Without this flag, hw_idct_2d() will be identical to sw_idct_2d()
-D_PROFILING_ flag will enable profiling using timer interrupt, and report the results in the end
Install IDCT AcceleratorInstall IDCT Accelerator
Copy lab_pkg/lab2/hw/devices.vhd to grlib-gpl-1.0.19-b3188/lib/grlib/amba/ and replace the original fileCopy lab_pkg/lab2/hw/libs.txt and the whole lab_pkg/lab2/hw/esw folder to grlib-gpl-1.0.19-b3188/lib/
The 1-D IDCT passive accelerator is located at lab_pkg/lab2/hw/esw/idct_acc/idct_1x8.vhd
Copy lab_pkg/lab2/hw/leon3mp.vhd to grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-1500/ and replace the original file
CO-DESIGNED SYSTEM WITH CO-DESIGNED SYSTEM WITH GHDL SIMULATIONGHDL SIMULATION
GHDL Simulation (1/6)GHDL Simulation (1/6)
We compile our program as a virtual SDRAM for LEON3 processorLEON3 will fetch the instructions and perform the corresponding operationsAll the hardware signals can be recorded and dumped by GHDL
GHDL Simulation (2/6)GHDL Simulation (2/6)In order to perform GHDL simulation, we disallow our program to link with eCos
Remove -D__ECOS &-I$(ECOSDIR)/include from CFLAGSRemove -Ttarget.ld, -nostdlib, &-L$(ECOSDIR)/lib from LFLAGSRemove –D_PROFILING_ flag
You can remove -D_VERBOSE_ for faster simulationYou can modify the NUM_BLKS macro in idct_test.c to reduce the number of testbench iterationsType “make” to buildYou should see a file named sdram.srec
GHDL Simulation (3/6)GHDL Simulation (3/6)
Start Cygwincd grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-1500/make distcleanmake softCopy sdram.srec webuilt into this directoryand replace theoriginal onemake ghdl
You can check forsyntax errors throughGHDL
GHDL Simulation (4/6)GHDL Simulation (4/6)
Type “./testbench.exe --vcd=waveform.vcd” after compilation to begin simulationYou should see an AHB slave with “Unknown vendor” appear, which is our IDCT accelerator
GHDL Simulation (5/6)GHDL Simulation (5/6)
The dump file waveform.vcd can be viewed on-the-fly using GTKWaveDrag waveform.vcd and drop it over the gtkwave.exe icon to open
You can also use Windows cmd to open“File → Reload Waveform” in GTKWave to update the dump file
GHDL Simulation (6/6)GHDL Simulation (6/6)
addrphase
dataphase
stage1
stage2
probecontrol reg
CO-DESIGNED SYSTEM ON CO-DESIGNED SYSTEM ON FPGAFPGA
Build FPGA Bitstream (1/2)Build FPGA Bitstream (1/2)
Type “make ise | tee ise_log” under grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-1500/ after you install the acceleratorIt is strongly suggested that you verify the hardware with GHDL simulation firstIt is also suggested that you take a look at ise_log for more informationConfigure your FPGA with leon3mp.bit after generating the bitstream
Build FPGA Bitstream (2/2)Build FPGA Bitstream (2/2)
After entering GRMON, check the system configuration using “info sys”You should see a device with “Unknown vendor” appear
Profiling ResultsProfiling Results
Build the program with -D_PROFILING_ flag onCompare the computation results of sw_idct_2d() and hw_idct_2d()Compare thecomputationresults withand without-D_VERBOSE_flag