Hadoop (Pig). Processing of large data (by Eugene Smertenko) - Big Data Tech Hangout - 2013.10.26
Eugene Khvedchenia - Image processing using FPGAs
-
Upload
eastern-european-computer-vision-conference -
Category
Technology
-
view
1.510 -
download
3
Transcript of Eugene Khvedchenia - Image processing using FPGAs
Image processing on FPGAEugene Khvedchenya
https://ua.linkedin.com/in/cvtalks
What is FPGA and who needs it ?
General implementation
OpenCLCache tuning
MultithreadingSIMD (SSE, NEON)
FPGA
Optimization pyramid
What’s inside?
LUT
Flip-Flop
ALU
BRAM
IO pads
FPGA
Development efforts
CPU vs FPGA
CPU vs FPGA
CPU vs FPGA
Development efforts
High Level SynthesisConverts C++ code to hardware design
HLS compiler optimizes your code for FPGA
Automatically optimize RTL and timing
Provides #pragma’s for fine tuning
C++ API for arbitrary precision math
C++ API for stream data processing
Supports C++ 11
Things to rememberNo branching penalty
Things to rememberNo dynamic memory allocation
Things to rememberInstantaneous BRAM access
Register-level bandwidth 0.5M-bits / second
BRAM bandwidth 23T-bits / second
Numbers above for Xilinx Kintex®-7 410T device
Things to rememberSingle producer - single consumer
Things to rememberPipelining
Things to remember
● No branching penalty
● No cache penalty
● No dynamic memory allocation
● Instantaneous BRAM access
● Single producer - single consumer
● Pipelining
● Task-centric approach
HLS Development cycle
1. Get baseline version
2. Write simulation test
3. Run HLS synthesis
4. Simulate
5. Validate
6. Measure
7. Optimize
8. Goto 3
Sobel Edge DetectionGoal: Process image 1920x1080 @ 60HZ
Sobel Edge DetectionBaseline implementation
Iterate over image● Convolve 3x3 window with Gx and Gy kernels● Compute their absolute sum● Write to corresponding output pixel
The FPGA frequency is this example is 150 MhzTo meet 1920x1080@60Hz goal we must process data at rate 1 cycle/pixel or faster
Sobel Edge DetectionBaseline implementation
Sobel Edge DetectionBaseline implementation
40 cycles/pixel on FPGATiming violation
Sobel Edge DetectionTuning FPGA implementation
Iterate over image● Convolve 3x3 window with Gx and Gy kernels
Pipeline: Compute one field in the 3x3 filter window per clock cycle.● Compute Gx and Gy absolute sum● Write to corresponding output pixel
Sobel Edge DetectionTuning FPGA implementation
Sobel Edge DetectionTuning FPGA implementation
10 cycles/pixel on FPGATiming violation
Sobel Edge DetectionTuning FPGA implementation
Iterate over image● Pipeline: Apply pipeline to the inner loop (columns)● Convolve 3x3 window with Gx and Gy kernels
○ Loop gets totally unrolled and computed at 1 cycle● Compute Gx and Gy absolute sum
○ Also computed in parallel● Write to corresponding output pixel
Sobel Edge DetectionTuning FPGA implementation
Sobel Edge DetectionTuning FPGA implementation
1 cycle/pixel on FPGAMemory-access violation
Sobel Edge DetectionTuning FPGA implementation
Issues● Nine concurrent memory accesses● More hardware blocks required● HLS module can only connect a single port capable of one transaction/clock
Sobel Edge DetectionTuning FPGA implementation
● Use BRAM to store intermediate line buffer ● Read data from external memory to line buffer● Fill memory window (Flip-flop elements)● Convolve 3x3 window with Gx and Gy kernels
○ Loop gets totally unrolled and computed at 1 cycle● Compute their absolute sum
○ Also computed in parallel● Write to corresponding output pixel
Sobel Edge DetectionTuning FPGA implementation
1 cycle/pixel on FPGAAchievement unlocked
The dark sideOf the FPGA development
● The tools aren’t great● It works in simulator!● Learning curve● Debugging timing violations
Quick start● FPGA Development board: Altera, Xilinx● IDE & Samples: Vivado● OpenCV support● HLS for OpenCL
Image processing on FPGAEugene Khvedchenya
Questions?
https://ua.linkedin.com/in/[email protected]
@cvtalks