Eugene Khvedchenia - Image processing using FPGAs

33
Image processing on FPGA Eugene Khvedchenya https://ua.linkedin.com/in/cvtalks

Transcript of Eugene Khvedchenia - Image processing using FPGAs

Page 1: Eugene Khvedchenia - Image processing using FPGAs

Image processing on FPGAEugene Khvedchenya

https://ua.linkedin.com/in/cvtalks

Page 2: Eugene Khvedchenia - Image processing using FPGAs

What is FPGA and who needs it ?

Page 3: Eugene Khvedchenia - Image processing using FPGAs

General implementation

OpenCLCache tuning

MultithreadingSIMD (SSE, NEON)

FPGA

Optimization pyramid

Page 4: Eugene Khvedchenia - Image processing using FPGAs

What’s inside?

LUT

Flip-Flop

ALU

BRAM

IO pads

FPGA

Page 5: Eugene Khvedchenia - Image processing using FPGAs

Development efforts

Page 6: Eugene Khvedchenia - Image processing using FPGAs

CPU vs FPGA

Page 7: Eugene Khvedchenia - Image processing using FPGAs

CPU vs FPGA

Page 8: Eugene Khvedchenia - Image processing using FPGAs

CPU vs FPGA

Page 9: Eugene Khvedchenia - Image processing using FPGAs

Development efforts

Page 10: Eugene Khvedchenia - Image processing using FPGAs

High Level SynthesisConverts C++ code to hardware design

HLS compiler optimizes your code for FPGA

Automatically optimize RTL and timing

Provides #pragma’s for fine tuning

C++ API for arbitrary precision math

C++ API for stream data processing

Supports C++ 11

Page 11: Eugene Khvedchenia - Image processing using FPGAs

Things to rememberNo branching penalty

Page 12: Eugene Khvedchenia - Image processing using FPGAs

Things to rememberNo dynamic memory allocation

Page 13: Eugene Khvedchenia - Image processing using FPGAs

Things to rememberInstantaneous BRAM access

Register-level bandwidth 0.5M-bits / second

BRAM bandwidth 23T-bits / second

Numbers above for Xilinx Kintex®-7 410T device

Page 14: Eugene Khvedchenia - Image processing using FPGAs

Things to rememberSingle producer - single consumer

Page 15: Eugene Khvedchenia - Image processing using FPGAs

Things to rememberPipelining

Page 16: Eugene Khvedchenia - Image processing using FPGAs

Things to remember

● No branching penalty

● No cache penalty

● No dynamic memory allocation

● Instantaneous BRAM access

● Single producer - single consumer

● Pipelining

● Task-centric approach

Page 17: Eugene Khvedchenia - Image processing using FPGAs

HLS Development cycle

1. Get baseline version

2. Write simulation test

3. Run HLS synthesis

4. Simulate

5. Validate

6. Measure

7. Optimize

8. Goto 3

Page 18: Eugene Khvedchenia - Image processing using FPGAs

Sobel Edge DetectionGoal: Process image 1920x1080 @ 60HZ

Page 19: Eugene Khvedchenia - Image processing using FPGAs

Sobel Edge DetectionBaseline implementation

Iterate over image● Convolve 3x3 window with Gx and Gy kernels● Compute their absolute sum● Write to corresponding output pixel

The FPGA frequency is this example is 150 MhzTo meet 1920x1080@60Hz goal we must process data at rate 1 cycle/pixel or faster

Page 20: Eugene Khvedchenia - Image processing using FPGAs

Sobel Edge DetectionBaseline implementation

Page 21: Eugene Khvedchenia - Image processing using FPGAs

Sobel Edge DetectionBaseline implementation

40 cycles/pixel on FPGATiming violation

Page 22: Eugene Khvedchenia - Image processing using FPGAs

Sobel Edge DetectionTuning FPGA implementation

Iterate over image● Convolve 3x3 window with Gx and Gy kernels

Pipeline: Compute one field in the 3x3 filter window per clock cycle.● Compute Gx and Gy absolute sum● Write to corresponding output pixel

Page 23: Eugene Khvedchenia - Image processing using FPGAs

Sobel Edge DetectionTuning FPGA implementation

Page 24: Eugene Khvedchenia - Image processing using FPGAs

Sobel Edge DetectionTuning FPGA implementation

10 cycles/pixel on FPGATiming violation

Page 25: Eugene Khvedchenia - Image processing using FPGAs

Sobel Edge DetectionTuning FPGA implementation

Iterate over image● Pipeline: Apply pipeline to the inner loop (columns)● Convolve 3x3 window with Gx and Gy kernels

○ Loop gets totally unrolled and computed at 1 cycle● Compute Gx and Gy absolute sum

○ Also computed in parallel● Write to corresponding output pixel

Page 26: Eugene Khvedchenia - Image processing using FPGAs

Sobel Edge DetectionTuning FPGA implementation

Page 27: Eugene Khvedchenia - Image processing using FPGAs

Sobel Edge DetectionTuning FPGA implementation

1 cycle/pixel on FPGAMemory-access violation

Page 28: Eugene Khvedchenia - Image processing using FPGAs

Sobel Edge DetectionTuning FPGA implementation

Issues● Nine concurrent memory accesses● More hardware blocks required● HLS module can only connect a single port capable of one transaction/clock

Page 29: Eugene Khvedchenia - Image processing using FPGAs

Sobel Edge DetectionTuning FPGA implementation

● Use BRAM to store intermediate line buffer ● Read data from external memory to line buffer● Fill memory window (Flip-flop elements)● Convolve 3x3 window with Gx and Gy kernels

○ Loop gets totally unrolled and computed at 1 cycle● Compute their absolute sum

○ Also computed in parallel● Write to corresponding output pixel

Page 30: Eugene Khvedchenia - Image processing using FPGAs

Sobel Edge DetectionTuning FPGA implementation

1 cycle/pixel on FPGAAchievement unlocked

Page 31: Eugene Khvedchenia - Image processing using FPGAs

The dark sideOf the FPGA development

● The tools aren’t great● It works in simulator!● Learning curve● Debugging timing violations

Page 32: Eugene Khvedchenia - Image processing using FPGAs

Quick start● FPGA Development board: Altera, Xilinx● IDE & Samples: Vivado● OpenCV support● HLS for OpenCL

Page 33: Eugene Khvedchenia - Image processing using FPGAs

Image processing on FPGAEugene Khvedchenya

Questions?

https://ua.linkedin.com/in/[email protected]

@cvtalks