Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

21
Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

Transcript of Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

Page 1: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

Optimizing FPGA Accelerator Design for Deep Convolution neural NetworksBy: Mohamad Kanafanai

Page 2: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

OutlineIntroductionBackgroundMethodologyResultsEvaluation of the systemCriticismQ&A

Page 3: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

IntroductionCNN is extend from artificial

neural networkApplication include image

processing Requires high performance

computation hardwareDesign exploration is a must !

Page 4: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

What is Deep Convolution neural Networks ? Type of Machine learning8 stepsLimitationsFeed forward computation

Page 5: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

Roof Line model Provide a graphical representation

of performance and productivity◦Rates and efficiencies(Gflops, % of peak)◦limitation◦Benefits

Focus ◦Computation◦Communication◦locality

Not for fine tuning

Page 6: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

Types of dataIrrelevant Independent Dependent

Page 7: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

Double buffering Allows for two way

communicationIncrease throughput

Page 8: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

Main concerns Communication overheadBuffer managementBandwidth optimizationBetter Utilization of FPGA

Page 9: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

Design ExplorationComputation

◦Loop scheduling◦Loop tile sizes

Communication ratio

Page 10: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

Directives loop PipelineSoftware pipeliningIncrease throughput

Page 11: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

Directives Loop UnrollingMaximizes computationData flow design

Page 12: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

Directives Loop TillingDivides loops into smaller loops

◦ensure data stays in cache◦Great for Data reuse

Page 13: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

Memory Optimization Polyhedral based optimizationLocal memory promotion for

irrelevant type communicationsData reuse

Page 14: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

Designed Model

Page 15: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

Detail of the final design

Page 16: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

ResultsVirtex 7 100 MHz as IP using VHLSIntel Xeon E5 2.2 GHz 15 MB cachePre synthesis report used for performance

and exploration

Page 17: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

Evaluation of the system 17.42 X speedup on 1 thread GP implementation 4.8 X speedup on 16 thread GP implementation 18.6 watts vs 95 watts GP 3.62X speedup on ICCD2013 Design

Page 18: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

My opinionThe techniques used to optimize

loops are well thought out It’s a unique way of looking at an

acceleratorThe memory enhancement offer

great insight

Page 19: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

Pitfall of the claimPre cached data testsEvaluation metrics when

comparing other designs Only tested using one imageTechnology difference Claiming Design has best

utilization

Page 20: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

Q&A

Page 21: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai.

Referencehttp://crd.lbl.gov/assets/pubs_presos/pa

rlab08-roofline-talk.pdfhttps://www.youtube.com/watch?v=n6h

pQwq7Inwhttp://en.wikipedia.org/wiki/Loop_tilinghttp://en.wikipedia.org/wiki/Polytope_m

odelChen Zhang, Peng Li, Guangyu Sun,

Yijin Guan, Bingjun Xiao, Jason Cong ,Center for Energy-Efficient Computing and Applications, Peking University, China, Computer Science Department, University of California, Los Angeles, USA