Politecnico di MilanoDipartimento di Elettronica, Informazione e Bioingegneria (DEIB)
Hardware Acceleration of
Computational Fluid Dynamics
Simulations in an Oxygenator
Introduction and State of the Art
Chiara [email protected]
B3Lab, May 4th 2016
Guido [email protected]
HAMSproject
2
The context
Wojchiec Zasina and Philip Hogeboom (from the Noun Project)
Cardiac Surgery
When we need to operate
the patient’s hearth,
but
we can’t stop the
circulation for too long...
3
Extra-Corporeal Circulation
(ECC)
www.medici-italia.com
The context
1) Venous blood picked up from the
caval veins
2) Blood treatment (heating,
oxygenation, ...)
3) Treated blood re-inserted into
arterial system
4
www.medici-italia.com
The context
OXYGENATOR
Extra-Corporeal Circulation
(ECC)
5
Oxygenator
www.lookfordiagnosis.com
● THE FUNCTION
Replicate the lungs’ task:
- pick up CO2 from blood
- put O2 into blood
6
Oxygenator
www.perfusione.net
● THE FUNCTION
Replicate the lungs’ task:
- pick up CO2 from blood
- put O2 into blood
● HOW IT IS MADE
Many different
membranes rolled up
concentrically
→ porous system
7
Oxygenator
www.perfusione.net
In order to optimize
the design
Need to know blood
behaviour
CFD simulations
8
Fluid dynamics can be studied through:
Oxygenator
FEM
(Finite Element
Method)
Lumped
parameters
model
9
Oxygenator
FEM
(Finite Element
Method)
Fluid dynamics can be studied through:
fast and accurate
but
complex and expensive
Lumped
parameters
model
10
Oxygenator
FEM
(Finite Element
Method)
Fluid dynamics can be studied through:
fast and accurate
but
complex and expensive
Lumped
parameters
model
easy and low-cost
but
trade-off time vs accuracy
11
Oxygenator
FEM
(Finite Element
Method)
Fluid dynamics can be studied through:
fast and accurate
but
complex and expensive
Lumped
parameters
model
the linear system generated by
the model is solved through
matrix inversion
12
Oxygenator
FEM
(Finite Element
Method)
Fluid dynamics can be studied through:
fast and accurate
but
complex and expensive
Lumped
parameters
model
implementation in MATLAB(versatile, easy-to-use, widespread)
13
The problem
https://thenounproject.com/
If only MATLAB simulations
could take less time...
14
The idea
https://thenounproject.com/
If only MATLAB simulations
could take less time...
HARDWARE
ACCELERATION
15
The device
FPGAs Speedup wrt common
processors
Low power requirements
Programmability
Affordable costs
16
Our goal
Study an oxygenator for ECC through a lumped
parameters model in MATLAB and accelerate the
simulation by means of an FPGA-based system
17
Our goal
Study an oxygenator for ECC through a lumped
parameters model in MATLAB and accelerate the
simulation by means of an FPGA-based system
18
Our goal
Study an oxygenator for ECC through a lumped
parameters model in MATLAB and accelerate the
simulation by means of an FPGA-based system
Lumped
parameters
model
19
Our goal
Study an oxygenator for ECC through a lumped
parameters model in MATLAB and accelerate the
simulation by means of an FPGA-based system
Lumped
parameters
model
20
Our goal
Study an oxygenator for ECC through a lumped
parameters model in MATLAB and accelerate the
simulation by means of an FPGA-based system
Lumped
parameters
model
21
State of the
Art
Matlab HDL Coder HW matrix inversion
22
State of the
Art
- Matrices can not be
passed directly as I/O (but
can be managed internally)
- Requires HW-adapted
algorithms (eg. CORDIC)
Matlab HDL Coder HW matrix inversion
23
State of the
Art
- Matrices can not be
passed directly as I/O (but
can be managed internally)
- Requires HW-adapted
algorithms (eg. CORDIC)
NOT TRIVIAL
Matlab HDL Coder HW matrix inversion
24
State of the
Art
AlgorithmsApplicative
domains
Matlab HDL Coder HW matrix inversion
25
Applicative domains
26
Algorithms
SVD method°Greville’s algorithmFull rank QR
factorization
Moore-Penrose Pseudo Inverse*
* Corrieu P, «Fast Computation of Moore-Penrose Inverse Matrices», Neural Information Processing, 2005
27
SVD method°Greville’s algorithmFull rank QR
factorization
Moore-Penrose Pseudo Inverse*
Let be A = U*∑*V’ then pinv(A) = V*pinv(∑)*U’
Very accurate method
But
Time consuming for large matrices
(because of SVD)
* Corrieu P, «Fast Computation of Moore-Penrose Inverse Matrices», Neural Information Processing, 2005
Algorithms
28
SVD method°Greville’s algorithmFull rank QR
factorization
° Rahmati et al, “FPGA Based Singular Value Decomposition for Image Processing Applications ”, 2008
QR algorithm
Computationally efficient
Hemkumar, "A systolic VLSI architecture
for complex SVD", 1992
Jacobi method
More accurate, parallelism
Luk, Park, "A proof of convergence for two
parallel Jacobi SVD algorithms", 2002
Moore-Penrose Pseudo Inverse*
Algorithms
29
Singular Value Matlab* FPGA % error
σ1 2.6603 2.7500 3.3718
σ2 2.3113 2.3125 0.0519
Elapsed Time 2.7141 s 24.3143 ms
SVD Computation of a 32x127 Matrix: this table shows the corresponding
singular values with the minimum and maximum estimation errors for the
case of a 32 x 127 matrix. This table also shows the elapsed time for the
software and hardware implementations.*Matlab 7.3.0.267 utilizing 2.4GHz Intel Core Duo Processor
“Reconfigurable FPGA-Based Unit for Singular Value Decomposition
of Large m x n Matrices”, Ledesma-Carrillo et al., 2011
Some results
30
Matlab HDL coder HW matrix inversion
Our contributionvs vs
31
Matlab HDL coder HW matrix inversion
Our contributionvs
Managing of the whole
interface
It is not needed to write
HDL-friendly Matlab code
(only function)
32
Matlab HDL coder HW matrix inversion
Our contributionvs
Management of larger matrices
(up to 8000x8000)
33
Matlab HDL coder HW matrix inversion
Our contributionvs
Management of larger matrices
(up to 8000x8000)
through
i) strong parallelism
ii) streaming in data transfer
iii) Xilinx Virtex 7 VC707
34
QUESTIONS?
Contact us!
HAMSproject
www.facebook.com/hams.project
https://twitter.com/HAMS_project
http://www.slideshare.net/HAMSproject
35
References[1] Wang et al, “A CORDIC-Based Dynamically Reconfigurable FPGA Architecture for Signal Processing Algorithms”, 2008
[2] Burian et al, “A Fixed-Point Implementation of Matrix Inversion Using Cholesky Decomposition”, 2004
[3] Bigdeli et al, “A New Pipelined Systolic Array-Based Architecture for Matri Inversion in FPGAs with Kalman Filter Case Study”, 2005
[4] Edmann et al, “A Scalable Pipelined Complex Valued Matrix Inversion Architecture”, 2005
[5] Garcia et al, “A Suitable FPGA Implementation of Floating-Point Matrix Inversion Based on Gauss-Jordan Elimination», 2011
[6] Ahmedsaid et al, “Accelerating SVD on Reconfigurable Hardware for Image Denoising”, 2004
[7] Kumar et al, “An Approach to Design a Matrix Inversion HW Module using FPGA”, 2014
[8] Irturk et al, “An Efficient FPGA Implementation of Scalable Matrix Inversion Core usign QR Decomposition”, 2009
[9] Norton et al, “An Evaluation of the Xilinx Virtex-4 FPGA for On-Board Processin in an Advanced Imaging System”, 2009
[10] Irturk et al, “An FPGA Design Space Exploration Tool for Matrix Inversion Archiectures”, 2008
[11] Ma et al, “An FPGA-based Singular Value Decomposition Processor ”, 2006
[12] Wu et al, “Approximate Matrix Inversion for High-Throughput Data Detection in the Large-Scale MIMO Uplink ”, 2013
[13] Irturk et al, “Automatic Generation of Decomposition based Matrix Inversion Architectures ”, 2008
[14] Szekowka et al, “CORDIC and SVD Implementation in Digital Hardware ”, 2010
[15] Sergiyenko et al, “Error-Free Computation of Inverse Matrices in FPGA ”, 2013
[16] Rahmati et al, “FPGA Based Singular Value Decomposition for Image Processing Applications ”, 2008
[17] Grammenos et al, “FPGA Design of a Truncated SVD Based Receiver for the detection of SEFDM Signals ”, 2011
[18] Karkooti et al, “FPGA Implementation of Matrix Inversion Using QRD-RLS Algorithm”, 2005
[19] Blace et al, “High level Prototyping and FPGA Implementation of the Orthogonal Matching Pursuit Algorithm ”, 2012
[20] Ahmedsaid et al, “Improved SVD Systolic Array and Implementation on FPGA”, 2003
[21] S. Hu and Q. Yan, “Inversion of Vandermonde Matrices in FPGAs ”, 2004
[22] Ohta et al, “Matrix Decomposition Suitable for FPGA Implementation of N-continuous OFDM ”, 2014
[23] Chisty et al, “Matrix Inversion Using QR Decomposition by Parabolic Synthesis ”, 2012
[24] Ma et al, “QR Decomposition-Based Matrix Inversion for High Embedded MIMO Receivers ”, 2011
[25] Wernke et al, “Real-Time Data Processing for an Advanced Imaging System Using the Xilinx Virtex-5 FPGA ”, 2009
[26] Ledesma-Carrillo et al, “Reconfigurable FPGA-Based Unit for Singular Value Decomposition of Large mxn Matrices ”, 2011
[27] Wang et al, “Singular Value Decomposition Hardware for MIMO - State of the Art and Custom Design ”, 2010
Top Related