Mapping Multiple Multivariate Gaussian Random...
Transcript of Mapping Multiple Multivariate Gaussian Random...
Mapping Multiple Multivariate
Gaussian Random Number Generators
on an FPGA
Chalermpol Saiprasert, Christos-S. Bouganis and George A. Constantinides
1
Outline
• Monte Carlo Simulation
• Multivariate Gaussian Random Number Generator (MVGRNG)
• Objective
• Optimization algorithm
• Proposed framework – Hardware architecture
• Experimental Results
• Conclusions
2
Introduction
• Monte Carlo simulation
» Mathematical technique
» Repeated random sampling
» Evaluate non-deterministic processes
• Pre-requisite for MC simulation random numbers
• Multivariate Gaussian distribution to capture many correlated
variables
• Acceleration of MC using FPGA
» Speed up simulations
» Optimization of MVGRNG
3
Objective
• Existing approaches only focus on single distribution MVGRNG
• Mapping of multiple multivariate Gaussian distributions
• Example: Optimization of many financial portfolios
» Represented by many multivariate Gaussian distributions
• MVRNG usually part of larger application
» Resource usage CRUCIAL
• Efficient resource sharing
4
Generate multivariate Gaussian Random Numbers
• Mean (m) and Covariance matrix (Σ)
• OBJECTIVE : APPROXIMATE Σ
• Eigenvalue Decomposition using SVD
• Using any levels of decomposition K
5
U UT
U1/ 21/ 2UT
x U1/ 2z m, z ~ N(0,I)
c izi i1
K
m
Proposed Algorithm
6
Approximation
Optimization
Approximation
Error Calculation
Calculate Overall
Approximation Error
Approximation
Optimization
Approximation
Error Calculation
Approximation
Optimization
Approximation
Error Calculation
Calculate Remainder of Target
Matrices
Check for Termination
Constraint
Termination constraint
Input Matrices
No
Yes
Vector Coefficients c
Σ1 Σ2 Σm
Approximate Σ for each
distribution
Target redundancies between
ALL input distribution
Exploit similarities in
PRECISION REQUIREMENTS
Select appropriate precision to minimize approximation error for all distributions
Distinct coefficients for each
distribution
Algorithm takes any M
number of distributions
Error Estimation Model
• Mean square error
• Approximation error for each distribution
7
2
2
1
NError
Approximated Matrix
Actual Matrix
Matrix Order
G
R
N
G
X +
X +
X +
CB1
CB2
CB3
1
1c
2
1c
1
2c2
2c
1
3c
2
3c
2
1
1
1 / zz
2
2
1
2 / zz
2
3
1
3 / zz
x1/x
2
8
Hardware Architecture
Constructed from K number of CBs K = no of decomposition levels
Mixed precisions in datapath
LUTs based
Precision in adder path = max(All CBs)
9
Hardware Architecture
Two multivariate Gaussian distributions (3x3
correlation matrices)
Using 3 levels of decomposition (K=3)
GRNG with different seeds for each input
distribution – completely independent
x1 produced after K clock cycles
x2 produced after 2K clock cycles
G
R
N
G
X +
X +
X +
CB1
CB2
CB3
1
1c
2
1c
1
2c2
2c
1
3c
2
3c
2
1
1
1 / zz
2
2
1
2 / zz
2
3
1
3 / zz
x1/x
2
Experiment I
Accuracy of Error and Resource
Estimation Model
Accuracy of the Error Estimation Model
10-15
10-10
10-5
100
10-14
10-12
10-10
10-8
10-6
10-4
10-2
100
Estimated Approximation Error of Correlation Matrices
Em
piric
al A
pp
roxim
atio
n E
rro
r o
f C
orr
ela
tio
n M
atr
ice
s
Accuracy of the Resource Estimation Model
0 500 1000 1500 2000 2500 3000 35000
500
1000
1500
2000
2500
3000
Estimated Resource Utilization (LUTs)
Em
piric
all
Re
so
urc
e U
tiliz
atio
n (
LU
Ts)
Experiment II
Comparison with Existing
Approaches
Experimental Setup
• Approaches under consideration
» [Thomas and Luk 2008]
» Our previous work [Saiprasert et al 2009]
• Adjust throughput of existing approaches to be the same level
» Fair comparison
• Force M consecutive levels to use same CB for [Saiprasert et al
2009]
» M = number of input distributions
14
CB2
CB3
CB2
CB1
CB3
CB2
a1
a1, a
2
a2
a3
a4
a3, a
4
x
x
Comparison of All Approaches
15
[Thomas08] [Saiprasert09] This work
Architecture DSP LUTs LUTs
Precision Fixed Mixed Mixed
Optimization across
all input distributions
No No Yes
Reuse same
hardware for all
input matrices
Force M consecutive
decomposition levels
to share same
hardware
Optimized
precisions and
coeff for all input
distributions
Experimental Setup
• 4 sets of input correlation matrices
» Set I: Four 2x2 matrices
» Set II: Four 4x4 matrices
» Set III: Four 6x6 matrices
» Set IV: Two 2x2 and two 4x4 matrices
• One MVGRNG optimised for each set
• 100,000 vectors obtained for each set
16
Set I Matrices (2x2)
200 400 600 800 1000 1200 140010
-15
10-10
10-5
100
Resource Utilization (LUTs)
Ap
pro
xim
atio
n E
rro
r o
f C
orr
ela
tio
n M
atr
ix
Proposed Approach
Extension of our previous work
[Thomas and Luk 08]
18bit GRNG
Floating Point GRNG
18 bit upstream, double precision hardware
Double precision upstream, double precision hardware
18 bit upstream, 18bit hardware
18 bit upstream, mixed precision hardware
Set II Matrices (4x4)
0 500 1000 1500 2000 2500 300010
-15
10-10
10-5
100
Resource Utilization (LUTs)
Ap
pro
xim
atio
n E
rro
r o
f C
orr
ela
tio
n M
atr
ix
Proposed Approach
Extension of our previous work
[Thomas and Luk 08]
18bit GRNG
Floating Point GRNG
Set III Matrices (6x6)
0 500 1000 1500 2000 2500 3000 3500 400010
-15
10-10
10-5
100
Resource Utilization (LUTs)
Ap
pro
xim
atio
n E
rro
r o
f C
orr
ela
tio
n M
atr
ix
Proposed Approach
Extension of our previous work
[Thomas and Luk 08]
18bit GRNG
Floating Point GRNG
50%
38%
Set IV Matrices (Mixed Matrix Orders)
0 500 1000 1500 2000 250010
-15
10-10
10-5
100
Resource Utilization (LUTs)
Ap
pro
xim
atio
n E
rro
r o
f C
orr
ela
tio
n M
atr
ix
Proposed Approach
Extension of our previous work
[Thomas and Luk 08]
18bit GRNG
Floating Point GRNG
Conclusions
• Innovative approach for multiple distributions MVGRNG
• One generator optimized for all input distributions
• Effective resource sharing algorithm
• Exploits similarities in precision requirements
• Up to 50% reduction in resource usage
• Without any penalty on the quality of the generated data
21
THANK YOU FOR YOUR ATTENTION