PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

31
FAST MODAL ANALYSIS WITH NX NASTRAN AND GPUS LEONARD HOFFNUNG SIEMENS PLM SOFTWARE

description

Presentation PG-4037,Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung at the AMD Developer Summit (APU13) November 11-13, 2013.

Transcript of PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

Page 1: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS  

LEONARD  HOFFNUNG  SIEMENS  PLM  SOFTWARE    

Page 2: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

INTRODUCTION  INTRO  /  FREQ  RESP  /  MODES  /  CONCLUSIONS  

Page 3: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

3   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

ABOUT  NX  NASTRAN  

!  Industry  standard  finite  element  package  from  Siemens  PLM  

!  Analysis  opSons  include:  ‒  Stress,  vibraSon,  structural  failure  ‒ Heat  transfer,  acousScs,  rotor  dynamics,  and  more  

!  Advanced  numerical  capabiliSes  and  proven  scalability:  ‒ Problem  sizes  approaching  1  billion  dofs  ‒  SMP  to  24  cores  ‒ DMP  to  2048  nodes  

   

Page 4: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

4   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

MODAL  FREQUENCY  RESPONSE  OVERVIEW  

!  Bread  and  bu\er  industrial  computaSon:  modal  frequency  response  

!  Widely  used  in  automoSve  &  aerospace  to  determine  response  under  varying  excitaSons  ‒ OpSmize  weight,  rigidity  ‒ Minimize  noise,  resonance  

!  Two  phase  calculaSon  more  efficient  than  direct:  ‒ Modal  analysis  ‒  Frequency  response  calculaSon    

 

 NASTRAN  SOL  111  

Page 5: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

5   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

MODAL  FREQUENCY  RESPONSE  

!  EigensoluSon  -­‐-­‐  ℎ  normal  modes  of  𝑓×𝑓  structural  matrices:    structural  matrices:  

𝐾↓𝑓𝑓 Φ↓𝑓ℎ = 𝑀↓𝑓𝑓 Φ↓𝑓ℎ Λ↓ℎℎ   

!  Frequency  response  -­‐-­‐  ℎ×ℎ  complex  linear  soluSon  at  each  of  𝑛𝑟𝑒𝑠𝑝  frequencies:  

(𝐾↓ℎℎ + 𝜔↓𝑘 𝑖   𝐵↓ℎℎ − 𝜔↓𝑘↑2 𝑀↓ℎℎ )𝑥↓𝑘 = 𝑏↓𝑘 ,    𝑘=1,…,𝑛𝑟𝑒𝑠𝑝    

 

!  All  parameters  large  in  typical  customer  usage:  ‒ 𝑓-­‐size  10-­‐30M  for  model  fidelity  ‒ ℎ-­‐size  10-­‐60K  for  modal  accuracy  ‒ 𝑛𝑟𝑒𝑠𝑝  20K  for  detailed  response  graph  

 COMPUTATIONAL  STEPS  

Page 6: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

6   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

PERFORMANCE  CASE  STUDY  

!  Shell  dominated  SOL  111  model  ‒ 245K  degrees  of  freedom  (𝑓-­‐size)  ‒ 1200  eigenpairs  (ℎ-­‐size)  ‒ 20K  frequency  responses    (𝑛𝑟𝑒𝑠𝑝)  

!  EigensoluSon  Sme:  30  minutes  

!  Frequency  response:  127  minutes  

!  Frequency  response  cost  𝑂(𝑛𝑟𝑒𝑠𝑝  ∗ℎ↑3 )  ‒ EsSmated  run  Sme  in  decades  as  ℎ→60𝐾  

 PR  MODEL  –  FREQUENCY  RESPONSE  COST  

Page 7: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

7   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

PERFORMANCE  CASE  STUDY  

!  More  typical  industrial  model:  ‒ 11  million  degrees  of  freedom  (𝑓-­‐size)  ‒  Shell  dominated  model  ‒ Approximately  3000  eigenpairs  (ℎ-­‐size)  ‒ 300  frequency  responses  (𝑛𝑟𝑒𝑠𝑝)  

!  Frequency  response  expensive,  but  modal  calculaSon  sSll  expensive  even  with  RDMODES:  ‒ Modal  calculaSon:  375  minutes  ‒  Frequency  response  Sme:  22  minutes  

!  Need  to  improve  performance  in  both  phases  

 CUSTOMER  BENCHMARK  

Page 8: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

FREQUENCY  RESPONSE  INTRO  /  FREQ  RESP  /  MODES  /  CONCLUSIONS  

Page 9: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

9   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

FREQUENCY  RESPONSE  IMPLEMENTATION  

!  NX  Nastran  implementaSon  uses  symmetric  𝐿𝐷𝐿↑𝑇   factorizaSon  and  forward-­‐backward  subsStuSon:    

   For  𝑘=1,…,𝑛𝑟𝑒𝑠𝑝          Assemble  𝐴= 𝐾↓ℎℎ + 𝜔↓𝑘 𝑖𝐵↓ℎℎ − 𝜔↓𝑘↑2 𝑀↓ℎℎ         Factor  𝐴=𝐿𝐷𝐿↑𝑇         Solve   𝑥↓𝑘 = 𝐴↑−1 𝑏↓𝑘 = 𝐿↑−𝑇 𝐷↑−1 𝐿↑−1 𝑏↓𝑘       End  for  

 

!  NX  Nastran  sparse  factorizaSon  difficult  to  adapt  to  GPU:  ‒   Disk  oriented  ‒ Tuned  for  sparse  matrices  ‒  Symmetric  pivoSng  required  for  stability  (indefiniteness)  

 

 DETAILS  OF  ORIGINAL  METHOD  

Page 10: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

10   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

FREQUENCY  RESPONSE  IMPLEMENTATION  

!  For  GPU  code,  use  LU  factorizaSon  instead:    

   For  𝑘=1,…,𝑛𝑟𝑒𝑠𝑝          Assemble  𝐴= 𝐾↓ℎℎ + 𝜔↓𝑘 𝑖𝐵↓ℎℎ − 𝜔↓𝑘↑2 𝑀↓ℎℎ         Factor  𝐴=𝐿𝑈        Solve   𝑥↓𝑘 = 𝐴↑−1 𝑏↓𝑘 = 𝑈↑−1 𝐿↑−1 𝑏↓𝑘       End  for  

 

!  OpenCL  port  of  LAPACK  zgesv  available  with  clMAGMA  and  clBLAS  ‒  In  core  storage  ‒ Dense  oriented  (okay  for  this  applicaSon)  ‒ Benefit  mainly  in  factorizaSon  step  (cubic  operaSon  count)  

 

 DETAILS  OF  REVISED  METHOD  

Page 11: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

11   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

FREQUENCY  RESPONSE  IMPLEMENTATION  

!  Original  NX  Nastran  sparse  symmetric  solver  ‒  Spills  to  disk,  requires  minimal  memory  ‒ Minimizes  flops  by  uSlizing  symmetry  ‒ Takes  advantage  of  sparsity  

!  Improved  SMP  method  (system462=1  in  NXN9.0)  ‒  In  core,  based  on  LAPACK    zsytrf/zsytrs ‒ Efficient  parallelizaSon  of  𝑛𝑟𝑒𝑠𝑝  loop  ‒  Large  memory  requirements  

!  OpenCL  method  (to  appear  in  NXN9  MP)  ‒  In  core,  based  on  clMAGMA  zgesv (LU  factorizaSon) ‒ USlizing  GPU  for  best  performance  

 LINEAR  SOLVER  SELECTION  STRATEGY  

Page 12: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

12   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

FREQUENCY  RESPONSE  

!  Test  machine:  ‒ Magny-­‐Cours  2.1  GHz,  24  cores  ‒ 32GB  memory  ‒ 4GB  TahiS  GPU  

!  GPU  roughly  40%  faster  than    24-­‐way  SMP  

 

 INITIAL  PERFORMANCE  COMPARISON  

0:00:00  

0:14:24  

0:28:48  

0:43:12  

0:57:36  

1:12:00  

1:26:24  

1:40:48  

1:55:12  

2:09:36  

2:24:00  

e10k   e20k   e30k   e40k  

serial  

smp=8  

smp=24  

GPU  

Model   Modes  

e10k   1785  

e20k   3631  

e30k   5576  

e40k   7646  

Page 13: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

13   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

FREQUENCY  RESPONSE  –  FURTHER  IMPROVEMENTS  

!  Use  single  precision  on  GPU  for  improved  performance  ‒ Higher  flop  rate  (typically  4-­‐5  Smes)  ‒  Lower  memory  uSlizaSon    ‒  (larger  dimension  problems  possible)  ‒ Be\er  scaling  with  larger  systems  

‒  Single  precision  disadvantage:  lower  precision  ‒ Accuracy  acceptable  for  most  engineering  purposes  ‒  (largest  relaSve  error  of   10↑−5 )  

 SINGLE  PRECISION  ARITHMETIC  

1E-­‐08  

0.0000001  

0.000001  

0.00001  

0.0001  

0.001  

0.01  

0.1  

1  

Double  precision  

Single  precision  

RelaSve  error  

Page 14: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

14   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

FREQUENCY  RESPONSE  –  FURTHER  IMPROVEMENTS  

!  40-­‐50%  reducSon  in  run  Sme  

!  Largest  example  only  possible  in  single  precision  

 SINGLE  PRECISION  ACCURACY  AND  PERFORMANCE  

0:00:00  

0:02:53  

0:05:46  

0:08:38  

0:11:31  

0:14:24  

0:17:17  

e10k   e20k   e30k   e40k   e60k  

Double  

Single  

Model   Modes  

e10k   1785  

e20k   3631  

e30k   5576  

e40k   7646  

e60k   12088  

Page 15: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

15   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

FREQUENCY  RESPONSE  –  FURTHER  IMPROVEMENTS  

!  Perform  addiSon  of  matrices  at  each  frequency  on  GPU  (assembly  step)  

𝐴= 𝐾↓ℎℎ + 𝜔↓𝑘 𝑖𝐵↓ℎℎ − 𝜔↓𝑘↑2 𝑀↓ℎℎ   

!  I.e.  store   𝐾↓ℎℎ ,   𝐵↓ℎℎ ,   𝑀↓ℎℎ   in  GPU  buffers  and  sum  using  zaxpy/saxpy kernels:  

 𝐴≔ 𝐾↓ℎℎ   

𝐴≔𝐴+ 𝜔↓𝑘 𝑖   𝐵↓ℎℎ   𝐴≔𝐴− 𝜔↓𝑘↑2 𝑀↓ℎℎ   

!  Minimizes  data  transfer  to/from  main  memory  

!  AddiSonal  GPU  memory  consumpSon  

 MATRIX  SUMMATION  ON  GPU  

Page 16: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

16   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

FREQUENCY  RESPONSE  –  FURTHER  IMPROVEMENTS  

!  Double  precision  best  result  (e30k):  ‒ Time  reduced  30%  from  6:52  to  4:50  ‒ 2x  faster  than  best  CPU  Sme  

!  Single  precision  best  result  (e40k):  ‒ Time  reduced  22%  from  6:23  to  4:58  ‒ 4x  faster  than  best  CPU  Sme  

!  Best  scaling  with  largest  problems  ‒  Limited  by  GPU  memory  

 MATRIX  SUMMATION  ON  GPU  PERFORMANCE  

0:00:00  

0:01:26  

0:02:53  

0:04:19  

0:05:46  

0:07:12  

0:08:38  

0:10:05  

0:11:31  

0:12:58  

e10k   e20k   e30k   e40k  

Double  Double  +  zaxpy  Single  Single  +  caxpy  

Page 17: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

MODAL  ANALYSIS    INTRO  /  FREQ  RESP  /  MODES  /  CONCLUSIONS  

Page 18: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

18   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

MODAL  ANALYSIS  WITH  RDMODES  

!  RDMODES  –  proprietary  high-­‐performance  approximate  eigensolver  

!  Tuned  for  typical  customer  use  cases:  ‒  Larger  models  (10  million+  dofs)  ‒ Many  modes  (300+)  ‒ Accelerated  computaSon  when  few  output  dofs  required  ‒  Sufficient  accuracy  for  frequency  response  calculaSons  

!  Performance  up  to  20x  faster  than  Lanczos  

!  Demonstrated  DMP  scalability  to  2048  nodes  

 OVERVIEW  

Page 19: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

19   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

MODAL  ANALYSIS  WITH  RDMODES  

!  RDMODES  method  comprised  of  mulSple  smaller  operaSons  –  five  areas  listed  below  

!  Costs  for  customer  benchmark:  ‒ 11  million  dofs  ‒  Shell  dominated  ‒ 3000  modes  below  400  Hz  ‒ 300  frequency  responses  

!  Dense  operaSons  good  candidates  for  GPU  ‒  FactorizaSon,  eigensoluSon  

 COST  BREAKDOWN  

Opera?on   Wall  ?me  

Sparse  factorizaSon   18:40  

Dense  factorizaSon   24:00  

Sparse  eigensoluSon   9:33  

Dense  eigensoluSon   65:00  

Reduced  (dense)  eigensoluSon   21:16  

Total   250:06  

Page 20: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

20   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

RDMODES  FACTORIZATION  

!  Fairly  large  quanSty  of  each  type  

!  Sparse  factorizaSons:  ‒ Typically  too  large  to  treat  efficiently  as  dense  ‒ NXN  mulSfrontal  solver  very  efficient  ‒ Efficient  sparse  soluSon  on  GPU  difficult  (acSve  research)  

!  Dense  factorizaSons:  ‒ Model  dependent,  typically  small  ‒  Symmetric  posiSve  definite,  may  use  clMAGMA  dposv  ‒ Candidate  for  GPU  

 CLASSIFICATION  

Page 21: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

21   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

RDMODES  FACTORIZATION  

!  Dense  factorizaSon  wall  Smes  ‒ Costs  include  factorizaSon  and  miscellaneous  assembly  

!  As  with  frequency  response,  GPU  suitable  above                      threshold  

‒ Threshold  of  5000  for  this  example  

!  Dense  in  core  methods  helpful  

!  GPU  ineffecSve  for  this  model  ‒  (all  linear  soluSons  relaSvely  small)  

 DENSE  FACTORIZATION  COST  COMPARISON  

0:00:00  

0:02:53  

0:05:46  

0:08:38  

0:11:31  

0:14:24  

0:17:17  

0:20:10  

0:23:02  

0:25:55  

Serial   SMP=24  

Dense  factoriza?on  ?mes  

NXN  LAPACK  GPU  

Page 22: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

22   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

RDMODES  EIGENSOLUTION  

!  Sparse  eigensoluSons:  ‒  Large  number  ‒  Sparse,  relaSvely  large  dimension  ‒  Inexpensive  with  NXN  sparse  eigensolvers  

!  Dense  eigensoluSons:  ‒  Large  number  ‒ Dense,  small-­‐medium  dimension  ‒ Candidate  for  GPU  

!  Reduced  eigensoluSon:  ‒ Only  one  instance  ‒ Dense,  fairly  large,  many  modes  ‒  Strong  candidate  for  GPU  

 CLASSIFICATION  

Page 23: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

23   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

RDMODES  EIGENSOLUTION  

!  Householder  type  soluSon  for  real  symmetric  problem  (dsyev):  ‒ Reduce  to  tridiagonal:         𝑄↑𝑇 𝐴𝑄=𝑇    ‒ Eigenvalues  of  tridiagonal:     𝑍↑𝑇 𝑇𝑍=Λ  ‒ Compute  eigenvectors:      Φ=𝑄𝑍    ‒ Then        𝐴Φ=ΦΛ  

!  Efficient  choice  for  dense  problems,  and/or  many  eigenvectors  needed  ‒ High  memory  consumpSon  

!  Transform  generalized  eigenvalue  problem  as  follows:  ‒  Factor:        𝑀=𝐿𝐿↑𝑇   ‒  Solve:         𝐿↑−1 𝐾𝐿↑−𝑇 𝑋=𝑋Λ  ‒ Generalized  eigensoluSon:  𝐾(𝐿↑−𝑇 𝑋)=𝑀(𝐿↑−𝑇 𝑋)Λ  

 DENSE  SOLUTION  METHODS  

Page 24: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

24   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

RDMODES  EIGENSOLUTION  

!  Dimensions  range  from  2800  to  8800  ‒ Dense  problems,  modes  variable  

!  GPU  beneficial  for  larger  sizes  

!  Total  Smes  (serial)  -­‐-­‐  50%  reducSon:  ‒ 56:29  (all  Lanczos)  ‒ 15:30  (all  LAPACK)  ‒ 7:29  (using  GPU)  

!  Total  Smes  (SMP)  –  36%  reducSon:    ‒ 52:22  (all  Lanczos)  ‒ 4:41  (all  LAPACK)  ‒ 3:00  (using  GPU)  

 DENSE  EIGENSOLUTION  SCALABILITY  

0:00:01  

0:00:09  

0:01:26  

0:14:24  

2:24:00  

2000   4000   8000  

Serial  

Lanczos  LAPACK  GPU  

0:00:01  

0:00:09  

0:01:26  

0:14:24  

2:24:00  

2000   4000   8000  

SMP=24  

Lanczos  LAPACK  GPU  

Page 25: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

25   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

RDMODES  EIGENSOLUTION  

!  Householder  methods  well  suited  (as  expected)  

!  Larger  dimension  dense  problems  benefit  from  the  GPU  ‒ And  are  the  most  Sme  consuming  

!  Send  most  expensive  problems  to  GPU  

!  Threshold  set  to  3800  for  this  test  ‒ Note:  opSmal  threshold  depends  on  hardware  and  SMP  

 GPU  SUPPORT  

Page 26: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

26   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

RDMODES  EIGENSOLUTION  

!  Reduced  eigensoluSon  ‒ Not  ideally  suited  to  NXN  Lanczos  eigensolver  ‒ Unique,  but  large  (14K  dofs)  ‒ Many  eigenvectors  needed  ‒ GPU  30%  speedup  (both  SMP  and  serial)  

!  GPU  in  RDMODES  conclusions  ‒ Dense  and  reduced  eigensoluSons  benefit  ‒ Threshold  for  dense  eigensoluSon  ‒ Dense  factorizaSon  benefits  from  LAPACK:          li\le  addiSonal  benefit  on  GPU    

!  Sparse  methods  not  supported  yet      

 MOST  SIGNIFICANT  COST  COMPONENTS  

0:00:00  

0:07:12  

0:14:24  

0:21:36  

0:28:48  

0:36:00  

0:43:12  

0:50:24  

0:57:36  

Serial   SMP=24  

Reduced  Eigensolu?on  

NXN  LAPACK  GPU  

Page 27: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

27   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

RDMODES  AND  FREQUENCY  RESPONSE  

!  SMP=24,  customer  benchmark  

!  Compared  to  NXN  system:  ‒  Frequency  response  3x  faster  ‒ Reduced  eigensoluSon  2.8x  faster  ‒  FactorizaSon  28%  faster  ‒ Dense  eigensoluSon  9x  faster  ‒ 30%  reducSon  in  total  run  Sme  

!  Compared  to  LAPACK:  ‒  Frequency  response  3x  faster  ‒ Reduced  eigensoluSon  2x  faster  ‒ 10%  reducSon  in  total  run  Sme  

 BENCHMARK  PERFORMANCE  RESULTS  

0:00:00  

1:12:00  

2:24:00  

3:36:00  

4:48:00  

6:00:00  

7:12:00  

8:24:00  

NXN   LAPACK   GPU  

Frequency  response  

Reduced  eigensoluSon  

Dense  eigensoluSon  

FactorizaSon  

Other  

Page 28: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

28   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

RDMODES  EIGENSOLUTION  

!  Performance  advantages  with  single  precision  eigensoluSon  ‒ As  with  linear  soluSon  in  frequency  response,  single  precision  faster  on  GPU  ‒  Lower  GPU  memory  consumpSon  ‒  (larger  problems)  

!  Dense  eigensoluSons  (customer  benchmark)  –  35-­‐40%  speedup:  

!  Reduced  eigensoluSon  also  benefits  –  20%  speedup:  ‒ 3:05  to  2:29  

 SINGLE  PRECISION  

Double  precision   Single  precision  

Serial   7:01   4:16  

SMP=24   3:41   2:23  

Page 29: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

CONCLUSIONS  INTRO  /  FREQ  RESP  /  MODES  /  CONCLUSIONS  

Page 30: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

30   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

CONCLUSIONS  

!  Significant  benefit  with  GPU  for  certain  computaSon  types  ‒  Frequency  response  calculaSon  2x-­‐3x  faster,  dense  eigensoluSon  2x  faster  ‒ AddiSonal  35-­‐50%  improvement  possible  with  single  precision  ‒ 30%  lower  turnaround  Sme  for  typical  customer  benchmark  

!  Efficient  dense  matrix  algebra  on  GPU  with  clMath,  clMAGMA  

!  Many  thanks  to:  Ben-­‐Shan  Liao,  Wei  Zhang  (Siemens  PLM),  Antoine  Reymond  (AMD)  

Thank  you!    

   

Page 31: PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung

31   |      FAST  MODAL  ANALYSIS  WITH  NX  NASTRAN  AND  GPUS    |      NOVEMBER  12,  2013      |      CONFIDENTIAL  

DISCLAIMER  &  ATTRIBUTION  

The  informaSon  presented  in  this  document  is  for  informaSonal  purposes  only  and  may  contain  technical  inaccuracies,  omissions  and  typographical  errors.    

The  informaSon  contained  herein  is  subject  to  change  and  may  be  rendered  inaccurate  for  many  reasons,  including  but  not  limited  to  product  and  roadmap  changes,  component  and  motherboard  version  changes,  new  model  and/or  product  releases,  product  differences  between  differing  manufacturers,  sotware  changes,  BIOS  flashes,  firmware  upgrades,  or  the  like.  AMD  assumes  no  obligaSon  to  update  or  otherwise  correct  or  revise  this  informaSon.  However,  AMD  reserves  the  right  to  revise  this  informaSon  and  to  make  changes  from  Sme  to  Sme  to  the  content  hereof  without  obligaSon  of  AMD  to  noSfy  any  person  of  such  revisions  or  changes.    

AMD  MAKES  NO  REPRESENTATIONS  OR  WARRANTIES  WITH  RESPECT  TO  THE  CONTENTS  HEREOF  AND  ASSUMES  NO  RESPONSIBILITY  FOR  ANY  INACCURACIES,  ERRORS  OR  OMISSIONS  THAT  MAY  APPEAR  IN  THIS  INFORMATION.    

AMD  SPECIFICALLY  DISCLAIMS  ANY  IMPLIED  WARRANTIES  OF  MERCHANTABILITY  OR  FITNESS  FOR  ANY  PARTICULAR  PURPOSE.  IN  NO  EVENT  WILL  AMD  BE  LIABLE  TO  ANY  PERSON  FOR  ANY  DIRECT,  INDIRECT,  SPECIAL  OR  OTHER  CONSEQUENTIAL  DAMAGES  ARISING  FROM  THE  USE  OF  ANY  INFORMATION  CONTAINED  HEREIN,  EVEN  IF  AMD  IS  EXPRESSLY  ADVISED  OF  THE  POSSIBILITY  OF  SUCH  DAMAGES.  

 

ATTRIBUTION  

©  2013  Advanced  Micro  Devices,  Inc.  All  rights  reserved.  AMD,  the  AMD  Arrow  logo  and  combinaSons  thereof  are  trademarks  of  Advanced  Micro  Devices,  Inc.  in  the  United  States  and/or  other  jurisdicSons.    SPEC    is  a  registered  trademark  of  the  Standard  Performance  EvaluaSon  CorporaSon  (SPEC).  Other  names  are  for  informaSonal  purposes  only  and  may  be  trademarks  of  their  respecSve  owners.