GPU-Accelerated Design Optimization on the Cloud

GPU-Accelerated

Design Optimization

on the Cloud

Krishnan Suresh

Associate Professor

Mechanical Engineering

Design Optimization

Reduce weight

subject to constraints

(GE/GrabCAD)

A structure subject to loading

Design Optimization

Domains

3(OptiStruct)

(Generico)

Big Players, Big $

� ANSYS

� Abaqus

� Altair/OptiStruct

� Nastran

� SolidWorks

$10 billion investment annually technavio.com

Design Optimization on the Cloud

Browser driven design optimization

Design Optimization on the Cloud

Client

• No software/hardware investment

• Pay as you go

• Anywhere, anytime

Service Provider

• Easier maintenance

• Larger market

3D-Printing

Democratization of

fabrication

Design Optimization to 3D Printing

Democratization of design

Catch?

Design Optimization

DesignSpace

Finite Element Analysis(FEA)

Optimal?

ChangeDesign

10^5 ~ 10^7 dof

Solve Kd = f

K: Sparse SPD

100’s of iterations!

Design Optimization Cost

A naïve port

to cloud will not work!

� OptiStruct (commercial code)

� Xeon E5 2697, 92 GB

� 20 hours!

Cloud Based Design Optimization

Fast Limited

Memory FEA

Pareto

Optimization

Acceleration

Fast Limited

Memory FEA

FEA Bottleneck: Kd = f

DesignSpace

Finite Element Analysis(FEA)

Optimal?

ChangeTopology

10^5 ~ 10^7 dof

Solve Kd = f

K: Sparse SPD

100’s of iterations!

Kd = f (GTC)

� Fine-grained Parallel Preconditioners

� CULA

� MAGMA

� Accelerating Iterative Linear Solvers

� Efficient AMG on Hybrid GPU Clusters

� Preconditioning for Large-Scale Linear Solvers

� Exploit mesh congruency

� Exploit physics behavior

� K constantly changing

Design Optimization

Kd = f

Exploit Mesh Congruency (GTC 2014)

Model DiscretizeAssemble/

Post-process

Mesh-aware SpMV Acceleration: Congruence

Element Congruency

62350 elements

2780 distinct

95.5% congruent

Observation: Large-meshes contain many similar elements!

Elements are ‘rigid-body/scaling’ congruent

⇒ Identical element stiffness Ke

Only store Ke of distinct elements

Implication: SpMV

Classic: N

Kd K d=

≡ ∑

: Sparse Matrix-Vector Multiplication (SpMV)

Critical operation in ALL iterative solvers

Assembly-free: N

Kd K d=

≡∑

Only store Ke of distinct elements + Assembly Free

Experiment

106 Elements

1 Distinct element

Assembled AF-CPU AF-GPU

SpMV; Kd (msec)

- Same number of FLOPS!

- Reduced memory

Mesh-aware Kd

Naïve Kd

One Kd (SpMV)

Physics Aware Deflation

Model DiscretizeAssemble/

Post-process

Physics Aware Deflation

TK W KW=%

0K K<<%

Agglomeration/Grouping

Treat each group as rigid body

Deflated CG

Example

3.15 million DOF

Fast Limited

Memory FEA

Pareto

Optimization

1. Mesh Congruency

2. AF Deflation

Design Optimization

K Matrix: Constantly changing

� Update K?

� Update deflation?

Assembly-free: N

Kd K d=

≡∑Skip deleted finite elements

SpMV accelerates further

TK W KW=%

0 ( )T

eK K W K W= − ∆% %

K K<<%

Example: Design Optimization

� OptiStruct (commercial)

� Xeon E5 2697, 92 GB

� 20 hours!

� Pareto

� I7 4770, 8 GB

� 42 mins

Framework

Fast Limited

Memory FEA

Acceleration

Pareto

Optimization

Mesh Aware SpMV on GPU

Deflation on GPU

TW dWµProlongation Restriction

Example: Design Optimization

� OptiStruct (commercial)

� Xeon E5 2697,92 GB

� 20 hours!

� Pareto

� I7 4770,8 GB

� 42 mins

� Pareto

� GTX 480,1.5 GB

� 6 mins

Fast Limited

Memory FEA

Pareto

Optimization

Acceleration

WebGL & Three.js

� JavaScript API for 3D graphics in browsers

� www.khronos.org

� Almost all browsers

ThreeJs

� Higher-level library

� www.threejs.org

� Almost all browsers

Finally …

A Pilot Service

www.cloudtopopt.com

� Entry level server

• E3-1270 V3

• 8 GB

� Limited to 150,000 degrees of freedom

� 300+ users

www.cloudtopopt.com

� Port to HPC provider

� NSF funding

� Launch startup

www.cloudtopopt.com

Acknowledgements

Praveen Yadav Shiguang Deng Amir M. Mirzendehdel Chaman Singh Alireza Taheri Bian Xiang

Anirudh Krishnakumar Anirban Niyogi Victor Cavalcanti Cameron Gilanshah Yibo Hu Alex Buehler

Funding� NSF

� Air-force

� Luvata

� Autodesk

� Sandia National Lab

ksuresh@wisc.edu www.cloudtopopt.com

GPU-Accelerated Design Optimization on the Cloud

Documents

Transcript of GPU-Accelerated Design Optimization on the Cloud

GPU Accelerated Libraries

Preparing GPU-Accelerated Applications for the Summit ...on-demand.gputechconf.com/gtc/2017/presentation/s7642-fernanda-foertter-preparing-gpu...Preparing GPU-Accelerated Applications

GPU-Accelerated Fluid Dynamics

GPU Accelerated Lanczos Algorithm with Applications

University of Tsukuba’s Accelerated Computing · Accelerated Computing TaisukeBoku ... n Base cluster with commodity GPU cluster technology ... intra-node GPU-GPU data copy

GPU-accelerated k-mer counting

Using Docker for GPU Accelerated Applications

GPU Accelerated Linear Algebra

Graphics Processor Unit (GPU) Accelerated Shallow ...

GPU Accelerated Genomics Data Compression · 2014. 4. 10. · Bzip2Qlike’Compression’Method’Pipeline ... GPU Accelerated Genomics Data Compression GuiXin Guo ...

GPU-Accelerated Path Rendering - GTC Onon-demand.gputechconf.com/.../S0024-GPU-Accelerated-Path-Rende… · sophisticated 2D graphics with OpenGL, ... –GPU-accelerated path rendering

The GPU Accelerated Database

GPU-Accelerated Applications for HPC Industries| NVIDIA · GPU‑ACCELERATED APPLICATIONS CONTENTS ... (back-end generates CUDA and ... GPU-Accelerated Applications for HPC Industries

GPU-Accelerated Data Science

GPU-Accelerated Large Scale Analytics · GPU-Accelerated Large Scale Analytics ... the GPU-accelerated version can ... for a broad base of users. While massively parallel data management

GPU-ACCELERATED APPLICATIONS - Nvidiaimages.nvidia.com/content/tesla/pdf/Apps-Catalog-March...* SaaS and Appliance models available. Multi-GPU Single Node BlazingDB BlazingDB GPU-accelerated

GPU-Accelerated Analysis of Petascale Molecular Dynamics … · 2012. 6. 12. · GPU-Accelerated Analysis of Petascale Molecular Dynamics Simulations John Stone ... GPU acceleration

GPU accelerated Large Scale Analytics

GPU Accelerated AES

PG-Strom - GPU Accelerated Asyncr