Large-Scale Matrix-Free Topology Optimization on the...
Transcript of Large-Scale Matrix-Free Topology Optimization on the...
![Page 1: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/1.jpg)
Large Scale Matrix-Free
Topology Optimization on the GPU
Krishnan Suresh
Associate Professor
Mechanical Engineering
![Page 2: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/2.jpg)
What is Topology Optimization?
2
![Page 3: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/3.jpg)
Topology Optimization (2D)
Reduce weight, but keep it stiff
Structure problem Optimal topologies
10% reduction in weight
2% loss in stiffness
D
50% reduction in weight
12% loss in stiffness
Topology optimization is the systematic generation of such structures
3
![Page 4: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/4.jpg)
TopOpt (SIMP)
0 1 : ' Density' e
r< £
SIMP: Most popular TopOpt method
4
![Page 5: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/5.jpg)
Topology Optimization (3D)
40% reduction in weight
10% loss in stiffness
50% reduction in weight
30% loss in stiffness
5
![Page 6: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/6.jpg)
Topology Optimization
6
![Page 7: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/7.jpg)
Computational Challenge?
7
![Page 8: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/8.jpg)
Topology Optimization
Design Space
Finite Element Analysis (FEA)
Optimal?
Change Topology
No
10^5 ~ 10^7 dof
Solve Ku = f
K: Sparse SPD
100’s of iterations!
- Multi-load
- Nonlinear FEA
- Constraints
- … 8
![Page 9: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/9.jpg)
Topology Optimization Cost
Size DOF [Wang]
Medium (84,28,14) 107,184 2.4 hours
Large (180,60,30) 1,010,160 45.7 hours
AMD Opteron TM252 2.6GHz
64-bit processor, with 8GB RAM
9
![Page 10: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/10.jpg)
Solve Ku=f
10
Method
• Direct
• Iterative
Precision
• Single
• Double
![Page 11: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/11.jpg)
Iterative Solve Ku = f on GPU
11
![Page 12: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/12.jpg)
Iterative Solve Ku = f
Two Goals:
1. Minimize #iterations
2. Minimize cost of K*u (SpMv)
1 ( )
: Preconditioner of K
i i iu u B f Ku
B
12
![Page 13: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/13.jpg)
13
SpMV on GPU
Double precision real world SpMv
100,000 to 1 Million degrees of freedom
GPU (GTX 280): 2~6 increase in flop-rate
![Page 14: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/14.jpg)
14
Iterative Ku = f (GTC 2012)
Fine-grained Parallel Preconditioners …(Wed)
CULA (Wed)
MAGMA (Wed)
Accelerating Iterative Linear Solvers (Wed)
Efficient AMG on Hybrid GPU Clusters (Wed)
Preconditioning for Large-Scale Linear Solvers (Thu)
…
![Page 15: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/15.jpg)
Iterative Solve Ku=f in TopOpt:
Unique Challenges
15
![Page 16: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/16.jpg)
TopOpt (SIMP)
0 1 : ' Density' e
r< £
TopOpt (SIMP)
- Ill-conditioned K
- Poor convergence
SIMP: Most popular TopOpt method
( ) 0
n
e eK Kr=
Large #iterations… bottleneck
16
1 ( ) i i iu u B f Ku
![Page 17: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/17.jpg)
PareTO: A GPU-Friendly TopOpt Method
17
![Page 18: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/18.jpg)
PareTO
Don’t Assign Densities!!
Compute T-Field!!
18
3 1 1 510 : ( ) ( )
2 7 5 1 2tr tr
n ns e s e
n n
é ù- -ê ú= - -ê ú- -ë û
T=
T
0.00004 A
1.36 B
T(x,y) = Stiffness-change
when hole is added @(x,y)
![Page 19: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/19.jpg)
T-Field
19
![Page 20: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/20.jpg)
T-Field: Level-Set
20
![Page 21: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/21.jpg)
T-Field: Optimization
![Page 22: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/22.jpg)
…
22
![Page 23: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/23.jpg)
PareTO Algorithm
Design Space
FEA TopSens
Decrement Volume
Pareto-Optimal?
Yes
Recompute t No
23
![Page 24: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/24.jpg)
Pareto Optimal Designs
![Page 25: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/25.jpg)
PareTO
SIMP
PareTO
Now focus on GPU accelerated K*u
25
PareTO: Well-conditioned K Few CG iterations
![Page 26: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/26.jpg)
Fast K*u on GPU
26
![Page 27: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/27.jpg)
Approach-1: Classic
GPU thread per node ?
~100 non-zeros
1. Assemble K (CPU/GPU)
2. Push to GPU
3. Execute K*u in parallel
Coalesced memory access
challenging!
![Page 28: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/28.jpg)
Approach-2: Matrix-Free
e
e
K K
? ...
e e e
e e
Ku K u K u
![Page 29: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/29.jpg)
Exploit Element Congruency
20 distinct elements
120 distinct elements 1 distinct element
1.48 MB storage (vs. 384 MB)
![Page 30: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/30.jpg)
Algorithm
Detect
Congruency
Hex-mesh
Compute Ke of templates
Push data to GPU
Assembly-free
PCG Solve
Ku = f
on GPU
30
Element: Present
or Absent
TopOpt
![Page 31: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/31.jpg)
K*u on GPU
31
![Page 32: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/32.jpg)
PareTO: Examples
32
![Page 33: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/33.jpg)
Platform
CPU:
– i7, 3.2 GHz, 8 core
– 6 GB
– C Code (OpenMP)
GPU:
– GTX 480 (400 cores)
– 1.5 GB
33
![Page 34: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/34.jpg)
Typical FEA Computing Time
34
DOF PareTO
CPU GPU
Cantilever
Beam; Edge
110K 8.3 secs 1.9 secs
Stool 2.7M 186 secs 32 secs
Point Load
Cantilever
15M 51 mins 5.73 mins
92M 12 hr -
1.5 hr on 44 core (ONR)
![Page 35: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/35.jpg)
TopOpt Comparison
Size DOF [Wang]*
Medium (84,28,14) 107,184 2.4 hours
Large (180,60,30) 1,010,160 45.7 hours
AMD Opteron TM252 2.6GHz
64-bit processor, with 8GB RAM GTX 480
Identical results
35
![Page 36: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/36.jpg)
Bridge Problem
36
![Page 37: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/37.jpg)
Bridge Problem
37
![Page 38: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/38.jpg)
Knuckle Problem
20,000 dof (Abaqus)
38
![Page 39: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/39.jpg)
Wheel Support
39
![Page 40: Large-Scale Matrix-Free Topology Optimization on the …on-demand.gputechconf.com/...Scale-Matrix-Free-Topology-Optimizat… · Large Scale Matrix-Free Topology Optimization on the](https://reader031.fdocuments.us/reader031/viewer/2022021512/5afc3f957f8b9a444f8bcafe/html5/thumbnails/40.jpg)
TopOpt Computing Time
40
Name of part &
volume fraction
DOF Pub.
Data
PareTO
CPU GPU
Cantilever
Beam; Edge (0.50)
110K 2.4 hr 200 secs 45 secs
Knuckle (0.55) 20K -- 111 secs 44 secs
Bridge (0.35) 113K -- 2 mins 36.2 secs
Stool; N=96 (0.20) 2.7M 21.8 hr 1 hr, 24
mins
14 mins
Point Load
Cantilever (0.50)
783K 3.9 hr 16 mins 125 s
15M - 19hr, 28
mins
2hr, 12 mins
92M - 12 days,
2hr
-