Why HPC for ANSYS CFD? -...

26
© 2014 ANSYS, Inc. November 25, 2014 1 Why HPC for ANSYS CFD?

Transcript of Why HPC for ANSYS CFD? -...

© 2014 ANSYS, Inc. November 25, 2014 1

Why HPC for ANSYS CFD?

© 2014 ANSYS, Inc. November 25, 2014 2

High Performance Computing (HPC) at ANSYS:

An ongoing effort designed to remove computing limitations from engineers who use computer aided engineering in all phases of design, analysis, and testing.

It is a hardware and software initiative!

HPC Defined

© 2014 ANSYS, Inc. November 25, 2014 3

Need for HPC

Impact product design Enable large models Allow parametric studies

Turbulence Combustion Particle Tracking

Assemblies CAD-to-mesh Capture fidelity

Multiple design ideas Optimize the design Ensure product integrity

© 2014 ANSYS, Inc. November 25, 2014 4

HPC Revolution

• Recent advancements have revolutionized the computational speed available on the desktop – Multi-core processors

• Every core is really an independent processor

– Large amounts of RAM

© 2014 ANSYS, Inc. November 25, 2014 5

Typical HPC Growth Path

Cluster Users Desktop User Workstation and/or

Server Users

© 2014 ANSYS, Inc. November 25, 2014 6

Summary

Design Impact

HPC

Using today’s multicore computers is key for companies to remain competitive. ANSYS HPC product suite allows scalability to whatever computational level required, from single-user or small user group options at entry-level up to virtually unlimited parallel capacity or large user group options at enterprise level.

• Shorter time to solution

• Increase high-fidelity insight

• Examine more design variants faster

© 2014 ANSYS, Inc. November 25, 2014 7

Application Example

Benefit of HPC for CFD Applications - Shorter Time to Solution with GPUs

Objective Meeting engineering services schedule & budget, and technical excellence are imperative for success. ANSYS Solution • PSI evaluates and implements the new technology

in software (ANSYS 15.0) and hardware (NVIDIA GPU) as soon as possible.

• GPU produces a 43% reduction in Fluent solution time on an Intel Xeon E5-2687 (8 core, 64GB) workstation equipped with an NVIDIA K40 GPU

Design Impact Increased simulation throughput allows meeting delivery-time requirements for engineering services. Images courtesy of Parametric Solutions, Inc.

© 2014 ANSYS, Inc. November 25, 2014 9

Application Example

Benefit of HPC for CFD Applications - Increase High-Fidelity Insight

Objective Full-stage simulations of turbochargers for diesel engines are needed to reliably understand and optimize their performance prior to physical prototyping. ANSYS Solution • ANSYS CFX simulations deliver near-linear parallel

processing on 160-core HPC system upgrade. • ANSYS HPC performance delivers ability to consider

5 full-stage compressor or turbine designs in a few hours (compared to many days prior to upgrade).

Design Impact ANSYS HPC is enabling Cummins to use larger models with greater geometric details and more-realistic treatment of physical phenomena to generate results in less time.

Courtesy Cummins Turbo Technologies

© 2014 ANSYS, Inc. November 25, 2014 10

Application Example

Benefit of HPC for CFD Applications - Increase High-Fidelity Insight

Objective Overtake the technological challenges on flow assurance and subsea oil processing present on the new pre-salt oil fields. ANSYS Solution • Transient multiphase simulations with ANSYS Fluent are

used to understand the sand transportation inside the kilometres long production lines

• ANSYS HPC performance together with advanced multiphase models and dynamic meshing features enable Petrobras to virtually reproduce critical scenarios and complex operation.

Design Impact Very detailed CFD simulations are providing Petrobras important physical insights that are guiding the design of the new tendency of upstream processing systems at oil industry.

© 2014 ANSYS, Inc. November 25, 2014 11

Application Example

Benefit of HPC for CFD Applications - Examine More Design Variants

Objective Advance in racing boat design to sustain medal-winning performances at Olympic games. ANSYS Solution • ANSYS CFX is used to optimize the fluid

dynamics for different classes of racing kayaks. • Using HPC, transient simulations of moving

boats can be accomplished in just two or three days.

Design Impact Using HPC, the FES engineers were able to efficiently consider up to 20 different virtual designs per boat class, and from those 20 designs they gained enough confidence to build a single prototype for testing.

Courtesy FES

© 2014 ANSYS, Inc. November 25, 2014 12

ANSYS Fluent Scaling at Dual Processors - Faster with More Compute Cores

Intel Xeon E5-2690v2 processors (3 GHz, 20 cores total) with 128 GB of RAM.

0

200

400

600

800

1000

1200

1400

1p 2p 4p 6p 8p 10p 12p 14p 16p 18p 20p

solver ratings

processes Geometric mean

0

2

4

6

8

10

12

14

16

1p 2p 4p 6p 8p 10p 12p 14p 16p 18p 20p

Speedup

processes Speedup

Higher is

Better

© 2014 ANSYS, Inc. November 25, 2014 13

ANSYS CFX Scaling at Multiple Nodes - Faster with More Compute Nodes

Each node has 2 X 10-core Intel Xeon E5-2690 v2 processors (3.0 GHz, 1866 MHz) with 128 GB of RAM. InfiniBand FDR.

Speedup

0

1

2

3

4

5

6

7

8

1 node 2 nodes 4 nodes 8 nodes

© 2014 ANSYS, Inc. November 25, 2014 15

Hexa mesh (830.000 cells)

Standard K-Epsilon Turbulence Model

VOF multiphase model (3 phases): Molten Steel Foamy Slag Oxygen 0

1

2

3

4

5

6

7

12 24 36 48 60 72

Spee

dup

cores

ideal speedupmeasured speedup

cores overall time (h)

measured speedup

ideal speedup

12 0.56 1.00 1 24 0.29 1.94 2 36 0.21 2.60 3 48 0.17 3.33 4 72 0.12 4.76 6

Courtesy of MORE S.r.l.

ANSYS Fluent Scaling at Multiple Nodes - Faster with More Compute Cores, for Complex Physics

© 2014 ANSYS, Inc. November 25, 2014 16

• Segregated implicit solver • Scalable at ~10K cells per core!

0

500

1000

1500

2000

2500

3000

3500

4000

0 2048 4096 6144 8192 10240 12288

Ratin

g

Number of Cores

13.0.014.0.015.0.0

Rating is jobs per day. A higher rating means faster performance.

Truck_111M Turbulent Flow

0

100

200

300

400

500

600

700

800

900

1000

0 2048 4096 6144 8192 10240 12288 14336

Ratin

g

Number of Cores

DLR_96M LES Combustion

R15.0Ideal

• Pressure based coupled solver • Scalable at ~10K cells per core!

Scaling Improvements at 10,000+ Cores Yield Benefits for Smaller Jobs!

ANSYS Fluent Scaling at Multiple Nodes - Parallel Efficiency Improving Release-by-Release!

© 2014 ANSYS, Inc. November 25, 2014 17

ANSYS CFX Scaling at Multiple Nodes - Parallel Efficiency Improving Release-by-Release!

R&D effort to improve HPC scaling in CFX • Basic & physics specific scaling areas • Significantly improved scalability

– Up to 89% efficiency at 2048 cores – HPC improvements are “beta” level for R15.0

4X faster

Courtesy Siemens AG, Müllheim, Germany, Paper GT2013-94639

5X faster

• Six Stage Axial Compressor • 13M nodes • 14 domains, 12 mixing planes

• Duct case • 150M nodes

© 2014 ANSYS, Inc. November 25, 2014 18

ANSYS Fluent 15.0 on GPU Performance of Pressure-Based Solver

Sedan Model

Sedan geometry 3.6M mixed cells Steady, turbulent External aerodynamics Coupled PBNS, DP CPU: Intel Xeon E5-2680; 8 cores GPU: 2 X Tesla K40

CPU + GPU

Segregated solver

1.9x

Higher is

Better

Coupled solver CPU only CPU only

15 Jobs/day

12 Jobs/day

27 Jobs/day

Convergence criteria: 10e-03 for all variables; No of iterations until convergence: segregated CPU-2798 iterations (7070 secs); coupled CPU-967 iterations (5900 secs); coupled 985 iterations (3150 secs)

NOTE: Times for total solution until convergence

© 2014 ANSYS, Inc. November 25, 2014 19

ANSYS Fluent 15.0 on GPU Performance of Pressure-Based Solver

All results are based on turbulent flow over a truck case (14-million cells) until convergence; steady-state, pressure-based coupled solver with double-precision; No. of iterations to reach convergence: CPU-531; CPU+GPU-566; The solution cost is approximated and includes both hardware and software license costs. Productivity is based on number of completed Fluent jobs/day in a multi-user cluster environment. Hardware: Intel Xeon E5-2680 (64 CPU cores on 8 sockets) 8 Tesla K40 GPUs. License: ANSYS Fluent and ANSYS HPC Workgroup 64.

CPU only CPU + GPU

16 Jobs/day

25 Jobs/day

Higher is

Better

Benefit

100%

125%

100%

156%

CPU only CPU + GPU Cost

TRUCK BODY MODEL (14 million cells)

© 2014 ANSYS, Inc. November 25, 2014 20

ANSYS Fluent 15.0 on GPU Better Speedup on Larger Models

Truck Model

NOTE: Reported times are per

iteration 14 million cells

13

9.5

111 million cells

36

18

144 CPU cores

1.4 X 2 X

Lower is

Better

36 CPU cores

36 CPU cores + 12 GPUs

ANSY

S Fl

uent

Tim

e (S

ec)

External aerodynamics Steady, k-ε turbulence Double-precision solver CPU: Intel Xeon E5-2667; 12 cores per node GPU: Tesla K40, 4 per node

144 CPU cores + 48 GPUs

© 2014 ANSYS, Inc. November 25, 2014 21

NVIDIA-GPU Solution Fit for ANSYS Fluent

Yes

No

Pressure-based coupled

solver?

Pressure–based coupled solver

Best-fit for GPUs

Segregated solver Is it a

steady-state analysis?

No

Consider switching to the pressure-based coupled solver for better performance (faster convergence) and further speedups with GPUs. Please see the next slide.

Yes

Is it single-phase & flow dominant?

Not ideal for GPUs

CFD analysis

No

© 2014 ANSYS, Inc. November 25, 2014 22

Scalable HPC Licensing

2048

32 8

128 512

Parallel Enabled (Cores)

Packs per Simulation 1 2 3 4 5

ANSYS HPC (per-process)

ANSYS HPC Pack • Each simulation consumes one or more Packs • Parallel enabled increases quickly with added Packs

ANSYS HPC Workgroup • 16 to 2048 parallel shared across any number of

simulations on a single server (16, 32 and 64 are NEW!) • 128 to 2048 enterprise parallel deployed and used

anywhere in the world

ANSYS HPC Parametric Pack and DSO • Enables simultaneous execution of multiple design

points while consuming just one set of licenses

Single HPC solution for FEA/CFD/FSI and any level of fidelity

© 2014 ANSYS, Inc. November 25, 2014 23

15.0 HPC Licensing Enabling GPU Acceleration - One HPC Task Required to Unlock one GPU!

6 CPU Cores + 2 GPUs 1 x ANSYS HPC Pack 4 CPU Cores + 4 GPUs

Licensing Examples:

Total 8 HPC Tasks (4 GPUs Max)

2 x ANSYS HPC Pack Total 32 HPC Tasks (16 GPUs Max)

Example of Valid Configurations:

24 CPU Cores + 8 GPUs

(Total Use of 2 Compute Nodes)

.

.

.

.

. (Applies to all license schemes: ANSYS HPC, ANSYS HPC Pack, ANSYS HPC Workgroup)

© 2014 ANSYS, Inc. November 25, 2014 24

HPC Parametric Pack License Scheme - Explore Parametric Designs Faster, More Cost Effectively

© 2012 ANSYS, Inc. November 25, 2014 25

Problem Description • Improve mixing while reducing energy • Design objective:

– Optimize the inlet velocities within their operating limits so that both temperature spread at the outlet and pressure drop in the vessel are minimized

• Input Parameters: fluid velocity at the cold and hot inlet (8 Design Points)

Example: Mixing Vessel - ANSYS HPC Parametric Pack

inlet cold

outlet

inlet hot

• Detail: – K-Epsilon Model with Standard Wall Functions – 52,000 nodes and 280,000 elements – Workstation: HP workstation with dual Intel Xeon E5-2687W

(3.10 GHz, 16 cores), 128 GB memory

Licensing Solution • 1 ANSYS Fluent • 2 ANSYS HPC Parametric Packs Result/Benefit • ~4.8x speedup over sequential execution

• Easier and fully automated workflow Acknowledgment: Paul Schofield and Jiaping Zhang, ANSYS Houston

© 2014 ANSYS, Inc. November 25, 2014 26

ANSYS Advantages

HPC for CFD Applications - Final Remarks

• Superior and proven parallel scalability above 80% efficiency with as low as 10,000 cells per CPU core, providing the ability to – Run bigger models at smaller hardware – Run smaller models at higher core counts

• Solvers required for complex physics (chemistry, multiphase) are highly optimized to run fast and deliver outstanding parallel scaling on today’s multicore processors

• ANSYS provides flexible, scalable, and cost-attractive HPC licensing! Courtesy of MORE S.r.l.

© 2014 ANSYS, Inc. November 25, 2014 27

“Take Home” Points / Discussion

With HPC, you can increase your engineering productivity by: • Decreasing your simulation time (increasing throughput)

• Performing larger, more detailed simulations (solving the unsolvable)

• Evaluating more design variations (gaining better insight into product performance)

© 2014 ANSYS, Inc. November 25, 2014 28

THANK YOU!