0 Copyright 2014 FUJITSU
Human CentricInnovation
Fujitsu Forum2014
19th – 20th November
1 Copyright 2014 FUJITSU
Leverage expertise for more economic value from HPC Clusters
Ian GodfreyDirector, Solutions Business at Fujitsu Systems Europe
2 Copyright 2014 FUJITSU
Create Knowledge from Information
Data KnowledgeInformation
Processed Relevant &
Actionable
Knowledge, as a form of capital, must be exchangeable among
persons, and it must be able to grow. [Turban et al, 2003]
3 Copyright 2014 FUJITSU
HPC contribution to knowledge creation
Data KnowledgeInformation
Measured
Created
Filtered
Organised
Simulated
Simplicity Expertise
4 Copyright 2014 FUJITSU
Scope for knowledge creation
Simplicity
+
Expertise
User
numbers
Usage
density
Usage
intensity
Knowledge
creation
opportunities
5 Copyright 2014 FUJITSU
Fujitsu Application Solutions
Simplify HPC to lower cost and
risk, and increase access
Build in
Expertise to
realise more
value from HPC
6 Copyright 2014 FUJITSU
Innovation constrained
NO YES
Do you think more computing capacity could increase the value
of simulation to your company?
Current computing
capacity is sufficient
for our needs.
For faster turnaround times.
To consider more design ideas or operating conditions.
For better understanding from more detailed or
complex sophisticated simulation models.
Courtesy of ANSYS Inc.
7 Copyright 2014 FUJITSU
Barriers to HPC
Barrier to Expanded Usage of Simulation
Lack of IT hardware and support infrastructure
Lack of IT expertise and support
Barrier to Adopting or Expanding New HPC infrastructure
Need of evidence of technical benefits for their simulation workloads
Lack of time and expertise to specify the hardware configuration
Courtesy of ANSYS Inc.
8 Copyright 2014 FUJITSU
Economic impact through HPC
COTS are central to all manufacturers
90% of these applications are used on PCs
Pressing need to shorten design time and increase realism
Discovery alone is not sufficient
Higher fidelity, but within the industrial design cycle
Source: Merle Giles, HPC User Forum, RIKEN, Kobe, Japan, July 2014
9 Copyright 2014 FUJITSU
Integrated Solution Process
1. Classify business sector workload
demands.
2. Design reference configurations and
dimensions for the simulation workload.
3. Integrate application in system environment
and simplify use and operation.
4. Onboard expertise to minimise time to peak
productivity, and sustain higher output.
FUJITSU Integrated Solutions for HPC
10 Copyright 2014 FUJITSU
Solution sectors
Businesses currently under-utilising
HPC and constraining their models
due to barriers from lack of expertise
and support.
CFD for SMEs Multiphysics for Geosciences Visualistion for CGI Agencies
An integrated multiphysics system
and in-built optimisation module for
more complete Geophysics &
Geomechanics simulation.
An easy scalable environment for
Creative Agencies to shorten the time
to build physically realistic animations
with VRED ray-tracing accuracy.
11 Copyright 2014 FUJITSU
An appliance to broaden usability of CFD
ANSYS Fluent & CFX
12 Copyright 2014 FUJITSU
CFD in the supply chain
ANSYS® Fluent® is a leading commercial code offering deep and
broad computational fluid dynamics capabilities
13 Copyright 2014 FUJITSU
Sector workloads for CFD with ANSYS
Automotive Supply ChainBuilt Environment
Customers Constructors, Architect bureaux
Application ANSYS Fluent
Physics Transient; Optimisation
Model HVAC 8M cells
Customers Sub-assembly suppliers
Application ANSYS Fluent
Physics Transient; Optimisation
Model Exhaust system 8M cells
14 Copyright 2014 FUJITSU
Business context – Automotive Supply Chain
Design changes spread along the supply chain
At all stages the performance and reliability of the part/sub-assembly must be optimised in the integrated system context
An exhaust system is one of many sub-assemblies in a modern vehicle, and usually the most constrained in the overall design
Looking at the options and solutions for such manufacturers we identity patterns and lessons for other suppliers
15 Copyright 2014 FUJITSU
Business challenges
Increased regulations on vehicle emissions and customer demand for fuel economy
Increased importance of the exhaust system’s design.
Advanced catalytic converters, aftertreatment devices
Ever quieter mufflers
Last component in project schedule
Timescales compressed, suppliers adjust to any upstream planning changes
Little flexibility to maintain high quality, performance-optimised, designs.
Vehicle geometry finalised
Subsequent design changes within a fixed external space
These challenges are triggers for an HPC need
16 Copyright 2014 FUJITSU
Baseline model
Model setup
Mesh
Geometry based on a full exhaust sub-system.
Optimisations made around manifold downpipe to
catalytic converter to improve mixing and flow.
Cells: 7,636,538
Physics Transient simulation with explicit time stepping.
Five minute engine startup cycle at 2000 rpm.
Workload A production model representing the Exhaust
simulation workloads for different types of sub-
system
The baseline simulation studied the transient behaviour of the exhaust gases within a typical assembly of manifold, conduit
and baffles. Optimisation was then done by applying variations around several dimensions concurrently: massflow,
geometry, catalytic converter resistivity.
17 Copyright 2014 FUJITSU
Evaluation system configurations
Application version ANSYS Fluent V15.0
Processors per node 2
Processor frequency 2.6 0GHz 3.00 GHz
2.80 GHz
2.50 GHz
2.20 GHz
2.70 GHz
2.40 GHz
Cores per processor 8 10 12
Interconnect Infiniband
Gigabit Ethernet
MPI libraries Intel
Fujitsu PRIMERGY CX250 nodes
with Dual Intel® Xeon® CPU
E5-2600 V2 processors
18 Copyright 2014 FUJITSU
Performance results
19 Copyright 2014 FUJITSU
Design optimisation parameters
Potential variations
5 massflows, 10 geometric modifications, 3 resistivities
Transient heat up for 5min
5X10 + 5X3 steady-state runs totals to 65 runs
4-stroke engine @ 2000rpm: 1 rev = 40 timesteps
• 2000 x 40 x 5 = 400,000 timesteps
Target Variables
Total pressure drop; homogeneity of Cat flow/utilization
Long-term
Acoustic behavior, Thermal effects, Mechanical stresses and fatigue
20 Copyright 2014 FUJITSU
Workload-based configurations
Assembly type
Component
(Muffler)
Sub-System
(Exhaust Aftertreatment)
Full System
(Entire Exhaust System)
Overall project duration (weeks) 2 3 4
Model size (number of cells) 5,000,000 10,000,000 30,000,000
Steady-state simulation phases Ideal job count
Problem setup 2 5 10
Design of experiment 25 50 100
Optimisation 25 50 100
Robust design optimization (RDO) 25 50 100
Transient scenarios Ideal job count
Problem setup (60 timesteps) 5 10 20
DoE (60 timesteps) 10 20 30
Full accurate transient run (60000 timesteps) 1 1 1
Elapsed time on 4 nodes Time in hours
Steady-state 6.7 27.1 162.5
Transient – 60 timesteps 7.9 31.5 157.3
Full accuracy transient – 60,000 timesteps 524 1049 3146
Tuned cluster size - number of compute nodes 12 20 40
1.1 weeks 1.3 weeks 2.1 weeks
21 Copyright 2014 FUJITSU
ANSYS Fujitsu Appliance
Pre-defined clusters optimised for ANSYS workloads
Best-practice application setup
Integrated system software and user environment preparation
Streamlined support
Inclusive pricing
Run immediately
22 Copyright 2014 FUJITSU
Targets
Appliance Objectives
Stimulate need – easier HPC
access, integrated software
environment.
Mitigate risk – validated
configuration, assured productivity,
streamlined delivery & support,
Try&Buy.
Confirm performance –
demonstrated workloads,
price/performance-optimised
configurations.
Appliance Markets
Companies using only
workstations to run ANSYS
solvers.
Existing users of small to medium
HPC Clusters.
Branch offices of larger
organisations with local HPC
compute resources.
23 Copyright 2014 FUJITSU
Where the Appliance can help
Number of users
Density of usage
Intensity of usage
Simplified user environments and more automated methods enable less expert users to run simulations.
Tuned, pre-configured solutions combining application and hardware ease market penetration.
Simulation is being applied to new physical phenomena and a wider range of physical models, while designs are increasingly elaborate.
Multi-physics allows a more complete study of the model, capturing interacting forces and behaviours exhibited in the real product.
Ensemble approaches – more jobs for a given model – give more accurate and robust designs, and satisfy a more informed customer demand.
Workflows automatically control and distribute compute tasks simplifying the optimisation methods; larger scale is handled more easily and optimally.
24 Copyright 2014 FUJITSU
Sample PRIMEFLEX configurations for ANSYS CFD
Purpose Low-cost entry Standard usage Medium scale
Head node PRIMERGY RX300 S8 PRIMERGY RX300 S8 PRIMERGY RX300 S8
Compute node 4x PRIMERGY CX250
or
4x PRIMERGY RX200
8x PRIMERGY CX250 28x PRIMERGY CX250
or
24x PRIMERGY CX250 plus
2x PRIMERGY CX270 with
Nvidia or Xeon Phi accelerators
Compute processor:
Dual Intel Xeon Processor
E5-2600 V2
8-core 2.60GHz 10-core 2.80GHz
or
10-core 2.50GHz
10-core 3.00GHz
Fast interconnect 1 InfiniBand QDR switch 1 InfiniBand QDR switch 1 InfiniBand QDR switch
Total compute cores
(excluding accelerators)80 cores 160 cores 560 or 520 cores
25 Copyright 2014 FUJITSU
A simpler way to work
Streamlined acquisition and support, delivered fully assembled for immediate production service
Components balanced to avoid bottlenecks, fully integrated behind an intuitive web user workplace for higher productivity at first login
26 Copyright 2014 FUJITSU
Robust design optimisation made more practical
ANSYS Mechanical
27 Copyright 2014 FUJITSU
Robust Design Optimisation
Robust Design Optimization (RDO) is optimizing the design under consideration of uncertainties
Quality and reliability are explicitly integrated in the optimization process
High computational effort generating multiple concurrent solves – efficient parallelism
Requires expertise to optimise and automate the process
RDO Methodology
1. Sensitivity Analysis
Stochastic sampling (LHS) for optimized scanning
of multi-dimensional parameter spaces
2. Optimization
3. Robustness Evaluation
Identification of the relevant input parameters and
response values
Efficient methods of stochastic analysis for the
determination of failure probabilities
28 Copyright 2014 FUJITSU
Demonstrator: Press tool variance
Model setup Find out the minimum radius in the cylinder
Understand influence of pressure load to stresses in
adapter
Mass reduction considering total deformation and
stresses
Parameters 2 variable parameters in the CAD geometry
5 load parameters (2 pressure and 3 force
parameters)
Description Manual geometry and load case variation
The press tool is a part of a hydraulic press. It is guided by four poles and loaded with pressure.
Applications Solver: ANSYS Mechanical APDL
29 Copyright 2014 FUJITSU
Model: Crossbar in press tool
RDO driven by ANSYS DesignXplorer or Dynardo optiSLang automates design
point generation, and statistically optimises the parameter combined variations
30 Copyright 2014 FUJITSU
Optimisation objectives
How to minimize the radius
(increased contact area) without
exceeding max. stresses?
Understand influence of pressure
load to stresses in adapter
31 Copyright 2014 FUJITSU
Application placement in Solution
ClusterHead node
Geometry handler
ANSYS Workbench
ANSYS Mechanical
ANSYS Workbench
Load distribution has to be balanced to overlap and synchronise the various stages in the
process, including the data movement to avoid bottlenecks
32 Copyright 2014 FUJITSU
Parallelism setup
Batch scheduling setup allows resource manager to distribute all design point calculations, utilising all available cluster cores
Restricting number of cores of all compute nodes, and with no DP‘s across compute nodes, gives best scaling
33 Copyright 2014 FUJITSU
Sample PRIMEFLEX setup for RDO
PBS accessing Compute Nodes 1..8ANSYS RSM
KVM
Shared disk: /RSMtemp
RAID0, InfiniBand
Head nodeRX3508 cores
Compute nodesCX2508 cores used per job
FUJITSU Integrated Solution
34 Copyright 2014 FUJITSU
Multiphysics drives new simulation in Geoscience
COMSOL Multiphysics
35 Copyright 2014 FUJITSU
COMSOL Multiphysics capabilities
Model and simulate any physics-based system
Temperature in a
geothermal heating system
36 Copyright 2014 FUJITSU
COMSOL 5.0 physics modules for Geosciences
Module Purpose
COMSOL Multiphysics COMSOL without any additional module already provides interfaces for heat transfer, laminar fluid flow and linear
structural mechanics cases.
Subsurface Flow Module Most important module in this sector because it already provides simulation of all kinds of subsurface flows, heat
transport, poroelastic effects and also chemical reactions.
Structural Mechanics
Module
To study thermal stresses caused by temperature changes. Another application would be acoustic-structure
interaction (keywords: fracking, artificially induced seismicity). Combines with the Geomechanics Module to
calculate the complex structural behavior of the subsurface, e.g. due to drilling processes or pressure changes due
to intensive pumping.
Pipe Flow Module Important if the thermal influence of the flow through the boreholes is to be quantified or for closed-loop shallow
geothermal applications. This module provides a very powerful simplification of the pipe flow regime. It can also be
used for large scale pipe systems, e.g. pipelines.
CFD Module Needed for computing more complex turbulent flows, e.g. in the very close range of a geothermal borehole.
Heat Transfer Module Needed if radiation or thermodynamical processes play a role. It also provides some turbulent flow models.
Optimization Module Multipurpose module that is e.g. used for backward-simulations. Provides state-of-the-art techniques for parameter
estimations and optimization purposes (e.g. Monte-Carlo methods).
37 Copyright 2014 FUJITSU
Deep Geothermal Energy study
Workload Geothermal doublet around a single borehole
Parametrics sweep across:
- with/without natural subsurface flow
- Depths of borehole inlet and outlet
- Distance between inlet and outlet
Evaluate the heat and re-injection of the cold water, and convective heat transport by groundwater flow around a single
borehole, particularly the impact of pressure changes in the medium
Model setup
Size
Injection and production areas at different depths
Horizontally shifted injection and production
Includes fault zone with different conductivities
500m cube simulated with 30 days of production
2.47M DOF
Physics Primarily using Structural Mechanics and Sub-
Surface Flow modules
38 Copyright 2014 FUJITSU
Results
Vertical plane through the borehole termination points
Total displacement (m) and Darcy’s velocity field (red = 0.016m) Temperature (C) and Darcy’s velocity field (red = 40C)
39 Copyright 2014 FUJITSU
Evaluation system configurations
Application version COMSOL 5.0
Processors per node 2
Processor frequency 2.6 0GHz
Cores per processor 8
Interconnect InfiniBand
Sim
ula
tions p
er
week
Number of nodes
40 Copyright 2014 FUJITSU
COMSOL HPC Solution benefits
COMSOL Multiphysics supports hybrid parallelism for the optimal balance between shared and distributed memory execution, giving efficient scaling across the range of different physics simulations
Clear and structured user interface and the extensive model library allow users to rapidly start preparing HPC models
In-built optimiser allows for finding the ideal design with large parametric sweeps in short time
41 Copyright 2014 FUJITSU
Realistic visualisation for Creative Agencies
Autodesk VRED
42 Copyright 2014 FUJITSU
On-line Configuration
Visualisation with Autodesk VRED
Image & Movie Rendering
Fastest rendering setup with
full quality ray-tracing
Realistic animation within
dynamic environments
HPC throughput transforms
the CGI end-to-end workflow
HPC Accelerated Visualisation From Concept to Sales
Realtime Visualization
Enables reliable decision
making with full quality
interactive rendering
Managing huge data sets
(CPU, not GPU bound)
Dedicated HPC acceleration
for acceptable interactivity
Rendering on demand &
streamed real-time
visualization
High visual fidelity even on thin
clients, web & mobile
Scalable HPC delivers on
performance & user load
43 Copyright 2014 FUJITSU
Example model
Render settings:
Resolution 1280 x 720 (720p)
Render Settings Antialiasing Samples 1024
Duration Infinite Rendering
in viewport
Adaptive Sampling High Quality
Use Clamping 16
Interactive Full Global Illumination
Still Full Global Illumination
Photon Tracing Indirect
Photon Count Still 2,000,000
Final Gather For Glossy Reflection
Scene specifications:
Triangles 7 million
Meshes 22,788
Active lights 35
Wire file size 208MB
VRED scene size 225MB
44 Copyright 2014 FUJITSU
VRED 2015 performance on Full HD
Nodes used –
hyperthreading enabled
Total cores Framerate
FPS
Offline
rendering
Master alone 24 0.71 64 minutes
Master plus 1 Compute 24+1x40=64 1.27 30 minutes
Master plus 2 Compute 24+2x40=104 1.88 21 minutes
Master plus 3 Compute 24+3x40=144 2.20 15 minutes
Master plus 4 Compute 24+4x40=184 2.65 12 minutes
Compute nodes:PRIMERGY CX400; each with 2x CX270 or 4x CX250 compute nodes
Per node: Dual Intel Xeon E5-2680 v2 @ 2.8GHz 10C, 64GB
Linux OS
QDR Infiniband
fast interconnect
Master node:CELSIUS R930power
Dual Intel Xeon E5-2643v2 6C @ 3.5GHz, 128GB
NVIDIA Quadro K5000
Windows OS
5x reduction in render time
45 Copyright 2014 FUJITSU
Zones of tolerance
Render settings:
Triangles >5M
Resolution 1920x1080
Adaptive Sampling High Quality
Pixel Filter Triangle (Size 1)
Interactive Full Global
Illumination
Still Frame Full Global
Illumination
Photons Indirect only
Photon Trace
Depth
32
Photon Count Still 1000000
Cluster settings:
Number of nodes 48
CPUs per node 2
Cores per CPU 8
Total core count 768
CPU type Xeon E5-2670,
2.6 GHz
Network, internal 10 GbE and 45
Gb
46 Copyright 2014 FUJITSU
Offline Rendering for animation
* Assuming 25 FPS with representatiive rendering time
per frame on a single node. Based on 3 times the total
number of frame renderings needed to produce the final 2
minute movie = 9000 frames. Apply 90% linearity in
reduction of rendering time based on reference scene.
1 day
47 Copyright 2014 FUJITSU
Potential reference configuration: CGI Studios
InfiniBand for
fast exchange
10 GbE for client access and storage
GbE for server management
PRIMERGY RX300
1x head node
PRIMERGY CX400 16x PRIMERGY CX250 compute nodes
IS 5035CONSOLEMGT
STATUS
PSU 1
PSU 2
FAN
RST
3433
3231
3635
2827
2625
3029
2221
2019
2423
1615
1413
1817
109
87
1211
43
21
65
Multiple users
Concurrent offline renders
Ray-tracing realism
Animation capability
48 Copyright 2014 FUJITSU
HPC benefits for VRED workloads
Highly detailed ray-tracing accessible through scalable parallelism, with detailed images being rendered in a few seconds
Higher concurrent frame throughput transforms offline rendering for creative assets
Greater speed permits re-rendering of animations within project timeframe – a valuable capability
49 Copyright 2014 FUJITSU
Foundation of PRIMEFLEX for HPC
50 Copyright 2014 FUJITSU
Fujitsu Application Solutions offer
Reference configuration Simplicity for users Expertise onboard
Increased ROI by minimised time to peak productivity, and higher sustained lifecycle output
Intuitive user interface
Accessible to practised and new users
Immediate productivity
Secure shared environment
Dedicated and complete system stack
Simplified end-to-end process
Factory assembled, ready to use
Application catalogue of pre-built packages
Applied expertise captured in automated intelligent workflows
Applications pre-installed
51 Copyright 2014 FUJITSU
PRIMEFLEX Reference Configurations
Components selected for optimal price-performance on ANSYS
CFD applications.
Cherent architecture to avoid performance bottlenecks as system grows
Architecture validated with application partners, system patterns defined for different production workloads
Intel Cluster Ready certification of Fujitsu PRIMERGY HPC systems.
Risk reduced, Confidence increased, ROI expanded
52 Copyright 2014 FUJITSU
Simplicity – Fujitsu HPC Gateway workplace
Intuitive desktop workplace in your web browser
Full set of user tools to run and track HPC workloads
Adaptable by user for their projects and applications
53 Copyright 2014 FUJITSU
Expertise – Gateway Application Catalogue
54 Copyright 2014 FUJITSU
Fujitsu Application Catalogue
Gateway users can download application workflow pacakges from Fujitsu
Current set of key applications from two main HPC sectors:
Life Sciences
Application Supplier Version
BLAST NCBI 2.2.7
DL_POLY_Classic STFC 1.9
GAMESS_US Ames Lab, Iowa State 2012
Gromacs Stockholm Center for
Biomembrane Research
4.6
LAMMPS Sandia National Laboratories 3Feb2013
NAMD U Illinois 2.9
NWChem Pacific Northwest National Lab 6.1
QuantumESPRESSO SISSA, Trieste 5.0
T-COFFEE Center for Genomic Regulation 9.03
CAE
Application Supplier Version
ABAQUS Simulia 6.12
CFX ANSYS 14.0, 14.5
FLUENT ANSYS 14.0, 14.5
LS-DYNA LS-TC V971
MSC NASTRAN MSC Software 2012.2
OpenFOAM OpenCFD 2.2.0
PAM-CRASH ESI Group 2012.0
RADIOSS-CRASH Altair 11.0, 12.0
STAR-CCM+ CD-adapco 7.02, 8.0
STAR-CD CD-adapco 4.18
55 Copyright 2014 FUJITSU
Importing from the Application Catalogue
Short path to importing expertise encoded in pre-built packages from the Application Catalogue
56 Copyright 2014 FUJITSU
Leveraging expertise through integrated solutions
57 Copyright 2014 FUJITSU
Bridging the gap to HPC accessibility
Number of
customers
Adoption
time
FUJITSU Integrated
Solutions are a whole
product approach
58 Copyright 2014 FUJITSU
Outcomes from PRIMEFLEX for HPC
End-to-end risk reduction from a simplified process of acquisition to production.
Shortest time to peak productivity, sustained across solution lifecycle
ROI multipliers from broader accessibility and business process transformation.
Value
dimensions
Reference configurations ensure more predictable and effective performance,
and removes DIY uncertainties.
Greater confidence to deploy HPC on new projects and for users with more
diverse skill levels.
Risk
reduction
59 Copyright 2014 FUJITSU
Thank you
60 Copyright 2014 FUJITSU
Top Related