Scalable System for Large Unstructured Mesh Simulation Miguel
A. Pasenau, Pooyan Dadvand, Jordi Cotela, Abel Coll and Eugenio
Oate
Slide 2
29th Nov 2012 / 2 Overview Introduction Preparation and
Simulation More Efficient Partitioning Parallel Element Splitting
Post Processing Results Cache Merging Many Partitions Memory usage
Off-screen mode Conclusions, Future lines Acknowledgements
Slide 3
29th Nov 2012 / 3 Overview Introduction Preparation and
Simulation More Efficient Partitioning Parallel Element Splitting
Post Processing Results Cache Merging Many Partitions Memory usage
Off-screen mode Conclusions, Future lines Acknowledgements
Slide 4
29th Nov 2012 / 4 Introduction Education: Masters in Numerical
Methods, trainings, seminars, etc. Publishers: magazines, books,
etc. Research: PhDs, congresses, projects, etc. One of the
International Centers of Excellence on Simulation-Based Engineering
and Sciences [Glotzer et al., WTEC Panel Report on International
Assessment of Research and Development in Simulation Based
Engineering and Science. World Technology Evaluation Center
(wtec.org), 2009].
Slide 5
29th Nov 2012 / 5 Introduction Simulation: structures
Slide 6
29th Nov 2012 / 6 Introduction CFD: Computer Fluid
Dynamics
29th Nov 2012 / 8 Introduction Simulation Preparation of
analysis data Visualization of results GiD Geometry description
Provided by CAD or using GiD Computer Analysis
Slide 9
29th Nov 2012 / 9 Introduction Analysis Data generation Read in
and correct CAD data Assignment of boundary conditions Definitions
of analysis parameters Generation of analysis data Assignment of
material properties, etc.
Slide 10
29th Nov 2012 / 10 Introduction Visualization of Numerical
Results Deformed shapes, temperature distributions, pressures, etc.
Vector, contour plots, graphs, Line diagrams, results surfaces
Animated sequences Particle line flow diagrams
Slide 11
29th Nov 2012 / 11
Slide 12
29th Nov 2012 / 12 Introduction Goal: do a CFD simulation with
100 Million elements using in-house tools Hardware: cluster with
Master node: 2 x Intel Quad Core E5410, 32 GB RAM 3 TB disc with
dedicated Gigabit to Master node 10 nodes: 2 x Intel Quad Core
E5410 and 16 GB RAM 2 nodes: 2 x AMD Opteron Quad Core 2356 and 32
GB Total of 96 cores, 224 GB RAM available Infiniband 4x DDR, 20
Gbps
Slide 13
29th Nov 2012 / 13 Introduction Airflow around a F1 car
model
Slide 14
29th Nov 2012 / 14 Introduction Kratos: Multi-physics, open
source framework Parallelized for shared and distributed memory
machines GiD: Geometry handling and data management First coarse
mesh Merging and post-processing results
Slide 15
29th Nov 2012 / 15 Introduction Geometry Conditions Materials
Coarse mesh generation Partition Distribution Communication plan
part 1 part 2 Refinement Calculation part n res. 1 res. 2 res. n
Merge Visualize
Slide 16
29th Nov 2012 / 16 Overview Introduction Preparation and
Simulation More Efficient Partitioning Parallel Element Splitting
Post Processing Results Cache Merging Many Partitions Memory usage
Off-screen mode Conclusions, Future lines and Acknowledgements
Slide 17
29th Nov 2012 / 17 Preparation and simulation Geometry
Conditions Materials Coarse mesh generation Partition Distribution
Communication plan part 1 part 2 Refinement Calculation part n res.
1 res. 2 res. n Merge Visualize
Slide 18
29th Nov 2012 / 18 Meshing Single workstation: limited memory
and time Three steps: Single node: GiD generates a coarse mesh with
13 Million tetrahedrons Single node: Kratos + Metis divide and
distribute In parallel: Kratos refines the mesh locally
Slide 19
29th Nov 2012 / 19 Preparation and simulation Geometry
Conditions Materials Coarse mesh generation Partition Distribution
Communication plan part 1 part 2 Refinement Calculation part n res.
1 res. 2 res. n Merge Visualize
Slide 20
29th Nov 2012 / 20 Rank0 read the model, partitions it and send
the partitions to the other ranks Rank 0Rank 1 Rank 2Rank 3
Efficient partitioning: before
Slide 21
29th Nov 2012 / 21 Rank0 read the model, partitions it and send
the partitions to the other ranks Rank 0Rank 1 Rank 2Rank 3
Efficient partitioning: before
Slide 22
29th Nov 2012 / 22 Requires large memory in node 0 Using the
cluster time for partitioning which can be done outside Each rerun
need repartitioning Same working procedure for OpenMP and MPI run
Efficient partitioning: before
Slide 23
29th Nov 2012 / 23 Dividing and writing the partitions in
another machine Reading data of each rank separately Efficient
partitioning: now
Slide 24
29th Nov 2012 / 24 Preparation and simulation Geometry
Conditions Materials Coarse mesh generation Partition Distribution
Communication plan part 1 part 2 Refinement Calculation part n res.
1 res. 2 res. n Merge Visualize
Slide 25
29th Nov 2012 / 25 Local refinement: triangle k i j l m n i l j
m k n 1 3 4 2 k k i j l i l j 1 2 i j m k 1 2 k i j l m i l j m k 1
3 2 i l j m k 1 3 2
Slide 26
29th Nov 2012 / 26 Local refinement: triangle Selecting the
case respecting nodes Id The decision is not for best quality! It
is very good for parallelization OpenMP MPI k i j l m i l j m k 1 3
2 i l j m k 1 3 2
Slide 27
29th Nov 2012 / 27 Local refinement: tetrahedron Father Element
Child Elements
Slide 28
29th Nov 2012 / 28 Local refinement: examples
Slide 29
29th Nov 2012 / 29 Local refinement: examples
Slide 30
29th Nov 2012 / 30 Local refinement: examples
Slide 31
29th Nov 2012 / 31 Local refinement: uniform A Uniform
refinement can be used to obtain a mesh with 8 times more elements
Does not improve the geometry representation
Slide 32
29th Nov 2012 / 32 Introduction Geometry Conditions Materials
Coarse mesh generation Partition Distribution Communication plan
part 1 part 2 Refinement Calculation part n res. 1 res. 2 res. n
Merge Visualize
Slide 33
29th Nov 2012 / 33 Parallel calculation Calculated using 12 x 8
MPI processes Less than 1 day for 400 time steps About 180 GB
memory usage Single volume mesh of 103 Million tetrahedrons split
into 96 files ( mesh portion and its results)
Slide 34
29th Nov 2012 / 34 Overview Introduction Preparation and
Simulation More Efficient Partitioning Parallel Element Splitting
Post Processing Results Cache Merging Many Partitions Memory usage
Off-screen mode Conclusions, Future lines and Acknowledgements
Slide 35
29th Nov 2012 / 35 Post processing Geometry Conditions
Materials Coarse mesh generation Partition Distribution
Communication plan part 1 part 2 Refinement Calculation part n res.
1 res. 2 res. n Merge Visualize
Slide 36
29th Nov 2012 / 36 Post-process Challenges to face: Single node
Big files: tens or hundreds of GB Merging: Lots of files Batch
post-processing Maintain generality
Slide 37
29th Nov 2012 / 37 Big Files: results cache Uses a defined
memory pool to store results. Used to cache results stored in
files. Mesh information Created Results: cuts, extrusions, tcl
Temporal results User definable Memory pool Results from files:
single, multiple, merge
Slide 38
29th Nov 2012 / 38 Big Files: results cache Results cache table
RC entry timestamp RC entry timestamp RC entry timestamp Result RC
info RC Info file 1offsettype file 2offsettype file noffsettype
memory footprint Open files table filehandletype filehandletype
filehandletype Result RC info Result RC info Granularity of
result
Slide 39
29th Nov 2012 / 39 Big Files: results cache Verifies results
file(s) and gets results position in file and memory footprint.
Results of latest analysis step in memory. Loaded on demand. Oldest
results unloaded if needed. Touch on use.
Slide 40
29th Nov 2012 / 40 Big Files: results cache Chinese harbour:
104 GB results file 7,6 Million tetrahedrons 2.292 time steps 3,16
GB memory usage ( 2 GB results cache)
Slide 41
29th Nov 2012 / 41 Big Files: results cache Chinese harbour:
104 GB results file 7,6 Million tetrahedrons 2.292 time steps 3,16
GB memory usage ( 2 GB results cache)
Slide 42
29th Nov 2012 / 42 Merging many partitions Before: 2, 4,... 10
partitions Now: 32, 64, 128,... of a single volume mesh Postpone
any calculation: Skin extraction Finding boundary edges Smoothed
normals Neighbour information Graphical objects creation
Slide 43
29th Nov 2012 / 43 Merging many partitions Telescope example
23,870,544 tetrahedrons Before32 partitions24 10 After32
partitions4 34 128 partitions10 43 Single file2 16
Slide 44
29th Nov 2012 / 44 Merging many partitions
Slide 45
29th Nov 2012 / 45 Merging many partitions Racing car example
103,671,344 tetrahedrons Before96 partitions> 5 hours After96
partitions51 21 Single file13 25
Slide 46
29th Nov 2012 / 46 Memory usage Around 12 GB of memory used
with a spike of 15 GB ( MS Windows) 17,5 GB ( Linux), including:
Volume mesh ( 103 Mtetras) Skin mesh ( 6 Mtriangs) Several surface
and cut meshes Stream line search tree 2 GB of results cache
Animations
Slide 47
29th Nov 2012 / 47 Pictures
Slide 48
29th Nov 2012 / 48 Pictures
Slide 49
29th Nov 2012 / 49 Pictures
Slide 50
29th Nov 2012 / 50 Batch post-processing: off-screen GiD with
no interaction and no window Command line: gid -offscreen [ WxH]
-b+g batch_file_to_run Useful to: launch costly animations in bg or
in queue use gid as template generator use gid behind a web server:
Flash Video animation Animation window: added button to generate
batch file for offscreen-gid to be sent to a batch queue.
Slide 51
29th Nov 2012 / 51 Animation
Slide 52
29th Nov 2012 / 52 Overview Introduction Preparation and
Simulation More Efficient Partitioning Parallel Element Splitting
Post Processing Results Cache Merging Many Partitions Memory usage
Off-screen mode Conclusions, Future lines and Acknowledgements
Slide 53
29th Nov 2012 / 53 Conclusions The implemented improvements
helped us to achieve the milestone: Prepare, mesh, calculate and
visualize a CFD simulation with 103 Million tetrahedrons GiD: also
modest machines take profit of these improvements
Slide 54
29th Nov 2012 / 54 Future lines Faster tree creation for stream
lines. Now: ~ 90 s. creation time, 2-3 s. per stream line Mesh
simplification, LOD geometry and results criteria Surface meshes,
iso-surfaces, cuts: faster drawing Volume meshes: faster cuts,
stream lines Near real-time Parallelize other algorithms in GiD:
Skin and boundary edges extraction Parallel cuts and stream lines
creation
Slide 55
29th Nov 2012 / 55 Challenges 10 9 10 10 tetrahedrons, 610 8
610 9 triangles Large workstation with Infiniband to cluster and 80
GB or 800 GB RAM? Hard disk? Post process as backend of a web
server in cluster? Security issues? Post process embedded in
solver? Output of both: the original mesh and a simplified
one?
Slide 56
29th Nov 2012 / 56 Acknowledgements Ministerio de Ciencia e
Innovacin, E-DAMS project European Commission, Real-time
project
Slide 57
29th Nov 2012 / 57 Comments, questions...... ?
Slide 58
Thanks for your attention Scalable System for Large
Unstructured Mesh Simulation