Download - Scalable System for Large Unstructured Mesh Simulation Miguel A. Pasenau, Pooyan Dadvand, Jordi Cotela, Abel Coll and Eugenio Oñate.

Scalable System for Large Unstructured Mesh Simulation Miguel A. Pasenau, Pooyan Dadvand, Jordi Cotela, Abel Coll and Eugenio Oate

29th Nov 2012 / 2 Overview Introduction Preparation and Simulation More Efficient Partitioning Parallel Element Splitting Post Processing Results Cache Merging Many Partitions Memory usage Off-screen mode Conclusions, Future lines Acknowledgements

29th Nov 2012 / 3 Overview Introduction Preparation and Simulation More Efficient Partitioning Parallel Element Splitting Post Processing Results Cache Merging Many Partitions Memory usage Off-screen mode Conclusions, Future lines Acknowledgements

29th Nov 2012 / 4 Introduction Education: Masters in Numerical Methods, trainings, seminars, etc. Publishers: magazines, books, etc. Research: PhDs, congresses, projects, etc. One of the International Centers of Excellence on Simulation-Based Engineering and Sciences [Glotzer et al., WTEC Panel Report on International Assessment of Research and Development in Simulation Based Engineering and Science. World Technology Evaluation Center (wtec.org), 2009].

29th Nov 2012 / 5 Introduction Simulation: structures

29th Nov 2012 / 6 Introduction CFD: Computer Fluid Dynamics

29th Nov 2012 / 7 Introduction Geomechanics Industrial forming processes Electromagnetism Acoustics Bio-medical engineering Coupled problems Earth sciences

29th Nov 2012 / 8 Introduction Simulation Preparation of analysis data Visualization of results GiD Geometry description Provided by CAD or using GiD Computer Analysis

29th Nov 2012 / 9 Introduction Analysis Data generation Read in and correct CAD data Assignment of boundary conditions Definitions of analysis parameters Generation of analysis data Assignment of material properties, etc.

29th Nov 2012 / 10 Introduction Visualization of Numerical Results Deformed shapes, temperature distributions, pressures, etc. Vector, contour plots, graphs, Line diagrams, results surfaces Animated sequences Particle line flow diagrams

29th Nov 2012 / 11

29th Nov 2012 / 12 Introduction Goal: do a CFD simulation with 100 Million elements using in-house tools Hardware: cluster with Master node: 2 x Intel Quad Core E5410, 32 GB RAM 3 TB disc with dedicated Gigabit to Master node 10 nodes: 2 x Intel Quad Core E5410 and 16 GB RAM 2 nodes: 2 x AMD Opteron Quad Core 2356 and 32 GB Total of 96 cores, 224 GB RAM available Infiniband 4x DDR, 20 Gbps

29th Nov 2012 / 13 Introduction Airflow around a F1 car model

29th Nov 2012 / 14 Introduction Kratos: Multi-physics, open source framework Parallelized for shared and distributed memory machines GiD: Geometry handling and data management First coarse mesh Merging and post-processing results

29th Nov 2012 / 15 Introduction Geometry Conditions Materials Coarse mesh generation Partition Distribution Communication plan part 1 part 2 Refinement Calculation part n res. 1 res. 2 res. n Merge Visualize

29th Nov 2012 / 16 Overview Introduction Preparation and Simulation More Efficient Partitioning Parallel Element Splitting Post Processing Results Cache Merging Many Partitions Memory usage Off-screen mode Conclusions, Future lines and Acknowledgements

29th Nov 2012 / 17 Preparation and simulation Geometry Conditions Materials Coarse mesh generation Partition Distribution Communication plan part 1 part 2 Refinement Calculation part n res. 1 res. 2 res. n Merge Visualize

29th Nov 2012 / 18 Meshing Single workstation: limited memory and time Three steps: Single node: GiD generates a coarse mesh with 13 Million tetrahedrons Single node: Kratos + Metis divide and distribute In parallel: Kratos refines the mesh locally

29th Nov 2012 / 20 Rank0 read the model, partitions it and send the partitions to the other ranks Rank 0Rank 1 Rank 2Rank 3 Efficient partitioning: before

29th Nov 2012 / 21 Rank0 read the model, partitions it and send the partitions to the other ranks Rank 0Rank 1 Rank 2Rank 3 Efficient partitioning: before

29th Nov 2012 / 22 Requires large memory in node 0 Using the cluster time for partitioning which can be done outside Each rerun need repartitioning Same working procedure for OpenMP and MPI run Efficient partitioning: before

29th Nov 2012 / 23 Dividing and writing the partitions in another machine Reading data of each rank separately Efficient partitioning: now

29th Nov 2012 / 25 Local refinement: triangle k i j l m n i l j m k n 1 3 4 2 k k i j l i l j 1 2 i j m k 1 2 k i j l m i l j m k 1 3 2 i l j m k 1 3 2

29th Nov 2012 / 26 Local refinement: triangle Selecting the case respecting nodes Id The decision is not for best quality! It is very good for parallelization OpenMP MPI k i j l m i l j m k 1 3 2 i l j m k 1 3 2

29th Nov 2012 / 27 Local refinement: tetrahedron Father Element Child Elements

29th Nov 2012 / 28 Local refinement: examples

29th Nov 2012 / 31 Local refinement: uniform A Uniform refinement can be used to obtain a mesh with 8 times more elements Does not improve the geometry representation

29th Nov 2012 / 32 Introduction Geometry Conditions Materials Coarse mesh generation Partition Distribution Communication plan part 1 part 2 Refinement Calculation part n res. 1 res. 2 res. n Merge Visualize

29th Nov 2012 / 33 Parallel calculation Calculated using 12 x 8 MPI processes Less than 1 day for 400 time steps About 180 GB memory usage Single volume mesh of 103 Million tetrahedrons split into 96 files ( mesh portion and its results)

29th Nov 2012 / 35 Post processing Geometry Conditions Materials Coarse mesh generation Partition Distribution Communication plan part 1 part 2 Refinement Calculation part n res. 1 res. 2 res. n Merge Visualize

29th Nov 2012 / 36 Post-process Challenges to face: Single node Big files: tens or hundreds of GB Merging: Lots of files Batch post-processing Maintain generality

29th Nov 2012 / 37 Big Files: results cache Uses a defined memory pool to store results. Used to cache results stored in files. Mesh information Created Results: cuts, extrusions, tcl Temporal results User definable Memory pool Results from files: single, multiple, merge

29th Nov 2012 / 38 Big Files: results cache Results cache table RC entry timestamp RC entry timestamp RC entry timestamp Result RC info RC Info file 1offsettype file 2offsettype file noffsettype memory footprint Open files table filehandletype filehandletype filehandletype Result RC info Result RC info Granularity of result

29th Nov 2012 / 39 Big Files: results cache Verifies results file(s) and gets results position in file and memory footprint. Results of latest analysis step in memory. Loaded on demand. Oldest results unloaded if needed. Touch on use.

29th Nov 2012 / 40 Big Files: results cache Chinese harbour: 104 GB results file 7,6 Million tetrahedrons 2.292 time steps 3,16 GB memory usage ( 2 GB results cache)

29th Nov 2012 / 41 Big Files: results cache Chinese harbour: 104 GB results file 7,6 Million tetrahedrons 2.292 time steps 3,16 GB memory usage ( 2 GB results cache)

29th Nov 2012 / 42 Merging many partitions Before: 2, 4,... 10 partitions Now: 32, 64, 128,... of a single volume mesh Postpone any calculation: Skin extraction Finding boundary edges Smoothed normals Neighbour information Graphical objects creation

29th Nov 2012 / 43 Merging many partitions Telescope example 23,870,544 tetrahedrons Before32 partitions24 10 After32 partitions4 34 128 partitions10 43 Single file2 16

29th Nov 2012 / 44 Merging many partitions

29th Nov 2012 / 45 Merging many partitions Racing car example 103,671,344 tetrahedrons Before96 partitions> 5 hours After96 partitions51 21 Single file13 25

29th Nov 2012 / 46 Memory usage Around 12 GB of memory used with a spike of 15 GB ( MS Windows) 17,5 GB ( Linux), including: Volume mesh ( 103 Mtetras) Skin mesh ( 6 Mtriangs) Several surface and cut meshes Stream line search tree 2 GB of results cache Animations

29th Nov 2012 / 47 Pictures

29th Nov 2012 / 50 Batch post-processing: off-screen GiD with no interaction and no window Command line: gid -offscreen [ WxH] -b+g batch_file_to_run Useful to: launch costly animations in bg or in queue use gid as template generator use gid behind a web server: Flash Video animation Animation window: added button to generate batch file for offscreen-gid to be sent to a batch queue.

29th Nov 2012 / 51 Animation

29th Nov 2012 / 53 Conclusions The implemented improvements helped us to achieve the milestone: Prepare, mesh, calculate and visualize a CFD simulation with 103 Million tetrahedrons GiD: also modest machines take profit of these improvements

29th Nov 2012 / 54 Future lines Faster tree creation for stream lines. Now: ~ 90 s. creation time, 2-3 s. per stream line Mesh simplification, LOD geometry and results criteria Surface meshes, iso-surfaces, cuts: faster drawing Volume meshes: faster cuts, stream lines Near real-time Parallelize other algorithms in GiD: Skin and boundary edges extraction Parallel cuts and stream lines creation

29th Nov 2012 / 55 Challenges 10 9 10 10 tetrahedrons, 610 8 610 9 triangles Large workstation with Infiniband to cluster and 80 GB or 800 GB RAM? Hard disk? Post process as backend of a web server in cluster? Security issues? Post process embedded in solver? Output of both: the original mesh and a simplified one?

29th Nov 2012 / 56 Acknowledgements Ministerio de Ciencia e Innovacin, E-DAMS project European Commission, Real-time project

29th Nov 2012 / 57 Comments, questions...... ?

Thanks for your attention Scalable System for Large Unstructured Mesh Simulation