3D Parallel Computing FEA in Offshore Foundation Design-L. Andresen, H. Sturm, M. Vöge & K....

Computer codes and algorithms

Numerical Methods in Geotechnical Engineering – Benz & Nordal (eds)© 2010 Taylor & Francis Group, London, ISBN 978-0-415-59239-0

3D parallel computing FEA in offshore foundation design

Lars Andresen, Hendrik Sturm, Malte Vöge & Kristoffer SkauNorwegian Geotechnical Institute, NGI

ABSTRACT: Several large scale 3d finite element analyses for the design of offshore foundations have recentlybeen carried out at NGI. The commercial software code Abaqus was used for parallel computation on an in-housecomputer cluster in order to obtain a reduction of required calculation time (speedup). This paper describes thecomputer environment, key data of the finite element models and obtained speedup for two design problems.In addition results from a systematic benchmark study on two typical use cases of geotechnical engineering arepresented.

1 INTRODUCTION

Parallel computing is a technique that allows for thecomputation of complex boundary value problemswith large numbers of degree of freedoms (DOF).Recently, several 3d finite element analyses have beenperformed at NGI mainly for the design of founda-tions for offshore structures. Thereto the commercialFE-program Abaqus/Standard Version 6.9 has beenemployed. It incorporates message passing interface(MPI) and thread based parallelisation techniques.

The boundary value problems presented are char-acterised by having complex 3d geometries whichrequire detailed meshes to minimise discretization(mesh) errors. Accounting also for soil-structure-interaction (SSI) and highly non-linear soil responsewe have to deal with large scale problems with up to2.5 million DOFs, computer memory requirements upto 40 GB and computer run times of several days. Thecalculations are run on an in-house computer clusterconsisting of several multi-core computers run with aLinux operating system.

The objective of this paper is to present experiencesgained on parallel computation of typical geotechnicalproblems. The results are supplemented by a system-atic benchmark test based on two representative usecases. Aim of this study is to proof feasibility andusability of present available software tools for paral-lel computation of geotechnical problems. Theoreticaland mathematical background of parallel computa-tion are not presented. For details on programmingaspects, we refer to the relevant literature (e.g. Smithand Griffiths 2004).

2 COMPUTATIONAL ENVIRONMENT

The hardware setup that is used for parallel com-putations is a cluster of 4 HP ProLiant BL460c G1

Figure 1. Schematic hardware system overview. The clustermaster provides all necessary services to the diskless nodes,i.e., system image, user disk area and job queue.

compute nodes. Each of these nodes contains 2 IntelXeon QuadCore 3 GHz processors (CPU) with 4 coreseach. Thus, the cluster provides a total of 32 (proces-sor) cores for parallel computations.Three of the nodesprovide 16 GB memory, shared by the respective cores.One node provides 40 GB memory for exceptionallylarge models. An additional HP-ProLiant DL360 G5Base server with an Intel Xeon DualCore 2.3 GHz pro-cessor serves as master to the cluster nodes.The masteris not involved in any computation, but merely pro-vides the cluster nodes with essential services, e.g.system image provisioning, user disk space and thecluster job queue. Master and nodes are connectedvia 1 GBit Ethernet and form a local cluster network.Only the master server is connected to NGIs internalnetwork, see Figure 1.

The cluster is operated by the Linux operating sys-tem (Fedora Core 10). The master server provides a

145

graphical user interface for remote login, so everyuser of the cluster can submit and monitor jobs fromtheir desktop computer. The cluster nodes boot viathe network and load a slim Linux system from themaster onto a RAM disk. This task is performedby the provisioning system Perceus (The officialPerceus/warewulf cluster portal 2009).Abaqus and theuser directories on the master node are mounted ontothe compute nodes via the network. In addition, eachnode contains a 250 GB hard drive for local storage,e.g. for out of core computations.

The job queue on the master is a Torque/PBS sys-tem (Cluster Resources, Inc. 2009). When a modellingjob is performed on multiple cores of a single com-pute node, Abaqus accomplishes the parallelisationof the calculation by shared memory communica-tion, so-called thread based parallelisation. When ajob is submitted to multiple nodes, the parallelisationis accomplished by the MPI communication. The par-ticular implementation that Abaqus uses is HP-MPI,which ships with ABAQUS. In case a cluster systemmakes use of special network communication hard-ware, Abaqus can be configured to use a different MPIimplementation. However, since the present clustersystem uses standard 1 GBit Ethernet network commu-nication, we have used the HP-MPI for the calculationspresented in this article.

3 FINITE ELEMENT MODELS

Recently several large scale 3d finite element analysesfor the design work of offshore foundations have beencarried out at NGI. The problems have been solved onthe computational environment described in Section 2by utilising parallel computation. Two such analysesare briefly presented.

3.1 Rotational stiffness of Troll A

Troll A is a concrete gravity base platform installedin 1995 in the Norwegian trench at a water depth of305 m. The foundation design is described in Hansenet al. (1992).

Recently, NGI has performed re-calculations of theplatform in order predict the serviceability, i.e. theupdated rotational stiffness and the cyclic displace-ment during a design storm. The FE-model of thefoundation subjected to the design load is shown inFigure 2. It was discretisized with 266576 C3D10H1

elements which resulted in 1.33 million DOFs. Thesoil was described with an in-house non-linear elasticuser material while the structure was modelled linearelastic.

The non-linear curve for the rotation versus theapplied overturning moment was established by theincremental, iterative automatic step size procedure

1 10-noded quadratic tetrahedron element with hybridformulation.

Figure 2. Abaqus FE-model in deformed shape and con-tours of deformation during maximum wave loading.

Figure 3. Achieved speedup for running the Troll FE-modelon 1, 4, 6 and 8 cores.

of Abaqus using a direct sparse solver and linearextrapolation.

The achieved normalised speedup Sp for differ-ent number of employed cores is shown in Figure 3.Although Sp is smaller than the theoretical possible lin-ear speedup, the computation time of one simulationscould be reduced to one fifth when employing 8 cores.

3.2 Capacity of shallow skirted foundations

NGI has been responsible for the foundation designof the Adriatic Sea LNG terminal and the Sakhalin 1Arkutun-Dagi platform. Both are founded on flatgrouted concrete bases of ≈100 m width and ≈100–200 m length equipped with a system of short (≈1 m)

146

Figure 4. Different mesh refinements around the skirts.

Table 1. Normalised computation time t for one iterationfor the different meshes shown in Figure 4.

No. of No. of No. of NormalisedMesh elements DOFs cores/nodes time t

A 34 818 187 021 16/2 1.19B 75 982 400 153 16/2 4.03C 190 906 999 521 16/2 7.85D 368 914 1 907 922 16/2 17.63E 506 566 2 619 917 14/4 21.77

corrugated steel skirts that penetrate into the seabedto provide additional horizontal capacity. Large defor-mation FEA has been used to calculate the ultimatebearing capacity under combined vertical, horizontaland overturning moment loading. The term ultimatecapacity, respectively failure, is understood as a zerostiffness full plastic failure mode. The soil behaviourhas been described by a linear-elastic, perfectly plasticmaterial incorporating a Tresca failure criterion.

Since the capacity Fult predicted with the finiteelement method is generally mesh depend, a meshrefinement study has been carried out. The refinementincreases from Mesh A to Mesh E and was concen-trated around the skirts, since the developed failuremechanism was close to the base of the foundation.The element type used for the discretisation was againthe C3D10H element. Detailed views of the corre-sponding meshes are shown in Figure 4(a) to 4(e);corresponding key data are listed in Table 1.

The asymptotic convergence of Fult towards a con-stant value with increasing refinement, i.e. number ofelements, is shown in Figure 5. The computed capac-ity has been normalized with the design wave load Fd.The predicted capacity of Mesh E seems to repre-sent the converged value, hence one can concludethat e.g. the coarsest Mesh A predicts an overshoot ofalmost 4% while the medium coarse Mesh C predictsonly a small overshoot for the production runs.

From Table 1 can be seen that mesh refinement isaccompanied by a significant increase of required cal-culation time, which is the normalised value of the wall

Figure 5. Normalised capacity versus no. of elements fordifferent mesh refinements for the models shown in Figure 4.

Figure 6. Computation time versus no. of DOFs.

clocktime tuser with the number of iterations niterationsand number of employed cores ncores defined viz.

The time t̄ = tMesh x/tMesh A versus normalised DOF =DOFMesh x/DOFMesh A is plotted in Figure 6. In canbe seen that t̄ increases faster than the number ofDOFs, which indicates decreasing efficiency for par-allel computation with increasing number of employedcores.

4 SPEEDUP

There are mainly two different use cases, in which onemay wish – and expect – a significant performanceincrease, i.e. a speedup of the calculation time, whenperforming parallel computation. The first case arecomplex 3d soil-structure interactions (SSI), like theexamples presented in Section 3.The amount of DOF’sis generally large and a single force equilibrium iter-ation may take several minutes. The second case are

147

boundary value problems consisting of rather simplegeometries but highly non-linear loading conditions.If in addition non-linear constitutive models are usedfor the description of the stress-strain behaviour of thesoils, the calculation time increases significantly dueto the mandatory small time stepping. Both use casesare discussed in the following in detail with respect totheir performance with parallel computation.

The achievable performance increase depends onseveral different factors:

model discretisation: geometry, no. of DOFs, bound-ary conditions and output request;

software: FE-program (mainly the solver), the MPIand the operating system (OS);

hardware: CPU speed (govern mainly by the clockrate, system bus bandwidth and cache size), RAM(random-access memory) size, read/write speed ofthe hard disk drive or the RAID (redundant array ofindependent disks), employed chipset (responsiblefor the communication between the different hard-ware components such as CPU, RAM, harddrive,network, . . .) and the network system (e.g. Ethernet,FDDI).

In order to judge the efficiency of a parallel compu-tation over a computation on a single core for a specificuse case, the so called speedup factor Sp is employed,which is defined viz.

with tuser being the total calculation time and niterationsthe no. of iterations. The simulation on one core servesas reference.

Throughout all simulations the Direct Sparse Solverhas been used, although the Iterative Solver based ona Domain Decomposition method is supposed to bemore suitable for geotechnical problems, since themodels are generally very compacted. However, theIterative Solver in Abaqus has some restrictions whichcan often be not fulfilled for typical geotechnicalproblems, e.g. symmetric matrices.

4.1 2d FE-model

The boundary value problem shown in Figure 7 showsa 2d plane strain FE-model of an embedded foun-dation with a flat tip. Undrained soil behaviour wasassumed in the simulations, which was approximatedby means of a simple linear elastic, perfectly plasticmodel incorporating a von Mises failure criterion witha normalised shear modulus of G/su = 150.

The model has 183009 DOFs and was discre-tised with 25981 CPE6H elements, which is a 6noded quadratic plane strain triangular element witha hybrid formulation. The loading history consists oftwo load-steps; after establishing an initial force equi-librium state, the foundation is pushed downwards for25 cm, which corresponds 5 times the thickness of thefoundation.

Figure 7. The plain strain FE-model of a penetration prob-lem used for the benchmark tests.

Figure 8. Achievable speedup depending on the no. of coresand nodes of the 2d model.

This model has been chosen for the benchmark test,because it requires very small time stepping due thelarge deformation. Hence it represents the second usecase described above.

Figure 8 shows the achieved speedup Sp dependingon the no. of used cores and nodes. In addition, theideal speedup curve is plotted in the figure. It repre-sents an idealised linear proportional increase of Spwith no. of employed cores ncores.

The most obvious observation shown in Figure 8 is,that the maximum achieved speedup is only Sp = 3.5for the simulation of the plane strain model on 8cores and 1 node. This indicates the communicationbetween the cores is significant compared to the actual

148

Figure 9. Average core load depending on the no. of coresand nodes of the 2d model.

Figure 10. Achievable speedup depending on the no. ofcores and iterations of the 2d model. Simulations wereperformed on 1 node.

calculation time. Since the communication within anode is generally faster than between the nodes, asimulation distributed over several nodes is slower.However, a simulation on 4 nodes is somewhat fasterthan on 2 nodes. This is obvious when the connectionbetween the nodes is considered. While 2 nodes areconnected via 1 cable, are 4 nodes connected via 3cables, which allows theoretical more communicationprocesses simultaneously.

The increase of communication time with increas-ing no. of employed cores and nodes, respectively,becomes also apparent from Figure 9.Therein is shownthe average core load for the different simulations.The simulation on 4 nodes causes even somewhatless load than its corresponding simulation on 2nodes, which indicates a more efficient node internalcommunication if less cores per node are employed.

Figure 10 presents the achieved speedup for thesame simulations but different extrapolation meth-ods. The influence of the no. of performed itera-tions within one increment is likewise small on theoverall speedup, although the no. of iterations wasalmost 40 times higher without extrapolation com-pared to linear extrapolation. Similar observations can

Figure 11. A 3d FE-model of a cavity expansion problemproblem.

Figure 12. Achievable speedup depending on the no. ofcores and nodes of the 3d model.

be made for the iterative solver. The effective speedupis almost identical compared to the direct solver, whichcontradicts the recommendation given by Abaqus.

4.2 3d FE-model

The second example used for the benchmark test isshown in Figure 11. It is a 3d model of a cavity expan-sion problem. It was discretised with 177600 C3D8RHelements, which is a linear 8 noded brick element withreduced integration points and hybrid integration. Themodel has 647136 DOFs. The material definition usedwas the same as for the 2d model.

This model was chosen, due to the homogenous andisotropic material behaviour as well as due to the axi-symmetrical loading and boundary conditions whichis assumed to perform well in parallel computation.

This assumption seems to be confirmed by achievedspeedup, as shown in Figure 12. The no. of employednodes does not effect the speedup. However, the cor-responding core load presented in Figure 13, showsa similar decreasing efficiency with increasing no. ofcores and nodes, respectively. The explanation is thesame as stated for the 2d model; the average speedlost caused by the network traffic is compensated byseveral positive effects, such as the available RAM percore with increasing no. of nodes.

149

Figure 13. Average core load depending on the no. of coresand nodes of the 3d model.

5 CONCLUDING REMARKS

The parallel computation functionality of the commer-cial FE-program Abaqus/Standard executed on an in-house cluster at NGI has been used over the last yearsfor several simulation of large scale 3d geotechnicalproblems. This paper presents the employed computa-tional environment and the achieved speedup in bothsimulations for the design of offshore foundations andin a benchmark test.

In all examples shown here, parallel computing hasgiven a reduction of the required calculation time. Thisis in accordance with the benchmark tests provided byAbaqus2 and has been confirmed earlier by e.g. Henkeand Hügel (2007) for Abaqus/Explicit. A decrease ofthe speedup Sp with further increase of employed coresas shown by Smith (2000) could not be observed.

2 http://www.simulia.com/support/v69/v69_performance.php

However, the speedup increase is less distinct thangenerally reported. The highest speedup ratio Sp/ncoresachieved was Sp = 5 for the 3d models run on 8 cores.A further increase of ncores did not show considerablylarger speedup values. The 2d-model performed evenworth. This can be explained with increasing commu-nication processes as well as with the restrictions ofthe used software with respect to the solver. Gener-ally the iterative solver is recommended for parallelcomputation of very compacted geometries. But theycan often not be used for geotechnical problems; e.g.if constitutive models with non-associative flow rulesare used. The influence of communication processescannot be assessed directly, but estimated from theaverage core load, which decreased significantly withincreasing number of used cores.

REFERENCES

Cluster Resources, Inc. (2009). Torque resource manager.www.clusterresources.com.

Hansen, B., F. Nowacki, E. Skomedal, and J. Hermstad(1992). Foundation design, Troll platform. In BOSS 92,6th International Conference on the Behaviour of OffshoreStructures, Volume 2, London, pp. 921–936.

Henke, S. and H. Hügel (2007). Räumliche Analysenzur quasi-statischen und dynamischen Penetration vonBauteilen in den Untergrund. In Tagungsband zur 19.Deutsche Abaqus-Benutzerkonferenz in Baden-Baden,Number 2.13.

Smith, I. (2000). A general purpose system for finite elementanalyses in parallel. Engineering Computations 17(1),75–91.

Smith, I. and V. Griffiths (2004). Programming the FiniteElement Method (4th ed.). John Wiley & Sons, Inc.

The official Perceus/warewulf cluster portal (2009). Perceus –Cluster provisioning toolkit. www.perceus.org.

150

3D Parallel Computing FEA in Offshore Foundation Design-L. Andresen, H. Sturm, M. Vöge & K....

Documents

Transcript of 3D Parallel Computing FEA in Offshore Foundation Design-L. Andresen, H. Sturm, M. Vöge & K....