Recent Progress of Parallel SAMCEF with MUMPS MUMPS User...
Transcript of Recent Progress of Parallel SAMCEF with MUMPS MUMPS User...
Jean-Pierre Delsemme
Product Development Manager Clamart, May 29th and 30th 2013
Recent Progress of Parallel SAMCEF with MUMPS
MUMPS User Group Meeting 2013
Clamart, May 29th and 30th 2013 2. MUMPS User Group Meeting 2013
Summary
SAMCEF, a brief history
Co-simulation, a good candidate for parallel processing
MAAXIMUS, The "Gigadof" challenge
SLEPC/MUMPS, a parallel eigenvalue solver
Test the limit of your machine
Conclusions
Clamart, May 29th and 30th 2013 3. MUMPS User Group Meeting 2013
SAMCEF, a brief history
SAMCEF: general purpose FE software
First line of code written in 1967 at University of Liege
First customer in aeronautic in 1977
Creation of SAMTECH in 1986 (9 people)
Implementation of BCSLIB in 1998
Implementation of MUMPS 2005
Acquisition of 60% of SAMTECH by LMS in 2011
(9 subsidiaries, 300 people, 30 MEuro per year)
Clamart, May 29th and 30th 2013 4. MUMPS User Group Meeting 2013
3rd participation…
MUMPS User Group Meeting, Toulouse April 15th an 16th 2010
Clamart, May 29th and 30th 2013 5. MUMPS User Group Meeting 2013
Parallel computing in SAMCEF
What is parallel computing in SAMCEF ?
ASEF Linear static analysis Use of MUMPS for linear system resolution
MECANO Non-linear static and dynamic
analysis
Elements generation / System resolution /
Results archive (depending on PARALL)
DYNAM Modal analysis Use of SLEPC and MUMPS for
eigenvalues problem resolution
STABI Linear buckling analysis Use of SLEPC and MUMPS for
eigenvalues problem resolution
BACON Ray Tracing
(.MCRT command) Distribution of ray tracing computation
Clamart, May 29th and 30th 2013 6. MUMPS User Group Meeting 2013
Summary
SAMCEF, a brief history
Co-simulation, a good candidate for parallel processing
MAAXIMUS, The "Gigadof" challenge
SLEPC/MUMPS, a parallel eigenvalue solver
Test the limit of your machine
Conclusions
Clamart, May 29th and 30th 2013 7. MUMPS User Group Meeting 2013
MECANO-MECANO Co-simulation
Co-simulation principle
2 fields of unknown
2 governing equations
Function of time
Staggered solution scheme
Field are exchanged at time step
level
Approximation due to delay of the
information
Can lead to numerical problem
0
0
,,
,,,,
tyxg
tyxftyxF
Clamart, May 29th and 30th 2013 8. MUMPS User Group Meeting 2013
MECANO-MECANO Co-simulation
Monolithic solution scheme
Coupling at iteration level
All equations treated at once
Master – slave approach
Internal dof's "11"
Interface dof's "22"
At each iteration, exchange of:
• Interface unknowns q2, R2
• Schur complement of the interface
and linear constrains
S
S
M
M
S
S
M
M
SM
STSS
SS
MTMM
MM
R
R
R
R
d
dq
dq
dq
dq
BB
BKK
KK
BKK
KK
2
1
2
1
2
1
2
1
2221
1211
2221
1211
000
00
000
00
000
SSSSS
TS
KKKKK
B
BK
121121
*
22
*
22
*
22
0
Clamart, May 29th and 30th 2013 9. MUMPS User Group Meeting 2013
MECANO-MECANO Co-simulation
Example: 4 wheel car model
400 000 dof's per tires
2 phase loading
• Tire inflate without contact
• Apply gravity on wheel to force contact with ground
Non optimum automatic
domain decomposition
Clamart, May 29th and 30th 2013 10. MUMPS User Group Meeting 2013
MECANO-MECANO Co-simulation
Example: 8 wheel car model
400 000 dof's per tires
Impressive computation
time reduction
Configuration Time (min) Speed up
monolithic 1 node, 4 procs 182 1.0
monolithic 2 nodes, 8 procs 45 4.0
monolithic 4 nodes, 16 procs 22 8.3
monolithic 8 nodes, 32 procs 28 6.5
New Method 5 nodes, 18 procs 12 15.1
New Method 9 nodes, 34 procs 7 26.0
Clamart, May 29th and 30th 2013 11. MUMPS User Group Meeting 2013
Summary
SAMCEF, a brief history
Co-simulation, a good candidate for parallel processing
MAAXIMUS, The "Gigadof" challenge
SLEPC/MUMPS, a parallel eigenvalue solver
Test the limit of your machine
Conclusions
Clamart, May 29th and 30th 2013 12. MUMPS User Group Meeting 2013
Maaximus, the "Gigadof" challenge
Calculate a fuselage made of,
as much as possible,
identical single barrels
~4,500,000 dof's per barrel
Identification of new limitations
"standardization" of long long integer version
Clamart, May 29th and 30th 2013 13. MUMPS User Group Meeting 2013
Maaximus, the "Gigadof" challenge
3 sections of fuselage
13,956,620 dof's
1,891,176 shell elem.
2322011 elem. In total
7 h 24' on a cluster of 8 nodes,
up to 100% of prescribed load
5 time steps, 1 rejected,
19 iterations
Intel(R) Core(TM) 2.67 GHz
12 Gb per node
Clamart, May 29th and 30th 2013 14. MUMPS User Group Meeting 2013
Maaximus, the "Gigadof" challenge
4 sections of fuselage
18,580,217 dof's
2,521,568 shell elem.
3,092,810 elem. In total
28 h 30' on a cluster
of 10 nodes.
7 time steps, 5 rejected,
57 iterations up to 91% of load
Intel(R) Core(TM) 2.67 GHz
12 Gb per node
Clamart, May 29th and 30th 2013 15. MUMPS User Group Meeting 2013
Maaximus, the "Gigadof" challenge
6 sections of fuselage
27,823,805 dof's
3782746 shell elem.
4,635,044 elem. In total
37h 30’ on a cluster of 8 nodes.
7 time steps, 3 rejected,
41 iterations up to 100% of load
Intel(R) Core(TM) 2.67 GHz
12 Gb per node
Clamart, May 29th and 30th 2013 16. MUMPS User Group Meeting 2013
Maaximus, the "Gigadof" challenge
7 sections of fuselage
37,127,423 dof's
5044328 shell elem.
6,177,110 elem. In total
0h 0’ on a cluster of 8 nodes.
0 time steps, 0 rejected,
0 iterations up to 0% of load
Intel(R) Core(TM) 2.67 GHz
12 Gb per node Segmentation fault in METIS ! Maybe long long integer needed
Can anyone help ?
Clamart, May 29th and 30th 2013 17. MUMPS User Group Meeting 2013
Summary
SAMCEF, a brief history
Co-simulation, a good candidate for parallel processing
MAAXIMUS, The "Gigadof" challenge
SLEPC/MUMPS, a parallel eigenvalue solver
Test the limit of your machine
Conclusions
Clamart, May 29th and 30th 2013 18. MUMPS User Group Meeting 2013
Parallel Eigenvalue Solver
Since version 15 (partially available in 14) an interface with
SLEPC solver has been implemented.
SLEPC library is a public domain software developed by
Universidad Politecnica de Valencia, Spain
V. Hernandez, J. E. Roman and V. Vidal (2005),
"SLEPc: A Scalable and Flexible Toolkit for the Solution of
Eigenvalue Problems", ACM Trans. Math. Softw. 31(3), 351-362
SLEPC perform mathematical operations in parallel based on:
PETSC for matrix – vector product.
MUMPS for liner system solve.
Clamart, May 29th and 30th 2013 19. MUMPS User Group Meeting 2013
Parallel Eigenvalue Solver
Benefit for users
Faster computation
Possibility to use several computers and than more memory
Possible disadvantage
MUMPS is able to work out-of-core but PETSC isn’t.
On a given computer (fixed memory) there is maximum problem
size that is lower than BCSLIB maximum size.
Clamart, May 29th and 30th 2013 20. MUMPS User Group Meeting 2013
Parallel Eigenvalue Solver
Scalable test case: parametric model
Comparison of a sequential BCSLIB run with parallel 4 proc. SLEPC run.
Intel(R) Xeon(R) @ 2.66GHz, 48 GBytes of memory, 8 cores
0
1
2
3
4
5
0
5000
10000
15000
20000
25000
1 2 3 4 5 6 7 8 9
Sp
eed
up
Ela
ps
ed
tim
e (
sec)
Problem size (Mdof)
BCSLIB Elapsed
PEIGDY2 Elapsed
Speedup
Clamart, May 29th and 30th 2013 21. MUMPS User Group Meeting 2013
Parallel Eigenvalue Solver
Scalable test case: parametric model
Allows to use several nodes in a cluster and than more memory.
Intel(R) Core(TM) i7-2600 @ 3.40GHz,
16 Gbytes of memory per node, 4 cores
4 threads per node; 100 eigenvalues
346 594
806 1132
89400
171
426 524 642 890
2 4 5 6 8
Ela
psed
So
lver t
ime,
Lo
g1
0 (
sec.)
Degrees of Freedom, (Mdof)
4 Nodes
8 Nodes
Out-of-Core
Clamart, May 29th and 30th 2013 22. MUMPS User Group Meeting 2013
Summary
SAMCEF, a brief history
Co-simulation, a good candidate for parallel processing
MAAXIMUS, The "Gigadof" challenge
SLEPC/MUMPS, a parallel eigenvalue solver
Test the limit of your machine
Conclusions
Clamart, May 29th and 30th 2013 23. MUMPS User Group Meeting 2013
Customer question…
On my machine,
what is the maximum model size I can analyze?
or
To analyze my model,
what is the minimum machine configuration (cheapest) I should
acquire?
Answer: "test it yourself"
Clamart, May 29th and 30th 2013 24. MUMPS User Group Meeting 2013
Parametric model
Realistic "fuselage" model
Made of shell elements
With skin, frames and stringers
2 types of analysis
A non-linear static loading
2 iterations (2 factors, 2solves)
An eigenvalue analysis
2 eigenvalues, 1 factor, n solves (18 or 27)
Clamart, May 29th and 30th 2013 25. MUMPS User Group Meeting 2013
Design of experiment
Hardware
Cluster of 16 (identical) nodes
Processor: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
# Cores: 6
MemTotal: 32855900 kB
SwapTotal: 33559776 kB
Clamart, May 29th and 30th 2013 26. MUMPS User Group Meeting 2013
Design of experiment
Software
10 model sizes [1-10] million of dof's
2 solvers BCSLIB and MUMPS or SLEPC/MUMPS
Sequential BCSLIB with 1, 2, 4 or 6 MKL Threads
Sequential MUMPS with 1, 2, 4 or 6 MKL Threads
Parallel MUMPS on 2 procs. with 1 or 2 MKL Threads
Parallel MUMPS on 3 procs. with 2 MKL Threads
Parallel MUMPS on 4 procs. with 1 MKL Threads
Parallel MUMPS on 6 procs. with 1 MKL Threads
9 among 14 possible cases
Total of 10*2*(4 + 4 + 2 + 1 + 1 +1) = 260 analyses
Total of 19days 14h 03min 26.05sec…
Clamart, May 29th and 30th 2013 27. MUMPS User Group Meeting 2013
Model properties
0
2000000
4000000
6000000
8000000
10000000
12000000
1 2 3 4 5 6 7 8 9 10
# Dof's
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0.0E+00
5.0E+12
1.0E+13
1.5E+13
2.0E+13
2.5E+13
0.0E+00 2.0E+06 4.0E+06 6.0E+06 8.0E+06 1.0E+07 1.2E+07
Flo
ati
ng
po
int
op
era
tio
ns
BCSLIB ops. F
MUMPS ops. F
Arithmetic Intensity
0
10000
20000
30000
40000
50000
60000
0.0E+00 2.0E+06 4.0E+06 6.0E+06 8.0E+06 1.0E+07 1.2E+07
Mem
ory
(M
byte
s)
BCSLIB Min OOC Mb
BCSLIB Min IC Mb
MUMPS Min OOC Mb
MUMPS Min IC Mb
ICNTL(22)
MemTotal
Clamart, May 29th and 30th 2013 28. MUMPS User Group Meeting 2013
Non-linear static analysis
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
20,000
1 2 3 4 5 6 7 8 9 10
Solv
er e
lap
sed
tim
e (
sec.
)
MUMPS 1th 1ProcMUMPS 1th 2ProcMUMPS 1th 4ProcMUMPS 1th 6ProcMUMPS 2th 1ProcMUMPS 2th 2ProcMUMPS 2th 3ProcMUMPS 4th 1ProcMUMPS 6th 1ProcBCSLIB 1thBCSLIB 2thBCSLIB 4thBCSLIB 6th
0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
1 2 3 4 5
Solv
er e
lap
sed
tim
e (
sec.
)
MUMPS 1th 1ProcMUMPS 1th 2ProcMUMPS 1th 4ProcMUMPS 1th 6ProcMUMPS 2th 1ProcMUMPS 2th 2ProcMUMPS 2th 3ProcMUMPS 4th 1ProcMUMPS 6th 1ProcBCSLIB 1thBCSLIB 2thBCSLIB 4thBCSLIB 6th
Clamart, May 29th and 30th 2013 29. MUMPS User Group Meeting 2013
Non-linear static analysis
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
BCSLIB1th
BCSLIB2th
MUMPS1th
4Proc
MUMPS1th
1Proc
BCSLIB4th
BCSLIB6th
MUMPS1th
2Proc
MUMPS2th
3Proc
MUMPS2th
1Proc
MUMPS2th
2Proc
MUMPS4th
1Proc
MUMPS6th
1Proc
MUMPS1th
6Proc
Sp
eed
up
To
tal
Ela
psed
Tim
e (
sec.)
Total Elapsed time, 8 Mdof
Clamart, May 29th and 30th 2013 30. MUMPS User Group Meeting 2013
Non-linear static analysis
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
0
500
1000
1500
2000
2500
3000
BCSLIB2th
BCSLIB1th
BCSLIB6th
MUMPS1th
1Proc
BCSLIB4th
MUMPS1th
6Proc
MUMPS2th
1Proc
MUMPS4th
1Proc
MUMPS1th
2Proc
MUMPS1th
4Proc
MUMPS6th
1Proc
MUMPS2th
2Proc
MUMPS2th
3Proc
Sp
eed
up
To
tal
Ela
psed
Tim
e (
sec.)
Total Elapsed time, 4 Mdof
Clamart, May 29th and 30th 2013 31. MUMPS User Group Meeting 2013
Eigenvalue analysis
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
1 2 3 4 5 6 7 8 9 10
Tota
l Ela
pse
d t
ime
(se
c.)
MUMPS 1th 1ProcMUMPS 1th 2ProcMUMPS 1th 4ProcMUMPS 1th 6ProcMUMPS 2th 1ProcMUMPS 2th 2ProcMUMPS 2th 3ProcMUMPS 4th 1ProcMUMPS 6th 1ProcBCSLIB 1thBCSLIB 2thBCSLIB 4thBCSLIB 6th
0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
1 2 3 4 5
Tota
l Ela
pse
d t
ime
(se
c.)
MUMPS 1th 1ProcMUMPS 1th 2ProcMUMPS 1th 4ProcMUMPS 1th 6ProcMUMPS 2th 1ProcMUMPS 2th 2ProcMUMPS 2th 3ProcMUMPS 4th 1ProcMUMPS 6th 1ProcBCSLIB 1thBCSLIB 2thBCSLIB 4thBCSLIB 6th
Clamart, May 29th and 30th 2013 32. MUMPS User Group Meeting 2013
Eigenvalue analysis
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
0
5000
10000
15000
20000
25000
30000
MUMPS1th
2Proc
MUMPS1th
6Proc
MUMPS1th
1Proc
MUMPS1th
4Proc
MUMPS2th
3Proc
MUMPS2th
1Proc
MUMPS2th
2Proc
MUMPS6th
1Proc
MUMPS4th
1Proc
BCSLIB6th
BCSLIB4th
BCSLIB1th
BCSLIB2th
Sp
ee
du
p
To
tal E
lap
se
d T
ime
(se
c.)
Total Elapsed time, =8 Mdof
Clamart, May 29th and 30th 2013 33. MUMPS User Group Meeting 2013
Eigenvalue analysis
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
0
200
400
600
800
1000
1200
1400
MUMPS2th
1Proc
BCSLIB1th
BCSLIB4th
BCSLIB2th
MUMPS1th
2Proc
BCSLIB6th
MUMPS1th
1Proc
MUMPS2th
2Proc
MUMPS4th
1Proc
MUMPS6th
1Proc
MUMPS1th
6Proc
MUMPS2th
3Proc
MUMPS1th
4Proc
Sp
ee
du
p
To
tal E
lap
se
d T
ime
(se
c.)
Total Elapsed time, 4 Mdof
Clamart, May 29th and 30th 2013 34. MUMPS User Group Meeting 2013
Conclusions
SAMCEF still improves parallel computing
Better parallel processing of non-linear analysis
Co-simulation helps parallel processing
Parallel eigenvalue solver SLEPC/MUMPS
Need help for METIS long long integer
Test the limits of your machine
Parametric model available
No need to try too far out of memory
Optimum balance MKL thread / MUMPS proc to find
SLEPC shows good performance if it can run in core
Real need for efficient eigenvalue solver for large problem
Jean-Pierre Delsemme
Product Development Manager Clamart, May 29th and 30th 2013
Thank You
Clamart, May 29th and 30th 2013 36. MUMPS User Group Meeting 2013
If you come to Liège