Parallel Processing in High Performance Computational Electromagnetics · 2017. 8. 8. · serial...
Transcript of Parallel Processing in High Performance Computational Electromagnetics · 2017. 8. 8. · serial...
Parallel Processing in High Performance
Computational ElectromagneticsDavid Abraham, under the supervision of Professor Dennis D. Giannacopoulos
Department of Electrical and Computer Engineering, McGill University, Montreal, Québec, Canada
INTRODUCTION
Modern simulation methods, based on techniques such as
finite element analysis, have had much success performing
electromagnetic modelling and simulation on traditional single
processor CPU’s. However, owing to the ever increasing
complexity of modern simulations and applications, these
computationally intensive methods begin to incur very long
computation times. As such, a greater focus is now being
given to finding methods to increase the computation speed
of electromagnetic simulations. One promising candidate is to parallelize computations, taking
advantage of the rapid increase in production, popularity, and availability of multi core processors.
OBJECTIVE
We wish to determine the effects of parallel computing on elementary finite element
simulations by comparing computation times between simulations run on 1, 2, 3 and 4 cores
of a modern multi core processor.
The same basic finite element simulation was run multiple times in order to acquire the following
data. In order to ensure accuracy and account for computational delays due to other processes currently
running on the machine at time of execution, each simulation was repeated 5 times and the results were
statistically analyzed. The data was collected for each of 1, 2, 3 and 4 cores (also known as “workers” in the
Matlab environment) in 3 different simulation scenarios containing different numbers of unknown variables
and elements. The results are summarized in the table below.
On average we observed simulation occurring 2.12 times as
fast on 4 cores as opposed to 1. As was expected, the more cores that
were used, the faster the simulation, with speed increase being
independent of the number of elements.
RESULTS
Computation Time in Seconds vs. Number of Workers
Cores Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average Std. Dev. Ratio (S/M) % Time
8040 Potentials 1 15.811531 16.051262 15.764749 16.608443 16.389314 16.1250598 0.366421853 1 100%
15721 Elements 2 9.801491 10.185931 10.504653 10.250295 10.397885 10.228051 0.26906067 1.58 63.43%
3 9.082127 9.22324 9.705682 9.632143 9.501662 9.4289708 0.267281605 1.71 58.47%
4 7.438147 7.164749 7.404521 7.627576 7.408214 7.4086414 0.164547564 2.18 45.87%
Cores Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average Std. Dev. Ratio (S/M) % Time
31721 Potentials 1 176.885318 174.052936 178.592732 177.473435 175.431596 176.4872034 1.775131224 1 100%
62726 Elements 2 122.885197 120.186661 122.058474 126.028733 124.249454 123.0817038 2.208818043 1.43 69.74%
3 103.414182 105.35724 103.801511 107.268522 107.790581 105.5264072 1.976539066 1.67 59.79%
4 78.870842 82.338225 80.445475 80.832922 83.103374 81.1181676 1.659362622 2.18 45.87%
Cores Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average Std. Dev. Ratio (S/M) % Time
126008 Potentials 1 2559.257279 2685.694734 2685.591752 2674.833031 2672.823466 2655.640052 54.20716589 1 100%
250588 Elements 2 1812.530122 1644.850036 1644.980962 1678.549471 1649.60475 1686.103068 72.05175966 1.57 63.49%
3 1588.346659 1597.217871 1570.345639 1534.964378 1564.359257 1571.046761 24.1421111 1.7 58.91%
4 1237.873169 1333.481115 1322.490923 1390.133754 1377.979164 1332.391625 60.0943562 1.99 50.17%
CONCLUSION
In conclusion it has been demonstrated that even at an elementary level, finite
element analysis could in fact benefit from substantial reductions in computation times by
utilizing modern multi core processors. Although in this treatment, only portions of the matrix
algorithms were parallelized, it is believe that even better performance could be obtained by
parallelizing other steps, such as the triangulation and solving. The results shown here could
also apply to many other programming fields where an increase in speed is desired.
FUTURE PROJECTSFuture areas of interest could be to explore the
possibility of cloud or cluster computing as
well as executing simulations on graphics cards.
One of the most widespread and useful techniques in electromagnetic simulation is that
of Finite Element Analysis (FEA). This method breaks down a region into a finite number of
segments or pieces known as elements (generally triangular or tetrahedral in shape), upon which
an approximate solution to Maxwell’s equations can be determined. Within this method there are
two keys steps required to complete the simulation. The first is to discretize the region into
elements to allow the approximation and the second is to solve the matrix of coefficients
associated with the set of linear equations that arises, in order to obtain the unknown potentials.
Although decreasing the size of elements (and hence increasing the number of
elements) increases the accuracy of the solution, this also consequently results in increased
computation times. Since modern simulations are requiring ever larger and more accurate
simulations, it is becoming a necessity to improve the efficiency and speed of FEA to reduce
computation times. The method of interest here is that of parallel computing: allowing multiple
processors or multiple cores of a single processor to work on different parts of the simulation
simultaneously.
By parallelizing the most time consuming steps, such as the triangulation of the
region or the linear algebra routines, we can stand to gain significant improvements
depending on the number of processors or “workers” available for the computation.
ACKNOWLEDGEMENTS
MATERIALS AND METHODS
In order to test the effects of parallel computing on finite element analysis simulations, a
very basic finite element simulation program was programmed in Matlab and made to perform a
variety of simulations serially. (See images in previous sections for example solutions).
The time required for each simulation was measured and averaged over 5 iterations. After
serial execution, the parallel computing toolbox within Matlab was used to modify the code to
allow matrix populating (the rate determining step in this code) to be parallelized. The simulations
varied in size from 15721 elements to 250588 elements and were executed on 1, 2, 3 and 4 cores
using an Intel i7 Q740 processor running at 1.734GHz.
BACKGROUND
I would like to thank Prof. D. Giannacopoulos for all his help and insight during the course of
this project, the Faculty of Engineering at McGill for giving me the opportunity to participate
and the Natural Sciences and Engineering Research Council of Canada for providing funding.Images:Intel i7: "Intel core i7." Technology@Intel. Web. 4 Aug 2011. <http://blogs.intel.com/technology/2009/04/intel_unveils_new.php>Mathworks: "Mathworks Logo." NC State University. Web. 4 Aug 2011. <http://ncsuecocar.com/?page_id=64>