Fast Integral Equation Methods for BIOEMfor BIOEM in the...

83
Fast Integral Equation Methods for BIOEM for BIOEM in the Petascale Era: Kilo Cores, Mega Codes, Giga Equations, Kilo Cores, Mega Codes, Giga Equations, Tera Bytes, and Peta Multiplications Ali E. YILMAZ Department of Electrical & Computer Engineering University of Texas at Austin University of Illinois at Urbana Champaign, Urbana, IL, Apr. 19, 2011

Transcript of Fast Integral Equation Methods for BIOEMfor BIOEM in the...

Page 1: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Fast Integral Equation Methods for BIOEMfor BIOEM

in the Petascale Era:Kilo Cores, Mega Codes, Giga Equations,Kilo Cores, Mega Codes, Giga Equations,

Tera Bytes, and Peta Multiplications

Ali E. YILMAZ

Department of Electrical & Computer EngineeringUniversity of Texas at Austin

University of Illinois at Urbana Champaign,Urbana, IL, Apr. 19, 2011

Page 2: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Outline

• Motivation- Body-Centric Wireless Systems- Historical Perspective: Computational Bioelectromagnetics- Computing Trends

• Backgroundac g ou d- FDTD vs. MOM- Pre-corrected FFT/Adaptive Integral Method- Computational Complexity

• Parallelization- General Limits of Parallelization - Traditional Scheme and Its LimitsTraditional Scheme and Its Limits- Proposed Parallelization- Comparison

• BIOEM Problems• BIOEM Problems- High-Resolution Simulations- AustinMan Antrophomorphic Human Model

Page 3: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Body-Centric Wireless Systems

- More functionality near-, on-, in-body - Wide-spread use - Marketing

- Health concerns

Images from news articles in newscientist.com, switched.com, pulse2.com

Page 4: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Computational BIOEM Requirements

• Phantoms (“tissue-equivalent” liquids)IEEE Std. C95.3-1991/2002 and 1528-2003- Resolve wavelength/skin depth in material

(900 MHz)• Anthropomorphic Human Models

/10 5.4mm, /10 3.6 mmλ δ≈ ≈

- Resolve wavelength/skin depth in tissuesGabriel, Lau, Gabriel, "The dielectric properties of biological tissues: III.Parametric models for the dielectric spectrum of tissues," Phys. Med.Biol., Nov. 1996. http://niremf.ifac.cnr.it/tissprop/htmlclie/htmlclie.htm

conformity.com/artman/publish/printer_74.shtml

CerebroSpinalFluid 2.4126 68.638 0.70202 38.147 19.215

Tissue Name Conductivity[S/m]

RelativePermittivity

LossTangent

Wavelength[mm]

Skin Depth[mm]

BoneCortical 0.14331 12.454 0.22984 93.782 131.57

BoneMarrow 0.040208 5.5043 0.1459 141.61 310.58

BrainGrayMatter 0.94227 52.725 0.35694 45.182 41.536

/10 3.8mm, /10 1.9 mmλ δ≈ ≈Muscle 0.94294 55.032 0.34222 44.277 42.355

3Finite element size ~ 10-100 mm- Resolve tissue boundaries

/10 3.8mm, /10 1.9 mmλ δ Finite element size 10 100 mm6~ 10 100 10N − ×

3Finite element size ~0.1-1 mm 9~ 1 10 10N − ×

Page 5: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Computational BIOEMHistorical Perspective: 1970sHistorical Perspective: 1970s

CDC 66002-3 MFlops

Chen, Guru, “Internal EM field and absorbed powerdensity in human torsos induced by 1-500 MHz EMwaves,” IEEE Trans. MTT, Sep. 1977.“Our numerical computation was performed with a CDC

Hagmann, Gandhi, Durney, “Numerical calculation ofelectromagnetic energy deposition for a realisticmodel of man,” IEEE Trans. MTT, Sep. 1979.“A total of 180 cubical cells of various sizes was used“Our numerical computation was performed with a CDC

6500 computer which has a maximum capacity ofinverting a 120x120 matrix. This limits the maximumnumber of cells to 40…A cell size of 10 cm3 seemsappropriate.”

“A total of 180 cubical cells of various sizes was usedto obtain a best fit of the contour on diagrams of the50th percentile standard man…the matrix is 270 by270…”

Page 6: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Computational BIOEM80s and 90s: CG FFT and FDTD80s and 90s: CG-FFT and FDTD

Borup, Gandhi, “Fast-Fourier-Transform method forcalculation of SAR distributions in finely discretized

Borup, Sullivan, Gandhi, “Comparison of the FFTconjugate gradient method and the finite-differencecalculation of SAR distributions in finely discretized

inhomogeneous models of biological bodies,” IEEETrans. MTT, Apr. 1984.“It is planned to extend the FFT method to 3-Dproblems. Thus far, the only technique available…formodels of man is the MOM ”

j g gtime-domain method for the 2-D absorption problem,”IEEE Trans. MTT, Apr. 1987.“…serious errors exist in solutions for the TEpolarization…fictitious line charge sources…havebeen removed these modifications prevent the usemodels of man…is the MOM.”

“An obvious difficulty…is the creation of the model…Another problem to be solved is the presentation of theresults.”

been removed…these modifications prevent…the useof the FFT-CGM and thus require O(N3) operations.”“In contrast, FD-TD…yields excellent solutions…FD-TD has great potential for solving the high-resolutionmodels needed in bioelectromagnetics.”

Page 7: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Computational BIOEM90s and 00s: FDTD Dominance90s and 00s: FDTD Dominance

Sullivan, Gandhi, Taflove, “Use of the finite-difference time-domain method for calculatingEM absorption in man models,” IEEE Trans.p ,Biomed. Eng., Mar. 1988.“FDTD is used to calculate the detailed SARwithin the human body…Comparison…with themethod of moments.”H b h M i B kh d K h K t “Th

SEMCAD X

Hombach, Meier, Burkhards, Kuhn, Kuster, “Thedependence of EM energy absorption uponhuman head modeling at 900 MHz,” IEEE Trans.MTT, Oct. 1996.“Different numerical head phantoms based onpMRI scans of three different adults were usedwith voxel sizes down to 1 mm3…up to 13 tissuetypes. The numerical results are compared withthose of measurements in a multitissuephantom ”phantom…

Wiart, Hadjem, Wong, Bloch, “Analysis of RFexposure in the head tissues of children andadults,” Phys. Med. Biology, IEEE Trans. MTT,June 2008June 2008.“…major efforts…to improve the numericaldosimetry…in particular the FDTD has beenimproved and adapted to this domain.”

Page 8: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Computational BIOEM90s and 00s: FDTD Controversies90s and 00s: FDTD Controversies

Lin, “Cellular mobile telephones and children,” IEEEAntennas Propagat. Mag., Oct. 2002.“Are children more vulnerable to the microwave radiationfrom cellular mobile telephones? Numerical dosimetry

Wust, Nadobny, Seebass, Stalling, Gellermann,Hege, Deuflhard, Felix, “Influence of patient modelsand numerical methods on predicted powerdeposition patterns ” Int J Hyperthermia Jan 1999

Mason, Hurt, Walters, D’Andrea, Gajsek, Ryan, Nelson,

from cellular mobile telephones?…Numerical dosimetry…one of the most productive research areas during thepast decade…Instead of providing a consensus, theseSAR studies have spawned diverse opinions.”

deposition patterns, Int. J. Hyperthermia, Jan. 1999.“CT data sets are transformed…FDTD-method is theapplied on a cubic lattice (voxel model) or atetrahedron grid (region-based segmentation)…Forcomparison, the VSIE method is performed on the

Smith, Ziriax, “Effects of frequency, permittivity, and voxelsize on predicted specific absorption rate values inbiological tissue during electromagnetic-field exposure,”IEEE Trans. MTT, Nov. 2000.“Comparing localized SAR values determined

same tetrahedron grid…power deposition patterns inthe interior of the patient depended strongly on themodels…Due to its shorter calculation time, theFDTD method is currently used in the clinic.Predictions without prior corrections of tissueComparing localized SAR values determined

experimentally...with those obtained using the FDTDmethod demonstrate good agreement, except when theFDTD method calculates “hot or cold” spots (relative towhole-body average SAR).” Kainz, Christ, Kellom, Seidman, Nikoloski, Beard,

Kuster, “Dosimetric comparison of the SAM to 14

Predictions…without prior corrections of tissuespecifications are not always supported by clinicalexperiences.”

Gjonaj, Bartsch, Clemens, Schupp, Weiland, “High-resolution human anatomy models for advancedelectromagnetic field computations,” IEEE Trans. Mag.,Mar. 2002.“ induced SAR in the human head due to cellular

Kuster, Dosimetric comparison of the SAM to 14anatomical head models using a novel definition forthe mobile phone positioning,” Phys. Med. Biol., Nov.2000.“Numerous numerical dosimetric studies have been

bli h d b t th f bil h…induced SAR…in the human head due to cellularphone radiation…for heterogeneous and homogeneousHUGO models. The discrepancy of the results,concerning the location of hot spots…necessity of usinghigh-resolution anatomy models for realistic simulations.”

published about the exposure of mobile phone userswhich concluded with conflicting results. However,many of these studies lack reproducibility due toshortcomings in the description of the phoneposition.”

Page 9: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

In the mean time…

Massoudi, Durney, Iskander, “Limitations of the cubicalblock model of man in calculating SAR distributions,”IEEE Trans MTT Aug 1984IEEE Trans. MTT, Aug. 1984.“…while the moment-method, using pulse basisfunctions, gives good values for whole-body averageSAR, the convergence of the solutions for SARdistributions is questionable…Pulse functions should

Jin, Chen, Chew, Gan, Magin, Dimbylow, “Computation

be replaced by a better approximation.”

Schaubert, Wilton, Glisson, “A tetrahedral modelingmethod for electromagnetic scattering by arbitrarilyshaped inhomogeneous dielectric bodies ” IEEE

9 tissues, 8 mm3 resolution

of electromagnetic fields for high-frequency magneticresonance imaging applications,” Phys. Med. Biol., Dec.1996.“The use of pulse basis functions yielded slowconvergence and poor results when dealing with

shaped inhomogeneous dielectric bodies, IEEETrans. AP, Jan.1984.“Special basis functions are defined within tetrahedralvolume elements to insure that the normal electric fieldsatisfies the correct jump condition at interfaces

convergence and poor results when dealing withmaterials with high dielectric contrast. Betterformulations were later proposed…most use mixed-order basis functions and Galerkin’s testing.”

Zwamborn, Vandenberg, “The 3-D weak form of theconjugate-gradient FFT method for solving scatteringproblems,” IEEE Trans. MTT, Sep. 1992.

between different dielectric media.”

“…BCG-FFT method reduces the memory requirementto O(N) and computational complexity to O(NiterNlogN),making the problem solvable on a workstation. Forlossy material…Niter is found to be small.”

“A weak form of the integral equation…is obtained bytesting it with appropriate testing functions. In view ofthe partial derivatives...the volumetric rooftop functionsare chosen as testing and expansion functions.”

Page 10: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Historical Perspective: SupercomputingTop Supercomputers

(top500.org)

• 1. Tian-He 1AIntel X56706 Core 2930 MHz JaguarTian-He 1A Ranger• 2. Jaguar AMD x86_64 Opteron 6 Core 2600 MHz

• 15. Ranger

JaguarTian-He 1A Ranger

gAMD x86_64 Opteron 4 Core 2300 MHz

• Blue Waters (this year)IBM POWER7 processor 8 Core p

Page 11: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Historical Perspective: Limits of Serial ComputingComputing

Page 12: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

BACKGROUND

- FDTD vs. MOM

- Pre-corrected FFT/Adaptive Integral Method

- Computational Complexity

Page 13: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

FDTD vs. MOM for BIOEM

• FDTD • MOM+FFT-based Algorithm

+ Historical momentum/easy to code+ Matrix free- Uniform grids/staircasing

- Rarer/harder to code- Dense matrix (+ fast algorithm)+ Arbitrary mesh g g

- Special grid truncation- Extended grids

Courant stability condition

y+ Exact radiation condition+ Never mesh free space+ Unknowns only in space- Courant stability condition

-+ Phase dispersion-+ Time-domain (transients)

+ Unknowns only in space+- Incorrect Green function phase speed+- Frequency-domain (dispersive materials)

Simulation Time:

Memory:

Simulation Time:

Memory:iter( log )O N N N( )tO N N ′

( )O N( )O N ′Memory: e o y ( )O N( )O N

For the same accuracy: N N′>>err(FDTD)>>For err(MOM):N N′=

Page 14: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

FDTD vs. MOM for BIOEM

Error

Controversialres lts

Computational

results

Computational Research Increases in

Raw Computer

Power

Increasedexpectations

Curiosity

New applications

Power

yUtopia Controversial

resultsTime/Memory Cost

Page 15: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Volume Electric Field Integral Equation

• Time-harmonic VEFIE for Inhomogeneous and Lossy Dielectric Scatterer

≡( ) ( )ε σ μr r

0 0,ε μ incE

( )J r0 0,ε μ incE

( )( ) ( )

ε εω

= +r

r r

( ) ( ) ( )ε=D r r E r0

( ), ( ),ε σ μr rV 0 0

,ε μ0

( )

( ) 1 ( )( )j

εω

ε

⎛ ⎞⎟⎜ ⎟= −⎜ ⎟⎜ ⎟⎜⎝ ⎠J r D r

rV

( ) ( ) ( )

0

inc sca

0

( ) ( ) ( ) ( ) ( ) ( )

( ) ( )( ) ( )

jk

j

eG dv G

ω φ

μ ′− −

= − = + +∇

⎧ ⎫ ⎧ ⎫′⎪ ⎪ ⎪ ⎪⎪ ⎪ ⎪ ⎪′ ′ ′⎨ ⎬ ⎨ ⎬∫∫∫r r

E r E r E r E r A r r

A r J rr r r r

( )κ r

0

( , ) , ( , )( ) ( ) / ( ) 4

V

G dv Gjφ ωε π

= =⎨ ⎬ ⎨ ⎬′ ′⎪ ⎪ ⎪ ⎪−∇ ⋅ ′−⎪ ⎪ ⎪ ⎪⎩ ⎭ ⎩ ⎭∫∫∫ r r r r

r J r r r

inc 2( ) ( , ) ( ) ( )( ) ( ) ( ) ( ) ( ) V

Gj G dv dv

κω μ κ

′ ′ ′ ′∇ ⋅′ ′ ′ ′ ′= + ∇ ∈∫∫∫ ∫∫∫D r r r r D r

E r r r r D r r0

0

( ) ( ) ( , ) ( ) ( ) V( )

V V

j G dv dvω μ κε ε

= + −∇ ∈∫∫∫ ∫∫∫E r r r r D r rr

Page 16: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Method of Moments

• MOM( ) ( ) ( ) ( )G ′ ′ ′ ′∇D D

- Discretize & Galerkin test

inc 20

0

( ) ( , ) ( ) ( )( ) ( ) ( , ) ( ) ( ) V

( )V V

Gj G dv dv

κω μ κ

ε ε′ ′ ′ ′∇ ⋅′ ′ ′ ′ ′= + −∇ ∈∫∫∫ ∫∫∫

D r r r r D rE r r r r D r r

r

1 1

( ) ( ) ( ) ( )

( ) ( ) VEFIE for 1,...,

N N

k k k k kk k

k k

I jw I

dv k N

κ

κ

′ ′ ′ ′ ′′ ′= =

≅ ≅

=

∑ ∑

∫∫∫

D r V r J r V r

r V r ik−p

+

kV −

- Fill matrix & solve

( ) ( ) , ,k k

V∫∫∫

k+p

kV +ka

/ (3 )a v V± ± ±⎧⎪ ∈p r

kε+

kε−

inc N N

=Z I F / (3 )( )

0 elsewherek k k k

k

a v V⎪ ∈⎪= ⎨⎪⎪⎩

p rV r

N N×Z I F

2( )O N Schaubert, Wilton, Glisson, “A tetrahedral modeling

• Computational Cost

Matrix fill time:2( )O N2( )O N

( )O Nmethod for electromagnetic scattering by arbitrarilyshaped inhomogeneous dielectric bodies,” IEEETrans. AP, Jan.1984.

Time per iteration:

Matrix fill time:

Memory:

Page 17: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Pre-corrected FFT/Adaptive Integral Method

Step 1: Project/AnterpolateEmbed in a regular grid with nodes

CNV

Step 2: Propagate

Step 3: InterpolateStep 4: CorrectS ep Co ec

C ti i i 1

near {IFFT FF [ ] [F ]F }TTT≈ + ⋅Λ ΛGZZI I I

Δ

Correction region size: 1γ =

ΛTΛ

G cΔ

• Computational Costar Cne( + + )O NN pNMatrix fill time:

near C C l + + )o+ g (N N NpN pNO

Λ ΛnearZT Λ−ΛZ G

Time per iteration:

ar Cne( + + )O NN pNMemory:

Page 18: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Benchmark Problems

CAD Sphere Phantom Voxel-Based Sphere Phantom AustinMan Partial BodyΔV (mm) ΔV (mm) ΔV (mm)l / lλ / lδ l / lλ / lδ lN N N

(mm3) (mm) (mm3) (mm) (mm3) (mm)

1030 21 2.4 1.7 ~104 683 20 2.5 1.8 ~1.5 104 683 20 ~4 104

15 5 9.8 7.0 ~6 105 11 5 9.9 7.1 ~9 105 11 5 ~2.5 106

l / lλ / lδ l / lλ / lδ lN N N

EincEincEinc

0.3 1.4 36 26 ~3.2 107 0.2 1.3 40 29 ~5.6 107 0.2 1.3 ~1.6 108

50 mm50 mm50 mm 50 mm

radius=104 mm

41 5 0 97ε σ

radius=104 mm

41 5 0 97ε σ

3500 300 350mm× ×

41 5 0 97. , .ε σ= =r

41 5 0 97. , .ε σ= =r

f = 900 MHz

Page 19: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Definitions for Post-Processing

( , ) ( ) ( , ) ( , )P f f t t dtσ⟨ ⟩ = ⋅∫r r E r E rabsorbed

22

AIM sinMIE d dπ π

θθ θθσ σ θ θ φ−∫ ∫R l i /

( ) ( , ) ( , )

f

f fσ ∗= ⋅r E r E r

1

12

0 02

2

0 0

sinMIE d d

θθ θθ

π π

θθ

φ

σ θ θ φ

=∫ ∫

∫ ∫

Relative RCS Error

Einc

0 0

Einc Einc

50 mm50 mm 50 mm 50 mm

radius=104 mm

41 5 0 97ε σ

radius=104 mm

41 5 0 97ε σ

3500 300 350mm× ×f = 900 MHz

41 5 0 97. , .ε σ= =r

41 5 0 97. , .ε σ= =r

Page 20: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Accuracy683 mm30.2 mm3 11 mm3

P⟨ ⟩ P⟨ ⟩P⟨ ⟩absorbed P⟨ ⟩absorbed

1030 mm315 mm30 3 mm3 1030 mm15 mm30.3 mm

Page 21: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Computational Costs

ΔV=683 mm3

ΔV=0.2 mm3

Page 22: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Need for Parallelization:Minimum P for Different ConstraintsMinimum P for Different Constraints

Page 23: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Parallelization- General Limits

- Traditional Scheme and Its Limitations

- Proposed Scheme

- Comparison

Page 24: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

General Limits of Parallelization

Sequential part of the code (Amdahl’s Law)

• (Hard) Scalability Limitations 1 q p ( )

Load imbalance/Can’t divide computations further (run out of parallelism)2

Communication latency/bandwidth3

e or

Mem

ory

e or

Mem

ory

k Ti

me

all C

lock

Tim

e

all C

lock

Tim

e

Wal

l Clo

ck

Wa

P 31 2

Wa

P P

Page 25: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Traditional Parallelization Scheme• Single Workload Distribution Strategy [1]

1. Project/Anterpolate: 1-D Slab xy 6P P= =2. Propagate: 1-D Slab3. Interpolate: 1-D Slab4. Correct: 1-D Slab5. Communicate Near

6P P1D cx cymax

min( , )P N N=

y y

P1 P2 P3 P4 P5 P6x

y

z x

y

zP1 P2 P3 P4 P5 P6

[1] Aluru, Nadkarni, White, “A parallel precorrected FFT based capacitance extraction program for signal integrity analysis,” Proc. Design Automat. Conf., 1996.

Page 26: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Traditional Parallelization Scheme• Single Workload Distribution Strategy [1]

1. Project/Anterpolate: 1-D Slab xy 6P P= =2. Propagate: 1-D Slab3. Interpolate: 1-D Slab4. Correct: 1-D Slab5. Communicate Near

6P P1D cx cymax

min( , )P N N=

P6

P4

P5

P5

P6

P2

P3

P3

P4

y y

P1P2

P1

x

y

z x

y

z

[1] Aluru, Nadkarni, White, “A parallel precorrected FFT based capacitance extraction program for signal integrity analysis,” Proc. Design Automat. Conf., 1996.

P1

Page 27: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Traditional Parallelization Scheme

1. Project/Anterpolate: 1-D Slab• Single Workload Distribution Strategy

2. Propagate: 1-D Slaba) FFTZY – Transpose – FFTXb) Multiplyc) IFFTX – Transpose – IFFTYZ) X p YZ

3. Interpolate: 1-D Slab4. Correct: 1-D Slab5. Communicate Near

P1 P2 P3 P4 P5 P6near IFFT{FFT[ ] FFT[ ]}T+Λ ⋅≈ ΛZ I GZI I x

y

z

near C Clog+ + + )(

NO

N N pNp N

near C

1Dmax

+ + +?

( )??? ??

NO

p pNN N

PMemory:

CPU time 1Dmax

+ + +?? ?

)???

(OP

near 1Dmax

+( )PO C

per iteration:

per iteration:Comm time

Page 28: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Traditional Parallelization Scheme

1. Project/Anterpolate: 1-D Slab• Single Workload Distribution Strategy

2. Propagate: 1-D Slaba) FFTZY – Transpose – FFTXb) Multiplyc) IFFTX – Transpose – IFFTYZ) X p YZ

3. Interpolate: 1-D Slab4. Correct: 1-D Slab5. Communicate Near

P1 P2 P3 P4 P5 P6near IFFT{FFT[ ] T }[ ]FFT+Λ ⋅≈ ΛZ I GZI I x

y

z

nea Cr Cl+ +(

og)+

NNO

N pNp N

nea C

1D

r

max

+ +? ?

)??

( +? ?

N p

P

pNN NOMemory:

CPU time 1Dmax

+ +?? ?

( )???

+OP

near 1Dmax

+( )PO C

per iteration:

per iteration:Comm time

Page 29: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Traditional Parallelization Scheme

1. Project/Anterpolate: 1-D Slab• Single Workload Distribution Strategy

P6

2. Propagate: 1-D Slaba) FFTZY – Transpose – FFTXb) Multiplyc) IFFTX – Transpose – IFFTYZ

P4

P5

) X p YZ3. Interpolate: 1-D Slab4. Correct: 1-D Slab5. Communicate Near P2

P3

P1

near IFFT{FFT[ ] T }[ ]FFT+Λ ⋅≈ ΛZ I GZI I x

y

z

P4

P5

P6

P4

P5

P6

nea Cr Cl+ +(

og)+

NNO

N pNp N

nea C

1D

r

max

+ +? ?

)??

( +? ?

N p

P

pNN NOMemory:

CPU time

P2

P1

P3

P2

P1

P31Dmax

+ +?? ?

( )???

+OP

ne 1Dm

arax

+( )PO C

per iteration:

per iteration:Comm time

Page 30: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Traditional Parallelization Scheme

1. Project/Anterpolate: 1-D Slab• Single Workload Distribution Strategy

P6

2. Propagate: 1-D Slaba) FFTZY – Transpose – FFTXb) Multiplyc) IFFTX – Transpose – IFFTYZ

P4

P5

) X p YZ3. Interpolate: 1-D Slab4. Correct: 1-D Slab5. Communicate Near P2

P3

near IFFT{FFT[ ] T }[ ]FFT+Λ ⋅≈ ΛZ I GZI I

P1

x

y

z

nea Cr Cl+ +(

og)+

NNO

N pNp N

nea C

1D

r

max

+ +? ?

)??

( +? ?

N p

P

pNN NOMemory:

CPU time 1Dmax

+ +?? ?

( )???

+OP

ne 1Dm

arax

+( )PO C

per iteration:

per iteration:Comm time

Page 31: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Traditional Parallelization Scheme

1. Project/Anterpolate: 1-D Slab• Single Workload Distribution Strategy

P6

2. Propagate: 1-D Slaba) FFTZY – Transpose – FFTXb) Multiplyc) IFFTX – Transpose – IFFTYZ

P4

P5

) X p YZ3. Interpolate: 1-D Slab4. Correct: 1-D Slab5. Communicate Near P2

P3

near IFFT{ [FFT }] F [F T ]T+Λ ⋅≈ ΛGZI IZ I

P1

x

y

z

nea Cr Cl+ +(

og)+

NNO

N pNp N

nea C

1D

r

max

+ +? ?

)??

( +? ?

N p

P

pNN NOMemory:

CPU time 1Dmax

+ +?? ?

( )???

+OP

ne 1Dm

arax

+( )PO C

per iteration:

per iteration:Comm time

Page 32: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Traditional Parallelization Scheme

1. Project/Anterpolate: 1-D Slab• Single Workload Distribution Strategy

P6

2. Propagate: 1-D Slaba) FFTZY – Transpose – FFTXb) Multiplyc) IFFTX – Transpose – IFFTYZ

P4

P5

) X p YZ3. Interpolate: 1-D Slab4. Correct: 1-D Slab5. Communicate Near P2

P3

near {IFFT FF [ ] [ ]}FFTTT+Λ ⋅≈ ΛGZZI II

P1

x

y

z

nea Cr Cl+ +(

og)+

NNO

N pNp N

nea C

1D

r

max

+ +? ?

)??

( +? ?

N p

P

pNN NOMemory:

CPU time 1Dmax

+ +?? ?

( )???

+OP

ne 1Dm

arax

+( )PO C

per iteration:

per iteration:Comm time

Page 33: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Traditional Parallelization Scheme

1. Project/Anterpolate: 1-D Slab• Single Workload Distribution Strategy

2. Propagate: 1-D Slaba) FFTZY – Transpose – FFTXb) Multiplyc) IFFTX – Transpose – IFFTYZ) X p YZ

3. Interpolate: 1-D Slab4. Correct: 1-D Slab5. Communicate Near

P1 P2 P3 P4 P5 P6near {IFFT FF [ ] [ ]}FFTTT+Λ ⋅≈ ΛGZZI II x

y

z

P4

P5

P6

P4

P5

P6

nea Cr Cl+ +(

og)+

NNO

N pNp N

nea C

1D

r

max

+ +? ?

)??

( +? ?

N p

P

pNN NOMemory:

CPU time

P1 P1

P2

P3

P2

P31Dmax

+ +?? ?

( )???

+OP

ne 1Dm

arax

+( )PO C

per iteration:

per iteration:Comm time

Page 34: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Traditional Parallelization Scheme

1. Project/Anterpolate: 1-D Slab• Single Workload Distribution Strategy

2. Propagate: 1-D Slaba) FFTZY – Transpose – FFTXb) Multiplyc) IFFTX – Transpose – IFFTYZ) X p YZ

3. Interpolate: 1-D Slab4. Correct: 1-D Slab5. Communicate Near

P1 P2 P3 P4 P5 P6near {IFFT FF [ ] [ ]}FFTTT+Λ ⋅≈ ΛGZZI II x

y

z

nea Cr Cl+ +(

og)+

NNO

N pNp N

nea C

1D

r

max

+ +? ?

)??

( +? ?

N p

P

pNN NOMemory:

CPU time 1Dmax

+ +?? ?

( )???

+OP

ne 1Dm

arax

+( )PO C

per iteration:

per iteration:Comm time

Page 35: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Traditional Parallelization Scheme

1. Project/Anterpolate: 1-D Slab• Single Workload Distribution Strategy

2. Propagate: 1-D Slaba) FFTZY – Transpose – FFTXb) Multiplyc) IFFTX – Transpose – IFFTYZ) X p YZ

3. Interpolate: 1-D Slab4. Correct: 1-D Slab5. Communicate Near

P1 P2 P3 P4 P5 P6near {IFFT FF [ ] [ ]}FFTTT+Λ ⋅≈ ΛGZZI II x

y

z

nea Cr Cl+

g(

o+ + )NN Np pNN

O

nea C

Dm

r

1ax

( ++?? ?

)??

+?

N NO

p pN

P

NMemory:

CPU time 1Dmax

+??? ? ??

( + + )P

O

ne 1Dm

arax

+( )PO C

per iteration:

per iteration:Comm time

Page 36: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Traditional Parallelization Scheme

1. Project/Anterpolate: 1-D Slab• Single Workload Distribution Strategy

2. Propagate: 1-D Slaba) FFTZY – Transpose – FFTXb) Multiplyc) IFFTX – Transpose – IFFTYZ) X p YZ

3. Interpolate: 1-D Slab4. Correct: 1-D Slab5. Communicate Near

P1 P2 P3 P4 P5 P6x

y

znear {IFFT FF [ ] [F ]F }TTT≈ + ⋅Λ ΛGZZI I I

nea Cr Cl( +

og+ + )NN Np pNN

O

near C

1Dmax

( + +? ??

+ )???

N NO

pN pN

PMemory:

CPU time 1Dmax

( +?? ?? ??

+ + )P

O

ne 1Dm

arax

+( )PO C

per iteration:

per iteration:Comm time

Page 37: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Traditional Parallelization Scheme• Single Workload Distribution Strategy

1. Project/Anterpolate: 1-D Slab2. Propagate: 1-D Slab

a) FFTZY – Transpose – FFTXb) Multiplyc) IFFTX – Transpose – IFFTYZ) X p YZ

3. Interpolate: 1-D Slab4. Correct: 1-D Slab5. Communicate Near

P1 P2 P3 P4 P5 P6near {IFFT FF [ ] [F ]F }TTT≈ + ⋅Λ ΛGZZI I I x

y

z

P4

P5P6

P4

P5P6

nea Cr Cl( +

og+ + )NN Np pNN

O

near C

1Dmax

( + +? ??

+ )???

N NO

pN pN

PMemory:

CPU time

P2P1

P3

P2P1

P31Dmax

( +?? ?? ??

+ + )P

O

near 1Dmax

( + )PO C

per iteration:

per iteration:Comm time

Page 38: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Limits of Traditional Parallelization Scheme

1. Anterpolation, Correction, Interpolation computations not balanced

2. “Grid limited” 1D cx cymax

min( , )P N N=

Page 39: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

General Limits of Parallelization

Sequential part of the code (Amdahl’s Law)

• (Hard) Scalability Limitations 1 q p ( )

Load imbalance/Can’t divide computations further (run out of parallelism)2

Communication latency/bandwidth3

e or

Mem

ory

e or

Mem

ory

k Ti

me

all C

lock

Tim

e

all C

lock

Tim

e

Wal

l Clo

ck

Wa

P 31 2

Wa

P P

Page 40: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Limits of Traditional Parallelization Maximum P and Maximum NMaximum P and Maximum N

radius=104 mm

41 5 0 97. , .ε σ= =r

Page 41: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Parallelization- General Limits

- Traditional Scheme and Its Limitations

- Proposed Scheme

- Comparison

Page 42: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies [1]

1. Project/Anterpolate: 2-D Pencil2. Propagate: 2-D Pencil3. Interpolate: 2-D Pencil3.5. Redistribute4. Correct: 3-D Block5. Communicate Near/Redistribute

P4 P5 P6 P4 P5 P6

P1 P2 P3 P1 P2 P3yy

xzxz

[1] Wei, Yılmaz,“A 2-D decomposition based parallelization of AIM for 3-D BIOEM problems,” IEEE APS Symp., 2011

Page 43: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil

P4 P5 P6

2. Propagate: 2-D Pencil3. Interpolate: 2-D Pencil3.5. Redistribute4. Correct: 3-D Block

a) Shift Balance in X Dimensionb) Shift Balance in Y Dimensionc) Shift Balance in Z Dimension

5 Communicate Near/RedistributeP1 P2 P3

y

5. Communicate Near/Redistribute

xz

Page 44: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil2. Propagate: 2-D Pencil3. Interpolate: 2-D Pencil3.5. Redistribute4. Correct: 3-D Block

P4 P5 P6

a) Shift Balance in X Dimensionb) Shift Balance in Y Dimensionc) Shift Balance in Z Dimension

5 Communicate Near/Redistribute5. Communicate Near/RedistributeP1 P2 P3

y

xz

Page 45: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil2. Propagate: 2-D Pencil3. Interpolate: 2-D Pencil3.5. Redistribute4. Correct: 3-D Block P4

P5P6

a) Shift Balance in X Dimensionb) Shift Balance in Y Dimensionc) Shift Balance in Z Dimension

5 Communicate Near/Redistribute5. Communicate Near/RedistributeP1 P2

P3

y

xz

Page 46: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil2. Propagate: 2-D Pencil3. Interpolate: 2-D Pencil3.5. Redistribute4. Correct: 3-D Block P4

P5P6

a) Shift Balance in X Dimensionb) Shift Balance in Y Dimensionc) Shift Balance in Z Dimension

5 Communicate Near/Redistribute5. Communicate Near/RedistributeP1 P2

P3

y

xz

Page 47: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil xy yz 3 2P P P= × = ×2. Propagate: 2-D Pencil3. Interpolate: 2-D Pencil3.5. Redistribute4. Correct: 3-D Block

3 2P P P× ×2D xy yz

max max maxP P P= ×

cx cy cy czmin( , ) min( , )N N N N= ×

5. Communicate Near/Redistribute

P4 P5 P6

P4 P5 P6

P1 P2 P3

x

y

z x

y

z

P1 P2 P3

Page 48: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil xy yz 3 2P P P= × = ×2. Propagate: 2-D Pencil3. Interpolate: 2-D Pencil3.5. Redistribute4. Correct: 3-D Block

3 2P P P× ×

cx cy cy czmin( , ) min( , )N N N N= ×

2D xy yzmax max maxP P P= ×

5. Communicate Near/RedistributeP1 P2 P3

P4 P5 P6

P4 P5 P6

x

y

z x

y

z

Page 49: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil xy yz 3 2P P P= × = ×2. Propagate: 2-D Pencil3. Interpolate: 2-D Pencil3.5. Redistribute4. Correct: 3-D Block

3 2P P P× ×

cx cy cy czmin( , ) min( , )N N N N= ×

2D xy yzmax max maxP P P= ×

5. Communicate Near/Redistribute

P3P6

P2

P3P6

P6

P1

P2P5

P5

x

y

z

P1

P4P4

x

y

z

Page 50: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil2. Propagate: 2-D Pencil3. Interpolate: 2-D Pencil3.5. Redistribute4. Correct: 3-D Block

P4 P5 P6

5. Communicate Near/Redistribute

y

P1 P2 P3

xz

Page 51: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil2. Propagate: 2-D Pencil3. Interpolate: 2-D Pencil3.5. Redistribute4. Correct: 3-D Block

P4 P5 P6

5. Communicate Near/Redistribute

y

P1 P2 P3

xz

P4

P5P6

P4

P5P6

P2P1

P3P4

P2P1

P3P4

Page 52: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil2. Propagate: 2-D Pencil3. Interpolate: 2-D Pencil3.5. Redistribute4. Correct: 3-D Block P4

P5P6

5. Communicate Near/Redistribute

P1 P2P3

y

xz

P4

P5P6

P4

P5P6

P2P1

P3P4

P2P1

P3P4

Page 53: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil2. Propagate: 2-D Pencil

a) FFTZ – ZY Transpose – FFTY– YX Transpose – FFTX

b) Multiply

P4 P5 P6

) p yc) IFFTX – XY Transpose – IFFTY

– YZ Transpose – IFFTZ3. Interpolate: 2-D Pencil3 5 Redistribute P1 P2 P33.5. Redistribute4. Correct: 3-D Block5. Communicate Near/Redistribute

near IFFT{FFT[ ] FFT[ ]}T+Λ ⋅≈ ΛZ I GZI I x

y

z

P1 P2 P3

{ [ ] [ ]}

near C Clog+ + +( )

N pN N N pNO

near C

xy yzmax max

+ +??? ??

( )+?

N pN N

P

pN

POMemory:

CPU time

x

y

z

xy yzmax max

+ + +?? ? ??

( )? P P

O

near rd

yz xymax max

+ +( )OP P

C CP P

+

per iteration:

per iteration:Comm time

Page 54: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil2. Propagate: 2-D Pencil

a) FFTZ – ZY Transpose – FFTY– YX Transpose – FFTX

b) Multiply

P4 P5 P6

) p yc) IFFTX – XY Transpose – IFFTY

– YZ Transpose – IFFTZ3. Interpolate: 2-D Pencil3 5 Redistribute P1 P2 P33.5. Redistribute4. Correct: 3-D Block5. Communicate Near/Redistribute

near IFFT{FFT[ ] T }[ ]FFT+Λ ⋅≈ ΛZ I GZI I

P1 P2 P3

x

y

z{ [ ] }[ ]

C Cnear

+ +( +l go

)N N

OpNN pN

C

xy yzmax m

e

ax

n ar

+ +?? ??

( +??

)N

P PO

pNN pNMemory:

CPU time xy yz

max max

+ +? ??

( +??

)? P P

O

near rd

yz xymax max

+ +( )OP P

C CP P

+

per iteration:

per iteration:Comm time

Page 55: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil

P1 P2 P3

P4 P5 P6

2. Propagate: 2-D Pencila) FFTZ – ZY Transpose – FFTY

– YX Transpose – FFTXb) Multiply) p yc) IFFTX – XY Transpose – IFFTY

– YZ Transpose – IFFTZ3. Interpolate: 2-D Pencil3 5 Redistribute3.5. Redistribute4. Correct: 3-D Block5. Communicate Near/Redistribute

near IFFT{FFT[ ] T }[ ]FFT+Λ ⋅≈ ΛZ I GZI I x

y

z

P5

P3P6

P5

P3P6

{ [ ] }[ ]

C Cnear

+ +( +l go

)N N

OpNN pN

C

xy yzmax m

e

ax

n ar

+ +?? ??

( +??

)N

P PO

pNN pNMemory:

CPU time

P4P1

P2

P4P1

P2xy yzmax max

+ +? ??

( +??

)? P P

O

near rd

yzm x

xymaxa

( + )+OP

PC

P

PC +

per iteration:

per iteration:Comm time

Page 56: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil

P1 P2 P3

P4 P5 P6

2. Propagate: 2-D Pencila) FFTZ – ZY Transpose – FFTY

– YX Transpose – FFTXb) Multiply) p yc) IFFTX – XY Transpose – IFFTY

– YZ Transpose – IFFTZ3. Interpolate: 2-D Pencil3 5 Redistribute3.5. Redistribute4. Correct: 3-D Block5. Communicate Near/Redistribute

near IFFT{FFT[ ] T }[ ]FFT+Λ ⋅≈ ΛZ I GZI I x

y

z{ [ ] }[ ]

C Cnear

+ +( +l go

)N N

OpNN pN

C

xy yzmax m

e

ax

n ar

+ +?? ??

( +??

)N

P PO

pNN pNMemory:

CPU time xy yz

max max

+ +? ??

( +??

)? P P

O

near rd

yzm x

xymaxa

( + )+OP

PC

P

PC +

per iteration:

per iteration:Comm time

Page 57: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil P3

2. Propagate: 2-D Pencila) FFTZ – ZY Transpose – FFTY

– YX Transpose – FFTXb) Multiply P2

P6

) p yc) IFFTX – XY Transpose – IFFTY

– YZ Transpose – IFFTZ3. Interpolate: 2-D Pencil3 5 Redistribute

P1

P5

3.5. Redistribute4. Correct: 3-D Block5. Communicate Near/Redistribute

near IFFT{FFT[ ] T }[ ]FFT+Λ ⋅≈ ΛZ I GZI I x

y

z

P4

P4

P5P6

P4

P5P6

{ [ ] }[ ]

C Cnear

+ +( +l go

)N N

OpNN pN

C

xy yzmax m

e

ax

n ar

+ +?? ??

( +??

)N

P PO

pNN pNMemory:

CPU time

P2P1

P3

P2P1

P3xy yzmax max

+ +? ??

( +??

)? P P

O

yz xymax

near rd

max

( + + )C CP P

PO

P+

per iteration:

per iteration:Comm time

Page 58: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil P3

2. Propagate: 2-D Pencila) FFTZ – ZY Transpose – FFTY

– YX Transpose – FFTXb) Multiply P2

P6

) p yc) IFFTX – XY Transpose – IFFTY

– YZ Transpose – IFFTZ3. Interpolate: 2-D Pencil3 5 Redistribute

P1

P5

3.5. Redistribute4. Correct: 3-D Block5. Communicate Near/Redistribute

near IFFT{FFT[ ] T }[ ]FFT+Λ ⋅≈ ΛZ I GZI I x

y

z

P4

{ [ ] }[ ]

C Cnear

+ +( +l go

)N N

OpNN pN

C

xy yzmax m

e

ax

n ar

+ +?? ??

( +??

)N

P PO

pNN pNMemory:

CPU time xy yz

max max

+ +? ??

( +??

)? P P

O

yz xymax

near rd

max

( + + )C CP P

PO

P+

per iteration:

per iteration:Comm time

Page 59: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil P3

2. Propagate: 2-D Pencila) FFTZ – ZY Transpose – FFTY

– YX Transpose – FFTXb) Multiply P2

P6

) p yc) IFFTX – XY Transpose – IFFTY

– YZ Transpose – IFFTZ3. Interpolate: 2-D Pencil3 5 Redistribute

P1

P5

3.5. Redistribute4. Correct: 3-D Block5. Communicate Near/Redistribute

near IFFT{ [FFT }] F [F T ]T+Λ ⋅≈ ΛGZI IZ I x

y

z

P4

{ [ }] [ ]

C Cnear

+ +( +l go

)N N

OpNN pN

C

xy yzmax m

e

ax

n ar

+ +?? ??

( +??

)N

P PO

pNN pNMemory:

CPU time xy yz

max max

+ +? ??

( +??

)? P P

O

yz xymax

near rd

max

( + + )C CP P

PO

P+

per iteration:

per iteration:Comm time

Page 60: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil P3

2. Propagate: 2-D Pencila) FFTZ – ZY Transpose – FFTY

– YX Transpose – FFTXb) Multiply P2

P6

) p yc) IFFTX – XY Transpose – IFFTY

– YZ Transpose – IFFTZ3. Interpolate: 2-D Pencil3 5 Redistribute

P1

P5

3.5. Redistribute4. Correct: 3-D Block5. Communicate Near/Redistribute

near {IFFT FF [ ] [ ]}T FFTT+Λ ⋅≈ ΛGZZI II x

y

z

P4

{ [ ] [ ]}

C Cnear

+ +( +l go

)N N

OpNN pN

C

xy yzmax m

e

ax

n ar

+ +?? ??

( +??

)N

P PO

pNN pNMemory:

CPU time xy yz

max max

+ +? ??

( +??

)? P P

O

yz xymax

near rd

max

( + + )C CP P

PO

P+

per iteration:

per iteration:Comm time

Page 61: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil

P1 P2 P3

P4 P5 P6

2. Propagate: 2-D Pencila) FFTZ – ZY Transpose – FFTY

– YX Transpose – FFTXb) Multiply) p yc) IFFTX – XY Transpose – IFFTY

– YZ Transpose – IFFTZ3. Interpolate: 2-D Pencil3 5 Redistribute3.5. Redistribute4. Correct: 3-D Block5. Communicate Near/Redistribute

near {IFFT FF [ ] [ ]}T FFTT+Λ ⋅≈ ΛGZZI II x

y

z

P4

P5P6

P4

P5P6

{ [ ] [ ]}

C Cnear

+ +( +l go

)N N

OpNN pN

C

xy yzmax m

e

ax

n ar

+ +?? ??

( +??

)N

P PO

pNN pNMemory:

CPU time

P2P1

P3

P2P1

P3xy yzmax max

+ +? ??

( +??

)? P P

O

yz xymax

near rd

max

( + + )C CP P

PO

P+

per iteration:

per iteration:Comm time

Page 62: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil

P1 P2 P3

P4 P5 P6

2. Propagate: 2-D Pencila) FFTZ – ZY Transpose – FFTY

– YX Transpose – FFTXb) Multiply) p yc) IFFTX – XY Transpose – IFFTY

– YZ Transpose – IFFTZ3. Interpolate: 2-D Pencil3 5 Redistribute3.5. Redistribute4. Correct: 3-D Block5. Communicate Near/Redistribute

near {IFFT FF [ ] [ ]}T FFTT+Λ ⋅≈ ΛGZZI II x

y

z{ [ ] [ ]}

C Cnear

+ +( +l go

)N N

OpNN pN

C

xy yzmax m

e

ax

n ar

+ +?? ??

( +??

)N

P PO

pNN pNMemory:

CPU time xy yz

max max

+ +? ??

( +??

)? P P

O

yz xymax

near rd

max

( + + )C CP P

PO

P+

per iteration:

per iteration:Comm time

Page 63: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil2. Propagate: 2-D Pencil

a) FFTZ – ZY Transpose – FFTY– YX Transpose – FFTX

b) Multiply

P4 P5 P6

) p yc) IFFTX – XY Transpose – IFFTY

– YZ Transpose – IFFTZ3. Interpolate: 2-D Pencil3 5 Redistribute P1 P2 P33.5. Redistribute4. Correct: 3-D Block5. Communicate Near/Redistribute

near {IFFT FF [ ] [ ]}T FFTT+Λ ⋅≈ ΛGZZI II x

y

z

P1 P2 P3

P5

P3P6

P5

P3P6

{ [ ] [ ]}

C Cnear

+ +( +l go

)N N

OpNN pN

C

xy yzmax m

e

ax

n ar

+ +?? ??

( +??

)N

P PO

pNN pNMemory:

CPU time

P4P1

P2

P4P1

P2xy yzmax max

+ +? ??

( +??

)? P P

O

yz xymax

near rd

max

( + + )C CP P

PO

P+

per iteration:

per iteration:Comm time

Page 64: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil2. Propagate: 2-D Pencil

a) FFTZ – ZY Transpose – FFTY– YX Transpose – FFTX

b) Multiply

P4 P5 P6

) p yc) IFFTX – XY Transpose – IFFTY

– YZ Transpose – IFFTZ3. Interpolate: 2-D Pencil3 5 Redistribute P1 P2 P33.5. Redistribute4. Correct: 3-D Block5. Communicate Near/Redistribute

near {IFFT FF [ ] [ ]}T FFTT+Λ ⋅≈ ΛGZZI II x

y

z

P1 P2 P3

{ [ ] [ ]}

C Cnear

+ +( +l go

)N N

OpNN pN

C

xy yzmax m

e

ax

n ar

+ +?? ??

( +??

)N

P PO

pNN pNMemory:

CPU time xy yz

max max

+ +? ??

( +??

)? P P

O

yz xymax

near rd

max

( + + )C CP P

PO

P+

per iteration:

per iteration:Comm time

Page 65: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil2. Propagate: 2-D Pencil

a) FFTZ – ZY Transpose – FFTY– YX Transpose – FFTX

b) Multiply

P4 P5 P6

) p yc) IFFTX – XY Transpose – IFFTY

– YZ Transpose – IFFTZ3. Interpolate: 2-D Pencil3 5 Redistribute P1 P2 P33.5. Redistribute4. Correct: 3-D Block5. Communicate Near/Redistribute

near {IFFT FF [ ] [ ]}T FFTT+Λ ⋅≈ ΛGZZI II x

y

z

P1 P2 P3

{ [ ] [ ]}

C Cnear

+lo

+ +g

( )N N

OpN pNN

C

xy yzmax ma

n ar

x

e

??+( + +

????)

N

P PO

pN pNNMemory:

CPU time

x

y

z

xy yzmax max

+? ?? ???

+ +( )P P

O

yz xymax

near rd

max

( + + )C CP P

PO

P+

per iteration:

per iteration:Comm time

Page 66: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil2. Propagate: 2-D Pencil

a) FFTZ – ZY Transpose – FFTY– YX Transpose – FFTX

b) Multiply

P4 P5 P6

) p yc) IFFTX – XY Transpose – IFFTY

– YZ Transpose – IFFTZ3. Interpolate: 2-D Pencil3 5 Redistribute P1 P2 P33.5. Redistribute4. Correct: 3-D Block5. Communicate Near/Redistribute

near {IFFT FF [ ] [ ]}T FFTT+Λ ⋅≈ ΛGZZI II x

y

z{ [ ] [ ]}

C Cnear

+lo

+ +g

( )N N

OpN pNN

C

xy yzmax ma

n ar

x

e

??+( + +

????)

N

P PO

pN pNNMemory:

CPU time P4

P5P6

P4

P5P6

xy yzmax max

+? ?? ???

+ +( )P P

O

rd

yz xym

ne

ax

ar

max

+( + )P P

CP P

CO +

per iteration:

per iteration:Comm time P2

P1

P3

P2P1

P3

Page 67: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil2. Propagate: 2-D Pencil

a) FFTZ – ZY Transpose – FFTY– YX Transpose – FFTX

b) Multiply

P4P5

P6

) p yc) IFFTX – XY Transpose – IFFTY

– YZ Transpose – IFFTZ3. Interpolate: 2-D Pencil3 5 Redistribute P1 P23.5. Redistribute4. Correct: 3-D Block5. Communicate Near/Redistribute

P1P3

x

y

znear {IFFT FF [ ] [F ]F }T TT≈ + ⋅Λ ΛGZZI I I{ [ ] [ ]}

C Cnear

( + + +log

)N NpN p

ONN

C

xy yzmax ma

ne r

x

a

??( + +

?+

? ?)

N

P PO

pN pNNP

Memory:

CPU time xy yz

max xma

( + + +?? ?? ?

)OPPP

rd

yz xym

ne

ax

ar

max

+( + )P P

CP P

CO +

per iteration:

per iteration:Comm time

Page 68: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Proposed Parallelization Scheme• Two Workload Distribution Strategies

1. Project/Anterpolate: 2-D Pencil2. Propagate: 2-D Pencil

a) FFTZ – ZY Transpose – FFTY– YX Transpose – FFTX

b) Multiply

P4P5

P6

) p yc) IFFTX – XY Transpose – IFFTY

– YZ Transpose – IFFTZ3. Interpolate: 2-D Pencil3 5 Redistribute P1 P23.5. Redistribute4. Correct: 3-D Block5. Communicate Near/Redistribute

near {IFFT FF [ ] [F ]F }T TT≈ + ⋅Λ ΛGZZI I I

P1 P3

x

y

z{ [ ] [ ]}

C Cnear

( + + +log

)N NpN p

ONN

C

xy yzmax ma

ne r

x

a

??( + +

?+

? ?)

N

P PO

pN pNNP

Memory:

CPU time P4

P5P6

P4

P5P6

xy yzmax xma

( + + +?? ?? ?

)OPPP

near rd

yz xymax max

( + + )P

CP

OP

CP

+

per iteration:

per iteration:Comm time P2

P1

P3

P2P1

P3

Page 69: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Parallelization- General Limits

- Traditional Scheme and Its Limitations

- Proposed Scheme

- Comparison

Page 70: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Traditional vs. Proposed Scheme

1. Correction computations better balanced

2. Grid limitation extended 2D cx cy cy czmax

min( , )min( , )P N N N N=

Page 71: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Traditional vs. ProposedMaximum P and Maximum NMaximum P and Maximum N

radius=104 mm

41 5 0 97. , .ε σ= =r

Page 72: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

P vs. N (Relative Efficiency=90%)

0.83N

0.59N

0 68

Ranger

radius=104 mm

41 5 0 97. , .ε σ= =r

0.68N

Page 73: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

P vs. N (Relative Efficiency=50%)

0.43N

Ranger

radius=104 mm

41 5 0 97. , .ε σ= =r

0.66N

Page 74: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

General Limits of Parallelization

Sequential part of the code (Amdahl’s Law)• (Hard) Scalability Limitations 1

N i N tiLoad imbalance/Can’t divide computations further (run out of parallelism)

C i ti l t /b d idth3

2- No size N arrays or operations

- Two workload distribution strategies- 2D pencil decomposition based 3D FFTs [1]

Communication latency/bandwidth

Mem

ory

3M

emor

y- Hybrid MPI/OpenMP [2]

e

ock

Tim

e or

M

ock

Tim

e or

M

all C

lock

Tim

e

Wal

l Clo

P 31 2

Wal

l Clo

P

Wa

P31 2

[2] Wei, Yılmaz,“A hybrid message passing/shared memory parallelization of the adaptive integral method for multi-core clusters,” Parallel Comput., 2011.

[1] Wei, Yılmaz,“A 2-D decomposition based parallelization of AIM for 3-D BIOEM problems,” IEEE APS Symp., 2011

Page 75: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

BIOEM Problems- High-Resolution Simulations

- AustinMan Antropohomorphic Model

Page 76: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

High-Resolution Simulations683 mm31030 mm3 683 mm3

Pabsorbed

11 mm3 11 mm315 mm3

0.2 mm30.3 mm3 0.2 mm3

Page 77: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Sample Tissues: 11 mm3 vs. 0.2 mm3

Pabsorbed

Muscle + Tendon

Skin Dry + Fat

Bone Cortical+ Bone Marrow

Page 78: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Sample Tissues: 11 mm3 vs. 0.2 mm3

1 2

Pabsorbed

3

4

1. White+Gray Matter2. Teeth3. Tongue4 Gland4. Gland5. Spinal Chord6. Mucous Membrane7. Vitreous Humor8. Blood Vessel

Page 79: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Sample Tissues: 11 mm3 vs. 0.2 mm3

5 6

Pabsorbed

8

71. White+Gray Matter2. Teeth3. Tongue4 Gland4. Gland5. Spinal Chord6. Mucous Membrane7. Vitreous Humor8. Blood Vessel

Page 80: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

AustinMan Anthropomorphic Model

CAT MRI C S tiCAT MRI Cross Sections

• Semi-automated+ Automated: Color based identification+ Automated: Color based identification- Manual: Knowledge of anatomy

• Boundary&Region Identification Software Kit (BRISkit)+ Create masks+ Region growing - MATLAB graphical user interface

AustinMan Model

g- Identify regions- Edit boundaries

Page 81: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

AustinMan Anthropomorphic Model

Sub-mmpixel

l tiresolution(1/3 mm)

Page 82: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

AustinMan Anthropomorphic ModelDownload at: web2.corral.tacc.utexas.edu/AustinManEMVoxels/

Page 83: Fast Integral Equation Methods for BIOEMfor BIOEM in the ...users.ece.utexas.edu/~yilmaz/CEM_APPLICATIONS/yilmaz_uiuc11.pdfHombach, Meier, Burkhards, Kuhn, Kuster, e dependence of

Acknowledgements

Fangzhou Wei Madison Ball

Normalized E-Field

0

Jackson MasseyCemil GeyikDr. Tahir MalasDr Victor Eijkhout

-10

-20

Dr. Victor EijkhoutProf. Paul FindellProf. John PearceProf. Leszek Demkowicz

11 mm315 mm3 11 mm3-30

-40

Prof. Robert van de Geijn

0.2 mm30.3 mm3 0.2 mm3