Distributed computing of the GEOS-Chem model
Transcript of Distributed computing of the GEOS-Chem model
![Page 1: Distributed computing of the GEOS-Chem model](https://reader031.fdocuments.us/reader031/viewer/2022012500/61793328cd62d22bc3120743/html5/thumbnails/1.jpg)
1
Distributed computing of the GEOS-Chem model
Kevin Bowman
Lei Pan, Qinbin Li, and Paul von Allmen
California Institute of Technology
Jet Propulsion Laboratory
![Page 2: Distributed computing of the GEOS-Chem model](https://reader031.fdocuments.us/reader031/viewer/2022012500/61793328cd62d22bc3120743/html5/thumbnails/2.jpg)
2
Objectives
• The objective of this activity is to develop a scalable, parallel version ofthe GEOS-Chem code based on a distributed computing architecturethat is suitable for the JPL 1024 processing element (PEs) institutionalcluster
• The goal is to improve the GEOS-Chem wall-clock performance by atleast one order of magnitude over the current capability
• The current capability:the speedup of GEOS-Chem with the number ofCPUs currently plateaus at 4 processors on a shared memory platformsuch as SGI O2K. Best wall-clock performance is completion of a 1-month model simulation on a 200 x 250 km grid within 1 day.
![Page 3: Distributed computing of the GEOS-Chem model](https://reader031.fdocuments.us/reader031/viewer/2022012500/61793328cd62d22bc3120743/html5/thumbnails/3.jpg)
3
Approach
• The primary calculations in GEOS-Chem are:
– Chemistry (60%)
– Transport, deposition, emissions (40%)
• The chemistry component is inherently parallel and therefore the mostlogical starting point.
• The initial stage is to use a master/slave architecture for theparallelization of the chemistry
• The second stage is to migrate towards a domain decompositiondesign that will handle both transport and chemistry.
![Page 4: Distributed computing of the GEOS-Chem model](https://reader031.fdocuments.us/reader031/viewer/2022012500/61793328cd62d22bc3120743/html5/thumbnails/4.jpg)
4
Timeseries diagnostics (diag49)
Initialization
Start 6-h loop
Start dynamic time stepArchive diagnostics (diag3)Met fields (a3 & a6): unzip, read
Transport
Turbulent Mixing
Convection
End 6-h loop
Dry Deposition
Emissions
Chemistry
Wet Deposition
End dynamic time step
Seasonal, monthly, daily dataInterpolate met fieldsCompute air mass quantitiesUnit conversion: kg -> v/v
Compute air mass quantities
Upper boundary flux conditions
Unit conversion: kg -> v/v
DO_CHEMISTRYchemistry_mod.f
CHEM chem.f
PHYSPROC physproc.f
CALCRATE calcrate.fSMVGEAR smvgear.f
DO_WETDEPwetscav_mod.f
WETDEP wetscav_mod.f
DO_EMISSIONSemissions_mod.f
EMISSDR emissdr.f
DO_DRYDEPdrydep_mod.f
DEPVEL drydep_mod.f
DO_CONVECTIONconvection_mod.f
FVDAS_CONVECT fvdas_convect_mod.f
NFCLDMX convection_mod.for
TURBDAYturbday.f
DO_TRANSPORTtransport_mod.f
TPCORE_FVDAS tpcore_fvdas_mod.f90
TPCOREtpcore_mod.for
15 min
15 min
60 min
60 min
60 min
15 min
15 min
GEOS-Chem computational flow
![Page 5: Distributed computing of the GEOS-Chem model](https://reader031.fdocuments.us/reader031/viewer/2022012500/61793328cd62d22bc3120743/html5/thumbnails/5.jpg)
5
Master/slave architecture
Transport, turbulent mixing,convectionDry deposition, emissions
GEOS-Chem master node
Chemistry Chemistry Chemistry Chemistry
GEOS-Chem master node
Wet deposition
Logical sequence/one tim
e-step
Slave node
![Page 6: Distributed computing of the GEOS-Chem model](https://reader031.fdocuments.us/reader031/viewer/2022012500/61793328cd62d22bc3120743/html5/thumbnails/6.jpg)
6
Chemistry
Physproc
FOR ii = 1, 2300 DO
CALCRATE
SMVGEAR
ENDDO
Physproc
FOR ii = 1, 2300/N DO
CALCRATE
SMVGEAR
MPI-SEND
MPI-RECEIVE
CALCRATE
SMVGEAR
CALCRATE
SMVGEAR
ENDDO
PE 1
PE 2
PE N
![Page 7: Distributed computing of the GEOS-Chem model](https://reader031.fdocuments.us/reader031/viewer/2022012500/61793328cd62d22bc3120743/html5/thumbnails/7.jpg)
7
Amdahl’s Law
Amdahl’s law describes the speed-up from parallelizationas a function of processor number, non-parallelizable component,processor communication and contention
Speedup =Tseq
Tnp + Tcom (P)+Tcont (P)+Tseq − Tnp
P
Tseq : Sequential timeTnp : Non-parallelizable component timeTcom: Communication time between processorsTcont: Contention time between processorsP : Number of processors
![Page 8: Distributed computing of the GEOS-Chem model](https://reader031.fdocuments.us/reader031/viewer/2022012500/61793328cd62d22bc3120743/html5/thumbnails/8.jpg)
8
Performance
•Test run on 4x5 deg, full chemistry•1024 processor (dual CPU/node) Dell cluster
• Xeon Processors, ~3 Tflops theoretical peak, ~2 Tbyte RAM•Pentium 3.2 Ghz and 2 GB RAM
•Communication and contention cost removed for analysis
•Tseq: 649.83 sec•Chemistry (seq) : 432.21 sec (66.5%)•SMVGEAR+CALCRATE: 0.0076 sec/node•Reach optimal trade-off in speedup-processorperformance with 32 processors
However,•Total time with master/slave architecture is 2230 sec•Contention time: 1825.95 sec or 82% of wall-clocktime.•Communication time: ~0.0063*2300 sec
Master/slave architecture not a viable option forChemistry or transport.
![Page 9: Distributed computing of the GEOS-Chem model](https://reader031.fdocuments.us/reader031/viewer/2022012500/61793328cd62d22bc3120743/html5/thumbnails/9.jpg)
9
Domain Decomposition
Grid cell:GhostBoundaries:
All computations (transport, chemistry) for a grid cell are performed on one processorFor transport, ghost boundaries must be used
PE 1,1
PE 1,2
PE 1,3
![Page 10: Distributed computing of the GEOS-Chem model](https://reader031.fdocuments.us/reader031/viewer/2022012500/61793328cd62d22bc3120743/html5/thumbnails/10.jpg)
10
Ghost Boundaries
PE1
PE2
t+2dtt t+dt
MES
SAG
E PA
SSIN
G
t+3dtProcess Time t: current values of fields on allgrid points are accessible by PE1 andPE2. Time t+dt and t+2dt: current valuesof fields are accessible by both PE1and PE2 on a reduced set of gridpoints. Message passing: current values offields are made accessible to both PE1and PE2 on all grid points. Time t+3dt: situation identical totime t.
Salient Features Information is exchanged betweenPE1 and PE2 every 3 time steps. Fields on all the grid points in theghost boundary are exchanged. Fields on some grids points arecomputed redundantly by both PE1 andPE2.
Optimization of ghost boundary size
![Page 11: Distributed computing of the GEOS-Chem model](https://reader031.fdocuments.us/reader031/viewer/2022012500/61793328cd62d22bc3120743/html5/thumbnails/11.jpg)
11
Future Directions and Conclusions
• We have a preliminary design for the domain decomposition
• We expect to achieve ~P1/2 speed-up with this design.
• The I/O bottleneck (lots of data written to files) will be resolved by usinga Parallel Virtual File System (PVFS) and MPI ROM/IO in order tomaintain the scaling for a larger number of processors.
• We see this approach will enable GEOS-Chem user’s to address abroad range of questions that are currently inhibited by computationalconstraints.
• These techniques will be beneficial not only to large systems, such asthe JPL institutional cluster, but also to more modest cluster systems.
![Page 12: Distributed computing of the GEOS-Chem model](https://reader031.fdocuments.us/reader031/viewer/2022012500/61793328cd62d22bc3120743/html5/thumbnails/12.jpg)
12
Distributed Computation
Data on full grid
Distribute Data (MP)
Distributed Computation
InjectBoundary Data (MP)
Gather Data (MP)
P1 PNP2
Chemistry
Transportt→t+dt
Chemistry
Transportt→t+dt
P1 P2 PN
Data on full grid