The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

The Application of POSIX The Application of POSIX Threads And OpenMP to the Threads And OpenMP to the U.S. NRC Neutron Kinetics U.S. NRC Neutron Kinetics

Code PARCSCode PARCSD.J. Lee and T.J. DownarD.J. Lee and T.J. Downar

School of Nuclear Engineering School of Nuclear Engineering Purdue UniversityPurdue University

July, 2001July, 2001

ContentsContents

• IntroductionIntroduction

• Parallelism in PARCSParallelism in PARCS

• Parallel Performance of PARCS Parallel Performance of PARCS

• Cache AnalysisCache Analysis

• ConclusionsConclusions

IntroductionIntroduction

PARCSPARCS

• ““PPurdue urdue AAdvanced dvanced RReactor eactor CCore ore SSimulator”imulator”

• U.S. NRC(Nuclear Regulatory Commission) U.S. NRC(Nuclear Regulatory Commission) Code for Nuclear Reactor Safety AnalysisCode for Nuclear Reactor Safety Analysis

• Developed at School of Nuclear Engineering Developed at School of Nuclear Engineering of Purdue Universityof Purdue University

• A Multi-Dimensional Multi-Group Reactor A Multi-Dimensional Multi-Group Reactor Kinetics Code Based on Nonlinear Nodal Kinetics Code Based on Nonlinear Nodal MethodMethod

Nuclear Power PlantNuclear Power Plant

Nuclear Reactor Core

Equations Solved in PARCS Equations Solved in PARCS

• Time-Dependent Boltzmann Transport Time-Dependent Boltzmann Transport Equation Equation

• T/H Field EquationsT/H Field Equations– Heat Conduction Equation Heat Conduction Equation

– Heat Convection EquationHeat Convection Equation

),,,(),',',()',',(''

),,,(),(),,,(),,,(1

tErStErEErdEd

tErErtErtErt

Spatial CouplingSpatial Coupling

Thermal-Hydraulics:Thermal-Hydraulics:

• Computes new Computes new coolant/fuel propertiescoolant/fuel properties

• Sends moderator Sends moderator temp., vapor and temp., vapor and liquid densities, void liquid densities, void fraction, boron conc., fraction, boron conc., and average, and average, centerline, and centerline, and surface fuel temp.surface fuel temp.

• Uses neutronic power Uses neutronic power as heat source for as heat source for conductionconduction

Thermal-Hydraulics:Thermal-Hydraulics:

• Computes new Computes new coolant/fuel propertiescoolant/fuel properties

• Sends moderator Sends moderator temp., vapor and temp., vapor and liquid densities, void liquid densities, void fraction, boron conc., fraction, boron conc., and average, and average, centerline, and centerline, and surface fuel temp.surface fuel temp.

• Uses neutronic power Uses neutronic power as heat source for as heat source for conductionconduction

Neutronics:Neutronics:

• Uses coolant and Uses coolant and fuel properties for fuel properties for local node conditionslocal node conditions

• Updates Updates macroscopic cross macroscopic cross sections based on sections based on local node conditionslocal node conditions

• Computes 3-D fluxComputes 3-D flux

• Sends node-wise Sends node-wise power distributionpower distribution

Neutronics:Neutronics:

• Uses coolant and Uses coolant and fuel properties for fuel properties for local node conditionslocal node conditions

• Updates Updates macroscopic cross macroscopic cross sections based on sections based on local node conditionslocal node conditions

• Computes 3-D fluxComputes 3-D flux

• Sends node-wise Sends node-wise power distributionpower distribution

High Necessity of HPC for PARCSHigh Necessity of HPC for PARCS• Acceleration Techniques in PARCSAcceleration Techniques in PARCS

– Nonlinear CMFD Method : Global(Low Order)+Local(High Order)Nonlinear CMFD Method : Global(Low Order)+Local(High Order)

– BILU3D Preconditioned BICGSTABBILU3D Preconditioned BICGSTAB

– Wielandt Shift MethodWielandt Shift Method

• Still, Computational Burden of PARCS is Very LargeStill, Computational Burden of PARCS is Very Large– Typically, The Calculation Speed is More Than an Order of Typically, The Calculation Speed is More Than an Order of

Magnitude Slower Than Real Time Magnitude Slower Than Real Time

– ExampleExample• NEACRP Benchmark NEACRP Benchmark

Several Tens of Seconds for 0.5 sec. SimulationSeveral Tens of Seconds for 0.5 sec. Simulation• PARCS/TRAC Coupled RUN PARCS/TRAC Coupled RUN

4 Hours for 100 sec. Simulation4 Hours for 100 sec. Simulation

Parallelism Parallelism In PARCSIn PARCS

PARCS Computational ModulesPARCS Computational Modules

• CMFDCMFD: Solves the “Global” Coarse Mesh : Solves the “Global” Coarse Mesh Finite Difference EquationFinite Difference Equation

• NODALNODAL: Solves “Local” Higher Order : Solves “Local” Higher Order Differenced EquationsDifferenced Equations

• XSECXSEC: Provides Temperature/Fluid Feedback : Provides Temperature/Fluid Feedback through Cross Sections (Coefficients of through Cross Sections (Coefficients of Boltzmann Equation)Boltzmann Equation)

• T/HT/H: Solution of Temperature/Fluid Field : Solution of Temperature/Fluid Field EquationsEquations

Parallelism in PARCSParallelism in PARCS

• NODAL and XsecNODAL and Xsec Module: Module:– Node by Node CalculationNode by Node Calculation

– Naturally ParallelizableNaturally Parallelizable

• T/HT/H Module: Module:– Channel by Channel CalculationChannel by Channel Calculation

– Naturally ParallelizableNaturally Parallelizable

• CMFDCMFD Module: Module:– Domain Decomposition PreconditioningDomain Decomposition Preconditioning

– Example: Split the Reactor into Two Halves Example: Split the Reactor into Two Halves

– The Number of Iteration Depends on the Number of The Number of Iteration Depends on the Number of DomainsDomains

Why Multi-Threaded Programming ?Why Multi-Threaded Programming ?• Coupling of DomainsCoupling of Domains

– The Information of One Plane at the Interface of Two The Information of One Plane at the Interface of Two Domains Should Be Transferred to Each OtherDomains Should Be Transferred to Each Other

– The Size of Information to be Exchanged is NOT SMALL The Size of Information to be Exchanged is NOT SMALL Compared with the Amount of Calculations for Each DomainCompared with the Amount of Calculations for Each Domain

• Message PassingMessage Passing– Large Communication Overhead Large Communication Overhead

• Multi-ThreadingMulti-Threading– Shared Address SpaceShared Address Space

– Negligible Communication Overhead Negligible Communication Overhead

Multi-threaded ProgrammingMulti-threaded Programming

• OpenMPOpenMP– FORTRAN, C, C++FORTRAN, C, C++

– Simple Implementation based on DirectivesSimple Implementation based on Directives

• POSIX ThreadsPOSIX Threads– No Interface to FORTRANNo Interface to FORTRAN

– Developed FORTRAN-to-C WrapperDeveloped FORTRAN-to-C Wrapper

– Much Caution Required to Avoid Race ConditionsMuch Caution Required to Avoid Race Conditions

POSIX THREADS WITH FORTRAN: POSIX THREADS WITH FORTRAN: nuc_threadsnuc_threads

• Mixed language interface accessible to both Mixed language interface accessible to both Fortran and C sections of the codeFortran and C sections of the code

• Minimal set of threads functions:Minimal set of threads functions:– nuc_init(*ncpu): nuc_init(*ncpu): initializes mutex and condition initializes mutex and condition

variables. variables.

– nuc_frk(*func_name,*nuc_arg,*arg):nuc_frk(*func_name,*nuc_arg,*arg): creates the creates the POSIX threads. POSIX threads.

– nuc_bar(*iam): nuc_bar(*iam): used for synchronization. used for synchronization.

– nuc_gsum(*iam,*A,*globsum):nuc_gsum(*iam,*A,*globsum): used to get a global used to get a global sum of an array updated by each thread. sum of an array updated by each thread.

Thread 1Thread 1Thread 1Thread 1 Thread 2Thread 2Thread 2Thread 2

BeginBeginBeginBegin

EndEndEndEnd

ForkForkForkFork

JoinJoinJoinJoin

ForkForkForkFork

JoinJoinJoinJoin

BeginBeginBeginBegin

EndEndEndEnd

ForkForkForkFork

JoinJoinJoinJoin

Implementation of OpenMP and Implementation of OpenMP and PthreadsPthreads

OpenMPOpenMPPthreadsPthreads

SynchronizatioSynchronizationn

idleidle

Parallel Parallel Performance of Performance of

PARCS PARCS

Applications Applications

• Matrix Vector MultiplicationMatrix Vector Multiplication– Subroutine “MatVec” of PARCSSubroutine “MatVec” of PARCS

– Size of Matrix Is Same As NEACRP BenchmarkSize of Matrix Is Same As NEACRP Benchmark

• NEACRP Reactor Transient BenchmarkNEACRP Reactor Transient Benchmark– Control Rod Ejection From Hot Zero Power Control Rod Ejection From Hot Zero Power

ConditionCondition

– Full 3-Dimensional TransientFull 3-Dimensional Transient

Specification of MachineSpecification of Machine

PlatformPlatformPlatformPlatform SUN ULTRA-80SUN ULTRA-80SUN ULTRA-80SUN ULTRA-80 SGI ORIGIN 2000SGI ORIGIN 2000SGI ORIGIN 2000SGI ORIGIN 2000

Number of CPUsNumber of CPUsNumber of CPUsNumber of CPUs 2222 32323232

CPU TypeCPU Type

ULTRA SPARC IIULTRA SPARC II450 MHz450 MHzULTRA SPARC IIULTRA SPARC II450 MHz450 MHz

MIPS R10000MIPS R10000250 MHz250 MHz4-way superscalar4-way superscalar

L1 CacheL1 Cache

16 KB D-cache16 KB D-cache16 KB I-cache16 KB I-cacheCache Line Size : 32bytesCache Line Size : 32bytes

32 KB D-cache32 KB D-cache32 KB I-cache32 KB I-cacheCache Line Size : 32bytesCache Line Size : 32bytes

L2 CacheL2 CacheL2 CacheL2 Cache 4MB4MB4MB4MB4MB per CPU4MB per CPUCache Line Size : Cache Line Size : 128bytes128bytes

4MB per CPU4MB per CPUCache Line Size : Cache Line Size : 128bytes128bytes

Main MemoryMain MemoryMain MemoryMain Memory 1GB1GB1GB1GB 16GB16GB16GB16GB

CompilerCompilerCompilerCompiler SUN Workshop 6SUN Workshop 6-FORTRAN 90 6.1-FORTRAN 90 6.1SUN Workshop 6SUN Workshop 6-FORTRAN 90 6.1-FORTRAN 90 6.1

MIPSpro Compiler 7.2.1MIPSpro Compiler 7.2.1- FORTRAN 90- FORTRAN 90MIPSpro Compiler 7.2.1MIPSpro Compiler 7.2.1- FORTRAN 90- FORTRAN 90

Specification of MachineSpecification of Machine

PlatformPlatformPlatformPlatform LINUX MachineLINUX MachineLINUX MachineLINUX Machine

Number of CPUsNumber of CPUsNumber of CPUsNumber of CPUs 4444

CPU TypeCPU Type

Intel Pentium-IIIIntel Pentium-III550 MHz550 MHzIntel Pentium-IIIIntel Pentium-III550 MHz550 MHz

L1 CacheL1 Cache

16 KB D-cache16 KB D-cache16 KB I-cache16 KB I-cacheCache Line Size : ? bytesCache Line Size : ? bytes

L2 CacheL2 CacheL2 CacheL2 Cache 512KB512KB512KB512KB

Main MemoryMain MemoryMain MemoryMain Memory 1GB1GB1GB1GB

CompilerCompilerCompilerCompiler NAGWare FORTRAN 90 NAGWare FORTRAN 90 Version 4.2Version 4.2NAGWare FORTRAN 90 NAGWare FORTRAN 90 Version 4.2Version 4.2

ftp://download.intel.com/design/PentiumIII/xeon/datashts/24509402.pdfftp://download.intel.com/design/PentiumIII/xeon/datashts/24509402.pdfSlot 2 technology, 100MHz bus, non-blocking cacheSlot 2 technology, 100MHz bus, non-blocking cache

SGISGISGISGI

SUNSUNSUNSUN

MachineMachineMachineMachine

Matrix-Vector MultiplicationMatrix-Vector Multiplication((MatVec Subroutine of PARCSMatVec Subroutine of PARCS))

1.731.731.731.73

3.763.763.763.76

SerialSerialSerialSerialOpenMPOpenMPOpenMPOpenMP

11*1)*1)11*1)*1) 2222 4444 8888

PthreadsPthreadsPthreadsPthreads

1111 2222 4444 8888

23.4323.4323.4323.43 13.2613.2613.2613.26 ---- ----

(0.16)(0.16)(0.16)(0.16) (0.28)(0.28)(0.28)(0.28) ---- ----

3.71 3.71 *2)*2)3.71 3.71 *2)*2) 1.931.931.931.93 ---- ----

(1.02) (1.02) *3)*3)(1.02) (1.02) *3)*3) (1.95)(1.95)(1.95)(1.95) ---- ----

1.731.731.731.73 0.920.920.920.92 0.520.520.520.52 0.370.370.370.37

(1.00)(1.00)(1.00)(1.00) (1.89)(1.89)(1.89)(1.89) (3.30)(3.30)*4)*4)(3.30)(3.30)*4)*4) (4.72)(4.72)(4.72)(4.72)

1.721.721.721.72 1.801.801.801.80 1.911.911.911.91 1.961.961.961.96

(1.01)(1.01)(1.01)(1.01) (0.96)(0.96)(0.96)(0.96) (0.91)(0.91)(0.91)(0.91) (0.88)(0.88)(0.88)(0.88)

*1) Number of Threads *4) Core is Divided into 18 Planes *1) Number of Threads *4) Core is Divided into 18 Planes

*2) Time(seconds) *2) Time(seconds)

*3) Speedup*3) Speedup

SGISGISGISGI

1 24 8

OpenMP

Pthreads0

Serial Run Time: 1.73 Serial Run Time: 1.73 ss

SUNSUNSUNSUN

OpenMP

Pthreads

Serial Run Time: 3.76 Serial Run Time: 3.76 ss

Matrix-Vector MultiplicationMatrix-Vector Multiplication((Subroutine of PARCSSubroutine of PARCS))

OpenMP

Pthreads

1 2 48

OpenMP

Pthreads0

NEACRP BenchmarkNEACRP Benchmark((Simulation with Multiple ThreadsSimulation with Multiple Threads))

Transient Power

100150200250300350400450500

0 0.1 0.2 0.3 0.4 0.5

TIME(sec)

) serial

2 threads

4 threads

8 thredsthreads

# of# ofUpdatesUpdates

TimeTime(sec)(sec)TimeTime(sec)(sec)

Parallel Performance (SUN)Parallel Performance (SUN)

CMFDCMFDCMFDCMFD 36.736.736.736.7 32.132.132.132.1

NodalNodalNodalNodal 11.511.511.511.5 11.311.311.311.3

T/HT/HT/HT/H 29.629.629.629.6 27.927.927.927.9

XsecXsecXsecXsec 7.67.67.67.6 7.17.17.17.1

CMFDCMFDCMFDCMFD 445445445445 445445445445

NodalNodalNodalNodal 31313131 31313131

T/HT/HT/HT/H 216216216216 216216216216

ModuleModuleModuleModule SerialSerialSerialSerialPthreadsPthreadsPthreadsPthreads

11*)*)11*)*)

TotalTotalTotalTotal 85.485.485.485.4 78.578.578.578.5

XsecXsecXsecXsec 225225225225 225225225225

20.820.820.820.8 1.771.771.771.77

6.46.46.46.4 1.781.781.781.78

14.514.514.514.5 2.042.042.042.04

3.73.73.73.7 2.042.042.042.04

456456456456 ----

33333333 ----

216216216216 ----

2222 SpeedupSpeedupSpeedupSpeedup

45.545.545.545.5 1.881.881.881.88

226226226226 ----

*) Number of Threads*) Number of Threads

Parallel Performance (SGI)Parallel Performance (SGI)

# of# ofUpdatesUpdates

TimeTime(sec)(sec)TimeTime(sec)(sec)

CMFDCMFDCMFDCMFD 19.319.319.319.3

NodalNodalNodalNodal 9.29.29.29.2

T/HT/HT/HT/H 25.325.325.325.3

XsecXsecXsecXsec 4.44.44.44.4

CMFDCMFDCMFDCMFD 445445445445

NodalNodalNodalNodal 31313131

T/HT/HT/HT/H 216216216216

ModuleModuleModuleModuleOpenMPOpenMPOpenMPOpenMP

1 1 *1)*1)1 1 *1)*1)

TotalTotalTotalTotal 58.158.158.158.1

XsecXsecXsecXsec 225225225225

8.938.938.938.93 2.212.212.212.21 8.858.858.858.85 2.232.232.232.23

3.563.563.563.56 2.532.532.532.53 2.872.872.872.87 3.143.143.143.14

8.928.928.928.92 2.992.992.992.99 7.147.147.147.14 3.733.733.733.73

1.371.371.371.37 3.533.533.533.53 1.111.111.111.11 4.354.354.354.35

497497497497 ---- 565565565565 ----

38383838 ---- 39393939 ----

216216216216 ---- 217217217217 ----

4444 SpeedupSpeedupSpeedupSpeedup 8888 SpeedupSpeedupSpeedupSpeedup

22.822.822.822.8 2.642.64*2)*2)2.642.64*2)*2) 20.020.020.020.0 3.023.02*2)*2)3.023.02*2)*2)

228228228228 ---- 227227227227 ----

19.819.819.819.8

9.09.09.09.0

26.626.626.626.6

4.84.84.84.8

445445445445

31313131

216216216216

SerialSerialSerialSerial

60.260.260.260.2

225225225225

12.112.112.112.1 1.631.631.631.63

5.85.85.85.8 1.551.551.551.55

12.312.312.312.3 2.172.172.172.17

2.42.42.42.4 2.012.012.012.01

456456456456 ----

33333333 ----

216216216216 ----

2222 SpeedupSpeedupSpeedupSpeedup

32.632.632.632.6 1.851.851.851.85

226226226226 ----

*1) Number of Threads *2) Core is divided into 18 planes*1) Number of Threads *2) Core is divided into 18 planes

Cache Analysis Cache Analysis

CPUCPUCPUCPU

L1 CacheL1 CacheL1 CacheL1 Cache

L2 CacheL2 CacheL2 CacheL2 Cache

MemoryMemoryMemoryMemory

Memory Access TypeMemory Access TypeMemory Access TypeMemory Access Type CyclesCyclesCyclesCycles

L1 cache hitL1 cache hitL1 cache hitL1 cache hit 2222

L1 cache miss L1 cache miss satisfied by L2 cache satisfied by L2 cache hithit

L2 cache miss L2 cache miss satisfied from satisfied from memorymemory

75757575

Memory Access TimeMemory Access Time

Typical Memory Access Cycles Typical Memory Access Cycles (SGI)(SGI)

CMFDCMFD(BICG)(BICG)

ModuleModuleModuleModule

NodalNodalNodalNodal

T/HT/H(TRTH)(TRTH)T/HT/H(TRTH)(TRTH)

XSECXSECXSECXSEC

Cache Miss Measurements (SGI)Cache Miss Measurements (SGI)

CacheCacheCacheCache SerialSerialSerialSerialOpenMPOpenMPOpenMPOpenMP

11*1)*1)11*1)*1) 2222 4444 8888

L1L1L1L1 477,691477,691477,691477,691 479,474479,474479,474479,474 258,027258,027258,027258,027 156,461156,461156,461156,461 105,733105,733105,733105,733

L1L1L1L1 857,744857,744857,744857,744 853,866853,866853,866853,866 444,849444,849444,849444,849 249,507249,507249,507249,507 160,699160,699160,699160,699

L2L2L2L2 54,16354,16354,16354,163 55,53455,53455,53455,534 33,84633,84633,84633,846 19,01619,01619,01619,016 12,84812,84812,84812,848

L1L1L1L1 165,133165,133165,133165,133 60,58760,58760,58760,587 39,41939,41939,41939,419 25,85025,85025,85025,850 19,81619,81619,81619,816

L2L2L2L2 9,5519,5519,5519,551 9,5129,5129,5129,512 9,6739,6739,6739,673 6,4516,4516,4516,451 4,6204,6204,6204,620

L1L1L1L1 62,32462,32462,32462,324 57,46257,46257,46257,462 29,84529,84529,84529,845 17,71517,71517,71517,715 11,34411,34411,34411,344

L2L2L2L2 9,4569,4569,4569,456 9,5189,5189,5189,518 5,5175,5175,5175,517 3,7373,7373,7373,737 2,5782,5782,5782,578

L2L2L2L2 28,24228,24228,24228,242 29,65029,65029,65029,650 17,00717,00717,00717,007 11,75111,75111,75111,751 9,3099,3099,3099,309

*1) Number of Threads *1) Number of Threads

Cache Miss & Speedup Cache Miss & Speedup of XSEC Module (SGI)of XSEC Module (SGI)

0 2 4 6 8 10

Number of CPUs

L2 MISSES SPEEDUP

ModuleModuleModuleModule

Cache Miss Ratio (SGI)Cache Miss Ratio (SGI)OpenMPOpenMPOpenMPOpenMP

CMFDCMFD(BICG)(BICG)

CacheCacheCacheCache SerialSerialSerialSerial11*1)*1)11*1)*1) 2222 4444 8888

L1L1L1L1 1.001.001.001.00

NodalNodalNodalNodalL1L1L1L1 1.001.001.001.00

L2L2L2L2 1.001.001.001.00

T/HT/H(TRTH)(TRTH)

L1L1L1L1 1.001.001.001.00

L2L2L2L2 1.001.001.001.00

XSECXSECXSECXSECL1L1L1L1 1.001.001.001.00

L2L2L2L2 1.001.001.001.00

3.053.053.053.05

3.443.443.443.44

2.852.852.852.85

6.396.396.396.39

1.481.481.481.48

3.523.523.523.52

2.532.532.532.53

2.402.402.402.40

4.524.524.524.52

5.345.345.345.34

4.224.224.224.22

8.338.338.338.33

2.072.072.072.07

5.495.495.495.49

3.673.673.673.67

3.033.033.033.03

1.851.851.851.85

1.931.931.931.93

1.601.601.601.60

4.194.194.194.19

0.990.990.990.99

2.092.092.092.09

1.711.711.711.71

1.661.661.661.66

1.001.001.001.00

0.980.980.980.98

2.732.732.732.73

1.001.001.001.00

1.081.081.081.08

0.990.990.990.99

0.950.950.950.95

Cache Miss Ratio =Cache Miss Ratio =ExecutionParallelofMissesCache

ExecutionSerialofMissesCache

*1) Number of Threads *1) Number of Threads

Speedup Estimation Using Cache Speedup Estimation Using Cache MissesMisses

= Total data access time for serial execution

= Total data access time for 2 threads execution.

= Total data access time for serial execution

= Total data access time for 2 threads execution.

thtotal

serialtotal

serialtotalT

thtotalT 2

•Speedup•Speedup

= Total L2 cache access time = Total memory access time = Number of L1 data cache misses satisfied by L2 cache hit = Number of L2 data cache misses satisfied from main memory = L2 cache access time for 1 word = Main memory access time for 1 word.

MemMemLLmemLtotal tntnTTT 222

2LTmemT

2LnMemn

2LtMemt

•Data Access Time•Data Access Time

Estimated 2-thread Speedup Based Estimated 2-thread Speedup Based on Data Cache Misses for OpenMP on Data Cache Misses for OpenMP

on SGIon SGI

CMFD (BICG)CMFD (BICG)CMFD (BICG)CMFD (BICG) 1.631.631.631.63 1.781.781.781.78

NodalNodalNodalNodal 1.551.551.551.55 1.801.801.801.80

T/H (TRTH)T/H (TRTH)T/H (TRTH)T/H (TRTH) 2.172.172.172.17 2.042.042.042.04

ModuleModuleModuleModuleSpeedupSpeedupSpeedupSpeedup

MeasuredMeasuredMeasuredMeasured PredictedPredictedPredictedPredicted

XSECXSECXSECXSEC 2.012.012.012.01 1.861.861.861.86

Conclusions Conclusions

ConclusionsConclusions

• Comparison of OpenMP and POSIX ThreadsComparison of OpenMP and POSIX Threads– OpenMP is comparable to POSIX Threads in OpenMP is comparable to POSIX Threads in

terms of Parallel Performanceterms of Parallel Performance

– OpenMP is much easier to Implement than OpenMP is much easier to Implement than POSIX Threads due to the Directive based POSIX Threads due to the Directive based NatureNature

• Cache AnalysisCache Analysis– The Prediction of Speedup based on Data The Prediction of Speedup based on Data

Cache Misses Agrees well with the Measured Cache Misses Agrees well with the Measured SpeedupSpeedup

Continuing WorkContinuing Work

• AlgorithmicAlgorithmic

- 3-D Domain Decomposition3-D Domain Decomposition

• Software Software

- SUN CompilerSUN Compiler

- Pthreads Scheduling on SGIPthreads Scheduling on SGI

• Alternate PlatformsAlternate Platforms

The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

Documents

Transcript of The Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS

ch15: POSIX IPC

PARCS Project Summary Report – October 2014€¦ · PARCS Project Summary Report – October 2014 . The PARCS project . Person-centred Activities for people with Respiratory, Cardiac

POSIX Threads Programming

WRITING Center Parcs Feature2

Parallel Programming AMANO, Hideharu. Parallel Programming Message Passing PVM MPI Shared Memory POSIX thread OpenMP CUDA/OpenCL Automatic Parallelizing.

Cs Posix Ssoo2

center parcs

Master Thesis Parallelization of the Webots simulation ... · Master Thesis Parallelization of the Webots ... OpenMP as a threading alternative to Posix Threads. ... most commonly

OpenMP · 2011-07-05 · OpenMP ... pc ?

OpenMP China MCP 1. Agenda Motivation: The Need The OpenMP Solution OpenMP Features OpenMP Implementation Getting Started with OpenMP on KeyStone.

Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines.

Parcs d´Aventure

Notes Posix

Posix Conf

The Application of POSIX Threads And OpenMP to the U.S ... · PDF fileThe Application of POSIX Threads And OpenMP to the U.S. NRC Neutron Kinetics Code PARCS ... the nuclear reactor

Hilos POSIX

Introduction to OpenMP - KFUPMhpc.kfupm.edu.sa/Documentation/OpenMP.pdf · Introduction to OpenMP • Introduction • OpenMP basics • OpenMP directives, clauses, and ... between

Serial Posix Ok

Industrial Parcs Rus

OpenMP 4 (and now 5.0) · Classic OpenMP OpenMP was designed to replace low-level and tedious solutions like POSIX threads, or Pthreads. OpenMP was originally targeted towards controlling