Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load
Javier CuencaDomingo Giménez
José González
Jack DongarraKenneth Roche
Optimisation of Linear Algebra Routines
•Traditional method: Hand-Optimisation for each platform
› Time-consuming› Incompatible with Hardware Evolution› Incompatible with changes in the system › (architecture and basic libraries)› Unsuitable for systems with variable load› Misuse by non expert users
Our ApproachModelling
the Linear Algebra Routine (LAR):
Texec = f (SP, AP, n)
SP: System ParametersAP: Algorithmic Parametersn: Problem size
Estimationof SP
Selectionof AP values
Executionof LAR
DESIGN
INSTALLATION
RUN-TIME
Our Approach
LARsJacobi methods for the symmetric eigenvalue problem
Gauss elimination
LU factorisation
QR factorisation
PlatformsCluster of Workstations
Cluster of PCs
SGI Origin 2000
IBM SP2
Static Model of LAR: Situation of platform at installation time
Our Approach
LARsJacobi methods for the symmetric eigenvalue problem
Gauss elimination
LU factorisation
QR factorisation
PlatformsCluster of Workstations
Cluster of PCs
SGI Origin 2000
IBM SP2
Static Model of LAR: Situation of platform at installation time
Dynamic Model of LAR: Situation of platform at run-time.
DESIGN PROCESS
DESIGN
LAR: Linear Algebra RoutineMade by the LAR Designer
LAR
Example of LAR: Parallel Block LU factorisation
Modelling the LARLAR
Modellingthe LAR
MODEL
DESIGN
Modelling the LARLAR
Modellingthe LAR
MODEL
DESIGN
MODELTexec = f (SP, AP, n)
SP: System Parameters AP: Algorithmic Parameters n : Problem size
Made by the LAR-DesignerOnly once per LAR
Modelling the LARLAR
Modellingthe LAR
MODEL
DESIGN
SP: k3, k2, ts, twAP: p, bn : Problem size
MODEL LAR: Parallel Block LU factorisation
Implementation of SP-EstimatorsLAR
Modellingthe LAR
MODEL
Implementationof SP-Estimators
SP-Estimators
DESIGN
Implementation of SP-EstimatorsLAR
Modellingthe LAR
MODEL
Implementationof SP-Estimators
SP-Estimators
DESIGN
Estimators of Arithmetic-SPComputation Kernel of the LARSimilar storage schemeSimilar quantity of data
Estimators of Communication-SP Communication Kernel of the LAR Similar kind of communicationSimilar quantity of data
INSTALLATION PROCESSLAR
Modellingthe LAR
MODEL
Implementationof SP-Estimators
SP-Estimators
INSTALLATION
DESIGN
Installation ProcessOnly once per PlatformDone by the System Manager
Estimation of Static-SPLAR
Modellingthe LAR
MODEL
Implementationof SP-Estimators
SP-Estimators
Estimationof Static-SP
Static-SP-File
Basic Libraries Installation-File
DESIGN
INSTALLATION
Estimation of Static-SPLAR
Modellingthe LAR
MODEL
Implementationof SP-Estimators
SP-Estimators
Estimationof Static-SP
Static-SP-File
Basic Libraries Installation-File
DESIGN
INSTALLATION
Basic LibrariesBasic Communication Library:
MPI PVM
Basic Linear Algebra Library: reference-BLAS
machine-specific-BLASATLAS
Installation FileSP values are obtained using the information (n and AP values) of this file.
Estimation of Static-SPLAR
Modellingthe LAR
MODEL
Implementationof SP-Estimators
SP-Estimators
Estimationof Static-SP
Static-SP-File
Basic Libraries Installation-File
DESIGN
INSTALLATION
Estimation of the Static-SP tw-static (in sec)
Message size (Kbytes) 32 256 1024 2048tw-static 0.700 0.690 0.680 0.675
Platform:Cluster of Pentium III + Fast Ethernet
Basic Libraries: ATLAS and MPI
Estimation of the Static-SP k3-static (in sec)
Block size 16 32 64 128k3-static 0.0038 0.0033 0.0030 0.0027
RUN-TIME PROCESSLAR
Modellingthe LAR
MODEL
Implementationof SP-Estimators
SP-Estimators
Estimationof Static-SP
Static-SP-File
Basic Libraries Installation-File
DESIGN
RUN-TIME
INSTALLATION
LAR
Modellingthe LAR
MODEL
Implementationof SP-Estimators
SP-Estimators
Estimationof Static-SP
Static-SP-File
Basic Libraries Installation-File
DESIGN
RUN-TIME
INSTALLATION
Optimum-AP
Selectionof Optimum AP
RUN-TIME PROCESS: Static approach
LAR
Modellingthe LAR
MODEL
Implementationof SP-Estimators
SP-Estimators
Estimationof Static-SP
Static-SP-File
Basic Libraries Installation-File
DESIGN
RUN-TIME
INSTALLATION
Optimum-AP
Selectionof Optimum AP
Executionof LAR
RUN-TIME PROCESS: Static approach
LAR
Modellingthe LAR
MODEL
Implementationof SP-Estimators
SP-Estimators
Estimationof Static-SP
Static-SP-File
Basic Libraries Installation-File
DESIGN
RUN-TIME
INSTALLATION
RUN-TIME PROCESS:Dynamic Approach
Call to NWSLAR
Modellingthe LAR
MODEL
Implementationof SP-Estimators
SP-Estimators
Estimationof Static-SP
Static-SP-File
Basic Libraries Installation-File
NWS Information
Call to NWS
DESIGN
INSTALLATION
RUN-TIME
Call to NWS
RUN-TIME
NWS Information
Call to NWS
The NWS is called and it reports:
the fraction of available CPU (fCPU)
the current word sending time (tw-current) for a specific n and AP values (n0, AP0).
Then the fraction of available network is calculated:
Call to NWSLAR
Modellingthe LAR
MODEL
Implementationof SP-Estimators
SP-Estimators
Estimationof Static-SP
Static-SP-File
Basic Libraries Installation-File
NWS Information
Call to NWS
DESIGN
INSTALLATION
RUN-TIME
Dynamic Adjustment of SPLAR
Modellingthe LAR
MODEL
Implementationof SP-Estimators
SP-Estimators
Estimationof Static-SP
Static-SP-File
Basic Libraries Installation-File
Current-SP
Dynamic Adjustmentof SP
NWS Information
Call to NWS
DESIGN
INSTALLATION
RUN-TIME
Dynamic Adjustment of SP
Current-SP
Dynamic Adjustmentof SP
NWS Information
Call to NWS
The values of the SP are adjusted, according to the current situation:
Static-SP-File
RUN-TIME
Dynamic Adjustment of SPLAR
Modellingthe LAR
MODEL
Implementationof SP-Estimators
SP-Estimators
Estimationof Static-SP
Static-SP-File
Basic Libraries Installation-File
Current-SP
Dynamic Adjustmentof SP
NWS Information
Call to NWS
DESIGN
INSTALLATION
RUN-TIME
Selection of Optimum APLAR
Modellingthe LAR
MODEL
Implementationof SP-Estimators
SP-Estimators
Estimationof Static-SP
Static-SP-File
Basic Libraries Installation-File
Current-SP
Dynamic Adjustmentof SP
Optimum-AP
Selectionof Optimum AP
NWS Information
Call to NWS
DESIGN
INSTALLATION
RUN-TIME
Execution of LARLAR
Modellingthe LAR
MODEL
Implementationof SP-Estimators
SP-Estimators
Estimationof Static-SP
Static-SP-File
Basic Libraries Installation-File
Current-SP
Dynamic Adjustmentof SP
Optimum-AP
Selectionof Optimum AP
Executionof LAR
NWS Information
Call to NWS
DESIGN
INSTALLATION
RUN-TIME
Platform load: different situations studied
nodo1 nodo2 nodo3 nodo4 nodo5 nodo6 nodo7 nodo8Situation A
CPU avail. 100% 100% 100% 100% 100% 100% 100% 100%tw-current 0.7sec
Situation BCPU avail. 80% 80% 80% 80% 100% 100% 100% 100%tw-current 0.8sec 0.7sec
Situation CCPU avail. 60% 60% 60% 60% 100% 100% 100% 100%tw-current 1.8sec 0.7sec
Situation DCPU avail. 60% 60% 60% 60% 100% 100% 80% 80%tw-current 1.8sec 0.7sec 0.8sec
Situation ECPU avail. 60% 60% 60% 60% 100% 100% 50% 50%tw-current 1.8sec 0.7sec 4.0sec
Optimum AP for the different situations studied
Block size Situations of the Platform Load
n A B C D E1024 32 32 64 64 642048 64 64 64 128 1283072 64 64 128 128 128
Number of nodes to use p = r c
Situations of the Platform Loadn A B C D E1024 42 42 22 22 212048 42 42 22 22213072 42 42 22 2221
Experimental Time:deviations from the Optimum
n = 1024
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
A B C D E
Situations of platform load
Static ModelDynamic Model
Experimental Time:deviations from the Optimum
n = 2048
0%
20%
40%
60%
80%
100%
120%
140%
160%
A B C D E
Situations of the platform load
Static ModelDynamic Model
Experimental Time:deviations from the Optimum
n = 3072
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
A B C D E
Situations of the platform load
Static ModelDynamic Model
Conclusions and Future Work
•The use of the proposed methodology is viable in systems where the load is stable or variable.
•Software like NWS is suitable for the adjustment of the system parameters’ values obtained at installation time.
•The heterogeneous load case offers many more possibilities than the one studied.
Top Related