1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and...
-
Upload
darren-mitchell -
Category
Documents
-
view
218 -
download
3
Transcript of 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and...
![Page 1: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/1.jpg)
1
![Page 2: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/2.jpg)
2
Upon completion of this module, you will be able to:
![Page 3: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/3.jpg)
Performance Features
Using the Library
![Page 4: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/4.jpg)
MKL Addresses:Solvers (BLAS, LAPACKEigenvector/eigenvalue solvers (BLAS, LAPACK)Some quantum chemistry needs (dgemm)PDEs, signal processing, seismic, solid-state physics (FFTs)Geneal scientific, financial [vector transcendental functions (VML) and vector random number generators (VSL)
![Page 5: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/5.jpg)
Software Construction
Geometric Transformation
Don’t use Intel® Math Kernel (Intel® MKL) on …
Don’t use Intel® MKL on “small” counts.Don’t call vector math functions on small n.
§ But you could use Intel® Performance Primitives
![Page 6: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/6.jpg)
6
BLAS (Basic Linear Algebra SubroutinesLevel 1 BLAS – vector-vector operations
15 function types48 functions
Level 2 BLAS – matrix-vector operations26 function types66 functions
Level 3 BLAS – matrix-matrix operations9 function types30 functions
Extended BLAS – level 1 BLAS for sparse vectors8 function types24 functions
![Page 7: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/7.jpg)
7
LAPACK (linear algebra packageSolvers and eigensolvers. Many hundreds of routines totalThere are more than 1000 total user callable and support routinesDiscrete Fourier Transformations (DFT)Mixed radix, multi-dimensional transformsMulti threadedVML (Vector Math Library)Set of vectorized transcendental functionsMost of libm functions, but fasterVSL (Vector Statistics Library)Set of vectorized ran
![Page 8: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/8.jpg)
8
BLAS and LAPACK* are both FortranLegacy of high performance computation
VSL and VML have Fortran and C interfacesDFTs have Fortran 95 and C interfacescblas intercate. It is more convenient for a C/C++ programmer to call BLAS
![Page 9: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/9.jpg)
9
Support 32-bit and 64-bit Intel Processors
Large set of examples and testsExtensive documentation
![Page 10: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/10.jpg)
04/18/23 10
The goal of all optimization is maximum speed.Resource limited optimization – exhaust one or more resource of system:
CPU: Register use, FP unitsCache: Keep data in cache as long as possible; deal with cache interleaving.TLBs: Maximally use data on each pageMemory bandwidth: Minimally access memoryComputer: Use all the processors available using threadingSystem: Use all the nodes available (cluster software)
![Page 11: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/11.jpg)
11
Most of Intel MKL could be threaded but:Limited resource is memory bandwidthThreading level 1 and level 2 BLAS are mostly ineffective (O(n) )
There are numerous opportunities for threading:Level 3 BLAS (O(n3) )LAPACK* (O(n3) )FFTs (O(n log(n) )VML, VSL? Depends on processor and function
All threading is via OpenMP*All Intel MKL is designed and compiled for thread safety
![Page 12: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/12.jpg)
12
Scenario 1: ifort, BLAS, IA-32 processor:ifort myprog.f mkl_c.lib
Scenario 2: CVF, LAPACK, IA-32 processor:f77 myprog.f mkl_s.lib
Scenario 3: Statically link a C program with DLL linked at runtime:link myprog.obj mkl_c_dll.libNote: Optimal binary code will execute at run time based on processor.
![Page 13: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/13.jpg)
13
![Page 14: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/14.jpg)
14
![Page 15: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/15.jpg)
15
Most important LAPACK optimizations:Threading – effectively uses multiple CPUsRecursive factorization
Reduces scalar time (Amdahl’s law: t=tscalar + tparallel/pExtends blocking further into the code
No runtime library support required
![Page 16: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/16.jpg)
16
One dimensional, two-dimensional, three-dimensionalMultithreadedMixed radixUser – specified scaling, transform signTransforms on imbedded matricesMultiple one-dimensional transforms on single cellStridesC and F90 interfaces
![Page 17: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/17.jpg)
17
Basically a three-step processCreate a descriptor
Status = DftiCreate Descriptor (MDH,…)Commit the descriptor (instantiates it)
Status = DftiCommit Descriptor (MDH)Perform the transform
Status = DftiComputeForard (MDH, X)Optionally free the descriptor
![Page 18: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/18.jpg)
18
Vector Math Library: Vectorized transcendental functions – like libm but better (faster)Interface: Have both Fortran and C interfacesMultiple accuracies
High accuracy (<1ulp)Lower accuracy, faster (<4 ulps)
Special value handling √(-a), sin(0), and so onError handling – can not duplicate libm here
![Page 19: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/19.jpg)
19
It is important for financial codes (Monte Carlo simulations)Exponentials, logarithms
Other scientific codes depend on transcendental functionsError functions can be big time sinks in come codes
![Page 20: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/20.jpg)
20
Set of random number generators (RNGs)Numerous non-uniform distributionsVML used extensively for transformationsParallel computation support – some functionsUser can supply own BRNG or transformationsFive basic RNGs (BRNGs) – bits, integer, FP
◦ MCG31, R250, MRG32, MCG59, WH
![Page 21: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/21.jpg)
21
Gaussian (two methods)ExponentialLaplaceWeibullCauchyRayleighLognormalGumbel
![Page 22: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/22.jpg)
22
Basically a 3-step ProcessCreate a stream pointer. VSLStreamStatePtr stream;Create a stream.vslNewStream(&stream,VSL_BRNG_MC_G31, seed );Generate a set of RNGs.vsRngUniform( 0, &stream, size, out, start, end );Delete a stream (optional).vslDeleteStream(&stream);
![Page 23: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/23.jpg)
2323
Compare the performance of C source code (RAND function) and VSL.Exercise control of the threading capabilities in MKL/VSL.
![Page 24: 1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.](https://reader031.fdocuments.us/reader031/viewer/2022020219/56649f445503460f94c64b66/html5/thumbnails/24.jpg)
24
Intel® Math Kernel Library is a broad scientific/engineering math library.It is optimized for Intel® processors.It is threaded for effective use on SMP machines.