Post on 13-Dec-2015
Performance Libraries:Intel Math Kernel Library (MKL)
Intel Software College
2
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Agenda
Introduction
• Purpose of Library
• Intel® Math Kernel Library (Intel® MKL) Contents
Performance Features
• Resource Limited Optimization
• Threading
Using the Library
The Library Sections• BLAS• LAPACK*• DFTs• VML• VSL
3
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Math Kernel Library Purpose
Performance, Performance, Performance!
Intel’s engineering, scientific, and financial math library
Addresses:
• Solvers (BLAS, LAPACK)
• Eigenvector/eigenvalue solvers (BLAS, LAPACK)
• Some quantum chemistry needs (dgemm)
• PDEs, signal processing, seismic, solid-state physics (FFTs)
• General scientific, financial [vector transcendental functions (VML) and vector random number generators (VSL)]
Tune for Intel® processors – current and future
4
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Math Kernel Library Purpose – Don’ts
But don’t use Intel® Math Kernel (Intel® MKL) on …
Don’t use Intel® MKL on “small” counts.
Don’t call vector math functions on small n.
X’Y’Z’W’
XYZW
=4x4
Transformationmatrix
Geometric Transformation
§ But you could use Intel® Performance Primitives.
5
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Math Kernel Library Contents
BLAS (Basic Linear Algebra Subroutines)
Level 1 BLAS – vector-vector operations• 15 function types• 48 functions
Level 2 BLAS – matrix-vector operations• 26 function types• 66 functions
Level 3 BLAS – matrix-matrix operations• 9 function types• 30 functions
Extended BLAS – level 1 BLAS for sparse vectors• 8 function types• 24 functions
6
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Math Kernel Library Contents
LAPACK (linear algebra package)
Solvers and eigensolvers. Many hundreds of routines total!
There are more than 1000 total user callable and support routines
DFTs (Discrete Fourier transforms)
Mixed radix, multi-dimensional transforms
Multithreaded
VML (Vector Math Library)
Set of vectorized transcendental functions
Most of libm functions, but faster
VSL (Vector Statistical Library)
Set of vectorized random number generators
7
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Math Kernel Library Contents
BLAS and LAPACK* are both Fortran.
• Legacy of high performance computation
VSL and VML have Fortran and C interfaces.
DFTs have Fortran 95 and C interfaces.
cblas interface. It is more convenient for a C/C++ programmer to call BLAS.
8
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Math Kernel Library (Intel® MKL) Environment
Support 32-bit and 64-bit Intel® processors
Large set of examples and tests
Extensive documentation
Windows* Linux*
Compilers Intel, CVF, Microsoft Intel, Gnu
Libraries .dll, .lib .a, .so
9
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Resource Limited Optimization
The goal of all optimization is maximum speed.
Resource limited optimization – exhaust one or more resource of system:
• CPU: Register use, FP units.
• Cache: Keep data in cache as long as possible; deal with cache interleaving.
• TLBs: Maximally use data on each page.
• Memory bandwidth: Minimally access memory.
• Computer: Use all the processors available using threading.
• System: Use all the nodes available (cluster software).
10
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Threading
Most of Intel® Math Kernel Library (Intel® MKL) could be threaded but:• Limited resource is memory bandwidth.
• Threading level 1 and level 2 BLAS are mostly ineffective ( O(n) )
There are numerous opportunities for threading:• Level 3 BLAS ( O(n3) )
• LAPACK* ( O(n3) )
• FFTs ( O(n log(n) )
• VML, VSL ? depends on processor and function
All threading is via OpenMP*.
All Intel MKL is designed and compiled for thread safety.
11
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Linking with Intel® Math Kernel Library (Intel® MKL)
Scenario 1: ifort, BLAS, IA-32 processor:
ifort myprog.f mkl_c.lib
Scenario 2: CVF, LAPACK, IA-32 processor:
f77 myprog.f mkl_s.lib
Scenario 3: Statically link a C program with DLL linked at runtime:
link myprog.obj mkl_c_dll.lib
Note: Optimal binary code will execute at run time based on processor.
12
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Matrix MultiplicationRoll Your Own/Dot Product
for( i = 0; i < n; i++ ){ for( j = 0; j < m; j++ ){ for( k = 0; k < kk; k++ )
c[i][j] += a[i][k] * b[k][j]; }}
for( i = 0; i < n; i++ ){ for( j = 0; j < m; j++ ) c[i][j] = cblas_ddot( n, &a[i], incx,&b[0][j], incy); }
Roll Your Own
ddot
13
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Matrix MultiplicationDGEMV/DGEMM
for( i = 0; i < n; i++ ) cblas_dgemv( CBLAS_RowMajor, CBLAS_NoTrans, m, n, alpha, a, lda, &b[0][i], ldb, beta, &c[0][i], ldc );
dgemv
Cblas_dgemm( CblasColMajor, CblasNoTrans, CblasNoTrans, m, n, kk, alpha, b, ldb, a, lda, beta, c, ldc );
dgemm
14
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Activity 1: DGEMM
Compare the performance of matrix multiply as implemented by C source code, DDOT, DGEMG and DGEMM.
Exercise control of the threading capabilities in MKL/BLAS.
15
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Math Kernel Library Optimizations in LAPACK*
Most important LAPACK optimizations:
• Threading – effectively uses multiple CPUs
• Recursive factorization• Reduces scalar time (Amdahl’s law: t = tscalar + tparallel/p)• Extends blocking further into the code
No runtime library support required
16
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Discrete Fourier Transforms
One dimensional, two-dimensional, three-dimensional…
Multithreaded
Mixed radix
User-specified scaling, transform sign
Transforms on imbedded matrices
Multiple one-dimensional transforms on single call
Strides
C and F90 interfaces
17
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Using the Intel® Math Kernel Library DFTs
Basically a 3-step Process
Create a descriptor.
Status = DftiCreateDescriptor(MDH, …)
Commit the descriptor (instantiates it).
• Status = DftiCommitDescriptor(MDH)
Perform the transform.
• Status = DftiComputeForward(MDH, X)
Optionally free the descriptor.
MDH: MyDescriptorHandle
Now supports FFTW interface
18
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Vector Math Library (VML) Features/Issues
Vector Math Library: vectorized transcendental functions – like libm but better (faster)
Interface: Have both Fortran and C interfaces
Multiple accuracies
• High accuracy ( < 1 ulp )
• Lower accuracy, faster ( < 4 ulps )
Special value handling √(-a), sin(0), and so on
Error handling – can not duplicate libm here
19
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
VML: Why Does It Matter?
It is important for financial codes (Monte Carlo simulations).
• Exponentials, logarithms
Other scientific codes depend on transcendental functions.
Error functions can be big time sinks in some codes.
And so on
20
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Vector Statistical Library (VSL)
Set of random number generators (RNGs)
Numerous non-uniform distributions
VML used extensively for transformations
Parallel computation support – some functions
User can supply own BRNG or transformations
Five basic RNGs (BRNGs) – bits, integer, FP
• MCG31, R250, MRG32, MCG59, WH
21
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Non-Uniform RNGs
Gaussian (two methods)
Exponential
Laplace
Weibull
Cauchy
Rayleigh
Lognormal
Gumbel
22
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Using VSL
Basically a 3-step Process
Create a stream pointer.
VSLStreamStatePtr stream;
Create a stream.
vslNewStream(&stream, VSL_BRNG_MC_G31, seed );
Generate a set of RNGs.
vsRngUniform( 0, &stream, size, out, start, end );
Delete a stream (optional).
vslDeleteStream(&stream);
23
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Activity: Calculating Pi using a Monte Carlo method
Compare the performance of C source code (RAND function) and VSL.
Exercise control of the threading capabilities in MKL/VSL.
24
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Performance Libraries: Intel® MKLWhat’s Been Covered
Intel® Math Kernel Library is a broad scientific/engineering math library.
It is optimized for Intel® processors.
It is threaded for effective use on SMP machines.
25
Copyright © 2006, Intel Corporation. All rights reserved.
Performance Libraries: Intel® Math Kernel Library (MKL)
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.