SIMD Processing Using Compiler Intrinsics
-
Upload
richard-thomson -
Category
Software
-
view
112 -
download
0
Transcript of SIMD Processing Using Compiler Intrinsics
![Page 1: SIMD Processing Using Compiler Intrinsics](https://reader036.fdocuments.us/reader036/viewer/2022082720/5881af231a28abdd348b4cad/html5/thumbnails/1.jpg)
SIMD ProcessingUsing Compiler Intrinsics
Richard [email protected]
@LegalizeAdulthdgithub.com/LegalizeAdulthood
![Page 2: SIMD Processing Using Compiler Intrinsics](https://reader036.fdocuments.us/reader036/viewer/2022082720/5881af231a28abdd348b4cad/html5/thumbnails/2.jpg)
SIMDSingleInstructionMultipleData
![Page 3: SIMD Processing Using Compiler Intrinsics](https://reader036.fdocuments.us/reader036/viewer/2022082720/5881af231a28abdd348b4cad/html5/thumbnails/3.jpg)
SIMD Exploits Data ParallelismImage ProcessingArray ProcessingScientific Computing3D Graphics
![Page 4: SIMD Processing Using Compiler Intrinsics](https://reader036.fdocuments.us/reader036/viewer/2022082720/5881af231a28abdd348b4cad/html5/thumbnails/4.jpg)
Brief History of CPU SIMDYear Extension Register Size1997 MMX 64 bits
1999 SSE 128 bits
2001 SSE2 128 bits
2004 SSE3 128 bits
2006 SSE4 128 bits
2008 AVX 256 bits
2015 AVX-512 512 bits
![Page 5: SIMD Processing Using Compiler Intrinsics](https://reader036.fdocuments.us/reader036/viewer/2022082720/5881af231a28abdd348b4cad/html5/thumbnails/5.jpg)
Data Types8-bit integers16-bit integers32-bit integers64-bit integers
16-bit floats32-bit floats64-bit floats
Multiple smaller quantities are packed into registers ("multiple data")
Alignment requirements on data
Older extensions do not support all data types
![Page 6: SIMD Processing Using Compiler Intrinsics](https://reader036.fdocuments.us/reader036/viewer/2022082720/5881af231a28abdd348b4cad/html5/thumbnails/6.jpg)
Alignment C++11struct alignas(16) foo{
int i; // 4 bytesint j; // 4 bytesalignas(4) char s[3]; // 3 bytesshort q; // 2 bytes
};// outputs 16:std::cout << alignof(foo) << '\n';
![Page 7: SIMD Processing Using Compiler Intrinsics](https://reader036.fdocuments.us/reader036/viewer/2022082720/5881af231a28abdd348b4cad/html5/thumbnails/7.jpg)
Alignment C++03// pre-C++11// MSVC:struct __declspec(align(16)) foo{
// ...};
// gcc:struct foo __attribute__((aligned(16))){
// ...};
![Page 8: SIMD Processing Using Compiler Intrinsics](https://reader036.fdocuments.us/reader036/viewer/2022082720/5881af231a28abdd348b4cad/html5/thumbnails/8.jpg)
Boost.AlignHandles heap allocation of aligned memory
Query the alignment requirements of a type
Declare alignment to the compiler portably
![Page 9: SIMD Processing Using Compiler Intrinsics](https://reader036.fdocuments.us/reader036/viewer/2022082720/5881af231a28abdd348b4cad/html5/thumbnails/9.jpg)
Compiler IntrinsicsA function whose implementation is handled
directly by the compiler.SIMD registers exposed as data types
__m64, __m128, __m128d, __m128i, etc.SIMD instructions exposed as intrinsic
functions_m_paddb, _m_paddd, _m_paddsb, etc.
Register allocation, instruction scheduling and addressing modes handled by the compiler
Proper alignment of operands is assumed
![Page 10: SIMD Processing Using Compiler Intrinsics](https://reader036.fdocuments.us/reader036/viewer/2022082720/5881af231a28abdd348b4cad/html5/thumbnails/10.jpg)
Options AvailableAssembly
Intrinsics
Class Library
Automatic Vectorization
+ Direct control,- Hard to program
+ Pure C/C++,- Hard to program
+ Easier to program,- Less control- Very little control
![Page 11: SIMD Processing Using Compiler Intrinsics](https://reader036.fdocuments.us/reader036/viewer/2022082720/5881af231a28abdd348b4cad/html5/thumbnails/11.jpg)
Proposed Boost.Simdhttps://github.com/NumScale/boost.simdSeems promising; easier to program without loss of
control?I had problems using it on Windows (issue #189)Abstracts away the different sizes of registers as
packsProvides facilities to deal with alignmentProvides natural syntax for manipulating packs, i.e.
a+b adds two packs togetherSingle code base can target multiple extensionsTemplates expand to calls to intrinsics
![Page 12: SIMD Processing Using Compiler Intrinsics](https://reader036.fdocuments.us/reader036/viewer/2022082720/5881af231a28abdd348b4cad/html5/thumbnails/12.jpg)
Group ExerciseConvert BasicMandel to use intrinsicsAVX packs 8 32-bit floats to a single 256-bit
registerAVX Intrinsics:
#include <immintrin.h> __m256 _mm256_add_ps(__m256 a, __m256 b) __m256 _m256_mul_ps(__m256 a, __m256 b) __m256 _m256_sub_ps(__m256 a, __m256 b) __m256 _mm256_load_ps(float const *c) __m256 _mm256_cmp_ps(__m256 a, __m256 b, const int compOp) __m256i _mm256_castps_si256(__m256 a)
Intel Intrinsics Guide