1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.
-
Upload
esther-cobb -
Category
Documents
-
view
223 -
download
0
Transcript of 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.
![Page 1: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/1.jpg)
1
OpenSSL acceleration using Graphics
Processing Units
Pedro Miguel Costa Saraiva
![Page 2: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/2.jpg)
2
Introduction•Cryptography: The study of
security techniques
•SSL: A set of rules governing authentication and encrypted client/server communication• De facto standard for secure electronic
communications
• Computationally intensive
• Large volumes of SSL traffic impact performance
Pedro Miguel Costa Saraiva
OpenSSL acceleration using Graphics Processing Units
![Page 3: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/3.jpg)
3
Introduction
•GPU: A specialised processing unit designed to manipulate graphics• Originally used solely for graphics calculations
• Recent developments enable its use for general purpose computing
• Massive computational power
Pedro Miguel Costa Saraiva
OpenSSL acceleration using Graphics Processing Units
![Page 4: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/4.jpg)
4
Introduction
•OpenSSL• Open-source implementation of the SSL and
TLS protocols
• Core-library implements a variety of cryptographic functions
• Intensively used by an extremely large number of both open and proprietary applications
Pedro Miguel Costa Saraiva
OpenSSL acceleration using Graphics Processing Units
![Page 5: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/5.jpg)
5
Introduction
•Objectives• Efficiently offload cryptographic operations
onto a GPU
• Add GPU functionality to OpenSSL
• Lighten the load on the CPU
Pedro Miguel Costa Saraiva
OpenSSL acceleration using Graphics Processing Units
![Page 6: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/6.jpg)
6
Introduction•Structure
• State of the art• OpenSSL
• GPU
• Programming the GPU
• OpenCL
• CUDA
• OpenCL vs CUDA
• Main challenges
• Implementation
• Results
• Conclusion
Pedro Miguel Costa Saraiva
OpenSSL acceleration using Graphics Processing Units
![Page 7: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/7.jpg)
7
State of the art
•OpenSSL
• Commercial-grade full-featured open source toolkit
• Divided into libssl and libcrypto
• Core library written in C
• Supports accelerator hardware via engines
Pedro Miguel Costa Saraiva
OpenSSL acceleration using Graphics Processing Units
![Page 8: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/8.jpg)
8
State of the art
• Massive parallel processing power
• Roughly ten times the floating point capability of a high end CPU
• Faster growth rate than CPUs
Pedro Miguel Costa Saraiva
GPU
OpenSSL acceleration using Graphics Processing Units
![Page 9: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/9.jpg)
9
State of the art
• At the end of the 90s, graphics cards could not be programmed
• Things changed in 2001 with the release of DirectX 8 and OpenGL
• Programmers had to express their computations in terms of textures, vertices and shader programs
Pedro Miguel Costa Saraiva
GPU - Programming
OpenSSL acceleration using Graphics Processing Units
![Page 10: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/10.jpg)
10
State of the art
• 2006: NVIDIA created the CUDA framework
• ATI created the CTM low-level framework
• 2008: NVIDIA and ATI joined the Khronos Group
• Development of an industry standard for hybrid computing
• OpenCL version 1.0 released in December 2008
Pedro Miguel Costa Saraiva
GPU - Programming
OpenSSL acceleration using Graphics Processing Units
![Page 11: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/11.jpg)
11
State of the art
• Open, royalty-free standard for general purpose programming
• Supports CPUs, GPUs, and other types of processors
• Maintained by the non-profit consortium Khronos Group
• Adopted by Intel, AMD, NVIDIA, and ARM Holdings
Pedro Miguel Costa Saraiva
GPU - OpenCL
OpenSSL acceleration using Graphics Processing Units
![Page 12: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/12.jpg)
12
State of the art
• API for coordinating parallel computation across different processors
• Cross-platform programming languages
• Subset of ISO C99
• Low performance on NVIDIA GPUs
Pedro Miguel Costa Saraiva
GPU - OpenCL
OpenSSL acceleration using Graphics Processing Units
![Page 13: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/13.jpg)
13
State of the art
• Proprietary hardware and software architecture
• Designed by NVIDIA
• Manages computations on a GPU
• API is programmed with “C for CUDA”
• Third party wrappers available for other languages
Pedro Miguel Costa Saraiva
GPU - CUDA
OpenSSL acceleration using Graphics Processing Units
![Page 14: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/14.jpg)
14
State of the art
• Well suited to extremely parallel problems
• Interaction between threads should be minimal
• Diverging executions paths are slow
• Limited memory
• Slow memory swapping
• Data-intensive operations are discouraged
• No file or standard I/O operations
Pedro Miguel Costa Saraiva
GPU - Main Challenges
OpenSSL acceleration using Graphics Processing Units
![Page 15: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/15.jpg)
15
Implementation
• OpenSSL
• AES
• RSA Key Generation
• RSA Cipher
Pedro Miguel Costa Saraiva
Structure
OpenSSL acceleration using Graphics Processing Units
![Page 16: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/16.jpg)
16
Implementation
• ENGINE component supports alternative cryptography implementations
• Supports dynamic loading of external engines
Pedro Miguel Costa Saraiva
OpenSSL
OpenSSL acceleration using Graphics Processing Units
![Page 17: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/17.jpg)
17
Implementation
• Binding function defines supported algorithms
• Pointers to functions implementing the defined algorithms
Pedro Miguel Costa Saraiva
OpenSSL Engine
OpenSSL acceleration using Graphics Processing Units
![Page 18: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/18.jpg)
18
Implementation
• CBC mode encryption cannot be parallelised
• Previous ciphertext block is required to begin encryption of the next one
• CBC mode decryption can be parallelised
• All blocks are decrypted in parallel
• ECB mode can be parallelised
Pedro Miguel Costa Saraiva
AES
OpenSSL acceleration using Graphics Processing Units
![Page 19: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/19.jpg)
19
Implementation
• Initialisation
• Key expansion is performed on the CPU
• Cipher
• Initialises the GPU
• Allocates host and GPU memory for input and output data
Pedro Miguel Costa Saraiva
AES
OpenSSL acceleration using Graphics Processing Units
![Page 20: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/20.jpg)
20
Implementation
• Cipher
• Input data transferred to the GPU memory
• All data transferred at once
• GPU Kernel is called
• Output data is transferred from the GPU memory
Pedro Miguel Costa Saraiva
AES
OpenSSL acceleration using Graphics Processing Units
![Page 21: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/21.jpg)
21
Implementation
• GPU Kernel
• For CBC encryption, a single thread is called
• Encrypts every block serially
• For CBC decryption and ECB operations, a thread is called for every block
• All blocks are processed in parallel
Pedro Miguel Costa Saraiva
AES
OpenSSL acceleration using Graphics Processing Units
![Page 22: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/22.jpg)
22
Implementation
• Generation function (CPU side)
• Calls the GPU to generate a large amount of prime candidates
• No more numbers are generated until the initial pool is exhausted
Pedro Miguel Costa Saraiva
RSA Key Generation
OpenSSL acceleration using Graphics Processing Units
![Page 23: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/23.jpg)
23
Implementation
• Generation function (GPU call)
• GPU RNG is initialised
• Device memory is allocated
• A large amount of threads is called to generate prime BIGNUMs
Pedro Miguel Costa Saraiva
RSA Key Generation
OpenSSL acceleration using Graphics Processing Units
![Page 24: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/24.jpg)
24
Implementation
• Generation function (GPU kernel)
• Random BIGNUM is generated
• BIGNUM p is tested for primality
• Miller-Rabin probabilistic primality test
• BIGNUMs determined to be prime are written into global memory
• Each thread tests one BIGNUM
Pedro Miguel Costa Saraiva
RSA Key Generation
OpenSSL acceleration using Graphics Processing Units
![Page 25: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/25.jpg)
25
Implementation
• Generation function (GPU call)
• Output data copied back to the host
• Required implementing the entire OpenSSL BIGNUM library on the GPU
Pedro Miguel Costa Saraiva
RSA Key Generation
OpenSSL acceleration using Graphics Processing Units
![Page 26: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/26.jpg)
26
Implementation
• BIGNUMs used in RSA must be broken down into small words
• Multiple threads can each process a word
• Chinese Remainder Theorem can split private key operations in half
Pedro Miguel Costa Saraiva
RSA Cipher
OpenSSL acceleration using Graphics Processing Units
![Page 27: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/27.jpg)
27
Implementation
• Multi-Precision Algorithm
• K-bit integer A is broken into s k/64 words
• O(s) parallel implementation
• Runs s threads in two phases
Pedro Miguel Costa Saraiva
RSA Cipher
OpenSSL acceleration using Graphics Processing Units
![Page 28: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/28.jpg)
28
Implementation
• First phase accumulates s partial products in 2s steps
• Carries accumulated in a separate array
• Second phase adds the carries to the intermediate result\
• Worst case scenario is s-1 iterations
• Usually only one or two
Pedro Miguel Costa Saraiva
RSA Cipher
OpenSSL acceleration using Graphics Processing Units
![Page 29: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/29.jpg)
29
Results
• Intel Core i7 950 CP, 3.07GHz
• NVIDIA GeForce GTX 580
• Stress tool used on heavy CPU load tests
• 300 threads looping on sqrt, malloc/free and sync
Pedro Miguel Costa Saraiva
Testing Framework
OpenSSL acceleration using Graphics Processing Units
![Page 30: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/30.jpg)
30
Results
Pedro Miguel Costa Saraiva
AES – CBC Decryption
OpenSSL acceleration using Graphics Processing Units
![Page 31: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/31.jpg)
31
Results
Pedro Miguel Costa Saraiva
AES – CBC Encryption
OpenSSL acceleration using Graphics Processing Units
![Page 32: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/32.jpg)
32
Results
Pedro Miguel Costa Saraiva
AES – ECB Encryption
OpenSSL acceleration using Graphics Processing Units
![Page 33: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/33.jpg)
33
Results
Pedro Miguel Costa Saraiva
AES – ECB Decryption
OpenSSL acceleration using Graphics Processing Units
![Page 34: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/34.jpg)
34
Results
Pedro Miguel Costa Saraiva
RSA Key Generation
OpenSSL acceleration using Graphics Processing Units
![Page 35: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/35.jpg)
35
Results
Pedro Miguel Costa Saraiva
RSA Key Generation – Heavy CPU load
OpenSSL acceleration using Graphics Processing Units
![Page 36: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/36.jpg)
36
Results
Pedro Miguel Costa Saraiva
RSA Cipher
OpenSSL acceleration using Graphics Processing Units
Single message, heavy CPU load
RSA Cipher
Single message
Multiple messages (4096-bit)
![Page 37: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/37.jpg)
37
Results
Pedro Miguel Costa Saraiva
RSA Key Generation – Heavy CPU load
OpenSSL acceleration using Graphics Processing Units
![Page 38: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/38.jpg)
38
Results
Pedro Miguel Costa Saraiva
RSA Key Generation – Heavy CPU load
OpenSSL acceleration using Graphics Processing Units
![Page 39: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/39.jpg)
39
Conclusion
Pedro Miguel Costa Saraiva
OpenSSL acceleration using Graphics Processing Units
• Significant performance boost for AES ECB and CBC Decryption
• AES CBC Encryption is slower, but significantly lighter on the CPU
• RSA Key Generation is significantly faster for multiple keys
• RSA Cipher is significantly slower
![Page 40: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649dba5503460f94aaa2af/html5/thumbnails/40.jpg)
40
Future Work
Pedro Miguel Costa Saraiva
OpenSSL acceleration using Graphics Processing Units
• AES CTR Cipher Mode
• OpenSSL implementation still unstable
• Manager to cache RSA requests for more effective use of the GPU