AES on modern GPUs

Author(s)

Politehnica University of

Bucharest

Automatic Control and Computers

Faculty

Computer Science

Department

Scientific Advisor

AES encryption using GPU architectures

Grigore Lupescu Emil Slusanschi

Scientific Student Projects Session - May 2014

AES Encrytion (1)

17.05.2014 Scientific Student Projects Session - May 2014 2

Algorithm to repeatedly apply a block cipher (e.g. AES) to the input plaintext

Most operation modes require an initialization vector

Most used cipher modes: Cipher-block chaining (CBC), Counter (CTR)

Other cipher modes: Electronic codebook (ECB), Output feedback (OFB)

Why use ECB ?

Simple, fast, very well parallelizable, max throughput

Provides a good estimate of how CTR would perform

AES Encrytion (2)

KeyExpansion: round keys are derived from the cipher key.

InitialRound: (AddRoundKey)

Rounds:

SubBytes— substitution step where each byte is replaced with another according to SBOX table.

ShiftRows— transposition step where the last three rows of the state are shifted.

MixColumns—a mixing operation which operates on the columns of the state. Operations (+,*) are redefined in the Galois Finite Field.

AddRoundKey - bitwise xor of each byte of the state with the round key.

Final Round:(SubBytes, ShiftRows, AddRoundKey).

Target System (1)

SoC CPU – AMD A4 4000K (2 cores @3.0ghz, Richland architecture, AES-NI), cores denoted by BLUE

SoC Integrated GPU HD7480 (iGPU), 2 SIMD units of 64 cores each (VLIW4 architecture), SIMD units denoted by RED

Discrete GPU AMD R7 250 (dGPU), 6 SIMD units of 64 cores each (GCN architecture), PCIe 16x 2.0 bus, SIMD units denoted by RED

Data to be encrypted denoted by GREEN

Software – C/C++/OpenCL, Linux Ubuntu 14.04 x64

Target System (2)

Algorithm Opt_1

• Array “indata” will reside in global device memory (__global)

• Variable “state” which holds transformations will be in GPU cache (__local)

• Simple operation “ShiftRows” is designed with vector addressing (state.s05AF49E38.. )

• Simple operation “AddRoundKey” is a simple XOR (state ^ key).

• Complex operation “SubBytes” will use precomputed tables of Sbox, stored in constant memory

• Complex operation “MixColumns” will use precomputed tables of Galois_FiniteField, stored in constant memory

• Host sample code bellow (simple blocking enqueues)

while(!done()) { writeData(32MB, &offset);

execKernel(32MB, &offset); readData(32MB, &offset); }

Results Opt_1

• AMD CodeXL profiling, initial results – iGPU A4 4000, ~100MB/sec AES ECB128

Algorithm Opt_2

• Simple operation “ShiftRows” - unchanged

• Simple operation “AddRoundKey” – unchanged

• Complex operation “SubBytes” will use precomputed tables of Sbox, stored in cache memory (__local)

• Complex operation “MixColumns” compute values instead of using precomputed (used optimized version of MixColumns)

• Host sample code – unchanged

Results Opt_2

• Profiling, Opt_1 – iGPU A4 4000, ~100MB/sec AES ECB128

• Profiling, Opt_2 – iGPU A4 4000, ~210MB/sec AES ECB128

Algorithm Opt_3

• Simple operation “ShiftRows” - unchanged

• Simple operation “AddRoundKey” – unchanged

• Complex operation “SubBytes” – unchanged

• Complex operation “MixColumns” - unchanged

• Host sample code – overlap execution with I/O by creating multiple queues (R, W, E)

Algorithm Opt_3 (2)

Results Opt_3

• Right figure - Results AES ECB128 in MB/sec, of serial (Opt_2) vs overlap (Opt_3)

• Bellow figure – 3 OpenCL queues (R, W, E) for asyncenqueues hence to achieve overlap execution with I/O

Conclusions

iGPU AES performance is good (faster than CPU but CPU AESNI is fastest)

Prefer cache over constant memory

Where possible analyze using precomputed tables vs computation on the fly

Overlaping execution with I/O could improve iGPU performance by 10-20%

Space of the iGPU occupied in the x86 SoC die increases with each generation and its contribution in AES throughput will increase as well

Memory transfers are expected to improve with each new generation and with them CPU/iGPU performance

AES on modern GPUs

Technology

Transcript of AES on modern GPUs

Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern … · 2012-02-09 · Modern GPU Hardware •GPUs have many parallel execution units and higher transistor

Encipherment Using Modern Symmetric-Key Ciphers. 8.2 Objectives ❏ To show how modern standard ciphers, such as DES or AES, can be used to encipher long.

FAST: Fast Architecture Sensitive Tree Search on Modern ...ryanjohn/teaching/csc2531-f11/slides/Andy-FA… · FAST: Fast Architecture Sensitive Tree Search on Modern CPUs and GPUs

Evolution of GPUs

Programming for GPUs

Brook for GPUs

Exploiting the capabilities of modern GPUs for dense matrix ...

CRN/CRN_Fall_20-21.pdfCRNS Details Date: 13-08-2020 Page: 1 57 AES-711 AES-711P AES-712 AES-722 AES-722P AES-722P AES-723 AES-731 Course No Course No Course No Course No Course No

§15. ОБЩЕЕ ОПИСАНИЕ КРИПТОАЛГОРИТМА AES AES ...

Modern graphics hardware Modern Graphics Hardware · Modern Graphics Hardware • A.k.a Graphics Processing Units (GPUs) • Programmable geometry and fragment stages • 600 million

Faster Password Recovery with modern GPUs

Gpus graal

Questions about GPUs

AES AES Kazakhstan Business Review AES ...(Unit 8) 500MW (Unit 2) 12 AES Kazakhstan Financial Overview Revenue Gross Margin Income Before Tax & Minority Interest Distributions to AES

DHZO-A* DKZOR-A* · dhzo-a* dkzor-a* 2 *-a *-ae -aes -a, (ae) (aes) *-aeg: a ( g120 )-ae aes 7 12 aeg aes /z-aes aeg *-ps rs232 712 *-bc canbus -aes *-bp profibus-dp -aes h iso 4401

AES encryption on modern consumer architectures

Cryptography - UPB · AES • AES is the 128 bit block version of the Rijndael Cipher • Very fast • Hardware support • AES-128, AES-192 and AES-256 refer to key sizes, and not

Michael Lauth, Security Analyst · -Introduced TKIP (still like WEP)-Easy to hack ... •WPA-PSK (AES): This chooses the older WPA wireless protocol with the more modern AES encryption.

Symmetric Key Cryptography on Modern Graphics Hardwarecryptslicerainb.sourceforge.net/Docs/asiacrypt2007.pdf · for implementing symmetric key ciphers on GPUs. We examine high-e–ciency

Design Exploration of AES Accelerators on FPGAs and GPUs · FPGA can outperformsmall factor devices, when compared to most powerful GPU they su er for the limited resources on board

DHZO-A* DKZOR-A* · dhzo-a* dkzor-a* 2 -a -ae -aes -a, (ae) (aes) -aeg: a ( g120 )-ae aes 7 12 aeg aes /z-aes aeg -ps rs232 712 -bc canbus -aes -bp profibus-dp -aes h iso 4401