pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4
Transcript of pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4
![Page 1: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/1.jpg)
pqm4: Testing and Benchmarking NIST PQC
on ARM Cortex-M4
Matthias Kannwischer, Joost Rijneveld, Peter Schwabe, and Ko Stoffelen Radboud University, Nijmegen, The Netherlands [email protected]
Aug 24, 2019, 2nd PQC Standardization Conference, Santa Barbara
![Page 2: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/2.jpg)
![Page 3: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/3.jpg)
§ Round 1: Focus on Intel/AVX2 implementations
§ But: Majority of cryptographic devices is way smaller
‚ Limited RAM
‚ No/limited vector instructions
‚ Side-channels?
§ Challenges
‚ Do schemes even fit in limited RAM + flash?
‚ Are schemes efficient on small ARMs?
‚ What is the overhead of masking?
Motivation
§ “Performance will play a larger role in the second round”
2
![Page 4: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/4.jpg)
§ But: Majority of cryptographic devices is way smaller
‚ Limited RAM
‚ No/limited vector instructions
‚ Side-channels?
§ Challenges
‚ Do schemes even fit in limited RAM + flash?
‚ Are schemes efficient on small ARMs?
‚ What is the overhead of masking?
Motivation
§ “Performance will play a larger role in the second round”
§ Round 1: Focus on Intel/AVX2 implementations
2
![Page 5: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/5.jpg)
‚ Limited RAM
‚ No/limited vector instructions
‚ Side-channels?
§ Challenges
‚ Do schemes even fit in limited RAM + flash?
‚ Are schemes efficient on small ARMs?
‚ What is the overhead of masking?
Motivation
§ “Performance will play a larger role in the second round”
§ Round 1: Focus on Intel/AVX2 implementations
§ But: Majority of cryptographic devices is way smaller
2
![Page 6: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/6.jpg)
‚ No/limited vector instructions
‚ Side-channels?
§ Challenges
‚ Do schemes even fit in limited RAM + flash?
‚ Are schemes efficient on small ARMs?
‚ What is the overhead of masking?
Motivation
§ “Performance will play a larger role in the second round”
§ Round 1: Focus on Intel/AVX2 implementations
§ But: Majority of cryptographic devices is way smaller
‚ Limited RAM
2
![Page 7: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/7.jpg)
‚ Side-channels?
§ Challenges
‚ Do schemes even fit in limited RAM + flash?
‚ Are schemes efficient on small ARMs?
‚ What is the overhead of masking?
Motivation
§ “Performance will play a larger role in the second round”
§ Round 1: Focus on Intel/AVX2 implementations
§ But: Majority of cryptographic devices is way smaller
‚ Limited RAM
‚ No/limited vector instructions
2
![Page 8: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/8.jpg)
§ Challenges
‚ Do schemes even fit in limited RAM + flash?
‚ Are schemes efficient on small ARMs?
‚ What is the overhead of masking?
Motivation
§ “Performance will play a larger role in the second round”
§ Round 1: Focus on Intel/AVX2 implementations
§ But: Majority of cryptographic devices is way smaller
‚ Limited RAM
‚ No/limited vector instructions
‚ Side-channels?
2
![Page 9: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/9.jpg)
‚ Do schemes even fit in limited RAM + flash?
‚ Are schemes efficient on small ARMs?
‚ What is the overhead of masking?
Motivation
§ “Performance will play a larger role in the second round”
§ Round 1: Focus on Intel/AVX2 implementations
§ But: Majority of cryptographic devices is way smaller
‚ Limited RAM
‚ No/limited vector instructions
‚ Side-channels?
§ Challenges
2
![Page 10: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/10.jpg)
‚ Are schemes efficient on small ARMs?
‚ What is the overhead of masking?
Motivation
§ “Performance will play a larger role in the second round”
§ Round 1: Focus on Intel/AVX2 implementations
§ But: Majority of cryptographic devices is way smaller
‚ Limited RAM
‚ No/limited vector instructions
‚ Side-channels?
§ Challenges
‚ Do schemes even fit in limited RAM + flash?
2
![Page 11: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/11.jpg)
‚ What is the overhead of masking?
Motivation
§ “Performance will play a larger role in the second round”
§ Round 1: Focus on Intel/AVX2 implementations
§ But: Majority of cryptographic devices is way smaller
‚ Limited RAM
‚ No/limited vector instructions
‚ Side-channels?
§ Challenges
‚ Do schemes even fit in limited RAM + flash?
‚ Are schemes efficient on small ARMs?
2
![Page 12: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/12.jpg)
Motivation
§ “Performance will play a larger role in the second round”
§ Round 1: Focus on Intel/AVX2 implementations
§ But: Majority of cryptographic devices is way smaller
‚ Limited RAM
‚ No/limited vector instructions
‚ Side-channels?
§ Challenges
‚ Do schemes even fit in limited RAM + flash?
‚ Are schemes efficient on small ARMs?
‚ What is the overhead of masking?
2
![Page 13: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/13.jpg)
– everyone, always
§ STM32F4DISCOVERY
‚ ARM Cortex-M4
‚ 32-bit, ARMv7E-M
‚ 192KiB RAM, 168MHz
§ PQM4: test and optimize
on the Cortex-M4
‚ github.com/mupq/pqm4
Post-quantum on small devices
“It’s big and it’s slow”
3
![Page 14: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/14.jpg)
§ STM32F4DISCOVERY
‚ ARM Cortex-M4
‚ 32-bit, ARMv7E-M
‚ 192KiB RAM, 168MHz
§ PQM4: test and optimize
on the Cortex-M4
‚ github.com/mupq/pqm4
Post-quantum on small devices
“It’s big and it’s slow”
– everyone, always
3
![Page 15: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/15.jpg)
Post-quantum on small devices
“It’s big and it’s slow”
– everyone, always
§ STM32F4DISCOVERY
‚ ARM Cortex-M4
‚ 32-bit, ARMv7E-M
‚ 192 KiB RAM, 168 MHz
§ PQM4: test and optimize
on the Cortex-M4
‚ github.com/mupq/pqm4
3
![Page 16: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/16.jpg)
§ They are huge in terms of RAM and flash
‚ Great for PQC – many schemes fit
‚ Unfortunately, pqRSA did not terminate within round 1
§ ARMv7E-M more interesting for assembly optimizations
§ NIST recommended Cortex-M4 for PQC evaluation
§ We’re using it for teaching
‚ We have dozens of them lying around
‚ Our students know how to work with them
Rationale for using STM32F4DISCOVERY boards
§ They are cheap (ă $30)
4
![Page 17: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/17.jpg)
‚ Great for PQC – many schemes fit
‚ Unfortunately, pqRSA did not terminate within round 1
§ ARMv7E-M more interesting for assembly optimizations
§ NIST recommended Cortex-M4 for PQC evaluation
§ We’re using it for teaching
‚ We have dozens of them lying around
‚ Our students know how to work with them
Rationale for using STM32F4DISCOVERY boards
§ They are cheap (ă $30)
§ They are huge in terms of RAM and flash
4
![Page 18: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/18.jpg)
‚ Unfortunately, pqRSA did not terminate within round 1
§ ARMv7E-M more interesting for assembly optimizations
§ NIST recommended Cortex-M4 for PQC evaluation
§ We’re using it for teaching
‚ We have dozens of them lying around
‚ Our students know how to work with them
Rationale for using STM32F4DISCOVERY boards
§ They are cheap (ă $30)
§ They are huge in terms of RAM and flash
‚ Great for PQC – many schemes fit
4
![Page 19: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/19.jpg)
§ ARMv7E-M more interesting for assembly optimizations
§ NIST recommended Cortex-M4 for PQC evaluation
§ We’re using it for teaching
‚ We have dozens of them lying around
‚ Our students know how to work with them
Rationale for using STM32F4DISCOVERY boards
§ They are cheap (ă $30)
§ They are huge in terms of RAM and flash
‚ Great for PQC – many schemes fit
‚ Unfortunately, pqRSA did not terminate within round 1
4
![Page 20: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/20.jpg)
§ NIST recommended Cortex-M4 for PQC evaluation
§ We’re using it for teaching
‚ We have dozens of them lying around
‚ Our students know how to work with them
Rationale for using STM32F4DISCOVERY boards
§ They are cheap (ă $30)
§ They are huge in terms of RAM and flash
‚ Great for PQC – many schemes fit
‚ Unfortunately, pqRSA did not terminate within round 1
§ ARMv7E-M more interesting for assembly optimizations
4
![Page 21: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/21.jpg)
§ We’re using it for teaching
‚ We have dozens of them lying around
‚ Our students know how to work with them
Rationale for using STM32F4DISCOVERY boards
§ They are cheap (ă $30)
§ They are huge in terms of RAM and flash
‚ Great for PQC – many schemes fit
‚ Unfortunately, pqRSA did not terminate within round 1
§ ARMv7E-M more interesting for assembly optimizations
§ NIST recommended Cortex-M4 for PQC evaluation
4
![Page 22: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/22.jpg)
‚ We have dozens of them lying around
‚ Our students know how to work with them
Rationale for using STM32F4DISCOVERY boards
§ They are cheap (ă $30)
§ They are huge in terms of RAM and flash
‚ Great for PQC – many schemes fit
‚ Unfortunately, pqRSA did not terminate within round 1
§ ARMv7E-M more interesting for assembly optimizations
§ NIST recommended Cortex-M4 for PQC evaluation
§ We’re using it for teaching
4
![Page 23: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/23.jpg)
‚ Our students know how to work with them
Rationale for using STM32F4DISCOVERY boards
§ They are cheap (ă $30)
§ They are huge in terms of RAM and flash
‚ Great for PQC – many schemes fit
‚ Unfortunately, pqRSA did not terminate within round 1
§ ARMv7E-M more interesting for assembly optimizations
§ NIST recommended Cortex-M4 for PQC evaluation
§ We’re using it for teaching
‚ We have dozens of them lying around
4
![Page 24: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/24.jpg)
Rationale for using STM32F4DISCOVERY boards
§ They are cheap (ă $30)
§ They are huge in terms of RAM and flash
‚ Great for PQC – many schemes fit
‚ Unfortunately, pqRSA did not terminate within round 1
§ ARMv7E-M more interesting for assembly optimizations
§ NIST recommended Cortex-M4 for PQC evaluation
§ We’re using it for teaching
‚ We have dozens of them lying around
‚ Our students know how to work with them
4
![Page 25: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/25.jpg)
pqm4
§ Goals
‚ Framework that eases optimization for this platform
‚ Automate testing and benchmarking
‚ Include as many schemes as possible
§ 4 types of implementations
‚ ref: Reference C implementations from submission
packages
‚ clean: Slightly modified reference implementations to
satisfy basic code quality requirements 1
‚ opt: Optimized portable C implementations
‚ m4: Optimized using ARMv7E-M assembly
1see https://github.com/PQClean/PQClean
5
![Page 26: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/26.jpg)
pqm4
§ Goals
‚ Framework that eases optimization for this platform
‚ Automate testing and benchmarking
‚ Include as many schemes as possible
§ 4 types of implementations
‚ ref: Reference C implementations from submission
packages
‚ clean: Slightly modified reference implementations to
satisfy basic code quality requirements 1
‚ opt: Optimized portable C implementations
‚ m4: Optimized using ARMv7E-M assembly
1see https://github.com/PQClean/PQClean
5
![Page 27: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/27.jpg)
pqm4
§ Goals
‚ Framework that eases optimization for this platform
‚ Automate testing and benchmarking
‚ Include as many schemes as possible
§ 4 types of implementations
‚ ref: Reference C implementations from submission
packages
‚ clean: Slightly modified reference implementations to
satisfy basic code quality requirements 1
‚ opt: Optimized portable C implementations
‚ m4: Optimized using ARMv7E-M assembly
1see https://github.com/PQClean/PQClean
5
![Page 28: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/28.jpg)
pqm4
§ Goals
‚ Framework that eases optimization for this platform
‚ Automate testing and benchmarking
‚ Include as many schemes as possible
§ 4 types of implementations
‚ ref: Reference C implementations from submission
packages
‚ clean: Slightly modified reference implementations to
satisfy basic code quality requirements 1
‚ opt: Optimized portable C implementations
‚ m4: Optimized using ARMv7E-M assembly
1see https://github.com/PQClean/PQClean
5
![Page 29: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/29.jpg)
‚ ref: Reference C implementations from submission
packages
‚ clean: Slightly modified reference implementations to
satisfy basic code quality requirements 1
‚ opt: Optimized portable C implementations
‚ m4: Optimized using ARMv7E-M assembly
pqm4
§ Goals
‚ Framework that eases optimization for this platform
‚ Automate testing and benchmarking
‚ Include as many schemes as possible
§ 4 types of implementations
1 see https://github.com/PQClean/PQClean
5
![Page 30: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/30.jpg)
‚ clean: Slightly modified reference implementations to
satisfy basic code quality requirements 1
‚ opt: Optimized portable C implementations
‚ m4: Optimized using ARMv7E-M assembly
pqm4
§ Goals
‚ Framework that eases optimization for this platform
‚ Automate testing and benchmarking
‚ Include as many schemes as possible
§ 4 types of implementations
‚ ref: Reference C implementations from submission
packages
1 see https://github.com/PQClean/PQClean
5
![Page 31: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/31.jpg)
‚ opt: Optimized portable C implementations
‚ m4: Optimized using ARMv7E-M assembly
pqm4
§ Goals
‚ Framework that eases optimization for this platform
‚ Automate testing and benchmarking
‚ Include as many schemes as possible
§ 4 types of implementations
‚ ref: Reference C implementations from submission
packages
‚ clean: Slightly modified reference implementations to
satisfy basic code quality requirements 1
1 see https://github.com/PQClean/PQClean
5
![Page 32: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/32.jpg)
‚ m4: Optimized using ARMv7E-M assembly
pqm4
§ Goals
‚ Framework that eases optimization for this platform
‚ Automate testing and benchmarking
‚ Include as many schemes as possible
§ 4 types of implementations
‚ ref: Reference C implementations from submission
packages
‚ clean: Slightly modified reference implementations to
satisfy basic code quality requirements 1
‚ opt: Optimized portable C implementations
1 see https://github.com/PQClean/PQClean
5
![Page 33: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/33.jpg)
pqm4
§ Goals
‚ Framework that eases optimization for this platform
‚ Automate testing and benchmarking
‚ Include as many schemes as possible
§ 4 types of implementations
‚ ref: Reference C implementations from submission
packages
‚ clean: Slightly modified reference implementations to
satisfy basic code quality requirements 1
‚ opt: Optimized portable C implementations
‚ m4: Optimized using ARMv7E-M assembly
1 see https://github.com/PQClean/PQClean
5
![Page 34: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/34.jpg)
Schemes included in pqm4– KEMs
reference optimized BIKE 7Lib — Classic McEliece 7Key — CRYSTALS-Kyber Frodo-KEM
3 3
3 3
[BKS19] [BFM ` 18]
HQC 7Lib — LAC 3 — LEDAcrypt 7RAM WIP NewHope 3 3 [AJS16] NTRU 3 3 [KRS19] NTRU Prime 3 — NTS-KEM 7Key — ROLLO 7Lib — Round5 3 3 Round5 team RQC 7Lib — SABER 3 3 [KRS19] SIKE 3 — ThreeBears 3 3 ThreeBears team
7Key : keys too large 7RAM : implementation uses too much RAM
7Lib: available implementations depend on external libraries 6
![Page 35: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/35.jpg)
Schemes included in pqm4– Signatures
reference optimized
CRYSTALS-Dilithium 3 3 [GKOS18, RSGCB19]
FALCON 7RAM 3 Falcon team
GeMSS 7Key —
LUOV 3 —
MQDSS 7RAM —
Picnic 7RAM —
qTESLA 3 —
Rainbow 7Key —
SPHINCS+ 3 —
7Key : keys too large 7RAM : implementation uses too much RAM
7Lib: available implementations depend on external libraries
7
![Page 36: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/36.jpg)
Benchmarking: Cycle Counts and RNG
§ Cycle counts
‚ We don’t want to benchmark the memory controller
§ Downclock core to 24MHz Ñ no wait states
§ Allows to have comprehensible cycle count
§ randombytes
‚ We use the hardware RNG of our platform
‚ Most schemes only sample seed, so speed doesn’t matter
8
![Page 37: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/37.jpg)
§ Downclock core to 24MHz Ñ no wait states
§ Allows to have comprehensible cycle count
§ randombytes
‚ We use the hardware RNG of our platform
‚ Most schemes only sample seed, so speed doesn’t matter
Benchmarking: Cycle Counts and RNG
§ Cycle counts
‚ We don’t want to benchmark the memory controller
8
![Page 38: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/38.jpg)
§ Allows to have comprehensible cycle count
§ randombytes
‚ We use the hardware RNG of our platform
‚ Most schemes only sample seed, so speed doesn’t matter
Benchmarking: Cycle Counts and RNG
§ Cycle counts
‚ We don’t want to benchmark the memory controller
§ Downclock core to 24MHz Ñ no wait states
8
![Page 39: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/39.jpg)
§ randombytes
‚ We use the hardware RNG of our platform
‚ Most schemes only sample seed, so speed doesn’t matter
Benchmarking: Cycle Counts and RNG
§ Cycle counts
‚ We don’t want to benchmark the memory controller
§ Downclock core to 24MHz Ñ no wait states § Allows to have comprehensible cycle count
8
![Page 40: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/40.jpg)
‚ We use the hardware RNG of our platform
‚ Most schemes only sample seed, so speed doesn’t matter
Benchmarking: Cycle Counts and RNG
§ Cycle counts
‚ We don’t want to benchmark the memory controller
§ Downclock core to 24MHz Ñ no wait states § Allows to have comprehensible cycle count
§ randombytes
8
![Page 41: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/41.jpg)
‚ Most schemes only sample seed, so speed doesn’t matter
Benchmarking: Cycle Counts and RNG
§ Cycle counts
‚ We don’t want to benchmark the memory controller
§ Downclock core to 24MHz Ñ no wait states § Allows to have comprehensible cycle count
§ randombytes
‚ We use the hardware RNG of our platform
8
![Page 42: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/42.jpg)
Benchmarking: Cycle Counts and RNG
§ Cycle counts
‚ We don’t want to benchmark the memory controller
§ Downclock core to 24MHz Ñ no wait states § Allows to have comprehensible cycle count
§ randombytes
‚ We use the hardware RNG of our platform
‚ Most schemes only sample seed, so speed doesn’t matter
8
![Page 43: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/43.jpg)
‚ We don’t want to benchmark those
§ Our approach: Replace those with a single fast
implementation to allow fair comparison
§ SHA-3: ARMv7-M assembly implementation from XKCP 1
§ SHA-2: Fast C implementation from SUPERCOP 2
§ AES: ARMv7-M assembly implementation from [SS16] 3
Benchmarking: Fast Hashing
§ Submission packages often come with different
implementations of SHA-2, SHA-3, or AES
1https://github.com/XKCP/XKCP 2https://bench.cr.yp.to/supercop.html 3Schwabe and Stoffelen, SAC 2016
9
![Page 44: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/44.jpg)
§ Our approach: Replace those with a single fast
implementation to allow fair comparison
§ SHA-3: ARMv7-M assembly implementation from XKCP 1
§ SHA-2: Fast C implementation from SUPERCOP 2
§ AES: ARMv7-M assembly implementation from [SS16] 3
Benchmarking: Fast Hashing
§ Submission packages often come with different
implementations of SHA-2, SHA-3, or AES
‚ We don’t want to benchmark those
1https://github.com/XKCP/XKCP 2https://bench.cr.yp.to/supercop.html 3Schwabe and Stoffelen, SAC 2016
9
![Page 45: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/45.jpg)
§ SHA-3: ARMv7-M assembly implementation from XKCP 1
§ SHA-2: Fast C implementation from SUPERCOP 2
§ AES: ARMv7-M assembly implementation from [SS16] 3
Benchmarking: Fast Hashing
§ Submission packages often come with different
implementations of SHA-2, SHA-3, or AES
‚ We don’t want to benchmark those
§ Our approach: Replace those with a single fast
implementation to allow fair comparison
1https://github.com/XKCP/XKCP 2https://bench.cr.yp.to/supercop.html 3Schwabe and Stoffelen, SAC 2016
9
![Page 46: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/46.jpg)
§ SHA-2: Fast C implementation from SUPERCOP 2
§ AES: ARMv7-M assembly implementation from [SS16] 3
Benchmarking: Fast Hashing
§ Submission packages often come with different
implementations of SHA-2, SHA-3, or AES
‚ We don’t want to benchmark those
§ Our approach: Replace those with a single fast
implementation to allow fair comparison
§ SHA-3: ARMv7-M assembly implementation from XKCP 1
1https://github.com/XKCP/XKCP 2https://bench.cr.yp.to/supercop.html 3Schwabe and Stoffelen, SAC 2016
9
![Page 47: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/47.jpg)
§ AES: ARMv7-M assembly implementation from [SS16] 3
Benchmarking: Fast Hashing
§ Submission packages often come with different
implementations of SHA-2, SHA-3, or AES
‚ We don’t want to benchmark those
§ Our approach: Replace those with a single fast
implementation to allow fair comparison
§ SHA-3: ARMv7-M assembly implementation from XKCP 1
§ SHA-2: Fast C implementation from SUPERCOP 2
1https://github.com/XKCP/XKCP 2https://bench.cr.yp.to/supercop.html 3Schwabe and Stoffelen, SAC 2016
9
![Page 48: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/48.jpg)
Benchmarking: Fast Hashing
§ Submission packages often come with different
implementations of SHA-2, SHA-3, or AES
‚ We don’t want to benchmark those
§ Our approach: Replace those with a single fast
implementation to allow fair comparison
§ SHA-3: ARMv7-M assembly implementation from XKCP 1
§ SHA-2: Fast C implementation from SUPERCOP 2
§ AES: ARMv7-M assembly implementation from [SS16] 3
1https://github.com/XKCP/XKCP 2https://bench.cr.yp.to/supercop.html 3Schwabe and Stoffelen, SAC 2016
9
![Page 49: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/49.jpg)
§ We tried to collect as many implementations as possible
§ If we missed something
‚ Send us an e-mail or talk to us
‚ Open a pull request
§ For the following results, we restrict to
‚ Only schemes that have been optimized
‚ NIST security level 1
‚ For KEMs: CCA variants
‚ Only SHA-3/SHAKE variants
§ For the full results
‚ see paper at https://ia.cr/2019/844
‚ see https://github.com/mupq/pqm4
Results
§ Not all schemes have been optimized for this platform yet
10
![Page 50: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/50.jpg)
§ If we missed something
‚ Send us an e-mail or talk to us
‚ Open a pull request
§ For the following results, we restrict to
‚ Only schemes that have been optimized
‚ NIST security level 1
‚ For KEMs: CCA variants
‚ Only SHA-3/SHAKE variants
§ For the full results
‚ see paper at https://ia.cr/2019/844
‚ see https://github.com/mupq/pqm4
Results
§ Not all schemes have been optimized for this platform yet
§ We tried to collect as many implementations as possible
10
![Page 51: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/51.jpg)
‚ Send us an e-mail or talk to us
‚ Open a pull request
§ For the following results, we restrict to
‚ Only schemes that have been optimized
‚ NIST security level 1
‚ For KEMs: CCA variants
‚ Only SHA-3/SHAKE variants
§ For the full results
‚ see paper at https://ia.cr/2019/844
‚ see https://github.com/mupq/pqm4
Results
§ Not all schemes have been optimized for this platform yet
§ We tried to collect as many implementations as possible
§ If we missed something
10
![Page 52: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/52.jpg)
‚ Open a pull request
§ For the following results, we restrict to
‚ Only schemes that have been optimized
‚ NIST security level 1
‚ For KEMs: CCA variants
‚ Only SHA-3/SHAKE variants
§ For the full results
‚ see paper at https://ia.cr/2019/844
‚ see https://github.com/mupq/pqm4
Results
§ Not all schemes have been optimized for this platform yet
§ We tried to collect as many implementations as possible
§ If we missed something
‚ Send us an e-mail or talk to us
10
![Page 53: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/53.jpg)
§ For the following results, we restrict to
‚ Only schemes that have been optimized
‚ NIST security level 1
‚ For KEMs: CCA variants
‚ Only SHA-3/SHAKE variants
§ For the full results
‚ see paper at https://ia.cr/2019/844
‚ see https://github.com/mupq/pqm4
Results
§ Not all schemes have been optimized for this platform yet
§ We tried to collect as many implementations as possible
§ If we missed something
‚ Send us an e-mail or talk to us
‚ Open a pull request
10
![Page 54: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/54.jpg)
‚ Only schemes that have been optimized
‚ NIST security level 1
‚ For KEMs: CCA variants
‚ Only SHA-3/SHAKE variants
§ For the full results
‚ see paper at https://ia.cr/2019/844
‚ see https://github.com/mupq/pqm4
Results
§ Not all schemes have been optimized for this platform yet
§ We tried to collect as many implementations as possible
§ If we missed something
‚ Send us an e-mail or talk to us
‚ Open a pull request
§ For the following results, we restrict to
10
![Page 55: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/55.jpg)
‚ NIST security level 1
‚ For KEMs: CCA variants
‚ Only SHA-3/SHAKE variants
§ For the full results
‚ see paper at https://ia.cr/2019/844
‚ see https://github.com/mupq/pqm4
Results
§ Not all schemes have been optimized for this platform yet
§ We tried to collect as many implementations as possible
§ If we missed something
‚ Send us an e-mail or talk to us
‚ Open a pull request
§ For the following results, we restrict to
‚ Only schemes that have been optimized
10
![Page 56: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/56.jpg)
‚ For KEMs: CCA variants
‚ Only SHA-3/SHAKE variants
§ For the full results
‚ see paper at https://ia.cr/2019/844
‚ see https://github.com/mupq/pqm4
Results
§ Not all schemes have been optimized for this platform yet
§ We tried to collect as many implementations as possible
§ If we missed something
‚ Send us an e-mail or talk to us
‚ Open a pull request
§ For the following results, we restrict to
‚ Only schemes that have been optimized
‚ NIST security level 1
10
![Page 57: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/57.jpg)
‚ Only SHA-3/SHAKE variants
§ For the full results
‚ see paper at https://ia.cr/2019/844
‚ see https://github.com/mupq/pqm4
Results
§ Not all schemes have been optimized for this platform yet
§ We tried to collect as many implementations as possible
§ If we missed something
‚ Send us an e-mail or talk to us
‚ Open a pull request
§ For the following results, we restrict to
‚ Only schemes that have been optimized
‚ NIST security level 1
‚ For KEMs: CCA variants
10
![Page 58: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/58.jpg)
§ For the full results
‚ see paper at https://ia.cr/2019/844
‚ see https://github.com/mupq/pqm4
Results
§ Not all schemes have been optimized for this platform yet
§ We tried to collect as many implementations as possible
§ If we missed something
‚ Send us an e-mail or talk to us
‚ Open a pull request
§ For the following results, we restrict to
‚ Only schemes that have been optimized
‚ NIST security level 1
‚ For KEMs: CCA variants
‚ Only SHA-3/SHAKE variants
10
![Page 59: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/59.jpg)
‚ see paper at https://ia.cr/2019/844
‚ see https://github.com/mupq/pqm4
Results
§ Not all schemes have been optimized for this platform yet
§ We tried to collect as many implementations as possible
§ If we missed something
‚ Send us an e-mail or talk to us
‚ Open a pull request
§ For the following results, we restrict to
‚ Only schemes that have been optimized
‚ NIST security level 1
‚ For KEMs: CCA variants
‚ Only SHA-3/SHAKE variants
§ For the full results
10
![Page 60: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/60.jpg)
‚ see https://github.com/mupq/pqm4
Results
§ Not all schemes have been optimized for this platform yet
§ We tried to collect as many implementations as possible
§ If we missed something
‚ Send us an e-mail or talk to us
‚ Open a pull request
§ For the following results, we restrict to
‚ Only schemes that have been optimized
‚ NIST security level 1
‚ For KEMs: CCA variants
‚ Only SHA-3/SHAKE variants
§ For the full results
‚ see paper at https://ia.cr/2019/844
10
![Page 61: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/61.jpg)
Results
§ Not all schemes have been optimized for this platform yet
§ We tried to collect as many implementations as possible
§ If we missed something
‚ Send us an e-mail or talk to us
‚ Open a pull request
§ For the following results, we restrict to
‚ Only schemes that have been optimized
‚ NIST security level 1
‚ For KEMs: CCA variants
‚ Only SHA-3/SHAKE variants
§ For the full results
‚ see paper at https://ia.cr/2019/844
‚ see https://github.com/mupq/pqm4
10
![Page 62: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/62.jpg)
KEM Speed
baby
bear-
opt
frodo
kem64
0shake
-m4
kyber5
12-m
4
lightsa
ber-m
4
ntruh
ps204
8509
-m4
ntruh
rss70
1-m4
r5nd-1
kemcca
-0d-m
4
r5nd-1
kemcca
-5d-m
40
20000
40000
60000
80000
100000
120000
140000
160000k
cycle
sKeyGenEncapsDecaps
11
![Page 63: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/63.jpg)
KEM Speed (2)
baby
bear-
opt
frodo
kem64
0shake
-m4
kyber5
12-m
4
lightsa
ber-m
4
ntruh
ps204
8509
-m4
ntruh
rss70
1-m4
r5nd-1
kemcca
-0d-m
4
r5nd-1
kemcca
-5d-m
40
250
500
750
1000
1250
1500
1750
2000k
cycle
sKeyGenEncapsDecaps
12
![Page 64: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/64.jpg)
Signature Speed
dilithi
um2-m
4
falcon
512-m
4-ct
0
25000
50000
75000
100000
125000
150000
175000
200000k
cycle
sKeyGenSignVerify
13
![Page 65: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/65.jpg)
Signature Speed (2)
dilithi
um2-m
4
falcon
512-m
4-ct
0
5000
10000
15000
20000
25000
30000
35000
40000k
cycle
sKeyGenSignVerify
14
![Page 66: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/66.jpg)
KEM RAM consumption
baby
bear-
opt
frodo
kem64
0shake
-m4
kyber5
12-m
4
lightsa
ber-m
4
ntruh
ps204
8509
-m4
ntruh
rss70
1-m4
r5nd-1
kemcca
-0d-m
4
r5nd-1
kemcca
-5d-m
40
10000
20000
30000
40000
50000
60000
70000by
tes
KeyGenEncapsDecaps
15
![Page 67: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/67.jpg)
KEM RAM consumption (2)
baby
bear-
opt
frodo
kem64
0shake
-m4
kyber5
12-m
4
lightsa
ber-m
4
ntruh
ps204
8509
-m4
ntruh
rss70
1-m4
r5nd-1
kemcca
-0d-m
4
r5nd-1
kemcca
-5d-m
40
1000
2000
3000
4000
5000
6000
7000
8000
9000by
tes
KeyGenEncapsDecaps
16
![Page 68: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/68.jpg)
Signature RAM consumption
dilithi
um2-m
4
falcon
512-m
4-ct
0
10000
20000
30000
40000
50000
60000by
tes
KeyGenSignVerify
17
![Page 69: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/69.jpg)
Signature RAM consumption (2)
dilithi
um2-m
4
falcon
512-m
4-ct
0
1000
2000
3000
4000
5000by
tes
KeyGenSignVerify
18
![Page 70: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/70.jpg)
‚ 10 KEMs (6 optimized)
‚ 5 signature schemes (2 optimized)
§ Current implementations of Classic McEliece, LEDAcrypt,
NTS-KEM, GeMSS, MQDSS, Picnic, and Rainbow consume
more than 128 KiB of RAM
Ñ don’t fit
§ BIKE, HQC, ROLLO, RQC use OpenSSL/NTL/GMP
Ñ needs to be replaced to make it work
Conclusion
§ pqm4 currently includes
19
![Page 71: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/71.jpg)
‚ 5 signature schemes (2 optimized)
§ Current implementations of Classic McEliece, LEDAcrypt,
NTS-KEM, GeMSS, MQDSS, Picnic, and Rainbow consume
more than 128 KiB of RAM
Ñ don’t fit
§ BIKE, HQC, ROLLO, RQC use OpenSSL/NTL/GMP
Ñ needs to be replaced to make it work
Conclusion
§ pqm4 currently includes
‚ 10 KEMs (6 optimized)
19
![Page 72: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/72.jpg)
§ Current implementations of Classic McEliece, LEDAcrypt,
NTS-KEM, GeMSS, MQDSS, Picnic, and Rainbow consume
more than 128 KiB of RAM
Ñ don’t fit
§ BIKE, HQC, ROLLO, RQC use OpenSSL/NTL/GMP
Ñ needs to be replaced to make it work
Conclusion
§ pqm4 currently includes
‚ 10 KEMs (6 optimized)
‚ 5 signature schemes (2 optimized)
19
![Page 73: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/73.jpg)
§ BIKE, HQC, ROLLO, RQC use OpenSSL/NTL/GMP
Ñ needs to be replaced to make it work
Conclusion
§ pqm4 currently includes
‚ 10 KEMs (6 optimized)
‚ 5 signature schemes (2 optimized)
§ Current implementations of Classic McEliece, LEDAcrypt,
NTS-KEM, GeMSS, MQDSS, Picnic, and Rainbow consume
more than 128 KiB of RAM
Ñ don’t fit
19
![Page 74: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/74.jpg)
Conclusion
§ pqm4 currently includes
‚ 10 KEMs (6 optimized)
‚ 5 signature schemes (2 optimized)
§ Current implementations of Classic McEliece, LEDAcrypt,
NTS-KEM, GeMSS, MQDSS, Picnic, and Rainbow consume
more than 128 KiB of RAM
Ñ don’t fit
§ BIKE, HQC, ROLLO, RQC use OpenSSL/NTL/GMP
Ñ needs to be replaced to make it work
19
![Page 75: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/75.jpg)
§ Level of optimization greatly differs
‚ Most implementations don’t optimize RAM consumption
‚ No implementations optimize code size
‚ What does NIST care about?
§ Currently, Round5 seems to be the fastest on this platform
‚ But Kyber, NTRU, Saber, ThreeBears very close
Conclusion
§ Still many schemes left to optimize Ñ please PR
20
![Page 76: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/76.jpg)
‚ Most implementations don’t optimize RAM consumption
‚ No implementations optimize code size
‚ What does NIST care about?
§ Currently, Round5 seems to be the fastest on this platform
‚ But Kyber, NTRU, Saber, ThreeBears very close
Conclusion
§ Still many schemes left to optimize Ñ please PR
§ Level of optimization greatly differs
20
![Page 77: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/77.jpg)
‚ No implementations optimize code size
‚ What does NIST care about?
§ Currently, Round5 seems to be the fastest on this platform
‚ But Kyber, NTRU, Saber, ThreeBears very close
Conclusion
§ Still many schemes left to optimize Ñ please PR
§ Level of optimization greatly differs
‚ Most implementations don’t optimize RAM consumption
20
![Page 78: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/78.jpg)
‚ What does NIST care about?
§ Currently, Round5 seems to be the fastest on this platform
‚ But Kyber, NTRU, Saber, ThreeBears very close
Conclusion
§ Still many schemes left to optimize Ñ please PR
§ Level of optimization greatly differs
‚ Most implementations don’t optimize RAM consumption
‚ No implementations optimize code size
20
![Page 79: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/79.jpg)
§ Currently, Round5 seems to be the fastest on this platform
‚ But Kyber, NTRU, Saber, ThreeBears very close
Conclusion
§ Still many schemes left to optimize Ñ please PR
§ Level of optimization greatly differs
‚ Most implementations don’t optimize RAM consumption
‚ No implementations optimize code size
‚ What does NIST care about?
20
![Page 80: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/80.jpg)
‚ But Kyber, NTRU, Saber, ThreeBears very close
Conclusion
§ Still many schemes left to optimize Ñ please PR
§ Level of optimization greatly differs
‚ Most implementations don’t optimize RAM consumption
‚ No implementations optimize code size
‚ What does NIST care about?
§ Currently, Round5 seems to be the fastest on this platform
20
![Page 81: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/81.jpg)
Conclusion
§ Still many schemes left to optimize Ñ please PR
§ Level of optimization greatly differs
‚ Most implementations don’t optimize RAM consumption
‚ No implementations optimize code size
‚ What does NIST care about?
§ Currently, Round5 seems to be the fastest on this platform
‚ But Kyber, NTRU, Saber, ThreeBears very close
20
![Page 82: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/82.jpg)
https://github.com/mupq/pqm4
slides and paper available at kannwischer.eu
Thank you!
20
![Page 83: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/83.jpg)
References i
Erdem Alkim, Philipp Jakubeit, and Peter Schwabe.
A new hope on ARM Cortex-M.
In Security, Privacy, and Advanced Cryptography Engineering,
volume 10076 of Lecture Notes in Computer Science, pages
332–349. Springer-Verlag Berlin Heidelberg, 2016.
Joppe W. Bos, Simon Friedberger, Marco Martinoli, Elisabeth
Oswald, and Martijn Stam.
Fly, you fool! faster frodo for the ARM Cortex-M4. Cryptology ePrint Archive, Report 2018/1116, 2018. https://eprint.iacr.org/2018/1116.
21
![Page 84: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/84.jpg)
References ii
Leon Botros, Matthias J. Kannwischer, and Peter Schwabe.
Memory-efficient high-speed implementation of Kyber on
Cortex-M4.
In Progress in Cryptology – Africacrypt 2019, Lecture Notes in
Computer Science, pages 209–228. Springer-Verlag Berlin
Heidelberg, 2019.
Tim Guneysu, Markus Krausz, Tobias Oder, and Julian Speith.
Evaluation of lattice-based signature schemes in
embedded systems.
In 25th IEEE International Conference on Electronics, Circuits
and Systems, ICECS 2018, pages 385–388, 2018.
22
![Page 85: pqm4: Testing and Benchmarking NIST PQC on ARM Cortex-M4](https://reader030.fdocuments.us/reader030/viewer/2022032817/623f66c4877cd334bf73c892/html5/thumbnails/85.jpg)
References iii
Matthias J. Kannwischer, Joost Rijneveld, and Peter Schwabe.
Faster multiplication in Z2m rxs on Cortex-M4 to speed up
NIST PQC candidates.
In Applied Cryptography and Network Security, Lecture Notes
in Computer Science, pages 281–301. Springer-Verlag Berlin
Heidelberg, 2019.
Prasanna Ravi, Sourav Sen Gupta, Anupam Chattopadhyay,
and Shivam Bhasin.
Improving speed of Dilithium’s signing procedure. Cryptology ePrint Archive, Report 2019/420, 2019. https://eprint.iacr.org/2019/420.
23