Evaluation of the CAESAR Hardware API for Lightweight 0 ... · Evaluation of the CAESAR Hardware...
Transcript of Evaluation of the CAESAR Hardware API for Lightweight 0 ... · Evaluation of the CAESAR Hardware...
Evaluation of the CAESAR Hardware API for LightweightImplementations
Panasayya Yalla, Jens-Peter KapsDepartment of Electrical and Computer Engineering, George Mason University, Fairfax, Virginia 22030, USA
AbstractThe Competition for Authenticated Encryption: Security, Applicability,and Robustness (CAESAR) requires that all hardware implementationsof candidate algorithms adhere to the CAESAR Hardware API [1]. TheCAESAR Hardware API is supported by a development package whichincludes VHDL code for universal pre- and post-processors forhigh-speed and recently also for lightweight implementations. Theseprocessors are designed to make a cipher core compliant with the API.In this work we verify that the lightweight package has a smaller areafootprint than the high-speed package. We also show that the overheadof using the generic lightweight pre- and post-processors over integratingtheir functionality into the cipher core is negligible. As part of these casestudies, we have developed the first lightweight implementations ofKetje-Sr, Ascon-128, and Ascon-128a.
Introduction and MotivationI CAESAR evaluates candidates for a final portfolio of newAuthenticated Encryption with Associated Data (AEAD) algorithms.
I All candidates must adhere to the CAESAR hardware (HW)Application Programming Interface (API).
I The HW API is one component which enables a fair comparisonamong algorithms.I Independent FIFO inputs for public data (PDI) and secret data (SDI) andFIFO output (DO).
I In-band signaling for commands and data types using a simple protocol.I CAESAR HW API is supported by an implementer’s guide anddevelopment package [2].I Includes VHDL code for high-speed (HS) and lightweight (LW)implementations.
I Pre- and PostProcessor separate protocol from cryptographic algorithm.I Bypass FIFO stores and passes header information to PostProcessor.
I It is generally assumed that having generic pre-and post-processorsincreases the area consumption over merging their functionality withthe cipher cores.
Differences between HS vs. LW Packages
High-SpeedI Supports bus width32 ≤ w ≤ 256 in multiples of 8.
I PreProcessor expands PDI andSDI data to full block size forCipherCore.
I PreProcessor stores one block ofPDI and SDI data.
I PreProcessor contains universalpadding unit.
I Tag comparison has to beperformed in CipherCore.
LightweightI Supports bus width w of 8, 16,and 32.
I PreProcessor, CipherCore,Bypass FIFO, and PostProcessorhave equal bus width.
I PreProcessor has no datastorage.
I Assumes padding is performed inCipherCore.
I PostProcessor supports tagcomparison.
CAESAR High-Speed Block Diagram
24 24
sdi_valid
sdi_readysdi_ready
sdi_valid
sdi_data sdi_datasw
DBLK_SIZEDatapath
CipherCore
DBLK_SIZE DBLK_SIZE
din_valid
din_ready
din FIFO
CMD
dout
dout_ready
dout_valid
key_update
bdi_eot
bdi_eoi
bdi_type
bdi_ready
3
bdi_valid
key_update
bdi_eot
bdi_eoi
bdi_type
bdi_ready
bdo_size
bdo_ready
Controller
CipherCorebdi_valid bdo_valid
bdo_size
bdo_ready
bdo_valid
key_valid
key_ready
key_valid
key_ready
LBS_BYTES+1
decrypt decrypt
bdi_valid_bytes
bdi_pad_loc
DBLK_SIZE/8
DBLK_SIZE/8
bdi_size
bdi_pad_loc
bdi_valid_bytes
bdi_sizeLBS_BYTES+1
CipherCore
pdi_valid
pdi_readypdi_ready
pdi_valid
OptionalRequired
do_valid do_valid
pdi_data
do_datado_data
pdi_dataw
w
cm
d_valid
cm
d_re
ady
cm
d
cm
d_va
lid
cm
d_
read
y
cm
d
bdi
key
bdo
bdi
key
DBLK_SIZEbdo
KEY_SIZE
AEAD_TP
fdo_valid
fdi_
ready
dout_
valid
dout_
ready
din
_valid
din
_re
ady
dout
din
bdi_partialbdi_partial
fdi_
data
fdo_data
fdi_
valid
fdo_re
ady
Two−Pass
FIFO
msg_auth_valid msg_auth_valid
msg_authmsg_auth
msg_auth_ready msg_auth_ready
Processor
Pre
Processor
Post
do_ready do_ready
CAESAR Lightweight Block Diagram
sdi_valid
sdi_readysdi_ready
sdi_valid
sdi_data sdi_data
Processor
Pre
sw
w
w/8
w/8
w/8+1
w w
w
4
Processor
Post
w/8
Tag
Comparator
w
pdi_valid
pdi_readypdi_ready
pdi_valid
pdi_data pdi_data
do_ready do_ready
do_valid do_valid
do_datado_dataw
4
bdi
AEAD
bdi_eoi
bdi_eot
key_valid
key_ready
key
cmd_valid
cmd_ready
Required Optional
bdi_type
decrypt
key_update
CipherCore
key
key_valid
key_ready
bdi
bdi_ready
bdi_valid bdi_valid
bdi_ready
bdi_partial
bdi_eot
bdi_eoi
bdi_type
decrypt_in
key_update
bdo
bdo_valid
bdo_ready
din_valid
din_ready
dout_valid
dout_ready
doutHeader/Tag
FIFO cmd_valid
cmd_ready
din
end_of_block
bdo_ready
bdo_valid
bdo
end_of_block
cmd cmd
bdi_partial
bdi_size bdi_size
bdi_valid_bytes bdi_valid_bytes
bdi_pad_locbdi_pad_loc bdo_valid_bytes bdo_valid_bytes
msg_auth_readymsg_auth_ready
msg_auth_valid msg_auth_valid
msg_authmsg_auth
bdo_type bdo_type
sw
Protocol: InstructionLSB
Status
Opcodeor
4 12
MSB
Reserved
16-bit Instruction with w=16LSB
4
2
Reserved
Reserved
Opcode
Statusor
MSB
4
16-bit Instruction with w=8
Rd−Inst Rd−Rsvd
Rd−Hdr w=16, 32w=8
States for ProcessingInstruction
Protocol: Segment Header
EOI
8
1 1 1 14
16
ReservedInfo Segment Length
MSB
TypeSegment
LSB
EOT
Partial
Last
8
32-bit Header
(LSB)
4
Info
Reserved
MSB LSB
Segment Length(MSB)
Segment Length
8
Segment Length
8
2
Info Reserved
MSB LSB
8
With w= 8, and 16
Rd−Hdr Rd−Rsvd
Seglen1SegLen
Seglen2Ld−Data
w=8w=16
w= 32
States forProcessing Header
Case Study1. Determine overhead of CAESAR LW package.
I Ketje-Sr implementation with integrated support of CAESAR API.I Ketje-Sr implementation using new CAESAR lightweight developmentpackage.
2. Determine overhead of CAESAR LW package vs HS package.I Implementation of Ascon using CAESAR LW package.I Using existing Ascon HS implementation.
Ketje-SrI Ketje [3] is based on round reduced Keccak-f called MonkeyWrap.I Has four variants Ketje-Jr, Ketje-Sr, Ketje-Minor, and Ketje-Majorwhich use Keccak-p∗[200], Keccak-p∗[400], Keccak-p∗[800], andKeccak-p∗[1600] respectively.
I Each round of Keccak-p∗ consists of five steps θ, ρ, π, χ, and ι.I In θ step, each bit in the state is Xored with two other bits fromtwo different columns.
I The state bits are rotated for each lane using one of the 25 differentoffsets in ρ step
I Lanes are rearranged in π, integer multiplication in χ.I The last step is ι, where a round constant is added.
Ketje-Sr Datapath
I We implemented aKetje-Sr using a 16-bitdatapath and interface.
I Datapath is the samefor integrated CAESARAPI support and usingCAESAR LW package.
I State is stored in adual-port memory(RAM) with one
Keypack
<<<1
Port−B
Port−A
RAM
Rho
16
RAMK2(LSB)
8
15
RAMK1(MSB)
0
7 reg−K
15
8
7
0
Padding
reg−Arcon16
16
16
sdi_data
pdi_data
do_data
read/write and one read-only ports.I To reduce the complexity of padding for key, the key size is fixed to128-bits.
I Two memory units (RAMK1, and RAMK2) with pre-stored valuesand a register (reg-K) for key storage and KeyPack operations.
I Padding for message and AD using multiplexers.I Needs 160 clock cycles to process a 32-bit block.I TP = 32
160 · F
AsconI Ascon[4] is a permutation based authenticated cipher.I Ascon-128, and Ascon-128a - two variants with block sizes of 64and 128 respectively.
I In each round, three sub transformations called constant-addition,substitution, and linear diffusion are applied.
I Constant-addition is the first operation in the round, where aconstant is added to one of the five words. Twelve round constantsare used.
I Substitution layer uses 5x5 S-boxes.I Linear diffusion layer for diffusion across each of the five 64-bitwords using circular shifts and an XOR.
Ascon Datapath
I A 64-bit datapath isused in this design anda 32-bit interface.
I The state is stored in adual-port RAM.
I Key is stored in a RAM.I The substitution layeris implemented in abit-slice fashion.
I Due to the contentionon the RAM ports, 18operations take 33clock cycles.
31
063
32
32
Padding
31
0 RAMK
63
32
R1
32
32
64
32
32
pdi_data
sdi_data
LDiff
63
10
00
0
63
8
rcon7
0
do_data
64
1
0
0
64
Port−B
Port−A
RAM
I The round constants are generated using two 4 bit registers andadders.
I Two 5-to-1 multiplexers are used to perform circular shifts in lineardiffusion step (LDiff).
I TP = 12833·8 · f
Case Study 1: Integrated vs. LW Package
Implementation Results on Xilinx Spartan-6 FPGA using ATHENa [5]
Flip- Freq TP TP/AreaDesign Slices LUTs Flops [MHz] [Mbps] [Mbps/slice]
KETJE-SR1 140 436 98 122.4 24.48 0.17KETJE-SR2 155 450 114 120.1 24.03 0.16Overhead 15 14 16ASCON-1282 231 684 268 216.0 60.10 0.26ASCON-128a2 231 684 268 216.0 119.16 0.52Joltik [6]3 168 534 381 200.0 426.67 2.54ACORN [6]4 202 540 383 231.6 1,852.80 9.17
1 ⇒ Dedicated CAESAR API; 2 ⇒ CAESAR LW Package; 3 ⇒ Not compliant to CAESAR API;4 ⇒ Tweaked CAESAR HS Package
I Using CAESAR LW Package leads to a small area increase.I Three separate counters for sdi, pdi and do buses are used for simplicity andparallel operation.
I Counter for sdi can be dropped if cipher core provides end_of_key signal.I Comparing our designs which each other and other reportedimplementations.I Ascon-128a has 4 times the TP while consuming only 50% more slices.I Joltik implementation is not compliant with CAESAR API but performssignificantly better.
I ACORN is based on a stream cipher which typically perform very well inlightweight implementations.
Case Study 2: Area Overhead HS vs. LW Pkg.
Area Overhead High-Speed (HS) vs. LightWeight (LW) PackagesImplementation Results on Xilinx Spartan-6 FPGA using ATHENa [5]
Design Top-level Slices LUTs Filp-FlopsAEAD1 231 684 268
LW Ascon CipherCore 196 606 212Overhead 35 78 56
AEAD2 416 1282 792HS Ascon [6] CipherCore 379 1033 529Overhead 37 249 263
1 ⇒ CAESAR LW Package; 2 ⇒ CAESAR HS Package
I Adding CAESAR API support toI LW core using LW Package leads to a small area increase,I HS core using HS Package leads to a larger area increase.
ConclusionsI The graph showsimplementation results ofKetje-Sr on Spartan-6.
I Using the CAESAR LWPackage leads to a small areaincrease over integrateddesigns.
I This small increase can easilybe mitigated. Slices LUTs FFs
050
100150200250300350400450500
Case Study 1
IntegratedLW Package
I The graph shows the overheadincurred for implementationsof Ascon on Spartan-6.
I CEASAR HS Package leads toa much larger area increasethan the LW Package as itexpands the data and keybuses to the full block size.
Slices LUTs FFs 0
50
100
150
200
250
300Case Study 2
LW OverheadHS Overhead
I CAESAR LW Package allows for bus widths of 8 and 16 bits, which arenot currently supported by CAESAR HS Package.
I The CAESAR LW-Package reduces the design time for LWimplementations.
I The CAESAR LW Package will be included in the next release of theDevelopment Package for the CAESAR Hardware API.
I The usage will be documented in the next release of the Implementer’sGuide to the CAESAR Hardware API.
AcknowledgmentThe CAESAR Lightweight API Sup-port Package was developed in collab-oration with Fabrizio De Santis andMichael Tempelmeier from
References[1] E. Homsirikamol, W. Diehl, A. Ferozpuri, F. Farahmand, P. Yalla, J.-P. Kaps, and K. Gaj, “CAESAR
hardware API,” Cryptology ePrint Archive, Report 2016/626, 2016, http://eprint.iacr.org/2016/626.[2] “Development package for the CAESAR hardware APIv1.2,”
https://cryptography.gmu.edu/athena/AEAD/GMU_AEAD_HW_API_v1_2.zip, accessed:2017-06-30.
[3] G. Bertoni, J. Daemen, M. Peeters, G. Van Assche, and R. Van Keer, “CAESAR submission:Ketje v2,”Submission to CAESAR (Round3), September 2016,https://competitions.cr.yp.to/round3/ketjev2.pdf.
[4] C. Dobraunig, M. Eichlseder, F. Mendel, and M. Schläffer, “ASCON v1.2,” Submission to CAESAR(Round3), September 2016.
[5] K. Gaj, J.-P. Kaps, V. Amirineni, M. Rogawski, E. Homsirikamol, and B. Y. Brewster, “ATHENa –automated tool for hardware evaluation: Toward fair and comprehensive benchmarking of cryptographichardware using FPGAs,” in 20th International Conference on Field Programmable Logic andApplications - FPL 2010. IEEE, 2010, pp. 414–421, winner of the FPL Community Award.
[6] “ATHENa database of FPGA results for authenticated ciphers,”https://cryptography.gmu.edu/athenadb/fpga_auth_cipher/table_view, accessed: 2017-07-30.
Cryptographic Engineering Research Group (CERG) Department of Electrical and Computer Engineering George Mason University http://cryptography.gmu.edu