Evaluation of the CAESAR Hardware API for Lightweight 0 ... · Evaluation of the CAESAR Hardware...

1
Evaluation of the CAESAR Hardware API for Lightweight Implementations Panasayya Yalla, Jens-Peter Kaps Department of Electrical and Computer Engineering, George Mason University, Fairfax, Virginia 22030, USA Abstract The Competition for Authenticated Encryption: Security, Applicability, and Robustness (CAESAR) requires that all hardware implementations of candidate algorithms adhere to the CAESAR Hardware API [1]. The CAESAR Hardware API is supported by a development package which includes VHDL code for universal pre- and post-processors for high-speed and recently also for lightweight implementations. These processors are designed to make a cipher core compliant with the API. In this work we verify that the lightweight package has a smaller area footprint than the high-speed package. We also show that the overhead of using the generic lightweight pre- and post-processors over integrating their functionality into the cipher core is negligible. As part of these case studies, we have developed the first lightweight implementations of Ketje-Sr, Ascon-128, and Ascon-128a. Introduction and Motivation I CAESAR evaluates candidates for a final portfolio of new Authenticated Encryption with Associated Data (AEAD) algorithms. I All candidates must adhere to the CAESAR hardware (HW) Application Programming Interface (API). I The HW API is one component which enables a fair comparison among algorithms. I Independent FIFO inputs for public data (PDI) and secret data (SDI) and FIFO output (DO). I In-band signaling for commands and data types using a simple protocol. I CAESAR HW API is supported by an implementer’s guide and development package [2]. I Includes VHDL code for high-speed (HS) and lightweight (LW) implementations. I Pre- and PostProcessor separate protocol from cryptographic algorithm. I Bypass FIFO stores and passes header information to PostProcessor. I It is generally assumed that having generic pre-and post-processors increases the area consumption over merging their functionality with the cipher cores. Differences between HS vs. LW Packages High-Speed I Supports bus width 32 w 256 in multiples of 8. I PreProcessor expands PDI and SDI data to full block size for CipherCore. I PreProcessor stores one block of PDI and SDI data. I PreProcessor contains universal padding unit. I Tag comparison has to be performed in CipherCore. Lightweight I Supports bus width w of 8, 16, and 32. I PreProcessor, CipherCore, Bypass FIFO, and PostProcessor have equal bus width. I PreProcessor has no data storage. I Assumes padding is performed in CipherCore. I PostProcessor supports tag comparison. CAESAR High-Speed Block Diagram 24 24 sdi_valid sdi_ready sdi_ready sdi_valid sdi_data sdi_data sw DBLK_SIZE Datapath CipherCore DBLK_SIZE DBLK_SIZE din_valid din_ready din FIFO CMD dout dout_ready dout_valid key_update bdi_eot bdi_eoi bdi_type bdi_ready 3 bdi_valid key_update bdi_eot bdi_eoi bdi_type bdi_ready bdo_size bdo_ready Controller CipherCore bdi_valid bdo_valid bdo_size bdo_ready bdo_valid key_valid key_ready key_valid key_ready LBS_BYTES+1 decrypt decrypt bdi_valid_bytes bdi_pad_loc DBLK_SIZE/8 DBLK_SIZE/8 bdi_size bdi_pad_loc bdi_valid_bytes bdi_size LBS_BYTES+1 CipherCore pdi_valid pdi_ready pdi_ready pdi_valid Optional Required do_valid do_valid pdi_data do_data do_data pdi_data w w cmd_valid cmd_ready cmd cmd_valid cmd_ready cmd bdi key bdo bdi key DBLK_SIZE bdo KEY_SIZE AEAD_TP fdo_valid fdi_ready dout_valid dout_ready din_valid din_ready dout din bdi_partial bdi_partial fdi_data fdo_data fdi_valid fdo_ready Two-Pass FIFO msg_auth_valid msg_auth_valid msg_auth msg_auth msg_auth_ready msg_auth_ready Processor Pre Processor Post do_ready do_ready CAESAR Lightweight Block Diagram sdi_valid sdi_ready sdi_ready sdi_valid sdi_data sdi_data Processor Pre sw w w/8 w/8 w/8+1 w w w 4 Processor Post w/8 Tag Comparator w pdi_valid pdi_ready pdi_ready pdi_valid pdi_data pdi_data do_ready do_ready do_valid do_valid do_data do_data w 4 bdi AEAD bdi_eoi bdi_eot key_valid key_ready key cmd_valid cmd_ready Required Optional bdi_type decrypt key_update CipherCore key key_valid key_ready bdi bdi_ready bdi_valid bdi_valid bdi_ready bdi_partial bdi_eot bdi_eoi bdi_type decrypt_in key_update bdo bdo_valid bdo_ready din_valid din_ready dout_valid dout_ready dout Header/Tag FIFO cmd_valid cmd_ready din end_of_block bdo_ready bdo_valid bdo end_of_block cmd cmd bdi_partial bdi_size bdi_size bdi_valid_bytes bdi_valid_bytes bdi_pad_loc bdi_pad_loc bdo_valid_bytes bdo_valid_bytes msg_auth_ready msg_auth_ready msg_auth_valid msg_auth_valid msg_auth msg_auth bdo_type bdo_type sw Protocol: Instruction LSB Status Opcode or 4 12 MSB Reserved 16-bit Instruction with w=16 LSB 4 2 Reserved Reserved Opcode Status or MSB 4 16-bit Instruction with w=8 Rd-Inst Rd-Rsvd Rd-Hdr w=16, 32 w=8 States for Processing Instruction Protocol: Segment Header EOI 8 1 1 1 1 4 16 Reserved Info Segment Length MSB Type Segment LSB EOT Partial Last 8 32-bit Header (LSB) 4 Info Reserved MSB LSB Segment Length (MSB) Segment Length 8 Segment Length 8 2 Info Reserved MSB LSB 8 With w= 8, and 16 Rd-Hdr Rd-Rsvd Seglen1 SegLen Seglen2 Ld-Data w=8 w=16 w= 32 States for Processing Header Case Study 1. Determine overhead of CAESAR LW package. I Ketje-Sr implementation with integrated support of CAESAR API. I Ketje-Sr implementation using new CAESAR lightweight development package. 2. Determine overhead of CAESAR LW package vs HS package. I Implementation of Ascon using CAESAR LW package. I Using existing Ascon HS implementation. Ketje-Sr I Ketje [3] is based on round reduced Keccak-f called MonkeyWrap. I Has four variants Ketje-Jr, Ketje-Sr, Ketje-Minor, and Ketje-Major which use Keccak-p * [200], Keccak-p * [400], Keccak-p * [800], and Keccak-p * [1600] respectively. I Each round of Keccak-p * consists of five steps θ, ρ, π, χ, and ι. I In θ step, each bit in the state is Xored with two other bits from two different columns. I The state bits are rotated for each lane using one of the 25 different offsets in ρ step I Lanes are rearranged in π , integer multiplication in χ. I The last step is ι, where a round constant is added. Ketje-Sr Datapath I We implemented a Ketje-Sr using a 16-bit datapath and interface. I Datapath is the same for integrated CAESAR API support and using CAESAR LW package. I State is stored in a dual-port memory (RAM) with one Keypack <<<1 Port-B Port-A RAM Rho 16 RAMK2 (LSB) 8 15 RAMK1 (MSB) 0 7 reg-K 15 8 7 0 Padding reg-A rcon 16 16 16 sdi_data pdi_data do_data read/write and one read-only ports. I To reduce the complexity of padding for key, the key size is fixed to 128-bits. I Two memory units (RAMK1, and RAMK2) with pre-stored values and a register (reg-K) for key storage and KeyPack operations. I Padding for message and AD using multiplexers. I Needs 160 clock cycles to process a 32-bit block. I TP = 32 160 · F Ascon I Ascon[4] is a permutation based authenticated cipher. I Ascon-128, and Ascon-128a - two variants with block sizes of 64 and 128 respectively. I In each round, three sub transformations called constant-addition, substitution, and linear diffusion are applied. I Constant-addition is the first operation in the round, where a constant is added to one of the five words. Twelve round constants are used. I Substitution layer uses 5x5 S-boxes. I Linear diffusion layer for diffusion across each of the five 64-bit words using circular shifts and an XOR. Ascon Datapath I A 64-bit datapath is used in this design and a 32-bit interface. I The state is stored in a dual-port RAM. I Key is stored in a RAM. I The substitution layer is implemented in a bit-slice fashion. I Due to the contention on the RAM ports, 18 operations take 33 clock cycles. 31 0 63 32 32 Padding 31 0 RAMK 63 32 R1 32 32 64 32 32 pdi_data sdi_data LDiff 63 1 0 0 0 0 63 8 rcon 7 0 do_data 64 1 0 0 64 Port-B Port-A RAM I The round constants are generated using two 4 bit registers and adders. I Two 5-to-1 multiplexers are used to perform circular shifts in linear diffusion step (LDiff). I TP = 128 33·8 · f Case Study 1: Integrated vs. LW Package Implementation Results on Xilinx Spartan-6 FPGA using ATHENa [5] Flip- Freq TP TP/Area Design Slices LUTs Flops [MHz] [Mbps] [Mbps/slice] KETJE-SR 1 140 436 98 122.4 24.48 0.17 KETJE-SR 2 155 450 114 120.1 24.03 0.16 Overhead 15 14 16 ASCON-128 2 231 684 268 216.0 60.10 0.26 ASCON-128a 2 231 684 268 216.0 119.16 0.52 Joltik [6] 3 168 534 381 200.0 426.67 2.54 ACORN [6] 4 202 540 383 231.6 1,852.80 9.17 1 Dedicated CAESAR API; 2 CAESAR LW Package; 3 Not compliant to CAESAR API; 4 Tweaked CAESAR HS Package I Using CAESAR LW Package leads to a small area increase. I Three separate counters for sdi, pdi and do buses are used for simplicity and parallel operation. I Counter for sdi can be dropped if cipher core provides end_of_key signal. I Comparing our designs which each other and other reported implementations. I Ascon-128a has 4 times the TP while consuming only 50% more slices. I Joltik implementation is not compliant with CAESAR API but performs significantly better. I ACORN is based on a stream cipher which typically perform very well in lightweight implementations. Case Study 2: Area Overhead HS vs. LW Pkg. Area Overhead High-Speed (HS) vs. LightWeight (LW) Packages Implementation Results on Xilinx Spartan-6 FPGA using ATHENa [5] Design Top-level Slices LUTs Filp-Flops AEAD 1 231 684 268 LW Ascon CipherCore 196 606 212 Overhead 35 78 56 AEAD 2 416 1282 792 HS Ascon [6] CipherCore 379 1033 529 Overhead 37 249 263 1 CAESAR LW Package; 2 CAESAR HS Package I Adding CAESAR API support to I LW core using LW Package leads to a small area increase, I HS core using HS Package leads to a larger area increase. Conclusions I The graph shows implementation results of Ketje-Sr on Spartan-6. I Using the CAESAR LW Package leads to a small area increase over integrated designs. I This small increase can easily be mitigated. Slices LUTs FFs 0 50 100 150 200 250 300 350 400 450 500 Case Study 1 Integrated LW Package I The graph shows the overhead incurred for implementations of Ascon on Spartan-6. I CEASAR HS Package leads to a much larger area increase than the LW Package as it expands the data and key buses to the full block size. Slices LUTs FFs 0 50 100 150 200 250 300 Case Study 2 LW Overhead HS Overhead I CAESAR LW Package allows for bus widths of 8 and 16 bits, which are not currently supported by CAESAR HS Package. I The CAESAR LW-Package reduces the design time for LW implementations. I The CAESAR LW Package will be included in the next release of the Development Package for the CAESAR Hardware API. I The usage will be documented in the next release of the Implementer’s Guide to the CAESAR Hardware API. Acknowledgment The CAESAR Lightweight API Sup- port Package was developed in collab- oration with Fabrizio De Santis and Michael Tempelmeier from References [1] E. Homsirikamol, W. Diehl, A. Ferozpuri, F. Farahmand, P. Yalla, J.-P. Kaps, and K. Gaj, “CAESAR hardware API,” Cryptology ePrint Archive, Report 2016/626, 2016, http://eprint.iacr.org/2016/626. [2] “Development package for the CAESAR hardware APIv1.2,” https://cryptography.gmu.edu/athena/AEAD/GMU_AEAD_HW_API_v1_2.zip, accessed: 2017-06-30. [3] G. Bertoni, J. Daemen, M. Peeters, G. Van Assche, and R. Van Keer, “CAESAR submission:Ketje v2,” Submission to CAESAR (Round3), September 2016, https://competitions.cr.yp.to/round3/ketjev2.pdf . [4] C. Dobraunig, M. Eichlseder, F. Mendel, and M. Schläffer, “ASCON v1.2,” Submission to CAESAR (Round3), September 2016. [5] K. Gaj, J.-P. Kaps, V. Amirineni, M. Rogawski, E. Homsirikamol, and B. Y. Brewster, “ATHENa – automated tool for hardware evaluation: Toward fair and comprehensive benchmarking of cryptographic hardware using FPGAs,” in 20th International Conference on Field Programmable Logic and Applications - FPL 2010. IEEE, 2010, pp. 414–421, winner of the FPL Community Award. [6] “ATHENa database of FPGA results for authenticated ciphers,” https://cryptography.gmu.edu/athenadb/fpga_auth_cipher/table_view, accessed: 2017-07-30. Cryptographic Engineering Research Group (CERG) Department of Electrical and Computer Engineering George Mason University http://cryptography.gmu.edu

Transcript of Evaluation of the CAESAR Hardware API for Lightweight 0 ... · Evaluation of the CAESAR Hardware...

Page 1: Evaluation of the CAESAR Hardware API for Lightweight 0 ... · Evaluation of the CAESAR Hardware API for Lightweight ... Rd-Hdr w=16, 32 w=8 ... Evaluation of the CAESAR Hardware

Evaluation of the CAESAR Hardware API for LightweightImplementations

Panasayya Yalla, Jens-Peter KapsDepartment of Electrical and Computer Engineering, George Mason University, Fairfax, Virginia 22030, USA

AbstractThe Competition for Authenticated Encryption: Security, Applicability,and Robustness (CAESAR) requires that all hardware implementationsof candidate algorithms adhere to the CAESAR Hardware API [1]. TheCAESAR Hardware API is supported by a development package whichincludes VHDL code for universal pre- and post-processors forhigh-speed and recently also for lightweight implementations. Theseprocessors are designed to make a cipher core compliant with the API.In this work we verify that the lightweight package has a smaller areafootprint than the high-speed package. We also show that the overheadof using the generic lightweight pre- and post-processors over integratingtheir functionality into the cipher core is negligible. As part of these casestudies, we have developed the first lightweight implementations ofKetje-Sr, Ascon-128, and Ascon-128a.

Introduction and MotivationI CAESAR evaluates candidates for a final portfolio of newAuthenticated Encryption with Associated Data (AEAD) algorithms.

I All candidates must adhere to the CAESAR hardware (HW)Application Programming Interface (API).

I The HW API is one component which enables a fair comparisonamong algorithms.I Independent FIFO inputs for public data (PDI) and secret data (SDI) andFIFO output (DO).

I In-band signaling for commands and data types using a simple protocol.I CAESAR HW API is supported by an implementer’s guide anddevelopment package [2].I Includes VHDL code for high-speed (HS) and lightweight (LW)implementations.

I Pre- and PostProcessor separate protocol from cryptographic algorithm.I Bypass FIFO stores and passes header information to PostProcessor.

I It is generally assumed that having generic pre-and post-processorsincreases the area consumption over merging their functionality withthe cipher cores.

Differences between HS vs. LW Packages

High-SpeedI Supports bus width32 ≤ w ≤ 256 in multiples of 8.

I PreProcessor expands PDI andSDI data to full block size forCipherCore.

I PreProcessor stores one block ofPDI and SDI data.

I PreProcessor contains universalpadding unit.

I Tag comparison has to beperformed in CipherCore.

LightweightI Supports bus width w of 8, 16,and 32.

I PreProcessor, CipherCore,Bypass FIFO, and PostProcessorhave equal bus width.

I PreProcessor has no datastorage.

I Assumes padding is performed inCipherCore.

I PostProcessor supports tagcomparison.

CAESAR High-Speed Block Diagram

24 24

sdi_valid

sdi_readysdi_ready

sdi_valid

sdi_data sdi_datasw

DBLK_SIZEDatapath

CipherCore

DBLK_SIZE DBLK_SIZE

din_valid

din_ready

din FIFO

CMD

dout

dout_ready

dout_valid

key_update

bdi_eot

bdi_eoi

bdi_type

bdi_ready

3

bdi_valid

key_update

bdi_eot

bdi_eoi

bdi_type

bdi_ready

bdo_size

bdo_ready

Controller

CipherCorebdi_valid bdo_valid

bdo_size

bdo_ready

bdo_valid

key_valid

key_ready

key_valid

key_ready

LBS_BYTES+1

decrypt decrypt

bdi_valid_bytes

bdi_pad_loc

DBLK_SIZE/8

DBLK_SIZE/8

bdi_size

bdi_pad_loc

bdi_valid_bytes

bdi_sizeLBS_BYTES+1

CipherCore

pdi_valid

pdi_readypdi_ready

pdi_valid

OptionalRequired

do_valid do_valid

pdi_data

do_datado_data

pdi_dataw

w

cm

d_valid

cm

d_re

ady

cm

d

cm

d_va

lid

cm

d_

read

y

cm

d

bdi

key

bdo

bdi

key

DBLK_SIZEbdo

KEY_SIZE

AEAD_TP

fdo_valid

fdi_

ready

dout_

valid

dout_

ready

din

_valid

din

_re

ady

dout

din

bdi_partialbdi_partial

fdi_

data

fdo_data

fdi_

valid

fdo_re

ady

Two−Pass

FIFO

msg_auth_valid msg_auth_valid

msg_authmsg_auth

msg_auth_ready msg_auth_ready

Processor

Pre

Processor

Post

do_ready do_ready

CAESAR Lightweight Block Diagram

sdi_valid

sdi_readysdi_ready

sdi_valid

sdi_data sdi_data

Processor

Pre

sw

w

w/8

w/8

w/8+1

w w

w

4

Processor

Post

w/8

Tag

Comparator

w

pdi_valid

pdi_readypdi_ready

pdi_valid

pdi_data pdi_data

do_ready do_ready

do_valid do_valid

do_datado_dataw

4

bdi

AEAD

bdi_eoi

bdi_eot

key_valid

key_ready

key

cmd_valid

cmd_ready

Required Optional

bdi_type

decrypt

key_update

CipherCore

key

key_valid

key_ready

bdi

bdi_ready

bdi_valid bdi_valid

bdi_ready

bdi_partial

bdi_eot

bdi_eoi

bdi_type

decrypt_in

key_update

bdo

bdo_valid

bdo_ready

din_valid

din_ready

dout_valid

dout_ready

doutHeader/Tag

FIFO cmd_valid

cmd_ready

din

end_of_block

bdo_ready

bdo_valid

bdo

end_of_block

cmd cmd

bdi_partial

bdi_size bdi_size

bdi_valid_bytes bdi_valid_bytes

bdi_pad_locbdi_pad_loc bdo_valid_bytes bdo_valid_bytes

msg_auth_readymsg_auth_ready

msg_auth_valid msg_auth_valid

msg_authmsg_auth

bdo_type bdo_type

sw

Protocol: InstructionLSB

Status

Opcodeor

4 12

MSB

Reserved

16-bit Instruction with w=16LSB

4

2

Reserved

Reserved

Opcode

Statusor

MSB

4

16-bit Instruction with w=8

Rd−Inst Rd−Rsvd

Rd−Hdr w=16, 32w=8

States for ProcessingInstruction

Protocol: Segment Header

EOI

8

1 1 1 14

16

ReservedInfo Segment Length

MSB

TypeSegment

LSB

EOT

Partial

Last

8

32-bit Header

(LSB)

4

Info

Reserved

MSB LSB

Segment Length(MSB)

Segment Length

8

Segment Length

8

2

Info Reserved

MSB LSB

8

With w= 8, and 16

Rd−Hdr Rd−Rsvd

Seglen1SegLen

Seglen2Ld−Data

w=8w=16

w= 32

States forProcessing Header

Case Study1. Determine overhead of CAESAR LW package.

I Ketje-Sr implementation with integrated support of CAESAR API.I Ketje-Sr implementation using new CAESAR lightweight developmentpackage.

2. Determine overhead of CAESAR LW package vs HS package.I Implementation of Ascon using CAESAR LW package.I Using existing Ascon HS implementation.

Ketje-SrI Ketje [3] is based on round reduced Keccak-f called MonkeyWrap.I Has four variants Ketje-Jr, Ketje-Sr, Ketje-Minor, and Ketje-Majorwhich use Keccak-p∗[200], Keccak-p∗[400], Keccak-p∗[800], andKeccak-p∗[1600] respectively.

I Each round of Keccak-p∗ consists of five steps θ, ρ, π, χ, and ι.I In θ step, each bit in the state is Xored with two other bits fromtwo different columns.

I The state bits are rotated for each lane using one of the 25 differentoffsets in ρ step

I Lanes are rearranged in π, integer multiplication in χ.I The last step is ι, where a round constant is added.

Ketje-Sr Datapath

I We implemented aKetje-Sr using a 16-bitdatapath and interface.

I Datapath is the samefor integrated CAESARAPI support and usingCAESAR LW package.

I State is stored in adual-port memory(RAM) with one

Keypack

<<<1

Port−B

Port−A

RAM

Rho

16

RAMK2(LSB)

8

15

RAMK1(MSB)

0

7 reg−K

15

8

7

0

Padding

reg−Arcon16

16

16

sdi_data

pdi_data

do_data

read/write and one read-only ports.I To reduce the complexity of padding for key, the key size is fixed to128-bits.

I Two memory units (RAMK1, and RAMK2) with pre-stored valuesand a register (reg-K) for key storage and KeyPack operations.

I Padding for message and AD using multiplexers.I Needs 160 clock cycles to process a 32-bit block.I TP = 32

160 · F

AsconI Ascon[4] is a permutation based authenticated cipher.I Ascon-128, and Ascon-128a - two variants with block sizes of 64and 128 respectively.

I In each round, three sub transformations called constant-addition,substitution, and linear diffusion are applied.

I Constant-addition is the first operation in the round, where aconstant is added to one of the five words. Twelve round constantsare used.

I Substitution layer uses 5x5 S-boxes.I Linear diffusion layer for diffusion across each of the five 64-bitwords using circular shifts and an XOR.

Ascon Datapath

I A 64-bit datapath isused in this design anda 32-bit interface.

I The state is stored in adual-port RAM.

I Key is stored in a RAM.I The substitution layeris implemented in abit-slice fashion.

I Due to the contentionon the RAM ports, 18operations take 33clock cycles.

31

063

32

32

Padding

31

0 RAMK

63

32

R1

32

32

64

32

32

pdi_data

sdi_data

LDiff

63

10

00

0

63

8

rcon7

0

do_data

64

1

0

0

64

Port−B

Port−A

RAM

I The round constants are generated using two 4 bit registers andadders.

I Two 5-to-1 multiplexers are used to perform circular shifts in lineardiffusion step (LDiff).

I TP = 12833·8 · f

Case Study 1: Integrated vs. LW Package

Implementation Results on Xilinx Spartan-6 FPGA using ATHENa [5]

Flip- Freq TP TP/AreaDesign Slices LUTs Flops [MHz] [Mbps] [Mbps/slice]

KETJE-SR1 140 436 98 122.4 24.48 0.17KETJE-SR2 155 450 114 120.1 24.03 0.16Overhead 15 14 16ASCON-1282 231 684 268 216.0 60.10 0.26ASCON-128a2 231 684 268 216.0 119.16 0.52Joltik [6]3 168 534 381 200.0 426.67 2.54ACORN [6]4 202 540 383 231.6 1,852.80 9.17

1 ⇒ Dedicated CAESAR API; 2 ⇒ CAESAR LW Package; 3 ⇒ Not compliant to CAESAR API;4 ⇒ Tweaked CAESAR HS Package

I Using CAESAR LW Package leads to a small area increase.I Three separate counters for sdi, pdi and do buses are used for simplicity andparallel operation.

I Counter for sdi can be dropped if cipher core provides end_of_key signal.I Comparing our designs which each other and other reportedimplementations.I Ascon-128a has 4 times the TP while consuming only 50% more slices.I Joltik implementation is not compliant with CAESAR API but performssignificantly better.

I ACORN is based on a stream cipher which typically perform very well inlightweight implementations.

Case Study 2: Area Overhead HS vs. LW Pkg.

Area Overhead High-Speed (HS) vs. LightWeight (LW) PackagesImplementation Results on Xilinx Spartan-6 FPGA using ATHENa [5]

Design Top-level Slices LUTs Filp-FlopsAEAD1 231 684 268

LW Ascon CipherCore 196 606 212Overhead 35 78 56

AEAD2 416 1282 792HS Ascon [6] CipherCore 379 1033 529Overhead 37 249 263

1 ⇒ CAESAR LW Package; 2 ⇒ CAESAR HS Package

I Adding CAESAR API support toI LW core using LW Package leads to a small area increase,I HS core using HS Package leads to a larger area increase.

ConclusionsI The graph showsimplementation results ofKetje-Sr on Spartan-6.

I Using the CAESAR LWPackage leads to a small areaincrease over integrateddesigns.

I This small increase can easilybe mitigated. Slices LUTs FFs

050

100150200250300350400450500

Case Study 1

IntegratedLW Package

I The graph shows the overheadincurred for implementationsof Ascon on Spartan-6.

I CEASAR HS Package leads toa much larger area increasethan the LW Package as itexpands the data and keybuses to the full block size.

Slices LUTs FFs 0

50

100

150

200

250

300Case Study 2

LW OverheadHS Overhead

I CAESAR LW Package allows for bus widths of 8 and 16 bits, which arenot currently supported by CAESAR HS Package.

I The CAESAR LW-Package reduces the design time for LWimplementations.

I The CAESAR LW Package will be included in the next release of theDevelopment Package for the CAESAR Hardware API.

I The usage will be documented in the next release of the Implementer’sGuide to the CAESAR Hardware API.

AcknowledgmentThe CAESAR Lightweight API Sup-port Package was developed in collab-oration with Fabrizio De Santis andMichael Tempelmeier from

References[1] E. Homsirikamol, W. Diehl, A. Ferozpuri, F. Farahmand, P. Yalla, J.-P. Kaps, and K. Gaj, “CAESAR

hardware API,” Cryptology ePrint Archive, Report 2016/626, 2016, http://eprint.iacr.org/2016/626.[2] “Development package for the CAESAR hardware APIv1.2,”

https://cryptography.gmu.edu/athena/AEAD/GMU_AEAD_HW_API_v1_2.zip, accessed:2017-06-30.

[3] G. Bertoni, J. Daemen, M. Peeters, G. Van Assche, and R. Van Keer, “CAESAR submission:Ketje v2,”Submission to CAESAR (Round3), September 2016,https://competitions.cr.yp.to/round3/ketjev2.pdf.

[4] C. Dobraunig, M. Eichlseder, F. Mendel, and M. Schläffer, “ASCON v1.2,” Submission to CAESAR(Round3), September 2016.

[5] K. Gaj, J.-P. Kaps, V. Amirineni, M. Rogawski, E. Homsirikamol, and B. Y. Brewster, “ATHENa –automated tool for hardware evaluation: Toward fair and comprehensive benchmarking of cryptographichardware using FPGAs,” in 20th International Conference on Field Programmable Logic andApplications - FPL 2010. IEEE, 2010, pp. 414–421, winner of the FPL Community Award.

[6] “ATHENa database of FPGA results for authenticated ciphers,”https://cryptography.gmu.edu/athenadb/fpga_auth_cipher/table_view, accessed: 2017-07-30.

Cryptographic Engineering Research Group (CERG) Department of Electrical and Computer Engineering George Mason University http://cryptography.gmu.edu