Vivado HLS Implementation of Round-2 SHA-3 Candidates

21
1 Vivado HLS Implementation of Round-2 SHA-3 Candidates Farnoud Farahmand ECE 646 CERG: http://cryptography.gmu.edu

Transcript of Vivado HLS Implementation of Round-2 SHA-3 Candidates

Page 1: Vivado HLS Implementation of Round-2 SHA-3 Candidates

1

Vivado HLS Implementation of Round-2 SHA-3 Candidates

Farnoud FarahmandECE 646

CERG: http://cryptography.gmu.edu

Page 2: Vivado HLS Implementation of Round-2 SHA-3 Candidates

2

Cryptographic Standard Contests

time97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17

AES

NESSIE

CRYPTREC

eSTREAM

SHA-3

34 stream 4 HW winnersciphers → + 4 SW winners

51 hash functions → 1 winner

15 block ciphers → 1 winnerIX.1997 X.2000

I.2000 XII.2002

IV.2008

X.2007 X.2012

XI.2004

CAESARI.2013

57 authenticated ciphers → multiple winnersXII.2017

Page 3: Vivado HLS Implementation of Round-2 SHA-3 Candidates

3

Evaluation Criteria

Security

Software Efficiency Hardware Efficiency

Simplicity

FPGAs ASICs

Flexibility Licensing

µProcessors µControllers

Page 4: Vivado HLS Implementation of Round-2 SHA-3 Candidates

4

Traditional Development & Benchmarking Flow

ManualDesign

HDLCode

Manual OptimizationFPGATools

Netlist

PostPlace&Route

Results

Functional Verification

Timing Verification

InformalSpecification TestVectors

Page 5: Vivado HLS Implementation of Round-2 SHA-3 Candidates

5

• Large number of candidates• Long time necessary to develop and verify

RTL (Register-Transfer Level)Hardware Description Language (HDL) codes

• Multiple variants of algorithms (e.g., multiple hash value size)

• Multiple hardware architectures • Dependence on skills of designers

Remaining Difficulties of Hardware Benchmarking

Page 6: Vivado HLS Implementation of Round-2 SHA-3 Candidates

6

High-Level Synthesis (HLS)

High Level Language(e.g. C, C++, SystemC)

Hardware Description Language(e.g., VHDL or Verilog)

High-Level Synthesis

Page 7: Vivado HLS Implementation of Round-2 SHA-3 Candidates

7

High-Level Synthesis

HDLCode

Automated OptimizationFPGATools

Netlist

PostPlace&Route

Results

Functional Verification

Timing Verification

ReferenceImplementationinC

TestVectorsManual Modifications

(pragmas, tweaks)

HLS-readyCcode

Proposed HLS-Based Development and Benchmarking Flow

SoftwareVerification

Page 8: Vivado HLS Implementation of Round-2 SHA-3 Candidates

8

Examples of Source Code Modifications (1)

for (i = 0; i < 16; i++) #pragma HLS UNROLL

b[i] = a[i]+4;

Unrolling of loops:

void main() { int A[16];#pragma HLS ARRAY_RESHAPE Variable=A complete dim=1

...}

Array Reshaping:

Page 9: Vivado HLS Implementation of Round-2 SHA-3 Candidates

9

Examples of Source Code Modifications (2)

Function Reuse:

Page 10: Vivado HLS Implementation of Round-2 SHA-3 Candidates

10

• 4 Round 2 SHA-3 candidates + 5 Finalists are done previously • Basic iterative architecture• GMU Hardware Interface for SHA core• Padding done in software• RTL Implementations developed previously • Starting point: Informal specifications and reference software

implementations in C provided by the algorithm authors• Post P&R results generated for

- Xilinx Virtex 5 using Xilinx ISE • No use of BRAMs or DSP Units

Our Test Case

Page 11: Vivado HLS Implementation of Round-2 SHA-3 Candidates

11

Parameters of Hash Functions

Algorithm Number of Rounds

Block Size Hash size Chain Size

Luffa 8 256 256 768Cubehash 16 256 256 1024BMW 1 512 256 512ECHO 8 1536 256 512

Page 12: Vivado HLS Implementation of Round-2 SHA-3 Candidates

12

SHA core Interface

Page 13: Vivado HLS Implementation of Round-2 SHA-3 Candidates

13

HLS and RTL Implementations Parameters

Algorithm #Rounds Cycles/BlockRTL

Cycles/BlockHLS

IntervalHLS

Luffa 8 9 10 11Cubehash 16 16 17 18BMW 1 2 1 1ECHO 8 26 25 26

Interval = Number of Cycles require to processtwo consecutives blocks

Page 14: Vivado HLS Implementation of Round-2 SHA-3 Candidates

14

Datapath vs. Control Unit

Datapath ControlUnit

Data Inputs

Data Outputs

Control Inputs

Control Outputs

Control Signals

StatusSignals

Determines• Area• Clock Frequency

Determines• Number of clock cycles

Page 15: Vivado HLS Implementation of Round-2 SHA-3 Candidates

15

Control Unit suboptimal• Difficulty in inferring an overlap between completing the last

round and reading the next input block• One additional clock cycle used for initialization of the state at

the beginning of each round

Encountered Problems

Page 16: Vivado HLS Implementation of Round-2 SHA-3 Candidates

16

0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

HLS RTL

Area

Cubehash Luffa BMW ECHO

RTL vs. HLS (Area):

Page 17: Vivado HLS Implementation of Round-2 SHA-3 Candidates

17

0

2,000

4,000

6,000

8,000

10,000

12,000

14,000

HLS RTL

Throughput

Cubehash BMW Luffa ECHO

RTL vs. HLS (Throughput):

Page 18: Vivado HLS Implementation of Round-2 SHA-3 Candidates

18

Throughput vs. Area:

0

2,000

4,000

6,000

8,000

10,000

12,000

14,000

0 1,000 2,000 3,000 4,000 5,000 6,000 7,000

Thro

ughp

ut

Area

HLSLuffaCubeHashBMWECHO

0

2,000

4,000

6,000

8,000

10,000

12,000

14,000

0 1,000 2,000 3,000 4,000 5,000 6,000 7,000

Thro

ughp

utArea

RTLLuffaCubeHashBMWECHO

Page 19: Vivado HLS Implementation of Round-2 SHA-3 Candidates

19

Overall Results:

HLS RTLFreq. TP A TP/A Freq TP A TP/A

Luffa 217.39 5,565 1,390 4.00 334.4 9,513 1,023 9.29Cubehash 168.71 2,399 1,076 2.23 248.3 3,972 663 5.99BMW 9.45 4,838 4,859 0.99 34 8,704 4,350 2.00ECHO 145.77 8,612 6,389 1.34 203.8 12,094 4,888 2.47

RTL/HLSFreq. TP A TP/A

Luffa 1.53 1.70 0.73 2.32Cube Hash 1.47 1.65 0.61 2.68BMW 3.59 1.79 0.89 2.00ECHO 1.39 1.40 0.76 1.83

Page 20: Vivado HLS Implementation of Round-2 SHA-3 Candidates

20

Conclusions

• High-level synthesis offers a potential to facilitate hardware benchmarking during the design of cryptographic algorithms and at the early stages of cryptographic contests

• Case study based on 4 Round 2 SHA-3 candidates demonstrated correct ranking for all of candidates using all major performance metrics

• More research & development needed to overcome remaining difficulties

• Suboptimal control unit of HLS implementations• Wide range of RTL to HLS performance metric ratios• A few potentially suboptimal HLS or RTL implementations• Efficient and reliable generation of HLS-ready C codes

Page 21: Vivado HLS Implementation of Round-2 SHA-3 Candidates

21

Comments? Questions?

Suggestions?

CERG: http://cryptography.gmu.edu

Thank you!