Hardware Accelerators for ECC and HECC -...
Transcript of Hardware Accelerators for ECC and HECC -...
Hardware Accelerators for ECC and HECC
Arnaud Tisserand
CNRS, IRISA laboratory, CAIRN research team
ECCBordeaux
Sep. 29–30, 2015
Summary
• Introduction
• Accelerator architecture and units
• Accelerator programming
• Implementation results: comparison ECC vs HECC on FPGA
• Conclusion & current/future works
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 2/35
Current Projects on (H)ECC AcceleratorsPAVOIS project 2012–2016
Arithmetic Protections Against PhysicalAttacks for Elliptic Curve based Cryptography
• IRISA (Lannion)
• LIRMM (Perpignan, Montpellier & Toulon)
http://pavois.irisa.fr/
ANR 12 BS02 002
HAH project 2014–2017
Hardware and Arithmetic for HyperellipticCurves Cryptography
• IRISA (Lannion)
• IRMAR (Rennes)
http://h-a-h.inria.fr/
Labex
and
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 3/35
Introduction
encryption
signature
etc
pro
toco
lle
vel
[k]P
ADD(P,Q) DBL(P)
P + Pcurv
ele
vel
x±y x×y . . .
fiel
dle
vel
Scalar multiplication operationfor i from 0 to t − 1 do
if ki = 1 then Q = ADD(P,Q)P = DBL(P)
Point addition/doubling operationssequence of finite field operationsDBL: v1 = z2
1 , v2 = x1 − v1, . . .ADD: w1 = z2
1 ,w2 = z1 × w1, . . .
Fp or F2m operationsoperation modulo large prime (Fp)or irreducible polynomial (F2m )
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 4/35
Introduction
encryption
signature
etc
pro
toco
lle
vel
[k]P
ADD(P,Q) DBL(P)
P + Pcurv
ele
vel
x±y x×y . . .
fiel
dle
vel
E : y2 = x3 + 4x + 20 over GF(1009)
points: P, Q= (x , y) or (x , y , z) or . . .
Scalar multiplication operationfor i from 0 to t − 1 do
if ki = 1 then Q = ADD(P,Q)P = DBL(P)
Point addition/doubling operationssequence of finite field operationsDBL: v1 = z2
1 , v2 = x1 − v1, . . .ADD: w1 = z2
1 ,w2 = z1 × w1, . . .
Fp or F2m operationsoperation modulo large prime (Fp)or irreducible polynomial (F2m )
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 4/35
Introduction
encryption
signature
etc
pro
toco
lle
vel
[k]P
ADD(P,Q) DBL(P)
P + Pcurv
ele
vel
x±y x×y . . .
fiel
dle
vel
E : y2 = x3 + 4x + 20 over GF(1009)
points: P, Q= (x , y) or (x , y , z) or . . .
coordinates: x , y , z ∈ GF(·)Fp, F2m , t : 80–600 bits
k = (kt−1kt−2 . . . k1k0)2 ∈ N
Scalar multiplication operationfor i from 0 to t − 1 do
if ki = 1 then Q = ADD(P,Q)P = DBL(P)
Point addition/doubling operationssequence of finite field operationsDBL: v1 = z2
1 , v2 = x1 − v1, . . .ADD: w1 = z2
1 ,w2 = z1 × w1, . . .
Fp or F2m operationsoperation modulo large prime (Fp)or irreducible polynomial (F2m )
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 4/35
Introduction
encryption
signature
etc
pro
toco
lle
vel
[k]P
ADD(P,Q) DBL(P)
P + Pcurv
ele
vel
x±y x×y . . .
fiel
dle
vel
E : y2 = x3 + 4x + 20 over GF(1009)
points: P, Q= (x , y) or (x , y , z) or . . .
coordinates: x , y , z ∈ GF(·)Fp, F2m , t : 80–600 bits
k = (kt−1kt−2 . . . k1k0)2 ∈ N
Scalar multiplication operationfor i from 0 to t − 1 do
if ki = 1 then Q = ADD(P,Q)P = DBL(P)
Point addition/doubling operationssequence of finite field operationsDBL: v1 = z2
1 , v2 = x1 − v1, . . .ADD: w1 = z2
1 ,w2 = z1 × w1, . . .
Fp or F2m operationsoperation modulo large prime (Fp)or irreducible polynomial (F2m )
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 4/35
Introduction
encryption
signature
etc
pro
toco
lle
vel
[k]P
ADD(P,Q) DBL(P)
P + Pcurv
ele
vel
x±y x×y . . .
fiel
dle
vel
E : y2 = x3 + 4x + 20 over GF(1009)
points: P, Q= (x , y) or (x , y , z) or . . .
coordinates: x , y , z ∈ GF(·)Fp, F2m , t : 80–600 bits
k = (kt−1kt−2 . . . k1k0)2 ∈ N
Scalar multiplication operationfor i from 0 to t − 1 do
if ki = 1 then Q = ADD(P,Q)P = DBL(P)
Point addition/doubling operationssequence of finite field operationsDBL: v1 = z2
1 , v2 = x1 − v1, . . .ADD: w1 = z2
1 ,w2 = z1 × w1, . . .
Fp or F2m operationsoperation modulo large prime (Fp)or irreducible polynomial (F2m )
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 4/35
Introduction
encryption
signature
etc
pro
toco
lle
vel
[k]P
ADD(P,Q) DBL(P)
P + Pcurv
ele
vel
x±y x×y . . .
fiel
dle
vel
E : y2 = x3 + 4x + 20 over GF(1009)
points: P, Q= (x , y) or (x , y , z) or . . .
coordinates: x , y , z ∈ GF(·)Fp, F2m , t : 80–600 bits
k = (kt−1kt−2 . . . k1k0)2 ∈ N
Scalar multiplication operationfor i from 0 to t − 1 do
if ki = 1 then Q = ADD(P,Q)P = DBL(P)
Point addition/doubling operationssequence of finite field operationsDBL: v1 = z2
1 , v2 = x1 − v1, . . .ADD: w1 = z2
1 ,w2 = z1 × w1, . . .
Fp or F2m operationsoperation modulo large prime (Fp)or irreducible polynomial (F2m )
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 4/35
Side Channel Attacks
encryption
signature
etc
pro
toco
lle
vel
[k]P
ADD(P,Q) DBL(P)
curv
ele
vel
x±y x×y . . .
fiel
dle
vel
Scalar multiplication operationfor i from 0 to t − 1 do
if ki = 1 then Q = ADD(P,Q)P = DBL(P)
• simple power analysis (& variants)
• differential power analysis (& variants)
• horizontal/vertical/. . . attacks
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 5/35
Side Channel Attacks
encryption
signature
etc
pro
toco
lle
vel
[k]P
ADD(P,Q) DBL(P)
curv
ele
vel
x±y x×y . . .
fiel
dle
vel
Scalar multiplication operationfor i from 0 to t − 1 do
if ki = 1 then Q = ADD(P,Q)P = DBL(P)
• simple power analysis (& variants)
• differential power analysis (& variants)
• horizontal/vertical/. . . attacks
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 5/35
Side Channel Attacks
encryption
signature
etc
pro
toco
lle
vel
[k]P
ADD(P,Q) DBL(P)
curv
ele
vel
x±y x×y . . .
fiel
dle
vel
DBL DBL DBL DBL DBL DBL
Scalar multiplication operationfor i from 0 to t − 1 do
if ki = 1 then Q = ADD(P,Q)P = DBL(P)
• simple power analysis (& variants)
• differential power analysis (& variants)
• horizontal/vertical/. . . attacks
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 5/35
Side Channel Attacks
encryption
signature
etc
pro
toco
lle
vel
[k]P
ADD(P,Q) DBL(P)
curv
ele
vel
x±y x×y . . .
fiel
dle
vel
DBL DBL DBL DBL DBL DBLADD ADD
Scalar multiplication operationfor i from 0 to t − 1 do
if ki = 1 then Q = ADD(P,Q)P = DBL(P)
• simple power analysis (& variants)
• differential power analysis (& variants)
• horizontal/vertical/. . . attacks
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 5/35
Side Channel Attacks
encryption
signature
etc
pro
toco
lle
vel
[k]P
ADD(P,Q) DBL(P)
curv
ele
vel
x±y x×y . . .
fiel
dle
vel
DBL DBL DBL DBL DBL DBLADD ADD
0 0 0 1 1 0
Scalar multiplication operationfor i from 0 to t − 1 do
if ki = 1 then Q = ADD(P,Q)P = DBL(P)
• simple power analysis (& variants)
• differential power analysis (& variants)
• horizontal/vertical/. . . attacks
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 5/35
Side Channel Attacks
encryption
signature
etc
pro
toco
lle
vel
[k]P
ADD(P,Q) DBL(P)
curv
ele
vel
x±y x×y . . .
fiel
dle
vel
DBL DBL DBL DBL DBL DBLADD ADD
0 0 0 1 1 0
Scalar multiplication operationfor i from 0 to t − 1 do
if ki = 1 then Q = ADD(P,Q)P = DBL(P)
• simple power analysis (& variants)
• differential power analysis (& variants)
• horizontal/vertical/. . . attacks
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 5/35
Objectives of Our Research Group
• Study and implementation of efficient hardware supports:I Cryptography over (hyper)-elliptic curves (H)ECCI Operations over finite fields Fp & F2m and curve pointsI Hardware targets: FPGAs and ASICsI Flexibility programmable in software
• Study and implementation of protections against physical attacks:I Passive attacks: measure of power consumption, electromagnetic
radiations, timingsI Active attacks: fault injection (in progress)
• Levels: algorithm, representation, operator, architecture, circuit
• Trade-offs between: performance, cost (area/energy), security
• Study, development and distribution of an open source (H)ECCaccelerator and its programming tools
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 6/35
Accelerator Specifications
encryption
signature
etc
pro
toco
lle
vel
HW
SW
HW
[k]P
ADD(P,Q) DBL(P)
P + Pcurv
ele
vel
x±y x×y . . .
fiel
dle
vel
• Performances =⇒ hardware (HW)I dedicated functional unitsI internal parallelism
• Limited cost (embedded systems)I reduced silicon areaI low energy (& power consumption)I large area used at each clock cycle
• Flexibility =⇒ software (SW)I curves, algorithms, representations
(points/elements), k recoding, . . .I at design time / at run time
• Security against SCAs =⇒ HWI secure units (F2m , Fp)I secure key storage/managementI secure control
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 7/35
Accelerator Specifications
encryption
signature
etc
pro
toco
lle
vel
HW
SW
HW
[k]P
ADD(P,Q) DBL(P)
P + Pcurv
ele
vel
x±y x×y . . .
fiel
dle
vel
• Performances =⇒ hardware (HW)I dedicated functional unitsI internal parallelism
• Limited cost (embedded systems)I reduced silicon areaI low energy (& power consumption)I large area used at each clock cycle
• Flexibility =⇒ software (SW)I curves, algorithms, representations
(points/elements), k recoding, . . .I at design time / at run time
• Security against SCAs =⇒ HWI secure units (F2m , Fp)I secure key storage/managementI secure control
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 7/35
Accelerator Specifications
encryption
signature
etc
pro
toco
lle
vel
HW
SW
HW
[k]P
ADD(P,Q) DBL(P)
P + Pcurv
ele
vel
x±y x×y . . .
fiel
dle
vel
• Performances =⇒ hardware (HW)I dedicated functional unitsI internal parallelism
• Limited cost (embedded systems)I reduced silicon areaI low energy (& power consumption)I large area used at each clock cycle
• Flexibility =⇒ software (SW)I curves, algorithms, representations
(points/elements), k recoding, . . .I at design time / at run time
• Security against SCAs =⇒ HWI secure units (F2m , Fp)I secure key storage/managementI secure control
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 7/35
Accelerator Specifications
encryption
signature
etc
pro
toco
lle
vel
HW
SW
HW[k]P
ADD(P,Q) DBL(P)
P + Pcurv
ele
vel
x±y x×y . . .
fiel
dle
vel
• Performances =⇒ hardware (HW)I dedicated functional unitsI internal parallelism
• Limited cost (embedded systems)I reduced silicon areaI low energy (& power consumption)I large area used at each clock cycle
• Flexibility =⇒ software (SW)I curves, algorithms, representations
(points/elements), k recoding, . . .I at design time / at run time
• Security against SCAs =⇒ HWI secure units (F2m , Fp)I secure key storage/managementI secure control
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 7/35
Accelerator Architecture
exte
rnal
inte
rfac
e
accelerator
Data: w -bit (32, . . . , 128) except for k digits, control: a few bits per unit
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 8/35
Accelerator Architecture
exte
rnal
inte
rfac
e
accelerator
FU1 FU2 FU3
Data: w -bit (32, . . . , 128) except for k digits, control: a few bits per unit
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 8/35
Accelerator Architecture
exte
rnal
inte
rfac
e
accelerator
registerfile
FU1 FU2 FU3
Data: w -bit (32, . . . , 128) except for k digits, control: a few bits per unit
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 8/35
Accelerator Architecture
exte
rnal
inte
rfac
e
accelerator
key
mn
g.
registerfile
FU1 FU2 FU3
Data: w -bit (32, . . . , 128) except for k digits, control: a few bits per unit
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 8/35
Accelerator Architecture
exte
rnal
inte
rfac
e
accelerator
CTRL
key
mn
g.
registerfile
FU1 FU2 FU3
Data: w -bit (32, . . . , 128) except for k digits, control: a few bits per unit
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 8/35
Accelerator Architecture
exte
rnal
inte
rfac
e
accelerator
CTRL
codemem.
key
mn
g.
registerfile
FU1 FU2 FU3
Data: w -bit (32, . . . , 128) except for k digits, control: a few bits per unit
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 8/35
Accelerator Architecture
exte
rnal
inte
rfac
e
accelerator
interconnect
CTRL
codemem.
key
mn
g.
registerfile
FU1 FU2 FU3
Data: w -bit (32, . . . , 128) except for k digits, control: a few bits per unit
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 8/35
Accelerator Architecture
exte
rnal
inte
rfac
e
accelerator
interconnect
CTRL
codemem.
key
mn
g.
registerfile
FU1 FU2 FU3
Data: w -bit (32, . . . , 128) except for k digits, control: a few bits per unitA. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 8/35
Accelerator Architecture
exte
rnal
inte
rfac
eaccelerator
interconnect
CTRL
codemem.
key
mn
g.
registerfile
FU1 FU2 FU3
Data: w -bit (32, . . . , 128) except for k digits, control: a few bits per unitA. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 8/35
Functional Units for Field Level Operations
data (w bits)
control (few bits)
FUα
x [i ] y [i ] r [i ]
Notation: x [i ] is the i-th w -bit word of x ∈ Fq
Units:
• Fp: addition/subtraction, multiplication (2-step, Montgomery,variants), inversion
• F2m (polynomial basis, normal basis & variants): addition/subtraction,multiplication (Montgomery, Mastrovito, 2-step), square, inversion
Internal parameters: nb of sub-blocks, radix, pipelining scheme,countermeasure, mapping of local registers, output/input bypass, . . .
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 9/35
Register File (≈ Dual Port Memory)
x [i ] y [i ] r [i ]
field elements (size ≥ m bits)
word size (w bits)
Control signals: addresses (port A, port B), read/write, write enable
Specific addressing model for Fq elements (through an intermediate addresstable with hardware loop)
• linear addresses, SW: LOAD @x =⇒ HW: loop x [0], x [1], . . . x [`− 1]
• randomized addressesA. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 10/35
Key Management Unit
key
mn
g.
kkey
recoding
ki
CTRL
• On-the-fly recoding of k: binary, λ-NAF (λ ∈ {2, 3, 4, 5}), variants(fixed/sliding), double-base [1] and multiple-base [2] number systems(w/wo randomization), addition chains [12], other ?
• Specific private path in the interconnect (no key leaks in RF or FUs)
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 11/35
External Interface(s)
Under development:
• Basic (neither clock rate nor width adaptation)
• ARM Cortex cores in Zynq 7 FPGAs (through AXI bus)
• MicroBlaze softcore processor for Xilinx FPGAsI AXI bus (V6+)I PLB bus (V2 – V5)
• Specific for a “small” ASIC pad ring
Future development:
• NIOS softcore processor for Altera FPGAs
• LEON softcore processor (depending on internal demand)
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 12/35
Protected F2m Multipliers
Unprotected
0
50
100
150
200
250
0 100 200 300 400 500
#tr
an
sitio
ns
cycles
Mastrovito 233
200 225 250cycles
Protected
Overhead:Area/time < 10 %
References:PhD D. Pamula [8]Articles: [11], [10], [9]
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 13/35
Protected F2m Multipliers
Unprotected
0
50
100
150
200
250
0 100 200 300 400 500
#tr
an
sitio
ns
cycles
Mastrovito 233
200 225 250cycles
Protected
Overhead:Area/time < 10 %
References:PhD D. Pamula [8]Articles: [11], [10], [9]
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 13/35
Protected (Old) Accelerator for F2m
0 100 200 300
0 50 100 150 200 250 300 350
#tr
an
sit.
cycles
DBL operationMastrovitoUnprotectedActivity trace
0.000.020.040.060.08
cu
rre
nt
[mA
]
DBL operationMastrovitoUnprotectedCurrent measures
0 100 200 300
#tr
an
sit.
DBL operationMastrovitoProtectedActivity trace
0.000.040.080.120.16
cu
rre
nt
[mA
]
DBL operationMastrovitoProtectedCurrent measures
0 100 200 300
#tr
an
sit.
ADD operationMastrovitoProtectedActivity trace
Warning: old dedicated accelerator (similar behavior is expected for our new one)A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 14/35
Circuit-Level Protections for Arithmetic Operators
References: [4] and [3]
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 15/35
Units Impact on Side Channel Information (1/2)
Activity traces measured with CABA1 simulations for three configurationsof the multiplier (1,2,4 sub-blocks of 32 bits) and a very small accelerator
1 2 4
ADD
0
200
400
600
800
1000
1200
0 5000 10000 15000 20000 25000
activity [#tr
ansitio
ns]
time [clock cycles]
0
200
400
600
800
1000
1200
0 2000 4000 6000 8000 10000 12000 14000 16000
activity [#tr
ansitio
ns]
time [clock cycles]
0
200
400
600
800
1000
1200
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
activity [#tr
ansitio
ns]
time [clock cycles]
DBL
0
200
400
600
800
1000
1200
0 5000 10000 15000 20000 25000
activity [#tr
ansitio
ns]
time [clock cycles]
0
200
400
600
800
1000
1200
0 2000 4000 6000 8000 10000 12000 14000
activity [#tr
ansitio
ns]
time [clock cycles]
0
200
400
600
800
1000
1200
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
activity [#tr
ansitio
ns]
time [clock cycles]
1 Cycle Accurate Bit Accurate
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 16/35
Units Impact on Side Channel Information (2/2)
0
200
400
600
800
1000
1200
16700 16720 16740 16760 16780 16800 16820 16840 16860
activity [
#tr
an
sitio
ns]
time [clock cycles]
RE
AD
LA
UN
CH
ad
ditio
n
WA
IT a
dd
itio
nW
RIT
E
RE
AD
LA
UN
CH
ad
ditio
n
WA
IT a
dd
itio
nW
RIT
E
RE
AD
LA
UN
CH
ad
ditio
n
WA
IT a
dd
itio
nW
RIT
E
RE
AD
LA
UN
CH
mu
ltip
lica
tio
n
WA
IT m
ultip
lica
tio
nW
RIT
E
0
200
400
600
800
1000
1200
1400
6500 6520 6540 6560 6580 6600 6620 6640 6660
activity [
#tr
an
sitio
ns]
time [clock cycles]
RE
AD
LA
UN
CH
ad
ditio
n
WA
IT a
dd
itio
nW
RIT
E
RE
AD
LA
UN
CH
ad
ditio
n
WA
IT a
dd
itio
nW
RIT
E
RE
AD
LA
UN
CH
ad
ditio
n
WA
IT a
dd
itio
nW
RIT
E
RE
AD
LA
UN
CH
mu
ltip
lica
tio
n
WA
IT m
ultip
lica
tio
nW
RIT
E
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 17/35
Developed Programming Tools
timenow
V0
acceleratormodules
. . .
configurations
CAD tools
selection
user
crypto.lib.
assembler
binaryimplementation
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 18/35
Developed Programming Tools
timenow
V0 V1
acceleratormodules
. . .
configurations
CAD tools
selection
user
crypto.lib.
assembler
binaryimplementation
compilerpython
API/TLS-SSL
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 18/35
Developed Programming Tools
timenow
V0 V1 V2
acceleratormodules
acceleratormodules
. . .
configurations
CAD tools
selection
user
crypto.lib.
crypto.lib.
assembler
binaryimplementation
compiler Sage
API/TLS-SSL
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 18/35
Instruction Set
READ FUid @Rid @Rid B/U
WRITE FUid @Rid
LAUNCH FUid MODE
WAIT FUid
SETADDRO @Rid OFFSET
SETADDRN @Rid #WORD
WRITEK #WORD
CALL @DEST
RET
BZ @DEST
BNZ @DEST
JMP @DEST
CMPD DIGIT
SET FLAGid
TST FLAGid
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 19/35
Address Model in the Register FileRF requirements :
• 5–16 registers of m-bit Fq elements
• worst case: w small (16 bits) and m large (600 bits) ⇒ 550+ wordsand 10-bit physical addresses
x ∈ Fq is addressed by one entry (notation @Rid) of the intermediateaddress table (IAT) with 2 values:
• offset of the first word (e.g. x [0])
• number of w -bit words
CTRLregister
file
addresstable
@Rid physical@
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 20/35
Address Model in the Register FileRF requirements :
• 5–16 registers of m-bit Fq elements
• worst case: w small (16 bits) and m large (600 bits) ⇒ 550+ wordsand 10-bit physical addresses
x ∈ Fq is addressed by one entry (notation @Rid) of the intermediateaddress table (IAT) with 2 values:
• offset of the first word (e.g. x [0])
• number of w -bit words
CTRLregister
file
addresstable
@Rid physical@
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 20/35
Code Memory
Behavior:
• Specific private path in the interconnect for code download (no leaksin RF or FUs)
• Code input can be disabled (ROM mode with code in the FPGAbitstream)
• Instruction CALL: push PC then jump to @DEST
• Instruction RET: jump to (pop) + 1
Memory mapping to be defined
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 21/35
Internal Parallelism Model
non-blocking instruction decoding (i.e. always do PC ← PC + 1 orPC ← cst) except for WAIT instruction
Example of operations sequence, its dependency graph and assembly codefor 2 multipliers:
r = ((a×b)+c)+(d×e))
a
b
c
d
e
r
M
0
1
M
3
4
A
2
5
A
5
6
5
1 read fu mul 0, 0, 1 read a & b2 launch fu mul 0 start ab3 read fu mul 1, 3, 4 lit d & e4 launch fu mul 1 start de5 wait fu mul 0 wait for ab6 write fu mul 0, 5 write ab7 set OPMODE, 0 addition mode (+)8 read fu add sub 0, 5, 2 read ab & c9 launch fu add sub 0 start (ab) + c
10 wait fu mul 1 wait for de11 write fu mul 1, 6 write de12 wait fu add sub 0 wait for (ab) + c13 write fu add sub 0, 5 write (ab) + c14 read fu add sub 0, 5, 6 read (ab) + c & de15 launch fu add sub 0 start ((ab) + c) + (de)16 wait fu add sub 0 wait for ((ab) + c) + (de)17 write fu add sub 0, 5 write ((ab) + c) + (de)
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 22/35
ECC Accelerator with Additions ChainsFirst full hardware implementation of recoding using additions chains
FPGA implementation
Spartan-6 XC6SLX9
192-bit Fp
Very small config.
Euclide
computationof C
MEM.(BRAM)
1/φ, k, k/φ,a, b, C, C′
a(j) − b(j) b(j) − a(j) a(j)
2− b(j) b(j)
2− a(j)
k
unused unusedcout cout
CTRL CSIPO C C
LSB(a(j)) LSB(b(j) )
computationof k
φ
±
+
+
ε
+ CTRL@
offset C′offset C
offset boffset a
offset k/φoffset k
write ports
read ports
address control signalsscalar
word digit w-bit data word
recoding
BR
AM optim. area freq. dura. SCA
method target slices (FF/LUT) MHz ms prot.
EAC 3area 534 (1813/1508) 132 35.8
Yspeed 556 (1872/1523) 137 34.5
DA 2area 429 (1243/1134) 191 30
Nspeed 399 (1302/1222) 177 32.5
ML 2area 429 (1243/1134) 191 42.5
Yspeed 399 (1302/1222) 177 45.8
UF 2area 429 (1243/1134) 191 50.4
Yspeed 399 (1302/1222) 177 54.4
NAF-3 2area 422 (1280/1157) 181 25.2
Nspeed 423 (1321/1242) 175 26.1
NAF-4 2area 420 (1277/1161) 158 27.3
Nspeed 425 (1233/1246) 177 24.4
EAC: Euclidean addition chains, DA: dbl-and-add, ML: Montgomery ladder,UF: unified formula
See details in [12]A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 23/35
Comparison ECC 256 vs HECC 128 (1/7)field Fp ADD DBL
ECC ` bits mul OUTRX
sub OUTRY
mul OUTRZ
PZ
mul
PZ
mul
PZ
mulPZ
PX mulPX
PYmul
PY
QYQY
QXQX
QZ
QZ
QZ
QZ
mulv18
add addv12
subv13
mulv10
v10
v10
mulv11
v11
sub
v11
mul
v16
v17
v14
v14
subv0
v1
v1
v2
v2
sqrv2
subv3
v4
v4
v5
sqr
v5
mul
v6
v7
v7v8
v9
v9
Cost: 12M + 2S
mul OUTRX
sub OUTRY
add OUTRZ
PZ
mul
PZ
mul
PZ
PX
sqr
PX
mul
PX
PYPY
mul
PY
aa
addadd
v18
v18
addv19
v19
sub
addv12
v12
sub
v12
v13
add
addv10
v10
v10
v11
mul
v16
sqrv17
v17
v15
mul addv23
v23sqr
v22
v20
add
v25
v25
v24
v24
add
v0
add
v1
v1
addv1
v2
v3
v4
sqr
v4
v5
v6
v6
v6
v6
v7
v7
add
v8
v8
v9
v9
Cost: 6M + 5S
HECC `2 bits
mul OUTRU0
mul OUTRU1
add OUTRV0
add OUTRV1
mul OUTRZ
PZ
mul
PZ
mul
PZ
mul
PZ
add
PZ
mul
PZ
mul
PZ
mul
PZ
mul
PZ
QV0
QV0
QV1
QV1
QU1
add
QU1
QU1
QU0QU0
QU0
PU0
mul
PU0
mul
PU0
mul
PU0
PU1
PU1
mul
PU1
mul
PU1
PV1 mulPV1
QZ
mul
QZ
QZ
QZ
QZ
QZ
PV0PV0
sub
v18
mul
v18
add
v19
mul
v19
add
v12
mulv13
mulv13
mul
v10
sqr
v11
add mulv16
v17
sqr
v14
mul
v14
mul
v14
v15
add
v85
mul
v84
mulv87
add
v86
add
sub
v81
mulv80
subv83
v82
sqr
v69
v69
mul
v69
mul
sub
v68
sub
v67
mul
v67
sub
v66
v66
sub
v65
add
v64
v64
mul
v63
v63
subv61
v60
v60
add
v78
v79
v74
v76
mul v77
mul
v70
v70
v71
v72
mul
v73
v73
v23
mul
mul
v41
sub
v40
mul
v43
v43
v43
v43
mul
v43
mul
v43
addv42
add
mul
v45
add
v44
sub
add
v47
addv46
v49
v48
sub
v22
mulv22
v21
v21
v21
v21
v20
mul
v27
v26
v26
addv25
sub
v25
sub
v24
v29
v28
mul
v56
add
v56
v56
sub
v57
add
v54
v54
v54
v52
v53
v50
v51
v58
v58
v59
v30
v30
v30
v30
mul
v30
mulv30
v31
v31
v31
v31 v32
v32
v32v32
v33
mulv34
v35
v35
sqrv35
addv35
v35
v36
add
v37
v37v38
v39
v0
v0
v0
v0
v0
v1
sub
v1
v2
v3
v3
v3
subv4
v5
v5
v5
v6
v6
v6
v6
v6
v6
v6
v6
add
v7
v8
v9
v9
v9
Cost: 47M + 4S
mul OUTRU0
mul OUTRU1
sub OUTRV0
sub OUTRV1
mul OUTRZ
PZ
mul
PZ
mul
PZ
sub
PZ
mul
PZ
mulPZ
sub
PZ
mul
PZ
mul
PZ
mul
PZ
mul
PZ
sqrPZ
mul
PZ
mul
PZ
PU0
add
PU0
PU0
mul
PU0
add
PU0
mul
PU0
mul
PU0
PU1
sqr
PU1
add
PU1
mul
PU1
PU1
mul
PU1
mul
PU1
PU1
Z
sub
Z
PV1
mul
PV1
sqr
PV1
addPV1
PV1
PV0
mul
PV0
add
PV0
PV0
add
sub
v18
addv18
v18
v19
add
mul
v12
add
v13
v13
v13
add
v13
mul
mul
v10
sqr
v10
mul
v10
v11
mul
v11
v16
v17
v14
v15
v15
v15
v80
v69
mul v68
v68
subv67
mul
v66
v65
addv65
sqr
v64
v64
mul
v64
mul
v63
addv62
mul
v62
sub
v61
v60
v60
add
v78
v79
mul
sub
v74
v75
sub
v76
v77
v71
add
v72
v73
v41
v40
v40
v40
v40
sub
sqr
v43
mul
v43
v42
mul
add
v45
add
v44
mulv47
v46
v49
v48
sub
v23
v22
v21
add
v20subv27
addv26
mul
v26
v25
v24
v29
v28
sub
v56
v57
v57
v57
v54
add
v54
v54v55
v53
v53
v53
v50
v51
v51
v51
mul
v59
v59
v59
v30
v30
v31
sub
v32
v33
add
v33
v34
v34
addv34
v35
v36
v37
v38
v38
v38
v38
v39
v39
v39
v0
v0
v1
mul
v1
sub
v2
v3
v4
v4
v4
add
v5
v6
v6
sqr
v6
v7
v8
v9
v9
Cost: 38M + 6S
Configurations on a XC6SLX75 FPGA (details in [5]):
• w = 32 bits internal words
• 1 adder/subtracter, 1 inversion unit
• nM multipliers (Montgomery) with nB w -bit sub-blocks
• No DSP blocks
• ISE 14.6 Xilinx CAD tools, standard efforts (synthesis and P&R)
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 24/35
Comparison ECC 256 vs HECC 128 (2/7)
• Compared recoding techniques:I BIN: standard binary from left to rightI NAF: non-adjacent formI λ-NAF: window methods with λ ∈ {3, 4}
• Implementation results for a full ECC accelerator (nM = 1, nB = 1):
Recoding BIN NAF 3-NAF 4-NAF
area slices (FF/LUT) 565 (1321/1461) 570 (1340/1479) 571 (1344/1495) 503 (1348/1489)freq. (MHz) 225 228 237 217
All other results are reported for 4-NAF
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 25/35
Comparison ECC 256 vs HECC 128 (3/7)
Impact of the number/size of multipliers on the area and frequency:
nM
BR
AM nB = 1 nB = 2 nB = 4
area freq. area freq. area freq.slices (FF/LUT) MHz slices (FF/LUT) MHz slices (FF/LUT) MHz
EC
C
1 3 547 (1374/1460) 231 573 (1476/1625) 233 673 (1674/1875) 2332 3 722 (1776/1903) 220 811 (1979/2210) 227 942 (2377/2701) 2203 3 810 (2174/2236) 221 915 (2480/2698) 215 1130 (3077/3430) 2144 3 952 (2569/2656) 215 1100 (2977/3282) 217 1512 (3771/4293) 2165 3 1064 (2982/3136) 210 1405 (3492/3902) 206 1722 (4487/5122) 209
HE
CC
1 4 514 (1336/1374) 235 549 (1434/1513) 2342 4 646 (1716/1783) 220 737 (1912/2055) 2343 4 732 (2092/2075) 224 826 (2386/2485) 2254 4 870 (2476/2424) 218 1022 (2868/2987) 2145 4 976 (2865/2773) 219 1115 (3355/3465) 2106 4 1089 (3233/3092) 203 1240 (3821/3908) 2087 4 1145 (3601/3426) 213 1372 (4287/4365) 2058 4 1281 (3981/3809) 191 1552 (4765/4890) 1839 4 1379 (4363/4051) 202 1691 (5245/5277) 199
10 4 1543 (4739/4435) 196 1856 (5719/5801) 19811 4 1547 (5114/4750) 189 1936 (6192/6240) 19812 4 1738 (5499/5128) 191 2100 (6675/6771) 188
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 26/35
Comparison ECC 256 vs HECC 128 (4/7)
Impact of the number/size of multipliers on the average time (ms):
nBnM
1 2 3 4 5 6 7 8 9 10 11 12
HECC1 15.6 8.6 5.7 4.7 3.9 3.7 3.3 3.6 3.4 3.5 3.6 3.62 11.9 6.2 4.5 3.6 3.2 2.8 2.8 3.0 2.7 2.7 2.8 2.9
ECC1 28.1 15.3 12.4 12.4 12.72 17.7 9.6 8.3 8.0 8.44 11.1 6.2 5.4 5.1 5.3
Standard deviation for 1000 [k]P:
configuration ECC (1,1) ECC (3,4) HECC (1,1) HECC (6,2)
average time [ms] 28.1 5.4 15.6 2.8standard deviation [ms] 0.289 0.056 0.324 0.045
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 27/35
Comparison ECC 256 vs HECC 128 (5/7)
area
[slice
s]
time [ms]
ECC
HECC
600 800 1000 1200 1400 1600 1800 2000 2200
5
10
15
20
25
30
5,4
5,2
5,1
4,4
4,2
4,1
3,4
3,2
3,1
2,4
2,2
2,1
1,4
1,2
1,1
12,212,1 11,211,1 10,210,1 9,2
9,1
8,2
8,1
7,2
7,1
6,2
6,1
5,2
5,1
4,2
4,1
3,23,1
2,2
2,1
1,2
1,1
On average HECC is 40 % faster than ECC for a similar silicon cost
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 28/35
Comparison ECC 256 vs HECC 128 (6/7)%
usa
ge×
area
spee
du
p
ECC HECC
020406080
1001
2
3012345
1,1 1,2 1,4 2,4 3,4 4,4 1,1 1,2 2,1 3,1 3,2 5,2 8,2
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 29/35
Comparison ECC 256 vs HECC 128 (7/7)
Source FPGAarea freq. duration [k]P
slices / DSP blocks MHz ms
ECC 1,2
Spartan 6
573 / 0 233 17.7ECC 1,4 673 / 0 233 11.1ECC 2,4 942 / 0 220 6.2ECC 3,4 1 130 / 0 214 5.4
[7]Virtex-5 1 725 / 37 291 0.38Virtex-4 4 655 / 37 250 0.44
[6] Virtex-413 661 / 0 43 9.220 123 / 0 43 7.7
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 30/35
Conclusion & Current/Future Works
• HECC is efficient in hardware (40 % speedup vs ECC)
• Flexible architecture and tools for research activities
• Advanced recoding schemes are efficient in hardware
Current/future works:
• Hardware implementation of halving based method(s)
• Protections against fault injection
• HECC extensions of the accelerator (and tools)
• ASIC (CMOS 65nm) implementation of the accelerator
• Side channel evaluation of (some) proposed protections
• HW/SW Code distribution under free license
• More advanced architecture/circuit level protections
• Collaboration with other research groups
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 31/35
Our Long Term ObjectivesStudy the links between:
• curves
• arithmetic algorithms
• Fq, pts representations
• architecture & units
• circuit styles
to ensure
• high security againstI theoretical attacksI physical attacks
• low design cost
• low silicon cost
• low energy(/power)
• high performances
• high flexibility
area 1
delay 1
energy 1
security 1
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 32/35
Our Long Term ObjectivesStudy the links between:
• curves
• arithmetic algorithms
• Fq, pts representations
• architecture & units
• circuit styles
to ensure
• high security againstI theoretical attacksI physical attacks
• low design cost
• low silicon cost
• low energy(/power)
• high performances
• high flexibility
area 1 1 + a
delay 1 1 + t
energy 1 1 + e
a, t, e ∈ 0%, 5%, 10%, . . . , 100%
security 1
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 32/35
Our Long Term ObjectivesStudy the links between:
• curves
• arithmetic algorithms
• Fq, pts representations
• architecture & units
• circuit styles
to ensure
• high security againstI theoretical attacksI physical attacks
• low design cost
• low silicon cost
• low energy(/power)
• high performances
• high flexibility
area 1 1 + a
delay 1 1 + t
energy 1 1 + e
a, t, e ∈ 0%, 5%, 10%, . . . , 100%
security 1
×10
×100
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 32/35
References I
T. Chabrier, D. Pamula, and A. Tisserand.
Hardware implementation of DBNS recoding for ECC processor.In Proc. 44rd Asilomar Conference on Signals, Systems and Computers, pages 1129–1133, Pacific Grove, California,U.S.A., November 2010. IEEE.
T. Chabrier and A. Tisserand.
On-the-fly multi-base recoding for ECC scalar multiplication without pre-computations.In A. Nannarelli, P.-M. Seidel, and P. T. P. Tang, editors, Proc. 21st Symposium on Computer Arithmetic (ARITH), pages219–228, Austin, TX, U.S.A, April 2013. IEEE Computer Society.
J. Chen, A. Tisserand, E. Popovici, and S. Cotofana.
Asynchronous charge sharing power consistent montgomery multiplier.In J. Sparso and E Yahya, editors, Proc. 21st IEEE International Symposium on Asynchronous Circuits and Systems(ASYNC), pages 132–138, Mountain View, California, USA, May 2015.
J. Chen, A. Tisserand, E. M. Popovici, and S. Cotofana.
Robust sub-powered asynchronous logic.In J. Becker and M. R. Adrover, editors, Proc. 24th International Workshop on Power and Timing Modeling, Optimizationand Simulation (PATMOS), pages 1–7, Palma de Mallorca, Spain, September 2014. IEEE.
G. Gallin, A. Tisserand, and N. Veyrat-Charvillon.
Comparaison experimentale d’architectures de crypto-processeurs pour courbes elliptiques et hyper-elliptiques.In Actes Conference d’informatique en Parallelisme, Architecture et Systeme (ComPAS), Lille, France, June 2015.Prix meilleur papier track architecture.
S. Ghosh, M. Alam, D. Roychowdhury, and I.S. Gupta.
Parallel crypto-devices for GF(p) elliptic curve multiplication resistant against side channel attacks.Computers and Electrical Engineering, 35(2):329–338, March 2009.
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 33/35
References II
Y. Ma, Z. Liu, W. Pan, and J. Jing.
A high-speed elliptic curve cryptographic processor for generic curves over GF(p).In Proc. 20th International Workshop on Selected Areas in Cryptography (SAC), volume 8282 of LNCS, pages 421–437,Burnaby, BC, Canada, August 2013. Springer.
D. Pamula.
Arithmetic Operators on GF(2m) for Cryptographic Applications: Performance - Power Consumption - Security Tradeoffs.Phd thesis, University of Rennes 1 and Silesian University of Technology, December 2012.
D. Pamula, E. Hrynkiewicz, and A. Tisserand.
Analysis of GF(2233) multipliers regarding elliptic curve cryptosystem applications.In 11th IFAC/IEEE International Conference on Programmable Devices and Embedded Systems (PDeS), pages 252–257,Brno, Czech Republic, May 2012.
D. Pamula and A. Tisserand.
GF(2m) finite-field multipliers with reduced activity variations.In 4th International Workshop on the Arithmetic of Finite Fields, volume 7369 of LNCS, pages 152–167, Bochum,Germany, July 2012. Springer.
D. Pamula and A. Tisserand.
Fast and secure finite field multipliers.In Proc. Euromicro Conference on Digital System Design (DSD), pages 1–8, Funchal, Portugal, August 2015.
J. Proy, N. Veyrat-Charvillon, A. Tisserand, and N. Meloni.
Full hardware implementation of short addition chains recoding for ECC scalar multiplication.In Actes Conference d’informatique en Parallelisme, Architecture et Systeme (ComPAS), Lille, France, June 2015.
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 34/35
The end, questions ?
Contact:
• mailto:[email protected]
• http://people.irisa.fr/Arnaud.Tisserand/
• CAIRN Group http://www.irisa.fr/cairn/
• IRISA Laboratory, CNRS–INRIA–Univ. Rennes 16 rue Kerampont, CS 80518, F-22305 Lannion cedex, France
Thank you
A. Tisserand, CNRS–IRISA–CAIRN. Hardware Accelerators for ECC and HECC 35/35