PowerPoint Presentation · 2018. 10. 24. · Title: PowerPoint Presentation
PowerPoint Presentation · Title: PowerPoint Presentation Author: Stv Created Date: 4/26/2018...
Transcript of PowerPoint Presentation · Title: PowerPoint Presentation Author: Stv Created Date: 4/26/2018...
An Energy-Efficient Reconfigurable
DTLS Cryptographic Engine for
End-to-End Security in IoT Applications
Utsav Banerjee, Chiraag Juvekar, Andrew Wright,
Arvind, Anantha P. Chandrakasan
Massachusetts Institute of Technology
Motivation
Cloud ServerSensor NodeUntrusted
Gateway
Goal: end-to-end security
Need to establish a secure channel between two parties in the
presence of untrusted network components
End-to-End Security through DTLS
DTLS: Datagram Transport Layer Security
Phase I: HandshakeMutually authenticated key exchange
establishes secure channel
Phase II: Application DataSecure channel used for encrypted
data communication
Challenges
Micro-Processor
I$ DMEM DTLS in software:
• ~30 mJ
• ~90 KB code
Energy constraintsIncreased processor
memory usage
Elliptic curve cryptography (ECC) operations account for
99% of DTLS handshake energy
Proposed Solution
Micro-Processor
I$ DMEM DTLS in hardware:
• ~70 µJ
• ~8 KB code
DTLS Engine: Energy-efficient cryptographic hardware with
dedicated protocol controller
DTLS Controller
EC
C
AE
S
SH
A
Hardware-accelerated DTLS 438x more
energy-efficient than software
Elliptic Curve Cryptography
Elliptic Curve Scalar Multiplication (ECSM):
Inputs: Point P, Scalar k
Output: Point Q = kP
Security protocols may choose from a wide variety of curves
⇒ Motivation for reconfigurable ECC hardware
Prior work in hardware ECC:
WISTP 2011 (NIST P-192), CHES 2015 (Curve25519), …
P
Q = kP
ECC Architecture
1
1 Efficient modular multiplier
Multiplication with interleaved modular reduction
ECC Architecture
2
2 Comb pre-computations [ISPEC 2005]
3 Dedicated modular inverter
31 Inv
(Fermat)
1220 Mul
128 Inv
(Euclid)
320 Mul
1.9x Affine coordinates
with Euclidean
inverter adds < 10%
to total system area
ECSM Energy
For 256-bit prime curve:
• Comb pre-computations ≈ 320K cycles
• SPA-secure ECSM ≈ 180K cycles
At VDD = 0.8 V and f = 16 MHz:
Operation Time (s) Energy (µJ)
Software Comb + ECSM 8.5 4180
Hardware Comb + ECSM 0.031 17.60
238x reduction in energy compared to S/W
ECC Reconfigurability
Supports all Weierstrass and Montgomery curves over primes up to 256 bits
• NIST P-256
• Curve25519
• brainpoolP256t1
• BN(2,254)
• secp256k1
• ANSSI FRP256v1
Supported elliptic curves include:
Base point ECSM measurements for
160b, 192b, 224b and 256b prime curves
DTLS Engine Overview
1 Cryptographic accelerators
1
Improves energy-efficiency and performance
2 DTLS controller
2
Reduces program code size
3
3 DTLS RAM
Reduces data memory usage
Chip Architecture
• RISC-V processor with 3-stage pipeline
• 16 KB I-Cache and 64 KB D-Mem
• SD card used to store larger programs
• Memory-mapped interface to access
DTLS engine and peripherals
Chip Specifications (DTLS Engine)
2m
m
2mm
DMEMI$
RISC-V
DTLS
Engine
Chip Micrograph
Chip Specifications
Technology 65 nm LP CMOS
Supply voltage 0.8 - 1.2 V
Package 64-pin QFN
Die size 2 mm x 2 mm
DTLS Cryptographic Engine
Logic gates 149k (NAND2 equiv.)
SRAM 6.75 KB
Max. frequency 16 MHz at 0.8 V & 20 MHz at 1.2 V
DTLS energy44.08 µJ (Handshake) at 0.8 V
0.89 nJ/B (App. Data) at 0.8 V
Chip Specifications (RISC-V)
2m
m
2mm
DMEMI$
RISC-V
DTLS
Engine
Chip Micrograph
RISC-V Processor
Logic gates 34k (NAND2 equiv.)
SRAM 16 KB Instr. Cache & 64 KB Data Mem.
Max. frequency 20 MHz at 0.8 V & 78 MHz at 1.2 V
Dhrystone energy40.36 µW/MHz at 0.8 V
(0.96 DMIPS/MHz)
Chip Specifications
Technology 65 nm LP CMOS
Supply voltage 0.8 - 1.2 V
Package 64-pin QFN
Die size 2 mm x 2 mm
DTLS Energy-Efficiency
Comparison of 3 DTLS implementations:
• SW: Full software DTLS on the RISC-V
• SW + HW: DTLS controller in software & cryptographic primitives in hardware
• HW: Full hardware DTLS on the DTLS engine
Improvements in speed
and energy-efficiency
primarily due to efficient
crypto hardware
518x
438x
DTLS Memory Usage
Comparison of 3 DTLS implementations:
• SW: Full software DTLS on the RISC-V
• SW + HW: DTLS controller in software & cryptographic primitives in hardware
• HW: Full hardware DTLS on the DTLS engine
78KB
20KB
Reduction in code and
data memory usage
primarily due to hardware
DTLS controller
Performance Comparison
Specifications WISTP’11 a CHES’15 a VLSIC’17 This work
Technology 350 nm 130 nm 40 nm 65 nm
Supply voltage (V) 3.3 1.2 0.7 0.8
Cryptographic Accelerator
Logic gates / SRAM 12.8k / 0.25 KB 32.6k / 0.28 KB ‒ / 8 KB 149k / 6.75 KB
Hardware ECSM support Only NIST P-192 Only Curve25519 ‒ b All prime curves up to
256 bits
Base point
ECSM energy (µJ)
1423.6 (192 bit) ‒‒ b 3.11 (192 bit)
‒ 56.8 (255 bit) 6.34 (256 bit)
AES energy (nJ) 8558.04 521.01 c 7.05 6.21
SHA energy (nJ) 6876.3 d ‒ 48.7 d 24.3 d
DTLS in hardware No No No Yes
a Post-synthesis data reported for [4] and [5]b [6] implements only modular multiplication in binary fields in hardware
c [5] implements Salsa20 instead of AES for encryptiond [4] implements SHA-1; [6] implements SHA-3; This work implements SHA-2
Summary
• Energy-efficient DTLS cryptographic engine for IoT
• Reconfigurable ECC is 238x more energy-efficient than
software implementation
• DTLS controller reduces code and data memory usage
by 78 KB and 20 KB respectively
• Protocols beyond DTLS can also be implemented using
the RISC-V processor and cryptographic accelerators
Acknowledgements
• Qualcomm Innovation Fellowship and Texas Instruments
for funding
• TSMC University Shuttle Program for chip fabrication
Questions
Cloud ServerSensor NodeUntrusted
Gateway