Experimental Testing of the Gigabit IPSec-Compliant ...kgaj/publications/conferences/GMU...CAST-128...
Transcript of Experimental Testing of the Gigabit IPSec-Compliant ...kgaj/publications/conferences/GMU...CAST-128...
Paweł Chodowiec & Kris GajGeorge Mason University
Peter Bellows & Brian SchottUSC - Information Sciences Institute
Experimental Testing of the GigabitIPSec-Compliant Implementations
of Rijndael and Triple DESUsing SLAAC-1V FPGA Accelerator Board
http://ece.gmu.edu/crypto-text.htm
IPSec: Transport Mode
Internet
Gateway Gateway
Cryptographic end points
Host Host
IPSec: Tunnel Mode
Internet
Securitygateway
Securitygateway.
.
.
.
.
.
.
.
Cryptographic end points
Host
Host
Host
Host
IPSec: Need for hardware accelerators
• large amount of secure associations processedby a single device
• cryptographic operations computationallyexpensive compared with regular IP operations
Cryptographic transformations in IPSec
Security ServicesThis project
Confidentiality
Key exchangeAuthentication
IPSec: Cryptographic algorithmsConfidentiality (1)Required:
Document: RFC 2405
DES
Key length
56 bitsAlgorithm
Document: RFC 2451 Optional:
Algorithm Key length[bits]
Popular sizes Default size
Triple DESBlowfishCAST-128IDEARC5
16840..44840..128
12840..2040
168128
40, 64, 80, 128128
40, 64, 80, 128
168128128128128
Breaking DES: Deep CrackElectronic Frontier Foundation, 1998
Total cost: $220,000Average time of search:
4.5 days/key
1800 ASIC chips, 40 MHz clock
Triple DESDiffie, Hellman, 1977
plaintext
DESencryption
DES-1
decryption
DESencryption
56K1
56K2
56K3
K = (K1, K2, K3)
168 bitsof the key
EDE mode
ciphertext
AES Contest EffortJune 1998
15 Candidatesfrom USA, Canada, Belgium,
France, Germany, Norway, UK, Isreal,Korea, Japan, Australia, Costa Rica
Round 1
August 1999
October 2000
5 final candidates
SecuritySoftware efficiency
Round 2
Mars, RC6, Rijndael, Serpent, TwofishSecurity
Hardware efficiency
1 winner: RijndaelBelgium
IPSec: Cryptographic algorithmsConfidentiality (2)
Proposed:Document: Internet Draft, November 2000,
Algorithm Key length[bits]
Popularsizes
Defaultsize
AES (Rijndael)MARSRC6SerpentTwofish
128, 192, 256128..448≤ 2040≤ 256≤ 256
128, 192, 256128, 192, 256128, 192, 256128, 192, 256128, 192, 256
128128128128128
RFC 2405Modes of operation: CBCM1 M2 M3
E
IV MN-1 MN
. . .
E E E E. . .
C1 C2 C3 CN-1 CN
Encryption
DIV
D D D D. . . . . .
M1 M2 M3 MN-1 MN
C3 CN-1 CNC2C1Decryption
Modes of operation: Current standard - CBCM3M1 M2 MNMN-1
. . .
E E E EE
IV
Problems:
. . .
C1 C2 C3 CN-1 CN
- No parallel processing of blocks from the same packet- No speed-up by preprocessing- No integrity or authentication
Counter modeIV+NIV+N-1IV IV+1 IV+2
. . .
M0M1 M2
E
MN-1 MN
E E E E. . .
KN
C0 C1 C2 CN-1Cechy:+ Potential for parallel processing+ Speed-up by preprocessing- No integrity or authentication
K0 K2 KN-1K1
CN
Operating Modes Contest4 Old Modes
(CBC, CFB, OFB, ECB)
10 New Candidatesfrom Egypt, Estonia, Norway,
Sweden, Thailand, USA
April 2001
Counter modeSummer 2001
5 Standard Modes2002
New Standard Modes
IPSec: Why reconfigurable hardware?
Frequently changing algorithms and their parameters
- AES- new modes of operation- new hash functions- parameters of public key cryptosystems
Capability for reconfiguration =- algorithm agility- scalable security- flexible architecture- remote error correction
Reconfigurability
External ROM and microprocessor enableschanging an FPGA function in several milliseconds
Encryption vs. decryption vs. key scheduling
Various algorithms
Keyscheduling Encryption
FPGA
5-15 ms
FPGA
Decryption
FPGA
5-15 ms
Triple DES IDEA
FPGA FPGA
5-15 ms 5-15 ms
FPGA
AES
SLAAC-1V
User programmed part
Standard interface(PCI interface + control module)
Xilinx FPGA devices SRAM
72-bit ring bus(64 bit data+ 8 bit control)
64/66 PCI
X0
X1 X2
IF
X072
72 72
72
X XX
60
72-bit shared bus
configurationcontrol device
Target FPGA devices
Xilinx Virtex - XCV 1000
• 0.22 µm CMOS process
• 1 mln equivalent logic gates
• 12 288 CLB slices
• Up to 200 MHz clock
ProgrammableInterconnects
Configurable Logic Block slices (CLB slices)
Block RAMs
• 10 4-kbit block RAMs
Methodology and Tools
Code in VHDL
1. Functional simulationAldec, Active-HDL
Netlist with timing
Xilinx, Foundation Series v. 3.1i
4. Experimental Testing
3. Timing simulation
Bitstream
Aldec, Active-HDL
2. Synthesis and
Implementation
Implementation Verification
USC-ISI, SLAAC-1V FPGA board
Primary parameters of hardware implementationsfor secret-key block ciphers
Latency Throughput
Mi
Time to encrypt/decrypt
a single block of data
Encryption/decryption
Number of bits encrypted/decrypted
in a unit of timeCi
Encryption/decryption
Mi
Mi+1
Mi+2
Ci
Ci+1
Ci+2
Throughput =Block_size · Number_of_blocks_processed_simultaneouslyLatency
Dependence of the encryption time on latency and throughput
Encryption time
Latency (Message_size –Block_size)
Message size
Throughput
Time
Typical Flow Diagram of a Secret-Key Block Cipher
Round Key[0]Initial transformation
i:=1
Round Key[i]
i:=i+1
i<#rounds?
Cipher Round#rounds
times
Round Key[#rounds+1]
Final transformation
Basic iterative architecture
register
combinationallogic
one round
multiplexer
round key
Triple DES: Basic Architecture Encryption/Decryption CoreInput Ln-1 Input Rn-1
F
32
32
32
mux1
mux3
32
mux2
mux4
Round key Kn
Output Ln Output Rn
Triple DES: Basic ArchitectureKey scheduling
<<<1 <<<2>>>1 >>>2
PC-2
PC-2
e
d
Four banksof key memory
Key input
Round key
PC-164 56 56
5648
4848
encryption decryption
AES -Rijndael: Basic Architecture
ByteSub&
InvByteSub
ShiftRow
MixCol
InvShiftRow
InvMixCol
round key
round key
round key
round key
Data input
Encryption circuit Decryption circuit
R1
R2a
IV
16 x 128 bit buffer
R0
R2b R2cR2d
R4
R3
16 x 128 bit buffer
B1 B2 B3 B4
mux1
mux2
M1 M2
IV
Data output
AES - Rijndael: 3-in-1 Key Scheduling UnitInput 64 bits
Rot Sub
Output 64 bits
Rconi/Nk
32wi-2 wi-1
wi-4 wi-3
wi-7
wi-Nk
wi-6 wi-5
wi-8
wi wi+1
wi-Nk+1
0
wi32
32
32 32
wi+1
32
Banks of round keys
3-in-1 keyscheduling unit
256 x 64 bitRAM
256 x 64 bitRAM
main key(in 64-bit words)
64
round key
64
64 64
128
16 banks of round keys
Rijndael vs. Triple DESExternal differences
AES-RijndaelTriple DES
input
3 DES
64 bits
input
64 bitsoutput
key
168 bitsAES
128 bits
key
128, 192, and 256 bits
128 bitsoutput
Rijndael vs. Triple DESInternal differences
RijndaelTriple DESSubstitution-
Linear Transformation NetworkFeistel network
Internal operationsoptimized for hardware
Internal operations optimized for software and hardware
• separate encryption and decryption units
• larger area• different maximum
encryption and decryption speeds
• the same circuit used forencryption and decryption
• compact design• the same speed for
encryption and decryption
Rijndael vs. Triple DESFunctional differences
RijndaelTriple DES
Round keys generatedfrom the main key• in only one order • 1/4 th or 1/2 nd of a round key per clock cycle
Round keys generatedfrom the main key• in arbitrary order• one round key per
clock cycle
Round keys need to beprecomputed and stored in internal memory
Round keys can be computed on the fly
Testing Procedure1. Functional testing
Tests based on NIST Special Publication 800-20• Known Answer Tests • Monte Carlo Test
2. Maximum clock frequency test
• clock frequency varied using binary search• 1 GB of data encrypted or decrypted in the CBC mode• results compared with results from software implementation
3. Maximum encryption/decryption throughput test
• maximum clock frequency• 4 GB of data encrypted or decrypted in the CBC mode• time necessary to complete all operations determined
Maximum Clock Frequency Test (1)START
Generate and upload key, IV,set DMA to send and receive 1GB of data
Perform reference encryption/decryptionin software
Set upper and lower bounds for clockfrequency
Test clock frequency = (upper bound + lower bound)/2
Encrypt/decrypt data in hardware at thegiven test clock frequency
Result same as insoftware?
Lower bound =test clock frequency
Upper bound =test clock frequency
Test clock frequency = (upper bound + lower bound)/2
Boundsclose?
N Y
N
Y
Results for basic architectures
Maximum clock frequency [MHz]
0255075100125150175 static analysis
experiment
7291
4760 52
Triple DESenc + dec
Rijndaelenc + dec
Rijndaelenc
Corresponding circuit throughputs
0100200300400500600700
static analysisderived from experimentalclock frequency
Throughput [Mbit/s]
91 116
521577665
800900
108
404experiment
Triple DESenc + dec
Rijndaelenc + dec
Rijndaelenc
Use of resources: basic architecture
0102030405060708090100
Percentage of the Virtex 1000 device resources
CLBs
Block RAMs
5 %10%
56 %
Triple DES Rijndael
Increasing throughput using parallel processing
Packet 1 Packet 2 Packet 3
Encryption/decryption
Memoryof
subkeys
Encryption/decryption
Memoryof
subkeys
IV1, a1, a2, … , aK IV2, b1, b2, … , bL IV3, c1, c2, … , cM
Encryption/decryption
Memoryof
subkeys
Increasing throughput using pipelining
b)a)
round #rounds=k pipeline stages
. . . .
round 1= k pipeline stages
round 2=k pipeline stages
. . . .
. . . .
. . . .
d) #rounds ·k registers
round K= k pipeline stages
. . . .
round 1= k pipeline stages
round 2= k pipeline stages
MUX
. . . .
. . . .
. . . .
c)K·k registers
one round= k pipeline stages
MUX
. . . .
k registersMUX
one round,no pipelining
register
combinational logic
Throughput [Mbit/s]mixed pipelininginner-round pipeliningbasic
18,000
431 414 177 143 62
16,76815,232
13,056
7,469
3,805
1,265 994 699
12,160
135
16,00014,00012,00010,0008,0006,0004,0002,000
0 RC6Serpent TwofishRijndael 3DES
Area [CLB slices]mixed pipelininginner-round pipeliningbasic
05000
100001500020000250003000035000400004500050000
SerpentTwofish3DES Rijndael356
375
12,288
1,0761,711
21,000
1,1373,458
46,800
2,507
2,057+8 RAMs
12,600+ 80 RAMs
4,507
19,700
5,623
4 devices
3 devices
2 devices
RC6
AES -Rijndael: Extended Architecture
ByteSub&
InvByteSub
ShiftRow
MixCol
InvShiftRow
InvMixCol
round key
round key
round key
round key
Data input
Encryption circuit Decryption circuit
R1
R2a
IV
16 x 128 bit buffer
R0
R2b R2cR2d
R4
R3
16 x 128 bit buffer
B1 B2 B3 B4
mux1
mux2
R5
R6
M1 M2
IV
Data output
Triple DES: Extended Architecture
Key input
round(1)
round(2)
round(16)
16 banksof main keysPC-1
64 56
Next key(1)
Next key(2)
Next key(16)
K1
K2
K16
Simplification of the keyscheduling unit: extended architecture
Next key(n)
<<< m >>> m
PC-2
e d Round key Kn
56
48
In-1
In
Triple DES: Key schedulingin basic architecture
<<<1 <<<2>>>1 >>>2
PC-2
PC-2
e
d
Four banksof key memory
Key input
Round key
PC-164 56 56
5648
4848
encryption decryption
Tentative results for extended architectureMaximum clock frequency [MHz]
0255075100125150175 analysis
experiment
Triple DESenc + dec
Rijndaelenc + dec
7291
47 52
Rijndaelenc
60
Rijndaelenc + dec
76
Basic architectures Extended architecture
90
Corresponding circuit throughputs
0100200300400500600700
analysisexperiment
Throughput [Mbit/s]
Triple DESenc + dec
Rijndaelenc + dec
91 116
521577
Rijndaelenc
665
Rijndaelenc + dec
843
Basic architectures Extended architecture
800900
9981000
Use of resources by extended architectures
0102030405060708090100
Percentage of the Virtex device resources
60%
19%
56 %(estimated)
CLBs
Block RAMs
Triple DES
5 % 10%
56 %
Basic architectureRijndaelRijndael Triple DES
Extended architecture
Conclusions• High-speed IPSEC-compliant implementations
of Rijndael and Triple DES developed and tested experimentally using the SLAAC-1V FPGAaccelerator board
• Encryption and decryption throughputs of Rijndael in the range of 1 Gbit/s (998 Mbit/s) demonstrated experimentally
• Integrated 1 Gbit/s implementation of Rijndael andTriple DES shown to require only 80% of resourcesof a single FPGA device Virtex XCV-1000
• SLAAC-1V accelerator board capable of supporting encryption & decryption throughputs in the range of 3 Gbit/s