FPGA Implementations of Pairing using Residue Number ... · Question : Can we bridge the gap to...
Transcript of FPGA Implementations of Pairing using Residue Number ... · Question : Can we bridge the gap to...
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
FPGA Implementations of Pairing usingResidue Number System and Lazy Reduction
Ray Cheung 1 Sylvain Duquesne 2 Junfeng Fan 4
Nicolas Guillermin 2,3 Ingrid Verbauwhede 4 Gavin Yao1
1 Department of Electronic EngineeringCity University of Hong Kong, Hong Kong SAR,
2 IRMAR, UMR CNRS 6625, Université Rennes 1Campus de Beaulieu, 35042 Rennes cedex, France
3 DGA.IS, La Roche Marguerite - 35170 - Bruz, France4 ESAT/SCD-COSIC, Katholieke Universiteit Leuven
Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium
Oct, 2011CHES 2011, Nara FPGA Implementation of Pairing using RNS 1/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Agenda
1 Motivations
2 Backgrounds
3 Pairing Coprocessor Design
4 Implementation Results
5 Conclusions
CHES 2011, Nara FPGA Implementation of Pairing using RNS 2/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Motivations
Getting the fastest hardware implementationfor a 128 bit security Pairing
supersingular curves/small char. ordinary curves/big char.
faster hardware architecture faster software implementationsfrobenius embedding degreeparallelism smaller curves
Question : Can we bridge the gap to make ordinary curveswin?
CHES 2011, Nara FPGA Implementation of Pairing using RNS 3/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Motivations
Getting the fastest hardware implementationfor a 128 bit security Pairing
supersingular curves/small char. ordinary curves/big char.
faster hardware architecture faster software implementationsfrobenius embedding degreeparallelism smaller curves
Question : Can we bridge the gap to make ordinary curveswin?
CHES 2011, Nara FPGA Implementation of Pairing using RNS 3/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Motivations
Getting the fastest hardware implementationfor a 128 bit security Pairing
supersingular curves/small char. ordinary curves/big char.
faster hardware architecture
faster software implementations
frobenius
embedding degree
parallelism
smaller curves
Question : Can we bridge the gap to make ordinary curveswin?
CHES 2011, Nara FPGA Implementation of Pairing using RNS 3/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Motivations
Getting the fastest hardware implementationfor a 128 bit security Pairing
supersingular curves/small char. ordinary curves/big char.
faster hardware architecture faster software implementationsfrobenius embedding degreeparallelism smaller curves
Question : Can we bridge the gap to make ordinary curveswin?
CHES 2011, Nara FPGA Implementation of Pairing using RNS 3/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Motivations
Getting the fastest hardware implementationfor a 128 bit security Pairing
supersingular curves/small char. ordinary curves/big char.
faster hardware architecture faster software implementationsfrobenius embedding degreeparallelism smaller curves
Question : Can we bridge the gap to make ordinary curveswin?
CHES 2011, Nara FPGA Implementation of Pairing using RNS 3/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
Agenda
1 Motivations
2 Backgrounds
3 Pairing Coprocessor Design
4 Implementation Results
5 Conclusions
CHES 2011, Nara FPGA Implementation of Pairing using RNS 4/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
What is Pairing and how do we use it?A pairing is a bilinear map e : G1 �G2 Ñ GT with G1 �G2 and GT groupswith hard Discrete Logarithm
Useful for :
Three-party one-roundDiffie-Hellman keyagreement [Joux’00]
Identity-based encryption[Boneh�01]
Short signature[Boneh�01]
Blind signature[Boldyreva’03]
A
B C
A
B C
(a, ga)
(b, gb) (c, gc)
A
B C
(a, ga)
(b, gb) (c, gc)
ga ↔
gb g a ↔
g c
gb ↔ gc
A
B C
(a, ga)
(b, gb) (c, gc)
ga ↔
gb g a ↔
g c
gb ↔ gc
to B: gab
to C: gac
to A: gab
to C: gbcto A: gac
to B: gbc
A
B C
(a, ga)
(b, gb) (c, gc)gb gc
ga
A
B C
(a, ga)
(b, gb) (c, gc)gb
e(gb, gc)a = e(g, g)abc
e(ga, gc)b = e(g, g)abc e(ga, gb)c = e(g, g)abc
gc
ga
CHES 2011, Nara FPGA Implementation of Pairing using RNS 5/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
What is Pairing and how do we use it?A pairing is a bilinear map e : G1 �G2 Ñ GT with G1 �G2 and GT groupswith hard Discrete Logarithm
Useful for :
Three-party one-roundDiffie-Hellman keyagreement [Joux’00]
Identity-based encryption[Boneh�01]
Short signature[Boneh�01]
Blind signature[Boldyreva’03]
A
B C
A
B C
(a, ga)
(b, gb) (c, gc)
A
B C
(a, ga)
(b, gb) (c, gc)
ga ↔
gb g a ↔
g c
gb ↔ gc
A
B C
(a, ga)
(b, gb) (c, gc)
ga ↔
gb g a ↔
g c
gb ↔ gc
to B: gab
to C: gac
to A: gab
to C: gbcto A: gac
to B: gbc
A
B C
(a, ga)
(b, gb) (c, gc)gb gc
ga
A
B C
(a, ga)
(b, gb) (c, gc)gb
e(gb, gc)a = e(g, g)abc
e(ga, gc)b = e(g, g)abc e(ga, gb)c = e(g, g)abc
gc
ga
CHES 2011, Nara FPGA Implementation of Pairing using RNS 5/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
What is Pairing and how do we use it?A pairing is a bilinear map e : G1 �G2 Ñ GT with G1 �G2 and GT groupswith hard Discrete Logarithm
Useful for :
Three-party one-roundDiffie-Hellman keyagreement [Joux’00]
Identity-based encryption[Boneh�01]
Short signature[Boneh�01]
Blind signature[Boldyreva’03]
A
B C
A
B C
(a, ga)
(b, gb) (c, gc)
A
B C
(a, ga)
(b, gb) (c, gc)
ga ↔
gb g a ↔
g c
gb ↔ gc
A
B C
(a, ga)
(b, gb) (c, gc)
ga ↔
gb g a ↔
g c
gb ↔ gc
to B: gab
to C: gac
to A: gab
to C: gbcto A: gac
to B: gbc
A
B C
(a, ga)
(b, gb) (c, gc)gb gc
ga
A
B C
(a, ga)
(b, gb) (c, gc)gb
e(gb, gc)a = e(g, g)abc
e(ga, gc)b = e(g, g)abc e(ga, gb)c = e(g, g)abc
gc
ga
CHES 2011, Nara FPGA Implementation of Pairing using RNS 5/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
What is Pairing and how do we use it?A pairing is a bilinear map e : G1 �G2 Ñ GT with G1 �G2 and GT groupswith hard Discrete Logarithm
Useful for :
Three-party one-roundDiffie-Hellman keyagreement [Joux’00]
Identity-based encryption[Boneh�01]
Short signature[Boneh�01]
Blind signature[Boldyreva’03]
A
B C
A
B C
(a, ga)
(b, gb) (c, gc)
A
B C
(a, ga)
(b, gb) (c, gc)
ga ↔
gb g a ↔
g c
gb ↔ gc
A
B C
(a, ga)
(b, gb) (c, gc)
ga ↔
gb g a ↔
g c
gb ↔ gc
to B: gab
to C: gac
to A: gab
to C: gbcto A: gac
to B: gbc
A
B C
(a, ga)
(b, gb) (c, gc)gb gc
ga
A
B C
(a, ga)
(b, gb) (c, gc)gb
e(gb, gc)a = e(g, g)abc
e(ga, gc)b = e(g, g)abc e(ga, gb)c = e(g, g)abc
gc
ga
CHES 2011, Nara FPGA Implementation of Pairing using RNS 5/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
What is Pairing and how do we use it?A pairing is a bilinear map e : G1 �G2 Ñ GT with G1 �G2 and GT groupswith hard Discrete Logarithm
Useful for :
Three-party one-roundDiffie-Hellman keyagreement [Joux’00]
Identity-based encryption[Boneh�01]
Short signature[Boneh�01]
Blind signature[Boldyreva’03]
A
B C
A
B C
(a, ga)
(b, gb) (c, gc)
A
B C
(a, ga)
(b, gb) (c, gc)
ga ↔
gb g a ↔
g c
gb ↔ gc
A
B C
(a, ga)
(b, gb) (c, gc)
ga ↔
gb g a ↔
g c
gb ↔ gc
to B: gab
to C: gac
to A: gab
to C: gbcto A: gac
to B: gbc
A
B C
(a, ga)
(b, gb) (c, gc)gb gc
ga
A
B C
(a, ga)
(b, gb) (c, gc)gb
e(gb, gc)a = e(g, g)abc
e(ga, gc)b = e(g, g)abc e(ga, gb)c = e(g, g)abc
gc
ga
CHES 2011, Nara FPGA Implementation of Pairing using RNS 5/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
What is Pairing and how do we use it?A pairing is a bilinear map e : G1 �G2 Ñ GT with G1 �G2 and GT groupswith hard Discrete Logarithm
Useful for :
Three-party one-roundDiffie-Hellman keyagreement [Joux’00]
Identity-based encryption[Boneh�01]
Short signature[Boneh�01]
Blind signature[Boldyreva’03]
A
B C
A
B C
(a, ga)
(b, gb) (c, gc)
A
B C
(a, ga)
(b, gb) (c, gc)
ga ↔
gb g a ↔
g c
gb ↔ gc
A
B C
(a, ga)
(b, gb) (c, gc)
ga ↔
gb g a ↔
g c
gb ↔ gc
to B: gab
to C: gac
to A: gab
to C: gbcto A: gac
to B: gbc
A
B C
(a, ga)
(b, gb) (c, gc)gb gc
ga
A
B C
(a, ga)
(b, gb) (c, gc)gb
e(gb, gc)a = e(g, g)abc
e(ga, gc)b = e(g, g)abc e(ga, gb)c = e(g, g)abc
gc
ga
CHES 2011, Nara FPGA Implementation of Pairing using RNS 5/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
What is Pairing and how do we use it?A pairing is a bilinear map e : G1 �G2 Ñ GT with G1 �G2 and GT groupswith hard Discrete Logarithm
Useful for :
Three-party one-roundDiffie-Hellman keyagreement [Joux’00]
Identity-based encryption[Boneh�01]
Short signature[Boneh�01]
Blind signature[Boldyreva’03]
A
B C
A
B C
(a, ga)
(b, gb) (c, gc)
A
B C
(a, ga)
(b, gb) (c, gc)
ga ↔
gb g a ↔
g c
gb ↔ gc
A
B C
(a, ga)
(b, gb) (c, gc)
ga ↔
gb g a ↔
g c
gb ↔ gc
to B: gab
to C: gac
to A: gab
to C: gbcto A: gac
to B: gbc
A
B C
(a, ga)
(b, gb) (c, gc)gb gc
ga
A
B C
(a, ga)
(b, gb) (c, gc)gb
e(gb, gc)a = e(g, g)abc
e(ga, gc)b = e(g, g)abc e(ga, gb)c = e(g, g)abc
gc
ga
CHES 2011, Nara FPGA Implementation of Pairing using RNS 5/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
What is Pairing and how do we use it?A pairing is a bilinear map e : G1 �G2 Ñ GT with G1 �G2 and GT groupswith hard Discrete Logarithm
Useful for :
Three-party one-roundDiffie-Hellman keyagreement [Joux’00]
Identity-based encryption[Boneh�01]
Short signature[Boneh�01]
Blind signature[Boldyreva’03]
A
B C
A
B C
(a, ga)
(b, gb) (c, gc)
A
B C
(a, ga)
(b, gb) (c, gc)
ga ↔
gb g a ↔
g c
gb ↔ gc
A
B C
(a, ga)
(b, gb) (c, gc)
ga ↔
gb g a ↔
g c
gb ↔ gc
to B: gab
to C: gac
to A: gab
to C: gbcto A: gac
to B: gbc
A
B C
(a, ga)
(b, gb) (c, gc)gb gc
ga
A
B C
(a, ga)
(b, gb) (c, gc)gb
e(gb, gc)a = e(g, g)abc
e(ga, gc)b = e(g, g)abc e(ga, gb)c = e(g, g)abc
gc
ga
CHES 2011, Nara FPGA Implementation of Pairing using RNS 5/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
Barreto-Naehrig Curves
BN curve over Fp:y 2� x 3
� bwhere b � 0 such that #E � `
p � 36u4 � 36u3 � 24u2 � 6u � 1
` � 36u4 � 36u3 � 18u2 � 6u � 1
for u P Z and p, ` primes.
Very adapted for 128 bits security
CHES 2011, Nara FPGA Implementation of Pairing using RNS 6/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
Barreto-Naehrig Curves
BN curve over Fp:y 2� x 3
� bwhere b � 0 such that #E � `
p � 36u4 � 36u3 � 24u2 � 6u � 1
` � 36u4 � 36u3 � 18u2 � 6u � 1
for u P Z and p, ` primes.
Very adapted for 128 bits security
CHES 2011, Nara FPGA Implementation of Pairing using RNS 6/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
Pairing Parameter Selection
Chosen curves
Security u rlog2 ps
126-bit �p262 � 255 � 1q 254128-bit �p263 � 222 � 218 � 27 � 1q 258192-bit �p2160 � 274 � 212 � 1q 646
CHES 2011, Nara FPGA Implementation of Pairing using RNS 7/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
Optimal Ate Pairing on BN Curves
pairing
Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]pairing
Miller's loopFinal exponentiation
Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]
[Miller'04]
pairing
Miller's loopFinal exponentiation
BN, Fp12, Fp2, Fp arithmetic
Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]
[Miller'04]
KaratsubaLazy reduction[Scott'08, Aranha+11]
pairing
Miller's loopFinal exponentiation
BN, Fp12, Fp2, Fp arithmetic
Fp arithmetic (modular operations)
Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]
[Miller'04]
KaratsubaLazy reduction[Scott'08, Aranha+11]
Barrett [Barrett'86]Montgomery [Montgomery'85]FVV [Fan+11]Blakley [Ghosh+10]RNS [Kawamura+00]
CHES 2011, Nara FPGA Implementation of Pairing using RNS 8/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
Optimal Ate Pairing on BN Curves
pairing
Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]
pairing
Miller's loopFinal exponentiation
Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]
[Miller'04]
pairing
Miller's loopFinal exponentiation
BN, Fp12, Fp2, Fp arithmetic
Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]
[Miller'04]
KaratsubaLazy reduction[Scott'08, Aranha+11]
pairing
Miller's loopFinal exponentiation
BN, Fp12, Fp2, Fp arithmetic
Fp arithmetic (modular operations)
Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]
[Miller'04]
KaratsubaLazy reduction[Scott'08, Aranha+11]
Barrett [Barrett'86]Montgomery [Montgomery'85]FVV [Fan+11]Blakley [Ghosh+10]RNS [Kawamura+00]
CHES 2011, Nara FPGA Implementation of Pairing using RNS 8/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
Optimal Ate Pairing on BN Curves
pairing
Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]pairing
Miller's loopFinal exponentiation
Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]
[Miller'04]
pairing
Miller's loopFinal exponentiation
BN, Fp12, Fp2, Fp arithmetic
Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]
[Miller'04]
KaratsubaLazy reduction[Scott'08, Aranha+11]
pairing
Miller's loopFinal exponentiation
BN, Fp12, Fp2, Fp arithmetic
Fp arithmetic (modular operations)
Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]
[Miller'04]
KaratsubaLazy reduction[Scott'08, Aranha+11]
Barrett [Barrett'86]Montgomery [Montgomery'85]FVV [Fan+11]Blakley [Ghosh+10]RNS [Kawamura+00]
CHES 2011, Nara FPGA Implementation of Pairing using RNS 8/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
Optimal Ate Pairing on BN Curves
pairing
Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]pairing
Miller's loopFinal exponentiation
Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]
[Miller'04]
pairing
Miller's loopFinal exponentiation
BN, Fp12, Fp2, Fp arithmetic
Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]
[Miller'04]
KaratsubaLazy reduction[Scott'08, Aranha+11]
pairing
Miller's loopFinal exponentiation
BN, Fp12, Fp2, Fp arithmetic
Fp arithmetic (modular operations)
Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]
[Miller'04]
KaratsubaLazy reduction[Scott'08, Aranha+11]
Barrett [Barrett'86]Montgomery [Montgomery'85]FVV [Fan+11]Blakley [Ghosh+10]RNS [Kawamura+00]
CHES 2011, Nara FPGA Implementation of Pairing using RNS 8/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
Optimal Ate Pairing on BN Curves
pairing
Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]pairing
Miller's loopFinal exponentiation
Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]
[Miller'04]
pairing
Miller's loopFinal exponentiation
BN, Fp12, Fp2, Fp arithmetic
Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]
[Miller'04]
KaratsubaLazy reduction[Scott'08, Aranha+11]
pairing
Miller's loopFinal exponentiation
BN, Fp12, Fp2, Fp arithmetic
Fp arithmetic (modular operations)
Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]
[Miller'04]
KaratsubaLazy reduction[Scott'08, Aranha+11]
Barrett [Barrett'86]Montgomery [Montgomery'85]FVV [Fan+11]Blakley [Ghosh+10]RNS [Kawamura+00]
CHES 2011, Nara FPGA Implementation of Pairing using RNS 8/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
Residue Number System (RNS)
RNS is defined by n pairwise coprime integer constants:B � tb1,b2, � � � ,bnu.
MB :�±n
i�1 bi ,bi P B
Any integer X ,0 ¤ X MB, X is uniquely represented by:X � tX mod b1,X mod b2, � � � ,X mod bnu,
Arithmetic operations on RNS (Z{MBZ)
Normal RNSR � X � Y mod MB ri � xi � yi mod biR � X � Y mod MB ri � xi � yi mod biR � X {Y mod MB ri � xi y�1
i mod bi
only if gcdpY ,MBq � 1
CHES 2011, Nara FPGA Implementation of Pairing using RNS 9/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
Residue Number System (RNS)
RNS is defined by n pairwise coprime integer constants:B � tb1,b2, � � � ,bnu.
MB :�±n
i�1 bi ,bi P B
Any integer X ,0 ¤ X MB, X is uniquely represented by:X � tX mod b1,X mod b2, � � � ,X mod bnu,
Arithmetic operations on RNS (Z{MBZ)
Normal RNSR � X � Y mod MB ri � xi � yi mod biR � X � Y mod MB ri � xi � yi mod biR � X {Y mod MB ri � xi y�1
i mod bi
only if gcdpY ,MBq � 1
CHES 2011, Nara FPGA Implementation of Pairing using RNS 9/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
Residue Number System (RNS)
RNS is defined by n pairwise coprime integer constants:B � tb1,b2, � � � ,bnu.
MB :�±n
i�1 bi ,bi P B
Any integer X ,0 ¤ X MB, X is uniquely represented by:X � tX mod b1,X mod b2, � � � ,X mod bnu,
Arithmetic operations on RNS (Z{MBZ)
Normal RNSR � X � Y mod MB ri � xi � yi mod biR � X � Y mod MB ri � xi � yi mod biR � X {Y mod MB ri � xi y�1
i mod bi
only if gcdpY ,MBq � 1
CHES 2011, Nara FPGA Implementation of Pairing using RNS 9/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
RNS Montgomery reduction [Bajard�98]
RNS Montgomery – MB
Input: A � aMB mod p andB � bMB mod p
Output: T � abMB mod p
in B1: TB Ð ABBB
2: QB Ð TB � p�pq�1
3: QBBase Extension
ÝÝÝÝÝÝÝÝÝÝÝÝÝÑQC
3: SB Ð pT �QBpq{MB
5: SBBase Extension
ÐÝÝÝÝÝÝÝÝÝÝÝÝÝSC
gcdpMB,MBq � MB � 1M�1
B does not exist in B.
CHES 2011, Nara FPGA Implementation of Pairing using RNS 10/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
RNS Montgomery reduction [Bajard�98]
RNS Montgomery – MB
Input: A � aMB mod p andB � bMB mod p
Output: T � abMB mod p
in B1: TB Ð ABBB
2: QB Ð TB � p�pq�1
3: QBBase Extension
ÝÝÝÝÝÝÝÝÝÝÝÝÝÑQC
3: SB Ð pT �QBpq{MB
5: SBBase Extension
ÐÝÝÝÝÝÝÝÝÝÝÝÝÝSC
gcdpMB,MBq � MB � 1M�1
B does not exist in B.
CHES 2011, Nara FPGA Implementation of Pairing using RNS 10/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
RNS Montgomery reduction [Bajard�98]
RNS Montgomery – MB
Input: A � aMB mod p andB � bMB mod p
Output: T � abMB mod p
in B in C1: TB Ð ABBB TC Ð ACBC
2: QB Ð TB � p�pq�1
3: QBBase Extension
ÝÝÝÝÝÝÝÝÝÝÝÝÝÑQC
4: SC Ð pTC �QCpqpM�1B qC
5: SBBase Extension
ÐÝÝÝÝÝÝÝÝÝÝÝÝÝSC
Introduce a new base C to perform division by MB.
Needs the computation of Base Extension [Kawamura�00]
CHES 2011, Nara FPGA Implementation of Pairing using RNS 10/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
RNS Montgomery reduction [Bajard�98]
RNS Montgomery – MB
Input: A � aMB mod p andB � bMB mod p
Output: T � abMB mod p
in B in C1: TB Ð ABBB TC Ð ACBC
2: QB Ð TB � p�pq�1
3: QBBase Extension
ÝÝÝÝÝÝÝÝÝÝÝÝÝÑQC
4: SC Ð pTC �QCpqpM�1B qC
5: SBBase Extension
ÐÝÝÝÝÝÝÝÝÝÝÝÝÝSC
Introduce a new base C to perform division by MB.Needs the computation of Base Extension [Kawamura�00]
CHES 2011, Nara FPGA Implementation of Pairing using RNS 10/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
RNS complexity and Lazy reduction
RNS Montgomery
Multiplication :2n MUL
Reduction (RED):2n2 � 3n MUL
Conventional Montgomery
Multiplication:n2 MUL
Reduction:n2 � n MUL
AB mod p:2n2 � 5n MUL
AB mod p:2n2 � n MUL
AB � CD mod p:2n2 � 7n MUL
AB � CD mod p:3n2 � n MUL
°ki�1 Ai Bi mod p:
2n2 � p3 � 2k qn MUL
°ki�1 Ai Bi mod p:
p1 � k qn2 � n MUL
CHES 2011, Nara FPGA Implementation of Pairing using RNS 11/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
RNS complexity and Lazy reduction
RNS Montgomery
Multiplication :2n MUL
Reduction (RED):2n2 � 3n MUL
Conventional Montgomery
Multiplication:n2 MUL
Reduction:n2 � n MUL
AB mod p:2n2 � 5n MUL
AB mod p:2n2 � n MUL
AB � CD mod p:2n2 � 7n MUL
AB � CD mod p:3n2 � n MUL
°ki�1 Ai Bi mod p:
2n2 � p3 � 2k qn MUL
°ki�1 Ai Bi mod p:
p1 � k qn2 � n MUL
CHES 2011, Nara FPGA Implementation of Pairing using RNS 11/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
RNS complexity and Lazy reduction
RNS Montgomery
Multiplication :2n MUL
Reduction (RED):2n2 � 3n MUL
Conventional Montgomery
Multiplication:n2 MUL
Reduction:n2 � n MUL
AB mod p:2n2 � 5n MUL
AB mod p:2n2 � n MUL
AB � CD mod p:2n2 � 7n MUL
AB � CD mod p:3n2 � n MUL
°ki�1 Ai Bi mod p:
2n2 � p3 � 2k qn MUL
°ki�1 Ai Bi mod p:
p1 � k qn2 � n MUL
CHES 2011, Nara FPGA Implementation of Pairing using RNS 11/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
Theoretical conclusions
Lazy reduction reduces the complexity of Pairings[Aranha�11]
RNS reduces the complexity of lazy reductionRNS involves easy parallelism
Remains to verify it "in real world"
CHES 2011, Nara FPGA Implementation of Pairing using RNS 12/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
Theoretical conclusions
Lazy reduction reduces the complexity of Pairings[Aranha�11]
RNS reduces the complexity of lazy reduction
RNS involves easy parallelism
Remains to verify it "in real world"
CHES 2011, Nara FPGA Implementation of Pairing using RNS 12/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
Theoretical conclusions
Lazy reduction reduces the complexity of Pairings[Aranha�11]
RNS reduces the complexity of lazy reductionRNS involves easy parallelism
Remains to verify it "in real world"
CHES 2011, Nara FPGA Implementation of Pairing using RNS 12/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery
Theoretical conclusions
Lazy reduction reduces the complexity of Pairings[Aranha�11]
RNS reduces the complexity of lazy reductionRNS involves easy parallelism
Remains to verify it "in real world"
CHES 2011, Nara FPGA Implementation of Pairing using RNS 12/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Architectural Design IFurther OptimizationArchitectural Design II
Agenda
1 Motivations
2 Backgrounds
3 Pairing Coprocessor Design
4 Implementation Results
5 Conclusions
CHES 2011, Nara FPGA Implementation of Pairing using RNS 13/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Architectural Design IFurther OptimizationArchitectural Design II
Cox-Rower Architecture [Kawamura�00]
Main featuren rowers
MUL: 2 cyclesRED: 2n+3 cycles
One rower,one channelMicrocodedsequencer
Row
er0
ControllerROM
(BRAM)
Row
er1
...
Row
er7
DATACTRL
ξ register
cox
CHES 2011, Nara FPGA Implementation of Pairing using RNS 14/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Architectural Design IFurther OptimizationArchitectural Design II
Rower Design
2 accumulatorsSmall-constant multiplier3-port RAMsSame data path
36x36
Multiplier
Constant Multiplier{2,3,4,6}
Acc0 Acc1
Accumulators
Channel Reduction
ξj
ρ
ξi
RAM0
CHES 2011, Nara FPGA Implementation of Pairing using RNS 15/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Architectural Design IFurther OptimizationArchitectural Design II
Rower Design
Underlying Field
Fp2 � Fpris{pi2 � 1q
Fp12 � Fp2rγs{pγ6 � p1 � iqq
px0 � x1iqpy0 � y1iqp1� iq�px0y0 � x1y1 � x0y1 � x1y0q�
px0y0 � x1y1 � x0y1 � x1y0qi
36x36
Multiplier
Constant Multiplier{2,3,4,6}
Acc0 Acc1
Accumulators
Channel Reduction
ξj
ρ
ξi
RAM0
CHES 2011, Nara FPGA Implementation of Pairing using RNS 15/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Architectural Design IFurther OptimizationArchitectural Design II
Rower Design
Underlying Field
Fp2 � Fpris{pi2 � 1q
Fp12 � Fp2rγs{pγ6 � p1 � iqq
px0 � x1iqpy0 � y1iqp1� iq
�px0y0 � x1y1 � x0y1 � x1y0q�
px0y0 � x1y1 � x0y1 � x1y0qi
36x36
Multiplier
Constant Multiplier{2,3,4,6}
Acc0 Acc1
Accumulators
Channel Reduction
ξj
ρ
ξi
RAM0
CHES 2011, Nara FPGA Implementation of Pairing using RNS 15/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Architectural Design IFurther OptimizationArchitectural Design II
Rower Design
Underlying Field
Fp2 � Fpris{pi2 � 1q
Fp12 � Fp2rγs{pγ6 � p1 � iqq
px0 � x1iqpy0 � y1iqp1� iq�px0y0 � x1y1 � x0y1 � x1y0q�
px0y0 � x1y1 � x0y1 � x1y0qi
36x36
Multiplier
Constant Multiplier{2,3,4,6}
Acc0 Acc1
Accumulators
Channel Reduction
ξj
ρ
ξi
RAM0
CHES 2011, Nara FPGA Implementation of Pairing using RNS 15/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Architectural Design IFurther OptimizationArchitectural Design II
Further Optimization
ObservationsComputational intensiveness: base extension.
Base extension: n2 multiplications by constant.Constant is determined by base B and C.
Constant size is 35 bits.With selected bases, bit-length: 35 Ñ 25.Bit-length of the constant is shortened.
CHES 2011, Nara FPGA Implementation of Pairing using RNS 16/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Architectural Design IFurther OptimizationArchitectural Design II
Further Optimization
ObservationsComputational intensiveness: base extension.Base extension: n2 multiplications by constant.
Constant is determined by base B and C.
Constant size is 35 bits.With selected bases, bit-length: 35 Ñ 25.Bit-length of the constant is shortened.
CHES 2011, Nara FPGA Implementation of Pairing using RNS 16/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Architectural Design IFurther OptimizationArchitectural Design II
Further Optimization
ObservationsComputational intensiveness: base extension.Base extension: n2 multiplications by constant.Constant is determined by base B and C.
Constant size is 35 bits.With selected bases, bit-length: 35 Ñ 25.Bit-length of the constant is shortened.
CHES 2011, Nara FPGA Implementation of Pairing using RNS 16/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Architectural Design IFurther OptimizationArchitectural Design II
Further Optimization
ObservationsComputational intensiveness: base extension.Base extension: n2 multiplications by constant.Constant is determined by base B and C.
Constant size is 35 bits.
With selected bases, bit-length: 35 Ñ 25.Bit-length of the constant is shortened.
CHES 2011, Nara FPGA Implementation of Pairing using RNS 16/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Architectural Design IFurther OptimizationArchitectural Design II
Further Optimization
ObservationsComputational intensiveness: base extension.Base extension: n2 multiplications by constant.Constant is determined by base B and C.
Constant size is 35 bits.With selected bases, bit-length: 35 Ñ 25.
Bit-length of the constant is shortened.
CHES 2011, Nara FPGA Implementation of Pairing using RNS 16/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Architectural Design IFurther OptimizationArchitectural Design II
Further Optimization
ObservationsComputational intensiveness: base extension.Base extension: n2 multiplications by constant.Constant is determined by base B and C.
Constant size is 35 bits.With selected bases, bit-length: 35 Ñ 25.Bit-length of the constant is shortened.
CHES 2011, Nara FPGA Implementation of Pairing using RNS 16/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Architectural Design IFurther OptimizationArchitectural Design II
Dual mode 4-stage pipelined MUL
One 35�35multiplier
Two 35�25multipliers
@250MHzon Virtex-6
MUL: 2 cycles
RED: n+4 cycles(v.s. 2n+3 cycles)
35x25 35x2535x35
CHES 2011, Nara FPGA Implementation of Pairing using RNS 17/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Architectural Design IFurther OptimizationArchitectural Design II
Dual mode 4-stage pipelined MUL
One 35�35multiplier
Two 35�25multipliers
@250MHzon Virtex-6
MUL: 2 cycles
RED: n+4 cycles(v.s. 2n+3 cycles)
35x25 35x2535x35
CHES 2011, Nara FPGA Implementation of Pairing using RNS 17/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Comparison
Agenda
1 Motivations
2 Backgrounds
3 Pairing Coprocessor Design
4 Implementation Results
5 Conclusions
CHES 2011, Nara FPGA Implementation of Pairing using RNS 18/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Comparison
Pairing Parameter Selection
Design I
Security u rlog2 ps126-bit �p262 � 255 � 1q 254128-bit �p263 � 222 � 218 � 27 � 1q 258192-bit �p2160 � 274 � 212 � 1q 646
Design II
126-bit �p262 � 255 � 1q 254
CHES 2011, Nara FPGA Implementation of Pairing using RNS 19/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Comparison
Logic Utilization and Cycle Count
Logic utilizationDesign n Device Multipliers Logic Embedded
Elements Memory8 Cyclone II 35 18multipliers 14274 LC 67 M4k
I 8 Stratix III 72 DSP18el 4233 ALMs 1 M144k + 18 M9k(Altera) 19 Stratix III 171 DSP18el 9910 ALMs 1 M144k + 40 M9k
II 8 Virtex-6 32 DSP48E1s 7032 Slices 45 18Kb BRAMs(Xilinx)
Cycle count and Latency
Curve Cycles Technology Frequency Latency
Design I
BN126 176111 Cyclone II 91 MHz 1.93 msBN126 176111 Stratix III 165 MHz 1.07 msBN128 192502 Stratix III 165 MHz 1.16 msBN192 789849 Stratix III 131 MHz 6.02 ms
Design II BN126 143111 Virtex-6 250 MHz 0.57 ms
CHES 2011, Nara FPGA Implementation of Pairing using RNS 20/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Comparison
Logic Utilization and Cycle Count
Logic utilizationDesign n Device Multipliers Logic Embedded
Elements Memory8 Cyclone II 35 18multipliers 14274 LC 67 M4k
I 8 Stratix III 72 DSP18el 4233 ALMs 1 M144k + 18 M9k(Altera) 19 Stratix III 171 DSP18el 9910 ALMs 1 M144k + 40 M9k
II 8 Virtex-6 32 DSP48E1s 7032 Slices 45 18Kb BRAMs(Xilinx)
Cycle count and Latency
Curve Cycles Technology Frequency Latency
Design I
BN126 176111 Cyclone II 91 MHz 1.93 msBN126 176111 Stratix III 165 MHz 1.07 msBN128 192502 Stratix III 165 MHz 1.16 msBN192 789849 Stratix III 131 MHz 6.02 ms
Design II BN126 143111 Virtex-6 250 MHz 0.57 ms
CHES 2011, Nara FPGA Implementation of Pairing using RNS 20/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
Comparison
Comparison
Design Pairing/ Platform Algorithm Area Freq. Cycle DelaySecurity[bit] [MHz] [ms]
Design optimal ate Altera RNS 4233 ALMs165 176,111 1.07
I 126 (Stratix III) (Parallel) 72 DSPsDesign optimal ate Xilinx RNS 7032 slices
250 143,111 0.573II 126 (Virtex-6) (Parallel) 32 DSPs
Fan�11ate/128 Xilinx HMM 4014 slices
210336,366 1.60
opt. ate/128 (Virtex-6) (Parallel) 42 DSPs 245,430 1.17
Estibals’10Tate F35�97 Xilinx
-4755 Slices
192 428,853 2.23128 (Virtex-4) 7 BRAMs
Aranha�10opt. Eta F2367 Xilinx
- 4518 Slices 220 773,960� 3.52128 (Virtex-4)
Ghosh�11ηTF21223 Xilinx
- 15167 Slices 250 76,000� 0.19128 (Virtex-6)
Beuchat�10optimal ate Core
Montgomery - 2800 2,330,000 0.83126 i7
Aranha�11optimal ate Phenom
Montgomery - 3000 1,562,000 0.52126 II
CHES 2011, Nara FPGA Implementation of Pairing using RNS 21/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
ConclusionsQ&A
Conclusions
Conclusions1 Pairing using RNS + lazy reduction.2 Novel base selection specification.3 Hardware architectures.4 New speed record.
CHES 2011, Nara FPGA Implementation of Pairing using RNS 22/24
MotivationsBackgrounds
Pairing Coprocessor DesignImplementation Results
Conclusions
ConclusionsQ&A
Thank you!
CHES 2011, Nara FPGA Implementation of Pairing using RNS 23/24