FPGA Implementations of Pairing using Residue Number ... · Question : Can we bridge the gap to...

Post on 09-Jul-2020

1 views 0 download

Transcript of FPGA Implementations of Pairing using Residue Number ... · Question : Can we bridge the gap to...

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

FPGA Implementations of Pairing usingResidue Number System and Lazy Reduction

Ray Cheung 1 Sylvain Duquesne 2 Junfeng Fan 4

Nicolas Guillermin 2,3 Ingrid Verbauwhede 4 Gavin Yao1

1 Department of Electronic EngineeringCity University of Hong Kong, Hong Kong SAR,

2 IRMAR, UMR CNRS 6625, Université Rennes 1Campus de Beaulieu, 35042 Rennes cedex, France

3 DGA.IS, La Roche Marguerite - 35170 - Bruz, France4 ESAT/SCD-COSIC, Katholieke Universiteit Leuven

Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium

Oct, 2011CHES 2011, Nara FPGA Implementation of Pairing using RNS 1/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Agenda

1 Motivations

2 Backgrounds

3 Pairing Coprocessor Design

4 Implementation Results

5 Conclusions

CHES 2011, Nara FPGA Implementation of Pairing using RNS 2/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Motivations

Getting the fastest hardware implementationfor a 128 bit security Pairing

supersingular curves/small char. ordinary curves/big char.

faster hardware architecture faster software implementationsfrobenius embedding degreeparallelism smaller curves

Question : Can we bridge the gap to make ordinary curveswin?

CHES 2011, Nara FPGA Implementation of Pairing using RNS 3/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Motivations

Getting the fastest hardware implementationfor a 128 bit security Pairing

supersingular curves/small char. ordinary curves/big char.

faster hardware architecture faster software implementationsfrobenius embedding degreeparallelism smaller curves

Question : Can we bridge the gap to make ordinary curveswin?

CHES 2011, Nara FPGA Implementation of Pairing using RNS 3/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Motivations

Getting the fastest hardware implementationfor a 128 bit security Pairing

supersingular curves/small char. ordinary curves/big char.

faster hardware architecture

faster software implementations

frobenius

embedding degree

parallelism

smaller curves

Question : Can we bridge the gap to make ordinary curveswin?

CHES 2011, Nara FPGA Implementation of Pairing using RNS 3/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Motivations

Getting the fastest hardware implementationfor a 128 bit security Pairing

supersingular curves/small char. ordinary curves/big char.

faster hardware architecture faster software implementationsfrobenius embedding degreeparallelism smaller curves

Question : Can we bridge the gap to make ordinary curveswin?

CHES 2011, Nara FPGA Implementation of Pairing using RNS 3/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Motivations

Getting the fastest hardware implementationfor a 128 bit security Pairing

supersingular curves/small char. ordinary curves/big char.

faster hardware architecture faster software implementationsfrobenius embedding degreeparallelism smaller curves

Question : Can we bridge the gap to make ordinary curveswin?

CHES 2011, Nara FPGA Implementation of Pairing using RNS 3/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

Agenda

1 Motivations

2 Backgrounds

3 Pairing Coprocessor Design

4 Implementation Results

5 Conclusions

CHES 2011, Nara FPGA Implementation of Pairing using RNS 4/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

What is Pairing and how do we use it?A pairing is a bilinear map e : G1 �G2 Ñ GT with G1 �G2 and GT groupswith hard Discrete Logarithm

Useful for :

Three-party one-roundDiffie-Hellman keyagreement [Joux’00]

Identity-based encryption[Boneh�01]

Short signature[Boneh�01]

Blind signature[Boldyreva’03]

A

B C

A

B C

(a, ga)

(b, gb) (c, gc)

A

B C

(a, ga)

(b, gb) (c, gc)

ga ↔

gb g a ↔

g c

gb ↔ gc

A

B C

(a, ga)

(b, gb) (c, gc)

ga ↔

gb g a ↔

g c

gb ↔ gc

to B: gab

to C: gac

to A: gab

to C: gbcto A: gac

to B: gbc

A

B C

(a, ga)

(b, gb) (c, gc)gb gc

ga

A

B C

(a, ga)

(b, gb) (c, gc)gb

e(gb, gc)a = e(g, g)abc

e(ga, gc)b = e(g, g)abc e(ga, gb)c = e(g, g)abc

gc

ga

CHES 2011, Nara FPGA Implementation of Pairing using RNS 5/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

What is Pairing and how do we use it?A pairing is a bilinear map e : G1 �G2 Ñ GT with G1 �G2 and GT groupswith hard Discrete Logarithm

Useful for :

Three-party one-roundDiffie-Hellman keyagreement [Joux’00]

Identity-based encryption[Boneh�01]

Short signature[Boneh�01]

Blind signature[Boldyreva’03]

A

B C

A

B C

(a, ga)

(b, gb) (c, gc)

A

B C

(a, ga)

(b, gb) (c, gc)

ga ↔

gb g a ↔

g c

gb ↔ gc

A

B C

(a, ga)

(b, gb) (c, gc)

ga ↔

gb g a ↔

g c

gb ↔ gc

to B: gab

to C: gac

to A: gab

to C: gbcto A: gac

to B: gbc

A

B C

(a, ga)

(b, gb) (c, gc)gb gc

ga

A

B C

(a, ga)

(b, gb) (c, gc)gb

e(gb, gc)a = e(g, g)abc

e(ga, gc)b = e(g, g)abc e(ga, gb)c = e(g, g)abc

gc

ga

CHES 2011, Nara FPGA Implementation of Pairing using RNS 5/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

What is Pairing and how do we use it?A pairing is a bilinear map e : G1 �G2 Ñ GT with G1 �G2 and GT groupswith hard Discrete Logarithm

Useful for :

Three-party one-roundDiffie-Hellman keyagreement [Joux’00]

Identity-based encryption[Boneh�01]

Short signature[Boneh�01]

Blind signature[Boldyreva’03]

A

B C

A

B C

(a, ga)

(b, gb) (c, gc)

A

B C

(a, ga)

(b, gb) (c, gc)

ga ↔

gb g a ↔

g c

gb ↔ gc

A

B C

(a, ga)

(b, gb) (c, gc)

ga ↔

gb g a ↔

g c

gb ↔ gc

to B: gab

to C: gac

to A: gab

to C: gbcto A: gac

to B: gbc

A

B C

(a, ga)

(b, gb) (c, gc)gb gc

ga

A

B C

(a, ga)

(b, gb) (c, gc)gb

e(gb, gc)a = e(g, g)abc

e(ga, gc)b = e(g, g)abc e(ga, gb)c = e(g, g)abc

gc

ga

CHES 2011, Nara FPGA Implementation of Pairing using RNS 5/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

What is Pairing and how do we use it?A pairing is a bilinear map e : G1 �G2 Ñ GT with G1 �G2 and GT groupswith hard Discrete Logarithm

Useful for :

Three-party one-roundDiffie-Hellman keyagreement [Joux’00]

Identity-based encryption[Boneh�01]

Short signature[Boneh�01]

Blind signature[Boldyreva’03]

A

B C

A

B C

(a, ga)

(b, gb) (c, gc)

A

B C

(a, ga)

(b, gb) (c, gc)

ga ↔

gb g a ↔

g c

gb ↔ gc

A

B C

(a, ga)

(b, gb) (c, gc)

ga ↔

gb g a ↔

g c

gb ↔ gc

to B: gab

to C: gac

to A: gab

to C: gbcto A: gac

to B: gbc

A

B C

(a, ga)

(b, gb) (c, gc)gb gc

ga

A

B C

(a, ga)

(b, gb) (c, gc)gb

e(gb, gc)a = e(g, g)abc

e(ga, gc)b = e(g, g)abc e(ga, gb)c = e(g, g)abc

gc

ga

CHES 2011, Nara FPGA Implementation of Pairing using RNS 5/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

What is Pairing and how do we use it?A pairing is a bilinear map e : G1 �G2 Ñ GT with G1 �G2 and GT groupswith hard Discrete Logarithm

Useful for :

Three-party one-roundDiffie-Hellman keyagreement [Joux’00]

Identity-based encryption[Boneh�01]

Short signature[Boneh�01]

Blind signature[Boldyreva’03]

A

B C

A

B C

(a, ga)

(b, gb) (c, gc)

A

B C

(a, ga)

(b, gb) (c, gc)

ga ↔

gb g a ↔

g c

gb ↔ gc

A

B C

(a, ga)

(b, gb) (c, gc)

ga ↔

gb g a ↔

g c

gb ↔ gc

to B: gab

to C: gac

to A: gab

to C: gbcto A: gac

to B: gbc

A

B C

(a, ga)

(b, gb) (c, gc)gb gc

ga

A

B C

(a, ga)

(b, gb) (c, gc)gb

e(gb, gc)a = e(g, g)abc

e(ga, gc)b = e(g, g)abc e(ga, gb)c = e(g, g)abc

gc

ga

CHES 2011, Nara FPGA Implementation of Pairing using RNS 5/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

What is Pairing and how do we use it?A pairing is a bilinear map e : G1 �G2 Ñ GT with G1 �G2 and GT groupswith hard Discrete Logarithm

Useful for :

Three-party one-roundDiffie-Hellman keyagreement [Joux’00]

Identity-based encryption[Boneh�01]

Short signature[Boneh�01]

Blind signature[Boldyreva’03]

A

B C

A

B C

(a, ga)

(b, gb) (c, gc)

A

B C

(a, ga)

(b, gb) (c, gc)

ga ↔

gb g a ↔

g c

gb ↔ gc

A

B C

(a, ga)

(b, gb) (c, gc)

ga ↔

gb g a ↔

g c

gb ↔ gc

to B: gab

to C: gac

to A: gab

to C: gbcto A: gac

to B: gbc

A

B C

(a, ga)

(b, gb) (c, gc)gb gc

ga

A

B C

(a, ga)

(b, gb) (c, gc)gb

e(gb, gc)a = e(g, g)abc

e(ga, gc)b = e(g, g)abc e(ga, gb)c = e(g, g)abc

gc

ga

CHES 2011, Nara FPGA Implementation of Pairing using RNS 5/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

What is Pairing and how do we use it?A pairing is a bilinear map e : G1 �G2 Ñ GT with G1 �G2 and GT groupswith hard Discrete Logarithm

Useful for :

Three-party one-roundDiffie-Hellman keyagreement [Joux’00]

Identity-based encryption[Boneh�01]

Short signature[Boneh�01]

Blind signature[Boldyreva’03]

A

B C

A

B C

(a, ga)

(b, gb) (c, gc)

A

B C

(a, ga)

(b, gb) (c, gc)

ga ↔

gb g a ↔

g c

gb ↔ gc

A

B C

(a, ga)

(b, gb) (c, gc)

ga ↔

gb g a ↔

g c

gb ↔ gc

to B: gab

to C: gac

to A: gab

to C: gbcto A: gac

to B: gbc

A

B C

(a, ga)

(b, gb) (c, gc)gb gc

ga

A

B C

(a, ga)

(b, gb) (c, gc)gb

e(gb, gc)a = e(g, g)abc

e(ga, gc)b = e(g, g)abc e(ga, gb)c = e(g, g)abc

gc

ga

CHES 2011, Nara FPGA Implementation of Pairing using RNS 5/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

What is Pairing and how do we use it?A pairing is a bilinear map e : G1 �G2 Ñ GT with G1 �G2 and GT groupswith hard Discrete Logarithm

Useful for :

Three-party one-roundDiffie-Hellman keyagreement [Joux’00]

Identity-based encryption[Boneh�01]

Short signature[Boneh�01]

Blind signature[Boldyreva’03]

A

B C

A

B C

(a, ga)

(b, gb) (c, gc)

A

B C

(a, ga)

(b, gb) (c, gc)

ga ↔

gb g a ↔

g c

gb ↔ gc

A

B C

(a, ga)

(b, gb) (c, gc)

ga ↔

gb g a ↔

g c

gb ↔ gc

to B: gab

to C: gac

to A: gab

to C: gbcto A: gac

to B: gbc

A

B C

(a, ga)

(b, gb) (c, gc)gb gc

ga

A

B C

(a, ga)

(b, gb) (c, gc)gb

e(gb, gc)a = e(g, g)abc

e(ga, gc)b = e(g, g)abc e(ga, gb)c = e(g, g)abc

gc

ga

CHES 2011, Nara FPGA Implementation of Pairing using RNS 5/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

Barreto-Naehrig Curves

BN curve over Fp:y 2� x 3

� bwhere b � 0 such that #E � `

p � 36u4 � 36u3 � 24u2 � 6u � 1

` � 36u4 � 36u3 � 18u2 � 6u � 1

for u P Z and p, ` primes.

Very adapted for 128 bits security

CHES 2011, Nara FPGA Implementation of Pairing using RNS 6/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

Barreto-Naehrig Curves

BN curve over Fp:y 2� x 3

� bwhere b � 0 such that #E � `

p � 36u4 � 36u3 � 24u2 � 6u � 1

` � 36u4 � 36u3 � 18u2 � 6u � 1

for u P Z and p, ` primes.

Very adapted for 128 bits security

CHES 2011, Nara FPGA Implementation of Pairing using RNS 6/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

Pairing Parameter Selection

Chosen curves

Security u rlog2 ps

126-bit �p262 � 255 � 1q 254128-bit �p263 � 222 � 218 � 27 � 1q 258192-bit �p2160 � 274 � 212 � 1q 646

CHES 2011, Nara FPGA Implementation of Pairing using RNS 7/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

Optimal Ate Pairing on BN Curves

pairing

Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]pairing

Miller's loopFinal exponentiation

Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]

[Miller'04]

pairing

Miller's loopFinal exponentiation

BN, Fp12, Fp2, Fp arithmetic

Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]

[Miller'04]

KaratsubaLazy reduction[Scott'08, Aranha+11]

pairing

Miller's loopFinal exponentiation

BN, Fp12, Fp2, Fp arithmetic

Fp arithmetic (modular operations)

Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]

[Miller'04]

KaratsubaLazy reduction[Scott'08, Aranha+11]

Barrett [Barrett'86]Montgomery [Montgomery'85]FVV [Fan+11]Blakley [Ghosh+10]RNS [Kawamura+00]

CHES 2011, Nara FPGA Implementation of Pairing using RNS 8/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

Optimal Ate Pairing on BN Curves

pairing

Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]

pairing

Miller's loopFinal exponentiation

Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]

[Miller'04]

pairing

Miller's loopFinal exponentiation

BN, Fp12, Fp2, Fp arithmetic

Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]

[Miller'04]

KaratsubaLazy reduction[Scott'08, Aranha+11]

pairing

Miller's loopFinal exponentiation

BN, Fp12, Fp2, Fp arithmetic

Fp arithmetic (modular operations)

Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]

[Miller'04]

KaratsubaLazy reduction[Scott'08, Aranha+11]

Barrett [Barrett'86]Montgomery [Montgomery'85]FVV [Fan+11]Blakley [Ghosh+10]RNS [Kawamura+00]

CHES 2011, Nara FPGA Implementation of Pairing using RNS 8/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

Optimal Ate Pairing on BN Curves

pairing

Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]pairing

Miller's loopFinal exponentiation

Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]

[Miller'04]

pairing

Miller's loopFinal exponentiation

BN, Fp12, Fp2, Fp arithmetic

Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]

[Miller'04]

KaratsubaLazy reduction[Scott'08, Aranha+11]

pairing

Miller's loopFinal exponentiation

BN, Fp12, Fp2, Fp arithmetic

Fp arithmetic (modular operations)

Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]

[Miller'04]

KaratsubaLazy reduction[Scott'08, Aranha+11]

Barrett [Barrett'86]Montgomery [Montgomery'85]FVV [Fan+11]Blakley [Ghosh+10]RNS [Kawamura+00]

CHES 2011, Nara FPGA Implementation of Pairing using RNS 8/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

Optimal Ate Pairing on BN Curves

pairing

Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]pairing

Miller's loopFinal exponentiation

Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]

[Miller'04]

pairing

Miller's loopFinal exponentiation

BN, Fp12, Fp2, Fp arithmetic

Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]

[Miller'04]

KaratsubaLazy reduction[Scott'08, Aranha+11]

pairing

Miller's loopFinal exponentiation

BN, Fp12, Fp2, Fp arithmetic

Fp arithmetic (modular operations)

Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]

[Miller'04]

KaratsubaLazy reduction[Scott'08, Aranha+11]

Barrett [Barrett'86]Montgomery [Montgomery'85]FVV [Fan+11]Blakley [Ghosh+10]RNS [Kawamura+00]

CHES 2011, Nara FPGA Implementation of Pairing using RNS 8/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

Optimal Ate Pairing on BN Curves

pairing

Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]pairing

Miller's loopFinal exponentiation

Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]

[Miller'04]

pairing

Miller's loopFinal exponentiation

BN, Fp12, Fp2, Fp arithmetic

Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]

[Miller'04]

KaratsubaLazy reduction[Scott'08, Aranha+11]

pairing

Miller's loopFinal exponentiation

BN, Fp12, Fp2, Fp arithmetic

Fp arithmetic (modular operations)

Tate [Frey+94]Ate [Hess+06]Optimal [Lee+08,Vercauteren'10]

[Miller'04]

KaratsubaLazy reduction[Scott'08, Aranha+11]

Barrett [Barrett'86]Montgomery [Montgomery'85]FVV [Fan+11]Blakley [Ghosh+10]RNS [Kawamura+00]

CHES 2011, Nara FPGA Implementation of Pairing using RNS 8/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

Residue Number System (RNS)

RNS is defined by n pairwise coprime integer constants:B � tb1,b2, � � � ,bnu.

MB :�±n

i�1 bi ,bi P B

Any integer X ,0 ¤ X   MB, X is uniquely represented by:X � tX mod b1,X mod b2, � � � ,X mod bnu,

Arithmetic operations on RNS (Z{MBZ)

Normal RNSR � X � Y mod MB ri � xi � yi mod biR � X � Y mod MB ri � xi � yi mod biR � X {Y mod MB ri � xi y�1

i mod bi

only if gcdpY ,MBq � 1

CHES 2011, Nara FPGA Implementation of Pairing using RNS 9/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

Residue Number System (RNS)

RNS is defined by n pairwise coprime integer constants:B � tb1,b2, � � � ,bnu.

MB :�±n

i�1 bi ,bi P B

Any integer X ,0 ¤ X   MB, X is uniquely represented by:X � tX mod b1,X mod b2, � � � ,X mod bnu,

Arithmetic operations on RNS (Z{MBZ)

Normal RNSR � X � Y mod MB ri � xi � yi mod biR � X � Y mod MB ri � xi � yi mod biR � X {Y mod MB ri � xi y�1

i mod bi

only if gcdpY ,MBq � 1

CHES 2011, Nara FPGA Implementation of Pairing using RNS 9/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

Residue Number System (RNS)

RNS is defined by n pairwise coprime integer constants:B � tb1,b2, � � � ,bnu.

MB :�±n

i�1 bi ,bi P B

Any integer X ,0 ¤ X   MB, X is uniquely represented by:X � tX mod b1,X mod b2, � � � ,X mod bnu,

Arithmetic operations on RNS (Z{MBZ)

Normal RNSR � X � Y mod MB ri � xi � yi mod biR � X � Y mod MB ri � xi � yi mod biR � X {Y mod MB ri � xi y�1

i mod bi

only if gcdpY ,MBq � 1

CHES 2011, Nara FPGA Implementation of Pairing using RNS 9/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

RNS Montgomery reduction [Bajard�98]

RNS Montgomery – MB

Input: A � aMB mod p andB � bMB mod p

Output: T � abMB mod p

in B1: TB Ð ABBB

2: QB Ð TB � p�pq�1

3: QBBase Extension

ÝÝÝÝÝÝÝÝÝÝÝÝÝÑQC

3: SB Ð pT �QBpq{MB

5: SBBase Extension

ÐÝÝÝÝÝÝÝÝÝÝÝÝÝSC

gcdpMB,MBq � MB � 1M�1

B does not exist in B.

CHES 2011, Nara FPGA Implementation of Pairing using RNS 10/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

RNS Montgomery reduction [Bajard�98]

RNS Montgomery – MB

Input: A � aMB mod p andB � bMB mod p

Output: T � abMB mod p

in B1: TB Ð ABBB

2: QB Ð TB � p�pq�1

3: QBBase Extension

ÝÝÝÝÝÝÝÝÝÝÝÝÝÑQC

3: SB Ð pT �QBpq{MB

5: SBBase Extension

ÐÝÝÝÝÝÝÝÝÝÝÝÝÝSC

gcdpMB,MBq � MB � 1M�1

B does not exist in B.

CHES 2011, Nara FPGA Implementation of Pairing using RNS 10/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

RNS Montgomery reduction [Bajard�98]

RNS Montgomery – MB

Input: A � aMB mod p andB � bMB mod p

Output: T � abMB mod p

in B in C1: TB Ð ABBB TC Ð ACBC

2: QB Ð TB � p�pq�1

3: QBBase Extension

ÝÝÝÝÝÝÝÝÝÝÝÝÝÑQC

4: SC Ð pTC �QCpqpM�1B qC

5: SBBase Extension

ÐÝÝÝÝÝÝÝÝÝÝÝÝÝSC

Introduce a new base C to perform division by MB.

Needs the computation of Base Extension [Kawamura�00]

CHES 2011, Nara FPGA Implementation of Pairing using RNS 10/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

RNS Montgomery reduction [Bajard�98]

RNS Montgomery – MB

Input: A � aMB mod p andB � bMB mod p

Output: T � abMB mod p

in B in C1: TB Ð ABBB TC Ð ACBC

2: QB Ð TB � p�pq�1

3: QBBase Extension

ÝÝÝÝÝÝÝÝÝÝÝÝÝÑQC

4: SC Ð pTC �QCpqpM�1B qC

5: SBBase Extension

ÐÝÝÝÝÝÝÝÝÝÝÝÝÝSC

Introduce a new base C to perform division by MB.Needs the computation of Base Extension [Kawamura�00]

CHES 2011, Nara FPGA Implementation of Pairing using RNS 10/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

RNS complexity and Lazy reduction

RNS Montgomery

Multiplication :2n MUL

Reduction (RED):2n2 � 3n MUL

Conventional Montgomery

Multiplication:n2 MUL

Reduction:n2 � n MUL

AB mod p:2n2 � 5n MUL

AB mod p:2n2 � n MUL

AB � CD mod p:2n2 � 7n MUL

AB � CD mod p:3n2 � n MUL

°ki�1 Ai Bi mod p:

2n2 � p3 � 2k qn MUL

°ki�1 Ai Bi mod p:

p1 � k qn2 � n MUL

CHES 2011, Nara FPGA Implementation of Pairing using RNS 11/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

RNS complexity and Lazy reduction

RNS Montgomery

Multiplication :2n MUL

Reduction (RED):2n2 � 3n MUL

Conventional Montgomery

Multiplication:n2 MUL

Reduction:n2 � n MUL

AB mod p:2n2 � 5n MUL

AB mod p:2n2 � n MUL

AB � CD mod p:2n2 � 7n MUL

AB � CD mod p:3n2 � n MUL

°ki�1 Ai Bi mod p:

2n2 � p3 � 2k qn MUL

°ki�1 Ai Bi mod p:

p1 � k qn2 � n MUL

CHES 2011, Nara FPGA Implementation of Pairing using RNS 11/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

RNS complexity and Lazy reduction

RNS Montgomery

Multiplication :2n MUL

Reduction (RED):2n2 � 3n MUL

Conventional Montgomery

Multiplication:n2 MUL

Reduction:n2 � n MUL

AB mod p:2n2 � 5n MUL

AB mod p:2n2 � n MUL

AB � CD mod p:2n2 � 7n MUL

AB � CD mod p:3n2 � n MUL

°ki�1 Ai Bi mod p:

2n2 � p3 � 2k qn MUL

°ki�1 Ai Bi mod p:

p1 � k qn2 � n MUL

CHES 2011, Nara FPGA Implementation of Pairing using RNS 11/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

Theoretical conclusions

Lazy reduction reduces the complexity of Pairings[Aranha�11]

RNS reduces the complexity of lazy reductionRNS involves easy parallelism

Remains to verify it "in real world"

CHES 2011, Nara FPGA Implementation of Pairing using RNS 12/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

Theoretical conclusions

Lazy reduction reduces the complexity of Pairings[Aranha�11]

RNS reduces the complexity of lazy reduction

RNS involves easy parallelism

Remains to verify it "in real world"

CHES 2011, Nara FPGA Implementation of Pairing using RNS 12/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

Theoretical conclusions

Lazy reduction reduces the complexity of Pairings[Aranha�11]

RNS reduces the complexity of lazy reductionRNS involves easy parallelism

Remains to verify it "in real world"

CHES 2011, Nara FPGA Implementation of Pairing using RNS 12/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Pairings and Barreto-Naehrig CurvesResidue number system (RNS)RNS Montgomery

Theoretical conclusions

Lazy reduction reduces the complexity of Pairings[Aranha�11]

RNS reduces the complexity of lazy reductionRNS involves easy parallelism

Remains to verify it "in real world"

CHES 2011, Nara FPGA Implementation of Pairing using RNS 12/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Architectural Design IFurther OptimizationArchitectural Design II

Agenda

1 Motivations

2 Backgrounds

3 Pairing Coprocessor Design

4 Implementation Results

5 Conclusions

CHES 2011, Nara FPGA Implementation of Pairing using RNS 13/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Architectural Design IFurther OptimizationArchitectural Design II

Cox-Rower Architecture [Kawamura�00]

Main featuren rowers

MUL: 2 cyclesRED: 2n+3 cycles

One rower,one channelMicrocodedsequencer

Row

er0

ControllerROM

(BRAM)

Row

er1

...

Row

er7

DATACTRL

ξ register

cox

CHES 2011, Nara FPGA Implementation of Pairing using RNS 14/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Architectural Design IFurther OptimizationArchitectural Design II

Rower Design

2 accumulatorsSmall-constant multiplier3-port RAMsSame data path

36x36

Multiplier

Constant Multiplier{2,3,4,6}

Acc0 Acc1

Accumulators

Channel Reduction

ξj

ρ

ξi

RAM0

CHES 2011, Nara FPGA Implementation of Pairing using RNS 15/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Architectural Design IFurther OptimizationArchitectural Design II

Rower Design

Underlying Field

Fp2 � Fpris{pi2 � 1q

Fp12 � Fp2rγs{pγ6 � p1 � iqq

px0 � x1iqpy0 � y1iqp1� iq�px0y0 � x1y1 � x0y1 � x1y0q�

px0y0 � x1y1 � x0y1 � x1y0qi

36x36

Multiplier

Constant Multiplier{2,3,4,6}

Acc0 Acc1

Accumulators

Channel Reduction

ξj

ρ

ξi

RAM0

CHES 2011, Nara FPGA Implementation of Pairing using RNS 15/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Architectural Design IFurther OptimizationArchitectural Design II

Rower Design

Underlying Field

Fp2 � Fpris{pi2 � 1q

Fp12 � Fp2rγs{pγ6 � p1 � iqq

px0 � x1iqpy0 � y1iqp1� iq

�px0y0 � x1y1 � x0y1 � x1y0q�

px0y0 � x1y1 � x0y1 � x1y0qi

36x36

Multiplier

Constant Multiplier{2,3,4,6}

Acc0 Acc1

Accumulators

Channel Reduction

ξj

ρ

ξi

RAM0

CHES 2011, Nara FPGA Implementation of Pairing using RNS 15/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Architectural Design IFurther OptimizationArchitectural Design II

Rower Design

Underlying Field

Fp2 � Fpris{pi2 � 1q

Fp12 � Fp2rγs{pγ6 � p1 � iqq

px0 � x1iqpy0 � y1iqp1� iq�px0y0 � x1y1 � x0y1 � x1y0q�

px0y0 � x1y1 � x0y1 � x1y0qi

36x36

Multiplier

Constant Multiplier{2,3,4,6}

Acc0 Acc1

Accumulators

Channel Reduction

ξj

ρ

ξi

RAM0

CHES 2011, Nara FPGA Implementation of Pairing using RNS 15/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Architectural Design IFurther OptimizationArchitectural Design II

Further Optimization

ObservationsComputational intensiveness: base extension.

Base extension: n2 multiplications by constant.Constant is determined by base B and C.

Constant size is 35 bits.With selected bases, bit-length: 35 Ñ 25.Bit-length of the constant is shortened.

CHES 2011, Nara FPGA Implementation of Pairing using RNS 16/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Architectural Design IFurther OptimizationArchitectural Design II

Further Optimization

ObservationsComputational intensiveness: base extension.Base extension: n2 multiplications by constant.

Constant is determined by base B and C.

Constant size is 35 bits.With selected bases, bit-length: 35 Ñ 25.Bit-length of the constant is shortened.

CHES 2011, Nara FPGA Implementation of Pairing using RNS 16/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Architectural Design IFurther OptimizationArchitectural Design II

Further Optimization

ObservationsComputational intensiveness: base extension.Base extension: n2 multiplications by constant.Constant is determined by base B and C.

Constant size is 35 bits.With selected bases, bit-length: 35 Ñ 25.Bit-length of the constant is shortened.

CHES 2011, Nara FPGA Implementation of Pairing using RNS 16/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Architectural Design IFurther OptimizationArchitectural Design II

Further Optimization

ObservationsComputational intensiveness: base extension.Base extension: n2 multiplications by constant.Constant is determined by base B and C.

Constant size is 35 bits.

With selected bases, bit-length: 35 Ñ 25.Bit-length of the constant is shortened.

CHES 2011, Nara FPGA Implementation of Pairing using RNS 16/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Architectural Design IFurther OptimizationArchitectural Design II

Further Optimization

ObservationsComputational intensiveness: base extension.Base extension: n2 multiplications by constant.Constant is determined by base B and C.

Constant size is 35 bits.With selected bases, bit-length: 35 Ñ 25.

Bit-length of the constant is shortened.

CHES 2011, Nara FPGA Implementation of Pairing using RNS 16/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Architectural Design IFurther OptimizationArchitectural Design II

Further Optimization

ObservationsComputational intensiveness: base extension.Base extension: n2 multiplications by constant.Constant is determined by base B and C.

Constant size is 35 bits.With selected bases, bit-length: 35 Ñ 25.Bit-length of the constant is shortened.

CHES 2011, Nara FPGA Implementation of Pairing using RNS 16/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Architectural Design IFurther OptimizationArchitectural Design II

Dual mode 4-stage pipelined MUL

One 35�35multiplier

Two 35�25multipliers

@250MHzon Virtex-6

MUL: 2 cycles

RED: n+4 cycles(v.s. 2n+3 cycles)

35x25 35x2535x35

CHES 2011, Nara FPGA Implementation of Pairing using RNS 17/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Architectural Design IFurther OptimizationArchitectural Design II

Dual mode 4-stage pipelined MUL

One 35�35multiplier

Two 35�25multipliers

@250MHzon Virtex-6

MUL: 2 cycles

RED: n+4 cycles(v.s. 2n+3 cycles)

35x25 35x2535x35

CHES 2011, Nara FPGA Implementation of Pairing using RNS 17/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Comparison

Agenda

1 Motivations

2 Backgrounds

3 Pairing Coprocessor Design

4 Implementation Results

5 Conclusions

CHES 2011, Nara FPGA Implementation of Pairing using RNS 18/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Comparison

Pairing Parameter Selection

Design I

Security u rlog2 ps126-bit �p262 � 255 � 1q 254128-bit �p263 � 222 � 218 � 27 � 1q 258192-bit �p2160 � 274 � 212 � 1q 646

Design II

126-bit �p262 � 255 � 1q 254

CHES 2011, Nara FPGA Implementation of Pairing using RNS 19/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Comparison

Logic Utilization and Cycle Count

Logic utilizationDesign n Device Multipliers Logic Embedded

Elements Memory8 Cyclone II 35 18multipliers 14274 LC 67 M4k

I 8 Stratix III 72 DSP18el 4233 ALMs 1 M144k + 18 M9k(Altera) 19 Stratix III 171 DSP18el 9910 ALMs 1 M144k + 40 M9k

II 8 Virtex-6 32 DSP48E1s 7032 Slices 45 18Kb BRAMs(Xilinx)

Cycle count and Latency

Curve Cycles Technology Frequency Latency

Design I

BN126 176111 Cyclone II 91 MHz 1.93 msBN126 176111 Stratix III 165 MHz 1.07 msBN128 192502 Stratix III 165 MHz 1.16 msBN192 789849 Stratix III 131 MHz 6.02 ms

Design II BN126 143111 Virtex-6 250 MHz 0.57 ms

CHES 2011, Nara FPGA Implementation of Pairing using RNS 20/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Comparison

Logic Utilization and Cycle Count

Logic utilizationDesign n Device Multipliers Logic Embedded

Elements Memory8 Cyclone II 35 18multipliers 14274 LC 67 M4k

I 8 Stratix III 72 DSP18el 4233 ALMs 1 M144k + 18 M9k(Altera) 19 Stratix III 171 DSP18el 9910 ALMs 1 M144k + 40 M9k

II 8 Virtex-6 32 DSP48E1s 7032 Slices 45 18Kb BRAMs(Xilinx)

Cycle count and Latency

Curve Cycles Technology Frequency Latency

Design I

BN126 176111 Cyclone II 91 MHz 1.93 msBN126 176111 Stratix III 165 MHz 1.07 msBN128 192502 Stratix III 165 MHz 1.16 msBN192 789849 Stratix III 131 MHz 6.02 ms

Design II BN126 143111 Virtex-6 250 MHz 0.57 ms

CHES 2011, Nara FPGA Implementation of Pairing using RNS 20/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

Comparison

Comparison

Design Pairing/ Platform Algorithm Area Freq. Cycle DelaySecurity[bit] [MHz] [ms]

Design optimal ate Altera RNS 4233 ALMs165 176,111 1.07

I 126 (Stratix III) (Parallel) 72 DSPsDesign optimal ate Xilinx RNS 7032 slices

250 143,111 0.573II 126 (Virtex-6) (Parallel) 32 DSPs

Fan�11ate/128 Xilinx HMM 4014 slices

210336,366 1.60

opt. ate/128 (Virtex-6) (Parallel) 42 DSPs 245,430 1.17

Estibals’10Tate F35�97 Xilinx

-4755 Slices

192 428,853 2.23128 (Virtex-4) 7 BRAMs

Aranha�10opt. Eta F2367 Xilinx

- 4518 Slices 220 773,960� 3.52128 (Virtex-4)

Ghosh�11ηTF21223 Xilinx

- 15167 Slices 250 76,000� 0.19128 (Virtex-6)

Beuchat�10optimal ate Core

Montgomery - 2800 2,330,000 0.83126 i7

Aranha�11optimal ate Phenom

Montgomery - 3000 1,562,000 0.52126 II

CHES 2011, Nara FPGA Implementation of Pairing using RNS 21/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

ConclusionsQ&A

Conclusions

Conclusions1 Pairing using RNS + lazy reduction.2 Novel base selection specification.3 Hardware architectures.4 New speed record.

CHES 2011, Nara FPGA Implementation of Pairing using RNS 22/24

MotivationsBackgrounds

Pairing Coprocessor DesignImplementation Results

Conclusions

ConclusionsQ&A

Thank you!

CHES 2011, Nara FPGA Implementation of Pairing using RNS 23/24