Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g....
Transcript of Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g....
![Page 1: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/1.jpg)
1
Modern ECC signatures
2011 Bernstein–Duif–Lange–
Schwabe–Yang:
Ed25519 signature scheme =
EdDSA using conservative
Curve25519 elliptic curve.
https://ed25519.cr.yp.to
32-byte public keys,
64-byte signatures,
≈2125:8 security level.
Deployed in SSH, Signal,
many more applications:
https://ianix.com/pub
/ed25519-deployment.html
2
Many papers have explored
Curve25519/Ed25519 speed.
e.g. 2015 Chou software:
on Intel Sandy Bridge (2011),
57164 cycles for keygen,
63526 cycles for signature,
205741 cycles for verification,
159128 cycles for ECDH.
Compare to, e.g., 2000 Brown–
Hankerson–Lopez–Menezes:
on Intel Pentium II (1997),
1920000 cycles for ECDH
using NIST P-256 curve.
![Page 2: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/2.jpg)
1
Modern ECC signatures
2011 Bernstein–Duif–Lange–
Schwabe–Yang:
Ed25519 signature scheme =
EdDSA using conservative
Curve25519 elliptic curve.
https://ed25519.cr.yp.to
32-byte public keys,
64-byte signatures,
≈2125:8 security level.
Deployed in SSH, Signal,
many more applications:
https://ianix.com/pub
/ed25519-deployment.html
2
Many papers have explored
Curve25519/Ed25519 speed.
e.g. 2015 Chou software:
on Intel Sandy Bridge (2011),
57164 cycles for keygen,
63526 cycles for signature,
205741 cycles for verification,
159128 cycles for ECDH.
Compare to, e.g., 2000 Brown–
Hankerson–Lopez–Menezes:
on Intel Pentium II (1997),
1920000 cycles for ECDH
using NIST P-256 curve.
3
AC : cycles for alg A on CPU C.
Does AC < BD prove that
A is better than B?
![Page 3: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/3.jpg)
1
Modern ECC signatures
2011 Bernstein–Duif–Lange–
Schwabe–Yang:
Ed25519 signature scheme =
EdDSA using conservative
Curve25519 elliptic curve.
https://ed25519.cr.yp.to
32-byte public keys,
64-byte signatures,
≈2125:8 security level.
Deployed in SSH, Signal,
many more applications:
https://ianix.com/pub
/ed25519-deployment.html
2
Many papers have explored
Curve25519/Ed25519 speed.
e.g. 2015 Chou software:
on Intel Sandy Bridge (2011),
57164 cycles for keygen,
63526 cycles for signature,
205741 cycles for verification,
159128 cycles for ECDH.
Compare to, e.g., 2000 Brown–
Hankerson–Lopez–Menezes:
on Intel Pentium II (1997),
1920000 cycles for ECDH
using NIST P-256 curve.
3
AC : cycles for alg A on CPU C.
Does AC < BD prove that
A is better than B?
![Page 4: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/4.jpg)
1
Modern ECC signatures
2011 Bernstein–Duif–Lange–
Schwabe–Yang:
Ed25519 signature scheme =
EdDSA using conservative
Curve25519 elliptic curve.
https://ed25519.cr.yp.to
32-byte public keys,
64-byte signatures,
≈2125:8 security level.
Deployed in SSH, Signal,
many more applications:
https://ianix.com/pub
/ed25519-deployment.html
2
Many papers have explored
Curve25519/Ed25519 speed.
e.g. 2015 Chou software:
on Intel Sandy Bridge (2011),
57164 cycles for keygen,
63526 cycles for signature,
205741 cycles for verification,
159128 cycles for ECDH.
Compare to, e.g., 2000 Brown–
Hankerson–Lopez–Menezes:
on Intel Pentium II (1997),
1920000 cycles for ECDH
using NIST P-256 curve.
3
AC : cycles for alg A on CPU C.
Does AC < BD prove that
A is better than B?
![Page 5: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/5.jpg)
2
Many papers have explored
Curve25519/Ed25519 speed.
e.g. 2015 Chou software:
on Intel Sandy Bridge (2011),
57164 cycles for keygen,
63526 cycles for signature,
205741 cycles for verification,
159128 cycles for ECDH.
Compare to, e.g., 2000 Brown–
Hankerson–Lopez–Menezes:
on Intel Pentium II (1997),
1920000 cycles for ECDH
using NIST P-256 curve.
3
AC : cycles for alg A on CPU C.
Does AC < BD prove that
A is better than B?
![Page 6: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/6.jpg)
2
Many papers have explored
Curve25519/Ed25519 speed.
e.g. 2015 Chou software:
on Intel Sandy Bridge (2011),
57164 cycles for keygen,
63526 cycles for signature,
205741 cycles for verification,
159128 cycles for ECDH.
Compare to, e.g., 2000 Brown–
Hankerson–Lopez–Menezes:
on Intel Pentium II (1997),
1920000 cycles for ECDH
using NIST P-256 curve.
3
AC : cycles for alg A on CPU C.
Does AC < BD prove that
A is better than B?
No! Beware change in CPU.
Maybe AC > BC ; AD > BD;
C does more work per cycle than
D, thanks to CPU manufacturer.
Sometimes people measure cost
in seconds instead of cycles.
Then they benefit
from more work per cycle and
from more cycles per second.
![Page 7: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/7.jpg)
2
Many papers have explored
Curve25519/Ed25519 speed.
e.g. 2015 Chou software:
on Intel Sandy Bridge (2011),
57164 cycles for keygen,
63526 cycles for signature,
205741 cycles for verification,
159128 cycles for ECDH.
Compare to, e.g., 2000 Brown–
Hankerson–Lopez–Menezes:
on Intel Pentium II (1997),
1920000 cycles for ECDH
using NIST P-256 curve.
3
AC : cycles for alg A on CPU C.
Does AC < BD prove that
A is better than B?
No! Beware change in CPU.
Maybe AC > BC ; AD > BD;
C does more work per cycle than
D, thanks to CPU manufacturer.
Sometimes people measure cost
in seconds instead of cycles.
Then they benefit
from more work per cycle and
from more cycles per second.
4
Better comparisons
(still raising many questions):
ECDH on Intel Pentium II/III
(still not exactly the same):
1920000 cycles for NIST P-256,
832457 cycles for Curve25519.
ECDH on Sandy Bridge:
374000 cycles for NIST P-256
(from 2013 Gueron–Krasnov),
159128 cycles for Curve25519.
Verification on Sandy Bridge:
529000 cycles for ECDSA-P-256,
205741 cycles for Ed25519.
![Page 8: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/8.jpg)
2
Many papers have explored
Curve25519/Ed25519 speed.
e.g. 2015 Chou software:
on Intel Sandy Bridge (2011),
57164 cycles for keygen,
63526 cycles for signature,
205741 cycles for verification,
159128 cycles for ECDH.
Compare to, e.g., 2000 Brown–
Hankerson–Lopez–Menezes:
on Intel Pentium II (1997),
1920000 cycles for ECDH
using NIST P-256 curve.
3
AC : cycles for alg A on CPU C.
Does AC < BD prove that
A is better than B?
No! Beware change in CPU.
Maybe AC > BC ; AD > BD;
C does more work per cycle than
D, thanks to CPU manufacturer.
Sometimes people measure cost
in seconds instead of cycles.
Then they benefit
from more work per cycle and
from more cycles per second.
4
Better comparisons
(still raising many questions):
ECDH on Intel Pentium II/III
(still not exactly the same):
1920000 cycles for NIST P-256,
832457 cycles for Curve25519.
ECDH on Sandy Bridge:
374000 cycles for NIST P-256
(from 2013 Gueron–Krasnov),
159128 cycles for Curve25519.
Verification on Sandy Bridge:
529000 cycles for ECDSA-P-256,
205741 cycles for Ed25519.
![Page 9: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/9.jpg)
2
Many papers have explored
Curve25519/Ed25519 speed.
e.g. 2015 Chou software:
on Intel Sandy Bridge (2011),
57164 cycles for keygen,
63526 cycles for signature,
205741 cycles for verification,
159128 cycles for ECDH.
Compare to, e.g., 2000 Brown–
Hankerson–Lopez–Menezes:
on Intel Pentium II (1997),
1920000 cycles for ECDH
using NIST P-256 curve.
3
AC : cycles for alg A on CPU C.
Does AC < BD prove that
A is better than B?
No! Beware change in CPU.
Maybe AC > BC ; AD > BD;
C does more work per cycle than
D, thanks to CPU manufacturer.
Sometimes people measure cost
in seconds instead of cycles.
Then they benefit
from more work per cycle and
from more cycles per second.
4
Better comparisons
(still raising many questions):
ECDH on Intel Pentium II/III
(still not exactly the same):
1920000 cycles for NIST P-256,
832457 cycles for Curve25519.
ECDH on Sandy Bridge:
374000 cycles for NIST P-256
(from 2013 Gueron–Krasnov),
159128 cycles for Curve25519.
Verification on Sandy Bridge:
529000 cycles for ECDSA-P-256,
205741 cycles for Ed25519.
![Page 10: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/10.jpg)
3
AC : cycles for alg A on CPU C.
Does AC < BD prove that
A is better than B?
No! Beware change in CPU.
Maybe AC > BC ; AD > BD;
C does more work per cycle than
D, thanks to CPU manufacturer.
Sometimes people measure cost
in seconds instead of cycles.
Then they benefit
from more work per cycle and
from more cycles per second.
4
Better comparisons
(still raising many questions):
ECDH on Intel Pentium II/III
(still not exactly the same):
1920000 cycles for NIST P-256,
832457 cycles for Curve25519.
ECDH on Sandy Bridge:
374000 cycles for NIST P-256
(from 2013 Gueron–Krasnov),
159128 cycles for Curve25519.
Verification on Sandy Bridge:
529000 cycles for ECDSA-P-256,
205741 cycles for Ed25519.
![Page 11: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/11.jpg)
3
AC : cycles for alg A on CPU C.
Does AC < BD prove that
A is better than B?
No! Beware change in CPU.
Maybe AC > BC ; AD > BD;
C does more work per cycle than
D, thanks to CPU manufacturer.
Sometimes people measure cost
in seconds instead of cycles.
Then they benefit
from more work per cycle and
from more cycles per second.
4
Better comparisons
(still raising many questions):
ECDH on Intel Pentium II/III
(still not exactly the same):
1920000 cycles for NIST P-256,
832457 cycles for Curve25519.
ECDH on Sandy Bridge:
374000 cycles for NIST P-256
(from 2013 Gueron–Krasnov),
159128 cycles for Curve25519.
Verification on Sandy Bridge:
529000 cycles for ECDSA-P-256,
205741 cycles for Ed25519.
5
For each of these operations,
on each of these curves,
on each of these CPUs:
Simplest implementations
are much, much, much slower.
Questions in algorithm design
and software engineering:
How to build the fastest software
on, e.g., an ARM Cortex-A8 for
Ed25519 signature verification?
Answers feed back into crypto
design: e.g., choosing fast curves.
![Page 12: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/12.jpg)
3
AC : cycles for alg A on CPU C.
Does AC < BD prove that
A is better than B?
No! Beware change in CPU.
Maybe AC > BC ; AD > BD;
C does more work per cycle than
D, thanks to CPU manufacturer.
Sometimes people measure cost
in seconds instead of cycles.
Then they benefit
from more work per cycle and
from more cycles per second.
4
Better comparisons
(still raising many questions):
ECDH on Intel Pentium II/III
(still not exactly the same):
1920000 cycles for NIST P-256,
832457 cycles for Curve25519.
ECDH on Sandy Bridge:
374000 cycles for NIST P-256
(from 2013 Gueron–Krasnov),
159128 cycles for Curve25519.
Verification on Sandy Bridge:
529000 cycles for ECDSA-P-256,
205741 cycles for Ed25519.
5
For each of these operations,
on each of these curves,
on each of these CPUs:
Simplest implementations
are much, much, much slower.
Questions in algorithm design
and software engineering:
How to build the fastest software
on, e.g., an ARM Cortex-A8 for
Ed25519 signature verification?
Answers feed back into crypto
design: e.g., choosing fast curves.
![Page 13: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/13.jpg)
3
AC : cycles for alg A on CPU C.
Does AC < BD prove that
A is better than B?
No! Beware change in CPU.
Maybe AC > BC ; AD > BD;
C does more work per cycle than
D, thanks to CPU manufacturer.
Sometimes people measure cost
in seconds instead of cycles.
Then they benefit
from more work per cycle and
from more cycles per second.
4
Better comparisons
(still raising many questions):
ECDH on Intel Pentium II/III
(still not exactly the same):
1920000 cycles for NIST P-256,
832457 cycles for Curve25519.
ECDH on Sandy Bridge:
374000 cycles for NIST P-256
(from 2013 Gueron–Krasnov),
159128 cycles for Curve25519.
Verification on Sandy Bridge:
529000 cycles for ECDSA-P-256,
205741 cycles for Ed25519.
5
For each of these operations,
on each of these curves,
on each of these CPUs:
Simplest implementations
are much, much, much slower.
Questions in algorithm design
and software engineering:
How to build the fastest software
on, e.g., an ARM Cortex-A8 for
Ed25519 signature verification?
Answers feed back into crypto
design: e.g., choosing fast curves.
![Page 14: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/14.jpg)
4
Better comparisons
(still raising many questions):
ECDH on Intel Pentium II/III
(still not exactly the same):
1920000 cycles for NIST P-256,
832457 cycles for Curve25519.
ECDH on Sandy Bridge:
374000 cycles for NIST P-256
(from 2013 Gueron–Krasnov),
159128 cycles for Curve25519.
Verification on Sandy Bridge:
529000 cycles for ECDSA-P-256,
205741 cycles for Ed25519.
5
For each of these operations,
on each of these curves,
on each of these CPUs:
Simplest implementations
are much, much, much slower.
Questions in algorithm design
and software engineering:
How to build the fastest software
on, e.g., an ARM Cortex-A8 for
Ed25519 signature verification?
Answers feed back into crypto
design: e.g., choosing fast curves.
![Page 15: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/15.jpg)
4
Better comparisons
(still raising many questions):
ECDH on Intel Pentium II/III
(still not exactly the same):
1920000 cycles for NIST P-256,
832457 cycles for Curve25519.
ECDH on Sandy Bridge:
374000 cycles for NIST P-256
(from 2013 Gueron–Krasnov),
159128 cycles for Curve25519.
Verification on Sandy Bridge:
529000 cycles for ECDSA-P-256,
205741 cycles for Ed25519.
5
For each of these operations,
on each of these curves,
on each of these CPUs:
Simplest implementations
are much, much, much slower.
Questions in algorithm design
and software engineering:
How to build the fastest software
on, e.g., an ARM Cortex-A8 for
Ed25519 signature verification?
Answers feed back into crypto
design: e.g., choosing fast curves.
6
Several levels to optimize:
ECC ops: e.g.,verify SB = R + hA
windowing etc.��
Point ops: e.g.,P;Q 7→ P + Q
faster doubling etc.��
Field ops: e.g.,x1; x2 7→ x1x2 in Fp
delayed carries etc.��
Machine insns: e.g.,32-bit multiplication
��pipelining etc.��
Gates: e.g.,AND, OR, XOR
![Page 16: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/16.jpg)
4
Better comparisons
(still raising many questions):
ECDH on Intel Pentium II/III
(still not exactly the same):
1920000 cycles for NIST P-256,
832457 cycles for Curve25519.
ECDH on Sandy Bridge:
374000 cycles for NIST P-256
(from 2013 Gueron–Krasnov),
159128 cycles for Curve25519.
Verification on Sandy Bridge:
529000 cycles for ECDSA-P-256,
205741 cycles for Ed25519.
5
For each of these operations,
on each of these curves,
on each of these CPUs:
Simplest implementations
are much, much, much slower.
Questions in algorithm design
and software engineering:
How to build the fastest software
on, e.g., an ARM Cortex-A8 for
Ed25519 signature verification?
Answers feed back into crypto
design: e.g., choosing fast curves.
6
Several levels to optimize:
ECC ops: e.g.,verify SB = R + hA
windowing etc.��
Point ops: e.g.,P;Q 7→ P + Q
faster doubling etc.��
Field ops: e.g.,x1; x2 7→ x1x2 in Fp
delayed carries etc.��
Machine insns: e.g.,32-bit multiplication
��pipelining etc.��
Gates: e.g.,AND, OR, XOR
![Page 17: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/17.jpg)
4
Better comparisons
(still raising many questions):
ECDH on Intel Pentium II/III
(still not exactly the same):
1920000 cycles for NIST P-256,
832457 cycles for Curve25519.
ECDH on Sandy Bridge:
374000 cycles for NIST P-256
(from 2013 Gueron–Krasnov),
159128 cycles for Curve25519.
Verification on Sandy Bridge:
529000 cycles for ECDSA-P-256,
205741 cycles for Ed25519.
5
For each of these operations,
on each of these curves,
on each of these CPUs:
Simplest implementations
are much, much, much slower.
Questions in algorithm design
and software engineering:
How to build the fastest software
on, e.g., an ARM Cortex-A8 for
Ed25519 signature verification?
Answers feed back into crypto
design: e.g., choosing fast curves.
6
Several levels to optimize:
ECC ops: e.g.,verify SB = R + hA
windowing etc.��
Point ops: e.g.,P;Q 7→ P + Q
faster doubling etc.��
Field ops: e.g.,x1; x2 7→ x1x2 in Fp
delayed carries etc.��
Machine insns: e.g.,32-bit multiplication
��pipelining etc.��
Gates: e.g.,AND, OR, XOR
![Page 18: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/18.jpg)
5
For each of these operations,
on each of these curves,
on each of these CPUs:
Simplest implementations
are much, much, much slower.
Questions in algorithm design
and software engineering:
How to build the fastest software
on, e.g., an ARM Cortex-A8 for
Ed25519 signature verification?
Answers feed back into crypto
design: e.g., choosing fast curves.
6
Several levels to optimize:
ECC ops: e.g.,verify SB = R + hA
windowing etc.��
Point ops: e.g.,P;Q 7→ P + Q
faster doubling etc.��
Field ops: e.g.,x1; x2 7→ x1x2 in Fp
delayed carries etc.��
Machine insns: e.g.,32-bit multiplication
��pipelining etc.��
Gates: e.g.,AND, OR, XOR
![Page 19: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/19.jpg)
5
For each of these operations,
on each of these curves,
on each of these CPUs:
Simplest implementations
are much, much, much slower.
Questions in algorithm design
and software engineering:
How to build the fastest software
on, e.g., an ARM Cortex-A8 for
Ed25519 signature verification?
Answers feed back into crypto
design: e.g., choosing fast curves.
6
Several levels to optimize:
ECC ops: e.g.,verify SB = R + hA
windowing etc.��
Point ops: e.g.,P;Q 7→ P + Q
faster doubling etc.��
Field ops: e.g.,x1; x2 7→ x1x2 in Fp
delayed carries etc.��
Machine insns: e.g.,32-bit multiplication
��pipelining etc.��
Gates: e.g.,AND, OR, XOR
7
Single-scalar multiplication
Fundamental ECC operation:
n; P 7→ nP .
Input n is integer in, e.g.,˘0; 1; : : : ; 2256 − 1
¯.
Input P is point on elliptic curve.
Will build n; P 7→ nP
using additions P;Q 7→ P + Q
and subtractions P;Q 7→ P − Q.
Later will also look at
double-scalar multiplication
m;P; n;Q 7→ mP + nQ.
![Page 20: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/20.jpg)
5
For each of these operations,
on each of these curves,
on each of these CPUs:
Simplest implementations
are much, much, much slower.
Questions in algorithm design
and software engineering:
How to build the fastest software
on, e.g., an ARM Cortex-A8 for
Ed25519 signature verification?
Answers feed back into crypto
design: e.g., choosing fast curves.
6
Several levels to optimize:
ECC ops: e.g.,verify SB = R + hA
windowing etc.��
Point ops: e.g.,P;Q 7→ P + Q
faster doubling etc.��
Field ops: e.g.,x1; x2 7→ x1x2 in Fp
delayed carries etc.��
Machine insns: e.g.,32-bit multiplication
��pipelining etc.��
Gates: e.g.,AND, OR, XOR
7
Single-scalar multiplication
Fundamental ECC operation:
n; P 7→ nP .
Input n is integer in, e.g.,˘0; 1; : : : ; 2256 − 1
¯.
Input P is point on elliptic curve.
Will build n; P 7→ nP
using additions P;Q 7→ P + Q
and subtractions P;Q 7→ P − Q.
Later will also look at
double-scalar multiplication
m;P; n;Q 7→ mP + nQ.
![Page 21: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/21.jpg)
5
For each of these operations,
on each of these curves,
on each of these CPUs:
Simplest implementations
are much, much, much slower.
Questions in algorithm design
and software engineering:
How to build the fastest software
on, e.g., an ARM Cortex-A8 for
Ed25519 signature verification?
Answers feed back into crypto
design: e.g., choosing fast curves.
6
Several levels to optimize:
ECC ops: e.g.,verify SB = R + hA
windowing etc.��
Point ops: e.g.,P;Q 7→ P + Q
faster doubling etc.��
Field ops: e.g.,x1; x2 7→ x1x2 in Fp
delayed carries etc.��
Machine insns: e.g.,32-bit multiplication
��pipelining etc.��
Gates: e.g.,AND, OR, XOR
7
Single-scalar multiplication
Fundamental ECC operation:
n; P 7→ nP .
Input n is integer in, e.g.,˘0; 1; : : : ; 2256 − 1
¯.
Input P is point on elliptic curve.
Will build n; P 7→ nP
using additions P;Q 7→ P + Q
and subtractions P;Q 7→ P − Q.
Later will also look at
double-scalar multiplication
m;P; n;Q 7→ mP + nQ.
![Page 22: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/22.jpg)
6
Several levels to optimize:
ECC ops: e.g.,verify SB = R + hA
windowing etc.��
Point ops: e.g.,P;Q 7→ P + Q
faster doubling etc.��
Field ops: e.g.,x1; x2 7→ x1x2 in Fp
delayed carries etc.��
Machine insns: e.g.,32-bit multiplication
��pipelining etc.��
Gates: e.g.,AND, OR, XOR
7
Single-scalar multiplication
Fundamental ECC operation:
n; P 7→ nP .
Input n is integer in, e.g.,˘0; 1; : : : ; 2256 − 1
¯.
Input P is point on elliptic curve.
Will build n; P 7→ nP
using additions P;Q 7→ P + Q
and subtractions P;Q 7→ P − Q.
Later will also look at
double-scalar multiplication
m;P; n;Q 7→ mP + nQ.
![Page 23: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/23.jpg)
6
Several levels to optimize:
ECC ops: e.g.,verify SB = R + hA
windowing etc.��
Point ops: e.g.,P;Q 7→ P + Q
faster doubling etc.��
Field ops: e.g.,x1; x2 7→ x1x2 in Fp
delayed carries etc.��
Machine insns: e.g.,32-bit multiplication
��pipelining etc.��
Gates: e.g.,AND, OR, XOR
7
Single-scalar multiplication
Fundamental ECC operation:
n; P 7→ nP .
Input n is integer in, e.g.,˘0; 1; : : : ; 2256 − 1
¯.
Input P is point on elliptic curve.
Will build n; P 7→ nP
using additions P;Q 7→ P + Q
and subtractions P;Q 7→ P − Q.
Later will also look at
double-scalar multiplication
m;P; n;Q 7→ mP + nQ.
8
Left-to-right binary method
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
R = scalarmult(n//2,P)
R = R + R
if n % 2: R = R + P
return R
Two Python notes:
• n//2 in Python means bn=2c.
• Recursion depth is limited.
See sys.setrecursionlimit.
![Page 24: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/24.jpg)
6
Several levels to optimize:
ECC ops: e.g.,verify SB = R + hA
windowing etc.��
Point ops: e.g.,P;Q 7→ P + Q
faster doubling etc.��
Field ops: e.g.,x1; x2 7→ x1x2 in Fp
delayed carries etc.��
Machine insns: e.g.,32-bit multiplication
��pipelining etc.��
Gates: e.g.,AND, OR, XOR
7
Single-scalar multiplication
Fundamental ECC operation:
n; P 7→ nP .
Input n is integer in, e.g.,˘0; 1; : : : ; 2256 − 1
¯.
Input P is point on elliptic curve.
Will build n; P 7→ nP
using additions P;Q 7→ P + Q
and subtractions P;Q 7→ P − Q.
Later will also look at
double-scalar multiplication
m;P; n;Q 7→ mP + nQ.
8
Left-to-right binary method
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
R = scalarmult(n//2,P)
R = R + R
if n % 2: R = R + P
return R
Two Python notes:
• n//2 in Python means bn=2c.
• Recursion depth is limited.
See sys.setrecursionlimit.
![Page 25: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/25.jpg)
6
Several levels to optimize:
ECC ops: e.g.,verify SB = R + hA
windowing etc.��
Point ops: e.g.,P;Q 7→ P + Q
faster doubling etc.��
Field ops: e.g.,x1; x2 7→ x1x2 in Fp
delayed carries etc.��
Machine insns: e.g.,32-bit multiplication
��pipelining etc.��
Gates: e.g.,AND, OR, XOR
7
Single-scalar multiplication
Fundamental ECC operation:
n; P 7→ nP .
Input n is integer in, e.g.,˘0; 1; : : : ; 2256 − 1
¯.
Input P is point on elliptic curve.
Will build n; P 7→ nP
using additions P;Q 7→ P + Q
and subtractions P;Q 7→ P − Q.
Later will also look at
double-scalar multiplication
m;P; n;Q 7→ mP + nQ.
8
Left-to-right binary method
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
R = scalarmult(n//2,P)
R = R + R
if n % 2: R = R + P
return R
Two Python notes:
• n//2 in Python means bn=2c.
• Recursion depth is limited.
See sys.setrecursionlimit.
![Page 26: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/26.jpg)
7
Single-scalar multiplication
Fundamental ECC operation:
n; P 7→ nP .
Input n is integer in, e.g.,˘0; 1; : : : ; 2256 − 1
¯.
Input P is point on elliptic curve.
Will build n; P 7→ nP
using additions P;Q 7→ P + Q
and subtractions P;Q 7→ P − Q.
Later will also look at
double-scalar multiplication
m;P; n;Q 7→ mP + nQ.
8
Left-to-right binary method
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
R = scalarmult(n//2,P)
R = R + R
if n % 2: R = R + P
return R
Two Python notes:
• n//2 in Python means bn=2c.
• Recursion depth is limited.
See sys.setrecursionlimit.
![Page 27: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/27.jpg)
7
Single-scalar multiplication
Fundamental ECC operation:
n; P 7→ nP .
Input n is integer in, e.g.,˘0; 1; : : : ; 2256 − 1
¯.
Input P is point on elliptic curve.
Will build n; P 7→ nP
using additions P;Q 7→ P + Q
and subtractions P;Q 7→ P − Q.
Later will also look at
double-scalar multiplication
m;P; n;Q 7→ mP + nQ.
8
Left-to-right binary method
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
R = scalarmult(n//2,P)
R = R + R
if n % 2: R = R + P
return R
Two Python notes:
• n//2 in Python means bn=2c.
• Recursion depth is limited.
See sys.setrecursionlimit.
9
This recursion computes nP as
• 2“n
2P”
if n ∈ 2Z.
e.g. 20P = 2 · 10P .
• 2
„n − 1
2P
«+ P if n ∈ 1 + 2Z.
e.g. 21P = 2 · 10P + P .
Base cases in recursion:
0P = 0. For Edwards: 0 = (0; 1).
1P = P . Could omit this case.
Assuming n ≥ 0 for simplicity.
Otherwise use nP = −(−n)P .
![Page 28: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/28.jpg)
7
Single-scalar multiplication
Fundamental ECC operation:
n; P 7→ nP .
Input n is integer in, e.g.,˘0; 1; : : : ; 2256 − 1
¯.
Input P is point on elliptic curve.
Will build n; P 7→ nP
using additions P;Q 7→ P + Q
and subtractions P;Q 7→ P − Q.
Later will also look at
double-scalar multiplication
m;P; n;Q 7→ mP + nQ.
8
Left-to-right binary method
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
R = scalarmult(n//2,P)
R = R + R
if n % 2: R = R + P
return R
Two Python notes:
• n//2 in Python means bn=2c.
• Recursion depth is limited.
See sys.setrecursionlimit.
9
This recursion computes nP as
• 2“n
2P”
if n ∈ 2Z.
e.g. 20P = 2 · 10P .
• 2
„n − 1
2P
«+ P if n ∈ 1 + 2Z.
e.g. 21P = 2 · 10P + P .
Base cases in recursion:
0P = 0. For Edwards: 0 = (0; 1).
1P = P . Could omit this case.
Assuming n ≥ 0 for simplicity.
Otherwise use nP = −(−n)P .
![Page 29: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/29.jpg)
7
Single-scalar multiplication
Fundamental ECC operation:
n; P 7→ nP .
Input n is integer in, e.g.,˘0; 1; : : : ; 2256 − 1
¯.
Input P is point on elliptic curve.
Will build n; P 7→ nP
using additions P;Q 7→ P + Q
and subtractions P;Q 7→ P − Q.
Later will also look at
double-scalar multiplication
m;P; n;Q 7→ mP + nQ.
8
Left-to-right binary method
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
R = scalarmult(n//2,P)
R = R + R
if n % 2: R = R + P
return R
Two Python notes:
• n//2 in Python means bn=2c.
• Recursion depth is limited.
See sys.setrecursionlimit.
9
This recursion computes nP as
• 2“n
2P”
if n ∈ 2Z.
e.g. 20P = 2 · 10P .
• 2
„n − 1
2P
«+ P if n ∈ 1 + 2Z.
e.g. 21P = 2 · 10P + P .
Base cases in recursion:
0P = 0. For Edwards: 0 = (0; 1).
1P = P . Could omit this case.
Assuming n ≥ 0 for simplicity.
Otherwise use nP = −(−n)P .
![Page 30: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/30.jpg)
8
Left-to-right binary method
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
R = scalarmult(n//2,P)
R = R + R
if n % 2: R = R + P
return R
Two Python notes:
• n//2 in Python means bn=2c.
• Recursion depth is limited.
See sys.setrecursionlimit.
9
This recursion computes nP as
• 2“n
2P”
if n ∈ 2Z.
e.g. 20P = 2 · 10P .
• 2
„n − 1
2P
«+ P if n ∈ 1 + 2Z.
e.g. 21P = 2 · 10P + P .
Base cases in recursion:
0P = 0. For Edwards: 0 = (0; 1).
1P = P . Could omit this case.
Assuming n ≥ 0 for simplicity.
Otherwise use nP = −(−n)P .
![Page 31: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/31.jpg)
8
Left-to-right binary method
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
R = scalarmult(n//2,P)
R = R + R
if n % 2: R = R + P
return R
Two Python notes:
• n//2 in Python means bn=2c.
• Recursion depth is limited.
See sys.setrecursionlimit.
9
This recursion computes nP as
• 2“n
2P”
if n ∈ 2Z.
e.g. 20P = 2 · 10P .
• 2
„n − 1
2P
«+ P if n ∈ 1 + 2Z.
e.g. 21P = 2 · 10P + P .
Base cases in recursion:
0P = 0. For Edwards: 0 = (0; 1).
1P = P . Could omit this case.
Assuming n ≥ 0 for simplicity.
Otherwise use nP = −(−n)P .
10
If 0 ≤ n < 2b then
this algorithm uses
≤2b − 2 additions: specifically
≤b − 1 doublings and
≤b − 1 additions of P .
Example of worst case:
31P = 2(2(2(2P+P )+P )+P )+P .
31 = (11111)2; b = 5;
4 doublings; 4 more additions.
Average case is better: e.g.
35P = 2(2(2(2(2P ))) + P ) + P .
35 = (100011)2; b = 6;
5 doublings; 2 additions.
![Page 32: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/32.jpg)
8
Left-to-right binary method
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
R = scalarmult(n//2,P)
R = R + R
if n % 2: R = R + P
return R
Two Python notes:
• n//2 in Python means bn=2c.
• Recursion depth is limited.
See sys.setrecursionlimit.
9
This recursion computes nP as
• 2“n
2P”
if n ∈ 2Z.
e.g. 20P = 2 · 10P .
• 2
„n − 1
2P
«+ P if n ∈ 1 + 2Z.
e.g. 21P = 2 · 10P + P .
Base cases in recursion:
0P = 0. For Edwards: 0 = (0; 1).
1P = P . Could omit this case.
Assuming n ≥ 0 for simplicity.
Otherwise use nP = −(−n)P .
10
If 0 ≤ n < 2b then
this algorithm uses
≤2b − 2 additions: specifically
≤b − 1 doublings and
≤b − 1 additions of P .
Example of worst case:
31P = 2(2(2(2P+P )+P )+P )+P .
31 = (11111)2; b = 5;
4 doublings; 4 more additions.
Average case is better: e.g.
35P = 2(2(2(2(2P ))) + P ) + P .
35 = (100011)2; b = 6;
5 doublings; 2 additions.
![Page 33: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/33.jpg)
8
Left-to-right binary method
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
R = scalarmult(n//2,P)
R = R + R
if n % 2: R = R + P
return R
Two Python notes:
• n//2 in Python means bn=2c.
• Recursion depth is limited.
See sys.setrecursionlimit.
9
This recursion computes nP as
• 2“n
2P”
if n ∈ 2Z.
e.g. 20P = 2 · 10P .
• 2
„n − 1
2P
«+ P if n ∈ 1 + 2Z.
e.g. 21P = 2 · 10P + P .
Base cases in recursion:
0P = 0. For Edwards: 0 = (0; 1).
1P = P . Could omit this case.
Assuming n ≥ 0 for simplicity.
Otherwise use nP = −(−n)P .
10
If 0 ≤ n < 2b then
this algorithm uses
≤2b − 2 additions: specifically
≤b − 1 doublings and
≤b − 1 additions of P .
Example of worst case:
31P = 2(2(2(2P+P )+P )+P )+P .
31 = (11111)2; b = 5;
4 doublings; 4 more additions.
Average case is better: e.g.
35P = 2(2(2(2(2P ))) + P ) + P .
35 = (100011)2; b = 6;
5 doublings; 2 additions.
![Page 34: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/34.jpg)
9
This recursion computes nP as
• 2“n
2P”
if n ∈ 2Z.
e.g. 20P = 2 · 10P .
• 2
„n − 1
2P
«+ P if n ∈ 1 + 2Z.
e.g. 21P = 2 · 10P + P .
Base cases in recursion:
0P = 0. For Edwards: 0 = (0; 1).
1P = P . Could omit this case.
Assuming n ≥ 0 for simplicity.
Otherwise use nP = −(−n)P .
10
If 0 ≤ n < 2b then
this algorithm uses
≤2b − 2 additions: specifically
≤b − 1 doublings and
≤b − 1 additions of P .
Example of worst case:
31P = 2(2(2(2P+P )+P )+P )+P .
31 = (11111)2; b = 5;
4 doublings; 4 more additions.
Average case is better: e.g.
35P = 2(2(2(2(2P ))) + P ) + P .
35 = (100011)2; b = 6;
5 doublings; 2 additions.
![Page 35: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/35.jpg)
9
This recursion computes nP as
• 2“n
2P”
if n ∈ 2Z.
e.g. 20P = 2 · 10P .
• 2
„n − 1
2P
«+ P if n ∈ 1 + 2Z.
e.g. 21P = 2 · 10P + P .
Base cases in recursion:
0P = 0. For Edwards: 0 = (0; 1).
1P = P . Could omit this case.
Assuming n ≥ 0 for simplicity.
Otherwise use nP = −(−n)P .
10
If 0 ≤ n < 2b then
this algorithm uses
≤2b − 2 additions: specifically
≤b − 1 doublings and
≤b − 1 additions of P .
Example of worst case:
31P = 2(2(2(2P+P )+P )+P )+P .
31 = (11111)2; b = 5;
4 doublings; 4 more additions.
Average case is better: e.g.
35P = 2(2(2(2(2P ))) + P ) + P .
35 = (100011)2; b = 6;
5 doublings; 2 additions.
11
Non-adjacent form (NAF)
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
if n % 4 == 1:
R = scalarmult((n-1)/4,P)
R = R + R
return (R + R) + P
if n % 4 == 3:
R = scalarmult((n+1)/4,P)
R = R + R
return (R + R) - P
R = scalarmult(n/2,P)
return R + R
![Page 36: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/36.jpg)
9
This recursion computes nP as
• 2“n
2P”
if n ∈ 2Z.
e.g. 20P = 2 · 10P .
• 2
„n − 1
2P
«+ P if n ∈ 1 + 2Z.
e.g. 21P = 2 · 10P + P .
Base cases in recursion:
0P = 0. For Edwards: 0 = (0; 1).
1P = P . Could omit this case.
Assuming n ≥ 0 for simplicity.
Otherwise use nP = −(−n)P .
10
If 0 ≤ n < 2b then
this algorithm uses
≤2b − 2 additions: specifically
≤b − 1 doublings and
≤b − 1 additions of P .
Example of worst case:
31P = 2(2(2(2P+P )+P )+P )+P .
31 = (11111)2; b = 5;
4 doublings; 4 more additions.
Average case is better: e.g.
35P = 2(2(2(2(2P ))) + P ) + P .
35 = (100011)2; b = 6;
5 doublings; 2 additions.
11
Non-adjacent form (NAF)
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
if n % 4 == 1:
R = scalarmult((n-1)/4,P)
R = R + R
return (R + R) + P
if n % 4 == 3:
R = scalarmult((n+1)/4,P)
R = R + R
return (R + R) - P
R = scalarmult(n/2,P)
return R + R
![Page 37: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/37.jpg)
9
This recursion computes nP as
• 2“n
2P”
if n ∈ 2Z.
e.g. 20P = 2 · 10P .
• 2
„n − 1
2P
«+ P if n ∈ 1 + 2Z.
e.g. 21P = 2 · 10P + P .
Base cases in recursion:
0P = 0. For Edwards: 0 = (0; 1).
1P = P . Could omit this case.
Assuming n ≥ 0 for simplicity.
Otherwise use nP = −(−n)P .
10
If 0 ≤ n < 2b then
this algorithm uses
≤2b − 2 additions: specifically
≤b − 1 doublings and
≤b − 1 additions of P .
Example of worst case:
31P = 2(2(2(2P+P )+P )+P )+P .
31 = (11111)2; b = 5;
4 doublings; 4 more additions.
Average case is better: e.g.
35P = 2(2(2(2(2P ))) + P ) + P .
35 = (100011)2; b = 6;
5 doublings; 2 additions.
11
Non-adjacent form (NAF)
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
if n % 4 == 1:
R = scalarmult((n-1)/4,P)
R = R + R
return (R + R) + P
if n % 4 == 3:
R = scalarmult((n+1)/4,P)
R = R + R
return (R + R) - P
R = scalarmult(n/2,P)
return R + R
![Page 38: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/38.jpg)
10
If 0 ≤ n < 2b then
this algorithm uses
≤2b − 2 additions: specifically
≤b − 1 doublings and
≤b − 1 additions of P .
Example of worst case:
31P = 2(2(2(2P+P )+P )+P )+P .
31 = (11111)2; b = 5;
4 doublings; 4 more additions.
Average case is better: e.g.
35P = 2(2(2(2(2P ))) + P ) + P .
35 = (100011)2; b = 6;
5 doublings; 2 additions.
11
Non-adjacent form (NAF)
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
if n % 4 == 1:
R = scalarmult((n-1)/4,P)
R = R + R
return (R + R) + P
if n % 4 == 3:
R = scalarmult((n+1)/4,P)
R = R + R
return (R + R) - P
R = scalarmult(n/2,P)
return R + R
![Page 39: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/39.jpg)
10
If 0 ≤ n < 2b then
this algorithm uses
≤2b − 2 additions: specifically
≤b − 1 doublings and
≤b − 1 additions of P .
Example of worst case:
31P = 2(2(2(2P+P )+P )+P )+P .
31 = (11111)2; b = 5;
4 doublings; 4 more additions.
Average case is better: e.g.
35P = 2(2(2(2(2P ))) + P ) + P .
35 = (100011)2; b = 6;
5 doublings; 2 additions.
11
Non-adjacent form (NAF)
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
if n % 4 == 1:
R = scalarmult((n-1)/4,P)
R = R + R
return (R + R) + P
if n % 4 == 3:
R = scalarmult((n+1)/4,P)
R = R + R
return (R + R) - P
R = scalarmult(n/2,P)
return R + R
12
Subtraction on the curve
is as cheap as addition.
NAF takes advantage of this.
31P = 2(2(2(2(2P ))))− P .
31 = (100001)2; 1 denotes −1.
35P = 2(2(2(2(2P )) + P ))− P .
35 = (100101)2.
“Non-adjacent”: ±P ops are
separated by ≥2 doublings.
Worst case: ≈b doublings
plus ≈b=2 additions of ±P .
On average ≈b=3 additions.
![Page 40: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/40.jpg)
10
If 0 ≤ n < 2b then
this algorithm uses
≤2b − 2 additions: specifically
≤b − 1 doublings and
≤b − 1 additions of P .
Example of worst case:
31P = 2(2(2(2P+P )+P )+P )+P .
31 = (11111)2; b = 5;
4 doublings; 4 more additions.
Average case is better: e.g.
35P = 2(2(2(2(2P ))) + P ) + P .
35 = (100011)2; b = 6;
5 doublings; 2 additions.
11
Non-adjacent form (NAF)
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
if n % 4 == 1:
R = scalarmult((n-1)/4,P)
R = R + R
return (R + R) + P
if n % 4 == 3:
R = scalarmult((n+1)/4,P)
R = R + R
return (R + R) - P
R = scalarmult(n/2,P)
return R + R
12
Subtraction on the curve
is as cheap as addition.
NAF takes advantage of this.
31P = 2(2(2(2(2P ))))− P .
31 = (100001)2; 1 denotes −1.
35P = 2(2(2(2(2P )) + P ))− P .
35 = (100101)2.
“Non-adjacent”: ±P ops are
separated by ≥2 doublings.
Worst case: ≈b doublings
plus ≈b=2 additions of ±P .
On average ≈b=3 additions.
![Page 41: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/41.jpg)
10
If 0 ≤ n < 2b then
this algorithm uses
≤2b − 2 additions: specifically
≤b − 1 doublings and
≤b − 1 additions of P .
Example of worst case:
31P = 2(2(2(2P+P )+P )+P )+P .
31 = (11111)2; b = 5;
4 doublings; 4 more additions.
Average case is better: e.g.
35P = 2(2(2(2(2P ))) + P ) + P .
35 = (100011)2; b = 6;
5 doublings; 2 additions.
11
Non-adjacent form (NAF)
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
if n % 4 == 1:
R = scalarmult((n-1)/4,P)
R = R + R
return (R + R) + P
if n % 4 == 3:
R = scalarmult((n+1)/4,P)
R = R + R
return (R + R) - P
R = scalarmult(n/2,P)
return R + R
12
Subtraction on the curve
is as cheap as addition.
NAF takes advantage of this.
31P = 2(2(2(2(2P ))))− P .
31 = (100001)2; 1 denotes −1.
35P = 2(2(2(2(2P )) + P ))− P .
35 = (100101)2.
“Non-adjacent”: ±P ops are
separated by ≥2 doublings.
Worst case: ≈b doublings
plus ≈b=2 additions of ±P .
On average ≈b=3 additions.
![Page 42: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/42.jpg)
11
Non-adjacent form (NAF)
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
if n % 4 == 1:
R = scalarmult((n-1)/4,P)
R = R + R
return (R + R) + P
if n % 4 == 3:
R = scalarmult((n+1)/4,P)
R = R + R
return (R + R) - P
R = scalarmult(n/2,P)
return R + R
12
Subtraction on the curve
is as cheap as addition.
NAF takes advantage of this.
31P = 2(2(2(2(2P ))))− P .
31 = (100001)2; 1 denotes −1.
35P = 2(2(2(2(2P )) + P ))− P .
35 = (100101)2.
“Non-adjacent”: ±P ops are
separated by ≥2 doublings.
Worst case: ≈b doublings
plus ≈b=2 additions of ±P .
On average ≈b=3 additions.
![Page 43: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/43.jpg)
11
Non-adjacent form (NAF)
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
if n % 4 == 1:
R = scalarmult((n-1)/4,P)
R = R + R
return (R + R) + P
if n % 4 == 3:
R = scalarmult((n+1)/4,P)
R = R + R
return (R + R) - P
R = scalarmult(n/2,P)
return R + R
12
Subtraction on the curve
is as cheap as addition.
NAF takes advantage of this.
31P = 2(2(2(2(2P ))))− P .
31 = (100001)2; 1 denotes −1.
35P = 2(2(2(2(2P )) + P ))− P .
35 = (100101)2.
“Non-adjacent”: ±P ops are
separated by ≥2 doublings.
Worst case: ≈b doublings
plus ≈b=2 additions of ±P .
On average ≈b=3 additions.
13
Width-2 signed sliding windows
def window2(n,P,P3):
if n == 0: return 0
if n == 1: return P
if n == 3: return P3
if n % 8 == 1:
R = window2((n-1)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P
if n % 8 == 3:
R = window2((n-3)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P3
![Page 44: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/44.jpg)
11
Non-adjacent form (NAF)
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
if n % 4 == 1:
R = scalarmult((n-1)/4,P)
R = R + R
return (R + R) + P
if n % 4 == 3:
R = scalarmult((n+1)/4,P)
R = R + R
return (R + R) - P
R = scalarmult(n/2,P)
return R + R
12
Subtraction on the curve
is as cheap as addition.
NAF takes advantage of this.
31P = 2(2(2(2(2P ))))− P .
31 = (100001)2; 1 denotes −1.
35P = 2(2(2(2(2P )) + P ))− P .
35 = (100101)2.
“Non-adjacent”: ±P ops are
separated by ≥2 doublings.
Worst case: ≈b doublings
plus ≈b=2 additions of ±P .
On average ≈b=3 additions.
13
Width-2 signed sliding windows
def window2(n,P,P3):
if n == 0: return 0
if n == 1: return P
if n == 3: return P3
if n % 8 == 1:
R = window2((n-1)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P
if n % 8 == 3:
R = window2((n-3)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P3
![Page 45: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/45.jpg)
11
Non-adjacent form (NAF)
def scalarmult(n,P):
if n == 0: return 0
if n == 1: return P
if n % 4 == 1:
R = scalarmult((n-1)/4,P)
R = R + R
return (R + R) + P
if n % 4 == 3:
R = scalarmult((n+1)/4,P)
R = R + R
return (R + R) - P
R = scalarmult(n/2,P)
return R + R
12
Subtraction on the curve
is as cheap as addition.
NAF takes advantage of this.
31P = 2(2(2(2(2P ))))− P .
31 = (100001)2; 1 denotes −1.
35P = 2(2(2(2(2P )) + P ))− P .
35 = (100101)2.
“Non-adjacent”: ±P ops are
separated by ≥2 doublings.
Worst case: ≈b doublings
plus ≈b=2 additions of ±P .
On average ≈b=3 additions.
13
Width-2 signed sliding windows
def window2(n,P,P3):
if n == 0: return 0
if n == 1: return P
if n == 3: return P3
if n % 8 == 1:
R = window2((n-1)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P
if n % 8 == 3:
R = window2((n-3)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P3
![Page 46: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/46.jpg)
12
Subtraction on the curve
is as cheap as addition.
NAF takes advantage of this.
31P = 2(2(2(2(2P ))))− P .
31 = (100001)2; 1 denotes −1.
35P = 2(2(2(2(2P )) + P ))− P .
35 = (100101)2.
“Non-adjacent”: ±P ops are
separated by ≥2 doublings.
Worst case: ≈b doublings
plus ≈b=2 additions of ±P .
On average ≈b=3 additions.
13
Width-2 signed sliding windows
def window2(n,P,P3):
if n == 0: return 0
if n == 1: return P
if n == 3: return P3
if n % 8 == 1:
R = window2((n-1)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P
if n % 8 == 3:
R = window2((n-3)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P3
![Page 47: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/47.jpg)
12
Subtraction on the curve
is as cheap as addition.
NAF takes advantage of this.
31P = 2(2(2(2(2P ))))− P .
31 = (100001)2; 1 denotes −1.
35P = 2(2(2(2(2P )) + P ))− P .
35 = (100101)2.
“Non-adjacent”: ±P ops are
separated by ≥2 doublings.
Worst case: ≈b doublings
plus ≈b=2 additions of ±P .
On average ≈b=3 additions.
13
Width-2 signed sliding windows
def window2(n,P,P3):
if n == 0: return 0
if n == 1: return P
if n == 3: return P3
if n % 8 == 1:
R = window2((n-1)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P
if n % 8 == 3:
R = window2((n-3)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P3
14
if n % 8 == 5:
R = window2((n+3)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P3
if n % 8 == 7:
R = window2((n+1)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P
R = window2(n/2,P,P3)
return R + R
def scalarmult(n,P):
return window2(n,P,P+P+P)
![Page 48: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/48.jpg)
12
Subtraction on the curve
is as cheap as addition.
NAF takes advantage of this.
31P = 2(2(2(2(2P ))))− P .
31 = (100001)2; 1 denotes −1.
35P = 2(2(2(2(2P )) + P ))− P .
35 = (100101)2.
“Non-adjacent”: ±P ops are
separated by ≥2 doublings.
Worst case: ≈b doublings
plus ≈b=2 additions of ±P .
On average ≈b=3 additions.
13
Width-2 signed sliding windows
def window2(n,P,P3):
if n == 0: return 0
if n == 1: return P
if n == 3: return P3
if n % 8 == 1:
R = window2((n-1)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P
if n % 8 == 3:
R = window2((n-3)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P3
14
if n % 8 == 5:
R = window2((n+3)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P3
if n % 8 == 7:
R = window2((n+1)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P
R = window2(n/2,P,P3)
return R + R
def scalarmult(n,P):
return window2(n,P,P+P+P)
![Page 49: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/49.jpg)
12
Subtraction on the curve
is as cheap as addition.
NAF takes advantage of this.
31P = 2(2(2(2(2P ))))− P .
31 = (100001)2; 1 denotes −1.
35P = 2(2(2(2(2P )) + P ))− P .
35 = (100101)2.
“Non-adjacent”: ±P ops are
separated by ≥2 doublings.
Worst case: ≈b doublings
plus ≈b=2 additions of ±P .
On average ≈b=3 additions.
13
Width-2 signed sliding windows
def window2(n,P,P3):
if n == 0: return 0
if n == 1: return P
if n == 3: return P3
if n % 8 == 1:
R = window2((n-1)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P
if n % 8 == 3:
R = window2((n-3)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P3
14
if n % 8 == 5:
R = window2((n+3)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P3
if n % 8 == 7:
R = window2((n+1)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P
R = window2(n/2,P,P3)
return R + R
def scalarmult(n,P):
return window2(n,P,P+P+P)
![Page 50: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/50.jpg)
13
Width-2 signed sliding windows
def window2(n,P,P3):
if n == 0: return 0
if n == 1: return P
if n == 3: return P3
if n % 8 == 1:
R = window2((n-1)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P
if n % 8 == 3:
R = window2((n-3)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P3
14
if n % 8 == 5:
R = window2((n+3)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P3
if n % 8 == 7:
R = window2((n+1)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P
R = window2(n/2,P,P3)
return R + R
def scalarmult(n,P):
return window2(n,P,P+P+P)
![Page 51: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/51.jpg)
13
Width-2 signed sliding windows
def window2(n,P,P3):
if n == 0: return 0
if n == 1: return P
if n == 3: return P3
if n % 8 == 1:
R = window2((n-1)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P
if n % 8 == 3:
R = window2((n-3)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P3
14
if n % 8 == 5:
R = window2((n+3)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P3
if n % 8 == 7:
R = window2((n+1)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P
R = window2(n/2,P,P3)
return R + R
def scalarmult(n,P):
return window2(n,P,P+P+P)
15
Worst case: ≈b doublings plus
≈b=3 additions of ±P or ±3P .
On average ≈b=4 additions.
![Page 52: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/52.jpg)
13
Width-2 signed sliding windows
def window2(n,P,P3):
if n == 0: return 0
if n == 1: return P
if n == 3: return P3
if n % 8 == 1:
R = window2((n-1)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P
if n % 8 == 3:
R = window2((n-3)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P3
14
if n % 8 == 5:
R = window2((n+3)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P3
if n % 8 == 7:
R = window2((n+1)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P
R = window2(n/2,P,P3)
return R + R
def scalarmult(n,P):
return window2(n,P,P+P+P)
15
Worst case: ≈b doublings plus
≈b=3 additions of ±P or ±3P .
On average ≈b=4 additions.
![Page 53: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/53.jpg)
13
Width-2 signed sliding windows
def window2(n,P,P3):
if n == 0: return 0
if n == 1: return P
if n == 3: return P3
if n % 8 == 1:
R = window2((n-1)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P
if n % 8 == 3:
R = window2((n-3)/8,P,P3)
R = R + R
R = R + R
return (R + R) + P3
14
if n % 8 == 5:
R = window2((n+3)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P3
if n % 8 == 7:
R = window2((n+1)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P
R = window2(n/2,P,P3)
return R + R
def scalarmult(n,P):
return window2(n,P,P+P+P)
15
Worst case: ≈b doublings plus
≈b=3 additions of ±P or ±3P .
On average ≈b=4 additions.
![Page 54: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/54.jpg)
14
if n % 8 == 5:
R = window2((n+3)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P3
if n % 8 == 7:
R = window2((n+1)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P
R = window2(n/2,P,P3)
return R + R
def scalarmult(n,P):
return window2(n,P,P+P+P)
15
Worst case: ≈b doublings plus
≈b=3 additions of ±P or ±3P .
On average ≈b=4 additions.
![Page 55: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/55.jpg)
14
if n % 8 == 5:
R = window2((n+3)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P3
if n % 8 == 7:
R = window2((n+1)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P
R = window2(n/2,P,P3)
return R + R
def scalarmult(n,P):
return window2(n,P,P+P+P)
15
Worst case: ≈b doublings plus
≈b=3 additions of ±P or ±3P .
On average ≈b=4 additions.
Width-3 signed sliding windows:
Precompute P; 3P; 5P; 7P .
On average ≈b=5 additions.
![Page 56: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/56.jpg)
14
if n % 8 == 5:
R = window2((n+3)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P3
if n % 8 == 7:
R = window2((n+1)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P
R = window2(n/2,P,P3)
return R + R
def scalarmult(n,P):
return window2(n,P,P+P+P)
15
Worst case: ≈b doublings plus
≈b=3 additions of ±P or ±3P .
On average ≈b=4 additions.
Width-3 signed sliding windows:
Precompute P; 3P; 5P; 7P .
On average ≈b=5 additions.
Width 4: Precompute
P; 3P; 5P; 7P; 9P; 11P; 13P; 15P .
On average ≈b=6 additions.
![Page 57: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/57.jpg)
14
if n % 8 == 5:
R = window2((n+3)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P3
if n % 8 == 7:
R = window2((n+1)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P
R = window2(n/2,P,P3)
return R + R
def scalarmult(n,P):
return window2(n,P,P+P+P)
15
Worst case: ≈b doublings plus
≈b=3 additions of ±P or ±3P .
On average ≈b=4 additions.
Width-3 signed sliding windows:
Precompute P; 3P; 5P; 7P .
On average ≈b=5 additions.
Width 4: Precompute
P; 3P; 5P; 7P; 9P; 11P; 13P; 15P .
On average ≈b=6 additions.
Cost of precomputation
eventually outweighs savings.
Optimal: ≈b doublings plus
roughly b=lg b additions.
![Page 58: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/58.jpg)
14
if n % 8 == 5:
R = window2((n+3)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P3
if n % 8 == 7:
R = window2((n+1)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P
R = window2(n/2,P,P3)
return R + R
def scalarmult(n,P):
return window2(n,P,P+P+P)
15
Worst case: ≈b doublings plus
≈b=3 additions of ±P or ±3P .
On average ≈b=4 additions.
Width-3 signed sliding windows:
Precompute P; 3P; 5P; 7P .
On average ≈b=5 additions.
Width 4: Precompute
P; 3P; 5P; 7P; 9P; 11P; 13P; 15P .
On average ≈b=6 additions.
Cost of precomputation
eventually outweighs savings.
Optimal: ≈b doublings plus
roughly b=lg b additions.
16
Double-scalar multiplication
Want to quickly compute
m;P; n;Q 7→ mP + nQ.
e.g. verify signature (R; S)
by computing h = H(R;M),
computing SB − hA,
checking whether R = SB − hA.
Obvious approach:
Compute mP ; compute nQ; add.
e.g. b = 256:
≈256 doublings for mP ,
≈256 doublings for nQ,
≈50 additions for mP ,
≈50 additions for nQ.
![Page 59: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/59.jpg)
14
if n % 8 == 5:
R = window2((n+3)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P3
if n % 8 == 7:
R = window2((n+1)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P
R = window2(n/2,P,P3)
return R + R
def scalarmult(n,P):
return window2(n,P,P+P+P)
15
Worst case: ≈b doublings plus
≈b=3 additions of ±P or ±3P .
On average ≈b=4 additions.
Width-3 signed sliding windows:
Precompute P; 3P; 5P; 7P .
On average ≈b=5 additions.
Width 4: Precompute
P; 3P; 5P; 7P; 9P; 11P; 13P; 15P .
On average ≈b=6 additions.
Cost of precomputation
eventually outweighs savings.
Optimal: ≈b doublings plus
roughly b=lg b additions.
16
Double-scalar multiplication
Want to quickly compute
m;P; n;Q 7→ mP + nQ.
e.g. verify signature (R; S)
by computing h = H(R;M),
computing SB − hA,
checking whether R = SB − hA.
Obvious approach:
Compute mP ; compute nQ; add.
e.g. b = 256:
≈256 doublings for mP ,
≈256 doublings for nQ,
≈50 additions for mP ,
≈50 additions for nQ.
![Page 60: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/60.jpg)
14
if n % 8 == 5:
R = window2((n+3)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P3
if n % 8 == 7:
R = window2((n+1)/8,P,P3)
R = R + R
R = R + R
return (R + R) - P
R = window2(n/2,P,P3)
return R + R
def scalarmult(n,P):
return window2(n,P,P+P+P)
15
Worst case: ≈b doublings plus
≈b=3 additions of ±P or ±3P .
On average ≈b=4 additions.
Width-3 signed sliding windows:
Precompute P; 3P; 5P; 7P .
On average ≈b=5 additions.
Width 4: Precompute
P; 3P; 5P; 7P; 9P; 11P; 13P; 15P .
On average ≈b=6 additions.
Cost of precomputation
eventually outweighs savings.
Optimal: ≈b doublings plus
roughly b=lg b additions.
16
Double-scalar multiplication
Want to quickly compute
m;P; n;Q 7→ mP + nQ.
e.g. verify signature (R; S)
by computing h = H(R;M),
computing SB − hA,
checking whether R = SB − hA.
Obvious approach:
Compute mP ; compute nQ; add.
e.g. b = 256:
≈256 doublings for mP ,
≈256 doublings for nQ,
≈50 additions for mP ,
≈50 additions for nQ.
![Page 61: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/61.jpg)
15
Worst case: ≈b doublings plus
≈b=3 additions of ±P or ±3P .
On average ≈b=4 additions.
Width-3 signed sliding windows:
Precompute P; 3P; 5P; 7P .
On average ≈b=5 additions.
Width 4: Precompute
P; 3P; 5P; 7P; 9P; 11P; 13P; 15P .
On average ≈b=6 additions.
Cost of precomputation
eventually outweighs savings.
Optimal: ≈b doublings plus
roughly b=lg b additions.
16
Double-scalar multiplication
Want to quickly compute
m;P; n;Q 7→ mP + nQ.
e.g. verify signature (R; S)
by computing h = H(R;M),
computing SB − hA,
checking whether R = SB − hA.
Obvious approach:
Compute mP ; compute nQ; add.
e.g. b = 256:
≈256 doublings for mP ,
≈256 doublings for nQ,
≈50 additions for mP ,
≈50 additions for nQ.
![Page 62: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/62.jpg)
15
Worst case: ≈b doublings plus
≈b=3 additions of ±P or ±3P .
On average ≈b=4 additions.
Width-3 signed sliding windows:
Precompute P; 3P; 5P; 7P .
On average ≈b=5 additions.
Width 4: Precompute
P; 3P; 5P; 7P; 9P; 11P; 13P; 15P .
On average ≈b=6 additions.
Cost of precomputation
eventually outweighs savings.
Optimal: ≈b doublings plus
roughly b=lg b additions.
16
Double-scalar multiplication
Want to quickly compute
m;P; n;Q 7→ mP + nQ.
e.g. verify signature (R; S)
by computing h = H(R;M),
computing SB − hA,
checking whether R = SB − hA.
Obvious approach:
Compute mP ; compute nQ; add.
e.g. b = 256:
≈256 doublings for mP ,
≈256 doublings for nQ,
≈50 additions for mP ,
≈50 additions for nQ.
17
Joint doublings
Do much better by merging
2X + 2Y into 2(X + Y ).
def scalarmult2(m,P,n,Q):
if m == 0:
return scalarmult(n,Q)
if n == 0:
return scalarmult(m,P)
R = scalarmult2(m//2,P,n//2,Q)
R = R + R
if m % 2: R = R + P
if n % 2: R = R + Q
return R
![Page 63: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/63.jpg)
15
Worst case: ≈b doublings plus
≈b=3 additions of ±P or ±3P .
On average ≈b=4 additions.
Width-3 signed sliding windows:
Precompute P; 3P; 5P; 7P .
On average ≈b=5 additions.
Width 4: Precompute
P; 3P; 5P; 7P; 9P; 11P; 13P; 15P .
On average ≈b=6 additions.
Cost of precomputation
eventually outweighs savings.
Optimal: ≈b doublings plus
roughly b=lg b additions.
16
Double-scalar multiplication
Want to quickly compute
m;P; n;Q 7→ mP + nQ.
e.g. verify signature (R; S)
by computing h = H(R;M),
computing SB − hA,
checking whether R = SB − hA.
Obvious approach:
Compute mP ; compute nQ; add.
e.g. b = 256:
≈256 doublings for mP ,
≈256 doublings for nQ,
≈50 additions for mP ,
≈50 additions for nQ.
17
Joint doublings
Do much better by merging
2X + 2Y into 2(X + Y ).
def scalarmult2(m,P,n,Q):
if m == 0:
return scalarmult(n,Q)
if n == 0:
return scalarmult(m,P)
R = scalarmult2(m//2,P,n//2,Q)
R = R + R
if m % 2: R = R + P
if n % 2: R = R + Q
return R
![Page 64: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/64.jpg)
15
Worst case: ≈b doublings plus
≈b=3 additions of ±P or ±3P .
On average ≈b=4 additions.
Width-3 signed sliding windows:
Precompute P; 3P; 5P; 7P .
On average ≈b=5 additions.
Width 4: Precompute
P; 3P; 5P; 7P; 9P; 11P; 13P; 15P .
On average ≈b=6 additions.
Cost of precomputation
eventually outweighs savings.
Optimal: ≈b doublings plus
roughly b=lg b additions.
16
Double-scalar multiplication
Want to quickly compute
m;P; n;Q 7→ mP + nQ.
e.g. verify signature (R; S)
by computing h = H(R;M),
computing SB − hA,
checking whether R = SB − hA.
Obvious approach:
Compute mP ; compute nQ; add.
e.g. b = 256:
≈256 doublings for mP ,
≈256 doublings for nQ,
≈50 additions for mP ,
≈50 additions for nQ.
17
Joint doublings
Do much better by merging
2X + 2Y into 2(X + Y ).
def scalarmult2(m,P,n,Q):
if m == 0:
return scalarmult(n,Q)
if n == 0:
return scalarmult(m,P)
R = scalarmult2(m//2,P,n//2,Q)
R = R + R
if m % 2: R = R + P
if n % 2: R = R + Q
return R
![Page 65: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/65.jpg)
16
Double-scalar multiplication
Want to quickly compute
m;P; n;Q 7→ mP + nQ.
e.g. verify signature (R; S)
by computing h = H(R;M),
computing SB − hA,
checking whether R = SB − hA.
Obvious approach:
Compute mP ; compute nQ; add.
e.g. b = 256:
≈256 doublings for mP ,
≈256 doublings for nQ,
≈50 additions for mP ,
≈50 additions for nQ.
17
Joint doublings
Do much better by merging
2X + 2Y into 2(X + Y ).
def scalarmult2(m,P,n,Q):
if m == 0:
return scalarmult(n,Q)
if n == 0:
return scalarmult(m,P)
R = scalarmult2(m//2,P,n//2,Q)
R = R + R
if m % 2: R = R + P
if n % 2: R = R + Q
return R
![Page 66: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/66.jpg)
16
Double-scalar multiplication
Want to quickly compute
m;P; n;Q 7→ mP + nQ.
e.g. verify signature (R; S)
by computing h = H(R;M),
computing SB − hA,
checking whether R = SB − hA.
Obvious approach:
Compute mP ; compute nQ; add.
e.g. b = 256:
≈256 doublings for mP ,
≈256 doublings for nQ,
≈50 additions for mP ,
≈50 additions for nQ.
17
Joint doublings
Do much better by merging
2X + 2Y into 2(X + Y ).
def scalarmult2(m,P,n,Q):
if m == 0:
return scalarmult(n,Q)
if n == 0:
return scalarmult(m,P)
R = scalarmult2(m//2,P,n//2,Q)
R = R + R
if m % 2: R = R + P
if n % 2: R = R + Q
return R
18
For example: merge
35P = 2(2(2(2(2P ))) + P ) + P ,
31Q = 2(2(2(2Q+Q)+Q)+Q)+Q
into 35P + 31Q =
2(2(2(2(2P+Q)+Q)+Q)+P+Q)
+P+Q.
≈b doublings (merged!),
≈b=2 additions of P ,
≈b=2 additions of Q.
Combine idea with windows: e.g.,
≈256 doublings for b = 256,
≈50 additions using P ,
≈50 additions using Q.
![Page 67: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/67.jpg)
16
Double-scalar multiplication
Want to quickly compute
m;P; n;Q 7→ mP + nQ.
e.g. verify signature (R; S)
by computing h = H(R;M),
computing SB − hA,
checking whether R = SB − hA.
Obvious approach:
Compute mP ; compute nQ; add.
e.g. b = 256:
≈256 doublings for mP ,
≈256 doublings for nQ,
≈50 additions for mP ,
≈50 additions for nQ.
17
Joint doublings
Do much better by merging
2X + 2Y into 2(X + Y ).
def scalarmult2(m,P,n,Q):
if m == 0:
return scalarmult(n,Q)
if n == 0:
return scalarmult(m,P)
R = scalarmult2(m//2,P,n//2,Q)
R = R + R
if m % 2: R = R + P
if n % 2: R = R + Q
return R
18
For example: merge
35P = 2(2(2(2(2P ))) + P ) + P ,
31Q = 2(2(2(2Q+Q)+Q)+Q)+Q
into 35P + 31Q =
2(2(2(2(2P+Q)+Q)+Q)+P+Q)
+P+Q.
≈b doublings (merged!),
≈b=2 additions of P ,
≈b=2 additions of Q.
Combine idea with windows: e.g.,
≈256 doublings for b = 256,
≈50 additions using P ,
≈50 additions using Q.
![Page 68: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/68.jpg)
16
Double-scalar multiplication
Want to quickly compute
m;P; n;Q 7→ mP + nQ.
e.g. verify signature (R; S)
by computing h = H(R;M),
computing SB − hA,
checking whether R = SB − hA.
Obvious approach:
Compute mP ; compute nQ; add.
e.g. b = 256:
≈256 doublings for mP ,
≈256 doublings for nQ,
≈50 additions for mP ,
≈50 additions for nQ.
17
Joint doublings
Do much better by merging
2X + 2Y into 2(X + Y ).
def scalarmult2(m,P,n,Q):
if m == 0:
return scalarmult(n,Q)
if n == 0:
return scalarmult(m,P)
R = scalarmult2(m//2,P,n//2,Q)
R = R + R
if m % 2: R = R + P
if n % 2: R = R + Q
return R
18
For example: merge
35P = 2(2(2(2(2P ))) + P ) + P ,
31Q = 2(2(2(2Q+Q)+Q)+Q)+Q
into 35P + 31Q =
2(2(2(2(2P+Q)+Q)+Q)+P+Q)
+P+Q.
≈b doublings (merged!),
≈b=2 additions of P ,
≈b=2 additions of Q.
Combine idea with windows: e.g.,
≈256 doublings for b = 256,
≈50 additions using P ,
≈50 additions using Q.
![Page 69: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/69.jpg)
17
Joint doublings
Do much better by merging
2X + 2Y into 2(X + Y ).
def scalarmult2(m,P,n,Q):
if m == 0:
return scalarmult(n,Q)
if n == 0:
return scalarmult(m,P)
R = scalarmult2(m//2,P,n//2,Q)
R = R + R
if m % 2: R = R + P
if n % 2: R = R + Q
return R
18
For example: merge
35P = 2(2(2(2(2P ))) + P ) + P ,
31Q = 2(2(2(2Q+Q)+Q)+Q)+Q
into 35P + 31Q =
2(2(2(2(2P+Q)+Q)+Q)+P+Q)
+P+Q.
≈b doublings (merged!),
≈b=2 additions of P ,
≈b=2 additions of Q.
Combine idea with windows: e.g.,
≈256 doublings for b = 256,
≈50 additions using P ,
≈50 additions using Q.
![Page 70: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/70.jpg)
17
Joint doublings
Do much better by merging
2X + 2Y into 2(X + Y ).
def scalarmult2(m,P,n,Q):
if m == 0:
return scalarmult(n,Q)
if n == 0:
return scalarmult(m,P)
R = scalarmult2(m//2,P,n//2,Q)
R = R + R
if m % 2: R = R + P
if n % 2: R = R + Q
return R
18
For example: merge
35P = 2(2(2(2(2P ))) + P ) + P ,
31Q = 2(2(2(2Q+Q)+Q)+Q)+Q
into 35P + 31Q =
2(2(2(2(2P+Q)+Q)+Q)+P+Q)
+P+Q.
≈b doublings (merged!),
≈b=2 additions of P ,
≈b=2 additions of Q.
Combine idea with windows: e.g.,
≈256 doublings for b = 256,
≈50 additions using P ,
≈50 additions using Q.
19
Batch verification
Verifying many signatures:
need to be confident that
S1B = R1 + h1A1,
S2B = R2 + h2A2,
S3B = R3 + h3A3,
etc.
Obvious approach:
Check each equation separately.
![Page 71: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/71.jpg)
17
Joint doublings
Do much better by merging
2X + 2Y into 2(X + Y ).
def scalarmult2(m,P,n,Q):
if m == 0:
return scalarmult(n,Q)
if n == 0:
return scalarmult(m,P)
R = scalarmult2(m//2,P,n//2,Q)
R = R + R
if m % 2: R = R + P
if n % 2: R = R + Q
return R
18
For example: merge
35P = 2(2(2(2(2P ))) + P ) + P ,
31Q = 2(2(2(2Q+Q)+Q)+Q)+Q
into 35P + 31Q =
2(2(2(2(2P+Q)+Q)+Q)+P+Q)
+P+Q.
≈b doublings (merged!),
≈b=2 additions of P ,
≈b=2 additions of Q.
Combine idea with windows: e.g.,
≈256 doublings for b = 256,
≈50 additions using P ,
≈50 additions using Q.
19
Batch verification
Verifying many signatures:
need to be confident that
S1B = R1 + h1A1,
S2B = R2 + h2A2,
S3B = R3 + h3A3,
etc.
Obvious approach:
Check each equation separately.
![Page 72: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/72.jpg)
17
Joint doublings
Do much better by merging
2X + 2Y into 2(X + Y ).
def scalarmult2(m,P,n,Q):
if m == 0:
return scalarmult(n,Q)
if n == 0:
return scalarmult(m,P)
R = scalarmult2(m//2,P,n//2,Q)
R = R + R
if m % 2: R = R + P
if n % 2: R = R + Q
return R
18
For example: merge
35P = 2(2(2(2(2P ))) + P ) + P ,
31Q = 2(2(2(2Q+Q)+Q)+Q)+Q
into 35P + 31Q =
2(2(2(2(2P+Q)+Q)+Q)+P+Q)
+P+Q.
≈b doublings (merged!),
≈b=2 additions of P ,
≈b=2 additions of Q.
Combine idea with windows: e.g.,
≈256 doublings for b = 256,
≈50 additions using P ,
≈50 additions using Q.
19
Batch verification
Verifying many signatures:
need to be confident that
S1B = R1 + h1A1,
S2B = R2 + h2A2,
S3B = R3 + h3A3,
etc.
Obvious approach:
Check each equation separately.
![Page 73: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/73.jpg)
18
For example: merge
35P = 2(2(2(2(2P ))) + P ) + P ,
31Q = 2(2(2(2Q+Q)+Q)+Q)+Q
into 35P + 31Q =
2(2(2(2(2P+Q)+Q)+Q)+P+Q)
+P+Q.
≈b doublings (merged!),
≈b=2 additions of P ,
≈b=2 additions of Q.
Combine idea with windows: e.g.,
≈256 doublings for b = 256,
≈50 additions using P ,
≈50 additions using Q.
19
Batch verification
Verifying many signatures:
need to be confident that
S1B = R1 + h1A1,
S2B = R2 + h2A2,
S3B = R3 + h3A3,
etc.
Obvious approach:
Check each equation separately.
![Page 74: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/74.jpg)
18
For example: merge
35P = 2(2(2(2(2P ))) + P ) + P ,
31Q = 2(2(2(2Q+Q)+Q)+Q)+Q
into 35P + 31Q =
2(2(2(2(2P+Q)+Q)+Q)+P+Q)
+P+Q.
≈b doublings (merged!),
≈b=2 additions of P ,
≈b=2 additions of Q.
Combine idea with windows: e.g.,
≈256 doublings for b = 256,
≈50 additions using P ,
≈50 additions using Q.
19
Batch verification
Verifying many signatures:
need to be confident that
S1B = R1 + h1A1,
S2B = R2 + h2A2,
S3B = R3 + h3A3,
etc.
Obvious approach:
Check each equation separately.
Much faster approach:
Check random linear combination
of the equations.
![Page 75: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/75.jpg)
18
For example: merge
35P = 2(2(2(2(2P ))) + P ) + P ,
31Q = 2(2(2(2Q+Q)+Q)+Q)+Q
into 35P + 31Q =
2(2(2(2(2P+Q)+Q)+Q)+P+Q)
+P+Q.
≈b doublings (merged!),
≈b=2 additions of P ,
≈b=2 additions of Q.
Combine idea with windows: e.g.,
≈256 doublings for b = 256,
≈50 additions using P ,
≈50 additions using Q.
19
Batch verification
Verifying many signatures:
need to be confident that
S1B = R1 + h1A1,
S2B = R2 + h2A2,
S3B = R3 + h3A3,
etc.
Obvious approach:
Check each equation separately.
Much faster approach:
Check random linear combination
of the equations.
20
Pick independent uniform random
128-bit z1; z2; z3; : : :.
Check whether
(z1S1 + z2S2 + z3S3 + · · ·)B =
z1R1 + (z1h1)A1 +
z2R2 + (z2h2)A2 +
z3R3 + (z3h3)A3 + · · ·.
(If 6=: See 2012 Bernstein–
Doumen–Lange–Oosterwijk.)
Easy to prove:
forgeries have probability ≤2−128
of fooling this check.
![Page 76: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/76.jpg)
18
For example: merge
35P = 2(2(2(2(2P ))) + P ) + P ,
31Q = 2(2(2(2Q+Q)+Q)+Q)+Q
into 35P + 31Q =
2(2(2(2(2P+Q)+Q)+Q)+P+Q)
+P+Q.
≈b doublings (merged!),
≈b=2 additions of P ,
≈b=2 additions of Q.
Combine idea with windows: e.g.,
≈256 doublings for b = 256,
≈50 additions using P ,
≈50 additions using Q.
19
Batch verification
Verifying many signatures:
need to be confident that
S1B = R1 + h1A1,
S2B = R2 + h2A2,
S3B = R3 + h3A3,
etc.
Obvious approach:
Check each equation separately.
Much faster approach:
Check random linear combination
of the equations.
20
Pick independent uniform random
128-bit z1; z2; z3; : : :.
Check whether
(z1S1 + z2S2 + z3S3 + · · ·)B =
z1R1 + (z1h1)A1 +
z2R2 + (z2h2)A2 +
z3R3 + (z3h3)A3 + · · ·.
(If 6=: See 2012 Bernstein–
Doumen–Lange–Oosterwijk.)
Easy to prove:
forgeries have probability ≤2−128
of fooling this check.
![Page 77: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/77.jpg)
18
For example: merge
35P = 2(2(2(2(2P ))) + P ) + P ,
31Q = 2(2(2(2Q+Q)+Q)+Q)+Q
into 35P + 31Q =
2(2(2(2(2P+Q)+Q)+Q)+P+Q)
+P+Q.
≈b doublings (merged!),
≈b=2 additions of P ,
≈b=2 additions of Q.
Combine idea with windows: e.g.,
≈256 doublings for b = 256,
≈50 additions using P ,
≈50 additions using Q.
19
Batch verification
Verifying many signatures:
need to be confident that
S1B = R1 + h1A1,
S2B = R2 + h2A2,
S3B = R3 + h3A3,
etc.
Obvious approach:
Check each equation separately.
Much faster approach:
Check random linear combination
of the equations.
20
Pick independent uniform random
128-bit z1; z2; z3; : : :.
Check whether
(z1S1 + z2S2 + z3S3 + · · ·)B =
z1R1 + (z1h1)A1 +
z2R2 + (z2h2)A2 +
z3R3 + (z3h3)A3 + · · ·.
(If 6=: See 2012 Bernstein–
Doumen–Lange–Oosterwijk.)
Easy to prove:
forgeries have probability ≤2−128
of fooling this check.
![Page 78: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/78.jpg)
19
Batch verification
Verifying many signatures:
need to be confident that
S1B = R1 + h1A1,
S2B = R2 + h2A2,
S3B = R3 + h3A3,
etc.
Obvious approach:
Check each equation separately.
Much faster approach:
Check random linear combination
of the equations.
20
Pick independent uniform random
128-bit z1; z2; z3; : : :.
Check whether
(z1S1 + z2S2 + z3S3 + · · ·)B =
z1R1 + (z1h1)A1 +
z2R2 + (z2h2)A2 +
z3R3 + (z3h3)A3 + · · ·.
(If 6=: See 2012 Bernstein–
Doumen–Lange–Oosterwijk.)
Easy to prove:
forgeries have probability ≤2−128
of fooling this check.
![Page 79: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/79.jpg)
19
Batch verification
Verifying many signatures:
need to be confident that
S1B = R1 + h1A1,
S2B = R2 + h2A2,
S3B = R3 + h3A3,
etc.
Obvious approach:
Check each equation separately.
Much faster approach:
Check random linear combination
of the equations.
20
Pick independent uniform random
128-bit z1; z2; z3; : : :.
Check whether
(z1S1 + z2S2 + z3S3 + · · ·)B =
z1R1 + (z1h1)A1 +
z2R2 + (z2h2)A2 +
z3R3 + (z3h3)A3 + · · ·.
(If 6=: See 2012 Bernstein–
Doumen–Lange–Oosterwijk.)
Easy to prove:
forgeries have probability ≤2−128
of fooling this check.
21
Multi-scalar multiplication
Review of asymptotic speeds:
1939 Brauer (windows):
≈ (1 + 1=lg b)b
additions to compute
P 7→ nP if n < 2b.
1964 Straus (joint doublings):
≈ (1 + k=lg b)b
additions to compute
P1; : : : ; Pk 7→ n1P1 + · · ·+ nkPkif n1; : : : ; nk < 2b.
![Page 80: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/80.jpg)
19
Batch verification
Verifying many signatures:
need to be confident that
S1B = R1 + h1A1,
S2B = R2 + h2A2,
S3B = R3 + h3A3,
etc.
Obvious approach:
Check each equation separately.
Much faster approach:
Check random linear combination
of the equations.
20
Pick independent uniform random
128-bit z1; z2; z3; : : :.
Check whether
(z1S1 + z2S2 + z3S3 + · · ·)B =
z1R1 + (z1h1)A1 +
z2R2 + (z2h2)A2 +
z3R3 + (z3h3)A3 + · · ·.
(If 6=: See 2012 Bernstein–
Doumen–Lange–Oosterwijk.)
Easy to prove:
forgeries have probability ≤2−128
of fooling this check.
21
Multi-scalar multiplication
Review of asymptotic speeds:
1939 Brauer (windows):
≈ (1 + 1=lg b)b
additions to compute
P 7→ nP if n < 2b.
1964 Straus (joint doublings):
≈ (1 + k=lg b)b
additions to compute
P1; : : : ; Pk 7→ n1P1 + · · ·+ nkPkif n1; : : : ; nk < 2b.
![Page 81: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/81.jpg)
19
Batch verification
Verifying many signatures:
need to be confident that
S1B = R1 + h1A1,
S2B = R2 + h2A2,
S3B = R3 + h3A3,
etc.
Obvious approach:
Check each equation separately.
Much faster approach:
Check random linear combination
of the equations.
20
Pick independent uniform random
128-bit z1; z2; z3; : : :.
Check whether
(z1S1 + z2S2 + z3S3 + · · ·)B =
z1R1 + (z1h1)A1 +
z2R2 + (z2h2)A2 +
z3R3 + (z3h3)A3 + · · ·.
(If 6=: See 2012 Bernstein–
Doumen–Lange–Oosterwijk.)
Easy to prove:
forgeries have probability ≤2−128
of fooling this check.
21
Multi-scalar multiplication
Review of asymptotic speeds:
1939 Brauer (windows):
≈ (1 + 1=lg b)b
additions to compute
P 7→ nP if n < 2b.
1964 Straus (joint doublings):
≈ (1 + k=lg b)b
additions to compute
P1; : : : ; Pk 7→ n1P1 + · · ·+ nkPkif n1; : : : ; nk < 2b.
![Page 82: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/82.jpg)
20
Pick independent uniform random
128-bit z1; z2; z3; : : :.
Check whether
(z1S1 + z2S2 + z3S3 + · · ·)B =
z1R1 + (z1h1)A1 +
z2R2 + (z2h2)A2 +
z3R3 + (z3h3)A3 + · · ·.
(If 6=: See 2012 Bernstein–
Doumen–Lange–Oosterwijk.)
Easy to prove:
forgeries have probability ≤2−128
of fooling this check.
21
Multi-scalar multiplication
Review of asymptotic speeds:
1939 Brauer (windows):
≈ (1 + 1=lg b)b
additions to compute
P 7→ nP if n < 2b.
1964 Straus (joint doublings):
≈ (1 + k=lg b)b
additions to compute
P1; : : : ; Pk 7→ n1P1 + · · ·+ nkPkif n1; : : : ; nk < 2b.
![Page 83: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/83.jpg)
20
Pick independent uniform random
128-bit z1; z2; z3; : : :.
Check whether
(z1S1 + z2S2 + z3S3 + · · ·)B =
z1R1 + (z1h1)A1 +
z2R2 + (z2h2)A2 +
z3R3 + (z3h3)A3 + · · ·.
(If 6=: See 2012 Bernstein–
Doumen–Lange–Oosterwijk.)
Easy to prove:
forgeries have probability ≤2−128
of fooling this check.
21
Multi-scalar multiplication
Review of asymptotic speeds:
1939 Brauer (windows):
≈ (1 + 1=lg b)b
additions to compute
P 7→ nP if n < 2b.
1964 Straus (joint doublings):
≈ (1 + k=lg b)b
additions to compute
P1; : : : ; Pk 7→ n1P1 + · · ·+ nkPkif n1; : : : ; nk < 2b.
22
1976 Yao:
≈ (1 + k=lg b)b
additions to compute
P 7→ n1P; : : : ; nkP
if n1; : : : ; nk < 2b.
1976 Pippenger:
Similar asymptotics,
but replace lg b with lg(kb).
Faster than Straus and Yao
if k is large.
(Knuth says “generalization”
as if speed were the same.)
![Page 84: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/84.jpg)
20
Pick independent uniform random
128-bit z1; z2; z3; : : :.
Check whether
(z1S1 + z2S2 + z3S3 + · · ·)B =
z1R1 + (z1h1)A1 +
z2R2 + (z2h2)A2 +
z3R3 + (z3h3)A3 + · · ·.
(If 6=: See 2012 Bernstein–
Doumen–Lange–Oosterwijk.)
Easy to prove:
forgeries have probability ≤2−128
of fooling this check.
21
Multi-scalar multiplication
Review of asymptotic speeds:
1939 Brauer (windows):
≈ (1 + 1=lg b)b
additions to compute
P 7→ nP if n < 2b.
1964 Straus (joint doublings):
≈ (1 + k=lg b)b
additions to compute
P1; : : : ; Pk 7→ n1P1 + · · ·+ nkPkif n1; : : : ; nk < 2b.
22
1976 Yao:
≈ (1 + k=lg b)b
additions to compute
P 7→ n1P; : : : ; nkP
if n1; : : : ; nk < 2b.
1976 Pippenger:
Similar asymptotics,
but replace lg b with lg(kb).
Faster than Straus and Yao
if k is large.
(Knuth says “generalization”
as if speed were the same.)
![Page 85: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/85.jpg)
20
Pick independent uniform random
128-bit z1; z2; z3; : : :.
Check whether
(z1S1 + z2S2 + z3S3 + · · ·)B =
z1R1 + (z1h1)A1 +
z2R2 + (z2h2)A2 +
z3R3 + (z3h3)A3 + · · ·.
(If 6=: See 2012 Bernstein–
Doumen–Lange–Oosterwijk.)
Easy to prove:
forgeries have probability ≤2−128
of fooling this check.
21
Multi-scalar multiplication
Review of asymptotic speeds:
1939 Brauer (windows):
≈ (1 + 1=lg b)b
additions to compute
P 7→ nP if n < 2b.
1964 Straus (joint doublings):
≈ (1 + k=lg b)b
additions to compute
P1; : : : ; Pk 7→ n1P1 + · · ·+ nkPkif n1; : : : ; nk < 2b.
22
1976 Yao:
≈ (1 + k=lg b)b
additions to compute
P 7→ n1P; : : : ; nkP
if n1; : : : ; nk < 2b.
1976 Pippenger:
Similar asymptotics,
but replace lg b with lg(kb).
Faster than Straus and Yao
if k is large.
(Knuth says “generalization”
as if speed were the same.)
![Page 86: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/86.jpg)
21
Multi-scalar multiplication
Review of asymptotic speeds:
1939 Brauer (windows):
≈ (1 + 1=lg b)b
additions to compute
P 7→ nP if n < 2b.
1964 Straus (joint doublings):
≈ (1 + k=lg b)b
additions to compute
P1; : : : ; Pk 7→ n1P1 + · · ·+ nkPkif n1; : : : ; nk < 2b.
22
1976 Yao:
≈ (1 + k=lg b)b
additions to compute
P 7→ n1P; : : : ; nkP
if n1; : : : ; nk < 2b.
1976 Pippenger:
Similar asymptotics,
but replace lg b with lg(kb).
Faster than Straus and Yao
if k is large.
(Knuth says “generalization”
as if speed were the same.)
![Page 87: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/87.jpg)
21
Multi-scalar multiplication
Review of asymptotic speeds:
1939 Brauer (windows):
≈ (1 + 1=lg b)b
additions to compute
P 7→ nP if n < 2b.
1964 Straus (joint doublings):
≈ (1 + k=lg b)b
additions to compute
P1; : : : ; Pk 7→ n1P1 + · · ·+ nkPkif n1; : : : ; nk < 2b.
22
1976 Yao:
≈ (1 + k=lg b)b
additions to compute
P 7→ n1P; : : : ; nkP
if n1; : : : ; nk < 2b.
1976 Pippenger:
Similar asymptotics,
but replace lg b with lg(kb).
Faster than Straus and Yao
if k is large.
(Knuth says “generalization”
as if speed were the same.)
23
More generally, Pippenger’s
algorithm computes
‘ sums of multiples of k inputs.
≈„
min{k; ‘}+k‘
lg(k‘b)
«b adds
if all coefficients are below 2b.
Within 1 + › of optimal.
![Page 88: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/88.jpg)
21
Multi-scalar multiplication
Review of asymptotic speeds:
1939 Brauer (windows):
≈ (1 + 1=lg b)b
additions to compute
P 7→ nP if n < 2b.
1964 Straus (joint doublings):
≈ (1 + k=lg b)b
additions to compute
P1; : : : ; Pk 7→ n1P1 + · · ·+ nkPkif n1; : : : ; nk < 2b.
22
1976 Yao:
≈ (1 + k=lg b)b
additions to compute
P 7→ n1P; : : : ; nkP
if n1; : : : ; nk < 2b.
1976 Pippenger:
Similar asymptotics,
but replace lg b with lg(kb).
Faster than Straus and Yao
if k is large.
(Knuth says “generalization”
as if speed were the same.)
23
More generally, Pippenger’s
algorithm computes
‘ sums of multiples of k inputs.
≈„
min{k; ‘}+k‘
lg(k‘b)
«b adds
if all coefficients are below 2b.
Within 1 + › of optimal.
![Page 89: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/89.jpg)
21
Multi-scalar multiplication
Review of asymptotic speeds:
1939 Brauer (windows):
≈ (1 + 1=lg b)b
additions to compute
P 7→ nP if n < 2b.
1964 Straus (joint doublings):
≈ (1 + k=lg b)b
additions to compute
P1; : : : ; Pk 7→ n1P1 + · · ·+ nkPkif n1; : : : ; nk < 2b.
22
1976 Yao:
≈ (1 + k=lg b)b
additions to compute
P 7→ n1P; : : : ; nkP
if n1; : : : ; nk < 2b.
1976 Pippenger:
Similar asymptotics,
but replace lg b with lg(kb).
Faster than Straus and Yao
if k is large.
(Knuth says “generalization”
as if speed were the same.)
23
More generally, Pippenger’s
algorithm computes
‘ sums of multiples of k inputs.
≈„
min{k; ‘}+k‘
lg(k‘b)
«b adds
if all coefficients are below 2b.
Within 1 + › of optimal.
![Page 90: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/90.jpg)
22
1976 Yao:
≈ (1 + k=lg b)b
additions to compute
P 7→ n1P; : : : ; nkP
if n1; : : : ; nk < 2b.
1976 Pippenger:
Similar asymptotics,
but replace lg b with lg(kb).
Faster than Straus and Yao
if k is large.
(Knuth says “generalization”
as if speed were the same.)
23
More generally, Pippenger’s
algorithm computes
‘ sums of multiples of k inputs.
≈„
min{k; ‘}+k‘
lg(k‘b)
«b adds
if all coefficients are below 2b.
Within 1 + › of optimal.
![Page 91: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/91.jpg)
22
1976 Yao:
≈ (1 + k=lg b)b
additions to compute
P 7→ n1P; : : : ; nkP
if n1; : : : ; nk < 2b.
1976 Pippenger:
Similar asymptotics,
but replace lg b with lg(kb).
Faster than Straus and Yao
if k is large.
(Knuth says “generalization”
as if speed were the same.)
23
More generally, Pippenger’s
algorithm computes
‘ sums of multiples of k inputs.
≈„
min{k; ‘}+k‘
lg(k‘b)
«b adds
if all coefficients are below 2b.
Within 1 + › of optimal.
Various special cases of
Pippenger’s algorithm were
reinvented and patented by
1993 Brickell–Gordon–McCurley–
Wilson, 1995 Lim–Lee, etc.
Is that the end of the story?
![Page 92: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/92.jpg)
22
1976 Yao:
≈ (1 + k=lg b)b
additions to compute
P 7→ n1P; : : : ; nkP
if n1; : : : ; nk < 2b.
1976 Pippenger:
Similar asymptotics,
but replace lg b with lg(kb).
Faster than Straus and Yao
if k is large.
(Knuth says “generalization”
as if speed were the same.)
23
More generally, Pippenger’s
algorithm computes
‘ sums of multiples of k inputs.
≈„
min{k; ‘}+k‘
lg(k‘b)
«b adds
if all coefficients are below 2b.
Within 1 + › of optimal.
Various special cases of
Pippenger’s algorithm were
reinvented and patented by
1993 Brickell–Gordon–McCurley–
Wilson, 1995 Lim–Lee, etc.
Is that the end of the story?
24
No! 1989 Bos–Coster:
If n1 ≥ n2 ≥ · · · then
n1P1 + n2P2 + n3P3 + · · · =
(n1 − qn2)P1 + n2(qP1 + P2) +
n3P3 + · · · where q = bn1=n2c.
Remarkably simple;
competitive with Pippenger
for random choices of ni ’s;
much better memory usage.
![Page 93: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/93.jpg)
22
1976 Yao:
≈ (1 + k=lg b)b
additions to compute
P 7→ n1P; : : : ; nkP
if n1; : : : ; nk < 2b.
1976 Pippenger:
Similar asymptotics,
but replace lg b with lg(kb).
Faster than Straus and Yao
if k is large.
(Knuth says “generalization”
as if speed were the same.)
23
More generally, Pippenger’s
algorithm computes
‘ sums of multiples of k inputs.
≈„
min{k; ‘}+k‘
lg(k‘b)
«b adds
if all coefficients are below 2b.
Within 1 + › of optimal.
Various special cases of
Pippenger’s algorithm were
reinvented and patented by
1993 Brickell–Gordon–McCurley–
Wilson, 1995 Lim–Lee, etc.
Is that the end of the story?
24
No! 1989 Bos–Coster:
If n1 ≥ n2 ≥ · · · then
n1P1 + n2P2 + n3P3 + · · · =
(n1 − qn2)P1 + n2(qP1 + P2) +
n3P3 + · · · where q = bn1=n2c.
Remarkably simple;
competitive with Pippenger
for random choices of ni ’s;
much better memory usage.
![Page 94: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/94.jpg)
22
1976 Yao:
≈ (1 + k=lg b)b
additions to compute
P 7→ n1P; : : : ; nkP
if n1; : : : ; nk < 2b.
1976 Pippenger:
Similar asymptotics,
but replace lg b with lg(kb).
Faster than Straus and Yao
if k is large.
(Knuth says “generalization”
as if speed were the same.)
23
More generally, Pippenger’s
algorithm computes
‘ sums of multiples of k inputs.
≈„
min{k; ‘}+k‘
lg(k‘b)
«b adds
if all coefficients are below 2b.
Within 1 + › of optimal.
Various special cases of
Pippenger’s algorithm were
reinvented and patented by
1993 Brickell–Gordon–McCurley–
Wilson, 1995 Lim–Lee, etc.
Is that the end of the story?
24
No! 1989 Bos–Coster:
If n1 ≥ n2 ≥ · · · then
n1P1 + n2P2 + n3P3 + · · · =
(n1 − qn2)P1 + n2(qP1 + P2) +
n3P3 + · · · where q = bn1=n2c.
Remarkably simple;
competitive with Pippenger
for random choices of ni ’s;
much better memory usage.
![Page 95: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/95.jpg)
23
More generally, Pippenger’s
algorithm computes
‘ sums of multiples of k inputs.
≈„
min{k; ‘}+k‘
lg(k‘b)
«b adds
if all coefficients are below 2b.
Within 1 + › of optimal.
Various special cases of
Pippenger’s algorithm were
reinvented and patented by
1993 Brickell–Gordon–McCurley–
Wilson, 1995 Lim–Lee, etc.
Is that the end of the story?
24
No! 1989 Bos–Coster:
If n1 ≥ n2 ≥ · · · then
n1P1 + n2P2 + n3P3 + · · · =
(n1 − qn2)P1 + n2(qP1 + P2) +
n3P3 + · · · where q = bn1=n2c.
Remarkably simple;
competitive with Pippenger
for random choices of ni ’s;
much better memory usage.
![Page 96: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/96.jpg)
23
More generally, Pippenger’s
algorithm computes
‘ sums of multiples of k inputs.
≈„
min{k; ‘}+k‘
lg(k‘b)
«b adds
if all coefficients are below 2b.
Within 1 + › of optimal.
Various special cases of
Pippenger’s algorithm were
reinvented and patented by
1993 Brickell–Gordon–McCurley–
Wilson, 1995 Lim–Lee, etc.
Is that the end of the story?
24
No! 1989 Bos–Coster:
If n1 ≥ n2 ≥ · · · then
n1P1 + n2P2 + n3P3 + · · · =
(n1 − qn2)P1 + n2(qP1 + P2) +
n3P3 + · · · where q = bn1=n2c.
Remarkably simple;
competitive with Pippenger
for random choices of ni ’s;
much better memory usage.
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
![Page 97: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/97.jpg)
23
More generally, Pippenger’s
algorithm computes
‘ sums of multiples of k inputs.
≈„
min{k; ‘}+k‘
lg(k‘b)
«b adds
if all coefficients are below 2b.
Within 1 + › of optimal.
Various special cases of
Pippenger’s algorithm were
reinvented and patented by
1993 Brickell–Gordon–McCurley–
Wilson, 1995 Lim–Lee, etc.
Is that the end of the story?
24
No! 1989 Bos–Coster:
If n1 ≥ n2 ≥ · · · then
n1P1 + n2P2 + n3P3 + · · · =
(n1 − qn2)P1 + n2(qP1 + P2) +
n3P3 + · · · where q = bn1=n2c.
Remarkably simple;
competitive with Pippenger
for random choices of ni ’s;
much better memory usage.
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
![Page 98: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/98.jpg)
23
More generally, Pippenger’s
algorithm computes
‘ sums of multiples of k inputs.
≈„
min{k; ‘}+k‘
lg(k‘b)
«b adds
if all coefficients are below 2b.
Within 1 + › of optimal.
Various special cases of
Pippenger’s algorithm were
reinvented and patented by
1993 Brickell–Gordon–McCurley–
Wilson, 1995 Lim–Lee, etc.
Is that the end of the story?
24
No! 1989 Bos–Coster:
If n1 ≥ n2 ≥ · · · then
n1P1 + n2P2 + n3P3 + · · · =
(n1 − qn2)P1 + n2(qP1 + P2) +
n3P3 + · · · where q = bn1=n2c.
Remarkably simple;
competitive with Pippenger
for random choices of ni ’s;
much better memory usage.
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
![Page 99: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/99.jpg)
24
No! 1989 Bos–Coster:
If n1 ≥ n2 ≥ · · · then
n1P1 + n2P2 + n3P3 + · · · =
(n1 − qn2)P1 + n2(qP1 + P2) +
n3P3 + · · · where q = bn1=n2c.
Remarkably simple;
competitive with Pippenger
for random choices of ni ’s;
much better memory usage.
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
![Page 100: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/100.jpg)
24
No! 1989 Bos–Coster:
If n1 ≥ n2 ≥ · · · then
n1P1 + n2P2 + n3P3 + · · · =
(n1 − qn2)P1 + n2(qP1 + P2) +
n3P3 + · · · where q = bn1=n2c.
Remarkably simple;
competitive with Pippenger
for random choices of ni ’s;
much better memory usage.
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000100000 = 32
000010000 = 16
010011010 = 154 ←010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
154P , 146P , 77P , 2P , 1P .
Plus one extra addition:
add 146P into 154P ,
obtaining 300P .
![Page 101: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/101.jpg)
24
No! 1989 Bos–Coster:
If n1 ≥ n2 ≥ · · · then
n1P1 + n2P2 + n3P3 + · · · =
(n1 − qn2)P1 + n2(qP1 + P2) +
n3P3 + · · · where q = bn1=n2c.
Remarkably simple;
competitive with Pippenger
for random choices of ni ’s;
much better memory usage.
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000100000 = 32
000010000 = 16
010011010 = 154 ←010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
154P , 146P , 77P , 2P , 1P .
Plus one extra addition:
add 146P into 154P ,
obtaining 300P .
![Page 102: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/102.jpg)
24
No! 1989 Bos–Coster:
If n1 ≥ n2 ≥ · · · then
n1P1 + n2P2 + n3P3 + · · · =
(n1 − qn2)P1 + n2(qP1 + P2) +
n3P3 + · · · where q = bn1=n2c.
Remarkably simple;
competitive with Pippenger
for random choices of ni ’s;
much better memory usage.
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000100000 = 32
000010000 = 16
010011010 = 154 ←010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
154P , 146P , 77P , 2P , 1P .
Plus one extra addition:
add 146P into 154P ,
obtaining 300P .
![Page 103: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/103.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000100000 = 32
000010000 = 16
010011010 = 154 ←010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
154P , 146P , 77P , 2P , 1P .
Plus one extra addition:
add 146P into 154P ,
obtaining 300P .
![Page 104: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/104.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000100000 = 32
000010000 = 16
000001000 = 8 ←010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
plus 2 additions.
![Page 105: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/105.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000100000 = 32
000010000 = 16
000001000 = 8
001000101 = 69 ←001001101 = 77
000000010 = 2
000000001 = 1
plus 3 additions.
![Page 106: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/106.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000100000 = 32
000010000 = 16
000001000 = 8
001000101 = 69
000001000 = 8 ←000000010 = 2
000000001 = 1
plus 4 additions.
![Page 107: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/107.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000100000 = 32
000010000 = 16
000001000 = 8
000100101 = 37 ←000001000 = 8
000000010 = 2
000000001 = 1
plus 5 additions.
![Page 108: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/108.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000100000 = 32
000010000 = 16
000001000 = 8
000000101 = 5 ←000001000 = 8
000000010 = 2
000000001 = 1
plus 6 additions.
![Page 109: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/109.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000010000 = 16 ←000010000 = 16
000001000 = 8
000000101 = 5
000001000 = 8
000000010 = 2
000000001 = 1
plus 7 additions.
![Page 110: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/110.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000000000 = 0
000010000 = 16
000001000 = 8
000000101 = 5
000001000 = 8
000000010 = 2
000000001 = 1
plus 7 additions.
![Page 111: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/111.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000000000 = 0
000001000 = 8 ←000001000 = 8
000000101 = 5
000001000 = 8
000000010 = 2
000000001 = 1
plus 8 additions.
![Page 112: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/112.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000000000 = 0
000000000 = 0 ←000001000 = 8
000000101 = 5
000001000 = 8
000000010 = 2
000000001 = 1
plus 8 additions.
![Page 113: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/113.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000000000 = 0
000000000 = 0
000000000 = 0 ←000000101 = 5
000001000 = 8
000000010 = 2
000000001 = 1
plus 8 additions.
![Page 114: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/114.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000000000 = 0
000000000 = 0
000000000 = 0
000000101 = 5
000000011 = 3 ←000000010 = 2
000000001 = 1
plus 9 additions.
![Page 115: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/115.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000000000 = 0
000000000 = 0
000000000 = 0
000000010 = 2 ←000000011 = 3
000000010 = 2
000000001 = 1
plus 10 additions.
![Page 116: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/116.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000000000 = 0
000000000 = 0
000000000 = 0
000000010 = 2
000000001 = 1 ←000000010 = 2
000000001 = 1
plus 11 additions.
![Page 117: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/117.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0 ←000000001 = 1
000000010 = 2
000000001 = 1
plus 11 additions.
![Page 118: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/118.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0
000000001 = 1
000000001 = 1 ←000000001 = 1
plus 12 additions.
![Page 119: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/119.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0 ←000000001 = 1
000000001 = 1
plus 12 additions.
![Page 120: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/120.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0 ←000000001 = 1
plus 12 additions.
![Page 121: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/121.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0 ←
plus 12 additions.
Final addition chain: 1, 2, 3, 5, 8,
16, 32, 37, 69, 77, 146, 154, 300.
Short, no temporary storage,
low two-operand complexity.
![Page 122: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/122.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0 ←
plus 12 additions.
Final addition chain: 1, 2, 3, 5, 8,
16, 32, 37, 69, 77, 146, 154, 300.
Short, no temporary storage,
low two-operand complexity.
27
Revised goal: Compute
32P1 + 16P2 + 300P3 + 146P4 +
77P5 + 2P6 + 1P7.
First compute P ′4 = P4 + P3
and then recursively compute
32P1 + 16P2 + 154P3 + 146P ′4 +
77P5 + 2P6 + 1P7.
Same scalars show up as before.
Ed25519 batch verification:
verify batch of 64 signatures
about twice as fast as
verifying each separately.
![Page 123: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/123.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0 ←
plus 12 additions.
Final addition chain: 1, 2, 3, 5, 8,
16, 32, 37, 69, 77, 146, 154, 300.
Short, no temporary storage,
low two-operand complexity.
27
Revised goal: Compute
32P1 + 16P2 + 300P3 + 146P4 +
77P5 + 2P6 + 1P7.
First compute P ′4 = P4 + P3
and then recursively compute
32P1 + 16P2 + 154P3 + 146P ′4 +
77P5 + 2P6 + 1P7.
Same scalars show up as before.
Ed25519 batch verification:
verify batch of 64 signatures
about twice as fast as
verifying each separately.
![Page 124: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/124.jpg)
25
Example of Bos–Coster:
000100000 = 32
000010000 = 16
100101100 = 300
010010010 = 146
001001101 = 77
000000010 = 2
000000001 = 1
Goal: Compute 32P , 16P ,
300P , 146P , 77P , 2P , 1P .
26
Reduce largest row:
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0 ←
plus 12 additions.
Final addition chain: 1, 2, 3, 5, 8,
16, 32, 37, 69, 77, 146, 154, 300.
Short, no temporary storage,
low two-operand complexity.
27
Revised goal: Compute
32P1 + 16P2 + 300P3 + 146P4 +
77P5 + 2P6 + 1P7.
First compute P ′4 = P4 + P3
and then recursively compute
32P1 + 16P2 + 154P3 + 146P ′4 +
77P5 + 2P6 + 1P7.
Same scalars show up as before.
Ed25519 batch verification:
verify batch of 64 signatures
about twice as fast as
verifying each separately.
![Page 125: Schwabe{Yang: Curve25519/Ed25519 speed. … Many papers have explored Curve25519/Ed25519 speed. e.g. 2015 Chou software: on Intel Sandy Bridge (2011), 57164 cycles for keygen, 63526](https://reader031.fdocuments.us/reader031/viewer/2022022804/5c98ccf509d3f253358c5b04/html5/thumbnails/125.jpg)
26
Reduce largest row:
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0
000000000 = 0 ←
plus 12 additions.
Final addition chain: 1, 2, 3, 5, 8,
16, 32, 37, 69, 77, 146, 154, 300.
Short, no temporary storage,
low two-operand complexity.
27
Revised goal: Compute
32P1 + 16P2 + 300P3 + 146P4 +
77P5 + 2P6 + 1P7.
First compute P ′4 = P4 + P3
and then recursively compute
32P1 + 16P2 + 154P3 + 146P ′4 +
77P5 + 2P6 + 1P7.
Same scalars show up as before.
Ed25519 batch verification:
verify batch of 64 signatures
about twice as fast as
verifying each separately.