Salsa20

13
INDIAN STATISTICAL INSTITUTE Salsa A Detailed Study on Salsa under the guidance of Dr. Bimal K. Roy Amit Kumar Ghosh , Abhijnan Chattopadhyay, Priyanka Syal, Preetha Bhattacharjee The document gives an overall description over specifications of Salsa as hash function, expansion function, encryption function ; a range of benchmarks relevant to cryptographic speed ; and explains, at a lower level, techniques to achieve this performance ; and discusses modern day cryptanalysis and security of Salsa and point all alternative measures and other variants listed till date.

description

This report is made by Amit (facebook.com/titanium009) for his summer project @ ISI, Kolkata.... Drop me a mail if you want ([email protected])

Transcript of Salsa20

Page 1: Salsa20

INDIAN STATISTICAL INSTITUTE

Salsa A Detailed Study on Salsa under the guidance of

Dr. Bimal K. Roy

Amit Kumar Ghosh , Abhijnan Chattopadhyay, Priyanka Syal, Preetha Bhattacharjee

The document gives an overall description over specifications of Salsa as hash function, expansion function, encryption function ; a range of benchmarks relevant to cryptographic speed ; and explains, at a lower level, techniques to achieve this performance ; and discusses modern day cryptanalysis and security of Salsa and point all alternative measures and other variants listed till date.

Page 2: Salsa20

1 | P a g e

Table of Contents

Designing Salsa20 ............................................................................................................................. 2

Introduction .................................................................................................................................. 2

Operations..................................................................................................................................... 2

Encryption ..................................................................................................................................... 3

Hashing ......................................................................................................................................... 4

Benchmarking Salsa20 ...................................................................................................................... 5

The Salsa20 structure .................................................................................................................... 5

Salsa20 on different Platforms ....................................................................................................... 5

Salsa20 specification ......................................................................................................................... 7

Defining Functions ......................................................................................................................... 7

Specification .................................................................................................................................. 8

The Salsa20 hash function ......................................................................................................... 8

The Salsa20 expansion function ................................................................................................. 9

The Salsa20 encryption function ................................................................................................ 9

Security and cryptanalysis of Salsa20 ............................................................................................... 9

Side Channel Attacks ..................................................................................................................... 9

Notes on the uniform randomness and diagonal constants ............................................................ 9

Differential Cryptanalysis of Salsa20/8 ......................................................................................... 10

Truncated differential cryptanalysis of five rounds of Salsa20 .................................................. 10

Algebraic attacks ......................................................................................................................... 11

Other notions of security ............................................................................................................. 11

Alternative Proposals ..................................................................................................................... 11

Extending the Salsa20 nonce ....................................................................................................... 11

Page 3: Salsa20

2 | P a g e

Designing Salsa20

Introduction

eSTREAM, the ECRYPT Stream Cipher Project, called for submissions of stream ciphers in

November 2004. It received more than 30 proposals from 97 cryptographers in 19 countries, and

over the subsequent years collected a total of 200 papers. The final eSTREAM portfolio," containing

four software stream ciphers and four hardware stream ciphers, was announced in April 2008. The

portfolio was revised in September 2008 to eliminate a hardware stream cipher, F-FCSR v2, that

had been broken.

Salsa20/r is a software-oriented (profile 1) stream cipher proposed by Daniel J. Bernstein. The

algorithm supports keys of 128 bits and 256 bits. During its operation, the key, a 64-bit nonce

(unique message number), a 64-bit counter and four 32-bit constants are used to construct the

512-bit initial state. After r iterations of the Salsa20/r round function, the updated state is used as

a 512-bit keystream output. Each such output block is an independent combination of the key,

nonce, and counter and, since there is no chaining between blocks, the operation of Salsa20/r

resembles the operation of a block cipher in counter mode. Salsa20/r therefore shares the very

same implementation advantages, in particular the ability to generate output blocks in any order

and in parallel. The maximum length of the keystream produced by Salsa20/r is 270

bits.

Operations

Topic Explanations

Integer multiplications Argument Counter-argument

The basic argument for integer multiplication is that the output bits are complicated functions of the input bits, mixing the inputs more thoroughly than

A further argument against integer multiplication is that it increases the risk of timing leaks. What really matters is not the speed of integer multiplication, but the speed of constant-time integer multiplication, which is often much slower.

integer multiplication takes several cycles on typical CPUs, and many more cycles on some CPUs. For comparison, a comparably complex series of simple integer operations is always reasonably fast. Multiplication might be slightly faster on some CPUs but it is not consistently fast.

S-box lookups [An S-box lookup is an array lookup using an input-dependent index. Most ciphers are designed to take advantage of this operation. For example, typical high-speed AES software has several 1024-byte S-boxes, each of which converts 8-bit inputs to 32-bit outputs.]

Argument Counter-argument

S-boxes is that a single table lookup can mangle its input quite more thoroughly than a chain of a few simple integer operations taking the same amount of time.

A further argument against S-box lookups is that, on most platforms, theyare vulnerable to timing attacks. NIST's statement to the contrary (table lookup is \not vulnerable to

A simple integer operation takes one or two 32-bit inputs rather than one 8-bit input, so it electively mangles several 8-bit inputs at once. It is not obvious that a series of S-box lookups-even with rather large S-boxes, as in AES, increasing L1 cache pressure on large CPUs and forcing different implementation techniques for

Page 4: Salsa20

3 | P a g e

timing attacks") is erroneous.

small CPUs -is faster than a comparably complex series of integer operations.

Rotations [Rotations account for about one third of the integer operations in Salsa20, and more on the UltraSPARC. Replacing some of the rotations with a comparable number of additions might achieve comparable di usion in fewer rounds.]

Argument Counter-argument

The basic argument for rotations is that one xor of a rotated quantity provides as much diffusion as two xors of shifted quantities.

Encryption

Topic Explanations

Different encryption and decryption

The popularity of CBC appears to be a historical accident. I have found very few people arguing for CBC over counter mode, and none of the arguments are even marginally convincing. On occasion I encounter the superstitious notion that encryption by xor is \too simple"; but a one-time pad (in conjunction with aWegman-Carter MAC) provably achieves perfect secrecy (and any desired level of integrity), so there is obviously nothing wrong with xor. There are several clear arguments against CBC. One disadvantage of CBC is that it requires different code for encryption and decryption, increasing costs in many contexts. Another disadvantage of CBC is that the extra communication from the cryptanalyst into the cipher state is a security threat; regaining the original level of confidence means adding rounds, taking additional time.

Stream’s dependency over plaintext

Argument Counter-argument

The basic argument for incorporating plaintext into the stream (specically, incorporating plaintext bytes into subsequent bytes of the stream) is that this allows message authentication \for free." After encrypting the plaintext, one generate a constant number of additional stream bytes and output them as an authenticator of the plaintext. .

One counterargument is that “free" is a wild exaggeration. Incorporating the plaintext into the stream takes time for every block, and generating an authenticator takes time for every message.

Incorporation of plaintext, being extra communication from the cryptanalyst into the cipher state, is a security threat. Regaining the original level of condence means adding rounds, which takes additional time for every block.

State Argument Counter-argument

The argument for a larger state is that one does not need as many cipher rounds to achieve the same conjectured security level. Copying

A larger state loses time in some contexts. Reuse forces serialization: one cannot take advantage of extra hardware to

Page 5: Salsa20

4 | P a g e

state across blocks seems to provide just as much mixing as the rst few cipher rounds. A larger state therefore saves some time after the first block..

reduce the latency of encrypting or decrypting long messages. Furthermore, large states reduce the number of messages that can be processed simultaneously on limited hardware.

Block Size Argument Counter-argument

The basic argument for a larger block size, say 256 bytes, one does not need as many cipher rounds to achieve the same conjectured security level. Using a larger block size, like copying state across blocks, seems to provide just as much mixing as the rst few cipher rounds. A larger state therefore saves time.

A larger block size also loses time. On most CPUs, the communication cost of sweeping through a 256-byte block is a bottleneck; CPUs are designed for computations that don't involve so much data.

Hashing

Topic Explanations

Implementation of Block cipher

Argument Counter-argument

The basic argument for a block cipher for keeping the k words independent of the n words is that, for fixed k, it is easy to make a block cipher be an invertible function of n. But this feature seems to be of purely historical interest. Invertibility is certainly not necessary for encryption.

The basic disadvantage of a block cipher is that the k words consume valuable communication resources. A 64-byte block cipher with a 32-byte key would need to repeatedly sweep through 96 bytes of memory (plus a few bytes of temporary storage) for its 64 bytes of output; in contrast, Salsa20 repeatedly sweeps through just 64 bytes of memory (plus a few bytes of temporary storage) for its 64 bytes of output.

Code-Length Argument Counter-argument

Using two different kinds of rounds is the idea that attacks will have some extra difficulty passing through the switch from one kind to another. This extra difficulty would allow the cipher to reach the same security level with fewer rounds..

The basic counterargument is that extra code is expensive in many contexts. It increases pressure on a CPU's L1 cache, for example, and it increases the minimum size of a hardware implementation.

Diffusion among words Salsa20 views its 16 words as a 4 4 array. During the rst round, there is no communication between columns; each column has its own chain of 12 serial

Page 6: Salsa20

5 | P a g e

operations modifying the words in that column. During the second round, there is no communication between rows; each row has its own chain of 12 serial operations modifying the words in that row. Et cetera.

There are pairs (i; j) such that a change in word i has no opportunity to affect word j until the third round. A different communication structure would allow much faster diffusion of changes through all 16 words. On the other hand, it doesn't appear to be possible to achieve much faster diffusion of changes through all 512 bits.

Modifications other than add-rotate-xor

There are many plausible ways to modify each word in a column using other words in the same column. The author settled on \xor a rotated sum" as bouncing back and forth between incompatible structures on the critical path. The author chose \xor a rotated sum" over \add a rotated xor" for simple performance reasons: the x86 architecture has a three-operand addition (LEA) but not a three-operand xor.

Benchmarking Salsa20

The Salsa20 structure

Encryption of a 64-byte block is xor with the output of the Salsa20 hash function, where the input consists of the 32-byte Salsa20 key, the 8-byte nonce (unique message number), the 8-byte block counter, and 16 constant bytes. The reader is cautioned that encryption time is slightly longer than hashing time: in particular, a 64-byte xor is not free. The Salsa20 hash function regards its 64-byte input x as an array of 16 words in little-endian form. It performs 320 invertible modfications, where each modfication changes one word of the array. The resulting words are added to the original words, producing, in little-endian form, the 64-byte output Salsa20(x). Each modifiation involves xoring into one word a rotated version of the sum of two other words. Thus the 320 modifiations involve, overall, 320 additions, 320 rotations, and 320 xors. The rotations are all by constant distances. The entire series of modfications is a series of 10 identical double-rounds. Each double-round is a series of 2 rounds. Each round is a set of 4 parallel quarter-rounds. Each quarter-round is a series of 4 word modifiations.

Salsa20 on different Platforms

Platform Name Implementation Comparison to AES timings

AMD Athlon salsa20_word_pm software takes 29:25 Athlon cycles for a Salsa20 round, totalling 585 cycles (9:15 cycles/byte) for 20 rounds, totalling 645 cycles (10:08 cycles/byte) for the Salsa20 hash function, timed as 680 cycles with 35 cycles timing overhead. The timings are actually 655 or 656 cycles most of the time but 849 cycles on every eighth call, presumably because of branch mispredictions. The compiled code occupies 1248 bytes. Its main loop occupies 937 bytes and handles 4 rounds.

Osvik reports that unpublished software|with no protection against timing leaks|takes 225 Athlon cycles (over 14 cycles/byte) to encrypt a 16-byte block with a 16-byte AES key, assuming that the key was pre-expanded into 176 bytes. One can reasonably extrapolate that similar software would take over 300 Athlon cycles (over 18 cycles/byte) to encrypt a 16-byte block with a 32-byte AES key, assuming that the key was pre-expanded into 240 bytes.

IBM PowerPC salsa20_word_aix software takes 33 PowerPC

Page 7: Salsa20

6 | P a g e

RS64 IV RS64 IV cycles for each Salsa20 round, totalling 660 cycles (10:32 cycles/byte) for 20 rounds, totaling 756 cycles (11:82 cycles/byte) for the Salsa20 hash function, timed as 770 cycles with 14 cycles timing overhead. The compiled code for the Salsa20 hash function occupies 768 bytes. Its main loop occupies 392 bytes and handles 2 rounds.

Intel Pentium III

salsa20_word_pii software takes 37:5 pentium III cycles for each Salsa20 round, totalling 750 cycles (11:72 cycles/byte) for 20 rounds, totalling 837 cycles (13:08 cycles/byte) for the Salsa20 hash function, timed as 872 cycles with 35 cycles timing overhead. (The timings are actually 859 cycles most of the time but 908 cycles on every fourth call, presumably because of branch mispredictions.) The compiled code for the Salsa20 hash function occupies 1280 bytes. Its main loop occupies 937 bytes and handles 4 rounds.

Osvik reports that unpublished software|with no protection against timing leaks|takes 224 Pentium III cycles (14 cycles/byte) to encrypt a 16-byte block with a 16-byte AES key, assuming that the key was pre-expanded into 176 bytes. One can reasonably extrapolate that similar software would take over 300 Pentium III cycles (over 18 cycles/byte) to encrypt a 16-byte block with a 32- byte AES key, assuming that the key was pre-expanded into 240 bytes.

Intel Pentium 4 f12

salsa20_word_p4 software takes 48 Pentium 4 f12 (Willamette) cycles for each Salsa20 round, totalling 960 cycles (15 cycles/byte) for 20 rounds, totaling 1052 cycles (16:44 cycles/byte) for the Salsa20 hash function, timed as 1136 cycles with 84 cycles timing overhead. The compiled code for the Salsa20 hash function occupies 1144 bytes. Its main loop occupies 629 bytes and handles 4 rounds.

Osvik reports that unpublished software|with no protection against timing leaks|takes 260 Pentium 4 (f12?) cycles (16:25 cycles/byte) to encrypt a 16-byte block with a 16-byte AES key, assuming that the key was pre-expanded into 176 bytes. Matsui and Fukuda report that unpublished software|with no protection against timing leaks|takes 251 Pentium 4 (f29?) cycles (15:68 cycles/byte) and 284 Pentium 4 f33 cycles (17:75 cycles/byte). One can reasonably extrapolate that similar software would take over 340 Pentium 4 f12 cycles (over 21 cycles/byte) to encrypt a 16-byte block with a 32-byte AES key, assuming that the key was pre-expanded into 240 bytes.

Intel Pentium M

salsa20_word_pm software takes 33:75 Pentium M cycles for each Salsa20 round, totalling 675 cycles (10:55 cycles/byte) for 20 rounds, totalling 740 cycles (11:57 cycles/byte) for the Salsa20 hash function, timed as 790 cycles with 50 cycles timing overhead. (The timings are actually 780 or 781 cycles most of the time but 856 cycles on every eighth call, presumably because of branch mispredictions.) The compiled code for the Salsa20 hash function occupies 1248 bytes. Its main loop occupies 937 bytes and handles 4 rounds.

The Pentium M might compute AES in marginally less time than the Pentium III, but both CPUs face the same basic AES bottleneck: encrypting a 16-byte block with a 16-byte AES key requires 200 S-box lookups, which cannot take fewer than 200 cycles (12:5 cycles/byte). Similarly, encrypting a 16-byte block with a 32-byte AES key requires 280 S-box lookups, which cannot take fewer than 280 cycles (17:5 cycles/byte). Even more S-box lookups are required if keys are not pre-expanded.

Motorola salsa20_word_macos software takes 24:5 Lipmaa reports that AES software by Ahrens|with,

Page 8: Salsa20

7 | P a g e

PowerPC 7410 PowerPC 7410 cycles for each Salsa20 round, totalling 490 cycles (7:66 cycles/byte) for 20 rounds, totaling approximately 570 cycles (8:91 cycles/byte) for the Salsa20 hash function, timed as approximately 584 cycles with 14 cycles timing overhead. (Precise timings are dicult: the CPU's cycle counter has 16-cycle resolution.) The compiled code for the Salsa20 hash function occupies 768 bytes. Its main loop occupies 392 bytes and handles 2 rounds.

presumably, no protection against timing leaks|takes 401 PowerPC 7400 cycles (over 25 cycles/byte) to encrypt a 16-byte block with a 16-byte AES key, assuming that the key was pre-expanded into 176 bytes. I am not aware of any relevant differences between the PowerPC 7400 and the PowerPC 7410. It should be possible to do somewhat better|my own public-domain AES software, including key expansion, takes about 490 cycles on the PowerPC 7410|but AES is clearly much slower than Salsa20 on this CPU

Sun UltraSPARC II

salsa20_word_sparc software takes 40:5 UltraSPARC II cycles for each Salsa20 round, totalling 810 cycles (12:66 cycles/byte) for 20 rounds, totaling 881 cycles (13:77 cycles/byte) for the Salsa20 hash function, timed as 892 cycles with 11 cycles timing overhead. The compiled code for the Salsa20 hash function occupies 936 bytes. Its main loop occupies 652 bytes and handles 2 rounds.

Lipmaa reports that unpublished software|with, presumably, no protection against timing leaks|takes 270 UltraSPARC II cycles (over 16 cycles/byte) to encrypt a 16-byte block with a 16-byte AES key, assuming that the key was pre-expanded into 176 bytes. One can reasonably extrapolate that similar software would take over 370 UltraSPARC II cycles (over 23 cycles/byte) to encrypt a 16-byte block with a 32-byte AES key, assuming that the key was pre-expanded into 240 bytes.

Sun UltraSPARC III

salsa20_word_sparc software takes 41 UltraSPARC III cycles for each Salsa20 round, totalling 820 cycles (12:82 cycles/byte) for 20 rounds, totaling 889 cycles (13:90 cycles/byte) for the Salsa20 hash function, timed as 905 cycles with 16 cycles timing overhead. The compiled code for the Salsa20 hash function occupies 936 bytes. Its main loop occupies 652 bytes and handles 2 rounds.

AES on an UltraSPARC III is at least as slow as AES on an UltraSPARC II.

Salsa20 specification

Defining Functions

Functions Inputs & Outputs

Definition

The quarterround function

If y is a 4-word sequence then quarterround(y) is a 4-word sequence

Page 9: Salsa20

8 | P a g e

The rowround function

If y is a 16-word sequence then rowround(y) is a 16-word sequence.

The columnround function

If x is a 16-word sequence then columnround(x) is a 16-word sequence.

The doubleround function

If x is a 16-word sequence then doubleround(x) is a 16-word sequence.

A double round is a column round followed by a row round: doubleround(x) = rowround(columnround(x)).

The littleendian function

If b is a 4-byte sequence then littleendian(b) is a word.

Specifications

Functions Inputs & Outputs Definition

The Salsa20 hash function

If x is a 64-byte sequence then Salsa20(x) is a 64-byte sequence.

Page 10: Salsa20

9 | P a g e

The Salsa20 expansion function

If k is a 32-byte or 16-byte sequence and n is a 16-byte sequence then Salsa20k(n) is a 64-byte sequence.

The Salsa20 encryption function

Let k be a 32-byte or 16-byte sequence. Let v be an 8-byte sequence. Let m be a l-byte sequence for some l€{1,2,…, }. The Salsa20 encryption of m with nonce v under key k, denoted Salsa (v) m, is an l-byte sequence.

Security and cryptanalysis of Salsa20

Side-channel attacks

Natural Salsa20 implementations take constant time on a huge variety of CPUs; here constant means input-independent. There is no incentive for the authors of Salsa20 software to use variable-time operations such as S-box lookups. Timing attacks against Salsa20 are therefore just as di_cult as pure cryptanalysis of the Salsa20 outputs. The operations in Salsa20 are also among the easiest to protect

against power attacks and other side-channel attacks.

Notes on the uniform randomness and diagonal constants

† Salsa20 column round: Each Salsa20 column round affects each column in the same way starting from the diagonal. Each Salsa20 row round affects each row in the same way starting from the diagonal. Consequently, shifting the entire Salsa20 hash-function input array along the diagonal has exactly the same effect on the output.

† Salsa20 expansion function: o Eliminates this shift structure by limiting the attacker's control over the hash-

function input. In particular, the input diagonal is always 0x61707865, 0x3320646e, 0x79622d32, 0x6b206574, which is different from all its nontrivial shifts. In other words, two distinct arrays with this diagonal are always in distinct orbits under the shift group.

o Eliminates this rotation structure. The input diagonal is different from all its nontrivial shifts and all its nontrivial rotations and all nontrivial shifts of its nontrivial

Page 11: Salsa20

10 | P a g e

rotations. In other words, two distinct arrays with this diagonal are always in distinct

orbits under the shift/rotate group.

† Salsa20 hash function: Operations are almost compatible with rotation of each input word by, say, 10 bits. Rotation changes the effect of carries that cross the rotation boundary, but it is consistent with all other carries, and with the Salsa20 operations other than addition.

† Attacks based on Non-randomness: Simon Fischer, Willi Meier, Côme Berbain, Jean-François Biasse and M. J. B. Robshaw published a paper which shows that Stream cipher initialisation should ensure that the initial state or keystream is not detectably related to the key and initialisation vector. In this paper we analyse the key/IV setup of the eSTREAM Phase 2 candidates Salsa20 and TSC-4. In the case of Salsa20 we demonstrate a key recovery attack on six rounds and observe non-randomness after seven. For TSC-4, non-randomness over the full eight-round initialisation phase is detected, but would also persist for more rounds.

Differential Cryptanalysis of Salsa20/8

The idea of a differential attack is that some “small” differences in input states have a perceptible chance of producing “small” differences after the first step of the computation, the second step of the computation, etc.

Salsa AES

Salsa20 is quite different in this respect from ciphers such as AES where the input size is as large as the state size. AES has 16-byte inputs, 16-byte outputs, and (at least) 16-byte keys; there are 2384 choices of (n, ,k) so presumably there are more than ,choices in which both of the 128-bit quantities and AESk AESk (n) are “small”.

Salsa20 has 16-byte inputs, 64-byte outputs, and

32-byte keys; there are ,choices of (n, ,k) so there is no a-priori reason to believe that any of the choices have the 128-bit quantity and the 512-bit quantity and Salsak Salsak (n) are “small”.

† Yukiyasu Tsunoo , Teruo Saito , Hiroyasu Kubo , Tomoyasu Suzaki, and Hiroki Nakashima

published a paper which presents a cryptanalysis of the Salsa20 stream cipher proposed in 2005. Salsa20 was submitted to eSTREAM, the ECRYPT Stream Cipher Project. The cipher

uses bitwise XOR, addition modulo , and constant-distance rotation operations on an internal state of 16 32-bit words.

† It is reported that there is a significant bias in the differential probability for Salsa20’s 4th

round internal state. It is further shown that using this bias, it is possible to break the 256-bit secret key 8-round reduced Salsa20 model with a lower computational complexity than an exhaustive key search. The cryptanalysis method exploits characteristics of addition, and succeeds in reducing the computational complexity compared to previous methods.

Truncated differential cryptanalysis of five rounds of Salsa20

Going further detail of the paper presented by Yukiyasu Tsunoo , Teruo Saito , Hiroyasu Kubo , Tomoyasu Suzaki, and Hiroki Nakashima; Paul Crowley published another paper stating “Truncated differential cryptanalysis of five rounds of Salsa20” which present an attack on Salsa20 reduced to

five of its twenty rounds.This attack uses many clusters of truncated differentials and requires ,

work and plaintexts. This conclusion leaves some open questions.

Page 12: Salsa20

11 | P a g e

It is clear that a naive attack of this type cannot be extended to more than a handful of rounds; this has no negative implications for the security of the full Salsa20- 32/20 presented to eSTREAM. Nonetheless, the degree of clustering exhibited by these differential characteristics is surprising; it is more usual for a single differential trail to dominate. It is also striking to find differential trails whose overall probability is so greatly mispredicted by the products of the probabilities of its components, marking a violation of the independence assumption usual in differential cryptanalysis. In both instances, it would bear investigation whether other ciphers that rely heavily on addition mod 2n to introduce nonlinearity in GF(2) would also show these properties in differential cryptanalysis, or related properties in other forms of cryptanalysis.

Algebraic attacks

General-purpose equation-solving methods, notably Buchberger's algorithm for computing Groebner bases, are remarkably powerful. Clegg, Edmonds, and Impagliazzo in proved for a comparable problem, namely finding proofs in propositional logic|that a Groebner-basis computation can quickly solve any problem that can be quickly solved by various ad-hoc proof-finding techniques. Even better, the Groebner-basis computation can quickly solve other problems that cannot be quickly solved by the ad-hoc techniques. It would be interesting to see analogous theorems regarding various ad-hoc cryptanalytic techniques. Fortunately, there does not seem to exist any “small” set of equations for the state bits in Salsa20. Each of the 320 32-bit additions in the Salsa20 computation requires dozens of quadratic equations, producing a substantially larger system of equations than are required to describe, for example, the bits in AES. Groebner-basis techniques for solving the AES-bit equations are, by the most optimistic estimates, slightly faster than brute-force search for a 256-bit key, but they use vastly more memory and thus have a much worse price-performance ratio. Algebraic attacks against Salsa20 appear to be even more difficult.

Other notions of security

Attacks Explanation Weak-key attacks This type of attack seems highly implausible for Salsa20. The Salsa20 key is

mangled along with the input in an extremely complicated way. Any key differences rapidly spread through the entire Salsa20 state for the same reason that input differences do.

Equivalent-key attacks This type of attack, like a weak-key attack, seems highly implausible for Salsa20 as machine would violate the Salsa20 security conjecture. In other words, there is no need to make a separate conjecture regarding equivalent keys.

Related-key attacks The standard solutions to all the standard cryptographic problems -encryption, authentication, etc. - are protocols that do not allow related-key attacks on the underlying primitives.There is no evidence of violence till date.

Key Recovery Attack At FSE 2008 Aumasson et al. improved this attack on Salsa20/7 and presented the first key-recovery attack on Salsa20/8. . It is a differential attack based on a technique called probabilistic neutral bits. The authors identify collision and preimage attacks for two simplified variants, then we discuss differential attacks on the original version, and exploit a high-probability differential to reduce complexity of collision search from 2256 to 279 for 3-round

Rumba.

Page 13: Salsa20

12 | P a g e

Alternative Proposals

Extending the Salsa20 nonce

Daniel J. Bernstein, the creator of Salsa published an another paper entitled “Extending the Salsa20

nonce” which introduces the XSalsa20 stream cipher. XSalsa20 is based upon the Salsa20 stream

cipher but has a much longer nonce: 192 bits instead of 64 bits. XSalsa20 has exactly the same

streaming speed as Salsa20, and its extra nonce-setup cost is slightly smaller than the cost of

generating one block of Salsa20 output. The paper proves that XSalsa20 is secure if Salsa20 is

secure: any fast attack on XSalsa20 using q queries and succeeding with probability p can be

converted into a fast attack on Salsa20 succeeding with probability at least p/(q + 1).

The paper introduces a new family of stream ciphers, XSalsa20. XSalsa20 is, at first glance, quite similar to Salsa20: it is built from exactly the same operations, has exactly the same protections against side-channel attacks, has exactly the same streaming speed, supports 256-bit keys, and allows reduced- round variants such as XSalsa20/12. Note that the speed reports above are for full-round Salsa20/20, not Salsa20/12. The advantage of XSalsa20 over Salsa20 is a longer nonce: 192 bits rather than 64 bits. The disadvantage is that nonce setup is less efficient-but the extra cost here is comparable to, and in fact slightly smaller than, the cost of generating a single Salsa20 output block. XSalsa20 might at first appear to be an ad-hoc design, following standard principles but potentially vulnerable to new attacks. On the contrary! The paper proves that any fast successful attack on XSalsa20 can be converted into a fast successful attack on Salsa20. Confidence in the security of Salsa20 therefore implies confidence in the security of XSalsa20. The paper is not meant to take a position in the dispute regarding the necessity of longer nonces. The paper does not claim any benefits for XSalsa20 in an application that already works with Salsa20's 64-bit nonces. What the paper shows is that-in case an application does want longer nonces-the Salsa20 nonce can be safely extended at surprisingly low cost.

References 1. Daniel J. Bernstein, Salsa20 - Design, Specification, Security and Speed. URL:

http://www.ecrypt.eu.org/stream/p3ciphers/salsa20/salsa20_p3.zip

2. Paul Crowley, Truncated differential cryptanalysis of five rounds of Salsa20. URL:

http://eprint.iacr.org/2005/375

3. Yukiyasu Tsunoo , Teruo Saito , Hiroyasu Kubo , Tomoyasu Suzaki, and Hiroki Nakashima,

Differential Cryptanalysis of Salsa20/8. URL:

http://sasc.crypto.rub.de/files/sasc2007_039.pdf

4. Jean-Philippe Aumasson, Simon Fischer, Shahram Khazaei, Willi Meier and Christian

Rechberger , New Features of Latin Dances: Analysis of Salsa, ChaCha, and Rumba. URL:

http://www.springerlink.com/content/j35241j881018085/

5. Simon Fischer, Willi Meier, Côme Berbain, Jean-François Biasse and M. J. B. Robshaw, Non-

randomness in eSTREAM Candidates Salsa20 and TSC-4. URL:

http://www.springerlink.com/content/46wv58h040218wp4/

6. Daniel J. Bernstein, Extending the Salsa20 nonce. URL: http://cr.yp.to/snuffle/xsalsa-

20110204.pdf

7. Robshaw, Matthew; Billet, Olivier (Eds.), New Stream Cipher Designs. URL:

http://www.springer.com/computer/security+and+cryptology/book/978-3-540-68350-6