[ACM Press the 4th international conference - Sydney, Australia (2011.11.14-2011.11.19)] Proceedings...

5
Algebraic Analysis of GOST Encryption Algorithm Ludmila Babenko College of Information Security Taganrog Institute of Technology Southern Federal University ul. Chekhova, 2, 347928, Taganrog, Russia Phone: +7 8634 371905 [email protected] Evgeniya Ishchukova College of Information Security Taganrog Institute of Technology Southern Federal University ul. Chekhova, 2, 347928, Taganrog, Russia Phone: +7 8634 371905 [email protected] Ekaterina Maro College of Information Security Taganrog Institute of Technology Southern Federal University ul. Chekhova, 2, 347928, Taganrog, Russia Phone: +7 8634 371905 [email protected] ABSTRACT This paper is devoted to the investigation of GOST algorithm with regard to its resistance against algebraic cryptanalysis. GOST algorithm is a state standard of Russian Federation. Its characteristic feature is the use of variable S- blocks and simple mathematical operations. It is considered that any initial values of S-blocks provide enough strength to resist any attacks. The general idea of algebraic analysis is based on the representation of initial encryption algorithm as a system of multivariate quadratic equations, which define relations between a secret key and a cipher text. Extended linearization method is evaluated as a method for solving the nonlinear system of equations. The most challenging problem of the analysis is caused by addition modulo 2 n in GOST. In order to make the analysis simpler we have considered a simplified scheme for GOST, in which the operation of addition modulo 2n is replaced by the addition modulo 2 (denoted as GOST). We have proposed an analysis algorithm of GOST according to experimental data. The research has shown that 32-round GOST is described by a system of 5376 quadratic equations, which characterize dependencies between inputs and outputs of S- blocks. The total number of variables is 2048 and the system contains 9472 monomials. Generation of the system for a single-round GOST demands circa 14 hours (with AMD Athlon X2DualCore processor 3800+, 1Gb RAM). Categories and Subject Descriptors F.2.0 Theory of Computation, Analysis of Algorithms and Problem Complexity, General. General Terms Algorithms, Security. Keywords GOST, GOST, S-box, systems of multivariate quadratic equations, Algebraic Cryptanalysis, extended linearization method, Gaussian elimination. 1. INTRODUCTION The first assumption on the possibility of algebraic attacks was made by Claude Shannon in [1]; he demonstrated that the attack demands “as much work as solving a system of simultaneous equations in a large number of unknowns”. Based on Shannon’s work one can conclude the cryptographic attacks on many symmetric and asymmetric encryption algorithm is reduced to solving a system of non-linear equation system over a finite field [2-4]. This problem (also known as MQ problem) in turn is NP-complete, and its complexity is proportional to the exponential of the number of unknown variables. One of the possible methods of solving MQ problems is the Gröbner basis method, which is based on the algorithm proposed in 1965 in [5]. Modifications of this method are also widely used in practice, e.g. F4 [6] and F5 [7] algorithms offered by Jean-Charle Faugère. In 1999, Aviad Kipnis and Adi Shamir offered the relinearization method [8]. The authors have proposed to reduce MQ problem to a linear representation, which can be solved with known efficient algorithms, and to obtain extra equation using existing relations between variables. The extended relinearization (XL) method was offered by Nicolas Courtois, Alexander Klimov, Jaques Patarin, and Adi Shamir as an extension of the relinearization method. The method exploits the transformation to the linear representation as well, however extra equations are formed by multiplying all the equations to monomials of a predefined degree. In paper [4], which is devoted to an algebraic attack on AES, it was found out that the nonlinear system is sparse. Another method called Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIN’11, November 14-19, 2011, Sydney, Australia. Copyright 2011 ACM 978-1-4503-1020-8/11/11. 57

Transcript of [ACM Press the 4th international conference - Sydney, Australia (2011.11.14-2011.11.19)] Proceedings...

Algebraic Analysis of GOST Encryption Algorithm 

Ludmila Babenko College of Information Security

Taganrog Institute of Technology Southern Federal University

ul. Chekhova, 2, 347928, Taganrog, Russia

Phone: +7 8634 371905

[email protected]

Evgeniya Ishchukova College of Information Security

Taganrog Institute of Technology Southern Federal University

ul. Chekhova, 2, 347928, Taganrog, Russia

Phone: +7 8634 371905

[email protected]

Ekaterina Maro College of Information Security

Taganrog Institute of Technology Southern Federal University

ul. Chekhova, 2, 347928, Taganrog, Russia

Phone: +7 8634 371905

[email protected]   

ABSTRACT This paper is devoted to the investigation of GOST

algorithm with regard to its resistance against algebraic cryptanalysis. GOST algorithm is a state standard of Russian Federation. Its characteristic feature is the use of variable S-blocks and simple mathematical operations. It is considered that any initial values of S-blocks provide enough strength to resist any attacks. The general idea of algebraic analysis is based on the representation of initial encryption algorithm as a system of multivariate quadratic equations, which define relations between a secret key and a cipher text. Extended linearization method is evaluated as a method for solving the nonlinear system of equations.

The most challenging problem of the analysis is caused by addition modulo 2n in GOST. In order to make the analysis simpler we have considered a simplified scheme for GOST, in which the operation of addition modulo 2n is replaced by the

addition modulo 2 (denoted as GOST). We have proposed an analysis algorithm of GOST according to experimental data.

The research has shown that 32-round GOST is described by a system of 5376 quadratic equations, which characterize dependencies between inputs and outputs of S-blocks. The total number of variables is 2048 and the system contains 9472 monomials. Generation of the system for a single-round GOST demands circa 14 hours (with AMD Athlon X2DualCore processor 3800+, 1Gb RAM).

Categories and Subject Descriptors

F.2.0 Theory of Computation, Analysis of Algorithms and Problem Complexity, General.

General Terms Algorithms, Security.

Keywords

GOST, GOST, S-box, systems of multivariate quadratic equations, Algebraic Cryptanalysis, extended linearization method, Gaussian elimination.

1. INTRODUCTION The first assumption on the possibility of algebraic

attacks was made by Claude Shannon in [1]; he demonstrated that the attack demands “as much work as solving a system of simultaneous equations in a large number of unknowns”. Based on Shannon’s work one can conclude the cryptographic attacks on many symmetric and asymmetric encryption algorithm is reduced to solving a system of non-linear equation system over a finite field [2-4]. This problem (also known as MQ problem) in turn is NP-complete, and its complexity is proportional to the exponential of the number of unknown variables.

One of the possible methods of solving MQ problems is the Gröbner basis method, which is based on the algorithm proposed in 1965 in [5]. Modifications of this method are also widely used in practice, e.g. F4 [6] and F5 [7] algorithms offered by Jean-Charle Faugère. In 1999, Aviad Kipnis and Adi Shamir offered the relinearization method [8]. The authors have proposed to reduce MQ problem to a linear representation, which can be solved with known efficient algorithms, and to obtain extra equation using existing relations between variables. The extended relinearization (XL) method was offered by Nicolas Courtois, Alexander Klimov, Jaques Patarin, and Adi Shamir as an extension of the relinearization method. The method exploits the transformation to the linear representation as well, however extra equations are formed by multiplying all the equations to monomials of a predefined degree. In paper [4], which is devoted to an algebraic attack on AES, it was found out that the nonlinear system is sparse. Another method called

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIN’11, November 14-19, 2011, Sydney, Australia. Copyright 2011 ACM 978-1-4503-1020-8/11/11.

57

extended sparse linearization (XSL) was proposed in order to effectively use this property. Unlike XL method, which is based on the multiplication to all the variables power a certain fixed value, only specially selected monomials are used. For instance, the monomials that already appear in other equations are recommended. In this case, although the number of equations is increased, the number of unique products remains the same.

GOST algorithm has been the national symmetric encryption standard for over twenty years in Russian Federation. Since then, no attacks that compromise its cryptographic strength are found. Before proceeding, we have to outline some results on GOST analysis. The research described in [9] is devoted to GOST resistance against differential cryptanalysis. This method is inapplicable if sufficiently “strong” S-blocks are used. The authors of [10] consider applicability of slide attack to GOST. The reflection attack against 30-round GOST that exploits the use of the same round function for both encryption and decryption is described in [11]. In order to find a key, 2224 encryption operations were required provided that 232 plain and cipher text pairs are known. Reflection MITM attack [10] improved reflection attack characteristics so that 2225 encryption operations were sufficient for the full GOST given the same number of 232 text pairs. Apparently, all the attacks demand unreasonably much data known in advance and extremely complex computations. We believe that algebraic attacks of lower complexity can be found.

The aim of this research is to investigate the possibility to apply algebraic attacks against GOST encryption. As soon as the single-round key encryption takes both the S-block input

and the plain text, we will start from evaluating GOST

algorithm as described in [10] for simplicity. GOST was chosen because addition modulo 232 is replaced by simpler addition module 2 in it. This substitution makes it possible to map secret key bits to known data. In order to switch to algebraic attacks against GOST, one can use a method of addition modulo 232 described in [12]. We will use XL method for solving nonlinear equation systems.

The paper is structured as follows. Section 2 contains a brief description of GOST. Basic principles of algebraic cryptanalysis are summarized in Section 3. Section 4 contains

basic approaches to the analysis of GOST as well as an analysis algorithm for the authentic GOST.

2. GOST OVERVIEW GOST 28147-89 encryption algorithm is the state

standard of the Russian Federation and its use is mandatory for Russian state organizations. GOST algorithm is a symmetric block cipher, which conforms to Feistel scheme. 64-bit blocks of data are submitted to the input and converted into 64-bit blocks of encrypted data by 256-bit key. In each round the right side of plain text messages is processed by function F, which converts data with three cryptographic operations:

adding data and subkey modulo 232, substitution of data using S-boxes, and left cyclic shift by 11 positions. Output of F-function is added modulo 2 to the left part of the plaintext, then right and left sides are swapped for next round. The algorithm has 32 rounds. In the last round of encryption right and left parts are not swapped. The overall dataflow diagram of GOST is shown in Fig. 1.

GOST uses 8 S-boxes, which convert 4-bit input to 4-bit output. Unlike most encryption algorithms, GOST has no predefined S-boxes and any values can be used for them.

Secret key contains 256 bits and is represented as a sequence of eight 32-bit words: K1, K2, K3, K4, K5, K6, K7 and K8. In each round of encryption one of these 32-bit words is used as a round subkey. When round subkey is calculated, the following principle is used: from round 1 to round 24 the order is straight, (K1, K2, K3, K4, K5, K6, K7, K8, K1, K2, etc); from round 25 to round 32 reversed order is used (K8, K7, K6, K5, K4, K3, K2, K1). Thus, it appears that the same subkey K1 is used at both the first and the last rounds. The

only distinction between GOST and GOST is the use addition module 2 by the former algorithm instead of addition modulo 232.

Fig. 1 – GOST Dataflow

3. ALGEBRAIC ANALYSIS Algebraic attacks are based on finding a secret key from

a system of quadratic equations that contains known data. It is usually assumed that the analyst known a set of plaintexts with corresponding ciphertexts. S-blocks are used as a source for finding equations for block ciphers because substitution is the only non-linear operation. One has to create a set of equations for an S-block that characterize its I/O relations, which, in turn, have to hold with the probability of 1 for all possible input values.

Assume an ss-bit S-block. Denote the number of all compositions of input and output block bits as t.

Hence, at least t-2s linearly independent equations exist that hold with the probability of 1.

58

For instance, a 33-bit S-block is characterized by the following equation (1):

03

0

3

0

3

0,

3

0,

3

0,

i

ii

iji

jiji

jiji

ji yxyyyxxx (1)

where ,,,,, are binary coefficients;

xi and yi are correspondingly input and out bits of the S-block.

The number of possible monomials with the degree of 2 or less is calculated according to formula (2):

122

2

ss

t (2)

i.e. for the 33-bit S-block obtain t=

162

6

=22.

Therefore, it is possible to find r≥t-23=14 linearly independent equations. If the obtained system contains cubic elements, the number of linearly independent equations reaches r≥26.

Algebraic attacks include two stages: 1. Creating a non-linear system of equations. 2. Solving and MQ-problem. On the first stage, 2t equations can be created at most.

However, only a part of them correspond the S-blocks. In

order to check this fact we have to build a truth table that contains values of each monomial given all possible S-block inputs (see Table 1).

Only linearly independent equations should be selected from each row of the table. In order to achieve this, one can simply add them to each other modulo 2 that leads to elimination of linearly dependent equations. The features of the obtained system influence the second stage method selection.

This research is aimed to investigate the possibility to use extended linearization method presented in [13]. XL algorithm is described as follows:

1. Find the maximal degree of monomials for multiplying them with the equations.

2. Multiply the equations and the monomials. 3. Aggregate the original systems and additional

equations. 4. Substitute all non-linear elements with new variables

(linearization).

5. Solve the linear system (Gauss method). The original systems contains r equations, 2s variables, and t monomials. Denote the maximal degree of monomials in the system as d, thus the equations are multiplied with the monomials of the degree of (d-2).

Table 1. Truth table for an s-bit S-block

S-block input S-block output All compositions of S-block inputs and outputs

All possible S-block inputs (from 0 to 2s)

хs … х1 уs … у1 хsхs-1 … х2x1 ysуs-1 … y2y1 хsуs … х1у1

0 … 0 1 … 1

1 … 1 0 … 1

Obtain

2

2'

d

srr extra equations after

multiplication, while the number of monomials reaches t'=

d

s2. The system of equations can be solved with

linearization only if the number of linearly independent equations is not fewer than the number of monomials, i.e. the inequality (3) must hold.

r’≥t’ (3) From (3) obtain the degree of the monomials (4).

d≥r

s2(4)

Another requirement to follow is the necessity that d is bigger than 2. Otherwise, no new equations can be found. Obtain the system (5) that takes all the conditions into account.

, 2

2 if 3,

22

,2

r

sd

r

sif

r

sd

(5)

59

4. ALGEBRAIC CRYPTANALYSIS OF

GOST WITH EXTENDED LINEARIZATION

Consider a single round of GOST . One has to

generate a system of equation for 8 S-block that work simultaneously. Consider obtaining an equation for a single block, which is defined in Table 2.

First of all, we generate all the equations described by (6). The total number of these equations is 237=137438953472; the number of monomials is 37; and the number of variables is 8.

04

0

4

0

4

0,

4

0,

4

0,

i

ii

iji

jiji

jiji

ji yxyyyxxx (6)

Then, select the equations that correspond to the predefined S-block as presented in Fig. 2.

Among all the equations, only 2097151 proved correct for the defined S-block. Of those ≈37-24=21 linearly independent equations can be extracted. Suppose that the minimal number of equation (21) is obtained. Consider applying the extended linearization method to the system. First

of all, the parameter d should be found. As soon as 22

r

s,

assume d=3. Hence, the equations are multiplied to first-degree monomials: {x1, x2, x3, x4, y1, y2, y3, y4}. Therefore 21·8=168 extra equations are added. The target system will contain 189 equation and 75 monomials, which are treated as new variables after linearization.

Thus, a system of 8·189=1512 equation should be generated for a single round that describes dependencies between S-block input and output. The number of variables is 64 while the number of monomials is 75·8=600. Even if some equations prove linearly dependent on the others, the number of remaining equations is sufficient for the linearization method.

After the generation of the system we have to advert to round keys from S-block inputs and outputs. In order to achieve this, we have to define each ith bit of S-block input as a sum modulo 2 of the right part of the plaintext and the round key as presented in equation (7).

xi = ki tRi (7)

where ki is the ith bit of the round key, and tRi is the ith bit of the right part of the plaintext.

Table 2. Exemplary S-block

x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

y=S(x) 4 10 9 2 13 8 0 14 6 11 1 12 7 15 5 3

Fig. 2 – Truth table

From the structure of GOST, the output of an S-block is defined (8).

сLi = (yi<<11) tLi (8)

yi is the ith bit of S-block output, and tLi is the ith bit of plaintext left part.

The output bit of S-block is defined by formula (9).

yi=сLi >>11 tLi>>11 (9)

Therefore, when considering a single-round system, the number of variables in the non-linear system can be halved because S-block outputs are defined unambiguously.

If we turn to the full GOST, the system of quadratic equations before XL linearization comprises 21·8·32=5376 equations, 32·64=2048 variables, and 37·8·32=9472 monomials. After multiplying it to first-degree monomials

60

obtain a system of 48384 cubic equations and 19200 monomials. When considering the attack on the first round, it will be similar to the single-round version attack and the output will remain unknown y1,i. Based on Feistel scheme, the inputs of the second round S-block are found as (10)

x2,i = k2,itLiy1,i<<11 (10) The output y2,i remains uknown as well. Relations

between inputs and outputs of subsequent S-blocks are characterized by (11).

1

1

11,,,

1

2

11,,,

even isn if,

odd isn if,

n

j

ijLiinin

n

j

ijRiinin

ytkx

ytkx

(11)

Substitution for the last round of GOST can be done according to (12) and (13).

x32,i = cRi k32,i (12)

y32,i = cRi>>11 tRi>>11

31

111,

jijy

(13)

In order to reduce the number of unknown variables in the system, one can use features of a round key calculation algorithm. The unknown parameters of the system in this case will comprise 8 round keys, i.e. 8·32=256 unknown bits and yi (32·32=1024 unknown bits).

5. CONCLUSION The research considers applicability of the algebraic

attack against GOST algorithms. This paper is devoted particularly to the simplified version of GOST known as

GOST. As the result, we obtained the system of 5376 quadratic equations and 9472 monomials that describe the full GOST encryption. Naïve application of linearization did not allow us to find a suitable solution, but XL method helped to effectively increase the number of linearly independent equations. XL method increased the number of equations to 48384 and the number of monomials to 19200. The obtained system was solved by substitution of all the quadratic and cubic equations to new variables and application of Gauss method afterwards.

The presented calculations lead to the conclusion that the further research can find features of GOST. The necessity to have a substantially large number of known plaintexts and corresponding ciphertexts is also an important feature of the presented attack. In order to decrease the number of variables one can use round key generation algorithms. We are planning to investigate the efficiency of this approach later.

6. REFERENCES [1] Shannon C.E. Communication theory of secret systems.

Bell System Technical Journal 28, 704 (1949)

[2] Nicolas Courtois, Gregory V. Bard: Algebraic Cryptanalysis of the Data Encryption Standard, In 11-th IMA Conference, Cirencester, UK, 18-20 December 2007, Springer LNCS 4887.

[3] Patarin J. Hidden Fields Equations (HFE) and Isomorphisms of Polynomials (IP): two new families of Asymmetric Algorithms; in Eurocrypt’96, Springer Verlag, pp. 33-48.

[4] Nicolas Courtois and Josef Pieprzyk, Cryptanalysis of Block Ciphers with Overdefined Systems of Equations In Yuliang Zheng, editor, ASIACRYPT 2002, volume 2501 of Lecture Notes in Computer Science, pages 267–287. Springer, 2002.

[5] Bruno Buchberger. Ein Algorithmus zum Auffinden der Basiselemente des Restklassenringes nach einem nulldimensionalen Polynomideal. PhDthesis, 1965.

[6] Jean-Charles Faugère, A new efficient algorithm for computing Gröbner bases (F4), Journal of Pure and Applied Algebra 139 (1999) pp. 61-88.

[7] Jean-Charles Faugère, A new efficient algorithm for computing Gröbner basis without reduction to 0 F5, In T. Mora, editor, Proceeding of ISSAC, pages 75-83, ACM Press, July 2002.

[8] A.Kipnis, A. Shamir. Cryptanalysis of the HFE Public Key Cryptosystem by Relinearization. Crypto99, LNCS 142,144. Springer-Verlag, pp.19-31.

[9] L. Babenko, E. Ishchukova, Differential Analysis GOST Encryption Algorithm // Proceedings of the 3rd International Conference of Security of Information and Networks (SIN 2010), p.149-157. ACM, New York, 2010.

[10] A. Biryukov and D. Wagner. Advanced Slide Attacks. In Proc. EUROCRYPT 2000, LNCS 1807, pp.589-606, Springer, 2000.

[11] Orhun Kara. Reflection Attacks on Product Ciphers. Cryptology ePrint Archive, Report 2007/043, 2007. http://eprint.iacr.org/

[12] Nicolas Courtois and Blandine Debraize: Algebraic Description and Simultaneous Linear Approximations of Addition in Snow 2.0., In ICICS 2008, 10th International Conference on Information and Communications Security, 20 - 22 October, 2008, Birmingham, UK. In LNCS 5308, pp. 328-344, Springer, 2008.

[13] N. Courtois, A. Klimov, J. Patarin, A. Shamir. Efficient Algorithms for solving Overdefined System of Multivariate Polynomial Equations. Eurocrypt'2000, LNCS 1807. Springer-Verlag, pp. 392-407.

61