Finite Field Arithmetic Architecture Based on Cellular Array

7/27/2019 Finite Field Arithmetic Architecture Based on Cellular Array

1/8

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(2): 122-129The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

*Institute of Media Content, Dankook University

152, Jukjeon-ro, Suji-gu, Yongin, Gyeonggi-do, 448-701, Korea**Dept. of Computer Engineering, Kumoh National Institute of Technology

61, Daehak-ro, Gumi, Gyeongbuk, 730-701, Korea

[email protected]*, [email protected](corresponding author)

**

ABSTRACT

KEYWORDS

cellular array, finite field, semi-systolicstructure, Montgomery multiplication,

arithmetic architecture

1 INTRODUCTION

Finite field arithmetic operations,

especially for the binary field GF(2m),

have been widely used in the areas ofdata communication and network

security applications such as error-

correcting codes [1,2] and cryptosystems

such as ECC(Elliptic Curve

Cryptosystem) [3,4]. The finite fieldmultiplication is the most frequently

studied. This is because the time-consuming operations such as

exponentiation, division, and

multiplicative inversion can be

decomposed into repeatedmultiplications. Thus, the fast

multiplication architecture with low

complexity is needed to design dedicated

high-speed circuits.

Certainly, one of most interesting and

useful advances in this realm has been

the Montgomery multiplicationalgorithm, introduced by Montgomery

[5] for fast modular integer

multiplication. The multiplication wassuccessfully adapted to finite fieldGF(2

m) by Koc and Acar [6]. They have

proposed three Montgomery

multiplication algorithms for bit-serial,digit-serial, and bit-parallel

multiplication. They have chosen the

Montgomery factor, R=xm for efficient

implementation of the multiplication inhardware and software.

Wu [7] has chosen a new Montgomeryfactor and shown that choosing the

middle term of the irreducible trinomial

G()= m+

k+1 as the Montgomery

factor, i.e.,R=xk, results in more efficient

bit-parallel architectures. In [8], MM is

implemented using systolic arrays for

all-one polynomials and trinomials. Chiuet al. [9]proposed semi-systolic array

structure for MM which uses R=xm.

Hariri and Reyhani-Masoleh [10]

proposed a number of bit-serial and bit-parallel Montgomery multipliers and

showed that MM can accelerate the ECC

scalar multiplication. Recently, in [11],they have considered concurrent error

detection for MM over binary field.

122

Finite Field Arithmetic Architecture Based on Cellular Array

Kee-Won Kim*and Jun-Cheol Jeon

**

Recently, various finite field arithmeticstructures are introduced for VLSI circuitimplementation on cryptosystems and error

correcting codes. In this study, we presentan efficient finite field arithmetic

architecture based on cellular semi-systolicarray for Montgomery multiplication bychoosing a proper Montgomery factor which

is highly suitable for the design on parallelstructures. Therefore, our architecture has

reduced a time complexity by 50%compared to typical architecture.


2/8


Three different multipliers, namely the

bit-serial, digit-serial, and bit-parallel

multipliers, have been considered andthe concurrent error detection scheme

has been derived and implemented for

each of them.

Chiou [12] used the recomputing with

shifted operands (RESO) to provide a

concurrent error detection method forpolynomial basis multipliers using an

irreducible all-one polynomial, which is

a special case of a general polynomial.

Lee et al. [13] described a concurrenterror detection (CED) method for a

polynomial multiplier with an

irreducible general polynomial. Chiou etal. [9] also developed a Montgomery

multiplier with concurrent error

detection capability. Bayat-Sarmadi and

Hasan [14] proposed semi-systolicmultipliers for various bases, such as the

polynomial, dual, type I and type II

optimal normal bases. They have alsopresented semi-systolic multipliers with

CED using RESO.

Recently, Huang et al. [15] proposed thesemi-systolic polynomial basis

multiplier over GF(2m) to reduce both

space and time complexities. Also theyproposed the semi-systolic polynomial

basis multipliers with concurrent error

detection and correction capability.Various approaches adopt semi-systolic

architectures to reduce the total number

of latches and computation latency

because of permitting the broadcastsignals. However, almost existing

polynomial multipliers suffer from

several shortcomings, including large

time and/or hardware overhead, and lowperformance.

In this paper, we consider theshortcomings that the typical

architectures have, and propose a semi-

systolic Montgomery multiplier with a

new Montgomery factor. We show thatan efficient multiplication architecture

can be obtained by choosing a proper

Montgomery factor, and reduces timecomplexity.

The remainder of this paper is organized

as follows. Section 2 introducesMontgomery multiplication over finite

fields. In Section 3, we propose a

Montgomery multiplication architecture

based on our algorithm which is highlyoptimized for hardware implementation.

In Section 4, we analyze and compare

our architecture with recent study.Finally, Section 5 gives our conclusion.

2 MONTGOMERY

MULTIPLICATION ON FINITE

FIELDS

GF(2m) is a kind of finite field [16] that

contains 2m different elements. This

finite field is an extension of GF(2) and

anyA GF(2m) can be represented as a

polynomial of degree m

1 over GF(2),such as

01

1

1 axaxaA m

m ,

where ai{0,1}, 0 im1.

Let x be a root of the polynomial, then

the irreducible polynomial G isrepresented as a following equation.

01 gxgxgG m

m , (1)

wheregiGF(2), 0 im1.

Let and be two elements of GF(2m),

then we define = modG. Also, let

A and B be two Montgomery residues,

then they are defined as A= R modG

and B=R modG, where GCD(R,G) =

123


3/8


1. Then, the Montgomery multiplication

algorithm over GF(2m) can be

formulated as

,mod1 GRBAP

whereR1

is the inverse of Rmodulo G,and RR

1+GG=1 [17]. Thus, by the

definition of the Montgomery residue,

the equation can be expressed as

follows.

GR

GRRRP

mod

mod)()( 1

It means that P is the Montgomery

residue of . This makes it possible toconvert the operands to Montgomery

residues once at the beginning, and then,do several consecutive multiplications/

squarings, and convert the final result to

the original representation. The finalconversion is a multiplication byR

1, i.e.,

= PR1 mod G. The polynomial R

plays an important role in the complexity

of the algorithm as we need to domodulo R multiplication and a final

division byR.

3 PROPOSED ARCHITECTURE

This section describes the proposed

Montgomery multiplication algorithmand architecture.

3.1 Proposed Algorithm

Based on the property of parallel

architecture, we choose the Montgomeryfactor, 2/m

xR . Then, theMontgomery multiplication over GF(2

m)

can be formulated as

GxBAP m mod2/ (2)

We know that x is a root of G and mg

and 0g have always 1 over all

irreducible polynomials. Thus, theequations can be rewritten as follows.

1

mod

1

1

1 xgxg

Gx

m

m

m

(3)

1

2

1

1

1 mod

gxgx

Gx

m

m

m

(4)

Meanwhile, (2) is represented by

substitutingAandBas follows.

]

[

mod

12/

1

22/

2

12/2/

2/

0

12/

1

222/112/

m

m

m

m

mm

mm

mm

AxbAxb

AxbAb

AxbAxb

AxbAxb

GP

(5)

Now, it expresses that Pcan be dividedinto two parts. One is based on the

negative powers of x and the other is

based on the positive powers of x. (5)

can be denoted byP= C+D, where

,mod]

[

2/

0

12/

1

2

22/

1

12/

GAxbAxb

AxbAxbC

mm

mm

.mod]

[

12/

1

22/

2

12/2/

GAxbAxb

AxbAbD

m

m

m

m

mm

Meanwhile, let )(iA and )(iA be

GAx i mod and GAxi mod ,

respectively. Then, based on (3) and (4),the equations can be expressed as

124


4/8


,)(

)(

mod)

(

mod

1)1(

0

2

1

)1(

0

)1(

1

1

)1(

0

)1(

1

1)1(

1

2)1(

2

)1(

1

)1(

0

1

)1(1)(

mim

m

ii

m

ii

mi

m

mi

m

ii

ii

xaxgaa

gaa

Gxaxa

xaax

GAxA

,)(

)(

mod)

(

mod

1

1

)1(

1

)1(

2

1

)1(

1

)1(

0

)1(

1

1)1(

1

2)1(

2

)1(

1

)1(

0

)1()(

m

m

i

m

i

m

i

m

ii

m

mi

m

mi

m

ii

ii

xgaa

xgaaa

Gxaxa

xaax

GxAA

where

,1,

20,

)1(

0

1

)1(

0

)1(

1

)(

mja

mjgaa

a

i

j

ii

j

i

j

(6)

0,

11,

)1(

1

)1(

1

)1(

1

)(

ja

mjgaa

a

i

m

j

i

m

i

j

i

j

(7)

Also, using the formulae of )(iA and)(i

A , the terms Cand D are represented

as follows.

)2/(0

)12/(

1

)2(

22/

)1(

12/

)0(

1

12/

2

22/

12/

1

2/

0

]

[

mod

mm

mm

mm

mm

AbAb

AbAbAz

AxbAxb

AxbAxb

GC

(8)

,

]

[

mod

)12/(

1

)22/(

2

)1(

12/

)0(

2/

12/

1

22/

2

12/2/

m

m

m

m

mm

m

m

m

m

mm

AbAb

AbAb

AxbAxb

AxbAb

GD

(9)

where 0z .

The coefficients of C and D are

produced by summing the corresponding

coefficients of each term in (8) and (9),

respectively. It means that cjand dj, for 0jm1 are represented as

)2/(0

)12/(

1

)2(

22/

)1(

12/

)0(

m

j

m

j

jmjmjj

abab

ababazc

Algorithm 1. COM_C(A,B,G)

Input:

),,,,( 0121 aaaaA mm ,

),,,,(' 0122/12/ bbbbB mm ,

),,,,( 0121 ggggG mm

Output:

GAxbAxb

AxbAxbC

mm

mm

mod]

[

1

12/

2

22/

12/

1

2/

0

;)0( jj aa ;0)0( jc 0z ;

for 1i to 12/ m dofor 0j to 1m in parallel do

if (j = 0) then /* 0j */)1(0

)(

1

ii

m aa ;

)1(

012/

)1(

0

)(

0

iimii abcc

(or )0(0)(

0

)1(

0 azcc i if 0i );

else /* 1,2,,2,1 mmj */

jm

ii

jm

i

jm gaaa

)1(

0

)1()(

1 ;

)1(

12/

)1()(

i

jmim

i

jm

i

jm abcc

(or )1()1()(

i

jm

i

jm

i

jm azcc if

0

i );end ifend for

end for

return C

125


5/8


.)12/(

1

)22/(

2

)1(

12/

)0(

2/

m

jm

m

jm

jmjmj

abab

ababd

Now, we obtain the following recurrence

equations from the above equations.

,12/1,

1,

)1(

12/

)1(

)1()1(

)(

miabc

iazc

c

i

jim

i

j

i

j

i

j

i

j

where 0)0( jc for 10 mj and

0z , and

)1(

12/

)1()(

ijimi

j

i

j abdd 2/1, mi

where 0)0( jd for 10 mj .

Algorithm 2. COM_D(A,B,G)

Input:

),,,,( 0121 aaaaA mm ,

),,,,(" 1212/2/ mmmm bbbbB ,

),,,,( 0121 ggggG mm

Output:

GAxbAxb

AxbAbD

m

m

m

m

mm

mod]

[

12/

1

22/

2

12/2/

;)0( jj aa ;0)0( jd

for 1i to 2/m dofor 0j to 1m in parallel do

if (j=0) then /* 0j */)1(

1

)(

0

i

m

i aa ;

)1(

112/

)1(

1

)(

1

i

mim

i

m

i

m abdd ;

else /* 1,2,,2,1 mmj */

j

i

m

i

j

i

j gaaa )1(

1

)1(

1

)(

;

)1(

112/)1(

1)(1

ijim

ij

ij abdd ;

end if

end for

end forreturn D

As shown in Algorithm 1 and 2, the

parallel computational algorithms for C

andDare driven by the above equations.The proposed COM_C(A,B,G) and

COM_D(A,B,G) algorithms can be

executed simultaneously since there isno data dependency between computingCandD.

3.2 Proposed Multiplier

Based on the proposed algorithms, the

hardware architecture of the proposed

semi-systolic Montgomery multiplier isshown in Figure 1. The upper, lower,

middle part of the array computes C, D,

and C+D, respectively. Our architectureis composed of 12/ m

)(

0U i cells,

)12/()2( mm )(U ij cells, 2/m

)(

0V i cells, 2/)2( mm

)(V ij cells,

and one S cell.

1mb

2mb

12/ imb

12/ mb

2/mb

2c1mc jmc 1c0c

1ma

0a

1md1jd 0d0p1pjp

2mp1mp

S

1ma 1mg 2g0 1a0 2a 1gjmg jma 0 0 0

1mg 2ma 2mg 1jajg3ma 1g 0a

)0(z

12/ mb

12/ imb

0b

1b

(1)

1-Um(1)

2-Um(1)

U j(1)

1U(1)

0U

)2(

1U )2(

2U m)2(

Uj(2)

1-Um(2)

0U

)(

0U i )(

2Ui

m

)(

Ui

j

)(

1Ui

m

)(

1Ui

)/2(0

U m )/2(

1-U

m

m )/2(U m

j )/2(

2U m

m )/2(

1U m

1)/2(0U m 1)/2(

1-U m

m 1)/2(U m

j 1)/2(

2U m

m 1)/2(

1U m

2md 3md

0000 0

(1)

0V(1)

1V(1)

Vj(1)

1V m(1)

2V m

(2)

0V(2)

1V(2)

Vj(2)

1V m(2)

2V m

)(

0V i)(

1Vi)(V

i

j)(

1Vi

m)(

2Vi

m

12/1V

m 12/V m

j 12/1V

m

m 12/

0V m 12/

2V

m

m

2/1V m 2/V

m

j 2/1V

m

m 2/

0V m 2/

2V m

m

Figure 1. The proposed semi-systolic

Montgomery multiplier over GF(2m)

126


6/8


The detailed circuits of the cells in

Figure 1 are depicted in Figure 2 thru

Figure 4, and , , and D denote XORgate, AND gate, and one-bit latch(flip-flop), respectively.

)1(

0

i

c

D D

)(

0

ic )( 1i

ma

12/ imb

)1(0

ia

(a)

)(0U

i

jmg )1(

i

jmc)1(

i

jma

)1(

0

ia

D D D

)(i

jmc )(

1

i

jma

12/ imb

(b)

)(

U

i

j

Figure 2.Circuit configuration of)(

0U i and

)(U ij cell

The latency of the proposed semi-

systolic multiplier requires m/2+1clock cycles. Each clock cycle takes thedelay of one 2-input AND gate, one 2-

input XOR gate, and one 1-bit latch. The

space complexity of this multiplierrequires 2m2+m1 2-input AND gates,

2m2+2m1 2-input XOR gates, and

3m2+2m1(for odd m) or 3m

2+3m1(for

even m) 1-bit latches.

Note that )(U ij ( )(

0U i ) and )(V ij (

)(

0V i )

cells in Figure 2 and 3 are functionally

equivalent cells and the computations

can be executed in parallel, and the

computed results are added in S cell. InFigure 4, D

*denotes one bit latch when

mis even, otherwise it is ignored.

)1(

1

i

md

)1(

1

i

ma

)(

1

i

md )(

0

i

a

12/ imb

DD

(a)

)(

0V i

12/ imb

)1(

1

i

jd )1(

1

i

ja

)1(

1

i

ma

)(

1

i

jd )(i

ja

D D D

jg

(b)

)(V ij

Figure 3. Circuit configuration of)(

0V i and

)(V ij cell

4 COMPLEXITY ANALYSIS

In CMOS VLSI technology, each gate is

composed of several transistors [18]. We

adopt that AAND2 = 6, AXOR2 = 6, and

ALATCH1 = 8, where AGATEn denotestransistor count of an n-input gate,

respectively. Also, for a furthercomparison of time complexity, we

adopt the practical integrated circuits in

[19] and the following assumptions, as

discussed in detail in [15], are made:TAND2= 7, TXOR2= 12, and TLATCH1= 13,

127


7/8


where TGATEn denotes the propagation

delay of an i-input gate, respectively.

2mc1mc jc 1c0c

1d 1mdjd 0d

0p

1p

jp

2mp

1mp

2md

D*

D*

D*

D*

D*

Figure 4. Circuit configuration of S cell

Table 1. Comparison of semi-systolic

polynomial basis architectures

gate/delayIn

[15]

Fig. 1

even m/odd m

Number of

cells2

m

U

: 12/ m U :

)12/()2( mm

V : 2/m

V : 2/)2( mm

S :1

2-input AND 22m 12 2 mm

2-input XOR 22m 122 2 mm

3-input XOR 0 0

one-bit latch 23m 133

2 mm /

123 2 mm

Total

transistorcount

2

48m

204248 2 mm /

203448 2 mm

Cell delay(ns) 32 32

Latency m 15.0 m / 5.05.0 m

Total

delay(ns)m32 3216 m / 1616 m

A circuit comparison between theproposed multiplier and the related

multiplier is given in Table 1. Althoughthe proposed multiplier has nearly thesame space complexity compared toHuang et al.[15], the time complexity isapproximately reduced by 50%.

5 CONCLUSION

In this paper, we propose a cellular semi-

systolic architecture for Montgomery

multiplication over finite fields. We

choose a novel Montgomery factorwhich is highly suitable for the design of

parallel structures. We also divided our

architecture into three parts, and

computed two parts of them in parallel

so that we reduced the time complexityby nearly 50% compared to the recent

study in spite of maintaining similarspace complexity. We expect that our

architecture can be efficiently used for

various applications, which demandhigh-speed computation, based on

arithmetic operations.

6 ACKNOWLEDGMENT

This research was supported by BasicScience Research Program through the

National Research Foundation of

Korea(NRF) funded by the Ministry of

Education, Science and Technology(2011-0014977).

7 REFERENCES

1. W. W. Peterson, and E. J. Weldon, Error-Correcting Codes, MIT Press, Cambridge

(1972).

2. R. E. Blahut. Theory and Practice of ErrorControl Codes, Addison-Wesley, Reading

(1983).

3. W. Diffie and M. E. Hellman, Newdirections in cryptography, IEEE

Transactions on Information Theory, vol.

22, no. 6, pp. 644-654 (1976).

4. B. Schneier, Applied Cryptography, JohnWiley & Sons press, 2nd edition (1996).

128


8/8


5. P. Montgomery, Modular Multiplicationwithout Trial Division, Mathematics of

Computation, vol. 44, no. 170, pp. 519521

(1985).

6. C. Koc and T. Acar, MontgomeryMultiplication in GF(2k), Designs, Codes

and Cryptography, vol. 14, no. 1, pp. 5769

(1998).

7. H. Wu, Montgomery Multiplier andSquarer for a Class of Finite Fields, IEEE

Trans. Computers, vol. 51, no. 5, pp. 521-

529 (2002).

8. C. Y. Lee, J. S. Horng, I. C. Jou and E. H.Lu, Low-Complexity Bit-Parallel Systolic

Montgomery Multipliers for Special Classes

of GF(2m), IEEE Transactions on

Computers, vol. 54, no. 9, pp. 10611070

(2005).

9. C. W. Chiou, C. Y. Lee, A. W. Deng and J.M. Lin, Concurrent Error Detection in

Montgomery Multiplication over GF(2m),IEICE Trans. Fundamentals of Electronics,

Communications and Computer Sciences,

vol. E89-A, no. 2, pp. 566-574, 2006.

10. A. Hariri and A. Reyhani-Masoleh, Bit-Serial and Bit-Parallel Montgomery

Multiplication and Squaring over GF(2m),

IEEE Trans. Computers, vol. 58, no. 10, pp.

1332-1345 (2009).

11. A. Hariri and A. Reyhani-Masoleh,Concurrent Error Detection in Montgomery

Multiplication over Binary Extension

Fields,IEEE Trans. Computers, vol. 60, no.

9, pp. 1341-1353 (2011).12. C. W. Chiou, Concurrent Error Detection

in Array Multipliers for GF(2m) Fields, IEE

Electronics Letters, vol. 38, no. 14, pp. 688

689 (2002).

13. C. Y. Lee, C. W. Chiou, and J. M. Lin,Concurrent Error Detection in a

Polynomial Basis Multiplier over GF(2m),

J. Electronic Testing: Theory and

Applications, vol. 22, no. 2, pp. 143-150

(2006).

14. S. Bayat-Sarmadi and M.A. Hasan,Concurrent Error Detection in Finite Field

Arithmetic Operations Using Pipelined andSystolic Architectures, IEEE Trans.

Computers, vol. 58, no. 11, pp. 1553-1567

(2009).

15. W. T. Huang, C. H. Chang, C. W. Chiouand F. H. Chou, Concurrent error detection

and correction in a polynomial basis

multiplier over GF(2m), IET Information

Security, vol. 4, no. 3, pp. 111-124 (2010).

16. R. Lidl and H. Niederreiter, Introduction toFinite Fields and Their Applications,

Cambridge Univ. Press (1986).

17. J. C. Jeon and K. Y. Yoo, Montgomeryexponent architecture based on

programmable cellular automata,

Mathematics and Computers in Simulation,

vol. 79, pp. 1189-1196 (2008).

18.N. Weste, K. Eshraghian, Principles ofCMOS VLSI design: a system perspective,

Addison-Wesley, Reading, MA (1985).

19. STMicroelectronics, Available athttp://www.st.com/

129

Finite Field Arithmetic Architecture Based on Cellular Array

Documents

Transcript of Finite Field Arithmetic Architecture Based on Cellular Array