Finite Field Arithmetic Architecture Based on Cellular Array

download Finite Field Arithmetic Architecture Based on Cellular Array

of 8

Transcript of Finite Field Arithmetic Architecture Based on Cellular Array

  • 7/27/2019 Finite Field Arithmetic Architecture Based on Cellular Array

    1/8

    International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(2): 122-129The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

    *Institute of Media Content, Dankook University

    152, Jukjeon-ro, Suji-gu, Yongin, Gyeonggi-do, 448-701, Korea**Dept. of Computer Engineering, Kumoh National Institute of Technology

    61, Daehak-ro, Gumi, Gyeongbuk, 730-701, Korea

    [email protected]*, [email protected](corresponding author)

    **

    ABSTRACT

    KEYWORDS

    cellular array, finite field, semi-systolicstructure, Montgomery multiplication,

    arithmetic architecture

    1 INTRODUCTION

    Finite field arithmetic operations,

    especially for the binary field GF(2m),

    have been widely used in the areas ofdata communication and network

    security applications such as error-

    correcting codes [1,2] and cryptosystems

    such as ECC(Elliptic Curve

    Cryptosystem) [3,4]. The finite fieldmultiplication is the most frequently

    studied. This is because the time-consuming operations such as

    exponentiation, division, and

    multiplicative inversion can be

    decomposed into repeatedmultiplications. Thus, the fast

    multiplication architecture with low

    complexity is needed to design dedicated

    high-speed circuits.

    Certainly, one of most interesting and

    useful advances in this realm has been

    the Montgomery multiplicationalgorithm, introduced by Montgomery

    [5] for fast modular integer

    multiplication. The multiplication wassuccessfully adapted to finite fieldGF(2

    m) by Koc and Acar [6]. They have

    proposed three Montgomery

    multiplication algorithms for bit-serial,digit-serial, and bit-parallel

    multiplication. They have chosen the

    Montgomery factor, R=xm for efficient

    implementation of the multiplication inhardware and software.

    Wu [7] has chosen a new Montgomeryfactor and shown that choosing the

    middle term of the irreducible trinomial

    G()= m+

    k+1 as the Montgomery

    factor, i.e.,R=xk, results in more efficient

    bit-parallel architectures. In [8], MM is

    implemented using systolic arrays for

    all-one polynomials and trinomials. Chiuet al. [9]proposed semi-systolic array

    structure for MM which uses R=xm.

    Hariri and Reyhani-Masoleh [10]

    proposed a number of bit-serial and bit-parallel Montgomery multipliers and

    showed that MM can accelerate the ECC

    scalar multiplication. Recently, in [11],they have considered concurrent error

    detection for MM over binary field.

    122

    Finite Field Arithmetic Architecture Based on Cellular Array

    Kee-Won Kim*and Jun-Cheol Jeon

    **

    Recently, various finite field arithmeticstructures are introduced for VLSI circuitimplementation on cryptosystems and error

    correcting codes. In this study, we presentan efficient finite field arithmetic

    architecture based on cellular semi-systolicarray for Montgomery multiplication bychoosing a proper Montgomery factor which

    is highly suitable for the design on parallelstructures. Therefore, our architecture has

    reduced a time complexity by 50%compared to typical architecture.

  • 7/27/2019 Finite Field Arithmetic Architecture Based on Cellular Array

    2/8

    International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(2): 122-129The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

    Three different multipliers, namely the

    bit-serial, digit-serial, and bit-parallel

    multipliers, have been considered andthe concurrent error detection scheme

    has been derived and implemented for

    each of them.

    Chiou [12] used the recomputing with

    shifted operands (RESO) to provide a

    concurrent error detection method forpolynomial basis multipliers using an

    irreducible all-one polynomial, which is

    a special case of a general polynomial.

    Lee et al. [13] described a concurrenterror detection (CED) method for a

    polynomial multiplier with an

    irreducible general polynomial. Chiou etal. [9] also developed a Montgomery

    multiplier with concurrent error

    detection capability. Bayat-Sarmadi and

    Hasan [14] proposed semi-systolicmultipliers for various bases, such as the

    polynomial, dual, type I and type II

    optimal normal bases. They have alsopresented semi-systolic multipliers with

    CED using RESO.

    Recently, Huang et al. [15] proposed thesemi-systolic polynomial basis

    multiplier over GF(2m) to reduce both

    space and time complexities. Also theyproposed the semi-systolic polynomial

    basis multipliers with concurrent error

    detection and correction capability.Various approaches adopt semi-systolic

    architectures to reduce the total number

    of latches and computation latency

    because of permitting the broadcastsignals. However, almost existing

    polynomial multipliers suffer from

    several shortcomings, including large

    time and/or hardware overhead, and lowperformance.

    In this paper, we consider theshortcomings that the typical

    architectures have, and propose a semi-

    systolic Montgomery multiplier with a

    new Montgomery factor. We show thatan efficient multiplication architecture

    can be obtained by choosing a proper

    Montgomery factor, and reduces timecomplexity.

    The remainder of this paper is organized

    as follows. Section 2 introducesMontgomery multiplication over finite

    fields. In Section 3, we propose a

    Montgomery multiplication architecture

    based on our algorithm which is highlyoptimized for hardware implementation.

    In Section 4, we analyze and compare

    our architecture with recent study.Finally, Section 5 gives our conclusion.

    2 MONTGOMERY

    MULTIPLICATION ON FINITE

    FIELDS

    GF(2m) is a kind of finite field [16] that

    contains 2m different elements. This

    finite field is an extension of GF(2) and

    anyA GF(2m) can be represented as a

    polynomial of degree m

    1 over GF(2),such as

    01

    1

    1 axaxaA m

    m ,

    where ai{0,1}, 0 im1.

    Let x be a root of the polynomial, then

    the irreducible polynomial G isrepresented as a following equation.

    01 gxgxgG m

    m , (1)

    wheregiGF(2), 0 im1.

    Let and be two elements of GF(2m),

    then we define = modG. Also, let

    A and B be two Montgomery residues,

    then they are defined as A= R modG

    and B=R modG, where GCD(R,G) =

    123

  • 7/27/2019 Finite Field Arithmetic Architecture Based on Cellular Array

    3/8

    International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(2): 122-129The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

    1. Then, the Montgomery multiplication

    algorithm over GF(2m) can be

    formulated as

    ,mod1 GRBAP

    whereR1

    is the inverse of Rmodulo G,and RR

    1+GG=1 [17]. Thus, by the

    definition of the Montgomery residue,

    the equation can be expressed as

    follows.

    GR

    GRRRP

    mod

    mod)()( 1

    It means that P is the Montgomery

    residue of . This makes it possible toconvert the operands to Montgomery

    residues once at the beginning, and then,do several consecutive multiplications/

    squarings, and convert the final result to

    the original representation. The finalconversion is a multiplication byR

    1, i.e.,

    = PR1 mod G. The polynomial R

    plays an important role in the complexity

    of the algorithm as we need to domodulo R multiplication and a final

    division byR.

    3 PROPOSED ARCHITECTURE

    This section describes the proposed

    Montgomery multiplication algorithmand architecture.

    3.1 Proposed Algorithm

    Based on the property of parallel

    architecture, we choose the Montgomeryfactor, 2/m

    xR . Then, theMontgomery multiplication over GF(2

    m)

    can be formulated as

    GxBAP m mod2/ (2)

    We know that x is a root of G and mg

    and 0g have always 1 over all

    irreducible polynomials. Thus, theequations can be rewritten as follows.

    1

    mod

    1

    1

    1 xgxg

    Gx

    m

    m

    m

    (3)

    1

    2

    1

    1

    1 mod

    gxgx

    Gx

    m

    m

    m

    (4)

    Meanwhile, (2) is represented by

    substitutingAandBas follows.

    ]

    [

    mod

    12/

    1

    22/

    2

    12/2/

    2/

    0

    12/

    1

    222/112/

    m

    m

    m

    m

    mm

    mm

    mm

    AxbAxb

    AxbAb

    AxbAxb

    AxbAxb

    GP

    (5)

    Now, it expresses that Pcan be dividedinto two parts. One is based on the

    negative powers of x and the other is

    based on the positive powers of x. (5)

    can be denoted byP= C+D, where

    ,mod]

    [

    2/

    0

    12/

    1

    2

    22/

    1

    12/

    GAxbAxb

    AxbAxbC

    mm

    mm

    .mod]

    [

    12/

    1

    22/

    2

    12/2/

    GAxbAxb

    AxbAbD

    m

    m

    m

    m

    mm

    Meanwhile, let )(iA and )(iA be

    GAx i mod and GAxi mod ,

    respectively. Then, based on (3) and (4),the equations can be expressed as

    124

  • 7/27/2019 Finite Field Arithmetic Architecture Based on Cellular Array

    4/8

    International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(2): 122-129The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

    ,)(

    )(

    mod)

    (

    mod

    1)1(

    0

    2

    1

    )1(

    0

    )1(

    1

    1

    )1(

    0

    )1(

    1

    1)1(

    1

    2)1(

    2

    )1(

    1

    )1(

    0

    1

    )1(1)(

    mim

    m

    ii

    m

    ii

    mi

    m

    mi

    m

    ii

    ii

    xaxgaa

    gaa

    Gxaxa

    xaax

    GAxA

    ,)(

    )(

    mod)

    (

    mod

    1

    1

    )1(

    1

    )1(

    2

    1

    )1(

    1

    )1(

    0

    )1(

    1

    1)1(

    1

    2)1(

    2

    )1(

    1

    )1(

    0

    )1()(

    m

    m

    i

    m

    i

    m

    i

    m

    ii

    m

    mi

    m

    mi

    m

    ii

    ii

    xgaa

    xgaaa

    Gxaxa

    xaax

    GxAA

    where

    ,1,

    20,

    )1(

    0

    1

    )1(

    0

    )1(

    1

    )(

    mja

    mjgaa

    a

    i

    j

    ii

    j

    i

    j

    (6)

    0,

    11,

    )1(

    1

    )1(

    1

    )1(

    1

    )(

    ja

    mjgaa

    a

    i

    m

    j

    i

    m

    i

    j

    i

    j

    (7)

    Also, using the formulae of )(iA and)(i

    A , the terms Cand D are represented

    as follows.

    )2/(0

    )12/(

    1

    )2(

    22/

    )1(

    12/

    )0(

    1

    12/

    2

    22/

    12/

    1

    2/

    0

    ]

    [

    mod

    mm

    mm

    mm

    mm

    AbAb

    AbAbAz

    AxbAxb

    AxbAxb

    GC

    (8)

    ,

    ]

    [

    mod

    )12/(

    1

    )22/(

    2

    )1(

    12/

    )0(

    2/

    12/

    1

    22/

    2

    12/2/

    m

    m

    m

    m

    mm

    m

    m

    m

    m

    mm

    AbAb

    AbAb

    AxbAxb

    AxbAb

    GD

    (9)

    where 0z .

    The coefficients of C and D are

    produced by summing the corresponding

    coefficients of each term in (8) and (9),

    respectively. It means that cjand dj, for 0jm1 are represented as

    )2/(0

    )12/(

    1

    )2(

    22/

    )1(

    12/

    )0(

    m

    j

    m

    j

    jmjmjj

    abab

    ababazc

    Algorithm 1. COM_C(A,B,G)

    Input:

    ),,,,( 0121 aaaaA mm ,

    ),,,,(' 0122/12/ bbbbB mm ,

    ),,,,( 0121 ggggG mm

    Output:

    GAxbAxb

    AxbAxbC

    mm

    mm

    mod]

    [

    1

    12/

    2

    22/

    12/

    1

    2/

    0

    ;)0( jj aa ;0)0( jc 0z ;

    for 1i to 12/ m dofor 0j to 1m in parallel do

    if (j = 0) then /* 0j */)1(0

    )(

    1

    ii

    m aa ;

    )1(

    012/

    )1(

    0

    )(

    0

    iimii abcc

    (or )0(0)(

    0

    )1(

    0 azcc i if 0i );

    else /* 1,2,,2,1 mmj */

    jm

    ii

    jm

    i

    jm gaaa

    )1(

    0

    )1()(

    1 ;

    )1(

    12/

    )1()(

    i

    jmim

    i

    jm

    i

    jm abcc

    (or )1()1()(

    i

    jm

    i

    jm

    i

    jm azcc if

    0

    i );end ifend for

    end for

    return C

    125

  • 7/27/2019 Finite Field Arithmetic Architecture Based on Cellular Array

    5/8

    International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(2): 122-129The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

    .)12/(

    1

    )22/(

    2

    )1(

    12/

    )0(

    2/

    m

    jm

    m

    jm

    jmjmj

    abab

    ababd

    Now, we obtain the following recurrence

    equations from the above equations.

    ,12/1,

    1,

    )1(

    12/

    )1(

    )1()1(

    )(

    miabc

    iazc

    c

    i

    jim

    i

    j

    i

    j

    i

    j

    i

    j

    where 0)0( jc for 10 mj and

    0z , and

    )1(

    12/

    )1()(

    ijimi

    j

    i

    j abdd 2/1, mi

    where 0)0( jd for 10 mj .

    Algorithm 2. COM_D(A,B,G)

    Input:

    ),,,,( 0121 aaaaA mm ,

    ),,,,(" 1212/2/ mmmm bbbbB ,

    ),,,,( 0121 ggggG mm

    Output:

    GAxbAxb

    AxbAbD

    m

    m

    m

    m

    mm

    mod]

    [

    12/

    1

    22/

    2

    12/2/

    ;)0( jj aa ;0)0( jd

    for 1i to 2/m dofor 0j to 1m in parallel do

    if (j=0) then /* 0j */)1(

    1

    )(

    0

    i

    m

    i aa ;

    )1(

    112/

    )1(

    1

    )(

    1

    i

    mim

    i

    m

    i

    m abdd ;

    else /* 1,2,,2,1 mmj */

    j

    i

    m

    i

    j

    i

    j gaaa )1(

    1

    )1(

    1

    )(

    ;

    )1(

    112/)1(

    1)(1

    ijim

    ij

    ij abdd ;

    end if

    end for

    end forreturn D

    As shown in Algorithm 1 and 2, the

    parallel computational algorithms for C

    andDare driven by the above equations.The proposed COM_C(A,B,G) and

    COM_D(A,B,G) algorithms can be

    executed simultaneously since there isno data dependency between computingCandD.

    3.2 Proposed Multiplier

    Based on the proposed algorithms, the

    hardware architecture of the proposed

    semi-systolic Montgomery multiplier isshown in Figure 1. The upper, lower,

    middle part of the array computes C, D,

    and C+D, respectively. Our architectureis composed of 12/ m

    )(

    0U i cells,

    )12/()2( mm )(U ij cells, 2/m

    )(

    0V i cells, 2/)2( mm

    )(V ij cells,

    and one S cell.

    1mb

    2mb

    12/ imb

    12/ mb

    2/mb

    2c1mc jmc 1c0c

    1ma

    0a

    1md1jd 0d0p1pjp

    2mp1mp

    S

    1ma 1mg 2g0 1a0 2a 1gjmg jma 0 0 0

    1mg 2ma 2mg 1jajg3ma 1g 0a

    )0(z

    12/ mb

    12/ imb

    0b

    1b

    (1)

    1-Um(1)

    2-Um(1)

    U j(1)

    1U(1)

    0U

    )2(

    1U )2(

    2U m)2(

    Uj(2)

    1-Um(2)

    0U

    )(

    0U i )(

    2Ui

    m

    )(

    Ui

    j

    )(

    1Ui

    m

    )(

    1Ui

    )/2(0

    U m )/2(

    1-U

    m

    m )/2(U m

    j )/2(

    2U m

    m )/2(

    1U m

    1)/2(0U m 1)/2(

    1-U m

    m 1)/2(U m

    j 1)/2(

    2U m

    m 1)/2(

    1U m

    2md 3md

    0000 0

    (1)

    0V(1)

    1V(1)

    Vj(1)

    1V m(1)

    2V m

    (2)

    0V(2)

    1V(2)

    Vj(2)

    1V m(2)

    2V m

    )(

    0V i)(

    1Vi)(V

    i

    j)(

    1Vi

    m)(

    2Vi

    m

    12/1V

    m 12/V m

    j 12/1V

    m

    m 12/

    0V m 12/

    2V

    m

    m

    2/1V m 2/V

    m

    j 2/1V

    m

    m 2/

    0V m 2/

    2V m

    m

    Figure 1. The proposed semi-systolic

    Montgomery multiplier over GF(2m)

    126

  • 7/27/2019 Finite Field Arithmetic Architecture Based on Cellular Array

    6/8

    International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(2): 122-129The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

    The detailed circuits of the cells in

    Figure 1 are depicted in Figure 2 thru

    Figure 4, and , , and D denote XORgate, AND gate, and one-bit latch(flip-flop), respectively.

    )1(

    0

    i

    c

    D D

    )(

    0

    ic )( 1i

    ma

    12/ imb

    )1(0

    ia

    (a)

    )(0U

    i

    jmg )1(

    i

    jmc)1(

    i

    jma

    )1(

    0

    ia

    D D D

    )(i

    jmc )(

    1

    i

    jma

    12/ imb

    (b)

    )(

    U

    i

    j

    Figure 2.Circuit configuration of)(

    0U i and

    )(U ij cell

    The latency of the proposed semi-

    systolic multiplier requires m/2+1clock cycles. Each clock cycle takes thedelay of one 2-input AND gate, one 2-

    input XOR gate, and one 1-bit latch. The

    space complexity of this multiplierrequires 2m2+m1 2-input AND gates,

    2m2+2m1 2-input XOR gates, and

    3m2+2m1(for odd m) or 3m

    2+3m1(for

    even m) 1-bit latches.

    Note that )(U ij ( )(

    0U i ) and )(V ij (

    )(

    0V i )

    cells in Figure 2 and 3 are functionally

    equivalent cells and the computations

    can be executed in parallel, and the

    computed results are added in S cell. InFigure 4, D

    *denotes one bit latch when

    mis even, otherwise it is ignored.

    )1(

    1

    i

    md

    )1(

    1

    i

    ma

    )(

    1

    i

    md )(

    0

    i

    a

    12/ imb

    DD

    (a)

    )(

    0V i

    12/ imb

    )1(

    1

    i

    jd )1(

    1

    i

    ja

    )1(

    1

    i

    ma

    )(

    1

    i

    jd )(i

    ja

    D D D

    jg

    (b)

    )(V ij

    Figure 3. Circuit configuration of)(

    0V i and

    )(V ij cell

    4 COMPLEXITY ANALYSIS

    In CMOS VLSI technology, each gate is

    composed of several transistors [18]. We

    adopt that AAND2 = 6, AXOR2 = 6, and

    ALATCH1 = 8, where AGATEn denotestransistor count of an n-input gate,

    respectively. Also, for a furthercomparison of time complexity, we

    adopt the practical integrated circuits in

    [19] and the following assumptions, as

    discussed in detail in [15], are made:TAND2= 7, TXOR2= 12, and TLATCH1= 13,

    127

  • 7/27/2019 Finite Field Arithmetic Architecture Based on Cellular Array

    7/8

    International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(2): 122-129The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

    where TGATEn denotes the propagation

    delay of an i-input gate, respectively.

    2mc1mc jc 1c0c

    1d 1mdjd 0d

    0p

    1p

    jp

    2mp

    1mp

    2md

    D*

    D*

    D*

    D*

    D*

    Figure 4. Circuit configuration of S cell

    Table 1. Comparison of semi-systolic

    polynomial basis architectures

    gate/delayIn

    [15]

    Fig. 1

    even m/odd m

    Number of

    cells2

    m

    U

    : 12/ m U :

    )12/()2( mm

    V : 2/m

    V : 2/)2( mm

    S :1

    2-input AND 22m 12 2 mm

    2-input XOR 22m 122 2 mm

    3-input XOR 0 0

    one-bit latch 23m 133

    2 mm /

    123 2 mm

    Total

    transistorcount

    2

    48m

    204248 2 mm /

    203448 2 mm

    Cell delay(ns) 32 32

    Latency m 15.0 m / 5.05.0 m

    Total

    delay(ns)m32 3216 m / 1616 m

    A circuit comparison between theproposed multiplier and the related

    multiplier is given in Table 1. Althoughthe proposed multiplier has nearly thesame space complexity compared toHuang et al.[15], the time complexity isapproximately reduced by 50%.

    5 CONCLUSION

    In this paper, we propose a cellular semi-

    systolic architecture for Montgomery

    multiplication over finite fields. We

    choose a novel Montgomery factorwhich is highly suitable for the design of

    parallel structures. We also divided our

    architecture into three parts, and

    computed two parts of them in parallel

    so that we reduced the time complexityby nearly 50% compared to the recent

    study in spite of maintaining similarspace complexity. We expect that our

    architecture can be efficiently used for

    various applications, which demandhigh-speed computation, based on

    arithmetic operations.

    6 ACKNOWLEDGMENT

    This research was supported by BasicScience Research Program through the

    National Research Foundation of

    Korea(NRF) funded by the Ministry of

    Education, Science and Technology(2011-0014977).

    7 REFERENCES

    1. W. W. Peterson, and E. J. Weldon, Error-Correcting Codes, MIT Press, Cambridge

    (1972).

    2. R. E. Blahut. Theory and Practice of ErrorControl Codes, Addison-Wesley, Reading

    (1983).

    3. W. Diffie and M. E. Hellman, Newdirections in cryptography, IEEE

    Transactions on Information Theory, vol.

    22, no. 6, pp. 644-654 (1976).

    4. B. Schneier, Applied Cryptography, JohnWiley & Sons press, 2nd edition (1996).

    128

  • 7/27/2019 Finite Field Arithmetic Architecture Based on Cellular Array

    8/8

    International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(2): 122-129The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

    5. P. Montgomery, Modular Multiplicationwithout Trial Division, Mathematics of

    Computation, vol. 44, no. 170, pp. 519521

    (1985).

    6. C. Koc and T. Acar, MontgomeryMultiplication in GF(2k), Designs, Codes

    and Cryptography, vol. 14, no. 1, pp. 5769

    (1998).

    7. H. Wu, Montgomery Multiplier andSquarer for a Class of Finite Fields, IEEE

    Trans. Computers, vol. 51, no. 5, pp. 521-

    529 (2002).

    8. C. Y. Lee, J. S. Horng, I. C. Jou and E. H.Lu, Low-Complexity Bit-Parallel Systolic

    Montgomery Multipliers for Special Classes

    of GF(2m), IEEE Transactions on

    Computers, vol. 54, no. 9, pp. 10611070

    (2005).

    9. C. W. Chiou, C. Y. Lee, A. W. Deng and J.M. Lin, Concurrent Error Detection in

    Montgomery Multiplication over GF(2m),IEICE Trans. Fundamentals of Electronics,

    Communications and Computer Sciences,

    vol. E89-A, no. 2, pp. 566-574, 2006.

    10. A. Hariri and A. Reyhani-Masoleh, Bit-Serial and Bit-Parallel Montgomery

    Multiplication and Squaring over GF(2m),

    IEEE Trans. Computers, vol. 58, no. 10, pp.

    1332-1345 (2009).

    11. A. Hariri and A. Reyhani-Masoleh,Concurrent Error Detection in Montgomery

    Multiplication over Binary Extension

    Fields,IEEE Trans. Computers, vol. 60, no.

    9, pp. 1341-1353 (2011).12. C. W. Chiou, Concurrent Error Detection

    in Array Multipliers for GF(2m) Fields, IEE

    Electronics Letters, vol. 38, no. 14, pp. 688

    689 (2002).

    13. C. Y. Lee, C. W. Chiou, and J. M. Lin,Concurrent Error Detection in a

    Polynomial Basis Multiplier over GF(2m),

    J. Electronic Testing: Theory and

    Applications, vol. 22, no. 2, pp. 143-150

    (2006).

    14. S. Bayat-Sarmadi and M.A. Hasan,Concurrent Error Detection in Finite Field

    Arithmetic Operations Using Pipelined andSystolic Architectures, IEEE Trans.

    Computers, vol. 58, no. 11, pp. 1553-1567

    (2009).

    15. W. T. Huang, C. H. Chang, C. W. Chiouand F. H. Chou, Concurrent error detection

    and correction in a polynomial basis

    multiplier over GF(2m), IET Information

    Security, vol. 4, no. 3, pp. 111-124 (2010).

    16. R. Lidl and H. Niederreiter, Introduction toFinite Fields and Their Applications,

    Cambridge Univ. Press (1986).

    17. J. C. Jeon and K. Y. Yoo, Montgomeryexponent architecture based on

    programmable cellular automata,

    Mathematics and Computers in Simulation,

    vol. 79, pp. 1189-1196 (2008).

    18.N. Weste, K. Eshraghian, Principles ofCMOS VLSI design: a system perspective,

    Addison-Wesley, Reading, MA (1985).

    19. STMicroelectronics, Available athttp://www.st.com/

    129