Finite Field Arithmetic Architecture Based on Cellular Array
Transcript of Finite Field Arithmetic Architecture Based on Cellular Array
-
7/27/2019 Finite Field Arithmetic Architecture Based on Cellular Array
1/8
International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(2): 122-129The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)
*Institute of Media Content, Dankook University
152, Jukjeon-ro, Suji-gu, Yongin, Gyeonggi-do, 448-701, Korea**Dept. of Computer Engineering, Kumoh National Institute of Technology
61, Daehak-ro, Gumi, Gyeongbuk, 730-701, Korea
[email protected]*, [email protected](corresponding author)
**
ABSTRACT
KEYWORDS
cellular array, finite field, semi-systolicstructure, Montgomery multiplication,
arithmetic architecture
1 INTRODUCTION
Finite field arithmetic operations,
especially for the binary field GF(2m),
have been widely used in the areas ofdata communication and network
security applications such as error-
correcting codes [1,2] and cryptosystems
such as ECC(Elliptic Curve
Cryptosystem) [3,4]. The finite fieldmultiplication is the most frequently
studied. This is because the time-consuming operations such as
exponentiation, division, and
multiplicative inversion can be
decomposed into repeatedmultiplications. Thus, the fast
multiplication architecture with low
complexity is needed to design dedicated
high-speed circuits.
Certainly, one of most interesting and
useful advances in this realm has been
the Montgomery multiplicationalgorithm, introduced by Montgomery
[5] for fast modular integer
multiplication. The multiplication wassuccessfully adapted to finite fieldGF(2
m) by Koc and Acar [6]. They have
proposed three Montgomery
multiplication algorithms for bit-serial,digit-serial, and bit-parallel
multiplication. They have chosen the
Montgomery factor, R=xm for efficient
implementation of the multiplication inhardware and software.
Wu [7] has chosen a new Montgomeryfactor and shown that choosing the
middle term of the irreducible trinomial
G()= m+
k+1 as the Montgomery
factor, i.e.,R=xk, results in more efficient
bit-parallel architectures. In [8], MM is
implemented using systolic arrays for
all-one polynomials and trinomials. Chiuet al. [9]proposed semi-systolic array
structure for MM which uses R=xm.
Hariri and Reyhani-Masoleh [10]
proposed a number of bit-serial and bit-parallel Montgomery multipliers and
showed that MM can accelerate the ECC
scalar multiplication. Recently, in [11],they have considered concurrent error
detection for MM over binary field.
122
Finite Field Arithmetic Architecture Based on Cellular Array
Kee-Won Kim*and Jun-Cheol Jeon
**
Recently, various finite field arithmeticstructures are introduced for VLSI circuitimplementation on cryptosystems and error
correcting codes. In this study, we presentan efficient finite field arithmetic
architecture based on cellular semi-systolicarray for Montgomery multiplication bychoosing a proper Montgomery factor which
is highly suitable for the design on parallelstructures. Therefore, our architecture has
reduced a time complexity by 50%compared to typical architecture.
-
7/27/2019 Finite Field Arithmetic Architecture Based on Cellular Array
2/8
International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(2): 122-129The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)
Three different multipliers, namely the
bit-serial, digit-serial, and bit-parallel
multipliers, have been considered andthe concurrent error detection scheme
has been derived and implemented for
each of them.
Chiou [12] used the recomputing with
shifted operands (RESO) to provide a
concurrent error detection method forpolynomial basis multipliers using an
irreducible all-one polynomial, which is
a special case of a general polynomial.
Lee et al. [13] described a concurrenterror detection (CED) method for a
polynomial multiplier with an
irreducible general polynomial. Chiou etal. [9] also developed a Montgomery
multiplier with concurrent error
detection capability. Bayat-Sarmadi and
Hasan [14] proposed semi-systolicmultipliers for various bases, such as the
polynomial, dual, type I and type II
optimal normal bases. They have alsopresented semi-systolic multipliers with
CED using RESO.
Recently, Huang et al. [15] proposed thesemi-systolic polynomial basis
multiplier over GF(2m) to reduce both
space and time complexities. Also theyproposed the semi-systolic polynomial
basis multipliers with concurrent error
detection and correction capability.Various approaches adopt semi-systolic
architectures to reduce the total number
of latches and computation latency
because of permitting the broadcastsignals. However, almost existing
polynomial multipliers suffer from
several shortcomings, including large
time and/or hardware overhead, and lowperformance.
In this paper, we consider theshortcomings that the typical
architectures have, and propose a semi-
systolic Montgomery multiplier with a
new Montgomery factor. We show thatan efficient multiplication architecture
can be obtained by choosing a proper
Montgomery factor, and reduces timecomplexity.
The remainder of this paper is organized
as follows. Section 2 introducesMontgomery multiplication over finite
fields. In Section 3, we propose a
Montgomery multiplication architecture
based on our algorithm which is highlyoptimized for hardware implementation.
In Section 4, we analyze and compare
our architecture with recent study.Finally, Section 5 gives our conclusion.
2 MONTGOMERY
MULTIPLICATION ON FINITE
FIELDS
GF(2m) is a kind of finite field [16] that
contains 2m different elements. This
finite field is an extension of GF(2) and
anyA GF(2m) can be represented as a
polynomial of degree m
1 over GF(2),such as
01
1
1 axaxaA m
m ,
where ai{0,1}, 0 im1.
Let x be a root of the polynomial, then
the irreducible polynomial G isrepresented as a following equation.
01 gxgxgG m
m , (1)
wheregiGF(2), 0 im1.
Let and be two elements of GF(2m),
then we define = modG. Also, let
A and B be two Montgomery residues,
then they are defined as A= R modG
and B=R modG, where GCD(R,G) =
123
-
7/27/2019 Finite Field Arithmetic Architecture Based on Cellular Array
3/8
International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(2): 122-129The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)
1. Then, the Montgomery multiplication
algorithm over GF(2m) can be
formulated as
,mod1 GRBAP
whereR1
is the inverse of Rmodulo G,and RR
1+GG=1 [17]. Thus, by the
definition of the Montgomery residue,
the equation can be expressed as
follows.
GR
GRRRP
mod
mod)()( 1
It means that P is the Montgomery
residue of . This makes it possible toconvert the operands to Montgomery
residues once at the beginning, and then,do several consecutive multiplications/
squarings, and convert the final result to
the original representation. The finalconversion is a multiplication byR
1, i.e.,
= PR1 mod G. The polynomial R
plays an important role in the complexity
of the algorithm as we need to domodulo R multiplication and a final
division byR.
3 PROPOSED ARCHITECTURE
This section describes the proposed
Montgomery multiplication algorithmand architecture.
3.1 Proposed Algorithm
Based on the property of parallel
architecture, we choose the Montgomeryfactor, 2/m
xR . Then, theMontgomery multiplication over GF(2
m)
can be formulated as
GxBAP m mod2/ (2)
We know that x is a root of G and mg
and 0g have always 1 over all
irreducible polynomials. Thus, theequations can be rewritten as follows.
1
mod
1
1
1 xgxg
Gx
m
m
m
(3)
1
2
1
1
1 mod
gxgx
Gx
m
m
m
(4)
Meanwhile, (2) is represented by
substitutingAandBas follows.
]
[
mod
12/
1
22/
2
12/2/
2/
0
12/
1
222/112/
m
m
m
m
mm
mm
mm
AxbAxb
AxbAb
AxbAxb
AxbAxb
GP
(5)
Now, it expresses that Pcan be dividedinto two parts. One is based on the
negative powers of x and the other is
based on the positive powers of x. (5)
can be denoted byP= C+D, where
,mod]
[
2/
0
12/
1
2
22/
1
12/
GAxbAxb
AxbAxbC
mm
mm
.mod]
[
12/
1
22/
2
12/2/
GAxbAxb
AxbAbD
m
m
m
m
mm
Meanwhile, let )(iA and )(iA be
GAx i mod and GAxi mod ,
respectively. Then, based on (3) and (4),the equations can be expressed as
124
-
7/27/2019 Finite Field Arithmetic Architecture Based on Cellular Array
4/8
International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(2): 122-129The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)
,)(
)(
mod)
(
mod
1)1(
0
2
1
)1(
0
)1(
1
1
)1(
0
)1(
1
1)1(
1
2)1(
2
)1(
1
)1(
0
1
)1(1)(
mim
m
ii
m
ii
mi
m
mi
m
ii
ii
xaxgaa
gaa
Gxaxa
xaax
GAxA
,)(
)(
mod)
(
mod
1
1
)1(
1
)1(
2
1
)1(
1
)1(
0
)1(
1
1)1(
1
2)1(
2
)1(
1
)1(
0
)1()(
m
m
i
m
i
m
i
m
ii
m
mi
m
mi
m
ii
ii
xgaa
xgaaa
Gxaxa
xaax
GxAA
where
,1,
20,
)1(
0
1
)1(
0
)1(
1
)(
mja
mjgaa
a
i
j
ii
j
i
j
(6)
0,
11,
)1(
1
)1(
1
)1(
1
)(
ja
mjgaa
a
i
m
j
i
m
i
j
i
j
(7)
Also, using the formulae of )(iA and)(i
A , the terms Cand D are represented
as follows.
)2/(0
)12/(
1
)2(
22/
)1(
12/
)0(
1
12/
2
22/
12/
1
2/
0
]
[
mod
mm
mm
mm
mm
AbAb
AbAbAz
AxbAxb
AxbAxb
GC
(8)
,
]
[
mod
)12/(
1
)22/(
2
)1(
12/
)0(
2/
12/
1
22/
2
12/2/
m
m
m
m
mm
m
m
m
m
mm
AbAb
AbAb
AxbAxb
AxbAb
GD
(9)
where 0z .
The coefficients of C and D are
produced by summing the corresponding
coefficients of each term in (8) and (9),
respectively. It means that cjand dj, for 0jm1 are represented as
)2/(0
)12/(
1
)2(
22/
)1(
12/
)0(
m
j
m
j
jmjmjj
abab
ababazc
Algorithm 1. COM_C(A,B,G)
Input:
),,,,( 0121 aaaaA mm ,
),,,,(' 0122/12/ bbbbB mm ,
),,,,( 0121 ggggG mm
Output:
GAxbAxb
AxbAxbC
mm
mm
mod]
[
1
12/
2
22/
12/
1
2/
0
;)0( jj aa ;0)0( jc 0z ;
for 1i to 12/ m dofor 0j to 1m in parallel do
if (j = 0) then /* 0j */)1(0
)(
1
ii
m aa ;
)1(
012/
)1(
0
)(
0
iimii abcc
(or )0(0)(
0
)1(
0 azcc i if 0i );
else /* 1,2,,2,1 mmj */
jm
ii
jm
i
jm gaaa
)1(
0
)1()(
1 ;
)1(
12/
)1()(
i
jmim
i
jm
i
jm abcc
(or )1()1()(
i
jm
i
jm
i
jm azcc if
0
i );end ifend for
end for
return C
125
-
7/27/2019 Finite Field Arithmetic Architecture Based on Cellular Array
5/8
International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(2): 122-129The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)
.)12/(
1
)22/(
2
)1(
12/
)0(
2/
m
jm
m
jm
jmjmj
abab
ababd
Now, we obtain the following recurrence
equations from the above equations.
,12/1,
1,
)1(
12/
)1(
)1()1(
)(
miabc
iazc
c
i
jim
i
j
i
j
i
j
i
j
where 0)0( jc for 10 mj and
0z , and
)1(
12/
)1()(
ijimi
j
i
j abdd 2/1, mi
where 0)0( jd for 10 mj .
Algorithm 2. COM_D(A,B,G)
Input:
),,,,( 0121 aaaaA mm ,
),,,,(" 1212/2/ mmmm bbbbB ,
),,,,( 0121 ggggG mm
Output:
GAxbAxb
AxbAbD
m
m
m
m
mm
mod]
[
12/
1
22/
2
12/2/
;)0( jj aa ;0)0( jd
for 1i to 2/m dofor 0j to 1m in parallel do
if (j=0) then /* 0j */)1(
1
)(
0
i
m
i aa ;
)1(
112/
)1(
1
)(
1
i
mim
i
m
i
m abdd ;
else /* 1,2,,2,1 mmj */
j
i
m
i
j
i
j gaaa )1(
1
)1(
1
)(
;
)1(
112/)1(
1)(1
ijim
ij
ij abdd ;
end if
end for
end forreturn D
As shown in Algorithm 1 and 2, the
parallel computational algorithms for C
andDare driven by the above equations.The proposed COM_C(A,B,G) and
COM_D(A,B,G) algorithms can be
executed simultaneously since there isno data dependency between computingCandD.
3.2 Proposed Multiplier
Based on the proposed algorithms, the
hardware architecture of the proposed
semi-systolic Montgomery multiplier isshown in Figure 1. The upper, lower,
middle part of the array computes C, D,
and C+D, respectively. Our architectureis composed of 12/ m
)(
0U i cells,
)12/()2( mm )(U ij cells, 2/m
)(
0V i cells, 2/)2( mm
)(V ij cells,
and one S cell.
1mb
2mb
12/ imb
12/ mb
2/mb
2c1mc jmc 1c0c
1ma
0a
1md1jd 0d0p1pjp
2mp1mp
S
1ma 1mg 2g0 1a0 2a 1gjmg jma 0 0 0
1mg 2ma 2mg 1jajg3ma 1g 0a
)0(z
12/ mb
12/ imb
0b
1b
(1)
1-Um(1)
2-Um(1)
U j(1)
1U(1)
0U
)2(
1U )2(
2U m)2(
Uj(2)
1-Um(2)
0U
)(
0U i )(
2Ui
m
)(
Ui
j
)(
1Ui
m
)(
1Ui
)/2(0
U m )/2(
1-U
m
m )/2(U m
j )/2(
2U m
m )/2(
1U m
1)/2(0U m 1)/2(
1-U m
m 1)/2(U m
j 1)/2(
2U m
m 1)/2(
1U m
2md 3md
0000 0
(1)
0V(1)
1V(1)
Vj(1)
1V m(1)
2V m
(2)
0V(2)
1V(2)
Vj(2)
1V m(2)
2V m
)(
0V i)(
1Vi)(V
i
j)(
1Vi
m)(
2Vi
m
12/1V
m 12/V m
j 12/1V
m
m 12/
0V m 12/
2V
m
m
2/1V m 2/V
m
j 2/1V
m
m 2/
0V m 2/
2V m
m
Figure 1. The proposed semi-systolic
Montgomery multiplier over GF(2m)
126
-
7/27/2019 Finite Field Arithmetic Architecture Based on Cellular Array
6/8
International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(2): 122-129The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)
The detailed circuits of the cells in
Figure 1 are depicted in Figure 2 thru
Figure 4, and , , and D denote XORgate, AND gate, and one-bit latch(flip-flop), respectively.
)1(
0
i
c
D D
)(
0
ic )( 1i
ma
12/ imb
)1(0
ia
(a)
)(0U
i
jmg )1(
i
jmc)1(
i
jma
)1(
0
ia
D D D
)(i
jmc )(
1
i
jma
12/ imb
(b)
)(
U
i
j
Figure 2.Circuit configuration of)(
0U i and
)(U ij cell
The latency of the proposed semi-
systolic multiplier requires m/2+1clock cycles. Each clock cycle takes thedelay of one 2-input AND gate, one 2-
input XOR gate, and one 1-bit latch. The
space complexity of this multiplierrequires 2m2+m1 2-input AND gates,
2m2+2m1 2-input XOR gates, and
3m2+2m1(for odd m) or 3m
2+3m1(for
even m) 1-bit latches.
Note that )(U ij ( )(
0U i ) and )(V ij (
)(
0V i )
cells in Figure 2 and 3 are functionally
equivalent cells and the computations
can be executed in parallel, and the
computed results are added in S cell. InFigure 4, D
*denotes one bit latch when
mis even, otherwise it is ignored.
)1(
1
i
md
)1(
1
i
ma
)(
1
i
md )(
0
i
a
12/ imb
DD
(a)
)(
0V i
12/ imb
)1(
1
i
jd )1(
1
i
ja
)1(
1
i
ma
)(
1
i
jd )(i
ja
D D D
jg
(b)
)(V ij
Figure 3. Circuit configuration of)(
0V i and
)(V ij cell
4 COMPLEXITY ANALYSIS
In CMOS VLSI technology, each gate is
composed of several transistors [18]. We
adopt that AAND2 = 6, AXOR2 = 6, and
ALATCH1 = 8, where AGATEn denotestransistor count of an n-input gate,
respectively. Also, for a furthercomparison of time complexity, we
adopt the practical integrated circuits in
[19] and the following assumptions, as
discussed in detail in [15], are made:TAND2= 7, TXOR2= 12, and TLATCH1= 13,
127
-
7/27/2019 Finite Field Arithmetic Architecture Based on Cellular Array
7/8
International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(2): 122-129The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)
where TGATEn denotes the propagation
delay of an i-input gate, respectively.
2mc1mc jc 1c0c
1d 1mdjd 0d
0p
1p
jp
2mp
1mp
2md
D*
D*
D*
D*
D*
Figure 4. Circuit configuration of S cell
Table 1. Comparison of semi-systolic
polynomial basis architectures
gate/delayIn
[15]
Fig. 1
even m/odd m
Number of
cells2
m
U
: 12/ m U :
)12/()2( mm
V : 2/m
V : 2/)2( mm
S :1
2-input AND 22m 12 2 mm
2-input XOR 22m 122 2 mm
3-input XOR 0 0
one-bit latch 23m 133
2 mm /
123 2 mm
Total
transistorcount
2
48m
204248 2 mm /
203448 2 mm
Cell delay(ns) 32 32
Latency m 15.0 m / 5.05.0 m
Total
delay(ns)m32 3216 m / 1616 m
A circuit comparison between theproposed multiplier and the related
multiplier is given in Table 1. Althoughthe proposed multiplier has nearly thesame space complexity compared toHuang et al.[15], the time complexity isapproximately reduced by 50%.
5 CONCLUSION
In this paper, we propose a cellular semi-
systolic architecture for Montgomery
multiplication over finite fields. We
choose a novel Montgomery factorwhich is highly suitable for the design of
parallel structures. We also divided our
architecture into three parts, and
computed two parts of them in parallel
so that we reduced the time complexityby nearly 50% compared to the recent
study in spite of maintaining similarspace complexity. We expect that our
architecture can be efficiently used for
various applications, which demandhigh-speed computation, based on
arithmetic operations.
6 ACKNOWLEDGMENT
This research was supported by BasicScience Research Program through the
National Research Foundation of
Korea(NRF) funded by the Ministry of
Education, Science and Technology(2011-0014977).
7 REFERENCES
1. W. W. Peterson, and E. J. Weldon, Error-Correcting Codes, MIT Press, Cambridge
(1972).
2. R. E. Blahut. Theory and Practice of ErrorControl Codes, Addison-Wesley, Reading
(1983).
3. W. Diffie and M. E. Hellman, Newdirections in cryptography, IEEE
Transactions on Information Theory, vol.
22, no. 6, pp. 644-654 (1976).
4. B. Schneier, Applied Cryptography, JohnWiley & Sons press, 2nd edition (1996).
128
-
7/27/2019 Finite Field Arithmetic Architecture Based on Cellular Array
8/8
International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(2): 122-129The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)
5. P. Montgomery, Modular Multiplicationwithout Trial Division, Mathematics of
Computation, vol. 44, no. 170, pp. 519521
(1985).
6. C. Koc and T. Acar, MontgomeryMultiplication in GF(2k), Designs, Codes
and Cryptography, vol. 14, no. 1, pp. 5769
(1998).
7. H. Wu, Montgomery Multiplier andSquarer for a Class of Finite Fields, IEEE
Trans. Computers, vol. 51, no. 5, pp. 521-
529 (2002).
8. C. Y. Lee, J. S. Horng, I. C. Jou and E. H.Lu, Low-Complexity Bit-Parallel Systolic
Montgomery Multipliers for Special Classes
of GF(2m), IEEE Transactions on
Computers, vol. 54, no. 9, pp. 10611070
(2005).
9. C. W. Chiou, C. Y. Lee, A. W. Deng and J.M. Lin, Concurrent Error Detection in
Montgomery Multiplication over GF(2m),IEICE Trans. Fundamentals of Electronics,
Communications and Computer Sciences,
vol. E89-A, no. 2, pp. 566-574, 2006.
10. A. Hariri and A. Reyhani-Masoleh, Bit-Serial and Bit-Parallel Montgomery
Multiplication and Squaring over GF(2m),
IEEE Trans. Computers, vol. 58, no. 10, pp.
1332-1345 (2009).
11. A. Hariri and A. Reyhani-Masoleh,Concurrent Error Detection in Montgomery
Multiplication over Binary Extension
Fields,IEEE Trans. Computers, vol. 60, no.
9, pp. 1341-1353 (2011).12. C. W. Chiou, Concurrent Error Detection
in Array Multipliers for GF(2m) Fields, IEE
Electronics Letters, vol. 38, no. 14, pp. 688
689 (2002).
13. C. Y. Lee, C. W. Chiou, and J. M. Lin,Concurrent Error Detection in a
Polynomial Basis Multiplier over GF(2m),
J. Electronic Testing: Theory and
Applications, vol. 22, no. 2, pp. 143-150
(2006).
14. S. Bayat-Sarmadi and M.A. Hasan,Concurrent Error Detection in Finite Field
Arithmetic Operations Using Pipelined andSystolic Architectures, IEEE Trans.
Computers, vol. 58, no. 11, pp. 1553-1567
(2009).
15. W. T. Huang, C. H. Chang, C. W. Chiouand F. H. Chou, Concurrent error detection
and correction in a polynomial basis
multiplier over GF(2m), IET Information
Security, vol. 4, no. 3, pp. 111-124 (2010).
16. R. Lidl and H. Niederreiter, Introduction toFinite Fields and Their Applications,
Cambridge Univ. Press (1986).
17. J. C. Jeon and K. Y. Yoo, Montgomeryexponent architecture based on
programmable cellular automata,
Mathematics and Computers in Simulation,
vol. 79, pp. 1189-1196 (2008).
18.N. Weste, K. Eshraghian, Principles ofCMOS VLSI design: a system perspective,
Addison-Wesley, Reading, MA (1985).
19. STMicroelectronics, Available athttp://www.st.com/
129