Effect of glitches against masked AES S-box implementation and countermeasure

34

&

www.ietdl.org

Published in IET Information SecurityReceived on 15th April 2008Revised on 1st October 2008doi: 10.1049/iet-ifs:20080041

ISSN 1751-8709

Effect of glitches against masked AES S-boximplementation and countermeasureM. Alam S. Ghosh M.J. Mohan D. MukhopadhyayD.R. Chowdhury I.S. GuptaDepartment of Computer Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, IndiaE-mail: [email protected]

Abstract: Masking of gates is one of the most popular techniques to prevent differential power analysis (DPA) ofAES algorithm. It has been shown that the logic circuits used in the implementation of cryptographic algorithmsleak side-channel information inspite of masking, which can be exploited, in differential power attacks. Thephenomenon in CMOS circuits responsible for the leakage of masked circuits is known as glitching. Motivatedby this fact, the authors analyse the effect of glitches in CMOS circuits against masked implementation of theAES S-box. The authors explicitly demonstrate that glitches do not affect always. There exists a relationbetween combinational path delay of the circuit and timing difference of input vectors to the circuit, whichhas a bearance on the amount of information leaked by the masked gates. A balanced masked S-box circuit isproposed where the inputs are synchronised by sequential components. Detailed SPICE results are shown tosupport the claim that the modifications indeed reduce the vulnerability of the masked AES S-box against DPAattacks.

1 IntroductionSecure communication is of crucial importance in everydaydigital applications. Cryptographic algorithms are essentialbuilding blocks of various security protocols. Thesealgorithms can be implemented both in software andhardware. Attacks on cryptographic algorithms are usuallydivided into mathematical and implementation attacks. Thelatter is based on weaknesses in the implementation andcan be passive or active. These attacks are also referred toas side-channel attacks as they benefit from side-channelinformation, which is achieved by measuring some physicalquantity.

Nowadays, CMOS is by far the most commonly usedtechnology to implement digital integrated circuits. Thedominating component of the power consumption of aCMOS gate is the dynamic power consumption. Twotypes of power analysis attacks are known. In the simplepower analysis attack, several measurements for the sameinput (or plaintext) are usually used and averaged in orderto filter out the noise. In differential power analysis (DPA)

The Institution of Engineering and Technology 2009

attack, the same approach is used for several input (orplaintexts) instead of one. DPA exploits the relationshipbetween the processed data and the power leakage. Thisattack was first introduced by Paul Kocher et al. in [1].Subsequently, various works [2–5] have been described toexploit side-channel information based on power leakage toattack crypto devices.

Yen, in [3], showed the first practical power analysis onAES hardware. Subsequently, several research works havebeen carried out for developing design alternatives todiminish power-based side-channel leakages. Theseresearch works have mainly progressed in two directions.First, the algorithms are rewriting the algorithms so thatthe intermediate results are random. This is primarily doneby adding some masking schemes at the algorithmic levelsso that the operations inside the device are performed in arandom fashion instead of on actual sequences. However,the primary inputs and outputs of these implementationsare identical to their unmasked versions. Different maskingstrategies on secret key and public key ciphers are reportedin [6–8]. Several patents exist on the gate level masking

IET Inf. Secur., 2009, Vol. 3, Iss. 1, pp. 34–44doi: 10.1049/iet-ifs:20080041

IETdoi

www.ietdl.org

strategies [12–14]. Second, several implementationcountermeasures have been adopted to overcome the side-channel leakage in the CMOS circuits. Some of thesesolutions were based on changing the design flow ofcustom IC and using special libraries. Tiri et al., in [11],have implemented secure IC for AES algorithm by usingwave dynamic digital logic (WDDL). It needs a specialrouting technique, called differential routing. Twoconditions must be satisfied to have constant powerdissipating logic: (1) a logic gate must have exactly onecharging event per clock cycle; and (2) the logic gate mustcharge a constant capacitance in that event. The fabricatedIC uses the technique called WDDL to fulfill the firstcondition, and it uses a differential routing technique toaccomplish the second condition. But, these techniquesmake the design slower than the state-of-the-art CMOSdesign. Badman and Zwolinski [15] have proposed a DPAcountermeasure by using dynamic voltage and frequencyscaling. However, it neither requires special cell-library noruses special design flow, but it makes the system costly toensure random supply voltage variation.

In the aforementioned masking strategies, it wasconsidered that each gate in the masked circuit switchesonly once per clock cycle. But, unfortunately, in the caseof practical CMOS circuits it is natural that the output ofinternal gates switch more than once. The number ofunwanted switching depends on the path delay inside thecircuit [6]. The transitions at the output of a gate thatoccurs before the gate switches to the correct output arecalled glitches. The fact that glitches occur in digitalCMOS circuit is well known. Glitches contributesignificantly to the power consumption of CMOS circuitsand hence, they are very relevant to the DPA attack. Adetailed theoretical analysis and some practical experimentalresults on DPA attacks by Mangard et al. [16], reportedthat all proposed masked gates are vulnerable to power-based side-channel leakage in the presence of glitches. Thispublication and there after [17] shows that the AESimplementation based on masked gates are not withoutthreats. The first successful masked AES S-box was brokenby Mangard et al. [5].

So glitch-free secured masked implementations in CMOStechnology are of immense importance. Recently, Lin et al.[18] came up with a logic design style called pre-chargemasked Reed–Muller logic to overcome the glitches anddissipation timing skew problems in design of DPA-resistant cryptographic hardware. It is based on old logicstyle and cannot be implemented using standard CMOSlibrary. In the papers [19, 20], secured maskedimplementation of AES S-box has been discussedtheoretically in the presence of glitches, but it has not beenimplemented.

The current article describes a countermeasure againstside-channel leakage by proposing a synchronous maskedAES S-box. In these lines the present work explains that

Inf. Secur., 2009, Vol. 3, Iss. 1, pp. 34–44: 10.1049/iet-ifs:20080041

even if glitches exist, the circuit will not make any DPAleakage as the XOR gates do not absorb signals if thecombinational path delay of the circuit and the timingdifference of input vectors follow a relationship. Theproposed masked architecture is balanced to reduce theinternal glitches. The paper develops a masked design ofAES S-box, using sequential elements to synchronise theinput signals. Detailed post-synthesised SPICE simulationshave been illustrated to confirm that the modificationsindeed prevent DPA attacks.

The outline of this paper is as follows: Section 2 gives anoverview of DPA attacks. This section also presents theexisting masking schemes and effect of glitches. Section 3deals with our balance pipelined masked AES S-boxarchitectures to reduce the effect of glitches, and hence, itdiminishes side-channel leakage. Section 4 discusses howthe proposed designs resist side-channel leakage with theirsimulation results. Finally, Section 5 concludes the paper.

2 Power consumption modelIn DPA, the attacker tries to find a correlation betweenexternally known (and guessed) data and internallyprocessed signals. Since he will not be able to gain theseinternal signal data directly, he is obliged to use physicaleffects (the side channels) that are again somehowcorrelated to the internal signals/data. One of the sidechannels is the current consumption. Since we areinterested in a protection of the side channels at gate level,our first step has to be the definition of a powerconsumption model of a gate: a gate g with n inputs andone output will be interpreted as a function g : Fn

2 ! F2.Our premise is that the power consumption during oneclock cycle (in a synchronous design) depends on the inputat the time t0 shortly before the clock edge (the old input)and after the clock edge (the new input), for example at t1

shortly before the next clock edge.

Let g : F2n! F2 be a gate. Denote the input at time t0,

at or shortly before the rising edge of a clock cycle, asa ¼ (a1, . . . , an) [ F2

n and the input at time t1, at orshortly before the next rising edge, as x ¼ (x1, . . . ,xn) [ F2

n. Then the energy consumption of the gate duringthis transition is given by the real number Eg(a, x) [ R.Hence the energy function of the gate g is defined to bethe map

† Eg : Fn2 � Fn

2 �! R

† (a, x) 7! Eg(a, x)

The energy function of a gate may be different forindividual gates in a circuit, even if they are functionallyequal. The reason is that the energy depends mainly on theindividual capacitive load the gate has to drive.

35

& The Institution of Engineering and Technology 2009

36

&

www.ietdl.org

2.1 DPA attack

2.1.1 Simplistic model: In a simplistic energyconsumption model, one mainly identifies the powerconsumption of a gate with the energy needed to drive theoutput capacitance if the output toggles. The energyconsumption of a gate is described only by its digitaloutput behaviour. Hence it is determined by the outputvalues of g at times t0 and t1 and a fixed tuple(Eg,0!0, Eg,0!1, Eg,1!0, Eg,1!1) [ R4. If for example attime t0 the output value of g is 1 and at time t1 it is 0, thenthe energy for this clock cycle is Eg,1!0. Hence, in thismodel the energy function of the gate g is given by

Eg(a, x) :¼ Eg,g(a)!g(x)

2.1.2 Differential power analysis: Assume we have acryptographic algorithm with some secret (key) implementedas a CMOS circuit. Further assume that there is a gate g :F2

2 ! F2 within this circuit. The input values of g at timet0 are (a, b) [ F2

2 and later at t1 are (x, y) [ F22. Since an

attacker will survey the energy consumption of this gateduring several runs of the algorithm with differentmessages, these values may be seen as random variablesa, b, x, y : V �! F2 on some probability space (V, S, P).This gives rise to the following concatenation

1g :¼ EgW (a, b, x, y) : V �! F22 � F2

2 �! R

According to the hypothesis about the value of the secret key(or parts of it), one may construct a partition of V into twodisjoint measurable subsets A and B such that V ¼ A < B,with the property

DE = 0

DE ¼ E(1g jA)� E(1g jB)

where E denotes the mean of a distribution. Even though, theaforementioned construction is done with a wrong hypothesisit yields: E(1g jA) ¼ E(1g jB). One classical example [10], isthe partition of V into

A ¼ v [ V : g(x(v), y(v)) ¼ 1

B ¼ v [ V : g(x(v), y(v)) ¼ 0

With the simplistic energy model’ we obtain E(1g jA) ¼

aEg,0!1 þ a0Eg,1!1 and E(1g jB) ¼ bEg,0!0 þ b0Eg,1!0

for a :¼ P({v [ V : g(x(v), y(v)) ¼ 0}jA), a0 :¼ 1� a

and b :¼ P({v [ V : g(x(v), y(v)) ¼ 0}jB), b0 :¼ 1� b.In general these two expectation values are not equal (if thehypothesis was correct). This gives rise to the classical DPA.

It is clear that, if (Eg,0!0 ¼ Eg,0!1 ¼ Eg,1!0 ¼ Eg,1!1,then indeed the two expectation values are always equal,


independent of whether the hypothesis was right or wrong.Hence no DPA is possible. In general terms, if the energyfunction Eg : F 2

2 � F22 �! F is constant, then the gate does

not leak information and a DPA on the gate is notpossible. In practice, these conditions are only met if alogic style is chosen for the implementation that guaranteesthe constancy of the energy function itself.

2.2 Masking on gate level ascountermeasure

The countermeasure using masking on the gate level aims atrandomising the intermediate results such that DE ¼ 0. In theapproach, each of the probably attacked signal b is representedby bm ¼ b XOR mb, where mb is a uniformly distributedrandom variable (i.e. p(mb ¼ 0) ¼ p(mb ¼ 1) ¼ 1/2) and isindependent of b. Consequently, the bm also is a uniformlydistributed random variable. In the masking approach, acircuit is replaced with a masked implementation, as shown inFig. 1. We do this as a gate by gate basis. The process ofreplacing a gate by masking is commonly known as lifting ofthe gate. The old gate g : F2

2 ! F2 is replaced by (g0), whichis a function defined over F2

2 � F22 ! F2 with the property g

(a, b) ¼ g0(am, ma, bm, mb, mc)þ mc and a ¼ am þ ma,b ¼ bm þ mb. Function g0 is called the masked lifting of g.Fig. 1 shows an example for a circuit using masked lifting ofgates (left hand sketch) and a realisation of a lifting of anAND gate [7, 16] (right hand sketch).

Another choice is using two gates (g01, g02) : F22 � F2

2 ! F22

with the property g0(a, b) ¼ g 01(a1, a2, b1, b2)þ g 02(a1,a2, b1, b2). The pair ( g01, g02) is called a randomised liftingof g 0. The simplistic energy consumption model gives

Eg 0 ((a0, b0mc), (x0, y0, mz)) ¼ Eg 0,g 0(a 0,b0mc )!g0(x 0,y0mz)

where (a0, b0, mc) [ F22 � F2

2 � F2 is the input at time t0,(x0, y0, mz) [ F2

2� F2

2� F2 is the input at time t1 with

the abbreviations a0 ¼ (am, ma); mz is interpreted as arandom variable and so on. The energy consumptionEg((a, b),(x, y)) has to be substituted by the expected valueE(Eg0 ((a

0, b0mc), (x0, y0mz))).

Figure 1 Masked lifting of gates [7]


IETdo

www.ietdl.org

2.3 Analysis of glitch problem over maskimplementation

It is realised by Mangard et al. [16] that in realistic CMOSimplementations the different signals am, ma, bm, mb, mq

may not arrive at the gate g0 at the same time. In theexample circuit of Fig. 1, signal qm may arrive with a delayat the input of gate g02 compared with signals mq, cm, mc

due to the gate delay imposed by g01. Furthermore, all inputsignals of gate g02, in general, have different additional delaydue to the propagation delay caused by wire capacitances.Thereafter, Mangard et al. in [17] pin-pointed that theXOR gates in masked gates are responsible for thecorrelation between power consumption and the value of q.In the common mask multiplier (Fig. 1) there exist glitchesdue to the following reasons. The masked multiplierconsists of four unmasked multipliers that calculate theintermediate values i1. . .i4. These intermediate values arethen summed by 4n XOR gates. A mask multiplier of thiskind has been used as a masked AND gate (n ¼ 1) in [7].Since the inputs to the 4 XOR gates i1, i2, i3 and i4 areuncorrelated to the unmasked values, it was expected thatthe outputs of the XOR gates should be also uncorrelatedto the unmasked values. However, XOR gates absorbcertain transitions when both values change simultaneouslyor within a small interval of time. It was shown that due tothe difference of arrival time between the inputs to theXOR gates, the transitions of the output of the XOR gateswere correlated to the delays in the circuit, which were inturn correlated to the unmasked values.

3 CountermeasuresConsider the example that the signals arrive in the distinct orderam, bm, ma, mb, mq. Let the arrival time of the inputs am, bm, ma,mb and mq be t1 t2, t3, t4 and t5, respectively. In this case, theoutput value of the gate changes not only once during theclock cycle but five times leading to the consecutive outputtransitions d1 :¼ g01(am, bm, ma, mb, mq)! d2 :¼ g01(am(t1),bm,ma, mb, mq)! d3 :¼ g01(am(t1), bm(t2), ma, mb, mq)!d4 :¼ g01(am(t1), bm(t2, ma(t3), mb, mq)! d5 :¼ g01(am(t1),bm(t2), ma(t3), mb(t4), mq)! d6 :¼ g01(am(t1), bm(t2), ma(t3),mb(t4), mq(t5)). Therefore the energy consumption will begiven by the sum Eg0

1,d1!d2

þ Eg 01,d2!d3

þ Eg 01,d3!d4

þ

Eg01,d4!d5

þ Eg01,d5!d6

. Our experiment focuses only on masklifting of g that is, g01.

3.1 Glitches do not effect always

Observation 1: Let us think the circuit (Fig. 1) as a black box.Let the arrival time of the inputs am, bm, ma, mb, mq be t1, t2, t3,t4 and t5, respectively. Let us define Dt as the maximum of theabsolute differences of ti and tj, that is,Dt ¼ max (jti � tj j) forevery fi, jg in f1, . . ., 5g. If l is the minimum path delay of thecircuit g ; then glitches can have effect if and only if Dt , l.

The observation signifies that, even if there exist glitches inmasked S-box it might not threaten or affect the security by

Inf. Secur., 2009, Vol. 3, Iss. 1, pp. 34–44i: 10.1049/iet-ifs:20080041

means of power consumption. We can explain this easily.The main reason of glitch in the architecture of themultiplier shown in Fig. 1 is XOR gates. Now, if Dt � l,the XOR gates do not absorb any transition, thus, thecircuit changes its output as many times in a single clockcycle as their input. But if Dt , l, the output changes onlyonce in a single clock cycle. In that case, the transitions areabsorbed by the XOR gates without effecting the output ofthe circuit due to the difference of arrival time between theinputs to the XOR gates. The transitions of the output ofthe XOR gates are then correlated to the delays in thecircuit that are in turn correlated to the unmasked values.Even if Dt ¼ 0 there exist glitches in the circuit that maycause the XOR gates to leak side-channel information.

3.2 Countermeasure using clock

Our objective is to design a balanced multiplier architecturewhere glitches are minimised and do not affect the side-channel vulnerabilities of the circuit. Fig. 2 depicts oursynchronous balanced mask multiplier. It may be notedthat in the proposed structure, if all the inputs arrive at thesame time they shall reach all the internal gates at the sametime. Our balanced masked multiplier removes the effect ofthe internal gates, which may occur in a skewed design ofthe previous circuit. The multiplier primarily has two n-bitmasked inputs am and bm and their masking agents ma andmb. We need another masking agent mq for robustness.The masking agents are necessarily required to reachexactly at the same time at the multipliers, that is Dt ¼ 0,which is achieved by the synchronous positive edgetriggered D type register elements.

The GF(2) masked multiplier essentially generates 1-bitoutput qm ¼ ab� mq. The output qm is the function ofintermediate outputs i1, i2, i3, i4 and i5, and it is computedas qm ¼ i1 � i2 � i3 � i4 � i5. At this stage we have two

Figure 2 Our proposed architecture of a balanced maskedmultiplier

37


38

&

www.ietdl.org

objectives. First, no intermediate output is correlated to theactual result ab [8]. Second, the organisation of the XORoperations are balanced so that no glitch occurs. In order tosatisfy the second one, we have added one extra multiplier(M) module that performs bitwise mq.1 operation. As wehave five outputs after M operations (i1, . . . , i5), aninverted binary tree like structure does not support thesecond criteria. We first make an inverted binary treestructure for generating intermediate results i8 fromfi2, . . . , i5g, satisfying our first objective. For computingfinal result qm, we need to perform another XOR operationbetween i1 and i8, which are in different path lengths inthe masked multiplier circuit. Therefore at this stage, thesignals i1 and i8 are synchronised at the XOR inputs. Asmay be seen in Fig. 2, they are passed to the XOR gatethrough two synchronous negative edge triggered D flip-flops. Essentially, the proposed architecture splits thecircuit into two stages. Within a stage, all XOR inputsignal paths have the same length, thus minimising theinternal glitches. The inputs of the individual stages aresynchronised by the D flip-flop. That is all the inputsignals change their state at the same time instance.

The correctness of the circuit can be easily verified byfollowing equation

qm ¼ q � mq

¼ (ab)� mq

¼ (am � ma)(bm �mb)� mq

¼ (am:bm � bm:ma � am:mb � ma:mb � mq:1)

In our multiplier circuit, no intermediate result i1, . . . , i8 arecorrelated to the original data assuming that the maskingagents ma, mb, and mq are randomly chosen. Theexpressions of i1, . . . , i8 are as follows

i1 ¼ am:bm, i2 ¼ bm:ma, i3 ¼ am:mb, i4 ¼ ma:mb, i5 ¼ mq:1

i6 ¼ bm:ma � am:mb, i7 ¼ ma:mb � mq

i8 ¼ i6 � i7 ¼ bm:ma � a:mb � mq

Fig. 3a presents the design of an AES S-box for reducing theglitches. The blocks MAP and MAP21 are used to transforman element in GF(28) to GF(24)2 and viceversa. For reducingthe glitches in the circuit, all inputs of the masked S-box havebeen synchronised by clock signal. There are five stages in theS-box circuit. It ensures that the inputs (8-bit A and 4-bitfresh masks) are synchronised by a clock even if they arriveat different times. Now REG1, REG2, REG3 and REG4are the pipelined registers. The multipliers in the pipelinedstages of the S-box are the previously described maskedmultiplier, redrawn in Fig. 3b. It may be noted that theregister inside the multiplier is clocked at the negative edgeof the clock if the registers REG1 – 4 are clocked in the


positive edge. This clocking strategy ensures that theoutput of the multiplier is obtained in one clock event.

The five-stage pipelined S-box is useful for high-speedimplementation of AES with 16 parallel S-boxes for bulkencryption. Owing to pipelined stages, the circuit operatesat a high frequency at the cost of more hardware.

4 Implementation resultsBoth the proposed architecture and previously known designs[7] have been implemented in CMOS transistor level usingCMOS9 libraries. The post place and route net lists arecreated, and the post-layout designs are simulated usingSPICE tool like Spectre Spice. The DPA results of bothmultiplier and S-box are demonstrated below.

4.1 DPA analysis on sequential maskedmultiplier

The designs have been simulated by a 40 ns clock period forall the possible 1024 input transitions (as there are fourtransitions e.g. 0! 0,0! 1,1! 0,1! 1 for each bit asdescribed in [16]). We have made the observations byvarying the timing of the input signals once all the inputsignals are simultaneously applied. In other cases, one ofthe input transitions occur 10 ns before the othertransitions. Likewise, two of the input signals occur atdifferent times maintaining a time interval of 10 nsbetween them and with the other signals. Thus, there canbe six combinations such as all inputs come at the same

Figure 3 Sub-pipelined and proposed architectures

a Our sub-pipelined architecture of a masked S-boxb Our proposed architecture of a masked multiplier used inFig. 3a


IETdo

www.ietdl.org

time, one different time, two different times and at most allfive inputs in different times. Each time we have collectedthe current drawn by the circuit to estimate the powerconsumption of the device. Using these simulation results,we have performed the DPA, which is described next.

As discussed in our sequential design (Fig. 2), the wholeGF(2n) masked multiplier is divided into two stages. Theinput transition of the first stage happens at the positiveedge of clock, where as the second stage is activated at thenegative edge of clock. In our experiment, the first positiveedge comes after 20 ns from the start of simulation and thefirst negative edge occurs after 40 ns. It is assumed that theoutput will be stabilised before the next positive edge,which occurs after 60 ns from the start of simulation.Therefore for the side-channel leakage analysis it isnecessary to trace the power leakage of the device from 0 to60 ns for every input transition.

The obtained current (power) values have been dividedinto two classes: the first one when the output is 1 and thesecond one when the output is 0. Our second analysis isbased on the difference of means for the sampled data for

Inf. Secur., 2009, Vol. 3, Iss. 1, pp. 34–44i: 10.1049/iet-ifs:20080041

the two classes. The analysis is performed as shown in [16]on the masked AND gate. As described we have separatedthe set of EDt¼0 (power consumption for no delay) datainto two different subsets or classes. The first set containsthe EDt¼0 values (say E0

Dt¼0) for which the final output q ofthe masked multiplier is 0 and the second set contains thevalues (say E1

Dt¼0) for which the q is 1. Essentially the setis equally divided and each subset contains exactly 512candidates. Now we have calculated the mean values (E) ofboth the sets, that is say EE0

Dt¼0and EE1

Dt¼0by the following

equations

EE0Dt¼0¼

P511i¼0 E0

Dt¼0

512

EE1Dt¼0¼

P511i¼0 E1

Dt¼0

512

Finally, the difference of means DEDt¼0 is calculated asjEE0

Dt¼0� EE1

Dt¼0j. Similarly, the difference of means when

inputs come in different delay can be measured asDEDt=0 ¼ jEE0

Dt=0� EE1

Dt=0j and these difference of means

have been plotted.

Figure 4 Transient power consumption for different inputs on our proposed

a Masked multiplierb GF(24) masked multiplier used in AES S-box

39


40

&

www.ietdl.org

The transient power analysis of the circuits is done by theSpectra SPICE tool. The transient power consumption ofproposed masked multiplier and GF(24) masked multiplierused in AES S-bix circuit are shown in Fig. 4. It is clearthat there is some difference in power consumption forprocessing different inputs; that is for producing differentoutput transitions. The difference of power consumption isexploited in the DPA attack.

In our results, we refer to the architecture proposed in [7]as the common architecture. The results of the attacks for thecommon architecture are shown in Fig. 5. Note that we havechosen the values of Dt ¼ 10 ns, so that it is less than the l

value of the circuit. The chosen Dt value ensures that theXOR gates absorb the transitions and the glitches leakside-channel information. This is to test the circuits in anadverse environment. The Y-axis of the plots are thedifference of means, whereas the X-axis shows the variousinstances of time during the 60 ns period when the


observations are made. We have seen that the timedifference of inputs affects the common architecture. Weobserve that the maximum of the difference of mean isaround 80–90 mA, which is significant to be exploited inan attack. Another key point, to be noted, is even whenDt ¼ 0 (all the inputs arrive at same time) we have asignificant value of the difference of mean.

The results of the attack (difference of means) for oursuggested architecture are shown in Fig. 6. The plots showthat the difference of the means are (25–35 mA) muchsmaller than the common architecture. This is due to theprevention mechanism inside the masked multiplier circuit.

Fig. 7 portrays the average values of the difference ofmeans of leakage current for the common and proposedmasked multiplier. These plots are basically drawn byconsidering all the difference of means of leakage currentwhich are plotted in Figs. 5 and 6. From the plots it can

Figure 5 Difference of means of leakage current for common masked multiplier [7] (Dt , l)

a All inputs make transition at same timeb mq makes transition 10 ns after other inputsc Two inputs make transition in different timesd Three inputs make transition in different timese Four inputs make transition in different timesf All inputs make transition in different times


IETdoi

www.ietdl.org

Figure 6 Difference of means of leakage current for our modified masked multiplier (Dt , l)

a All inputs make transition at same timeb mq makes transition 10 ns after other inputsc Two inputs make transition in different timesd Three inputs make transition in different timese Four inputs make transition in different timesf All inputs make transition in different times

Figure 7 Average difference of means of leakage current for

a Common multiplier (considering all cases of Fig. 5b Our proposed multiplier(considering all cases of Fig. 6)

:

be claimed that the average difference of means of the leakagecurrent is reduced by 55% in our proposed DPAcountermeasures. Therefore our sequential maskedmultiplier provides better security against DPA attacks inthe presence of glitches.

Inf. Secur., 2009, Vol. 3, Iss. 1, pp. 34–4410.1049/iet-ifs:20080041

4.2 DPA analysis of pipelined maskedAES S-box

The DPA analysis on the AES S-box is almost same aspreviously described with a small modification. As

41


42

&

www.ietdl.org

described in Fig. 3a, there are 12-bit inputs and 8-bit outputof the S-box. We have 412 input transitions for a set of 28

different outputs. Each output is essentially generated from216 or 32 K different input transitions. The outputs areindicated by the integer variables i, j, where 1 � i, j � 28.For DPA, mean value (E) for the set i and j are calculated as

EEiDt¼0¼

P216�1

i¼0 EiDt¼0

216

EE

j

Dt¼0

¼

P216�1

i¼0 EjDt¼0

216


Similarly the difference of means DEDt¼0 is calculated asjEEi

Dt¼0� E

Ej

Dt¼0

j. By the same way, DEDt=0 is calculated.

We have chosen the values of i and j as f00gh and fFFghrespectively.

The results of the attacks for the common architecture ofS-box [7], and our suggested one are shown in Figs. 8 andFig. 9, respectively. All results are obtained from post-layout simulation model. All figures show some correlationpeak, which is the main issue of the DPA attack. Ourexperiment claims that there exists a correlation of leakagesignals, and the data processed in the S-box. Thiscorrelation may find the secret keys (or data). So the peaks

Figure 8 Difference of means of leakage current for common S-box architecture [7] (Dt , l)

a All inputs make a transition at same timeb At least one input makes a transition 6 ns after other inputs

Figure 9 Difference of means of leakage current for our suggested S-box architecture (Dt , l)

a All inputs make transition at same timeb At least one input makes transition 6 ns after other inputs

Figure 10 Average difference of means of leakage current for

a Common S-box [7] (considering both cases of Fig. 8)b Our proposed S-box (considering both cases of Fig. 9)


IETdoi

www.ietdl.org

of the correlation may be regarded as a measure of thestrength of the S-box against DPA.

From the figures, it may be observed that the difference ofmean peaks of the common architecture are much highercompared to our suggested one. Higher peak indicateshigher vulnerability to DPA attack. Fig. 10 portrays thatthe average difference of means in case of common S-boxis nearly 85 mA, and same for our proposed S-box is nearly25 mA. Thus, our suggested S-box is 70% less vulnerablecompared to the common one.

4.3 Hardware comparison of proposedscheme

The SPICE simulations show that the suggested schemesimprove the security of S-Box against DPA attacks.Table 1 illustrates a brief comparisons of existing maskedAES S-box [21] along with our suggested one. Criticalpath of masked multiplier of [7] is 1 multiplier þ 4 XORgates whereas the critical path delay of our proposedmasked multiplier is 1 multiplier þ 3 XOR gates.

5 ConclusionThe paper has observed the DPA of AES S-box in thepresences of glitches. A relation between the input timingdifference and the combinational path delays indicate whenthe glitches become harmful. The paper has proposed amasking technique for multipliers, reducing the effects ofglitches. The multipliers have been used to realise abalanced five-stage pipelined S-box that is stronger againstDPA. SPICE simulation results of post-layout circuit havebeen demonstrated to support our claims.

6 AcknowledgmentWe are grateful to the Department of InformationTechnology (DIT), Govt. of India for funding us to fulfilthis work.

7 References

[1] KOCHER P., JAFFE J., JUN B.: ‘Introduction to differentialpower analysis and related attacks’, 1998, http://www.cryptography.com/

Table 1 Hardware comparison

Design Area,gates

Speed,MHz

Leakagecurrent, mA

[7] 3628 16.91 70

[18] 16 465 9.82 85

ourmodified

5478 120 35

Inf. Secur., 2009, Vol. 3, Iss. 1, pp. 34–44: 10.1049/iet-ifs:20080041

[2] STANDAERT F., ORS S., PRENEEL B.: ‘Power analysis of an FPGAimplementation of Rijndael: is pipelining a DPAcountermeasure?’. CHES, 2004, (LNCS, 3156), pp. 30–44

[3] YEN S.M.: ‘Amplified differential power cryptanalysison Rijndael implementations with exponentially fewerpower traces’. Information Security and Privacy – ACISP2003, Wollongong, Australia, 2003, (LNCS, 2727),pp. 106–117

[4] ORS S.B., GURKAYNAK F., OSWALD E., PRENEEL B.: ‘Power-analysis attack on an ASIC AES implementation’.Information Technology: Coding and Computing, 2004,(LNCS, 2), pp. 546–552

[5] MANGARD S., PRAMSTALLER N., OSWALD E.: ‘Successfullyattacking masked AES hardware implementations’. CHES2005, Edinburgh, Scotland, August 2005, (LNCS, 3659),pp. 157–171

[6] GOLIC J.D., TYMEN C.: ‘Multiplicative masking and poweranalysis of AES’. CHES 2002, Redwood Shores, CA, USA,August 2002, (LNCS, 2535), pp. 198–212, Revised Papers

[7] TRICHINA E.: ‘Combinational logic design for AES subbytetransformation on masked data’. Cryptology ePrint ArchiveReport 2003/236 (http://eprint.iacr.org/)

[8] TRICHINA E., SETA D.D., GERMANI L.: ‘Simplified adaptivemultiplicative masking for AES’. CHES 2002,Redwood Shores, CA, USA, August 2002, (LNCS, 2535),pp. 187–197, Revised Papers

[9] BLOMER J., GUAJARDO J., KRUMMEL V.: ‘Provably securemasking of AES’. Selected Areas in Cryptography – SAC2004, Waterloo, Canada, August 2004, (LNCS, 3357),pp. 69–83, Revised Selected Papers

[10] TRICHINA E., KORKISHKO T., LEE K.H.: ‘Small size, lowpower, side channel-immune AES coprocessor:design and synthesis results’. Advanced EncryptionStandard – AES 2004, Bonn, Germany, May 2004,(LNCS, 3373), pp. 113 – 127, Revised Selected and InvitedPapers

[11] TIRI K., SCHAUMONT P.: ‘Changing the odds against maskedlogic’. Selected Areas of Cryptography (SAC’06), 2006,(LNCS, 3156), pp. 30–44

[12] MESSERGES T.S., DABBISH E.A., PUHL L.: ‘Method andapparatus for preventing information leakage attacks on amicroelectronic assembly’. US Patent 6,295,606,September 2001

[13] OSWALD E., MANGARD S., PRAMSTALLER N., RIJMEN V.: ‘A side-channel analysis resistant description of the AES S-box’.Fast Software Encryption – FSE 2005, Paris, France,February 2005, (LNCS, 3557), pp. 413–423

43


44

&

www.ietdl.org

[14] TIRI K., HWANG D., HODJAT A., ET AL.: ‘A side-channel leakagefree coprocessor IC in 0.18 m CMOS for embedded AES-based cryptographic and biometric processing’. DesignAutomation Conf. – DAC 2005, Anaheim, California, USA,June 2005

[15] BADDAM K., ZWOLINSKI M.: ‘Evaluation of dynamic voltageand frequency scaling as a differential power analysiscountermeasure’. 20th VLSI Design – 6th EmbeddedSystems – VLSID 2007, Bangalore, India, January 2007,pp. 854–859

[16] MANGARD S., POPP T., GAMMEL B.M.: ‘Side-channel leakage ofmasked CMOS gates’. Topics in Cryptology – CT-RSA 2005,The Cryptographers’ Track at the RSA Conf. 2005, SanFrancisco, CA, USA, February 2005, (LNCS, 3376), pp. 351–365

[17] MANGARD S., SCHRAMM K.: ‘Pinpointing the side-channelleakage of masked AES hardware implementations’.Cryptographic Hardware and Embedded Systems – CHES


2006, Tokio, Japan, September 2006, (LNCS, 3738),pp. 156–171

[18] LIN K., FANG S., YANG S., LO C.: ‘Overcoming glitches anddissipation timing skew in design of DPA resistantcryptographic hardware’. Design Automation and Test inEurope (DATE’07), Nice, France, April 2007

[19] FISCHER W., GAMMEL B.: ‘Masking at gate level in thepresence of glitches’. CHES 2005, 2005, (LNCS, 3659),pp. 187–200

[20] NIKOVA S., RECHBERGER C., RIJMEN V.: ‘Thresholdimplementations against side-channel attacks and glitches�’.8th Int. Conf. Information and Communications Security(ICICS’06), 2006, (LNCS, 4307), pp. 529–545

[21] POPP T., MANGARD S.: ‘Masked dual-rail pre-charge logic:DPA-resistance without routing constraints’. CHES 2005,2005, (LNCS, 3659), pp. 172–186


Effect of glitches against masked AES S-box implementation and countermeasure

Documents

Transcript of Effect of glitches against masked AES S-box implementation and countermeasure