Low power scalable encryption for wireless systems

Wireless Networks 4 (1998) 55–70 55

Low power scalable encryption for wireless systems ∗

James Goodman and Anantha P. ChandrakasanMassachusetts Institute of Technology, Cambridge, MA, USA

Secure transmission of multimedia information (e.g., voice, video, data, etc.) is critical in many wireless network applications.Wireless transmission imposes constraints not found in typical wired systems such as low power consumption, tolerance to high bit errorrates, and scalability. A variety of low power techniques have been developed to reduce the power of several encryption algorithms.One key idea involves exploiting the variation in computation requirements to dynamically vary the power supply voltage. Applicationof low power techniques to a wireless camera application yield more than an order of magnitude reduction in power consumptionover conventional design methods. Test circuits for five algorithms have been fabricated in a 0.6 µm process and the resulting powerconsumption of each is presented. In addition, a low power hybrid system that combines a power-efficient keystream generator witha secure pseudo-random seed generator is proposed that provides 1 Mbps data encryption at a total estimated power consumption of150 µW.

1. Introduction

Two of the biggest trends in computing today are globalnetworking and mobile computing. The popularity of theInternet is an example of the drive towards a global webthat allows users to communicate and share informationwith other systems located around the world. At the sametime the current trend in computing hardware is towardsportable, battery-operated nomadic computing terminals.The popularity of wireless networks is a direct result ofthese two trends as people strive to remain connected tothe global web without having to be tied down to a wiredlink.

Unfortunately, wireless networks are known for theirsusceptibility to tampering and eavesdropping. In a wirednetwork, the fact that a user must be physically connectedto the network, and that information is transmitted withinprotected physical links (wires), offers the user some mea-sure of security. However, in a wireless network there areno such guarantees as anyone with a simple radio trans-mitter can pretend to be a valid network user. In addition,the transmission medium is the open air which implies thatanyone with the appropriate radio scanner can eavesdropas well. In the North American cellular phone network,for example, service providers attribute annual losses dueto fraud at over $500 millions [10,17]. The need for devel-oping secure wireless transmission systems becomes evenmore apparent when one considers the gradual migrationof conventional commerce, electronic banking, and otherapplications to the Internet; a trend that will inevitably con-tinue to wireless networks.

From a security standpoint a wireless network user isconcerned with two primary issues: guaranteeing the iden-tity of other users on the network with whom they are com-municating (authentication), and ensuring that any informa-

∗ The work described within this paper is funded by DARPA contractDAAL-01-95-K3526.

tion regarding their identity and their data is not exposedto unauthorized eavesdroppers (privacy). The wireless net-work administrator in turn is concerned both with providingthe above guarantees to ensure user satisfaction, and pre-venting fraudulent usage of the network.

Authentication is performed via protocols (e.g., [4,8,27,52]) that require each party to demonstrate knowledge ofsome secret information that only a valid user would know,without exposing what that secret is. In some instances au-thentication is performed in only one direction (e.g., GSM[18]), while in others both parties must authenticate them-selves (e.g., [4]).

Privacy is achieved through the use of data encryptionfor encoding the data stream, and the use of temporaryor encoded IDs to provide user anonymity. Depending onthe type of encryption used the communicating parties mayneed to share some secret piece of information called thekey that is used to encrypt/decrypt the data. If a key isrequired then users must perform some form of key agree-ment protocol in which they agree upon or exchange thevalue of the secret key (e.g., [4,15,16]). In some appli-cations (e.g., GSM [18]) both parties might already sharesome secret information and use it to generate a temporarysession key.

In order to maximize the battery lifetime, mobile com-puting units must feature ultra low power electronics. Lowpower design requires a systematic optimization approachat all levels of the design. A variety of low power tech-niques exist in the literature (e.g., [11,40]).

By dynamically scaling the encryption algorithms thesystem can allocate security where it is needed within thedata stream. High priority data can be encrypted usingstronger encryption (which will require increased powerconsumption), while low priority data can be encryptedwith weaker encryption. Hence, the system can dynam-ically allocate the battery-power to where it will be bestused in terms of security.

J.C. Baltzer AG, Science Publishers

56 J. Goodman, A.P. Chandrakasan / Low power scalable encryption for wireless systems

Figure 1. MIT wireless video camera project.

The work presented in this paper is motivated by an ultralow power wireless video camera that is currently being de-veloped at MIT (figure 1). The goal is to develop a wirelessvideo camera capable of operating at a wide range of datarates (e.g., 1 bps–1 Mbps) for a prolonged period of time(e.g., 100 hours of battery-powered active operation). Thedesign specifications of the camera require its size to be nogreater than 1 in3, and its power budget to be approximately50 mW. While the encryption work presented in this paperis driven by this project, it is broadly applicable to a varietyof wireless systems.

2. Encryption algorithms

Cryptographic algorithms involve performing variousoperations on a given input to produce an encoded output,based on the value of a secret control input called the key.The security of a cryptographic algorithm is dictated by thedifficulty of deducing the input given the output and all de-tails regarding the encoding algorithm with the exceptionof the value of the key. Thus, the security of the algorithmrests solely in the secrecy of the key and the computationalcomplexity of deducing its value.

Further information regarding cryptography and encryp-tion algorithms can be found in any of the numerous avail-able references (e.g., [30,45,48,49], the Journal of Cryp-tology, and the proceedings of the CRYPTO and EURO-CRYPT conferences).

2.1. Algorithm classification

At a high level, cryptographic algorithms fall into oneof two categories: asymmetric algorithms, which are com-monly known as the public key algorithms, and symmetricalgorithms, which are commonly referred to as secret keyalgorithms. Public key algorithms have the advantage thatno secret information need ever be shared between senderand receiver. Unfortunately, public key algorithms are typ-ically based on hard number-theoretic problems which canbe too computationally intensive for use as a low powerencryption system. However, for applications such as keyexchange and authentication, which are performed very in-

frequently, it may be feasible to implement them in softwareusing a low power microprocessor.

Secret key systems rely on the fact that both sender andreceiver share a piece of secret information (i.e., the key).Secret key systems can be made very computationally effi-cient as the security of the system is based on the secrecyof the key, and not the computational complexity of thealgorithm.

Secret key systems can be categorized into either blockor stream ciphers. By definition block ciphers (e.g., DES[34], IDEA [25], BLOWFISH [46], GOST [47], SAFERK-64 [31], and RC5 [41]) are memoryless algorithms thatpermute N -bit blocks of data under the influence of thesecret key and generate N -bit blocks of encrypted data.Block ciphers are designed so that inputs differing by somuch as a single bit generate widely varying outputs. Thisproperty, while being highly desirable for security reasons,leads to the phenomenon of error propagation in block ci-phers. Error propagation is the reason that block ciphersare not well suited to applications where high bit error ratesare common place.

Stream ciphers (e.g., SEAL [44], WAKE [51], RC4 [48,pp. 397–398] and PIKE [3]) contain internal state and typ-ically operate serially by generating a stream of pseudo-random key bits (i.e., the keystream) that are then XORedwith the data to encrypt/decrypt it. Stream ciphers do notsuffer from error propagation as each bit is encrypted inde-pendently of any other.

2.2. Comparison of block vs. stream ciphers

A variety of issues must be considered in order to choosebetween using a block or stream cipher for a given appli-cation. In this section we compare several inherent proper-ties of block and stream ciphers to determine which is bestsuited to our application.

2.2.1. Error propagationAs previously mentioned, block ciphers will cause single

bit errors in the encrypted block to propagate into multiplebit errors in the decrypted block. For wireless communica-tion channels, one must be especially concerned with errorpropagation as BERs can be as high as 10−2.

Figure 2 shows the effects of error propagation for avideo data stream consisting of raw pixel data encryptedusing a stream cipher and a 64-bit block cipher. As seenfrom the figure, a single bit error will become a large burstof errors for the block cipher due to error propagation. Ifthe data stream encoding is intolerant to even single bit er-rors (e.g., Huffman encoding) then error propagation is notan issue as it involves the propagation of existing errors.Hence an error has already occurred and the data is lostregardless of how many other errors are generated due toerror propagation. However, if the encoding is tolerant toerrors (e.g., bit-mapped data), then error propagation analy-sis is extremely important as the increased BER translatesinto image degradation.

J. Goodman, A.P. Chandrakasan / Low power scalable encryption for wireless systems 57

Figure 2. Effects of error propagation on an uncompressed video image.

Assuming that channel bit errors can be modeled as in-dependent, identically distributed events with some prob-ability Pe, then the channel BER for a block of length Bbits can be expressed as

BER =1B

B∑i=1

(iP ie (1− Pe)B−i

(B

i

))= Pe. (1)

Now assume that a block cipher is used to encrypt thedata before it is transmitted over the channel. The blockcipher operates on blocks of length B, and exhibits ex-cellent diffusion characteristics such that even a single biterror will affect all other bits within the block with proba-bility 1/2. Hence a single bit error in the encrypted blockwill decrypt to a block with, on average, B/2 bit errors.Under these assumptions, the BER for a block of lengthB bits that is encrypted before being transmitted, and thendecrypted at the output of the channel (i.e., BERencrypt) canbe expressed as

BERencrypt =1B

B∑i=1

((B

2

)P ie (1− Pe)B−i

(B

i

)). (2)

Figure 3 shows the resulting BERencrypt, assuming a typ-ical block length of 64 (e.g., IDEA and DES algorithms)for channel BER values ranging from 10−1 to 10−5. Notethat the bit error rate with encryption is approximatelyB/2 = 32 times that of the original channel for BERsbelow 10−3.

Figure 3. Theoretical BER for block size = 64 bits.

Experiments were conducted using a 100-frame videosequence. The channel BER was varied from 10−2 to 10−4,and the 64-bit IDEA block cipher was used for encryp-tion/decryption. Figure 4 shows the multiplicative effectsof error propagation for both the simulation, and that pre-dicted by equation (2).

2.2.2. Decoupling the data streamThe inherent structure of a stream cipher can be used to

decouple the encryption function from the data stream bybuffering the keystream. This decoupling of the encryptionfunction from the data stream provides additional latitude


Figure 4. Error propagation ratio.

when it comes time to apply power reduction techniques, afact which we will later exploit. This freedom makes thestream ciphers much more flexible than the block cipherswhich operate directly on the data stream and hence cannotbe decoupled.

2.2.3. Software efficiencyIn some wireless applications the mobile unit may in-

clude a microprocessor that can be used to implement thedata encryption/decryption operation in software. Micro-processors operate on words of data and, depending on theprocessor, the size of these words can range anywhere from8 to 64 bits. This word-level granularity might lead one tobelieve that it would be much more efficient to implementa block cipher using software running on a microprocessor.However, this is not the case. Newly proposed stream ci-phers such as SEAL and WAKE have been designed with asoftware implementation in mind, yielding encryption ratesof over 100 Mbps running on a 133 MHz Alpha proces-sor. In comparison, block cipher algorithms running onthe same processor achieved encryption rates of at most 27Mbps [43]. Hence stream ciphers can be much more effi-cient in applications where encryption is to be implementedin software running on an embedded microprocessor.

2.2.4. Summary of comparisonThe decision was made to utilize a stream cipher because

of its immunity to error propagation, increased flexibility,and greater software efficiency. However, there are nu-merous stream cipher algorithms and the question remainsas to which is best suited to a low power implementation.We explored two very different stream cipher constructions:one based on the theory of linear feedback shift registers,and the other based on the theory of quadratic residues.The two constructions represent very different methods –one features high hardware efficiency at the cost of reducedsecurity, while the other provides very high security at thecost of increased computational complexity.

Figure 5. n-bit linear feedback shift register.

3. Linear feedback shift register based stream ciphers

One popular class of stream ciphers is based on thetheory of Linear Feedback Shift Registers (LFSRs) [20].LFSRs consist of a shift register whose input is computedusing a linear recursion based on the current state of theshift register:

x1(t+ 1) =n∑i=1

cixi(t). (3)

Alternatively, the feedback expression can be thought ofas an nth degree feedback polynomial given by the expres-sion

f (x) = cnxn + cn−1x

n−1 + · · ·+ c2x2 + c1x

1 + 1

= 0, (4)

where ci and x are binary values. The above summation isperformed modulo-2 and can be implemented using simpleXORs, thus resulting in a very simple and efficient hard-ware implementation (figure 5).

An n bit LFSR can be in any of 2n states and can cy-cle through any of 2n − 1 states.1 Any LFSR that cyclesthrough all 2n − 1 non-zero states is said to be a maximallength LFSR and its output is called an m-sequence. TheLFSR’s feedback polynomial must be irreducible2 in orderto generate an m-sequence. There are a total of (2n− 1)/nprimitive polynomials for an n-bit LFSR.

One interesting property of a maximal length LFSR isthat its output is pseudo-random and exhibits a uniform dis-tribution (over sequences up to length n) in terms of thenumber of 1’s and 0’s, making it appear to be an excellentcandidate for use as a stream cipher keystream generator.Unfortunately LFSRs are completely insecure as any n-bitLFSR can be cracked after observing only 2n output bits us-ing the Berlekamp–Massey algorithm [29]. However, dueto their trivial hardware requirements and excellent outputstatistics, LFSRs are typically used as building blocks inmore complicated stream cipher systems. Such systems uti-lize multiple LFSRs that are irregularly clocked and whoseoutputs are combined using output functions to yield morecryptographically secure outputs.

1 The zero state will cause the LFSR to become stuck.2 An nth degree polynomial f (x) is irreducible if no polynomial of de-

gree k, 0 < k < n, divides f (x).


Numerous LFSR-based stream cipher designs have beenproposed and subsequently broken due to weakness in theirconstruction (e.g., [6,9,19,23]). As a result we decided toutilize existing LFSR-based stream cipher algorithms thatappear to be secure, to the limits discussed within the fol-lowing descriptions.

3.1. Shrinking and Self-Shrinking Generators

The Shrinking Generator (SG) was proposed by Copper-smith et. al. [13] as an efficient and secure pseudorandombit generator for use in stream cipher systems. The genera-tor is based on the idea of using a pseudo-random sequenceto variably decimate another.

The Shrinking Generator utilizes a selection LFSR(LFSRsel) to determine whether the output of the gener-ating LFSR (LFSRgen) is to be used within the keystream(figure 6). If the output of LFSRsel is a “1” then the outputof LFSRgen is added to the keystream, if not, then the out-put of LFSRgen is discarded. The output of the generatorrepresents a varying decimation of LFSRgen, as governedby LFSRsel.

One obvious shortcoming of the Shrinking Generator isthat it does not guarantee that a keystream bit will be out-put on any given clock cycle. Hence, the generator mustutilize a higher clock frequency and an output buffer to en-sure that there is always a keystream bit available. Kesslerand Krawczyk proposed a model [24] for determining therequired clock rate and buffer length in order to guaranteea certain miss probability. For large buffer sizes (e.g., 1024bits) and modest clock rate increases (e.g., 2.5×), the missprobability can be effectively forced to zero.

The best known practical attack on the Shrinking Gener-ator reduces its effective key size to lsel (2lsel if the feedbackpolynomials are programmable).

The Self-Shrinking Generator (SSG) [32] is a modifi-cation of the Shrinking Generator that was intended to re-duce the amount of hardware required by incorporating bothLFSRsel and LFSRgen into one LFSR (figure 7). On eachclock cycle, the most significant bit of the LFSR is exam-ined and if it is a “1” the second most significant bit of theLFSR is added to the keystream. The shift register is then

Figure 6. Shrinking Generator.

Figure 7. Self-Shrinking Generator.

clocked twice and the whole process is repeated with thetwo new most significant bits.

One obvious drawback of the Self-Shrinking Generatoris that the LFSR must be clocked at twice the rate of theShrinking Generator, which in turn is operating at somemultiple of the data rate. This higher clock rate will directlyaffect the power consumption of the algorithm.

Early cryptanalytic results in [32] demonstrate that theShrinking Generator has an effective key size of 0.375l,or 0.875l if the feedback polynomials are programmable.However, it must be emphasized that these are preliminaryresults – the algorithm is still too new to have had a thor-ough cryptanalytic analysis.

3.2. Alternating Stop and Go Generator

The Alternating Stop and Go Generator (ASG) [21] isan example of an LFSR-based stream cipher that utilizesclock-control and an output combining function to generatea more secure pseudo-random keystream sequence.

The Alternating Stop and Go Generator XORs the out-puts of two LFSRs (LFSR1 and LFSR2) whose clocks aregated by the outputs of a third LFSR (LFSR3). If the outputof LFSR3 is “1” then LFSR1 is clocked, otherwise LFSR2

is clocked (figure 8).The best known attack on the Alternating Stop and Go

Generator is a correlation attack on LFSR3 [21] that reducesthe effective key size to l3.

Figure 8. Alternating Stop & Go Generator.


Figure 9. GSM cipher (i.e., A5).

3.3. Proposed GSM cipher (i.e., A5)

The proposed stream cipher algorithm used in the GSMcellular network standard (i.e., A5) is somewhat of a mys-tery and still officially remains a secret. However, cryptog-raphers have speculated as to its design and have proposedthe construction described below [2] based on informationthat they have gleaned from various sources.

A5 utilizes three fixed-length LFSRs (LFSR19, LFSR22,and LFSR23) with fixed feedback polynomial connections(figure 9). The clock control function of the registers is de-rived by XORing the inverted carry-out bit from the sum-mation of the three LFSRs’ middle bits with that register’smiddle bit. This ensures that at least two of the three reg-isters are clocked on any given cycle.

Some early analysis of the proposed structure of A5 hasled to the conclusion that it is far from secure due to itsrelatively short LFSRs and the fact that the feedback con-nections are fixed. A simple attack has been proposed [2]that reduces the effective key length to just 41 bits3 byguessing the contents of LFSR19 and LFSR22, determiningthe contents of LFSR23 from an examination of the result-ing keystream, and then continuing on to determine if theguesses were in fact correct.

The security of A5 could be greatly improved by utiliz-ing longer LFSRs and programmable feedback polynomi-als.

3.4. Reducing the power consumption

The major source of power dissipation in modern digitalCMOS integrated circuits arises from the charging of loadcapacitances within the circuit and can be estimated by [12]

Pswitching = CswitchedV2ddf , (5)

where Cswitched is the capacitance switched per sample, Vddis the supply voltage, and f is the sampling frequency.

3 Some sources quote the effective key length on the order of 50 bits dueto the setup overhead outlined by the GSM standard.

Figure 10. Propagation delay vs. supply voltage.

3.4.1. Conventional supply scalingAs noted in equation (5), the power dissipated in a digital

CMOS circuit varies with the square of the supply voltage.Hence, it is desirable to reduce the supply voltage to thelowest possible level. Unfortunately, propagation delays inCMOS integrated circuits increase as the supply voltage islowered according to the relation [12]

Tdelay =kVdd

(Vdd − Vt)2, (6)

where k is a process dependent constant, Vdd is the supplyvoltage, and Vt is the threshold of the process. Figure 10shows the propagation delay vs. supply voltage character-istic for a single stage of a ring-oscillator fabricated using a0.6 µm CMOS process. This dependance limits the amountthat the supply voltage can be reduced as the circuit mustsatisfy its throughput requirements. Hence the supply volt-age is reduced until the critical path of the circuit is thesame as the clock period required for the given through-put.

The critical path of LFSR-based stream ciphers is lim-ited by either the XOR summation tree, the clock controlcircuitry, or the output combining function. Our experiencehas shown that these functions have very short delays and,at clock frequencies up to 10 MHz, will be able to operateat supply voltages down to 1.1 V.

3.4.2. Parallelizing the LFSRsThe shift registers used to construct the LFSRs can be

parallelized as shown in figure 11. By N -way parallelizing

Figure 11. 2-way parallel shift register.


Figure 12. Degree of parallelism vs. normalized power.

Figure 13. 2-way parallel LFSR.

the shift registers we are able to reduce the clock rate bya factor of N , which corresponds to a linear reduction inthe power consumption. The costs of parallelizing the shiftregisters are the additional multiplexors and routing, as wellas the control circuitry required to generate multiple clockphases for each of the parallel registers. The multiplexor,routing, and clock generation circuitry overhead is whatultimately determines the optimal degree of parallellization[5, figure 12].

Parallelizing LFSRs is not quite so straightforward. Thefeedback tap positions must change their relative positionswith time as the location of bit i within the LFSR changesfrom cycle to cycle. In general, for an N -way parallelizedLFSR, the location of bit i can be any of N positions.The additional multiplexing required for the feedback tapsincreases the overhead dramatically (figure 13), making theoptimum parallellization quite different from that of thesimple shift register. Analysis using 520-bit matched filterswith 32 feedback taps showed that the optimal degree ofparallelism is only 2 [5].

Table 1Layout/power statistics for stream cipher test chip.

Design Transistor Area Powercount (mm2) (Vdd = 1.5 V)

Shrinking Generator 7,998 0.98 65 µWSelf-Shrinking Generator 4,167 0.52 108 µWAlternating Stop & Go Generator 11,923 1.48 33 µWGSM cipher (A5) 3,822 0.50 17 µW

3.5. Hardware implementation and test results

The four LFSR-based stream ciphers were implementedusing a 0.6 µm CMOS process (figure 14). All of the de-signs are quite similar, consisting of multiple LFSRs withalgorithm-specific output combiner and clock-control cir-cuitry, as described in sections 3.1–3.3. Three of the fourdesigns feature variable length LFSRs that can range from7 to 65 bits in length, and all feature fully-programmablefeedback functions.

The entire test chip contained a total of 27,910 devices.The active circuitry of the ciphers occupied a total area of3.5 mm2, which could be significantly reduced by a morecareful layout. The physical statistics tics of the individualciphers are given in table 1.

Initial testing has verified correct operation at supplyvoltages down to 1.1 V. Power consumption measurements(table 1) have shown that the power consumption of theLFSR-based stream ciphers at a data rate of 1 Mbps is onthe order of tens of microwatts.

4. Quadratic Residue Generator stream cipher

The Quadratic Residue Generator stream cipher (QRG)is based on the Blum–Blum–Shub generator (BBS) [7]which performs repeated modular squarings:

xi = x2i−1 mod n. (7)

The least significant log2(log2(xi)) bits of each result areused as the keystream. n is the modulus used and is theproduct of two prime numbers, p and q, that satisfy therelation p mod 4 = q mod 4 = 3. The initial seed to thegenerator, x0, must be relatively prime to the modulus n.

4.1. Security of the QRG

Unlike the LFSR-based stream ciphers discussed previ-ously, the QRG generates a pseudo-random sequence thatis cryptographically secure in the sense that even given allprevious outputs of the QRG, the attacker has no betterthan a 50% chance of guessing the next output bit assum-ing that they did not know how to factor the modulus n.The security thus relies on the assumption that factoringcomposite numbers is computationally infeasible for large(e.g., > 512-bit) numbers; the same assumption that theRSA public-key algorithm [42] is based upon.

There are numerous algorithms for factoring large inte-gers (e.g., the Number Field Sieve [26] and the Quadratic


Figure 14. LFSR-based stream cipher test chip die photo.

Table 2Algorithm parameters for several factoring algorithms.

Algorithm a ν

General Number Field Sieve (64/9)1/3 1/3Special Number Field Sieve (32/9)1/3 1/3Quadratic Sieve 1 1/2

Sieve [38]), the running time of which all have the generalform

L(N , ν, a+ o(1)

)= e(a+o(1))(lnN )ν (ln lnN )1−ν

, (8)

where a and ν are algorithm-specific constants (e.g., ta-ble 2) and N is the n-bit integer to be factored. Theo(1) term is difficult to calculate and is eliminated fromthe analysis by approximating the time required to factorN given the known time to factor some integer M usingthe expression

TN = TML(N , ν, a)L(M , ν, a)

. (9)

Practice has shown this to be a reasonable approximation[35]. Currently the best algorithm for factoring large inte-gers is the General Number Field Sieve (GNFS). A variantof the GNFS is the Special Number Field Sieve (SNFS)

Figure 15. Amount of computation required for factoring large integers.

that can be used with integers of the form (ak + b) wherea and b are small (typically a = 2, b = 1). Using the cur-rent version of the GNFS, it is predicted that it will require3× 104 MIPS-years4 to factor a 512-bit number. Figure 15shows how the amount of computation scales with moduluslength for the GNFS, SNFS, and Quadratic Sieve (QS).

4 A MIPS-year is the number of computations performed by a 1 MIPScomputer operating non-stop for a year (∼ 3× 1013 computations).


4.2. Modular multiplication algorithm

The price for such security is the complexity of the hard-ware required to perform the large bit-width modular squar-ing operation. The performance of the QRG depends en-tirely on the ability to quickly, and efficiently compute thisresult.

High speed multiplication is for the most part a simpletask – there are many very efficient algorithms available toa hardware designer (e.g., [28,36,53]). However, modularmultiplication is considerably more difficult and as a result,care must be taken to select and/or develop an efficientalgorithm that best suits the design requirements.

Modular multiplication can be performed in either of twoways: perform an n× n bit multiplication and then reducethe result utilizing an 2n × n bit division, or perform ann× n bit multiplication with concurrent partial reductions.Unfortunately the first approach has numerous inefficien-cies: the intermediate result requires a 2n bit register, then × n bit multiplication requires a time consuming 2n bitcarry propagate addition, and the division circuit must becapable of handling a 2n bit operand. Using a simple ar-ray multiplier coupled with an iterative divider will require2n + n2 operations, or approximately 0.25 ms per 512-bit modular multiplication assuming a typical gate delay is1 ns. The end result is a slow and inefficient algorithmthat generates a keystream at just 32 kbps. The second ap-proach requires very little additional intermediate storage(e.g., two additional digits [50]), and is currently the al-gorithm of choice for performing modular multiplication.There are many existing algorithms in the literature (e.g.,[33,37,50]) that utilize this concurrent approach to createvery efficient implementations.

The concurrent algorithms are iterative in nature, requir-ing approximately n/ log2 r iterations to perform an n bitmodular multiplication, where r is the radix of the mul-tiplication. Hence, one way to speed up the algorithm isto utilize a higher-radix. The costs of using a higher-radixare the additional storage requirements, and increased cycletimes due to operand recoding. Another method of speedingup the multiplication is to utilize a redundant number rep-resentation to eliminate time-consuming carry-propagationchains. Eliminating carry-propagation has the added ben-efit of minimizing the glitching effects that occur whena carry pulse is propagated through a long addition chain.Glitching causes energy-consuming transitions that can thenpropagate throughout the circuit, possibly causing a signifi-cant increase in the power consumption. However, the costof using a redundant number representation is that at somepoint the result must be converted into a non-redundantbinary form, a step that will require a carry-propagatingaddition. Hence, care must be taken to choose an algo-rithm where this conversion is not in the critical path of themultiplier.

Takagi’s modular multiplication algorithm [50] was se-lected as it utilizes both a higher radix and a redundantnumber representation, without incurring most of the costs

Figure 16. Takagi’s algorithm.

of either optimization. The operand recoding can be donein parallel with the next iteration of the loop, and both thealgorithm’s inputs and outputs utilize the same redundantrepresentation so the conversion to a non-redundant binaryform can be done offline, in parallel with the next multipli-cation.

Takagi’s algorithm operates by performing both an n-bitmultiplication and modular reduction (i.e., division) concur-rently. On each iteration, the operands are scanned fromMSB to LSB to form an intermediate result whose mostsignificant digits are then scanned to form a quotient esti-mate that is used to partially reduce the intermediate result.The use of a quotient estimate requires that the final resultundergo one additional iteration to account for any errorsintroduced by the approximations in the estimation process.The algorithm requires approximately n/2 iterations, eachof which is performed within a single clock cycle. The for-mal description of Takagi’s algorithm is given in figure 16.

Takagi’s algorithm is ideally suited to a VLSI imple-mentation due to its regular cellular array structure, whichcan be implemented in a bitslice structure (figure 17). Oneparticularly nice feature of the architecture is that the logicdepth, and hence the cycle time, of the multiplier is in-dependent of the length of the modulus due to the use ofredundant number representations that eliminate long carrychains. The use of redundant number representations re-quires that all interconnections between the bitslices be lo-cal which eliminates the need for large global busses andtheir accompanying drivers. The elimination of long carrychains and global bussing allows for short cycle times (onthe order of 20–30 ns) and hence fast modular multiplica-tions, which is exactly what is required for implementingthe QRG. Another nice feature of Takagi’s algorithm is thatit is inherently scalable. To perform an m < n bit multipli-cation using an n-bit implementation of Takagi’s multiplier,all operands are aligned along the most significant digit andthe algorithm is iterated for fewer cycles (i.e., m/2 + 1 in-


Figure 17. Architecture of Takagi’s multiplier.

stead of n/2+1). The unused least significant bitslices canbe disabled in order to reduce the power consumption ofthe multiplier.

4.3. Reducing the power consumption

The power consumption of the QRG-based cipher is pro-hibitively high due to the large word widths (∼ 512 bits).The use of redundant number representations amplifies thisproblem by requiring multiple bits to represent each digitof the operands (e.g., the digit set {−1, 0, 1} requires 2 bitsper digit), all of which must be registered on the chip. Thealgorithm used requires numerous iterations as well. All ofthese factors add up to give a very large power dissipation,estimated using EPIC’s Powermill power estimation tool[39] to be on the order of 938 mW at a supply voltage of5 V and a data rate of 1 Mbps! The following techniquescan be utilized to minimize the power consumption of theQRG, in an effort to make it a feasible part of a low powerencryption system.

4.3.1. Concurrency driven voltage scalingThe critical path of the QRG (figure 18) was extracted

and simulated using Hspice. Our simulations showed thatthe supply voltage must be kept at its maximum value of5 V in order to operate at the required clock rate of 32 MHzthat is required to meet the 1 Mbps throughput requirement.

Figure 18. Critical path of Takagi’s multiplier.

A common technique for reducing the supply voltage,and hence the power dissipation, involves parallelizing thecomputation to increase the effective cycle time of the cir-cuitry for a fixed throughput. By increasing the effectivecycle time the circuit delays can be increased and the sup-ply voltage can be reduced. The cost of parallelizing thecomputation is the area overhead in creating parallel imple-mentations of the circuitry and the increase in the switchedcapacitance due to the multiple circuits and additional mul-tiplexing and routing that is required. However, the ad-ditional reduction in supply voltage will result in loweroverall power consumption [12].

Investigation of Takagi’s algorithm revealed that onlythe computation of the next Yj value can be performed inparallel with the current iteration of the loop, resulting ina slight reduction in the critical path and hence a smallreduction in the supply voltage.

Another technique for reducing the supply voltage is topipeline the computation by partitioning it into n steps, eachof which needs to be completed in the original clock period.Thus delays can be increased by a factor of n and hencethe supply voltage can be decreased by a factor of n. Thecosts of pipelining are the additional pipeline registers thatmust be added to the circuit and the increased latency of the


Figure 19. Activity factor per frame.

pipeline (Tcalc = nTpd). Unfortunately, the iterative natureof Takagi’s algorithm, coupled with the dependency of eachiteration on the results of the one before it, precludes theuse of pipelining as a power reduction technique.

4.3.2. Exploiting signal statisticsIn many applications, the average data rate is often less

than the peak rate. For example, in a video camera sys-tem utilizing video compression, a majority of the timethe throughput at the output of the video compression unitis much lower (figure 19) due to the high correlation be-tween video frames and the differential encoding of thevideo stream. Given a circuit with a maximum throughputof N bps and a current required throughput of n bps, theactivity factor α can be defined as

α =n

N, (10)

where α represents the fraction of the sample time that thecircuit must operate at full speed (i.e., N bps) in order toprocess the current required load. At a fixed supply, thecircuit operates for α fraction of the time and is powereddown (1 − α) fraction of the time. Hence, the averagepower is

Pavg = αCbitV2fsample, (11)

where α is the average value of α, Cbit is the physicalcapacitance switched to generate one keystream bit, V isthe supply voltage, and fsample is the keystream bit rate.Hence, the power consumption can be reduced linearly bypowering down the circuit.

4.3.3. Variable supply voltageFurther power reduction for the QRG can be achieved by

utilizing a variable power supply controller that can moni-tor the workload of the encryption module and reduce thesupply voltage adaptively to the minimum required valueon a frame by frame basis. The basic idea is to lower volt-age when the activity is less than peak rather than work-ing at a fixed supply and idling for a fraction of the time.By reducing the supply voltage, the power consumption

Figure 20. Supply voltage per frame.

Figure 21. Normalized power consumption per frame.

can be reduced significantly more than when idling is useddue to the quadratic dependency of power on supply volt-age.

Note that when the supply voltage is reduced the circuitdelays increase (figure 10), so the clock frequency mustbe reduced as well. One way to scale the clock propor-tionally with the voltage supply is to use a ring-oscillatorconnected to the varying supply voltage as the clock. Theclock period will then track the supply voltage as the delaysof the inverters in the ring oscillator will vary with the sup-ply voltage. Figure 20 shows the resulting supply voltagerequired per frame for the compressed video sequence offigure 19. A preliminary version of this power supply isdescribed in [22].

The above technique was applied to three compressedvideo streams. Figure 21 shows the resulting power con-sumption for sequence #1. The power consumption ofthe fixed supply scheme is shown for reference. Table 3shows the results of utilizing this method for all three videostreams.


Table 3Power reduction factor using a variable supply relative to a fixed supply

scheme.

Sequence # Pfixed-supply/Pvariable-supply

1 7.142 9.043 7.00

4.3.4. Averaging the workloadStill another improvement can be made by introducing

buffering at the encryption module input and averaging theworkload over multiple frames. It can be shown that byaveraging the workload the average power consumption isreduced, based on the argument that the power consump-tion vs. supply voltage curve is convex and hence satisfiesJensen’s inequality [14] (i.e., E[f (X)] > f (E[X])).

Averaging over multiple samples will increase latency,but for bursty data patterns, such as differentially encodedvideo signals with a frequent initialization frame (figure 22),it will result in lower average power consumption. Fig-ure 23 shows the normalized power consumption for abursty 128 frame compressed video sequence utilizing theaforementioned variable voltage supply technique. Fig-ure 24 shows the normalized power consumption per frame

Figure 22. Activity factor per frame (bursty data).

Figure 23. Power consumption per frame (bursty data).

for varying averaging intervals. Figure 25 shows the nor-malized power consumption as a function of the number ofsamples that are averaged for this video stream.

There is an inherent trade off between the variable volt-age supply and averaging techniques. Variable voltage sup-ply techniques favour data streams that are highly correlated(i.e., when the data rate is low with infrequent peaks suchas in the aforementioned compressed video sequences),whereas averaging techniques favour bursty data streams(i.e., data rate has frequent peaks such as in figure 22).Averaging techniques don’t provide a significant amountof power reduction for non-bursty data streams (table 4),while variable supply voltage techniques don’t provide asnoticeable an effect for bursty data streams (e.g., a reduc-

Figure 24. Power consumption per frame for varying sample sizes.

Figure 25. Power reduction factor for varying sample sizes relative to afixed supply scheme.

Table 4Power reduction factors using averaging relative to a fixed supply scheme.

Sequence # Number of frames per sample

1 2 4 8 16

1 7.14 7.65 7.86 8.06 8.362 9.05 9.88 10.09 10.17 10.383 7.00 7.39 7.53 7.66 7.75


Table 5Power reduction factor of complimentary scheme relative to a fixed supply

scheme.

Sequence # Pfixed/Pvar&average

(averaged over 16 samples)

1 8.362 10.383 7.754 7.91

tion of 3.36 for the bursty data stream given in this section).However, when the two techniques are combined the powerreduction factor approaches an order of magnitude for bothtypes of data streams (table 5). Hence the two techniquescan be used in a complimentary fashion to provide signif-icant power reduction across a wider range of data streampatterns.

4.4. Hardware implementation and test results

The scalable nature of Takagi’s algorithm was exploitedto yield power consumption estimates of a full-scale 512-bit implementation from an 8-bit test implementation thatwas implemented in a 0.6 µm CMOS process (figure 26).

The registers, control logic, and digit selection circuitrywere implemented using standard cell techniques. The bit-slice, which will make up the majority of the full-scalemultiplier chip, was implemented using a full-custom lay-out in order to minimize the area. The physical statisticsof the design are given in table 6.

In order to satisfy the throughput requirements of the1 Mbps data rate, the multiplier must operate at a clockfrequency of 32 MHz. Preliminary test results have shown

Table 6Layout/power statistics for modular multiplier test chip.

Design # Device count Area Power (Vdd = 4 V)

Bitslice 246 170× 90 µm –8-bit multiplier 5963 5.6 mm2 539 µW512-bit multiplier ∼200,000 25 mm2 514 mW

that the multiplier can operate at the required clock rate ata supply voltage of 4 V. The power consumption of thetest chip operating at a data rate of 1 Mbps is 539 µW.Using the scalable nature of Takagi’s algorithm the powerconsumption of a full scale implementation is estimated tobe 514 mW, operating at a clock frequency of 32 MHz andsupply voltage of 4 V. Utilizing the power reduction tech-niques outlined within Section 4.3.2 the power reductioncan be reduced by an order of magnitude to approximately50 mW.

5. Scalability

Many transmitted data streams have a structured formatcontaining both high priority and low priority information.For example, consider a differential video encoding schemein which the initial frame is transmitted uncompressed, andthen a sequence of difference frames are transmitted. In thisexample the initial frame would be labeled with a higherpriority than the difference frames as without it the dif-ference frames yield little useful information, whereas thedifferential frames could be approximated using interpo-lation. The system designer can utilize their knowledgeof the data stream’s structure to dynamically reconfigurethe encryption module to provide varying levels of secu-

Figure 26. Modular multiplier test chip die photo.


Figure 27. Power consumption for varying multiplier widths.

Figure 28. MIPS-years/Watt for varying multiplier widths.

rity for the data stream based on the priorities of the databeing transmitted. This is similar to the idea of priorityencoding (e.g., [14]) that is used to allocate additional er-ror recovery coding for portions of the data stream that aredeemed important (i.e., high priority), and reduced errorcorrection coding for the lower priority portions of the datastream.

The security of the QRG scales exponentially with thesize of the modulus, and hence the width of the modu-lar multiplier, which can be dynamically scaled to meetthe varying security requirements of the data stream. Thepower consumption of the multiplier varies approximatelywith the fourth power of the multiplier width (figure 27),while the security scales exponentially with the multiplierwidth (figure 15). Hence, for relatively small increasesin the power consumption, it is possible to dramatically in-crease the amount of security of the QRG. Figure 28 showsthe amount of computation required to factor the modulusper watt expended in the modular multiplier for varyingmodulus sizes. It illustrates the large increase in securitythat is possible for relatively small increases in power con-sumption.

The security of the LFSR-based stream ciphers scalesexponentially with the lengths of the LFSRs used. The

power consumption however scales linearly with length ofthe LFSRs, and hence small increases in the power con-sumption can lead to large increases in security. ScalableLFSR-based stream ciphers can easily be constructed usingvariable length LFSRs as basic building blocks.

6. A low power hybrid approach

Even utilizing the power reduction techniques developedearlier, the power consumption of the QRG is still on theorder of 50 mW. While this may be low enough for someapplications, it is prohibitive for others such as the afore-mentioned wireless camera where the entire system powerbudget is on the order of 50 mW. Hence the QRG shouldnot be used as a keystream generator in such applications.However, as previously stated, it’s cryptographically secureoutput would serve as an excellent pseudo-random seedgenerator. If used in such a role the generator could oper-ate in the background at a much lower data rate and supplyvoltage. Given the reduced supply voltage and frequency,the power consumption of the hybrid system can be esti-mated using

Phybrid = PQRG

(Vhybrid

VQRG

)2(fhybrid

fQRG

)αhybrid

= 514 mW

(Vhybrid

4

)2(fhybrid

1× 106

)αhybrid, (12)

where we have utilized the measured values of VQRG andfQRG.

As an example, consider a 1 Mbps keystream that isreinitialized every frame (i.e., 1/30 s), and that requires ap-proximately 100 bits per initialization. The seed generatordata rate is 100/(1/30) = 3 kbps, corresponding to an al-lowable delay increase by a factor of 1 Mbps/3 kbps = 333.At a supply voltage of 1 V (Vhybrid = 1 V) circuit delaysincrease by a factor of 33.33 so the clock frequency canbe brought down to 32 MHz/33.33 = 960 kHz. At a 960kHz clock frequency the QRG will output 960 kHz/256(cycles/multiply) × 8 (bits/multiply) = 30 kbps (fhybrid =0.03 MHz). Thus the generator need only operate 1/10 ofthe time (αhybrid = 0.1) and can be turned off a majorityof the time. This yields an estimated power consumptionfor the hybrid system of 96 µW. In general, the power con-sumption of the hybrid scheme can be estimated using

Phybrid = 514 mW

(14

)2(30× 103

1× 106

)(1

10

)= 96 µW. (13)

The final proposed system (figure 29) utilizes this lowpower hybrid approach. A simple, low-power LFSR-basedstream cipher to generate the output keystream, which isdecoupled from the data stream through an output buffer,and the more complex, and power intensive, QRG as theseed generator. At the time of re-synchronization the LFSR-based cipher is re-initialized with the pseudo-random bits


Figure 29. Proposed low power encryption scheme.

in the seed buffer, and the feedback polynomial registersare loaded with values from a low-power polynomial ROMthat are selected using several pseudo-random bits from theseed buffer. The estimated power consumption of the entiresystem is on the order of a 150 µW.

The proposed hybrid system also features the additionalbenefit that it increases the amount of work that is requiredby an attacker to crack the system. This results from the factthat the attacker is forced to restart their attack after everyinitialization period because of the pseudo-randomness ofthe seeds generated by the QRG. In addition, the constantreinitialization also serves to minimize the amount of in-formation that is exposed for any given successful attack –just a single frame of video in the above example. If theframes are differentially encoded then it follows that the at-tacker must repeatedly crack the system until they succeedin uncovering an initial frame from which they can thendecode the differential video stream. Hence the system isvery computationally expensive to attack.

7. Conclusions

LFSR-based stream ciphers are ideally suited to low-power wireless communications as they can be constructedfrom very simple and power-efficient hardware, and they donot suffer from the effects of error-propagation like blockciphers do.

Several techniques for reducing the power consumptionof the encryption module have been proposed. A com-bination of varying the supply voltage and averaging theworkload yielded an order of magnitude reduction in powerconsumption for the experiments conducted. The conceptof scalable encryption was introduced to allow varying lev-els of encryption for data streams with varying priorities.In addition, a hybrid scheme was proposed in which thecomplex, but secure, QRG is used to operate in the back-ground at very low speeds and voltages to generate pseudo-random seeds for a faster, more power-efficient LFSR-basedcipher. The estimated power consumption of this hybridscheme is approximately 150 µW at a peak data rate of1 Mbps.

Acknowledgements

The authors would like to thank Ingrid Verbauwhede forproviding insightful comments at the outset of this project.

The work described within this paper is funded byDARPA contract DAAL-01-95-K3526.

References

[1] A. Albanese, J. Blomer, J. Edmonds, M. Luby and M. Sudan, Priorityencoding transmission, in: Proceedings of 35th FOCS (1994) pp.604–612.

[2] R.J. Anderson, Posting to sci.crypt USENET newsgroup (June 17,1994).

[3] R.J. Anderson, On Fibonacci keystream generators, in: Fast Soft-ware Encryption – Second International Workshop (1995) pp. 346–352.

[4] A. Aziz and W. Diffie, Privacy and authentication for wireless localarea networks, IEEE Personal Communications (First Quarter 1994)25–31.

[5] T. Barber, BodyLAN: A low power communications system, S.M.Thesis, Massachusetts Institute of Technology (1996).

[6] T. Beth and F.C. Piper, The stop-and-go generator, in: Advances inCryptology: Proceedings of EUROCRYPT ’84 (1984) pp. 88–92.

[7] L. Blum, M. Blum and M. Shub, A simple unpredictable pseudo-random number generator, SIAM Journal on Computing 15 (1986)364–383.

[8] D. Brown, Techniques for privacy and authentication in personalcommunication systems, IEEE Personal Communications (August1995) 6–10.

[9] J.O. Bruer, On pseudo random sequences as crypto generators, in:Proceedings of the International Zurich Seminar on Digital Commu-nication (1984).

[10] Cellular Telecommunications Industry Association, National phonefraud expert testifies in favour of Maryland State Police proposal,CTIA Press Release (January 31, 1996).

[11] A.P. Chandrakasan, Low Power Digital CMOS Design (Kluwer Aca-demic Publishers, Boston, 1995).

[12] A.P. Chandrakasan, S. Sheng and R.W. Brodersen, Low-powerCMOS digital design, IEEE Journal of Solid-State Circuits 27 (April1992) 473–484.

[13] D. Coppersmith, H. Krawczyk and Y. Mansour, The shrinking gener-ator, in: Advances in Cryptology – CRYPTO ’93 Proceedings (1994)pp. 22–39.

[14] T.M. Cover and J.A. Thomas, Elements of Information Theory (Wi-ley, New York, 1991) p. 25.

[15] W. Diffie and M. Hellman, New directions in cryptography, IEEETransactions on Information Theory 22 (November 1976) 644–654.

[16] W. Diffie, P.C. van Oorschot and M.J. Wiener, Authentication andauthenticated key exchanges, Designs, Codes and Cryptography 2(1992) 107–125.

[17] J. Eckhouse, Hackers hurt cellular industry, San Francisco Chronicle(January 25, 1993) C1.

[18] European Telecommunications Standards Institute, Security aspects,Recommendation GSM 02.09.

[19] P.R. Geffe, How to protect data with ciphers that are really hard tobreak, Electronics 46 (January 1973) 99–101.

[20] S.W. Golomb, Shift Register Sequences (Holden-Day, San Francisco,1967).

[21] C.G. Gunther, Alternating step sequences controlled by de Bruijnsequences, in: Advances in Cryptology – EUROCRYPT ’87 Pro-ceedings (1988) pp. 5–14.

[22] V. Gutnik and A.P. Chandrakasan, An efficient controller for variablesupply-voltage low power processing, in: 1996 Symposium on VLSICircuits. Digest of Technical Papers (1996).


[23] S.M. Jennings, Multiplexed sequences: some properties of the min-imum polynomial, in: Cryptography: Proceedings of the Workshopon Cryptography (1983) pp. 189–206.

[24] I. Kessler, Minimum buffer length and clock rate for the shrinkinggenerator cryptosystem, IBM Research Report RC 19938 (88322)(February 17, 1995).

[25] X. Lai, On the Design and Security of Block Ciphers, ETH Series inInformation Processing 1 (Konstanz: Hartung-Gorre Verlag, 1992).

[26] A.K. Lenstra, H.W. Lenstra Jr, M.S. Manasse and J.M. Pollard, Thenumber field sieve, in: Proceedings of the Twenty Second AnnualACM Symposium on Theory of Computing (1990) pp. 564–572.

[27] H. Lin and L. Harn, Authentication in wireless communications, in:Proceedings of GLOBECOM ’93 (1993) pp. 550–554.

[28] H. Makino et al., An 8.8 ns 54*54 bit multiplier with high speedredundant binary architecture IEEE Journal of Solid-State Circuits31 (June 1996) 773–783.

[29] J.L. Massey, Shift-register synthesis and BCH decoding, IEEE Trans-actions on Information Theory 15 (January 1969) 122–127.

[30] J.L. Massey, An introduction to contemporary cryptology, Proceed-ings of the IEEE 76 (May 1988) 533–549.

[31] J.L. Massey, SAFER K-64: A byte-oriented block-ciphering algo-rithm, in: Fast Software Encryption, Cambridge Security WorkshopProceedings (1994) pp. 1–17.

[32] W. Meier and O. Staffelbach, The self-shrinking generator, in: Ad-vances in Cryptology – EUROCRYPT ’94 Proceedings (1995) pp.205–214.

[33] H. Morita, A fast modular-multiplication algorithm based on a higherradix, in: Advances in Cryptology – CRYPTO ’89 Proceedings(1990) pp. 387–399.

[34] National Institute of Standards and Technology, Data EncryptionStandard (NIST FIPS PUB 46-), U.S. Department of Commerce(December 1993).

[35] A.M. Odlyzko, The future of integer factorization, CryptoBytes,RSA Laboratories 1 (Summer 1995) 5–12.

[36] N. Ohkubo et al., A 4.4 ns CMOS 54*54-b multiplier using pass-transistor multiplexers, in: Proceedings of IEEE Custom IntegratedCircuits Conference – CICC ’94 (1994) pp. 599–602.

[37] H. Orup and P. Kornerup, A high-radix hardware algorithm for cal-culating the exponential ME modulo N , in: Proc. 10th IEEE Sym-posium on Computer Arithmetic (1991) pp. 51–57.

[38] C. Pomerance, The quadratic sieve factoring algorithm, in: Advancesin Cryptology – Proceedings of EUROCRYPT ’84 (1985) pp. 169–182.

[39] PowerMill User Manual Release 3.4 (EPIC Design Technologies,Inc., 1995).

[40] J.M. Rabaey and M. Pedram, eds., Low Power Design Methodologies(Kluwer Academic Publishers, Boston, 1996).

[41] R.L. Rivest, The RC5 encryption algorithm, in: Fast Software En-cryption – Second International Workshop (1995) pp. 86–96.

[42] R.L. Rivest, A. Shamir and L. M. Adleman, A method for obtainingdigital signatures and public-key cryptosystems Communications ofthe ACM 21 (February 1979) 120–126.

[43] M. Roe, Performance of block ciphers and hash functions – one yearlater, in: Fast Software Encryption – Second International Workshop(1994) pp. 359–362.

[44] P. Rogaway and D. Coppersmith, A software optimized encryptionalgorithm, in: Fast Software Encryption, Cambridge Security Work-shop Proceedings (1994) pp. 56–63.

[45] R.A. Rueppel, Analysis and Design of Stream Ciphers (Springer,1986).

[46] B. Schneier, Description of a new variable-length key, 64-bit block

cipher (Blowfish), in: Fast Software Encryption, Cambridge SecurityWorkshop Proceedings (1994) pp. 191–204.

[47] B. Schneier, The GOST encryption algorithm, Dr. Dobb’s Journal20 (January 1995) 123–124.

[48] B. Schneier, Applied Cryptography (Wiley, New York, 2nd ed.,1996).

[49] D. Stinson, Cryptography: Theory and Practice (CRC Press, BocaRaton, 1995).

[50] N. Takagi, A radix-4 modular multiplication hardware algorithm formodular exponentiation, IEEE Transactions on Computers 41 (Au-gust 1992) 949–956.

[51] D. Wheeler, A bulk data encryption algorithm, in: Fast SoftwareEncryption, Cambridge Security Workshop Proceedings (1994) pp.56–63.

[52] J.E. Wilkes, Privacy and authentication needs of PCS, IEEE PersonalCommunications (August 1995) 11–15.

[53] R.K. Yu and G.B. Zyner, 167 MHz radix-4 floating point multiplier,in: Proceedings of the 12th Symposium on Computer Arithmetic(1995) pp. 149–154.

James Goodman received the B.A.Sc. degree inelectrical engineering from the University of Wa-terloo, Waterloo, Canada, in 1994. He received hisM.S. degree in electrical engineering and computerscience from the Massachusetts Institute of Tech-nology, Cambridge, in 1996, where he is currentlypursuing his Ph.D. His research interests includelow power implementation of cryptographic algo-rithms and protocols for wireless systems, and lowpower asynchronous design. He has held a variety

of industrial positions both as a student and full-time engineer at com-panies such as Bell-Northern Research Ltd., CAE Electronics Ltd., andDY-4 Electronics Inc., working on a wide variety of projects ranging fromvirtual reality hardware engines to real-time CASE tools.E-mail: [email protected]

Anantha P. Chandrakasan received the B.S,M.S. and Ph.D. degrees in electrical engineeringand computer sciences from the University of Cal-ifornia, Berkeley, in 1989, 1990 and 1994, re-spectively. Since September 1994, he has beenthe Analog Devices career development assistantprofessor of Electrical Engineering at the Massa-chusetts Institute of Technology, Cambridge. Hereceived the NSF Career Development Award in1995, the IBM Faculty Development Award in

1995 and the National Semiconductor Faculty Development Award in1996. He received the IEEE Communications Society 1993 Best TutorialPaper Award for the IEEE Communications Magazine paper entitled, “APortable Multimedia Terminal”. His research interests include the ultralow power implementation of custom and programmable digital signalprocessors, wireless sensors and multimedia devices, emerging technolo-gies, and CAD tools for VLSI. He is a co-author of the book entitled “LowPower Digital CMOS Design”, published by Kluwer Academic Publishers.He has served on the technical program committee of various conferencesincluding ISSCC, DAC, ISLPED, and ICCD. He is the technical programco-chair for the 1997 International Symposium on Low-power Electronicsand Design.

Low power scalable encryption for wireless systems

Documents

Transcript of Low power scalable encryption for wireless systems