Fiber-optic array algebraic processing architectures

13

Click here to load reader

Transcript of Fiber-optic array algebraic processing architectures

Page 1: Fiber-optic array algebraic processing architectures

Fiber-optic arrayalgebraic processing architectures

Nam Q. Ngo and Le Nguyen Binh

A high-accuracy fiber-optic array processor 1FOAP2 based on the algorithm of digital multiplication byanalog convolution is proposed. The FOAP architecture is a local regularly interconnected processorthat utilizes an array of identical all-optical elemental-processing lattice units, namely, an optical splitter,an optical combiner, and a binary programmable fiber-optic transversal filter. Various FOAP matrixmultipliers are proposed for nonnegative and twos-complement binary arithmetic matrix–vector,matrix–matrix, triple-matrix, and high-order matrix operations. The overall performances of the FOAPmatrix multipliers are compared with the time-integrating and space-integrating architectures and withthe digital multipliers. Extension of the digital-multiplication-by-analog-convolution algorithm is alsoconsidered.

Key words: Optical computing, optical information processing, fiber-optic array processing, discreteconvolution, twos-complement binary arithmetic, fiber-optic transversal filters, code generation.

1. Introduction

Optical computing has drawn great attention due tothe inherently ultrafast and parallel-processing capa-bility of many optical processors.1 Multiplication isone of the most important mathematical operationsfor image processing, signal processing, and comput-ing tasks. Incoherent optical analog processors havebeen proposed to perform systolic matrix–vector mul-tiplication with fiber-optic 1FO2 scattering processors2and vector–matrix and matrix–matrix products withelectro-optical engagement-array architecture3 andacousto-optic 1AO2 convolvers.4–6 However, the draw-backs of these inherently analog processors are theirlimitations to operating in the low-accuracy 18–10bits2 range, although they often require less hardwareimplementations than their optical digital counter-parts.Fortunately, this problem was overcome when

Whitehouse and Speiser7 demonstrated to the optical-signal-processing community that the multiplicationof two digital numbers 1originally known as theSwartzlandermultiplier82 is equivalent to the convolu-tion of their binary representations, provided that

The authors are with the Department of Electrical and ComputerSystems Engineering, Monash University, Clayton, Victoria 3168,Australia.Received 5 October 1993; revised manuscript received 6 August

1994.0003-6935@95@050803-13$06.00@0.

r 1995 Optical Society of America.

the analog result of the digital product is representedin the mixed-binary format. This digital-multiplica-tion-by-analog convolution is since known as theDMAC algorithm in the literature. An analog-to-digital converter 1ADC2 and a shift-and-add 1S@A2circuit are required in the postprocessing unit inorder to achieve the standard binary representationof the digital product. This significant finding wasextensively pursued and first applied to optics byPsaltis et al.,9 who used a space-integrating 1SI2 AOconvolver architecture. This basic idea has sincebeen exploited and applied to matrix–vector andmatrix–matrix operations with various optical archi-tectures.4–6,10–21 Most of these optical processors havebeen developed to perform on nonnegative matrixoperations. However, complex matrix computationcan be achieved by partitioning of the complex num-bers into four unipolar parts,22 while real-valuedmatrix calculations can be achieved with twos-complement binary 1TCB2 arithmetic.4,13,19,23 TheDMAC technique is attractive to many optical scien-tists and engineers because many optical processorsare capable of performing high-speed discrete-timeconvolution.Photonic incoherent signal processors comprising

low-loss 10.1 dB@km at 1.5-µm wavelength2 and largetime–bandwidth product 1.1052 single-mode opticalfiber and low-loss 1,0.1 dB2 tunable fiber directionalcouplers are important for processing broadband-modulated signals in signal-processing systems.They have been demonstrated to be capable of per-

10 February 1995 @ Vol. 34, No. 5 @ APPLIED OPTICS 803

Page 2: Fiber-optic array algebraic processing architectures

forming systolic analog matrix–vector multiplication,frequency filtering, code generation, matched filter-ing, and high-speed pulse generation.2,24In this paper we describe a fiber-optic array proces-

sor 1FOAP2 based on the DMAC technique that uti-lizes three all-optical elemental-processing latticeunits, namely, an optical splitter, an optical combiner,and a binary programmable optical transversal filter,to perform positive and real-valued matrix–vector,matrix–matrix, triple-matrix, and high-order matrixoperations. These three basic building block opticalcomponents consist of simple 3-dB fiber directionalcouplers and erbium-doped fiber amplifiers 1EDFA’s2,and their basic operations are discussed in Section 2.A brief review of the DMAC algorithm and its exten-sion and also TCB arithmetic are presented in Section3. Section 4 discusses variousDMACFOAPmultipli-ers for matrix operations. Section 5 describes theperformance comparison between the FOAP and theAO matrix multipliers and the digital multipliers.Section 6 continues with discussion, and a conclusionis given in Section 7. Mainly binary-encoded dataare considered in this paper unless otherwise stated.

2. All-Optical Elemental-Processing Lattice Structures

It is assumed that all optical processors considered inthis paper operate with light intensities that arealways nonnegative. The optically encoded signalsto be processed are modulated onto the optical carrierwhose coherence time Tp is very short compared withthe basic time delay T in the processor, i.e., Tp 9 T.It is assumed that the modulated laser pulse has apulse width Tp and a height of one unit. Encodingand sampling of the data can be accomplished eitherdirectly by modulation of the injection current bygain-switched lasers25 or by modulation of the inten-sity of the light in an electro-optic modulator such as aLiNbO3 waveguide electro-optic device.26 Thus thephase of the optical carrier is not important, and thesignals add on an intensity basis. In other words,incoherent optical processors are dealt with in thispaper. This section describes several FO elemental-processing lattice structures, namely, an optical split-ter, an optical combiner, and a binary programmableoptical transversal filter. They serve as the basicbuilding block optical components formatrix computa-tions, which are discussed in Section 4.

2.A. Optical Splitter and Optical Combiner

The 1 3 n optical splitter and the n 3 1 opticalcombiner are shown schematically in Figs. 11a2 and21a2, and their block-diagram representations areshown in Figs. 11b2 and 21b2, respectively. The 1 3 noptical splitter distributes the incoming light inten-sity evenly into n output fiber lines, while the n 3 1optical combiner collects light intensity of n inputfiber lines into a single output. Each 1 3 n splitterand n 3 1 combiner is constructed from n 2 1 3-dB

804 APPLIED OPTICS @ Vol. 34, No. 5 @ 10 February 1995

FO directional couplers arranged in several stages ina binary-tree fashion.The signal intensity coming into the optical splitter

will experience a 3-dB coupling loss at every stage ofthe tree, and this results in a common power loss of 10log n 1dB2 at each of the n output lines of the powerdivider device. An EDFA gain, Gsplit, is incorporatedat the start of the tree to compensate for such a loss.Thus the signal intensity at each of the output lines isequal to the incoming signal intensity provided thatGsplit 5 10 log n 1dB2. Likewise, the incoming signalintensity at each of the n input lines of the opticalcombiner will also experience a common loss of 10log n 1dB2 at the end of the tree. The signal intensityof each input line is recovered at the output line,provided that an in-line EDFA gain, Gcomb 5 10 log n

Fig. 1. Optical splitter: 1a2 schematic showing several stages of2 3 2 3-dB fiber directional couplers in a binary-tree structure, 1b2block-diagram representation.

Fig. 2. Same as Fig. 1 but for an optical combiner.

Page 3: Fiber-optic array algebraic processing architectures

1dB2, is incorporated at the output of the tree. Theinteger value n can take a large value because theoptical gain varies as the logarithm of n, and currentEDFA technology can provide gain up to 40 dBwithout having the device working in the saturationregion. An EDFA is considered as the preferredamplifier source because it provides large spectralbandwidth, low noise level, and is more compatiblewith the single-mode fiber used in the processors thanother semiconductor amplifiers.27 Tsplit and Tcomb arethe propagation delays of the splitter and the com-biner, respectively, arising from the fiber length of theEDFA’s.

2.B. Binary Programmable Fiber-Optic Transversal Filter

The binary programmable FO transversal filter 1orFO convolver2 is shown schematically in Fig. 31a2, andthe block diagram representation is shown in Fig.31b2. The binary-code selector chooses the fiber de-lay paths through which the signal intensity from theoptical splitter is routed. This can be achieved bythe use of n 2 3 2 or n 1 3 2 optical switches,corresponding to the input code word bi 55s0s1 . . . sn216;s0 and sn21 designate the least-signifi-cant bit 1LSB2 and the most-significant bit 1MSB2,respectively. The binary-code selector is bit parallelloaded into the FO convolver, which then simulta-neously activates the switches according to the codeword bi. Each optical switch permits the light inten-sity either to connect 1binary state si 5 12 or to bypass1binary state si 5 02 a particular fiber delay path.The optical combiner collects the signal intensityaccording to the binary state of the switches. The

optical splitter and combiner are those of Figs. 1 and2, respectively, except that there is no incorporation ofan EDFA, i.e., Gsplit 5 Gcomb 5 0 dB. Instead, theEDFAgainGf at the start of the splitter is sufficient tocompensate for the common losses of the splitter andthe combiner. That is, the signal intensity at theoutput line of the FO convolver is the same as theincoming signal intensity provided that Gf 5 20 log n1dB2, where n is the number of binary bits in a word bi.The total number of fiber couplers required is 21n 2 12.Tf is the propagation delay, t is the smallest timedelay, and T is the delay difference between adjacentbits in the filter impulse response. T is referred to asthe sampling period of the filter 1or bit time2, and 1@Tis the filter sampling frequency. The filter impulseresponse is given by, provided that Gf 5 20 log n 1dB2,

h1t2 5 oi50

n21

sid1t 2 TFO 2 iT2, 112

where si [ 10, 12, d1t2 is the delta function, and TFO 5

t 1 Tf is the filter propagation delay.Another form of the FO programmable transversal

filter is in the forward-flow bus structure27 that has acommon signal loss of 31n 1 12 dB at the output andrequires 2n fiber directional couplers. The bus struc-ture has loss that varies linearly with the number ofbits n, but the common loss of our transversal filter1i.e., 20 log n dB2 varies as the logarithm of n. Theproposed binary programmable optical transversalfilter is thus chosen here because of its major advan-

Fig. 3. 1a2 Schematic of the binary programmable fiber-optic transversal filter 1FO convolver2, 1b2 block-diagram representation.

10 February 1995 @ Vol. 34, No. 5 @ APPLIED OPTICS 805

Page 4: Fiber-optic array algebraic processing architectures

tages in terms of less loss and fewer couplers required.The significant advantage of a binary 1or analog272optical programmable transversal filter is its adap-tive feature, permitting a particular filter impulseresponse to be easily obtained by means of externalelectronic and@or optical control.

3. Digital-Multiplication-by-Analog-ConvolutionAlgorithm and Its Extended Version

This section considers only the binary-encoded num-bers, although the DMAC technique is generally validfor data encoded in any radix.14,17 Subsection 3.Adescribes the well-known DMAC algorithm through agraphical technique that helps understand the opera-tion of the FO convolver. Subsection 3.B considersthe extension of the DMAC algorithm to digitalmultiplication of x positive n-bit binary numbers,which to our knowledge has not been discussed beforein the literature. Subsection 3.C considers the FOAParchitectures incorporating the TCB arithmetic.

3.A. Digital-Multiplication-by-Analog-Convolution Algorithm

The DMAC algorithm can be best understood througha simple example of multiplying two unsigned n-bitwords whose decimal numbers are f 5 6 and g 5 3.The standard binary representations of these se-quences are f 5 51106 and g 5 50116, respectively.Figures 41a2 and 41b2 show the discrete-time represen-tation of these sequences with f 3i4 being the FOconvolver impulse response, and g3i4 represents themodulated input light pulses 1rectangular profile as-sumed2 of unity height to be launched into the con-volver. The convolver pulse response, which is sim-ply the discrete convolution of its impulse response f 3i4with the input pulse sequence g3i4, can be written as

y3k4 5 oi50

k

f 3i4g3k 2 i4, 10 # k # 2n 2 22, 122

which can be evaluated graphically as shown in Figs.41c2–41g2 for n 5 3. g3i4 is folded about i 5 0 tobecome g32i4, as shown in Fig. 41c2, and is slid past1with the LSB first2 the digits of f 3i4. The sequenceg3k 2 i4 is simply the folded sequence of g3i4 shifted tothe right by k units of delay T, as depicted in Figs.41c2–41g2 for k 5 0, 1, 2, 3, 4, respectively. The pulseshown by the dotted curve corresponds to the LSB ofthe next word. t0 and t1 denote the starting times ofthe LSB of the first and the secondwords, respectively.The word strobe period Tw 5 tj11 2 tj is the timeseparation between the LSB of the previous word andthe LSB of the next word and is given by Tw 5

12n 2 12T. There are n 2 1 zeros for padding be-tween the words to avoid overflow. The analogresult y3k4 is shown in Fig. 41h2 with TFO being thepropagation delay of the FO convolver, which hasbeen ignored during the convolution process in Figs.41c2–41g2 for the sake of clarity. These mixed-binaryintensity pulses are converted into their binary repre-

806 APPLIED OPTICS @ Vol. 34, No. 5 @ 10 February 1995

sentations by a 2-bit ADC and shifted by a 2-bit S@Acircuit in the postprocessing unit, as shown in Fig. 41i2.The standard binary representation 50100106 corre-sponds to the integer 18, which is expected from theproduct 162132.

3.B. Extension of theDigital-Multiplication-by-Analog-ConvolutionAlgorithm

It is shown here that the digital multiplication of xn-bit binary numbers is equivalent to convolvingtheir binary representations. The product P of such

Fig. 4. Graphical illustration of the DMAC technique: 1a2 g3i4 isthe laser input pulse sequence, 1b2 f 3i4 is the convolver impulseresponse, 1c2–1g2 the convolution operation, 1h2 the convolutionoutput y3k4 is in the mixed-binary format, 1i2 the analog output isoperated by the postprocessing unit to obtain the expected stan-dard binary representation of the decimal number 18.

Page 5: Fiber-optic array algebraic processing architectures

x n-bit words is given by

P5 f 112 f 122 · · · f 1x2 12 f 1x2

5 3oa50

n21

f 11,a22a43ob50

n21

f 12, b22b4· · · 3oa50

n21

f 1x2 1,a22a43 3o

b50

n21

f 1x,b22b4, 132

where f 11, a2, f 12, b2, . . . , f 1x 2 1, a2, f 1x, b2 [ 10, 12.On substitution of g 5 a 1 b 1 · · · 1 a 1 b or b 5g 2 a 2 b 2 · · · 2 a, Eq. 132 becomes

P 5 og50

x1n212

2gyg, 14a2

where

yg 5 oa50

g

ob50

g

· · · oa50

g

f 11, a2 f 12, b2 · · · f 1x 2 1, a2

3 f 1x, g 2 a 2 b 2 · · · 2 a2, 14b2

for 0 # g # x1n 2 12, and f 11, a2, f 12, a2, . . . , f 1x 2 1, a2,f 1x, a2 5 0 for a , 0 or a . n 2 1.Equation 41b2 can be recognized as the discrete convo-lution of x n-bit binary sequences

yg 5 f 112 p f 122 p · · · p f 1x 2 12 p f 1x2, 152

where p designates the convolution operation. Theorder of performing the convolution in Eq. 152 isunimportant because convolution is commutative andassociative. In general, there are x1n 2 12 1 1 mixedbinary points as a result of the convolution of x n-bitwords, the word cycle is Tw 5 3x1n 2 12 1 14T, andthere are 1x 2 121n 2 12 zeros for padding. Thesemixed-binary points can be converted to the standardbinary string by the postprocessing unit, as describedabove.

3.C. Fiber-Optic-Array-Processor Architectures UsingTwos-Complement Binary Arithmetic

TheTCB representation is a powerful encoding schemethat permits both positive and negative real numbersto be represented in the binary form. The encodingscheme requires a sign bit 1SB2 to be attached to theleftmost bit of the binary number, i.e., SB 5 0 for apositive number and SB 5 1 for a negative number.For example, the TCB sequence of the positive integer15.5 is 50 1 0 1.16, and the negative number 213.375is 51 0 0 1 0.1 0 16.The multiplication of two real numbers based on

the TCB arithmetic4,13,19,23 requires that the inputnumbers be represented by the same number of bitsas the output. For example, the TCB representationof the product 115.521213.3752 5 273.5625 is51 0 1 1 0 1 1 0.0 1 1 16, which is a 12-bit word.The 12-bit TCB sequence of the input numberscan thus be obtained by insertion of seven zerosto the left of the SB of 15.5 to become50 0 0 0 0 0 0 0 1 0 1.16 and by four ones to the left ofthe SB of 213.375 to give 51 1 1 1 1 0 0 1 0.1 0 16.The discrete convolution of these two sequencesresults in the mixed-binary sequence 51 1 1 2.1

2 0 2 2 2 3 3 2 1 1 0 0 0 0 0 0 0 06, in which the last13 analog bits are discarded. The standard TCBformat of the first 12 chosen analog bits 1the boldfacebits2 on the ADC and the S@A operations is51 0 1 1 0 1 1 0.0 1 1 16, as expected for the number273.5625. Implementation of the radix 1decimal2point shifting is not discussed here because it is not ahardware issue. The DMAC technique in conjunc-tion with the TCB arithmetic also holds for themultiplication of x real numbers, whose nonnegativeinteger counterpart has described in Subsection 3.B.The major disadvantage of any optical DMAC pro-

cessor employing the TCB arithmetic is the reductionin the preprocessing speed, and this can be explainedas follows. The convolution of x n8-bit TCB numbersgenerates 3x1n8 2 12 1 14mixed-binary points, but onlythe first n8 analog bits are useful for decoding into theexpected result, and T8w 5 3x1n8 2 12 1 14T, whereprimes are included for TCB variables. If the firstuseful n8 analog bits of the TCB convolution result areequal to the 3x1n 2 12 1 14 analog bits of the unsignedconvolution, as described in Subsection 3.B, then thefollowing relationship is obtained:

T8w 5 xTw 1 11 2 x2T, 162

in which the first term is dominant for large wordlength. Thus the processing power of the DMACFOAP architectures incorporating the TCB arith-metic is reduced approximately by a factor of x 1i.e.,T8w< xTw,where x is the number of matrices to bemultiplied2 as compared with their unsigned counter-parts. Optical DMAC processors based on the TCBarithmetic also suffer from an increase in the resolu-tion bits of the ADC and the S@A circuit because allthe bits representing the real numbers are not fullyutilized. This has the effect of reducing the prepro-cessing speed because a higher-resolution ADC oper-ates at a much lower speed. Nevertheless, opticalDMAC processors incorporating the TCB representa-tions enable bipolar number handling13 and eliminatethe need for matrix partitioning.23

4. Fiber-Optic-Array-Processor Matrix Multipliers

This section describes the proposed FOAP architec-tures for implementations of the nonnegative matrix–vector, matrix–matrix, and triple-matrix products byuse of the DMAC algorithm and its extended version,as described in Subsections 3.A and 3.B, respectively.

4.A. Matrix–Vector Multiplication

The matrix–vector product of anM 3 Nmatrix A andan N 3 1 column vector B to form an M 3 1 columnvectorC is defined as

3c1c2···cM4 5 3

a11 a12 · · · a1Na21 a22 · · · a2N···

···· · · ···

aM1 aM2 · · · aMN

43b1b2···bN4 , 172

10 February 1995 @ Vol. 34, No. 5 @ APPLIED OPTICS 807

Page 6: Fiber-optic array algebraic processing architectures

where

ci 5 oj51

N

ai jbj 1i 5 1, 2, . . . ,M2, 182

is the binary word of vectorC. ai j and bj are the n-bitbinary words of matrix A and vectorB, respectively.Figure 5 shows the proposed FOAP matrix–vector

multiplier based on the vector inner-product opera-tions13 and the DMAC algorithm. At time t0 the Nn-bit binary words 1b1b2 . . . bN2 of vector B are bitparallel loaded into the corresponding N FO convolv-ers by activation of the appropriate optical switches1see Fig. 32. This process is equivalent to setting theFO convolvers’ impulse responses, which remainunchanged over the duration of convolution. At thistime the LSB’s of the N n-bit words 1a11a12 . . . a1N2 onthe first row of matrix A are serial fed into theappropriate inputs of the N FO convolvers. The Nconvoluted output bits are then added by the N 3 1optical combiner 1see Fig. 2, where n now becomes N2to form a single output bit. This analog signal isthen detected by an integrated detector@amplifierand converted into its binary representation by anADC and the S@A circuit of log2 Nn resolution bits.At a later bit time T 1i.e., t0 1 T2 the second bits of thesame words 1a11a12 . . . a1N2 are loaded into the FOconvolvers, and the output bits are then combined bythe optical combiner. This detected analog pulse isanalog to digital converted into its binary format,which is then upshifted by one position to the left andbinary added to the previous log2 Nn binary bits toform a new binary sequence. The third and nextn 2 3 bits of 1a11a12 . . . a1N2 are loaded into theconvolvers at the discrete times 2T, 3T, . . . , 1n 2 12T.This process continues until the convolution computa-tion of the first row of matrix A is completed. It is

808 APPLIED OPTICS @ Vol. 34, No. 5 @ 10 February 1995

assumed that theADC conversion timeTADC is smallerthan the bit time T. At time t1 1 T the first elementof the column vector C 1i.e., c12 is computed and storedin an output buffer. At the same time, the LSB’s ofthe N words on the second row of the matrix, i.e.,1a21a22 . . . a2N2 are convoluted with the LSB’s of the NFO convolvers. This convolution process is repeatedas before, and the second element of the vector C isavailable after two word cycles 1i.e., 2Tw2 and stored inthe buffer. There are n 2 1 zeros for padding be-tween the words to avoid overflow 3see Fig. 41c24. Thecomputation time of the FOAP matrix–vector multi-plier, which takesMword cycles, is thus given by

TMV 5 12n 2 12MT 1 TFO 1 Tcomb, 192

provided that TADC , T.

4.B. Matrix–Matrix Multiplication

Thematrix–matrix product of anM3NmatrixA andan N 3 P matrix B results in an M 3 P matrix C,given by

3c11 c12 · · · c1Pc21 c22 · · · c2P···

···· · · ···

cM1 cM2 · · · cMP

45 3

a11 a12 · · · a1Na21 a22 · · · a2N···

···· · · ···

aM1 aM2 · · · aMN

4 3b11 b12 · · · b1Pb21 b22 · · · b2P···

···· · · ···

bN1 bN2 · · · bNP4 ,

1102

Fig. 5. Proposed FOAPmatrix–vector multiplier based on vector inner-product operations and the DMAC algorithm.

Page 7: Fiber-optic array algebraic processing architectures

where

ci j 5 ok51

N

aikbk j, 1112

for i 5 1, 2, . . . , M and j 5 1, 2, . . . , P. The n-bitbinary words aik and bk j are the elements of matricesA andB, respectively.Figure 6 shows the proposed FOAP matrix–matrix

multiplier based on the outer-product operations3,12,18and the DMAC algorithm. At time t0, all the n-bitbinary words 1b11b12 . . . bNP2 of matrix B are bitparallel loaded into the FO convolvers by activation ofthe appropriate optical switches. At the same timethe N n-bit words 1a11a12 . . . a1N2 on the first row ofmatrix A are bit serial fed 1with the LSB’s first2 intothe N 1 3 P optical splitters. The LSB’s of thesewords are then bit serial fed into the inputs of theappropriate FO convolvers, where convolution compu-tations begin. This process is equivalent to perform-ing the outer-product operations between the kthcolumn vector of A and the kth row vector of B.These convoluted output bits are then added by theP N 3 1 optical combiners with a propagation delayTcomb. The P postprocessors 1see Fig. 52 will performtheADC and the S@Aoperations to yield the standardbinary sequence for the first row of matrix C, i.e.,

1c11c12 . . . c1P2, which takes one word cycle Tw, whereTw 5 12n 2 12T. At time t1 1 T, the LSB’s of the Nn-bit elements 1a21a22 . . . a2N2 on the second row ofmatrixA are fed sequentially into theN splitters, andthe process repeats as before until time tM21, whichtakesMword cycles. Thus thematrix–matrix multi-plication is performed by the matrix–matrix convolu-tion in the preprocessing unit. The computationtime of the FOAP matrix–matrix multiplier is thusgiven by

TMM 5 12n 2 12MT 1 Tsplit 1 TFO 1 Tcomb, 1122

provided that TADC , T. Both the ADC and the S@Acircuit in each postprocessor require log2 Nn resolu-tion bits, the same number as that required by theFOAPmatrix–vector multiplier of Fig. 5. The convo-lution time in the first term of Eq. 1122 is exactly thesame as the first term of Eq. 192, and this is clearly dueto the massive-parallel-processing capability of theFOAPmultiplier.

4.C. Triple-Matrix Multiplication

The triple-matrix product of an M 3 N matrix A, anN 3 P matrix B, and a P 3 Q matrix C results in an

Fig. 6. Proposed FOAPmatrix–matrix multiplier based on vector outer-product operations and the DMAC algorithm.

10 February 1995 @ Vol. 34, No. 5 @ APPLIED OPTICS 809

Page 8: Fiber-optic array algebraic processing architectures

M 3 QmatrixD, given as

3d11 d12 · · · d1Qd21 d22 · · · d2Q···

···· · · ···

dM1 dM2 · · · dMQ

45 3

a11 a12 · · · a1Na21 a22 · · · a2N···

···· · · ···

aM1 aM2 · · · aMN

4 3b11 b12 · · · b1Pb21 b22 · · · b2P···

···· · · ···

bN1 bN2 · · · bNP4

3 3c11 c12 · · · c1Qc21 c22 · · · c2Q···

···· · · ···

cP1 cP2 · · · cPQ4 , 1132

where

dil 5 oj51

P

ok51

N

aikbkjcjl, 1142

for i 5 1, 2, . . . , M and l 5 1, 2, . . . , Q. The n-bit

810 APPLIED OPTICS @ Vol. 34, No. 5 @ 10 February 1995

binary words aik, bkj, and cjl are the elements ofmatrices A,B, andC, respectively.Figure 7 shows the proposed FOAP triple-matrix

multiplier based on the outer-product operations andthe extension of the DMAC algorithm, as described inSubsection 3.B, and its operating principle is similarto that described in Subsection 4.B for double-matrixcomputation. The outputs of the first highlightedblock 1i.e., matrixB of Fig. 72 generate a series of sumsof convolutions of the elements of matrix A and thecorresponding elements of matrix B, i.e., A p B. Atthis point the preprocessing architecture is exactlythe same as the preprocessing unit of the matrix–matrix multiplier of Fig. 6. The analog output bits ofA p B are serial fed to the P 1 3 Q optical splitters andthen to the FO convolvers, whose impulse responsescorrespond to the binary words of matrix C. Theoutputs of the Q P 3 1 optical combiners yield theanalog results 1A p B2 p C. Thus we obtain thetriple-matrix convolution A p B p C by computing thedouble-matrix convolution A p B, whose analog bitsare then convoluted with the binary bits of the wordsof matrix C to give 1A p B2 p C. The word cycle Tw isgiven by Tw 5 13n 2 12T, and the number of zeros forpadding between the words is 21n 2 12. The largestanalog value of the convolution of three n-bit words

Fig. 7. Proposed FOAP triple-matrix multiplier based on the extension of the DMAC algorithm.

Page 9: Fiber-optic array algebraic processing architectures

1i.e., aik p bkj p cjl2 in which all binary bits have logic 1is given by 0.75n2 when n is even and 0.2513n2 1 12when n is odd. Thus the largest analog bit of dil is,from Eq. 1142, given by 0.25NP13n2 1 12, where N andP are the numbers of summations performed byN 3 1and P 3 1 optical combiners to generate 1A p B2 and1A p B2 p C, respectively. Thus both the ADC and theS@A circuit in the postprocessing unit requirelog230.25NP13n2 1 124 resolution bits. The computa-tion time of the FOAP triple-matrix multiplier, whichtakesMword cycles, is thus given by

TMMM 5 13n2 22MT1 2Tsplit1 1 2TFO 1Tcomb1 1Tsplit2,

1152

provided that TADC , T and Tsplit1 5 Tcomb2. Themajor advantage of the DMAC technique lies in thefact that a combination of convolution and additionoperations can be performed in the preprocessingunit. We can easily achieve the product of xmatricesby cascading x 2 1 of the highlighted basic buildingblocks of Fig. 7, where each block corresponds to theelements of each matrix.

5. Performance and Comparison

Subsection 5.A shows that the FOAP architecturesperform better with binary-encoded 1or base-22 data.Subsections 5.B and 5.C describe the performancecomparison among the FOAP, the time-integrating,the space-integrating, the digital matrix–vector, andthe matrix–matrix multipliers. Subsection 5.D con-siders the performance of the high-order FOAP ma-trix multiplier. The propagation delays Tsplit, Tcomb,and TFO of the FOAP architectures are assumed to benegligible compared with the word-cycle terms 3i.e.,the first term of Eq. 1924 because thematrix dimensionsconsidered here are large.

5.A. Fiber-Optic-Array-Processor Architectures UsingNon-Binary-Encoded Data

It is shown in this subsection that the overall perfor-mance of the FOAP architectures, in which the FOAPmatrix–vector multiplier of Fig. 5 is chosen as anexample, is best achieved with base-2 1or binary-encoded2 numbers. The number of bits n requiredin basic b for representing an m-bit number is n $logb 2m. The matrix dimension N of the FOAP

matrix–vector multiplier of Fig. 5 is limited by thedynamic range of the ADC as N # 2NADC@n1b 2 122,where NADC are the ADC resolution bits and b is thebase. The FOAP matrix–vector multiplier of Fig. 5has the processing speed given by

MOPS 5 N@12n 2 12T, 1162

where MOPS means megaoperations per second and1@T is in megahertz. The Psaltis–Athale ratio R1 ofthe FOAPmatrix–vector multiplier is given by

R1 5 N@12n 2 12. 1172

The Psaltis–Athale ratio15 R1 is given by the numberof multiplications per ADC conversion, and it can beobtained by dividing the number of operations 1whereone operation is considered to be equivalent to onemultiplication and one addition in this paper2 persecond 1OPS2 by the number of ADC conversions persecond 3the number of output channels 1or ADC’s2times the clock rate4. The ratio R1 is thus indepen-dent of the clock rate and must exceed unity in orderto keep the complexity of the electronics to a mini-mum.Table 1 describes the performance comparison for

various case studies of the FOAP matrix–vectormultiplier of Fig. 5. The resolution bits NADC andspeed 1@T of the 12-bit 100-MHz ADC are fixed forcases 1–3 of Table 1. Cases 1–3 show that increas-ing the base b results in decreasing the values of nand N. Consequently the number of MOPS and thevalue R1 are significantly reduced. Thus the FOAPmatrix–vector multiplier performs best for binary-encoded data 1case 12 when the ADC bits and speedare fixed. Cases 4–6 correspond to a fixed matrixdimension N 5 128. The ADC resolution bits andspeed are obtained from the present state-of-the-artelectronic data. Increasing the base b results inincreasingNADC, which greatly reduces the processingspeed MOPS because higher-resolution ADC’s oper-ate at much lower speeds. However, the FOAP ma-trix–vector multiplier still outperforms its digitalcounterpart for a fixed N because the ratio R1 in-creases with decreasing n as a result of increasing b.The overall performance of the FOAP matrix–vectorarchitecture is best achieved with binary-encodeddata because of the desirable value of the MOPS in

Table 1. Performance Comparison for Case Studies of the Fiber-Optic Array-Processor Matrix–Vector Multiplier of Fig. 5 for 32-bitAm 5 32BMultiplications a

Case b n N NADC 1@T 1MHz2 MOPS R1 Remarks

123

248

321611

128287

121212

100100100

203.290.333.3

2.0320.9030.333

Constant clock rate 1@T, constant ADC bits NADC

456

248

321611

128128128

121618

100200.20

203.282.61.22

2.0324.136.1

Constant matrix dimension N

ab is the base used, n is the digits of accuracy,N is the number of optical convolvers of Fig. 5,NADC is theADC resolution bits, MOPS is megaoperations per second, and R1 is the Psaltis–Athale ratio.

10 February 1995 @ Vol. 34, No. 5 @ APPLIED OPTICS 811

Page 10: Fiber-optic array algebraic processing architectures

Table 2. Performance Comparison of Several Optical Matrix–Vector Multipliers

ProcessorTypea

Number of ClockCycles per Operationb

Timec

1µs2 MOPScNumber

of DetectorsNumberof ADC’s

Psaltis–AthaleRatio R1

1-D SI 1Ref. 112 12N 1 M 2 12n 2 N 121.3 135.1 M M 0.01061-D TI 1I2 1M 1 N 2 1212n 2 12 160.7 102 M12n 2 12 M 0.007971-D TI 1II2 1Ref. 232 1M 1 N 2 12n 81.6 200.8 Mn M 0.01571-D TI 1III2 1Ref. 142 M12n 2 12 80.64 203.2 n 1 2.0322-D SI 1M 1 N 2 1212n 2 12 160.7 102 M M 0.007972-D TI 1I2 N12n 2 12 1 1M 2 12 81.9 200 M12n 2 12 M 0.01562-D TI 1II2 1Ref. 132 M12n 2 12 80.64 203.2 M12n 2 12 M 0.0159FM N12n 2 12 80.64 203.2 M M 0.01591-D OP 1Ref. 32 NMn 5243 3.125 M12n 2 12 M 0.0002442-D OP 1Ref. 32 2N 2.56 6400 M12n 2 12 M 0.50FOAP M12n 2 12 80.64 203.2 1 1 2.032

a1-D SI, one-dimensional space integrating; 1-D TI, one-dimensional time integrating; FM, frequency multiplexed; 1-D OP, outerproducts with one-dimensional modulators; 2-D OP, outer products with two-dimensional modulators; FOAP, fiber-optic array processor ofFig. 5.

bParameters for the product of anM 3 Nmatrix and anN 3 1 vector to n digits of accuracy.cAssuming a 100-MHz clock rate; MOPS, mega operations per second for 1n 5 32,M 5 N 5 1282.

addition to its advantage over its digital processor, asdescribed in case 4. The FOAPmatrix–matrixmulti-plier of Fig. 6 for 32-bit multiplication also has thesame performance parameters as those in Table 1except that it has P times the values of the MOPS ofthose in Table 1. The analysis here also applies tohigher-order FOAP matrix multipliers, and we canconclude that the FOAP architectures perform betterwith binary-encoded data.

5.B. Performance Comparison of Nonfiber and Fiber-OpticArray Processor Matrix Multipliers

We now compare the performances of several previ-ously proposed nonfiber matrix multipliers with theFOAP architectures. Cartwright28 has described anexcellent treatment of the performance comparison ofvarious nonfiber matrix–vector and matrix–matrixmultipliers, as summarized in Tables 2 and 3, respec-tively. The clock rate is assumed to be 100 MHzbecause present 12-bit ADC’s are capable of achievinga conversion time of 9 ns. For comparison purposesthe dimensions of the matrix–vector and matrix–matrixmultipliers are specifically chosen to be 1n 5 32,M 5 N 5 1282 and 1n 5 32, M 5 N 5 P 5 1282 forTables 2 and 3, respectively. Both the FOAPmatrix–vector and matrix–matrix multipliers of Figs. 5 and 6,

812 APPLIED OPTICS @ Vol. 34, No. 5 @ 10 February 1995

respectively, require 12-bit ADC’s, and it is assumedthat other nonfiber architectures also require ADC’sof no more than 12-bit resolution. Reference num-bers are also included in the first columns of Tables 2and 3.The time-integrating 1TI2, one-dimensional outer-

product 11-D OP2, and two-dimensional outer-product12-D OP2 architectures generate the mixed-binaryresult in a parallel fashion and thereby require alarger number of detectors thanADC’s. This fact canbe observed in columns 5 and 6 of Tables 2 and 3.Consequently, additional electronic switching is neces-sary to convert these parallel outputs to serial datastrings for further ADC and S@A operations. Inorder to exploit fully the processing capability ofoptical computing architectures, one should use asfew electronic components as possible. Otherwise,there may be no significant advantage of opticalDMAC processors over their digital counterparts.The SI, TI, frequency-multiplexed 1FM2, and FOAPmultipliers have very minor differences 1less than anorder of magnitude2 in the processing-power MOPS1see columns 4 of Tables 2 and 32. However, the 1-DOP and the 2-D OP processors have the smallest andthe largest computing powers, respectively, as com-pared with other architectures. The preprocessing

Table 3. Performance Comparison of Several Optical Matrix–Matrix Multipliers

ProcessorTypea

Number of ClockCycles per Multiplicationb

Timec

1µs2 GOPScNumber

of DetectorsNumberof ADC’s

Psaltis–AthaleRatio R1

SI 1Ref. 112 12N 1 M 2 12n 2 N 1 1P 2 1212n 2 12 201.3 10.42 MP MP 0.00636TI 1I2 1M 1 N 2 1212n 2 12 1 1P 2 12 161.9 12.95 MP12n 2 12 MP 0.0079TI 1II2 1Ref. 42 1M 1 N 2 12n 1 1P 2 12 82.87 25.31 MPn MP 0.0154TI 1III2 1Ref. 132 1P 1 M 2 1212n 2 12 160.7 13.05 M12n 2 12 M 1.021-D OP 1Ref. 32 NMn 5243 0.40 MP12n 2 12 MP 0.0002442-D OP 1Ref. 32 2N 2.56 819.2 NP12n 2 12 NP 0.50FOAP M12n 2 12 80.64 26 P P 2.032

aSI, space integrating; TI, time integrating; 1-D OP, outer products with one-dimensional modulators; 2-D OP, outer products withtwo-dimensional modulators; FOAP, fiber-optic array processor of Fig. 6.

bParameters for the product of anM 3 Nmatrix and anN 3 P vector to n digits of accuracy.cAssuming a 100-MHz clock rate; GOPS, giga operations per second for 1n 5 32,M 5 N 5 P 5 1282.

Page 11: Fiber-optic array algebraic processing architectures

speed of a particular optical architecture cannot beused alone as the main basis for assessing its superi-ority over other processors because the postprocessor,in which the ADC is the most power consumptive andmay be the slowest component, must be considered.The FOAPmatrix–vector andmatrix–matrixmultipli-ers require only one ADC 1and one detector2 and PADC’s 1and P detectors2, respectively, as shown incolumns 5 and 6 of the tables. Other nonfiberprocessors require a large number of detectors andADC’s and thereby increase the complexity of theelectronics. The overall performances of the nonfi-ber architectures, except the 1-D TI 1III2 matrix–vector multiplier in Table 2, are thus reduced asdescribed by the small values of the ratio R1, asindicated in the last columns of Tables 2 and 3.Although the preprocessing speed of the 2-D OParchitecture is extremely fast, its overall performanceis greatly affected by the large number of ADC’s 1andeven larger number of detectors2, as indicated by theratio R1 5 0.5 for both cases. Table 2 shows that the1-D TI 1III2 matrix–vector multiplier shares opera-tional parameters that are very similar to those of theFOAPmatrix–vectormultiplier except that the formerrequires an array of n detectors instead of only onesingle detector, as does the FOAP matrix–vectormultiplier. The FOAP and the 1-D TI 1III2 matrix–vector multipliers achieve a very desirable ratio ofR1 5 2.032 because the binary words of one vector areparallel 1instead of serially2 loaded into the convolv-ers, and both require only oneADC in the postprocess-ing unit. Thus these two architectures outperformother optical DMAC 3with FOAP outperforming 1-DTI 1III24 and digital matrix–vector multipliers. Table3 shows that the TI 1III2 architecture is the onlynonfiber matrix–matrix multiplier that is capable ofachieving a ratio slightly greater than unity. Theoverall performance of the FOAPmatrix–matrix mul-tiplier is superior to that of other nonfiber and digitalprocessors because of its large value of R1 5 2.032, asshown in Table 3. The performances of the 1-D TI1III2matrix–vectormultiplier and the 2-DOParchitec-tures are limited by the time–bandwidth product6 1oraperture time2 of the AO cell. This restricts thenumber of digits to be fit in parallel into the cellbecause the elements of the vector of these architec-tures are required to be fed in parallel to the cell.Their performances are also limited by the spatialbandwidth as a result of large pixel sizes.28 It is nowclear that the overall performances of the FOAPmatrix–vector and matrix–matrix multipliers are su-perior to those of other nonfiber architectures and thedigital multipliers.

5.C. Performance Comparison ofFiber-Optic-Array-Processor and Acousto-Optic Matrix–VectorMultipliers

The 1-D TI 1III2 matrix–vector multiplier14 as de-scribed in Table 2 was later pursued by Baranoski andCasasent,17 who used an array of n ADC’s instead ofonly one ADC, as described previously.14 The perfor-

mance of the AO matrix–vector multiplier17 is chosenhere for comparison with the FOAP matrix–vectormultiplier because of its superiority over other nonfi-ber architectures and its close performance to theFOAP matrix–vector multiplier. Baranoski andCasasent suggested a new performance measure asan alternative approach to the Psaltis–Athale ratio.The Baranoski–Casasent ratio R4 is given by thevalue of the optical OPS@comp divided by the value ofthe digital OPS@comp, as17

R4 5 1OPS@comp2optical@1OPS@comp2digital, 1182

where OPS@comp is the number of operations persecond performed by one component, where an opticalcomponent is an ADC and a digital componentis a multiplier with the same resolution. The1OPS@comp2digital for digital computation of x n-bitmultiplications with n-bit devices is given by 1@x2T,which is a constant value regardless of the parallel-ism employed. The value R4 of the FOAP matrix–vector multiplier of Fig. 5 is thus given by

R40FOAP 5 4N@12n 2 12, 1192

where x 5 2 is used for two n-bit multiplications.It is noted that R4 in Eq. 1192 is four times R1, as givenin Eq. 1172 1i.e.,R4 5 4R12. This is because Psaltis andAthale assumed that a digital multiplier is capable ofperforming one multiplication 1or operation2 per sec-ond. Baranoski and Casasent proposed that only1@4 of an operation performed per second can beachieved by a digital multiplier.The AO matrix–vector architecture is capable of

performing the number of MOPS given by17

MOPS0AO 5 M@nT, 1202

where 1@T is in megahertz. The number of zeros forpadding 1or overflow2was ignored in Eq. 1202 because itcan be easily detected and handled. For comparisonpurposes the number of zeros for padding are in-cluded here to give

MOPS0AO 5 M@12n 2 12T, 1212

which is equal toMOPS0FOAP whenN5M according toEq. 1162. The performance ratio R40AO of the AOmatrix–vector multiplier is thus given by

R40AO 5 4M@12n 2 12n. 1222

The factor n in Eq. 1222 is due to an array of n ADC’srequired by the AO architecture, while the FOAParchitecture requires only one ADC. It is clear fromEqs. 1192 and 1222 that the FOAP matrix–vector multi-plier outperforms the AO matrix–vector multiplieraccording to

R40FOAP 5 1Nn@M2R40AO. 1232

For square-matrix computation 1i.e., M 5 N 2 andboth architectures using the same base b, we haveR40FOAP 5 nR40AO, which indicates the superiority of

10 February 1995 @ Vol. 34, No. 5 @ APPLIED OPTICS 813

Page 12: Fiber-optic array algebraic processing architectures

the FOAP architecture over the AO architecture.It will be shown that the permissible M is much lessthanN, and theAO architecture performs better witha higher base.We now consider a numerical example of the perfor-

mance measures between the AO and the FOAPmatrix–vector architectures. For ease of comparisona square matrix is assumed here such that N 5 M.The following AO parameters were obtained17 for a16-bit word and a 100-MHz ADC as b 5 8, n 5 6,NADC 5 12, and M 5 83. The AO matrix–vectormultiplier is capable of performing the maximumpermissible values of MOPS0AO 5 754 and R40AO 5 5.The larger base b 5 8 1hence smaller n 5 6 instead ofn 5 16 and larger M 5 832 was specifically chosenhere because the AO architecture performs betterwith a higher base. The FOAPmatrix–vector multi-plier requires the following parameters: b 5 2, n 516, andNADC 5 12. The maximum permissible valueof N is N 5 256. Thus the maximum values ofMOPS0FOAP and R40FOAP that the FOAP architecturecan achieve are MOPS0FOAP 5 825 and R40FOAP 5 33.It is clear from this example that the maximumpermissible value M of the AO architecture is muchless than the value N of the FOAP architecture.Consequently the performance ratio R40AO of the AOprocessor is significantly less than that of the FOAParchitecture. The aperture time TA of the AO cellwas chosen to be TA 5 10 µs for the technologyavailable at that time.17 It is not surprising thatpresent technology can provide TA in the range 20–30µs. Increasing the value TA results in increasingM,which in turn increases NADC. Consequently thefollowing performance measures, MOPS0AO and R40AO,are also increased. However, the performance mea-sures MOPS0FOAP and R40FOAP of the FOAP matrix–vector multiplier are also increased with increasingNADC. From this specific case study it is clear thatthe FOAP matrix–vector architecture outperformsthe AO matrix–vector multiplier because the perfor-mance of the AO processor is limited by the aperturetime of the AO cell. This study also shows themassive-parallelism capability of the FOAP architec-ture; i.e., the values n and N are not limited by anyphysical constraints imposed in the preprocessingunit.

5.D. High-Order Fiber-Optic Array Processor MatrixMultiplier

We have described in Subsection 5.B that the FOAPmatrix–matrix multiplier achieves a ratio R1 5 2.032,which indicates its superiority over other nonfiberand digital processors. We now describe its perfor-mance for higher-order matrix operations in whichthematrix is assumed to be square and has dimensionM for the sake of simplicity in the analysis. Thehigh-order FOAP matrix multiplier requires M wordcycles to perform the product of x M-dimensionalsquare matrices, and its computation time in one unitof time is M3x1n 2 12 1 14T per operation. The num-ber of operations involved in the product of xmatrices

814 APPLIED OPTICS @ Vol. 34, No. 5 @ 10 February 1995

is 1x 2 12M3. Thus the processing-power OPS of theFOAP matrix multiplier per unit time is OPS 5

1x 2 12M2@3x1n 2 12 1 14T. The number of ADC’s re-quired by the FOAP matrix multiplier is always M.The performance ratio R4 is thus given by

R4 5 x21x2 12M@3x1n2 121 14< x1x2 12M@n. 1242

The value R4 in Eq. 1242 is much greater than unitybecause the matrix dimension M is usually muchlarger than the word length n for many practicalhigher-order matrix operations. This clearly showsthe superiority of the high-order FOAP matrix multi-plier over its digital counterpart because the value1OPS@comp2FOAP increases with increasing value of xand@or M, while the value 1OPS@comp2digital is fixed.Thus the high-order FOAP matrix multiplier canperform various linear algebraic operations, such assolutions of algebraic equations, 2-D mathematicaltransforms, matrix inversions, and pattern recogni-tion, which require higher-order matrix operations.4

6. Discussion

Despite the lossy nature of the FOAP processingarchitectures, they are highly accurate because lightintensity can be easily guided and recombined in alow-loss single-mode optical fiber and collected onto asingle, fast, integrated detector@amplifier. One cankeep the propagation delays, arising from the in-lineEDFA’s, of the optical elemental processing unitsreasonably small by using short and highly dopedfibers and increasing the pump power in the EDFA’spumping of the laser diodes. The performance of theTI and the SI architectures incorporating theAO cellsis limited by acoustic wave propagation velocity, theAO material response time, and, more severely, theacoustic diffraction, which results in cross talk. Anintegrated version, for example, a guided-wave 4 3 4optical gate matrix switch29 consisting of high-silicaguided-wave optical splitters, combiners, and semicon-ductor laser-diode optical gates on a single siliconsubstrate of size 10mm3 25mm, was experimentallydemonstrated; the proposed all-optical elemental-processing lattice structures can be extremely attrac-tive and competitive with other DMAC convolversand their digital counterparts.It is clear that the limitation of any optical DMAC

processor is mainly the slow processing speed of theelectronic postprocessing units, in which the ADC isoften the slowest component, with a bit-time limit ofT. Thus the overall performance of any optical DMACsystem is highly compromised. A state-of-the-artelectronic 8-bit ADC can operate at 1.5 GHz, andfuture development of a 6-bit ADC at 6 GHz isfeasible.30 Electro-optic 2- and 4-bit ADC’s wereexperimentally demonstrated to be capable of operat-ing in the gigahertz range.31 Future development ofhigh-speed optical ADC’s can greatly improve theoverall performance of optical DMAC computers; forexample, a recently proposed32 optical ADC usingoptical logic and a lookup table method could make

Page 13: Fiber-optic array algebraic processing architectures

the DMAC implementations very competitive withtheir electronic counterparts.

7. Conclusion

We have described the FOAP matrix–vector, matrix–matrix, triple-matrix, and high-ordermatrixmultipli-ers based on the DMAC algorithm and its extendedversion. We have shown that the processing powerof the DMAC FOAP matrix multipliers incorporatingthe TCB arithmetic is reduced by a factor equal toapproximately the number of matrices to be multi-plied compared with their positive counterparts.The FOAP matrix multipliers have been seen toperform best with binary-encoded data. We haveemployed the Psaltis–Athale andBaranoski–Casasentratios as the performance measures to show thesuperiority of the FOAP matrix–vector and matrix–matrix multipliers over the TI and the SI architec-tures and the digital multipliers. The high-orderFOAP matrix multiplier has also been shown to becapable of achieving many operations per second perADC component. The FOAP architecture may beconsidered as a general optical digital computingnetwork whose structural arrangements depend on aparticular algebraic operation.

This work was supported in part by a special grantfrom the Dean of Engineering, Monash University,Australia. We thank the reviewers for providingmany useful comments and suggestionswithout whichthe paper could not be completed in the way itappears.

References and Notes1. H. J. Caulfield, S. Horvitz, G. P. Tricoles, andW.A. VonWinkle,

‘‘Scanning the special issue: the special issue on opticalcomputing,’’ Proc. IEEE 72, 755–757 119842.

2. B. Moslehi, J. W. Goodman, M. Tur, and H. J. Shaw, ‘‘Fiber-optic lattice signal processing,’’ Proc. IEEE 72, 909–930 119842.

3. R. A. Athale and W. C. Collins, ‘‘Optical matrix–matrix multi-plier based on outer-product decomposition,’’ Appl. Opt. 21,2089–2090 119822.

4. R. P. Bocker, ‘‘Optical digital RUBIC 1rapid unbiased bipolarincoherent calculator2 cube processor,’’ Opt. Eng. 23, 26–32119842.

5. P. S. Guilfoyle, ‘‘Systolic acousto-optic binary convolver,’’ Opt.Eng. 23, 20–25 119842.

6. W. T. Rhodes and P. S. Guilfoyle, ‘‘Acousto-optic algebraicprocessing architectures,’’ Proc. IEEE 72, 820–830 119842.

7. H. J. Whitehouse and J. M. Speiser, ‘‘Linear signal processingarchitectures,’’ in Aspects of Signal Processing. Part 2, G.Tacconi, ed. 1NATO Advanced Study Institute, Boston, Mass.,19762, pp. 669–702.

8. E. E. Swartzlander, ‘‘The quasi-serial multiplier,’’ IEEE Trans.Comput.C-22, 317–321 119732.

9. D. Psaltis, D. Casasent, D. Neft, and M. Carlotto, ‘‘Accuratenumerical computation by optical convolution,’’ in 1980 Inter-national Optical Computing Conference II, W. T. Rhodes, ed.,Proc. Soc. Photo-Opt. Instrum. Eng. 232, 151–156 119802.

10. R. A. Athale, W. C. Collins, and P. D. Stilwell, ‘‘High-accuracy

matrix multiplication with outer-product optical processor,’’Appl. Opt. 22, 368–370 119832.

11. S. Cartwright, ‘‘New optical matrix–vector multiplier,’’ Appl.Opt. 23, 1683–1684 119842.

12. R. Athale and J. N. Lee, ‘‘Optical processing using outer-product concepts,’’ Proc. IEEE 72, 931–941 119842.

13. A. P. Goutzoulis, ‘‘Systolic time-integrating acousto-optic bi-nary processor,’’Appl. Opt. 23, 4095–4099 119842.

14. D. Casasent and B. K. Taylor, ‘‘Banded-matrix high-perfor-mance algorithm and architecture,’’ Appl. Opt. 24, 1476–1480119852.

15. D. Psaltis and R. A. Athale, ‘‘High-accuracy computation withlinear analog optical systems: a critical study,’’Appl. Opt. 25,3071–3077 119862.

16. D. Casasent and S. Riedl, ‘‘Direct finite-element solution on anoptical laboratorymatrix–vector processor,’’Opt. Commun. 65,329–333 119882.

17. E. J. Baranoski and D. P. Casasent, ‘‘High-accuracy opticalprocessors: a new performance comparison,’’ Appl. Opt. 28,5351–5357 119892.

18. F. T. S. Yu andM. F. Cao, ‘‘Digital optical matrix multiplicationbased on a systolic outer-product method,’’ Opt. Eng. 26,1229–1233 119872.

19. G. Eichmann, Y. Li, P. P. Ho, and R. R. Alfano, ‘‘Digital opticalisochronous array processing,’’Appl. Opt. 26, 2726–2733 119872.

20. Y. Li, G. Eichmann, and R. R. Alfano, ‘‘Fast parallel opticaldigital multiplication,’’ Opt. Commun. 64, 99–104 119872.

21. Y. Li, B. Ha, and G. Eichmann, ‘‘Fast digital optical multiplica-tion using an array of binary symmetric logic counters,’’ Appl.Opt. 30, 531–539 119912.

22. W. C. Collins, R. A. Athale, and P. D. Stilwell, ‘‘Improvedaccuracy for an optical iterative processor,’’ in Bragg SignalProcessing and Output Devices, T. Kooij and B. V. Markevitch,eds., Proc. Soc. Photo-Opt. Instrum. Eng. 352, 59–63 119832.

23. R. P. Bocker, S. R. Clayton, and K. Bromley, ‘‘Electro-opticalmatrix multiplication using the twos-complement arithmeticfor improved accuracy,’’Appl. Opt. 22, 2019–2021 119832.

24. K. P. Jackson, S.A. Newton, B. Moslehi, M. Tur, C. C. Cutler, J.W. Goodman, and H. J. Shaw, ‘‘Optical fiber delay-line signalprocessing,’’ IEEE Trans. Microwave Theory Tech. MTT-33,193–210 119852.

25. R. S. Tucker, J. M.Wiesenfeld,A. H. Gnauck, and J. E. Bowers,‘‘8 Gbit@s return-to-zero modulation of a semiconductor laserby gain-switching,’’ Electron. Lett. 22, 1329–1330 119862.

26. R. C. Alferness, ‘‘Waveguide electro-optic modulator,’’ IEEETrans. Microwave Theory Tech.MTT-30, 1121–1137 119822.

27. J. Capmany and J. Cascon, ‘‘Optical programmable transver-sal filters using fiber amplifiers,’’ Electron. Lett. 28, 1245–1246119922.

28. S. Cartwright, ‘‘Optical matrix multiplication,’’ in OpticalComputing: Digital and Symbolic, R.Arrathoon, ed. 1Dekker,NewYork, 19892, pp. 185–219.

29. A. Himeno, H. Terui, and M. Kobayashi, ‘‘Guided-wave opticalgate matrix switch,’’ J. Lightwave Technol. 6, 30–35 119882.

30. C. A. Liechti, ‘‘High-speed transistors: directions for the1990’s,’’ Microwave J. 30, 165–177 119892.

31. R. A. Becker, C. E. Woodward, F. J. Leonberger, and R. C.Williamson, ‘‘Wideband electro-optic guided-wave analog-to-digital converters,’’ Proc. IEEE 72, 802–819 119842.

32. A. D. McAulay, ‘‘Optical analog-to-digital converter usingoptical logic and table lookup,’’ Opt. Eng. 29, 114–120 119902.

10 February 1995 @ Vol. 34, No. 5 @ APPLIED OPTICS 815