Fast and Precise Fourier Transforms
Transcript of Fast and Precise Fourier Transforms
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 1, JANUARY 2000
213
Fast and Precise Fourier TransformsJoe Buhler, M. Amin Shokrollahi, and Volker StemannAbstractMany applications of fast Fourier transforms (FFTs), such as computer tomography, geophysical signal processing, high-resolution imaging radars, and prediction filters, require high-precision output. An error analysis reveals that the usual method of fixed-point computation of FFTs of vectors of length 2 leads to an average loss of /2 bits of precision. This phenomenon, often referred to as computational noise, causes major problems for arithmetic units with limited precision which are often used for real-time applications. Several researchers have noted that calculation of FFTs with algebraic integers avoids computational noise entirely, see, e.g., [1]. We will combine a new algorithm for approximating complex numbers by cyclotomic integers with Chinese remaindering strategies to give an efficient algorithm to compute -bit precision FFTs of length . More precisely, we will approximate complex numbers by cyclotomic integers in [ 2 2 ] whose coefficients, when expressed as polynomials in 2 2 , are bounded in absolute value by some integer . For fixed our algorithm runs in time (log( )), and produces 2 1 ). We an approximation with worst case error of (1 will prove that this algorithm has optimal worst case error by proving a corresponding lower bound on the worst case error of any approximation algorithm for this task. The main tool for designing the algorithms is the use of the cyclotomic units, a subgroup of finite index in the unit group of the cyclotomic field. First implementations of our algorithms indicate that they are fast enough to be used for the design of low-cost high-speed/highprecision FFT chips. Index TermsChinese remaindering, continued fractions, cyclotomic fields, cyclotomic units, effective Chebotarev theorem, Fourier transforms, integer linear programming, residue number systems, signal processing.
I. INTRODUCTION
T
HE discrete Fourier transform (DFT) of a complex is defined as the vector vector , where
Manuscript received December 21, 1997; revised July 21, 1999. M. Shokrollahi was supported by a post-doctoral fellowship of the International Computer Science Institute and a Habilitationsstipendium of the Deutsche Forschungsgemeinschaft, under Grant Sh-57/1. V. Stemann was supported by a post-doctoral fellowship of the International Computer Science Institute. M. A. Shokrollahis and V. Stemanns research on this work was done while they were employed by the International Computer Science Institute, Berkeley, CA. J. Buhler is with the Department of Mathematics, Reed College, Portland, OR 97202 USA and the Mathematical Sciences Research Institute, Berkeley, CA USA (e-mail: [email protected]). M. A. Shokrollahi is with the Department of Fundamental Mathematics, Bell Laboratories Rm. 2C-353, Murray Hill, NJ USA (e-mail: [email protected]). V. Stemann is with Deutsche Bank AG, Global Markets, Frankfurt, Germany (e-mail: [email protected]). Communicated by C. Herley, Associate Editor for Estimation. Publisher Item Identifier S 0018-9448(00)00068-7.
and . Evidently, direct computation of uses complex operations. In their celebrated 1965 paper, Cooley and for Tukey [2] (re-)discovered a method with cost computing the discrete Fourier transform of a vector whose length is a power of . This method, called the fast Fourier transform (FFT), and variants thereof have ever since been an indispensable tool in many different areas that deal with processing of digital signals [3], [4]. Many applications of FFTs such as computer tomography, geophysical signal processing, and high-resolution imaging radars require high-precision output. An error analysis reveals that, when performed on fixed-point arithmetic units, the bits, usual CooleyTukey FFT algorithms lose roughly on average, of precision for Fourier transforms of vectors of [5]. This phenomenon, often referred to as compulength tational noise, is caused by inaccurate computations due to limited dynamic range. Since long FFTs are unavoidable for a high-resolution analysis of signals (such as shock waves in oil search, or acoustic echoes in imaging radars), it is common to switch to custom hardware based on floating-point arithmetic. Unfortunately, floating-point integrated circuits are more complex and expensive than fixed-point processors. Any improvement of fixed-point computation so as to be more robust and accurate could lead to significantly less expensive chips in high-precision FFT processors. These issues have led to suggestions for alternate FFT computations by many researchers [6][12]. Much of the research has concentrated on residue number system (RNS) processors. The idea is as follows: the complex numbers constituting the input and the twiddle factors (i.e., the roots of unity involved) are approximated by Gaussian integers, that is, complex numbers of , where , and . At this point the the form computations can be performed exactly with integers, without any loss of precision. This approach has two major problems: the first is that the initial quantization error is large. The second is that during the course of the computation one has to perform arithmetic operations on large integers. The latter problem is usually solved by using residue number systems: the Chinese remainder theorem allows the computation to be distributed over an appropriate set of finite fields. The FFT is computed over each of these fields separately in parallel, with one fixed-point processor per field. Finally, these results are glued together as is standard in Chinese remainder calculations. In practice, one would specify an upper bound for the maximum length of the FFT, choose appropriate primes, and hard-wire them together with the Chinese remaindering transforms on the board. This brings us to the more serious of the above mentioned problems. The bottleneck of the standard RNS approach is that it is not possible to approximate complex numbers by Gaussian integers with arbitrary precision, since the Gaussian integers
00189448(00)$10.00 2000 IEEE
214
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 1, JANUARY 2000
form a discrete subset of the set of complex numbers. In a pioneering paper Cozzens and Finkelstein [7] suggested that the fourth root of unity could be replaced by a th root , . Hence, Gaussian integers are replaced by integral linear . combinations of . These constitute a ring which we call The fundamental difference between the set of Gaussian inteis that the latter is dense in . Hence, any comgers and plex number can be approximated to arbitrary precision by an . Because of the limited precision available, it element of of all turns out that the right object to study is the set integral linear combinations of powers of whose coefficients, when expressed as polynomials in , are bounded in absolute value by
Fig. 1. FFT processor chart for two FFT units.
If -bit precision FFTs of length are desired, then the usual high-precision straightforward fixed-point computa, where is the time tion takes time required for the processor to compute the product of two -bit integers. Measuring the running time of our algorithms with respect to the multiplication time seems to be sensible because for large , the running time of the basic arithmetic operations on integers of size has the same order of magnitude as that of the multiplication of such integers. To compare the multiple precision approach against the running time of an RNS approach, we first start by assuming that is fixed. Note that the closest point of to any in the unit from , where .1 circle has distance (This is not hard to see, see Theorem 2.) To obtain -bit approximations, we thus need to have . Using exhaustive search for finding this point takes operations. For larger than , this order of step alone is already inferior to the multiple precision approach. So, the approach becomes attractive only if the approximation step can be performed quickly. Since any complex number is the sum of a Gaussian integer and a complex number inside the unit circle, we can assume that the complex number to be . The overall running time approximated has absolute value of our approach can be summarized as follows: suppose that we have an algorithm that approximates complex numbers inside to within an absolute error the unit circle by elements of in time . Then, using the approach above we can of compute -bit approximations of the DFT of a complex vector of length with entries inside the unit circle in time
opposed to the running time of of the multiple precision approach. An abstract processor chart for FFTs using this strategy is given in Fig. 1. Our fundamental problem now is the design of a fast algorithm which takes as input a complex number and computes close to . The main technical contribution an element of of this paper is the proof of the following result. be an integer, , and Theorem 1: Let . For any one can construct in time an approximation algorithm with the following , , it comproperties: for fixed , on input , an element such that putes, in time . Furthermore, the worst case error of the approximations algorithm has optimal order of magnitude for fixed . The last claim in the above theorem needs some explanation. We will show in the next section that for any approximation algorithm there is an in the unit circle such that the distance between the approximation produced by the algorithm and is for some constant depending on only. As at least a result, the worst case error of the algorithm we have presented go to has optimal order of magnitude if we fix and let infinity. The only prior approximation algorithm with a reasonable . It running time, due to Games [13], [14], is for the case of , and a running has a worst case approximation error of . Games et al. [15] also report on the design of a time of brass board for filtering data based on approximation by eighth roots of unity. The approximation unit in their board is replaced for various by a ROM which contains approximations of by integral linear combinations of and . In Appendix C we will give a practical and efficient algorithm for the case which is a modification of our general algorithm. In particular, our algorithm will have worst case error of and a running time of . How do we prove Theorem 1? The lower bound of on the worst case approximation error is easy to prove and is based on a volume argument. Details are found in Section II. The harder part is the proof of a matching upper bound. We first reduce the problem to that of approximating real numbers. have real part and imaginary part Remark 1: Let Suppose that are approximations of and , for some . Then respectively, and that
An exact version of this result and its proof will be presented in Appendix D. This theorem proves that the cyclotomic integer approximation and Chinese remainder techniques are superior to multiple precision strategies, given a good approximation al, which seems to be gorithm for the input. Assuming reasonable for currently envisioned processors, the running time as of our RNS approach would becg x for all x
()
1Here
and in the sequel we use f (x) = (g (x)) to denote that f (x) 0, for some fixed constant c.
BUHLER et al.: FAST AND PRECISE FOURIER TRANSFORMS
215
is an approximation of with error at most . This remark calls for some extra notation to be used . Often, we will throughout the paper. Let suppress the dependency of on to ease the notation. Let and for let . It is a well-known are the integral linear combifact that the real elements in we define nations of the . For a positive integer
of the algorithm. These versions are described in Appendix C and Section III, respectively. Our algorithms use heavily some results in the theory of cyclotomic fields which are gathered in Appendices A and B. II. LOWER BOUNDS In this section we will prove by using a volume argument that any algorithm that approximates complex numbers in the unit will have a worst case approximacircle by elements of for fixed , where .A tion error of matching upper bound will be derived in the next sections. Furthermore, we will show that the order of the smallest absolute is also for fixed value of an element in . This result which is of independent interest will be applied in the next sections to analyze the running time of our approximation algorithms. Proposition 1: Let Then there exists we have , , . , and , such that for all
In view of the previous remark, it is sufficient to approximate . real numbers by elements in To solve the approximation problem for real numbers, we will levels. In a preprodesign an algorithm that runs in concessing phase we compute for each level a magic set . Roughly sisting of a few small positive real elements of there exists speaking, it has the property that for any such that is in . Level accepts two inputs: the complex number to be approximated, and a current apobtained from the previous level. proximation by Using , we will improve our current approximation in a way that the bound on the maximum adding elements of will size of the coefficients is not violated. The elements in ; so, in level , this procedure will be smaller than those in of . produce a better approximation is related to the The main property of the elements of . The GaGalois theory of the field lois group of this field is cyclic, and a generator is given by . For the rest of this paper we will by . Using this terminology, the denote the element have the property that their Galois spectrum is elements of there is a position such that spike-like, i.e., for is much larger than for . The precise conditions are as a stated in Section IV. To guarantee them, we construct . The crucial set of power products of cyclotomic units in property of these units is that their regulator is nonzero. This -entry is the absolute value means that the matrix whose of the logarithm of the th Galois conjugate of the th unit is nonsingular. This is a deep fact from algebraic number theory related to the nonvanishing of certain Dirichlet -series at . of in the worst case In this respect, the exponent approximation error of our algorithm should be interpreted as , i.e., as the number of fundamental units the unit-rank of . The fact that there is also a lower bound of this order of gives an a posteriori justification for the use of units for the approximation problem. This correspondence is further exploited in Section VI. To decide which power products of cyclotomic units will be, we set up an integer linear programming problem long to whose solutions are the corresponding integral exponents. Then we proceed to show how to find close-to-optimal solutions of this integer linear programming problem in time polynomial in and . This will finish the proof of our main theorem. The general strategy used above can be specialized in the and to give a more streamlined version cases
For the proof of this proposition we need the following result. Lemma 1: There are at most inside the unit circle. Proof: Let be given. Then there exists exactly one pair such that elements of
Hence, there is at most one pair , such that
with
Thus the assertion follows. Proof of Proposition 1: (See also Fig. 2.) Let
It suffices to prove that the elements of Hence, by Lemma 1 which proves the proposition. Theorem 2: Let on and for any that
. The circles of radius around cover the unit circle.
. There exists a constant there exists ,
depending , such
Proof: Let and and such that
be as in Proposition 1, , and set
be .
216
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 1, JANUARY 2000
We do this for two reasons: first, this serves as a predecessor to the general approximation algorithm given in the next section, and hence, eases the understanding of the main mechanism of the latter. Second, this algorithm is much simpler than the one given in the next section when specialized to the case of 16th roots of unity, and it runs much faster in practice. We adopt the notation of the previous sections, and , in addition, we set throughout this section . A. The Main Idea Let us start by describing the main idea of the algorithm (which is quite similar to the one given in Appendix C). As was described in Section I, our algorithm runs in levels. At each to improve level, we will use a different set of elements in the quality of approximation, starting with the approximation obtained from the previous level. In the following, we will concentrate on one level only, which we may as well assume to be the first. Here, we start with zero as the first approximation and increase the value of the approximation step by step by adding a small algebraic integer whose signature is inverse to that of the current approximation. Loosely speaking, the signature of an is the vector of signs of its coefficients when element in , , represented in the basis consisting of , . By adding a small element with inverse signature, we ensure that the absolute values of the coefficients remain bounded. For example, suppose that we want in . Let be a set whose elean approximation to ments have the following representation with respect to the basis :
Fig. 2. Circle of radius
and the unit circle.
The next theorem gives a lower bound on the size of the ele. ments of Theorem 3: For fixed pending on , such that there exists a constant de-
Proof: We will only prove this statement for the elements . The corresponding assertion for can be of proved in exactly the same way. During this proof we set . Let be the smallest element of this set. Let be the matrix defined in Appendix A. Then, using (4) of Appendix A we obtain
where . The -norm of a real-valued vector is the sum of the absolute values of its entries. Note that if is a square matrix, then , where is the maximum of the -norms of the columns of . Hence, taking -norms of the above equation we obtain
Note that , and that the elements of are of different . Next, we choose from the set signatures. We start with an element that is closest to . This is Then we choose an element in that of , i.e., it has signature whose signature is inverse to . We obtain
where Let
. be the norm of . We have
We proceed by adding to an element of with signature . The third component can be plus or minus, since is zero. This gives the third component of where Observing that has signature we obtain
is a constant depending on . Since is a nonzero integer, we , which implies the assertion. have III. 16TH ROOTS OF UNITY In this section we describe in detail the approximation of real numbers in the interval by elements from .
which has larger distance to than does. Hence, we stop . The error of this with the approximation approximation is 0.00121 . Several questions remain about this algorithm: how do we obtain the set ? What if during the approximation we encounter an element whose inverse signature is not contained in ? What is the quality of the approximation of this algorithm? In the remaining part of this section we will answer all these questions.
BUHLER et al.: FAST AND PRECISE FOURIER TRANSFORMS
217
B. Signatures As was mentioned above, the signature of an element in is, roughly speaking, the vector of signs of its coefficients. Unfortunately, the sign of zero is not uniquely determined, so we let have to use a more technical definition. For if if if . For and Examples are let
Proof: Let and The assumption implies that , we obtain Since . for all for all .
In order to carry out the idea of using signatures, we would having all have to construct a set of small elements of possible signatures. This may not be a feasible task. Instead, the following lemma shows that it is enough to construct a set with only six appropriate signatures, at the cost of a . Let , small sacrifice on the bound , , and for let be such that and , . Lemma 4: Let
and
For let and define . Hence . The following properties of the -function can be easily verified. Remark 2: Let
Then there exists Proof: If Lemma 3. Suppose or such that
such that
. then apply . Then either , and let , there exists
. Suppose that . Since
Then , a) , b) . c) then and . d) If -function and the An important relationship between the bound on the coefficients is given in the next lemma. Lemma 2: Let and
Let
Then for we have Hence Lemma 2. The case
and for we have , since is handled analogously.
by
a) If b) If .
then there exists such that then there exists such that
.
We will construct from a smaller set having only three elements with disjoint signatures. The missing signatures can be obtained by multiplication with appropriate elements of . Lemma 5: Suppose that , and . Let Then a) b) c) is such that for
Proof: a) There are three of them are
by Remark 2d). Suppose that all . Then
for , a contradiction. by Remark 2d). Suppose that all b) There are three . Then of them are for all , a contradiction. The idea of using signatures is captured by the following result. Lemma 3: Suppose that . Then and that .
. with
, . Proof: Part c) is obvious, so we concentrate on the first two parts. Let
218
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 1, JANUARY 2000
holds. Using Lemma 9 and Example 2, we obtain the following explicit form:
From this equation we can derive a relationship between the position of the maximal conjugate of an element and its signature. Proposition 2: Suppose that . Then we have some if if if if if if Proof: Suppose that , . Then for
.
Fig. 3.
Approximation by signatures (ABS16) in
[
e
]
.
be such that and we have
. We prove that . Note first that
The remaining cases can be proved analogously. The next theorem gives sufficient conditions for elements to belong to the set . Theorem 4: Suppose that as . Furthermore, a direct calculation shows that for all . and The other cases dealing with multiplication with are handled analogously. The algorithm Approximation by Signatures (ABS16) is given in Fig. 3 and will be analyzed later in this section. The basic precomputation step of this algorithm is the construction , , satisof sets and , . fying C. Construction of and Analysis of ABS16 for some , and is such that
. This set will consist of cyclotomic It remains to construct units, see Appendix B. By(4) in Appendix A we know that for
Then and Proof: The assertion on has been proved in the previous proposition. The other claim is obtained using Lemma 10 in Appendix B-A:
where tion. the equation
. This proves the asser-
The main ingredient of our algorithm is the choice of the eleas power products of , ments of .
BUHLER et al.: FAST AND PRECISE FOURIER TRANSFORMS
219
Suppose that and that is such that for , and . Then satisfies the assumptions of the above theorem. Taking logarithms we obtain
and
Minimizing subject to the above inequalities gives an integer linear programming (ILP) problem in three vari. One can prove that the optimal value of the ables , see the proof of Theobjective function of this ILP is orem 7. Now we can analyze Algorithm ABS16. Theorem 5: Algorithm ABS16 computes the desired output iterations, where is an absolute constant. in less than . First we show that ABS16 Proof: Let computes the desired output in no more than
Similar to the algorithm presented in the previous section, the general algorithm also works in levels, using at each level a magic set consisting of small elements such that for each approximation in that level, there is an element in the magic set so that addition of that element to the current approximation does not violate the bounds on the maximum size of the coefficients. However, unlike the algorithm of the previous section, we do not have a signature technique available. Instead, we will use a technique based on the magnitude of the Galois conjugates of the given element: note that in view of Lemma 1 in Appendix B-A, the condition suffices to guarantee . Hence, if we add to an element whose maxthat imum of the absolute values of Galois conjugates is at the same location as , but with a different sign, we see that adding that , if the element to does again lead to an element inside other conjugates of that element are small. This, and ideas to compute the magic set, are addressed in the remaining of the section. A. The Algorithm For let
iterations. At step the difference between the s obtained from two successive runs of the while-loop and the while-loop terminates iff is at least . Hence, the while-loop is pertimes. For the formed at most while-loop is performed at most times. This proves the claim. Hence, for proving the theorem, it suffices to show that
The basic precomputation step is that of finding a subset of consisting of positive elements with the following properties: we have and a) for all ; b) for all : ;
. c) has the above properties, then for an We first show that if there always exists arbitrary element in such that . Roughly speaking, one has to choose an element that has its maximum conjugate at the from same position as , but with a different sign. Recall that the sign is if is of a nonzero real number , denoted by if it is negative. positive, Lemma 6: Let , and if , and are defined as above. Then , . Further, let if , where the ,
for some constants and . But this follows from the above discussion and Theorem 3. Note that in ABS16 one has to compute a real approximation of and compare it to that of at each step. This can be done at round , if we precompute approximations to in time (we have to add/subtract -bit integers). As a result, the overall . running time of the algorithm is IV. THE GENERAL APPROXIMATION ALGORITHM In this section we describe a general procedure to approxby real elements of imate real numbers in the interval , and prove Theorem 1. We adopt the notation of the pre, . vious sections, and set and some , our aim is to find Given such that for some constant (possibly depending on ).
Proof: Let Then
. Suppose first that
.
since by Condition b) on
we have
.
220
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 1, JANUARY 2000
times. For performed at most
the inner loop is times.
The remaining (and more difficult) problem is the design of . We will study the more general problem of conthe sets for arbitrary . In a first step we show how to structing from a set with only half as many elements. construct have their conjugates at different positions. The elements of another element The idea is to construct from an element in having its maximal conjugate at the same position but with a different sign, by multiplying this element with an appropriate . Lemma 7: Let that , , and suppose
Fig. 4. General approximation algorithm (GAA).
Let Suppose now that . Then a) b) c)
, . Then
odd, be such that ,
, and let
. Proof: Let . by Proposition 5. Hence a) follows.
a) b) We have where the last inequality follows from Condition (c) on .
The idea of the approximation algorithm is now fairly simple. for different and starting from we We compute sets improve our approximation by adding elements from the sets until no further improvement is possible. Using Lemma 6 we know that we can always find such elements without violating , we increase the bound on the size of the coefficients. If the value of and start all over again. Suppose that we have found sets for Let and , such that conditions a)c) are satisfied.
c) By Proposition 5 we know that . we get
for all . Hence
Using the above lemma we only need to construct a set described below, which only has half as many elements as the . The construction is summarized as follows. set Remark 3: Let a subset such that be
The general approximation algorithm (GAA) is given in pseudocode in Fig. 4. Theorem 6: Algorithm such that more than GAA computes an element and uses no
for all . For each divides
let , where . Then by Lemma 7, the set
satisfies conditions a)c) stated at the beginning of this section. and Note that since by Proposition 5. B. Design of
iterations. the difference between the s Proof: At step obtained from two successive runs of the inner loop is at least and the inner loop terminates iff . Hence, the inner loop is performed at most
To construct the sets we will use power products of the cyclotomic units introduced in Appendix B-B. For let . we want to find integers For a given such that satisfies the inequalities given in Remark 3. (To keep the notation simple, we
BUHLER et al.: FAST AND PRECISE FOURIER TRANSFORMS
221
suppress the dependency of the on .) In fact, we will find satisfying the stronger conditions and Taking logarithms this gives
The next proposition shows that the optimal value of the obfor fixed , and comjective function of ILP1 is pletes the proof of Theorem 7. Proposition 4: With the notation of the previous lemma let
Then for fixed we have . Proof: With the notation of the proof of the previous proposition we have for
Recall the matrix defined in Appendix B-B. Let be the matrix obtained from by subtracting , and leaving the th row unthe th row from the th for is invertible by Theorem 9. In terms of changed. Obviously, this matrix the above inequalities can be summarized as . . . (1)
Hence, there exists a nonnegative such that Analogously, there is a nonnegative such that on Since is a unit, we have
depending on
but not on
depending on
and not
at position and where is the vector having entry at positions , and the inequalities are entry to hold component-wise. Our aim is to find a small satisfying these inequalities. This gives us the following integer linear programming problem: ILP1 Minimize subject to
Summing up, we obtain
While branch and bound methods easily yield optimal solutions of the above problem for small values of (e.g., or ), the integer linear programming approach is not feasible for larger . Below we will give another approach to find (asymptotically optimal) solutions of ILP1, which will also prove the following. Theorem 7: The integer linear programming problem ILP1 is solvable and for fixed , the optimal value of its objective . function is , and for each let Let . Let be the vector having entry at position , and entry at position . The following proposition shows that ILP1 is always solvable. Proposition 3: Let
Hence, for fixed
we get the assertion.
The last two propositions form the proof of Theorem 1. for Proof of Theorem 1: We first compute the sets . Using the previous propositions we see that each of these sets can be constructed in time (solving linear equations). The computation of all the takes time. From we construct using time. We then incorporate Remark 3. This takes these sets into Algorithm GAA given in Fig. 4. Since the are of order maximal and the minimal elements of the sets for fixed by the previous proposition and Theorem 3, Theorem 6 implies that Algorithm GAA computes with and in time . Theorem 2 in Section II shows that the approximation error given in Theorem 1 is essentially optimal. is not essenIt should be noted that restriction to we can compute an approxtial. Actually, for any in the following way. We first approximate imation with an element of , and then add to this ap. Furthermore, comproximation to obtain an element in bining the approximation algorithm with scaling techniques, we by elements can obtain approximations of order (see also Section V). of the set Furthermore, it is easy to show that the overall running time : at round of the algorithm, we need of GAA is
and
. Then Proof: Let gives the equation
satisfy the inequalities (1). . The th entry of
where
is the th entry of
. Hence
by the definition of
.
222
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 1, JANUARY 2000
time, since we are dealing with addition/subtraction of -bit time. integers which requires V. IMPLEMENTATION RESULTS In this section we have gathered results of our implementations of the approximation algorithms discussed in this paper. Three different algorithms were implemented: approximation by eighth roots of unity, as described in Appendix C, ABS16, . and GAA for the case of A. Eighth Roots of Unity consisting We designed a data type for the elements of of two integer coefficients and a value (IEEE 64-bit double format) corresponding to the value of the integer as a real number. This reduced computing the value of an approximation to floating-point addition. (In a hardware implementation, the floating-point addition would have to be replaced with , we ran our a fixed-point addition, of course.) For each . The algorithm for 1000 random inputs in the interval maximum absolute error observed was slightly less than the bounds computed in Appendix C. The number of iterations to reach the desired bound was, however, significantly less than for small those bounds, with ratios ranging between (around ) to for larger (around ). Our algorithm runs fast in practice. For instance, 100 000 aptook 0.2 s proximations of random numbers in the interval on an ULTRASPARC-1 with 167 MHz. Due to its simple form, it is well suited for real-time applications when combined with scaling methods. B. 16th Roots of Unity were computed by The units giving rise to the sets solving the integer linear programming problems of Section III. The resulting ILPs were solved by the integer linear [16]. Using these units we programming package computed upper bounds for theoretical worst case running times (in terms of the number of iterations) of the approximation algorithm as given in Theorems 5 and 6. Here too, we ran for the algorithm on 1000 random numbers in the interval in a set of 13 values ranging from to . For each the implementation of ABS16 we deviated slightly from the algorithm given in Section III by replacing the single addition steps by multisteps obtained from multiplying the current unit by an appropriate multiple. Using this strategy, the observed number of iterations was less than the theoretical upper bounds (around ) to for by a factor ranging from for small (around ). larger Our implementation is quite fast: for instance, 13 000 real approximations were accomplished in 0.7 s on an ULTRASPARC-1 with 167 MHz. C. 32nd Roots of Unity for a speIn a preprocessing phase we computed sets cial sequence of s described below. The resulting ILPs were solved by the integer linear programming package [16]. All the computations in the field including those of the units were done with the package PARI [17]. For
each element found in this way, we also computed and stored all the conjugates. This data was used by the approximation algorithm to reduce computing the conjugates of the approximations to table lookups and additions/subtractions. To avoid a large number of iterations, we had to design the sets in such a way as to minimize the sum given in Theorem to resulted in 6. Deviation from the sequence a good tradeoff between the number of iterations and the amount in a set of memory used to store all the . Again, for each to we ran our alof 22 different values ranging from . We obgorithm for 1000 random numbers in the interval served that the theoretical upper bounds on the number of iterations obtained via Theorem 6 were too pessimistic by a factor to . ranging from Due to the complicated data structures necessary for this case, our implementation was not as fast as the ones reported earlier in this section. For instance, 22 000 real approximations took 17 s on an ULTRASPARC-1 with 167 MHz. In general, our implementations and large number of examples we have computed so far support the conjecture that the bounds we have obtained on the range of the approximation are quite pessimistic in real applications. D. Error of the FFT Suppose that we want to compute the DFT of a vector and assume that all the entries of are smaller than in absolute value. The approximation unit produces a vector whose entries differ from the corresponding entries of by at most , , where say, in absolute value. Then we have denotes the Euclidean norm. Let denote the (normalized) DFT-operator of length . Since is unitary, we have Since our FFT procedure does not add additional computational , and thus we see that the maximal error noise, it computes . The reader can now of our routine is bounded above by plug in the different values of obtained from the theory given above, or from the experiments reported in this section, to derive upper bounds for the error in the DFT-routine. VI. EXTENSION TO OTHER FIELDS In this section we prepare the ground for a far-reaching generalization of the main results of this paper. We will start with the is characterization of those complex numbers such that dense in , see Theorem 8. This theorem has been taken from [18]. It turns out that the necessary conditions for this to hold are also sufficient: if is neither real nor an algebraic integer is a lattice), then is dense. of degree (in which case Theorem 8: For a complex number the following statements are equivalent. is dense in . a) and . b) Proof: It is obvious that a) implies b), hence we focus on the converse. The proof proceeds in several steps. and then is dense in . Claim 1: If To prove this, note that, according to our assumptions, all are two-dimensional lattices in . Hence,
BUHLER et al.: FAST AND PRECISE FOURIER TRANSFORMS
223
for every that
there exist uniquely determined
such
Since the diameter of is smaller than and , our first claim follows. and contains a nonzero with Claim 2: If then is dense in . for all . This is obvious by Claim 1, since and . If contains a Now let then a) follows by Claim 2. So we are nonzero with left with the case (2) We are going to show that this case is impossible. To begin is discrete, since by (2) the difference with, we note that has absolute value . of any two different elements in satisfying Hence there exists
index of the group of units of the ring , where bar means complex conjugation. Then in exactly the same way as in Section IV we can construct an approximation algorithm whose worst case approximation error is in if the absolute values of the coefficients are bounded by , where is the maximal number of independent units of the ring. By Dirichlets Unit Theorem [19] the number equals . Abelian number fields satisfy the above assumptions. Hence, our algorithm is readily extendable to these fields. VII. CONCLUSION In this paper we have introduced an abstract processor for high-precision FFT computations which operates using fixed-point arithmetic. As fixed-point processors are generally cheaper than floating-point processors, this approach could ultimately lead to considerable cost reduction in areas where high-precision FFTs are needed. These areas include computer tomography, high-resolution radars, or geophysical signal processing. For instance, in the latter application, one tries to obtain information about different layers of the earth by studying shock waves of deliberate controlled explosions. Long FFTs will lead to a better resolution of the image, and precise FFTs will ultimately lead to more reliable data. The processor we have introduced combines old and new techniques. It uses the well-known approach of using Chinese remaindering techniques. This approach has not been very successful in the past because of the lack of a reasonable unit to convert the input data into the domain in which the Chinese remaindering approach operates. These domains are often finite fields obtained from reduction of rings of algebraic integers of certain number fields modulo appropriate prime divisors. Using the theory of cyclotomic fields we have designed a fast and practical approximation algorithm, and have also shown its optimality with respect to its worst case approximation error. The main tool for the design of our algorithm is the Galois theory of cyclotomic fields and the theory of cyclotomic units. In particular, the approximations that our algorithm computes is a sum of appropriately chosen cyclotomic units. Best implementation results are obtained for the case of approximating by eighth and 16th roots of unity: in these cases, the Galois-spectrum technique of the general case can be replaced by simple signature techniques requiring only additions/subtractions and table lookups. We have also shown that the methodology of our algorithm is, in fact, applicable to a much wider class of fields, including Abelian fields, thereby revealing the intimate relationship between the worst case error of approximation algorithms and the unit rank of the field over which the approximation algorithm is defined. APPENDIX A GALOIS EXTENSIONS be a Galois extension of fields with group . Further, let be a -basis of , with . We wish to derive a reand , , lationship between the sets . We start by noting that for any nonzero and Let
Claim 3: To see this, let and with Since and using
. . Since is not real there exist such that we have . But then, (see(2)), we get
If we get by minimality of . If then ; hence . Combining this with we get , by (2). In both cases and our claim follows. . To finish the proof we derive the contradiction and By Claim 3 we already know that for suitable . Hence
This completes the proof of the theorem. In the same way as above we can also show that for real the is dense in if and only if is not an integer. set This result suggests the following generalization of the general approximation algorithm GAA: suppose that where is a totally real algebraic integer. Suppose further that we already know a subgroup of finite
224
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 1, JANUARY 2000
there exists an invertible matrix that . . . . . .
, such
(3)
so Gal . Note that is the complex , is a subfield conjugation. Hence its fixed field, denoted by . of and has index in and for let . The elements Let of form an integral basis of , i.e., and is the ring they are a -basis of of . For a positive integer we define of integers
is an injective homomorphism of into The map (regular representation). Since the entries of beGL long to , they are invariant under . Hence, taking the -conjugates of (3) yields where
By Galois theory we have Gal An explicit isomorphism is as follows: Let Gal . Then defined by be
. . .
. . .
..
.
. . .
Gal Example 1: Let under are given by and . The images of the
and is the diagonal matrix . The following result with diagonal entries is well-known [20, Sec.5, Corollary 5.4]. is invertible. Lemma 8: and that In the following we will assume that implies that identity map. The definition of (multiply both sides of (3) with Hence Noting that the first row of is the
from the left).
For we set . In accordance with Ap. is invertible by pendix A we define in this case. Lemma 8 and we can explicitly determine Lemma 9: We have
is the all one vector, we obtain (4) where Proof: For . the -entry of equals
APPENDIX B CYCLOTOMIC FIELDS In this appendix we review some basic and well-known facts about cyclotomic fields. Proofs of the classic results not explicitly proved here can be found, e.g., in Washingtons book [19]. A. Explicit Galois Theory , and be the cycloLet by . has a -basis tomic field generated over and its ring of integers is
Recall that for a positive integer we denote by the of . subset is a Galois extension of with Galois group isomorphic under the canonical isomorphism to Gal It is well known that
where is the Kronecker function. (Note that if , then the under Gal is zero.) Analsum of the conjugates of -entry of equals . ogously one proves that the Example 2: For we have
BUHLER et al.: FAST AND PRECISE FOURIER TRANSFORMS
225
and
B. Cyclotomic Units The norm of an element is defined as
For
let
is related to the different conjugates of in the following way. Lemma 10: For we have
is an algebraic integer, ; is called a unit if . A set of units is called indepen, , implies . dent if For general number fields, it can be quite hard to find a maximal set of independent units. For cyclotomic fields and their maximal real subfields, however, the situation is quite different: the cyclotomic units form a maximal set of independent units. we set They are defined as follows: for [19, Ch. 8]. The following im. portant fact holds for the matrix Theorem 9: The matrix is invertible.
As
Proof: By (4) we know that
Since are
for all , the absolute values of the entries of by Lemma 9.
We will use this lemma in the following form. Corollary 1: If is such that
A proof of this classic result can be found in [19, Ch. 8.1]. The regularity of this matrix is a consequence of the nonvanat for nontrivial ishing of the Dirichlet -series characters .
APPENDIX C APPROXIMATION BY EIGHTH ROOTS OF UNITY for some positive integer , then we have Recall that for . We will in this section design an algorithm for approximating by real elements in the ring of real numbers in the interval eighth roots of unity. In particular, we will prove Theorem 1 for . The reason for a separate inclusion of this case the case is that it is much simpler than the general case and leads to an interesting, fast, and practical algorithm. As was described earlier in Section I, our algorithm will work in levels. At level the algorithm uses a magic set of small pos. These elements will have inverse itive elements of signatures, i.e., the signs of the two coefficients of these eleand . Using them, we can improve our ments are approximation at level by adding an element of the magic set whose signature is the inverse of the current approximation. In this way we make sure that the coefficients of the new approxiin absolute value. mation will be less than Construction of the magic set uses the continued fraction ex. In this respect, and only in this, our algorithm is pansion of and be sequences similar to that of Games [14]. Let of integers defined by
Lemma 10 explains the significance of this quantity: we always . have Proposition 5: Let and odd. Then and be such that
Proof: Let us first prove that for . Hence, and all odd Next we show for all
: note that .
Notice that for all . Hence, we need to prove since the left equality. Observe that . Thus, , and for odd we have
(5) Hence are called the convergents of The quotients . The following (wellthe continued fraction expansion of known) results will be useful in analyzing our approximation algorithm. Lemma 11: The following assertions hold for a) b) . . :
Now note that By the above we have
.
(replace
by
).
226
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 1, JANUARY 2000
Theorem 10: Algorithm ABS8 computes
with
Fig. 5. Approximation by signatures (ABS8) for
M =P .. . , .
c) d) e)
Proof: a) This is obvious from the recursion formulas. b) The right-hand inequality is well known, see [21]. For the left-hand inequality observe first that
in at most iterations. . Lemma 12 assures that Proof: Let The approximation error at step can be upper-bounded by which is less than by Lemma 11d). Let us study how many times the inner loop is performed . First, notice that for each the algorithm proat step of . Furthermore, duces an approximation for some nonnegative integers and . The number of times the inner loop is performed at step is then . Since and , we obtain . This gives , , . Hence, . and rules out the case times. This At step 1 the loop is performed at most implies the assertion. For the error bound given in algorithm ABS8 can if we do not require that the be improved to approximation be less than . Using Remark 1 this yields an for the approximation error of is an arbitrary integer, we have to of complex numbers. If . multiply all these bounds by an additional factor of APPENDIX D ANALYSIS OF THE CHINESE REMAINDERING This appendix is devoted to the proof of the following theorem.
Hence, by a) we get
. The assumption would yield the contradiction
for . For the assertion can be verified directly. and follow from (5). The inc)e) The formulas for equalities can be easily derived from these formulas. For let
The algorithm approximation by signatures ABS8 is given as pseudocode in Fig. 5 and works as follows: suppose that . Starting with we run through all between and and improve our approximation by adding to the number or depending on the sign of the first coefficient of . As we are adding up positive numbers, the algorithm will eventually terminate. The following lemma is the basis of our approximation algorithm. Lemma 12: Let , ,
, be inTheorem 11: Let be a fixed integer. Let , be a constant, and , tegers, . Suppose that we have an algorithm that apwhere proximates complex numbers inside the unit circle by elements to within an absolute error of in time . Then, of using the Chinese remainder strategy described in this paper, we can compute -bit approximations of the DFT of a complex vector of length with entries inside the unit circle in time , where is the time required to compute the product of two -bit numbers. be a vector of length of Let complex numbers. We will assume that the components have lie inside the unit been scaled beforehand so that all of the circle, and we will assume that the are known to a precision is a Gaussian integer. of bits, which means that that Our goal is to compute bits of the components of the Fourier transform sequence
and , and if Proof: Since Suppose that shows that Similarly, is handled analogously.
. If , then , . Then
, then . is equivalent to . , which . If this is done by any of the usual FFT algorithms using fixedpoint arithmetic, then the calculation will require arithmetic operations, and one expects to lose bits in each component due to computational noise [4], [22]. as in Theorem 1, and let . Fix an integer Use the approximation algorithm described there to calculate
. The case
BUHLER et al.: FAST AND PRECISE FOURIER TRANSFORMS
227
an approximation to the real and imaginary components of that is accurate to -bits; for this, it suffices each of the , where is a constant which to take guarantees the approximation error to be less than . The whose entries are in , where result is a vector . Similarly, we compute once and for all -bit approximations to the roots of unity (twiddle factors) that arise in computing -point FFTs. Now we choose prime numbers such that Each of these primes splits into prime ; concretely, this corresponds to ideals in of More precisely, a value in our expressions for for each such prime we replace modulo components of (or twiddle factors) by an integer , to get an element of the field with elements. We then perform the FFT by any of the usual variants of the CooleyTukey algorithm. The arithmetic is all done modulo and is exact. The Chinese remainder reconstruction has two phases. The first uses the values modulo to reconstruct the Fourier transmodulo ; the second uses the usual Chinese form vector remainder techniques to reconstruct . The former amounts to a discrete Fourier transform of length (we know values of and can recover the coefa polynomial of degree less than ficients). The reconstruction of from the is done by standard Chinese remainder techniques; the running times of both of these phases will be analyzed below. to complex numbers Finally, we convert the elements of by replacing and its powers by -bit complex approximations. This finishes the description of the algorithm. In order to assess the asymptotic complexity of the procedure, we need to make assumptions about the various parameters. For the sake of clarity and simplicity, we choose to think of the approximation-degree as fixed, with the precision and the length of the Fourier transform going to infinity. Each of the initial approximations can be calculated in time
[23] says that the number of primes up to that are congruent is within an error of its asymptotic value to , where the error term is
Thus for large enough it suffices to take, for instance, . Then we get almost primes each with at least bits. Their product has bits and is certainly larger than . In addition, each prime has at most bits, opso that the arithmetic certainly involves at most erations as claimed. primes (this is an overesThus we have timate in reality) and the computational complexity of a single . The overall complexity of the FFT is . FFT phase is then The first Chinese remainder step involves, in each of comof integers ponents, discrete Fourier transforms of length modulo . The total complexity is, therefore,
and this initial phase of the computations thus takes time . This is a key phase of the computation. Note that in practice , which is constant in our theoretthe denominator ical model, actually makes this idea even more attractive; in any event, the running time given in Theorem 1 shows that this complexity will clearly be dominated by the complexity of the ensuing FFTs. takes arithmetic opEach of the FFTs . In order for the Chinese erations on integers of size remainder techniques to work, we need the product of the primes to exceed . We choose successive primes congruent that exceed . to We need to know that the primes involved are not significantly -bit arithlarger than , so that the FFT arithmetic is metic. In practice, it is easy to find the necessary primes. We can also easily prove that there are enough primes, at least if we assume the extended Riemann hypothesis. Indeed, the effective version of Dirichlets theorem on arithmetic progressions in
which is majorized by the previous FFT step. The second Chinese remainder step involves a Chinese remainder computation for each of the components. on primes of length To perform this step efficiently, we assume that we have already performed the following precomputation step: for each prime found above, we compute a positive integer less than the is congruent to modulo product of the primes such that and congruent to modulo the other primes involved. In this way we can perform the inverse of the Chinese remaindering operations: we need to compute expressions by using . The inner summands can be computed of the form operations (using standard multiplication). using primes in total, the running time of As there are , even using the grammar this stage of the algorithm is school method of integer multiplication. At this stage of the algorithm we have obtained a vector whose entries are integral linear combinations of length in absolute of powers of with coefficients bounded by value. For each of these entries we multiply precomputed -bit approximations of the roots of unity with the -bit , the total running time coefficients. Since for the final conversion. All in all, the overall is running time of the algorithm is of order
which is
. This proves Theorem 1. We would like to emphasize that the technique introduced in this paper can be used (almost verbatim) to compute split radix fast Fourier transforms as well.
as
228
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 1, JANUARY 2000
Remark 4: The most realistic value for with reason; in theory, one able values of the parameters is might replace this with the asymptotically fast value [24]. It is also conceivable that spewould cial-purpose chips would be constructed, so that represent a more general notion of complexity, such as chip area or running time. We should note that in practice the original values of the might have been quantized from values known to greater than -bit precision. Random quantization error of this kind tends to produce less problems in FFT algorithms than computational noise [22]; a standard stochastic analysis says bits of the that on average this sort of error contaminates result [25]. Since this applies equally to the usual fixed-point model and to our model (and even to floating-point models), and since it gives lower order terms in any complexity analysis, we have ignored this kind of quantization error throughout. ACKNOWLEDGMENT The authors wishes to thank three anonymous referees for their suggestions on improving the presentation of the material in this paper. REFERENCES[1] J. H. McClellan and C. M. Rader, Number Theory in Digital Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1979. [2] J. W. Cooley and J. W. Tukey, An algorithm for the machine calculation of complex Fourier series, Math. Comp., vol. 19, pp. 297301, 1965. [3] J. H. Karl, An Introduction to Digital Signal Processing. Boston, MA:: Academic, 1989. [4] A. V. Oppenheim and R. W. Schfer, Digital Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1975. [5] P. D. Welch, A fixed-point fast Fourier transform error analysis, IEEE Trans. Audio Electroacoust., vol. AU-17, pp. 151157, 1969. [6] R. C. Agarwal and C. S. Burrus, Fast convolution using Fermat number transforms with applications to digital filtering, IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-22, pp. 8797, 1974.
[7] J. H. Cozzens and L. A. Finkelstein, Computing the discrete Fourier transform using residue number systems in a ring of algebraic integers, IEEE Trans. Inform. Theory, vol. IT-31, no. 5, pp. 580588, 1985. [8] E. Dubois and A. N. Venetsanopoulos, The discrete Fourier transform over finite rings with application to fast convolution, IEEE Trans. Comput., vol. C-27, pp. 586593, 1978. [9] C. M. Rader, Discrete convolution via Mersenne transforms, IEEE Trans. Comput., vol. C-21, pp. 12691273, 1972. [10] I. S. Reed and T. K. Truong, The use of finite fields to compute convolutions, IEEE Trans. Inform. Theory, vol. IT-21, pp. 208213, 1975. , Complex integer convolution over a direct sum of Galois fields, [11] IEEE Trans. Inform. Theory, vol. IT-22, pp. 475568, 1976. , Convolutions over residue classes of quadratic integers, IEEE [12] Trans. Inform. Theory, vol. IT-24, pp. 343344, 1976. [13] R. A. Games, Complex approximations using algebraic integers, IEEE Trans. Inform. Theory, vol. IT-31, no. 5, pp. 565579, 1985. , An algorithm for complex approximation in [e ], IEEE [14] Trans. Inform. Theory, vol. IT-32, pp. 603607, July 1986. [15] R. A. Games, D. Moulin, S. D. ONeil, and J. J. Rushanan, Algebraic integer quantization and residue number system processing, in ICASSP 89, 1989, pp. 948951. [16] M. Berkelaar. lp_solve 2.0 release. [Online] available via anonymous ftp from ftp://ftp.es.ele.tue.nl/pub/lp_solve/. [17] C. Batut, D. Bernardi, H. Cohen, and M. Olivier, Users Guide to PARI-GP, Univ. Bordeaux, obtainable via anonymous ftp from megrez.math.u-bordeaux.fr,, May 1995. [18] M. Clausen and M. A. Shokrollahi, Dense -modules in ,, unpublished, 1988. [19] L. C. Washington, Introduction to cyclotomic fields, in Graduate Texts in Mathematics, 2nd ed. New York, NY: Springer-Verlag, 1996, vol. 83. [20] S. Lang, Algebra, 3rd ed. Readiddng, MA: Addison-Wesley, 1993. [21] I. Niven and H. S. Zuckerman, An Introduction to the Theory of Numbers, 4th ed. New York, NY: Wiley, 1980. [22] R. Storn, Algorithmen und Architekturen der diskreten Fourier Transformation zur schnellen Faltung reeller Signale, Ph.D. disssertation, Universitt Stuttgart, Stuttgart, Germany, 1990. [23] J. Oesterl, Versions effectives du thorme de Chebotarev sous lhypothse de Riemann gnralise, Astrisque, vol. 61, pp. 165167, 1979. [24] A. Schnhage and V. Strassen, Schnelle Multiplikation groer Zahlen, Computing, vol. 7, pp. 281292, 1971. [25] U. Heute, Fehler im DFT und FFT: Neue Aspekte in Theorie und Anwendung, Ph.D. dissertation, Universitt Erlangen, Erlangen, Germany, 1982.