Lecture 7 Inner product spaces (cont’d)links.uwaterloo.ca/amath391w13docs/set3.pdf · ·...
Transcript of Lecture 7 Inner product spaces (cont’d)links.uwaterloo.ca/amath391w13docs/set3.pdf · ·...
Lecture 7
Inner product spaces (cont’d)
The “moral of the story” regarding discontinuities: They affect the rate of conver-
gence of Fourier series
As suggested by the previous example, discontinuities of a function f(x) create problems for its Fourier
series expansion by slowing down its rate of convergence. At a jump discontinuity, the convergence
may be quite slow, with the partial sums demonstrating Gibbs’ “ringing.”
Another way to look at this situation is as follows: Generally, a higher number of terms in the
Fourier series expansion – or “higher frequencies” – are needed in order to approximate a function
f(x) near points of discontinuity.
But, in fact, it doesn’t stop there – the existence of points of discontinuity actually affects the rate
convergence at other regions of the interval of expansion. To see this, let’s return to the two examples
studied above, i.e., the functions
f1(x) = |x| =
−x, −π < x ≤ 0,
x, 0 < x ≤ π,(1)
and
f2(x) =
−1, −π < x ≤ 0,
1, 0 < x ≤ π,(2)
Note that we have subscripted them for convenience. Recall that the function f1(x) is continuous on
[−π, π] and its 2π-extension is continuous for all x ∈ R. On the other hand f2(x) has a discontinuity
at x = 0 and its 2π-extension has discontinuities at all points kπ.
We noticed how well a rather low number of terms (i.e., 5) in the Fourier expansion of f1(x)
approximated it over the interval [−π, π]. On the other hand, we saw how the discontinuities of f2(x)
affected the performance of the Fourier expansion, even for a much larger number of terms (i.e., 50).
This is not so surprising when we examine the decay rates of the Fourier series coefficients for each
function:
1. For f1(x), the coefficients ak decay as O(1/k2) as k → ∞.
2. For f2(x), the coefficients bk decay as O(1/k) as k → ∞.
65
The coefficients for f1 are seen to decay more rapidly than those of f2. As such, you don’t have to
go to such high k values (which multiply sine and cosine functions, of maximum absolute value 1) for
the coefficients ak to become negligible to some prescribed accuracy ǫ. (Of course, there is the infinite
“tail” of the series to worry about, but the above reasoning is still valid.)
The other important point is that the rate of decay of the coefficients affects the convergence over
the entire interval, not just around points of discontinuity. This has been viewed as a disadvantage
of Fourier series expansions: that a “bad point,” p, i.e. a point of discontinuity, even near or at the
end of an interval will affect the convergence of a Fourier series over the entire interval, even if the
function f(x) is “very nice” on the other side of the interval. We illustrate this situation in the sketch
on the left in the figure below.
Researchers in the signal/image processing community recognized this problem years ago and
came up with a clever solution: If the convergence of the Fourier series over the entire interval [a, b]
is being affected by such a bad point p, why not split the interval into two subintervals, say A = [a, c]
and B = [c, b] and perform separate Fourier series expansions over each subinterval. Perhaps in this
way, the number of coefficients saved by the “niceness” of f(x) over [a, c] might exceed the number of
coefficients needed to accomodate the “bad” point p. The idea is illustrated in the sketch on the right
in the figure below.
The above discussion is, of course, rather simplified, but it does describe the basic idea behind
block coding, i.e., partitioning a signal or image into subblocks and Fourier coding each subblock, as
opposed to coding the entire signal/image.
Block coding is the basis of the JPEG compression method for images as well as for the MPEG
method for video sequences. More on this later.
y = f(x)
discontinuity
p
“bad” point of
ba
Fourier series on [a, b]
“nice” region of smoothness
of f(x)
y = f(x)
p ba c
Fourier series on [c, d]Fourier series on [a, c]
.
66
Greater degree of smoothness implies faster decay of Fourier series coefficients
The effect of discontinuities on the rate of convergence of Fourier series expansions does not end with
the discussion above. Recall that the Fourier series for the continuous function f1(x) given above
demonstrated quite rapid convergence. But it is possible that series will demonstrate even more rapid
convergence due to the fact that the Fourier series coefficients ak and bk decay even more rapidly
than 1/k2. Recall that the function f1(x) is continuous, but that its derivative f ′(x) is only piecewise
continuous, having discontinuities at x = 0 and x = ±π. Functions with greater degrees of smoothness,
i.e., higher-order continuous derivatives will have Fourier series with more rapid convergence. We
simply state the following result without proof:
Theorem: Suppose that f(x) is 2π-periodic and Cn[−π, π], for some n > 0 – that is, its nth derivative
(and all lower order derivatives) is continuous. Then the Fourier series coefficients ak and bk decay as
ak, bk = O
(
1
kn+1
)
, as k → ∞.
An idea of the proof is as follows. To avoid complications, suppose that f is piecewise continuous,
corresponding to n = 0 above, the coefficients must decay at least as quickly as 1/k, since they
comprise a square-summable sequence in l2. Now consider the function
g(x) =
∫ x
0f(s) ds, (3)
which is a continuous function of x (Exercise). The Fourier series coefficients of g(x) may be ob-
tained by termwise integration of the coefficients of f(x) (AMATH 231). This implies that the series
coefficients of g(x) will decay at least as quickly as 1/k2. Integrate again, etc..
In other words, the more “regular” or “smooth” a function f(x) is, the faster the decay of its
Fourier series coefficients, implying that you can generally approximate f(x) to a desired accuracy
over the interval with a fewer number of terms in the Fourier series expansion. Conversely, the more
“irregular” a function f(x) is, the slower the decay of its FS coefficients, so that you’ll need more terms
in the FS expansion to approximate it to a desired accuracy. This feature of regularity/approximability
is very well-known and appreciated in the signal and image processing field. In fact, it is a very
important, and still ongoing, field of research in analysis.
The above discussion may seem somewhat “handwavy” and imprecise. Let’s look at the problem
67
in a little more detail. And we’ll consider the more general case in which a function f(x) is expressed
in terms of a set of of functions, {φk(x)}∞k=1, which form a complete and orthonormal basis on an
interval [a, b], i.e.,
f(x) =∞∑
k=1
ckφk(x), ck = 〈f, φk〉. (4)
Here, the equation is understood in the L2 sense, i.e., the sequence of partial sums, Sn(x), defined as
follows,
Sn(x) =n∑
k=1
ckφk(x), (5)
converges to f in L2 norm/metric, i.e.,
‖f − Sn‖2 → 0 as n → ∞. (6)
The expression in the above equation is the magnitude of the error associated with the approximation
f(x) ∼= Sn(x), which we shall simply refer to as the error in the approximation. This error may be
expressed in terms of the Fourier coefficients ck. First note that
f(x) − Sn(x) =
∞∑
k=n+1
ckφk. (7)
Therefore the L2-squared error is given by
‖f − Sn‖22 = 〈f − Sn, f − Sn〉
= 〈∞∑
k=n+1
ckφk,∞∑
l=n+1
clφl〉
=∞∑
k=n+1
|ck|2. (8)
Thus,
‖f − Sn‖2 =
[
∞∑
k=n+1
|ck|2]1/2
. (9)
Recall that for the above sum of an infinite series to be finite, the coefficients ck must tend to
zero sufficiently rapidly. The above summation of coefficients starting at k = n + 1 may be viewed as
involving the “tail” of the infinite sequence of coefficients ck, as sketched schematically below.
For a fixed n > 0, the greater the rate of decay of the coefficients ck, the smaller the area under
the curve that connects the tops of these lines representing the coefficient magnitudes, i.e., the smaller
the magnitude of the term on the right of Eq. (9), hence the smaller the error in the approximation.
From a signal processing point of view, more of the signal is concentrated in the first n coefficients ck.
68
|ck|2 vs. k
kn + 1
“tail” of infinite sequence
0
From the examples presented earlier, we see that singularities in the function/signal, e.g., discon-
tinuities of the function, will generally reduce the rate of decay of the Fourier coefficients. As such,
for a given n, the error of approximation by the partial sum Sn will be larger. This implies that in
order to achieve a certain accuracy in our approximation, we shall have to employ more coefficients
in our expansion. In the case of the Fourier series, this implies the use of functions sin kx and cos kx
with higher k, i.e., higher frequencies.
Unfortunately, such singularities cannot be avoided, especially in the case of images. Images are
defined by edges, i.e., sharp changes in greyscale values, which are precisely the points of discontinuity
in an image.
However, singularities are not the only reason that the rate of decay of Fourier coefficients may
be reduced, as we’ll see below.
Higher variation means higher frequencies are needed
In the previous discussion, we saw how the irregularity or lack of smoothness of a function f(x) –
for example, points of discontinuity in f(x) or its derivatives – affects the convergence of its Fourier
series expansion. This phenomenon is very important in signal and image processing, particularly in
the field of signal/image compression, where we wish to store approximations to the signal f(x) to a
prescribed accuracy with as few coefficients as possible.
In addition to smoothness, however, the rate of change of f , as measured by the magnitude of its
derivative, |f ′(x)|, or gradient ‖∇f‖, also affects the convergence. Contrast the two functions sketched
69
below. The function on the left, g(x), has little variation over the interval [a, b] whereas the one on
the right, h(x), has significant variation.
a b a b
g(x) h(x)
In order to accomodate the more rapid change in f(x), i.e., in order to approximate such a function
better, sine and cosine functions of higher frequencies, i.e., higher oscillation, are required. In other
words, we expect that the Fourier series coefficients of g(x) will decay more rapidly than those of h(x).
Example 1: We can illustrate this point with the help of the following analytical example. Consider
the normalized Gaussian function,
gσ(x) =1
2πσ2e−
x2
2σ2 , (10)
which you have probably encountered in a course on probability or statistics. The variance of this
function is σ2 and its standard deviation is σ. As σ decreases toward zero, the graph of gσ(x) becomes
more peaked – higher and narrower – as shown in the figure below. In what follows, we’ll consider the
function gσ(x) as defined only over the interval [−π, π] so that we may examine its Fourier series.
0
0.5
1
1.5
2
-3 -2 -1 0 1 2 3
G(t)
t
Gaussian functions
sigma = 1
sigma = 0.5
sigma = 0.25
70
Clearly, the magnitude of the derivative of gσ(x) is increasing near x = 0. Let us now observe
the effect of this increase on the Fourier coefficients of gσ(x). Since it is an even function, its Fourier
series will be composed only of cosine functions, i.e.,
gσ(x) = c0 +∞∑
k=1
ckφk, (11)
where we are using the orthonormal cosine basis set (see earlier notes),
φ0(x) =1√2π
, φk(x) =1√π
cos kx, k ≥ 1. (12)
Technically, the computation of the integrals of the Gaussian function is rather complicated since we
are integrating only over the finite interval [−π, π]. For sufficiently large σ, the “tail” of gσ(x) lying
outside this interval is very small – in fact, it is exponentially small, therefore negligible. To a good
approximation, therefore,
a0 =
∫ π
−πgσ(x)φ0(x) dx
∼= 1√2π
∫ ∞
−∞gσ(x) dx
=1√2π
, (13)
and
ak =1√π
∫ π
−πgσ(x) cos kx dx
∼= 1√π
∫ ∞
−∞gσ(x) cos kx dx
=1√π
∫ ∞
−∞e−
x2
2σ2 cos kx dx
=1√π
e−σ2
k2
2 . (14)
These results can be derived from the following formula that can be found in integral tables,
∫ ∞
0e−a2x2
cos bx dx =
√π
2ae−
b2
4a2 . (15)
You let a2 =1
2σ2and then do some algebra.
Note that the distribution of ak values with respect to k > 0 – we don’t even have to square
them since they are all positive – is a Gaussian distribution with variance1
σ. As we let σ → 0+,
71
k
Profile of ak coefficients
0
1
σ
the distribution spreads out, in complete opposition to the function gσ(x) getting more concentrated
at x = 0. (We’ll return to this theme – the complementarity of space and frequency – later in this
course.)
Example 2: This is a numerical version of the previous example. For 0 < a < π, let ga(x) denote
the function,
ga(x) =
√
32a
(
1 − xa
)
, 0 ≤ x ≤ a,√
32a
(
1 + xa
)
, −a ≤ x ≤ 0,
0, a < |x| ≤ π.
(16)
A sample graph of this function is sketched in the figure below.
π−π −a a
y
q
32a
y = ga(x)
The multiplicative factor√
3/(2a) was chosen so that
‖ga‖2 = 1, (17)
for all a > 0, a kind of normalization condition. Note that as a approaches zero, the peak becomes
more pronounced, since the magnitudes of the slopes of the peak are given by |g′
a(x)| =√
3/2a−3/2.
72
Since the function ga(x) is even, it will admit a Fourier cosine series (i.e., the coefficients bk of
all sine terms are zero). Here we consider the expansion of ga(x) in terms of the orthonormal cosine
basis,
e1(x) =1√2π
, ek(x) =1√π
cos kx, k ≥ 1. (18)
Then
ga(x) = c0 +∞∑
k=1
ckek, (19)
where
ck = 〈ga, ek〉. (20)
For example,
c0 =1
2√
π· 2 ·
∫ a
0ga(x) dx =
1
2
√
3a
π. (21)
Since ga ∈ L2[−π, π], the sequence of Fourier coefficients c = (c0, c1, c2, · · ·) is square summable, i.e.,
c ∈ l2 sequence space. Moreover, from a previous lecture,
‖ga‖L2 = ‖c‖l2 = 1, (22)
implying that∞∑
k=1
[ck]2 = 1. (23)
In the figure below are plotted the coefficients cn, 0 ≤ n ≤ 20, for a values 1.0, 0.5, 0.25, 0.1, 0.05.
(The coefficients were computed using MAPLE.) The plots clearly show that the rate of decay of the
coefficients decreases as a is decreased. For a = 1.0, the coefficients cn appear to be negligible for
n > 5, at least to the resolution of the plot. This would suggest that the partial sum function S5(x),
composed of cosine terms with coefficients c0 to c5 would provide an excellent approximation to ga(x)
over the interval. On the other hand, for a = 0.5, it appears that we would have to use the partial
sum S10(x), and so on.
In order to understand this more quantitatively, the partial sums S20(x) were computed for the a-
values shown in the above figure. From these partial sums, the L2 distances ‖ga−S20‖2 were computed
(using MAPLE). These distances represent the L2 error in approximating ga with S20. The results
are presented in the table below. Clearly, as a is decreased, the error in approximation by the partial
sums S20 increases. There appears to be a dramatic increase between a = 0.25 and a = 0.1.
Improvement by “block coding”. In light of the earlier discussion on “block coding,” let us see if
we can improve the approximation to the above triangular peak function by dividing up the interval
73
0
0.1
0.2
0.3
0.4
0.5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
n
a=1.0
a=0.5
a=0.25
a=0.1
a=0.05
Coefficients cn of Fourier cosine series expansion of the triangular peak-function ga(x) defined in Eq. (16), for
a = 1.0, 0.5, 0.25, 0.1, 0.05. As a decreases, the rate of decay of the Fourier coefficients cn is seen to decrease.
a ‖ga − S20‖2
1 0.012
0.5 0.026
0.25 0.056
0.1 0.460
0.05 0.733
Error in approximation to ga(x) afforded by partial sum functions S20(x) comprised of Fourier coefficients c0
to c20.
and coding the function separately over the subintervals. In the following experiment, the interval
I = [−π, π] was partitioned into the three subintervals,
I1 = [−π,−π/3], I2 = [−π/3, π/3], I3 = [π/3, π]. (24)
For a ≤ 1, the approximation of ga(x) over intervals I1 and I3 is trivial since ga(x) = 0. As such we
don’t even have to supply any Fourier coefficients but we should record the use of the first coefficient
c0 = 0. After all, the function ga(x) is constant on these intervals, and we should specify the value of
the constant. Since 21 coefficients were used in the previous experiment (S20(x) uses ck, 0 ≤ k ≤ 20),
we shall use 19 coefficients to code the function ga(x) over interval I2.
It remains to construct the Fourier series approximation to ga(x) over interval I2 = [−π/3, π/3].
74
From Lecture 7, we must employ the basis set
{ek} =
{
1√2a
,1√a
cos(πx
a
)
,1√a
sin(πx
a
)
,1√2a
cos
(
2πx
a
)
, · · ·}
(25)
where a = π/3. Once again, the sine functions are discarded since ga(x) is an even function. This was
easily done in MAPLE: For each a value, the necessary integrals were computed (actually only the
integrals over [0, a] were computed), followed by the L2 distance between ga and the S18(x) partial
sum functions. The results are presented in the table below. We can see an improvement for all a
a ‖ga − S18‖2
1 0.002
0.5 0.007
0.25 0.021
0.1 0.054
0.05 0.294
Error in approximation to ga(x) afforded by partial sums S18 of Fourier cosine series over interval [−π/3, π/3]
employing Fourier coefficients c0 to c18, along with the trivial Fourier expansions c0 = 0 on [−π,−π/3) and
(π/3, π].
values – a roughly five-fold decrease in the error for a = 1 and about a three-fold decrease for a = 0.05.
This very simple implementation of “block coding” has achieved the goal of decreasing the error with
a given number of coefficients.
Question: The fact that the Fourier series over [−π/3, π/3] works better to approximate the function
ga(x) might appear rather magical. Can you come up with a rather rather simple explanation for the
improvement in accuracy?
That being said, the improvement is rather impressive in this case because we know the function
essentially to infinte accuracy, i.e., we have its formula. If we had only a finite set of discrete data
points representing sampled values of the function, the improvement would not be so dramatic. We’ll
return to this matter after looking at discrete Fourier transforms.
75
Fourier series on the interval [−a, a], even and odd extensions
In a previous lecture, it was mentioned that the following functions comprise an orthonormal set on
the interval [−a, a], where a > 0:
e0 =1√2a
, e1 =1√a
cos(πx
a
)
, e2 =1√a
sin(πx
a
)
, e3 =1√a
cos
(
2πx
a
)
, · · · . (26)
Moreover, this set serves as a complete orthonormal basis for the space L2[−a, a] of square-integrable
functions on [−a, a]. Thus, for an f ∈ L2[−a, a],
f =
∞∑
k=0
〈f, ek〉ek. (27)
This may be translated to the following standard (unnormalized) Fourier series expansion having the
form
f(x) = a0 +
∞∑
k=1
[
ak cos
(
kπx
a
)
+ bk sin
(
kπx
a
)]
, (28)
where
a0 =1
2a
∫ a
−af(x) dx
ak =1
a
∫ a
−af(x) cos
(
kπx
a
)
dx
bk =1
a
∫ a
−af(x) sin
(
kπx
a
)
dx. (29)
(We use the term “unnormalized” since the coefficients ak, bk are multiplying the unnormalized func-
tions cos(kπx/a) and sin(kπx/a). The normalization factors, which involve√
a factors that become a
upon squaring, are swept into the ak and bk coefficients, which accounts for the factors appearing in
front of the above integrals.) Once again, in the special case a = π, the above formulas become the
standard formulas for Fourier series on [−π, π], cf. Eq. (1), Lecture 1 of these notes.
Fourier cosine series on [−a, a] and periodic extensions
In the case that f(x) is even, i.e., f(x) = f(−x), then all coefficients bk = 0, so that the expansion in
(28) becomes a Fourier cosine series expansion. Moreover, since f(x) is even, it need only be defined
on the interval [0, a], and the expressions for the coefficients ak become
a0 =1
a
∫ a
0f(x) dx, ak =
2
a
∫ a
0f(x) cos
(
kπx
a
)
dx, k ≥ 1. (30)
76
Now suppose that we are given a function f(x) defined on the interval [0, a] as input data. From
this data, we may construct the ak coefficients – these coefficients define a Fourier cosine series that
converges to to the even 2a-extension of f(x), constructed from f(x) by means of two steps, illustrated
schematically in the figure below,
1. A “flipping” of the graph of f(x) with respect to the y-axis to produce an even function on
[−a, a].
2. Copying this graph on the intervals [a, 3a], [3a, 5a], etc. and [−3a,−a], [−5a,−3a], etc..
a 2a 3a 4a0
2a-extension2a-extension
y
x
even extension
of data
−5a −3a −2a −a
original data
y = f(x)
2a-even extension of f(x), 0 ≤ x ≤ a
Note that the resulting 2a-extension is continuous at all “patch points,” i.e., x = (2k−1)a, k ∈ Z.
For this reason, Fourier cosine series are usually employed in the coding of signals and images. The
JPEG/MPEG standards are based on versions of the discrete cosine transform.
Fourier sine series on [−a, a] and periodic extensions
In the case that f(x) is odd, i.e., f(x) = −f(−x), then all coefficients ak = 0, so that the expansion
in (28) becomes a Fourier sine series expansion. Moreover, since f(x) is odd, it need only be defined
on the interval [0, a] as well. The expression for the coefficients bk becomes
bk =2
a
∫ a
0f(x) sin
(
kπx
a
)
dx, k ≥ 1. (31)
Once again, suppose that we are given a function f(x) defined on the interval [0, a] as input data.
From this data, we may construct the bk coefficients – these coefficients define a Fourier sine series that
converges to to the odd 2a-extension of f(x), constructed from f(x) by means of two steps, illustrated
schematically in the figure below,
77
1. An inversion of the graph of f(x) with respect to the origin produce an odd function on [−a, a].
(If f(0) 6= 0, then one of the points (0,±f(0)) will have to be deleted for f to be single-valued
at x = 0.)
2. Copying this graph on the intervals [a, 3a], [3a, 5a], etc. and [−3a,−a], [−5a,−3a], etc.. (Once
again, some endpoints of the pieces of the graph will have to be deleted to make f single-valued.)
a 2a 3a 4a0
y
x−5a −3a −2a −a
y = f(x)
of data
2a-extension2a-extension
original data
odd extension
2a-of extension of f(x), 0 ≤ x ≤ a
Note that the resulting 2a-extension need not be continuous at the “patch points,” i.e., x =
(2k − 1)a, k ∈ Z. Indeed, if f(0) 6= 0, then the odd extension of f(x) will not even be continuous at
0, ±2a, ±4a, etc..
The two-dimensional case: image functions
Note: The discussion in the first two paragraphs is slightly more general than that presented in class.
We now examine briefly the Fourier analysis of two-dimensional functions, which will be used
primarily to represent images. We shall consider an image function f(x, y) to be defined over a
suitable rectangular region D ⊂ R2. For the moment, let D be defined as the rectangular region
−a ≤ x ≤ a, −b ≤ y ≤ b, centered at the origin. A suitable function space for the representation of
images will be the space of square-integrable functions on D, i.e., L2(D):
L2(D) = {f : D → R |∫
D|f(x, y)|2 dA < ∞} (32)
Now let
78
1. {ek(x)}∞1 denote the orthonormal set of sine and cosine functions on the space L2[−a, a].
2. {ok(y)}∞1 denote the orthonormal set of sine and cosine functions in the space L2[−b, b].
Theorem: The set of all product functions {φkl(x, y) = ek(x)ol(y)} k = 1, 2, · · ·, l = 1, 2, · · ·, form an
orthonormal basis in L2(D).
For simplicity, we now assume that our images are defined on square regions, i.e., a = b, and
further assume that a = b = 1. In this case the basis functions ek and ok have the same functional
form:
{ek}∞1 = { 1√2, cos(πx), sin(πx), cos(2πx), sin(2πx), · · ·} (33)
The set of all products ek(x)el(y) will lead to a complicated mixture of sine and cosine functions.
It is convenient to assume that the image function f(x, y) is an even function with respect to both x
and y, implying that we use only the cosine functions in our basis. In essence, this amounts to the
assumption that the actual image being analyzed lies in the region [0, 1] × [0, 1]. Analogous to the
one-dimensional case, the use of only cosine functions will perform an even 2π-periodic extension of
this image, both in the x and y directions. Let us examine this further.
1. Even w.r.t. x: f(x, y) = f(−x, y).
2. Even w.r.t. y: f(x, y) = f(x,−y).
3. From 1 and 2: f(−x, y) = f(x,−y), implying that f(x, y) = f(−x,−y), i.e., symmetry w.r.t.
inversion about (0, 0).
This means that the graph of f(x, y) in the first quadrant, i.e., [0, a] × [0, a], i.e., the input image,
is “flipped” w.r.t. the y-axis, then “flipped” w.r.t. the x-axis, and finally “flipped” w.r.t. the point
(0, 0). The result is an even 2π-extension of the function f(x, y). The process is illustrated below.
The advantage of an even extension in both directions is that no discontinuities are introduced.
The function f(x, y) is continuous at all points on the x and y-axes. As such, no complications
regarding convergence of the Fourier series are introduced artificially.
The net result is that the input image function f(x, y) defined on the region [0, 1]× [0, 1] will admit a
Fourier cosine series expansion of the form,
f(x, y) = a00 +∞∑
k=1
∞∑
l=1
akl cos(kπx) cos(lπy). (34)
79
y
1
1
-1
-1
original image
x
Input image f(x, y), 0 ≤ x, y ≤ 1, and its even 2π-extension in x and y directions via Fourier cosine transform.
The series coefficients akl could be obtained from the expansion for f in terms of the orthonormal
basis functions or by simply multiplying both sides of (34) with the function cos(mπx) cos(nπy) and
integrating x and y over [0, 1], and exploiting the orthogonality of the cosine functions. The net result
is
a00 =
∫ 1
0
∫ 1
0f(x, y) dxdy,
a0l = 2
∫ 1
0
∫ 1
0f(x, y) cos(lπy) dxdy, l ≥ 1,
ak0 = 2
∫ 1
0
∫ 1
0f(x, y) cos(kπx) dxdy, k ≥ 1,
akl = 4
∫ 1
0
∫ 1
0f(x, y) cos(kπx) cos(lπy) dxdy, k, l ≥ 1. (35)
80
Lecture 8
The Discrete Fourier Transform
We now turn to the analysis of discrete data, e.g., sets of measurements, yk, k = 0, 1, 2, · · ·, as opposed
to signals in continuous time, e.g., f(t). We also assume that the measurements are evenly spaced in
time/space, i.e., there is a fixed time interval T > 0 between each measurement. This is necessary for
the basic theory to be presented below. That being said, it is very often the procedure employed in
scientific experiments, e.g., measuring the temperature at a particular location at hourly intervals.
At this time, we shall simply assume that the measurements correpond to the values of a function
f(t) at discrete times, tn = nT . In the signal processing literature, the usual notation for such a
sampling is as follows,
f [n] := f(nT ), n ∈ {0, 1, 2, · · ·} or n ∈ {· · · ,−1, 0, 1, · · ·}. (36)
The square brackets are rather cumbersome – some authors employ the notation “fn”, but we shall
reserve this notation for other purposes. The idea is sketched below.
T 3T2T0 4T 5T 6T
oo o o o o
o
f [2]f [3] f [4] f [5] f [6]
t
y
y = f(t)
of [1]
f [0]
f [n]
nT
We now assume that we are working with a set of N such consecutive data points which will
comprise an N -vector, indexed as follows,
f = (f [0], f [1], · · · , f [N − 1]). (37)
These measurements could be complex-valued, so that f ∈ CN . Furthermore, we assume that this set
of measurements is then periodized, i.e., extended into the future and backwards into the past, so that
f [k + N ] = f [k], k ∈ Z. (38)
81
This represents a periodic extension of the data, a discrete analogy to the periodization of functions
produced by Fourier series representations.
A “derivation” of the DFT from Fourier expansion in terms of complex exponentials
Let us first assume that we are working with a function f(t) that is a-periodic, i.e.,
f(t + a) = f(t), t ∈ R. (39)
We now use the fact that the following doubly-infinite set of functions,
ek(t) =1√a
ei2πkt/a, k ∈ {· · · ,−2,−1, 0, 1, 2, · · ·}, (40)
forms an orthonormal basis for the space of functions L2[0, a]. (It is a good exercise to verify the
orthonormality of these functions over the interval [0, a]. In Lecture 6, we introduced a set of complex
exponential functions that were orthonormal over the interval [−a, a].) We now expand f(t) in terms
of this basis over [0, a],
f(t) =∞∑
k=−∞
ckek, (41)
where the Fourier coefficients ck are given by the complex scalar product
ck = 〈f, ek〉
=
∫ a
0f(t)ek(t) dt
=1√a
∫ a
0f(t)e−i2πkt/a dt (42)
Ignoring the constant, we now construct Riemann sum approximations to the above integral
following the usual procedure from first-year Calculus. Let N ≥ 1 be a fixed integer. Construct an
equipartition of the interval [0, a] in the usual way, i.e., let
∆t =a
N, (43)
and define the partition points,
tn = n∆t =na
N, n = 0, 1, 2, · · · , N. (44)
We use the Riemann sum that is produced by evaluating f(t) at the left-endpoints of each of the N
subintervals In = [tn, tn+1], n = 0, 1, 2, · · · , N − 1, i.e.,
∫ a
0f(t) exp
(
− i2πkt
a
)
dt ≈N−1∑
n=0
f(tn) exp
(
− i2πktna
)
∆t
82
=N−1∑
n=0
f(tn) exp
(
− i2πkn
N
)
a
N
=
(
a√N
)
1√N
N−1∑
n=0
f(tn) exp
(
− i2πkn
N
)
. (45)
We now ignore the constant factor a/√
N and focus on the remaining summation. The f(tn) are
viewed as discrete samples of the function f(t) so we define, as before,
f [n] = f(tn) = f(nT ), (46)
where the sampling time is given by T = ∆t = a/N . The summation in Eq. (45) may then be written
as follows,
c[k] :=1√N
N−1∑
n=0
f [n] exp
(
− i2πkn
N
)
. (47)
This has the form of a complex scalar product between the N -vector of sampled data points,
f = (f [0], f [1], · · · , f [N − 1]), (48)
defined earlier and the complex N -vector ek,
ek = (ek[1], ek[2], · · · , ek[N − 1]), (49)
with components
ek[n] =1√N
exp
(
i2πkn
N
)
, n = 0, 1, · · · , N − 1. (50)
(We’ll show below that the vectors ek are orthonormal.) The index n plays the role of the time or
spatial variable and k is the index of the frequency. We haven’t said anything about the frequency k
so far. In the continuous formulation, we required all integer values of k. From Eq. (47), it is easily
shown (we’ll do it later) that
c[k + N ] = c[k], (51)
i.e., the complex N -vector, c, defined as
c = (c[0], c[1], · · · , c[N − 1]), (52)
is N -periodic, as is the N -vector of sampled data, f . As such, the frequency index k may be constrained
to the values 0, 1, · · · , N − 1.
The complex N -vector c defined in Eq. (52) is known as a discrete Fourier transform (DFT) of
the (discrete) N -vector f in Eq. (48). Let us now investigate this DFT in terms of complex periodic
N -vectors.
83
An orthonormal periodic basis in CN
The goal is to provide a representation of a set of data in terms of periodic basis vectors in CN . First
of all, the following inner product will be used in CN :
〈f, g〉 =N−1∑
n=0
f [n]g[n], (53)
where the bar once again denotes complex conjugation.
Of course, any orthogonal set of complex N -vectors will serve as a basis for CN , but we wish to
use a set of periodic vectors. The family uk ∈ CN discovered in the previous section will do the trick:
For k = 0, 1, · · · , N − 1, define the vector
uk = (uk[1], uk[2], · · · , uk[N − 1]), (54)
with components
uk[n] = exp
(
i2πkn
N
)
, n = 0, 1, · · · , N − 1. (55)
Once again, the index n plays the role of the time or spatial variable and k is the index of the frequency.
Note that in the special case k = 0, all elements uk[n] = 1. In other words, for all N ≥ 2, the
N -vector u0 ∈ CN is a row of 1’s:
u0 = (1, 1, 1, · · · , 1). (56)
This will have important implications in for the DFT.
Let us now show that the vectors uk are N -periodic. First consider a given k ∈ {0, 1, · · · , N − 1}.Then consider a given component uk[n], n ∈ {0, 1, · · · , N − 1}, in the vector uk. From Eq. (55),
uk[n + N ] = exp
(
i2πk(n + N)
N
)
= exp
(
i2πkn
N
)
exp
(
i2πkN
N
)
= exp
(
i2πkn
N
)
exp(i2πk)
= exp
(
i2πkn
N
)
= uk[n]. (57)
84
We claim that the set of N -vectors {uk} forms an orthogonal set in CN . To prove this, consider
the inner product between two elements, uk and ul:
〈uk, ul〉 =
N−1∑
n=0
exp
(
i2πkn
N
)
exp
(
− i2πln
N
)
=
N−1∑
n=0
exp
(
i2π(k − l)n
N
)
. (58)
Case 1: k = l. In this case, the above inner product reduces to
〈uk, ul〉 =
N−1∑
n=0
1 = N. (59)
Case 2: k 6= l. First let p = k − l, an integer. Then the inner product in (58) becomes
〈uk, ul〉 =N−1∑
n=0
exp
(
i2πpn
N
)
=
N−1∑
n=0
[
exp
(
i2πp
N
)]n
= 1 + r + · · · + rN−1, (60)
where r = exp
(
i2πp
N
)
. The sum of this finite geometric series is
S =1 − rN
1 − r=
1 − ei2πp
1 − r=
1 − 1
1 − r= 0. (61)
Therefore,
〈uk, ul〉 = Nδkl, (62)
i.e., the set {uk} is an orthogonal set. Therefore it is a basis in CN . In particular, it is the desired
basis because of its internal periodicity. Once again, we may view the n index as a spatial index – in
fact, n/N plays the role of t or x.
From this orthogonal basis set {uk}, we construct the orthonormal basis vectors,
ek =1√N
uk, k = 0, 1, · · · , N − 1, (63)
with components
ek[n] =1√N
exp
(
i2πkn
N
)
, n = 0, 1, · · · , N − 1. (64)
85
Once again, the case k = 0 is special. For N ≥ 2,
e0 =1√N
(1, 1, · · · , 1). (65)
Examples:
1. N = 2:
In this very simple case, one can probably guess the vectors that are generated. First of all,
from Eq. (65), for k = 0,
e0 =1√2(1, 1). (66)
For k = 1,
e1[1] =1√2
exp
(
i2π · 1 · 12
)
=1√2
exp (iπ) =1√2. (67)
Therefore,
e1 =1√2(1,−1). (68)
2. N = 3:
Once again, the case k = 0 is simple. From Eq. (65),
e0 =1√3(1, 1, 1). (69)
For k = 1, using Eq. (64),
(a) n = 0:
e1[0] =1√3
exp(0) =1√3. (70)
(b) n = 1:
e1[1] =1√3
exp
(
i2π
3
)
=1√3
[
−1
2+
√3
2i
]
. (71)
(c) n = 2:
e1[2] =1√3
exp
(
i4π
3
)
=1√3
[
−1
2−
√3
2i
]
. (72)
In summary,
e1 =1√3
(
1,−1
2+
√3
2i,−1
2−
√3
2i
)
(73)
For k = 2, using Eq. (64),
86
(a) n = 0:
e2[0] =1√3
exp(0) =1√3. (74)
(b) n = 1:
e2[1] =1√3
exp
(
i4π
3
)
=1√3
[
−1
2−
√3
2i
]
. (75)
(c) n = 2:
e2[2] =1√3
exp
(
i8π
3
)
=1√3
[
−1
2+
√3
2i
]
. (76)
In summary,
e2 =1√3
(
1,−1
2−
√3
2i,−1
2+
√3
2i
)
(77)
Discrete Fourier Transform, Version 1
We now employ the orthonormal basis developed above to construct our first version of the DFT. Any
element f ∈ CN will have an expansion of the form
f =
N−1∑
k=0
〈f, ek〉ek. (78)
In component form,
f [n] =N−1∑
k=0
〈f, ek〉ek[n] =N−1∑
k=0
c[k]ek[n], (79)
where the c[k] = 〈f, ek〉 denote the Fourier coefficients of f in the ek basis. Let us now examine these
coefficients:
c[k] = 〈f, ek〉
=
N−1∑
n=0
f [n]ek[n], (80)
or
c[k] =1√N
N−1∑
n=0
f [n] exp
(
− i2πkn
N
)
, k = 0, 1, · · · , N − 1. (81)
This relation defines a discrete Fourier transform (DFT) of f . The components of the vector
c = (c[1], c[2], · · · , c[N − 1]) comprise the DFT of the vector f = (f [1], f [2], · · · , f [N − 1]). Mathemat-
ically, we can write
c = Ff, (82)
87
where F : CN → CN denotes the discrete Fourier transform operator on complex N -vectors.
Important comment: Note the choice of “a” instead of “the” before “discrete Fourier transform.”
Unfortunately, there are several closely-related definitions, and it is important to recognize this fact.
For this reason, we refer to the above DFT as DFT, Version 1.
Let us return to Eq. (81) to show that, indeed, the DFT vector c with components c[k] is N -
periodic:
c[k + N ] =1√N
N−1∑
n=0
f [n] exp
(
− i2π(k + N)n
N
)
=1√N
N−1∑
n=0
f [n] exp
(
− i2π(k)n
N
)
exp
(
− i2πNn
N
)
=1√N
N−1∑
n=0
f [n] exp
(
− i2πkn
N
)
exp (−i2πn)
=1√N
N−1∑
n=0
f [n] exp
(
− i2πkn
N
)
= c[k]. (83)
Eq. (81) is the definition of the discrete Fourier transform implemented in the MAPLE programming
language. In MAPLE, the relation would be written as
c = FourierTransform(f)
f = InverseFourierTransform(c),
where we still have to define the inverse DFT.
Mathematically, the above formula is elegant because of the following result,
‖f‖2 = ‖c‖2, (84)
where ‖ · ‖2 denotes the L2 norm defined by the complex inner product in CN . To see this:
‖f‖22 = 〈f, f〉
=
N−1∑
n=0
f [n]f [n]
88
=N−1∑
n=0
[
N−1∑
k=0
c[k]ek[n]
][
N−1∑
l=0
c[l]el[n]
]
=
N−1∑
k=0
N−1∑
l=0
c[k]c[l]
[
N−1∑
n=0
ek[n]el[n]
]
=
N−1∑
k=0
N−1∑
l=0
c[k]c[l]〈ek, el〉
=N−1∑
k=0
c[k]c[k]
= ‖c‖22, (85)
from which (84) follows. This means that the DFT operator F is norm-preserving, i.e., the norm
of c is the norm of f .
Inverse DFT, Version 1
Let us now see if we can find a result for the inverse discrete Fourier transform, i.e., given the DFT
c, how can we find f , written mathematically as
f = F−1c. (86)
In order to invert relation (81), we shall utilize the orthonormality of the ek vectors in (64). For a
particular value of m ∈ {0, 1, · · · , N − 1}, multiply both sides of Eq. (81) by1√N
exp
(
i2πkm
N
)
and
then sum over k:
1√N
N−1∑
k=0
c[k] exp
(
i2πkm
N
)
=1
N
N−1∑
n=0
f [n]N−1∑
k=0
exp
(
i2πk(m − n)
N
)
. (87)
We have already seen earlier that the final summation is Nδmn. Thus, for each m, only the term
n = m from the sum over n contributes. As a result, we have
f [m] =1√N
N−1∑
k=0
c[k] exp
(
i2πkm
N
)
. (88)
This relation is true for each m = 0, 1, · · · , N − 1. It is customary to let n denote the spatial or time
variable, so we rewrite the above result as
f [n] =1√N
N−1∑
k=0
c[k] exp
(
i2πkn
N
)
, n = 0, 1, · · · , N − 1. (89)
89
These relations comprise the inverse discrete Fourier transform (IDFT) associated with the DFT in
Eq. (81).
A closer look at Eq. (89) shows that, in fact, the inverse DFT is nothing more than the expansion
of the discrete vector f in terms of the orthonormal basis {ek}. the DFT coefficients c[k] are used to
construct the “signal” elements f [n].
We now summarize the results obtained above:
DFT and IDFT, Version 1
c[k] =1√N
N−1∑
n=0
f [n] exp
(
− i2πkn
N
)
, k = 0, 1, · · · , N − 1,
f [n] =1√N
N−1∑
k=0
c[k] exp
(
i2πkn
N
)
, n = 0, 1, · · · , N − 1. (90)
DFT and IDFT, Version 2
A second version of the DFT and its inverse is employed in many mathematics books (e.g., the book
by Kammler). Unlike the first version, it is not symmetric. But there is a legitimate reason for its
definition, since it arises naturally from a discretization of the integrals used to compute Fourier series
coefficients. We shall postpone the discussion of this result to another lecture. For the moment, we
simply state the second version of the DFT.
The DFT, Version 2 is defined as follows,
F [k] =1
N
N−1∑
n=0
f [n] exp
(
− i2πkn
N
)
, k = 0, 1, · · · , N − 1. (91)
Note the use of F to denote the DFT: It is customary to let capital letters denote the FT/DFTs of
functions. The only difference between this version and Version 1 in Eq. (81) is that the factor in front
is 1/N instead of 1/√
N . In the same manner as was done for Version 1, the inverse DFT associated
with the above DFT is given by
f [n] =
N−1∑
k=0
F [k] exp
(
i2πkn
N
)
, n = 0, 1, · · · , N − 1. (92)
90
DFT and IDFT, Version 3
This is the version that appears in most of the signal processing literature (e.g. Mallat) as well as
mathematics books that deal with signal processing applications (e.g., Boggess and Narcowich). It
appears to be the version that is most widely used by research workers in signal and image processing,
as witnessed by the fact that it is the version implemented in MATLAB. As such, unless specified
otherwise, this will be the version used in this course.
The DFT, Version 3 is defined as follows,
F [k] =N−1∑
n=0
f [n] exp
(
− i2πkn
N
)
, k = 0, 1, · · · , N − 1. (93)
There is no factor in front of the summation. The inverse DFT associated with the this DFT is given
by
f [n] =1
N
N−1∑
k=0
F [k] exp
(
i2πkn
N
)
, n = 0, 1, · · · , N − 1. (94)
In MATLAB, the DFT and IDFT are denoted as follows,
F = fft(f),
f = ifft(F).
Using the orthogonality property of the complex exponential functions established earlier, i.e., 〈uk, ul〉 =
Nδkl, it can be shown (a simple modification of the derivation for DFT, Version 1, Eq. (85)) that this
particular version of the DFT satisfies the relation,
‖f‖22 =
1
N‖F‖2
2, (95)
where ‖ · ‖2 denotes the L2/Euclidean norm on CN . From this point onward, we shall omit the
subscript 2 from the norm and write ‖ · ‖, with the understanding that it represents the L2 norm.
Matrix form of DFT
You’ll note that all the coefficients multiplying the f [n] elements in the DFT of Eq. (93) involve
powers of the complex number
ω = exp
(
− i2π
N
)
= cos
(
2π
N
)
− i sin
(
2π
N
)
. (96)
91
A closer examination shows that if f and F are written as column N -vectors, f and F, respectively,
then the DFT relation in (93) may be written in matrix form as
F = Ff , (97)
where F is an N × N complex matrix having the form
1 1 1 · · · 1
1 ω ω2 · · · ωN−1
1 ω2 ω4 · · · ω2N−2
......
......
1 ωN−1 ω2N−2 · · · ω(N−1)(N−1)
(98)
The kth entry of the vector F is given by
F [k] = f [0] + ωkf [1] + ω2kf [2] + · · · + ω(N−1)kf [N − 1]
= f [0] + f [1]ωk + f [2](ωk)2 + · · · f [N − 1](ωk)N−1, (99)
where the second line indicates that F [k] is a polynomial in ωk. This suggests that it may be evaluated
recursively, as opposed to computing the terms separately and adding them up. The following is a
pseudocode version of “Horner’s algorithm” for computing the entire vector F :
z:=1
ω = e−i2π/N
for k=0,1,...,N-1 do
S:=f[N-1]
for l=2,3,...,N do
S:=z*S+f[N-l]
od
F[k]:=S
z:=z*ω
od
In this form, the computation of the DFT F requires N2 complex operations, which translates to
4N2 real operations. For special values of N , the procedure can be optimized, utilizing the fact that
ω is a root of unity. This is the basis of the fast Fourier transform (FFT) which we may discuss a
little later in the course.
92
Lecture 9
Discrete Fourier Transform (cont’d)
We now examine the DFT a little further, with the help of some examples. As mentioned earlier, we
shall be using the DFT, Version 3 – the “MATLAB” formula – summarized again below:
DFT and IDFT, Version 3
F [k] =
N−1∑
n=0
f [n] exp
(
− i2πkn
N
)
, k = 0, 1, · · · , N − 1. (100)
f [n] =1
N
N−1∑
k=0
F [k] exp
(
i2πkn
N
)
, n = 0, 1, · · · , N − 1. (101)
An important note: As we know from before, the DFT and IDFT may be viewed as inner
products between appropriate N -vectors. In the special case k = 0, all of the complex exponentials
in Eq. (100) are equal to 1. This is because the particular element F [0] is the inner product between
the N -vector f and the unnormalized N -vector u0 = (1, 1, · · · , 1). As such,
F [0] = 〈f, u0〉 =N−1∑
n=0
f [n] . (102)
Some examples:
For N = 4:
1. f = (1, 1, 1, 1), F = (4, 0, 0, 0) (This illustrates Eq. (102) above.)
2. g = (0, 1, 0, 1), G = (2, 0,−2, 0)
3. h = (1, 2, 1, 2), H = (6, 0,−2, 0).
4. a = (1, 2, 3, 4), A = (10,−2 + 2i,−2,−2 − 2i).
Comments:
1. In 1, the signal f is a constant signal, i.e., no variation. This means that the only frequency
component is zero frequency, i.e., k = 0. This is why the first element, k = 0, corresponding to
the constant vector u0 is the only nonzero component of F . The signal f is orthogonal to all
other vectors uk, k 6= 0.
93
2. In 2, the signal g has period 2, i.e., it oscillates with twice the periodicity of signal f (period 4).
This accounts for the nonzero entry G[2] = −2.
3. Note that the third result is in accordance with the linearity of the DFT: h = f + g implies that
H = F(h) = F(f + g)
= F(f) + F(g)
= F + G. (103)
4. The result in 4 shows that a real-valued signal can have a complex-valued DFT.
5. Each of the four results above demonstrates the modified Parseval equality for the DFT, Version
3 mentioned earlier, i.e.,
‖f‖2 =1
N‖F‖2. (104)
Some more complicated examples:
5. We consider the function f(x) = cos(2x) defined on the interval 0 ≤ x ≤ 2π. From this function
we construct N = 256 equally-spaced samples,
f [n] = f(xn) = cos(2xn), xn =2πn
N, n = 0, 1, · · · , N − 1. (105)
The samples are plotted on the left in the figure below.
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5
x
0
25
50
75
100
125
150
0 50 100 150 200 250 300
|F(k)|
k
Sampled signal f [n] = cos(2xn), n = 0, 1, · · · , 255 and magnitudes |F [k]| of its DFT.
Numerically, we find that all DFT coefficients F [k] are zero, except for two elements:
F [2] = 128, F [254] = 128. (106)
94
A plot of the magnitudes |F [k]| of the DFT coefficients is presented on the right in the figure
below. The nonzero entry for F [2] picks out the k = 2 frequency of the signal. We’ll see later
that the F [254] component does likewise.
Numerically, we also find that
‖f‖2 =255∑
n=0
|f [n]|2 = 128 (to two decimals), (107)
and1
N‖F‖2 =
1
256(F [2]2 + F [4]2) =
1
256(1282 + 1282) = 128. (108)
Thus, Eq. (95) is satisfied.
6. Now consider the function f(x) = sin(2x) defined on the interval 0 ≤ x ≤ 2π. From this function
we construct N = 256 equally-spaced samples,
f [n] = f(xn) = sin(2xn), xn =2πn
N, n = 0, 1, · · · , N − 1. (109)
The samples are plotted on the left in the figure below.
-1.5
-1
-0.5
0
0.5
1
1.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5
f[n]
x
0
25
50
75
100
125
150
0 50 100 150 200 250 300
|F(k)|
k
Sampled signal f [n] = sin(2xn), n = 0, 1, · · · , 255 and magnitudes |F [k]| of its DFT.
Numerically, we find that all DFT coefficients F [k] are zero, except for two elements:
F [2] = −128i, F [254] = 128i. (110)
Of course, there is a similarity between this spectrum and that of Example 5 in that the peaks
coincide at k = 2 (and 254), corresponding to the common frequency k = 2. As such, a plot
of the magnitudes |F [k]| of the DFT coefficients, presented on the right in the figure below, is
identical to the corresponding plot of Example 5.
95
On the other hand, the DFT coefficients for the sin(2x) function are complex. In fact, they are
purely imaginary. The coefficient F [2] for the sin(x) function is obtained from the coefficient of
the cos(x) function by multiplication by i = eiπ/2. This might have something to do with the
fact that the sin(x) function is a shifted version of the cos(x) version. More on this later.
The moral of the story is that the magnitudes |F [k]| do not contain all of the information about
a signal. If we write
F [k] = |F [k]|eiφk , (111)
then the phases φk also contain information about the signal and cannot be ignored.
Numerically, we also find that
‖f‖2 =
255∑
n=0
|f [n]|2 = 128 (to two decimals), (112)
and1
N‖F‖2 =
1
256(1282 + 1282) = 128. (113)
Thus, Eq. (95) is satisfied.
7. We now consider the function f(x) = sin(2x)+ 5 sin(5x) defined on the interval 0 ≤ x ≤ 2π. We
have added a higher-frequency term to the function of Example 5.
From this function we again construct N = 256 equally-spaced samples,
f [n] = f(xn) = sin(2xn) + 5 sin(5xn), xn =2πn
N, n = 0, 1, · · · , N − 1. (114)
The samples are plotted on the left in the figure below. Numerically, we find that all DFT
-10
-8
-6
-4
-2
0
2
4
6
8
10
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5
f[n]
x
0
100
200
300
400
500
600
700
800
0 50 100 150 200 250 300
|F(k)|
k
Sampled signal f [n], n = 0, 1, · · · , 255 and magnitudes |F [k]| of the DFT, for Example 7.
96
coefficients F [k] are zero, except for four elements:
F [2] = −128i, F [254] = 128i, (115)
as expected, corresponding to the sin(2x) component, and
F [5] = −640i, F [251] = 640i, (116)
corresponding to the sin(5x) component. From the linearity of the DFT, the DFT of the sum
of these two functions is the sum of their DFTs. Also note that the ratio of amplitudes of these
two sets follows the 1 : 5 ratio of the sin(2x) and sin(5x) components. A plot of the magnitudes
|F [k]| of the DFT coefficients is presented on the right in the figure below.
Numerically, we also find that
‖f‖2 =
255∑
n=0
|f [n]|2 = 3328. (117)
and1
N‖F‖2 =
1
256(2 ∗ 1282 + 2 ∗ 6402) = 3328, (118)
once again in accordance with Eq. (95).
8. Let us now generalize the results from the previous two examples. Suppose that we have the
complex-valued function f(x) = exp(ik0x), defined on the interval [0, 2π], with k0 an integer.
For the moment, we assume that k0 ∈ {0, 1, 2, · · · , N − 1}. From this function, we extract N
equally-spaced samples at the sample points xn = 2πn/N , n = 0, 1, · · · , N − 1, i.e,
f [n] = f(xn) = exp(ik0xn) = exp
(
i2πk0n
N
)
, n = 0, 1, · · · , N − 1. (119)
The function f(x) is 2π-periodic. But what is its DFT?
By definition, its DFT is given by
F [k] =N−1∑
n=0
f [n] exp
(
− i2πkn
N
)
=
N−1∑
n=0
exp
(
i2πk0n
N
)
exp
(
− i2πkn
N
)
. (120)
Now recall from the previous lecture that the discrete exponential functions uk,
uk[n] =
N−1∑
n=0
exp
(
− i2πkn
N
)
, (121)
97
an orthogonal set on CN , i.e.,
〈uk, ul〉 = Nδkl. (122)
This means that the F [k] = 0 in Eq. (120) unless k = k0. In other words, the N -point DFT of
the exponential function exp(ik0x) sampled on [0, 2π] is given by
F [k] = Nδkk0. (123)
The DFT consists of a single peak of magnitude N at k = k0.
But wait just one minute!
The above result applies to the case k0 ∈ {0, 1, 2, · · · , N − 1}. What happens if it is not in the set
of frequencies {0, 1, 2, · · · , N − 1} covered by the DFT? For example, if N = 256, what happens if
k0 = 260? Will the function be oscillating too quickly to be detected?
The answer is “No, it will be detected.” Somehow, one gets the feeling that everything here happens
“modulo N ,” because of the periodicity of the vectors. And that is what happens in frequency space
as well. Let us replace k0 in Eq. (120) with k0 + N . Then the RHS becomes
N−1∑
n=0
exp
(
i2π(k0 + N)n
N
)
exp
(
− i2πkn
N
)
=N−1∑
n=0
exp
(
i2πk0n
N
)
exp
(
i2πkNn
N
)
exp
(
− i2πkn
N
)
=
N−1∑
n=0
exp
(
i2πk0n
N
)
exp (i2πkn) exp
(
− i2πkn
N
)
=
N−1∑
n=0
exp
(
i2πk0n
N
)
exp
(
− i2πkn
N
)
. (124)
In other words, the result in Eq. (120) is unchanged. Therefore the same result holds for k0 +pN ,
where p is an integer. This implies that a peak will show up at k = k0 mod N . So the final result is:
The N -point DFT of the sampled function exp(ik0x), 0 ≤ x ≤ 2π, i.e., f [n] given in Eq.
(119), is given by a single peak:
F [k] =
N, k = k0 mod N,
0, otherwise.(125)
98
Note: We actually proved this result earlier, when we showed that the discrete Fourier coefficient
vectors c = (c[1], c[2], · · · , c[N − 1]) associated with the orthonormal basis ek are N -periodic. But it
doesn’t hurt to revisit this result.
From this property, we may now go back and verify that the calculations of Examples 5-7 of the
previous lecture, involving sine and cosine functions, are correct.
1. Since
cos(k0x) =1
2eik0x +
1
2e−ik0x, (126)
it follows, from the linearity of the DFT, and the “modulo N” property derived above, that the
N -point DFT of cos(k0x) consists of two peaks of height N/2, i.e.,
F [k] =
N2 , k = k0,
N2 , k = N − k0,
0, otherwise.
(127)
The peak at k = N − k0 comes from the “modulo N” property. The second exponential in Eq.
(126) would produce a peak at k = −k0 which, in turn, because of the N -periodicity of the DFT,
produces a peak at k = −k0 + N = N − k0.
Note that this result is in agreement with the computation in Example 5 above.
2.
sin(k0x) =1
2ieik0x − 1
2ie−ik0x, (128)
it follows, once again from the linearity of the DFT, and the “modulo N” property derived
above, that the N -point DFT of sin(k0x) consists of the following two peaks,
F [k] =
N2i = −N
2 i, k = k0,
−N2i = N
2 i, k = N − k0,
0, otherwise.
(129)
Note that these peaks have the same magnitudes as for the cosine case, but that they are now
complex. Moreover, the two peaks of the DFT of the sine function are complex conjugates of
each other. More on this later.
For the moment, we note that this result is in agreement with the computation in Example 6
above.
99
3. Finally, notice that if we add up the DFTs of the sine and cosine function appropriately, we
retrieve the DFT of the exponential, i.e.,
F(cos(k0x) + i sin(k0x)) = F(exp(ik0x)). (130)
The peaks of the cosine and sine at k = k0 “constructively interfere” whereas their peaks at
k = N − k0 “destructively interfere.” The result is a single peak of height N at k = k0.
We now consider a slightly perturbed version of Example 6 of the previous lecture, namely the function
f(x) = sin(2.1x) defined on the interval 0 ≤ x ≤ 2π. From this function we construct N = 256 equally-
spaced samples,
f [n] = f(xn) = sin(2.1xn), xn =2πn
N, n = 0, 1, · · · , N − 1. (131)
The samples are plotted on the left in the figure below. Note that this signal is not 2π-periodic, but
the sampling and resulting DFT produces a 2π-periodic extension. As such, there is a significant jump
between f [255] and f [256] = f [0].
This time, we find that the DFT spectrum of coefficients F [k] is not as simple as in the first two
examples. First of all, with the exception of F [0] ≈ 3.41068, all DFT coefficients are complex, i.e.,
have nonzero imaginary part. A plot of the magnitudes |F [k] of the DFT coefficients is presented on
the right in the figure below. There is still a dominant peak at k = 2, but it is not a singular peak –
it is somewhat diffuse.
-1.5
-1
-0.5
0
0.5
1
1.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5
f[n]
x
0
25
50
75
100
125
150
0 50 100 150 200 250 300
|F(k)|
k
Sampled signal f [n] = sin(2.1xn), n = 0, 1, · · · , 255 and magnitudes |F [k]| of the DFT, for Example 9.
In order to show the diffuseness of the DFT spectrum, the coefficients are plotted on a different
scale so that the enormous peaks at k = 2 and 254 do not mask their behaviour as in the previous
plot.
100
0
5
10
15
20
0 50 100 150 200 250 300
|F(k)|
k
Plot of magnitudes |F (k)| of DFT of sin(2.1xn) signal of Example 9, magnified to show the diffuse
structure around the dominant peaks at k = 2 and 254.
Ideally, the DFT would like to place a peak at the frequency k = 2.1, but it doesn’t exist. As such,
the dominant peaks are found at k = 2 and 254. But all other frequencies are need to accomodate
this “nonexistent” or irregular frequency – note that their contribution decreases as we move away
from the peaks.
If this appears to be a rather “bizarre” phenomenon, just go back and think about the Fourier
(sine) series of this function, i.e.,
sin(2.1x) =
∞∑
k=1
bk sin(kx). (132)
In fact, the coefficients bk can be computed rather easily, and one observes that they produce a
somewhat “diffuse” Fourier spectrum that peaks at k = 2.
The reader may wish to examine the effect of further perturbing the frequency of the sampled
signal, i.e., the function f(x) = sin(2 + ǫ)x as ǫ is increased. For example, will the k = 3 (and 253)
components of the DFT increase in magnitude. And for ǫ > 0.5, does k = 3 “take over” in magnitude?
101