Lecture Notes in Harmonic Analysismath.wsu.edu/students/jstreipel/notes/harmonicanalysis.pdf ·...

70
Lecture Notes in Harmonic Analysis Lectures by Dr. Charles Moore Throughout this document, signifies end proof, and N signifies end of example. Table of Contents Table of Contents i Lecture 1 Introduction to Fourier Analysis 1 1.1 Fourier Analysis ........................... 1 1.2 In more general settings... ..................... 3 Lecture 2 More Fourier Analysis 4 2.1 Elementary Facts from Fourier Analysis .............. 4 Lecture 3 Convolving Functions 7 3.1 Properties of Convolution ...................... 7 Lecture 4 An Application 9 4.1 Photographing a Star ........................ 9 4.2 Results in L 2 ............................. 10 Lecture 5 Hilbert Spaces 12 5.1 Fourier Series on L 2 ......................... 12 Lecture 6 More on Hilbert Spaces 15 6.1 Haar Functions ............................ 15 6.2 Fourier Transform on L 2 ....................... 16 Lecture 7 Inverse Fourier Transform 17 7.1 Undoing Fourier Transforms ..................... 17 Lecture 8 Fejer Kernels 19 8.1 Fejer Kernels and Approximate Identities ............. 19 Lecture 9 Convergence of Cesar´ o Means 22 9.1 Convergence of Fourier Sums .................... 22 Notes by Jakob Streipel. Last updated December 1, 2017. i

Transcript of Lecture Notes in Harmonic Analysismath.wsu.edu/students/jstreipel/notes/harmonicanalysis.pdf ·...

  • Lecture Notes in Harmonic Analysis

    Lectures by Dr. Charles Moore

    Throughout this document, signifies end proof, and N signifies endof example.

    Table of Contents

    Table of Contents i

    Lecture 1 Introduction to Fourier Analysis 11.1 Fourier Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 In more general settings. . . . . . . . . . . . . . . . . . . . . . . . 3

    Lecture 2 More Fourier Analysis 42.1 Elementary Facts from Fourier Analysis . . . . . . . . . . . . . . 4

    Lecture 3 Convolving Functions 73.1 Properties of Convolution . . . . . . . . . . . . . . . . . . . . . . 7

    Lecture 4 An Application 94.1 Photographing a Star . . . . . . . . . . . . . . . . . . . . . . . . 94.2 Results in L2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    Lecture 5 Hilbert Spaces 125.1 Fourier Series on L2 . . . . . . . . . . . . . . . . . . . . . . . . . 12

    Lecture 6 More on Hilbert Spaces 156.1 Haar Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156.2 Fourier Transform on L2 . . . . . . . . . . . . . . . . . . . . . . . 16

    Lecture 7 Inverse Fourier Transform 177.1 Undoing Fourier Transforms . . . . . . . . . . . . . . . . . . . . . 17

    Lecture 8 Fejer Kernels 198.1 Fejer Kernels and Approximate Identities . . . . . . . . . . . . . 19

    Lecture 9 Convergence of Cesaró Means 229.1 Convergence of Fourier Sums . . . . . . . . . . . . . . . . . . . . 22

    Notes by Jakob Streipel. Last updated December 1, 2017.

    i

    mailto:[email protected]

  • TABLE OF CONTENTS ii

    Lecture 10 Toward Convergance of Partial Sums 2410.1 Dirichlet Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . 2410.2 Convergence for Continuous Functions . . . . . . . . . . . . . . . 25

    Lecture 11 Convergence in Lp 2611.1 Convergence in Lp . . . . . . . . . . . . . . . . . . . . . . . . . . 2611.2 Almost Everywhere Convergence . . . . . . . . . . . . . . . . . . 28

    Lecture 12 Maximal Functions 2912.1 Hardy-Littlewood Maximal Functions . . . . . . . . . . . . . . . 29

    Lecture 13 More on Maximal Functions 3313.1 Proof of Hardy-Littlewood’s Theorem . . . . . . . . . . . . . . . 33

    Lecture 14 Marcinkiewicz Interpolation 3614.1 Proof of Marcinkiewicz Interpolation Theorem . . . . . . . . . . 36

    Lecture 15 Lebesgue Differentiation Theorem 3815.1 A Note About Maximal Functions . . . . . . . . . . . . . . . . . 3815.2 Lebesgue Differentiation Theorem . . . . . . . . . . . . . . . . . 39

    Lecture 16 Maximal Functions and Kernels 4116.1 Generalising Lebesgue Differentiation Theorem . . . . . . . . . . 41

    Lecture 17 Rising Sun Lemma 4417.1 Nontangential Maximal Function . . . . . . . . . . . . . . . . . . 4417.2 Riesz’s Proof of the Hardy-Littlewood Theorem . . . . . . . . . . 46

    Lecture 18 Calderón-Zygmund Decomposition of Functions 4718.1 Higher-Dimensional Rising Sun Lemma . . . . . . . . . . . . . . 47

    Lecture 19 Density of Sets 4919.1 Hardy-Littlewood’s Theorem from Calderón-Zygmund . . . . . . 4919.2 Density of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    Lecture 20 Marcinkiewicz Integral 5220.1 Convergence of Marcinkiewicz Integral . . . . . . . . . . . . . . . 52

    Lecture 21 Integral Operators 5421.1 Schur’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    Lecture 22 Integral Operators continued 5722.1 Singular Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    Lecture 23 Integral Operators continued 6023.1 Finishing the Proof . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    Lecture 24 Integral Operators continued 6324.1 Proof of the Lemma . . . . . . . . . . . . . . . . . . . . . . . . . 63

    Lecture 25 Integral Operators continued 6425.1 Finalising the Proof . . . . . . . . . . . . . . . . . . . . . . . . . 64

  • TABLE OF CONTENTS iii

    Index 67

  • INTRODUCTION TO FOURIER ANALYSIS 1

    Lecture 1 Introduction to Fourier Analysis

    Harmonic analysis is a broad field involving a great deal of subjects concerningthe art of decomposing functions into constituent parts. These might be Fouriercoefficients, breaking them down into exponential parts, wavelet theory, toolsto deal with partial differential equations, or Sobolev spaces.

    This course will deal with the following:

    • Fourier analysis,

    • Harmonic functions,

    • Singular integrals, and

    • Maximal functions.

    1.1 Fourier Analysis

    Definition 1.1.1 (Inner product space). Let V be a finite dimensional vectorspace over C. Then V is called an inner product space if there is a mapping〈·, ·〉 : V × V → C which satisfies the following for all vectors x,y, z ∈ V andscalars α ∈ C:

    (i) Conjugate symmetry, meaning that 〈x,y〉 = 〈y,x〉;

    (ii) Linearity in the first argument, i.e. 〈αx,y〉 = α〈x,y〉 and 〈x + y, z〉 =〈x, z〉+ 〈y, z〉;

    (iii) Positive-definiteness, meaning that 〈x,x〉 ≥ 0 and 〈x,x〉 = 0 if and onlyif x = 0.

    All inner product spaces automatically induce a norm , namely ‖v‖ = 〈v,v〉1/2.Moreover normed spaces are automatically metric spaces by d(v,w) = ‖v−

    w‖. Ergo it has the following properties:

    Definition 1.1.2 (Metric). A function d(·, ·) : V × V → R is called a metricif

    (i) d(x, y) ≥ 0 and d(x, y) = 0 if and only if x = y;

    (ii) symmetry, i.e. d(x, y) = d(y, x); and

    (iii) d(x, z) ≤ d(x, y) + d(y, z)

    for all x, y, x ∈ V .

    Since we have an inner product, we are able to define all manner of otherinteresting concepts.

    Definition 1.1.3 (Orthogonal, orthonormal). A basis B = {v1,v2, . . . ,vn } iscalled orthogonal if the basis elements are pairwise orthogonal, i.e. 〈vi,vj〉 = 0for all i 6= j. Moreover the basis is called orthonormal if in addition ‖vi‖ = 1for all i.

  • INTRODUCTION TO FOURIER ANALYSIS 2

    Since a basis spans the space, we can write v = c1v1 + c2v2 + . . .+ cnvn. Nowif the basis in addition is orthonormal we have the illuminating property that

    〈v,v1〉 = 〈c1v1 + c2v2 + . . .+ cnvn,v1〉= c1〈v1,v1〉+ c2〈v1,v2〉+ . . .+ cn〈v1,vn〉

    = c1‖v1‖2 + 0 + 0 + . . .+ 0 = c1,

    and similarly 〈v,vi〉 = ci for all i.Therefore

    v = 〈v,v1〉v1 + 〈v,v2〉v2 + . . .+ 〈v,vn〉vn,and 〈vvv,vi〉 are called the Fourier coefficients of v.

    We can do the same thing on a slightly more interesting space than ordinaryEuclidean space, namely L1:

    Definition 1.1.4. Suppose f is a function on [−π, π) with∫−π

    π|f(x)| dx

  • INTRODUCTION TO FOURIER ANALYSIS 3

    Now suppose that n 6= m are integers. Then

    〈einx, eimx〉 = 12π

    ∫ π−π

    einxeimx dx =1

    ∫ π−π

    einxe−imx dx

    =1

    ∫ π−π

    ei(n−m)x dx =1

    ei(n−m)x

    i(n−m)

    ∣∣∣∣πx=−π

    =1

    ( ei(n−m)πi(n−m)

    − e−i(n−m)π

    i(n−m)

    )= 0,

    and

    ‖einx‖ = 〈einx, einx〉 =( 1

    ∫ π−π

    einxeinx dx)1/2

    =( 1

    ∫ π−π

    e0)1/2

    = 1.

    Therefore { einx }n∈Z is an orthonormal set.Given a function f , we defined f̂(n) = 〈f, einx〉. We would like for this to

    mean thatf =

    ∑n∈Z

    f̂(n)einx

    likev = 〈v,v1〉v1 + 〈v,v2〉v2 + . . .+ 〈v,vn〉vn.

    But is it a basis? In what sense does this infinite sum converge?

    1.2 In more general settings. . .

    Given f on [−π, π) and∫ π−π|f(x)| dx

  • MORE FOURIER ANALYSIS 4

    Indeed we could consider n-dimensional tori too, if we like, where’d we’d bedoing the exact same thing, just over a different integrating domain (and witha different scalar in front).

    More generally we may do this over any locally compact Abelian group G(which is what it sounds like, with the caveat that the group structure and thetopological structure be connected in that the group operations of addition andnegation are both continuous).

    It is a fact that on G there exists a measure µ called the Haar measure suchthat it is translation invariant, i.e. µ(E + x) = µ(E), for evert Borel set E ⊆ Gand x ∈ G.

    If γ : G→ C is a mapping such that γ(x+ y) = γ(x)γ(y) for every x, y ∈ G,e.g. γ(x) = einx, then γ is called a (multiplicative) character on G.

    In L2[−π, π), einx are characters. Moreover the set of all characters of agroup is called the dual group, often denoted Γ.

    Now we can define

    f̂(γ) =

    ∫G

    f(x)γ(x) dx.

    (These things are discussed in Rudin’s Fourier analysis on LCA groups.)

    Lecture 2 More Fourier Analysis

    Recall from last time that we write

    f ∼∞∑

    n=−∞f̂(n)einx

    to mean that

    f̂(n) =1

    ∫ π−π

    f(t)e−int dt,

    for functions f ∈ F 1([−π, π)), which means that∫ π−π|f(t)| dt.

    Similarly if f is defined on Rd we write

    f̂(ξ) =

    ∫Rde−iξ·t dt.

    2.1 Elementary Facts from Fourier Analysis

    Proposition 2.1.1. With these definitions,

    (i) |f̂(n)| ≤ 12π

    ∫ π−π|f(t)| dt, and

    (ii) |f̂(ξ)| ≤∫Rd|f(t)| dt.

  • MORE FOURIER ANALYSIS 5

    Proof. We prove the first one; the second one is almost identical:

    |f̂(n)| =∣∣∣∣ 12π

    ∫ π−π

    f(t)e−int dt

    ∣∣∣∣≤ 1

    ∫ π−π|f(t)||e−int| dt = 1

    ∫ π−π|f(t)| dt.

    We can do better, in fact.

    Proposition 2.1.2 (Riemann-Lebesgue lemma). (i) If f ∈ L1([−π, π)) then|f̂(n)| → 0 as |n| → ∞, and

    (ii) If f ∈ L1(Rd) then f̂(ξ) is continuous and |f̂(ξ)| → 0 as |ξ| → ∞.

    Note that whilst (ii) might seem stronger, since (a) is a sequence it is automat-ically continuous in the discrete topology.

    Proof. We prove the first one (since again, save for the continuity, the secondone is similar), but we prove it only for indicator functions. This is of coursefine since they are dense in the space of L1 functions.

    For an indicator function

    χ(a,b)(x) =

    {1, if x ∈ (a, b)0, otherwise

    we have

    χ̂(a,b)(n) =1

    ∫ π−π

    χ(a,b)e−inx dx =

    1

    ∫ ba

    e−inx dx

    =1

    e−inx

    −in

    ∣∣∣∣ba

    =1

    (e−inb − e−ina−in

    )→ 0

    as n→∞ since eix is bounded by 1.Now since characteristic functions are dense in L1, this means that there

    exists a function g that is a linear combination of characteristic functions suchthat ∫ π

    −π|f(x)− g(x)| dx < ε

    for any ε > 0. Now since ĝ(n)→ 0 as |n| → ∞ there exists an M such that forall |n| > M we have |ĝ(n)| < ε. Take such an n, then

    |f̂(n)| =∣∣∣∣ 12π

    ∫ π−π

    f(t)e−int dt

    ∣∣∣∣=

    ∣∣∣∣ 12π∫ π−π

    f(t)e−int dt+1

    ∫ π−π

    g(t)e−int dt− 12π

    ∫ π−π

    g(t)e−int dt

    ∣∣∣∣≤ 1

    ∫ π−π|f(t)− g(t)||e−int| dt+ |ĝ(n)| ≤ ε

    2π+ ε

    which can of course be made arbitrarily small.

  • MORE FOURIER ANALYSIS 6

    For the continuity in the second case, let ξ ∈ Rd and morever let ξj ∈ Rdsuch that ξj → ξ. Then

    |f̂(ξj)− f̂(ξ)| =∣∣∣∣∫

    Rdf(t)e−iξj ·t dt−

    ∫Rdf(t)e−iξ·t dt

    ∣∣∣∣≤∫Rd|f(t)||e−ixij ·t − e−iξ·t dt| → 0

    as j →∞.

    Remark 2.1.3. Note how without comment we took limits from outside of theintegral to the inside of the integral just then. This is of course allowed sincethe integrand is finite, by previous discussion.

    This is contrary to classical examples such as fn(x) = nχ(0,1/n)(x). Sincethis converges pointwise to 0 as n goes to infinity, we have∫ 1

    0

    limn→∞

    fn(x) dx = 0 and limn→∞

    ∫ 10

    fn(x) dx = 1.

    Remark 2.1.4. The Riemann-Lebesgue lemma is not true for measures. Takefor instance the Dirac measure δ, for which we have

    δ̂(n) =

    ∫1

    ∫ π−π

    e−int dδ(t) = 1

    for all n, so it certainly does not go to 0.

    Remark 2.1.5. We have now shown that the Fourier coefficients of an L1 functionf is a two-sided decaying sequence. One might ask, then, whether if for eachdecaying two-sided sequence we might find an L1 function who has preciselythat sequence as its Fourier coefficients.

    The answer to this question is no, by Bochner’s theorem, but more on thislater.

    Proposition 2.1.6. Let f ∈ L1([−π, π)). Extend f to all of R as a 2π periodicfunction, i.e. f(2πn+ x) = x for all n ∈ Z. Then

    (i) Let y ∈ R and define g(x) = f(x− y). Then ĝ(n) = f̂(n)e−iyn;

    (ii) If m ∈ Z, g(x) = f(x)eimx, then ĝ(n) = f̂(n−m);

    Proof. (i) We compute, making a change of variable along the way:

    ĝ(n) =1

    ∫ π−π

    g(t)e−int dt =1

    ∫ π−π

    f(t− y)e−int dt

    =1

    ∫ π−y−π−y

    f(s)e−in(s+y) ds =e−iny

    ∫ π−y−π−y

    f(s)e−ins ds

    = e−iny f̂(n).

    (ii) is similar.

    Something similar is true for products of functions—almost.

  • CONVOLVING FUNCTIONS 7

    Definition 2.1.7 (Convolution). If f ∈ Lp(Rd), g ∈ L1(Rd), then we let

    f ∗ g(x) =∫Rdf(x− y)g(y) dy,

    called the convolution of f and g.

    Not is turns out that f̂g 6= f̂ ĝ in general, but f̂ ∗ g = f̂ · ĝ.

    Lecture 3 Convolving Functions

    3.1 Properties of Convolution

    Recall that for f, g ∈ L1([−π, π)), we define

    f ∗ g(x) = 12π

    ∫ π−π

    f(x− y)g(y) dy

    and for f, g ∈ L1(Rd) we define

    f ∗ g(x) =∫Rdf(x− y)g(y) dy.

    Proposition 3.1.1 (Young’s inequality). If f ∈ Lp and g ∈ L1, then ‖f ∗g‖p ≤‖f‖p‖g‖1.

    We will need the following lemma:

    Lemma 3.1.2 (Holder’s inequality). Let 1 ≤ p, q ≤ ∞ be such that 1/p+1/q =1. Then ∫

    Rd|h(x)k(x)| dx ≤

    (∫Rd|h(x)|p dx

    )1/p(∫Rd|k(x)|q dx

    )1/q.

    Proof. We prove the case of Rd; the other one is similar.

    ‖f ∗ g‖pp =∫Rd|f ∗ g(x)|p dx =

    ∫Rd

    ∣∣∣∣∫Rdf(x− y)g(y) dy

    ∣∣∣∣p dx≤∫Rd

    ∣∣∣∣∫Rd|f(x− y)||g(y)|1/p|g(y)|1/q dy

    ∣∣∣∣p dx≤∫Rd

    ∣∣∣∣( ∫Rd|f(x− y)|p|g(y)| dy

    )1/p(∫Rd|g(y)| dy

    )1/q∣∣∣∣p dx= ‖g‖p/q1

    ∫Rd

    ∫Rd|f(x− y)|p|g(y)| dy dx

    = ‖g‖p/q1∫Rd

    ∫Rd|f(x− y)|p|g(y)| dx dy

    = ‖g‖p/q1∫Rd|g(y)|

    ∫Rd|f(x− y)|p dx dy

    = ‖g‖p/q1 ‖f‖pp‖g‖1 = ‖f‖

    pp‖g‖

    p/q+11 = ‖f‖

    pp‖g‖

    p1.

  • CONVOLVING FUNCTIONS 8

    Proposition 3.1.3. (i) If f, g ∈ L1([−π, π)) then

    f̂ ∗ g(n) = f̂(n)ĝ(n).

    (ii) If f, g ∈ L1(Rd) thenf̂ ∗ g(ξ) = f̂(ξ)ĝ(ξ).

    Proof. We prove (i):

    f̂ ∗ g(n) = 12π

    ∫ π−π

    f ∗ g(t)e−int dt

    =1

    ∫ π−π

    1

    ∫ π−π

    f(t− s)g(s) ds e−int dt

    =1

    ∫ π−π

    1

    ∫ π−π

    f(t− s)g(s)e−int ds dt

    =1

    ∫ π−π

    1

    ∫ π−π

    f(t− s)g(s)e−int dt ds

    =1

    ∫ π−π

    g(s)1

    ∫ π−π

    f(t− s)e−int dt ds

    =1

    ∫ π−π

    g(s)

    ∫ π−s−π−s

    f(v)e−in(s+v) dv ds

    =1

    ∫ π−π

    g(s)e−ins1

    ∫ π−s−π−s

    f(v)e−inv dv ds = ĝ(n)f̂(n).

    Proposition 3.1.4. (i) If f ∈ L1(Rd) and if xkf(x) ∈ L1(Rd) (where x =(x1, x2, . . . , xd)), then f̂(ξ) is differentiable almost everywhere with respectto ξk and

    ∂ξkf̂(ξ) = −̂ixkf(ξ).

    (ii) If f, ∂f/∂xk ∈ L1(Rd), then

    ∂̂f

    ∂xk(ξ) = iξkf̂(ξ).

    Proof. Set h = (0, . . . , 0, h, 0, . . . , 0), with h being in the kth position. Consider

    f̂(ξ − h)− f̂(ξ)h

    =1

    h

    ∫Rdf(t)(e−i(ξ+h)·t − e−iξ·t) dt

    =

    ∫Rdf(t)e−iξ·t

    e−ih·t − 1h

    dt.

    Thus taking limits we get the partial derivative in ξ, whereby

    ∂f̂

    ∂ξk(ξ) =

    ∫Rdf(t)e−iξ·t lim

    h→0

    e−ihtk − 1h

    dt

    =

    ∫Rdf(t)e−iξ·t(−itk) dt = −̂itkf(ξ).

  • AN APPLICATION 9

    Proposition 3.1.5. If f, df/dx ∈ L1([−π, π)) then

    d̂f

    dx(n) = inf̂(n).

    Proof. By simple computation using integration by parts,

    d̂f

    dx(n) =

    1

    ∫ π−π

    df

    dx(x)e−inx dx

    =1

    2πf(x)e−inx

    ∣∣∣∣π−π− 1

    ∫ π−π

    f(x)(−in)e−inx dx

    = in

    ∫ π−π

    f(x)e−inx dx = inf̂(n).

    Lecture 4 An Application

    4.1 Photographing a Star

    We will describe an application of the Fourier transform. Imagine taking a two-dimensional photograph of a start from Earth. This star looks like a disc fromour perspective, and due to there being atmosphere between us and the starthe picture will be blurry. A question one might ask oneself then is: despite theblurriness, would it be possible to determine the radius of the star?

    To describe how this is accomplished using the Fourier transform, we firstset the problem up in terms of functions on the plane.

    The true image of the star can be described as

    f(x) = λχB

    (x− yε

    )where x = (x1, x2), y = (y1, y2) is the center of the disc (which is the image ofthe star), ε is the radius of the disc, and λ is the brightness or luminosity of thestar. Moreover χB is the characteristic function of the unit disc, from which itfollows that the above is the disc of radius ε centred on y with brightness λ.

    The blurry photograph of the star can be described by

    f ∗ k(x) =∫R2f(y)k(x− y) dy

    where k(y) is some sort of smooth function with a bump around the origin and0 elsewhere. Such a function k is called a mollifier , which comes from the factthat the convolution above will produce an image of f , with the caveat thatsharp edges have been smoothed out.

    Now imagine us taking n photos, getting f ∗ k1, f ∗ k2, . . . , f ∗ kn. We super-impose these, yielding f ∗ k1 + f ∗ k2 + . . .+ f ∗ kn, and then take the Fouriertransform, producing

    f̂ k̂1 + f̂ k̂2 + . . .+ f̂ k̂n = f̂(k̂1 + k̂2 + . . .+ k̂n).

    Now f̂ has zeros, as do k̂1, . . . , k̂n. Let us assume that the zeros of Fouriertransform are the zeros of f̂ (i.e. that the zeros of the Fourier transforms of the

  • AN APPLICATION 10

    mollifier don’t happen to coincide with each other and the zeros of f̂). Then,using the substitutions s = x− y and u = y/ε along the way,

    f̂(ξ) =

    ∫R2e−ix·ξλχB

    (x− yε

    )dx

    = λ

    ∫R2e−i(s+y)·ξχB(s/ε) ds

    = λe−iy·ξ∫R2e−is·ξχB(s/ε) dy

    = ε2λe−iy·ξ∫R2e−iεu·ξχB(u) du

    = ε2λe−iy·ξ∫R2e−iu·(εξ)χB(u) du

    = ε2λe−iy·ξχ̂B(εξ)

    It happens to be a fact that the Fourier transform of the unit ball is

    χ̂B(ξ) =2πJ1(|ξ|)|ξ|

    ,

    where J1 is the Bessel function of the first kind. Therefore

    f̂(ξ) = ελe−iy·ξ2πJ1(ε|ξ|)|ξ|

    ,

    so under our assumption the zeros of the superimposed image of the star shouldbe the zeros of f̂ , which are then exactly characterised by the zeros of the Besselfunction, which are well known. Knowing then that the distance between theobserved zeros would be the distance between the zeros of the Bessel functionscaled by the radius ε, we can recover this radius.

    4.2 Results in L2

    A question we have asked ourselves previously is when a function f ∈ L1([−π, π))is equal to its own Fourier series. In other words, when is

    f(x) =

    ∞∑n=−∞

    f̂(n)einx?

    Recall first that f ∈ L2([−π, π)) means that∫ π−π|f(t)|2 dt

  • AN APPLICATION 11

    Proof. That this is true is quite obvious: expanding the left-hand sum we getmostly mixed 〈en, em〉, which by orthogonality are zero, and the remaining〈en, en〉 are 1 by normality.

    Proposition 4.2.2. Let f be a function in an inner product space, and { e1, e2, . . . , eN }an orthonormal set. Then∥∥∥∥∥

    N∑n=1

    〈f, en〉 − f

    ∥∥∥∥∥2

    = ‖f‖2 −N∑n=1

    |〈f, en〉|2.

    Proof. By computation:∥∥∥∥∥N∑n=1

    〈f, en〉 − f

    ∥∥∥∥∥2

    =

    〈N∑n=1

    〈f, en〉en − f,N∑n=1

    〈f, en〉en − f

    =

    〈N∑n=1

    〈f, en〉en,N∑n=1

    〈f, en〉en

    〉−

    〈N∑n=1

    〈f, en〉en, f

    〈f,

    N∑n=1

    〈f, en〉en

    〉+ 〈f, f〉

    =

    N∑n=1

    |〈f, en〉|2 − 2N∑n=1

    〈f, en〉〈f, en〉+ ‖f‖2

    =

    N∑n=1

    |〈f, en〉|2 − 2N∑n=1

    |〈f, en〉|2 + ‖f‖

    = ‖f‖ −N∑n=1

    |〈f, en〉|2.

    Corollary 4.2.3 (Bessel’s inequality). In the same setting,

    N∑n=1

    |〈f, en〉|2 ≤ ‖f‖2.

    Proof. This is immediate byu the previous proposition since norms are nonneg-ative.

    Example 4.2.4. For any N ,

    N∑n=−N

    |f̂(n)|2 ≤ ‖f‖

    whereby∞∑

    n=−∞|f̂(n)|2 ≤ ‖f‖22.

    N

  • HILBERT SPACES 12

    Lecture 5 Hilbert Spaces

    5.1 Fourier Series on L2

    Recall that L2([−π, π)) is the set of all functions f satisfying

    ‖f‖2 =( 1

    ∫ π−π|f(t)|2 dt

    )1/2

  • HILBERT SPACES 13

    Remark 5.1.6. We don’t actually know yet that { einx } ⊂ L2([−π, π)) is com-plete, i.e. that 〈f, einx〉 = 0 for all n ∈ Z implies that f = 0. This is intuitivelytrue: the scalar products somehow measure how much f oscillates in the givenfrequency, so if f is constant (i.e. doesn’t oscillate at all), then all scalar prod-ucts except for n = 0 would be 0, and finally for this last one to be 0 the constantwould indeed have to be 0.

    Proof. (ii) We know that ∑α

    |〈f, eα〉|2 ≤ ‖f‖22,

    i.e. Bessel’s inequality. Suppose there exists some ε > 0 such that∥∥∥∥∥f −∑α∈S

    aαeα

    ∥∥∥∥∥ > εfor every finite set S and aα ∈ C. Let

    M =

    {∑α∈S

    aαeα

    ∣∣∣∣∣ S finite and aα ∈ C}.

    An elementary fact from linear algebra is that we can write f = fM + fM⊥ ,where fM ∈M and 〈fM⊥ ,m〉 = 0 for all m ∈M .

    Since f 6∈M (since f is a positive distance away from all things before takingclosure, it remains outside afterward; just put a ball of radius ε/2 around it),then fM⊥ 6= 0.

    But 〈fM⊥ , eα〉 = 0 for every α since eα ∈ M , so by completeness fM⊥ = 0,which is a contradiction. Therefore we cannot bound the distance between fand its Fourier series by any positive value.

    A simple computation shows that∥∥∥∥∥f −∑α∈S〈f, eα〉eα

    ∥∥∥∥∥ ≤∥∥∥∥∥f −∑

    α∈Saαeα

    ∥∥∥∥∥,i.e. this sort of quantity is minimised by the Fourier coefficients. Moreover fromlast lecture ∥∥∥∥∥f −∑

    α∈S〈f, eα〉eα

    ∥∥∥∥∥2

    = ‖f‖ −∑α∈S|〈f, eα〉|2,

    and by (ii) we can make this arbitrarily small, and so we have (i).Furthermore

    〈f, g〉 −

    〈∑α∈S〈f, eα〉eα, g

    〉= 〈f, g〉 −

    ∑α∈S〈f, eα〉〈g, eα〉 = 〈f, g〉 −

    ∑α∈S

    f̂(α)ĝ(α)

    but ∣∣∣∣∣〈f, g〉 −〈∑α∈S〈f, eα〉eα, g

    〉∣∣∣∣∣ =∣∣∣∣∣〈f −

    ∑α∈S〈f, eα〉eα, g

    〉∣∣∣∣∣

  • HILBERT SPACES 14

    which by Cauchy-Schwartz inequality is bounded by∥∥∥∥∥f −∑α∈S〈f, eα〉eα

    ∥∥∥∥∥ · ‖g‖and this we can make arbitrarily small by (ii), and so (iii) follows.

    Theorem 5.1.7 (Riesz-Fischer). (i) Given f ∈ H and { eα } an orthonormalset indexed by α ∈ A, then { f̂(α) }α∈A ∈ `2(A).

    (ii) Conversely, given a sequence { aα } ∈ `2(A), then∑α∈A

    aαeα

    defines an element of H.

    Recall before we proceed that `2(A) is the set of square summable sequencesindexed by A, where we mean that the supremum of all finite square sums isfinite.

    Example 5.1.8. For instance,

    `2(N) =

    {{ an }

    ∣∣∣∣∣∞∑n=1

    |an|2 m, since we are operating on anorthonormal set,∥∥∥∥∥ ∑

    α∈Bn

    aαeα −∑α∈Bm

    aαeα

    ∥∥∥∥∥2

    =∑

    α∈Bn\Bm

    |aα|2 ≤2

    2m

    which can be made arbitrarily small, whereby these sums indexed by n form aCauchy sequence in H, and since H is a Hilbert space it must converge to anelement in H.

  • MORE ON HILBERT SPACES 15

    Example 5.1.9. Consider a positive function h on [−1, 1] such that∫ 1−1h(t) dt = 1.

    Suppose further that ∫ 1−1tnh(t) dt ≤ ∞

    for all n = 0, 1, 2, . . ..Then the set 1, x, x2, . . . isn’t orthonormal, but we can orthogonalise by

    Gram-Schmidt, i.e. let v0 = 1,

    v1 = x−〈x, v0〉‖v0‖2

    v0

    and

    v2 = x2 − 〈x

    2, v1〉‖v1‖2

    v1 −〈x2, v0〉‖v0‖2

    v0

    and so on. Finally normalise by ui = vi/‖vi‖, and we have an orthonormal setof functions (which will be complete for some choices of h).

    For instance h(x) = 1/2 yields the Legendre polynomials, h(x) = (1−x)α(1+x)β produces Jacobi functions, having Gaussian h gives Hermite functions, andso on.

    These special functions arise as solutions to some ordinary differential equa-tions, as orthonormal bases for some L2 spaces, or as certain recurrence rela-tions. N

    Lecture 6 More on Hilbert Spaces

    6.1 Haar Functions

    Example 6.1.1. Consider the following set of functions on [0, 1] with the innerproduct

    〈f, g〉 =∫ 1

    0

    f(t)g(t) dt.

    Let h0(x) = 1, and let h(x) = χ[0,1/2)(x)− χ[1/2,1)(x).Now we recursively define

    hj,k(x) = 2j/2h(2jx− k)

    for all j = 0, 1, 2, . . . and for each j we have k ∈ { 0, 1, 2, . . . , 2j − 1 }.It is then straightforward to verify that

    ‖hj,k‖ =∫ 1

    0

    |hj,k(x)|2 dx = 1

    and that

    〈hj,k, hl,m〉 =∫ 1

    0

    hj,k(x)hl,m(x) dx

  • MORE ON HILBERT SPACES 16

    is 0 if (j, k) 6= (l,m). This is clear since if j 6= l, then either their supports aredisjoint, in which we get 0, or they aren’t, but then they average to 0 on theintersection of their supports. Similarly if j = l but k 6= m, then the supportsare disjoint, and again we have 0.

    Hence the Haar functions form an orthonormal set. Moreover it is complete,and any function on [0, 1] can be written as a linear sum of Haar functions. N

    6.2 Fourier Transform on L2

    We will consider functions in L1(Rd) ∩ L2(Rd).

    Lemma 6.2.1. Let f(x) = e−a|x|2

    , with a > 0. Then

    f̂(ξ) =(πa

    )n/2e−|ξ|

    2/(4a).

    In other words, the Fourier transform of a Gaussian is (up to some constant)again Gaussian.

    Proof. It is pretty much by straightforward—but long—computation:

    f̂(ξ) =

    ∫Rde−ix·ξe−a|x|

    2

    dx =

    ∫Rde−i(x1ξ1+...+xdξd)−a(x

    21+...x

    2d) dx

    =

    ∫R

    ∫R· · ·∫Re−ix1ξ1−ax

    21 · e−ix2ξ2−ax

    22 · . . . · e−ixdξd−ax

    2d dx1 dx2 . . . dxd

    =(∫

    Re−ix1ξ1−ax

    21 dx1

    )· . . . ·

    (∫Re−ixdξd−ax

    2d dxd

    )=

    d∏j=1

    ∫Re−ixjξj−ax

    2j dxj ,

    meaning that it is sufficient to evaluate this integral in one variable.So we compute:∫Re−ixξ−ax

    2

    dx =

    ∫Re−a(x

    2−ξ/aix) dx = e−ξ2/(4a)

    ∫Re−a(x

    2ξ/aix−ξ2/(4a2)) dx

    = e−ξ2/(4a)

    ∫Re−a(x+iξ/(2a))

    2

    dx.

    We evaluate this by means of a contour integral of g(z) = e−az2

    , around thecurve C which is the rectangle with corners in N , N + iξ/(2a), −N + iξ/(2a),and −N . Then∮C

    g(z) dz =

    ∫ N−N

    e−ax2

    dx+i

    ∫ ξ/(2a)0

    e−a(N+is)2

    ds

    −∫ N−N

    e−a(x+iξ/(2a))2

    dx− i∫ ξ/(2a)

    0

    e−a(−N+is)2

    ds.

    The integrals on the vertical parts both vanish as N →∞ since they’re productsof a negative exponential and bounded terms. Moreover the whole thing is 0since we’re integrating an entire function on a simple, closed curve, and therefore∫

    Re−ax

    2

    dx =

    ∫Re−a(x+iξ/(2a))

    2

    dx,

  • INVERSE FOURIER TRANSFORM 17

    and the left-hand side is well-known to be√π/a. Therefore

    f̂(ξ) =

    d∏j=1

    ∫Re−ixjξj−ax

    2j dxj =

    d∏j=1

    (πa

    )n/2e−ξ

    2j/(4a) =

    (πa

    )n/2e−|ξ|

    2/(4a).

    Lecture 7 Inverse Fourier Transform

    7.1 Undoing Fourier Transforms

    It is helpful to know that we can ‘move the hat’ inside of integrals:

    Lemma 7.1.1. If f, g ∈ L1(Rd), then∫Rdf̂(y)g(y) dy =

    ∫Rdf(y)ĝ(y) dy.

    Proof. We prove it by straightforward computation:∫Rdf̂(y)g(y) dy =

    ∫Rd

    ∫Rde−iy·xf(x) dxg(y) dy,

    and by Fubini’s theorem we can switch the order of integration since the inte-grand is bounded, so∫

    Rd

    ∫Rde−ix·yf(x)g(y) dy dx =

    ∫Rdf(x)

    ∫Rde−ix·yg(y) dy dx

    =

    ∫Rdf(x)ĝ(x) dx.

    Definition 7.1.2 (Inverse Fourier transform). For f ∈ L1(Rd) we define

    f̌(x) =1

    (2π)d

    ∫Rdeix·tf(t) dt,

    the so-called inverse Fourier transform of f .

    Theorem 7.1.3 (Fourier inversion theorem). If f, f̂ ∈ L1(Rd) and f bounded,then

    f(x) =ˇ̂f(x)

    almost everywhere.

    Proof. It follows from computation and our two latest lemmas. Let φ(t) =

    eix·t−ε2|t|2 , x fixed. Then

    φ̂(ξ) =

    ∫Rdeix·t−ε

    2|t|2e−iξ·t dt =

    ∫Rde−i(ξ−x)·t−ε

    2|t|2 dt.

    By the lemma from last lecture this is√π

    ε2

    d

    e−|ξ−x|2/(4ε2).

  • INVERSE FOURIER TRANSFORM 18

    Now for convenience write

    g(ξ) =√πde−|ξ|

    2/4,

    then φ̂(ξ) = ε−dg((x− ξ)/ε).Therefore∫

    Rdf(y)

    1

    εdg(x− y

    ε

    )dy =

    ∫Rdf(y)φ̂(y) dy

    =

    ∫Rdf̂(y)φ(y) dy =

    ∫Rdf̂(y)eix·y−ε

    2|y|2 dy.

    Now let ε→ 0. The right-hand side tends to ˇ̂f(x), where we can pass the limitinside since it is bounded by an integrable function.

    For the left-hand side, suppose f is continuous at x and consider∣∣∣∣∫Rdf(y)

    1

    εdg(x− y

    ε

    )dy − f(x)

    ∣∣∣∣which, since ∫

    Rd

    1

    εdg(x− y

    ε

    )dy = 1,

    is the same if we multiply f(x) by the above integrand to combine the integrals.Then it is the same as∣∣∣∣∫

    Rd|f(y)− f(x)| 1

    εdg(x− y

    ε

    )dy

    ∣∣∣∣and if we let u = (x− y)/ε this becomes∫

    Rd|f(x− εu)− f(x)|g(u) du.

    Letting ε tend to 0, then this tends to 0, meaning that f equalsˇ̂f in L1, but we

    want almost everywhere.But the above means that there exists sequences εj tending to 0 for which

    the above holds, which results in convergence almost everywhere.

    Theorem 7.1.4 (Plancheral’s theorem). Suppose f, g ∈ L1 ∩ L2(Rd), then

    (i)

    ∫Rdf(x)g(x) dx =

    1

    (2π)d

    ∫Rdf̂(ξ)ĝ(ξ) dξ,

    (ii)

    ∫Rd|f(x)|2 dx = 1

    (2π)d

    ∫Rd|f̂(y)|2 dy.

    Proof. We compute the first part:∫Rdf(x)g(x) dx =

    ∫Rd

    ˇ̂f(x)g(x) dx

    =

    ∫Rd

    1

    (2π)d

    ∫Rdeiy·xf̂(y) dyg(x) dx

    =1

    (2π)d

    ∫Rdf̂(y)

    ∫Rde−ix·yg(x) dx dy

    =1

    (2π)d

    ∫Rdf̂(y)ĝ(y) dy.

    The second part follows immediately by letting g = f .

  • FEJER KERNELS 19

    Corollary 7.1.5. Let B(0, r) be the ball of radius r centred on 0 in Rd. Letf ∈ L1 ∩ L2(Rd). Then ̂

    χB(0,r)f̂ → fin L2 as r →∞.Compare this with how partial sums of Fourier series converge to the functionin L2.

    One might ask similar questions for Lp ∩ L2. In one dimension this is true,but in dimensions 2 or greater it is not. This is due to Fefferman in the seventies,part of what earned him the Field’s medal.

    Lecture 8 Fejer Kernels

    8.1 Fejer Kernels and Approximate Identities

    One of our fondest hopes in this course is that the Fourier series of a functionconverges, in some reasonable way, to the function itself. Another way of askingif this happens is to study the partial sums

    Snf(x) =

    n∑k=−n

    f̂(k)eikx

    and ask whether Snf(x)→ f(x) in some sense of convergence, be it in Lp norm,almost everywhere, uniformly, and so on.

    What Fejer showed is that if we define

    σn =S0 + S1 + . . .+ Sn

    n+ 1,

    i.e. the arithmetic average of the first n+ 1 partial sums, then σnf → f in Lpand almost everywhere.

    In general, suppose { an }∞n=0 ⊂ R, and an → L, then σn → L as well. Wesay that a sequence for which σn converges is Cesaró summable .

    The converse is in general not true:

    Example 8.1.1. Let an = (−1)n. Clearly this does not converge to anything—it jumps between 1 and −1 indefinitely. However it does converge in the Cesarósense, since σ0 = 1, σ1 = 0, σ2 = 1/3, σ3 = 0, σ4 = 1/5, and so on. N

    There are other curious ways to sum things:

    Example 8.1.2. For a sequence a0, a1, . . ., let 0 < r < 1 and consider S(r) =a0 + a1r + a2r

    2 + . . .. Iflimr→1

    S(r)

    exists, we say that the sequence a0, a1, . . . is Abel summable . N

    Let f ∈ L1([−π, π)), then

    Snf(x) =

    n∑k=−n

    f̂(k)eikx =

    n∑k=−n

    1

    ∫ π−π

    f(t)e−ikt dteikx

    =1

    ∫ π−π

    f(t)

    n∑k=−n

    eik(x−t) dt.

  • FEJER KERNELS 20

    We identify the inner sum

    Dn(x) =

    n∑k=−n

    eikx

    called the Dirichlet kernel . Therefore

    Skf(x) =1

    ∫ π−π

    f(t)Dk(x− t) dt

    and

    σnf(x) =1

    ∫ π−π

    f(t)

    n∑k=0

    Dk(x− t)n+ 1

    dt.

    Now we once more identify the inner sum as a new piece of notation—this timeit will turn out to be very useful, after some algebra:

    Kn(x) =

    n∑k=0

    Dk(x)

    n+ 1

    is the so-called Fejer kernel , and using it we have

    σnf(x) =1

    ∫ π−π

    f(t)Kn(x− t) dt.

    We will now spend some rewriting the Fejer kernel in a more practical way:

    Kn(x) =ei0x +

    ∑1k=−1 e

    ikx +∑2k=−2 e

    ikx + . . .+∑nk=−n e

    ikx

    n+ 1

    and if we just count how many times each eikx appears for each k, we can clearlyrewrite it as

    Kn(x) =

    n∑l=−n

    (n+ 1)− |l|n+ 1

    eils.

    Now as an aside, note that( n∑j=0

    ei(j−n/2)x)2

    =

    2n∑l=0

    ∑j+k=l

    ei(l−n)x =

    2n∑l=0

    ((n+ 1)− |l − n|)ei(l−n)x

    and therefore

    Kn(x) =1

    n+ 1

    ( n∑j=0

    ei(j−n/2)x)2

    =1

    n+ 1

    (e−in/2x

    n∑j=0

    eijx)2

    and the sum in the last step is geometric, so this is the same as

    1

    n+ 1

    (e−in/2x

    1− (eix)n+1

    1− eix)2

    =1

    n+ 1

    (e−in/2xiei(n/2+1)x1− eix

    )2and by multiplying and dividing by e−ix/2 we get

    1

    n+ 1

    (e−i(n+1)/2x − ei(n+1)/2xe−ix/2 − eix/2

    )2=

    1

    n+ 1

    (−2i sin((n+ 1)/2x)−2i sin(x/2)

    )2and simplifying this is just

    Kn(x) =1

    n+ 1

    ( sin(n+12 x)sin(x2 )

    )2.

  • FEJER KERNELS 21

    Lemma 8.1.3 (Properties of the Fejer kernel). (i) Kn(x) ≥ 0 for all x ∈[−π, π).

    (ii) Fix a δ > 0. Then Kn(x)→ − as n→∞ uniformly on [−π, π) \ (−δ, δ).

    (iii)1

    ∫ π−π

    Kn(x) dx = 1.

    Proof. (i) is quite clear, since we have the square of a real number.(ii) is reconciled by noting that the sin in the numerator is bounded by 1,

    and in the bottom we can take x = δ since |x| > δ means that

    Kn(x) =1

    n+ 1

    ( sin(n+12 x)sin(x2 )

    )2=

    1

    n+ 1

    1

    sin(δ/2)2

    which goes to 0 uniformly as n→∞, since the estimate is independent of x solong as |x| > δ.

    (iii) We just compute, recalling one of the earlier forms of the Fejer kernel:

    1

    ∫ π−π

    Kn(x) dx =1

    ∫ π−π

    n∑k=−n

    n+ 1− |k|n+ 1

    eikx dx

    =

    n∑k=−n

    1

    n+ 1− |k|n+ 1

    ∫ π−π

    eikx dx

    and this last integral is 0 unless k = 0, and so the whole thing is equal to

    1

    n+ 1

    n+ 12π = 1.

    Remark 8.1.4. Any sequence of functions satisfying (i)–(iii) is called an ap-proximate identity .

    Theorem 8.1.5. (i) For f ∈ Lp([−π, π)), 1 ≤ p < ∞, then σnf(x) → f(x)in the Lp([−π, π)) norm, i.e.

    ‖σnf − f‖p =( 1

    ∫ π−π|σnf(x)− f(x)|p dx

    )1/p→ 0

    as n→∞.

    (ii) If f ∈ L1, and f is continuous at x, then σnf(x)→ f(x).

    (iii) If f ∈ L1([−π, π)) then σnf(x)→ f(x) almost everywhere.

    Remark 8.1.6. The proof of this relies on the fact that Kn(x) is an approxi-mate identity, and no other special properties of Kn(x). Therefore any otherapproximate identity has the same properties.

    Remark 8.1.7. The answer to our original question—actual convergence of par-tial sums, not of the means—is a question that has been subject to much study.Carleson proved that Snf(x) → f(x) almost everywhere in L2, and Hunt laterproved the same in Lp for 1 < p

  • CONVERGENCE OF CESARÓ MEANS 22

    Lecture 9 Convergence of Cesaró Means

    9.1 Convergence of Fourier Sums

    We prove the theorem stated at the end of last lecture.

    Theorem 9.1.1. (i) For f ∈ Lp([−π, π)), 1 ≤ p < ∞, then σnf(x) → f(x)in the Lp([−π, π)) norm, i.e.

    ‖σnf − f‖p =( 1

    ∫ π−π|σnf(x)− f(x)|p dx

    )1/p→ 0

    as n→∞.

    (ii) If f ∈ L1, and f is continuous at x, then σnf(x)→ f(x).

    (iii) If f ∈ L1([−π, π)) then σnf(x)→ f(x) almost everywhere.

    Proof. We start by proving (ii). Fix an x, then

    σnf(x)− f(x) =1

    ∫ π−π

    f(t)Kn(x− t) dt− f(x)

    =1

    ∫ π−π

    f(x− t)Kn(t) dt−1

    ∫ π−π

    f(x)Kn(t) dt

    =1

    ∫ π−π

    (f(x− t)− f(x))Kn(x) dt.

    Letting ε > 0, then there exists a δ > 0 such that if |t| < δ then |f(x−t)−f(x)| <ε by continuity.

    By the second property of approximate identities, we can choose an n largeenough so that |Kn(t)| < ε/(‖f‖1 + |f(x)|) for all t ∈ [−π, π) \ (−δ, δ). Then

    |σnf(x)− f(x)| ≤1

    ∫ π−π|f(x− t)− f(x)|Kn(t) dt

    =1

    ∫ δ−δ|f(x− t)− f(x)|Kn(t) dt+

    +1

    ∫[−π,π)\(−δ,δ)

    |f(x− t)− f(x)|Kn(t) dt

    ≤ ε · 1 +ε(‖f‖1 + |f(x)|)‖f‖1 + |f(x)|

    = 2ε.

    For (i), we compute and cleverly use Jensen’s inequality at one point:

    1

    ∫ π−π|σnf(x)− f(x)|p dx =

    1

    ∫ π−π

    ∣∣∣∣ 12π∫ π−π

    (f(x− t)− f(x))Kn(t) dt∣∣∣∣p dx

    ≤ 12π

    ∫ π−π

    1

    ∫ π−π|f(x− t)− f(x)|pKn(t) dt dx.

  • CONVERGENCE OF CESARÓ MEANS 23

    Now noting that dµ = Kn(t)/(2π) dt is a measure with total mass 1, and that|·|p is convex, we have by Jensen’s inequality that this equals

    1

    ∫ π−π

    1

    ∫ π−π|f(x− t)− f(x)|p dx︸ ︷︷ ︸

    =h(t)

    Kn(t) dt.

    We therefore have

    1

    ∫ π−π

    h(t)Kn(t) dt =1

    ∫ π−π

    h(0− t)Kn(t) dt

    but this goes to h(0) = 0 as n→∞, by (ii).(iii) We have

    |σnf(x)| =1

    ∣∣∣∣∫ π−π

    Kn(t)f(x− t) dt∣∣∣∣ ≤ 12π

    ∫ π−π

    Kn(t)|f(x− t)| dt

    =1

    ∫ π−π

    ∫ Kn(t)0

    dr|f(x− t)| dt

    =1

    ∫ ∞0

    χ[0,Kn(t))(r) dr|f(x− t)| dt

    =

    ∫ ∞0

    1

    ∫ π−π

    χ[0,Kn(t))(r)|f(x− t)| dt dr.

    Now χ[0,Kn(t))(r) is the same as χ[r,∞)(Kn(t)), meaning that we have∫ ∞0

    1

    ∫ π−π

    χ[r,∞)(Kn(t))|f(x− t)| dt dr.

    Letting Ir be the subset of [−π, π) where that characteristic function is 1, wehave ∫ ∞

    0

    |Ir|2π

    ∫Ir

    1

    |Ir||f(x− t)| dt dr.

    This inner integral is bounded by the maximal average of f , defined by

    Mf(x) = supr>0

    1

    Br(x)

    ∫Br(x)

    |f(t)| dr,

    which by a theorem of Hardy and Little wood is proven to be in Lp if f is.So our expression is bounded by

    Mf(x)‖Kn‖1 = Mf(x).

    We wish to conclude that σnf(x) ≤Mf(x). To do so, let

    Tf(x) = lim supn→∞

    |σnf(x)− f(x)|.

    If we can show that Tf(x) = 0 almost everywhere, we are done. Let N ∈ N,and choose a continuous function g such that

    ‖f − g‖1 =1

    ∫ π−π|f(t)− g(t)| dt < 1

    N.

  • TOWARD CONVERGANCE OF PARTIAL SUMS 24

    In fact (ii) essentially shows that σng → g uniformly. Then

    |σnf(x)− f(x)| = |σn(f − g)(x) + σng(x)− (f − g)(x)− g(x)|≤ |σn(f − g)(x)|+ |σng(x)− g(x)|+ |(f − g)(x)|≤M(f − g)(x) + |σg(x)− g(x)|+ |(f − g)(x)|.

    Taking lim sup, we get Tf(x) ≤M(f − g)(x) + |f(x)− g(x)| and

    {x | Tf(x) > ε } ⊂ {x |M(f − g)(x) > ε2} ∪ {x | |f(x)− g(x)| > ε

    2}

    and taking measures of these sets we have that

    Tf(x) ≤ C/Nε/2

    +1/N

    ε/2.

    Now let N →∞, then |{x | Tf(x) > ε }| = 0. Take a countable sequence of εngoing to 0, then |{x | Tf(x) > 0 }| = 0, and we’re done.

    Lecture 10 Toward Convergance of Partial Sums

    10.1 Dirichlet Kernels

    Recall that

    Snf(x) =

    n∑k=−n

    f̂(k)eikx =

    n∑k=−n

    ( 12π

    ∫ π−π

    f(t)e−ikt dt)eikx

    =1

    ∫ π−π

    f(t)

    n∑k=−n

    eik(x−t) dt =1

    ∫ π−π

    f(t)Dn(x− t) dt

    wherre

    Dn(s) =

    n∑k=−n

    eiks

    is the Dirichlet Kernel .We can rewrite

    eis/2Dn(s)−e−is/2Dn(s) =n∑

    k=−n

    ei(k+1/2)s−n∑

    k=−n

    ei(k−1/2)s = ei(n+1/2)s−e−i(n−1/2)s.

    Therefore

    Dn(s) =ei(n+1/2)s − e−i(n+1/2)s

    eis/2 − e−is/2=

    2i sin(n+1

    2 s)

    2i sin(s2

    ) = sin (n+12 s)sin(s2

    ) .So in all

    Snf(x) =1

    ∫ π−π

    f(t)sin(n+1

    2 (x− t))

    sin(x−t

    2

    ) dt.

  • TOWARD CONVERGANCE OF PARTIAL SUMS 25

    Remark 10.1.1. Note that the Dirichlet kernel Dn is not an approximate iden-tity; certainly it changes sign, and also

    1

    ∫ π−π|Dn(s)| dt =

    1

    ∫ π−π

    ∣∣∣∣∣ sin(n+1

    2 s)

    sin(s2

    ) ∣∣∣∣∣ ds ≥ 12π∫ π−π

    ∣∣∣∣∣ sin(n+1

    2 s)

    s

    ∣∣∣∣∣ ds=

    1

    π

    ∫ (n+1)/2π(−n+1)/2π

    |sin(u)||u|

    du =2

    π

    ∫ (n+1)/2π0

    |sin(u)|u

    du

    ≥ 2π

    (n−1)/2∑k=0

    ∫ (k+1)πkπ

    |sin(u)|kπ

    =2

    π2

    (n−1)/2∑k=0

    ∫ (k+1)πkπ

    |sin(u)| du

    =2

    π2

    (n−1)/2∑k=0

    1

    k≈ 2π2

    log((n− 1)/2),

    meaning that the L1 norm of the Dirichlet kernel diverges as n→∞.There are now two principal things we wish to discuss. First: convergence ofthe partial sums for continuous functions.

    10.2 Convergence for Continuous Functions

    For f ∈ C([−π, π)), set

    Tnf = Snf(0) =

    n∑k=−n

    eik0f̂(k) =

    n∑k=−n

    f̂(k).

    Note that

    |f̂(k)| =∣∣∣∣ 12π

    ∫ π−π

    f(t)eikt dt

    ∣∣∣∣ ≤ ‖f‖∞,if f is bounded. Then also

    |Tnf | ≤ (2n+ 1)‖f‖∞.

    In other words every Tn is a bounded linear functional from C([−π, π)) to C,however the bound grows with n.

    We’ll now do something clever: for a fixed n, construct the function g suchthat

    g(t) =

    {1, if Dn(t) ≥ 0−1, if Dn(t) < 0.

    In other words g is a bunch of line segments at y = 1 and y = −1, jumpingbetween the two. Certainly g is discontinuous at these jumps, but we canapproximate it to any desired accuracy by gj ∈ C([−π, π)) such that ‖gj‖∞ ≤ 1and gj(t) → g(t) pointwise, where we connect the line segments with steeperand steeper lines for each j.

    Then

    limj→∞

    Tngj = limj→∞

    Sngj(0) = limj→∞

    1

    ∫ π−π

    gj(t)Dn(0− t) dt

    =1

    ∫ π−π|Dn(t)| dt ≈ log(n)|gj |∞.

  • CONVERGENCE IN LP 26

    by dominated convergence.Therefore

    ‖Tn‖ = supg∈C([−π,π))

    Tng

    normg∞≥ log(n).

    Therefore we have a Banach space containing a family of bounded linear func-tionals Tn. Recall the following from functional analysis:

    Theorem 10.2.1 (Principle of Bounded Convergence). Suppose X is a Banachspace, Tα, α ∈ Λ is a family of bounded linear functionals on X, i.e. for all α

    ‖Tα‖ = supx∈X|Tαx|‖x‖

  • CONVERGENCE IN LP 27

    In order to study this, we will make use of the conjugate series of f , whichis defined as

    f̃ ∼∞∑

    k=−∞

    −i sgn(k)f̂(k)eikx.

    This turns out to be an interesting construction:

    Theorem 11.1.2 (M. Riesz). Given f ∈ Lp([−π, π)), 1 < p < ∞, f̃ defines aunique function in Lp, i.e. there exists a unique function f̃ ∈ Lp([−π, π)) suchthat

    ˆ̃f(k) = i sgn(k)f̂(k) for every k.

    Furthermore there exists a constant Cp such that ‖f̃‖p ≤ Cp‖f‖p.

    Remark 11.1.3. Given a harmonic function u on the unit disc, and assuming uis somewhat well-behaved, then it has boundary values on the unit circle. Itturns out then that

    u(θ) = limr→1−

    u(rxiθ) = f(θ),

    and with v being the conjugate function of u we have v(θ) = f̃(θ). That is tosay, this conjugate series does not come from nowhere!

    For f ∈ Lp([−π, π)), define P−f = (f − if̃)/2 and P+f = (f + if̃)/2. Then

    ‖P−f‖p ≤‖f‖p

    2+‖f̃‖p

    2≤‖f‖p

    2+Cp‖f‖p

    2=

    1 + Cp2‖f‖p,

    so P− : Lp([−π, π))→ Lp([−π, π)) is a bounded operator, since

    ‖P−‖ = supf∈Lp

    normP−fp‖f‖p

    =1 + Cp

    2.

    In the same way P+ is a bounded operator. We call these P− and P+ becausethey are projections:

    P+f ∼∞∑

    k=−∞

    f̂(k)

    2eikx + i

    ∞∑k=−∞

    −i sgn(k)f̂(k)2

    eikx

    =

    ∞∑k=−∞

    f̂(k) + sgn(k)f̂(k)

    2eikx =

    ∞∑k=0

    f̂(k)eikx.

    If we now apply P+ again,

    P+ ◦ P+(f) =∞∑k=0

    f̂(k)

    2eikx + i

    ∞∑k=0

    −i sgn(k)f̂(k)2

    eikx =

    ∞∑k=0

    f̂(k)eikx = P+f.

    In other words, P+ sets negative Fourier coefficients to 0, and P− sets nonneg-ative ones to 0.

    The reason this is interesting is this: Snf is also the Fourier series of f with

  • CONVERGENCE IN LP 28

    a bunch of coefficients set to 0;

    Snf(x) =

    n∑k=−n

    f̂(k)eikx = eix(n+1)n∑

    k=−n

    f̂(k)eix(k−n−1)

    = eix(n+1)P−

    ( ∞∑k=−n

    f̂(k)eix(k−n−1))

    = eix(n+1)P−

    (e−ix(2n+2)

    ∞∑k=−n

    f̂(k)eix(k+n))

    = eix(n+1)P−

    (e−ix(2n+2)P+

    ( ∞∑n=−∞

    f̂(k)eix(k+n)))

    = eix(n+1)P−

    (e−ix(2n+2)P+

    (e−inx

    ∞∑n=−∞

    f̂(k)eikx))

    which means that ‖Snf‖p ≤ C2p‖f‖p.Now we’re ready to prove that Snf → f in Lp:

    Proof. Let ε > 0. Pick a trigonometric polynomial q (the Fejer kernels, forinstance) with ‖f − q‖p < ε. Then

    ‖Snf − f‖p ≤ ‖Snf − Snq‖p + ‖Snq − q‖p + ‖q − f‖p= ‖Sn(f − q)‖p + ‖Snq − q‖p + ‖q − f‖p≤ C2p‖f − q‖p + ‖Snq − q‖p + ‖q − f‖p

    which is bounded by C2pε + ε since the middle term is 0 for a sufficiently largen.

    11.2 Almost Everywhere Convergence

    Kolmogorov proved in 1925 that there exists a function f ∈ L1([−π, π)) suchthat Snf(x) diverges almost everywhere. He later showed that in fact thereexists such an f that diverges on all x, not just on sets of positive measure.

    Carleson proved in 1966 that if you instead work in L2, then Snf(x)→ f(x)almost everywhere.

    Hunt improved this in 1967 to work for f ∈ Lp for 1 < p

  • MAXIMAL FUNCTIONS 29

    Proof. SetTf(x) = lim sup

    n→∞|Snf(x)− f(x)|.

    We want Tf(x) = 0 almost everywhere. Let N ∈ N, and choose a trigonometricpolynomial q such that ‖f − q‖p < 1/N . Then

    |Snf − f | ≤ |Snf − Snq|+ |Snq − q|+ |q − f |.

    Note that Snq → q at every x since they’re equal for large n. Taking lim sup weget

    Tf ≤M(f − q) + |f − q|

    and so

    |SetxTf > ε| = |{x |M(f − q) } > ε/2|+ |{x | |f − q| > ε/2 }|

    but by the lemma ‖M(f − q)‖p ≤ Cp‖f − q‖p. By Chebyshev

    |{x | Tf(x) > ε }| ≤‖M(f − q)‖p

    (ε/2)p+‖f − q‖p(ε/2)p

    ≤ 2p

    εp

    (Cp‖f − q‖p + ‖f − q‖p

    )=≤ 2

    p

    εp(Cp + 1)/N

    which goes to 0 as N →∞.

    Lecture 12 Maximal Functions

    12.1 Hardy-Littlewood Maximal Functions

    Definition 12.1.1. Suppose f is a Lebesgue measurable function on Rd. Wesay that f ∈ L1loc(Rd) if ∫

    B

    |f(x)| dx

  • MAXIMAL FUNCTIONS 30

    • Suppose f ∈ Lp(Rd), and let B ⊂ Rd be a ball. Then∫B

    |f(y)| dy =∫RdχB(y)|f(y)| dy

    ≤(∫

    Rd|χB(y)|q dy

    )1/q(∫Rd|f(y)|p dy

    )1/p≤ |B|1/q‖f‖p.

    In other words Lp ⊂ L1loc, as hinted at above.

    • If µ is a positive Borel measure, then we can define the maximal functionof a measure analogously:

    Mµ(x) = supr>0

    1

    |B(x, r)|µ(B(x, r)) = sup

    r>0

    1

    |B(x, r)|

    ∫B(x,r)

    dµ.

    This is a generalisation of the previous definition, since dµ = |f(y)| dydefines a measure.

    Proposition 12.1.4. If µ is a Borel measure, then Mµ is a Borel measurablefunction.

    Proof. Let λ > 0, and let Eλ = {x |Mµ(x) > λ }. Take x ∈ Eλ. Then thereexists some r0 > 0 such that

    µ(B(x, r0))

    |B(x, r0)|= t > λ.

    Choose δ such that (r0 + δ)n < rn0 t/λ, which is possible since t/λ > 1. Suppose

    y ∈ B(x, δ). Then B(y, r0 + δ) ⊃ B(x, r0). This follows directly by the triangleinequality; take z to be in B(x, r0), then

    d(z, y) ≤ d(x, y) + d(x, z) ≤ δ + r0.

    Therefore

    µ(B(y, r0 + δ)) ≥ µ(B(x, r0)) = t|B(x, r0)|

    >(r0 + δ)

    n

    rn0λ|B(x, r)| = λ|B(y, r0 + δ)|.

    Since (r0 + δ)n/rn0 is the ratio of the volumes of the two balls. This means that

    Mµ(y) > λ, meaning that Ey is open, making it measurable, which in turnmakes Mµ a measurable function.

    Remark 12.1.5. One can also define Mµ and Mf using cubes instead of balls,say Q(x, r) = { y ∈ Rd | |xi − yi| < r, i = 1, 2, . . . , d }. Then

    MQµ(x) = supr>0

    µ(Q(x, r))

    |Q(x, r)|,

    using which

    µ(B(x, r))

    |B(x, r)|≤ µ(Q(x, r))|Q(x, r)|

    |Q(x, r)||B(x, r)|

    =µ(Q(x, r))

    |Q(x, r)|2d

    cd

    where cn = πn/2Γ(n/2 + 1) is the volume of a d-dimensional unit sphere. In

    other words these different ways of defining the maximal function are the sameup to multiplication by some constant depending on the dimension.

  • MAXIMAL FUNCTIONS 31

    Remark 12.1.6. One can also define it in terms of surface integrals over theboundaries of balls, but this is less well-understood.

    Theorem 12.1.7 (Hardy-Littlewood, 1930). If µ is a positive Borel measureon Rd, then for every λ > 0,

    |{x ∈ Rd |Mµ(x) > λ }| ≥ 3d

    λµ(Rd).

    Remark 12.1.8. If f ∈ L1loc(Rd) then

    µ(E) =

    ∫E

    |f(y)| dy,

    meaning that

    |{x ∈ Rd |Mf(x) > λ }| ≤ 3d

    λ

    ∫Rd|f(y)| dy = 3

    d

    λ‖f‖1.

    Remark 12.1.9. Let δ0 be the Dirac measure at 0, i.e. δ0(E) = 1 if 0 ∈ E, and0 otherwise. In R1, we then have

    δ0(B(x, |x|+ ε))|B(x, |x|+ ε)|

    =1

    2(|x|+ ε),

    meaning that Mδ0(x) ≥ 1/(2|x|). Rearranging we therefore have

    {x |Mδ0(x) > λ } ⊃{x

    ∣∣∣∣ |x| < 12λ},

    which it we measure the sets yields

    |{x |Mδ0(x) > λ }| ≥∣∣∣∣{x ∣∣∣∣ |x| < 12λ

    }∣∣∣∣ = 1λ.This serves to demonstrate that the bound in Hardy-Littlewood’s theorem isabout as good as it gets.

    Remark 12.1.10. Suppose f ∈ L1(Rd). Then the theorem says

    |{x ∈ Rd |Mf(x) > λ }| ≤ 3d

    λ‖f‖1.

    Suppose it were the case that we knew∫RdMf(x) dx ≤ C

    ∫Rd|f(x)| dx.

    Then

    λ|{x |Mf(x) > λ }| ≤∫{ x|Mf(x)>λ }

    Mf(x) dx ≤∫RdMf(x) dx

    ≤ C∫Rd|f(x)| dx.

    It turns out, however, that what we assume above it never true.

  • MAXIMAL FUNCTIONS 32

    We will state and prove a lemma that takes us most of the way toward theHardy-Littlewood theorem:

    Lemma 12.1.11 (Wiener’s covering lemma). Suppose W is a set in Rd, andthat

    W ⊂N⋃i=1

    B(xi, ri),

    i.e. W can be covered by a finite set of balls. Then there exists a set of indicesS ⊂ { 1, 2, . . . , N } such that

    (i) The balls {B(xi, ri) | i ∈ S } are disjoint,

    (ii) W ⊂⋃i∈S

    B(xi, 3ri), and

    (iii) |W | ≤ 3d∑i∈S|B(xi, ri)|.

    This is a so-called covering lemma, since it tells about covers. In particular ittells us that if we can cover a set in a finite set of balls, then we can pick asubset of that finite set of balls that is disjoint, and still covers the set if weblow each of the remaining balls up to thrice their original radius.

    Proof. (ii) implies (iii) quite trivially:

    |W | ≤

    ∣∣∣∣∣⋃i∈S

    B(xi, 3ri)

    ∣∣∣∣∣ ≤∑i∈S|B(xi, 3ri)| = 3d

    ∑i∈S|B(xi, ri)|.

    By reordering, we may assume r1 ≥ r2 ≥ . . . ≥ rN . Let i1 = 1, and considerthe biggest ball B(x1, r1) = B(xi1 , ri1). Discard all balls that intersect this one(the idea being that the three-fold enlargement of this ball envelopes any ballthat intersects it, so we don’t need them anyway).

    It no such balls exist, we simply stop. If not, let B(xi2 , ri2) be the largestremaining ball in the list. Throw away any ball that intersects it; If no suchball exists, stop, otherwise continue in the same fashion.

    This process must eventually terminate, since we’re working on a finite setof balls. Thus S = { i1, i2, . . . , i` }. Part (i), the balls being disjoint, is clear byconstruction. To get (ii) we consider some ball B(xj , rj) in the original list. IfB(xj , rj) is in the new list, we’re good to go, since trivially

    B(xj , rj) ⊂ B(xj , 3rj) ⊂⋃i∈S

    B(xi, 3ri).

    If B(xj , rj) was discarded, it is because it intersected some B(xik , rik), for whichrik ≥ rj . Then clearly the discarded ball is contained in the three-fold enlarge-ment of B(xik , rik), since if we take a point z in the intersection and a point yin the discarded ball,

    d(y, xik) ≤ d(y, xj) + d(xj , z) + d(z, xik) ≤ 3rik .

  • MORE ON MAXIMAL FUNCTIONS 33

    Lecture 13 More on Maximal Functions

    13.1 Proof of Hardy-Littlewood’s Theorem

    We wish to prove the theorem stated last lecture. The proof is almost done,given Wiener’s covering lemma that we proved at the end of last lecture.

    Theorem 13.1.1 (Hardy-Littlewood, 1930). If µ is a positive Borel measureon Rd, then for every λ > 0,

    |{x ∈ Rd |Mµ(x) > λ }| ≥ 3d

    λµ(Rd).

    Proof. Fix λ > 0. Let K ⊂ {x ∈ Rn |Mµ(x) > λ } be any compact subset.Then for each x ∈ K, choose a ball B(x, rx) such that

    µ(B(x, rx))

    |B(x, rx)|> λ,

    which is possible since the supremum of those quantities exceed λ, and so thereexists at least one ball that does too—it might not be equal to the supremum,but must at least be somewhere in between. Now K is compact, so there existssome B(x1, r1), . . . , B(xN , rN ) which is a finite subcover of K.

    If we now use Wiener’s covering lemma on this finite subcover, we get afurther reduction to {B(xi, ri) }i∈S for some S ⊂ { 1, 2, . . . , N }. Then

    |K| ≤ 3d∑i∈S|B(xi, ri)| ≤ 3d

    ∑i∈S

    µ(B(xi, ri))

    λ≤ 3dµ(Rd)

    since the last sum is over a collection of disjoint balls, so the sum of the measuresis the measure of the union, and the measure of that union is naturally boundedby the measure of the entire space.

    Now by inner regularity,

    |{x ∈ Rd |Mµ(x) > λ }| = sup{ |K| |K ⊂ {x ∈ Rd |Mµ(x) > λ }, K compact }.

    Consequently we also have the previous bound for the entire set we considered.

    In summary, what we know about Hardy-Littlewood’s maximal function is this:

    Theorem 13.1.2 (Hardy-Littlewood, 1930). Let f be a measurable function onRd. Then

    (i) If f ∈ Lp(Rd), 1 ≤ p ≤ ∞, then Mf is finite almost everywhere.

    (ii) If f ∈ L1(Rd), then

    |{x ∈ Rd |Mf(x) > λ }| ≤ 3d

    λ‖f‖1.

    (iii) For 1 < p ≤ ∞, there exists a constant Ap such that ‖Mf‖p ≤ Ap‖f‖p.

  • MORE ON MAXIMAL FUNCTIONS 34

    It turns out Ap is monotone in p, and approaches ∞ as p approaches 1.Remark 13.1.3. Note that

    Mf(x) = supr>0

    1

    |B(x, r)|

    ∫B(x,r)

    |f(y)| dy

    ≤ supr>0

    1

    |B(x, r)|

    ∫B(x,r)

    ‖f‖∞ dy = ‖f‖∞.

    In other words we know A∞ = 1.

    Definition 13.1.4 (Weak and (strong) type operators). Let T : Lp(Rd) →Lq(Rd), q ≤ p, q ≤ ∞. Then

    (i) T is said to be of (strong) type (p, q) if there exists a constant A suchthat ‖Tf‖q ≤ A‖f‖p for all f ∈ Lp(Rd).

    (ii) T is called weak type (p, q) if there exists a constant A such that

    |{x ∈ Rd | |Tf(x)| > λ }| ≤(A‖f‖p

    λ

    )qfor all f ∈ Lp(Rd) and all λ > 0, with A being independent of these.

    Remark 13.1.5. If an operator T is of type (p, q), then it is also of weak type(p, q), by Chebyshev’s inequality: We have ‖Tf‖q ≤ A‖f‖p, and the left-handside satisfies

    ‖Tf‖q =(∫

    Rd|Tf(x)|q dx

    )1/q≥(∫{ x||Tf(x)|>λ }

    |Tf(x)|q dx)1/q

    ≥ λq|{x | |Tf(x)| > λ }|1/q,

    meaning that

    |{x | |Tf(x)| > λ }| ≤(A‖f‖p

    λ

    )q.

    Remark 13.1.6. Therefore M is of (strong) type (∞,∞) and weak type (1, 1).We will introduce the following notation in order to make the upcoming discus-sion nicer:

    Definition 13.1.7. The sum of Lp spaces is defined as:

    Lp1(Rd) + Lp2(Rd) = { f1 + f2 | f1 ∈ Lp1(Rd), f2 ∈ Lp2(Rd) }.

    Proposition 13.1.8. Suppose p1 < p < p2. Then Lp(Rd) ⊂ Lp1(Rd)+Lp2(Rd).

    Proof. Let f ∈ Lp(Rd), and γ > 0. Set

    f1(x) =

    {f(x), if |f(x)| ≥ γ0, otherwise

    and

    f2(x) =

    {f(x), if |f(x)| < γ0, otherwise.

  • MORE ON MAXIMAL FUNCTIONS 35

    Therefore by construction f = f1 + f2. Then∫Rd|f1(x)|p1 dx =

    ∫Rd|f1(x)|p|f1(x)|p1−p dx =

    ∫{ x||f(x)|≥γ }

    |f(x)|p|f(x)|p1−p dx

    ≤ γp1−p∫{ x||f(x)|≥γ }

    |f(x)|p dx ≤ γp1−p‖f‖pp

  • MARCINKIEWICZ INTERPOLATION 36

    Lecture 14 Marcinkiewicz Interpolation

    14.1 Proof of Marcinkiewicz Interpolation Theorem

    We will now prove the interpolation theorem we stated and used at the end oflast lecture.

    Proof. First suppose p2 6=∞. We wish to show that ‖Tf‖p ≤ Ap‖f‖p for everyf ∈ Lp(Rd), with Ap not depending on f (but probably depending on p, T , andd). Let m(λ) = |{x ∈ Rd | |Tf(x)| > λ }|. Then∫

    Rd|Tf(x)|p =

    ∫ ∞0

    pλp−1m(λ) dλ.

    This is effectively the layer cake theorem. We therefore need to estimate m(λ),so fix λ > 0 and let

    f1(x) =

    {f(x), if |f(x)| > λ,0, otherwise

    and

    f2(x) =

    {f(x), if |f(x)| ≤ λ,0, otherwise.

    Therefore f = f1 + f2, and by assumption we have sublinearity of T so

    |Tf(x)| = |T (f1 + f2)(x)| ≤ |Tf1(x)|+ |Tf2(x)|,

    whereby

    m(λ) = |{x | |Tf(x)| > λ }| ≤ |{x | |Tf1(x)| > λ/2 }|+ |{x | |Tf2(x)| > λ/2 }|,

    but by weak type we have

    |{x | |Tf1(x)| > λ/2 }| ≤(A1‖f1‖p1

    λ/2

    )p1and

    |{x | |Tf2(x)| > λ/2 }| ≤(A2‖f2‖p2

    λ/2

    )p2.

    Therefore∫Rd|Tf(x)|p dx =

    ∫ ∞0

    pλp−1m(λ) dλ

    ≤∫ ∞

    0

    pλp−1(A1‖f1‖p1

    λ/2

    )p1dλ︸ ︷︷ ︸

    =I

    +

    ∫ ∞0

    pλp−1(A2‖f2‖p2

    λ/2

    )p2dλ︸ ︷︷ ︸

    =II

    .

  • MARCINKIEWICZ INTERPOLATION 37

    Studying the two integrals one at a time, we have

    I = p(2A1)p1

    ∫ ∞0

    λp−p1−1‖f1‖p1p1 dλ

    = p(2A1)p1

    ∫ ∞0

    λp−p1−1∫{ y∈|f(y)|>λ }

    |f(x)|p1 dx dλ

    = p(2A1)p1

    ∫Rd

    ∫ |f(x)|0

    λp−p1−1|f(x)|p1 dλ dx

    = p(2A1)p1

    ∫Rd|f(x)|p1 |f(x)|

    p−p1

    p− p1dx =

    p(2A1)p1

    p− p1

    ∫Rd|f(x)|p dx,

    and

    II = p(2A2)p2

    ∫ ∞0

    λp−p2−1‖f2‖p2p2 dλ

    = p(2A2)p2

    ∫ ∞0

    λp−p2−1∫{ y∈|f(y)|≤λ }

    |f(x)|p2 dx dλ

    = p(2A2)p2

    ∫Rd

    ∫ |f(x)|0

    λp−p2−1|f(x)|p2 dλ dx

    = p(2A2)p2

    ∫Rd|f(x)|p2 · −|f(x)|

    p−p2

    p− p2dx =

    p(2A2)p2

    p2 − p

    ∫Rd|f(x)|p dx.

    Therefore ∫Rd|Tf(x)|p dx ≤

    (p(2A1)p1p− p1

    +p(2A2)

    p2

    p2 − p

    )‖f‖pp.

    Notice that this quantity blows up near p1 and p2—if not, we could have takenclever limits and turned weak type into strong type at the endpoints.

    Now suppose p2 = ∞. We proceed almost as before, but assign f1 and f2slightly differently:

    f1(x) =

    {f(x), if |f(x)| > λ/(2A2),0, otherwise

    and

    f2(x) =

    {f(x), if |f(x)| ≤ λ/(2A2),0, otherwise.

    Once again f = f1 + f2, but this time ‖Tf2‖ ≤ A2‖f2‖∞ ≤ A2λ/(2A2) = λ2,whereby |{x | |Tf2(x)| > λ/2 }| = 0. Therefore

    m(λ) ≤ |{x | |Tf1(x)| > λ/2 }|,

  • LEBESGUE DIFFERENTIATION THEOREM 38

    and∫Rd|Tf(x)|p dx =

    ∫ ∞0

    pλp−1m(λ) dλ ≤∫ ∞

    0

    pλp−1|{x | |Tf1(x)| > λ/2 }| dλ

    ≤∫ ∞

    0

    pλp−1(2A1‖f1‖p1

    λ

    )p1dx

    = p(2A1)p1

    ∫ ∞0

    λp−p1−1∫Rd|f1(x)|p1 dx dλ

    = p(2A1)p1

    ∫ ∞0

    λp−p1−1∫{ y||f(y)|>λ/(2A2) }

    |f(x)|p1 dx dλ

    = p(2A1)p1

    ∫Rd|f(x)|p1

    ∫ 2A2|f(x)|0

    λp−p1−1 dλ dx

    =p(2A1)

    p1(2A2)p−p1

    p− p1‖f‖pp,

    and we are done.

    The more general version hinted at in the end of the last lecture is proven inmuch the same way, but is messier. We can actually interpolate from evenweaker assumptions:

    Definition 14.1.1 (Restricted weak type). Let 1 ≤ p ≤ ∞. An operator T issaid to be restricted weak type (p, p) if

    |{x | |TχE(x)| > λ }| ≤(Ap‖χE‖pλ

    )pfor all measurable sets E.

    Theorem 14.1.2 (Stein-Weiss). Suppose 1 ≤ p1 < p2 ≤ ∞. Suppose T be anoperator from Lp1(Rd) +Lp2(Rd) to the space of measurable functions. AssumeT is subliniear, and that T is restricted weak type (p1, p1) and (p2, p2). Then Tis strong type (p, p) for all p1 < p < p2.

    Lecture 15 Lebesgue Differentiation Theorem

    15.1 A Note About Maximal Functions

    Recall how we discussed how the Hardy-Littlewood maximal function can bedefined in terms of cubes instead of balls, but that it does not work over arbitraryrectangles. It does, however, work over rectangles so long as you require theirsides to be parallel to the axes, i.e.

    R = [a1, b1]× [a2, b2]× . . .× [ad, bd].

    Then

    1

    |R|

    ∫R

    |f(y)| dy = 1|R|

    ∫ bdad

    . . .

    ∫ b1a1

    |f(y1, . . . , yd)| dy1 . . . dyd

    =1

    |[ad, bd]|

    ∫ bdad

    . . .1

    |[a1, b1]|

    ∫ b1a1

    |f(y1, . . . , yd)| dy1 . . . dyd

    ≤Md(. . . (M2(M1(f))) . . .)(x),

  • LEBESGUE DIFFERENTIATION THEOREM 39

    where by Mi we mean the maximal function in the ith variable.So if we define

    Mf(x) = supR, x ∈ R 1|R|

    ∫R

    |f(y)| dy

    where the rectangles R have sides parallel to the axes, then

    Mf(x) ≤Md(. . . (M2(M1(f))) . . .)(x)

    and ∫Rd|Mf(x)|p dx ≤

    ∫R

    ∫R. . .

    ∫R

    (Md . . .M1f(x))p dxd . . . dx1

    =

    ∫R. . .

    ∫R|Md−1 . . .M1f(x)|p dxd . . . dx1 ≤ C‖f‖pp.

    In other words rectangles with arbitrary direction is bad, but rectangles in thesame orientation are OK. One then asks how many directions are OK—it hasbeen shown that rectangles with major axes with angles π/2k work.

    15.2 Lebesgue Differentiation Theorem

    Theorem 15.2.1. Suppose 1 ≤ p ≤ ∞ and f ∈ Lp(Rd). Then for almost everyx ∈ Rd,

    limr→0

    1

    |B(x, r)|

    ∫B(x,r)

    |f(y)− f(x)| dy = 0.

    Corollary 15.2.2. If f ∈ Lp(Rd), then for almost every x ∈ Rd

    limr→0

    1

    |B(x, r)|

    ∫B(x,r)

    f(y) dy = f(x).

    Proof. Assuming the theorem, we have∣∣∣∣∣ 1|B(x, r)|∫B(x,r)

    f(y) dy − f(x)

    ∣∣∣∣∣ ≤ 1|B(x, r)|∫B(x,r)

    |f(y)− f(x)| dy = 0

    from which the corollary follows.

    Remark 15.2.3. In one dimension we have

    limr→0

    1

    2r

    ∫ x+rx−r

    f(y) dy = f(x)

    almost everywhere by the Fundamental theorem of calculus; the above is just alimit of a difference quotient.

    A point x ∈ Rd is said to be a Lebesgue point of f if

    limr→∞

    1

    |B(x, r)|

    ∫B(x,r)

    |f(y)− f(x)| dy = 0.

    We will let Lf denote the set of all Lebesgue points.

  • LEBESGUE DIFFERENTIATION THEOREM 40

    Lemma 15.2.4. If x is a point of continuity of f , then x ∈ Lf .

    Proof. Let ε >). Then there exists some δ > 0 such that if |x − y| < δ, then|f(x)− f(y)| < ε. So if r < δ, then

    1

    |B(x, r)|

    ∫B(x,r)

    |f(y)− f(x)| dy ≤ 1|B(x, r)|

    ∫B(x,r)

    ε dy = ε.

    Proof. Proof of the theorem Let f ∈ Lp, 1 ≤ p 0, and let k ∈ N.Choose a continuous function g ∈ Lp(Rd) such that ‖f − g‖p < 1/k. Then

    Trf(x) =1

    |B(x, r)|

    ∫B(x,r)

    |f(y)− f(x)| dy

    ≤ 1|B(x, r)|

    ∫B(x,r)

    |(f − g)(y)− (f − g)(x)| dy + 1|B(x, r)|

    ∫B(x,r)

    |g(y)− g(x)| dy

    ≤ 1|B(x, r)|

    ∫B(x,r)

    |(f − g)(y)| dy + |(f − g)(x)|+ Trg(x).

    ThereforeTf(x) ≤M(f − g)(x) + |(f − g)(x)|+ Tg(x).

    The last term is 0 by the lemma, and the first term comes from lim sup ≤ sup.Therefore

    |{x | Tf(x) > ε }| ≤ |{x |M(f − g)(x) > ε/2 }|+ |{x | |f − g)(x)| > ε/2 }|

    but

    |{x |M(f − g)(x) > ε/2 }| ≤C‖f − g‖pp

    (ε/2)2< C

    1

    kp2p

    εp.

    Therefore

    |{x | Tf(x) > ε }| ≤ C 2p

    εp1

    kp.

    Let k →∞, and we get |{x | Tf(x) > ε }| = 0. Thus

    {x | Tf(x) > 0 } =∞⋃n=1

    {x | Tf(x) > 1/n }

    and so Tf(x) = 0 almost everywhere.For p =∞, fix N and consider fχB(0,N) ∈ L1(Rd). Now almost every x ∈ Rd

    is a Lebesgue point, and in particular x ∈ B(0, N − 1). Let N go to ∞, and wecapture everything.

    Corollary 15.2.5. If f ∈ L1loc(Rd), then almost every x is a Lebesgue point.

  • MAXIMAL FUNCTIONS AND KERNELS 41

    We can manage a generalisation of this theorem that seems quite powerful, butisn’t actually all that impressive.

    Definition 15.2.6 (Regular set). A family of sets {Ek(x) }k∈N is said to beregular at x if there exists an α > 0 and a sequence ki decreasing to 0 suchthat Eki(x) ⊂ B(x, ki) and |Eki(x)| > α|B(x, ki)| for every i.

    Theorem 15.2.7. Suppose f ∈ L1loc(Rd). Suppose {Ek(x) } is a regular familyat x. If x ∈ Lf , then

    limi→∞

    1

    |Eki(x)|

    ∫Eki (x)

    |f(y)− f(x)| dy = 0.

    Corollary 15.2.8. Suppose f ∈ L1loc(Rd). Suppose at each x there exists afamily of regular sets {Ek(x) }. Then

    limi→∞

    1

    |Eki(x)|

    ∫Eki (x)

    |f(y)− f(x)| dy = 0

    almost everywhere.

    Proof of theorem. It’s pretty much straight forward from what we already know:

    1

    |Eki(x)|

    ∫Eki (x)

    |f(y)− f(x)| dy ≤ 1|Eki(x)|

    ∫B(x,ki)

    |f(y)− f(x)| dy

    ≤ 1αx|B(x, ki)|

    ∫B(x,ki)

    |f(y)− f(x)| dy → 0

    as i→∞ since |Eki(x)| > αx|B(x, ki)|.

    Lecture 16 Maximal Functions and Kernels

    16.1 Generalising Lebesgue Differentiation Theorem

    We showed that if f ∈ L1loc(Rd), then

    limr→0

    1

    |B(x, r)|

    ∫B(x,r)

    f(y) dy = f(x)

    almost everywhere.We can rewrite this in the following way:

    1

    |B(x, r)|

    ∫B(x,r)

    f(y) dy =

    ∫RdχB(x,r)(y)

    1

    |B(x, r)|f(y) dy

    =

    ∫Rd

    χB(x,r)(x− y)rd|B(0, r)|

    f(y) dy

    =

    ∫Rd

    χB(0,1)(x−yr )

    rd|B(0, 1)|f(y) dy

    =

    ∫Rd

    1

    rdϕ(x− y

    r

    )f(y) dy,

  • MAXIMAL FUNCTIONS AND KERNELS 42

    where

    ϕ(s) =χB(0,1)(s)

    |B(0, 1)|.

    Note that ∫Rdϕ(s) ds = 1.

    We ask ourselves the following question: Given a ϕ with total mass 1, when isit true that

    limr→0

    ∫Rd

    ϕ(x−yr )

    rdf(y) dy = f(x)

    almost everywhere, like with the above?In order to make life easier on ourselves we will often write

    ϕr(s) =1

    rdϕ(s/r).

    Consider ϕ(s) being some function on Rd with total mass 1, let

    v(x, t) = ϕt ∗ f(x) =∫Rdϕt(x− y)f(y) dy =

    ∫Rd

    1

    tdϕ(x− y

    t

    )f(y) dy.

    We will think of v(x, t) as a function on Rd+1+ = { (x, t) | x ∈ Rd, t > 0 }. Thismeans that the question we’re asking, namely whether

    limt→0

    v(x, t) = f(x)

    almost everywhere, boils down to asking what happens as we project radiallydownwards onto Rd.

    We can make a slight generalisation without trouble: Consider

    Γα(x) = { (y, t) | y ∈ Rd, t > 0, |x− y| < αt },

    i.e. a cone above the fixed point x. We may then consider limits of the form

    lim(y,t)→(x,0)(y,t)∈Γα(x)

    v(y, t) = f(x)

    almost everywhere. This type of limit is called a nontangential limit sincethe path we take approaching (x, 0) can’t be tangential to x in Rd.

    Recall how when we proved the Lebesgue differentiation theorem, we wrotef = f − g+ g with g being a continuous function, then proved that the theoremis true for g, and finally controlled f − g using the maximal function M(f − g).

    We use exactly the same strategy for this more general result:

    Lemma 16.1.1. Suppose ϕ ∈ L1(Rd) such that∫Rdϕ(x) dx = 1,

    and let α > 0. Finally let g be continuous with compact support. Then

    lim(y,t)→(x,0)(y,t)∈Γα(x)

    v(y, t) = g(x)

    at all x.

  • MAXIMAL FUNCTIONS AND KERNELS 43

    Proof. Take (yj , tj) ∈ Γα(x) such that (yj , tj)→ (x, 0) as j →∞. Then

    |v(yj , tj)− g(x)| = |(ϕtj ∗ g)(yj)− g(x)|

    =

    ∣∣∣∣∣∫Rd

    ϕ(yj−syj

    )

    tdjg(s) ds− g(x)

    ∣∣∣∣∣=

    ∣∣∣∣∣∫Rd

    ϕ(yj−syj

    )

    tdjg(s) ds−

    ∫Rd

    ϕ(yj−syj

    )

    tdjg(x) ds

    ∣∣∣∣∣≤∫Rd

    ∣∣∣∣∣ϕ(yj−syj

    )

    tdj

    ∣∣∣∣∣|g(s)− g(x)| ds.We make a change of variable, taking u = (yj−s)/tj , meaning that s = yj− tjuand du = (−1)d/tdj ds. Note that the (−1)d will disappear when we switch thelimits of integration in each variable, should d happen to be odd.

    Then the above is equal to∫Rd|ϕ(u)||g(yj − tju)− g(x)| du

    and since g is continuous,

    |yj − tju− x| ≤ |yj − x|+ tj |u| < (α+ |u|)tj

    (since |yj − x| < αj in Γα(x)) implies that

    limj→∞|g(yj − tju)− g(x)| = 0

    for every x. Moreover

    |ϕ(u)||g(yj − tju)− g(x)| ≤ 2‖g‖∞|ϕ(u)|

    since g is continuous with compact support. Therefore by Lebesgue dominatedconvergence theorem,

    limj→∞

    ∫Rd|ϕ(u)||g(yj− tju)−g(x)| du =

    ∫Rd

    limj→∞|ϕ(u)||g(yj− tju)−g(x)| du = 0.

    Definition 16.1.2. A function Ψ on Rd is said to be radial if Ψ(x) = Ψ(y)whenever |x| = |y|. In other words, and hence the name, the value of thefunction at a point depends only on the distance from the origin of that point.Sometimes, in an abuse of notation, we will write Ψ(r), with r ≥ 0 being theradius.

    Lemma 16.1.3. Suppose Ψ is radial, bounded and positive. Suppose moreoverthat Ψ ∈ L1(Rd) and that Ψ(x) is decreasing sa |x| → ∞.

    SetΨ̃(x) = sup

    { y||y−x|

  • RISING SUN LEMMA 44

    Proof. By definition, we have that for |x| ≤ α, Ψ̃(x) = Ψ(0), and if |x| > α,then Ψ̃(x) = Ψ(|x| − α). From there on, the boundedness and radialness followby definition, and the function being decreasing is clear by the above.

    Definition 16.1.4. For α > 0, ϕ ∈ L1(Rd), and f ∈ Lp(Rd) for 1 ≤ p ≤ ∞.Set v(x, t) = ϕt ∗ f(x) and let

    Nαv(x) = sup{ |v(y, t)| | (t, y) ∈ Γα(x) },

    called the nontangential maximal function . In other words, it is the supre-mum of all the v(y, t) in the cones discussed before.

    This plays the same role the Hardy-Littlewood maximal function did in theLebesgue differentiation theorem. As it turns out, we needn’t reinvent the wheeleither—much of what we know about the Hardy-Littlewood maximal functiontransfers!

    Theorem 16.1.5. Suppose ϕ ∈ L1(Rd) is bounded. Let

    Ψ(x) = ess sup|y|≥|x|

    |ϕ(y)|,

    which is integrable. Then if f ∈ Lp(Rd) with 1 ≤ p ≤ ∞,

    Nαv(x) ≤ CMf(x),

    where Ψ depends only on d, ϕ¡ and α.

    The function Ψ as defined above is called the least decreasing radial majo-rant of ϕ.

    Proof. Take (y, t) ∈ Γα(x). Then

    ϕt ∗ f(y) =

    ∣∣∣∣∣∫Rd

    ϕ(y−st )

    tdf(s) ds

    ∣∣∣∣∣ ≤∫Rd

    Ψ(y−st )

    td|f(s)| ds

    ≤∫Rd

    Ψ̃(x−st )

    td|f(s)| ds

    since ∣∣∣∣x− st − y − st∣∣∣∣ < α.

    Therefore

    Nαv(x) ≤ supt>0

    ∫Rd

    Ψ̃(x−st )

    td|f(s)| ds.

    Lecture 17 Rising Sun Lemma

    17.1 Nontangential Maximal Function

    We start by proving the theorem stated at the end of last lecture.

  • RISING SUN LEMMA 45

    Proof continued. We left off last time at

    Nαv(x) ≤ supt>0

    ∫Rd

    Ψ̃(x−st )

    td|f(s)| ds.

    To finish the proof we would like to show that

    supt>0

    ∫Rd

    Ψ̃(x−st )

    td|f(s)| ≤ CMf(x) = C sup

    r>0

    ∫Rd

    χB(x,r)

    |B(x, r)||f(y)| dy.

    We restrict our study to the case of x = 0—all other cases can be derived fromthis by a change of variable in the ordinary way. Therefore we instead need toshow that

    supt>0

    ∫Rd

    Ψ̃( st )

    td|f(s)| ≤ CMf(0) = C sup

    r>0

    ∫Rd

    χB(0,r)

    |B(0, r)||f(y)| dy.

    Fix t, and we get, using Fubini’s theorem to switch the order of integration,∫Rd

    Ψ̃( st )

    td|f(s)| =

    ∫Rd

    ∫ 1/tdΨ̃(s/t)0

    dr|f(s)| ds

    =

    ∫ ∞0

    ∫|{ s|Ψ̃(s/t)>r }|

    |f(s)| ds dr

    =

    ∫ ∞0

    |{ s | Ψ̃(s/t) > r }||{ s | Ψ̃(s/t) > r }|

    ∫{ s|Ψ̃(s/t)>r }

    |f(s)| ds dr.

    Now since Ψ̃ is radial and decreasing, the level sets for it are particular ballscentred on the origin. Now the maximal function is the supremum over all suchballs, so it dominates the above:

    ≤∫ ∞

    0

    |{ s | Ψ̃(s/t) > r }|Mf(0) dr = Mf(0)∫Rd

    Ψ̃(s) ds.

    Theorem 17.1.1. Suppose ϕ ∈ L1(Rd) is bounded, that its integral is 1, andthat its least decreasing radial majorant Ψ is integrable. Then for f ∈ L1(Rd),for 1 ≤ p ≤ ∞,

    lim(y,t)→(x,0)(y,t)∈Γα(x)

    ϕt ∗ f(y) = f(x)

    almost everywhere.

    Proof. SetTf(x) = lim sup

    (y,t)→(x,0)(y,t)∈Γα(x)

    |ϕt ∗ f(y)− f(x)|.

    Let ε > 0 and k a positive integer. Choose a function g that is continuous withcompact support such that ‖f − g‖p < 1/k. Then

    ϕt ∗ f(y)− f(x) = ϕtf(y)− ϕt ∗ g(y) + ϕt ∗ g(y)− g(x) + g(x)− f(x).

    By the lemma from last time, Tg(x) = 0. Thus

    Tf(x) ≤ T (f − g)(x) + |g(x)− f(x)| ≤ Nα(ϕt ∗ (f − g))(x) + |g(x)− f(x)|,

  • RISING SUN LEMMA 46

    so Tf(x) ≤ CM(f − g)(x) + |f(x)− g(x)|. Therefore

    |{x | Tf(x) > ε }| ≤ |{x |M(f − g)(x) > ε/(2C) }|+ |{x | |f(x)− g(x)| > ε/2 }|

    which by the maximal function being weak type (1, 1) and using Chebyshev onthe second set is bounded by

    D‖f − g‖pp(ε/(2C))p

    +‖f − g‖pp(ε/2)p

    for a constant D. All of this is therefore bounded by 1/kp and some constant,but as we let k →∞ all goes to 0. Therefore

    |{x | Tf(x) > ε }| = 0,

    and if we take some countable sequence of εn going to 0 and union over thesewe get

    |{x | Tf(x) > 0 }| = 0,

    and we are done.

    17.2 Riesz’s Proof of the Hardy-Littlewood Theorem

    We’ll take a moment to appreciate the beautifully simple proof Riesz producedfor the Hardy-Littlewood Maximal Function Theorem.

    Consider

    MRf(x) = supξ>x

    1

    ξ − x

    ∫ ξx

    |f(t)| dt,

    making it a right-handed maximal function, and similarly for ML.Then clearly Mf ≤MRf +MLf .Now set

    F (x) =

    ∫ x0

    |f(t)| dt,

    making F an increasing function. Fix λ and imagine rays of a run infinitely fataway shining down on the graph of F , with rays coming in with slope λ.

    Then there will be areas that are in the shadow, which we can characteriseas intervals (ai, bi). Then MRf(x) > λ if and only if there exists some ξ > xsuch that

    1

    ξ − x

    ∫ ξx

    |f(t)| dt > λ

    which in turn is true if and only if there exists some ξ > x such that

    F (ξ)− F (x)ξ − x

    > λ

    which moreover is true if and only if x is in the shadow.Therefore {x |MRf(x) > λ } is the same as the set of x in the shadows,

    which is the union ⋃i

    (ai, bi).

  • CALDERÓN-ZYGMUND DECOMPOSITION OF FUNCTIONS 47

    For each (ai, bi) we have exactly

    F (bi)− F (ai)bi − ai

    = λ.

    Therefore

    |{x |MRf(x) > λ }| =∑i

    (bi − ai) =1

    λ

    ∑i

    (F (bi)− F (ai))

    ≤ 1λ

    ∑i

    ∫ biai

    |f(t)| dt ≤ 1λ

    ∫R|f(t)| dt =

    ‖f‖1λ

    .

    So MR is weak type (1, 1), by a really, really simple argument. The sameargument gives ML as weak (1, 1), and together M is weak type (1, 1), andwe’ve proven the Hardy-Littlewood Maximal Function Theorem.

    Lecture 18 Calderón-Zygmund Decomposition ofFunctions

    18.1 Higher-Dimensional Rising Sun Lemma

    Theorem 18.1.1 (Calderón-Zygmind, 1952). Let f ≥ 0 be a function inL1(Rd), and let λ > 0. Then there exists a decomposition of Rd so that

    (i) Rd = F ∪ Ω and F ∩ Ω = ∅;

    (ii) f ≤ λ almost everywhere on F ;

    (iii) Ω =⋃kQk where Qk are cubes whose interiors are disjoint and so that

    for each cube Qk, we have

    λ <1

    |Qk|

    ∫Qk

    f(x) dx ≤ 2dλ.

    Remark 18.1.2. Note that F ⊂ {x | f ≤ λ } but they need not be equal.

    Definition 18.1.3 (Dyadic cube). A dyadic cube in Rd is a cube of the form

    Q =

    [k12j,k1 + 1

    2j

    ]×[k22j,k2 + 1

    2j

    ]× . . .×

    [kd2j,kd + 1

    2j

    ]for some integers k1, k2, . . . , kd and an integer j.

    Proof of theorem. Since f ∈ L1(Rd), we have∫Rdf(t) dt

  • CALDERÓN-ZYGMUND DECOMPOSITION OF FUNCTIONS 48

    for every dyadic cube Q of side length 2j , since the integral over the whole spaceis some finite number, and so we can choose j large enough that

    1

    2j

    ∫Rdf(t) dt < λ

    and so if Q has side length 2j , then

    λ >1

    |Q|

    ∫Rdf(t) dt ≥ 1

    |Q|

    ∫Q

    f(t) dt.

    Now let Q′ be one of the cubes in this family. Divide Q′ into 2d dyadic subcubes,and let Q′′. There are two possibilities:

    Type 11

    |Q′′|

    ∫Q′′

    f(x) dx ≤ λ,

    Type 21

    |Q′′|

    ∫Q′′

    f(x) dx > λ.

    If Q′′ is a cube of Type 2, we leave it alone. Otherwise, we divide it into 2d

    subcubes again and repeat.Continue this process, always leaving fixed cubes of Type 2 and dividing

    cubes of Type 1. Let {Qk } be the collection of all Type 2 cubes. Set Ω =⋃kQk.

    By construction Qk have disjoint interiors.Now consider a typical Qk. It was the product of dividing a Type 1 cube,

    so if we for Qk let Q̃k denote the parent cube it stemmed from, we have

    λ <1

    |Qk|

    ∫Qk

    f(x) dx and1

    Q̃k

    ∫Q̃k

    f(x) dx ≤ λ.

    So we have

    1

    |Qk|

    ∫Qk

    f(x) dx ≤ |Q̃k||Qk|

    · 1|Q̃k|

    ∫Q̃k

    f(x) dx ≤ 2dλ

    since the radio between |Q̃k| and |Qk| is precisely 2d since we divided the formerinto 2d equal parts to get the latter.

    Now set F = Ωc. Suppose x 6∈ Ω, meaning that x is never in a Type 2cube. Therefore there exist dyadic cubes Q` with side lengths going to 0 suchthat x ∈ Q` for all ` and each Q` is Type 1. In other words, since these Q` areshrinking, they are regular at x, meaning that if Q` has side length 1/2

    ` we cancover it by a ball B(x,

    √d/2`) ⊃ Q`.

    So for almost every x ∈ F , by Lebesgue differentiation theorem we have

    λ ≥ 1|Q`|

    ∫Q`

    f(y) dy → f(x)

    and so f(x) ≤ λ for almost every x ∈ F , as claimed.

    Note that this method of proof isn’t all that unfamiliar: it is exactly the samestrategy one uses when proving, say, Bolzano-Weierstrass theorem about con-vergent subsequences of bounded sequences.

  • DENSITY OF SETS 49

    Lecture 19 Density of Sets

    19.1 Hardy-Littlewood’s Theorem from Calderón-Zygmund

    We will show, mostly for fun, that if we have a Calderón-Zygmund decomposi-tion or Rd, then the weak type (1, 1) of Mf follows.

    Fix λ and suppose x ∈ Rd has Mf(x) > cλ for some constant c we’ll fixlater. Then there exists a ball B(x, r) such that

    1

    |B(x, r)|

    ∫B(x,r)

    |f(y)| dy > cλ.

    Let j be the largest integer so that a dyadic cube of the side length 2j is insideB(x, r). Consider all dyadic cubes of side length 2j that intersect B(x, r)—callthese c1, c2, . . . , cm, where the amount m of them depends only on the dimensiond.

    Then ∫⋃mj=1 cj

    |f(y)| dy ≥∫B(x,r)

    |f(y)| dy > cλ|B(x, r)| ≥ cλ|cj |

    for every j. So there exists at least one cj such that∫cj

    |f(y)| dy > cmλ|cj | = λ|cj |,

    if we now fix c = m.We claim that cj ⊂

    ⋃kQk, where Qk are the same cubes as in the Calderón-

    Zygmund decomposition. If not, then cj ∩ Qk = ∅ for all k, meaning thatcj ⊂ F , but since |f | ≤ λ on F , this gives rise to a contradiction since it implies

    1

    |cj |

    ∫cj

    |f(y)| dy ≤ λ,

    even though we know the same is strictly greater than λ.Therefore

    λ <1

    |cj |

    ∫cj

    |f(y)| dy ≤ 2dλ.

    There exists a constant D depending on the dimension d such that the D-foldenlargement of cj contains B(x, r). Call this enlargement c̃j . Similarly, let Q̃kbe the D-fold enlargements of the Calderón-Zygmund cubes, and consider theunion

    ⋃k Q̃k.

    Then{x ∈ Rd |Mf(x) > cλ } ⊂

    ⋃k

    Q̃k.

    These cubes are disjoint, so

    |{x ∈ Rd |Mf(x) > cλ }| ≤∑k

    |Q̃k| ≤ Dd∑k

    |Qk|,

  • DENSITY OF SETS 50

    and the measure of Qk can be bounded by an integral from rearranging theirproperty from the decomposition, so the above is less than or equal to

    Dd∑k

    1

    λ

    ∫Qk

    |f(y)| dy ≤ Dd

    λ

    ∫⋃k Qk

    |f(y)| dy.

    Thus

    |{x ∈ Rd|Mf(x) > λ }| ≤ cDd

    λ

    ∫⋃k Qk

    |f(y)| dy ≤ cDd

    λ

    ∫Rd|f(y)| dy = cD

    d

    λ‖f‖1,

    meaning that we have weak type (1, 1).There is a problem with this line of reasoning: we used the Lebesgue Differ-

    entiation Theorem to prove the existence of the Calderón-Zygmund decompo-sition, but our proof of the Lebesgue Differentiation Theorem in turn requiredthe Hardy-Littlewood theorem for their maximal function.

    19.2 Density of Sets

    Definition 19.2.1 (Point of density). Let E ⊂ Rd be a measurable set. Wesay that x is a point of density of E if

    limr→0

    |E ∩B(x, r)||B(x, r)|

    = 1.

    Theorem 19.2.2. Let E ⊂ Rd be measurable. Then almost every point of E isa point of density of E.

    Proof. By Lebesgue Differentiation Theorem we have

    limr→0

    1

    |B(x, r)|

    ∫B(x,r)

    f(y) dy = f(x)

    for almost every x. Thus in particular, taking f = χE , we have

    limr→0

    1

    |B(x, r)|

    ∫B(x,r)

    χE(y) dy = χE(x)

    almost everywhere. But for x ∈ E this is the same as

    limr→0

    |B(x, r) ∩ E||B(x, r)|

    = 1.

    Example 19.2.3. Let E be the set of irrational numbers in R. Then

    |E ∩B(x, r)||B(x, r)|

    = 1

    for all x, r, so every R is a point of density. N

    Example 19.2.4. Let E = Q. Then |Q ∩B(x, r)| = 0 for all x, r, so Q has nopoints of density. N

  • DENSITY OF SETS 51

    Definition 19.2.5 (Distance to a set). Let F ⊂ Rd. Set

    δ(x, F ) = inf{ |x− y| | y ∈ F },

    called the distance from x to the set F . We write δ(x) if the set F is understood.

    It is a fact that if F is closed, then x ∈ F is equivalent with δ(x, F ) = 0.

    Proposition 19.2.6. Let F ⊂ Rd be closed. Let x ∈ F , then for every y ∈ Rdwe have δ(x+ y) ≤ |y|.

    Proof. This is completely straight forward: certainly we have |y| = |(x+y)−x|.Taking infimum, we have δ(x+ y) ≤ |y|.

    Proposition 19.2.7. Let F ⊂ Rd be