Three Lectures on Littlewood-Paley Theory · Three Lectures on Littlewood-Paley Theory Michael...

Three Lectures on Littlewood-Paley Theory

Michael Wilson1

University of VermontBurlington, Vermont, USA

andUniversidad de Sevilla

These lectures were given while I was visiting the Universidad de Sevilla during theacademic year 2004-2005.

I can never adequately thank my colleagues at the Universidad de Sevilla for themany kindnesses my family and I received during my sabbatical year in Sevilla. I wish tothank in particular Renato Alvarez Nodarse, Guillermo Curbera Costello, Rafael EspinolaGarcia, Genaro Lopez Acedo, Alejandro Rodriguez Martinez, Alfonso Montes Rodriguez,Luis Rodriguez Piazza, Juan Arias de Reyna, and Francisco Jose Freniche Ibanez for thewarm welcome and the help they gave to me and my family. However, my deepest gratitudemust be reserved for my friend, Carlos Perez Moreno, without whose kind invitation andtireless work on our behalf, our visit to Sevilla would not have been possible.

While at the Universidad de Sevilla, my research was funded by a fellowship (numberSAB2003-0003) from the Ministerio de Educacion, Cultura, y Deporte. I am very pleasedto acknowledge the Ministerio’s generous support.

LECTURE I

Littlewood-Paley theory offers an approach to the following question: Given a functionf , what is f , really?

This question is not stupid. Functions in nature (signals) are only known by theirinteractions with measuring devices (such as eyes and ears). We model this interactionby means of inner products. The mathematical problem—“Given f , what is f?”—thenbecomes, “Given that we know, with a high degree of accuracy, f ’s inner products 〈f, φn〉with a suitable family of functions φn, what can we infer about f?”

An example of a suitable family is the Fourier system einθ for θ ∈ [0, 2π). TheFourier coefficients of f ∈ L1 are defined by the usual formula:

f(n) ≡ 12π

∫ 2π

0

f(θ) e−inθ dθ.

We can then identify f with its Fourier series:

f ∼∞∑−∞

f(n)einθ. (1.1)

But “identify” must be used guardedly, for at least 2 reasons.

1 This work was supported by a fellowship (number SAB2003-0003) from the Ministeriode Educacion, Cultura, y Deporte.

1

Reason 1. If we sum up the series (1.1) in the naive fashion, as the limit of

N∑

−N

f(n)einθ

as N → ∞, it can fail to converge at many points θ, even if f is continuous; and, if f ismerely integrable, it can diverge everywhere. These problems can be fixed by summingthe series in a gentler fashion; e.g., using Poisson or Fejer summation. However, even“mollified” summation does nothing to fix the other problem:

Reason 2. We do not actually sum up the series (1.1), but a series

∞∑−∞

f(n)(1 + εn)einθ,

where the εn’s are (we hope) very small, but random errors. If each εn equals ±ε, whereε > 0 is fixed and the ‘±’ denotes independent, equal-probability sign changes, then theexpected value of our error,

ε

∞∑−∞

±f(n)einθ,

will be comparable to

ε

( ∞∑−∞

|f(n)|2)1/2

,

which will be infinite if f /∈ L2. But the situation is even worse than that. It is a theoremthat, if 1 < p < ∞ and p 6= 2, then, with probability 1, the Fourier multiplier that maps

∞∑−∞

f(n)einθ 7→∞∑−∞

±f(n)einθ

is NOT bounded on Lp. Thus, even if the relative errors in our measured values of f(n)are very small, and even if we choose a gentle way of summing (1.1), the noise can easilydrown out the signal.

Littlewood-Paley theory provides us with a more stable way of decomposing (andre-composing) functions f . Now we will show how one form of it works.

Definition 1.1. We use D to denote the family of dyadic intervals on R:

D = [j2k, (j + 1)2k) : j, k ∈ Z.

It is easy to see that, if I and J belong to D, either I ⊂ J , J ⊂ I, or I ∩ J = ∅.If I is an interval, we use `(I) to denote its length. As usual, we use |E| to denote the

Lebesgue measure of a set E.

2

Every I ∈ D has a left half, Il, and a right half, Ir, that are also dyadic intervals. IfI is any dyadic interval, we define h(I), the Haar function adapted to I, by

h(I)(x) ≡ |I|−1/2 if x ∈ Il;−|I|−1/2 if x ∈ Ir;0 if x /∈ I.

It is not hard to show that the family of functions h(I)I∈D is orthonormal in L2(R).If f is locally integrable and I ∈ D, we define

λI(f) ≡∫

f(x)h(I)(x) dx,

which we call f ’s Haar coefficient on I.I claim that h(I)I∈D is a complete orthonormal system. To show this, and to better

understand how the Haar system represents functions, it is convenient to see how the Haarcoefficients are related to f ’s averages over dyadic intervals. We begin with a definition.

Definition 1.2. If J is any interval and f is locally integrable, we set

fJ ≡ 1|J |

∫

J

f dx,

the average value of f on J .

An easy computation shows that, for any I ∈ D,

λI(f) = (fIl− fI)|I|1/2 = −(fIr − fI)|I|1/2, (1.2)

with the consequence that, if I ′ is Il or Ir, and x ∈ I ′, then

fI′ − fI = λI(f)h(I)(x).

Suppose J is any dyadic interval, and we write f = f1 + f2, where

f1(x) =

f(x)− fJ if x ∈ J ;0 otherwise.

Using (1.2), it is easy to see that, if I ∈ D, then

λI(f) =

λI(f1) if I ⊂ J ;λI(f2) otherwise. (1.3)

This is a special case of a general phenomenon. Suppose that Jkk is an arbitrarydisjoint collection of dyadic intervals. Now split f into f1 + f2, where

f1(x) =

f(x)− fJkif x ∈ Jk;

0 if x /∈ ∪kJk.

3

This definition forces f2 to equal

f2(x) =

fJkif x ∈ Jk;

f(x) if x /∈ ∪kJk.

This has the consequence that, if I is any dyadic interval not properly contained in someJk, then ∫

I

f dx =∫

I

f2 dx.

From this it is easy to show that, for all dyadic intervals I,

λI(f) =

λI(f1) if I is a subset of some Jk;λI(f2) otherwise. (1.4)

What (1.3) and (1.4) are saying is that the inner product 〈·, h(I)〉 acts like a filter,which “catches” only the oscillation of f on I, at scale roughly equal to `(I).

Let us return to (1.2). Suppose that we have a “tower” of dyadic intervals I0 ⊂ I1 ⊂I2 ⊂ · · · ⊂ IN , where `(Ik+1) = 2`(Ik) for all 0 ≤ k < N . We will refer to I0 as I and IN

by J , and we’ll set `(I) = 2p and `(J) = 2r, where r = p + N . Repeated applications of(1.2) yield:

fI − fJ =∑

K∈D:I⊂K⊂J`(I)<`(K)≤`(J)

λK(f)h(K)(x). (1.5)

Equation (1.5) holds for any x, and for any pair of dyadic intervals I and J such thatx ∈ I ⊂ J and `(I) = 2p and `(J) = 2r. Therefore,

∑

I∈D:`(I)=2p

fIχI(x)−∑

J∈D:`(J)=2r

fJχJ(x) =∑K∈D

2p<`(K)≤2r

λK(f)h(K)(x). (1.6)

It is well worth the reader’s time to thoroughly understand what is going on in (1.6).The sum on the far left is what you get when you replace f by its average values overdyadic intervals of length 2p. The sum contains infinitely many terms, but, for every x, atmost one term will be non-zero. Similar comments apply to the second sum on the left.The sum on the right side of the equals sign also has infinitely many terms. However, forany x, at most N of these terms will be non-zero.

It is now easy to show completeness of the Haar system. Suppose that f ∈ L2 andλI(f) = 0 for all I ∈ D. We need to show that f = 0 a.e., and for this it will be enough toshow that f is a.e.-constant on the half-lines (−∞, 0) and [0,∞). Let x and y be pointsin f ’s Lebesgue set that also lie in the right half-line. We can find dyadic intervals I0, J0,and K0 such that

x ∈ J0 ⊂ I0

y ∈ K0 ⊂ I0

and f(x) − fJ0 and f(y) − fK0 are very small. Equation (1.6) and the assumption thatλI(f) = 0 for all I ∈ D imply that fJ0 = fI0 = fK0 , which implies that f(x) − f(y) is assmall as we like. This shows that f is a.e.-constant on [0,∞), which proves completeness.

4

Remark. We have actually shown that if f is locally integrable, and λI(f) = 0 forall I, then f is a.e. constant on (−∞, 0) and [0,∞). This is significantly stronger thancompleteness.

Notice what happens if we let p → −∞ and r → ∞. If f ∈ L2, the far-left sumconverges to f almost everywhere, while the second sum on the left converges to zeroeverywhere; so, the left-hand side of (1.6) converges to f almost everywhere. Meanwhile—because of completeness—the right-hand side of (1.6) will converge to f in L2. THERE-FORE, the left-hand side of (1.6) also converges to f in L2, and the right-hand side alsoconverges to f almost-everywhere.

But more is true. Completeness ensures that, for f ∈ L2 the sum∑

I λI(f)h(I)

converges to f unconditionally in the L2 norm. This means that if F1 ⊂ F2 ⊂ F3 · · · isany increasing chain of finite subsets of D such that ∪kFk = D, then

∑

I∈Fk

λI(f)h(I) → f

in L2. This is a very strong statement.It is natural to ask if (and when) we can do the same for Lp when p 6= 2.Another consequence of completeness is the equation

∫|f |2 dx =

∑

I

|λI(f)|2, (1.7)

valid for all f ∈ L2.We will now rewrite the right side of (1.7) in a funny way. Noticing that

1 =1|I|

∫χI(x) dx,

we get: ∑

I

|λI(f)|2 =∑

I

|λI(f)|2 1|I|

∫χI(x) dx

=∫ (∑

I

|λI(f)|2|I| χI(x)

)dx.

The one-dimensional dyadic square function, which makes sense for any f ∈ L1loc(R),

is defined by the equation

S(f)(x) ≡(∑

I

|λI(f)|2|I| χI(x)

)1/2

.

Equation (1.7) simply states that

‖f‖2 = ‖S(f)‖2 (1.8)

5

for f ∈ L2. Before going further, let us talk about what (1.8) means in terms of summationerror. A theorem from probability, which we alluded to earlier, states that if a1, a2, a3,. . . , aN are complex numbers, then the expected value of | ± a1 ± a2 ± a3 · · · ± aN |,averaged over all changes in sign, will be comparable to

(∑N1 |ai|2

)1/2

. If we sum up aHaar function series

∑I λI(f)h(I) to get f , we will typically not have the true series, but

one with errors in it, something like:∑

I

λI(f)(1 + εI)h(I),

where, to make things simple, we’ll assume that each εI = ±ε, where ε is small. If the plus-or-minus signs are independent and fairly distributed, the expected size of the accumulatederror in the sum will be comparable to

ε

(∑

I

|λI(f)|2|h(I)|2)1/2

.

But the last quantity—check it—is just ε times S(f)!Equation (1.8) says that if we use a Haar series to reproduce an L2 function, and

make reasonable assumptions about the errors in our coefficients, we can trust that, insome averaged sense, the “noise” will not overwhelm the signal.

The beauty of the Haar system is that, unlike with Fourier series, the same thing canbe said of Haar function decompositions in Lp for p’s different from 2. This is a result ofthe following theorem:

Theorem 1.3. If 1 < p < ∞, there are constants c1(p) and c2(p) such that, for all f ∈ Lp,

c1(p)‖f‖p ≤ ‖S(f)‖p ≤ c2(p)‖f‖p. (1.9)

We will prove Theorem 1.3 in the next lecture. For now let us notice that it answers—in the affirmative—the question we posed a few paragraphs ago.

Corollary 1.4. Let F1 ⊂ F2 ⊂ F3 · · · be an increasing chain of finite subsets of D suchthat ∪kFk = D. If 1 < p < ∞ and f ∈ Lp then∑

I∈Fk

λI(f)h(I) → f

in Lp.

In the language of Banach space theory, Corollary 1.4 says that the Haar system isan unconditional basis for Lp when 1 < p < ∞.

The proof of Corollary 1.4 is easy, given Theorem 1.3. For each k, set

f(k) =∑

I∈Fk

λI(f)h(I),

for f ∈ Lp. The right-hand inequality of (1.9) implies that S(f) < ∞ almost everywhere,from which it is easy to see that S(f − f(k)) → 0 almost everywhere. It is trivial thatS(f−f(k)) ≤ S(f) pointwise. Therefore, the Dominated Convergence Theorem implies that‖S(f − f(k))‖p → 0, and now the left-hand inequality in (1.9) yields that ‖f − f(k)‖p → 0.

The proof of Theorem 1.3 will use this well-known result from harmonic analysis.

6

Theorem 1.5. If f is locally integrable, we set

Md(f)(x) ≡ supI:x∈I∈D

1|I|

∫

I

|f | dt,

which is called the dyadic Hardy-Littlewood maximal function of f . For 1 < p ≤ ∞ thereis a constant cp such that

‖Md(f)‖p ≤ cp‖f‖p

for all f ∈ Lp.

Theorem 1.5 fails for p = 1. Not surprisingly, Theorem 1.3 also fails for p = 1. Whatshould surprise the reader is that only the right-hand inequality of (1.9) fails when p = 1:the left-hand inequality is true. Showing this is not extremely difficult, but it goes beyondthe scope of these lectures. Both parts of (1.9) fail when p = ∞. Showing this is easy, andit will make an excellent exercise for the reader.

7

LECTURE II

We will prove Theorem 1.3 in 3 easy steps. Steps 1 and 2 are merely a “dyadic”rephrasing of arguments given in [CW]1 and [CWW], while Step 3 is a standard dualitytrick, such as can be found in [St].

Step 1. We will show that, for any non-negative function v and any f ,∫

(S(f))2 v dx ≤ C

∫|f |2 Md(v) dx,

where C is an absolute constant.Step 2. We will show that, for all f ∈ L1 and all λ > 0,

|x : S(f)(x) > λ| ≤ C

λ

∫|f | dx,

where, again, C is an absolute constant.Step 1 implies that ‖S(f)‖p ≤ Cp‖f‖p for all 2 ≤ p < ∞, while Step 2, with interpo-

lation, implies that ‖S(f)‖p ≤ Cp‖f‖p for all 1 < p < 2.We will obtain the converse inequality, ‖f‖p ≤ Cp‖S(f)‖p, by means of a duality trick.

That will be Step 3. The duality trick will make use of a little result which is of value inits own right. The proof of this result uses the following handy definition.

Definition 2.1. If N is a positive integer, we set

D(N) = I ∈ D : I ⊂ [−2N , 2N ), `(I) > 2−N.

Each collection D(N) consists of dyadic intervals that aren’t too big, aren’t too small,and aren’t too far from the origin. The families D(N) are finite, increasing with N , and∪ND(N) = D. We denote the two maximal elements of D(N)—[−2N , 0) and [0, 2N )—byJ−N and J+

N , respectively.The little result is:

Theorem 2.2. If f ∈ Lp, 1 < p < ∞ then

∑

I∈D(N)

λI(f)h(I) → f

1 The argument in [CW] actually shows that

∫(S(f))p v dx ≤ C(p)

∫|f |p Md(v) dx

for 1 < p ≤ 2. Their proof is elegant and not too long, but it makes use of a deep result ofFefferman and Stein [FS], and this was only a three-day course! We enthusiastically referthe reader to [CW].

8

in Lp.

Remark. Theorem 2.2 FAILS when p = 1 and p = ∞.

Proof. By our previous work, relating Haar decompositions to averages, we can seethat ∑

I∈D(N)

λI(f)h(I) =∑

I∈D,`(I)=2−N

I⊂[−2N ,2N )

fIχI − (fJ−N

χJ−N

+ fJ+N

χJ+N

).

The expression on the right converges to f(x) almost everywhere and is pointwise domi-nated by 2Md(f). The result now follows from the Dominated Convergence Theorem.

Proof of Step 1. For k = 0, ±1, ±2, . . . , set

Ek ≡ I ∈ D : 2k < vI ≤ 2k+1.

Notice that every I ∈ Ek is a subset of Ωk ≡ x : Md(v)(x) > 2k.We write: ∫

(S(f))2 v dx =∫ (∑

I

|λI(f)|2|I| χI

)v dx

=∑

I

|λI(f)|2vI

=∑

k

∑

I∈Ek

|λI(f)|2vI

≤∑

k

2k+1∑

I∈Ek

|λI(f)|2.

However, if I ∈ Ek,

λI(f) = 〈f, h(I)〉 = 〈fχI , h(I)〉 = 〈fχΩk , h(I)〉,

and therefore ∑

I∈Ek

|λI(f)|2 =∑

I∈Ek

|〈fχΩk , h(I)〉|2 ≤∫|f |2χΩk dx.

Plugging this back in, and summing, we get

∫(S(f))2 v dx ≤

∫|f |2

(∑

k

2k+1χΩk

)dx ≤ C

∫|f |2 Md(v) dx,

which proves Step 1.

Proof of Step 2. Take f ∈ L1 and, for λ > 0, let Iλk be the maximal dyadic intervals

such that1|Iλ

k |∫

Iλk

|f | dx > λ.

9

Because of maximality, each of these intervals also satisfies

1|Iλ

k |∫

Iλk

|f | dx ≤ 2λ.

We also have the estimate ∑

k

|Iλk | ≤

1λ

∫|f | dx.

Write f = f1 + f2, where

f1(x) =

f(x)− fIλk

if x ∈ Iλk ;

0 if x /∈ ∪kIλk .

This forces

f2(x) =

fIλk

if x ∈ Iλk ;

f(x) if x /∈ ∪kIλk .

We wish to show that

|x : S(f)(x) > λ| ≤ C

λ

∫|f | dx, (2.1)

and for this it will be enough to show that |x : S(f1)(x) > λ/2| and |x : S(f2)(x) >λ/2| are bounded by the right-hand side of (2.1).

The bound on S(f1) is easy because, as the reader can quickly verify, the supportof S(f1) is contained in the union of the Iλ

k ’s. The proof of this relies on the “filtering”properties of the Haar coefficients we observed in Lecture I.

The bound on S(f2) uses our (unweighted) L2 estimate for the square function, andeasy pointwise estimates on f2. We have:

∫(S(f2))2 dx =

∫|f2|2 dx

=∫

R\∪kIλk

|f2|2 dx +∫

∪kIλk

|f2|2 dx

=∫

R\∪kIλk

|f |2 dx +∑

k

|fIλk|2|Iλ

k |

≤ λ

∫

R\∪kIλk

|f | dx + 4λ2∑

k

|Iλk |

≤ λ

∫|f | dx + 4λ

∫|f | dx.

Chebyshev’s inequality (dividing by (λ/2)2) now yields our estimate.

Combining Step 1 and Step 2, we now have that ‖S(f)‖p ≤ C(p)‖f‖p when 1 < p < ∞.

10

Proof of Step 3. We want to show that ‖f‖p ≤ Cp‖S(f)‖p, for 1 < p < ∞. Takef ∈ Lp. Because of Theorem 2.2, we can assume that f is a finite linear sum of Haarfunctions:

f =∑

I

λI(f)h(I).

Let φ ∈ Lp′ , where p′ is p’s dual index, and suppose that ‖φ‖p′ = 1. Write:∣∣∣∣∫

f(x)φ(x) dx

∣∣∣∣ =

∣∣∣∣∣∫ (∑

I

λI(f)h(I)

)φ dx

∣∣∣∣∣

=

∣∣∣∣∣∑

I

λI(f)λI(φ)

∣∣∣∣∣≤

∑

I

|λI(f)||λI(φ)|

=∫ (∑

I

|λI(f)||λI(φ)||I| χI

)dx

≤∫

S(f) S(φ) dx

≤ ‖S(f)‖p‖S(φ)‖p′

≤ Cp′‖S(f)‖p,

finishing the proof.

Theorem 1.3 is a quantitative statement of the fact that the size of f controls the sizeof S(f) and vice-versa. But this control is really much tighter than Theorem 1.3 mightlead one to believe.

Theorem 2.3. There are positive constants C1 and C2 such that, if ‖f‖∞ ≤ 1 and f ’ssupport is contained in I ∈ D, then, for all λ > 0,

|x ∈ I : S(f)(x) > λ| ≤ C1 exp(−C2λ2)|I|.

Theorem 2.4. There are positive constants C1 and C2 such that, if ‖S(f)‖∞ ≤ 1, f ’ssupport is contained in I ∈ D, and

∫If = 0, then, for all λ > 0,

|x ∈ I : |f(x)| > λ| ≤ C1 exp(−C2λ2)|I|.

Moreover, in this case we may take C1 = 2 and C2 = 1/2.

The proof of Theorem 2.3 is very fast, and we will give it right now. The proof ofTheorem 2.4 will come in the next lecture.

Proof of Theorem 2.3. There is a constant C such that, if v is any non-negativefunction supported in I ∈ D, then

∫

I

Md(v) dx ≤ C

∫

I

v(x) log(e + v(x)/vI) dx.

11

(See [St] for a proof.) SetEλ = x ∈ I : S(f)(x) > λ.

We have: ∫(S(f))2 v dx ≤ C

∫|f |2 Md(v) dx.

If we now put v = χEλ , our hypotheses on f and some computations imply:

λ2|Eλ| ≤ C|Eλ| log(e + |I|/|Eλ|).

If |Eλ| = 0 there is nothing to prove. Otherwise, a little algebra quickly yields:

|Eλ||I| ≤ C1 exp(−C2λ

2).

12

LECTURE III

It is enough to prove Theorem 2.4 when I = I0 ≡ [0, 1). The theorem is an immediateconsequence of the following lemma ([CWW]).

Lemma 3.1. Let g ∈ L1 have support contained in I0 and satisfy∫

g dx = 0. Then

∫

I0

exp(g(x)− (1/2)(S(g)(x))2) dx ≤ 1.

Assuming we have the lemma, we prove Theorem 2.4 this way. Set g = λf . Lemma3.1 and the bound on ‖S(f)‖∞ imply

∫

I0

exp(λf(x)) dx ≤ exp((1/2)λ2),

from which Chebyshev’s inequality yields

|x ∈ I0 : f(x) > λ| ≤ exp(−(1/2)λ2).

The same argument, applied to g = −λf , yields

|x ∈ I0 : f(x) < −λ| ≤ exp(−(1/2)λ2),

and these together imply

|x ∈ I0 : |f(x)| > λ| ≤ 2 exp(−(1/2)λ2),

which is Theorem 2.4.We now prove Lemma 3.1.Define g0 ≡ 0, and, for k ≥ 1, set

gk = gk−1 +∑

I:`(I)=2−k+1

λI(g)h(I).

Let the reader note that the function gk is simply

∑I⊂[0,1)

`(I)=2−k

gIχI ;

and that

(S(gk))2 =∑

I:`(I)>2−k

|λI(f)|2|I| χI

and that both of these functions are constant across dyadic intervals of length 2−k.

13

It is clear that gk → g a.e. and that S(gk) S(g) everywhere as k →∞. Therefore,by Fatou’s Lemma, it is enough to show that

∫

I0

exp(gk(x)− (1/2)(S(gk)(x))2) dx ≤ 1

for all k. We will prove this latter fact by induction.The statement is trivial for k = 0. So, assume it’s true for k. Then,

gk+1 − (1/2)(S(gk+1))2 = (gk+1 − gk) + (gk − (1/2)(S(gk))2)− (1/2)∑

I:`(I)=2−k

|λI(g)|2|I| χI .

But gk− (1/2)(S(gk))2 is constant across dyadic intervals of length 2−k. Let J be one suchintegral. We can write∫

J

exp(gk+1 − (1/2)(S(gk+1))2) =(∫

J

exp(gk − (1/2)(S(gk))2) dx

)×

1|J |

∫

J

exp

gk+1 − gk − (1/2)

∑

I:`(I)=2−k

|λI(g)|2|I| χI

dx

=(∫

J

exp(gk − (1/2)(S(gk))2) dx

)×

(1|J |

∫

J

exp(

gk+1 − gk − (1/2)|λJ(g)|2|J |

)dx

)

≡ A(J) ·B(J),

where the summation inside the exponential goes away because, on J , it has only one term;namely, I = J .

Our induction hypothesis says that∑

J A(J) ≤ 1. Therefore we will be done if wecan show that B(J) ≤ 1 for each J , where

B(J) =1|J |

∫

J

exp(

gk+1 − gk − (1/2)|λJ(g)|2|J |

)dx.

On the interval J ,

gk+1(x)− gk(x) =

c on Jl;−c on Jr,

for some real number c. On the other hand,

(1/2)|λJ(g)|2|J | = (1/2)c2.

Therefore B(J) is equal to

exp(−(1/2)c2)1|J |

∫

J

(ecχJl

+ e−cχJr

)dx = exp(−(1/2)c2) cosh(c).

14

But the last quantity is less than or equal to 1 (compare the power series of ex2/2 andcosh(x)). That proves Lemma 3.1 and thus Theorem 2.4.

We have shown that f and S(f) control each other, essentially, to the same degree: abounded f implies an exponentially L2 bounded S(f), and vice-versa. It is natural to askhow sharp this control is.

The answer is: very sharp.Let r0(t) be the unique function with period equal to 2 defined by

r0(t) ≡

1 if 0 ≤ t < 1;−1 if 1 ≤ t < 2;

and, for n ≥ 1 define rn(t) ≡ r0(2nt). These are the Rademacher functions. They areindependent random variables on the probability space [0, 1).

For N ≥ 1 and t ∈ [0, 1) define

fN (t) ≡ 1√N

N∑1

rn(t).

It is easy to compute that S(fN ) ≡ 1. According to the central limit theorem ([C]), forevery λ > 0,

|t ∈ [0, 1) : |fN (t)| > λ| → 1√2π

∫

|y|>λ

exp(−y2/2) dy

as N →∞.For the other direction, define

g0(t) ≡

2 if 1/4 ≤ t < 1/2;−1 if 1/2 ≤ t < 1;0 otherwise;

and for n ≥ 1 set gn(t) ≡ g0(4nt). The functions gn have disjoint supports, and therefore

‖∞∑0

gn‖∞ = 2.

For any dyadic interval I, at most one of the inner products 〈gn, h(I)〉 6= 0, and therefore

(S(∞∑0

gn))2 =∞∑0

(S(gn))2.

The function (S(g0))2 equals 1 on [1/2, 1) and 2 on [0, 1/2). Similarly, each (S(gn))2 equals1 on [(1/2)4−n, 4−n) and 2 on [0, (1/2)4−n). Thus S(

∑∞0 gn) ∼

√k when x ∼ 4−k, which

means that S(∑∞

0 gn) ∼√

log(1/x) as x → 0+; and this is (locally) in exponential L2,but no better.

15

There is a little more to this story. It is natural to ask why we didn’t prove Theorem2.4 by first showing ∫

|f |2 v dx ≤ C

∫(S(f))2 Md(v) dx. (3.1)

The reason is simple: (3.1) is false. For N >> 1, set v = 2Nχ[0,2−N ). For k = 0, 1, 2, . . .,let Ik = [0, 2−k), and define

f(x) =N−1∑

k=0

12k/2(N − k)

h(Ik)(x).

Then |f | ∼ log N on [0, 2−N ), implying∫|f |2 v dx ≥ C(log N)2.

On the other hand, when x ∈ [2−k−1, 2−k) (0 < k < N − 1), Md(v) ∼ 2k and

S2(f) ≤ 1(N − k)2

+1

(N − k + 1)2+ · · ·+ 1

N2

≤ C/(N − k),

implying∫

(S(f)) Md(v) dx ≤ C log N .So, it seems that, while f and S(f) exercise essentially equal degrees of control over

each other, the control exercised by f is just a little bit stronger; or that, in some averaged,asymptotic sense, S(f) is a little bit smaller than f .

We want to say a few words about how results on the dyadic square function can beextended to higher dimensions, after which we will talk about one of its many “continuous”analogues.

The family of dyadic cubes Dd in Rd is defined by

Dd ≡ Q = I1 × · · · × Id : Ii ∈ D, `(I1) = · · · = `(Id),

and we use `(Q) to denote the common length `(Ii). If f ∈ L1loc(R

d) and Q ∈ Dd we set

fQ ≡ 1|Q|

∫

Q

f dx

and, for k an integer,fk =

∑

Q∈Dd:`(Q)=2k

fQχQ(x).

If Q ∈ Dd and `(Q) = 2k, we define

a(Q)(f) ≡ (fk−1 − fk)χQ.

16

The functions a(Q)(f) play a role analogous to that of 〈f, h(I)〉h(I) in one dimension. Inparticular, the family a(Q)(f)Q is pairwise orthogonal and, if f ∈ L2(Rd), the sum∑

Q∈Dda(Q)(f) converges to f in L2, implying that

∫|f |2 dx =

∑

Q∈Dd

‖a(Q)(f)‖22

=∫

∑

Q∈Dd

‖a(Q)(f)‖22|Q| χQ

dx

=∫

(Sd(f))2 dx,

where we define

Sd(f)(x) ≡ ∑

Q∈Dd

‖a(Q)(f)‖22|Q| χQ

1/2

,

which is the d-dimensional dyadic square function.This square function satisfies the same boundedness properties as its one-dimensional

version, with proofs that are essentially identical. In particular, we have ‖Sd(f)‖p ∼ ‖f‖p

for f ∈ Lp(Rd) (1 < p < ∞) and that∫

(Sd(f))2 v dx ≤ C(d)∫|f |2 Md(v) dx

for all weights v, where here Md(·) denotes the maximal operator in which the averagesare taken over dyadic cubes instead of intervals1. As in R1, this last inequality impliesthat, if f ∈ L∞, Sd(f) is locally exponentially square integrable. The converse implicationalso holds, but for this the changes in the proof are not entirely trivial.

Let us note that, for every Q ∈ Dd,

‖a(Q)(f)‖22|Q| ∼ ‖a(Q)(f)‖2∞,

with approximate proportionality constants that only depend on d. Therefore

Sd(f) ∼ S∞(f),

where

S∞(f) ≡ ∑

Q∈Dd

‖a(Q)(f)‖2∞

1/2

,

and it is enough to show the following:

1 Also as in R1, this inequality holds for 1 < p < 2 as well; see [CW].

17

Theorem 3.2. There are positive constants C1(d) and C2(d) such that if S∞(f) ∈ L∞,the support of f is contained in a dyadic cube Q, and

∫f = 0, then, for all λ > 0,

|x ∈ Q : |f(x)| > λ| ≤ C1(d) exp(−C2(d)λ2)|Q|.If the reader tries to redo the proof of Theorem 2.4 here, he will find that everything

goes fine until he comes to the estimate:

1|Q|

∫

Q

exp(a(Q)(f)) dx ≤ exp(c‖a(Q)(f)‖2∞).

This estimate is the burden of the following lemma:

Lemma 3.3. Let (Ω, P ) be a probability space and let φ : Ω 7→ R satisfy (i) ‖φ‖∞ ≤ tand (ii)

∫Ω

φdP = 0. Then ∫

Ω

exp(φ) dP ≤ cosh(t). (iii)

Proof of Lemma. Let µ be φ’s probability distribution. This is a Borel measure,supported on R, such that, for all Borel E ⊂ R,

P (ω ∈ Ω : φ(ω) ∈ E) = µ(E).

Hypotheses (i) and (ii) are respectively equivalent to (i′) supp µ ⊂ [−t, t] and (ii′)∫x dµ(x) = 0. The lemma’s conclusion (iii) is equivalent to

∫ t

−t

exp(x) dµ(x) ≤ cosh(t). (iii′)

But (iii′) is easy. On [−t, t] the graph of exp(x) lies below the straight line passing through(−t, exp(−t)) and (t, exp(t)), which has equation y = mx+ b, with b = cosh(t). Therefore,

∫ t

−t

exp(x) dµ(x) ≤∫ t

−t

(mx + b) dµ(x)

=∫ t

−t

b dµ(x)

= b,

which proves the lemma.

Remark. Juan Sueiro Bal taught me this elegant proof when he was a graduate studentat the University of Wisconsin, Madison.

The preceding “discrete” analysis can be made “continuous” by means of the cel-ebrated Calderon reproducing formula. Let ψ ∈ C∞0 (Rd) be real, radial, supported in|x| ≤ 1, satisfy

∫ψ = 0, and also satisfy

∫ ∞

0

|ψ(yξ)|2 dy

y≡ 1

18

for all ξ 6= 0. If f also belongs to C∞0 (Rd) and satisfies∫

f = 0, then

f(x) =∫

Rd+1+

(f ∗ ψy(t))ψy(x− t)dt dy

y, (3.2)

where we are using ψy(t) to denote y−dψ(t/d). The proof of (3.2) relies on the fact thatwe can take the Fourier transform of both sides for f ’s in our test class, with the Fouriertransform of the right-hand side being

f(ξ)∫ ∞

0

|ψ(yξ)|2 dy

y≡ f(ξ)

for all ξ. It is not hard to extend (3.2) to f ’s in L2. Plancherel’s Theorem implies that,for any f ∈ L2, ∫

Rd+1+

|f ∗ ψy(t)|2 dt dy

y=

∫

Rd

|f |2 dx.

If K ⊂ Rd+1+ is compact then

fK(x) ≡∫

K

(f ∗ ψy(t)) ψy(x− t)dt dy

y

belongs to L2. The inner product of fK with any h ∈ L2 is bounded by

∫

K

|f ∗ ψy(t)| |h ∗ ψy(t)| dt dy

y≤

(∫

K


y

)1/2(∫

Rd+1+

|h ∗ ψy(t)|2 dt dy

y

)1/2

,

implying that

‖fK‖2 ≤(∫

K


y

)1/2

.

From this it is easy to show that if Ki is any increasing sequence of compact sets such that∪Ki = Rd+1

+ , and f ∈ L2, then fKi has an L2 limit g, with ‖g‖2 ≤ ‖f‖2. But this limithas to be f , if f belongs to C∞0 (Rd) and has integral equal to 0; therefore the (L2) limitis f for all f ∈ L2. That is the rigorous meaning of formula (3.2) for f ∈ L2.

For Q ∈ Dd define T (Q) ≡ Q × [`(Q)/2, `(Q)). The collection T (Q)Q tiles Rd+1+ ,

and it provides a convenient way to decompose the integral (3.2); namely,

f =∫

Rd+1+


y

=∑

Q

∫

T (Q)


y

≡∑

Q

λQa(Q), (3.3)

19

where

λQ =

(∫

T (Q)


y

)1/2

.

The functions a(Q) are smooth and satisfy: (1) supp a(Q) ⊂ Q, the triple of Q; (2)‖∇a(Q)‖∞ ≤ c`(Q)−1|Q|−1/2; (3)

∫a(Q) dx = 0. They are called “adapted functions.”

One can think of them as “continuous Haar functions.” However, notice that, unlike Haarfunctions (but like the a(Q)(f)’s from the higher dimensional dyadic setting), the familya(Q)Q depends on the function f . This dependence can be removed by dealing withspecial (wavelet) families instead. But, for many purposes, such a sophisticated approachis not needed, and it adds nothing more.

Corresponding to the decomposition (3.3) we define a square function Sc(f):

Sc(f) ≡∑

Q

|λQ|2|Q| χQ

1/2

.

Unfortunately, a detailed analysis of Sc(f) goes beyond the scope of these lectures. Sufficeit to say that the essential properties of S(f) and Sd(f) also hold for Sc(f):

‖Sc(f)‖ ∼ ‖f‖p; (1 < p < ∞)

and ∫

Rd

(Sc(f))p v dx ≤ C(p, d, ψ)∫|f |p Md(v) dx. (1 < p ≤ 2)

We refer the reader to [CW] and [W] (and especially the former) for detailed treatmentsof these.

We also have the important fact that if Sc(f) is bounded, then f is locally exponen-tially L2. The reason for this last fact is that sums like

∑Q λQa(Q) can be handled almost

as if they were sums of the form∑

Q a(Q)(f). The big problem with doing somethinglike this is that the triples of the dyadic cubes do not have the same exclusion/inclusionproperties as the cubes in Dd do (if Q and Q′ belong to Dd, either they are disjoint, or oneis a subset of the other). But this problem can be fixed by means of a simple combinatorialtrick.

Lemma Trick. The family Dd, the triples of the dyadic cubes, can be decomposed into3d disjoint families Gi such that, for each i (1 ≤ i ≤ 3d): (1) bisecting the sides of anyQ ∈ Gi yields 2d other cubes in Gi; (2) every Q ∈ Gi is obtained by bisecting the sides ofsome larger Q′ ∈ Gi; (3) if Q and Q′ belong to Gi, either the cubes are disjoint, or one isa subset of the other.

Hints toward a proof. It is enough to prove the lemma when d = 1. If I is thetriple of a dyadic interval then

I = [(3j + s)2k, (3j + 3 + s)2k), (3.4)

20

where j and k are integers, and s is 0, 1, or 2. The right and left halves of I each havethe form [(3j′ + 2s)2k−1, (3j′ + 3 + 2s)2k−1). Similarly, a computation shows that I is theright or left half of an interval having the form [(3j′′ + 2s)2k+1, (3j′′ + 3 + 2s)2k+1) (it’simportant here that 22 = 1, mod 4). For every integer k, and s ∈ 0, 1, 2, let Fk,s be thefamily of intervals having the form (3.4). Then the required sets Gs (s ∈ 0, 1, 2) can bedefined by

Gs ≡ ∪∞−∞Fk,2|k|s,

where 2|k|s means ’2|k|s, mod 4.’

Remark. The preceding lemma readily generalizes to the family mDd, the m-folddilates of the dyadic cubes, where m is any odd positive integer. Another exposition ofthis trick (and its earliest occurrence, as far as the author knows) is cited on page 416 of[G].

Once we have the families Gi, we can, for each i, define “dyadic” averages and “dyadic”square functions relative to the cubes Q ∈ Gi. We can then decompose f =

∑3d

1 f(i), whereeach f(i) is

f(i) ≡∑

Q∈Gi

a(Q)(f).

(Note that the Q’s in the summation are triples of dyadic cubes!) We get a collection ofcorresponding “dyadic” square functions:

SGi(f(i))(x) ≡ ∑

Q∈Gi

‖a(Q)(f)‖22|Q| χQ(x)

1/2

.

It is then a theorem (not hard, but a little technical; see [W], Lemma 2.2) that, for each i,

SGi(f(i)) ≤ C(d)Sc(f)

pointwise. Thus, any bound on Sc(f) automatically implies a corresponding bound on theSGi(f(i))’s, to which we can apply “dyadic” arguments. We refer the reader to [W] for amore detailed exposition of this method.

21

References.

[C] K. L. Chung, A Course in Probability Theory Revised, Academic Press, New York(2000).

[CW] S. Chanillo, R. L. Wheeden, “Some weighted norm inequalities for the area integral,”Indiana Univ. Math. Journal 36 (1987), 277-294.

[CWW] S. Y. A. Chang, J. M. Wilson, T. H. Wolff, “Some weighted norm inequalities con-cerning the Schrodinger operators,” Comm. Math. Helv. 60 (1985), 217-246.

[FS] C. Fefferman, E. M. Stein, “Some maximal inequalities,” Amer. Journal of Math. 93(1971), 107-115.

[G] J. B. Garnett, Bounded Analytic Functions, Academic Press, New York (1981).

[GR] J. Garcia-Cuerva, J. L. Rubio de Francia, Weighted Norm Inequalities and RelatedTopics, North-Holland Math. Studies 116, North Holland, Amsterdam (1985).

[St] E. M. Stein, Singular Integrals and Differentiability Properties of Functions, PrincetonUniversity Press, Princeton NJ (1970).

[W] J. M. Wilson, “Weighted norm inequalities for the continuous square function,” Trans.Amer. Math. Soc. 314 (1989), 661-692.

22

Three Lectures on Littlewood-Paley Theory · Three Lectures on Littlewood-Paley Theory Michael...

Documents

Transcript of Three Lectures on Littlewood-Paley Theory · Three Lectures on Littlewood-Paley Theory Michael...