Chapter 02 Foundations of Measurement Fractal Theory for the Fracture Mechanics
Self-Similar Models and How to Find Them: A Moment Theory Approach · 2019. 11. 21. · of an...
Transcript of Self-Similar Models and How to Find Them: A Moment Theory Approach · 2019. 11. 21. · of an...
Self-Similar Models and How to Find Them:
A Moment Theory Approach
Christopher James Williams
Supervisor: Michael Barnsley
May 2019
An Honours thesis submitted for the degree of Bachelor of Philosophy (PhB.) – Mathematics
of the Australian National University
Declaration
The work in this thesis is my own except where otherwise stated.
Christopher James Williams
Acknowledgements
Firstly I would like to thank Joan for being an extraordinary convener of the honours program
and accommodating my specialised honours ‘year’.
To the honours/masters students: Alex, Adele, Jack, Yossi, Hugh, Kelly, Adwait, James,
Yiming and Michael I am thankful for the friendships you have all provided in this time period.
I am especially thankful to: Jane’s ubiquitous influence on the tikz design of many (most)
figures, after witnessing my colour-blind design of palette; Kyle’s enforcement of detail to many
(most) of my proofs, that I considered to be complete; Feng and Sam’s willingness to endlessly
talk about the failure of many (most) of our Python programs; and Wenqi’s analytical input in
bounding quantities that in many (most) cases resulted in repeated use of the triangle inequality.
In my third year of study, I was determined to do honours in a more applied area than
mathematics. Doing Pierre’s ‘applied’ math of finance course and a computational project
with Markus that semester showed me how some pretty abstract ideas made their way into
application. Markus has always continued to give me computational advice since that project,
much of which has been used in my examples here. Their influence rid me of much fear of
integrals and computation, both of which are abundant in this thesis.
Michael has always been a force of encouragement, support, and like Kyle, an excellent
supervisor — although, unlike Kyle, his organisational skills may come exclusively from Louisa.
I am very lucky to have Michael and Louisa’s continuing input to both my thesis, general
knowledge of mathematics and the overall guidance they have provided me throughout my
studies for the past three years. All the extra effort Michael and Louisa have given, outside of
writing a thesis, I will always remember.
Lastly I would like to thank my family for their continued support for my higher education.
In particular, my partner Caroline has kept my life together during this period, without which,
my organisational skills would be completely reminiscent of Michael\Louisa.
v
Abstract
Self-similar modelling aims to capture how an object relates to itself, and can be done
through fractal geometry. Creating a fractal that looks like a natural object is easy, find-
ing a fractal that models a given self-similar object is hard. The only known success of this
inverse problem comes from fractal image compression (FIC). In this thesis we develop
an alternative solution to the problem. Explicitly, we formulate a fractal moment approx-
imation theory that provides more flexibility in self-similar modelling to that present in
FIC. This is done through placing a normalised measure on a self-similar set and using
this measure’s moments in its reconstruction. It is proven that the vector of these mo-
ment values is the unique fixed point of a linear operator on 1`∞, the space of bounded
sequences whose first element is 1. A relationship between self-similar sets and measures
is given through tools found in fractal tiling and dimension theory. An approximation
theory is developed that gives a computationally feasible way to approximate (possibly
non-self-similar) objects. These approximations are investigated in both theory and com-
putation. Through our novel examples, this theory is extended to local fractal models
that are frequent in application, and their relation to image compression is discussed.
vii
Contents
Acknowledgements v
Abstract vii
Notation and Terminology xi
1 Motivation and Outline 1
2 The Home H(X) and Introductory Fractal Modelling 5
2.1 The Hutchinson Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Introduction to Fractal Approximations . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 One-Dimensional Fractal Image Compression . . . . . . . . . . . . . . . . . . . . 16
3 The Probabilists P(X) and Self-Similar Measures 21
3.1 Self-Similar Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Fractal Moment Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Computation of Two-Dimensional Affine Moments . . . . . . . . . . . . . . . . . 46
4 The Inverse Problem of Moments 47
4.1 Algebraic Solution to Polynomial Systems . . . . . . . . . . . . . . . . . . . . . . 48
4.2 Hausdorff Dimension and The Open Set Condition . . . . . . . . . . . . . . . . . 57
4.3 The Collage Theorem for Moments . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4 Computational Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5 Loosening Self-Similarity 79
5.1 Graph-Directed IFS (GD-IFS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2 A Commentary of some Modern Approaches . . . . . . . . . . . . . . . . . . . . . 85
Bibliography I
A Further Fractal Figures V
ix
Notation and Terminology
Notation
N0 the natural numbers including zero, {0, 1, 2, . . . }.
N, Q, R, C the natural, rational, real and complex numbers.
[N ] the set of numbers {1, . . . , N} for N ∈ N.
[N ]n, [N ]N the strings of numbers of length n ∈ N from the alphabet [N ], and infinite
strings of words from the alphabet [N ].
(X, d) a complete (or sometimes compact) metric space.
B(X) the Borel subsets of the metric space (X, d).
1·(·) the indicator function, 1B : X → {0, 1}, that returns one if x ∈ B ∈ B(X)
and zero otherwise. Through a mild abuse of notation, we also take 1·(x) :
B(X)→ {0, 1} through 1B(x) returning one if x ∈ B, and zero otherwise.
Lp(Rn) the space of Lebesgue measurable functions, f on Rn, such that∫|f |p <∞.
H(X) the space of non-empty compact subsets of a complete metric space (X, d).
P(X) the space of normalised Borel measures on a compact metric space (X, d).
`∞, 1`∞ the space of bounded sequences in the ‖·‖∞ norm, and the space of bounded
sequences whose first term is one.
(M)i,j the (i, j)-th entry of a matrix M .
Terminology
IFS an iterated function system, F = {(X, d); f1, . . . , fN}, see Defini-
tion 2.3.
attractor the unique fixed point of an IFS when viewed as an operator F :
H(X)→ H(X), see Definition 2.5.
attractive measure the unique fixed point of an IFS with probabilites, M : P(X) →P(X), see Definition 3.6.
xi
Chapter 1
Motivation and Outline
Since the inception of calculus, mathematical modelling has been dominated by smooth func-
tions. To their credit, the approximations made by polynomials, splines and cosines have proven
to be impressive tools in many modern mathematical models. But there are disciplines in which
such approximates are notoriously unhelpful — said disciplines often include data which pertains
to nature. A fractalist, in response to this situation, might declare that calculus is a fallacy,
first-year analysis courses amount to brainwashing and that the the world is by nature rough,
jagged and complex. They might then, ironically, proceed to launch into a long discussion about
Sierpinski triangles with no explanation as to how these very artificial fractal objects relate to
modelling nature. Straight lines appear as seldom in nature as do smooth functions, and Sier-
pinski triangles even more so. The aim of this thesis therefore is not to find the ‘best’ fractal
approximation of a given object but rather to give insight into some methods of fractal approx-
imation and how one can transition from knowledge of Sierpinski triangles to the creation of
fractal models.
Broadly, a self-similar object is anything that may be described with reference to itself. General
examples are abundant; a dictionary contains words whose definitions are constructed by their
relation to other words, and population growth is described in terms of the current population.
Mathematically, to construct a self-similar model the following three items are generally needed:
1. a compact, or possibly complete, metric space (X, d);
2. a space of ‘models’ M on (X, d);
3. an operator T : M→M constructed of ‘simple maps’.
Examples of the first two items include: (R2, ‖ · ‖2) whose models are non-empty compact sets
to form Sierpinski triangles (Example 2.10); (C([0, 1]), ‖ · ‖∞) to model continuous self-similar
functions (Example 2.11); and probability measures on ([0, 1]2, ‖ · ‖2) that can model leaves
(Figure 3.3).
1
2 CHAPTER 1. MOTIVATION AND OUTLINE
How to create a self-similar object that resembles natural phenomena such as leaves, mountains
and clouds is well documented. This is the forward problem. But given a self-similar object,
how does one fit a fractal model to it? This is the inverse problem and the main question being
addressed in this thesis.
We break up the study of this inverse problem into three main parts. We first review techniques
of fractal image compression in the most simplified setting possible — approximating a ‘one-
dimensional’ image. Fractal image compression is the only success of this inverse problem
known to date. As such, we then highlight areas where the study of this inverse problem is non-
existent. For instance, the only approach of modelling self-similar objects with fractal models
— namely, fractal image compression — will never yield an exact construction of self-similar
objects like Cantor sets and Sierpinski triangles. The next part of this thesis is the formulation
of an approximation method that can do exactly this task through fractal moment theory.
Finally, we show with simplified computational examples how the ideas from our newly created
approximation of self-similar objects pertain to modelling. The thesis concludes by providing a
commentary of how our simplified examples can be generalised to explicit use in modelling and
by outlining what ideas are present in modern techniques.
The structure of this thesis is as follows.
Chapter 2 introduces the home of fractal geometry H(X), the space of non-empty compact
subsets of a complete metric space (X, d). Three self-similar objects are created here to later
model through the use of iterated function systems (IFSs) — this will be the operator made
from the ‘simple maps’ alluded to above. Traditional fractal image compression (FIC) will be
performed in a simplified setting on the last of the three objects created — a self-similar function.
Since an image can be thought of as the graph of a function, the ‘one dimensional image’ we will
make will be the graph of the simplified self-similar function I : [0, 1]→ [0, 1]. To present FIC,
a relationship between the space of models of sets in H(X) and models of continuous functions
in C(X) is established. This is exactly how knowledge of Sierpinski triangles transitions to
modelling photographs. A computational example is then provided to show how this is done
explicitly. The reader should refer back to the structure of this chapter while continuing through
the remainder of this thesis. Although the structures of this chapter and of the thesis are similar,
the content is subtly different and in many cases of the writers own formulation for a different
method of fractal approximation, unless otherwise referenced.
After presenting FIC, a more flexible modelling tool is desired. This is done in P(X), the space
of compactly supported Borel measures on a compact metric space X. In Chapter 3, this space is
3
built and the theory of integration with respect to self-similar measures is defined. The ergodic
theory of a self-similar measure with respect to an IFS is explored for the efficient computation
of a fractal image. Through investigating this fractal generation technique — referred to as the
chaos game — the computation of the measures moments is proposed. This computation is
done for three types of IFS: for one-dimensional real affine IFSs, for one-dimensional complex
affine IFSs, and for two-dimensional affine moments. The last of these computations is a novel
additions to the literature. These moments can be calculated in an efficient manner through an
iterative process on 1`∞: a closed subset of the Banach space of bounded sequences.
Chapter 4 investigates reconstructing a self-similar measure through its moments. To do this, a
multivariate polynomial system of equations must be solved. In the case of a Cantor measure,
this is done exactly through tools in computational algebraic geometry — specifically Grobner
bases. Like how a relationship between C(X) and H(X) was created in Chapter 2, a relationship
between P(X) and H(X) is established. This is done through modern tools in fractal tiling theory
and Hausdorff dimension to show the core principles of the computation in a simplified setting.
This relationship is then used to allow moments to model a generalised Sierpinski triangle —
specifically a Steemson triangle. Finally, through using the iterative formula created in Chapter
3, an approximation theory for a self-similar object using moment values is created. This is
synonymous with the approximation theory made in Chapter 2. A computational example is
provided for modelling the same object presented in Chapter 2 in a subtly different way.
In describing these techniques, this thesis provides the history of how fractal approximations
have been progressively refined. After the success of FIC, there were two clear paths for the
progression of image compression: one that strived for efficient computation and description,
the other that aimed to model a self-similar object at a potentially larger computational cost.
Chapter 5 reveals how the simple modelling examples made in the thesis could possibly be
extended to image compression, specifically by use of local fractal models and their respective
moment theory, which is derived in that chapter.
Code for numerical experiments can be found at:
https://github.com/Chr1sWilliams/Chris Williams Honours,
in amply named Jupyter notebooks pertaining to each chapter. The code is not optimised and
written for readability in the notebook. The thesis is self-contained, although Chapter 4 can be
read interactively with the notebooks provided.
4 CHAPTER 1. MOTIVATION AND OUTLINE
Chapter 2
The Home H(X) and Introductory
Fractal Modelling
‘What is the simplest example?’ — Markus Hegland.
Self-referential modelling has had great success in capturing the details of natural objects that
are rich in self-similarity, such as leaves, clouds and mountain ranges. The premise of fractal
modelling is that an object is described by its relations to itself — avoiding the potential dif-
ficulty of directly describing the object. Heuristically, a ‘good’ fractal model can be achieved
through the ability to accurately and concisely capture the self-referential aspects an object
might exhibit. Currently there are two effective methodologies used to find such aspects: the
graduate student algorithm and one stemming from fractal image compression (FIC). In this
chapter we explore the latter. This is done by firstly introducing the construction of self-similar
objects, then presenting fundamental ideas from FIC in the simplest possible setting as an in-
troduction to fractal modelling.
The structure of the chapter is as follows. The construction of self-referential objects such as
Cantor sets and Sierpinski triangles is introduced in the home of fractal geometry, H(X): the
space of non-empty compact subsets of a complete metric space (X, d). We then progress to how
one can use the ideas formed from working inH(X) to begin self-referential modelling. Explicitly,
we show how one fits a self-referential model to a function in C([0, 1]) — the space of continuous
functions on the unit interval — by introducing fractal interpolation functions (FIFs). This is
done to reveal techniques and tools from FIC and fractal dimension theory as a sampling of
to the ideas and methodology introduced throughout the thesis. A computational example is
included at the end of the chapter to explore traditional fractal approximation in a simplified
setting.
5
6 CHAPTER 2. THE HOME H(X) AND INTRODUCTORY FRACTAL MODELLING
2.1 The Hutchinson Operator
The first step in creating our fractal objects is the construction of a “suitable ‘home’ in which
to describe them”. To start, a complete metric space is needed, for which we usually select
(R2, ‖ · ‖2) where ‖ · ‖2 is the Euclidean norm. We will turn this metric space into a home
through the following:
Definition 2.1. Let (X, d) be a complete metric space and H(X) be the space of non-empty
compact subsets of X. The Hausdorff metric dH on H(X) is given through
dH(U, V ) = infr>0
{r∣∣ V ⊂ ⋃
u∈UBr(u) and U ⊂
⋃v∈V
Br(v)
},
where Br(x) = {y | d(x, y) ≤ r}.
Remark 2.2. The above metric can be described intuitively as follows; given two sets U, V ∈H(X), the set U is inflated by replacing each point in U with a ball of radius r centralised to
the point replaced. This inflation continues until the set V is consumed, then this is done in the
reverse fashion and the maximum inflation needed from either case is the distance between the
sets. This metric is not a norm.
It is always true that if (X, d) is a complete metric space then (H(X), dH) is also complete
[Hau78]. Now that we have created a home for our fractals, we need to know how to construct
them. This is done with the aid of Iterated Function Systems (IFSs) [BD85] and the Hutchinson
operator [Hut79] as defined below.
Definition 2.3. An iterated function system (IFS) is a complete metric space (X, d) with a
finite collection of contractive functions {fi : X → X}Ni=1. We denote this collection F =
{(X, d); f1, . . . , fN}.
Remark 2.4. A contractive function fi : X → X on a metric space (X, d) obeys d(x, y) ≤λid(fi(x), fi(y)) for all x, y ∈ X and for a fixed λi ∈ [0, 1). The letter λ will be exclusively used
in this thesis to refer to the contractivity factor of a function (or operator when appropriate).
Through a mild and convenient abuse of notation, we use the same symbol F for an operator
F : H(X)→ H(X).
Definition 2.5. Let F = {(X, d); f1, . . . , fN} be an IFS. Then the Hutchinson operator F :
H(X)→ H(X) is defined through
F(V ) =N⋃i=1
fi(V ) for all V ∈ H(X).
Remark 2.6. The notation above is ambiguous as fi : X → X, yet we have used fi above to
act on elements of H(X). To remedy this we define fi acting on non-empty subsets of X through
fi(U) = {fi(u) | u ∈ U}.
2.1. THE HUTCHINSON OPERATOR 7
When each of the fi in an IFS are contractive functions, the Hutchinson operator is a contraction
on the space H(X) with respect to the metric dH. That is for all U, V ∈ H(X), let r = dH(U, V )
then
V ⊂⋃u∈U
Br(u)⇒ fi(V ) ⊂ fi
(⋃u∈U
Br(u)
)=⋃u∈U
fi (Br(u)) ⊂⋃u∈U
Bλir(u),
where λi < 1 is the contractivity factor for fi, which implies the contractivity of F acting on
H(X). Through Banach’s Fixed Point Theorem we know that F admits a unique fixed point A.
We call this element A ∈ H(X) the attractor of an IFS, as by Banach’s Fixed Point Theorem,
it is attractive under the iteration of F . Explicitly, for any V ∈ H(X) we have Fn(V )n→∞−−−→ A.
For the most part it is assumed that the functions fi are affine and (X, d) = (R2, ‖ · ‖2); that
is, they take the form x → fi(x) = Aix + bi where Ai is a linear operator Ai : R2 → R2 and bi
is a translation. In particular cases, we use a specific type of affine function called a similitude.
Similitudes are affine functions with the added requirement that Ai = λiOi, where λi ∈ [0, 1)
and Oi is an isometry. To make this distinction clear, when similitudes in two real dimensions
are used, the functions in the IFS may be identified as complex variable isometries of the space
(C, | · |) multiplied by a scaling factor, which creates a fractal object in H(C).
It is often useful to place some type of separation property on an IFS being studied for reasons
that will become abundantly clear when analysing the dynamics and dimension theory present
on a fractal attractor. This is an assumption on how the functions in an IFS map certain sets.
These extra assumptions enable us to determine a wealth of information about the topology of
an IFS’s attractor without being too restrictive on the type of IFS being studied. There are
various separation conditions to choose from (see, for instance [KR15]), we select the open set
condition (OSC) defined below.
Definition 2.7. An IFS F = {(X, d); f1, . . . , fN} obeys the open set condition (OSC) if there
exists a non-empty open set O ⊂ X such that:
1.⋃Ni=1 fi(O) ⊂ O ;
2. fi(O) ∩ fj(O) = ∅ for all i 6= j.
Remark 2.8. In the first assumption of the OSC, it would be incorrect to write F(O) ⊂ O as
O may not lie in H(X). It is worth noting that the OSC is a condition which pertains to an
IFS, not its attractor.
With these formalities defined, we present three examples of IFSs that obey the open set con-
dition. These will be our test cases for the fractal models described in this thesis.
Firstly, we have the traditional Cantor set created by an IFS that mimics its typical set theoretic
construction.
8 CHAPTER 2. THE HOME H(X) AND INTRODUCTORY FRACTAL MODELLING
Example 2.9. Take the IFS FCant = {([0, 1], |·|); f1(x) = x3 , f2(x) = x
3 + 23}. Then the attractor
of this IFS is the traditional Cantor set as evinced through the following iteration.
[0, 1]
FCant([0, 1])
F2Cant([0, 1])
F3Cant([0, 1])
F4Cant([0, 1])
0 1
This IFS obeys the open set condition witnessed by (0, 1). In one real dimension, affine functions
are precisely similitudes — the nomenclature chosen is to refer to them as affine functions.
Next we present a Steemson triangle, a Generalised Sierpinski Triangle [SW18]. This in an IFS
acting on the space H(C), which is made from complex similitude functions. One could construct
a synonymous IFS acting on the space H(R2), but the maps become more cumbersome to write
and hinder the observation of how the maps are explicitly acting on H(C).
Example 2.10. [SW18] Consider the IFS of three maps:
FSteem = {(C, | · |) ; f1(z) = αeiθz, f2(z) = βz + (1− β), f3(z) = γz + (1− γ)(Cx + iCy)},
where α = bb2+1
, β = 1b2+1
, γ = b2
b2+1, Cx = b2−a2+1
2 , Cy =
√4b2−(b2−a2+1)2
2 and θ = cos−1(Cxb
);
and a = 1.1, b = 0.85. The attractor of this IFS is a Steemson triangle from [SW18], a generalised
Sierpinkski triangle.
Unlike the previous example whose attractor was the traditional Cantor set, it is not straight-
forward to identify the attractor of this IFS. To gain some insight, we provide the following
discussion of the maps in the IFS. Broadly speaking, the maps are parameterised by the values
a, b > 0, which become the side lengths (along with 1) of the triangular convex hull that contains
the attractor of this IFS. This can be identified by noting the fixed point of each fi is a vertex
of this triangular hull. We have identified the ‘home’ for this fractal to be (C, | · |), so by our
convention these maps are similitudes. These functions are isometries multiplied by a scaling
factor (of magnitude less than 1); where f2 and f3 are scaling functions composed with transla-
tions, and f1 is a scaling function with a flip about the real axis followed by an anti-clockwise
rotation. This means that distances between points are uniformly scaled under the maps fi, or
geometrically, the group of similitude transformations maps squares to squares.
The following page shows an arbitrary set V ∈ H(X) (to which some colouring on the set is
added) under iteration of FSteem. This set was chosen as it is easy to identify the orientation of
its images under the functions fi.
2.1. THE HUTCHINSON OPERATOR 9
Figure 2.1: Elements of H(C) (with added colouring, explicitly Bobby the Crimson Rosella)
converging to a Steemson triangle under the iteration of FSteem. This IFS obeys the open set
condition, seen by considering the interior of the set under iteration.
10 CHAPTER 2. THE HOME H(X) AND INTRODUCTORY FRACTAL MODELLING
Our final example is an IFS whose maps are affine in two dimensions and has an attractor that
creates the graph of a continuous function. This form of IFS is known as a fractal interpolation
function (FIF) and is presented in depth here on the historic basis of being one of the first
applications of fractal geometry to modelling problems [BH89]. In the application of affine
maps, straight lines are mapped to straight lines, but objects such as squares may be mapped
other quadrilaterals. In general, the theory of IFSs comprising of affine maps is far from being
well-understood. To determine if there exists a topology on X = Rn such that an affine IFS
is contractive on H(X) requires bounding the maps joint spectral radius by one [ABVW10], a
difficult computation when n > 1 and there is more than one affine function in the IFS.
Classically, the construction of a non-differentiable continuous function was a non-trivial task
[Kol31]. In this example, constructing what was once regarded as pathological function is as
straightforward as creating a generalised linear interpolant.
Example 2.11. [BH89] Suppose data {(xi, yi)}Ni=0 ⊂ R2 is given, and the data points are
enumerated such that xi < xi+1 for all i ∈ {0, . . . , N − 1}. This data looks to be sampled from
a continuous rough function. We wish to build an IFS such that the attractor G ∈ H(X) is
the graph of a continuous interpolant of the data. Through a change in coordinates by means
of an affine transformation, we may assume that our data obeys (x0, y0) = (0, 0), xN = 1 and
{(xi, yi)}Ni=0 ⊂ [0, 1]2 to simplify calculations. Based on this assumption we may define the maps
fi(x) =
(xi − xi−1 0
yi − yi−1 − diyN di
)(x
y
)+
(xi−1
yi−1
),
for i ∈ {1, . . . , N} := [N ] and |di| < 1. Define the IFS FFIF = {(R2, ‖ · ‖2); f1, · · · , fN}. A
graphical example of one such IFS for N = 4 is shown below, where the outer rectangle is
mapped to the shaded regions under the maps fi.
(x1, y1)
(x2, y2)
(x3, y3)
(x4, y4)
f1
f2f3
f4
|d1|
|d2|
|d3|
|d4|
0 1
1
Figure 2.2: The unit square [0, 1]2 and the corresponding images fi([0, 1]2).
2.1. THE HUTCHINSON OPERATOR 11
We reveal the explicit details of this attractor in the next section. For now, consider selecting
di = 0; this maps the whole outer rectangle to the graph of a linear interpolant of the data,
indicating that this IFS might be a good candidate for data interpolation. The sign of the value
di is noteworthy in constructing these maps. A negative value for di means the orientation of
the unit square mapped under fi (see map f3 in the diagram above) is flipped about the x-axis,
in the exact same way complex conjugation was used in the construction of a Steemson triangle.
Before we may continue, this example elucidates one of the first difficulties of working with affine
IFSs in dimensions larger than one: the existence of an attractor. For this, we require that each
function in an IFS be contractive in some underlying metric space from which H(X) can be
constructed. It is straightforward to check this in the similitude case, as one can always write
for a similitude s : X → X that ‖s(x) − s(y)‖ = λ‖x − y‖ for some λ > 0 and norm ‖ · ‖. In
the affine case it is unclear by inspection if the above maps are indeed contractive, and in many
metric spaces they are not — including (R2, ‖ · ‖2). However, by using a different metric on the
underlying space R2, the functions for a FIF can always be made contractive. Define the norm
‖ · ‖θ on R2 through ‖(x, y)‖θ = |x|+ θ|y| for θ > 0 and any (x, y) ∈ R2. It directly follows for
fi as defined above that
‖fi((x1, y1))− fi((x2, y2))‖θ ≤ (|xi − xi−1|+ θ|yi − yi−1 − diyM |)|x1 − x2|+ θ|dn||y1 − y2|
≤ λ‖(x1 − x2, y1 − y2)‖θ,
where λ = max{|di|, |xi − xi−1| + θ|yi − yi−1 − diyM |}, which can be made less than one for
sufficiently small θ because |xi − xi−1| < 1. Geometrically the metric ‖ · ‖θ is ‘squishing’ the
y axis to gain a contraction, with the requirement that the function is a contraction in the x
co-ordinate. The space H(R2) is then constructed with the norm ‖ · ‖θ to gain the existence of
an attractor, which can be related back to the initial space through using the equivalency of
norms in finite dimensions. To see a full justification of this, see [Bar14] page 215.
To understand why this formulation might produce the graph of a continuous interpolant, we go
through the following thought process. Draw your favourite two distinct continuous interpolants
of the data; for simplicity, we assume that they are contained in the unit square. Now apply the
map fi to these two (refer to Figure 2.2). It is clear that continuous interpolants are mapped
to continuous interpolants and by assumption these graphs lie in the images of the unit square
under the fi. Furthermore, when viewed as compact sets inside R2, these graphs of functions
are getting ‘squished’ together (in the Hausdorff metric sense) under each application of FFIF.
Though we are still working in H(R2), it is as if a contraction has been constructed on the Banach
space (C[0, 1], ‖ · ‖∞). In the next section, we explore the relationship between (H(X), dH) and
(C[0, 1], ‖ · ‖∞).
12 CHAPTER 2. THE HOME H(X) AND INTRODUCTORY FRACTAL MODELLING
2.2 Introduction to Fractal Approximations
A square greyscale image can be thought of as the graph of a function I : [0, 1]2 → [0, 1].
Generally, few assumptions can be placed on such a function; notions of smoothness or even
continuity are unrealistic — but many images foster obvious structure visually despite lacking
these qualities. In fractal image compression (FIC), the structure exploited is that many images
are in some sense self-similar. In this section, we take a simplified instance of this application,
whereby a bounded self-similar function defined in one dimension on the unit interval can be
interpreted as a ‘one-dimensional image’. Furthermore, we place the unrealistic assumption of
continuity on this function for simplicity.
To convey the key ideas that are found in traditional fractal modelling in the simplest possible
manner, we will show how the final example presented in the preceding section is related to
function spaces and approximation. This requires identifying how our home H([0, 1]2) interacts
with standard function spaces such as L∞([0, 1]) and L2([0, 1]). Throughout this working, results
are presented only to convey the ideas fostered in fractal modelling, so the explicit details of
many results are omitted or deferred to later chapters.
2.2.1 From H([0, 1]2) to C([0, 1])
It is still unclear what the attractor is of the final IFS given in the last chapter. This question is
extraordinarily difficult to answer concisely in H([0, 1]2) but becomes clear in the space C([0, 1]).
To begin, the relationship between these two must be established which is done in the following
lemmas.
Lemma 2.12. If f, g ∈ L∞([0, 1]) then ‖f − g‖∞ ≥ dH(Graph(f),Graph(g)).
Proof. Fix f, g and consider the set Br(Graph(f)) = {z : z ∈⋃x∈Graph(f)Br(x)}. Then
Graph(g) ⊂ B‖f−g‖∞(Graph(f)).
Lemma 2.13. The action of FFIF on the graph of a function can be written as an operator
FFIF : C([0, 1])→ C([0, 1]), defined through
FFIF(g)(x) =N∑i=1
((x− xi−1)(yi − yi−1 − diyN )
xi − xi−1+ yi−1 + di
(g
(x− xi−1xi − xi−1
)))· 1[xi−1,xi)(x).
Proof. It is readily checked when g is continuous that FFIF(g) is as well. By construction we
require FFIF to obey the equation below for a point ui(x) ∈ [xi−1, xi):
FFIF
(x
g(x)
)=
(xi − xi−1 0
yi − yi−1 − diyN di
)(x
g(x)
)+
(xi−1
yi−1
)=
(ui(x)
FFIF(g)(ui(x))
),
where ui(x) = (xi − xi−1)x + xi−1. This maps the interval [0, 1] to [xi−1, xi]. The equation
above comes from looking at where a point (x, g(x)) on the attractor (or graph of the function)
2.2. INTRODUCTION TO FRACTAL APPROXIMATIONS 13
is mapped to under FFIF and equating it to its respective image. The above is a slight abuse
of notation as F acts on sets, so we are really doing the calculation on the singleton {(x, g(x))}.Solving the above for FFIF(g)(x) yields the desired result.
Proposition 2.14. [BH89] The attractor G of the IFS in Example 2.11 is the graph of a
continuous interpolant of the data {(xi, yi)}Ni=0.
Proof. Fix f, g ∈ C[0, 1]. Using an identical calculation to that in the proof of Lemma 2.21, we
obtain the inequality
‖FFIF(f)− FFIF(g)‖∞ ≤ max1≤i≤N
|di| · ‖f − g‖∞.
Thus FFIF is a contraction in the supremum norm on C([0, 1]), therefore the attractor is a
continuous function. Note that fi((0, 0)) = (xi−1, yi−1) and fi((1, yN )) = (xi, yi) so the attractor
of FFIF interpolates the data; denote this g∗. Finally from Lemma 2.13
dH(Graph(Fng),Graph(g∗)) ≤ ‖FnFIFg − g∗‖∞n→∞−−−→ 0.
And, since a graph uniquely defines the function, it must be the case Graph(g∗) := G ∈ H([0, 1]2)
is the attractor of FFIF.
Remark 2.15. A consequence of the above relation between H(R2) and C([0, 1]) is that one
may now produce a sequence of elements from H(R2) approaching an element of C([0, 1]). That
is, a sequence of disconnected ‘Rosella sets’ as in the Example 2.10 can be produced so that they
converge to the graph of a non-differentiable continuous function in the Hausdorff topology.
In this formulation, the values di ∈ (−1, 1) can be thought of as ‘free variables’. The typical man-
ner in which these are selected is to capture the roughness desired from a fractal approximate,
explained in the following subsection.
2.2.2 Box-Counting Dimension
When modelling rough data, it is reasonable to capture ‘how rough’ the data is, and one partic-
ular method for this is matching the box-counting dimension of an object with its approximant.
Definition 2.16. [Fal04] The box-counting (or Fractal/Minkowski) dimension of a set V ∈ R2
is given by
dimB(V ) := limε→0
− log(N (ε))
log(ε),
where N (ε) is the minimum number of boxes of diameter ε needed to cover the set E. If the
above limit does not exist, the upper/lower dimension is evaluated through the corresponding
lim inf/lim sup respectively, which is used as an estimated bound.
14 CHAPTER 2. THE HOME H(X) AND INTRODUCTORY FRACTAL MODELLING
In later chapters we will discuss more formally the notion of non-integer dimension, but for now
it is only presented for exploratory purposes. The definition given above is a practical notion of
(possibly) non-integer dimension as the limit can be estimated through various methods, such
as a log–log plot of ε against N (ε) (see [Bar14]). Once this is obtained, we want to ensure the
attractor of our IFS has the same box-counting dimension as what it is approximating. There are
two fundamental theorems relating the IFSs presented thus far in this thesis to the box-counting
dimensions specifically of their respective attractors.
Theorem 2.17. [Mor46] If an IFS of N similitudes with scaling factors λi obeys the open set
condition, then the fractal dimension is the unique positive solution D to∑N
i=1 λDi = 1.
Remark 2.18. The function D →∑N
i=1 λDi − 1 is convex, decreasing, equal to N − 1 at D = 0,
and and has an asymptote at −1 as D → ∞. These are desirable properties for a numerical
solver: in particular, under these conditions, Newton’s root finding method can solve this non-
linear equation to machine accuracy.
Example 2.19. The fractal dimension of the Steemson triangle given in Example 2.10 is ap-
proximately D ≈ 1.5867. This was numerically calculated through Newton’s method with an
initialisation of D0 = 1.5.
Theorem 2.20. [BH89] The fractal dimension of G defined in Example 2.11 is the unique
solution to
N∑i=1
|di|aD−1i = 1,
when the interpolation points do not lie on a straight line and∑N
i=1 |di| > 1, otherwise D = 1.
The proof and discussion of these statements are deferred to Chapter 4.
2.2.3 Moving to L2([0, 1])
Working with the Hausdorff distance or uniform metric is computationally too difficult to readily
solve many optimisation problems that are required to fit a fractal model. A standard solution to
this problem is to revert in some way to Euclidean distance, where such minimisation problems
are simple to solve. This is precisely the key idea founded in traditional fractal image compres-
sion. While we now have the tools to fit a fractal interpolant to a given data set and match the
dimension, our method still contains a major flaw in that we have used all of our data in the
creation of our model. When the overall goal of modelling is to gain a better representation of
a function for compression, such a construction is not useful.
To progress, we will try to gain a ‘good’ approximate that uses as few maps as possible in an IFS.
For computational reasons, FIC always results to trading in the Hausdorff distance for Lp with
p 6=∞. We show how to make this transition in the case p = 2, improving on the methodology
presented in [IS13].
2.2. INTRODUCTION TO FRACTAL APPROXIMATIONS 15
Lemma 2.21. The operator FFIF is a contraction on the metric space C([0, 1]) equipped with
the L2 norm.
Proof. Through a direct computation we have
‖FFIFg − FFIFf‖2 =
√√√√ N∑i=1
di
∫ 1
0
(g
(x− xi−1xi − xi−1
)− f
(x− xi−1xi − xi−1
))2
· 1[xi−1,xi](x)dx
≤ max1≤i≤M
|di|
√√√√ N∑i=1
xi
∫ 1
0(g(x)− f(x))2dx
= max1≤i≤M
|di| · ‖f − g‖2 < ‖f − g‖2.
Remark 2.22. It is not stated in [IS13], but the operator FFIF may not be well defined for a
general L2 function, hence the restriction to C([0, 1]) here.
Fundamentally, fractal image compression is computationally feasible through the discovery of
the Collage Theorem listed below. This result is often proved before the statement of Banach’s
Fixed Point Theorem, but its reinterpretation to the application of fractal geometry is where it
gains its strength.
Theorem 2.23. [Bar14] Let (X, d) be a non-empty complete metric space and T : X → X be
a contraction mapping on X with contractivity factor λ. For the unique fixed point x∗ of T we
have
d(x, x∗) ≤ d(x, T (x))
1− λ∀x ∈ X.
Typically, the contraction mapping is the action of an IFS with the fixed point as the attractor.
The bound above states that if you have found an element of H(X) such that it remains rela-
tively ‘unchanged’ by the action of your IFS, then this element is close to your attractor, which
provides continuity (in the Hausdorff sense) in the IFS parameters. Its namesake stems from
the fact that, if you have an arbitrary compact set and try to form this set by shrunken copies
of itself, the maps made in this collaging process define an IFS whose attractor approximates
the set. Doing this process by hand is what is referred to as the graduate student algorithm and
is currently what gives the ‘most realistic’ fractal models.
For the following selection of the free parameters di we will use the Collage Theorem and follow a
method in [IS13]. Assume that we are given a continuous function g ∈ L2([0, 1]) and wish to find
a FIF with attractor g∗ such that it is close to this function (in the L2 sense, as an approximation
to the Hausdorff distance). Using the Collage Theorem, we overcome the difficulty of having
16 CHAPTER 2. THE HOME H(X) AND INTRODUCTORY FRACTAL MODELLING
no direct relationship to the IFS parameters and the optimisation being solved. First, for some
C > 0 given that N > 2,
‖g − g∗‖22 < C‖g −FLg‖22
=
N∑i=1
∫ xi
xi−1
(g(x)− (x− xi−1)(yi − yi−1 − diyN )
xi − xi−1− yi−1 − di
(g
(x− xi−1xi − xi−1
)))2
dx,
where g∗ is such that Graph(g∗) = G. Note that the above functional is convex in di as a
coercive quadratic form. Therefore we can solve the minimisation by simply finding the critical
point of the function, which we can achieve by setting partial derivatives to zero. This yields
dmini =
∫ xixi−1
(yi−1 + (x−xi−1)(yi−yi−1)
xi−xi−1− g(x)
)(yN (x−xi−1)xi−xi−1
− g(x−xi−1
xi−xi−1
))dx∫ xi
xi−1
(yN (x−xi−1)xi−xi−1
− g(x−xi−1
xi−xi−1
))2dx
. (2.1)
To discretise this result for data points, we can pose the same problem in a different measure
space. We may simply replace the Lebesgue measure on [0, 1] in the above formulation with
Dirac masses at the recorded data points; that is, instead of approximating g we approximate
{xi, yi}Mi=1 := {xi, g(xi)}Mi=1. Furthermore, to compute g(x−xi−1
xi−xi−1
)we make the approximation:
g ≈m∑i=0
yi 1{x : |x−xi|<|x−xj | ∀j 6=i}.
In [IS13], this was the method taken without accounting for the requirement that |di| < 1. The
above will typically give a valid solution if the fixed interpolation points are chosen carefully,
such as selecting local extrema points in an interval. To see why this ambiguity might occur,
consider the instance when we have data {(0, 0), (0.1,−0.1), (0.5, 1), (0.6, 1), (1, 0)} and select
{(0, 0), (0.1,−0.1), (1, 0)} as our interpolation data points. The above formulation yields that
d2 ≈ 1.05 > 1. By adding this requirement to the posed minimisation problem, we add a
constraint on the dimension as well giving a computationally feasible and novel fractal approxi-
mation to functions in C([0, 1]) in the L2 sense. In the next section, using this novel construction
we present the limitations of the methodology in FIC.
2.3 One-Dimensional Fractal Image Compression
In the following computational example, we will show the methodology of fractal approximation
in its most simple form. That is, finding a self-similar continuous function defined on [0, 1] that
approximates a (possibly non-smooth) function that displays self-similar properties. This yields
the following optimisation problem:
2.3. ONE-DIMENSIONAL FRACTAL IMAGE COMPRESSION 17
solve arg mindi∈R
n∑i=1
(c2id2i + c1idi + c0i);
subject to |di| < 1,n∑i=1
|di| > 1,n∑i=1
ki|di| = 1;
where
ki = aD−1i ,
c0i =
∫ xi
xi−1
(yi−1 +
(x− xi−1)(yi − yi−1)xi − xi−1
− g(x)
)2
dx,
c1i =
∫ xi
xi−1
2
(yi−1 +
(x− xi−1)(yi − yi−1)xi − xi−1
− g(x)
)(yN (x− xi−1)xi − xi−1
− g(x− xi−1xi − xi−1
))dx,
c2i =
∫ xi
xi−1
(yN (x− xi−1)xi − xi−1
− g(x− xi−1xi − xi−1
))2
dx,
are all assumed to be known values. The constraints in this problem are non-smooth so standard
optimisation techniques are not directly applicable
By assumption ki = aD−1i < 1, so the third constraint implies the second. Geometrically, the
second condition represents the points outside the L1 unit ball and the third equation is an
L1 ellipse in this area. Now the points of non-differentiability are eliminated by how the first
constraint intersects the L1-ellipse∑n
i=1 ki|di| = 1. This leaves 2n disconnected convex sets of
the form {di|∑n
i=1 sikidi = 1, di ∈ [0, 1)} for all combinations of si ∈ {−1, 1}. By relaxing the
problem to allow |di| ∈ [0, 1], we create a compact convex set with smooth boundary on which
we wish to minimise a convex function. This can be graphed for when N = 2.
d1
d2
−1 1
−1
1
Figure 2.3: Feasible set of solution for when N = 2. The areas of the constraints are coloured
by |di| < 1 (red),∑n
i=1 |di| > 1 (blue) — this intersection purple, and∑n
i=1 ki|di| = 1 in pink.
The intersection of these three constraints gives 22 = 4 disconnected smooth boundary convex
bodies.
18 CHAPTER 2. THE HOME H(X) AND INTRODUCTORY FRACTAL MODELLING
The solution above is incomplete, because when n grows larger there are exponentially more
optimisations to solve. For general fractal image compression, both a continuous and a discrete
optimisation must be performed to model a photograph. A continuous optimisation is done to
find parameter magnitudes, which are the magnitudes of the di above. A discrete optimisation
must also be performed to find the orientations of the maps, these are the signs of the di —
determining whether or not the map ‘flips’ across the x-axis. In our simplified example, we
eliminate the discrete optimisation by taking the signs of the di given by the unconstrained
optimum solution given in Equation 2.1. This is the argument projection of the global optimum
to the (L2) closest convex body in our feasible set of solution.
On the next page we provide a computational experiment of implementing the algorithm given
above, whereby we highlight the limitations of this method. A FIF consisting of four maps de-
fined on the interval [0, 53 ] was constructed and truncated to [0, 1]. This creates a function that
is rich in self-similarity and it is not unreasonable to expect this function be well approximated
by another FIF. Implementing the given method with the agnostic choice of interpolation points
being evenly spaced in the domain of our approximating function, our FIF approximation to a
truncated FIF will never describe this function exactly since 4 and 5 are co-prime.
In the first given FIC algorithm [Jac92], an exact description of an (infinite resolution) image
of a Cantor set will never be achieved. Simply stated, in this algorithm the generalised ‘points
of interpolation’ are chosen dyadically and two is co-prime to 3 (the Cantor set sitting on the
points without a two in their trinary expansion). Generally, it is unknown explicitly what type of
function’s methods from FIC approximate — only that it has been shown to work well empirically
for photographs. For a graphically convincing approximation of this Cantor set image, a great
deal of functions (upwards of six) is needed, despite only two maps required to generate the
image to infinite resolution. Likewise, 16 maps were needed to gain an approximation that
resembled our self-similar function. For a final hindrance of this method, by moving to L2 much
of the intuition from the formulation in H(X) is lost.
Example 2.24. Take the (discretised) one-dimensional image I ∈ R1000 given by I := e1 =
(1, 0, . . . , 0), that is, sampled values from a function on the unit interval. An approximant to
this image is given by e2 = (0, 1, 0, . . . , 0). In any Lp norm for 1 ≤ p ≤ ∞ it will be the case that
e2 is a ‘bad’ approximant to e1. In the Hausdorff distance e2 is still close to e1, and visually, e2
is a ‘good’ approximation to e1.
The goal for the remainder of this thesis is to create a methodology that focuses on fractally
modelling an object, opposed to describing it, in the hope that when we do describe a self-
similar object it also captures its self-similar properties. For example, given an exact copy of
the Cantor set, how does one deduce if this set is the attractor of an affine IFS. Currently there
is no solution to this question recorded in the literature. We want to have an approximation
method that may find an exact solution to such an inverse problem when it exists.
2.3. ONE-DIMENSIONAL FRACTAL IMAGE COMPRESSION 19
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0Approximant Dimension = 1.2 Hausdorff Distance = 0.072
DataApproximant
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0Approximant Dimension = 1.4 Hausdorff Distance = 0.046
DataApproximant
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0Approximant Dimension = 1.6 Hausdorff Distance = 0.132
DataApproximant
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0Approximant Dimension = 1.8 Hausdorff Distance = 0.346
DataApproximant
Figure 2.4: Rough data (blue) and self-similar function approximatants (in the L2 sense) with
varying box-counting dimensions.
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0Approximant Dimension = 1.538 Hausdorff Distance = 0.097
DataApproximant
Figure 2.5: Data (blue) and an approximant that is self-similar, matches the data’s box-counting
dimension and close to the data in the L2 sense — that approximates the Hausdorff distance
well in this instance.
20 CHAPTER 2. THE HOME H(X) AND INTRODUCTORY FRACTAL MODELLING
Chapter 3
The Probabilists P(X) and
Self-Similar Measures
‘Let’s first talk about a coin. Understand the coin, understand probability.’
— Pierre Portal.
In this chapter we move to P(X), the space of compactly supported normalised Borel measures
on a metric space (X, d). By working with measures, we can use the measure theoretic quantities
of moment data to assist in creating better models for self-similar objects. This leads us to the
main problem in this chapter: how do we explicitly relate moment values of a measure to our
IFS parameters? To address this, we derive formulas relating the moments of a one-dimensional
affine self-similar measure, complex similitude self-similar measure and two-dimensional affine
self-similar measure to the IFS parameters that generate them.
The moments of a self-similar measure have in general been of interest in constructing a Hilbert
space and studying its orthogonal polynomials [Man96, Man13] (see Figure A.2). We intend
to use them in addressing the inverse problem of reconstructing a self-similar measure from its
moments. To this end, we hope that simply measuring the moments of a data set will lead to
constructing an IFS that approximates a self-similar object. The formulas derived in the one-
dimensional and complex case have been studied elsewhere, but to the best of our knowledge
the formulas for the two-dimensional case to be a novel addition to the literature.
The chapter has the following structure. The space P(X) and a contractive operator whose fixed
point yields a ‘fractal measure’ is constructed. This is analogous to how we built self-similar
functions in H(X). We then move to deriving the moment formulas for a variety of self-similar
measures. This includes measures given by the attractor of IFSs consisting of one dimensional
real and complex affine functions, and finally two dimensional affine functions. We conclude with
computational examples of calculating the moments of two dimensional self-affine measures.
21
22 CHAPTER 3. THE PROBABILISTS P(X) AND SELF-SIMILAR MEASURES
3.1 Self-Similar Measures
To motivate moving on from H(X) to a more abstract space, consider the following example.
Example 3.1. Let the IFS FCant = {([0, 1], | · |) ; f1(x) = 13x, f2(x) = 1
3x + 23} be given
as in Example 3.1. The action of FCant is given by the Hutchinson operator (Definition 2.1),
FCant : H([0, 1])→ H([0, 1]) and its attractor is the Cantor set: C.
Given the Cantor set C, is there a computational way to find the IFS FCant, or any other IFS that
gives the same attractor? To simplify the problem, assume that a Cantor set is the attractor
of an IFS F containing two affine maps acting on [0, 1]. Computationally we are fitting the
fractal model F = {([0, 1], | · |); f1(x) = a1x+ b1, f2(x) = a2x+ b2} to C and solving for the IFS
parameters. Using the methods developed from fractal image compression used in the former
chapter yields the following optimisation problem:
solve arg mina1,a2∈(−1,1)b1,b2∈[0,1]
dH(C, A) subject to F(A) = A,
where A ∈ H(X), and the constraint in this optimisation implies that A is the attractor of
the IFS F . Ideally one would wish to minimise the Hausdorff distance directly. The space
(H(X), dH) lacks a norm, so any general theory of optimisation pertaining to a Banach space is
lost. Furthermore, in this example one can no longer switch to C[0, 1] and create a least squares
type solution as we did in our simplified FIC method, as the Cantor set cannot be identified in
a such a space. In H(X) there is no immediate solution to the inverse problem above.
A new home must be created for our fractal models. In a synonymous fashion to how H(X)
was created, the space P(X) of compactly supported normalised Borel measures on a compact
metric space will be constructed. Here we place the stronger assumption of (X, d) being compact
rather than just complete so the arguments presented are more concise.
Remark 3.2. Given a metric space (X, d), the Borel sets are precisely those which can be
formed through countable unions, countable intersections and compliments of the topology’s
open sets. This collection of such sets will be denoted B(X).
Definition 3.3. Let (X, d) be a compact metric space, then P(X) is the space of normalised
Borel measures onX. This space is metrisable with the Monge-Kantorovich metric given through
dP(µ, ν) := supg∈Lip1(X)
{∫gd(µ− ν)
},
where the notation∫gd(µ − ν) :=
∫gdµ −
∫gdν is used and Lip1(X) := {f : X → R| |f(x) −
f(y)| ≤ d(x, y)∀x, y ∈ X} is the space of Lipschitz functions X → R whose Lipschitz constant
is no larger than one.
3.1. SELF-SIMILAR MEASURES 23
Proposition 3.4. Let (X, d) be a compact metric space, then the metric space (P(X), dP) is
compact.
A sketch of this proof will be briefly discussed. This is done to highlight why the claim is true
and to also show that the topology on this space is equivalent to a weak topology.
Assume (X, d) is a compact metric space and let C(X,R) be the space of continuous functions
from X to R. Then every element in C(X,R) is necessarily bounded. The dual of this space, de-
noted C∗(X,R), is the space of regular Borel measures on X with bounded variation through the
Riesz-Markov Representation Theorem. Note that the measures in C∗(X,R) are not required to
be normalised. In the dual space induced norm, the unit ball of C∗(X,R) is precisely the set of
measures such that µ(X) ≤ 1 for µ ∈ C∗(X,R). By the Banach-Alaoglu Theorem, the unit ball
of C∗(X,R) is compact under the weak-∗ topology. It is a known fact [Hut79] that this topology
is metrisable with the metric dP on the closed subset {µ | µ ∈ C∗(X,R), ‖µ‖ = 1} = P(X).
Similar to how an IFS was defined in the previous chapter, an IFS with probabilities will be
defined in an analogous way. Let a stochastic vector of length N be p = (p1, . . . , pN ) ∈ RN such
that 0 < pi < 0 and∑N
i=1 pi = 1.
Definition 3.5. Let an IFS F = {(X, d); f1, . . . , fN} be given, where it is further assumed
(X, d) is compact. Then given a stochastic vector p = (p1, . . . , pN ), an IFS with probabilities is
the collection
M := {(X, d); f1, . . . , fN ; p}.
Finally a contractive operator on our newly created compact metric space will be defined.
Through another mild abuse of notation, the symbol M for an IFS with probabilities is also
used to represent this operator M : P(X)→ P(X) in the definition below.
Definition 3.6. Let an IFS of probabilities M = {(X, d); f1, . . . , fN ; p} be given. Define f−1i :
B(X) → B(X) through f−1i (U) := {u ∈ X : f(u) ∈ U} for U ∈ B(X). Then the operator
M : P(X)→ P(X) is given through
(M(µ))(S) =
N∑i=1
piµ(f−1i (S)) ∀S ∈ B(X).
This operator M is a contraction on P(X) equipped with the metric dP. To see this in a
simplified instance, take the case when our IFS with probabilities consists of only one map f1
forcing p1 = 1, granting
dP(M(µ),M(ν)) =
supg∈Lip1(X)
∫gd(Mµ−Mν) = sup
g∈Lip1(X)
∫g ◦ f1d(µ− ν) = sup
g∈Lip1(X)
∫λ1gd(µ− ν),
24 CHAPTER 3. THE PROBABILISTS P(X) AND SELF-SIMILAR MEASURES
where λ1 is the contractivity factor for the map f1. It is easily checked by definition that
g := 1λ1g ◦ f1 ∈ Lip1(X). The argument above works for an IFS with multiple maps, but makes
the argument unclear with notation.
This implies the operator M has a unique attractive fixed point through Banach’s Fixed Point
Theorem. We will denote this fixed point µA and call this the attractive measure of an IFS with
probabilities. To demonstrate the relationship between an IFS and an IFS with probabilities we
present the following example.
Example 3.7. Let MCant = {([0, 1], | · |); f1(x) = 13x, f2(x) = 1
3x+ 23 ; p1 = p2 = 1
2} be an IFS
with probabilities. Then taking MCant : P([0, 1]) → P([0, 1]) as defined above, we may iterate
the operator MCant on any element of P([0, 1]) to aid in finding the attractive measure µA.
Seeing how FCant acted on an arbitrary set in H([0, 1]) in Example 3.1 was difficult, thus the
unit interval was used as an ‘easy’ set to iterate. To stay in familiar territory here, we iterate
a measure that is absolutely continuous with respect to Lebesgue measure which we denote by
µL. Specifically, let dMn
dµLbe the Radon-Nikodym derivative ofMn(µL) with respect to Lebesgue
measure. Then dM0
dµL(x) = x for x ∈ [0, 1] µL-almost everywhere. Proceeding with one iteration
of MCant reveals some intuition:∫dMCant(µL)(x) =
1
2
(∫3 · 1[0,1/3](x)dµL(x) +
∫X
3 · 1[2/3,1](x)dµL(x)
)⇒ dM1
dµL(x) =
3
2
(1[0,1/3](x) + 1[2/3,1](x)
),
where we are identifying dM1
dµLup to Lebesgue-almost everywhere equivalence. After repeatedly
applyingMCant, the general relationship dMn
dµL(x) =
(32
)n ·1FnCant([0,1])
(x) can be identified. This
is uniform measure on the nth approximation of the Cantor set given in Example 3.1. The
attractor ofM is ‘uniform measure on the Cantor set’ which we denote µC . In general, the fixed
point measure µA is supported on a subset of A [Bar14]. A ‘traditional’ probability density
cannot be written for µC as this measure is singular with respect to Lebesgue measure, hence
the Radon-Nikodym derivative is not defined. This restriction is similar to how we cannot write
the Cantor set as a union of intervals, yet its most common construction comes from the closure
of the limit of a union of intervals.
In the example above, it was not straightforward to make a series of pictures converging to our
fractal attractor as we did in Example 2.1. We now ask, is there a way to visualise our fractal
measures? The short answer is yes [Elt87], through an algorithm called the chaos game [BE88].
This algorithm will give an efficient means to compute images of fractal objects in H(X) as the
support of µA is A. Furthermore, this algorithm will allude to how measure theoretic quantities
aid in reconstruction of a fractal object in H(X). Finally, we rid ourselves of some misconcep-
tions of fractal geometry given by the deterministic algorithm of fractal generation (see Example
2.1).
3.1. SELF-SIMILAR MEASURES 25
On a side note, the following survey on an easily made mistake in fractal geometry was con-
ducted on several Honours/Masters students at ANU MSI. Take the IFS FCant from Example
3.1 and look at the limit FnCant([0, 1]) as n tends to infinity. When questioned, those asked re-
sponded that this limit (in the Hausdorff sense) will be the Cantor set. Abusing some notation,
the question of FnCant((0, 1)) as n tends to infinity was posed. The answer always given was
the empty set. In iterating (0, 1), we have thrown away only a countable amount of points in
comparison to iterating [0, 1], whereas the Cantor set is uncountable. The actual answer to the
second question is that we are left with µC-almost all the Cantor set. Less intuitive examples
of this same misunderstanding can be made, such as removing countably infinite line segments
from a Sierpinski triangle. This point will be formalised later in the language of measure theory.
What is noteworthy is the different perspective that thinking in a measure theoretic manner
grants, whereby the confusion above should never happen. This is a minor insight to what
the space P(X) has to offer in better understanding fractal objects. We will see that studying
fractals in P(X) will lead to new ways in which we can create fractal models.
Before explaining this new algorithm, some properties of self-similar measures need to be de-
duced. These properties are most easily seen through an explicit example with the general
theory quoted. The tools required to explicitly prove the quoted results will be presented when
we unite the spaces H(X) and P(X) in the next chapter.
Example 3.8. Consider the IFS FL = {([0, 1], | · |); f1(x) = 12x, f2(x) = 1
2x + 12)}. Define an
associated IFS with probabilities ML by equipping the maps in FL with p1 = p2 = 12 . The
respective attractors are [0, 1] ∈ H([0, 1]) and Lebesgue measure on [0, 1], which we previously
denoted by µL ∈ P([0, 1]).
Using this example, a dynamical system will be created; that is the tuple (X,B(X), µ, T ) where
X is a set, B(X) its associated Borel subsets, µ a finte measure on B(X) and T : B(X)→ B(X)
a transformation such that µ(T−1(O)) = µ(O) for all O ∈ B(X). In our example above, we have
the first three items for this tuple. To construct the transformation T we require the following
definition.
Definition 3.9. If an IFS F = {(X, d); f1, . . . , fN} is such that each of the maps fi is invertible,
then F is an invertible IFS. In this case, the inverse IFS is given by F−1 = {(X, d); f−11 , . . . , f−1N }.In this case we define F−1 : B(X)→ B(X) through
F−1(O) =N⋃i=1
f−1i (O).
Example 3.10. (continuing Example 3.8)
The inverse IFS of FL is F−1L = {([0, 1], | · |); f−11 (x) = 2x|[0,1/2], f−12 (x) = (2x−1)|[1/2,1]}, where
the notation |[0,1/2] is taken to mean the domain of the function is restricted to the interval
[0, 1/2]. Typically this restriction need not be written through the definition of f−1 : B(X) →
26 CHAPTER 3. THE PROBABILISTS P(X) AND SELF-SIMILAR MEASURES
B(X), but we do it here for illustrative reasons that become clear in an example given later.
Note that F−1L ({12}) maps to {0, 1}, whilst for every x ∈ [0, 1] \ {12} the singleton set {x} is
mapped to another singleton under FL. This point will be formalised later, but the idea here
is that {12} is a µL-null set so the above ambiguity does not matter for ‘almost everywhere’
statements.
Lemma 3.11. Let F be an IFS consisting of invertible functions and M be a corresponding
IFS with probabilities. Let the fixed points/attractors of these be A ∈ H(X) and µA ∈ P(X)
respectively. Then F−1 preserves the measure µA. That is µA(F(O)) = µA(O) for all O ∈ B(X).
Proof. Take O ∈ B(X),
µA(F(O)) =M(µA(F(O))) =
N∑i=1
piµA(f−1i fiO) =
N∑i=1
piµA(O) = µA(O).
Definition 3.12. A dynamical system is a Borel measure space (X,B(X)) with a normalised
Borel measure µ : B(X) → [0, 1], and a measure preserving transformation T : (X,B(X), µ) →(X,B(X), µ) as defined in the preceding lemma. We define this as the tuple (X,B(X), µ, T ).
Corollary 3.13. Given an IFS with probabilities, the tuple (X,B(X), µA,F−1) forms a dynam-
ical system.
Remark 3.14. The above proof of Lemma 3.11 does not depend on the probabilities chosen.
In particular, there are uncountably many measures given by equipping an IFS F with various
stochastic vectors that all make the function F a measure preserving transformation.
Example 3.15. (continuing Example 3.8)
The tuple ([0, 1], B([0, 1]), µL,F−1L ) forms a dynamical system, explicitly it is a n-ary system
[Wal00] with n = 2 that can be rewritten (up to Lebesgue-almost everywhere equivalence) as
([0, 1], B([0, 1]), µL, x → 2x mod 1). The graph of this function is shown below. Note that
x 7→ 2x mod 1 maps the unit interval to the unit interval. In the language of IFS, F−1 maps
the attractor A to the attractor A in general.
12
0
1
1
x
F−1
L
18
Figure 3.1: n-ary transformation for n = 2 and the (splitting) orbit of 18 under F−1L .
3.1. SELF-SIMILAR MEASURES 27
The ambiguity alluded to in Example 3.10 will now be addressed. The function x 7→ 2x mod 1
is a well defined function, however F−1 : [0, 1] → [0, 1] is not well defined by looking at how
F−1 maps singleton sets by again noting F−1({12}) = {0, 1}. In the definition of a measure
preserving transformation however, we only require that µ(T−1O) = µ(O). Therefore, we only
require that F−1 and x → 2x mod 1 agree on a set of µ-measure one. Explicitly, the set we
take is U = [0, 1] \⋃n∈NFn({12}) to ensure that this ambiguity does not happen for the nth
iterates of F . It is easily seen that µL(U) = 1 as U has only disregarded a countable collection
of Lebesgue null-measure sets.
It is convenient to use fractal codespace to continue with our example. The theory presented in
this example holds rather generally, so the machinery is meant to be illustrative.
Recall [N ] := {1, . . . , N} and let [N ]k be the words of length k, with [N ]N = limk→∞[N ]k. A
(metrisable) topology on [N ]N can be defined by letting the sets of the form:
∆(ω1 · · ·ωl) = {ω ∈ [N ]N : ω|l = ω1 · · ·ωl} ωi ∈ [N ],
form a topological sub-base, where ω|l is the lth truncation of the infinite word ω. In general
ergodic theory, the sets of the form above are referred to as cylinder sets [Wal00]. Furthermore,
for each letter i ∈ [N ] we associate a fixed probability pi ∈ (0, 1) such that∑N
i=1 pi = 1 and
measure ν : B([N ]N)→ [0, 1] where ν(∆(ω1 · · ·ωl)) =∏li=1 pωi .
A dynamical system on the measure space ([N ]N,B([N ]N), ν) can be formed by considering the
operator S : [N ]N → [N ]N given by S(ω1ω2ω3 . . . ) = (ω2ω3 . . . ). It is readily checked that S is
ν-measure preserving on the cylinder sets defined above through the following calculation
ν(S−1(∆(ω1 · · ·ωl))) = ν
⋃j∈[N ]
∆(j ω1 · · ·ωl)
=∑j∈[N ]
pjν(∆(ω1 · · ·ωl)) = ν(∆(ω1 · · ·ωl)),
where we have used that the sets: ∆(j ω1 · · ·ωl) are disjoint for j ∈ [N ]. A dynamical system is
given by the tuple ([N ]N,B([N ]N), ν, S), that is better known as a Bernoulli shift on N letters
with probabilities p.
With this notation in place, we may introduce a fundamental structure in fractal geometry.
Proposition 3.16. The dynamical system ([0, 1], B([0, 1]), λ,F−1L ) is isomorphic (in the ergodic
theory sense [Wal00]) to a Bernoulli shift on two letters with uniform probabilities.
Proof. Two dynamical systems are isomorphic if there exists a bijective doubly measure preserv-
ing map π (that is π and π−1 are measure preserving transformations) such that the following
diagram commutes.
B([0, 1]) B([0, 1])
B([N ]N) B([N ]N)
F−1
π−1 π−1
S
π π
28 CHAPTER 3. THE PROBABILISTS P(X) AND SELF-SIMILAR MEASURES
Let U = [0, 1] \ {x : x = i2k, k ∈ N, i ∈ {1, . . . , 2k − 1}} and note that λ(U) = 1. Now given
a, b ∈ U with a < b, we can uniquely represent a and b in their dyadic expansions as a =∑∞
i=1ai2i
and b =∑∞
i=1bi2i
for ai, bi ∈ {0, 1}. Define the mapping π−1 : U → [2]N through
π−1(a) = (a1 + 1, a2 + 1, . . . ),
where the ai are the digits given by the number a’s dyadic expansion. It is easily checked that
π−1 is bijective and doubly measure preserving on dyadic intervals, whose unions can arbitrarily
approximate any interval. This shows π−1 is measure preserving on the generators for our Borel
subsets which means π−1 is doubly measure preserving (Theorem 1.1 [Wal00]). Finally, we have
for an arbitrary a ∈ U that
π−1 ◦ F−1L (a) = π−1 ◦ F−1L
( ∞∑i=1
ai2i
)= π−1
( ∞∑i=2
ai2i
)= (a2 + 1, a3 + 1, . . . ),
and
S ◦ π−1(a) = (a2 + 1, a3 + 1, . . . ).
Thus, π−1 ◦ F−1L = S ◦ π−1, giving the claim.
The above map π defined as the inverse of π−1 is often referred to as the projection map from
codespace. Proposition 3.16 is a simplification of the stronger result given below.
Remark 3.17. If an IFS that did not obey the OSC was selected, opposed to the one given in
the example above, then π will not be bijective.
In this case, the projected dynamical system from codespace to the attractor becomes a factor
of a Bernoulli shift [Wal00] and leads into the theory of fractal tops.
Theorem 3.18. Assume an invertible IFS F of N maps that obeys the open set condition is
given. For any associated IFS with probabilities M, the dynamical system (X,B(X), µA,F−1)is isomorphic to a Bernoulli shift on N letters with probabilities p.
Remark 3.19. To prove the theorem above with the tools we have presented thus far would be
cumbersome, so we delay the key detail to Chapter 4. Instead, we will highlight the importance
of the OSC. This assumption gives sufficient separation to prove that points on the attractor of
F with multiple addresses form a µA-null set and may be disregarded.
The isomorphism in Theorem 3.18 gives a great amount of structure on our fractal objects when
they obey the open set condition, in the sense that we may think of them in the same way as
one would model flipping a coin repeatedly.
Corollary 3.20. The dynamical system (X,B(X), µA,F−1) is ergodic. That is any B ∈ B(X)
which satisfies F(B) = B has µA-measure 0 or 1.
3.1. SELF-SIMILAR MEASURES 29
Given this structure, another way to generate our fractal objects can be given. We do this by
approximating the fixed-point measure µA of an IFS with probabilities through ergodic theory.
Given a dynamical system (X,B(X), µ, T ), the Point-Wise Ergodic Theorem states
limn→∞
1
n
n∑i=1
f(T i(x0)) =
∫fdµ
for µ-almost any x0 ∈ A and f : X → X continuous. The interpretation of this equality is that
the average given by following the orbit of a single point x0 is equal to average of the measure.
Or equivalently
1
n
n∑i=1
1(·)(xk)weakly−−−−→ µ,
where xk is the orbit of a point under iteration T in the support of the measure of µ. The
quantity on the left is what we will refer to as the empirical distribution of µ.
Through the ergodic theorem, we are able to construct an algorithm that approximates the fixed
point measure µ of an IFS with probabilities in a computationally efficient manner. This concept
is explored deeply in the work of John Elton [Elt87], whereby the language of iterated function
systems and ergodic theory are united. The key detail in this work is that it is non-trivial to
sample a point on the support of the fixed-point measure µA. In Elton’s work, he shows under
far fewer assumptions than what we have here that when the IFS is ‘contractive on average’,
one can create such an algorithm. We present the main lemma he proved, which is trivial under
our current assumptions, to elucidate this central algorithm.
Notation 3.21. Given a word ω1ω2ω3 · · · = ω ∈ [N ]∞ for ωi ∈ [N ], we recall ω|n is the nth
level truncation of this word. For instance take [3] = {1, 2, 3} then (123123111 . . . )|3 = (123).
Furthermore, given an IFS F = {(X, d); f1, . . . , fN} we define fω|n : X → X through
fω|n(x) := fω1 ◦ · · · ◦ fωn(x).
Lemma 3.22. Let x, y ∈ X with x 6= y, any ε > 0, and any word ω ∈ [N ]∞ be given. Then
there exists n ∈ N such that
d(fω|m(x), fω|m(y)) < ε d(x, y),
for all m ≥ n.
This result states that given any point y ∈ X and any point x in the support of µ, that these
points are mapped arbitrarily close to one another under fω|m for m large enough. If this
sequence of functions is chosen with the appropriate probabilities to approximate the orbit of a
sequence {T k(x0)}nk=0, then we have effectively approximated a sample from the measure µ and
the algorithm is complete. An example is shown below.
30 CHAPTER 3. THE PROBABILISTS P(X) AND SELF-SIMILAR MEASURES
Figure 3.2: The distribution 1n
∑ni=1 1(·)(xk) for various n (with added colouring corresponding
to which map was last applied to xk in its orbit) weakly approaching a measure supported on a
Steemson triangle.
Given this algorithm to approximate a fractal measure, we may demonstrate how H(X) and
P(X) differ in modelling photographs. Two fractal models of leaves are shown below, one in
H([0, 1]2) and the other in P([0, 1]2). In this example, the self-similar measure is represented by
a grey-scale image by making a frequency plot of the empirical distribution, whilst an element
in H([0, 1]2) is shown as a black and white image being a set. From a modelling perspective
P(X) is a ‘larger’ space.
Figure 3.3: The attractors of an IFS in H([0, 1]2), and an analogous IFS with probabilities in
P([0, 1]2) produced by the graduate student algorithm. In the first instance, H([0, 1]2) has the
ability to model sets and is therefore represented by a black and white image. In the second
case we display the empirical distribution of our fractal measure as a grey-scale image. In this
instance we also can model the leaf interior through carefully selecting the self-similar measure
supported on the self similar set A.
Given the image of the leaf on the right, can we find the IFS that generates it? This is one
of the major open problems in fractal reconstruction and approximation which we will aim to
answer in a simplified instance. We aim to do this by relating where the orbit of {xk}Nk=0 lands
‘on average’ in the algorithm above to the IFS maps, and further, their map parameters that
generate this orbit.
3.2. FRACTAL MOMENT THEORY 31
Specifically, we view µA as a probability measure and aim to evaluate∫xdµA(x) which is the
expectation of where this point lands. More broadly we would like to calculate the moments of a
probability distribution{∫
xkdµA(x)}∞k=0
. Note that as the measure µA has compact support,
all of these are integrals are finite for each k ∈ N0. Furthermore, such a measure can be described
by the infinite collection of such integrals — this creates a determined moment problem.
Proposition 3.23. Let a compactly supported finite Borel measure µ on Rn be given. Then all
the moments mn :=∫xndµ are finite, and further, uniquely determine the measure µ.
Proof. We may re-interpret the metric dP through stating a measure is characterised by the
integral of continuous functions, that is∫fdµ =
∫fdν ∀f continuous
then µ = ν. If all the moments exist and are equal on either measure µ or ν, then we have
that∫qdµ =
∫qdν for all polynomials q. As we assumed that µ and ν have compact support,
continuous functions are uniformly approximated by polynomials granting that we may find
polynomial q′ such that ∫|f − q′|d(µ− ν) < ε,
for any ε > 0 and any continuous function f . This implies that dP(µ, ν) = 0, so µ = ν.
Remark 3.24. The statement above is not always true when a measure does not have compact
support as polynomials may not uniformly approximate continuous functions. In general, a
measure is not uniquely determined by finitely many moments, this is an undetermined moment
problem.
The fractal models presented in Figure 3.3 were formed by looking at the self-similar properties
of a maple leaf. The current standard to attempt such a modelling task is to have a human will
heuristically create the maps that approximate the leaf — recall this is the graduate student
algorithm. No computer algorithm is available to automate such a process. As shown above,
the grey-scale leaf model is characterised by its infinite collection of moments. Our goal is to
use the moments of an object to assist a computer in creating such self-similar models. Now
that the moment problem has been motivated, the following section describes how one efficiently
calculates the moments for a given affine IFS.
3.2 Fractal Moment Theory
Given an IFS whose functions are all affine, one may identify a recurrence relation to calculate
the moment integrals. This allows for the explicit closed form formula’s for the moments of a
self-similar measure, completely dependent on the IFS parameters. For the calculation of fractal
moments, we rely heavily on the following proposition.
32 CHAPTER 3. THE PROBABILISTS P(X) AND SELF-SIMILAR MEASURES
Proposition 3.25. LetM = {(X, d); f1, . . . , fN ; p} be an invertible IFS with probabilities. Then
the attractive measure µA of M obeys∫g(x)dµA(x) =
N∑i=1
pi
∫g(fi(x))dµA(x),
for all g ∈ L1(X,B(X), µA).
Proof. Fix g ∈ L1(X,B(X), µA), then by direct computation,∫g(x)dµA(x) =
∫g(x)dMµA(x) =
∫g(x)d
(N∑i=1
piµA(f−1i (x))
)
=N∑i=1
pi
∫g(x)dµA(f−1i (x)) =
N∑i=1
pi
∫g(fi(x))dµA(x),
where we have used a change of variable and the linearity of the integral above.
We will investigate three cases of calculating the moments for a self-similar measure. These
cases pertain to what type of functions are given in an IFS with probabilities. The functions fi
in our IFS will have the forms:
1. f : R→ R: one real dimension affine functions;
2. f : C→ C: one complex dimension affine functions except complex conjugation;
3. f : R2 → R2: two-real dimension affine functions.
The first two instances have already been investigated in [BD85, EST07, JKS11]. Only in [BD85]
were these moment integrals considered for practical use. In [EST07] a neat convergence result
is shown for the complex case, and in [JKS11] this result is quoted and used to study the Hilbert
spaces that can be formed by a self-similar measure. We firstly review these first two cases, then
present an extension that has use for fractal modelling.
3.2.1 One-Dimensional Affine Moments
In this section we will be restricting our IFSs to those of the form,
F = {([0, 1], | · |); f1(x) = a1x+ b1, . . . , fN (x) = aNx+ bN}, (3.1)
where |ai| < 1, bi ∈ [0, 1]. This simple case is shown to gain a better understanding for the
calculation of these moments before moving to more complicated examples.
Remark 3.26. Taking the metric space above to be ([0, 1], | · |) is not a specialisation of consid-
ering the space (R, | · |). Any fractal attractor in R is a compact subset and can be (uniformly)
scaled and translated to fit inside the unit interval.
3.2. FRACTAL MOMENT THEORY 33
Consider Proposition 3.25 and take g : R → R given through g(x) = xn. This function is in
L1([0, 1],B([0, 1]), µA) as our measure is normalised and compactly supported.
Proposition 3.27. Let F be an IFS of the form given by Equation 3.1. Associate a stochastic
vector p to create the IFS with probabilities, say M, and let µA be its attractive measure. Then
the moments mn :=∫xndµA(x) obey
mn =N∑i=1
pi
n∑j=0
(n
j
)ajmjb
n−j .
Proof. Using the recurrence in Lemma 3.25 with selecting g : R→ R by g(x) = xn yields
∫xndµA(x) =
N∑i=1
pi
∫(fi(x))ndµA(x).
Substituting the fi into this equation and using the binomial expansion gains,
∫xndµA(x) =
N∑i=1
pi
∫(aix+ bi)
ndµA(x) =N∑i=1
pi
∫ (n
j
)ajix
jbn−ji dµA(x)
=
N∑i=1
pi
(n
j
)aji
∫xjµA(x)bn−ji =
N∑i=1
pi
n∑j=0
(n
j
)ajimjb
n−ji .
We have now established a recurrence relation on the calculation of moments. Of particular
interest is that the equation above is linear in the mn terms and m0 = 1 as µA is normalised,
giving the following corollary.
Corollary 3.28. The moments of µA defined above can be computed exactly through the recur-
rence for n > 0,
mn =
∑Ni=1 pi
∑n−1j=0
(nj
)ajmjb
n−j
1−∑N
i=1 piani
,
where m0 = 1.
Remark 3.29. The denominator in the above formula will not be 0 as∑N
i=1 piani < 1 for all
n ∈ N.
Example 3.30. Let MCant = {([0, 1], | · |); f1(x) = 13x, f2(x) = 1
3x + 23 ; p1 = p2 = 1
2} be as
in Example 3.1 with the attractive measure being uniform Cantor measure: µC . The graph of
the cumulative distribution (the real valued function c →∫1[0,c]dµC) is shown below, which is
commonly known as the ‘Devil’s Staircase’. This function is known to be (Holder) continuous,
indicating that the measure µC has no atoms, that is singleton sets are of µC-measure zero.
34 CHAPTER 3. THE PROBABILISTS P(X) AND SELF-SIMILAR MEASURES
0.0 0.2 0.4 0.6 0.8 1.0
c
0.0
0.2
0.4
0.6
0.8
1.0
∫ c 0dµC
Figure 3.4: Cumulative distribution of the Cantor measure, which is also a self-similar function.
The first ten moments of the (uniform) Cantor measure are readily calculated and equal to:
M10 =
(1,
1
2,3
8,
5
16,
87
320,
31
128,10215
46592,
2675
13312,
2030721
10915840,
3791353
21831680
)T.
Proposition 3.27 shows that the higher order moments are linearly related to the lower order
terms. The following passage aims to capture this structure in a systematic way so that the
moments can be calculated efficiently. Riordan arrays are infinite lower triangular matrices that
can be used to capture a sequence of terms through matrix multiplication [WW08]. We will
be using a simple example of these in the form of a variant of the Pascal matrix for binomial
expansions. The reason we make the comment to the general theory is that in calculating the
moments in higher dimensions, the Riordan array structure is lost which was desirable to keep
in [JKS11].
Notation 3.31. Let mn be defined as above (mn :=∫xndµA(x)). Let Mx be in`∞(R), the space
of bounded countably infinite sequences with entries from R, with entries given by (Mx)i = mi.
Note that since our self-similar measure is assumed to be normalised and contained in the unit
interval, the moments of such a measure are necessarily bounded and it is appropriate for Mx
to be in `∞(R). Graphically we have:
Mx :=
∫x0dµA(x)∫x1dµA(x)∫x2dµA(x)
...
=
m0
m1
m2
...
.
We call Mx the moment vector associated with the measure µA. Furthermore we let Mx,n be
the nth level truncation of this vector for n ∈ N0.
This notation grants an easy simplification on our moment formula through the following Propo-
sition.
3.2. FRACTAL MOMENT THEORY 35
Proposition 3.32. Let f : R → R with f(x) = ax + b, and Mx ∈ `∞ be a vector of moments.
Then
φf (Mx) :=
∫f(x)0dµA(x)∫f(x)1dµA(x)∫f(x)2dµA(x)
...
= R(a, b)Mx,
where R(a, b) is an (infinite) matrix defined entry-wise by (R(a, b))i,j = 1i≥j(ij
)aibi−j where
i, j ∈ N0.
Proof. Recall that∫f(x)ndµA(x) =
∑nj=0
(nj
)ajbn−jmj . Then through a direct calculation
(φf (Mx))i,j=
∞∑k=0
(R(a, b))n,k(Mx)k =
∞∑k=0
1n≥k
(i
j
)aibi−jmk =
n∑k=0
(i
j
)aibi−jmk =
∫f(x)ndµA(x).
Writing out the first few terms of the matrix R(a, b) reveals its structure:
R(a, b) =
(00
)a0b0 0 0 · · ·(
10
)a0b1
(11
)a1b0 0 · · ·(
20
)a0b2
(21
)a1b1
(22
)a2b0 · · ·
......
.... . .
.
Using the above proposition we may easily rewrite the previous formula given in Proposition
3.27.
Corollary 3.33. Let R be defined as in the preceding proposition, then the recurrence in Propo-
sition 3.27 for a specified IFS with probabilities M can be rewritten as
Mx =N∑i=1
piR(ai, bi)Mx. (3.2)
Define Rn(a, b) to be the n × n truncation of R(a, b) for a, b ∈ R. Furthermore, define the
operator ΦR,M,n : Rn → Rn associated to an IFS with probabilities M through:
ΦM,n(x) :=N∑i=1
pi Rn(ai, bi)x.
Remark 3.34. By putting M in the subscript of the operator Φ, we have specified all the
information needed to form Φ. This is important to note as we will keep the notation Φ for
more generalised IFS than what has been firstly presented. Further, the operator ΦM,n is linear
on finite dimensions — so it may be represented by an explicit matrix. For this simple initial
case, we restrict the convergence analysis of an iterative formula to find these moments only to
finite dimensions.
36 CHAPTER 3. THE PROBABILISTS P(X) AND SELF-SIMILAR MEASURES
Observation 3.35. The matrix that represents ΦM,n is lower triangular as the sum of lower
triangular matrices. Furthermore it has eigenvalues {∑N
i=1 piaji}nj=0.
Proposition 3.36. The sequence {ΦkM,n(v)}∞k=1 converges to a multiple of Mx,n as k →∞ for
Lebesgue-almost any v ∈ Rn.
Proof. Equation 3.2 shows that Mx,n is an eigenvector of ΦkM,n with corresponding eigenvalue
one. By Observation 3.35, one is the maximum eigenvalue of ΦM,n, so by the Von-Mises iteration
(power iteration), ΦkM,n(v)
∞k=1
converges to the dominant eigenvector for v not orthogonal to
Mx,n.
The above working yields two systematic stable methods to calculate the moments of a one
dimensional affine IFS in an efficient manner. One may either solve the linear system given in
Equation 3.2 or use the iterative method given above. The first method involves solving a lower
triangular system of equations, that can be done by the method of forward substitution which
will recover the formula obtained in Corollary 3.28. The former method will give a convergent
geometric series whose limit is a multiple of Mx. The convergence ratio for this geometric
series is no larger than∑N
i=1 |piai| (this bounds the norm of the second largest eigenvalue of the
operator ΦM,n). The former method avoids matrix inversion.
Example 3.37. Consider the IFS with probabilities given in Example 3.30. Then using the
iterative formula, the computed, we now have
Φ5MCant,10
((1, . . . , 1)T ) =
(1, 0.5000, 0.3750, 0.3125, 0.2718, 0.2421, 0.2192, 0.2009, 0.1860, 0.1736)T ,
which agrees with the exact calculation up to four decimal places.
3.2.2 One Dimensional Affine Complex Moments
This section presents the commonly used moment matrices studied in [EST07, JKS11] and
what was cited as a solution to the ‘Two Dimensional moment inverse problem’ in [HM90].
The matrix equations below should be familiar to the reader from the previous subsection.
Restricting attention to IFSs with probabilities of the form:
M = {(B1,C, | · |); f1(z) = a1z + b1, . . . , fN (z) = aNz + bN ; p}, (3.3)
where B1,C := {z ∈ C | |z| ≤ 1} and ai, bi ∈ B1,C with |ai| 6= 1.
Remark 3.38. Like the remark in the previous section, the assumption that the metric space
above is not all of C is not restrictive as one can always scale an IFS attractor to fit inside the
unit ball.
3.2. FRACTAL MOMENT THEORY 37
Recall µA is the attractive measure for this IFS with probabilities. Let Mz×z ∈ `∞(C)× `∞(C),
the space of infinite matrices with bounded coefficients, be such that mi,j := (Mz×z)i,j =∫zizjdµA(z) ≤ 1. Finally allow M z×z,n be the n× n truncation of Mz×z.
Proposition 3.39. Let M be of the form 3.3, then the moment matrix Mz×z of the invariant
measure µA obeys
Mz×z =
N∑i=1
piR(ai, bi)Mz×zR∗(ai, bi),
where R∗ is the conjugate transpose of R.
Proof. Through a calculation on the components of the matrix,(N∑r=1
prR(ar, br)Mz×zR∗(ar, br)
)i,j
=N∑r=1
pr
∞∑k=0
∞∑l=0
1i≥k1j≥l
(i
k
)akbi−k
(j
l
)albj−l
∫zkzldµA(z)
=N∑r=1
pr
∫(akz + bk)
i(akz + bk)jdµA(z)
=
∫zizjdMµA(z) =
∫zizjdµA(z) = (Mz×z)i,j ,
where the invariance of the measure µA has been used above.
Observation 3.40. The family of IFS considered (Equation 3.3) in this subsection are a subset
of similitude transformations. For instance, the generalised Sierpinski triangles (Example 2.1)
cannot be created with the above family of IFS (excluding the traditional Sierpinski), as complex
conjugation is not an isometry obtainable with the maps listed.
Similar to the real case in the previous subsection, an iterative formula can be constructed by
forming an appropriate linear transformation. Define the the operator ΦM : `∞(C)× `∞(C)→`∞(C)× `∞(C) through:
ΦM(Z) =N∑i=1
piR(ai, bi)ZR∗(ai, bi).
The convergence of the iterates of this operator to the matrix of moments Mz×z has been
studied in [EST07, JKS11]. In [EST07], a fixed point method is constructed to prove uniform
convergence by selecting the appropriate closed set of a Banach space where this operator is
contractive. We will generalise this result in the next subsection. The convergence is justified
in [JKS11] by noting that Mn(µ)weakly−−−−→ µA for any µ ∈ P(X) and identifying that each entry
of the matrix as a specific integral of a continuous function. In our case, we will simply prove
the result in finite dimensions through an identical argument given in the previous subsection.
38 CHAPTER 3. THE PROBABILISTS P(X) AND SELF-SIMILAR MEASURES
In both of the aforementioned papers, the Hankel structure of the matrix R(a, b) was desirable
and thus kept so that the Riordan arrays [WW08] could be kept for cleanliness of computation.
We will rid ourselves of this structure now to reveal possible generalisations that may have been
overlooked. Let (en1,n2)i,j = (1i=n1,j=n2)i,j for en1,n2 ∈ `∞(C) × `∞(C) and e′n ∈ `∞(C) be
defined analogously. Define the vectorisation map v : `∞(C)× `∞(C)→ `∞(C) through:
v(en1,n2) = e′τ(n1,n2),
where τ(n1, n2) = ((n1+n2)+1)(n1+n2)2 +n2. To aid with visualisation, this map does the following
on Mz×z:
Mz×z =
m0,0 m0,1 m0,2 · · ·m1,0 m1,1 · · ·m2,0 · · ·
v−→ (m0,0,m1,0,m0,1,m2,0,m1,1,m0,2, . . . )T = v(Mz×z)
T .
It is clear that this map has an inverse,
v−1(e′k) = ebkct−(k−(bkct(bkct+1)/2)),k−(bkct(bkct+1)/2)
where b·ct = b1+√1+8 ·2 c − 1; the function which gives the first triangular number lesser than
a given number. This inverse can be determined through a straightforward counting argument
using Pascal’s triangle.
The map v is an isometry from (`∞(C)× `∞(C), ‖ · ‖∞,∞)→ (`∞(C), ‖ · ‖∞). Define the linear
operator ΦM = v ΦM v−1 : `∞(C) → `∞(C) which may be represented by an infinite matrix;
let ΦM,n be its n× n truncation.
Lemma 3.41. The operator ΦM,n has the same eigenvalues as ΦM,n
Proof. Take an eigenvalue λ and corresponding eigenvector v of ΦM,n and note
ΦM,n x = λx ⇐⇒ v ΦM,n v−1 v x v−1 = λ v x v−1.
Observation 3.42. The matrix that represents the operator ΦM,n is lower triangular. We
illustrate this with N = 1, meaning there is only one map in our IFS, that is N = 1, and n = 6,
ΦM,6 =
1 0 0 0 0 0
b1 a1 0 0 0 0
b1 0 a1 0 0 0
b21 2a1b1 0 a21 0 0
b1b1 a1b1 a1b1 0 a1a1 0
b21 0 2a1b1 0 0 a21
. (3.4)
3.2. FRACTAL MOMENT THEORY 39
For any n ∈ N0, the operator ΦM,n will be lower triangular from the ordering chosen in the
vectorisation map. The largest degree polynomial term of z in (az + b)i(az + b)j will always
be aiziajzj which by construction of the vectorisation map v forces this term will lie on the
main diagonal of the matrix. Additionally, by the ordering chosen in the vectorisation map any
lower degree terms will sit beneath the main diagonal. When additional maps are added to the
IFS, we gain the sum of lower triangular matrices which is itself lower triangular. Therefore,
in general ΦM,n has eigenvalues contained in {∑N
k=1 pkaikajk}i,j∈N0 which are all of magnitude
less than one for i, j 6= 0 (this follows from the fact that |ai| < 1, together with the triangle
inequality).
The lemma and observation above, combined with the equation given in Proposition 3.39 shows
that M z×z,n is an eigenvector of ΦM,n with eigenvalue one. Again by the reasoning given in
the real case from the former subsection, we have that for almost any z ∈ Cn (with respect to
Lebesgue measure) that ΦkC,M,n(z)
k→∞−−−→ cMzn for c ∈ C as this is the dominant eigenvector.
This gives an iterative method to compute the moments in the complex case that is once again
forms a geometric sequence with ratio∑N
i=1 pi|ai|. There is some merit to this observation that
is not merely listed for explanatory reasons. We have determined precisely the eigenvalues of
the truncated linear operator which will give strict bounds to the convergence of the iterative
formula in finite dimensions — something noted in [EST07] where they state their bound is far
from strict and in [JKS11] where no bound was needed and hence not given.
3.2.3 Two Dimensional Affine Moments
The matrix in Equation 3.4 is sparse under the main diagonal. This alludes to the idea that
there may be a more general class of IFS for which we may compute its moments with this
structure. In this subsection, it is found that the moments of an attractive measure given by an
IFS consisting of two-dimensional affine maps acting on (R2, ‖ · ‖2)1 can be computed in a very
similar fashion. This is done by ‘filling in’ the matrix created in Equation 3.4. We do this by
considering an IFS F of N maps that has functions of the form x 7→ hi(x, y) = Ai(x, y)T + bi
where Ai ∈ R2×2 and bi in R2×1. We adopt the standard notation for this type of IFS:
Ai =
(ai bi
ci di
)bi =
(ei
fi
)ai, bi, ci, di, ei, fi ∈ R. (3.5)
Associate this IFS with a vector of probabilities p to make this an IFS with probabilities. Now
we aim to explicitly evaluate
mxn1 ,yn2 :=
∫xn1yn2dµA(x, y),
1As the attractor is compact, one can always make the underlying metric space compact by taking a sufficiently
large closed ball around the origin such that the attractor of the IFS is a subset and the maps in the IFS send
this ball into itself.
40 CHAPTER 3. THE PROBABILISTS P(X) AND SELF-SIMILAR MEASURES
where µA is as always the invariant measure of the IFSM with probabilities. In this subsection
we again give two methods to compute these moments. The first is based on solving a matrix
system and the other based on an iterative formula. The analysis of the convergence of the
iterative formula given for the calculation of these moments will be done in infinite dimensions,
unlike the two previous subsections, as this is the most general case considered here.
Before presenting any results, an example of calculating the first three moments of an affine IFS
is shown briefly so the presented content is not lost in notation.
Example 3.43. Consider the IFS of the form 3.5, note mx0,y0 = 1. Further see,
mx1,y0 =
∫xdµA(x, y) =
N∑i=1
∫pi(aix+ biy + ei)dµA(x, y) =
N∑i=1
pi(aimx1,y0 + bimx0,y1 + ei)
mx0,y1 =
∫ydµA(x, y) =
N∑i=1
∫pi(cix+ diy + fi)dµA(x, y) =
N∑i=1
pi(cimx1,y0 + dimx0,y1 + fi).
This yields mx0,y0
mx1,y0
mx0,y1
=
1 0 0∑Ni=1 piei
∑Ni=1 piai
∑Ni=1 pibi∑N
i=1 pifi∑N
i=1 pici∑N
i=1 pidi
mx0,y0
mx1,y0
mx0,y1
.
Solving the above system of equations will give the first three moments for the measure µA. The
matrix above is not lower triangular. This means that an explicit recursive formula cannot be
defined for this calculation as it had been previously done in the one dimensional case, nor can
a Hankel structure be identified or or be associated with a Riordan array.
Observation 3.44. The moments of an affine IFS in two dimensions can be calculated through
solving a system of linear equations. See the expansion:
mxn1 ,yn2 =
∫xn1yn2dµA(x, y)
=
N∑r=1
pr
∫(arx+ bry + er)
n1(crx+ dry + fr)n2dµA(x, y)
=N∑r=1
pr
∫ ∑i+j+k=n1
n1!
i! j! k!(arx)i(bry)j(er)
k
∑i+j+k=n2
n2!
i! j! k!(crx)i(dry)j(fr)
k
dµA(x, y)
=
N∑r=1
pr∑
i+j+k=n1
i+j+k=n2
∫n1! n2!
i! j! k! i! j! k!(arx)i(bry)j(er)
k(crx)i(dry)j(fr)kdµA(x, y)
=N∑r=1
pr∑
i+j+k=n1
i+j+k=n2
n1! n2!
i! j! k! i! j! k!air c
ir b
jr d
jr e
kr f
kr mi+i, j+j ,
3.2. FRACTAL MOMENT THEORY 41
where the linearity of the integral has been used above and finite sums exchanged.
Let deg(mn1,n2) = n1 + n2, then deg(mi+i, j+j) under the constraints i + j + k = n1 and
i+ j+ k = n2 is bounded by n1 +n2. For a fixed n1 and n2 with the aforementioned constraints
enforced, the number of ways deg(mi+i, j+j) may achieve this bound for i, i, j, j ∈ N0 is n1+n2+1
ways.
Likewise, the number of solutions to n1 + n2 = d for all numbers in N0 is precisely n1 + n2 +
1. Consequently the system of equations created is perfectly determined when the moment
equations are linearly independent.
As in the previous subsections, we aim to create a linear operator that allows for the iterative
calculation of these moments. As this is the most general case investigated in this chapter, con-
vergence analysis will be done on the infinite collection of moments for a specified self-similar
measure.
Notation 3.45. By selecting the terms that pertain to a fixed mn1,n2 in the expansion from the
observation above, define (the infinite matrix) φhr associated with an affine map hr by letting
(φhr)u1,u2 be∑τ(n1,n2)=u1τ(i+i,j+j)=u2
n1! n2!
i! j! (n1 − i− j)! i! j! (n2 − i− j)!air c
ir b
jr d
jr e
n1−i−jr fn2−i−j
r 1i+j≤n11i+j≤n2,
for i, j, n1, n2 ∈ N0. Note that τ is a bijective function in the above notation. The indicator
function on the formula above is to ensure that the exponents on the er and fr terms are natural
numbers as in the derivation in the sum above.
Through construction
Mx,y =N∑r=1
prφhrMx,y, (3.6)
for an infinite vector (Mx,y)τ(i,j) := mxi,yj =∫xiyjdµA(x, y) which may not necessarily lie in
`∞ as no assumption on the attractor of the IFS has been made.
At this stage there are two options for calculating the moment vector Mx,y of µA:
1. truncate the matrix represents ΦM at the bkct level for k ∈ N to ensure a completely
determined system is left and solve this system of equations;
2. find an iterative formula based on the operator ΦM so that matrix inversion is avoided.
We will investigate the iterative formula here, as avoiding matrix inversion here is desirable.
The form in which we have presented small examples is deceptive. As the size of the matrix
grows, we stray away from a diagonal like form and have to solve a system of equations which
42 CHAPTER 3. THE PROBABILISTS P(X) AND SELF-SIMILAR MEASURES
is far more dense and difficult to solve. With this in place, if an iterative method were obtained
then the larger order moments could be computed with ease.
Let 1`∞ := {s ∈ `∞(R) | (s)0 = 1}, the space of bounded sequences that begin with 1. Equipping
this space with the infinity norm makes this a complete space as 1`∞ is a closed subset of `∞ in
the supremum topology. We define the operator ΦM : 1`∞ → 1`∞ through
ΦM(x) =N∑r=1
prφhrx.
It is readily checked that (ΦM(x))0 = 1 for any x ∈ 1`∞. It is untrue that the operator above
is bounded for any measure µ that has compact support in R2. For example, any compact
measure whose support is outside the ‖ · ‖∞ unit ball will result in a unbounded operator in the
supremum topology. This is because the integrals∫xn1yn2dµ(x, y) can grow arbitrarily large
for n1, n2 ∈ N0. Equation 3.6 shows that the moment vector of the measure of µA is a fixed
point of ΦM. Our goal now is to show that this fixed point is unique in 1`∞ equipped with a
certain topology.
The convergence proof we provide comes from a contractive fixed point argument with an ap-
propriate topology, which is a generalised result of what was proven in [EST07]. The structure
of this argument is straightforward and will be summarised briefly should the reader choose to
skim the details of the following section. Firstly, under restrictive conditions the convergence is
guaranteed through a contractive fixed point argument. This simple case is then related to the
general instance, and finally an appropriate Banach space where the general case reduces to the
restrictive solution is constructed.
Lemma 3.46. Let x, y ∈ 1`∞ such that ‖x− y‖∞ < 1. Then
‖φhrx− φhry‖∞ ≤ supn1,n2∈N0
(|ar|+ |br|+ |er|)n1(|cr|+ |dr|+ |fr|)n2 .
This follows through application of the triangle inequality and summing the rows of the absolute
values of the matrix φhr . From now we will assume that any element x ∈ `∞ is also in 1`∞. For
notational convenience we will take ‖φhr‖∞ = sup‖x‖∞, (x)1=0 ‖φhrx‖∞ so that we may simply
write the norm of the operator, and not take a difference φhrx− φhry as we have done above.
Firstly assume ‖φhrx − φhry‖∞ is less than one, it is immediate that the operator ΦM would
be a contraction in the supremum topology. This is a first trivial case as the assumption that
the convergence relies on the translation terms er, fr of the IFS is completely artificial and their
dependence should be nullified. We do this by appropriately transforming our space.
Lemma 3.47. Given an IFS F = {(X, d); f1, . . . , fn} with attractor A, the IFS g ◦ F ◦ g−1 =
{(X, d); g ◦ f1 ◦ g−1, . . . , g ◦ fn ◦ g−1} for a continuous invertible function g has attractor g(A)
when each of the functions g ◦ fi ◦ g−1 are contractive with respect to the metric d.
3.2. FRACTAL MOMENT THEORY 43
Proof. First note that g(A) ∈ H(X) as the image of a compact set under a continuous function.
Now we can verify directly that
(g ◦ F ◦ g−1)g(A) =N⋃i=1
g ◦ F ◦ g−1g(A) = g
(N⋃i=1
fi(A)
)= g(A),
where we have used the fact that F(A) = A.
Lemma 3.48. Given the assumptions of Lemma 3.47, if an associated IFS with probabilitiesMis given with attractor µA then the IFS with probabilities g ◦M◦ g−1 has the attractor µA ◦ g−1.
Proof. By the definition of g−1 : H(X) → H(X), if x /∈ g(X) then g−1(x) = ∅. Consequently,
g−1 is measurable. This ensures that the measure µA ◦ g−1 is well-defined. Now again we may
directly compute
µA ◦ g−1 =
(N∑i=1
piµA ◦ f−1i
)◦ g−1 =
N∑i=1
pi(µA ◦ g−1) ◦ g ◦ f−1i ◦ g−1,
where again we have used the fixed point equation of µA.
Lemma 3.49. Given the function g : R2 → R2 such that (x, y) 7→ g(x, y) = (αx, βy) for
α, β > 0 and the moments mµ,xn1 ,yn2 :=∫xn1yn2dµ(x, y) of a two dimensional measure µ, then
the measure µ ◦ g−1 has moments
mµ ◦ g−1,xn1 ,yn2 = αn1βn2mµ,xn1 ,yn2 .
Proof. Through a direct calculation,∫xn1yn2dµ ◦ g−1(x, y) = αn1βn2
∫xn1yn2dµ(x, y).
Corollary 3.50. Define the linear operator S : 1`∞ → 1`∞ through
(S(z))τ(n1,n2) = αn1βn2xτ(n1,n2).
Then the operator Φg◦M◦g−1 : 1`∞ → 1`∞ acting on a moment vector M ∈ l1⊕∞ is
Φg◦M◦g−1(M) =
N∑i=1
piSφhiS−1M.
Remark 3.51. The operator S can be represented by an (infinite) diagonal matrix, that is we
are simply re-weighting the dimensions of the space. The corollary above is just noting that we
have changed the co-ordinates of our space, exactly in the same manner one proves the existence
of an attractor for a FIF in Example 2.11.
44 CHAPTER 3. THE PROBABILISTS P(X) AND SELF-SIMILAR MEASURES
Proposition 3.52. Define the norm ‖ · ‖g on 1`∞ through ‖ · ‖g := ‖S · ‖∞ and let an IFS with
probabilities M be given. Then the corresponding moment operator ΦM obeys
‖ΦM(x)‖g ≤∥∥Φg◦M◦g−1
∥∥∞ ‖x‖g ,
for g of the form given in Lemma 3.49 and any x ∈ 1`∞.
Proof. It is clear that ‖ · ‖g is a well defined norm as S is linear and bijective. Recalling the
notation given in Lemma 3.46, we have the bounds
‖ΦM(x)‖g =
∥∥∥∥∥N∑i=1
piφhi(x)
∥∥∥∥∥g
=
∥∥∥∥∥SN∑i=1
piφhi(x)
∥∥∥∥∥∞
=
∥∥∥∥∥N∑i=1
piSφhiS−1S(x)
∥∥∥∥∥∞
≤∥∥Φg◦M◦g−1
∥∥∞ ‖x‖g ,
where we have used that S is linear and bijective, and Corollary 3.50 in the last line.
Theorem 3.53. Let F be an IFS of the form described in Equation 3.5 with the added require-
ment that each bi = 0 and |ai|, |di| < 1. Then for the associated IFS with probabilities, M, with
any stochastic vector p of length N ; the iteration Φn(x0) converges to the vector of moments
Mx,y of the attractive measure µA in the ‖ · ‖g topology for any initial x0 in 1`∞.
The assumptions above describe a reasonable type of IFS, such as those which generate fractal
interpolation functions (Example 2.11). Indeed, the only contractive triangular affine maps are
those which obey the conditions above through calculating the joint spectral radius of the IFS
[ABVW10]. This instance of two-dimensional IFS is one generally understood in the literature
— such as the reasonable understanding of the dimension theory in this case [BRS16].
Proof. Select g : R2 → R2 such that g(x, y) → (αx, βy) for α, β > 0, then each of the maps in
the IFS g ◦ fi ◦ g−1 have the form:
g ◦ fi ◦ g−1(x, y) =
(aix+ eiα
βαcix+ diy + fiβ
).
These functions can be made contractive in the (R2, ‖ · ‖θ) metric space in an identical fashion
to Example 2.11, meaning that we have a valid IFS. Select α, β > 0 such that (|ai|+α|ei|) and
(βα |ci|+ |di|+ β|fi|) < 1 granting
∥∥Φg◦M◦g−1
∥∥∞ ≤ max
s∈[N ]sup
n1,n2∈N(|ar|+ α|er|)n1
(β
α|cr|+ |dr|+ β|fr|
)n2
< 1.
By Proposition 3.52 ΦM : (1`∞, ‖ · ‖g) → (1`∞, ‖ · ‖g) is a contraction. Note that as the first
element of x ∈ 1`∞ is fixed, the supremum above is only over N for n1 and n2. It is immediate
that (‖ · ‖g, 1`∞) is complete as (‖ · ‖∞, 1`∞) is complete. Equation 3.6 then shows that the
unique fixed point of the operator ΦM is the moment matrix of µA given by Mx,y.
3.2. FRACTAL MOMENT THEORY 45
Remark 3.54. The proof above works for triangular systems, as we know precisely when an IFS
is contractive in relation to its map parameters. Furthermore, the infinite matrix that represents
this operator in this instance is diagonal, so in finite dimensions one could argue convergence
like the previous subsections. It is not as refined for when the matrix is dense, even in the case
of two-dimensional similitudes as evinced in the argument below. This result is sufficient and
constructive for our usage in the next chapter so we include it for thoroughness.
Theorem 3.55. Take an IFS of the form described in Equation 3.5, where we additionally
assume that the matrix Ai has the form λiOi for an isometry Oi and 0 < λi < 1; that is the
functions in the IFS are similitudes. Then associating this IFS with probabilities in the same
fashion as above, the iteration Φn(x0) converges to the vector of moments Mx,y of the attractive
measure µA in the ‖ · ‖g topology for any x0 in 1`∞ when λi <1√2
for all i ∈ [N ].
Proof. Select g(x, y) → (αx, αy) for α > 0, then each of the maps in the IFS g ◦ F ◦ g−1 have
the form
g ◦ F ◦ g−1(x, y) =
(aix+ biy + eiα
cix+ diy + fiα
).
From the form of the functions in the IFS, the parameters necessarily obey 1λi
(|ai| + |bi|) =1λi
(|ci|+ |di|) = (| sin(θ)|+ | cos(θ)|) ≤√
2 for some θ ∈ [0, 2π] as the matrix Ai will have entries
relating to a rotation matrix. To gain a contraction as we did in the previous argument, we must
additionally assume that λi <1√2
and then the argument follows in an identical manner.
Remark 3.56. The assumption λi <1√2
is unnatural. Recall that the topology on P(X) is
equivalent to a weak topology. The operator ΦM is really acting on vectors of integrals of
polynomials (which are dense on continuous functions on µA). Such integrals determine these
measures under our assumptions on the IFS (Proposition 3.23). The operator ΦM maps vectors
of moments corresponding to one measure µ to another vector of moments corresponding to
M(µ). The convergence of limn→∞ΦnM(x), intuitively speaking, should be strictly dependent
on the existence of an attractive measure for M. In general, this question is hard to answer in
terms of the IFS parameters.
46 CHAPTER 3. THE PROBABILISTS P(X) AND SELF-SIMILAR MEASURES
3.3 Computation of Two-Dimensional Affine Moments
Using Elton’s Theorem [Elt87] we may computationally validate the new formulas and algo-
rithms we have produced for the calculation of moments arising from a 2-dimensional affine IFS
with probabilities. Here we go through the calculation of the first 21 = b21ct moments of the
Steemson triangle given in Example 2.1. A Python notebook with this experiment may be found
here.
The moments of the Steemson triangle from our example were calculated directly through solving
the set of linear equations. This calculation was done in 32-bit floating point arithmetic. The
code provided can calculate the moments exactly in this method when the IFS parameters are
given as exact rational numbers. This computation yields
M21 =(
1, 0.4241632143, 0.2631243811, . . . , 0.0225122445).
There were two computational experiments run here. Firstly, the moments were calculated
through the iterative method discussed. We expected this ΦiMSteem,21
(x0) to uniformly converge
at a geometric rate to M21 where x0 was selected as the vector of 1’s. This is what is observed
in the graph below on the left, whereby plotting the − log term of the residuals shows the digits
of accuracy increase with increasing iterations.
Next, the measure was estimated in the weak sense through Elton’s theorem, pictures of this con-
vergence of the empirical distribution are provided. The moment integral through this method
is known to converge at a rate of O(
1√i
), where i represents the length of an orbit of the dynam-
ical system. The error between M21 and Elton’s approximation was estimated for i ∈ {1, 106}and converged at the expected rate as shown in the figure on the bottom right.
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0i
0
2
4
6
8
log 1
0||M
21i
,21(
x 0)||
0 200000 400000 600000 800000 1000000Iterations : i
0.000
0.002
0.004
0.006
0.008
0.010
||M21
Mel
t,i||
Figure 3.5: Convergence of the iterative formula and Elton’s Theorem validation of the moment
formula’s.
Chapter 4
The Inverse Problem of Moments
‘Some systems of hundreds of polynomials can be solved, whilst others of five cannot.’
— Anand Deopurkar.
Moment reconstruction of a self-similar measure was the initial premise of attempting the inverse
problem of fractal approximation [BD85]. Shortly after this idea was given, many attempts to use
moments in the aid of reproducing a fractal measure have been attempted [MS89, HM90, Man13].
Such methods aim to reconstruct self-similar measures in P(X); however this space is ‘large’ —
for every self-similar set in H(X) there are undenumerable self-similar measures supported on
this set. The associated difficulty of this problem led to the discovery of the algorithms used for
fractal image compression (FIC) which all stay within the space H(X). Fundamentally, what
we do differently in this chapter is to reconstruct elements of H(X) with methods made in P(X)
— in the same fashion Lp was used to approximate elements of H(X) in Chapter 2. The inten-
tion here is that these methods from P(X) may create a more flexible model to those given by
considering function spaces.
Three instances of fractal reconstruction through its moments are given — each pertaining to
the three examples presented in Chapter 2. Recall the examples are: a Cantor set, a Steem-
son triangle, and a truncated fractal interpolation function that has no exact IFS description.
Firstly the use of numerical methods to solve polynomial systems are utilised in finding a one
dimensional self-similar measure through knowledge of its moments. This is done in exact ratio-
nal arithmetic through the use of Grobner basis. Next, a relationship between H(X) and P(X)
is established through theory pertaining to the open set condition, fractal tiling, and dimension
theory [BHR06, BV18, BRS19]. This relationship is exploited to create a method whereby the
attractor in H(X) given by an IFS of similtudes that obeys the OSC can be reconstructed nu-
merically with moment and dimension theory methods. Finally, using the iterative methods for
calculating moments found in the former chapter yields a moment approximation theory that
can be used in approximating an object that is not self-similar. The method presented here has
the ability to find an exact (up to rounding error) description of a self-similar object if it exists
— unlike the method presented in Example 2.11.
47
48 CHAPTER 4. THE INVERSE PROBLEM OF MOMENTS
4.1 Algebraic Solution to Polynomial Systems
This section is to introduces modern algebraic polynomial solving techniques with an explicit
example, showing a constructive formulation of solution to the problem stated in the previous
chapter (Example 3.1) that has not been previously investigated. This method is not generally
suitable for modelling purposes, but can aid in the explicit study of an exact affine self-similar
measure and will give some insight to the non-linear system of equations that we aim to solve.
Constructive answers to questions like, ‘what are the dynamical systems that can be produced
from an affine IFS of N maps given a self-similar measure?’ can be given through this method.
Particularly, one may have an IFS that does not obey the open set condition and beg the
existence of another affine IFS that does obey the OSC whose IFS with probabilities has the
same attractive measure. Simple examples of this situation can be constructed.
Example 4.1. Take the IFS FCant which we have seen obeys the OSC and has the attractive
measure of ‘uniform Cantor measure’ when given uniform probabilities on the maps. Then
the following IFS F = {([0, 1], | · |); f1(x) = x3 , f2(x) = x
3 + 23 , f3(x) = x
6 , f4(x) = x6 + 2
9 }has the Cantor set as an attractor and clearly does not obey the OSC through a simple set
containment argument. Equipping this IFS with the probability vector p = (14 ,12 ,
18 ,
18) grants
that the attractive measure is uniform Cantor measure.
This is an example of a perfectly overlapping IFS, made explicit through Hochmans Exponential
Separation Condition (HESC) [BRS16, BRS18, BRS19], a separation condition less restrictive
than the OSC. Determining when an IFS is perfectly over-lapping is an open question in fractal
geometry, which we mention as the tools and examples we develop in this chapter may aid in
better understanding this problem.
Through the methods developed in the former chapter, the first n moments of any affine IFS in
one or two dimensions can be calculated. Now assume these are given, and measure from which
they were taken wants to be recovered. In this section, the technique used gives all the IFS(s)
with probabilities whose attractive measure has the same first n moments. In many cases, when
n is larger than the free parameters being solved for, this produces precisely all the IFS(s) who
have a common attractive measure as the non-linear system becomes completely determined by
this finite collection of moments. We do not prove this in general as it would be a cumbersome
detour of our modelling goal, but show a novel instance that also exemplifies how moment theory
may intertwine studying polynomials and self-similar measures.
Proposition 4.2. Assume the polynomials in the variables ai and bi
n∑j=0
(n
j
)aji b
n−ji mj ,
have the form qni for a fixed polynomial qi in ai, bi for i ∈ [N ], n ∈ N0, ai ∈ (−1, 1). Then if
4.1. ALGEBRAIC SOLUTION TO POLYNOMIAL SYSTEMS 49
an IFS with probabilities is made from the parameters ai, bi with these constraints, its attractive
measure µA is a point mass if ai 6= 0, otherwise µA is a sum of point masses.
Proof. Assume ai 6= 0 for all i ∈ [N ]. Through looking at the case for n = 1, it must be the
case qi = aim1 + bi through the linearity of the integral. We claim that mn = mn1 , this is easily
seen through induction. The base case is clear, then through a calculation with the inductive
hypothesis
(aim1 + bi)n + ani (mn −mn
1 ) = (aim1 + bi)n,
showing mn = mn1 when ai 6= 0. If the latter were the case, the attractive measure is trivally a
collection of point masses as ai = 0. Now we have the identity
N∑i=1
pi(aim1 + bi)n = mn =
N∑i=1
pi
∫(aix+ bi)
ndµA(x) ∀n ∈ N0,
which occurs exactly when µA = 1(·)(m1).
Remark 4.3. This proposition firstly shows, as one would expect, that a point mass with one
free variable needs only one moment equation to fully describe the measure. Secondly this
example shows how the divisibility of polynomials given by the moment equations can be used
to study the properties of a self-similar measure.
We now indirectly address the problem posed in Example 3.1 through viewing the linear —
in the moment values — system ΦMMx = Mx as a non-linear system of equations in the IFS
parameters. In one dimension we have the (infinite) polynomial system of equationsqn(a,b, p) :=N∑i=1
pi
n∑j=0
(n
j
)ajimjb
n−ji −mn
∞
n=0
, (4.1)
where we seek to find the common zeros of all of the polynomials in the above set. As a conve-
nient notation, we have used qn(a,b, p) to mean qn is a polynomial in the variables ai, bi, pi for
all i ∈ [N ]. In general, one cannot solve a polynomial in one variable with degree larger than five
in closed form, let alone solve an infinite multi-variable system of polynomials with increasing
degree. For this reason, we must specialise to a certain self-similar measure to solve this inverse
problem for. We will solve this system of equations exactly for a single simple example, so that
this might elude to a numerically feasible method to solve the general instance.
A brief overview of key tools needed to analyse a polynomial system of equations (most of which
can be found in the paper [MT01]) will be given. This example shows a link between the very
heuristic method of identifying an IFS attractor and the systematic working found in compu-
tational algebra. From now, this chapter can be read interactively with the Python notebook
provided.
50 CHAPTER 4. THE INVERSE PROBLEM OF MOMENTS
Formalism for how polynomial systems are typically structured is required to solve this system.
A monomial in n variables: x1, . . . , xn, is the product xj := xj11 · · ·xjnn for j ∈ Ns0. The set
of monomials in n variables is denoted T n. A (real) polynomial in n variables is of the form
p =∑
J 6=0 cjxj with cj ∈ R where we assume the sum to only have finitely many terms. As
is common notation, F[x] is the field F adjoining the variables x. Furthermore, if I := {qi}ni=0
are a collection of n + 1 polynomials of the variables x whose co-efficient come from F, then
〈I〉 is the ideal generated by such polynomials. We use the notation F[x] mod 〈I〉 to mean
the polynomial field F[x] quotient our polynomial ideal. For the remainder of this section, we
will always take F = Q so we are dealing with polynomials that have rational coefficients —
this allows for computation in exact arithmetic. It is clear from the equations derived in the
former chapter that selecting IFS parameters with rational coefficients will always yield rational
moments of the invariant measure.
There are ‘good’ and ‘bad’ generators for a polynomial ideal. A ‘good’ generating set is one
for which certain information of the ideal is easy to deduce from its generators, such as the
determination of a (linear1) basis for the quotient ring Q[x] mod 〈I〉. A Grobner basis is
one such collection of generators and may be produced computationally through Buchbergers
algorithm [Buc76]. To properly define a Grobner basis, a total term ordering on monomials is
needed.
Definition 4.4. (total ordering)
Given the set of monomials in n variables
T n := {xi11 . . . xinn | i1, . . . , in ∈ N0},
a total ordering ≺T is an operation on a pair of elements in T n such that:
(i) 1 ≺T t ∀t ∈ T n \ {1};
(ii) if t1 ≺T t2 then t1t ≺T t2t ∀t ∈ T n.
That is 1 is the smallest element in T n and the ordering respects monomial multiplication.
We have implicitly used this definition in creating the map τ : N20 → N0 in the previous chapter,
using graded lexicographic ordering defined below with x ≺grlex y. It is useful to introduce the
function LT≺T : Pn → Pn that returns the largest monomial term in a polynomial q with
its field co-efficient according to the ordering ≺T . This is amply called the leading term of a
polynomial (under the ordering ≺T ).
1In the sense of linear algebra, opposed to a polynomial basis where the coefficients themselves are permitted
to be polynomials.
4.1. ALGEBRAIC SOLUTION TO POLYNOMIAL SYSTEMS 51
Example 4.5. Two possible examples of total orders are:
1. lexicographic ordering, ≺lex;
xi11 . . . xinn ≺lex x
j11 . . . xjnn if il < jl for the largest l ∈ {1, . . . , n} such that il 6= jl.
2. graded lexicographic ordering, ≺grlex;
xi11 . . . xinn ≺grlex xj11 . . . xjnn if
∑nl=1 il <
∑nl=1 jl. In the case of equality, lexicographic
ordering is used.
For an explicit example, when n = 2 we have x41 ≺lex x2 and x2 ≺grlex x41.
Remark 4.6. Total orderings are often regarded as artificial constraints for computation, as
the theory only needs the partial ordering on monomials through their divisibility. Whilst this is
mathematically true, the ordering chosen plays a great role in the simplification of a polynomial
system. It is a known fact that it is computational more expensive to work with lexicographic
ordering in comparison to graded lexicographic ordering.
A negative neighbour of a monomial∏ni=1 x
jii is another monomial of the form xjk−1k
∏i 6=k x
jii
for jk > 0 and i ∈ {1, . . . , n}. A set of monomials is called closed if each monomial in the set
contains all of its negative neighbours.
Definition 4.7. For a specified closed set V ⊂ T s, the border set is B(V ) = {xj ∈ T s : xj /∈V,but some negative neighbour of xj
′ ∈ V }.
The definition given above is all that is needed to define a (border) Grobner basis. Before we
give such a formulation, an example of some border sets and closed sets are provided.
Example 4.8. Consider the closed monomial sets in two variables x1, x2:
{1, x1, x2, x1x2} {1, x1, x21, x31}.
To visualise these sets and deduce the corresponding border set, a lattice plot is enlightening.
x11 x21 x31 x41
x12
x22
x32
1 x11 x21 x31 x41
x12
x22
x32
1
Figure 4.1: Two closed sets of monomials (large circles) and their corresponding border sets
(diamonds). Additional details of the figure will be described later.
52 CHAPTER 4. THE INVERSE PROBLEM OF MOMENTS
We may now define a Grobner basis.
Proposition 4.9. Let a polynomial ideal 〈I〉 be given. Then there exists a unique closed set V
for a specified term order ≺T such that the polynomials
gk := zk − (zk mod 〈I〉) ∀zk ∈ B(V ),
form a generating set to the ideal 〈I〉. This set of generators we define as the (border1) Grobner
basis of 〈I〉.
As we are using the this proposition for applied use, we will show an example and mention how
one can construct a proof based on the example, opposed to providing a proof explicitly — this
can be found in [Ste04].
Example 4.10. Take the polynomial system of two variables R = {r1(x1, x2) = x21 + 4x22 −4, r2(x1, x2) = 9x21 + x22 − 2x2 − 8}, and define the ideal 〈R〉. The zero sets of r1 and r2
(x1, x2 such that r1 = r2 = 0) represent two ellipses in the (x1, x2)-plane. The solution of this
polynomial system corresponds to how the ellipses r1 = 0, r2 = 0 intersect one another, where
for this example they are constructed so that there are four distinct points of intersection. We
may now compute explicitly the border Grobner basis for graded lexicographic ordering. We
do this through displaying a simplified version of the general algorithm in which this can be
computed.
Let lcmr1,r2 be the least common multiple of the leading monomials of r1 and r2, then
Sr1,r2 :=lcmr1,r2
LT≺grlex(r2)
r2 −lcmr1,r2
LT≺grlex(r1)
r1 = 4r2 − r1 = 35x21 − 8x2 − 28.
The polynomial Sr1,r2 is a syzygy — a non-trivial polynomial combination of r1 and r2. To com-
pute a Grobner basis, one forms all such syzygy’s from the initial generators of the ideal. After
this has been completed, the syzygy’s are reduced through multivariate polynomial division —
repeating this process in a specific manner is Buchbergers algorithm for finding a Grobner basis.
For the sake of brevity we reduce our example directly. CombineSr1,r235 := g1 with r1 to gain
the polynomial g2 := x22 − 8x235 −
45 . An elementary argument shows 〈r1, r2〉 = 〈g1, g2〉. Define
g3 = x2g1 and g4 = x1g2, then 〈g1, g2〉 = 〈g1, g2, g3, g4〉. Let V be the border set displayed
in Figure 4.1 on the left, then through uniqueness {gk}4k=1 are the generators for the border
Grobner basis of the ideal R.
Although the border Grobner basis is unique for a specified term order, not all of its polynomials
are needed to generate the ideal. In the instance above, we saw 〈g1, g2〉 generated R — this is the
1A Grobner basis need not be minimal and therefore not unique, thus in the phrasing ‘the’ we have actually
described a ‘border Grobner basis’ that does enjoy these qualities, up to normalisation of the leading term. A
particular subset of this border set, referred to the corner set, is a called a reduced Grobner basis and what is
most commonly used for computation.
4.1. ALGEBRAIC SOLUTION TO POLYNOMIAL SYSTEMS 53
reduced Grobner basis, its generators leading monomials are marked with crosses in the Figure
4.1. This is the corner set of the border set, defined as being the border elements whose negative
neighbours are all contained in the closed set V . If we did the same sequence of calculation just
lexicographic ordering, we would recover the image on the right.
To see why a Grobner basis is a ‘good’ generating set, an analysis of Figure 4.1 will be given.
Let a polynomial r ∈ Q[x1, x2] be given and assume the graded lexicographic ordering on
monomials. Assume r is normalised in the sense that its leading term has co-efficient one. We
wish to represent r in the quotient ring Q[x1, x2] mod 〈R〉. With reference to Figure 4.1 we
have the two cases to consider.
1. If LT (r) ≺ LT (gk) for some gk then r = r mod 〈R〉. If this were not the case, then there
would exist some zk ∈ B(V ) such that lcm(zk, LT (k)) = zk, so the generator gk = zk − zkmod 〈R〉 could be replaced by the polynomial r, altering the closed set V . This contradicts
the uniqueness of the border Grobner basis.
2. If not above, then there exists a generator gk that can reduce r by polynomial division.
The leading term of r marked on the lattice given in Figure 4.1, would have its remainders
leading term strictly less than the monomial that divided it (this ordering is given by
dashed black lines on the figure). The algorithm will run until the remainder of such
divisions obeys the first case, in which scenario the representation of r mod 〈R〉 has been
determined.
The monomials shaded in blue in Figure 4.1 can all be reduced to the closed set V through
either the border Grobner basis or the (more minimalist) reduced Grobner basis. The
monomials that may be reduced by an element of the reduced Grobner basis are encapsu-
lated in the dashed blue lines in Figure 4.1.
Thus, the closed set V defines a (linear) basis for the quotient ring Q[x] mod 〈I〉 [Ste04].
This quotient ring is finite dimensional precisely when V is. Solving a polynomial system is
computationally feasible when the quotient ring is finite dimensional — or equivalently having
finitely many solutions to the polynomial equations [Ste04]. Some self-similar measures are
degenerate in this regard.
Example 4.11. Lebesgue measure supported on [0, 1] will always have a one-dimensional so-
lution in its solution set for the polynomials given by Equation 4.1 when N = 2. Consider the
family of IFS(s) with probabilities
Mα = {([0, 1], | · |); f1(x) = αx, f2 = (1− α)x+ α; p1 = α, p2 = 1− α}
whose attractive measure is unit supported Lebesgue measure for any α ∈ (0, 1).
Intuitively, selecting a fractal measure whose support has fewer affine-symmetries than a straight
line should yield a problem that we may readily solve. Referring back to the first of our examples
54 CHAPTER 4. THE INVERSE PROBLEM OF MOMENTS
presented in Chapter 2, a Cantor set equipped with uniform measure — that has only finitely
many affine IFSs of two maps which construct it through the current standard of the graduate
student algorithm — will be explicitly solved as our example. In light of Example 3.30, this
produces the following polynomial system {qn(a,b, p)}5n=0:
q0 = p1 + p2 − 1;
q1 = a1p1/2 + a2p2/2 + b1p1 + b2p2 − 1/2;
q2 = 3a21p1/8 + a1b1p1 + 3a22p2/8 + a2b2p2 + b21p1 + b22p2 − 3/8;
q3 = 5a31p1/16 + 9a21b1p1/8 + 3a1b21p1/2 + 5a32p2/16 + 9a22b2p2/8 + 3a2b
22p2/2 + b31p1 + b32p2 − 5/16;
q4 = 87a41p1/320 + 5a31b1p1/4 + 9a21b21p1/4 + 2a1b
31p1 + 87a42p2/320 + 5a32b2p2/4+
9a22b22p2/4 + 2a2b
32p2 + b41p1 + b42p2 − 87/320;
q5 = 31a51p1/128 + 87a41b1p1/64 + 25a31b21p1/8 + 15a21b
31p1/4 + 5a1b
41p1/2 + 31a52p2/128
+ 87a42b2p2/64 + 25a32b22p2/8 + 15a22b
32p2/4 + 5a2b
42p2/2 + b51p1 + b52p2 − 31/128
For readability, the explicit Grobner basis polynomials are not shown but are presented in the
accompanying Jupyter notebook. Appealing to the use of symbolic computing, our system has
the following corner set with the graded lexicographical ordering given by p2 ≺lex p1 ≺lex a2 ≺lex
a1 ≺lex b2 ≺lex b1 which we will denote pi ≺lex ai ≺lex bi from now for ease.
{a41a22, a21a42, a21a
22b1, a42b1, a21b
32, a22b
22p2, a41b2, a21a
22b2, a22b2p
22,
a42p2, a22b21, a21b1b2, b32p2, b21b2, b1b
22, a21p2, b1p2, p1}.
Note that the monomials {pn2}∞n=1 are not ‘bounded’ by the leading monomials of our Grobner
basis. This result is unintuitive as it states that we have infinitely many solutions to our
polynomial system and thus infinitely many ways to construct uniform Cantor measure with
two maps. This statement is false. Currently, the polynomial system of equations given address
the unconstrained inverse problem. It is true that any IFS parameters that produce a Cantor
set will be a solution to the system of polynomials {qn(a,b, p)}∞n=0; the converse of this is false,
as evinced by the following observation.
Observation 4.12. The infinite polynomial system {qn(a,b, p)}∞n=0 has the following N − 1
dimensional zero-set
{ai, bi, pi ∈ R
∣∣∣ ai ∈ {−1, 1}, bi = 0, ∀i ∈ {1, . . . , N};N∑i=1
aipi = 1}
contained in their combined solution set.
Selecting ai /∈ (−1, 1) contradicts the maps fi being contractive. Currently, this observation
kills any numerical feasibility of using these moment equations for modelling purposes. The
moment method of fractal reconstruction is known to be notoriously unstable in computation
4.1. ALGEBRAIC SOLUTION TO POLYNOMIAL SYSTEMS 55
[HM90, Man13]. If we were to naively look for a least squares solution to our polynomial system
for measure reconstruction, one of the uncountably many ‘trivial’ zeros given above may be
found. We cannot restrict an optimisation to consider a ∈ (−1, 1) as this an open set — where
numerical methods typically require minimally the assumption of a closed set for convergence
theorems to hold.
Let P be a polynomial system in n variables x1, . . . , xn. Adding the polynomial qt(xi, t) =
(xn − a)t − 1 for a newly introduced variable t and constant a ∈ Q to this system guarantees
that xn = a is not a solution to the system of equations. Furthermore, any solution of P that
does not have xi = a remains a solution in the new polynomial system. Finally, when the under-
lying field has an ordering on its elements (such as R, Q, but not C) then adding the polynomial
qt(xi, t) = (xi−a)t2−1 forces the constraint xi > a. With these points made, our polynomial sys-
tem can be turned into a constrained problem through the addition of these constraint variables.
Unfortunately, placing all of the constraints that on our polynomial system to enforce ai ∈ (−1, 1)
prevents the computation being completed in any reasonable time1. The trivial solutions found
in Observation 4.12 can be parameterised by ai = ±1 granting only the additional constraint
polynomials qi1(ai, ti1) := (ai − 1)ti1 − 1 and qi2(ai, ti2) := (ai + 1)ti2 − 1 for all i ∈ [N ] need
to be added to eliminate this one-dimensional (trivial) solution set. As our system is growing
in total degree with the increase of the order of moments, choosing any type of lexicographic
ordering will results in large computation time, so using a graded ordering was an intuitive
choice. Explicitly, we may select the ordering bi ≺ ai ≺ ti ≺ pi and use graded lexicographic
ordering yielding the simple set of equations as the corresponding reduced Grobner basis:
a2b2 − a22 + b22 − b2 + 1
6a12 + a2
2 + b1 + b2 − 19a28 + t21 + 9
89a28 + t22 − 9
89a18 + t11 + 9
89a18 + t12 − 9
8
a21 − 19 a22 − 1
9
p1 − 12 p2 − 1
2 .
The system above can be easily solved directly, so we do not continue with the work presented
in [MT01]. We will make mention of the technique they give for when there are finitely many
solutions to the polynomial system, to show this procedure could be used for more complicated
fractal measures. When the quotient ring is zero dimensional, then the common zeros of the
polynomials in the system can be enumerated as {zi}di=1, where zi ∈ Qn. For the sake of simplic-
ity we assume that these are simple zeros, that is there are no repeated roots of the polynomials.
Then one can represent the polynomials in our quotient ring with the Lagrange interpolating
polynomial basis, whereby the selection for the basis Lbi is chosen such that Lbi(zj) = 1i=j .
Informally, one may think of the quotient ring as a d-dimensional space; the linear basis of this
1This computation was left for one week without completing. The only bounds given for algorithms to produce
a Grobner basis are ‘super exponential’.
56 CHAPTER 4. THE INVERSE PROBLEM OF MOMENTS
space given by the Lagrange polynomials forms the ‘standard unit basis’ {ei}di=1.
It follows that multiplication by a monomial term in the quotient ring using this Lagrange basis
can be represented by a diagonal matrix, and so the multiplication matrices given in any linear
basis of this space are related to the Lagrange basis by an eigenvalue decomposition. An in
depth discussion of this can be found at [MT01, Ste04], but the key detail is that once the
monomial multiplication matrices are identified (with some additional mild assumptions on the
border basis), the solution of the system can be obtained through finding the eigenvectors of the
multiplication matrix given from any basis, which are easily deduced by a Grobner basis.
With this stated, the solution of our Cantor set inverse problem is readily solved. Let FCantor =
{Fi = {([0, 1], | · |); f1 = a1x+ b1, f2 = a2x+ b2}}8i=1 where the coefficients of each Fi are given
in the table below:
F1 F2 F3 F4 F5 F6 F7 F8
a113
13 −1
3 −13
13
13 −1
3 −13
a213
13
13
13 −1
3 −13 −1
3 −13
b1 0 23 1 1
3 0 23
13 1
b223 0 0 2
3 1 13 1 1
3
It is immediate that all of these solutions will give a Cantor set, and furthermore, uniform
Cantor measure when the maps are given equal probabilities. This gives the following corollary
through realising that the solutions given above are not only the rational solutions, but all of
the real ones as well, noting the simplicity of our reduced polynomial system.
Corollary 4.13. The only affine IFSs of two maps that produce uniform Cantor measure are
those listed in FCantor. Furthermore, all IFSs of N maps that produce uniform Cantor measure
are scaled copies of these in the sense if:
µC =N∑i=1
piµC(f−1i ),
then fi(x) = 13n (ajx+bj)+c for aj , bj in the table above, n ∈ {0, . . . , N−2} and c ∈ FN−1Cant ({0, 1}).
Note this includes IFSs that do not obey the OSC.
The methodology here answers a related question to that posed in Example 3.1 exactly and
systematically. This is a constructive method that can be performed with modern computing
power to find all IFSs of N maps with probabilities for a given attractive measure. Asking ‘how
many ways one can construct a Sierpinski triangle’ motivated the discovery of the generalised
Sierpinski triangles which have since appeared in [SW18, BV18, Ste18, Gra18]. It would be
interesting to study questions such as ‘when is an attractive measure uniquely represented by
an IFS with probabilities of N maps’, through analysing the polynomial systems given by the
4.2. HAUSDORFF DIMENSION AND THE OPEN SET CONDITION 57
moments. This example reveals a new tool in fractal geometry through this link to computational
algebraic geometry.
Observation 4.14. There are infinitely many ways to form uniform Cantor measure using an
affine IFS with probabilities of three or more maps. For instance, the family :
MP =
{([0, 1], | · |); f1 =
x
3, f2 =
x
3+
2
3, f3 =
x
3+
2
3; p =
(1
2, α,
1
2− α
)}for α ∈ [0, 12 ] has an attractive measure of µC . However, there are only finitely many ways to
construct uniform Cantor measure with N maps when the underlying IFS obeys the OSC. We
investigate the OSC in the next section.
As previously commented, the space P(X) is ‘larger’ than H(X). The observation given above
shows that solving the inverse problem in P(X) in an unconstrained manner is exceptionally
difficult. Our original modelling goal was to find a computational way to find an IFS for a given
self-affine set (an element of H(X)), a relationship between H(X) and P(X) must be established.
Assume a black and white image of a Cantor set (of infinite resolution) is given. To model this,
it can be viewed as either an element of H(X) or an empirical distribution of a measure in P(X).
There should be some choice of the additional parameters of the probabilities to restrict IFSs
acting on P(X) so that the IFS on H(X) is such that their attractors model the same black
and white image. In general the answer to this unknown [BRS16]. When the IFS consists of
similitudes and obeys the open set condition, an answer is available. To explain this relationship,
a more in depth understanding of the OSC, tiling theory and Hausdorff dimensions is needed.
4.2 Hausdorff Dimension and The Open Set Condition
In our introductory modelling chapter we introduced the concepts of box-counting dimension and
the open set condition at a very shallow level to reveal considerations taken in traditional fractal
modelling. In this section, a deeper insight into these topics is given. This was intentionally
done so that we may reveal the interplay between H(X) and P(X) from a modelling perspective
now that both of these spaces have been introduced. For clarity we will limit the form of the
IFSs discussed here to only contain functions which are similitudes.
Firstly, a constructive view point of the OSC is shown, then the Hausdorff dimension defined
and finally combining these theories to unite H(X) and P(X).
4.2.1 The Open Set Condition
Recall an IFS, F = {(X, d); f1, . . . , fN}, obeys the open set condition if there exists a non-
empty open set U ⊂ X such that⋃Ni=1 fi(O) ⊂ O and fi(O) ∩ fj(O) = ∅ (Definition 2.7). From
a constructive point of view, one may equivalently state the second condition of the OSC as
O ∩ f−1i fj(O) = ∅ (for invertible fi) for i 6= j, that is all the points f−1i fj(x) for x ∈ O cannot
58 CHAPTER 4. THE INVERSE PROBLEM OF MOMENTS
be contained in O if this is to be a suitable open set. Any point x ∈ X is forbidden when any
open set that contains x is not a suitable candidate to fulfil the OSC. All elements of the sets
f−1i fj(A) for i 6= j in [N ] are forbidden through a straightforward calculation of the second
requirement of the OSC. Through identical reasoning, this construction generalises to f−1i fj(A)
for i, j ∈⋃n∈N[N ]n when i1 6= j1 — as to exclude the entire attractor being forbidden. Finally,
a constructive interpretation of the OSC can be presented through the following definition
[BHR06].
Definition 4.15. The maps in
N ={f−1i fj | i, j ∈ [N ]∗, i1 6= j1
},
are called the neighbour maps for the IFS F . Furthermore the sets h(A) for h ∈ N are the
neighbour sets. Define the central open set as
Uc = {x | d(x,A) < d(x,H)},
where d(x, S) = infs∈S{d(x, s)}.
The central open set is clearly open and, by definition, stays away from the forbidden points
formulated. It is shown in [BHR06] that this construction is a feasible open set for the OSC if
and only if a IFS obeys the open set condition, otherwise Uc = ∅. The definitions given above,
and their relation to tiling theory, are more easily digested through an explicit example.
Example 4.16. Take the IFS F = {(C, |·|); f1(z) = z2 , f2(z) = z+1
2 , f3(z) = 2z+√3i+1
4 }, then the
attractor is an equilateral Sierpinski triangle contained in the unit square with unit side lengths.
A first approximation to this attractor is shown in black below, with the central open set in
blue. The neighbour sets which intersect the attractor are coloured in red. Correspondingly,
the images of the central open set under the neighbour maps that create these neighbour sets
are shaded in red.
Uc
f−11 f3f−1
2 f3
f−12 f1
f−13 f1 f−1
3 f2
f−11 f2
Figure 4.2: Sierpinski Triangle hull (black) and the Neighbour Sets that intersect the attractor
(red). Central open set shaded in blue and images of Uc under the listed Neighbour Maps shaded
red.
4.2. HAUSDORFF DIMENSION AND THE OPEN SET CONDITION 59
This example may foster some naive conjectures on the properties of the central open set. For
instance, the set above is connected, convex, bounded and has a smooth1 boundary. These
qualities are generally untrue for the central open set as shown through the following two coun-
terexamples. The central open set for the Twin Dragon IFS [BD85] trivially has non-smooth
boundary as it is the interior of the Twin Dragon fractal. The other properties are seen in the
example below.
Example 4.17. Consider an IFS acting on (C, | · |) with four maps:
f1(z) =z
4f2(z) =
z
4+
1 + i
4f3(z) =
z
4+ 2 + 2i f4(z) =
z
4+
9 + 9i
4
Take the convention z = <(z) + =(z)i for <(z), =(z) ∈ R. It follows that the attractor of this
IFS is A = {z : <(z) = =(z), <(z) ∈ [0, 1] ∪ [2, 3]} and also obeys the open set condition.
The neighbour sets of this IFS are completely contained on the line <(z) = =(z) through a
straightforward calculation. Furthermore the neighbour sets union to be this line set-minus the
attractor. Below is the attractor (coloured according to the partitioning given by Ai := fi(A)
for i ∈ [4] ) and the corresponding central open set shaded in blue.
<(z)
=(z)i
3
3
1
1
2
2
Explicitly,
Uc = {(<(z),=(z)) : <(z) +=(z) > 0, <(z) +=(z) < 2 or <(z) +=(z) > 4, <(z) +=(z) < 6}.
In this example Uc is not connected, convex or bounded.
A property of the central open set which holds generally it that it has µA measure one when it
is non-empty for any associated IFS of probabilities.
Proposition 4.18. Let F be an IFS that obeys the OSC and has central open set Uc 6= ∅. Let
M be an associated IFS with probabilities whose invariant measure is µA. Then µA(Uc) = 1.
1Integer fractal dimension boundary.
60 CHAPTER 4. THE INVERSE PROBLEM OF MOMENTS
Proof. The only points in Uc ∩ A are those which exist in the attractors dynamical boundary
defined by
∂A :=∞⋃k=1
F−k(C) ∩A,
where C is the set of overlap of the attractor A, C :=⋃i 6=j fi(A) ∩ fj(A).
Informally speaking, the set of overlap is where the pieces, {fi(A)}Ni=1, ‘just touch’ each other
(intuitively, look at the images of the vertices of a Sierpinski triangle under its three maps).
Taking the pre-image of all such points considers all of the points for which fi(A) ∩ fj(A) =
A ∩ f−1i fj(A) 6= ∅ for finite strings i, j =⋃n∈N[N ]n.
Let a probability vector p be associated with the IFS F . Then the Bernoulli probability mea-
sure ([N ]N,B([N ]N), νp) has the νp-null set π−1(A \ ∂A). To see this we must consider the set
of disjunctive sequences in [N ]N. The set of disjunctive sequences D [BBHV16] are those which
have a dense orbit under the shift map S : [N ]N → [N ]N, or equivalently the infinite sequences
that contain every finite string as a sub-string. Fix a finite string i ∈ [N ], then through a
straightforward calculation the set of words in [N ]N that contain this as a sub-string is a νp
full measure set. Taking the countable intersection over all such finite sub-words grants that
νA(D) = 1 (look at the compliment of each of these sets, countably union over N, apply the
monotonicity of a measure and De-Morgans law to yield νA(D) = 1).
For a point x ∈ D then x /∈ ∂A. Let x ∈ A, then π(ω) = x for some word ω which we do not
assume to be unique. In this representation from the commutative diagram in Proposition 3.16
we have that π(Sk(ω)) ∈ F−k(x). If x ∈ ∂A, then all of the iterates π(Sk(ω)) also belong to
∂A. However, if x ∈ D then the orbit under F−1 would be dense but as F obeys the open set
condition we must have that ∂A 6= A [Mor99]. Therefore x ∈ D ⇒ x /∈ ∂A. As the invariant
measure µA is normalised
µA(A \ ∂A) = νp(π−1(A \ ∂A)) ≥ νp(D) = 1,
granting µA(A \ ∂A) = µA(Uc) = 1.
An immediate consequence of the Proposition above is that there is no ambiguity in the state-
ment presented in Chapter 3. So with complete clarity, we may now state that any IFS with
probabilities whose IFS obeys the OSC is isomorphic to a Bernoulli shift.
Corollary 4.19. Theorem 3.18 is an immediate consequence of Proposition 4.18 with identical
working given in Proposition 3.16.
This theory on the OSC has a relation to fractal tiling theory. A tile is an element of H(X)
and a fractal tiling is a collection of tiles such that they union to equal the attractor of an IFS.
4.2. HAUSDORFF DIMENSION AND THE OPEN SET CONDITION 61
In this regard, we aim to tile the ambient space X with a tiling system given by an arbitrary
IFS. Figure 4.16 is misleading to the assumption of such a tiling may be created through the
use of neighbour sets. The Sierpinski triangle has many symmetries. These symmetries actually
enforce that any fractal tiling made from this IFS are in some sense isomorphic to one another
[BV18]. Loosely, the neighbour sets are really constructing the set of all possible tiles for all
tilings for a given IFS and when such tilings are isomorphic these tiles perfectly overlay on one-
another — a rigidity property in tiling theory. Thus, neighbour sets are not directly applicable
as this does not hold generally, rather we prove the following theorem.
To continue, we require the following notation.
Notation 4.20. Let F = {(X, d); f1, . . . , fN} be an invertible IFS and θ = (θ1, θ2, . . . ) ∈ [N ]N.
Define f−1θ|i to be
f−1θ|i := f−1θ1 ◦ · · · ◦ f−1θi.
The following theorem does not assume that the fractal attractor has non-empty interior, in
comparison to what is shown in [Vin18]. Therefore, this may be considered a minor extension
of this work.
Theorem 4.21. Let F = {(X, d); f1, . . . , fN} be an invertible IFS that obeys the OSC and whose
central open set Uc is bounded. Then:
1. there exists an IFS F with Condensation1 of N + 1 maps acting on (X, d) that obeys the
OSC and whose attractor is Uc;
2. for θ ∈ [N ]N such that π(θ) /∈ ∂A,
X = limi→∞
f−1θ|i (Uc) =⋃i∈N
⋃j∈[N+1]i
f−1θ|i ◦ fj(Uc); (4.2)
3. for a fixed i ∈ N, the open sets {f−1θ|i ◦ fj(Uc)}j∈[N ]i are disjoint.
In the statement above, f−1θ|i ◦ fj(Uc) will be a tile of non-negligible Lebesgue measure, their
interior being disjoint assures that the overlap of these tiles are negligible in a measure theoretic
sense described in the next section. The first union applied is looking at all the tiles of a certain
size, the second is a union over increasing sequence of sets that limit to the ambient space under
appropriate assumptions. To show the intuition of this proof, we explain an algorithm that
generates a fractal tiling; this algorithm was used to create the tiling photo of a generalised
Sierpinski triangle in [BV18] and used in [Gra18, Ste18] (see Figure A.1).
Let an IFS F = {(R2, ‖·‖2); f1, . . . , fN} be given with attractor A. Associate R2 with an infinite
sheet of plastic with an image of A on it. Choose a point x on the attractor, this point has
1An IFS whose functions are defined fi : B(X) → B(X) not fi : X → X.
62 CHAPTER 4. THE INVERSE PROBLEM OF MOMENTS
minimally one address θ ∈ π−1(x) 6= ∅. Find this point x on the piece of plastic, place a pin in
it, and cut a circular piece of plastic around this point. Now stretch the plastic with equal force
around this circle, whereby the pin is fixed. How much this circle is stretched is parameterised
by i in the above notation and as i → ∞ we stretch the plastic to infinity and create a fractal
blow-up of which we aim to tile. If a point is chosen not according to what is assumed above,
for example select a vertex on a Steemson triangles triangular convex hull, then the fractal blow
up may not expand in all directions of the space X.
Proof. Let Uc be the central open set of F . As F obeys the OSC, this set is non-empty and
contains all the points A \ ∂A by the preceding proposition. Define F = {(X, d); f1, . . . , fN , fc}where fc : H(X) → H(X) given through f(V ) := Uc \ F(Uc): a ‘constant’ function on H(X)
that is well defined when Uc is bounded. If this set is empty, then the IFS F already obeys
the statement given. Make the notation fN+1 = fc. It is immediate that F is a contraction on
H(X) whenever F is, furthermore F(Uc) = Uc and Uc is a suitable open set for the OSC.
As π(θ) := x /∈ ∂A, then x ∈ Uc. There exists ε > 0 such that Bε(x) ⊂ Uc ⊂ Uc. Let λmax be
the maximum contractivity of the functions f1, . . . , fN , then
X∞←i←−−− Bε/λmax
(x) ⊂ f−1θ|i (Uc) =⋃
j∈[N+1]i
f−1θ|i ◦ fj(Uc).
By the construction of Uc not containing any neighbour sets, {f−1θ|i ◦ fj(Uc)}j∈[N ]i are disjoint.
It is untrue {f−1θ|i ◦ fj(Uc)}j∈[N+1]i are disjoint as we have the triviality f−1θ|i ◦ fN+1,j2,...,ji(Uc) =
f−1θ|i ◦ fN+1,j′2,...,j′i(Uc) for jk and j′k distinct in [N + 1]. With additional notation and ordering
one could remove this case, but it is unnecessary here.
More intuitively, the closure of the central open set tessellates the space X given a fractal tiling
system. For instance, a hexagon tiling of R2 can be made from the Sierpinski triangle IFS given
in Example 4.16. Typical fractal tiling systems aim to tile a fractal blow up, this construction
uses the fractal tiling system to tile the ambient space. We use such a construction in analysing
the dimension theory of an IFS attractor.
4.2.2 Hausdorff Dimension
Fractals are sometimes considered to be objects that are ‘rough’. The notion of roughness
typically used is that of non-integer dimensions, made explicit with the aid of the Hausdorff
measure and dimension.
Definition 4.22. Let δ > 0 and E ⊂ X for a metric space (X, d). The Hausdorff content of E
is defined through:
Hαδ (E) := infVi∈B(X)
{ ∞∑i=1
|Vi|α : E ⊂∞⋃i=1
Vi, |Vi| < δ
},
4.2. HAUSDORFF DIMENSION AND THE OPEN SET CONDITION 63
where |Vi| is the diameter of the set given by |Vi| := supx,y∈Vi d(x, y). Furthermore let Hα(E) :=
limδ→0Hαδ (E).
The firstly defined quantity is the α-Hausdorff content of a set E, the second is the called
the Hausdorff measure (which is a positive measure). The first value α for which the Hausdorff
content is finite is the Hausdorff dimension of the set E, that is dimH(E) = infα∈R+{α : Hα(E) <
∞}. This is well defined as the function is decreasing in α. The exact formalities of these points
can be found in most introductory textbooks pertaining to the subject (for instance [Fal04]).
The relationship this has to the box-counting dimension is that here we may select any covering
of a set E, whereby the box counting method allowed only for coverings by boxes. Hence
dimB(E) ≥ dimH(E). To illustrate that this idea coincides with our traditional (topological)
idea of dimension, we will do the following example of a line segment.
Example 4.23. Consider the set A = [0, 1] ⊂ R. We may cover A by dyadic intervals, that is
A =⋃2j
i=1
[i−12j, i2j
)∪{1}. As we have equality, this covering is optimal and disjoint, furthermore
for any δ > 0 we can make the interval width sufficiently small. It is immediate that for d 6= 0
we have that Hd({1}) = 0, thus Hd2−j (A) =
∑2j
i=112dj
= 2j
2djwhich limits to zero with j → ∞ if
and only if d > 1. Therefore A has Hausdorff dimension one through the infimum.
Moran’s Theorem for the box-counting dimension (Theorem 2.17) remains true for the Hausdorff
dimension. Furthermore, the proof of the theorem becomes relatively straightforward with the
theoretic structure that the Hausdorff dimension provides.
Theorem 4.24. [Mor46] Given an IFS that obeys the OSC and consists of N many similitudes
whose scaling factors are {λi}Ni=1, the attractor A obeys dimH(A) = D where∑N
i=1 λDi = 1.
A sketch of this can be made through the observation that if⋃∞i=1 Ui is a covering of a self-similar
set A =⋃Ni=1 fi(A), then fj (
⋃∞i=1 Ui) covers fj(A). The assumption of the OSC grants that
taking the image of a optimal covering under each fj and taking the union over j gives another
optimal covering of A, for instance one might consider using scaled copies of the central open set
(union the dynamical boundary) as a covering. As the functions fj are taken to be similitudes
we have that |fj(Ui)| = λjδ when |Ui| = δ. In the language of the Hausdorff content:
Hαδ (A) = Hαδ
N⋃j=1
fj(A)
=N∑j=1
λαjHαδ (fj(A)) =N∑j=1
λαjHαλiδ (A) .
Where above we have used that the scaled coverings can be made almost-disjoint through the
OSC. Allowing δ → 0 gives Hα(A) =∑N
j=1 λαjHα (A). It is not straightforward to show that
dimH(A) = infα∈R+{α : Hα(A) < ∞} > 0 and can be found in [Sch94] by assuming the OSC,
but once this is obtained the result is immediate.
64 CHAPTER 4. THE INVERSE PROBLEM OF MOMENTS
4.2.3 A Relationship Between H(X) and P(X)
With the machinery developed in the previous two subsections, we may present a view of how
the fractal models in H(X) and P(X) can be related to one-another. This relationship is best
seen through how the dimension theory interplays between fractal attractors and self-similar
measures. We present this relationship in the case of IFSs that obey the OSC and whose
functions are similitudes. Only under these assumptions is the dimension theory for an IFS
attractor concisely shown. To elucidate on this point, a current research paper on the theory
of two-dimensional affine attractors begins with, ‘The dimension theory of self-affine sets and
measures is far from being completely understood’ [BRS16].
The following result is proved in one dimension in both [Fal04] (under the assumption the attrac-
tor is disconnected) and [BRS19], although the geometric proof given in [BRS19] has an error
with their use of the OSC, with the IFS given in Example 4.17 projected onto the <(z)-axis
as a counterexample to their argument. A similar result in n-dimensions with HESC is stated
in [BRS16, BRS18, BRS19] and used as the basis for approximating coverings of an affine IFS.
The following geometric proof is of the writers own creation and uses the OSC and tiling theory
already shown.
Firstly we define the Hausdorff dimension of a measure.
Definition 4.25. The Hausdorff dimension of a measure µ on (X,B(X)) is given by
dimH(µ) := infV ∈B(X)
{dimH(V ) | µ(V ) > 0},
where it may be equivalently stated that if the limit limr→0log(µ(Br(x))
log(r) = α for a fixed α ∈ R+
for µ-almost every x ∈ X then dimH(A) = α [MMR00].
Intuitively, dimH(µ) is the Hausdorff dimension of the smallest1 set of non-negligible µ-measure.
From this definition it is clear that for a self-similar measure, dimH(µA) ≤ dimH(A). Some
preliminary statements are needed before we present this relationship between spaces.
Lemma 4.26. Let F = {(Rn, ‖ · ‖2); {fj}Nj=1} be an IFS of similitudes that obeys the OSC
and M be an associated IFS with probabilities. Let A be the attractor and µA be the attractive
measure. Then, for µA-almost all x ∈ A and for any ε > 0 there exist open sets U1,U2 such
that:
1. Ui = riUc + ki where Uc ⊂ Uc and for some ki ∈ Rn;
2. µA(Uc) = 1;
3. U1 ⊂ Bε(x) ⊂ U2.
1In the Hausdorff measure sense.
4.2. HAUSDORFF DIMENSION AND THE OPEN SET CONDITION 65
Proof. Let x ∈ A \⋃i∈NF i(C), from Proposition 4.18 this is a µA-full measure set from which
x is taken. From this choice of x we have that x ∈ Uc, thus Bε(x) ⊂ Uc as Uc is open which up
to renormalisation and translations gives Bε(x) ⊂ U2.Furthermore, x has a unique code-space address: π−1(x) := ω ∈ [N ]N through the injective
coding map π−1 : A \⋃i∈NF i(C) → [N ]N. It follows that fω(Uc) = x, so x ∈ fω|n2
(Uc) ⊂fω|n1
(Uc) for n2 ≥ n1. If necessary, redefine Uc := Uc = {x | d(x,A) < min{1, d(x,H)}} to
ensure this set is bounded. The diameter of |fω|n(Uc)| = |∏i∈ω|n λiUc|
n→∞−−−→ 0 as we constructed
|Uc| < ∞. Select n large enough such that r1 =∏i∈ω|n λi has U1 ⊂ Bε(x) and the result is
given.
Lemma 4.27. Let Uc be defined as in the preceding lemma. Then the set fω|n(Uc) has µA-
measure∏i∈ω|n pi.
Proof. From Proposition 4.18, we have that µA(A \⋃∞j=0F j(C)) = 1. Recall that µA obeys the
fixed point equation∫A1fi(Uc)(x)dµA(x) =
∫A\
⋃∞j=0 Fj(C)
1fi(Uc)(x)dµA(x) =pi
∫A\
⋃∞j=0 Fj(C)
1fi(Uc)(fi(x))dµA(x) =pi.
Where above we have used the construction of the central open set to ensure that 1fi(Uc)(fj(x)) =
1i=j for any x ∈ A \ F(C). As µA(Uc) = 1 then µA(fi(Uc)) = pi. Through a simple induction
the result is given.
Notation 4.28. A convenient notation is given through introducing the (injective) function
ω : A \⋃∞j=0F j(C)→ [N ]N, where this mapping is defined component-wise through
(π−1(x))i = ωi(x) =N∑j=1
1π(j)(F−1)i−1(x) · j j ∈ [N ],
and F−1 denotes the inverse IFS to F and (F−1)i−1 its iterates. In the one-dimensional case,
this can be thought of as the digit map of a dynamical system — such as deducing the binary
expansion of a number in the unit interval from the IFS in Proposition 3.16.
Before reading the following proof, a geometric interpretation is given. We aim to calculate the
point-wise dimension of a self-similar measure µA-almost everywhere. We assume the OSC, so
one may create a tiling around a point x ∈ A\⋃∞j=0F j(C) in the fashion given by Theorem 4.2.
Two such (finite) tilings are created, one for the fractal attractor and another for the extended
IFS whose attractor is Uc that tiles the ambient space. These tilings are over-layed on one
another. Intuitively, we have broken up the attractor into several fractal tiles of which it is
easy to find the µA measure individually for. Furthermore, a covering of these fractal tiles by
non-overlapping open sets is provided by our secondary ambient space tiling. In this instance,
we know precisely the size of such tiles and can use this relationship to compute the dimension
of dµA(x).
66 CHAPTER 4. THE INVERSE PROBLEM OF MOMENTS
Theorem 4.29. Let F = {(Rn, ‖ · ‖2); f1, . . . , fN} be an IFS that obeys the OSC and has
functions of the form fi(x) = λiOix + bi where λi < 1, Oi is an isometry on Rn and bi ∈ Rn.
Associate this IFS with probabilities p = {pi}Ni=1 to create an IFS with probabilities M. Then
the Hausdorff dimension of the attractive measure µA is given by
dimH(µA) =
∑Ni=1 pi log(pi)∑Ni=1 pi log(λi)
. (4.3)
Proof. Fix x ∈ A\⋃∞j=0F j(C), then we aim to evaluate the point-wise dimension of the measure
µA:
dµA(x) = limε→0
log(µA(Bε(x)))
log(ε).
By Lemma 4.26 we can control the convergence of this value by
dµA(x) = limε→0
log(µA(U1)))log(|U1|)
= limn→∞
log(µA(fω|n(Uc)))
log(|fω|n(Uc)|).
Where |fω|n(Uc)| is the diameter of the set |fω|n(Uc)|. We can evaluate this quantity directly
through
dµA(x) = limn→∞
log(µA(fω|n(Uc)))
log(|fω|n(Uc)|))= lim
n→∞
log(∏ni=1 pωi)
log(|Uc| ·∏ni=1 λωi)
,
where above we have used that similitudes uniformly scale length in all directions. If we did
not have this assumption, then we would not be able to directly evaluate this limit, which is
the source of difficulty in calculating the dimension of an affine IFS. In light of Notation 4.28
it is the case we have that 1π(j)((F−1)i−1(x)) = 1 if and only if ωi = j for j ∈ [N ]. The
function x →∑N
j=1 pj1π(j)((F−1)i−1(x))) = pi for all i ∈ N and as the pj < ∞ this function is
in L1(X,B(X), µA). Continuing with our calculation;
dµA(x) = limn→∞
log(∏ni=1
∑Nj=1 pj1π(j)((F−1)i−1(x))))
log(|Uc| ·∏ni=1
∑Nj=1 λj1π(j)((F−1)i−1(x))))
= limn→∞
1n
∑ni=1 log(
∑Nj=1 pj1π(j)((F−1)i−1(x))))
1n
∑ni=1 log(
∑Nj=1 λj1π(j)((F−1)i−1(x)))) + 1
n log(|Uc|)
=
∫log(
∑Nj=1 pj1π(j)(z))dµA(z)∫
log(∑N
j=1 λj1π(j)(z))dµA(z). (Birkhoff’s Ergodic Theorem)
the quantity on the numerator is the entropy of µA. It is shown in [Wal00] Corollary 4.9.1 that
this value is finite and so the above limit exists as the denominator is clearly bounded from zero.
What is left is the quotient of integrals that can be readily evaluated through noting
∫log
N∑j=1
cj1π(j)(z)
dµA(z) =
N∑i=1
pi log(ci)
4.2. HAUSDORFF DIMENSION AND THE OPEN SET CONDITION 67
for constants cj > 0: granting
dµA(x) =
∑Ni=1 pi log(pi)∑Ni=1 pi log(λi)
.
The above value is constant for µA-almost any x, giving the result. This quantity is the entropy
of µA divided by its Lyapunov exponent.
Observation 4.30. Making the selection pi = λDi , where D is the Hausdorff dimension of A
grants that:
dimH(µA) =
∑Ni=1 λ
Di log(λDi )∑N
i=1 λDi log(λi)
= dimH(A).
Furthermore, the function p 7→∑N
i=1 pi log(pi)∑Ni=1 pi log(λi)
is concave in pi ∈ (0, 1) with the constraint∑Ni=1 pi = 1 [Fal04], Proposiotin 17.8. A brief explanation of this is that the (negative of
the) numerator can be viewed as a entropy, which is always concave. The function on the de-
nominator is both concave and convex as a linear function in the pi. The quotient of these
functions is typically quasi-concave which in this instance may be lifted to concavity. One may
also calculate the second derivative directly in one variable for the case N = 2 (through noting
p2 = 1− p1) and check it is negative.
In particular, this function has a unique maximum value which is achieved for the selection
pi = λDi as it was previously stated dimH(µA) ≤ dimH(A). Hausdorff dimensions can be
estimated by box-counting dimensions, so making such a choice in computation for modelling a
fractal object is possible.
The result above is what grants the link between P(X) and H(X) in a modelling context, where
a black and white image of a fractal attractor can be considered a set or ‘uniform’ measure on
such a set. Any IFS of similitudes that obeys the OSC may be equipped with a unique set of
probabilities such that dimH(µA) = dimH(A). This uniqueness grants the following Corollary.
Corollary 4.31. Let the Cantor set C ∈ H([0, 1]) be given. Then all the affine IFSs of two
maps, F = {([0, 1], | · |); f1, f2}, that obey the OSC and whose attractor is C are listed in FCantor.
Proof. Take an IFS that obeys the OSC and whose attractor is the Cantor set C. Then there
exists a unique set of probabilities to create an IFS with probabilities whose attractive measure
is uniform measure on the Cantor set. ‘Uniform’ measure here is defined to be dimH(µA) =
dimH(A). All such IFSs with probabilities are listed in Corollary 4.13.
The assumption of the OSC can be loosened to a ‘perfectly overlapping IFS’ [BRS18], which
we do not present here for clarity, but can exploit in the following statement. All affine IFSs of
N maps on ((0, 1), | · |) that obey HESC [BRS18] whose attractor is a Cantor set has functions
of the form given in Corollary 4.13. The selection of the pi for a specific instance of IFS with
probabilities that obeys the OSC allows for the modelling of objects in H(X). This will be done
explicitly in the next computational section.
68 CHAPTER 4. THE INVERSE PROBLEM OF MOMENTS
4.3 The Collage Theorem for Moments
The iterative formula created in Chapter 3 for the calculation of moments of a self-similar
measure serves another purpose. Namely, as we extracted a contraction mapping on 1`∞ —
under certain conditions — we recover another form of the Collage Theorem for moments.
Theorem 4.32. Let M be an IFS with probabilities of the form given in either Theorem 3.53
or 3.55. Then, its associated vector of moments M∗x,y obeys the inequality:
‖M −M∗x,y‖g ≤‖ΦM −M‖g
1− λ,
where M ∈ 1`∞ and λ is the contractivity factor of the operator Φ : (1`∞, ‖ · ‖g)→ (1`∞, ‖ · ‖g).
This Theorem has the following interpretation for modelling: given a vector of moments M , we
want to find a contractive operator Φ whose fixed point M∗ is close to M . The bound states that
if M is almost a fixed point of Φ, then the actual fixed point of Φ is near M in the ‖ · ‖g-norm
sense.
Proposition 4.33. Let M be an IFS of the form of the above, then under the norm ‖ · ‖g we
have Mx,y ∈ c0: the space of convergent sequences in (1`∞, ‖ · ‖g) approaching zero.
Proof. Select the ‖ · ‖g norm such that the operator Φ is contractive and the attractor of the
IFS Fg◦f◦g−1 is (strictly) contained in the ‖ · ‖∞-unit ball — such a choice is always possible as
the scaling of the function g can be arbitrarily close to zero and continue to make Φ contractive.
We may either view Mx,y ∈ (1`∞, ‖ · ‖g) as the vector of moments for µA or Mx,y ∈ (1`∞, ‖ · ‖∞)
as the vector of moments for µA ◦ g−1 in light of Lemma 3.48. Then for n1, n2 ∈ N0,∥∥∥∥∫ xn1yn2dµg(A)(x, y)
∥∥∥∥∞≤∫‖xn1yn2‖∞dµg(A)(x, y) ≤ max
x,y∈g(A)‖xn1yn2‖∞
n1+n2→∞−−−−−−−→ 0,
where we have used µA is normalised and any points on g(A) are less than one. Through the
(graded lexicographic) ordering chosen in the vectorisation map v, the claim is given.
Corollary 4.34. Let M be of the form assumed above. Let M∗x,y be the vector of moments of
the invariant measure µA that is in 1`∞. Let M ∈ 1`∞ and Mn be its nth level truncation chosen
such that n = bnct, then we gain the following chain of bounds:
‖M∗x,y −M‖g ≤‖ΦMM −M‖g
1− λ≤‖ΦM,nMn −Mn‖g
1− λ+ ε(n) ≤ C(λ)‖ΦM,nMn −Mn‖2 + ε(n).
for some constants ε, C > 0 that depends on the truncation level n and contractivity λ. Where
ε can be made small with increasing n.
Remark 4.35. The above Corollary firstly uses the Collage Theorem for moments, then Propo-
sition 3.52 that states we may appropriately re-scale the space such that the measures considered
have moments that tend to zero. This allows for truncation to finitely many moment terms Mn
4.3. THE COLLAGE THEOREM FOR MOMENTS 69
plus some error term ε > 0 which can be made arbitrarily small with increasing n by Proposition
4.33. Finally, as Mn is finite dimensional, we may swap to the L2 norm and have some con-
stant that depends on the contractivity of the operator Φ. This creates a feasible computational
problem.
The Corollary above gives justification to truncating finitely many moment equations and finding
a least squares solution that does not solve our problem; it simply must be a local-minima of the
moment equations. For instance let M21 be the vector of the first 21 moments for the Steemson
triangle in Example 2.10. A 21-moment approximation of a Steemson triangle found through a
least squares optimisation, where ‖ΦM,21M21−M21‖2 ≈ 6 · 10−6 for the affine approximant and
≈ 9 · 10−5 for the similitude approximant. In this approximation, the maps were intentionally
given the incorrect orientations as to not be an exact reconstruction.
Figure 4.3: Affine (left) and similitude (right) moment approximation of a Steemson triangle.
In particular, we have created an approximation method that does not require the object that we
are approximating to be strictly self-similar but may find an exact solution if it exists (see Figure
4.5). This bound also makes sense of the trivial solutions spotted in the algebraic approach to
solving these moment equations, seen through the following example.
Example 4.36. Recall we had the trivial solutions to the polynomial system in one dimension:
{ai, bi, pi ∈ R
∣∣∣ ai ∈ {−1, 1}, bi = 0, ∀i ∈ {1, . . . , N};N∑i=1
aipi = 1}.
In one dimension we have that |ai| = λi. The bound given by the Collage Theorem for moments
is meaningless for these trivial solutions as the constant C(λ)maxi |ai|→1−−−−−−−→ ∞. This can be seen
through identifying that the contractivity of Φ is λ = maxi∈[N ]{λi} in one-dimension, where
λi are the contractivity factors of the maps fi in the IFS. In the two-dimensional case, trivial
solutions to the moment equations are in even more abundance and may potentially ruin any
numerical experiment run if not appropriately controlled. These solutions cannot be generally
written out as knowing precisely these solutions would imply knowing when an affine IFS is
contractive with respect to its map parameters.
70 CHAPTER 4. THE INVERSE PROBLEM OF MOMENTS
The above shows one caveat that is often left unmentioned in much of the literature, that being
that the quality of the bound given from the Collage Theorem is highly dependent on the operator
Φ. In practice, this is not commonly a problem as the model fit is sufficiently constrained
to, consciously or unconsciously, make sure this bound does not blow up. If this problem is
addressed, then we have the following interpretation of this bound. The vector equation ΦM−Mrepresents our system of non-linear equations, for which (in practical use), is unreasonable to
assume that an exact solution exists. The bound given states that ‖ΦM −M‖g being small for
fixed IFS parameters, fixing Φ will yield an approximate solution, or more explicitly a local-
minima of ‖ΦM −M‖g may produce a ‘good’ approximate. Thus, in a synonymous fashion
to the ‘standard’ Collage Theorem, we explore this interpretation computationally in the next
section.
4.4 Computational Experiments
The following experiments are made available here in Jupyter notebooks. Justification and
results of these computations are shown here for completeness.
4.4.1 One-Dimensional Example with Polynomials
Generally the coefficients of a polynomial system have errors, unlike our Cantor example given
in exact rational arithmetic. If rational computation was not used in our Grobner basis section,
no solutions would have survived the floating point rounding errors. In fact, the floating point
implementation of polynomial division in Python1 returns one (trivial division) for all floating
point numbers that are not in their exact representation as rational numbers (multiples of the
floating point mantissa) and had to be reimplemented for many of our calculations when using
Grobner bases.
For error analysis, we aim to bound the backwards error δ of our polynomial system. The back-
wards error is the smallest distance a rounded input is from solving the set of equations given
an exact input. ‘Input’ here is misleading: the errors in the problem arise from the coefficients
of the polynomial system, so the true way to view ‘input’ is through looking at the dual space
of (finitely many) monomial evaluations.
Assume an exact solution to a single polynomial q in one variable x1 exists and call this z ∈ R.
Then the finite collection of monomials that make a polynomial q can be evaluated exactly at z,
this vector of evaluations is an element of Rn when q is a n− 1 degree polynomial. We now ask
how far away the solution a polynomial q with errors is from giving the correct solution to the
exact problem of q. Different polynomials evaluated at z can be thought of as linear functionals
1Sympy.
4.4. COMPUTATIONAL EXPERIMENTS 71
acting on this vector of monomial evaluations, represented as vectors of the coefficients of the
polynomials. This reasoning generalises to polynomial systems in multiple variables [Ste04].
Naturally, the distance between two elements of the dual space is the induced norm, which can
be any norm ‖ · ‖∗ as we are in finite dimensions. Let Mn be the vector of moments estimated
from an empirical distribution through Elton’s Theorem and Mx,n be the exact values of the
moments. The backwards error, δ ∈ R≥0, is:
δ(n) = ‖Mx,n − Mn‖∗ = O
(1√n
).
The interpretation of this bound is that as the sample size of the empirical distribution increases,
then too does the accuracy of the moment values through Elton’s Theorem, as does the accuracy
of the IFS parameters found in solving the inverse problem at the same rate as the moment values
converge. This can be observed in the Python Notebook given. Finally, when aiming to solve
the least squares problem of ‖ΦM −M‖22 ≈ 0 it is valid to do these computations in floating
point arithmetic as minor changes in the input of polynomial coefficients lead to minor changes
to the value ‖ΦM −M‖22 ≈ 0. This statement is very different to stating that a minor change
in the coefficients leads to a minor change of a polynomial root. For instance, take the real
polynomial q(x) = x2, it has a single (repeated) root at zero. The polynomial q(x) = x2 + 0.001
has coefficients that are close to the coefficients of q in the dual space, yet this polynomial does
not even possess a root in R. With this stated we aim to solve the following system numerically:
arg minai,bi,pi,t1i,t2i∈R
‖ΦnMn −Mn‖22 +
N∑i=1
((ai − 1)t1i − 1)2 + ((ai + 1)t2i − 1)2
=k∑
n=0
N∑i=1
pi
n∑j=0
(n
j
)ajimjb
n−ji −mn
2
+N∑i=1
((−ai + 1)t21i − 1)2 + ((ai + 1)t22i − 1)2.
This least squares objective function — we call Q — is clearly not convex and will be plagued
by several local minima that a standard solver will fall victim to before a global minimum is
obtained. One solution is to use non-convex optimisation techniques such as simulated annealing.
Instead of this, we use a form of preconditioning on our system of equations so that our initial
point of iteration may find a global minima. Note the partial derivatives of ΦnMn −Mn:
∂qn∂pi
=
n∑j=0
(n
j
)ajimjb
n−ji ,
∂qn∂ai
= pi
n∑j=0
j
(n
j
)aj−1i mjb
n−ji ,
∂qn∂bi
=
n∑j=0
(n
j
)(n− j)ajimjb
n−j−ii ,
that can be computed in an efficient vectorised manner. This is implemented in the notebook
provided to efficiently calculate the Jacobian of the constrained polynomial system that rids the
trivial solutions where ai = ±1. Let x0 be a vector in R5N of our IFS parameters with their
constraint variables. To pre-condition the initial starting point, the Fredman-Altman iteration
[Alt80] is used through the update rule:
xk+1 = xk −‖Q(xk)‖22‖JTQQ(xk)‖22
JTQQ(xk) = xk −‖Q(xk)‖22‖JTQQ(xk)‖2
JTQQ(xk)
‖JTQQ(xk)‖2,
72 CHAPTER 4. THE INVERSE PROBLEM OF MOMENTS
where JQ is the Jacobian of the constrained polynomial system. Geometrically the unit vector
of JTQQ(xk) is choosing the direction in which to descend to a local optima. This is not a descent
method, as the value‖Q(xk)‖22‖JT
QQ(xk)‖2will only decrease to zero when the numerator approaches zero
‘sufficiently fast’, thus local minima are avoided as ‖Q(xk)‖22 6= 0 in this instance. When in a
local minima, this value becomes large as the denominator approaches zero forcing the algorithm
to ‘jump out’ of such a minima. Convergence to any minima is slow with this update rule, so
after applying this pre-conditioning method to a point, standard local optima optimisations
can be applied. The polynomial system constructed here can be made arbitrarily large in both
its variables and equations making a potentially challenging problem for numerical polynomial
solvers.
Example 4.37. The inversion of the nth level Hilbert matrix is a difficult task for linear nu-
merical solvers. This is precisely a matrix of moments for Lebesgue measure on the unit interval
[JKS11]. We pose a similar numerically difficult task for polynomial systems.
Let Fk = {([0, 1], | · |) : {fi}Ni=1} where fi(x) = xn(n+1) + 1
n+1 . For every k ∈ N the attractor Ak of
this IFS will be totally disconnected — the scaling factors sum to less than one, the IFS obeys
the OSC and so by Moran’s Theorem dimH(Ak) < 1 [Fal04]. Let p ∈ Rk such that pi = 1i(i+1)
for i ∈ [k−1] and pk = 1−∑k−1
i=11
i(i+1) , then the associated IFS with probabilitiesMk will have
an attractive measure that is singular with respect to Lebesgue. Restrict the fractal model fitted
to this measure to be Fk = {([0, 1], | · |); {f i := ai ·+(b−∑i−1
j=1 ai)}Ni=1} such that b, ai ∈ (0, 1),
through constraint polynomials. Then the polynomial system formed from this measure will
have a unique solution for each k ∈ N by construction. However, the attractive measure of
limk→∞Mk is1 µL and so the polynomial system will have uncountably many solutions. This is
a contrasting example to that given in Example 3.1, where a sequence of absolutely continuous
measures with respect to Lebesgue were given that limited to a µL-singular measure.
Here we have briefly discussed the subtle points in computation for the one-dimensional case.
From now, our computations are more exploratory in nature, and so these points will not be
made again.
4.4.2 Similitude Example with Dimension Theory
This subsection is a numerical experiment to show how the theory derived explicitly relates to
fitting a fractal model.
Assume that a black and white image of a Steemson triangle is given, and we may measure its
moments from its empirical distribution. Assume that this image is identified as the attractor
of an IFS given by three similitude maps — an element of H(X). Through the relationship
between H(X) and P(X) found earlier in the chapter, we fit an affine IFS with probabilities
1this dynamical system is a ‘infinite IFS’, related to Luroth series.
4.4. COMPUTATIONAL EXPERIMENTS 73
model Mmodel to our black and white image. In this instance we place the assumption of the
OSC on our solution. Computationally we aim to solve the non-linear system:
‖ΦMmodel,21M21 −M21‖22 = 0,
where ΦMmodel,21 represents a system of 21 polynomial equations in 21 unknown IFS parameters.
Initially this system was constrained so that the model parameters ai, bi, ci, di had magnitude
less than 1√2
to keep the contractivity of ΦMmodel,21 away from one. Removing this constraint
had the computation reach the same result. As per the theory derived, the probabilities of
these maps are chosen such that pi = λDi where D is the box-counting/Hausdorff1 dimension of
the image (Example 2.19), making us have only 18 unknown variables. For an affine map, the
contractivity factors, λi, are approximated numerically by the square root of the absolute value
of the determinant of the linear component of the affine map. More sophisticated optimisation
techniques such as using a sub-gradient method to handle the non-smooth absolute value func-
tion could be used, but we do not deem these necessary for our illustrative purposes. In the
Jupyter notebook provided, one has the option to also project from an affine map to a similitude
function (in the induced L2 norm) at each iteration of the minimisation. This procedure is what
produced the pictures in Figure 4.3.
Here are some observations from our numerical experiment.
In making this selection of the probabilities, the first equation in our moment polynomials is
1 =∑N
i=1 pi =∑N
i=1 λDi , so our choice of the pi implicitly forces any (non-trivial) solution found
to have the same Hausdorff dimension as the object to which we are approximating when the
maps in the model are similitudes and the IFS obeys the OSC.
Different ‘initial guesses’ lead to different solutions found by the numerical solver. This is clear
as we have a non-convex system of equations which has various local minima, local minima are
explained in the next section with The Collage Theorem for Moments.
On the following page, we show the an initial guess of how to construct a Steemson triangle and
how minimising the moments reconstructs the fractal to machine accuracy. The interpretation
we give to this observation is a form of ‘fractal polish’. It is known that an IFSs attractor is con-
tinuous in its map parameters. Thus, a good initial guess can be made by forming an IFS model
that gives an image similar image and our moment minimisation acts a refinement method.
For instance, the starting guess taken on the next page was three affine maps with map ori-
entations ‘near correct’, whereby the moment equations may deduce the rest of the fractal image.
Imagine that the maple leaf given through the ‘graduate student algorithm’ in Figure 3.3 was
used as an initial guess for a black and white photo of a real world maple leaf. Being able to make
a generic model, such as our fractal leaf, and using moments to capture the subtle differences a
collection of real maple leaves posses would be an interesting computational experiment.
1These are the same for similitude’s.
74 CHAPTER 4. THE INVERSE PROBLEM OF MOMENTS
Figure 4.4: A black and white photo of a Steemson Triangle, that we wish to fit a fractal model
to. This photo can either be thought of as a set in H([0, 1]2) or ‘uniform’ measure supported on
a Steemson triangle.
Figure 4.5: Six fractal models given by an affine IFS of probabilities with three maps. Each
have their probabilities selected so that their attractive measure is uniform when the IFS obeys
the OSC. This sequence was produced by progressively minimising ‖Φ21M21 −M21‖22.
4.4. COMPUTATIONAL EXPERIMENTS 75
4.4.3 FIF Example with The Collage Theorem for Moments
Finally, we give an example of the third part of the theory created. That is, finding a self-
similar approximant of an object that does not have an exact self-affine representation through
the method of moment matching. We present the following proposition which is a specialisation
of the theory derived that pertains to FIFs, this gives justification to our numerical experiment.
Proposition 4.38. Let ΦM be given as in Theorem 3.53 where we restrict the underlying IFS
to be of the form given in Example 2.11. Further assume that the probabilities pi are chosen to
equal ai := xi − xi−1 for i ∈ [N ]. Then:
1. there exists g such that the contractivity λ > 0 of for Φ : (1`∞, ‖ · ‖g)→ (1`∞, ‖ · ‖g) obeys
λ ≤ maxi∈[N ] |ai||di|+ ε = maxi∈[N ] |xi+1 − xi||di|+ ε for any ε > 0 ;
2. µA(Projx(B)) = µL(Projx(B)) for all B ∈ B([0, 1]2), where µL is Lebesgue measure;
3. the moment vector for µA, Mx,y ∈ (c0, ‖ · ‖∞) and decays at a rate of O( 1√n
);
4. the constants given in Corollary 4.34 obey: C(λ)N→∞−−−−→ 1 and ε(n)
n→∞−−−→ 0 when the
interpolation points are chosen such that
limN→∞
⋃i∈[N ]
{xi} = [0, 1].
Proof. From Theorem 3.53 we have that for a fixed g : R2 → R2 given through g(x, y) = (αx, βy)
we can bound ‖Φ‖g by
∥∥Φg◦M◦g−1
∥∥∞ ≤ max
i∈[N ]sup
n1,n2∈N(|ai|+ α|ei|)n1
(β
α|ci|+ |di|+ β|fi|
)n2
< 1.
For any ε > 0 we may select α, β > 0 such that 1 > ε > maxi∈[N ] max{α|ei|, βα |ci|, β|fi|}. Then∥∥Φg◦M◦g−1
∥∥∞ ≤ maxi∈N |ai||di|+O(ε) := maxi∈N |ai||di|+ ε, granting the first claim.
To show the second claim, it is sufficient to consider sets that form the topological sub-base of
[0, 1]2 that generate its Borel sets. These sets have the form B = [b1, b2]× [b3, b4] for bi ∈ [0, 1].
Define Projx(B) := [b1, b2] × [0, 1] = Bx. Then we have the sub-sigma-algebra of B([0, 1])
generated by sets of the form Bx. See through a calculation:
µA(Bx) =
N∑i=1
piµA(f−1i (Bx)) =
N∑i=1
pi
∫1BxdµA(f−1i ) =
N∑i=1
pi
∫1fi(Bx)dµA
=
N∑i=1
pi
∫1fxi ([b1,b2])×[0,1]dµA =
N∑i=1
piµA(f−1xi (Bx)) := µAx(Bx),
where we define fxi : R → R by fxi(x) := (xi+1 − xi)x + xi and above have used that A ∩fi(Bx) = A ∩ (fxi([b1, b2]) × [0, 1]), and µA is supported on A. Now we have the projected
76 CHAPTER 4. THE INVERSE PROBLEM OF MOMENTS
IFS Fx = {([0, 1] × {[0, 1]}, dx); fx1 , . . . , fxN } where d(x × {[0, 1]}, y × {[0, 1]}) := |x − y| —
this makes a compact metric space. The attractor of this space is Ax = [0, 1] × {[0, 1]} and
the invariant measure obeys µAx(Bx) = b2 − b1 = µL(Bx) = µA(Bx) when equipped with
probabilities pi = (xi − xi−1). This gives the second claim.
Singleton sets are of µA measure zero when the underlying IFS consists of more than one map.
This IFS has only finitely many points outside the open ‖ · ‖∞ unit ball granting the following
bound: ∫|xn1yn2 |dµA(x, y) =
∫|x|,|y|<1
|xn1yn2 |dµA(x, y)n1+n2→∞−−−−−−−→ 0.
To determine the rate of this moment convergence, recall that the measure µA projected onto
the x-axis is unit supported Lebesgue measure from claim two, yielding:∫xn1yn2dµA(x, y) ≤
∫xn1dµA(x, y) =
∫xn1dx =
1
n1= (Mx,y)τ(n1,0) = (Mx,y)n1(n1+1)/2,
which through a change of variable gives the third claim. Note in the case of n1 = bn1ct we have
equality, so a better bound may not be achieved for the decay of the whole sequence.
The fourth claim is an immediate consequence of the first and third claims.
The interpretation of the fourth part of this proposition is: the bound in Corollary 4.34 becomes
better when more interpolation points are used in our FIF interpolant; and when these points
are chosen in a ‘dense’ fashion the contractivity factor of ΦM approaches zero. Furthermore,
the bound becomes better when more moment equations are considered in a minimisation.
In our numerical test, we are interested in minimising the following function in the variables
xr, yr ∈ [0, 1] and dr ∈ (0, 1), ‖ΦM,nMn −Mn‖22 equal to
∑τ(n1,n2)≤n
(mn1,n2 −
∑I
n1! n2!
i! k! i! j! k!(xr − xr−1)i+1 (yr − yr−1 − dryN )i djr x
kr−1 y
kr−1 mi+i,j
)2
where I = {i + k = n1, i + j + k = n2, r ∈ [N ]}, the selection pr = |xr − xr−1| has been made
and only a local minima of the function is needed. If we take the parameterisation x0 = 0 and
xN = 1 with xi = xi−1+δi−1 for δi ∈ (0, 1), our model is flexible enough to approximate another
FIF made from N maps exactly — that is we have a global minima of ‖ΦM,nMn −Mn‖22 = 0.
This may prove difficult in computation as δr → 1 for some r ∈ [N ].
Lemma 4.39. Let an IFS F of N maps be given that obeys the open set condition. Then there
exists another IFS F that obeys the OSC of N ≥ N maps such that N mod (N − 1) = 1 and
F(A) = F(A) = A for a unique A ∈ H(X).
This result is immediate, but grants the following corollary.
Corollary 4.40. Let an IFS F be of the form given in Example 2.11, whose attractor is a
fractal interpolation function, and let 0 < δ < 1. Then there exists an IFS F of N maps that
obeys the proceeding lemma such that the attractor of F is the same as F and the interpolation
points obey |xi−1 − xi| < δ for i ∈ [N ].
4.4. COMPUTATIONAL EXPERIMENTS 77
Given a FIF constructed from an IFS with N maps, there is another IFS of N maps with the
same attractor such that the interpolation points used to construct the IFS in the x direction
are spaced no larger than δ apart. In particular, when modelling a FIF with another FIF we
may constrain the parameterisation above to only consider δi < δ and still find an exact rep-
resentation if we select N sufficiently large. Therefore, we could select a grid of points {xi}Ni=0
such that xi+1 − xi < δ where λ < δ as the contractivity of ΦM so we both control our bound
on the collage theorem for moments and have a model flexible enough to find an exact solution.
Our truncated FIF example from Chapter 2 does not have an exact representation, therefore we
must approximate.
Our computational example has the following design. The first 36 moments of the empirical
distribution given by measured data was calculated. Selecting 36 moments grants ε(n) < 1√36
from Proposition 4.38. A FIF model parametrised by the ‘interpolation points’ {xr, yr}Nr=0,
where we restrain x0 = 0 and xr = 1, was made. An initial approximation is found through the
FIC method presented in Chapter 2, withN = 6 and the di chosen to match the fractal dimension
of the data given, see Figure 4.6. This N was chosen to give the same free parameters as the
example given with 16 maps in Chapter 2. Let the parameters used to generate this model be
denoted {xFICi , yFICi , dFICi}Ni=1. We then apply ‘fractal polish’, see Figure 4.7. A regularisation
term was added in this minimisation, being∑N
i=1 ‖xi − xFICi‖2 + ‖yi − yFICi‖2 + ‖di − dFICi‖2to assure that any gap |xi−1 − xi| does not become too large 1 recalling Proposition 4.38. the
selection of pi = ai = xi − xi−1 is made. If the data were to be ‘uniformly distributed’ on
the fractal curve, we could exploit a simultaneous dimension/moment matching technique. A
generalisation of Theorem 4.29 for triangular IFSs is to select the probabilities pi = |ai|D−1|di|to assure the attractive measure is ‘uniform’ on the IFS attractor, what one may conjecture
given Theorem 2.20 [BRS16].
Example 4.41. Recall the discretised one-dimensional image in [0, 1]2 given in Example 2.24
represented by the vector e1 = (1, 0, . . . , 0) ∈ R1000. This image may be considered a graph of a
function or a self-similar measure on the unit interval — discretised point mass at zero. Taking
the function perspective: in the Lp norm the approximation e2 = (0, 1, 0, . . . , 0) was ‘bad’ as
‖e1 − e2‖pp = 2, although it visually looks like e1. If we instead think of the image as measures,
e1 = 1(·)(0) and e2 = 1(·)(1
1000) then dP(e1, e2) ≤ 11000 , making this a ‘good’ approximation
visually, in the moment sense, and in the Hausdorff sense.
The assumption of the function being continuous, or the object to which we are approximating
even being connected is superfluous to the moment technique, as witnessed in our Steemson
triangle example. Thus, this method can be thought of as an extension to that shown in [BV15]
for discontinuous models. To gain a method that can find a truncated FIF exactly, we must
progress from our simple example to local fractal models.
1We thank Markus Hegland for the suggestion to use regularisation.
78 CHAPTER 4. THE INVERSE PROBLEM OF MOMENTS
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0Approximant Dimension = 1.538 Hausdorff Distance = 0.223
DataApproximant
Figure 4.6: A one-dimensional image (blue) and a FIF approximation made with the techniques
given in Chapter 2 that matches the fractal dimension of the data and is close in the L2 sense.
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0Approximant Dimension = 1.52 Hausdorff Distance = 0.08
DataApproximantInterpolation Points
Figure 4.7: A moment approximation with the methodology described in the previous section.
The interpolation points need not lie on the function given. In adding the moment quantities, the
approximant becomes closer to the given function, in the Hausdorff sense, than the approximant
above, at the expense of the dimensions not matching exactly.
Chapter 5
Loosening Self-Similarity
‘Fractals Everywhere.’ – Michael Barnsley.
It is naive to assume that an object is globally self-similar. Fractal modelling is based off a very
simple premise: in many objects such as images of nature, parts of an object are similar to other
parts of an object. Being able to model these relationships leads to the efficient description and
therefore storage of the object. For instance, within a picture of a tree, one single leaf on the
tree will look very similar to another leaf. However, it is untrue that the whole image of a tree
will look like a leaf. Therefore, we must model real world objects locally, not globally as we have
been in our simplified examples.
In this section we give a brief introduction to how one moves from the global models presented
thus far to local ones. Once this is done, we show that one can calculate the moments of a
self-affine local fractal as a generalisation with the tools we have created so far. Finally, a
commentary is made on how the modern image compression is dissimilar to the theory built up
in this thesis and how the work done here fits into the overall story of fractal approximation.
In this commentary, only the ideas are shown with how they are fundamentally different to the
approach explored here.
5.1 Graph-Directed IFS (GD-IFS)
Recall that when we had an IFS that obeyed the OSC, then equipped this IFS with probabilities,
the dynamical system one could create was isomorphic to a Bernoulli shift. A natural extension
of a Bernoulli shift is a Markov shift, whereby place dependent probabilities are put on the
construction of our measure. This is precisely the type of construction we move to here.
79
80 CHAPTER 5. LOOSENING SELF-SIMILARITY
5.1.1 Construction
From Chapter 3 we know that when (P(X), dP) is a compact metric space, this implies (X, d)
is also a compact metric space. In this chapter a contractive operator, called an IFS of prob-
abilities, M : (P(X), dP) → (P(X), dP) was built. Now we aim to build a vectorised version of
this construction. A summary of this construction is that we will build an IFS that acts on the
collection {(Xi, di)}Ni=1 of compact metric spaces. We will firstly build a collection of operators
Mi,j : P(Xi) → P(Xj) in a near identical fashion to Chapter 3. Finally, let P =⊗V
i=1 P(Xi),
then an operator (and IFS) are built acting on M : P → P; also a compact metric space as
the product of compact metric spaces. The analogous fixed point of such a space is a vector of
normalised Borel measures.
Select Xi = [0, 1]2 for all i ∈ [V ]. We keep the general notation as the subscripts aid in under-
standing the where the functions map between. For this section, we take hi,j,k to be a function
hi,j,k : Xi → Xj for k ∈ [Ni,j ] where Ni,j is the amount of functions acting from Xi to Xj .
Let (Xi, di) and (Xj , dj) be compact metric spaces. Then define the operator Mi,j : P(Xi) →P(Xj) through
(Mi,j(µ))(S) :=
Ni,j∑k=1
pi,j,kµ(h−1i,j,k(S)) (5.1)
for an IFS with probabilities1 Mi,j = {(Xi, di)→ (Xj , dj);hi,j,1, . . . , hi,j,Ni,j ; pi,j,1, . . . , pi,j,Ni,j}.
Now define the operator M : P→ P component-wise through:
(M(µ))j(Sj) :=
V∑i=1
(Mi,j(µj))(Si) =
V∑i=1
Pi
Ni,j∑k=1
pi,j,kµi(h−1i,j,k(Si))
where Pi ∈ (0, 1) are normalising constants such that∑V
i=1 Pi = 1, and Sj is the jth component
projection of this Borel set S ∈ B(P) given by the product topology. There is a natural metric
given by a vectorised form of dP given through
d(µ, ν) = sup
{V∑i=1
(∫fid(µi − νi)
): fi ∈ Lip1(Xi)
}
which by identical reasoning given in Chapter 3 is contractive when each of the functions hi,j,k :
Xi → Xj are. This assumption is stronger than what is required as eluded to by the point
made on the chaos game in Lemma 3.22. One only require that the operator above has a unique
invariant measure with the maps being ‘contractive on average’ [Elt87].
1This is slightly incorrect, as we have defined an IFS with probabilities to be acting on a space to itself. We
allow for this minor abuse of correctness for pedagogical reasons.
5.1. GRAPH-DIRECTED IFS (GD-IFS) 81
Example 5.1. The following example is a generalisation of the Generalised Sierpinski Triangles
given in [SW18]. In particular, this example is a combination of a Steemson and a Pedal triangle.
Let A1 = A2 be the triangle contained in the unit square with one side being the line segment
from zero to one; with the two other side lengths given by the values a = 1.1 and b = 0.85 in
the figure below with Cx, Cy defined in Example 2.10. Note that the triangles Ai,j and Ai are
similar for i ∈ [2] := V and j ∈ [3]. The similarities given on the left may be used to create a
Steemson triangle, those on the right make a Pedal triangle.
X1 = [0, 1]2
(Cx,Cy)
A1,1 A1,2
A1,3
A1
b a
1
X2 = [0, 1]2
(Cx,Cy)
A2,1
A2,2
A2,3
A2
b a
1
Figure 5.1: First Order Steemson and Pedal fractal triangles with a = 1.1 and b = 0.85
Let two copies of the unit square in R2 be given and equip these with the Euclidean metric
to make these compact metric spaces homes for our figures — call these X1 and X2. Now we
will visually describe the similitude maps that create a GD-IFS constructed by two generalised
Sierpinski triangle. Referring to the figure above let hk,k,l : Xk → Xk that map Ak through a
similitude to Ak,l+1 for l ∈ {1, 2}. Furthermore let hi,j,1 : Xi → Xj for i 6= j and i, j ∈ {1, 2}where gi,j,1 maps Ai to Aj,1 — this is possible as Ai and Aj,1 are similar triangles. These
relations can be represented by the following directed graph.
X1 X2
h1,2,1
h2,1,1
h1,1,1
h1,1,2
h2,2,2
h2,2,1
We may equip these maps with any sets of probabilities pi,j,k and normalising probabilities Pi
to form the operator M. Using Elton’s Theorem we may generate an image of the attractor of
82 CHAPTER 5. LOOSENING SELF-SIMILARITY
this GD-IFS, as alluded to in Lemma 3.22, his theorem holds rather generally — even in the
case of place dependent probabilities in the ‘Markov system’ we have here.
Figure 5.2: The two attractors of the GD-IFS system given above, coloured in accordance to
the edges in the directed graph given. Notice that either attractor is not globally self-similar.
5.1.2 Moment Calculation for GD-IFS
The point of this section is to show how the simplified examples shown in this thesis relate to
image compression. In particular, the machinery in this thesis is readily generalised to local
fractal models which are more abundant in practice [Jac92]. Let µ ∈ P := ⊗Vi=1P(Xi) where
Xi is a compact subset of R2 and Mj := (1,∫xdµj(x, y),
∫ydµj(x, y), . . . ). We then have the
recurrence relationship through recalling the notation (Notation 3.45) used in Chapter 3:
(Mj)τ(n,m) =
∫xnymd(Mv(µj))(x, y) =
V∑i=1
Pi
Ni,j∑k=1
pi,j,k
∫hi,j,k(x
n, ym)µi(x, y)
which gives the vector equation, with the notation used in Chapter 3:
Mj =V∑i=1
Pi
Ni,j∑k=1
pi,j,kφhi,j,kMi.
When the above operator is truncated to the n = bnct ∈ N0 level we can construct the following
block matrix form of a perfectly determined system of equations
M =
M1
...
MV
=
P1∑N1,1
k=1 p1,1,kφh1,1,k · · · PV∑NV,1
k=1 pV,1,kφhV,1,k
.... . .
...
P1∑N1,V
k=1 p1,V,kφh1,V,k· · · PV
∑NV,V
k=1 pV,V,kφhV,V,k
M1
...
MV
. (5.2)
This is completely reminiscent of the construction in Chapter 3. Naturally we aim to make use
of the machinery created in Chapter 3, whereby a contractive operator whose fixed point is the
vector of a sequences of moments. Using the notation built in Chapter 3.
5.1. GRAPH-DIRECTED IFS (GD-IFS) 83
Let 1˜∞ = ⊗Vi=1 1`∞, then define the operator Φ : 1
˜∞ → 1˜∞ through
(Φ(x))j = Φj(xj) :=
V∑i=1
Pi
Ni,j∑k=1
pi,j,kφhi,j,kxi,
where x = (x1, . . . , xV ) with xi ∈ 1`∞.
Theorem 5.2. Let a graph directed IFS M : P → P be given such that each IFS Mi,j : Xi ⊂R2 → Xj ⊂ R2 has functions, hi,j,k for k ∈ [Ni,j ], of the form:
hi,j,k
(x
y
)=
(ai,j,k 0
ci,j,k di,j,k
)(x
y
)+
(ei,j,k
fi,j,k
)|ai,j,k|, |di,j,k| < 1.
Define the norm ‖ · ‖g on 1˜∞ through ‖x‖g =
∑Vj=1 ‖xi‖gi, where ‖ · ‖gi is given by the norm
defined in Proposition 3.52. Then there exist gi such that Φ : (1˜∞, ‖ · ‖g) → (1
˜∞, ‖ · ‖g) is a
contraction.
Proof. Recall the working shown in Theorem 3.53 allows the selection of a function gi that maps
(x, y)→ (αix, βiy) such that ‖Φgj◦Mi,j◦g−1j‖∞ < 1 for all i, j ∈ [V ]. Then through near identical
working to Theorem 3.53 we have
‖Φ(x)‖g =
V∑j=1
‖Φ(xj)‖gj ≤V∑j=1
‖Φgj◦Mi,j◦g−1j‖∞‖xj‖gj ≤ max
j∈[V ]‖Φgj◦Mi,j◦g−1
j‖∞ ‖x‖g < ‖x‖g,
completing the claim.
Corollary 5.3. Let M be of the form assumed above and M∗ be the vector of moments of the
invariant measure µ ∈ P that is in (1˜∞, ‖ · ‖g). Let M ∈ (1
˜∞, ‖ · ‖g) and Mn be its nth level
truncation, then:‖M∗ −M‖g ≤ C(λ)‖ΦM,n
Mn −Mn‖2 + ε(n).
for some constants ε, C > 0 that depend on the truncation level n and contractivity λ. Where
ε can be made small with increasing n.
Remark 5.4. Given a one-dimensional image, the theorem above is general enough to use
moments in the fractal image compression used in practice [Jac92]. That is graph directed FIF
functions, which are explained in detail here [DO17]. Using the graded-lexicographical ordering
x ≺lex y ≺lex z, the results in this thesis could be lifted to the three dimensional case for use in
current image compression. Finally, Corollary 5.3 justifies finding a local minima to the moment
equations to approximate a real world image with moments that is not locally self-similar.
Using identical working to the above, combined with Theorem 3.55 we may iteratively calculate
the moments of our GD-IFS generalised Sierpinski triangles.
Example 5.5. Let the graph direct IFS be given as in Example 5.1. For this calculation, we
have made the selection Pi · pi,j,k = λDj
i,j,k where Dj is the root of λDj
j,j,1 + λDj
j,j,2 + λDj
i,j,1 = 1 for
i, j ∈ [2]. The first six moments for these attractors are:
(1, 0.4265, 0.2646, 0.241, 0.0979, 0.118), (1, 0.4327, 0.2705, 0.2446, 0.1022, 0.1208),
which agree with the result found through Elton’s Theorem computed here.
84 CHAPTER 5. LOOSENING SELF-SIMILARITY
5.1.3 Application of Moments to Fractal Image Compression
This thesis has begun the study of a tool for fractal approximation using moments. As alluded
to in our graph directed example, the structure created in Chapter 3 generalises to the IFSs
used in practice. As we saw in Chapter 4, the bound on the Collage theorem for moments gives
an approximation theory for triangular IFSs which in this chapter was shown to generalise to
local models. In Chapter 2 we saw how Lp spaces were used as useful tools to approximate
elements of H(X), from which the methods in this space were sufficient to model photographs.
For example, the methodology presented in Chapter 2 with a local fractal model produces the
following approximant to this picture of a hot air balloon.
Figure 5.3: Original photo (left), FIC compressed photo (right) [B+96].
This is ‘fractal enhancement’. When data lies on a straight line, it is not ridiculous to interpolate
them linearly; here we have self-referential data and a (locally) self-referential interpolation
enhancing the image. In allowing for more variability in the fractal model of a photograph —
such as what our moment approximations aim to do — one aims to better model the data of
the image which leads to effective prediction of a resolution independent model as shown above.
The use of moments to approximate the Hausdorff distance to create such models has been
done heuristically in [Wel99]. In this document, the design of a neural network to automate
finding self-similarities in an image is given. The key heuristic observation is that the current
FIC algorithm is too restrictive in searching for self-similarity in images. Concisely, the current
FIC algorithms are sufficiently constrained so that the costly optimisation being performed is
discrete and further assumes that an image is sufficiently rich in self-similarity that fixing some
map parameters will still yield a good approximant. Our moment method let the parameters
live in a continuum, granting more flexibility as seen in Proposition 4.38. Investigating if the
moment theory derived in this thesis could be used to make a less restrained fractal image
compression would be an interesting area of application.
5.2. A COMMENTARY OF SOME MODERN APPROACHES 85
5.2 A Commentary of some Modern Approaches
Fractal image compression is a modelling tool, not an effective means to describe an object. The
moment method presented in this thesis was on the side of assuming very little when modelling
a self-similar object. Given a Cantor set, the moment method could deduce all the IFSs that
generate it. This ability is accompanied by the drawback of increased difficulty in computation
as witnessed in Chapter 4. In Chapter 2, we saw in our least squares example that placing more
assumptions on an IFS can make the computations easier at the expense of flexibility in the
fractal model. To create fractal models in application, such as the picture shown in Figure 5.3,
these opposing ideas must be balanced.
If one method sways to the side given in this thesis, the area of application is that of ‘SuperRes-
olution’. In this area of application, data is given at measured finite resolutions and deductions
are wanted to be made at a higher resolution — this is a computationally expensive interpolation
problem. In contrast, if computability is favoured, then stronger assumptions on the IFS maps
are made. For many modern image compression techniques, this is the case.
A common idea used in compression is to find a ‘good basis’ in which to describe data — where
analysis of complicated objects, like images, can actually be performed. One such example
is the use of wavelets and a multiresolution basis. The intuition for this basis is that, when
approximating functions, one should firstly make a crude approximation that agrees with the
function on a large scale and then include additional terms to improve the approximate, which
will then capture the finer details. Imagine we have a function and its approximate graphed on
a wall. How would we measure ‘how good’ the approximate is? The multiresolution approach
is to look at the wall, start walking away from it and stop when the two are indistinguishable
to the eye. We then measure how many steps we took from the wall, which we consider to be
‘how far away’ the approximate is to the function. For an explicit example, refer to figure A.4
and do this experiment. The relationship between forming a wavelet basis from FIFs with fixed
map parameters is shown in [DGHM96].
In [MS15] they fix IFS map parameters for a superIFS — an IFS on (H(H(X)), dH). To describe
an image, a vectorised version of this space, (H(HV (X)), dH) is needed. This space is ‘huge’
and a model here must be constrained greatly for computation — in fact the IFS used is a two
dimensional version of Example 3.8. A compression scheme is given that allows for V -different
similarities (see Figure A.5) in their ‘basis’, but like wavelets, we are finding an effective way
to describe an image, not model it. The way compression is achieved is through using IFS
theory (specifically codespace) to describe the image in an efficient way. Again, the resolution
independence is lost here as we are fundamentally describing the self-similarities in the image,
not using the self-similarities to model how the image relates to itself.
Bibliography
[ABVW10] Ross Atkins, Michael F Barnsley, Andrew Vince, and David C Wilson. A char-
acterization of hyperbolic affine iterated function systems. Topology Proceedings,
36:189–211, 2010.
[Alt80] Mieczyslaw Altman. Iterative methods of contractor directions. Nonlinear Analysis:
Theory, Methods & Applications, 4(4):761–771, 1980.
[B+96] Michael F Barnsley et al. Fractal image compression. Notices of the AMS, 43(6):657–
662, 1996.
[Bar14] Michael F Barnsley. Fractals everywhere. Academic press, 2014.
[BBHV16] Christoph Bandt, Michael Barnsley, Markus Hegland, and Andrew Vince. Old
wine in fractal bottles: Orthogonal expansions on self-referential spaces via fractal
transformations. Chaos, Solitons & Fractals, 91:478–489, 2016.
[BD85] Michael F Barnsley and Stephen Demko. Iterated function systems and the global
construction of fractals. Proceedings of the Royal Society of London. A. Mathemat-
ical and Physical Sciences, 399(1817):243–275, 1985.
[BE88] Michael F Barnsley and John H Elton. A new class of markov processes for image
encoding. Advances in applied probability, 20(1):14–32, 1988.
[BH89] Michael F Barnsley and Andrew N Harrington. The calculus of fractal interpolation
functions. Journal of Approximation Theory, 57(1):14–34, 1989.
[BHR06] Christoph Bandt, Nguyen Hung, and Hui Rao. On the open set condition for self-
similar fractals. Proceedings of the American Mathematical Society, 134(5):1369–
1374, 2006.
[BRS16] Balazs Barany, Michal Rams, and Karoly Simon. On the dimension of self-affine
sets and measures with overlaps. Proceedings of the American Mathematical Society,
144(10):4427–4440, 2016.
[BRS18] Balazs Barany, Michal Rams, and Karoly Simon. Dimension of the repeller for a
piecewise expanding affine map. arXiv preprint arXiv:1803.03788, 2018.
I
II BIBLIOGRAPHY
[BRS19] Balazs Barany, Michal Rams, and Karoly Simon. Dimension theory of some non-
markovian repellers part i: A gentle introduction. arXiv preprint arXiv:1901.04035,
2019.
[Buc76] Bruno Buchberger. A theoretical basis for the reduction of polynomials to canonical
forms. ACM SIGSAM Bulletin, 10(3):19–29, 1976.
[BV15] Michael F Barnsley and P Viswanathan. Discontinuous fractal functions and fractal
histopolation. arXiv preprint arXiv:1503.06903, 2015.
[BV18] Michael F Barnsley and Andrew Vince. Tiling iterated function systems and
anderson-putnam theory. arXiv preprint arXiv:1805.00180, 2018.
[Dau92] Ingrid Daubechies. Ten lectures on wavelets, volume 61. Siam, 1992.
[DGHM96] George C Donovan, Jeffrey S Geronimo, Douglas P Hardin, and Peter R Massopust.
Construction of orthogonal wavelets using fractal interpolation functions. SIAM
Journal on Mathematical Analysis, 27(4):1158–1192, 1996.
[DO17] Ali Deniz and Yunus Ozdemir. Graph-directed fractal interpolation functions. Turk-
ish Journal of Mathematics, 41(4):829–840, 2017.
[Elt87] John H Elton. An ergodic theorem for iterated maps. Ergodic Theory and Dynamical
Systems, 7(4):481–488, 1987.
[EST07] C Escribano, MA Sastre, and E Torrano. A fixed point theorem for moment ma-
trices of self-similar measures. Journal of computational and applied mathematics,
207(2):352–359, 2007.
[Fal04] Kenneth Falconer. Fractal geometry: mathematical foundations and applications.
John Wiley & Sons, 2004.
[Gra18] Alexandra Grant. Mixed tiling systems and anderson putnam theory. Honours
thesis, Australian National University, Acton, Canberra, 2018.
[Hau78] Felix Hausdorff. Grundzuge der mengenlehre, volume 61. American Mathematical
Soc., 1978.
[HM90] CR Handy and Giorgio Mantica. Inverse problems in fractal construction: moment
method solution. Physica D: Nonlinear Phenomena, 43(1):17–36, 1990.
[Hut79] John E Hutchinson. Fractals and self similarity. University of Mel-
bourne.[Department of Mathematics], 1979.
[IS13] Konstantin Igudesman and Gleb Shabernev. Novel method of fractal approximation.
Lobachevskii Journal of Mathematics, 34(2):125–132, 2013.
BIBLIOGRAPHY III
[Jac92] Arnaud E Jacquin. Image coding based on a fractal theory of iterated contractive
image transformations. IEEE transactions on image processing, 1(1):18–30, 1992.
[JKS11] Palle ET Jørgensen, Keri A Kornelson, and Karen L Shuman. Iterated function sys-
tems, moments, and transformations of infinite matrices. American Mathematical
Soc., 2011.
[Kol31] Andrei Kolmogoroff. Uber die analytischen methoden in der wahrscheinlichkeit-
srechnung. Mathematische Annalen, 104(1):415–458, 1931.
[KR15] Antti Kaenmaki and Eino Rossi. Weak separation condition, assouad dimension,
and furstenberg homogeneity. arXiv preprint arXiv:1506.07851, 2015.
[Man96] Giorgio Mantica. A stable stieltjes technique for computing orthogonal polynomi-
als and jacobi matrices associated with a class of singular measures. Constructive
Approximation, 12(4):509–530, 1996.
[Man13] Giorgio Mantica. Direct and inverse computation of jacobi matrices of infinite iter-
ated function systems. Numerische Mathematik, 125(4):705–731, 2013.
[MMR00] Pertti Mattila, Manuel Moran, and Jose-Manuel Rey. Dimension of a measure.
Studia Math, 142(3):219–233, 2000.
[Mor46] Pat AP Moran. Additive functions of intervals and hausdorff measure. In Mathe-
matical Proceedings of the Cambridge Philosophical Society, volume 42, pages 15–23.
Cambridge University Press, 1946.
[Mor99] Manuel Moran. Dynamical boundary of a self-similar set. Fundamenta Mathemati-
cae, 160(1):1–14, 1999.
[MS89] Giorgio Mantica and Alan Sloan. Chaotic optimization and the construction of
fractals: solution of an inverse problem. Complex Systems, 3(1), 1989.
[MS15] Franklin Mendivil and Orjan Stenflo. V-variable image compression. Fractals,
23(02):1550007, 2015.
[MT01] H Michael Moller and Ralf Tenberg. Multivariate polynomial system solving using
intersections of eigenspaces. Journal of symbolic computation, 32(5):513–531, 2001.
[Sch94] Andreas Schief. Separation properties for self-similar sets. Proceedings of the Amer-
ican Mathematical Society, 122(1):111–115, 1994.
[Ste04] Hans J Stetter. Numerical polynomial algebra, volume 85. Siam, 2004.
[Ste18] Kyle Steemson. The open set condition and neighbour graphs. Honours thesis,
Australian National University, Acton, Canberra, 2018.
IV BIBLIOGRAPHY
[SW18] Kyle Steemson and Christopher Williams. Generalised sierpinski triangles. arXiv
preprint arXiv:1803.00411, 2018.
[Vin18] Andrew Vince. Global fractal transformations and global addressing. Journal of
Fractal Geometry, 5(4):387–418, 2018.
[Wal00] Peter Walters. An introduction to ergodic theory, volume 79. Springer Science &
Business Media, 2000.
[Wel99] Stephen T Welstead. Fractal and wavelet image compression techniques. SPIE
Optical Engineering Press Bellingham, Washington, 1999.
[WW08] Weiping Wang and Tianming Wang. Generalized riordan arrays. Discrete Mathe-
matics, 308(24):6466–6500, 2008.
Appendix A
Further Fractal Figures
‘Let me check your figures, last time you made scary photos of Griff.’
– Jane Tan.
‘You need to have our triangles, with their names, somewhere.’
– Kyle Steemson.
Figure A.1: A (periodic) fractal tiling given by a Williams triangle, as named in [Gra18].
V
VI APPENDIX A. FURTHER FRACTAL FIGURES
Figure A.2: First eight orthogonal polynomials for uniform Cantor measure µC computed by the
method given in [JKS11].
VII
Figure A.3: A FIF of two maps (blue) with a moment approximation by a FIF made from two
maps (orange). Next, a FIF of three maps approximated by a FIF of two maps with moments.
VIII APPENDIX A. FURTHER FRACTAL FIGURES
Figure A.4: Visualisation of an image projected to the various resolution spaces for a Haar
wavelet compression, see [Dau92].
IX
Figure A.5: Image storage using a V-variable coding of [0, 1]2 for V = 4j where j ∈ {0, . . . , 5},for compression algorithm details see [MS15].