Self-Similar Models and How to Find Them: A Moment Theory Approach · 2019. 11. 21. · of an...

$: Self-Similar Models and How to Find Them: A Moment Theory Approach · 2019. 11. 21. · of an approximation method that can do exactly this task through fractal moment theory. Finally,$
Self-Similar Models and How to Find Them:

A Moment Theory Approach

Christopher James Williams

Supervisor: Michael Barnsley

May 2019

An Honours thesis submitted for the degree of Bachelor of Philosophy (PhB.) – Mathematics

of the Australian National University

Declaration

The work in this thesis is my own except where otherwise stated.

Christopher James Williams

Acknowledgements

Firstly I would like to thank Joan for being an extraordinary convener of the honours program

and accommodating my specialised honours ‘year’.

To the honours/masters students: Alex, Adele, Jack, Yossi, Hugh, Kelly, Adwait, James,

Yiming and Michael I am thankful for the friendships you have all provided in this time period.

I am especially thankful to: Jane’s ubiquitous influence on the tikz design of many (most)

figures, after witnessing my colour-blind design of palette; Kyle’s enforcement of detail to many

(most) of my proofs, that I considered to be complete; Feng and Sam’s willingness to endlessly

talk about the failure of many (most) of our Python programs; and Wenqi’s analytical input in

bounding quantities that in many (most) cases resulted in repeated use of the triangle inequality.

In my third year of study, I was determined to do honours in a more applied area than

mathematics. Doing Pierre’s ‘applied’ math of finance course and a computational project

with Markus that semester showed me how some pretty abstract ideas made their way into

application. Markus has always continued to give me computational advice since that project,

much of which has been used in my examples here. Their influence rid me of much fear of

integrals and computation, both of which are abundant in this thesis.

Michael has always been a force of encouragement, support, and like Kyle, an excellent

supervisor — although, unlike Kyle, his organisational skills may come exclusively from Louisa.

I am very lucky to have Michael and Louisa’s continuing input to both my thesis, general

knowledge of mathematics and the overall guidance they have provided me throughout my

studies for the past three years. All the extra effort Michael and Louisa have given, outside of

writing a thesis, I will always remember.

Lastly I would like to thank my family for their continued support for my higher education.

In particular, my partner Caroline has kept my life together during this period, without which,

my organisational skills would be completely reminiscent of Michael\Louisa.

v

Abstract

Self-similar modelling aims to capture how an object relates to itself, and can be done

through fractal geometry. Creating a fractal that looks like a natural object is easy, find-

ing a fractal that models a given self-similar object is hard. The only known success of this

inverse problem comes from fractal image compression (FIC). In this thesis we develop

an alternative solution to the problem. Explicitly, we formulate a fractal moment approx-

imation theory that provides more flexibility in self-similar modelling to that present in

FIC. This is done through placing a normalised measure on a self-similar set and using

this measure’s moments in its reconstruction. It is proven that the vector of these mo-

ment values is the unique fixed point of a linear operator on 1`∞, the space of bounded

sequences whose first element is 1. A relationship between self-similar sets and measures

is given through tools found in fractal tiling and dimension theory. An approximation

theory is developed that gives a computationally feasible way to approximate (possibly

non-self-similar) objects. These approximations are investigated in both theory and com-

putation. Through our novel examples, this theory is extended to local fractal models

that are frequent in application, and their relation to image compression is discussed.

vii

Contents

Acknowledgements v

Abstract vii

Notation and Terminology xi

1 Motivation and Outline 1

2 The Home H(X) and Introductory Fractal Modelling 5

2.1 The Hutchinson Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Introduction to Fractal Approximations . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 One-Dimensional Fractal Image Compression . . . . . . . . . . . . . . . . . . . . 16

3 The Probabilists P(X) and Self-Similar Measures 21

3.1 Self-Similar Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2 Fractal Moment Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3 Computation of Two-Dimensional Affine Moments . . . . . . . . . . . . . . . . . 46

4 The Inverse Problem of Moments 47

4.1 Algebraic Solution to Polynomial Systems . . . . . . . . . . . . . . . . . . . . . . 48

4.2 Hausdorff Dimension and The Open Set Condition . . . . . . . . . . . . . . . . . 57

4.3 The Collage Theorem for Moments . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.4 Computational Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5 Loosening Self-Similarity 79

5.1 Graph-Directed IFS (GD-IFS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.2 A Commentary of some Modern Approaches . . . . . . . . . . . . . . . . . . . . . 85

Bibliography I

A Further Fractal Figures V

ix

Notation and Terminology

Notation

N0 the natural numbers including zero, {0, 1, 2, . . . }.

N, Q, R, C the natural, rational, real and complex numbers.

[N ] the set of numbers {1, . . . , N} for N ∈ N.

[N ]n, [N ]N the strings of numbers of length n ∈ N from the alphabet [N ], and infinite

strings of words from the alphabet [N ].

(X, d) a complete (or sometimes compact) metric space.

B(X) the Borel subsets of the metric space (X, d).

1·(·) the indicator function, 1B : X → {0, 1}, that returns one if x ∈ B ∈ B(X)

and zero otherwise. Through a mild abuse of notation, we also take 1·(x) :

B(X)→ {0, 1} through 1B(x) returning one if x ∈ B, and zero otherwise.

Lp(Rn) the space of Lebesgue measurable functions, f on Rn, such that∫|f |p <∞.

H(X) the space of non-empty compact subsets of a complete metric space (X, d).

P(X) the space of normalised Borel measures on a compact metric space (X, d).

`∞, 1`∞ the space of bounded sequences in the ‖·‖∞ norm, and the space of bounded

sequences whose first term is one.

(M)i,j the (i, j)-th entry of a matrix M .

Terminology

IFS an iterated function system, F = {(X, d); f1, . . . , fN}, see Defini-

tion 2.3.

attractor the unique fixed point of an IFS when viewed as an operator F :

H(X)→ H(X), see Definition 2.5.

attractive measure the unique fixed point of an IFS with probabilites, M : P(X) →P(X), see Definition 3.6.

xi

Chapter 1

Motivation and Outline

Since the inception of calculus, mathematical modelling has been dominated by smooth func-

tions. To their credit, the approximations made by polynomials, splines and cosines have proven

to be impressive tools in many modern mathematical models. But there are disciplines in which

such approximates are notoriously unhelpful — said disciplines often include data which pertains

to nature. A fractalist, in response to this situation, might declare that calculus is a fallacy,

first-year analysis courses amount to brainwashing and that the the world is by nature rough,

jagged and complex. They might then, ironically, proceed to launch into a long discussion about

Sierpinski triangles with no explanation as to how these very artificial fractal objects relate to

modelling nature. Straight lines appear as seldom in nature as do smooth functions, and Sier-

pinski triangles even more so. The aim of this thesis therefore is not to find the ‘best’ fractal

approximation of a given object but rather to give insight into some methods of fractal approx-

imation and how one can transition from knowledge of Sierpinski triangles to the creation of

fractal models.

Broadly, a self-similar object is anything that may be described with reference to itself. General

examples are abundant; a dictionary contains words whose definitions are constructed by their

relation to other words, and population growth is described in terms of the current population.

Mathematically, to construct a self-similar model the following three items are generally needed:

1. a compact, or possibly complete, metric space (X, d);

2. a space of ‘models’ M on (X, d);

3. an operator T : M→M constructed of ‘simple maps’.

Examples of the first two items include: (R2, ‖ · ‖2) whose models are non-empty compact sets

to form Sierpinski triangles (Example 2.10); (C([0, 1]), ‖ · ‖∞) to model continuous self-similar

functions (Example 2.11); and probability measures on ([0, 1]2, ‖ · ‖2) that can model leaves

(Figure 3.3).

1

2 CHAPTER 1. MOTIVATION AND OUTLINE

How to create a self-similar object that resembles natural phenomena such as leaves, mountains

and clouds is well documented. This is the forward problem. But given a self-similar object,

how does one fit a fractal model to it? This is the inverse problem and the main question being

addressed in this thesis.

We break up the study of this inverse problem into three main parts. We first review techniques

of fractal image compression in the most simplified setting possible — approximating a ‘one-

dimensional’ image. Fractal image compression is the only success of this inverse problem

known to date. As such, we then highlight areas where the study of this inverse problem is non-

existent. For instance, the only approach of modelling self-similar objects with fractal models

— namely, fractal image compression — will never yield an exact construction of self-similar

objects like Cantor sets and Sierpinski triangles. The next part of this thesis is the formulation

of an approximation method that can do exactly this task through fractal moment theory.

Finally, we show with simplified computational examples how the ideas from our newly created

approximation of self-similar objects pertain to modelling. The thesis concludes by providing a

commentary of how our simplified examples can be generalised to explicit use in modelling and

by outlining what ideas are present in modern techniques.

The structure of this thesis is as follows.

Chapter 2 introduces the home of fractal geometry H(X), the space of non-empty compact

subsets of a complete metric space (X, d). Three self-similar objects are created here to later

model through the use of iterated function systems (IFSs) — this will be the operator made

from the ‘simple maps’ alluded to above. Traditional fractal image compression (FIC) will be

performed in a simplified setting on the last of the three objects created — a self-similar function.

Since an image can be thought of as the graph of a function, the ‘one dimensional image’ we will

make will be the graph of the simplified self-similar function I : [0, 1]→ [0, 1]. To present FIC,

a relationship between the space of models of sets in H(X) and models of continuous functions

in C(X) is established. This is exactly how knowledge of Sierpinski triangles transitions to

modelling photographs. A computational example is then provided to show how this is done

explicitly. The reader should refer back to the structure of this chapter while continuing through

the remainder of this thesis. Although the structures of this chapter and of the thesis are similar,

the content is subtly different and in many cases of the writers own formulation for a different

method of fractal approximation, unless otherwise referenced.

After presenting FIC, a more flexible modelling tool is desired. This is done in P(X), the space

of compactly supported Borel measures on a compact metric space X. In Chapter 3, this space is

3

built and the theory of integration with respect to self-similar measures is defined. The ergodic

theory of a self-similar measure with respect to an IFS is explored for the efficient computation

of a fractal image. Through investigating this fractal generation technique — referred to as the

chaos game — the computation of the measures moments is proposed. This computation is

done for three types of IFS: for one-dimensional real affine IFSs, for one-dimensional complex

affine IFSs, and for two-dimensional affine moments. The last of these computations is a novel

additions to the literature. These moments can be calculated in an efficient manner through an

iterative process on 1`∞: a closed subset of the Banach space of bounded sequences.

Chapter 4 investigates reconstructing a self-similar measure through its moments. To do this, a

multivariate polynomial system of equations must be solved. In the case of a Cantor measure,

this is done exactly through tools in computational algebraic geometry — specifically Grobner

bases. Like how a relationship between C(X) and H(X) was created in Chapter 2, a relationship

between P(X) and H(X) is established. This is done through modern tools in fractal tiling theory

and Hausdorff dimension to show the core principles of the computation in a simplified setting.

This relationship is then used to allow moments to model a generalised Sierpinski triangle —

specifically a Steemson triangle. Finally, through using the iterative formula created in Chapter

3, an approximation theory for a self-similar object using moment values is created. This is

synonymous with the approximation theory made in Chapter 2. A computational example is

provided for modelling the same object presented in Chapter 2 in a subtly different way.

In describing these techniques, this thesis provides the history of how fractal approximations

have been progressively refined. After the success of FIC, there were two clear paths for the

progression of image compression: one that strived for efficient computation and description,

the other that aimed to model a self-similar object at a potentially larger computational cost.

Chapter 5 reveals how the simple modelling examples made in the thesis could possibly be

extended to image compression, specifically by use of local fractal models and their respective

moment theory, which is derived in that chapter.

Code for numerical experiments can be found at:

https://github.com/Chr1sWilliams/Chris Williams Honours,

in amply named Jupyter notebooks pertaining to each chapter. The code is not optimised and

written for readability in the notebook. The thesis is self-contained, although Chapter 4 can be

read interactively with the notebooks provided.

https://github.com/Chr1sWilliams/Chris_Williams_Honours

4 CHAPTER 1. MOTIVATION AND OUTLINE

Chapter 2

The Home H(X) and Introductory

Fractal Modelling

‘What is the simplest example?’ — Markus Hegland.

Self-referential modelling has had great success in capturing the details of natural objects that

are rich in self-similarity, such as leaves, clouds and mountain ranges. The premise of fractal

modelling is that an object is described by its relations to itself — avoiding the potential dif-

ficulty of directly describing the object. Heuristically, a ‘good’ fractal model can be achieved

through the ability to accurately and concisely capture the self-referential aspects an object

might exhibit. Currently there are two effective methodologies used to find such aspects: the

graduate student algorithm and one stemming from fractal image compression (FIC). In this

chapter we explore the latter. This is done by firstly introducing the construction of self-similar

objects, then presenting fundamental ideas from FIC in the simplest possible setting as an in-

troduction to fractal modelling.

The structure of the chapter is as follows. The construction of self-referential objects such as

Cantor sets and Sierpinski triangles is introduced in the home of fractal geometry, H(X): the

space of non-empty compact subsets of a complete metric space (X, d). We then progress to how

one can use the ideas formed from working inH(X) to begin self-referential modelling. Explicitly,

we show how one fits a self-referential model to a function in C([0, 1]) — the space of continuous

functions on the unit interval — by introducing fractal interpolation functions (FIFs). This is

done to reveal techniques and tools from FIC and fractal dimension theory as a sampling of

to the ideas and methodology introduced throughout the thesis. A computational example is

included at the end of the chapter to explore traditional fractal approximation in a simplified

setting.

5

6 CHAPTER 2. THE HOME H(X) AND INTRODUCTORY FRACTAL MODELLING

2.1 The Hutchinson Operator

The first step in creating our fractal objects is the construction of a “suitable ‘home’ in which

to describe them”. To start, a complete metric space is needed, for which we usually select

(R2, ‖ · ‖2) where ‖ · ‖2 is the Euclidean norm. We will turn this metric space into a home

through the following:

Definition 2.1. Let (X, d) be a complete metric space and H(X) be the space of non-empty

compact subsets of X. The Hausdorff metric dH on H(X) is given through

dH(U, V ) = infr>0

{r∣∣ V ⊂ ⋃

u∈UBr(u) and U ⊂

⋃v∈V

Br(v)

},

where Br(x) = {y | d(x, y) ≤ r}.

Remark 2.2. The above metric can be described intuitively as follows; given two sets U, V ∈H(X), the set U is inflated by replacing each point in U with a ball of radius r centralised to

the point replaced. This inflation continues until the set V is consumed, then this is done in the

reverse fashion and the maximum inflation needed from either case is the distance between the

sets. This metric is not a norm.

It is always true that if (X, d) is a complete metric space then (H(X), dH) is also complete

[Hau78]. Now that we have created a home for our fractals, we need to know how to construct

them. This is done with the aid of Iterated Function Systems (IFSs) [BD85] and the Hutchinson

operator [Hut79] as defined below.

Definition 2.3. An iterated function system (IFS) is a complete metric space (X, d) with a

finite collection of contractive functions {fi : X → X}Ni=1. We denote this collection F =

{(X, d); f1, . . . , fN}.

Remark 2.4. A contractive function fi : X → X on a metric space (X, d) obeys d(x, y) ≤λid(fi(x), fi(y)) for all x, y ∈ X and for a fixed λi ∈ [0, 1). The letter λ will be exclusively used

in this thesis to refer to the contractivity factor of a function (or operator when appropriate).

Through a mild and convenient abuse of notation, we use the same symbol F for an operator

F : H(X)→ H(X).

Definition 2.5. Let F = {(X, d); f1, . . . , fN} be an IFS. Then the Hutchinson operator F :

H(X)→ H(X) is defined through

F(V ) =N⋃i=1

fi(V ) for all V ∈ H(X).

Remark 2.6. The notation above is ambiguous as fi : X → X, yet we have used fi above to

act on elements of H(X). To remedy this we define fi acting on non-empty subsets of X through

fi(U) = {fi(u) | u ∈ U}.

2.1. THE HUTCHINSON OPERATOR 7

When each of the fi in an IFS are contractive functions, the Hutchinson operator is a contraction

on the space H(X) with respect to the metric dH. That is for all U, V ∈ H(X), let r = dH(U, V )

then

V ⊂⋃u∈U

Br(u)⇒ fi(V ) ⊂ fi

(⋃u∈U

Br(u)

)=⋃u∈U

fi (Br(u)) ⊂⋃u∈U

Bλir(u),

where λi < 1 is the contractivity factor for fi, which implies the contractivity of F acting on

H(X). Through Banach’s Fixed Point Theorem we know that F admits a unique fixed point A.

We call this element A ∈ H(X) the attractor of an IFS, as by Banach’s Fixed Point Theorem,

it is attractive under the iteration of F . Explicitly, for any V ∈ H(X) we have Fn(V )n→∞−−−→ A.

For the most part it is assumed that the functions fi are affine and (X, d) = (R2, ‖ · ‖2); that

is, they take the form x → fi(x) = Aix + bi where Ai is a linear operator Ai : R2 → R2 and bi

is a translation. In particular cases, we use a specific type of affine function called a similitude.

Similitudes are affine functions with the added requirement that Ai = λiOi, where λi ∈ [0, 1)

and Oi is an isometry. To make this distinction clear, when similitudes in two real dimensions

are used, the functions in the IFS may be identified as complex variable isometries of the space

(C, | · |) multiplied by a scaling factor, which creates a fractal object in H(C).

It is often useful to place some type of separation property on an IFS being studied for reasons

that will become abundantly clear when analysing the dynamics and dimension theory present

on a fractal attractor. This is an assumption on how the functions in an IFS map certain sets.

These extra assumptions enable us to determine a wealth of information about the topology of

an IFS’s attractor without being too restrictive on the type of IFS being studied. There are

various separation conditions to choose from (see, for instance [KR15]), we select the open set

condition (OSC) defined below.

Definition 2.7. An IFS F = {(X, d); f1, . . . , fN} obeys the open set condition (OSC) if there

exists a non-empty open set O ⊂ X such that:

1.⋃Ni=1 fi(O) ⊂ O ;

2. fi(O) ∩ fj(O) = ∅ for all i 6= j.

Remark 2.8. In the first assumption of the OSC, it would be incorrect to write F(O) ⊂ O as

O may not lie in H(X). It is worth noting that the OSC is a condition which pertains to an

IFS, not its attractor.

With these formalities defined, we present three examples of IFSs that obey the open set con-

dition. These will be our test cases for the fractal models described in this thesis.

Firstly, we have the traditional Cantor set created by an IFS that mimics its typical set theoretic

construction.


Example 2.9. Take the IFS FCant = {([0, 1], |·|); f1(x) = x3 , f2(x) = x

3 + 23}. Then the attractor

of this IFS is the traditional Cantor set as evinced through the following iteration.

[0, 1]

FCant([0, 1])

F2Cant([0, 1])

F3Cant([0, 1])

F4Cant([0, 1])

0 1

This IFS obeys the open set condition witnessed by (0, 1). In one real dimension, affine functions

are precisely similitudes — the nomenclature chosen is to refer to them as affine functions.

Next we present a Steemson triangle, a Generalised Sierpinski Triangle [SW18]. This in an IFS

acting on the space H(C), which is made from complex similitude functions. One could construct

a synonymous IFS acting on the space H(R2), but the maps become more cumbersome to write

and hinder the observation of how the maps are explicitly acting on H(C).

Example 2.10. [SW18] Consider the IFS of three maps:

FSteem = {(C, | · |) ; f1(z) = αeiθz, f2(z) = βz + (1− β), f3(z) = γz + (1− γ)(Cx + iCy)},

where α = bb2+1

, β = 1b2+1

, γ = b2

b2+1, Cx = b2−a2+1

2 , Cy =

√4b2−(b2−a2+1)2

2 and θ = cos−1(Cxb

);

and a = 1.1, b = 0.85. The attractor of this IFS is a Steemson triangle from [SW18], a generalised

Sierpinkski triangle.

Unlike the previous example whose attractor was the traditional Cantor set, it is not straight-

forward to identify the attractor of this IFS. To gain some insight, we provide the following

discussion of the maps in the IFS. Broadly speaking, the maps are parameterised by the values

a, b > 0, which become the side lengths (along with 1) of the triangular convex hull that contains

the attractor of this IFS. This can be identified by noting the fixed point of each fi is a vertex

of this triangular hull. We have identified the ‘home’ for this fractal to be (C, | · |), so by our

convention these maps are similitudes. These functions are isometries multiplied by a scaling

factor (of magnitude less than 1); where f2 and f3 are scaling functions composed with transla-

tions, and f1 is a scaling function with a flip about the real axis followed by an anti-clockwise

rotation. This means that distances between points are uniformly scaled under the maps fi, or

geometrically, the group of similitude transformations maps squares to squares.

The following page shows an arbitrary set V ∈ H(X) (to which some colouring on the set is

added) under iteration of FSteem. This set was chosen as it is easy to identify the orientation of

its images under the functions fi.


Figure 2.1: Elements of H(C) (with added colouring, explicitly Bobby the Crimson Rosella)

converging to a Steemson triangle under the iteration of FSteem. This IFS obeys the open set

condition, seen by considering the interior of the set under iteration.


Our final example is an IFS whose maps are affine in two dimensions and has an attractor that

creates the graph of a continuous function. This form of IFS is known as a fractal interpolation

function (FIF) and is presented in depth here on the historic basis of being one of the first

applications of fractal geometry to modelling problems [BH89]. In the application of affine

maps, straight lines are mapped to straight lines, but objects such as squares may be mapped

other quadrilaterals. In general, the theory of IFSs comprising of affine maps is far from being

well-understood. To determine if there exists a topology on X = Rn such that an affine IFS

is contractive on H(X) requires bounding the maps joint spectral radius by one [ABVW10], a

difficult computation when n > 1 and there is more than one affine function in the IFS.

Classically, the construction of a non-differentiable continuous function was a non-trivial task

[Kol31]. In this example, constructing what was once regarded as pathological function is as

straightforward as creating a generalised linear interpolant.

Example 2.11. [BH89] Suppose data {(xi, yi)}Ni=0 ⊂ R2 is given, and the data points are

enumerated such that xi < xi+1 for all i ∈ {0, . . . , N − 1}. This data looks to be sampled from

a continuous rough function. We wish to build an IFS such that the attractor G ∈ H(X) is

the graph of a continuous interpolant of the data. Through a change in coordinates by means

of an affine transformation, we may assume that our data obeys (x0, y0) = (0, 0), xN = 1 and

{(xi, yi)}Ni=0 ⊂ [0, 1]2 to simplify calculations. Based on this assumption we may define the maps

fi(x) =

(xi − xi−1 0

yi − yi−1 − diyN di

)(x

y

)+

(xi−1

yi−1

),

for i ∈ {1, . . . , N} := [N ] and |di| < 1. Define the IFS FFIF = {(R2, ‖ · ‖2); f1, · · · , fN}. A

graphical example of one such IFS for N = 4 is shown below, where the outer rectangle is

mapped to the shaded regions under the maps fi.

(x1, y1)

(x2, y2)

(x3, y3)

(x4, y4)

f1

f2f3

f4

|d1|

|d2|

|d3|

|d4|

0 1

1

Figure 2.2: The unit square [0, 1]2 and the corresponding images fi([0, 1]2).


We reveal the explicit details of this attractor in the next section. For now, consider selecting

di = 0; this maps the whole outer rectangle to the graph of a linear interpolant of the data,

indicating that this IFS might be a good candidate for data interpolation. The sign of the value

di is noteworthy in constructing these maps. A negative value for di means the orientation of

the unit square mapped under fi (see map f3 in the diagram above) is flipped about the x-axis,

in the exact same way complex conjugation was used in the construction of a Steemson triangle.

Before we may continue, this example elucidates one of the first difficulties of working with affine

IFSs in dimensions larger than one: the existence of an attractor. For this, we require that each

function in an IFS be contractive in some underlying metric space from which H(X) can be

constructed. It is straightforward to check this in the similitude case, as one can always write

for a similitude s : X → X that ‖s(x) − s(y)‖ = λ‖x − y‖ for some λ > 0 and norm ‖ · ‖. In

the affine case it is unclear by inspection if the above maps are indeed contractive, and in many

metric spaces they are not — including (R2, ‖ · ‖2). However, by using a different metric on the

underlying space R2, the functions for a FIF can always be made contractive. Define the norm

‖ · ‖θ on R2 through ‖(x, y)‖θ = |x|+ θ|y| for θ > 0 and any (x, y) ∈ R2. It directly follows for

fi as defined above that

‖fi((x1, y1))− fi((x2, y2))‖θ ≤ (|xi − xi−1|+ θ|yi − yi−1 − diyM |)|x1 − x2|+ θ|dn||y1 − y2|

≤ λ‖(x1 − x2, y1 − y2)‖θ,

where λ = max{|di|, |xi − xi−1| + θ|yi − yi−1 − diyM |}, which can be made less than one for

sufficiently small θ because |xi − xi−1| < 1. Geometrically the metric ‖ · ‖θ is ‘squishing’ the

y axis to gain a contraction, with the requirement that the function is a contraction in the x

co-ordinate. The space H(R2) is then constructed with the norm ‖ · ‖θ to gain the existence of

an attractor, which can be related back to the initial space through using the equivalency of

norms in finite dimensions. To see a full justification of this, see [Bar14] page 215.

To understand why this formulation might produce the graph of a continuous interpolant, we go

through the following thought process. Draw your favourite two distinct continuous interpolants

of the data; for simplicity, we assume that they are contained in the unit square. Now apply the

map fi to these two (refer to Figure 2.2). It is clear that continuous interpolants are mapped

to continuous interpolants and by assumption these graphs lie in the images of the unit square

under the fi. Furthermore, when viewed as compact sets inside R2, these graphs of functions

are getting ‘squished’ together (in the Hausdorff metric sense) under each application of FFIF.

Though we are still working in H(R2), it is as if a contraction has been constructed on the Banach

space (C[0, 1], ‖ · ‖∞). In the next section, we explore the relationship between (H(X), dH) and

(C[0, 1], ‖ · ‖∞).


2.2 Introduction to Fractal Approximations

A square greyscale image can be thought of as the graph of a function I : [0, 1]2 → [0, 1].

Generally, few assumptions can be placed on such a function; notions of smoothness or even

continuity are unrealistic — but many images foster obvious structure visually despite lacking

these qualities. In fractal image compression (FIC), the structure exploited is that many images

are in some sense self-similar. In this section, we take a simplified instance of this application,

whereby a bounded self-similar function defined in one dimension on the unit interval can be

interpreted as a ‘one-dimensional image’. Furthermore, we place the unrealistic assumption of

continuity on this function for simplicity.

To convey the key ideas that are found in traditional fractal modelling in the simplest possible

manner, we will show how the final example presented in the preceding section is related to

function spaces and approximation. This requires identifying how our home H([0, 1]2) interacts

with standard function spaces such as L∞([0, 1]) and L2([0, 1]). Throughout this working, results

are presented only to convey the ideas fostered in fractal modelling, so the explicit details of

many results are omitted or deferred to later chapters.

2.2.1 From H([0, 1]2) to C([0, 1])

It is still unclear what the attractor is of the final IFS given in the last chapter. This question is

extraordinarily difficult to answer concisely in H([0, 1]2) but becomes clear in the space C([0, 1]).

To begin, the relationship between these two must be established which is done in the following

lemmas.

Lemma 2.12. If f, g ∈ L∞([0, 1]) then ‖f − g‖∞ ≥ dH(Graph(f),Graph(g)).

Proof. Fix f, g and consider the set Br(Graph(f)) = {z : z ∈⋃x∈Graph(f)Br(x)}. Then

Graph(g) ⊂ B‖f−g‖∞(Graph(f)).

Lemma 2.13. The action of FFIF on the graph of a function can be written as an operator

FFIF : C([0, 1])→ C([0, 1]), defined through

FFIF(g)(x) =N∑i=1

((x− xi−1)(yi − yi−1 − diyN )

xi − xi−1+ yi−1 + di

(g

(x− xi−1xi − xi−1

)))· 1[xi−1,xi)(x).

Proof. It is readily checked when g is continuous that FFIF(g) is as well. By construction we

require FFIF to obey the equation below for a point ui(x) ∈ [xi−1, xi):

FFIF

(x

g(x)

)=

(xi − xi−1 0

yi − yi−1 − diyN di

)(x

g(x)

)+

(xi−1

yi−1

)=

(ui(x)

FFIF(g)(ui(x))

),

where ui(x) = (xi − xi−1)x + xi−1. This maps the interval [0, 1] to [xi−1, xi]. The equation

above comes from looking at where a point (x, g(x)) on the attractor (or graph of the function)

2.2. INTRODUCTION TO FRACTAL APPROXIMATIONS 13

is mapped to under FFIF and equating it to its respective image. The above is a slight abuse

of notation as F acts on sets, so we are really doing the calculation on the singleton {(x, g(x))}.Solving the above for FFIF(g)(x) yields the desired result.

Proposition 2.14. [BH89] The attractor G of the IFS in Example 2.11 is the graph of a

continuous interpolant of the data {(xi, yi)}Ni=0.

Proof. Fix f, g ∈ C[0, 1]. Using an identical calculation to that in the proof of Lemma 2.21, we

obtain the inequality

‖FFIF(f)− FFIF(g)‖∞ ≤ max1≤i≤N

|di| · ‖f − g‖∞.

Thus FFIF is a contraction in the supremum norm on C([0, 1]), therefore the attractor is a

continuous function. Note that fi((0, 0)) = (xi−1, yi−1) and fi((1, yN )) = (xi, yi) so the attractor

of FFIF interpolates the data; denote this g∗. Finally from Lemma 2.13

dH(Graph(Fng),Graph(g∗)) ≤ ‖FnFIFg − g∗‖∞n→∞−−−→ 0.

And, since a graph uniquely defines the function, it must be the case Graph(g∗) := G ∈ H([0, 1]2)

is the attractor of FFIF.

Remark 2.15. A consequence of the above relation between H(R2) and C([0, 1]) is that one

may now produce a sequence of elements from H(R2) approaching an element of C([0, 1]). That

is, a sequence of disconnected ‘Rosella sets’ as in the Example 2.10 can be produced so that they

converge to the graph of a non-differentiable continuous function in the Hausdorff topology.

In this formulation, the values di ∈ (−1, 1) can be thought of as ‘free variables’. The typical man-

ner in which these are selected is to capture the roughness desired from a fractal approximate,

explained in the following subsection.

2.2.2 Box-Counting Dimension

When modelling rough data, it is reasonable to capture ‘how rough’ the data is, and one partic-

ular method for this is matching the box-counting dimension of an object with its approximant.

Definition 2.16. [Fal04] The box-counting (or Fractal/Minkowski) dimension of a set V ∈ R2

is given by

dimB(V ) := limε→0

− log(N (ε))

log(ε),

where N (ε) is the minimum number of boxes of diameter ε needed to cover the set E. If the

above limit does not exist, the upper/lower dimension is evaluated through the corresponding

lim inf/lim sup respectively, which is used as an estimated bound.


In later chapters we will discuss more formally the notion of non-integer dimension, but for now

it is only presented for exploratory purposes. The definition given above is a practical notion of

(possibly) non-integer dimension as the limit can be estimated through various methods, such

as a log–log plot of ε against N (ε) (see [Bar14]). Once this is obtained, we want to ensure the

attractor of our IFS has the same box-counting dimension as what it is approximating. There are

two fundamental theorems relating the IFSs presented thus far in this thesis to the box-counting

dimensions specifically of their respective attractors.

Theorem 2.17. [Mor46] If an IFS of N similitudes with scaling factors λi obeys the open set

condition, then the fractal dimension is the unique positive solution D to∑N

i=1 λDi = 1.

Remark 2.18. The function D →∑N

i=1 λDi − 1 is convex, decreasing, equal to N − 1 at D = 0,

and and has an asymptote at −1 as D → ∞. These are desirable properties for a numerical

solver: in particular, under these conditions, Newton’s root finding method can solve this non-

linear equation to machine accuracy.

Example 2.19. The fractal dimension of the Steemson triangle given in Example 2.10 is ap-

proximately D ≈ 1.5867. This was numerically calculated through Newton’s method with an

initialisation of D0 = 1.5.

Theorem 2.20. [BH89] The fractal dimension of G defined in Example 2.11 is the unique

solution to

N∑i=1

|di|aD−1i = 1,

when the interpolation points do not lie on a straight line and∑N

i=1 |di| > 1, otherwise D = 1.

The proof and discussion of these statements are deferred to Chapter 4.

2.2.3 Moving to L2([0, 1])

Working with the Hausdorff distance or uniform metric is computationally too difficult to readily

solve many optimisation problems that are required to fit a fractal model. A standard solution to

this problem is to revert in some way to Euclidean distance, where such minimisation problems

are simple to solve. This is precisely the key idea founded in traditional fractal image compres-

sion. While we now have the tools to fit a fractal interpolant to a given data set and match the

dimension, our method still contains a major flaw in that we have used all of our data in the

creation of our model. When the overall goal of modelling is to gain a better representation of

a function for compression, such a construction is not useful.

To progress, we will try to gain a ‘good’ approximate that uses as few maps as possible in an IFS.

For computational reasons, FIC always results to trading in the Hausdorff distance for Lp with

p 6=∞. We show how to make this transition in the case p = 2, improving on the methodology

presented in [IS13].

2.2. INTRODUCTION TO FRACTAL APPROXIMATIONS 15

Lemma 2.21. The operator FFIF is a contraction on the metric space C([0, 1]) equipped with

the L2 norm.

Proof. Through a direct computation we have

‖FFIFg − FFIFf‖2 =

√√√√ N∑i=1

di

∫ 1

0

(g


)− f


))2

· 1[xi−1,xi](x)dx

≤ max1≤i≤M

|di|

√√√√ N∑i=1

xi

∫ 1

0(g(x)− f(x))2dx

= max1≤i≤M

|di| · ‖f − g‖2 < ‖f − g‖2.

Remark 2.22. It is not stated in [IS13], but the operator FFIF may not be well defined for a

general L2 function, hence the restriction to C([0, 1]) here.

Fundamentally, fractal image compression is computationally feasible through the discovery of

the Collage Theorem listed below. This result is often proved before the statement of Banach’s

Fixed Point Theorem, but its reinterpretation to the application of fractal geometry is where it

gains its strength.

Theorem 2.23. [Bar14] Let (X, d) be a non-empty complete metric space and T : X → X be

a contraction mapping on X with contractivity factor λ. For the unique fixed point x∗ of T we

have

d(x, x∗) ≤ d(x, T (x))

1− λ∀x ∈ X.

Typically, the contraction mapping is the action of an IFS with the fixed point as the attractor.

The bound above states that if you have found an element of H(X) such that it remains rela-

tively ‘unchanged’ by the action of your IFS, then this element is close to your attractor, which

provides continuity (in the Hausdorff sense) in the IFS parameters. Its namesake stems from

the fact that, if you have an arbitrary compact set and try to form this set by shrunken copies

of itself, the maps made in this collaging process define an IFS whose attractor approximates

the set. Doing this process by hand is what is referred to as the graduate student algorithm and

is currently what gives the ‘most realistic’ fractal models.

For the following selection of the free parameters di we will use the Collage Theorem and follow a

method in [IS13]. Assume that we are given a continuous function g ∈ L2([0, 1]) and wish to find

a FIF with attractor g∗ such that it is close to this function (in the L2 sense, as an approximation

to the Hausdorff distance). Using the Collage Theorem, we overcome the difficulty of having


no direct relationship to the IFS parameters and the optimisation being solved. First, for some

C > 0 given that N > 2,

‖g − g∗‖22 < C‖g −FLg‖22

=

N∑i=1

∫ xi

xi−1

(g(x)− (x− xi−1)(yi − yi−1 − diyN )

xi − xi−1− yi−1 − di

(g


)))2

dx,

where g∗ is such that Graph(g∗) = G. Note that the above functional is convex in di as a

coercive quadratic form. Therefore we can solve the minimisation by simply finding the critical

point of the function, which we can achieve by setting partial derivatives to zero. This yields

dmini =

∫ xixi−1

(yi−1 + (x−xi−1)(yi−yi−1)

xi−xi−1− g(x)

)(yN (x−xi−1)xi−xi−1

− g(x−xi−1

xi−xi−1

))dx∫ xi

xi−1

(yN (x−xi−1)xi−xi−1

− g(x−xi−1

xi−xi−1

))2dx

. (2.1)

To discretise this result for data points, we can pose the same problem in a different measure

space. We may simply replace the Lebesgue measure on [0, 1] in the above formulation with

Dirac masses at the recorded data points; that is, instead of approximating g we approximate

{xi, yi}Mi=1 := {xi, g(xi)}Mi=1. Furthermore, to compute g(x−xi−1

xi−xi−1

)we make the approximation:

g ≈m∑i=0

yi 1{x : |x−xi|<|x−xj | ∀j 6=i}.

In [IS13], this was the method taken without accounting for the requirement that |di| < 1. The

above will typically give a valid solution if the fixed interpolation points are chosen carefully,

such as selecting local extrema points in an interval. To see why this ambiguity might occur,

consider the instance when we have data {(0, 0), (0.1,−0.1), (0.5, 1), (0.6, 1), (1, 0)} and select

{(0, 0), (0.1,−0.1), (1, 0)} as our interpolation data points. The above formulation yields that

d2 ≈ 1.05 > 1. By adding this requirement to the posed minimisation problem, we add a

constraint on the dimension as well giving a computationally feasible and novel fractal approxi-

mation to functions in C([0, 1]) in the L2 sense. In the next section, using this novel construction

we present the limitations of the methodology in FIC.

2.3 One-Dimensional Fractal Image Compression

In the following computational example, we will show the methodology of fractal approximation

in its most simple form. That is, finding a self-similar continuous function defined on [0, 1] that

approximates a (possibly non-smooth) function that displays self-similar properties. This yields

the following optimisation problem:

2.3. ONE-DIMENSIONAL FRACTAL IMAGE COMPRESSION 17

solve arg mindi∈R

n∑i=1

(c2id2i + c1idi + c0i);

subject to |di| < 1,n∑i=1

|di| > 1,n∑i=1

ki|di| = 1;

where

ki = aD−1i ,

c0i =

∫ xi

xi−1

(yi−1 +

(x− xi−1)(yi − yi−1)xi − xi−1

− g(x)

)2

dx,

c1i =

∫ xi

xi−1

2

(yi−1 +

(x− xi−1)(yi − yi−1)xi − xi−1

− g(x)

)(yN (x− xi−1)xi − xi−1

− g(x− xi−1xi − xi−1

))dx,

c2i =

∫ xi

xi−1

(yN (x− xi−1)xi − xi−1

− g(x− xi−1xi − xi−1

))2

dx,

are all assumed to be known values. The constraints in this problem are non-smooth so standard

optimisation techniques are not directly applicable

By assumption ki = aD−1i < 1, so the third constraint implies the second. Geometrically, the

second condition represents the points outside the L1 unit ball and the third equation is an

L1 ellipse in this area. Now the points of non-differentiability are eliminated by how the first

constraint intersects the L1-ellipse∑n

i=1 ki|di| = 1. This leaves 2n disconnected convex sets of

the form {di|∑n

i=1 sikidi = 1, di ∈ [0, 1)} for all combinations of si ∈ {−1, 1}. By relaxing the

problem to allow |di| ∈ [0, 1], we create a compact convex set with smooth boundary on which

we wish to minimise a convex function. This can be graphed for when N = 2.

d1

d2

−1 1

−1

1

Figure 2.3: Feasible set of solution for when N = 2. The areas of the constraints are coloured

by |di| < 1 (red),∑n

i=1 |di| > 1 (blue) — this intersection purple, and∑n

i=1 ki|di| = 1 in pink.

The intersection of these three constraints gives 22 = 4 disconnected smooth boundary convex

bodies.


The solution above is incomplete, because when n grows larger there are exponentially more

optimisations to solve. For general fractal image compression, both a continuous and a discrete

optimisation must be performed to model a photograph. A continuous optimisation is done to

find parameter magnitudes, which are the magnitudes of the di above. A discrete optimisation

must also be performed to find the orientations of the maps, these are the signs of the di —

determining whether or not the map ‘flips’ across the x-axis. In our simplified example, we

eliminate the discrete optimisation by taking the signs of the di given by the unconstrained

optimum solution given in Equation 2.1. This is the argument projection of the global optimum

to the (L2) closest convex body in our feasible set of solution.

On the next page we provide a computational experiment of implementing the algorithm given

above, whereby we highlight the limitations of this method. A FIF consisting of four maps de-

fined on the interval [0, 53 ] was constructed and truncated to [0, 1]. This creates a function that

is rich in self-similarity and it is not unreasonable to expect this function be well approximated

by another FIF. Implementing the given method with the agnostic choice of interpolation points

being evenly spaced in the domain of our approximating function, our FIF approximation to a

truncated FIF will never describe this function exactly since 4 and 5 are co-prime.

In the first given FIC algorithm [Jac92], an exact description of an (infinite resolution) image

of a Cantor set will never be achieved. Simply stated, in this algorithm the generalised ‘points

of interpolation’ are chosen dyadically and two is co-prime to 3 (the Cantor set sitting on the

points without a two in their trinary expansion). Generally, it is unknown explicitly what type of

function’s methods from FIC approximate — only that it has been shown to work well empirically

for photographs. For a graphically convincing approximation of this Cantor set image, a great

deal of functions (upwards of six) is needed, despite only two maps required to generate the

image to infinite resolution. Likewise, 16 maps were needed to gain an approximation that

resembled our self-similar function. For a final hindrance of this method, by moving to L2 much

of the intuition from the formulation in H(X) is lost.

Example 2.24. Take the (discretised) one-dimensional image I ∈ R1000 given by I := e1 =

(1, 0, . . . , 0), that is, sampled values from a function on the unit interval. An approximant to

this image is given by e2 = (0, 1, 0, . . . , 0). In any Lp norm for 1 ≤ p ≤ ∞ it will be the case that

e2 is a ‘bad’ approximant to e1. In the Hausdorff distance e2 is still close to e1, and visually, e2

is a ‘good’ approximation to e1.

The goal for the remainder of this thesis is to create a methodology that focuses on fractally

modelling an object, opposed to describing it, in the hope that when we do describe a self-

similar object it also captures its self-similar properties. For example, given an exact copy of

the Cantor set, how does one deduce if this set is the attractor of an affine IFS. Currently there

is no solution to this question recorded in the literature. We want to have an approximation

method that may find an exact solution to such an inverse problem when it exists.

2.3. ONE-DIMENSIONAL FRACTAL IMAGE COMPRESSION 19

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0Approximant Dimension = 1.2 Hausdorff Distance = 0.072

DataApproximant

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8


DataApproximant

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8


DataApproximant

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8


DataApproximant

Figure 2.4: Rough data (blue) and self-similar function approximatants (in the L2 sense) with

varying box-counting dimensions.

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8


DataApproximant

Figure 2.5: Data (blue) and an approximant that is self-similar, matches the data’s box-counting

dimension and close to the data in the L2 sense — that approximates the Hausdorff distance

well in this instance.

Chapter 3

The Probabilists P(X) and

Self-Similar Measures

‘Let’s first talk about a coin. Understand the coin, understand probability.’

— Pierre Portal.

In this chapter we move to P(X), the space of compactly supported normalised Borel measures

on a metric space (X, d). By working with measures, we can use the measure theoretic quantities

of moment data to assist in creating better models for self-similar objects. This leads us to the

main problem in this chapter: how do we explicitly relate moment values of a measure to our

IFS parameters? To address this, we derive formulas relating the moments of a one-dimensional

affine self-similar measure, complex similitude self-similar measure and two-dimensional affine

self-similar measure to the IFS parameters that generate them.

The moments of a self-similar measure have in general been of interest in constructing a Hilbert

space and studying its orthogonal polynomials [Man96, Man13] (see Figure A.2). We intend

to use them in addressing the inverse problem of reconstructing a self-similar measure from its

moments. To this end, we hope that simply measuring the moments of a data set will lead to

constructing an IFS that approximates a self-similar object. The formulas derived in the one-

dimensional and complex case have been studied elsewhere, but to the best of our knowledge

the formulas for the two-dimensional case to be a novel addition to the literature.

The chapter has the following structure. The space P(X) and a contractive operator whose fixed

point yields a ‘fractal measure’ is constructed. This is analogous to how we built self-similar

functions in H(X). We then move to deriving the moment formulas for a variety of self-similar

measures. This includes measures given by the attractor of IFSs consisting of one dimensional

real and complex affine functions, and finally two dimensional affine functions. We conclude with

computational examples of calculating the moments of two dimensional self-affine measures.

21

22 CHAPTER 3. THE PROBABILISTS P(X) AND SELF-SIMILAR MEASURES

3.1 Self-Similar Measures

To motivate moving on from H(X) to a more abstract space, consider the following example.

Example 3.1. Let the IFS FCant = {([0, 1], | · |) ; f1(x) = 13x, f2(x) = 1

3x + 23} be given

as in Example 3.1. The action of FCant is given by the Hutchinson operator (Definition 2.1),

FCant : H([0, 1])→ H([0, 1]) and its attractor is the Cantor set: C.

Given the Cantor set C, is there a computational way to find the IFS FCant, or any other IFS that

gives the same attractor? To simplify the problem, assume that a Cantor set is the attractor

of an IFS F containing two affine maps acting on [0, 1]. Computationally we are fitting the

fractal model F = {([0, 1], | · |); f1(x) = a1x+ b1, f2(x) = a2x+ b2} to C and solving for the IFS

parameters. Using the methods developed from fractal image compression used in the former

chapter yields the following optimisation problem:

solve arg mina1,a2∈(−1,1)b1,b2∈[0,1]

dH(C, A) subject to F(A) = A,

where A ∈ H(X), and the constraint in this optimisation implies that A is the attractor of

the IFS F . Ideally one would wish to minimise the Hausdorff distance directly. The space

(H(X), dH) lacks a norm, so any general theory of optimisation pertaining to a Banach space is

lost. Furthermore, in this example one can no longer switch to C[0, 1] and create a least squares

type solution as we did in our simplified FIC method, as the Cantor set cannot be identified in

a such a space. In H(X) there is no immediate solution to the inverse problem above.

A new home must be created for our fractal models. In a synonymous fashion to how H(X)

was created, the space P(X) of compactly supported normalised Borel measures on a compact

metric space will be constructed. Here we place the stronger assumption of (X, d) being compact

rather than just complete so the arguments presented are more concise.

Remark 3.2. Given a metric space (X, d), the Borel sets are precisely those which can be

formed through countable unions, countable intersections and compliments of the topology’s

open sets. This collection of such sets will be denoted B(X).

Definition 3.3. Let (X, d) be a compact metric space, then P(X) is the space of normalised

Borel measures onX. This space is metrisable with the Monge-Kantorovich metric given through

dP(µ, ν) := supg∈Lip1(X)

{∫gd(µ− ν)

},

where the notation∫gd(µ − ν) :=

∫gdµ −

∫gdν is used and Lip1(X) := {f : X → R| |f(x) −

f(y)| ≤ d(x, y)∀x, y ∈ X} is the space of Lipschitz functions X → R whose Lipschitz constant

is no larger than one.

3.1. SELF-SIMILAR MEASURES 23

Proposition 3.4. Let (X, d) be a compact metric space, then the metric space (P(X), dP) is

compact.

A sketch of this proof will be briefly discussed. This is done to highlight why the claim is true

and to also show that the topology on this space is equivalent to a weak topology.

Assume (X, d) is a compact metric space and let C(X,R) be the space of continuous functions

from X to R. Then every element in C(X,R) is necessarily bounded. The dual of this space, de-

noted C∗(X,R), is the space of regular Borel measures on X with bounded variation through the

Riesz-Markov Representation Theorem. Note that the measures in C∗(X,R) are not required to

be normalised. In the dual space induced norm, the unit ball of C∗(X,R) is precisely the set of

measures such that µ(X) ≤ 1 for µ ∈ C∗(X,R). By the Banach-Alaoglu Theorem, the unit ball

of C∗(X,R) is compact under the weak-∗ topology. It is a known fact [Hut79] that this topology

is metrisable with the metric dP on the closed subset {µ | µ ∈ C∗(X,R), ‖µ‖ = 1} = P(X).

Similar to how an IFS was defined in the previous chapter, an IFS with probabilities will be

defined in an analogous way. Let a stochastic vector of length N be p = (p1, . . . , pN ) ∈ RN such

that 0 < pi < 0 and∑N

i=1 pi = 1.

Definition 3.5. Let an IFS F = {(X, d); f1, . . . , fN} be given, where it is further assumed

(X, d) is compact. Then given a stochastic vector p = (p1, . . . , pN ), an IFS with probabilities is

the collection

M := {(X, d); f1, . . . , fN ; p}.

Finally a contractive operator on our newly created compact metric space will be defined.

Through another mild abuse of notation, the symbol M for an IFS with probabilities is also

used to represent this operator M : P(X)→ P(X) in the definition below.

Definition 3.6. Let an IFS of probabilities M = {(X, d); f1, . . . , fN ; p} be given. Define f−1i :

B(X) → B(X) through f−1i (U) := {u ∈ X : f(u) ∈ U} for U ∈ B(X). Then the operator

M : P(X)→ P(X) is given through

(M(µ))(S) =

N∑i=1

piµ(f−1i (S)) ∀S ∈ B(X).

This operator M is a contraction on P(X) equipped with the metric dP. To see this in a

simplified instance, take the case when our IFS with probabilities consists of only one map f1

forcing p1 = 1, granting

dP(M(µ),M(ν)) =

supg∈Lip1(X)

∫gd(Mµ−Mν) = sup

g∈Lip1(X)

∫g ◦ f1d(µ− ν) = sup

g∈Lip1(X)

∫λ1gd(µ− ν),


where λ1 is the contractivity factor for the map f1. It is easily checked by definition that

g := 1λ1g ◦ f1 ∈ Lip1(X). The argument above works for an IFS with multiple maps, but makes

the argument unclear with notation.

This implies the operator M has a unique attractive fixed point through Banach’s Fixed Point

Theorem. We will denote this fixed point µA and call this the attractive measure of an IFS with

probabilities. To demonstrate the relationship between an IFS and an IFS with probabilities we

present the following example.

Example 3.7. Let MCant = {([0, 1], | · |); f1(x) = 13x, f2(x) = 1

3x+ 23 ; p1 = p2 = 1

2} be an IFS

with probabilities. Then taking MCant : P([0, 1]) → P([0, 1]) as defined above, we may iterate

the operator MCant on any element of P([0, 1]) to aid in finding the attractive measure µA.

Seeing how FCant acted on an arbitrary set in H([0, 1]) in Example 3.1 was difficult, thus the

unit interval was used as an ‘easy’ set to iterate. To stay in familiar territory here, we iterate

a measure that is absolutely continuous with respect to Lebesgue measure which we denote by

µL. Specifically, let dMn

dµLbe the Radon-Nikodym derivative ofMn(µL) with respect to Lebesgue

measure. Then dM0

dµL(x) = x for x ∈ [0, 1] µL-almost everywhere. Proceeding with one iteration

of MCant reveals some intuition:∫dMCant(µL)(x) =

1

2

(∫3 · 1[0,1/3](x)dµL(x) +

∫X

3 · 1[2/3,1](x)dµL(x)

)⇒ dM1

dµL(x) =

3

2

(1[0,1/3](x) + 1[2/3,1](x)

),

where we are identifying dM1

dµLup to Lebesgue-almost everywhere equivalence. After repeatedly

applyingMCant, the general relationship dMn

dµL(x) =

(32

)n ·1FnCant([0,1])

(x) can be identified. This

is uniform measure on the nth approximation of the Cantor set given in Example 3.1. The

attractor ofM is ‘uniform measure on the Cantor set’ which we denote µC . In general, the fixed

point measure µA is supported on a subset of A [Bar14]. A ‘traditional’ probability density

cannot be written for µC as this measure is singular with respect to Lebesgue measure, hence

the Radon-Nikodym derivative is not defined. This restriction is similar to how we cannot write

the Cantor set as a union of intervals, yet its most common construction comes from the closure

of the limit of a union of intervals.

In the example above, it was not straightforward to make a series of pictures converging to our

fractal attractor as we did in Example 2.1. We now ask, is there a way to visualise our fractal

measures? The short answer is yes [Elt87], through an algorithm called the chaos game [BE88].

This algorithm will give an efficient means to compute images of fractal objects in H(X) as the

support of µA is A. Furthermore, this algorithm will allude to how measure theoretic quantities

aid in reconstruction of a fractal object in H(X). Finally, we rid ourselves of some misconcep-

tions of fractal geometry given by the deterministic algorithm of fractal generation (see Example

2.1).


On a side note, the following survey on an easily made mistake in fractal geometry was con-

ducted on several Honours/Masters students at ANU MSI. Take the IFS FCant from Example

3.1 and look at the limit FnCant([0, 1]) as n tends to infinity. When questioned, those asked re-

sponded that this limit (in the Hausdorff sense) will be the Cantor set. Abusing some notation,

the question of FnCant((0, 1)) as n tends to infinity was posed. The answer always given was

the empty set. In iterating (0, 1), we have thrown away only a countable amount of points in

comparison to iterating [0, 1], whereas the Cantor set is uncountable. The actual answer to the

second question is that we are left with µC-almost all the Cantor set. Less intuitive examples

of this same misunderstanding can be made, such as removing countably infinite line segments

from a Sierpinski triangle. This point will be formalised later in the language of measure theory.

What is noteworthy is the different perspective that thinking in a measure theoretic manner

grants, whereby the confusion above should never happen. This is a minor insight to what

the space P(X) has to offer in better understanding fractal objects. We will see that studying

fractals in P(X) will lead to new ways in which we can create fractal models.

Before explaining this new algorithm, some properties of self-similar measures need to be de-

duced. These properties are most easily seen through an explicit example with the general

theory quoted. The tools required to explicitly prove the quoted results will be presented when

we unite the spaces H(X) and P(X) in the next chapter.

Example 3.8. Consider the IFS FL = {([0, 1], | · |); f1(x) = 12x, f2(x) = 1

2x + 12)}. Define an

associated IFS with probabilities ML by equipping the maps in FL with p1 = p2 = 12 . The

respective attractors are [0, 1] ∈ H([0, 1]) and Lebesgue measure on [0, 1], which we previously

denoted by µL ∈ P([0, 1]).

Using this example, a dynamical system will be created; that is the tuple (X,B(X), µ, T ) where

X is a set, B(X) its associated Borel subsets, µ a finte measure on B(X) and T : B(X)→ B(X)

a transformation such that µ(T−1(O)) = µ(O) for all O ∈ B(X). In our example above, we have

the first three items for this tuple. To construct the transformation T we require the following

definition.

Definition 3.9. If an IFS F = {(X, d); f1, . . . , fN} is such that each of the maps fi is invertible,

then F is an invertible IFS. In this case, the inverse IFS is given by F−1 = {(X, d); f−11 , . . . , f−1N }.In this case we define F−1 : B(X)→ B(X) through

F−1(O) =N⋃i=1

f−1i (O).

Example 3.10. (continuing Example 3.8)

The inverse IFS of FL is F−1L = {([0, 1], | · |); f−11 (x) = 2x|[0,1/2], f−12 (x) = (2x−1)|[1/2,1]}, where

the notation |[0,1/2] is taken to mean the domain of the function is restricted to the interval

[0, 1/2]. Typically this restriction need not be written through the definition of f−1 : B(X) →


B(X), but we do it here for illustrative reasons that become clear in an example given later.

Note that F−1L ({12}) maps to {0, 1}, whilst for every x ∈ [0, 1] \ {12} the singleton set {x} is

mapped to another singleton under FL. This point will be formalised later, but the idea here

is that {12} is a µL-null set so the above ambiguity does not matter for ‘almost everywhere’

statements.

Lemma 3.11. Let F be an IFS consisting of invertible functions and M be a corresponding

IFS with probabilities. Let the fixed points/attractors of these be A ∈ H(X) and µA ∈ P(X)

respectively. Then F−1 preserves the measure µA. That is µA(F(O)) = µA(O) for all O ∈ B(X).

Proof. Take O ∈ B(X),

µA(F(O)) =M(µA(F(O))) =

N∑i=1

piµA(f−1i fiO) =

N∑i=1

piµA(O) = µA(O).

Definition 3.12. A dynamical system is a Borel measure space (X,B(X)) with a normalised

Borel measure µ : B(X) → [0, 1], and a measure preserving transformation T : (X,B(X), µ) →(X,B(X), µ) as defined in the preceding lemma. We define this as the tuple (X,B(X), µ, T ).

Corollary 3.13. Given an IFS with probabilities, the tuple (X,B(X), µA,F−1) forms a dynam-

ical system.

Remark 3.14. The above proof of Lemma 3.11 does not depend on the probabilities chosen.

In particular, there are uncountably many measures given by equipping an IFS F with various

stochastic vectors that all make the function F a measure preserving transformation.

Example 3.15. (continuing Example 3.8)

The tuple ([0, 1], B([0, 1]), µL,F−1L ) forms a dynamical system, explicitly it is a n-ary system

[Wal00] with n = 2 that can be rewritten (up to Lebesgue-almost everywhere equivalence) as

([0, 1], B([0, 1]), µL, x → 2x mod 1). The graph of this function is shown below. Note that

x 7→ 2x mod 1 maps the unit interval to the unit interval. In the language of IFS, F−1 maps

the attractor A to the attractor A in general.

12

0

1

1

x

F−1

L

18

Figure 3.1: n-ary transformation for n = 2 and the (splitting) orbit of 18 under F−1L .


The ambiguity alluded to in Example 3.10 will now be addressed. The function x 7→ 2x mod 1

is a well defined function, however F−1 : [0, 1] → [0, 1] is not well defined by looking at how

F−1 maps singleton sets by again noting F−1({12}) = {0, 1}. In the definition of a measure

preserving transformation however, we only require that µ(T−1O) = µ(O). Therefore, we only

require that F−1 and x → 2x mod 1 agree on a set of µ-measure one. Explicitly, the set we

take is U = [0, 1] \⋃n∈NFn({12}) to ensure that this ambiguity does not happen for the nth

iterates of F . It is easily seen that µL(U) = 1 as U has only disregarded a countable collection

of Lebesgue null-measure sets.

It is convenient to use fractal codespace to continue with our example. The theory presented in

this example holds rather generally, so the machinery is meant to be illustrative.

Recall [N ] := {1, . . . , N} and let [N ]k be the words of length k, with [N ]N = limk→∞[N ]k. A

(metrisable) topology on [N ]N can be defined by letting the sets of the form:

∆(ω1 · · ·ωl) = {ω ∈ [N ]N : ω|l = ω1 · · ·ωl} ωi ∈ [N ],

form a topological sub-base, where ω|l is the lth truncation of the infinite word ω. In general

ergodic theory, the sets of the form above are referred to as cylinder sets [Wal00]. Furthermore,

for each letter i ∈ [N ] we associate a fixed probability pi ∈ (0, 1) such that∑N

i=1 pi = 1 and

measure ν : B([N ]N)→ [0, 1] where ν(∆(ω1 · · ·ωl)) =∏li=1 pωi .

A dynamical system on the measure space ([N ]N,B([N ]N), ν) can be formed by considering the

operator S : [N ]N → [N ]N given by S(ω1ω2ω3 . . . ) = (ω2ω3 . . . ). It is readily checked that S is

ν-measure preserving on the cylinder sets defined above through the following calculation

ν(S−1(∆(ω1 · · ·ωl))) = ν

⋃j∈[N ]

∆(j ω1 · · ·ωl)

=∑j∈[N ]

pjν(∆(ω1 · · ·ωl)) = ν(∆(ω1 · · ·ωl)),

where we have used that the sets: ∆(j ω1 · · ·ωl) are disjoint for j ∈ [N ]. A dynamical system is

given by the tuple ([N ]N,B([N ]N), ν, S), that is better known as a Bernoulli shift on N letters

with probabilities p.

With this notation in place, we may introduce a fundamental structure in fractal geometry.

Proposition 3.16. The dynamical system ([0, 1], B([0, 1]), λ,F−1L ) is isomorphic (in the ergodic

theory sense [Wal00]) to a Bernoulli shift on two letters with uniform probabilities.

Proof. Two dynamical systems are isomorphic if there exists a bijective doubly measure preserv-

ing map π (that is π and π−1 are measure preserving transformations) such that the following

diagram commutes.

B([0, 1]) B([0, 1])

B([N ]N) B([N ]N)

F−1

π−1 π−1

S

π π


Let U = [0, 1] \ {x : x = i2k, k ∈ N, i ∈ {1, . . . , 2k − 1}} and note that λ(U) = 1. Now given

a, b ∈ U with a < b, we can uniquely represent a and b in their dyadic expansions as a =∑∞

i=1ai2i

and b =∑∞

i=1bi2i

for ai, bi ∈ {0, 1}. Define the mapping π−1 : U → [2]N through

π−1(a) = (a1 + 1, a2 + 1, . . . ),

where the ai are the digits given by the number a’s dyadic expansion. It is easily checked that

π−1 is bijective and doubly measure preserving on dyadic intervals, whose unions can arbitrarily

approximate any interval. This shows π−1 is measure preserving on the generators for our Borel

subsets which means π−1 is doubly measure preserving (Theorem 1.1 [Wal00]). Finally, we have

for an arbitrary a ∈ U that

π−1 ◦ F−1L (a) = π−1 ◦ F−1L

( ∞∑i=1

ai2i

)= π−1

( ∞∑i=2

ai2i

)= (a2 + 1, a3 + 1, . . . ),

and

S ◦ π−1(a) = (a2 + 1, a3 + 1, . . . ).

Thus, π−1 ◦ F−1L = S ◦ π−1, giving the claim.

The above map π defined as the inverse of π−1 is often referred to as the projection map from

codespace. Proposition 3.16 is a simplification of the stronger result given below.

Remark 3.17. If an IFS that did not obey the OSC was selected, opposed to the one given in

the example above, then π will not be bijective.

In this case, the projected dynamical system from codespace to the attractor becomes a factor

of a Bernoulli shift [Wal00] and leads into the theory of fractal tops.

Theorem 3.18. Assume an invertible IFS F of N maps that obeys the open set condition is

given. For any associated IFS with probabilities M, the dynamical system (X,B(X), µA,F−1)is isomorphic to a Bernoulli shift on N letters with probabilities p.

Remark 3.19. To prove the theorem above with the tools we have presented thus far would be

cumbersome, so we delay the key detail to Chapter 4. Instead, we will highlight the importance

of the OSC. This assumption gives sufficient separation to prove that points on the attractor of

F with multiple addresses form a µA-null set and may be disregarded.

The isomorphism in Theorem 3.18 gives a great amount of structure on our fractal objects when

they obey the open set condition, in the sense that we may think of them in the same way as

one would model flipping a coin repeatedly.

Corollary 3.20. The dynamical system (X,B(X), µA,F−1) is ergodic. That is any B ∈ B(X)

which satisfies F(B) = B has µA-measure 0 or 1.


Given this structure, another way to generate our fractal objects can be given. We do this by

approximating the fixed-point measure µA of an IFS with probabilities through ergodic theory.

Given a dynamical system (X,B(X), µ, T ), the Point-Wise Ergodic Theorem states

limn→∞

1

n

n∑i=1

f(T i(x0)) =

∫fdµ

for µ-almost any x0 ∈ A and f : X → X continuous. The interpretation of this equality is that

the average given by following the orbit of a single point x0 is equal to average of the measure.

Or equivalently

1

n

n∑i=1

1(·)(xk)weakly−−−−→ µ,

where xk is the orbit of a point under iteration T in the support of the measure of µ. The

quantity on the left is what we will refer to as the empirical distribution of µ.

Through the ergodic theorem, we are able to construct an algorithm that approximates the fixed

point measure µ of an IFS with probabilities in a computationally efficient manner. This concept

is explored deeply in the work of John Elton [Elt87], whereby the language of iterated function

systems and ergodic theory are united. The key detail in this work is that it is non-trivial to

sample a point on the support of the fixed-point measure µA. In Elton’s work, he shows under

far fewer assumptions than what we have here that when the IFS is ‘contractive on average’,

one can create such an algorithm. We present the main lemma he proved, which is trivial under

our current assumptions, to elucidate this central algorithm.

Notation 3.21. Given a word ω1ω2ω3 · · · = ω ∈ [N ]∞ for ωi ∈ [N ], we recall ω|n is the nth

level truncation of this word. For instance take [3] = {1, 2, 3} then (123123111 . . . )|3 = (123).

Furthermore, given an IFS F = {(X, d); f1, . . . , fN} we define fω|n : X → X through

fω|n(x) := fω1 ◦ · · · ◦ fωn(x).

Lemma 3.22. Let x, y ∈ X with x 6= y, any ε > 0, and any word ω ∈ [N ]∞ be given. Then

there exists n ∈ N such that

d(fω|m(x), fω|m(y)) < ε d(x, y),

for all m ≥ n.

This result states that given any point y ∈ X and any point x in the support of µ, that these

points are mapped arbitrarily close to one another under fω|m for m large enough. If this

sequence of functions is chosen with the appropriate probabilities to approximate the orbit of a

sequence {T k(x0)}nk=0, then we have effectively approximated a sample from the measure µ and

the algorithm is complete. An example is shown below.


Figure 3.2: The distribution 1n

∑ni=1 1(·)(xk) for various n (with added colouring corresponding

to which map was last applied to xk in its orbit) weakly approaching a measure supported on a

Steemson triangle.

Given this algorithm to approximate a fractal measure, we may demonstrate how H(X) and

P(X) differ in modelling photographs. Two fractal models of leaves are shown below, one in

H([0, 1]2) and the other in P([0, 1]2). In this example, the self-similar measure is represented by

a grey-scale image by making a frequency plot of the empirical distribution, whilst an element

in H([0, 1]2) is shown as a black and white image being a set. From a modelling perspective

P(X) is a ‘larger’ space.

Figure 3.3: The attractors of an IFS in H([0, 1]2), and an analogous IFS with probabilities in

P([0, 1]2) produced by the graduate student algorithm. In the first instance, H([0, 1]2) has the

ability to model sets and is therefore represented by a black and white image. In the second

case we display the empirical distribution of our fractal measure as a grey-scale image. In this

instance we also can model the leaf interior through carefully selecting the self-similar measure

supported on the self similar set A.

Given the image of the leaf on the right, can we find the IFS that generates it? This is one

of the major open problems in fractal reconstruction and approximation which we will aim to

answer in a simplified instance. We aim to do this by relating where the orbit of {xk}Nk=0 lands

‘on average’ in the algorithm above to the IFS maps, and further, their map parameters that

generate this orbit.

3.2. FRACTAL MOMENT THEORY 31

Specifically, we view µA as a probability measure and aim to evaluate∫xdµA(x) which is the

expectation of where this point lands. More broadly we would like to calculate the moments of a

probability distribution{∫

xkdµA(x)}∞k=0

. Note that as the measure µA has compact support,

all of these are integrals are finite for each k ∈ N0. Furthermore, such a measure can be described

by the infinite collection of such integrals — this creates a determined moment problem.

Proposition 3.23. Let a compactly supported finite Borel measure µ on Rn be given. Then all

the moments mn :=∫xndµ are finite, and further, uniquely determine the measure µ.

Proof. We may re-interpret the metric dP through stating a measure is characterised by the

integral of continuous functions, that is∫fdµ =

∫fdν ∀f continuous

then µ = ν. If all the moments exist and are equal on either measure µ or ν, then we have

that∫qdµ =

∫qdν for all polynomials q. As we assumed that µ and ν have compact support,

continuous functions are uniformly approximated by polynomials granting that we may find

polynomial q′ such that ∫|f − q′|d(µ− ν) < ε,

for any ε > 0 and any continuous function f . This implies that dP(µ, ν) = 0, so µ = ν.

Remark 3.24. The statement above is not always true when a measure does not have compact

support as polynomials may not uniformly approximate continuous functions. In general, a

measure is not uniquely determined by finitely many moments, this is an undetermined moment

problem.

The fractal models presented in Figure 3.3 were formed by looking at the self-similar properties

of a maple leaf. The current standard to attempt such a modelling task is to have a human will

heuristically create the maps that approximate the leaf — recall this is the graduate student

algorithm. No computer algorithm is available to automate such a process. As shown above,

the grey-scale leaf model is characterised by its infinite collection of moments. Our goal is to

use the moments of an object to assist a computer in creating such self-similar models. Now

that the moment problem has been motivated, the following section describes how one efficiently

calculates the moments for a given affine IFS.

3.2 Fractal Moment Theory

Given an IFS whose functions are all affine, one may identify a recurrence relation to calculate

the moment integrals. This allows for the explicit closed form formula’s for the moments of a

self-similar measure, completely dependent on the IFS parameters. For the calculation of fractal

moments, we rely heavily on the following proposition.


Proposition 3.25. LetM = {(X, d); f1, . . . , fN ; p} be an invertible IFS with probabilities. Then

the attractive measure µA of M obeys∫g(x)dµA(x) =

N∑i=1

pi

∫g(fi(x))dµA(x),

for all g ∈ L1(X,B(X), µA).

Proof. Fix g ∈ L1(X,B(X), µA), then by direct computation,∫g(x)dµA(x) =

∫g(x)dMµA(x) =

∫g(x)d

(N∑i=1

piµA(f−1i (x))

)

=N∑i=1

pi

∫g(x)dµA(f−1i (x)) =

N∑i=1

pi

∫g(fi(x))dµA(x),

where we have used a change of variable and the linearity of the integral above.

We will investigate three cases of calculating the moments for a self-similar measure. These

cases pertain to what type of functions are given in an IFS with probabilities. The functions fi

in our IFS will have the forms:

1. f : R→ R: one real dimension affine functions;

2. f : C→ C: one complex dimension affine functions except complex conjugation;

3. f : R2 → R2: two-real dimension affine functions.

The first two instances have already been investigated in [BD85, EST07, JKS11]. Only in [BD85]

were these moment integrals considered for practical use. In [EST07] a neat convergence result

is shown for the complex case, and in [JKS11] this result is quoted and used to study the Hilbert

spaces that can be formed by a self-similar measure. We firstly review these first two cases, then

present an extension that has use for fractal modelling.

3.2.1 One-Dimensional Affine Moments

In this section we will be restricting our IFSs to those of the form,

F = {([0, 1], | · |); f1(x) = a1x+ b1, . . . , fN (x) = aNx+ bN}, (3.1)

where |ai| < 1, bi ∈ [0, 1]. This simple case is shown to gain a better understanding for the

calculation of these moments before moving to more complicated examples.

Remark 3.26. Taking the metric space above to be ([0, 1], | · |) is not a specialisation of consid-

ering the space (R, | · |). Any fractal attractor in R is a compact subset and can be (uniformly)

scaled and translated to fit inside the unit interval.


Consider Proposition 3.25 and take g : R → R given through g(x) = xn. This function is in

L1([0, 1],B([0, 1]), µA) as our measure is normalised and compactly supported.

Proposition 3.27. Let F be an IFS of the form given by Equation 3.1. Associate a stochastic

vector p to create the IFS with probabilities, say M, and let µA be its attractive measure. Then

the moments mn :=∫xndµA(x) obey

mn =N∑i=1

pi

n∑j=0

(n

j

)ajmjb

n−j .

Proof. Using the recurrence in Lemma 3.25 with selecting g : R→ R by g(x) = xn yields

∫xndµA(x) =

N∑i=1

pi

∫(fi(x))ndµA(x).

Substituting the fi into this equation and using the binomial expansion gains,

∫xndµA(x) =

N∑i=1

pi

∫(aix+ bi)

ndµA(x) =N∑i=1

pi

∫ (n

j

)ajix

jbn−ji dµA(x)

=

N∑i=1

pi

(n

j

)aji

∫xjµA(x)bn−ji =

N∑i=1

pi

n∑j=0

(n

j

)ajimjb

n−ji .

We have now established a recurrence relation on the calculation of moments. Of particular

interest is that the equation above is linear in the mn terms and m0 = 1 as µA is normalised,

giving the following corollary.

Corollary 3.28. The moments of µA defined above can be computed exactly through the recur-

rence for n > 0,

mn =

∑Ni=1 pi

∑n−1j=0

(nj

)ajmjb

n−j

1−∑N

i=1 piani

,

where m0 = 1.

Remark 3.29. The denominator in the above formula will not be 0 as∑N

i=1 piani < 1 for all

n ∈ N.

Example 3.30. Let MCant = {([0, 1], | · |); f1(x) = 13x, f2(x) = 1

3x + 23 ; p1 = p2 = 1

2} be as

in Example 3.1 with the attractive measure being uniform Cantor measure: µC . The graph of

the cumulative distribution (the real valued function c →∫1[0,c]dµC) is shown below, which is

commonly known as the ‘Devil’s Staircase’. This function is known to be (Holder) continuous,

indicating that the measure µC has no atoms, that is singleton sets are of µC-measure zero.


0.0 0.2 0.4 0.6 0.8 1.0

c

0.0

0.2

0.4

0.6

0.8

1.0

∫ c 0dµC

Figure 3.4: Cumulative distribution of the Cantor measure, which is also a self-similar function.

The first ten moments of the (uniform) Cantor measure are readily calculated and equal to:

M10 =

(1,

1

2,3

8,

5

16,

87

320,

31

128,10215

46592,

2675

13312,

2030721

10915840,

3791353

21831680

)T.

Proposition 3.27 shows that the higher order moments are linearly related to the lower order

terms. The following passage aims to capture this structure in a systematic way so that the

moments can be calculated efficiently. Riordan arrays are infinite lower triangular matrices that

can be used to capture a sequence of terms through matrix multiplication [WW08]. We will

be using a simple example of these in the form of a variant of the Pascal matrix for binomial

expansions. The reason we make the comment to the general theory is that in calculating the

moments in higher dimensions, the Riordan array structure is lost which was desirable to keep

in [JKS11].

Notation 3.31. Let mn be defined as above (mn :=∫xndµA(x)). Let Mx be in`∞(R), the space

of bounded countably infinite sequences with entries from R, with entries given by (Mx)i = mi.

Note that since our self-similar measure is assumed to be normalised and contained in the unit

interval, the moments of such a measure are necessarily bounded and it is appropriate for Mx

to be in `∞(R). Graphically we have:

Mx :=

∫x0dµA(x)∫x1dµA(x)∫x2dµA(x)

...

=

m0

m1

m2

...

.

We call Mx the moment vector associated with the measure µA. Furthermore we let Mx,n be

the nth level truncation of this vector for n ∈ N0.

This notation grants an easy simplification on our moment formula through the following Propo-

sition.


Proposition 3.32. Let f : R → R with f(x) = ax + b, and Mx ∈ `∞ be a vector of moments.

Then

φf (Mx) :=

∫f(x)0dµA(x)∫f(x)1dµA(x)∫f(x)2dµA(x)

...

= R(a, b)Mx,

where R(a, b) is an (infinite) matrix defined entry-wise by (R(a, b))i,j = 1i≥j(ij

)aibi−j where

i, j ∈ N0.

Proof. Recall that∫f(x)ndµA(x) =

∑nj=0

(nj

)ajbn−jmj . Then through a direct calculation

(φf (Mx))i,j=

∞∑k=0

(R(a, b))n,k(Mx)k =

∞∑k=0

1n≥k

(i

j

)aibi−jmk =

n∑k=0

(i

j

)aibi−jmk =

∫f(x)ndµA(x).

Writing out the first few terms of the matrix R(a, b) reveals its structure:

R(a, b) =

(00

)a0b0 0 0 · · ·(

10

)a0b1

(11

)a1b0 0 · · ·(

20

)a0b2

(21

)a1b1

(22

)a2b0 · · ·

......

.... . .

.

Using the above proposition we may easily rewrite the previous formula given in Proposition

3.27.

Corollary 3.33. Let R be defined as in the preceding proposition, then the recurrence in Propo-

sition 3.27 for a specified IFS with probabilities M can be rewritten as

Mx =N∑i=1

piR(ai, bi)Mx. (3.2)

Define Rn(a, b) to be the n × n truncation of R(a, b) for a, b ∈ R. Furthermore, define the

operator ΦR,M,n : Rn → Rn associated to an IFS with probabilities M through:

ΦM,n(x) :=N∑i=1

pi Rn(ai, bi)x.

Remark 3.34. By putting M in the subscript of the operator Φ, we have specified all the

information needed to form Φ. This is important to note as we will keep the notation Φ for

more generalised IFS than what has been firstly presented. Further, the operator ΦM,n is linear

on finite dimensions — so it may be represented by an explicit matrix. For this simple initial

case, we restrict the convergence analysis of an iterative formula to find these moments only to

finite dimensions.


Observation 3.35. The matrix that represents ΦM,n is lower triangular as the sum of lower

triangular matrices. Furthermore it has eigenvalues {∑N

i=1 piaji}nj=0.

Proposition 3.36. The sequence {ΦkM,n(v)}∞k=1 converges to a multiple of Mx,n as k →∞ for

Lebesgue-almost any v ∈ Rn.

Proof. Equation 3.2 shows that Mx,n is an eigenvector of ΦkM,n with corresponding eigenvalue

one. By Observation 3.35, one is the maximum eigenvalue of ΦM,n, so by the Von-Mises iteration

(power iteration), ΦkM,n(v)

∞k=1

converges to the dominant eigenvector for v not orthogonal to

Mx,n.

The above working yields two systematic stable methods to calculate the moments of a one

dimensional affine IFS in an efficient manner. One may either solve the linear system given in

Equation 3.2 or use the iterative method given above. The first method involves solving a lower

triangular system of equations, that can be done by the method of forward substitution which

will recover the formula obtained in Corollary 3.28. The former method will give a convergent

geometric series whose limit is a multiple of Mx. The convergence ratio for this geometric

series is no larger than∑N

i=1 |piai| (this bounds the norm of the second largest eigenvalue of the

operator ΦM,n). The former method avoids matrix inversion.

Example 3.37. Consider the IFS with probabilities given in Example 3.30. Then using the

iterative formula, the computed, we now have

Φ5MCant,10

((1, . . . , 1)T ) =

(1, 0.5000, 0.3750, 0.3125, 0.2718, 0.2421, 0.2192, 0.2009, 0.1860, 0.1736)T ,

which agrees with the exact calculation up to four decimal places.

3.2.2 One Dimensional Affine Complex Moments

This section presents the commonly used moment matrices studied in [EST07, JKS11] and

what was cited as a solution to the ‘Two Dimensional moment inverse problem’ in [HM90].

The matrix equations below should be familiar to the reader from the previous subsection.

Restricting attention to IFSs with probabilities of the form:

M = {(B1,C, | · |); f1(z) = a1z + b1, . . . , fN (z) = aNz + bN ; p}, (3.3)

where B1,C := {z ∈ C | |z| ≤ 1} and ai, bi ∈ B1,C with |ai| 6= 1.

Remark 3.38. Like the remark in the previous section, the assumption that the metric space

above is not all of C is not restrictive as one can always scale an IFS attractor to fit inside the

unit ball.


Recall µA is the attractive measure for this IFS with probabilities. Let Mz×z ∈ `∞(C)× `∞(C),

the space of infinite matrices with bounded coefficients, be such that mi,j := (Mz×z)i,j =∫zizjdµA(z) ≤ 1. Finally allow M z×z,n be the n× n truncation of Mz×z.

Proposition 3.39. Let M be of the form 3.3, then the moment matrix Mz×z of the invariant

measure µA obeys

Mz×z =

N∑i=1

piR(ai, bi)Mz×zR∗(ai, bi),

where R∗ is the conjugate transpose of R.

Proof. Through a calculation on the components of the matrix,(N∑r=1

prR(ar, br)Mz×zR∗(ar, br)

)i,j

=N∑r=1

pr

∞∑k=0

∞∑l=0

1i≥k1j≥l

(i

k

)akbi−k

(j

l

)albj−l

∫zkzldµA(z)

=N∑r=1

pr

∫(akz + bk)

i(akz + bk)jdµA(z)

=

∫zizjdMµA(z) =

∫zizjdµA(z) = (Mz×z)i,j ,

where the invariance of the measure µA has been used above.

Observation 3.40. The family of IFS considered (Equation 3.3) in this subsection are a subset

of similitude transformations. For instance, the generalised Sierpinski triangles (Example 2.1)

cannot be created with the above family of IFS (excluding the traditional Sierpinski), as complex

conjugation is not an isometry obtainable with the maps listed.

Similar to the real case in the previous subsection, an iterative formula can be constructed by

forming an appropriate linear transformation. Define the the operator ΦM : `∞(C)× `∞(C)→`∞(C)× `∞(C) through:

ΦM(Z) =N∑i=1

piR(ai, bi)ZR∗(ai, bi).

The convergence of the iterates of this operator to the matrix of moments Mz×z has been

studied in [EST07, JKS11]. In [EST07], a fixed point method is constructed to prove uniform

convergence by selecting the appropriate closed set of a Banach space where this operator is

contractive. We will generalise this result in the next subsection. The convergence is justified

in [JKS11] by noting that Mn(µ)weakly−−−−→ µA for any µ ∈ P(X) and identifying that each entry

of the matrix as a specific integral of a continuous function. In our case, we will simply prove

the result in finite dimensions through an identical argument given in the previous subsection.


In both of the aforementioned papers, the Hankel structure of the matrix R(a, b) was desirable

and thus kept so that the Riordan arrays [WW08] could be kept for cleanliness of computation.

We will rid ourselves of this structure now to reveal possible generalisations that may have been

overlooked. Let (en1,n2)i,j = (1i=n1,j=n2)i,j for en1,n2 ∈ `∞(C) × `∞(C) and e′n ∈ `∞(C) be

defined analogously. Define the vectorisation map v : `∞(C)× `∞(C)→ `∞(C) through:

v(en1,n2) = e′τ(n1,n2),

where τ(n1, n2) = ((n1+n2)+1)(n1+n2)2 +n2. To aid with visualisation, this map does the following

on Mz×z:

Mz×z =

m0,0 m0,1 m0,2 · · ·m1,0 m1,1 · · ·m2,0 · · ·

v−→ (m0,0,m1,0,m0,1,m2,0,m1,1,m0,2, . . . )T = v(Mz×z)

T .

It is clear that this map has an inverse,

v−1(e′k) = ebkct−(k−(bkct(bkct+1)/2)),k−(bkct(bkct+1)/2)

where b·ct = b1+√1+8 ·2 c − 1; the function which gives the first triangular number lesser than

a given number. This inverse can be determined through a straightforward counting argument

using Pascal’s triangle.

The map v is an isometry from (`∞(C)× `∞(C), ‖ · ‖∞,∞)→ (`∞(C), ‖ · ‖∞). Define the linear

operator ΦM = v ΦM v−1 : `∞(C) → `∞(C) which may be represented by an infinite matrix;

let ΦM,n be its n× n truncation.

Lemma 3.41. The operator ΦM,n has the same eigenvalues as ΦM,n

Proof. Take an eigenvalue λ and corresponding eigenvector v of ΦM,n and note

ΦM,n x = λx ⇐⇒ v ΦM,n v−1 v x v−1 = λ v x v−1.

Observation 3.42. The matrix that represents the operator ΦM,n is lower triangular. We

illustrate this with N = 1, meaning there is only one map in our IFS, that is N = 1, and n = 6,

ΦM,6 =

1 0 0 0 0 0

b1 a1 0 0 0 0

b1 0 a1 0 0 0

b21 2a1b1 0 a21 0 0

b1b1 a1b1 a1b1 0 a1a1 0

b21 0 2a1b1 0 0 a21

. (3.4)


For any n ∈ N0, the operator ΦM,n will be lower triangular from the ordering chosen in the

vectorisation map. The largest degree polynomial term of z in (az + b)i(az + b)j will always

be aiziajzj which by construction of the vectorisation map v forces this term will lie on the

main diagonal of the matrix. Additionally, by the ordering chosen in the vectorisation map any

lower degree terms will sit beneath the main diagonal. When additional maps are added to the

IFS, we gain the sum of lower triangular matrices which is itself lower triangular. Therefore,

in general ΦM,n has eigenvalues contained in {∑N

k=1 pkaikajk}i,j∈N0 which are all of magnitude

less than one for i, j 6= 0 (this follows from the fact that |ai| < 1, together with the triangle

inequality).

The lemma and observation above, combined with the equation given in Proposition 3.39 shows

that M z×z,n is an eigenvector of ΦM,n with eigenvalue one. Again by the reasoning given in

the real case from the former subsection, we have that for almost any z ∈ Cn (with respect to

Lebesgue measure) that ΦkC,M,n(z)

k→∞−−−→ cMzn for c ∈ C as this is the dominant eigenvector.

This gives an iterative method to compute the moments in the complex case that is once again

forms a geometric sequence with ratio∑N

i=1 pi|ai|. There is some merit to this observation that

is not merely listed for explanatory reasons. We have determined precisely the eigenvalues of

the truncated linear operator which will give strict bounds to the convergence of the iterative

formula in finite dimensions — something noted in [EST07] where they state their bound is far

from strict and in [JKS11] where no bound was needed and hence not given.

3.2.3 Two Dimensional Affine Moments

The matrix in Equation 3.4 is sparse under the main diagonal. This alludes to the idea that

there may be a more general class of IFS for which we may compute its moments with this

structure. In this subsection, it is found that the moments of an attractive measure given by an

IFS consisting of two-dimensional affine maps acting on (R2, ‖ · ‖2)1 can be computed in a very

similar fashion. This is done by ‘filling in’ the matrix created in Equation 3.4. We do this by

considering an IFS F of N maps that has functions of the form x 7→ hi(x, y) = Ai(x, y)T + bi

where Ai ∈ R2×2 and bi in R2×1. We adopt the standard notation for this type of IFS:

Ai =

(ai bi

ci di

)bi =

(ei

fi

)ai, bi, ci, di, ei, fi ∈ R. (3.5)

Associate this IFS with a vector of probabilities p to make this an IFS with probabilities. Now

we aim to explicitly evaluate

mxn1 ,yn2 :=

∫xn1yn2dµA(x, y),

1As the attractor is compact, one can always make the underlying metric space compact by taking a sufficiently

large closed ball around the origin such that the attractor of the IFS is a subset and the maps in the IFS send

this ball into itself.


where µA is as always the invariant measure of the IFSM with probabilities. In this subsection

we again give two methods to compute these moments. The first is based on solving a matrix

system and the other based on an iterative formula. The analysis of the convergence of the

iterative formula given for the calculation of these moments will be done in infinite dimensions,

unlike the two previous subsections, as this is the most general case considered here.

Before presenting any results, an example of calculating the first three moments of an affine IFS

is shown briefly so the presented content is not lost in notation.

Example 3.43. Consider the IFS of the form 3.5, note mx0,y0 = 1. Further see,

mx1,y0 =

∫xdµA(x, y) =

N∑i=1

∫pi(aix+ biy + ei)dµA(x, y) =

N∑i=1

pi(aimx1,y0 + bimx0,y1 + ei)

mx0,y1 =

∫ydµA(x, y) =

N∑i=1

∫pi(cix+ diy + fi)dµA(x, y) =

N∑i=1

pi(cimx1,y0 + dimx0,y1 + fi).

This yields mx0,y0

mx1,y0

mx0,y1

=

1 0 0∑Ni=1 piei

∑Ni=1 piai

∑Ni=1 pibi∑N

i=1 pifi∑N

i=1 pici∑N

i=1 pidi

mx0,y0

mx1,y0

mx0,y1

.

Solving the above system of equations will give the first three moments for the measure µA. The

matrix above is not lower triangular. This means that an explicit recursive formula cannot be

defined for this calculation as it had been previously done in the one dimensional case, nor can

a Hankel structure be identified or or be associated with a Riordan array.

Observation 3.44. The moments of an affine IFS in two dimensions can be calculated through

solving a system of linear equations. See the expansion:

mxn1 ,yn2 =

∫xn1yn2dµA(x, y)

=

N∑r=1

pr

∫(arx+ bry + er)

n1(crx+ dry + fr)n2dµA(x, y)

=N∑r=1

pr

∫ ∑i+j+k=n1

n1!

i! j! k!(arx)i(bry)j(er)

k

∑i+j+k=n2

n2!

i! j! k!(crx)i(dry)j(fr)

k

dµA(x, y)

=

N∑r=1

pr∑

i+j+k=n1

i+j+k=n2

∫n1! n2!

i! j! k! i! j! k!(arx)i(bry)j(er)

k(crx)i(dry)j(fr)kdµA(x, y)

=N∑r=1

pr∑

i+j+k=n1

i+j+k=n2

n1! n2!

i! j! k! i! j! k!air c

ir b

jr d

jr e

kr f

kr mi+i, j+j ,


where the linearity of the integral has been used above and finite sums exchanged.

Let deg(mn1,n2) = n1 + n2, then deg(mi+i, j+j) under the constraints i + j + k = n1 and

i+ j+ k = n2 is bounded by n1 +n2. For a fixed n1 and n2 with the aforementioned constraints

enforced, the number of ways deg(mi+i, j+j) may achieve this bound for i, i, j, j ∈ N0 is n1+n2+1

ways.

Likewise, the number of solutions to n1 + n2 = d for all numbers in N0 is precisely n1 + n2 +

1. Consequently the system of equations created is perfectly determined when the moment

equations are linearly independent.

As in the previous subsections, we aim to create a linear operator that allows for the iterative

calculation of these moments. As this is the most general case investigated in this chapter, con-

vergence analysis will be done on the infinite collection of moments for a specified self-similar

measure.

Notation 3.45. By selecting the terms that pertain to a fixed mn1,n2 in the expansion from the

observation above, define (the infinite matrix) φhr associated with an affine map hr by letting

(φhr)u1,u2 be∑τ(n1,n2)=u1τ(i+i,j+j)=u2

n1! n2!

i! j! (n1 − i− j)! i! j! (n2 − i− j)!air c

ir b

jr d

jr e

n1−i−jr fn2−i−j

r 1i+j≤n11i+j≤n2,

for i, j, n1, n2 ∈ N0. Note that τ is a bijective function in the above notation. The indicator

function on the formula above is to ensure that the exponents on the er and fr terms are natural

numbers as in the derivation in the sum above.

Through construction

Mx,y =N∑r=1

prφhrMx,y, (3.6)

for an infinite vector (Mx,y)τ(i,j) := mxi,yj =∫xiyjdµA(x, y) which may not necessarily lie in

`∞ as no assumption on the attractor of the IFS has been made.

At this stage there are two options for calculating the moment vector Mx,y of µA:

1. truncate the matrix represents ΦM at the bkct level for k ∈ N to ensure a completely

determined system is left and solve this system of equations;

2. find an iterative formula based on the operator ΦM so that matrix inversion is avoided.

We will investigate the iterative formula here, as avoiding matrix inversion here is desirable.

The form in which we have presented small examples is deceptive. As the size of the matrix

grows, we stray away from a diagonal like form and have to solve a system of equations which


is far more dense and difficult to solve. With this in place, if an iterative method were obtained

then the larger order moments could be computed with ease.

Let 1`∞ := {s ∈ `∞(R) | (s)0 = 1}, the space of bounded sequences that begin with 1. Equipping

this space with the infinity norm makes this a complete space as 1`∞ is a closed subset of `∞ in

the supremum topology. We define the operator ΦM : 1`∞ → 1`∞ through

ΦM(x) =N∑r=1

prφhrx.

It is readily checked that (ΦM(x))0 = 1 for any x ∈ 1`∞. It is untrue that the operator above

is bounded for any measure µ that has compact support in R2. For example, any compact

measure whose support is outside the ‖ · ‖∞ unit ball will result in a unbounded operator in the

supremum topology. This is because the integrals∫xn1yn2dµ(x, y) can grow arbitrarily large

for n1, n2 ∈ N0. Equation 3.6 shows that the moment vector of the measure of µA is a fixed

point of ΦM. Our goal now is to show that this fixed point is unique in 1`∞ equipped with a

certain topology.

The convergence proof we provide comes from a contractive fixed point argument with an ap-

propriate topology, which is a generalised result of what was proven in [EST07]. The structure

of this argument is straightforward and will be summarised briefly should the reader choose to

skim the details of the following section. Firstly, under restrictive conditions the convergence is

guaranteed through a contractive fixed point argument. This simple case is then related to the

general instance, and finally an appropriate Banach space where the general case reduces to the

restrictive solution is constructed.

Lemma 3.46. Let x, y ∈ 1`∞ such that ‖x− y‖∞ < 1. Then

‖φhrx− φhry‖∞ ≤ supn1,n2∈N0

(|ar|+ |br|+ |er|)n1(|cr|+ |dr|+ |fr|)n2 .

This follows through application of the triangle inequality and summing the rows of the absolute

values of the matrix φhr . From now we will assume that any element x ∈ `∞ is also in 1`∞. For

notational convenience we will take ‖φhr‖∞ = sup‖x‖∞, (x)1=0 ‖φhrx‖∞ so that we may simply

write the norm of the operator, and not take a difference φhrx− φhry as we have done above.

Firstly assume ‖φhrx − φhry‖∞ is less than one, it is immediate that the operator ΦM would

be a contraction in the supremum topology. This is a first trivial case as the assumption that

the convergence relies on the translation terms er, fr of the IFS is completely artificial and their

dependence should be nullified. We do this by appropriately transforming our space.

Lemma 3.47. Given an IFS F = {(X, d); f1, . . . , fn} with attractor A, the IFS g ◦ F ◦ g−1 =

{(X, d); g ◦ f1 ◦ g−1, . . . , g ◦ fn ◦ g−1} for a continuous invertible function g has attractor g(A)

when each of the functions g ◦ fi ◦ g−1 are contractive with respect to the metric d.


Proof. First note that g(A) ∈ H(X) as the image of a compact set under a continuous function.

Now we can verify directly that

(g ◦ F ◦ g−1)g(A) =N⋃i=1

g ◦ F ◦ g−1g(A) = g

(N⋃i=1

fi(A)

)= g(A),

where we have used the fact that F(A) = A.

Lemma 3.48. Given the assumptions of Lemma 3.47, if an associated IFS with probabilitiesMis given with attractor µA then the IFS with probabilities g ◦M◦ g−1 has the attractor µA ◦ g−1.

Proof. By the definition of g−1 : H(X) → H(X), if x /∈ g(X) then g−1(x) = ∅. Consequently,

g−1 is measurable. This ensures that the measure µA ◦ g−1 is well-defined. Now again we may

directly compute

µA ◦ g−1 =

(N∑i=1

piµA ◦ f−1i

)◦ g−1 =

N∑i=1

pi(µA ◦ g−1) ◦ g ◦ f−1i ◦ g−1,

where again we have used the fixed point equation of µA.

Lemma 3.49. Given the function g : R2 → R2 such that (x, y) 7→ g(x, y) = (αx, βy) for

α, β > 0 and the moments mµ,xn1 ,yn2 :=∫xn1yn2dµ(x, y) of a two dimensional measure µ, then

the measure µ ◦ g−1 has moments

mµ ◦ g−1,xn1 ,yn2 = αn1βn2mµ,xn1 ,yn2 .

Proof. Through a direct calculation,∫xn1yn2dµ ◦ g−1(x, y) = αn1βn2

∫xn1yn2dµ(x, y).

Corollary 3.50. Define the linear operator S : 1`∞ → 1`∞ through

(S(z))τ(n1,n2) = αn1βn2xτ(n1,n2).

Then the operator Φg◦M◦g−1 : 1`∞ → 1`∞ acting on a moment vector M ∈ l1⊕∞ is

Φg◦M◦g−1(M) =

N∑i=1

piSφhiS−1M.

Remark 3.51. The operator S can be represented by an (infinite) diagonal matrix, that is we

are simply re-weighting the dimensions of the space. The corollary above is just noting that we

have changed the co-ordinates of our space, exactly in the same manner one proves the existence

of an attractor for a FIF in Example 2.11.


Proposition 3.52. Define the norm ‖ · ‖g on 1`∞ through ‖ · ‖g := ‖S · ‖∞ and let an IFS with

probabilities M be given. Then the corresponding moment operator ΦM obeys

‖ΦM(x)‖g ≤∥∥Φg◦M◦g−1

∥∥∞ ‖x‖g ,

for g of the form given in Lemma 3.49 and any x ∈ 1`∞.

Proof. It is clear that ‖ · ‖g is a well defined norm as S is linear and bijective. Recalling the

notation given in Lemma 3.46, we have the bounds

‖ΦM(x)‖g =

∥∥∥∥∥N∑i=1

piφhi(x)

∥∥∥∥∥g

=

∥∥∥∥∥SN∑i=1

piφhi(x)

∥∥∥∥∥∞

=

∥∥∥∥∥N∑i=1

piSφhiS−1S(x)

∥∥∥∥∥∞

≤∥∥Φg◦M◦g−1

∥∥∞ ‖x‖g ,

where we have used that S is linear and bijective, and Corollary 3.50 in the last line.

Theorem 3.53. Let F be an IFS of the form described in Equation 3.5 with the added require-

ment that each bi = 0 and |ai|, |di| < 1. Then for the associated IFS with probabilities, M, with

any stochastic vector p of length N ; the iteration Φn(x0) converges to the vector of moments

Mx,y of the attractive measure µA in the ‖ · ‖g topology for any initial x0 in 1`∞.

The assumptions above describe a reasonable type of IFS, such as those which generate fractal

interpolation functions (Example 2.11). Indeed, the only contractive triangular affine maps are

those which obey the conditions above through calculating the joint spectral radius of the IFS

[ABVW10]. This instance of two-dimensional IFS is one generally understood in the literature

— such as the reasonable understanding of the dimension theory in this case [BRS16].

Proof. Select g : R2 → R2 such that g(x, y) → (αx, βy) for α, β > 0, then each of the maps in

the IFS g ◦ fi ◦ g−1 have the form:

g ◦ fi ◦ g−1(x, y) =

(aix+ eiα

βαcix+ diy + fiβ

).

These functions can be made contractive in the (R2, ‖ · ‖θ) metric space in an identical fashion

to Example 2.11, meaning that we have a valid IFS. Select α, β > 0 such that (|ai|+α|ei|) and

(βα |ci|+ |di|+ β|fi|) < 1 granting

∥∥Φg◦M◦g−1

∥∥∞ ≤ max

s∈[N ]sup

n1,n2∈N(|ar|+ α|er|)n1

(β

α|cr|+ |dr|+ β|fr|

)n2

< 1.

By Proposition 3.52 ΦM : (1`∞, ‖ · ‖g) → (1`∞, ‖ · ‖g) is a contraction. Note that as the first

element of x ∈ 1`∞ is fixed, the supremum above is only over N for n1 and n2. It is immediate

that (‖ · ‖g, 1`∞) is complete as (‖ · ‖∞, 1`∞) is complete. Equation 3.6 then shows that the

unique fixed point of the operator ΦM is the moment matrix of µA given by Mx,y.


Remark 3.54. The proof above works for triangular systems, as we know precisely when an IFS

is contractive in relation to its map parameters. Furthermore, the infinite matrix that represents

this operator in this instance is diagonal, so in finite dimensions one could argue convergence

like the previous subsections. It is not as refined for when the matrix is dense, even in the case

of two-dimensional similitudes as evinced in the argument below. This result is sufficient and

constructive for our usage in the next chapter so we include it for thoroughness.

Theorem 3.55. Take an IFS of the form described in Equation 3.5, where we additionally

assume that the matrix Ai has the form λiOi for an isometry Oi and 0 < λi < 1; that is the

functions in the IFS are similitudes. Then associating this IFS with probabilities in the same

fashion as above, the iteration Φn(x0) converges to the vector of moments Mx,y of the attractive

measure µA in the ‖ · ‖g topology for any x0 in 1`∞ when λi <1√2

for all i ∈ [N ].

Proof. Select g(x, y) → (αx, αy) for α > 0, then each of the maps in the IFS g ◦ F ◦ g−1 have

the form

g ◦ F ◦ g−1(x, y) =

(aix+ biy + eiα

cix+ diy + fiα

).

From the form of the functions in the IFS, the parameters necessarily obey 1λi

(|ai| + |bi|) =1λi

(|ci|+ |di|) = (| sin(θ)|+ | cos(θ)|) ≤√

2 for some θ ∈ [0, 2π] as the matrix Ai will have entries

relating to a rotation matrix. To gain a contraction as we did in the previous argument, we must

additionally assume that λi <1√2

and then the argument follows in an identical manner.

Remark 3.56. The assumption λi <1√2

is unnatural. Recall that the topology on P(X) is

equivalent to a weak topology. The operator ΦM is really acting on vectors of integrals of

polynomials (which are dense on continuous functions on µA). Such integrals determine these

measures under our assumptions on the IFS (Proposition 3.23). The operator ΦM maps vectors

of moments corresponding to one measure µ to another vector of moments corresponding to

M(µ). The convergence of limn→∞ΦnM(x), intuitively speaking, should be strictly dependent

on the existence of an attractive measure for M. In general, this question is hard to answer in

terms of the IFS parameters.


3.3 Computation of Two-Dimensional Affine Moments

Using Elton’s Theorem [Elt87] we may computationally validate the new formulas and algo-

rithms we have produced for the calculation of moments arising from a 2-dimensional affine IFS

with probabilities. Here we go through the calculation of the first 21 = b21ct moments of the

Steemson triangle given in Example 2.1. A Python notebook with this experiment may be found

here.

The moments of the Steemson triangle from our example were calculated directly through solving

the set of linear equations. This calculation was done in 32-bit floating point arithmetic. The

code provided can calculate the moments exactly in this method when the IFS parameters are

given as exact rational numbers. This computation yields

M21 =(

1, 0.4241632143, 0.2631243811, . . . , 0.0225122445).

There were two computational experiments run here. Firstly, the moments were calculated

through the iterative method discussed. We expected this ΦiMSteem,21

(x0) to uniformly converge

at a geometric rate to M21 where x0 was selected as the vector of 1’s. This is what is observed

in the graph below on the left, whereby plotting the − log term of the residuals shows the digits

of accuracy increase with increasing iterations.

Next, the measure was estimated in the weak sense through Elton’s theorem, pictures of this con-

vergence of the empirical distribution are provided. The moment integral through this method

is known to converge at a rate of O(

1√i

), where i represents the length of an orbit of the dynam-

ical system. The error between M21 and Elton’s approximation was estimated for i ∈ {1, 106}and converged at the expected rate as shown in the figure on the bottom right.

0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0i

0

2

4

6

8

log 1

0||M

21i

,21(

x 0)||

0 200000 400000 600000 800000 1000000Iterations : i

0.000

0.002

0.004

0.006

0.008

0.010

||M21

Mel

t,i||

Figure 3.5: Convergence of the iterative formula and Elton’s Theorem validation of the moment

formula’s.


Chapter 4

The Inverse Problem of Moments

‘Some systems of hundreds of polynomials can be solved, whilst others of five cannot.’

— Anand Deopurkar.

Moment reconstruction of a self-similar measure was the initial premise of attempting the inverse

problem of fractal approximation [BD85]. Shortly after this idea was given, many attempts to use

moments in the aid of reproducing a fractal measure have been attempted [MS89, HM90, Man13].

Such methods aim to reconstruct self-similar measures in P(X); however this space is ‘large’ —

for every self-similar set in H(X) there are undenumerable self-similar measures supported on

this set. The associated difficulty of this problem led to the discovery of the algorithms used for

fractal image compression (FIC) which all stay within the space H(X). Fundamentally, what

we do differently in this chapter is to reconstruct elements of H(X) with methods made in P(X)

— in the same fashion Lp was used to approximate elements of H(X) in Chapter 2. The inten-

tion here is that these methods from P(X) may create a more flexible model to those given by

considering function spaces.

Three instances of fractal reconstruction through its moments are given — each pertaining to

the three examples presented in Chapter 2. Recall the examples are: a Cantor set, a Steem-

son triangle, and a truncated fractal interpolation function that has no exact IFS description.

Firstly the use of numerical methods to solve polynomial systems are utilised in finding a one

dimensional self-similar measure through knowledge of its moments. This is done in exact ratio-

nal arithmetic through the use of Grobner basis. Next, a relationship between H(X) and P(X)

is established through theory pertaining to the open set condition, fractal tiling, and dimension

theory [BHR06, BV18, BRS19]. This relationship is exploited to create a method whereby the

attractor in H(X) given by an IFS of similtudes that obeys the OSC can be reconstructed nu-

merically with moment and dimension theory methods. Finally, using the iterative methods for

calculating moments found in the former chapter yields a moment approximation theory that

can be used in approximating an object that is not self-similar. The method presented here has

the ability to find an exact (up to rounding error) description of a self-similar object if it exists

— unlike the method presented in Example 2.11.

47

48 CHAPTER 4. THE INVERSE PROBLEM OF MOMENTS

4.1 Algebraic Solution to Polynomial Systems

This section is to introduces modern algebraic polynomial solving techniques with an explicit

example, showing a constructive formulation of solution to the problem stated in the previous

chapter (Example 3.1) that has not been previously investigated. This method is not generally

suitable for modelling purposes, but can aid in the explicit study of an exact affine self-similar

measure and will give some insight to the non-linear system of equations that we aim to solve.

Constructive answers to questions like, ‘what are the dynamical systems that can be produced

from an affine IFS of N maps given a self-similar measure?’ can be given through this method.

Particularly, one may have an IFS that does not obey the open set condition and beg the

existence of another affine IFS that does obey the OSC whose IFS with probabilities has the

same attractive measure. Simple examples of this situation can be constructed.

Example 4.1. Take the IFS FCant which we have seen obeys the OSC and has the attractive

measure of ‘uniform Cantor measure’ when given uniform probabilities on the maps. Then

the following IFS F = {([0, 1], | · |); f1(x) = x3 , f2(x) = x

3 + 23 , f3(x) = x

6 , f4(x) = x6 + 2

9 }has the Cantor set as an attractor and clearly does not obey the OSC through a simple set

containment argument. Equipping this IFS with the probability vector p = (14 ,12 ,

18 ,

18) grants

that the attractive measure is uniform Cantor measure.

This is an example of a perfectly overlapping IFS, made explicit through Hochmans Exponential

Separation Condition (HESC) [BRS16, BRS18, BRS19], a separation condition less restrictive

than the OSC. Determining when an IFS is perfectly over-lapping is an open question in fractal

geometry, which we mention as the tools and examples we develop in this chapter may aid in

better understanding this problem.

Through the methods developed in the former chapter, the first n moments of any affine IFS in

one or two dimensions can be calculated. Now assume these are given, and measure from which

they were taken wants to be recovered. In this section, the technique used gives all the IFS(s)

with probabilities whose attractive measure has the same first n moments. In many cases, when

n is larger than the free parameters being solved for, this produces precisely all the IFS(s) who

have a common attractive measure as the non-linear system becomes completely determined by

this finite collection of moments. We do not prove this in general as it would be a cumbersome

detour of our modelling goal, but show a novel instance that also exemplifies how moment theory

may intertwine studying polynomials and self-similar measures.

Proposition 4.2. Assume the polynomials in the variables ai and bi

n∑j=0

(n

j

)aji b

n−ji mj ,

have the form qni for a fixed polynomial qi in ai, bi for i ∈ [N ], n ∈ N0, ai ∈ (−1, 1). Then if

4.1. ALGEBRAIC SOLUTION TO POLYNOMIAL SYSTEMS 49

an IFS with probabilities is made from the parameters ai, bi with these constraints, its attractive

measure µA is a point mass if ai 6= 0, otherwise µA is a sum of point masses.

Proof. Assume ai 6= 0 for all i ∈ [N ]. Through looking at the case for n = 1, it must be the

case qi = aim1 + bi through the linearity of the integral. We claim that mn = mn1 , this is easily

seen through induction. The base case is clear, then through a calculation with the inductive

hypothesis

(aim1 + bi)n + ani (mn −mn

1 ) = (aim1 + bi)n,

showing mn = mn1 when ai 6= 0. If the latter were the case, the attractive measure is trivally a

collection of point masses as ai = 0. Now we have the identity

N∑i=1

pi(aim1 + bi)n = mn =

N∑i=1

pi

∫(aix+ bi)

ndµA(x) ∀n ∈ N0,

which occurs exactly when µA = 1(·)(m1).

Remark 4.3. This proposition firstly shows, as one would expect, that a point mass with one

free variable needs only one moment equation to fully describe the measure. Secondly this

example shows how the divisibility of polynomials given by the moment equations can be used

to study the properties of a self-similar measure.

We now indirectly address the problem posed in Example 3.1 through viewing the linear —

in the moment values — system ΦMMx = Mx as a non-linear system of equations in the IFS

parameters. In one dimension we have the (infinite) polynomial system of equationsqn(a,b, p) :=N∑i=1

pi

n∑j=0

(n

j

)ajimjb

n−ji −mn

∞

n=0

, (4.1)

where we seek to find the common zeros of all of the polynomials in the above set. As a conve-

nient notation, we have used qn(a,b, p) to mean qn is a polynomial in the variables ai, bi, pi for

all i ∈ [N ]. In general, one cannot solve a polynomial in one variable with degree larger than five

in closed form, let alone solve an infinite multi-variable system of polynomials with increasing

degree. For this reason, we must specialise to a certain self-similar measure to solve this inverse

problem for. We will solve this system of equations exactly for a single simple example, so that

this might elude to a numerically feasible method to solve the general instance.

A brief overview of key tools needed to analyse a polynomial system of equations (most of which

can be found in the paper [MT01]) will be given. This example shows a link between the very

heuristic method of identifying an IFS attractor and the systematic working found in compu-

tational algebra. From now, this chapter can be read interactively with the Python notebook

provided.



Formalism for how polynomial systems are typically structured is required to solve this system.

A monomial in n variables: x1, . . . , xn, is the product xj := xj11 · · ·xjnn for j ∈ Ns0. The set

of monomials in n variables is denoted T n. A (real) polynomial in n variables is of the form

p =∑

J 6=0 cjxj with cj ∈ R where we assume the sum to only have finitely many terms. As

is common notation, F[x] is the field F adjoining the variables x. Furthermore, if I := {qi}ni=0

are a collection of n + 1 polynomials of the variables x whose co-efficient come from F, then

〈I〉 is the ideal generated by such polynomials. We use the notation F[x] mod 〈I〉 to mean

the polynomial field F[x] quotient our polynomial ideal. For the remainder of this section, we

will always take F = Q so we are dealing with polynomials that have rational coefficients —

this allows for computation in exact arithmetic. It is clear from the equations derived in the

former chapter that selecting IFS parameters with rational coefficients will always yield rational

moments of the invariant measure.

There are ‘good’ and ‘bad’ generators for a polynomial ideal. A ‘good’ generating set is one

for which certain information of the ideal is easy to deduce from its generators, such as the

determination of a (linear1) basis for the quotient ring Q[x] mod 〈I〉. A Grobner basis is

one such collection of generators and may be produced computationally through Buchbergers

algorithm [Buc76]. To properly define a Grobner basis, a total term ordering on monomials is

needed.

Definition 4.4. (total ordering)

Given the set of monomials in n variables

T n := {xi11 . . . xinn | i1, . . . , in ∈ N0},

a total ordering ≺T is an operation on a pair of elements in T n such that:

(i) 1 ≺T t ∀t ∈ T n \ {1};

(ii) if t1 ≺T t2 then t1t ≺T t2t ∀t ∈ T n.

That is 1 is the smallest element in T n and the ordering respects monomial multiplication.

We have implicitly used this definition in creating the map τ : N20 → N0 in the previous chapter,

using graded lexicographic ordering defined below with x ≺grlex y. It is useful to introduce the

function LT≺T : Pn → Pn that returns the largest monomial term in a polynomial q with

its field co-efficient according to the ordering ≺T . This is amply called the leading term of a

polynomial (under the ordering ≺T ).

1In the sense of linear algebra, opposed to a polynomial basis where the coefficients themselves are permitted

to be polynomials.


Example 4.5. Two possible examples of total orders are:

1. lexicographic ordering, ≺lex;

xi11 . . . xinn ≺lex x

j11 . . . xjnn if il < jl for the largest l ∈ {1, . . . , n} such that il 6= jl.

2. graded lexicographic ordering, ≺grlex;

xi11 . . . xinn ≺grlex xj11 . . . xjnn if

∑nl=1 il <

∑nl=1 jl. In the case of equality, lexicographic

ordering is used.

For an explicit example, when n = 2 we have x41 ≺lex x2 and x2 ≺grlex x41.

Remark 4.6. Total orderings are often regarded as artificial constraints for computation, as

the theory only needs the partial ordering on monomials through their divisibility. Whilst this is

mathematically true, the ordering chosen plays a great role in the simplification of a polynomial

system. It is a known fact that it is computational more expensive to work with lexicographic

ordering in comparison to graded lexicographic ordering.

A negative neighbour of a monomial∏ni=1 x

jii is another monomial of the form xjk−1k

∏i 6=k x

jii

for jk > 0 and i ∈ {1, . . . , n}. A set of monomials is called closed if each monomial in the set

contains all of its negative neighbours.

Definition 4.7. For a specified closed set V ⊂ T s, the border set is B(V ) = {xj ∈ T s : xj /∈V,but some negative neighbour of xj

′ ∈ V }.

The definition given above is all that is needed to define a (border) Grobner basis. Before we

give such a formulation, an example of some border sets and closed sets are provided.

Example 4.8. Consider the closed monomial sets in two variables x1, x2:

{1, x1, x2, x1x2} {1, x1, x21, x31}.

To visualise these sets and deduce the corresponding border set, a lattice plot is enlightening.

x11 x21 x31 x41

x12

x22

x32

1 x11 x21 x31 x41

x12

x22

x32

1

Figure 4.1: Two closed sets of monomials (large circles) and their corresponding border sets

(diamonds). Additional details of the figure will be described later.


We may now define a Grobner basis.

Proposition 4.9. Let a polynomial ideal 〈I〉 be given. Then there exists a unique closed set V

for a specified term order ≺T such that the polynomials

gk := zk − (zk mod 〈I〉) ∀zk ∈ B(V ),

form a generating set to the ideal 〈I〉. This set of generators we define as the (border1) Grobner

basis of 〈I〉.

As we are using the this proposition for applied use, we will show an example and mention how

one can construct a proof based on the example, opposed to providing a proof explicitly — this

can be found in [Ste04].

Example 4.10. Take the polynomial system of two variables R = {r1(x1, x2) = x21 + 4x22 −4, r2(x1, x2) = 9x21 + x22 − 2x2 − 8}, and define the ideal 〈R〉. The zero sets of r1 and r2

(x1, x2 such that r1 = r2 = 0) represent two ellipses in the (x1, x2)-plane. The solution of this

polynomial system corresponds to how the ellipses r1 = 0, r2 = 0 intersect one another, where

for this example they are constructed so that there are four distinct points of intersection. We

may now compute explicitly the border Grobner basis for graded lexicographic ordering. We

do this through displaying a simplified version of the general algorithm in which this can be

computed.

Let lcmr1,r2 be the least common multiple of the leading monomials of r1 and r2, then

Sr1,r2 :=lcmr1,r2

LT≺grlex(r2)

r2 −lcmr1,r2

LT≺grlex(r1)

r1 = 4r2 − r1 = 35x21 − 8x2 − 28.

The polynomial Sr1,r2 is a syzygy — a non-trivial polynomial combination of r1 and r2. To com-

pute a Grobner basis, one forms all such syzygy’s from the initial generators of the ideal. After

this has been completed, the syzygy’s are reduced through multivariate polynomial division —

repeating this process in a specific manner is Buchbergers algorithm for finding a Grobner basis.

For the sake of brevity we reduce our example directly. CombineSr1,r235 := g1 with r1 to gain

the polynomial g2 := x22 − 8x235 −

45 . An elementary argument shows 〈r1, r2〉 = 〈g1, g2〉. Define

g3 = x2g1 and g4 = x1g2, then 〈g1, g2〉 = 〈g1, g2, g3, g4〉. Let V be the border set displayed

in Figure 4.1 on the left, then through uniqueness {gk}4k=1 are the generators for the border

Grobner basis of the ideal R.

Although the border Grobner basis is unique for a specified term order, not all of its polynomials

are needed to generate the ideal. In the instance above, we saw 〈g1, g2〉 generated R — this is the

1A Grobner basis need not be minimal and therefore not unique, thus in the phrasing ‘the’ we have actually

described a ‘border Grobner basis’ that does enjoy these qualities, up to normalisation of the leading term. A

particular subset of this border set, referred to the corner set, is a called a reduced Grobner basis and what is

most commonly used for computation.


reduced Grobner basis, its generators leading monomials are marked with crosses in the Figure

4.1. This is the corner set of the border set, defined as being the border elements whose negative

neighbours are all contained in the closed set V . If we did the same sequence of calculation just

lexicographic ordering, we would recover the image on the right.

To see why a Grobner basis is a ‘good’ generating set, an analysis of Figure 4.1 will be given.

Let a polynomial r ∈ Q[x1, x2] be given and assume the graded lexicographic ordering on

monomials. Assume r is normalised in the sense that its leading term has co-efficient one. We

wish to represent r in the quotient ring Q[x1, x2] mod 〈R〉. With reference to Figure 4.1 we

have the two cases to consider.

1. If LT (r) ≺ LT (gk) for some gk then r = r mod 〈R〉. If this were not the case, then there

would exist some zk ∈ B(V ) such that lcm(zk, LT (k)) = zk, so the generator gk = zk − zkmod 〈R〉 could be replaced by the polynomial r, altering the closed set V . This contradicts

the uniqueness of the border Grobner basis.

2. If not above, then there exists a generator gk that can reduce r by polynomial division.

The leading term of r marked on the lattice given in Figure 4.1, would have its remainders

leading term strictly less than the monomial that divided it (this ordering is given by

dashed black lines on the figure). The algorithm will run until the remainder of such

divisions obeys the first case, in which scenario the representation of r mod 〈R〉 has been

determined.

The monomials shaded in blue in Figure 4.1 can all be reduced to the closed set V through

either the border Grobner basis or the (more minimalist) reduced Grobner basis. The

monomials that may be reduced by an element of the reduced Grobner basis are encapsu-

lated in the dashed blue lines in Figure 4.1.

Thus, the closed set V defines a (linear) basis for the quotient ring Q[x] mod 〈I〉 [Ste04].

This quotient ring is finite dimensional precisely when V is. Solving a polynomial system is

computationally feasible when the quotient ring is finite dimensional — or equivalently having

finitely many solutions to the polynomial equations [Ste04]. Some self-similar measures are

degenerate in this regard.

Example 4.11. Lebesgue measure supported on [0, 1] will always have a one-dimensional so-

lution in its solution set for the polynomials given by Equation 4.1 when N = 2. Consider the

family of IFS(s) with probabilities

Mα = {([0, 1], | · |); f1(x) = αx, f2 = (1− α)x+ α; p1 = α, p2 = 1− α}

whose attractive measure is unit supported Lebesgue measure for any α ∈ (0, 1).

Intuitively, selecting a fractal measure whose support has fewer affine-symmetries than a straight

line should yield a problem that we may readily solve. Referring back to the first of our examples


presented in Chapter 2, a Cantor set equipped with uniform measure — that has only finitely

many affine IFSs of two maps which construct it through the current standard of the graduate

student algorithm — will be explicitly solved as our example. In light of Example 3.30, this

produces the following polynomial system {qn(a,b, p)}5n=0:

q0 = p1 + p2 − 1;

q1 = a1p1/2 + a2p2/2 + b1p1 + b2p2 − 1/2;

q2 = 3a21p1/8 + a1b1p1 + 3a22p2/8 + a2b2p2 + b21p1 + b22p2 − 3/8;

q3 = 5a31p1/16 + 9a21b1p1/8 + 3a1b21p1/2 + 5a32p2/16 + 9a22b2p2/8 + 3a2b

22p2/2 + b31p1 + b32p2 − 5/16;

q4 = 87a41p1/320 + 5a31b1p1/4 + 9a21b21p1/4 + 2a1b

31p1 + 87a42p2/320 + 5a32b2p2/4+

9a22b22p2/4 + 2a2b

32p2 + b41p1 + b42p2 − 87/320;

q5 = 31a51p1/128 + 87a41b1p1/64 + 25a31b21p1/8 + 15a21b

31p1/4 + 5a1b

41p1/2 + 31a52p2/128

+ 87a42b2p2/64 + 25a32b22p2/8 + 15a22b

32p2/4 + 5a2b

42p2/2 + b51p1 + b52p2 − 31/128

For readability, the explicit Grobner basis polynomials are not shown but are presented in the

accompanying Jupyter notebook. Appealing to the use of symbolic computing, our system has

the following corner set with the graded lexicographical ordering given by p2 ≺lex p1 ≺lex a2 ≺lex

a1 ≺lex b2 ≺lex b1 which we will denote pi ≺lex ai ≺lex bi from now for ease.

{a41a22, a21a42, a21a

22b1, a42b1, a21b

32, a22b

22p2, a41b2, a21a

22b2, a22b2p

22,

a42p2, a22b21, a21b1b2, b32p2, b21b2, b1b

22, a21p2, b1p2, p1}.

Note that the monomials {pn2}∞n=1 are not ‘bounded’ by the leading monomials of our Grobner

basis. This result is unintuitive as it states that we have infinitely many solutions to our

polynomial system and thus infinitely many ways to construct uniform Cantor measure with

two maps. This statement is false. Currently, the polynomial system of equations given address

the unconstrained inverse problem. It is true that any IFS parameters that produce a Cantor

set will be a solution to the system of polynomials {qn(a,b, p)}∞n=0; the converse of this is false,

as evinced by the following observation.

Observation 4.12. The infinite polynomial system {qn(a,b, p)}∞n=0 has the following N − 1

dimensional zero-set

{ai, bi, pi ∈ R

∣∣∣ ai ∈ {−1, 1}, bi = 0, ∀i ∈ {1, . . . , N};N∑i=1

aipi = 1}

contained in their combined solution set.

Selecting ai /∈ (−1, 1) contradicts the maps fi being contractive. Currently, this observation

kills any numerical feasibility of using these moment equations for modelling purposes. The

moment method of fractal reconstruction is known to be notoriously unstable in computation


[HM90, Man13]. If we were to naively look for a least squares solution to our polynomial system

for measure reconstruction, one of the uncountably many ‘trivial’ zeros given above may be

found. We cannot restrict an optimisation to consider a ∈ (−1, 1) as this an open set — where

numerical methods typically require minimally the assumption of a closed set for convergence

theorems to hold.

Let P be a polynomial system in n variables x1, . . . , xn. Adding the polynomial qt(xi, t) =

(xn − a)t − 1 for a newly introduced variable t and constant a ∈ Q to this system guarantees

that xn = a is not a solution to the system of equations. Furthermore, any solution of P that

does not have xi = a remains a solution in the new polynomial system. Finally, when the under-

lying field has an ordering on its elements (such as R, Q, but not C) then adding the polynomial

qt(xi, t) = (xi−a)t2−1 forces the constraint xi > a. With these points made, our polynomial sys-

tem can be turned into a constrained problem through the addition of these constraint variables.

Unfortunately, placing all of the constraints that on our polynomial system to enforce ai ∈ (−1, 1)

prevents the computation being completed in any reasonable time1. The trivial solutions found

in Observation 4.12 can be parameterised by ai = ±1 granting only the additional constraint

polynomials qi1(ai, ti1) := (ai − 1)ti1 − 1 and qi2(ai, ti2) := (ai + 1)ti2 − 1 for all i ∈ [N ] need

to be added to eliminate this one-dimensional (trivial) solution set. As our system is growing

in total degree with the increase of the order of moments, choosing any type of lexicographic

ordering will results in large computation time, so using a graded ordering was an intuitive

choice. Explicitly, we may select the ordering bi ≺ ai ≺ ti ≺ pi and use graded lexicographic

ordering yielding the simple set of equations as the corresponding reduced Grobner basis:

a2b2 − a22 + b22 − b2 + 1

6a12 + a2

2 + b1 + b2 − 19a28 + t21 + 9

89a28 + t22 − 9

89a18 + t11 + 9

89a18 + t12 − 9

8

a21 − 19 a22 − 1

9

p1 − 12 p2 − 1

2 .

The system above can be easily solved directly, so we do not continue with the work presented

in [MT01]. We will make mention of the technique they give for when there are finitely many

solutions to the polynomial system, to show this procedure could be used for more complicated

fractal measures. When the quotient ring is zero dimensional, then the common zeros of the

polynomials in the system can be enumerated as {zi}di=1, where zi ∈ Qn. For the sake of simplic-

ity we assume that these are simple zeros, that is there are no repeated roots of the polynomials.

Then one can represent the polynomials in our quotient ring with the Lagrange interpolating

polynomial basis, whereby the selection for the basis Lbi is chosen such that Lbi(zj) = 1i=j .

Informally, one may think of the quotient ring as a d-dimensional space; the linear basis of this

1This computation was left for one week without completing. The only bounds given for algorithms to produce

a Grobner basis are ‘super exponential’.


space given by the Lagrange polynomials forms the ‘standard unit basis’ {ei}di=1.

It follows that multiplication by a monomial term in the quotient ring using this Lagrange basis

can be represented by a diagonal matrix, and so the multiplication matrices given in any linear

basis of this space are related to the Lagrange basis by an eigenvalue decomposition. An in

depth discussion of this can be found at [MT01, Ste04], but the key detail is that once the

monomial multiplication matrices are identified (with some additional mild assumptions on the

border basis), the solution of the system can be obtained through finding the eigenvectors of the

multiplication matrix given from any basis, which are easily deduced by a Grobner basis.

With this stated, the solution of our Cantor set inverse problem is readily solved. Let FCantor =

{Fi = {([0, 1], | · |); f1 = a1x+ b1, f2 = a2x+ b2}}8i=1 where the coefficients of each Fi are given

in the table below:

F1 F2 F3 F4 F5 F6 F7 F8

a113

13 −1

3 −13

13

13 −1

3 −13

a213

13

13

13 −1

3 −13 −1

3 −13

b1 0 23 1 1

3 0 23

13 1

b223 0 0 2

3 1 13 1 1

3

It is immediate that all of these solutions will give a Cantor set, and furthermore, uniform

Cantor measure when the maps are given equal probabilities. This gives the following corollary

through realising that the solutions given above are not only the rational solutions, but all of

the real ones as well, noting the simplicity of our reduced polynomial system.

Corollary 4.13. The only affine IFSs of two maps that produce uniform Cantor measure are

those listed in FCantor. Furthermore, all IFSs of N maps that produce uniform Cantor measure

are scaled copies of these in the sense if:

µC =N∑i=1

piµC(f−1i ),

then fi(x) = 13n (ajx+bj)+c for aj , bj in the table above, n ∈ {0, . . . , N−2} and c ∈ FN−1Cant ({0, 1}).

Note this includes IFSs that do not obey the OSC.

The methodology here answers a related question to that posed in Example 3.1 exactly and

systematically. This is a constructive method that can be performed with modern computing

power to find all IFSs of N maps with probabilities for a given attractive measure. Asking ‘how

many ways one can construct a Sierpinski triangle’ motivated the discovery of the generalised

Sierpinski triangles which have since appeared in [SW18, BV18, Ste18, Gra18]. It would be

interesting to study questions such as ‘when is an attractive measure uniquely represented by

an IFS with probabilities of N maps’, through analysing the polynomial systems given by the

4.2. HAUSDORFF DIMENSION AND THE OPEN SET CONDITION 57

moments. This example reveals a new tool in fractal geometry through this link to computational

algebraic geometry.

Observation 4.14. There are infinitely many ways to form uniform Cantor measure using an

affine IFS with probabilities of three or more maps. For instance, the family :

MP =

{([0, 1], | · |); f1 =

x

3, f2 =

x

3+

2

3, f3 =

x

3+

2

3; p =

(1

2, α,

1

2− α

)}for α ∈ [0, 12 ] has an attractive measure of µC . However, there are only finitely many ways to

construct uniform Cantor measure with N maps when the underlying IFS obeys the OSC. We

investigate the OSC in the next section.

As previously commented, the space P(X) is ‘larger’ than H(X). The observation given above

shows that solving the inverse problem in P(X) in an unconstrained manner is exceptionally

difficult. Our original modelling goal was to find a computational way to find an IFS for a given

self-affine set (an element of H(X)), a relationship between H(X) and P(X) must be established.

Assume a black and white image of a Cantor set (of infinite resolution) is given. To model this,

it can be viewed as either an element of H(X) or an empirical distribution of a measure in P(X).

There should be some choice of the additional parameters of the probabilities to restrict IFSs

acting on P(X) so that the IFS on H(X) is such that their attractors model the same black

and white image. In general the answer to this unknown [BRS16]. When the IFS consists of

similitudes and obeys the open set condition, an answer is available. To explain this relationship,

a more in depth understanding of the OSC, tiling theory and Hausdorff dimensions is needed.

4.2 Hausdorff Dimension and The Open Set Condition

In our introductory modelling chapter we introduced the concepts of box-counting dimension and

the open set condition at a very shallow level to reveal considerations taken in traditional fractal

modelling. In this section, a deeper insight into these topics is given. This was intentionally

done so that we may reveal the interplay between H(X) and P(X) from a modelling perspective

now that both of these spaces have been introduced. For clarity we will limit the form of the

IFSs discussed here to only contain functions which are similitudes.

Firstly, a constructive view point of the OSC is shown, then the Hausdorff dimension defined

and finally combining these theories to unite H(X) and P(X).

4.2.1 The Open Set Condition

Recall an IFS, F = {(X, d); f1, . . . , fN}, obeys the open set condition if there exists a non-

empty open set U ⊂ X such that⋃Ni=1 fi(O) ⊂ O and fi(O) ∩ fj(O) = ∅ (Definition 2.7). From

a constructive point of view, one may equivalently state the second condition of the OSC as

O ∩ f−1i fj(O) = ∅ (for invertible fi) for i 6= j, that is all the points f−1i fj(x) for x ∈ O cannot


be contained in O if this is to be a suitable open set. Any point x ∈ X is forbidden when any

open set that contains x is not a suitable candidate to fulfil the OSC. All elements of the sets

f−1i fj(A) for i 6= j in [N ] are forbidden through a straightforward calculation of the second

requirement of the OSC. Through identical reasoning, this construction generalises to f−1i fj(A)

for i, j ∈⋃n∈N[N ]n when i1 6= j1 — as to exclude the entire attractor being forbidden. Finally,

a constructive interpretation of the OSC can be presented through the following definition

[BHR06].

Definition 4.15. The maps in

N ={f−1i fj | i, j ∈ [N ]∗, i1 6= j1

},

are called the neighbour maps for the IFS F . Furthermore the sets h(A) for h ∈ N are the

neighbour sets. Define the central open set as

Uc = {x | d(x,A) < d(x,H)},

where d(x, S) = infs∈S{d(x, s)}.

The central open set is clearly open and, by definition, stays away from the forbidden points

formulated. It is shown in [BHR06] that this construction is a feasible open set for the OSC if

and only if a IFS obeys the open set condition, otherwise Uc = ∅. The definitions given above,

and their relation to tiling theory, are more easily digested through an explicit example.

Example 4.16. Take the IFS F = {(C, |·|); f1(z) = z2 , f2(z) = z+1

2 , f3(z) = 2z+√3i+1

4 }, then the

attractor is an equilateral Sierpinski triangle contained in the unit square with unit side lengths.

A first approximation to this attractor is shown in black below, with the central open set in

blue. The neighbour sets which intersect the attractor are coloured in red. Correspondingly,

the images of the central open set under the neighbour maps that create these neighbour sets

are shaded in red.

Uc

f−11 f3f−1

2 f3

f−12 f1

f−13 f1 f−1

3 f2

f−11 f2

Figure 4.2: Sierpinski Triangle hull (black) and the Neighbour Sets that intersect the attractor

(red). Central open set shaded in blue and images of Uc under the listed Neighbour Maps shaded

red.


This example may foster some naive conjectures on the properties of the central open set. For

instance, the set above is connected, convex, bounded and has a smooth1 boundary. These

qualities are generally untrue for the central open set as shown through the following two coun-

terexamples. The central open set for the Twin Dragon IFS [BD85] trivially has non-smooth

boundary as it is the interior of the Twin Dragon fractal. The other properties are seen in the

example below.

Example 4.17. Consider an IFS acting on (C, | · |) with four maps:

f1(z) =z

4f2(z) =

z

4+

1 + i

4f3(z) =

z

4+ 2 + 2i f4(z) =

z

4+

9 + 9i

4

Take the convention z = <(z) + =(z)i for <(z), =(z) ∈ R. It follows that the attractor of this

IFS is A = {z : <(z) = =(z), <(z) ∈ [0, 1] ∪ [2, 3]} and also obeys the open set condition.

The neighbour sets of this IFS are completely contained on the line <(z) = =(z) through a

straightforward calculation. Furthermore the neighbour sets union to be this line set-minus the

attractor. Below is the attractor (coloured according to the partitioning given by Ai := fi(A)

for i ∈ [4] ) and the corresponding central open set shaded in blue.

<(z)

=(z)i

3

3

1

1

2

2

Explicitly,

Uc = {(<(z),=(z)) : <(z) +=(z) > 0, <(z) +=(z) < 2 or <(z) +=(z) > 4, <(z) +=(z) < 6}.

In this example Uc is not connected, convex or bounded.

A property of the central open set which holds generally it that it has µA measure one when it

is non-empty for any associated IFS of probabilities.

Proposition 4.18. Let F be an IFS that obeys the OSC and has central open set Uc 6= ∅. Let

M be an associated IFS with probabilities whose invariant measure is µA. Then µA(Uc) = 1.

1Integer fractal dimension boundary.


Proof. The only points in Uc ∩ A are those which exist in the attractors dynamical boundary

defined by

∂A :=∞⋃k=1

F−k(C) ∩A,

where C is the set of overlap of the attractor A, C :=⋃i 6=j fi(A) ∩ fj(A).

Informally speaking, the set of overlap is where the pieces, {fi(A)}Ni=1, ‘just touch’ each other

(intuitively, look at the images of the vertices of a Sierpinski triangle under its three maps).

Taking the pre-image of all such points considers all of the points for which fi(A) ∩ fj(A) =

A ∩ f−1i fj(A) 6= ∅ for finite strings i, j =⋃n∈N[N ]n.

Let a probability vector p be associated with the IFS F . Then the Bernoulli probability mea-

sure ([N ]N,B([N ]N), νp) has the νp-null set π−1(A \ ∂A). To see this we must consider the set

of disjunctive sequences in [N ]N. The set of disjunctive sequences D [BBHV16] are those which

have a dense orbit under the shift map S : [N ]N → [N ]N, or equivalently the infinite sequences

that contain every finite string as a sub-string. Fix a finite string i ∈ [N ], then through a

straightforward calculation the set of words in [N ]N that contain this as a sub-string is a νp

full measure set. Taking the countable intersection over all such finite sub-words grants that

νA(D) = 1 (look at the compliment of each of these sets, countably union over N, apply the

monotonicity of a measure and De-Morgans law to yield νA(D) = 1).

For a point x ∈ D then x /∈ ∂A. Let x ∈ A, then π(ω) = x for some word ω which we do not

assume to be unique. In this representation from the commutative diagram in Proposition 3.16

we have that π(Sk(ω)) ∈ F−k(x). If x ∈ ∂A, then all of the iterates π(Sk(ω)) also belong to

∂A. However, if x ∈ D then the orbit under F−1 would be dense but as F obeys the open set

condition we must have that ∂A 6= A [Mor99]. Therefore x ∈ D ⇒ x /∈ ∂A. As the invariant

measure µA is normalised

µA(A \ ∂A) = νp(π−1(A \ ∂A)) ≥ νp(D) = 1,

granting µA(A \ ∂A) = µA(Uc) = 1.

An immediate consequence of the Proposition above is that there is no ambiguity in the state-

ment presented in Chapter 3. So with complete clarity, we may now state that any IFS with

probabilities whose IFS obeys the OSC is isomorphic to a Bernoulli shift.

Corollary 4.19. Theorem 3.18 is an immediate consequence of Proposition 4.18 with identical

working given in Proposition 3.16.

This theory on the OSC has a relation to fractal tiling theory. A tile is an element of H(X)

and a fractal tiling is a collection of tiles such that they union to equal the attractor of an IFS.


In this regard, we aim to tile the ambient space X with a tiling system given by an arbitrary

IFS. Figure 4.16 is misleading to the assumption of such a tiling may be created through the

use of neighbour sets. The Sierpinski triangle has many symmetries. These symmetries actually

enforce that any fractal tiling made from this IFS are in some sense isomorphic to one another

[BV18]. Loosely, the neighbour sets are really constructing the set of all possible tiles for all

tilings for a given IFS and when such tilings are isomorphic these tiles perfectly overlay on one-

another — a rigidity property in tiling theory. Thus, neighbour sets are not directly applicable

as this does not hold generally, rather we prove the following theorem.

To continue, we require the following notation.

Notation 4.20. Let F = {(X, d); f1, . . . , fN} be an invertible IFS and θ = (θ1, θ2, . . . ) ∈ [N ]N.

Define f−1θ|i to be

f−1θ|i := f−1θ1 ◦ · · · ◦ f−1θi.

The following theorem does not assume that the fractal attractor has non-empty interior, in

comparison to what is shown in [Vin18]. Therefore, this may be considered a minor extension

of this work.

Theorem 4.21. Let F = {(X, d); f1, . . . , fN} be an invertible IFS that obeys the OSC and whose

central open set Uc is bounded. Then:

1. there exists an IFS F with Condensation1 of N + 1 maps acting on (X, d) that obeys the

OSC and whose attractor is Uc;

2. for θ ∈ [N ]N such that π(θ) /∈ ∂A,

X = limi→∞

f−1θ|i (Uc) =⋃i∈N

⋃j∈[N+1]i

f−1θ|i ◦ fj(Uc); (4.2)

3. for a fixed i ∈ N, the open sets {f−1θ|i ◦ fj(Uc)}j∈[N ]i are disjoint.

In the statement above, f−1θ|i ◦ fj(Uc) will be a tile of non-negligible Lebesgue measure, their

interior being disjoint assures that the overlap of these tiles are negligible in a measure theoretic

sense described in the next section. The first union applied is looking at all the tiles of a certain

size, the second is a union over increasing sequence of sets that limit to the ambient space under

appropriate assumptions. To show the intuition of this proof, we explain an algorithm that

generates a fractal tiling; this algorithm was used to create the tiling photo of a generalised

Sierpinski triangle in [BV18] and used in [Gra18, Ste18] (see Figure A.1).

Let an IFS F = {(R2, ‖·‖2); f1, . . . , fN} be given with attractor A. Associate R2 with an infinite

sheet of plastic with an image of A on it. Choose a point x on the attractor, this point has

1An IFS whose functions are defined fi : B(X) → B(X) not fi : X → X.


minimally one address θ ∈ π−1(x) 6= ∅. Find this point x on the piece of plastic, place a pin in

it, and cut a circular piece of plastic around this point. Now stretch the plastic with equal force

around this circle, whereby the pin is fixed. How much this circle is stretched is parameterised

by i in the above notation and as i → ∞ we stretch the plastic to infinity and create a fractal

blow-up of which we aim to tile. If a point is chosen not according to what is assumed above,

for example select a vertex on a Steemson triangles triangular convex hull, then the fractal blow

up may not expand in all directions of the space X.

Proof. Let Uc be the central open set of F . As F obeys the OSC, this set is non-empty and

contains all the points A \ ∂A by the preceding proposition. Define F = {(X, d); f1, . . . , fN , fc}where fc : H(X) → H(X) given through f(V ) := Uc \ F(Uc): a ‘constant’ function on H(X)

that is well defined when Uc is bounded. If this set is empty, then the IFS F already obeys

the statement given. Make the notation fN+1 = fc. It is immediate that F is a contraction on

H(X) whenever F is, furthermore F(Uc) = Uc and Uc is a suitable open set for the OSC.

As π(θ) := x /∈ ∂A, then x ∈ Uc. There exists ε > 0 such that Bε(x) ⊂ Uc ⊂ Uc. Let λmax be

the maximum contractivity of the functions f1, . . . , fN , then

X∞←i←−−− Bε/λmax

(x) ⊂ f−1θ|i (Uc) =⋃

j∈[N+1]i

f−1θ|i ◦ fj(Uc).

By the construction of Uc not containing any neighbour sets, {f−1θ|i ◦ fj(Uc)}j∈[N ]i are disjoint.

It is untrue {f−1θ|i ◦ fj(Uc)}j∈[N+1]i are disjoint as we have the triviality f−1θ|i ◦ fN+1,j2,...,ji(Uc) =

f−1θ|i ◦ fN+1,j′2,...,j′i(Uc) for jk and j′k distinct in [N + 1]. With additional notation and ordering

one could remove this case, but it is unnecessary here.

More intuitively, the closure of the central open set tessellates the space X given a fractal tiling

system. For instance, a hexagon tiling of R2 can be made from the Sierpinski triangle IFS given

in Example 4.16. Typical fractal tiling systems aim to tile a fractal blow up, this construction

uses the fractal tiling system to tile the ambient space. We use such a construction in analysing

the dimension theory of an IFS attractor.

4.2.2 Hausdorff Dimension

Fractals are sometimes considered to be objects that are ‘rough’. The notion of roughness

typically used is that of non-integer dimensions, made explicit with the aid of the Hausdorff

measure and dimension.

Definition 4.22. Let δ > 0 and E ⊂ X for a metric space (X, d). The Hausdorff content of E

is defined through:

Hαδ (E) := infVi∈B(X)

{ ∞∑i=1

|Vi|α : E ⊂∞⋃i=1

Vi, |Vi| < δ

},


where |Vi| is the diameter of the set given by |Vi| := supx,y∈Vi d(x, y). Furthermore let Hα(E) :=

limδ→0Hαδ (E).

The firstly defined quantity is the α-Hausdorff content of a set E, the second is the called

the Hausdorff measure (which is a positive measure). The first value α for which the Hausdorff

content is finite is the Hausdorff dimension of the set E, that is dimH(E) = infα∈R+{α : Hα(E) <

∞}. This is well defined as the function is decreasing in α. The exact formalities of these points

can be found in most introductory textbooks pertaining to the subject (for instance [Fal04]).

The relationship this has to the box-counting dimension is that here we may select any covering

of a set E, whereby the box counting method allowed only for coverings by boxes. Hence

dimB(E) ≥ dimH(E). To illustrate that this idea coincides with our traditional (topological)

idea of dimension, we will do the following example of a line segment.

Example 4.23. Consider the set A = [0, 1] ⊂ R. We may cover A by dyadic intervals, that is

A =⋃2j

i=1

[i−12j, i2j

)∪{1}. As we have equality, this covering is optimal and disjoint, furthermore

for any δ > 0 we can make the interval width sufficiently small. It is immediate that for d 6= 0

we have that Hd({1}) = 0, thus Hd2−j (A) =

∑2j

i=112dj

= 2j

2djwhich limits to zero with j → ∞ if

and only if d > 1. Therefore A has Hausdorff dimension one through the infimum.

Moran’s Theorem for the box-counting dimension (Theorem 2.17) remains true for the Hausdorff

dimension. Furthermore, the proof of the theorem becomes relatively straightforward with the

theoretic structure that the Hausdorff dimension provides.

Theorem 4.24. [Mor46] Given an IFS that obeys the OSC and consists of N many similitudes

whose scaling factors are {λi}Ni=1, the attractor A obeys dimH(A) = D where∑N

i=1 λDi = 1.

A sketch of this can be made through the observation that if⋃∞i=1 Ui is a covering of a self-similar

set A =⋃Ni=1 fi(A), then fj (

⋃∞i=1 Ui) covers fj(A). The assumption of the OSC grants that

taking the image of a optimal covering under each fj and taking the union over j gives another

optimal covering of A, for instance one might consider using scaled copies of the central open set

(union the dynamical boundary) as a covering. As the functions fj are taken to be similitudes

we have that |fj(Ui)| = λjδ when |Ui| = δ. In the language of the Hausdorff content:

Hαδ (A) = Hαδ

N⋃j=1

fj(A)

=N∑j=1

λαjHαδ (fj(A)) =N∑j=1

λαjHαλiδ (A) .

Where above we have used that the scaled coverings can be made almost-disjoint through the

OSC. Allowing δ → 0 gives Hα(A) =∑N

j=1 λαjHα (A). It is not straightforward to show that

dimH(A) = infα∈R+{α : Hα(A) < ∞} > 0 and can be found in [Sch94] by assuming the OSC,

but once this is obtained the result is immediate.


4.2.3 A Relationship Between H(X) and P(X)

With the machinery developed in the previous two subsections, we may present a view of how

the fractal models in H(X) and P(X) can be related to one-another. This relationship is best

seen through how the dimension theory interplays between fractal attractors and self-similar

measures. We present this relationship in the case of IFSs that obey the OSC and whose

functions are similitudes. Only under these assumptions is the dimension theory for an IFS

attractor concisely shown. To elucidate on this point, a current research paper on the theory

of two-dimensional affine attractors begins with, ‘The dimension theory of self-affine sets and

measures is far from being completely understood’ [BRS16].

The following result is proved in one dimension in both [Fal04] (under the assumption the attrac-

tor is disconnected) and [BRS19], although the geometric proof given in [BRS19] has an error

with their use of the OSC, with the IFS given in Example 4.17 projected onto the <(z)-axis

as a counterexample to their argument. A similar result in n-dimensions with HESC is stated

in [BRS16, BRS18, BRS19] and used as the basis for approximating coverings of an affine IFS.

The following geometric proof is of the writers own creation and uses the OSC and tiling theory

already shown.

Firstly we define the Hausdorff dimension of a measure.

Definition 4.25. The Hausdorff dimension of a measure µ on (X,B(X)) is given by

dimH(µ) := infV ∈B(X)

{dimH(V ) | µ(V ) > 0},

where it may be equivalently stated that if the limit limr→0log(µ(Br(x))

log(r) = α for a fixed α ∈ R+

for µ-almost every x ∈ X then dimH(A) = α [MMR00].

Intuitively, dimH(µ) is the Hausdorff dimension of the smallest1 set of non-negligible µ-measure.

From this definition it is clear that for a self-similar measure, dimH(µA) ≤ dimH(A). Some

preliminary statements are needed before we present this relationship between spaces.

Lemma 4.26. Let F = {(Rn, ‖ · ‖2); {fj}Nj=1} be an IFS of similitudes that obeys the OSC

and M be an associated IFS with probabilities. Let A be the attractor and µA be the attractive

measure. Then, for µA-almost all x ∈ A and for any ε > 0 there exist open sets U1,U2 such

that:

1. Ui = riUc + ki where Uc ⊂ Uc and for some ki ∈ Rn;

2. µA(Uc) = 1;

3. U1 ⊂ Bε(x) ⊂ U2.

1In the Hausdorff measure sense.


Proof. Let x ∈ A \⋃i∈NF i(C), from Proposition 4.18 this is a µA-full measure set from which

x is taken. From this choice of x we have that x ∈ Uc, thus Bε(x) ⊂ Uc as Uc is open which up

to renormalisation and translations gives Bε(x) ⊂ U2.Furthermore, x has a unique code-space address: π−1(x) := ω ∈ [N ]N through the injective

coding map π−1 : A \⋃i∈NF i(C) → [N ]N. It follows that fω(Uc) = x, so x ∈ fω|n2

(Uc) ⊂fω|n1

(Uc) for n2 ≥ n1. If necessary, redefine Uc := Uc = {x | d(x,A) < min{1, d(x,H)}} to

ensure this set is bounded. The diameter of |fω|n(Uc)| = |∏i∈ω|n λiUc|

n→∞−−−→ 0 as we constructed

|Uc| < ∞. Select n large enough such that r1 =∏i∈ω|n λi has U1 ⊂ Bε(x) and the result is

given.

Lemma 4.27. Let Uc be defined as in the preceding lemma. Then the set fω|n(Uc) has µA-

measure∏i∈ω|n pi.

Proof. From Proposition 4.18, we have that µA(A \⋃∞j=0F j(C)) = 1. Recall that µA obeys the

fixed point equation∫A1fi(Uc)(x)dµA(x) =

∫A\

⋃∞j=0 Fj(C)

1fi(Uc)(x)dµA(x) =pi

∫A\

⋃∞j=0 Fj(C)

1fi(Uc)(fi(x))dµA(x) =pi.

Where above we have used the construction of the central open set to ensure that 1fi(Uc)(fj(x)) =

1i=j for any x ∈ A \ F(C). As µA(Uc) = 1 then µA(fi(Uc)) = pi. Through a simple induction

the result is given.

Notation 4.28. A convenient notation is given through introducing the (injective) function

ω : A \⋃∞j=0F j(C)→ [N ]N, where this mapping is defined component-wise through

(π−1(x))i = ωi(x) =N∑j=1

1π(j)(F−1)i−1(x) · j j ∈ [N ],

and F−1 denotes the inverse IFS to F and (F−1)i−1 its iterates. In the one-dimensional case,

this can be thought of as the digit map of a dynamical system — such as deducing the binary

expansion of a number in the unit interval from the IFS in Proposition 3.16.

Before reading the following proof, a geometric interpretation is given. We aim to calculate the

point-wise dimension of a self-similar measure µA-almost everywhere. We assume the OSC, so

one may create a tiling around a point x ∈ A\⋃∞j=0F j(C) in the fashion given by Theorem 4.2.

Two such (finite) tilings are created, one for the fractal attractor and another for the extended

IFS whose attractor is Uc that tiles the ambient space. These tilings are over-layed on one

another. Intuitively, we have broken up the attractor into several fractal tiles of which it is

easy to find the µA measure individually for. Furthermore, a covering of these fractal tiles by

non-overlapping open sets is provided by our secondary ambient space tiling. In this instance,

we know precisely the size of such tiles and can use this relationship to compute the dimension

of dµA(x).


Theorem 4.29. Let F = {(Rn, ‖ · ‖2); f1, . . . , fN} be an IFS that obeys the OSC and has

functions of the form fi(x) = λiOix + bi where λi < 1, Oi is an isometry on Rn and bi ∈ Rn.

Associate this IFS with probabilities p = {pi}Ni=1 to create an IFS with probabilities M. Then

the Hausdorff dimension of the attractive measure µA is given by

dimH(µA) =

∑Ni=1 pi log(pi)∑Ni=1 pi log(λi)

. (4.3)

Proof. Fix x ∈ A\⋃∞j=0F j(C), then we aim to evaluate the point-wise dimension of the measure

µA:

dµA(x) = limε→0

log(µA(Bε(x)))

log(ε).

By Lemma 4.26 we can control the convergence of this value by

dµA(x) = limε→0

log(µA(U1)))log(|U1|)

= limn→∞

log(µA(fω|n(Uc)))

log(|fω|n(Uc)|).

Where |fω|n(Uc)| is the diameter of the set |fω|n(Uc)|. We can evaluate this quantity directly

through

dµA(x) = limn→∞

log(µA(fω|n(Uc)))

log(|fω|n(Uc)|))= lim

n→∞

log(∏ni=1 pωi)

log(|Uc| ·∏ni=1 λωi)

,

where above we have used that similitudes uniformly scale length in all directions. If we did

not have this assumption, then we would not be able to directly evaluate this limit, which is

the source of difficulty in calculating the dimension of an affine IFS. In light of Notation 4.28

it is the case we have that 1π(j)((F−1)i−1(x)) = 1 if and only if ωi = j for j ∈ [N ]. The

function x →∑N

j=1 pj1π(j)((F−1)i−1(x))) = pi for all i ∈ N and as the pj < ∞ this function is

in L1(X,B(X), µA). Continuing with our calculation;

dµA(x) = limn→∞

log(∏ni=1

∑Nj=1 pj1π(j)((F−1)i−1(x))))

log(|Uc| ·∏ni=1

∑Nj=1 λj1π(j)((F−1)i−1(x))))

= limn→∞

1n

∑ni=1 log(

∑Nj=1 pj1π(j)((F−1)i−1(x))))

1n

∑ni=1 log(

∑Nj=1 λj1π(j)((F−1)i−1(x)))) + 1

n log(|Uc|)

=

∫log(

∑Nj=1 pj1π(j)(z))dµA(z)∫

log(∑N

j=1 λj1π(j)(z))dµA(z). (Birkhoff’s Ergodic Theorem)

the quantity on the numerator is the entropy of µA. It is shown in [Wal00] Corollary 4.9.1 that

this value is finite and so the above limit exists as the denominator is clearly bounded from zero.

What is left is the quotient of integrals that can be readily evaluated through noting

∫log

N∑j=1

cj1π(j)(z)

dµA(z) =

N∑i=1

pi log(ci)


for constants cj > 0: granting

dµA(x) =

∑Ni=1 pi log(pi)∑Ni=1 pi log(λi)

.

The above value is constant for µA-almost any x, giving the result. This quantity is the entropy

of µA divided by its Lyapunov exponent.

Observation 4.30. Making the selection pi = λDi , where D is the Hausdorff dimension of A

grants that:

dimH(µA) =

∑Ni=1 λ

Di log(λDi )∑N

i=1 λDi log(λi)

= dimH(A).

Furthermore, the function p 7→∑N

i=1 pi log(pi)∑Ni=1 pi log(λi)

is concave in pi ∈ (0, 1) with the constraint∑Ni=1 pi = 1 [Fal04], Proposiotin 17.8. A brief explanation of this is that the (negative of

the) numerator can be viewed as a entropy, which is always concave. The function on the de-

nominator is both concave and convex as a linear function in the pi. The quotient of these

functions is typically quasi-concave which in this instance may be lifted to concavity. One may

also calculate the second derivative directly in one variable for the case N = 2 (through noting

p2 = 1− p1) and check it is negative.

In particular, this function has a unique maximum value which is achieved for the selection

pi = λDi as it was previously stated dimH(µA) ≤ dimH(A). Hausdorff dimensions can be

estimated by box-counting dimensions, so making such a choice in computation for modelling a

fractal object is possible.

The result above is what grants the link between P(X) and H(X) in a modelling context, where

a black and white image of a fractal attractor can be considered a set or ‘uniform’ measure on

such a set. Any IFS of similitudes that obeys the OSC may be equipped with a unique set of

probabilities such that dimH(µA) = dimH(A). This uniqueness grants the following Corollary.

Corollary 4.31. Let the Cantor set C ∈ H([0, 1]) be given. Then all the affine IFSs of two

maps, F = {([0, 1], | · |); f1, f2}, that obey the OSC and whose attractor is C are listed in FCantor.

Proof. Take an IFS that obeys the OSC and whose attractor is the Cantor set C. Then there

exists a unique set of probabilities to create an IFS with probabilities whose attractive measure

is uniform measure on the Cantor set. ‘Uniform’ measure here is defined to be dimH(µA) =

dimH(A). All such IFSs with probabilities are listed in Corollary 4.13.

The assumption of the OSC can be loosened to a ‘perfectly overlapping IFS’ [BRS18], which

we do not present here for clarity, but can exploit in the following statement. All affine IFSs of

N maps on ((0, 1), | · |) that obey HESC [BRS18] whose attractor is a Cantor set has functions

of the form given in Corollary 4.13. The selection of the pi for a specific instance of IFS with

probabilities that obeys the OSC allows for the modelling of objects in H(X). This will be done

explicitly in the next computational section.


4.3 The Collage Theorem for Moments

The iterative formula created in Chapter 3 for the calculation of moments of a self-similar

measure serves another purpose. Namely, as we extracted a contraction mapping on 1`∞ —

under certain conditions — we recover another form of the Collage Theorem for moments.

Theorem 4.32. Let M be an IFS with probabilities of the form given in either Theorem 3.53

or 3.55. Then, its associated vector of moments M∗x,y obeys the inequality:

‖M −M∗x,y‖g ≤‖ΦM −M‖g

1− λ,

where M ∈ 1`∞ and λ is the contractivity factor of the operator Φ : (1`∞, ‖ · ‖g)→ (1`∞, ‖ · ‖g).

This Theorem has the following interpretation for modelling: given a vector of moments M , we

want to find a contractive operator Φ whose fixed point M∗ is close to M . The bound states that

if M is almost a fixed point of Φ, then the actual fixed point of Φ is near M in the ‖ · ‖g-norm

sense.

Proposition 4.33. Let M be an IFS of the form of the above, then under the norm ‖ · ‖g we

have Mx,y ∈ c0: the space of convergent sequences in (1`∞, ‖ · ‖g) approaching zero.

Proof. Select the ‖ · ‖g norm such that the operator Φ is contractive and the attractor of the

IFS Fg◦f◦g−1 is (strictly) contained in the ‖ · ‖∞-unit ball — such a choice is always possible as

the scaling of the function g can be arbitrarily close to zero and continue to make Φ contractive.

We may either view Mx,y ∈ (1`∞, ‖ · ‖g) as the vector of moments for µA or Mx,y ∈ (1`∞, ‖ · ‖∞)

as the vector of moments for µA ◦ g−1 in light of Lemma 3.48. Then for n1, n2 ∈ N0,∥∥∥∥∫ xn1yn2dµg(A)(x, y)

∥∥∥∥∞≤∫‖xn1yn2‖∞dµg(A)(x, y) ≤ max

x,y∈g(A)‖xn1yn2‖∞

n1+n2→∞−−−−−−−→ 0,

where we have used µA is normalised and any points on g(A) are less than one. Through the

(graded lexicographic) ordering chosen in the vectorisation map v, the claim is given.

Corollary 4.34. Let M be of the form assumed above. Let M∗x,y be the vector of moments of

the invariant measure µA that is in 1`∞. Let M ∈ 1`∞ and Mn be its nth level truncation chosen

such that n = bnct, then we gain the following chain of bounds:

‖M∗x,y −M‖g ≤‖ΦMM −M‖g

1− λ≤‖ΦM,nMn −Mn‖g

1− λ+ ε(n) ≤ C(λ)‖ΦM,nMn −Mn‖2 + ε(n).

for some constants ε, C > 0 that depends on the truncation level n and contractivity λ. Where

ε can be made small with increasing n.

Remark 4.35. The above Corollary firstly uses the Collage Theorem for moments, then Propo-

sition 3.52 that states we may appropriately re-scale the space such that the measures considered

have moments that tend to zero. This allows for truncation to finitely many moment terms Mn

4.3. THE COLLAGE THEOREM FOR MOMENTS 69

plus some error term ε > 0 which can be made arbitrarily small with increasing n by Proposition

4.33. Finally, as Mn is finite dimensional, we may swap to the L2 norm and have some con-

stant that depends on the contractivity of the operator Φ. This creates a feasible computational

problem.

The Corollary above gives justification to truncating finitely many moment equations and finding

a least squares solution that does not solve our problem; it simply must be a local-minima of the

moment equations. For instance let M21 be the vector of the first 21 moments for the Steemson

triangle in Example 2.10. A 21-moment approximation of a Steemson triangle found through a

least squares optimisation, where ‖ΦM,21M21−M21‖2 ≈ 6 · 10−6 for the affine approximant and

≈ 9 · 10−5 for the similitude approximant. In this approximation, the maps were intentionally

given the incorrect orientations as to not be an exact reconstruction.

Figure 4.3: Affine (left) and similitude (right) moment approximation of a Steemson triangle.

In particular, we have created an approximation method that does not require the object that we

are approximating to be strictly self-similar but may find an exact solution if it exists (see Figure

4.5). This bound also makes sense of the trivial solutions spotted in the algebraic approach to

solving these moment equations, seen through the following example.

Example 4.36. Recall we had the trivial solutions to the polynomial system in one dimension:

{ai, bi, pi ∈ R

∣∣∣ ai ∈ {−1, 1}, bi = 0, ∀i ∈ {1, . . . , N};N∑i=1

aipi = 1}.

In one dimension we have that |ai| = λi. The bound given by the Collage Theorem for moments

is meaningless for these trivial solutions as the constant C(λ)maxi |ai|→1−−−−−−−→ ∞. This can be seen

through identifying that the contractivity of Φ is λ = maxi∈[N ]{λi} in one-dimension, where

λi are the contractivity factors of the maps fi in the IFS. In the two-dimensional case, trivial

solutions to the moment equations are in even more abundance and may potentially ruin any

numerical experiment run if not appropriately controlled. These solutions cannot be generally

written out as knowing precisely these solutions would imply knowing when an affine IFS is

contractive with respect to its map parameters.


The above shows one caveat that is often left unmentioned in much of the literature, that being

that the quality of the bound given from the Collage Theorem is highly dependent on the operator

Φ. In practice, this is not commonly a problem as the model fit is sufficiently constrained

to, consciously or unconsciously, make sure this bound does not blow up. If this problem is

addressed, then we have the following interpretation of this bound. The vector equation ΦM−Mrepresents our system of non-linear equations, for which (in practical use), is unreasonable to

assume that an exact solution exists. The bound given states that ‖ΦM −M‖g being small for

fixed IFS parameters, fixing Φ will yield an approximate solution, or more explicitly a local-

minima of ‖ΦM −M‖g may produce a ‘good’ approximate. Thus, in a synonymous fashion

to the ‘standard’ Collage Theorem, we explore this interpretation computationally in the next

section.

4.4 Computational Experiments

The following experiments are made available here in Jupyter notebooks. Justification and

results of these computations are shown here for completeness.

4.4.1 One-Dimensional Example with Polynomials

Generally the coefficients of a polynomial system have errors, unlike our Cantor example given

in exact rational arithmetic. If rational computation was not used in our Grobner basis section,

no solutions would have survived the floating point rounding errors. In fact, the floating point

implementation of polynomial division in Python1 returns one (trivial division) for all floating

point numbers that are not in their exact representation as rational numbers (multiples of the

floating point mantissa) and had to be reimplemented for many of our calculations when using

Grobner bases.

For error analysis, we aim to bound the backwards error δ of our polynomial system. The back-

wards error is the smallest distance a rounded input is from solving the set of equations given

an exact input. ‘Input’ here is misleading: the errors in the problem arise from the coefficients

of the polynomial system, so the true way to view ‘input’ is through looking at the dual space

of (finitely many) monomial evaluations.

Assume an exact solution to a single polynomial q in one variable x1 exists and call this z ∈ R.

Then the finite collection of monomials that make a polynomial q can be evaluated exactly at z,

this vector of evaluations is an element of Rn when q is a n− 1 degree polynomial. We now ask

how far away the solution a polynomial q with errors is from giving the correct solution to the

exact problem of q. Different polynomials evaluated at z can be thought of as linear functionals

1Sympy.


4.4. COMPUTATIONAL EXPERIMENTS 71

acting on this vector of monomial evaluations, represented as vectors of the coefficients of the

polynomials. This reasoning generalises to polynomial systems in multiple variables [Ste04].

Naturally, the distance between two elements of the dual space is the induced norm, which can

be any norm ‖ · ‖∗ as we are in finite dimensions. Let Mn be the vector of moments estimated

from an empirical distribution through Elton’s Theorem and Mx,n be the exact values of the

moments. The backwards error, δ ∈ R≥0, is:

δ(n) = ‖Mx,n − Mn‖∗ = O

(1√n

).

The interpretation of this bound is that as the sample size of the empirical distribution increases,

then too does the accuracy of the moment values through Elton’s Theorem, as does the accuracy

of the IFS parameters found in solving the inverse problem at the same rate as the moment values

converge. This can be observed in the Python Notebook given. Finally, when aiming to solve

the least squares problem of ‖ΦM −M‖22 ≈ 0 it is valid to do these computations in floating

point arithmetic as minor changes in the input of polynomial coefficients lead to minor changes

to the value ‖ΦM −M‖22 ≈ 0. This statement is very different to stating that a minor change

in the coefficients leads to a minor change of a polynomial root. For instance, take the real

polynomial q(x) = x2, it has a single (repeated) root at zero. The polynomial q(x) = x2 + 0.001

has coefficients that are close to the coefficients of q in the dual space, yet this polynomial does

not even possess a root in R. With this stated we aim to solve the following system numerically:

arg minai,bi,pi,t1i,t2i∈R

‖ΦnMn −Mn‖22 +

N∑i=1

((ai − 1)t1i − 1)2 + ((ai + 1)t2i − 1)2

=k∑

n=0

N∑i=1

pi

n∑j=0

(n

j

)ajimjb

n−ji −mn

2

+N∑i=1

((−ai + 1)t21i − 1)2 + ((ai + 1)t22i − 1)2.

This least squares objective function — we call Q — is clearly not convex and will be plagued

by several local minima that a standard solver will fall victim to before a global minimum is

obtained. One solution is to use non-convex optimisation techniques such as simulated annealing.

Instead of this, we use a form of preconditioning on our system of equations so that our initial

point of iteration may find a global minima. Note the partial derivatives of ΦnMn −Mn:

∂qn∂pi

=

n∑j=0

(n

j

)ajimjb

n−ji ,

∂qn∂ai

= pi

n∑j=0

j

(n

j

)aj−1i mjb

n−ji ,

∂qn∂bi

=

n∑j=0

(n

j

)(n− j)ajimjb

n−j−ii ,

that can be computed in an efficient vectorised manner. This is implemented in the notebook

provided to efficiently calculate the Jacobian of the constrained polynomial system that rids the

trivial solutions where ai = ±1. Let x0 be a vector in R5N of our IFS parameters with their

constraint variables. To pre-condition the initial starting point, the Fredman-Altman iteration

[Alt80] is used through the update rule:

xk+1 = xk −‖Q(xk)‖22‖JTQQ(xk)‖22

JTQQ(xk) = xk −‖Q(xk)‖22‖JTQQ(xk)‖2

JTQQ(xk)

‖JTQQ(xk)‖2,


where JQ is the Jacobian of the constrained polynomial system. Geometrically the unit vector

of JTQQ(xk) is choosing the direction in which to descend to a local optima. This is not a descent

method, as the value‖Q(xk)‖22‖JT

QQ(xk)‖2will only decrease to zero when the numerator approaches zero

‘sufficiently fast’, thus local minima are avoided as ‖Q(xk)‖22 6= 0 in this instance. When in a

local minima, this value becomes large as the denominator approaches zero forcing the algorithm

to ‘jump out’ of such a minima. Convergence to any minima is slow with this update rule, so

after applying this pre-conditioning method to a point, standard local optima optimisations

can be applied. The polynomial system constructed here can be made arbitrarily large in both

its variables and equations making a potentially challenging problem for numerical polynomial

solvers.

Example 4.37. The inversion of the nth level Hilbert matrix is a difficult task for linear nu-

merical solvers. This is precisely a matrix of moments for Lebesgue measure on the unit interval

[JKS11]. We pose a similar numerically difficult task for polynomial systems.

Let Fk = {([0, 1], | · |) : {fi}Ni=1} where fi(x) = xn(n+1) + 1

n+1 . For every k ∈ N the attractor Ak of

this IFS will be totally disconnected — the scaling factors sum to less than one, the IFS obeys

the OSC and so by Moran’s Theorem dimH(Ak) < 1 [Fal04]. Let p ∈ Rk such that pi = 1i(i+1)

for i ∈ [k−1] and pk = 1−∑k−1

i=11

i(i+1) , then the associated IFS with probabilitiesMk will have

an attractive measure that is singular with respect to Lebesgue. Restrict the fractal model fitted

to this measure to be Fk = {([0, 1], | · |); {f i := ai ·+(b−∑i−1

j=1 ai)}Ni=1} such that b, ai ∈ (0, 1),

through constraint polynomials. Then the polynomial system formed from this measure will

have a unique solution for each k ∈ N by construction. However, the attractive measure of

limk→∞Mk is1 µL and so the polynomial system will have uncountably many solutions. This is

a contrasting example to that given in Example 3.1, where a sequence of absolutely continuous

measures with respect to Lebesgue were given that limited to a µL-singular measure.

Here we have briefly discussed the subtle points in computation for the one-dimensional case.

From now, our computations are more exploratory in nature, and so these points will not be

made again.

4.4.2 Similitude Example with Dimension Theory

This subsection is a numerical experiment to show how the theory derived explicitly relates to

fitting a fractal model.

Assume that a black and white image of a Steemson triangle is given, and we may measure its

moments from its empirical distribution. Assume that this image is identified as the attractor

of an IFS given by three similitude maps — an element of H(X). Through the relationship

between H(X) and P(X) found earlier in the chapter, we fit an affine IFS with probabilities

1this dynamical system is a ‘infinite IFS’, related to Luroth series.


model Mmodel to our black and white image. In this instance we place the assumption of the

OSC on our solution. Computationally we aim to solve the non-linear system:

‖ΦMmodel,21M21 −M21‖22 = 0,

where ΦMmodel,21 represents a system of 21 polynomial equations in 21 unknown IFS parameters.

Initially this system was constrained so that the model parameters ai, bi, ci, di had magnitude

less than 1√2

to keep the contractivity of ΦMmodel,21 away from one. Removing this constraint

had the computation reach the same result. As per the theory derived, the probabilities of

these maps are chosen such that pi = λDi where D is the box-counting/Hausdorff1 dimension of

the image (Example 2.19), making us have only 18 unknown variables. For an affine map, the

contractivity factors, λi, are approximated numerically by the square root of the absolute value

of the determinant of the linear component of the affine map. More sophisticated optimisation

techniques such as using a sub-gradient method to handle the non-smooth absolute value func-

tion could be used, but we do not deem these necessary for our illustrative purposes. In the

Jupyter notebook provided, one has the option to also project from an affine map to a similitude

function (in the induced L2 norm) at each iteration of the minimisation. This procedure is what

produced the pictures in Figure 4.3.

Here are some observations from our numerical experiment.

In making this selection of the probabilities, the first equation in our moment polynomials is

1 =∑N

i=1 pi =∑N

i=1 λDi , so our choice of the pi implicitly forces any (non-trivial) solution found

to have the same Hausdorff dimension as the object to which we are approximating when the

maps in the model are similitudes and the IFS obeys the OSC.

Different ‘initial guesses’ lead to different solutions found by the numerical solver. This is clear

as we have a non-convex system of equations which has various local minima, local minima are

explained in the next section with The Collage Theorem for Moments.

On the following page, we show the an initial guess of how to construct a Steemson triangle and

how minimising the moments reconstructs the fractal to machine accuracy. The interpretation

we give to this observation is a form of ‘fractal polish’. It is known that an IFSs attractor is con-

tinuous in its map parameters. Thus, a good initial guess can be made by forming an IFS model

that gives an image similar image and our moment minimisation acts a refinement method.

For instance, the starting guess taken on the next page was three affine maps with map ori-

entations ‘near correct’, whereby the moment equations may deduce the rest of the fractal image.

Imagine that the maple leaf given through the ‘graduate student algorithm’ in Figure 3.3 was

used as an initial guess for a black and white photo of a real world maple leaf. Being able to make

a generic model, such as our fractal leaf, and using moments to capture the subtle differences a

collection of real maple leaves posses would be an interesting computational experiment.

1These are the same for similitude’s.


Figure 4.4: A black and white photo of a Steemson Triangle, that we wish to fit a fractal model

to. This photo can either be thought of as a set in H([0, 1]2) or ‘uniform’ measure supported on

a Steemson triangle.

Figure 4.5: Six fractal models given by an affine IFS of probabilities with three maps. Each

have their probabilities selected so that their attractive measure is uniform when the IFS obeys

the OSC. This sequence was produced by progressively minimising ‖Φ21M21 −M21‖22.


4.4.3 FIF Example with The Collage Theorem for Moments

Finally, we give an example of the third part of the theory created. That is, finding a self-

similar approximant of an object that does not have an exact self-affine representation through

the method of moment matching. We present the following proposition which is a specialisation

of the theory derived that pertains to FIFs, this gives justification to our numerical experiment.

Proposition 4.38. Let ΦM be given as in Theorem 3.53 where we restrict the underlying IFS

to be of the form given in Example 2.11. Further assume that the probabilities pi are chosen to

equal ai := xi − xi−1 for i ∈ [N ]. Then:

1. there exists g such that the contractivity λ > 0 of for Φ : (1`∞, ‖ · ‖g)→ (1`∞, ‖ · ‖g) obeys

λ ≤ maxi∈[N ] |ai||di|+ ε = maxi∈[N ] |xi+1 − xi||di|+ ε for any ε > 0 ;

2. µA(Projx(B)) = µL(Projx(B)) for all B ∈ B([0, 1]2), where µL is Lebesgue measure;

3. the moment vector for µA, Mx,y ∈ (c0, ‖ · ‖∞) and decays at a rate of O( 1√n

);

4. the constants given in Corollary 4.34 obey: C(λ)N→∞−−−−→ 1 and ε(n)

n→∞−−−→ 0 when the

interpolation points are chosen such that

limN→∞

⋃i∈[N ]

{xi} = [0, 1].

Proof. From Theorem 3.53 we have that for a fixed g : R2 → R2 given through g(x, y) = (αx, βy)

we can bound ‖Φ‖g by

∥∥Φg◦M◦g−1

∥∥∞ ≤ max

i∈[N ]sup

n1,n2∈N(|ai|+ α|ei|)n1

(β

α|ci|+ |di|+ β|fi|

)n2

< 1.

For any ε > 0 we may select α, β > 0 such that 1 > ε > maxi∈[N ] max{α|ei|, βα |ci|, β|fi|}. Then∥∥Φg◦M◦g−1

∥∥∞ ≤ maxi∈N |ai||di|+O(ε) := maxi∈N |ai||di|+ ε, granting the first claim.

To show the second claim, it is sufficient to consider sets that form the topological sub-base of

[0, 1]2 that generate its Borel sets. These sets have the form B = [b1, b2]× [b3, b4] for bi ∈ [0, 1].

Define Projx(B) := [b1, b2] × [0, 1] = Bx. Then we have the sub-sigma-algebra of B([0, 1])

generated by sets of the form Bx. See through a calculation:

µA(Bx) =

N∑i=1

piµA(f−1i (Bx)) =

N∑i=1

pi

∫1BxdµA(f−1i ) =

N∑i=1

pi

∫1fi(Bx)dµA

=

N∑i=1

pi

∫1fxi ([b1,b2])×[0,1]dµA =

N∑i=1

piµA(f−1xi (Bx)) := µAx(Bx),

where we define fxi : R → R by fxi(x) := (xi+1 − xi)x + xi and above have used that A ∩fi(Bx) = A ∩ (fxi([b1, b2]) × [0, 1]), and µA is supported on A. Now we have the projected


IFS Fx = {([0, 1] × {[0, 1]}, dx); fx1 , . . . , fxN } where d(x × {[0, 1]}, y × {[0, 1]}) := |x − y| —

this makes a compact metric space. The attractor of this space is Ax = [0, 1] × {[0, 1]} and

the invariant measure obeys µAx(Bx) = b2 − b1 = µL(Bx) = µA(Bx) when equipped with

probabilities pi = (xi − xi−1). This gives the second claim.

Singleton sets are of µA measure zero when the underlying IFS consists of more than one map.

This IFS has only finitely many points outside the open ‖ · ‖∞ unit ball granting the following

bound: ∫|xn1yn2 |dµA(x, y) =

∫|x|,|y|<1

|xn1yn2 |dµA(x, y)n1+n2→∞−−−−−−−→ 0.

To determine the rate of this moment convergence, recall that the measure µA projected onto

the x-axis is unit supported Lebesgue measure from claim two, yielding:∫xn1yn2dµA(x, y) ≤

∫xn1dµA(x, y) =

∫xn1dx =

1

n1= (Mx,y)τ(n1,0) = (Mx,y)n1(n1+1)/2,

which through a change of variable gives the third claim. Note in the case of n1 = bn1ct we have

equality, so a better bound may not be achieved for the decay of the whole sequence.

The fourth claim is an immediate consequence of the first and third claims.

The interpretation of the fourth part of this proposition is: the bound in Corollary 4.34 becomes

better when more interpolation points are used in our FIF interpolant; and when these points

are chosen in a ‘dense’ fashion the contractivity factor of ΦM approaches zero. Furthermore,

the bound becomes better when more moment equations are considered in a minimisation.

In our numerical test, we are interested in minimising the following function in the variables

xr, yr ∈ [0, 1] and dr ∈ (0, 1), ‖ΦM,nMn −Mn‖22 equal to

∑τ(n1,n2)≤n

(mn1,n2 −

∑I

n1! n2!

i! k! i! j! k!(xr − xr−1)i+1 (yr − yr−1 − dryN )i djr x

kr−1 y

kr−1 mi+i,j

)2

where I = {i + k = n1, i + j + k = n2, r ∈ [N ]}, the selection pr = |xr − xr−1| has been made

and only a local minima of the function is needed. If we take the parameterisation x0 = 0 and

xN = 1 with xi = xi−1+δi−1 for δi ∈ (0, 1), our model is flexible enough to approximate another

FIF made from N maps exactly — that is we have a global minima of ‖ΦM,nMn −Mn‖22 = 0.

This may prove difficult in computation as δr → 1 for some r ∈ [N ].

Lemma 4.39. Let an IFS F of N maps be given that obeys the open set condition. Then there

exists another IFS F that obeys the OSC of N ≥ N maps such that N mod (N − 1) = 1 and

F(A) = F(A) = A for a unique A ∈ H(X).

This result is immediate, but grants the following corollary.

Corollary 4.40. Let an IFS F be of the form given in Example 2.11, whose attractor is a

fractal interpolation function, and let 0 < δ < 1. Then there exists an IFS F of N maps that

obeys the proceeding lemma such that the attractor of F is the same as F and the interpolation

points obey |xi−1 − xi| < δ for i ∈ [N ].


Given a FIF constructed from an IFS with N maps, there is another IFS of N maps with the

same attractor such that the interpolation points used to construct the IFS in the x direction

are spaced no larger than δ apart. In particular, when modelling a FIF with another FIF we

may constrain the parameterisation above to only consider δi < δ and still find an exact rep-

resentation if we select N sufficiently large. Therefore, we could select a grid of points {xi}Ni=0

such that xi+1 − xi < δ where λ < δ as the contractivity of ΦM so we both control our bound

on the collage theorem for moments and have a model flexible enough to find an exact solution.

Our truncated FIF example from Chapter 2 does not have an exact representation, therefore we

must approximate.

Our computational example has the following design. The first 36 moments of the empirical

distribution given by measured data was calculated. Selecting 36 moments grants ε(n) < 1√36

from Proposition 4.38. A FIF model parametrised by the ‘interpolation points’ {xr, yr}Nr=0,

where we restrain x0 = 0 and xr = 1, was made. An initial approximation is found through the

FIC method presented in Chapter 2, withN = 6 and the di chosen to match the fractal dimension

of the data given, see Figure 4.6. This N was chosen to give the same free parameters as the

example given with 16 maps in Chapter 2. Let the parameters used to generate this model be

denoted {xFICi , yFICi , dFICi}Ni=1. We then apply ‘fractal polish’, see Figure 4.7. A regularisation

term was added in this minimisation, being∑N

i=1 ‖xi − xFICi‖2 + ‖yi − yFICi‖2 + ‖di − dFICi‖2to assure that any gap |xi−1 − xi| does not become too large 1 recalling Proposition 4.38. the

selection of pi = ai = xi − xi−1 is made. If the data were to be ‘uniformly distributed’ on

the fractal curve, we could exploit a simultaneous dimension/moment matching technique. A

generalisation of Theorem 4.29 for triangular IFSs is to select the probabilities pi = |ai|D−1|di|to assure the attractive measure is ‘uniform’ on the IFS attractor, what one may conjecture

given Theorem 2.20 [BRS16].

Example 4.41. Recall the discretised one-dimensional image in [0, 1]2 given in Example 2.24

represented by the vector e1 = (1, 0, . . . , 0) ∈ R1000. This image may be considered a graph of a

function or a self-similar measure on the unit interval — discretised point mass at zero. Taking

the function perspective: in the Lp norm the approximation e2 = (0, 1, 0, . . . , 0) was ‘bad’ as

‖e1 − e2‖pp = 2, although it visually looks like e1. If we instead think of the image as measures,

e1 = 1(·)(0) and e2 = 1(·)(1

1000) then dP(e1, e2) ≤ 11000 , making this a ‘good’ approximation

visually, in the moment sense, and in the Hausdorff sense.

The assumption of the function being continuous, or the object to which we are approximating

even being connected is superfluous to the moment technique, as witnessed in our Steemson

triangle example. Thus, this method can be thought of as an extension to that shown in [BV15]

for discontinuous models. To gain a method that can find a truncated FIF exactly, we must

progress from our simple example to local fractal models.

1We thank Markus Hegland for the suggestion to use regularisation.


0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8


DataApproximant

Figure 4.6: A one-dimensional image (blue) and a FIF approximation made with the techniques

given in Chapter 2 that matches the fractal dimension of the data and is close in the L2 sense.

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8


DataApproximantInterpolation Points

Figure 4.7: A moment approximation with the methodology described in the previous section.

The interpolation points need not lie on the function given. In adding the moment quantities, the

approximant becomes closer to the given function, in the Hausdorff sense, than the approximant

above, at the expense of the dimensions not matching exactly.

Chapter 5

Loosening Self-Similarity

‘Fractals Everywhere.’ – Michael Barnsley.

It is naive to assume that an object is globally self-similar. Fractal modelling is based off a very

simple premise: in many objects such as images of nature, parts of an object are similar to other

parts of an object. Being able to model these relationships leads to the efficient description and

therefore storage of the object. For instance, within a picture of a tree, one single leaf on the

tree will look very similar to another leaf. However, it is untrue that the whole image of a tree

will look like a leaf. Therefore, we must model real world objects locally, not globally as we have

been in our simplified examples.

In this section we give a brief introduction to how one moves from the global models presented

thus far to local ones. Once this is done, we show that one can calculate the moments of a

self-affine local fractal as a generalisation with the tools we have created so far. Finally, a

commentary is made on how the modern image compression is dissimilar to the theory built up

in this thesis and how the work done here fits into the overall story of fractal approximation.

In this commentary, only the ideas are shown with how they are fundamentally different to the

approach explored here.

5.1 Graph-Directed IFS (GD-IFS)

Recall that when we had an IFS that obeyed the OSC, then equipped this IFS with probabilities,

the dynamical system one could create was isomorphic to a Bernoulli shift. A natural extension

of a Bernoulli shift is a Markov shift, whereby place dependent probabilities are put on the

construction of our measure. This is precisely the type of construction we move to here.

79

80 CHAPTER 5. LOOSENING SELF-SIMILARITY

5.1.1 Construction

From Chapter 3 we know that when (P(X), dP) is a compact metric space, this implies (X, d)

is also a compact metric space. In this chapter a contractive operator, called an IFS of prob-

abilities, M : (P(X), dP) → (P(X), dP) was built. Now we aim to build a vectorised version of

this construction. A summary of this construction is that we will build an IFS that acts on the

collection {(Xi, di)}Ni=1 of compact metric spaces. We will firstly build a collection of operators

Mi,j : P(Xi) → P(Xj) in a near identical fashion to Chapter 3. Finally, let P =⊗V

i=1 P(Xi),

then an operator (and IFS) are built acting on M : P → P; also a compact metric space as

the product of compact metric spaces. The analogous fixed point of such a space is a vector of

normalised Borel measures.

Select Xi = [0, 1]2 for all i ∈ [V ]. We keep the general notation as the subscripts aid in under-

standing the where the functions map between. For this section, we take hi,j,k to be a function

hi,j,k : Xi → Xj for k ∈ [Ni,j ] where Ni,j is the amount of functions acting from Xi to Xj .

Let (Xi, di) and (Xj , dj) be compact metric spaces. Then define the operator Mi,j : P(Xi) →P(Xj) through

(Mi,j(µ))(S) :=

Ni,j∑k=1

pi,j,kµ(h−1i,j,k(S)) (5.1)

for an IFS with probabilities1 Mi,j = {(Xi, di)→ (Xj , dj);hi,j,1, . . . , hi,j,Ni,j ; pi,j,1, . . . , pi,j,Ni,j}.

Now define the operator M : P→ P component-wise through:

(M(µ))j(Sj) :=

V∑i=1

(Mi,j(µj))(Si) =

V∑i=1

Pi

Ni,j∑k=1

pi,j,kµi(h−1i,j,k(Si))

where Pi ∈ (0, 1) are normalising constants such that∑V

i=1 Pi = 1, and Sj is the jth component

projection of this Borel set S ∈ B(P) given by the product topology. There is a natural metric

given by a vectorised form of dP given through

d(µ, ν) = sup

{V∑i=1

(∫fid(µi − νi)

): fi ∈ Lip1(Xi)

}

which by identical reasoning given in Chapter 3 is contractive when each of the functions hi,j,k :

Xi → Xj are. This assumption is stronger than what is required as eluded to by the point

made on the chaos game in Lemma 3.22. One only require that the operator above has a unique

invariant measure with the maps being ‘contractive on average’ [Elt87].

1This is slightly incorrect, as we have defined an IFS with probabilities to be acting on a space to itself. We

allow for this minor abuse of correctness for pedagogical reasons.

5.1. GRAPH-DIRECTED IFS (GD-IFS) 81

Example 5.1. The following example is a generalisation of the Generalised Sierpinski Triangles

given in [SW18]. In particular, this example is a combination of a Steemson and a Pedal triangle.

Let A1 = A2 be the triangle contained in the unit square with one side being the line segment

from zero to one; with the two other side lengths given by the values a = 1.1 and b = 0.85 in

the figure below with Cx, Cy defined in Example 2.10. Note that the triangles Ai,j and Ai are

similar for i ∈ [2] := V and j ∈ [3]. The similarities given on the left may be used to create a

Steemson triangle, those on the right make a Pedal triangle.

X1 = [0, 1]2

(Cx,Cy)

A1,1 A1,2

A1,3

A1

b a

1

X2 = [0, 1]2

(Cx,Cy)

A2,1

A2,2

A2,3

A2

b a

1

Figure 5.1: First Order Steemson and Pedal fractal triangles with a = 1.1 and b = 0.85

Let two copies of the unit square in R2 be given and equip these with the Euclidean metric

to make these compact metric spaces homes for our figures — call these X1 and X2. Now we

will visually describe the similitude maps that create a GD-IFS constructed by two generalised

Sierpinski triangle. Referring to the figure above let hk,k,l : Xk → Xk that map Ak through a

similitude to Ak,l+1 for l ∈ {1, 2}. Furthermore let hi,j,1 : Xi → Xj for i 6= j and i, j ∈ {1, 2}where gi,j,1 maps Ai to Aj,1 — this is possible as Ai and Aj,1 are similar triangles. These

relations can be represented by the following directed graph.

X1 X2

h1,2,1

h2,1,1

h1,1,1

h1,1,2

h2,2,2

h2,2,1

We may equip these maps with any sets of probabilities pi,j,k and normalising probabilities Pi

to form the operator M. Using Elton’s Theorem we may generate an image of the attractor of


this GD-IFS, as alluded to in Lemma 3.22, his theorem holds rather generally — even in the

case of place dependent probabilities in the ‘Markov system’ we have here.

Figure 5.2: The two attractors of the GD-IFS system given above, coloured in accordance to

the edges in the directed graph given. Notice that either attractor is not globally self-similar.

5.1.2 Moment Calculation for GD-IFS

The point of this section is to show how the simplified examples shown in this thesis relate to

image compression. In particular, the machinery in this thesis is readily generalised to local

fractal models which are more abundant in practice [Jac92]. Let µ ∈ P := ⊗Vi=1P(Xi) where

Xi is a compact subset of R2 and Mj := (1,∫xdµj(x, y),

∫ydµj(x, y), . . . ). We then have the

recurrence relationship through recalling the notation (Notation 3.45) used in Chapter 3:

(Mj)τ(n,m) =

∫xnymd(Mv(µj))(x, y) =

V∑i=1

Pi

Ni,j∑k=1

pi,j,k

∫hi,j,k(x

n, ym)µi(x, y)

which gives the vector equation, with the notation used in Chapter 3:

Mj =V∑i=1

Pi

Ni,j∑k=1

pi,j,kφhi,j,kMi.

When the above operator is truncated to the n = bnct ∈ N0 level we can construct the following

block matrix form of a perfectly determined system of equations

M =

M1

...

MV

=

P1∑N1,1

k=1 p1,1,kφh1,1,k · · · PV∑NV,1

k=1 pV,1,kφhV,1,k

.... . .

...

P1∑N1,V

k=1 p1,V,kφh1,V,k· · · PV

∑NV,V

k=1 pV,V,kφhV,V,k

M1

...

MV

. (5.2)

This is completely reminiscent of the construction in Chapter 3. Naturally we aim to make use

of the machinery created in Chapter 3, whereby a contractive operator whose fixed point is the

vector of a sequences of moments. Using the notation built in Chapter 3.

5.1. GRAPH-DIRECTED IFS (GD-IFS) 83

Let 1˜∞ = ⊗Vi=1 1`∞, then define the operator Φ : 1

˜∞ → 1˜∞ through

(Φ(x))j = Φj(xj) :=

V∑i=1

Pi

Ni,j∑k=1

pi,j,kφhi,j,kxi,

where x = (x1, . . . , xV ) with xi ∈ 1`∞.

Theorem 5.2. Let a graph directed IFS M : P → P be given such that each IFS Mi,j : Xi ⊂R2 → Xj ⊂ R2 has functions, hi,j,k for k ∈ [Ni,j ], of the form:

hi,j,k

(x

y

)=

(ai,j,k 0

ci,j,k di,j,k

)(x

y

)+

(ei,j,k

fi,j,k

)|ai,j,k|, |di,j,k| < 1.

Define the norm ‖ · ‖g on 1˜∞ through ‖x‖g =

∑Vj=1 ‖xi‖gi, where ‖ · ‖gi is given by the norm

defined in Proposition 3.52. Then there exist gi such that Φ : (1˜∞, ‖ · ‖g) → (1

˜∞, ‖ · ‖g) is a

contraction.

Proof. Recall the working shown in Theorem 3.53 allows the selection of a function gi that maps

(x, y)→ (αix, βiy) such that ‖Φgj◦Mi,j◦g−1j‖∞ < 1 for all i, j ∈ [V ]. Then through near identical

working to Theorem 3.53 we have

‖Φ(x)‖g =

V∑j=1

‖Φ(xj)‖gj ≤V∑j=1

‖Φgj◦Mi,j◦g−1j‖∞‖xj‖gj ≤ max

j∈[V ]‖Φgj◦Mi,j◦g−1

j‖∞ ‖x‖g < ‖x‖g,

completing the claim.

Corollary 5.3. Let M be of the form assumed above and M∗ be the vector of moments of the

invariant measure µ ∈ P that is in (1˜∞, ‖ · ‖g). Let M ∈ (1

˜∞, ‖ · ‖g) and Mn be its nth level

truncation, then:‖M∗ −M‖g ≤ C(λ)‖ΦM,n

Mn −Mn‖2 + ε(n).

for some constants ε, C > 0 that depend on the truncation level n and contractivity λ. Where

ε can be made small with increasing n.

Remark 5.4. Given a one-dimensional image, the theorem above is general enough to use

moments in the fractal image compression used in practice [Jac92]. That is graph directed FIF

functions, which are explained in detail here [DO17]. Using the graded-lexicographical ordering

x ≺lex y ≺lex z, the results in this thesis could be lifted to the three dimensional case for use in

current image compression. Finally, Corollary 5.3 justifies finding a local minima to the moment

equations to approximate a real world image with moments that is not locally self-similar.

Using identical working to the above, combined with Theorem 3.55 we may iteratively calculate

the moments of our GD-IFS generalised Sierpinski triangles.

Example 5.5. Let the graph direct IFS be given as in Example 5.1. For this calculation, we

have made the selection Pi · pi,j,k = λDj

i,j,k where Dj is the root of λDj

j,j,1 + λDj

j,j,2 + λDj

i,j,1 = 1 for

i, j ∈ [2]. The first six moments for these attractors are:

(1, 0.4265, 0.2646, 0.241, 0.0979, 0.118), (1, 0.4327, 0.2705, 0.2446, 0.1022, 0.1208),

which agree with the result found through Elton’s Theorem computed here.



5.1.3 Application of Moments to Fractal Image Compression

This thesis has begun the study of a tool for fractal approximation using moments. As alluded

to in our graph directed example, the structure created in Chapter 3 generalises to the IFSs

used in practice. As we saw in Chapter 4, the bound on the Collage theorem for moments gives

an approximation theory for triangular IFSs which in this chapter was shown to generalise to

local models. In Chapter 2 we saw how Lp spaces were used as useful tools to approximate

elements of H(X), from which the methods in this space were sufficient to model photographs.

For example, the methodology presented in Chapter 2 with a local fractal model produces the

following approximant to this picture of a hot air balloon.

Figure 5.3: Original photo (left), FIC compressed photo (right) [B+96].

This is ‘fractal enhancement’. When data lies on a straight line, it is not ridiculous to interpolate

them linearly; here we have self-referential data and a (locally) self-referential interpolation

enhancing the image. In allowing for more variability in the fractal model of a photograph —

such as what our moment approximations aim to do — one aims to better model the data of

the image which leads to effective prediction of a resolution independent model as shown above.

The use of moments to approximate the Hausdorff distance to create such models has been

done heuristically in [Wel99]. In this document, the design of a neural network to automate

finding self-similarities in an image is given. The key heuristic observation is that the current

FIC algorithm is too restrictive in searching for self-similarity in images. Concisely, the current

FIC algorithms are sufficiently constrained so that the costly optimisation being performed is

discrete and further assumes that an image is sufficiently rich in self-similarity that fixing some

map parameters will still yield a good approximant. Our moment method let the parameters

live in a continuum, granting more flexibility as seen in Proposition 4.38. Investigating if the

moment theory derived in this thesis could be used to make a less restrained fractal image

compression would be an interesting area of application.

5.2. A COMMENTARY OF SOME MODERN APPROACHES 85

5.2 A Commentary of some Modern Approaches

Fractal image compression is a modelling tool, not an effective means to describe an object. The

moment method presented in this thesis was on the side of assuming very little when modelling

a self-similar object. Given a Cantor set, the moment method could deduce all the IFSs that

generate it. This ability is accompanied by the drawback of increased difficulty in computation

as witnessed in Chapter 4. In Chapter 2, we saw in our least squares example that placing more

assumptions on an IFS can make the computations easier at the expense of flexibility in the

fractal model. To create fractal models in application, such as the picture shown in Figure 5.3,

these opposing ideas must be balanced.

If one method sways to the side given in this thesis, the area of application is that of ‘SuperRes-

olution’. In this area of application, data is given at measured finite resolutions and deductions

are wanted to be made at a higher resolution — this is a computationally expensive interpolation

problem. In contrast, if computability is favoured, then stronger assumptions on the IFS maps

are made. For many modern image compression techniques, this is the case.

A common idea used in compression is to find a ‘good basis’ in which to describe data — where

analysis of complicated objects, like images, can actually be performed. One such example

is the use of wavelets and a multiresolution basis. The intuition for this basis is that, when

approximating functions, one should firstly make a crude approximation that agrees with the

function on a large scale and then include additional terms to improve the approximate, which

will then capture the finer details. Imagine we have a function and its approximate graphed on

a wall. How would we measure ‘how good’ the approximate is? The multiresolution approach

is to look at the wall, start walking away from it and stop when the two are indistinguishable

to the eye. We then measure how many steps we took from the wall, which we consider to be

‘how far away’ the approximate is to the function. For an explicit example, refer to figure A.4

and do this experiment. The relationship between forming a wavelet basis from FIFs with fixed

map parameters is shown in [DGHM96].

In [MS15] they fix IFS map parameters for a superIFS — an IFS on (H(H(X)), dH). To describe

an image, a vectorised version of this space, (H(HV (X)), dH) is needed. This space is ‘huge’

and a model here must be constrained greatly for computation — in fact the IFS used is a two

dimensional version of Example 3.8. A compression scheme is given that allows for V -different

similarities (see Figure A.5) in their ‘basis’, but like wavelets, we are finding an effective way

to describe an image, not model it. The way compression is achieved is through using IFS

theory (specifically codespace) to describe the image in an efficient way. Again, the resolution

independence is lost here as we are fundamentally describing the self-similarities in the image,

not using the self-similarities to model how the image relates to itself.

Bibliography

[ABVW10] Ross Atkins, Michael F Barnsley, Andrew Vince, and David C Wilson. A char-

acterization of hyperbolic affine iterated function systems. Topology Proceedings,

36:189–211, 2010.

[Alt80] Mieczyslaw Altman. Iterative methods of contractor directions. Nonlinear Analysis:

Theory, Methods & Applications, 4(4):761–771, 1980.

[B+96] Michael F Barnsley et al. Fractal image compression. Notices of the AMS, 43(6):657–

662, 1996.

[Bar14] Michael F Barnsley. Fractals everywhere. Academic press, 2014.

[BBHV16] Christoph Bandt, Michael Barnsley, Markus Hegland, and Andrew Vince. Old

wine in fractal bottles: Orthogonal expansions on self-referential spaces via fractal

transformations. Chaos, Solitons & Fractals, 91:478–489, 2016.

[BD85] Michael F Barnsley and Stephen Demko. Iterated function systems and the global

construction of fractals. Proceedings of the Royal Society of London. A. Mathemat-

ical and Physical Sciences, 399(1817):243–275, 1985.

[BE88] Michael F Barnsley and John H Elton. A new class of markov processes for image

encoding. Advances in applied probability, 20(1):14–32, 1988.

[BH89] Michael F Barnsley and Andrew N Harrington. The calculus of fractal interpolation

functions. Journal of Approximation Theory, 57(1):14–34, 1989.

[BHR06] Christoph Bandt, Nguyen Hung, and Hui Rao. On the open set condition for self-

similar fractals. Proceedings of the American Mathematical Society, 134(5):1369–

1374, 2006.

[BRS16] Balazs Barany, Michal Rams, and Karoly Simon. On the dimension of self-affine

sets and measures with overlaps. Proceedings of the American Mathematical Society,

144(10):4427–4440, 2016.

[BRS18] Balazs Barany, Michal Rams, and Karoly Simon. Dimension of the repeller for a

piecewise expanding affine map. arXiv preprint arXiv:1803.03788, 2018.

I

II BIBLIOGRAPHY

[BRS19] Balazs Barany, Michal Rams, and Karoly Simon. Dimension theory of some non-

markovian repellers part i: A gentle introduction. arXiv preprint arXiv:1901.04035,

2019.

[Buc76] Bruno Buchberger. A theoretical basis for the reduction of polynomials to canonical

forms. ACM SIGSAM Bulletin, 10(3):19–29, 1976.

[BV15] Michael F Barnsley and P Viswanathan. Discontinuous fractal functions and fractal

histopolation. arXiv preprint arXiv:1503.06903, 2015.

[BV18] Michael F Barnsley and Andrew Vince. Tiling iterated function systems and

anderson-putnam theory. arXiv preprint arXiv:1805.00180, 2018.

[Dau92] Ingrid Daubechies. Ten lectures on wavelets, volume 61. Siam, 1992.

[DGHM96] George C Donovan, Jeffrey S Geronimo, Douglas P Hardin, and Peter R Massopust.

Construction of orthogonal wavelets using fractal interpolation functions. SIAM

Journal on Mathematical Analysis, 27(4):1158–1192, 1996.

[DO17] Ali Deniz and Yunus Ozdemir. Graph-directed fractal interpolation functions. Turk-

ish Journal of Mathematics, 41(4):829–840, 2017.

[Elt87] John H Elton. An ergodic theorem for iterated maps. Ergodic Theory and Dynamical

Systems, 7(4):481–488, 1987.

[EST07] C Escribano, MA Sastre, and E Torrano. A fixed point theorem for moment ma-

trices of self-similar measures. Journal of computational and applied mathematics,

207(2):352–359, 2007.

[Fal04] Kenneth Falconer. Fractal geometry: mathematical foundations and applications.

John Wiley & Sons, 2004.

[Gra18] Alexandra Grant. Mixed tiling systems and anderson putnam theory. Honours

thesis, Australian National University, Acton, Canberra, 2018.

[Hau78] Felix Hausdorff. Grundzuge der mengenlehre, volume 61. American Mathematical

Soc., 1978.

[HM90] CR Handy and Giorgio Mantica. Inverse problems in fractal construction: moment

method solution. Physica D: Nonlinear Phenomena, 43(1):17–36, 1990.

[Hut79] John E Hutchinson. Fractals and self similarity. University of Mel-

bourne.[Department of Mathematics], 1979.

[IS13] Konstantin Igudesman and Gleb Shabernev. Novel method of fractal approximation.

Lobachevskii Journal of Mathematics, 34(2):125–132, 2013.

BIBLIOGRAPHY III

[Jac92] Arnaud E Jacquin. Image coding based on a fractal theory of iterated contractive

image transformations. IEEE transactions on image processing, 1(1):18–30, 1992.

[JKS11] Palle ET Jørgensen, Keri A Kornelson, and Karen L Shuman. Iterated function sys-

tems, moments, and transformations of infinite matrices. American Mathematical

Soc., 2011.

[Kol31] Andrei Kolmogoroff. Uber die analytischen methoden in der wahrscheinlichkeit-

srechnung. Mathematische Annalen, 104(1):415–458, 1931.

[KR15] Antti Kaenmaki and Eino Rossi. Weak separation condition, assouad dimension,

and furstenberg homogeneity. arXiv preprint arXiv:1506.07851, 2015.

[Man96] Giorgio Mantica. A stable stieltjes technique for computing orthogonal polynomi-

als and jacobi matrices associated with a class of singular measures. Constructive

Approximation, 12(4):509–530, 1996.

[Man13] Giorgio Mantica. Direct and inverse computation of jacobi matrices of infinite iter-

ated function systems. Numerische Mathematik, 125(4):705–731, 2013.

[MMR00] Pertti Mattila, Manuel Moran, and Jose-Manuel Rey. Dimension of a measure.

Studia Math, 142(3):219–233, 2000.

[Mor46] Pat AP Moran. Additive functions of intervals and hausdorff measure. In Mathe-

matical Proceedings of the Cambridge Philosophical Society, volume 42, pages 15–23.

Cambridge University Press, 1946.

[Mor99] Manuel Moran. Dynamical boundary of a self-similar set. Fundamenta Mathemati-

cae, 160(1):1–14, 1999.

[MS89] Giorgio Mantica and Alan Sloan. Chaotic optimization and the construction of

fractals: solution of an inverse problem. Complex Systems, 3(1), 1989.

[MS15] Franklin Mendivil and Orjan Stenflo. V-variable image compression. Fractals,

23(02):1550007, 2015.

[MT01] H Michael Moller and Ralf Tenberg. Multivariate polynomial system solving using

intersections of eigenspaces. Journal of symbolic computation, 32(5):513–531, 2001.

[Sch94] Andreas Schief. Separation properties for self-similar sets. Proceedings of the Amer-

ican Mathematical Society, 122(1):111–115, 1994.

[Ste04] Hans J Stetter. Numerical polynomial algebra, volume 85. Siam, 2004.

[Ste18] Kyle Steemson. The open set condition and neighbour graphs. Honours thesis,

Australian National University, Acton, Canberra, 2018.

IV BIBLIOGRAPHY

[SW18] Kyle Steemson and Christopher Williams. Generalised sierpinski triangles. arXiv

preprint arXiv:1803.00411, 2018.

[Vin18] Andrew Vince. Global fractal transformations and global addressing. Journal of

Fractal Geometry, 5(4):387–418, 2018.

[Wal00] Peter Walters. An introduction to ergodic theory, volume 79. Springer Science &

Business Media, 2000.

[Wel99] Stephen T Welstead. Fractal and wavelet image compression techniques. SPIE

Optical Engineering Press Bellingham, Washington, 1999.

[WW08] Weiping Wang and Tianming Wang. Generalized riordan arrays. Discrete Mathe-

matics, 308(24):6466–6500, 2008.

Appendix A

Further Fractal Figures

‘Let me check your figures, last time you made scary photos of Griff.’

– Jane Tan.

‘You need to have our triangles, with their names, somewhere.’

– Kyle Steemson.

Figure A.1: A (periodic) fractal tiling given by a Williams triangle, as named in [Gra18].

V

VI APPENDIX A. FURTHER FRACTAL FIGURES

Figure A.2: First eight orthogonal polynomials for uniform Cantor measure µC computed by the

method given in [JKS11].

VII

Figure A.3: A FIF of two maps (blue) with a moment approximation by a FIF made from two

maps (orange). Next, a FIF of three maps approximated by a FIF of two maps with moments.

VIII APPENDIX A. FURTHER FRACTAL FIGURES

Figure A.4: Visualisation of an image projected to the various resolution spaces for a Haar

wavelet compression, see [Dau92].

IX

Figure A.5: Image storage using a V-variable coding of [0, 1]2 for V = 4j where j ∈ {0, . . . , 5},for compression algorithm details see [MS15].

Self-Similar Models and How to Find Them: A Moment Theory Approach · 2019. 11. 21. · of an...

Documents

Transcript of Self-Similar Models and How to Find Them: A Moment Theory Approach · 2019. 11. 21. · of an...