Convolutional Neural Network Assisted Limited Angle …1.2 Arti cial Neural Networks An increasingly...

MSc Physics and AstronomyTrack: Theoretical Physics

Master Thesis

Convolutional Neural Network AssistedLimited Angle Tomography

by FBP, SIRT and TV-MIN Approximation

by

Wiebe Stolp10346996

July 12, 2018

Supervisor:Prof. Dr. Sander M. Bohte

Assessor:Prof. Dr. Bernard Nienhuis

Co-assessor:Dr. Greg J. Stephens

Abstract

Tomography, such as done in CT and X-ray scanning, revolves around the con-version of projection data, obtained at a number of different angles by emittingand measuring penetrating radiation on opposite sides of an unknown object,to a reconstructed image of this object. Since penetrating radiation is generallyharmful, limiting the necessary number of projection angles to a minimum is oneof tomography’s essential topics. Various mathematically backed reconstructionmethods exist, but retaining quality with fewer projection angles comes at thecost of computation time. This thesis explores the ability of a newly popular-ized computational tool that mimics some principles of the biological brain, aconvolutional neural network, to quickly approximate higher quality reconstruc-tions done by three different tomographic methods using only a lower qualitylimited angle- and computationally cheap reconstruction as input. By trainingthe network on a known set of simulated example pairs, both in the presenceand absence of measurement noise, it is able to succesfully learn such an ap-proximation and thereby to either avoid significant computation time or reducethe number of necessary projection angles.

Contents

1 Introduction 31.1 Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . 61.3 Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Theory 92.1 Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Analytical Methods . . . . . . . . . . . . . . . . . . . . . 122.1.2 Algebraic Methods . . . . . . . . . . . . . . . . . . . . . . 152.1.3 Practicalities & Noise . . . . . . . . . . . . . . . . . . . . 18

2.2 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . 202.2.1 The Multilayer Perceptron . . . . . . . . . . . . . . . . . . 212.2.2 Convolutional Neural Networks . . . . . . . . . . . . . . . 23

3 Related Work 273.1 Neural Network Filtered Backprojection . . . . . . . . . . . . . . 273.2 FBPConvNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4 Network 314.1 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.2 Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.3 Computational Cost . . . . . . . . . . . . . . . . . . . . . . . . . 364.4 Reading Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5 Many angle analytical target 395.1 Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6 Limited angle algebraic target 486.1 Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486.2 SIRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6.2.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516.3 TV-MIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

1

CONTENTS 2

7 Conclusion 597.1 General Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

7.1.1 Correlations between SNRs and network capacity . . . . 597.1.2 Correlation between SNRphantom and SNRtarget . . . . . 607.1.3 Outperforming the target methods . . . . . . . . . . . . . 61

7.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Chapter 1

Introduction

Inverse problems arise in situations where certain properties of a source arereconstructed from observations that indirectly originate from this source. Adifficulty often intrinsic to problems in this group is that a range of possiblesolutions will fit the observations, while only one solution can correspond to theoriginal source.

1.1 Tomography

In tomography a 3D image of an object is estimated from a stack of 2D sectionsof this object as pictured. The sections are acquired through the use of pene-trating radiation. The main tomographic challenge however, lies not in stackingthese sections together but in computing accurate representations of the sectionsusing 1D projections that are acquired at a number of different angles withineach section plane. A schematic representation of the process that takes us from1D measurement data to a 2D section to a 3D reconstruction is given in figure1.1. This research focuses solely on improving the core process that takes usfrom a collection of 1D projections to a 2D section.

Tomographic problems fall into the category of inverse problems. Findinga solution for the image that approximates the original object most accuratelyis relevant in applications such as X-ray tomography in medical imaging andelectron tomography in materials science. Because of this practical importancea significant amount of research on the subject has already been done [15]. How-ever, today’s reconstruction methods still leave room for improvement. Sum-marized, mathematical and physical issues both intrinsic and extrinsic to thetomographic problem interfere with the process that takes us from projectionsto image in ways that undermine either quality or reconstruction time or both.

In terms of reconstruction quality, the merit of projection data is determinedin particular by the number and range of projection angles along which thepenetrating radiation is recorded, the number of detectors responsible for thisjob and the presence of noise in their measurements. In practical applications it

3

CHAPTER 1. INTRODUCTION 4

=Figure 1.1: A collection of 1D detector values recorded at a number of differentprojection angles is used to tomographically generate a 2D reconstruction ofthe unknown contents of a head. Sections harvested at different depths canbe stacked together for a 3D reconstruction. This research focuses only onthe process that brings us from the detector values to the 2D reconstruction.Adapted from [17].


is often favorable to limit the number of angles while collecting projection dataas too much radiation exposure can be damaging to the object of which we wantto reconstruct an image. Furthermore, the acquisition time of projection datascales with the number of angles, and apart from this acquisition time beingprecious in itself, minimizing it will also benefit the quality of reconstructions ofmoving objects. It is therefore useful to find ways to reconstruct images based onprojection data garnered at smaller numbers of angles, so called limited angleprojection data as opposed to the need for projection data garnered at moreangles, so called many angle projection data.

In addition to limiting the number of projection angles, radiation exposurecan be lowered by simply decreasing the radiation intensity penetrating theobject. This however results in increased measurement noise.

Angle and exposure limitations both produce significant artifacts in tomo-graphic reconstructions. This research explores possible improvementsin the areas of both reconstruction time and reconstruction qualityusing the powers of artificial neural network computation, throughapproximation of different existing traditional tomographic methods.Both actual reconstruction quality and method approximation qual-ity are examined as functions of different network properties, andof the number of projection angles, amount of noise and specifics ofthe traditional tomographic method that the networks are made toapproximate.

Reconstruction quality is thus limited in a natural way by the quality ofthe source data used for the reconstruction, i.e. the number of projection an-gles, the number of detectors and the presence of noise. However, an equallyimportant influence on reconstruction quality is the choice of reconstructionmethod. Practically all traditional reconstruction methods can be divided intotwo categories: Analytical methods, that use a direct mapping from the mea-sured projection data to its corresponding reconstruction and algebraic methods,that repeatedly map a guessed reconstruction to its corresponding theoreticalprojection data until it fits the measured projection data sufficiently.

Analytical methods, of which filtered backprojection [15] is used most com-monly, are based on a continuous and noiseless representation of the recon-struction problem in which a mathematical relationship between image andprojection data is exploited, resulting in a direct mapping from projection datato reconstruction. This direct mapping makes analytical methods computation-ally cheap, but as the mapping is built on a relation that is in principle onlyvalid in the continuous angle limit, reconstruction quality decreases drasticallyin cases where projections at only a limited number of angles is available due tothe appearance of artifacts. This, combined with a high sensitivity to measure-ment noise, makes that analytical methods often produce unacceptable resultsin these limited angle scenarios.

Algebraic methods, such as the Simultaneous Iterative Reconstruction Tech-nique (SIRT) and the Simultaneous Algebraic Reconstruction Technique [15]work differently. They do not try to invert the straightforward mapping be-tween reconstruction and projection data, but use it as is to map an initially


guessed image to its theoretical projection, and then, based on the differencewith the actual projection data, try to iteratively find an image that minimizesthis difference, using various algorithms. These methods are based on a dis-crete representation of the problem from the start and are therefore less proneto producing artifacts. Furthermore they are flexible enough to simultaneouslyimplement additional techniques. Examples are total variation minimizationbased methods such as FISTA [4] that reduce the effects of noise and the for-mation of artifacts in the reconstruction by exploiting prior knowledge aboutthe image space of the image that is to be reconstructed.

A disadvantage resulting from the iterative nature of algebraic methods isthe longer computation time needed to obtain satisfactory results. Furthermore,imposing prior knowledge about the object may in some cases limit the types ofobjects that can accurately be reconstructed. See section 2.1 for a more in-depthdiscussion on analytical and algebraic methods.

1.2 Artificial Neural Networks

An increasingly popular tool in computer science that has found applications inwidely varying fields ranging from image and speech recognition to the genera-tion of music and playing board games is the use of artificial neural networks.An artificial neural network (ANN) can be generally described as a digital col-lection of interconnected units that mimics some principles of the biologicalbrain. Most commonly used ANN’s are simple mathematical functions that areshaped mainly by the strength of the connections between their units, the net-work parameters. An algorithm is used to iteratively adjust these parametersto optimally fit a training set of known corresponding input and output pairsto each other during the so called training phase. The idea is that, once thistraining phase is over, new unseen input from the same category as the input ofthe training set is then mapped to their unknown corresponding output values.

1.3 Our Contribution

In short, this thesis aims to explore the power of neural network computationin the context of tomography. This is done by training a network in two ways,either to imitate many angle analytical reconstructions without actually needingmany projection angles after the training phase, or to imitate computationallyexpensive algebraic reconstructions without actually being computationally ex-pensive after the training phase.

In [20] it is shown that the calculations done by arguably the most simpleversion of a neural network, a multilayer perceptron such as described in section2.2.1, can after training be viewed as a weighted sum of filtered backprojectionsif the sinograms in its training set are preprocessed in a certain way. This socalled Neural Network Filtered Backprojection (NNFBP) method is explainedin more detail in section 3.1.


A more recent neural network assisted approach to tomography called FBP-ConvNet [14] has been succesfully trained to map complete noiseless limitedangle FBP reconstructions towards noiseless many angle FBP reconstructions.FBPConvNet uses a more modern convolutional network structure, the basicsof which are illuminated in section 2.2.2.

Perhaps the most straightforward example of an inverse problem is the prob-lem of super resolution, where an attempt is made to obtain a realistic highresolution image from a low resolution input image. This problem intuitivelydemonstrates how lack of input information can prevent us from finding a desir-able solution. Since there is generally a broad range of high resolution imagesthat could theoretically correspond to the same low resolution image, the oldestsuper resolution methods usually produce no more than a blurred version oftheir inputs that has more pixels but not more useful information about thecontent that the image represents. However, a recent neural network assistedsuper resolution method has shown surprising advances in solving this difficultproblem through their ability to extract additional generalized information fromthe high and low resolution examples in its training set. This Super-ResolutionConvolutional Neural Network [8] can also be trained for use as a denoising tool.

Because the tomographic problem appears to share essential similarities tothe problem of super resolution, in this article we propose a network whosestructure is inspired by this Super-Resolution Convolutional Neural Network[8]. Our network is thus also convolutional, but otherwise structurally differentfrom FBPConvNet [14]. It is furthermore trained on noisy data and uses adifferent kind of output targets. More details on the network are found inchapter 4. The ways we use our network can be summarized in two groupscharacterized by the network’s input and output images:

1. Many angle analytical target: The mapping of a limited angle NNFBP[20] and limited angle FBP reconstruction pair towards one many angleFBP reconstruction. For the training phase many angle projection data isrequired, but for application it is not. This method is discussed in chapter5.

2. Limited angle algebraic target: The mapping of a limited angle FBPreconstruction towards a limited angle algebraic reconstruction. In thisuse case no many angle projection data is needed at all. Two algebraicreconstruction methods are explored: SIRT [15] and the total variationbased method FISTA [4]. The method is discussed in chapter 6.

We show that the proposed network structures are impressively capable oflearning the mappings in both these use cases. Usually an expected trend isvisible where network performance increases with the number of network pa-rameters, but this is not a rule. We see that better approximation of the targetreconstruction method does not necessarily imply a better reconstruction. Es-pecially in the presence of noise, better target method approximations oftenresult in lower quality reconstructions when compared to their original phan-toms. On the other hand, target reconstructions that are already closer to their


original phantoms seem to be approximated more easily by our network. Thisindicates that the practical value of our network increases in multiple ways withthe properties of the approximated reconstruction method:

• Approximating a higher quality reconstruction method naturally increasesreconstruction quality by definition.

• The network approximates a high quality reconstruction more easily thana low quality reconstruction.

• The network’s approximation quality does not depend on the computa-tional cost of the target reconstruction method. The relative reduction incomputation time thus increases with the computational cost of the targetreconstruction method.

Chapter 2

Theory

2.1 Tomography

Tomography is 3D imaging through the use of 2D sections. The largest difficul-ties usually lie in obtaining these 2D sections from a set of 1D measurementsobtained by a CT-scanner at different angles. The most commonly used andintuitive setup that is also used in this research can be understood in termsof the parallel beam geometry coordinate system. From a mechanical perspec-tive, this setup constitutes a line-shaped penetrative radiation source that emitscollimated radiation from one side of an object towards a block of detectors mea-suring the leftover radiation intensity on the other side of the object. After aprojection is done, either the object or the radiation source and detector blockare rotated by a small angle to get in position for recording the next projection.For optimal reconstruction results the set of angles is equally divided over arange of 180°. The setup is pictured in figure 2.1.

A theoretical projection of an object along a certain angle corresponds toa set of line integrals through this object. In practice such a line represents anarrow beam of radiation and the line integral therefore represents the totalaccumulated attenuation of the beam after traveling from its source to thedetector array. Attenuation happens exponentially, and as such we can describethe final intensity of a beam I relative to its initial intensity I0 after a journeyalong the straight line l generally as:

I

I0= e−

∫lµ(x,y) ds (2.1)

where µ : R2 → R is the local attenuation coefficient as a function of 2D space.In a CT-scanner, I0 is known and I is measured by a detector at the end ofl. The tomographic goal is to find a direct description of µ(x, y) based on acollection of measurements of I at different detector positions and at differentangles. Using t to parameterize the continuous position on the detector array,we define the projection p : R→ R of a single beam that lands at t:

9

CHAPTER 2. THEORY 10

p(t) = − ln

(I(t)

I0

)=

∫lt

µ(x, y) ds (2.2)

with lt the path of the beam leading to position t on the detector array. In theparallel beam geometry coordinate system pictured in figure 2.1, a beam goingthrough some point in space with coordinates (x, y) at an angle θ will land onthe detector array at a location t specified by:

t = x cos θ + y sin θ

For any given t and θ, we call the set of points in space obeying this equationline lt,θ, the beam’s path between source and detector traveling through theobject. Thus, we can write equation 2.2 for the projection done at an angle θas [15]:

pθ(t) =

∫lt,θ

µ(x, y) ds =

∫ ∞−∞

∫ ∞−∞

µ(x, y)δ(x cos θ + y cos θ − t) dxdy (2.3)

This is called the Radon transform of µ(x, y). In short, for a specified t andθ, pθ(t) is the scalar value corresponding to a single line integral over the at-tenuation coefficient along the line lt,θ through the object, parameterized bylt,θ : x cos θ + y sin θ = t.

For any use outside of theoretical understanding, this model must be viewedin a discrete light where the line becomes a strip and the value pτiθ recorded bythe ith detector in reality corresponds to the value of pθ(t) integrated over asmall range of t values that spans the detector’s width:

pτiθ =

∫ i∆+∆

i∆

pθ

(t− 1

2Nd∆

)dt (2.4)

where Nd is the number of detectors on the array and ∆ is the width of onedetector. The set {pτiθ |θ ∈ Θ, i ∈ {0..Nd − 1}}, with Θ a finite set of projectionangles and Nd a finite number of detectors, organized as pixels in a 2D imagewith rows corresponding to equal θ and columns corresponding to equal τi, isreferred to as a sinogram. In tomographic scenarios, the perfect description ofµ that any reconstruction algorithm tries to approximate is called the phantom.Figure 2.2 shows an example of a (discrete) phantom and its correspondingsinogram.

In order to get to a reconstruction that approximates µ, it is necessary to finda method to somehow invert the relation between µ and p given in equation 2.3.These so called tomographic reconstruction methods can be divided into twocategories: Analytical methods that directly invert the relation, and algebraicmethods that use the relation as it is and iteratively look for a µ that fits themeasured p most accurately.


Figure 2.1: Visualization of a projection pθ(t) of an object µ(x, y) along a certainangle θ as done in a CT-scanner. Both the detector block and the radiationsource are rotated around µ(x, y) for various values of θ. Adapted from [20].

(a) phantom (b) sinogram

Figure 2.2: A phantom and its corresponding sinogram. Tomographic methodstry to come as close as possible to reconstructing 2.2a from 2.2b.


2.1.1 Analytical Methods

Analytical tomographic reconstruction methods are based on a continuous rep-resentation of the reconstruction problem that is discretized in order to fit prac-tical usage. With the object to be reconstructed modeled as a continuous densityfunction µ, values of the sinogram are given by equation 2.3.

We can take its Fourier Transform F [pθ(t)] = Pθ(ω) and obtain:

Pθ(ω) =

∫∫∫ ∞−∞

µ(x, y)δ(x cos θ + y cos θ − t)e−2πitω dtdxdy

=

∫∫ ∞−∞

µ(x, y)e−2πixω cos θe−2πiω sin θ dx dy

= F [µ(x, y)] = M(ω cos θ, ω sin θ)

(2.5)

This relation is known as the Fourier Slice Theorem. It appears that to get µ,all that is needed is taking the inverse Fourier Transform of Pθ(ω):

µ(x, y) =

∫∫ ∞−∞

Pθ(ω)e2πiω(x cos θ+y sin θ) d(ω cos θ) d(ω sin θ) (2.6)

We now switch from the Cartesian (ω cos θ, ω sin θ) to polar (ω, θ) coordinatesto obtain:

µ(x, y) =

∫ ∞0

∫ π

0

Pθ(ω)e2πiω(x cos θ+y sin θ)ω dω dθ (2.7)

Using the fact that in polar coordinates g(ω, θ + π) = g(−ω, θ) for any g, wecan rewrite the integral as:

µ(x, y) =

∫ ∞−∞

∫ π

0

Pθ(ω)e2πiω(x cos θ+y sin θ)|ω|dω dθ (2.8)

This integral is commonly expressed in two parts:

µ(x, y) =

∫ π

0

Qθ(x cos θ + y sin θ) dθ (2.9)

with

Qθ(t) =

∫ ∞−∞

Pθ(ω)e2πiωt|ω|dω (2.10)

The convolution theorem states that F [f ∗ g] = F [f ] · F [g] and equation 2.10can thus equally well be written as a convolution for a perhaps slightly moreintuitive description:

Qθ(t) =

∫ ∞−∞

pθ(τ)h(t− τ) dτ (2.11)


where F [h(t)] = |ω|. Relations 2.9 and 2.10, depending on integrals over con-tinuous variables while a CT scanner can only measure data in discrete steps,have no practical use in this form. Analytical methods in practice therefore usea discretized version to get an estimate of µ(x, y). The widely used FilteredBackprojection (FBP) method [15], can be built with the discretized versions ofequation 2.9 and 2.11:

µ(x, y) ≈ FBPh(x, y) =∑θ∈Θ

∑τ∈T

h(τ)pθ(x cos θ + y sin θ − τ) (2.12)

where T is the set of detector positions as pictured in figure 2.1, Θ the set ofangles at which a projection has been measured and h(τ) is called the filter. Inthe case of a discrete pθ linearly interpolated values between two pixels can beused. Furthermore the convolution can again be replaced by a multiplication inFourier space.

As the inverse Fourier transform h(t) = F−1[H(ω)] only exists for square in-tegrable H(ω) and H(ω) = |ω| does not meet this requirement, various discreteapproximations are used in practice, such as the Ram-Lak filter, the Shepp-Logan filter and Hann filter [9]. The filter used in FBP reconstructions through-out this research is the Ram-Lak filter h(t) = F−1[H(ω)] defined by:

H(ω) =

{|ω| |ω| 6 ωc

0 |ω| > ωc(2.13)

for a certain cut-off frequency ωc. This gives

h(t) = 2ω2c sinc(2tωc)− ω2

c sinc(tωc) (2.14)

The Nyquist-Shannon theorem [22] states that if an original continuous sig-nal contains no frequencies higher than some frequency ωc, it can be reliablyrepresented and reconstructed using a sampling interval of 1

2ωc. Following this

idea, if we want to transform an unknown signal to Fourier space that has al-ready been sampled using intervals of width ∆, frequencies higher than 1

2∆ arenot needed in our Fourier transformed signal, since they would represent infor-mation that was not originally contained in the discrete signal. The best cut-offfrequency to define our Ram-Lak filter is therefore ωc = 1

2∆ , where ∆ is thedetector width. Both H(ω) and h(t) are depicted in figure 2.3, where this valuefor ωc is used.

The process of turning a sinogram into a reconstruction can be divided intwo steps: In the first step all rows corresponding to all different θ are convolvedwith the filter. In the second step pixels x and y determine an interpolated valuein each row that is summed over all θ. The convolution of a detector array Pθof length Nd and a discretized filter array h of length Nh done directly as inequation 2.12 takes O(NdNh) time. However, the same calculation can be donein Fourier space as a simple multiplication according to the convolution theorem.Any length n array can be efficiently converted to Fourier space in O(n log n)


c 0 c

frequency

0

cH

()

(a) The Ram-Lak filter in Fourier space.

-7 -6 -5 -4 -3 -2 - 0 2 3 4 5 6 7detector position t

14

2

2 2

0

h(t)

h(t)h(n )

(b) The Ram-Lak filter in real space. h(t) represents the ideal continuous version andh(n∆) the set of points in the discretized version.

Figure 2.3: Visualisations of the Ram-Lak filter.


time using a fast Fourier transform algorithm. The filter has to be convertedonly once, so after that convolution with any Pθ just takes O(Nd logNd) time.Step one, convolving a full Nθ ×Nd sinogram with a filter, is therefore done inO(NθNd logNd) time.

As linear interpolation is hardwired on GPUs and takes no more time thandirect memory access [3], step two in which the convolved sinogram is summedover all θ ∈ Θ for N different interpolated values of x cos θ and N differentinterpolated values of y sin θ is done in O(NθN

2) time. In practice, since usuallyNd logNd � N2, FBP reconstruction times are governed by step 2.

The FBP method produces excellent results when low noise projection datais available at many angles. However, the quality of the output produced by theFBP method is heavily dependent on the range of projection angles and numberof detectors. Especially the former presents a problem in many applications, asit is not always wanted or possible to obtain projections at a large number ofangles.

2.1.2 Algebraic Methods

Algebraic methods use a representation of the tomographic problem that is al-ready discrete to begin with by describing the pixels of the sinogram as a set ofequations that linearly depend on the pixels of the reconstruction. Because thisapproach is discrete from the start and not a discretized version of a continu-ous approach, algebraic methods are often better at handling projection datarecorded at fewer angles. This comes with a price however, as solving systemsof large linear equations is usually computationally expensive.

If we superimpose a square N ×N pixel grid on our original image µ(x, y),assume µ(x, y) to be constant in each cell and consider a detector array with Nddetectors, we can discretize µ(x, y) into a vector ~µ ∈ RN2

and define a measuredprojection ~pθ ∈ RNd at an angle θ as:

N2∑j

wθ,ijµj = pθ,i (2.15)

or in matrix formWθ~µ = ~pθ (2.16)

Here the elements wθ,ij of Wθ specify the contribution of the content of cell j todetector i for the projection done at an angle θ by the ray emitted opposite to thedetector. Matrix Wθ does not depend on the image that is to be reconstructed,only on the projection geometry of the system. Note that for any given θ mostelements of Wθ are 0. Instead of a matrix Wθ of dimensions Nd × N2 and avector ~pθ of length Nd, we can group all angles together into one single matrixequation with a matrix W of dimensions NdNθ × N2 and a vector ~p of lengthNdNθ. This is also pictured in figure 2.4.

W~µ = ~p (2.17)


Figure 2.4: Visualization of the elements of projection matrix W . From [15]


The task at hand now is to solve for µ. The system of equations can beunderdetermined if N2 > NdNθ, allowing for an infinite number of equally pos-sible solutions. It can also be overdetermined if N2 < NdNθ. Since in practice ~pis usually corrupted by noise, this does not imply redundant information. Dueto noise corrupted measurement data, such an overdetermined system has nounique solution. The solution we are looking for can be defined in the followingway:

~µsolution = arg min~µ

||~p−W~µ||2 (2.18)

Since in practice the systems of linear equations are large but sparse, findinga solution does not involve matrix inversion but is done by use of an iterativealgorithm to save memory. The SIRT algorithm goes through the linear equa-tions one by one from an initial guess µ0 of µsolution that is repeatedly forwardprojected and updated in the following way:

~µi = ~µi−1 − ~µi−1 · ~wi − ~pi~wi · ~wi

~wi (2.19)

Here ~wi = (wi0, wi1, wi2, ..., wiN2), row i of matrix W . After the last row ati = NdNθ − 1 is reached, the algorithm will continue at the top row of W .

Any underdetermined system of equations will converge to the µ that isclosest to µ0. An overdetermined noisy system of equations will not convergeto a unique solution, but will oscillate between solutions specified by the row ofW that was seen last.

An additional advantage of these iterative algorithms is the possibility toincorporate abstract prior knowledge of the scanned object in between iterations.The easiest example of this would be disallowing negative pixel values in thereconstruction by setting them to zero. Another possibility is the addition ofa ’punishment’ function to equation 2.18 that is low for ’good’ reconstructionsand high for ’bad’ ones. This is called regularization and is can be understoodin the following way:

~µsolution = arg min~µ

(||~p−W~µ||2 + λfreg(~µ)

)(2.20)

Here freg : RN2 → R can be any regularization function and λ is a constantcontrolling its relative importance. A regularization function could for examplebe a discretized summed gradient of the whole reconstruction discouraging theformation of small artifacts, as is done in total variation minimization basedalgorithms. More complicated reconstruction object dependent requirementscan also be enforced. Requirements posed on the solution are meant to have aconvergence inducing effect on reconstruction algorithms.

Although we can easily define the perfect µsolution, in practice the iterativealgorithms used to solve sets of linear equations will only approximate it with anaccuracy that depends on the number of iterations. Furthermore the direction


in reconstruction space from which the solution is approximated depends on thealgorithm and starting conditions used.

For a ratiodpixdτ

between pixel width and detector width, we can infer fromfigure 2.4 that for each θ the number of non-zero matrix elements will roughlyscale with

dpixdτN2 and thus with

dpixdτNθN

2 for the whole matrix W . Since thecomputation time of an iteration in 2.19 is governed by the forward projections~µi−1 · ~wi, going trough the full matrix W once is comparable in computationtime to performing one FBP reconstruction. Total computation time of iterativereconstruction methods naturally scales with the number of iterations.

2.1.3 Practicalities & Noise

The relation between the attenuation coefficient as a continuous function ofspace and the noiseless sinogram as a continuous function of projection angle anddetector position is described by the clearly defined 1:1 mapping in equation 2.3.Apart from the aforementioned discretization issues however, in order to makea reasonable simulation of the realistic pitfalls surrounding the tomographicproblem it is necessary to reconsider the meaning behind the line integrals thatproduce the sinogram’s pixel values, by investigating the physical process thatunderlies equation 2.1.

A standard CT-scanner uses energy integrating detectors to get an estimateof I at each detector site. Simplified, an energy integrating detector setupconsists of a radiation source at one end of the object that is to be reconstructed,and a scintillator coupled to a photodiode array at the other end. This setup ispictured in figure 2.5. Although electron beams can be and sometimes are usedin CT, the most common radiation source is an X-ray tube.

An X-ray tube is a gas-filled chamber subject to an electric field resultingfrom a voltage difference between the two ends of the tube. By transportingelectrons through the gas, it converts electrical energy into heat and polychro-matic radiation with photon energies up to the energy corresponding to theapplied voltage difference. Usually a filter is applied after the tube to preventthe emission of lower energy photons that would not survive the journey to-wards the detector array. An X-ray beam therefore consists of a collection ofX-ray photons with different wavelengths, distributed according to an energyspectrum characteristic to the filter. The total energy carried by all photonsemitted by an X-ray tube in the direction of one detector during the recordingof one projection is referred to as I0 in equation 2.1.

A scintillator is a material that emits lower energy photons upon excitationby ionizing radiation such as X-ray photons. To minimize radiation exposureto the object, a good scintillator is dense or large enough to be able to absorbclose to all energy contained in the X-ray beam and in this way maximizethe information we extract from it. Scintillators do not emit all their energyimmediately after absorption, but are bound to a decay time in which theirphoton emission rate decreases exponentially. This decay time depends on thematerial and its impurities. It should be clear that scintillator decay timesideally be as short as possible in order for the detector to be ready for the next


X-ray tube

Photodiodes

Scintillator

Object

Figure 2.5: Schematic representation of the CT-scanner detection mechanism.


projection in time.A photo diode produces an increasing electrical current upon absorbtion of

low energy photons, giving us a way to estimate the total energy contained inthe incident X-ray beam.

The energy of such an incident beam scales with the number of photonsemitted per unit time from the X-ray tube and with the ’shutter’ time usedfor each projection. The rate at which a radiation source emits photons is notconstant but subject to Poisson distributed random fluctuations that propagatetheir way through the detection mechanism and show up in the integrated cur-rent produced by the photo diodes over the course of one projection. In otherwords, a higher current could either mean the beam went through lower densitymaterial, or the X-ray tube coincidentally produced more photons during thetime period of the projection. Following the Poisson distribution, these photonemission fluctuations increase with decreasing total photon count and result inso called ’spot noise’ in the sinogram. As the total photon count is directlyrelated to the radiation induced damage to the scanned object that is to beminimized, for any practical applications it is essential to implement the effectof photon count on reconstruction quality into our simulations. In this research,spot noise is simulated by adding Poisson distributed random fluctuations tothe radiation intensity I0, resulting in sinogram pixel value fluctuations deter-mined by equation 2.2. These fluctuations are applied on a pixel-by-pixel basiswithout any inter pixel correlation, as is the case in reality.

The width of the Poisson distribution and thus the magnitude of the fluc-tuations also depends on I0, which can now be viewed not as the exact, butthe average intensity of an X-ray beam over time as it comes out of the tube.As fluctuations in a Poisson distribution decrease with an increasing average, anoiseless scenario corresponds to an X-ray tube with I0 → ∞. Spot noise in asinogram results in a situation where the system of linear equations as describedin equation 2.17 is inconsistent.

In reality the attenuation coefficient µ depends not only on the density of thematerial, but also on the wavelength of the photon traveling through it. Thisresults in a process called beam hardening in which the peak of the distributionshifts to shorter wavelengths as the beam travels through the object. Thisresearch does not account for the effects of beam hardening on reconstructionquality.

2.2 Artificial Neural Networks

An artificial neural network (ANN) is a collection of interconnected units thatmimics some basic principles of the biological brain. In this analogy a unitcan be compared to a biological neuron and the connections between units toits axons. Connections between units have a certain strength called a weight.Each unit has a number of input channels that pass on scalar values produced bypredecessor units. The set of input values that is passed on to a unit through itsinput channels can be organized in a vector. Using a relatively simple function,


a unit produces a scalar output value based on its input vector and the set ofweights that comes with its input channels. The output value is passed on toany input channels connected to the unit that are in turn connected to anothersuccessive unit. This process continues until the final output units have beenreached, producing an output vector that is the network’s answer to the questionencoded in the input vector. An ANN thus maps an input vector to an outputvector and can be described as a function f : RI → RO where I is the size ofthe input vector and O the size of the output vector. The specific properties off depend on the weights.

The question remains how to find the weights or parameters that give usthe desired solution for a certain input vector. The process of finding theseparameters is called learning or training. This text addresses only one kindof learning: supervised learning, in which a training set of inputs and theircorrect outputs is known and a cost function can be designed that producesa single scalar error value by comparing the network’s output to the correctoutput after propagating the corresponding input through all the layers usingequation 2.21. In imaging problems it is common to use a mean squared erroror similar cost function. Before any training is done, the parameter values arepulled from a certain probability distribution for the network to be able to get itsfirst output. After obtaining the first output and the corresponding error value,the parameters are then iteratively adjusted with steps that try to minimizethe error value using stochastic gradient descent or more advanced variationsof it, such as ADADELTA [23]. This goes on until the difference between thenetwork’s outputs and the correct outputs becomes smaller than a certain errormargin. The idea is that if the training set is large and diverse enough, thenetwork will not just provide correct outputs for the inputs of the training set,but also for new inputs that do not yet have a known correct output due togeneralization. A training set that is larger and more diverse will generalizebetter. So far the idea is completely general.

This general nature of the ANN concept allows it to find use in many differ-ent applications, one of which is tomography. A recent paper [20] has shown thata simple kind of neural network called a multilayer perceptron can be used toreconstruct tomographic images in a way that shows similarities to the filteredbackprojection method. This increases the ability to handle limited angle pro-jection data, while keeping the computational cost low. Furthermore it opensup the possibility to incorporate prior knowledge of the scanned object intothe process of finding an output, something that can not be easily done in thefiltered backprojection method.

2.2.1 The Multilayer Perceptron

A multilayer perceptron is an ANN consisting of a collection of successive layers,or columns of units. The values contained in the input and output units canrepresent any desired kind of digital data, e.g. a picture or an audio fragment,but also any vector with a more abstract meaning. Between the input and out-put layer lie a number of hidden layers. The number of units in a hidden layer,


Output

Hidden

Input

a00

a01

a02

a10

a11

a12

a20

a21

a31

Figure 2.6: A Multilayer Perceptron with one hidden layer.

or the network’s width, can vary wildly and finding the width that is optimal fora given problem can be difficult. The number of hidden layers themselves is notfixed either, ranging between one to occasionally several hundreds. A neuralnetwork with more hidden layers is called deeper, while one with less hiddenlayers is called shallower. A simple multilayer perceptron with one hidden layeris shown in figure 2.6, where connections between units are depicted as arrows.

The weights connecting units between layers are depicted can be organizedin matrix form. If matrix W l contains the weights of the connections betweenlayer l and layer l−1, element wlij of W l is the weight of the connection between

unit i of layer l − 1 to unit j of layer l. Furthermore, a number of biases ~bl iscommonly added to the value of the units, where element blj is added to thevalues of unit j in layer l. The weights and biases together form the parameters.After applying these two operations to a layer, an activation function σl(x),is applied element wise to the layer by the unit. This function is necessaryto allow the total operation performed by the network to be non-linear andtherefore has to be a non-linear function. Activation functions can be of manyforms and can even differ per layer. The Rectified Linear Unit (ReLU) [12] isbiologically motivated and mathematically justified [11] and the most popularactivation function in the field of deep learning today [16]. A standard ReLUsimply returns σ(x) = max(0, x).

After the last hidden layer comes an output layer, sometimes followed byanother activation function, producing the answer that the network has comeup with. A group of arrows pointing towards a single unit as in figure 2.6


represents the following computation of its corresponding value ali:

ali = σl

∑k

wlikal−1k + bli

(2.21)

and as such can only be done once all previous values al−1k have been computed.

In matrix-vector notation:

~al = σl(W l~al−1 +~bl

)(2.22)

In these formulas l = 0 corresponds to the input layer.Therefore, the function that the network applies to its input is determined

completely by its activation functions and set of parameters, the weights andbiases. Although the choice of activation functions matters for the output, theyare conceptually less important than the parameters.

2.2.2 Convolutional Neural Networks

The multilayer perceptron described in section 2.2.1 has connections betweenall units in adjacent layers.

Assuming the width of the network, the number of units in the hidden layers,is increased proportional to the number of input units, the total number ofparameters in such a network structure will grow with the number of inputunits squared.

As the parameters essentially determine all the properties of the networkand thus its ability to do a certain task, the number of parameters is called thecapacity. The minimum capacity necessary for successful training and imple-mentation of a neural network is dictated by the complexity of the function thatis to be approximated by the network, i.e. the problem that is to be solved.Although it is difficult to find an exact value for this minimum, it is essentialto stay above it in order to produce optimal results.

At the same time, a network with a capacity that is far above this minimumis more prone to overfitting, a process in which a network has a capacity solarge that it is able to memorize its training set, instead of generalizing on it.This results in a lower error on the training examples, but a higher error in theevaluation of not previously seen input data. Furthermore, increasing the capac-ity increases training time and computation time, which can be problematic inapplications involving larger numbers of input units, such as image processing.Apart from increasing training time due to the need to adjust a greater numberof parameters, having a larger parameter space can also mean that it is moredifficult to find a parameter set that produces desirable results.

It is therefore useful to consider adaptations to the multilayer perceptronnetwork structure in order to minimize this scaling effect. It is generally believedthat earlier layers in a network process more ’low level’ or local information andinformation processing in later layers concerns more ’high level’ or abstract


features of the input. From a theoretical perspective, it seems unnecessary foruncorrelated units in the input layer, such as pixels in different corners of animage, or samples on both ends of an audio fragment, to be connected to thesame unit in the next layer, as the value of this unit should not depend on bothof these previous units. One way to enforce this assumption of locality is tosimply not allow any direct connections between uncorrelated units by settingthose matrix elements to zero.

Furthermore, it seems reasonable to assume that for many types of data,low level information emerges from a set of features that is universal acrossdifferent locations in the data. Using a typical photograph as an example,we can think of low level information as pixels and high level information ascertain notions about specific objects. In between them are increasingly complexfeatures starting with e.g. lines and dots, then shapes, then combined shapes etc.Many of the functions we could want a neural network to perform depend on therecognition of these features at lower and higher levels, but do not change withthe location of these features in the image: they are translationally invariant.

Both the assumption of locality and the need for translational invariancenaturally point us towards the use of the convolution operation instead of matrixmultiplication. In mathematics, a convolution ∗ is an operation on two functionsf and g, producing a third function f ∗g that can be viewed as a modified versionof f by a filter or kernel g. A 1D convolution of f : R → R and g : R → R isdefined as:

(f ∗ g)(t) =

∫ ∞−∞

f(τ) · g(t− τ)dτ (2.23)

or discretized for finite arrays

(f ∗ g)[t] =∑τ∈T

f [τ ] · g[t− τ ] (2.24)

Confusingly however, the definition of convolution as used in machine learninghas a plus sign instead of the minus sign:

(f ∗ g)[t] =∑τ∈T

f [τ ] · g[t+ τ ] (2.25)

This is in fact the definition of cross-correlation commonly stated in mathemat-ics, but from here on, the word convolution will mean the operation done inequation 2.25. Instead of a multiplication by a learned matrix as done in a fullyconnected layer, a convolutional layer as found in a CNN convolves its inputwith a learned discrete filter. A convolution with a discrete filter is equivalentto multiplication with a matrix in which only a small band of matrix elementsaround the diagonal can be non-zero, providing us with only local connections.Convolutions are furthermore translationally invariant by definition. CNNs areoften used in image processing where they apply discrete 2D convolutions to2D images. A discrete 2D convolution of an image f with a filter g is describedmathematically in the following equation and visually in figure 2.7.


Figure 2.7: Convolving one 2D input image with one 2D filter/kernel. From[10].

(f ∗ g)[x, y] =∑i∈I

∑j∈J

f [i, j] · g[x+ i, y + j] (2.26)

To avoid shrinking of the images after each convolution, an image can bezero padded, surrounded by a frame of zeros.

While the connections between layers in a multilayer perceptron are de-scribed by a single matrix multiplication, a convolutional layer typically appliesmany filters to the incoming information, resulting in a multi-channel informa-tion stream. Convolutions on different channels done with the same filter arethen summed together before they are passed on to the next layer. The wholeprocess is pictured schematically in figure 2.8.

Even though CNNs can have many filters, their capacity does not increasewith the number of input units, making them ideal for imaging and other highdimensionality computations.

Direct 2D convolution of an N×N image with an n×n filter takes O(N2n2)time. As seen earlier, both could be converted to the 2D Fourier domain toturn the convolution into a simple multiplication. The filter could be trainedin Fourier space to skip one conversion, in which case convolving it with an


+

+

+

+

+

+

Input

Figure 2.8: Three convolutional layers with three filters per layer, applied toone input image. An arrow signifies one convolution operation as described infigure 2.7. Different colors signify different trainable filters. Stacks of imagesare added together after convolution to create one 2D image before the nextconvolution is applied.

image would take O(N2 log2N) time. However, in practice usually n < logN ,resulting in longer computation time via this route. Since each 2D convolution isdone in O(N2n2) time and the number of needed convolution operations scaleswith the number of filters squared, propagating an N ×N image through an Nllayer CNN featuring Nf n× n filters per layer takes O(NlN

2fN

2n2) time.

Chapter 3

Related Work

A first approach in trying to apply the power of neural network computationto the tomographic reconstruction problem would be to simply try to train amultilayer perceptron to map limited angle sinograms directly to their higherquality reconstructions in a supervised learning phase. This treats the networkas a black box that is responsible for each step in the reconstruction process.However, apart from its more delicate and complicated aspects, we can be surethat any well-functioning reconstruction process would incorporate a conversionbetween polar and Cartesian coordinates somewhere between sinogram and re-construction. By supplying the network with a cheap FBP reconstruction in-stead of the sinogram, this unavoidable conversion does not have to be lookedfor in the depths of the network’s parameter space. Furthermore, the correla-tions between input and output are more local if both input and output live inthe same space. This shortens the minimum needed scope, the distance betweeninput and output pixels that should be able to influence each other. As thisscope becomes longer with larger filters and more layers, a shorter minimumscope allows for fewer parameters which is good for training. When not takinginto consideration the discretization induced information loss, converting thesinogram to an FBP reconstruction before feeding it to the network can be seenas a change in information representation without an actual change in informa-tion. This is comparable to using the frequency spectrum of an audio fragmentinstead of the waveform itself. The scale of information loss that inevitablydoes come with this conversion to reconstruction space decreases with increas-ing number of detectors and projection angles. Both the following methods havein different ways implemented this conversion in their neural network assistedreconstruction method.

3.1 Neural Network Filtered Backprojection

In [20] it is shown that by feeding the network a preprocessed version of thesinogram as input instead of the sinogram itself, it is possible to let the network

27

CHAPTER 3. RELATED WORK 28

reconstruct just a single pixel at a time instead of reconstructing the wholeimage at once. This corresponds to a network with just a single output unitthat returns values in R. The function that preprocesses the sinogram exploitsthe linear shift invariance of the tomographic problem by shifting and flippingthe pixel values of the sinogram and then summing the result over all angles, asexplained in more detail in [20]. The exact operation to get the ith element ofthe network’s input vector for the reconstruction of a single specified phantompixel (x, y) looks as follows:

~aini (τi, x, y) =∑θ∈Θ

Pθ(x cos θ + y sin θ − τi) (3.1)

with τi ∈ T , the set of detector positions and θ ∈ Θ, the set of angles. This issimply equation 2.12 without convolution with the filter. This gives us a vectorof a length equal to the number of detectors in the CT scanner that is used asthe network’s input. Bringing the size of the input layer down in this fashion,while also reducing the number of output units to 1, the number of parametersneeded to create the network is decreased immensely.

We can conclude from equation 2.12 and figure 2.3 that in a regular Ram-Lak FBP reconstruction, elements near the middle of ~ai contribute more tothe final value of the reconstructed pixel than elements near the beginning andend of ~ai, as h(τ) is much larger near τ = 0. Assuming our trained networkwill do something similar that is just more optimized for our specific task, wecan force this knowledge of relative vector element importance into the networkusing a technique called exponential binning, in which the vector is shortenedby summing elements in groups that exponentially increase in size while goingfrom the middle to the endings of the vector/detector array. Since h(τ) is alsosymmetric, by the same line of reasoning a further reduction in input size can beobtained by also summing together elements that live on opposite sides of themiddle. We can tweak the relative importance of pixels away from the middleby starting the exponential binning process l elements after the middle instead

of immediately. An example of an ~aini that is exponentially binned with base 2,starting at element l after the middle m looks like:

ain,expi =

am i = 0

am+i + am−i 0 < i 6 l2i−l−1∑j=2i−1−l

am+j + am−j i > l

(3.2)

The preprocessing action performed on the sinogram depends only on thecoordinates of the pixel that the network is asked to reconstruct. After prepro-cessing, the vector is related to the reconstruction pixel in a way that does notdepend on the location of the pixel anymore. As the network does not knowwhere the pixel it is reconstructing belongs, it is sufficient to pick a number ofrandom pixels to reconstruct for each image, instead of training the network on


full images. In this way the training set can be reduced in size, without reducingtraining sample diversity.

Furthermore, it is shown in [20] that propagating the preprocessed sinogramthrough the network can be interpreted as a series of FBP reconstructions ofthe sinogram that is summed with learned weights after applying the network’sactivation function on them. However, these FBP reconstructions are doneusing a collection of filters that don’t necessarily satisfy or even approximate thecondition imposed on the filter h(τ) of equation 2.11 stating that F−1[h(τ)] =|ω| must hold. Instead they are optimized for the specific properties of theimages in the training set, such as the number of angles and detectors and theamount of noise, but also the type of image content. The set of filters in thismethod corresponds to a certain selection of learned network weights. For thisreason, the method is named Neural Network Filtered Backprojection (NNFBP).The network has just one hidden layer and the value in its single output unit iscalculated by the following formula:

aout(~ain(x, y)

)= σ

Nhid∑k=1

[qkσ

(FBP~wk(x, y) + bk

)]+ bout

(3.3)

where aout is the value of the reconstructed pixel, ~wk represents a row of theweight matrix that is used as an FBP filter, Nhid is the number of units in thesingle hidden layer and qk, bk and bout make up the rest of the network’s weights.This approach opens up the possibility to decrease reconstruction time by theuse of highly optimized Fast Fourier Transform algorithms to do the FBP re-constructions instead of letting the network perform its matrix multiplicationsin the regular way. An obvious direction to follow in an attempt to improvethis method is the addition of more hidden layers. After some experimentinghowever, we did not find any significant improvement in reconstruction qualityusing this method. This might be explained by the strong compression of avail-able information done during the sinogram’s preprocessing, removing e.g. thepossibility to implement information from a reconstruction pixel’s surroundings.Additional layers can furthermore not be interpreted as an FBP with a customfilter. Together with the fact that neural network and computer technology ingeneral have advanced so rapidly that parameter reduction measures need notbe as drastic anymore, an incentive has emerged to look beyond the NNFBPmethod.

Since preprocessing of the sinograms in this research is not done with an effi-cient GPU algorithm, it is difficult to provide any useful actual NNFBP networkruntimes, however, the more conceptual time complexity of the NNFBP networkcan still be investigated. Formula 3.3 starts by calculating FBP reconstructionswith Nhid different filters, taking O(NθN

2) time for each of these conversionsof a sinogram with Nθ projection angles towards an N ×N reconstruction grid.The remaining operations all take N2 time, resulting in a final time complexityof O(NhidNθN

2), indeed significantly faster than the time complexity of the 2Dconvolutional network described in section 2.2.2.


3.2 FBPConvNet

In [14] a CNN based method named FBPConvNet is proposed that is trainedto map FPB reconstructions from noiseless limited angle projection data totheir many angle counterparts, improving image quality by reducing artifacts.The method uses many angle and limited angle reconstructions of a numberof randomly positioned ellipses as training examples. This method has thefollowing disadvantages:

• The best way to implement their method in practice would involve a largedataset of experimentally measured many angle projection data and theirrespective FBP reconstructions. However, many angle projection data isscarce and expensive to make, creating a situation in which the obtainedqualitative results on simulation data don’t translate well to real life use.

• The FBPConvNet paper does not take into account the effects of noisyprojection data on reconstruction quality.

Chapter 4

Network

The key problem that holds us back in tomography is the fact that projectiondata generally contains less accessible information about the object than theideal reconstruction we are trying to map it to, resulting in an underdeterminedsituation in which a range of reconstructions fit the projection data equally well.In order to specify a preferred solution within this range it is therefore necessaryto somehow inject additional information to the process, information that is notdirectly present in the projection data but can be viewd as foreknowledge aboutour reconstruction image space. This works because although theoretically ann-bit N × N px image has an image space of (2n)N

2

possible combinations,images that represent something from our physical world usually inhabit animage space that is many times smaller.

In analytical methods these preferred solutions are specified in a primitiveway by the choice of filter. The iterative SIRT algorithm does not specifyany preferred solution and therefore has a tendency to explore the range ofpossible solutions instead of converging to a specific one. The iterative TV-MIN algorithm already adds arguably more sophisticated knowledge than ananalytical filter: the knowledge that real world images do not have a high totalvariation. Requiring an image to have a low total variation poses an enormousconstraint on the range of possible solutions, steering the algorithm away fromunphysicality.

However, the range of images that is allowed by TV-MIN algorithms is heav-ily dependent on only a few parameters that, since optimal parameter valuesdepend on the specific content and type of data that is to be reconstructed, needto be heuristically found and manually set. A broader selection of parameterswould be better suited to specify the complicated rules that underly the imagespace of our physical reality, but such a selection would be impossible to tunemanually. Since neural networks are essentially large collections of automati-cally tuned parameters, they seem to be perfectly fit for the tomographic task.A trained network that is presented with unseen projection data can incorporatepart of the information present in the full training set to come to its solution.The NNFBP method described in section 3.1 looks for the right parameters in

31

CHAPTER 4. NETWORK 32

the form of a custom FBP filter that best fits the data presented. However, theelegant parameter reduction ways that were able to turn the problem from asinogram→image mapping into a vector→pixel mapping at the same time elim-inate the possibility for the network to find global and semi global properties ofimages in our reconstruction space.

Since computer and neural network development have brought us to a pointwhere the fast training of larger networks is not as problematic as it once was, itseems fruitful to try to find out whether these newer ideas and technologies canimprove on the NNFBP method by allowing for the learning of these more globalimage properties. Because the task at hand, injecting additional global informa-tion based on an educated guess, is similar to the problem of super resolution,super resolution networks have been the predominant source of inspiration inthe search of a network fit to our task.

4.1 Structure

Our network is fully convolutional and can therefore be applied to images of anysize, even after training is already done (although how this affects reconstructionquality has not been investigated). Inspired by [8], it consists of a numberof successive convolutional layers with a number of filters that is halved afterthe first two hidden layers and then after every three hidden layers. The lastlayer has only one filter. Following [8], the first layer has filters of size 52

while all subsequent layers have filters of size 32. So if we specify a networkusing (#filterslayer 1 × filter sizelayer 1,#filterslayer 2 × filter sizelayer 2, etc...) anexample of a 9 layer, 128 filter network would look like (128× 52, 128× 32, 64×32, 64 × 32, 64 × 32, 32 × 32, 32 × 32, 32 × 32, 1 × 32) after the input layer. Toget a feel for the optimal size and configuration of this network, it has beentrained and tested at depths of 3, 6, 9 and 12 layers excluding the input layer,in combination with 32, 64, 128 or 256 starting filters. The network makes useof residual learning [13], featuring skip connections between every third layerin order to ease the training of higher capacity networks. Between adjacentlayers a ReLU [12] is applied. Our network takes two input channels for themethod described in chapter 5 and one input channel for the method describedin chapter 6. A visual representation of the network structure is given in figure4.1.

The quantitative measure of reconstruction quality used throughout thisresearch is the Signal to Noise Ratio (SNR) defined as:

SNR(~xtrue, ~xpred) = 10 log10

~x2true

(~xtrue − ~xpred)2(4.1)

where ~xpred would would be the reconstruction predicted by our method and~xtrue the reconstruction we are attempting to approximate. It should be notedhowever, that even though the original source of all projection data in our sim-ulation is available, these phantoms are never used during the training phaseof our network as they would not be available in the real life scenarios that we


ReLuReLu

ReLuReLu

ReLu

128x128x

64x64x

64x1x

Skip connection

input

output

Figure 4.1: Structure of the proposed deep convolutional network. This examplestarts with 128 filters and has 6 layers.

are trying to imitate. Instead, the SNR that is to be maximized by the networkduring the training phase is computed with respect to the target outputs, eithermany angle FBP reconstructions or limited angle algebraic reconstructions. Ahigher SNR value therefore corresponds to a better approximation of the tar-get reconstruction method and not necessarily to a reconstruction closer to theoriginal phantom. For target reconstructions that are close to their correspond-ing phantoms this problem is largely theoretical, but as the sinogram’s noiseincreases or as its resolution decreases, traditional reconstructions can divergequite far from their original phantoms. To keep an eye on this, SNR valuesof network reconstructions after training are not just computed with respectto the network’s targets, but also with a test set of original phantoms. Eventhough this comparison could not be done in reality, it allows for some interest-ing results without influencing the training and reconstruction process. Sincethe SNR is a log function, SNR values can also be negative.

The network is created with Keras [6] and uses GPU acceleration. The SNRis used as loss value for training and maximized using the ADADELTA opti-mization algorithm [23]. A training set of 900 samples is used with a validationset of 50 and a test set of 50 samples. Training is done for a maximum of 150epochs, but is stopped if the SNR on the validation set improves by less than0.0001 within a window of 10 epochs to prevent overfitting on the training set.The network parameters are stored after each epoch and the network with thehighest SNR on the validation set is picked out. This network is then appliedto the test set, producing the final SNR value that is reported here as a qualitymeasure.


4.2 Training Data

Of equal importance to the final result as the specific network configurationsare the characteristics of the data flowing through these configurations duringtraining. All training data used in this research is simulated. Let Nd againrepresent the number of detectors, Nθ the number of projection angles and Nthe width and height of a reconstruction image. In a real world situation, thecomplete tomographic task we try to fulfill consists of mapping a point in the

limited angle sinogram space RNdN limθ measured by the CT-scanner to a pointin reconstruction space RN2

that is thought to best represent the scanned ob-ject. This research however, focuses on mappings between one point to anotherwithin this reconstruction space RN2

, where the output point is an in some wayimproved version of the input point that is lying somewhere in its neighbor-hood. As also mentioned in chapter 3 this ensures an easier and more localizedrelationship between in- and output, a feature that is beneficial in convolutionalneural networks especially, as is explained earlier in section 2.2.2.

For the network to learn a mapping in the training phase that is useful inany way, the reconstruction that is used as the network’s target output mustbe closer to the phantom image than the initial reconstruction used as thenetwork’s input, resulting from some beneficial difference in their their respectivereconstruction procedures. In this research two possible scenarios are consideredthat result in such a beneficial difference: A higher number of projection anglesin the sinogram while using the same reconstruction method, or a different andmore computationally expensive reconstruction method while using the samenumber of projection angles.

In chapter 5 limited angle FBP reconstructions and limited angle NNFBPreconstructions are mapped towards many angle FBP reconstructions, thus ex-ploring the first scenario. Chapter 6 focuses on mapping limited angle FBPreconstructions towards limited angle algebraic reconstructions, thus exploringthe second scenario. In both scenarios the influence of noise in the originalprojection data is investigated.

The following describes the basic process that precedes all different trainingsets used in this research:


Foundation for a training set

1. A set of 1000 high resolution Nhigh×Nhigh px grayscale phantomimages is created that contain a number of randomly rotated andshaped ellipses of different randomly picked shades with float pixelvalues between 0 and 1.

2. A set of many angle√

2Nhigh×Nmanyθ px sinograms with

√2Nhigh

detectors and Nmanyθ projection angles is created from the set of

Nhigh×Nhigh px phantoms. The factor of√

2 ensures that the de-tector array is as long as the diagonal length of the square phantomgiven a detector length of one pixel width.

3. Using anti-aliasing, each sinogram is scaled down along the detectordimension to produce an Nd × Nmany

θ px image corresponding toa sinogram with a lower amount of wider detectors, but with thedetector array spanning the same length overall.

4. Sets of many angle noisy sinograms are created by adding noiseto the noiseless sinogram pixels following a Poisson distributioncharacterized by the beam intensity parameter I0 from equation2.1.

5. From the noiseless set and the noisy sets of many angle sinograms,sets of limited angle Nd×N lim

θ px sinograms are created by takingout detector rows at a certain selection of angles.

6. All limited angle and many angle sinograms are now converted backto N low × N low reconstruction space using the computationallycheap FBP method with the Ram-Lak filter. The limited angleFBP reconstructions will be used as inputs in the different trainingsets. The many angle FBP reconstructions will function as targetoutputs for the networks described in chapter 5 and as a simplequality comparison in chapter 6.

7. To be able to compare reconstructions with the phantoms theyinitially originated from, the original phantoms too are downscaleddown from Nhigh ×Nhigh px to N low ×N low px.

In practice we have used Nd = 729, Nhigh = 2048, N low = 512 and Nmanyθ =

1000. Two options for N limθ are chosen: N lim

θ = 50 (every twentieth detectorrow) and N lim

θ = 143 (every seventh detector row). Noisy sinograms are createdwith two options for I0: I0 = 1000 and I0 = 10000. Including the noiseless setthat can be understood as I0 =∞, this process produces 2×3 = 6 different setsof limited angle FBP reconstructions characterized by the different combinationsof respectively N lim

θ and I0 that all originate from the same set of of phantoms.For each combination a number of different networks is trained, as detailed in


chapters 5 and 6.For the creation of sinograms, the addition of noise and the creation of FBP

reconstructions the ASTRA Toolbox [18, 2, 1] is used.The downscaling of the sinogram is done to ensure that the sinogram’s pixel

values are comparable to real world measurements in which the ’resolution’ ofphysical space is much higher than the measurement resolution obtained by ourdetector array that is recording properties of this physical space.

4.3 Computational Cost

To complement the notions on computational complexity done at the end ofsection 2.2.2, in figure 4.2 is plotted the time needed to propagate one inputimage through our network for combinations of different network depths andnumbers of filters using Keras [6] on a system using one Nvidia GeForce GTXTitan X GPU driven by CUDA [7] and cuDNN [5]. The 512 × 512 px Ram-Lak FBP reconstruction that is needed as input for these networks takes anadditional 4.5 ms to reconstruct for a 50 angle × 729 detector sinogram and 8.4ms for a 143 angle × 729 detector sinogram using the ASTRA Toolbox with thesame setup. These reconstruction times do not depend on the type of targetreconstruction and thus are the same for both the many angle analytical andthe limited angle algebraic use cases.

The training needed for the network to converge on our training set of 1000input output pairs takes anywhere between 20 minutes for the lowest and 5hours for the highest capacity networks on a system using one Nvidia GeForceGTX Titan X GPU driven by CUDA [7] and cuDNN. [5]

Figure 6.1 contains the computation times that come with various SIRTreconstructions for comparison.

4.4 Reading Results

Performances of all individual networks trained on all combinations of data arepresented as bar charts in the results sections and using two quality measures:SNRphantom and SNRtarget. They are organized first by the number of projec-tion angles Nθ that is used to create the input FBP reconstructions and thenby their value for the beam intensity I0 that quantifies the amount of noise.

A copy of bar chart 5.5c, corresponding to a network trained to map noisyFBPs with I0 = 1000 and Nθ = 50 towards many angle noisy FBPs, is alsodisplayed in figure 4.3 to provide an example to the following description. In-formation in bar charts is communicated through bar heights and bar colors.

• SNRphantom: the average Signal to Noise Ratio between network recon-structions and phantoms in the testset, without involving the test set oftarget reconstructions. This value determines the height of a bar and isalso printed above the top of each bar for more accurate comparisons.


Figure 4.2: Computation time of reconstructions done using versions of theCNN with different combinations of layers and starting filters. All networkpropagations are done using Keras [6] on a system using one Nvidia GeForceGTX Titan X GPU driven by CUDA [7] and cuDNN. [5]

3layer 6layer 9layer 12layer

10

5

0

5

10

15

20

25

SNR p

hant

om 13.505.58

13.585.59

13.505.58

13.475.59

13.885.63

14.065.64

13.995.64

13.905.64

14.095.65

14.175.65

14.175.65

14.285.67

14.105.63

14.165.65

14.195.65

14.455.67

32 64 128 256 32 64 128 256 32 64 128 256 32 64 128 256

SNRphantomSNRtarget

#filters

10.76

NNFBP50

-8.74

FBP50

4.11

FBP1000

SNRs with phantoms and many angle RamLak FBP targetsUsing noisy projection data with I0 = 100050 angle FBP + NNFBP 1000 angle FBP

5.58

5.60

5.62

5.64

5.66

SNRtarget

Figure 4.3: Example bar chart and copy of figure 5.5c to illustrate the expla-nation in section 4.4. The SNRphantom values are higher than the SNRtargetvalues in this scenario because the network’s failure to reproduce the noise in thetarget reconstructions results in reconstructions more similar to the phantom.


• SNRtarget: the average Signal to Noise Ratio between network reconstruc-tions and the target reconstructions in the test set. This value determinesthe color of a bar and is also printed below the top of each bar for moreaccurate comparisons.

Bars that do not correspond to any convolutional networks do not have aSNRtarget and are left white. At the bottom of each bar the number of startingfilters of the network is specified. Bars are furthermore grouped by networkdepth. Because both color and height scales are set per chart, inter-chart com-parisons can only be done using the actual numerical SNR values above andbelow the top of each bar, not their heights and colors.

Chapter 5

Many angle analyticaltarget

When compared to FBP reconstructions from the same limited angle sinograms,although NNFBP [20] reconstructions are more blurry they also exhibit a smalleroccurrence of artifacts induced by noise and lack of projection angles. The differ-ence in reconstruction quality becomes more notable in scenarios with smallerphoton counts and fewer projection angles. Example reconstructions of boththe NNFBP and FBP method in a 50 angle scenario are visually comparedin a noiseless and noisy case in respectively figure 5.1 and 5.2. This qualita-tive reconstruction difference has inspired the idea that another trained neuralnetwork might be able to combine the useful information of one kind of recon-struction with the useful information of another to form a new reconstructionthat features the best of both. This task could be described as a deblurring +denoising operation on the NNFBP reconstruction that is somehow ’guided’ bythe FBP reconstruction and therefore we have used a network inspired by theSuper-Resolution Convolutional Neural Network described in [8]. In this casethat would mean picking out the ’sharpness’ of the FBP reconstruction whileretaining the decreased occurrence of artifacts in the NNFBP reconstruction.This method therefore attempts to learn the mapping of the combination of alimited angle FBP reconstruction plus a limited angle NNFBP reconstructiontowards a many angle FBP reconstruction, based solely on examples in thetraining set.

5.1 Training Data

The process described in section 4.2 provides us with a set of limited angle FBPreconstructions and many angle FBP reconstructions. For the purpose of thisnetwork use case however we also need a set of NNFBP reconstructions to useas input alongside the FBP reconstructions. Since the sinograms that are con-verted into these NNFBP reconstructions should not have been seen earlier by

39

CHAPTER 5. MANY ANGLE ANALYTICAL TARGET 40

(a) 1000 angle sinogram (b) 1000 angle Ram-Lak FBP

(c) 50 angle sinogram

(d) 50 angle NNFBP (e) 50 angle Ram-Lak FBP

(f) mse in 5.1d (SNR = 13.2) (g) mse in 5.1e (SNR = 12.1)

Figure 5.1: NNFBP vs. FBP reconstructions resulting from noiseless 50 angleprojection data. 5.1a is the source for 5.1b. 5.1c is simply every twentieth rowof 5.1a and is the source for 5.1d and 5.1e. The mean squared error distributionsin 5.1f and 5.1g are computed relative to 5.1b.

.


(a) 1000 angle sinogram (b) 1000 angle Ram-Lak FBP

(c) 50 angle sinogram

(d) 50 angle NNFBP (e) 50 angle Ram-Lak FBP

(f) error in 5.2d (SNR = 5.1) (g) error in 5.2e (SNR = −6.8)

Figure 5.2: NNFBP vs. FBP reconstructions resulting from 50 angle projectiondata influenced by Poisson spot noise due to a finite photon count I0 = 1000.5.2a is the source for 5.2b, 5.2c is simply every twentieth row of 5.2a and is thesource for 5.2d and 5.2e. The error distributions in 5.2f and 5.2g are computedrelative to 5.2b.


NNFBP

Limited anglereconstruction

FBP

Limited anglereconstruction

CNN

FBP

Many anglereconstruction

Figure 5.3: Schematic representation to illustrate training sequence

the NNFBP network in its own training process, the NNFBP network is trainedon a different training set before using it to produce the inputs for our CNNtraining set. The NNFBP is trained as described in section 3.1 using prepro-cessed sinograms that are exponentially binned in accordance with equation 3.2with l = 3. An element of the training set for our CNN now consists of a twochannel image as input containing a limited angle NNFBP reconstruction in onechannel and a limited angle FBP reconstruction in the other, and a regular onechannel image as output containing a many angle FBP reconstruction. This ispictured in figure 5.3.

5.2 Results

Figure 5.4 displays a selection of example reconstructions done by networks withdifferent numbers of layers and filters after training them using the many angleanalytical target training procedure described earlier, along with their relevantFBP and NNFBP reconstructions. The corresponding phantoms are omitted inorder to save space, but to the human observer they look exactly like the 1000angle FBP reconstructions in figures 5.4a and 5.4c and can also be imaginedsince they are just the same transparent ellipses, but of exactly constant shadestacked on top of each other.


(a) 50 angle noiseless projection data (I0 =∞)

(b) 50 angle noisy projection data with I0 = 1000

(c) 143 angle noiseless projection data (I0 =∞)

(d) 143 angle noisy projection data with I0 = 10000)

Figure 5.4: A selection of CNN reconstructions from the test set with theircorresponding 1000 angle Ram-Lak FBP targets and limited angle Ram-Lak(NN)FBP inputs.



12

14

16

18

20

22

24

26

SNR p

hant

om 19.5525.64

19.6626.11

19.7426.60

19.8326.89 19.68

26.1119.9327.72

20.0428.28

20.1529.05 19.99

27.8620.1729.41

20.2330.48

20.1730.29 19.88

27.5520.1729.51

20.2329.59

20.3231.71

32 64 128 256 32 64 128 256 32 64 128 256 32 64 128 256

SNRphantomSNRtarget

#filters

13.04

NNFBP50

11.39

FBP50

20.67

FBP1000

SNRs with phantoms and many angle RamLak FBP targetsUsing noiseless projection data (I0 = )

50 angle FBP + 50 angle NNFBP 1000 angle FBP

26

27

28

29

30

31

SNRtarget

(a)

3layer 6layer 9layer 12layer0

5

10

15

20

25

SNR p

hant

om 17.9213.80

17.9913.85

18.0713.86

18.0413.84

18.4413.99

18.7314.11

18.6514.10

18.5614.09

18.4613.99

18.7114.11

18.8314.16

18.8514.20 18.50

14.0118.8314.17

18.8814.19

18.9414.21

32 64 128 256 32 64 128 256 32 64 128 256 32 64 128 256

SNRphantomSNRtarget

#filters

13.00

NNFBP50

0.89FBP50

13.30

FBP1000


13.85

13.90

13.95

14.00

14.05

14.10

14.15

14.20SNRtarget

(b)


10

5

0

5

10

15

20

25

SNR p

hant

om 13.505.58

13.585.59

13.505.58

13.475.59

13.885.63

14.065.64

13.995.64

13.905.64

14.095.65

14.175.65

14.175.65

14.285.67

14.105.63

14.165.65

14.195.65

14.455.67

32 64 128 256 32 64 128 256 32 64 128 256 32 64 128 256

SNRphantomSNRtarget

#filters

10.76

NNFBP50

-8.74

FBP50

4.11

FBP1000


5.58

5.60

5.62

5.64

5.66

SNRtarget

(c)

Figure 5.5: SNR values on testset of networks trained on Ram-Lak targetsreconstructed from 50 angle sinograms with different amounts of noise.



18

19

20

21

22

SNR p

hant

om 20.5436.77

20.5637.35

20.5737.66

20.5738.21 20.50

36.7320.5238.12

20.5737.17

20.5437.65

20.5636.50 20.50

35.9620.5737.81 20.49

35.0520.5436.68

20.5536.75

20.6037.67 20.50

34.63

32 64 128 256 32 64 128 256 32 64 128 256 32 64 128 256

SNRphantomSNRtarget

#filters17.91

NNFBP143

17.65

FBP143

20.67

FBP1000

SNRs with phantoms and many angle RamLak FBP targetsUsing noiseless projection data (I0 = )

143 angle FBP + 143 angle NNFBP 1000 angle FBP

35.0

35.5

36.0

36.5

37.0

37.5

38.0

SNRtarget

(a)

3layer 6layer 9layer 12layer5.0

7.5

10.0

12.5

15.0

17.5

20.0

22.5

25.0

SNR p

hant

om 18.0614.77

18.1514.78 17.97

14.7718.1414.81

18.2314.82

18.2314.83

18.2614.85

18.2314.87

18.2414.83

18.3614.88

18.3414.86

18.5014.89 18.26

14.8318.3314.86

18.2914.88

18.3614.85

32 64 128 256 32 64 128 256 32 64 128 256 32 64 128 256

SNRphantomSNRtarget

#filters

17.20

NNFBP143

5.50

FBP143

13.30

FBP1000


14.78

14.80

14.82

14.84

14.86

14.88SNRtarget

(b)


5

0

5

10

15

20

SNR p

hant

om

11.586.15

11.506.15

11.476.16

11.626.15

11.626.17

11.726.18

11.756.18

11.656.18

11.676.18

11.846.19 11.55

6.1811.80

6.1911.64

6.1711.81

6.18 11.586.18

11.576.19

32 64 128 256 32 64 128 256 32 64 128 256 32 64 128 256

SNRphantomSNRtarget

#filters

10.77

NNFBP143 -4.25

FBP143

4.11

FBP1000


6.155

6.160

6.165

6.170

6.175

6.180

6.185

6.190SNRtarget

(c)

Figure 5.6: SNR values on testset of networks trained on Ram-Lak FBP targetsreconstructed from 143 angle sinograms with different amounts of noise.


The reconstructions in 5.4 merely exemplify some of the network reconstruc-tions whose errors give rise to the bars in the example figure 4.3 and in figures5.5 and 5.6, but through visualization can grant a more intuitive insight in thespecific origins of the individual pixel value errors that add up to produce thesebars.

At first sight we can conclude from both the example reconstructions andthe bar charts that the CNN (first column in figure 5.4) is able to do a goodjob in approximating the many angle FBPs (second column) using only thelimited angle FBPs and NNFBPs (third and fourth columns) as input in thenoiseless scenarios, since in those cases the CNN’s reconstructions are virtuallyindistinguishable from their many angle FBP targets. High SNRtarget valuesand SNRphantom values that approach those of the FBP1000 bar in figures 5.5aand 5.6a confirm this.

It is the noisy scenarios in figures 5.4b and 5.4d though that inspire a closerlook. Comparing respectively the CNN’s reconstructions with its 1000 angleFBP targets, the differences are clearly visible and characterized mostly by anoise of lower amplitude and larger radius, and also less pronounced objectborders that still seem to deviate from their original shape.

Quantitative measures in figures 5.5b, 5.5c, 5.6b and 5.6c, even show the re-markable fact that all networks trained towards noisy many angle FBPs greatlyoutperform their own target method, corresponding to the FBP1000 bar, whenboth reconstruction methods are compared to the downscaled phantoms by theirSNRphantom values.

To explain this, we observe that only noisy data brings this effect as it isnot visible in 5.5a and 5.6a. Seemingly the network fails to learn the precisemapping from the limited angle noisy FBP reconstruction towards the 1000angle noisy FBP reconstruction and instead of ‘guessing’ the noise, settles withthe pixel values that fluctuate less overall and produce the lowest SNRtarget.

This behavior makes sense, since the noise that is present in the 1000 angleFBP reconstructions stems from noise in the 1000 angle noisy sinograms. Thisinformation is internally uncorrelated and absent from the leftover angles inthe limited angle sinograms that underly the FBPs, and therefore inaccessibleto the network. A reconstructed image with more guessed noise that mightlook more similar to the 1000 angle FBP for a human observer, would actuallyhave a lower SNRtarget than the images produced by our network that havepixel values closer to the average of its surroundings. This effect, althoughdependent on the image content, automatically produces an image with a lowertotal variation and in this way improves its SNRphantom to levels above themany angle FBP reconstructions. Thus the failure of the network to learn andpredict noise, something that might be considered disadvantageous or neutralinitially, proves to be advantageous in this sort of environment.

The reader might wonder wether the general succes of this method is indeedrelated to the ‘best of both worlds’ idea stated in the beginning of chapter5, or that the network secretly learns to make use of only one of the imageswhile more or less ignoring the other one, perhaps resulting effectively in asituation comparable to [14]. We have done a small sample check omitting


the NNFBP channel reconstruction from training and reconstructing with a 9layer, 256 starting filter network and compared it with the ordinary situation.In a I = ∞, I = 10000 and I = 1000 scenario the addition of the NNFBPnetwork accounted for respectively a 10%, 16% and 20% decrease in the MSE.This information has sufficed for us to consider the addition of the NNFBPreconstruction beneficial and significant in general.

It should be noted that this decrease in MSE could theoretically speakingoriginate from the fact that an additional 1000 images were available to theordinary FBP + NNFBP → FBP1000 training process as a whole when com-pared to this simpler FBP → FBP1000 training process, since for the trainingof our networks we have not used the exact same training sets. However, sincethe information presented to both ends of the NNFBP network is compressedso much already, and since the NNFBP network has only few parameters tobegin with, it seems unlikely that different images (of the same kind) producea significantly different walk through parameter space.

Chapter 6

Limited angle algebraictarget

6.1 Training Data

The method that maps limited to many angle FBP reconstructions describedin chapter 5 initially needs a substantial number of many angle sinograms totrain successfully. Although this problem disappears after the training phase,many angle sinograms are difficult to acquire in real life. Fortunately bettertomographic methods than FBP have been developed that are often used insituations where reconstruction quality takes priority over computation time orwhere FBP algorithms simply do not produce usable results due to a lack of use-ful information in the projection data. These algebraic or iterative algorithms,earlier explained in section 2.1.2, usually perform remarkably better in limitedangle and noisy scenarios when compared to the FBP method. It remains anopen question whether it is possible to come to the exact outcome of an iterativereconstruction algorithm without performing all the iterations, but being ableto save the often immense amount of computation time needed by making anapproximation of the result might be good enough for practical purposes andfor this reason it is useful to try out the neural network approach.

In this research project two algebraic methods are explored for the produc-tion of limited angle reconstructions to use as target outputs for the training ofour CNN: the Simultaneous Iterative Reconstruction Technique (SIRT) illumi-nated earlier in section 2.1.2 and the Total Variation MINimization (TV-MIN)method. In short, TV-MIN is simply SIRT with a regularization function addedto the error function that hinders the formation of high total variation artifactsand reduces the number of possible solutions that fit the system of equations.The input image that is mapped to the target is again a limited angle FBP re-construction, but this time the additional NNFBP reconstruction in the secondinput channel is left out for simplicity and time reasons. SIRT reconstructionsare created using the ASTRA Toolbox and TV-MIN reconstructions are created

48

CHAPTER 6. LIMITED ANGLE ALGEBRAIC TARGET 49

using PyTV-TOMO [19], an embodiment of the FISTA [4] algorithm.Since it does not apply a regularization function in between iterations the

SIRT algorithm is faster when compared to TV-MIN. In low noise scenarioshowever, SIRT can still produce reconstructions of a quality comparable to oreven better than TV-MIN since there is a more direct relationship with theprojection data. In scenarios with more noise the SIRT method cannot keep upwith TV-MIN quality-wise, but in such an environment a sufficiently good andfast SIRT approximator could still be used in combination with a regularizationfunction to speed up the TV-MIN reconstruction process. Both CNN assistedSIRT approximation and CNN assisted TV-MIN approximation are thereforeuseful areas to explore.

To make a fair comparison between our network and the NNFBP methodin [20], an NNFBP-style network is also trained towards single SIRT and TV-MIN target pixels instead of single many angle FBP pixels. SNR values forthese networks are labeled respectively NNSIRT and NNTVMIN in the barchart figures. These NNTVMIN and NNSIRT networks work exactly the sameas the NNFBP network internally, the only difference lies in their training set.NNFBP, NNSIRT and NNTVMIN therefore have equal computational cost andtraining times.

The results of using our CNN to map limited angle FBP reconstructionstowards limited angle SIRT and TV-MIN reconstructions are described in re-spectively section 6.2 and 6.3.

6.2 SIRT

Figure 6.1 gives an impression of the computational cost of the SIRT reconstruc-tion process. SIRT reconstruction times are displayed for a number of differentiterations and projection angles, all performed on sinograms with 729 horizontalpixels or detector values using the ASTRA-Toolbox and CUDA with the sameNvidia GeForce GTX Titan X GPU that was used to produce the computationtimes displayed in figure 4.2.

Because the system of equations that is to be solved by the SIRT algo-rithm can be underdetermined or inconsistent and the algorithm goes throughthe equations one by one, increasing the number of SIRT iterations does notnecessarily result in a better reconstruction. Especially in noisy scenarios it isimportant to stop the process in time in order to prevent exorbitant formationof artifacts. The number of iterations needed to arrive at an optimal recon-struction depends on the amount, quality and content of the projection dataand is difficult to determine exactly. In this research the number of SIRT it-erations is heuristically chosen to be 50 for the noisy scenarios of figures 6.3b,6.3c, 6.4b and 6.4c and 500 in the noiseless scenarios in 6.3a and 6.4a. A 50iteration noiseless SIRT approximation has not been attempted, since a working500 iteration approximation would be more useful than a working 50 iterationscenario in most cases where the reconstruction quality does not decline at thishigher number of iterations.


50 100 200# iterations

200

400

600

800

reco

nstru

ctio

n tim

e (m

s)

# projection angles501431000

Figure 6.1: Durations of SIRT reconstructions with 729 detector pixels and 50,143 or 1000 projection angles. All reconstructions are run with the ASTRA-Toolbox [18, 2, 1] on a system using one Nvidia GeForce GTX Titan X GPUdriven by CUDA [7].


The SIRT method can already be approximated well in a completely differentway by FBP using a specifically optimized filter [21] instead of the Ram-Lakfilter that is used in this research. By implementing an additional non-negativityconstraint in our SIRT algorithm that sets all negative pixel values to zeroin between iterations, we increase the complexity of the relation between thealgorithm’s in- and output to such an extent that it becomes more difficultto approximate using this special FBP filter, thereby increasing the merit ournetwork could bring if it were to correctly learn this relationship. Apart fromincreasing the complexity between in- and output this constraint should alsoimprove reconstruction quality with respect to the phantom.

6.2.1 Results

Figure 6.2 shows example reconstructions done by a small selection of CNNnetworks trained with SIRT targets. As in figure 5.4, to save space the cor-responding phantoms are left out but can be imagined as a collection of cleanedgy transparent single shade ellipses stacked on top of each other very similarto the 143 angle SIRT reconstruction in figure 6.2a. A first glance at figure 6.2indicates that the network is quite capable of learning a meaningful relationshipbetween its computationally cheap limited angle analytical FBP reconstructioninputs and its computationally expensive limited angle algebraic SIRT targets.Even in the seemingly difficult high noise scenario depicted in 6.2b the net-work comes up with a reconstruction that is virtually indistinguishable fromthe reconstruction produced by the algorithm it tries to approximate.

6.3 TV-MIN

The computation of TV-MIN reconstructions in this research has been done ona CPU and not, like the SIRT and network reconstructions, on a faster GPU.Actual TV-MIN computation times would therefore be meaningless in an at-tempt to compare computational costs between the methods mentioned in thisresearch. However, when running both TV-MIN and SIRT reconstructions onthe same CPU and not the GPU used earlier, TV-MIN reconstructions takeabout 2.2, 1.9 and 1.6 times as long as SIRT reconstructions with the sameamount of iterations when applied to sinograms with 729 detector pixels andrespectively 50, 143 and 1000 projection angles. These values do not depend onthe number of iterations. This should give a rough indication of the computa-tional cost of a TV-MIN reconstruction.

Since PyTV-TOMO is run on a CPU, to limit the time needed for creatingthe training set all TV-MIN reconstructions are done with 50 iterations. TheTV-MIN parameter λ in PyTV-TOMO that determines the weight of the totalvariation punishment is set to λ = 2−8. The value of this parameter has sig-nificant effects on the outcome of the TV-MIN reconstructions and the SNRresults presented in this part might depend on it strongly, but this dependencehas not been investigated.




(c) 143 angle noiseless projection data (I0 =∞)

Figure 6.2: A selection of CNN reconstructions from the test set with theircorresponding limited angle 50(0) iteration SIRT targets and limited angle Ram-Lak FBP inputs.



12

14

16

18

20

22

24

26

SNR p

hant

om

18.3926.98

18.5527.30

18.4727.22

18.5227.52

18.6327.49

18.5627.44

18.7629.25

18.8228.74

18.7428.26

18.7629.44 18.60

29.1118.7529.67

18.7528.80

18.7328.47

18.7829.47

18.7529.59

32 64 128 256 32 64 128 256 32 64 128 256 32 64 128 256

SNRphantomSNRtarget

#filters

12.54

NNSIRT50

11.39

FBP50

20.67

FBP1000

18.73

SIRT50

SNRs with phantoms and SIRT targetsUsing noiseless projection data (I0 = )

50 angle FBP 50 angle SIRT (500 iterations)

27.0

27.5

28.0

28.5

29.0

29.5

SNRtarget

(a)


5

10

15

20

25

SNR p

hant

om 15.6921.29

15.9120.82

15.9520.91 15.62

21.4215.8224.78

15.8424.89 15.13

25.0615.6425.95

15.5027.85

15.5128.12 15.25

28.6215.3428.49

15.3025.37

15.3429.23

15.2029.66

15.3230.22

32 64 128 256 32 64 128 256 32 64 128 256 32 64 128 256

SNRphantomSNRtarget

#filters

12.85

NNSIRT50

0.89FBP50

13.30

FBP1000

15.21

SIRT50

SNRs with phantoms and SIRT targetsUsing noisy projection data with I0 = 10000


22

24

26

28

30SNRtarget

(b)


10

5

0

5

10

15

20

25

SNR p

hant

om 11.9716.16

11.9616.16

11.9416.23

12.0416.24

12.7619.82

12.6019.97

12.5319.99

12.7920.04

12.6821.20

12.6321.49

12.6621.43

12.5420.69

12.5821.61

12.7221.49 12.38

21.6012.6121.74

32 64 128 256 32 64 128 256 32 64 128 256 32 64 128 256

SNRphantomSNRtarget

#filters

10.91

NNSIRT50

-8.74

FBP50

4.11

FBP1000

12.13

SIRT50



17

18

19

20

21

SNRtarget

(c)

Figure 6.3: SNR values on testset of networks trained on 50(0) iteration SIRTtargets reconstructed from 50 angle sinograms with different amounts of noise.



18

19

20

21

22

SNR p

hant

om 20.4138.74

20.4239.09

20.3838.77

20.4438.47 20.38

39.0520.3640.07

20.4137.15

20.4437.80

20.4136.75

20.4038.07

20.3938.77 20.32

37.5220.4138.26 20.35

38.4320.3639.06 20.31

37.92

32 64 128 256 32 64 128 256 32 64 128 256 32 64 128 256

SNRphantomSNRtarget

#filters

18.04

NNSIRT143

17.65

FBP143

20.67

FBP1000

20.47

SIRT143

SNRs with phantoms and SIRT targetsUsing noiseless projection data (I0 = )


37.0

37.5

38.0

38.5

39.0

39.5

40.0SNRtarget

(a)


7.5

10.0

12.5

15.0

17.5

20.0

22.5

SNR p

hant

om 16.3723.60

16.3123.63

16.3023.64

16.7222.73 16.02

26.8415.9827.48

16.0527.70

15.9327.90

15.9130.12 15.70

30.3715.8329.71

15.8530.68 15.63

29.7615.7731.00

15.8230.20

15.9431.11

32 64 128 256 32 64 128 256 32 64 128 256 32 64 128 256

SNRphantomSNRtarget

#filters

15.32

NNSIRT143

5.50

FBP143

13.30

FBP1000

15.75

SIRT143



23

24

25

26

27

28

29

30

31SNRtarget

(b)


0

5

10

15

20

25

SNR p

hant

om

14.4918.83

14.4418.86

14.3918.89

14.5218.89

14.3721.43

14.2921.60

14.2121.42

14.2921.58

14.2322.43

14.0822.52

14.1122.66

14.2222.43

14.1822.69

14.1922.62

14.2622.98 14.01

22.77

32 64 128 256 32 64 128 256 32 64 128 256 32 64 128 256

SNRphantomSNRtarget

#filters

13.86

NNSIRT143 -4.25

FBP143

4.11

FBP1000

13.69

SIRT143



19.0

19.5

20.0

20.5

21.0

21.5

22.0

22.5

SNRtarget

(c)

Figure 6.4: SNR values on testset of networks trained on 50(0) iteration SIRTtargets reconstructed from 143 angle sinograms with different amounts of noise.


6.3.1 Results

Figure 6.5 shows example reconstructions done by a selection of CNN networkstrained with TV-MIN targets. As in the SIRT case, the network reconstructionsand its TV-MIN targets are much alike, even in the high noise case of figure 6.5bwhere details are barely visible. To the attentive eye figure 6.5a still displayssome traces of FBP stripe artifacts, showing that even in noiseless scenarios thenetwork is not always able to discern artifacts from content as well as the targetmethod used for training.

Bar charts 6.6 and 6.7 in the appendix depict the network’s achievements inapproximating its TV-MIN targets in the ellipse training set.




(c) 143 angle noisy projection data with I0 = 10000

Figure 6.5: A selection of CNN reconstructions from the test set with their corre-sponding limited angle 50 iteration TV-MIN targets and limited angle Ram-LakFBP inputs.



12

14

16

18

20

22

24

26

SNR p

hant

om

16.7825.61

16.7225.82

16.8325.95

16.7625.99

16.7030.72

16.6431.39

16.6930.12

16.6130.94

16.6733.15 16.47

31.4716.6534.43

16.6835.44

16.7033.49 16.56

33.9016.6836.37

16.6235.62

32 64 128 256 32 64 128 256 32 64 128 256 32 64 128 256

SNRphantomSNRtarget

#filters

13.33

NNTVMIN50

11.39

FBP50

20.67

FBP1000

16.68

TVMIN50

SNRs with phantoms and TVMIN targetsUsing noiseless projection data (I0 = )

50 angle FBP 50 angle TVMIN (50 iterations)

26

28

30

32

34

36SNRtarget

(a)


5

10

15

20

25

SNR p

hant

om 16.4722.27

16.5022.31

16.4722.31

16.5022.34

16.4127.71

16.6727.89 16.43

28.0516.5928.41

16.5231.92

16.5632.38

16.5732.92

16.5433.33

16.4931.93

16.5234.06

16.5234.47

16.4732.99

32 64 128 256 32 64 128 256 32 64 128 256 32 64 128 256

SNRphantomSNRtarget

#filters

12.86

NNTVMIN50

0.89FBP50

13.30

FBP1000

16.57

TVMIN50

SNRs with phantoms and TVMIN targetsUsing noisy projection data with I0 = 10000


24

26

28

30

32

34SNRtarget

(b)


10

5

0

5

10

15

20

25

30

SNR p

hant

om 13.7416.51

13.7116.52

13.7516.52

13.7916.56

15.3122.79

15.1621.99

15.3822.56 14.82

22.0515.5826.57

15.6626.38

15.5327.10

15.5927.51

15.7128.53

15.6229.65

15.6730.35

15.7531.20

32 64 128 256 32 64 128 256 32 64 128 256 32 64 128 256

SNRphantomSNRtarget

#filters

12.47

NNTVMIN50

-8.74

FBP50

4.11

FBP1000

15.71

TVMIN50



18

20

22

24

26

28

30

SNRtarget

(c)

Figure 6.6: SNR values on testset of networks trained on 50 iteration TVMINtargets reconstructed from 50 angle sinograms with different amounts of noise.



17

18

19

20

21

22

23

SNR p

hant

om

16.9828.08

17.2127.50 17.00

27.7517.0028.38 16.89

31.64 16.8033.01

16.7933.60

16.8033.20

16.8332.94

16.8435.60

16.8035.99

16.7636.41

16.8135.02

16.7735.43

16.7934.80 16.72

34.0632 64 128 256 32 64 128 256 32 64 128 256 32 64 128 256

SNRphantomSNRtarget

#filters16.18

NNTVMIN143

17.65

FBP143

20.67

FBP1000

16.78

TVMIN143

SNRs with phantoms and TVMIN targetsUsing noiseless projection data (I0 = )


28

29

30

31

32

33

34

35

36SNRtarget

(a)


7.5

10.0

12.5

15.0

17.5

20.0

22.5

SNR p

hant

om 17.0925.03

17.3124.84 17.03

25.6117.1025.65 16.90

30.4916.8130.89

16.8431.26

16.8630.58

16.7933.94

16.7534.15

16.7934.41

16.6934.30

16.7234.01

16.7133.95

16.5931.92

16.8132.12

32 64 128 256 32 64 128 256 32 64 128 256 32 64 128 256

SNRphantomSNRtarget

#filters

15.81

NNTVMIN143

5.50

FBP143

13.30

FBP1000

16.74

TVMIN143



26

28

30

32

34SNRtarget

(b)


0

5

10

15

20

25

SNR p

hant

om

16.0820.59

16.0120.68

16.0320.70

16.0920.67

16.2225.87

16.4226.57

16.4426.43 16.05

26.0816.3330.21

16.3530.43

16.3530.93

16.3631.11

16.2127.20

16.1530.25

16.3933.47

16.3133.14

32 64 128 256 32 64 128 256 32 64 128 256 32 64 128 256

SNRphantomSNRtarget

#filters

14.80

NNTVMIN143 -4.25

FBP143

4.11

FBP1000

16.38

TVMIN143



22

24

26

28

30

32

SNRtarget

(c)

Figure 6.7: SNR values on testset of networks trained on 50 iteration TVMINtargets reconstructed from 143 angle sinograms with different amounts of noise.

Chapter 7

Conclusion

7.1 General Results

The convolutional neural network approach to approximating existing higherquality tomographic methods from limited angle FBP reconstructions generallyworks well in all usage scenarios. For practical use the best choice of networkdepends on properties of the usage scenario and not solely on simple networkcharacteristics such as depth, number of filters and computation time. In fact,often these characteristics have little effect on the final reconstruction quality.

7.1.1 Correlations between SNRs and network capacity

In all cases except for the noiseless 143 angle SIRT approximations and the noise-less many angle FBP approximations an expected trend is visible where highercapacity networks obtain higher SNRtarget values. In other words, high capacitynetworks are usually better able to approximate their targets than low capacitynetworks. The relationship between network depth and the SNRphantom is lessobvious.

Throughout the results in the appendix we see that the SNRtarget valuesin the test set do not necessarily reflect their SNRphantom values. When vary-ing network depths and numbers of filters, sometimes only a minor increase inthe SNRtarget comes with a relatively major increase in the SNRphantom (e.g.figure 5.5c). Sometimes a major increase in the SNRtarget comes with onlya minor increase in the SNRphantom (5.6a, 6.3c). And sometimes a major in-crease in the SNRtarget even comes with a decrease in the SNRphantom (6.3b,6.4c, 6.7a). In these last examples, since the SNRtarget values do increase withnetwork capacity as expected, a negative correlation between network capacityand SNRphantom emerges. It appears as if deepening the network or increasingits number of filters can sometimes have a detrimental rather than a benifi-cial effect. Restricting the network’s capacity might therefore in these cases beunderstood as a form of regularization: by hindering the network in learninghow the more intricate properties of limited angle FBPs relate to the unwanted

59

CHAPTER 7. CONCLUSION 60

artifacts in its target outputs, these artifacts don’t end up in the network’sprediction and thereby increase the SNRphantom.

Related to this might be the fact that for a given Nθ, SNRtarget valuesgrow slower with network capacity when the networks are trained on data withmore noise than when they are trained on data with less noise, something thatcan easily be observed by comparing only the scales of the legend SNRtargetbars on the right of each bar chart. Apparently during the processing of noisierreconstructions a ceiling, after which the network is unable to make up forthe lack of information presented to it, is more easily reached and cannot besurpassed by adding parameters in the form of filters or network depth.

7.1.2 Correlation between SNRphantom and SNRtarget

In all of the scenarios we observe an information dependent approximation qual-ity ; generally speaking, both analytically and iteratively obtained reconstruc-tions can be approximated better by the network when they are created fromsinograms that have more useful information readily available due to eithera smaller amount of noise or a higher number of projection angles. In otherwords, there is a positive correlation between SNRtarget and Nθ, and betweenSNRtarget and I0. Since here we are comparing network reconstructions witheach other and with the reconstructions they are trained on, and not with theoriginal phantoms, this is not necessarily an obvious observation.

Focusing again solely on the SNRtarget values, we see a similar thing happenbetween TV-MIN approximation networks and SIRT approximation networks:For a given amount of noise and number of projection angles, the TV-MIN re-constructions are without exception approximated better than their SIRT coun-terparts. This is likely the case because the relation between FBP and SIRTartifacts is too complex or even impossible to learn, resulting in SIRT recon-struction artifacts that the network cannot predict from their correspondingFBP reconstructions. Since the TV-MIN algorithm is designed to remove theseoften high total variation inducing artifacts, in the TV-MIN approximation thenetwork is asked to perform a task more similar to noise reduction instead ofa transformation between two different kinds of noise, which is arguably morestraightforward.

The traditional reconstruction method that has the highest SNRphantom, theSIRT reconstruction resulting from a 143 angle sinogram after 500 iterations,also has the highest scoring network in terms of SNRtarget.

From the three observations just mentioned, it seems that the mapping be-tween limited angle FBPs and target reconstructions is easier to learn for thenetwork when the targets are closer to the phantom, which happens either be-cause the method we try to approximate is better in its specific environment,or because it has more quality information at its disposal.

However, as the original sinograms used to create the network’s target recon-structions are not directly available to the network itself, this could in principleat least in part be caused by the information loss coming with the conversion


from sinogram to FBP that is performed before the network even sees any in-formation.

7.1.3 Outperforming the target methods

In section 5.2 we learned that the network likely outperformed its 1000 angleFBP target method because it was not given the same (corrupted) input in-formation. The networks trained on limited angle algebraic targets of figures6.2 and 6.5 show no clearly visible discrepancy between noisy and noiseless sce-narios, as is to be expected since the limited angle iterative reconstructions arecreated from exactly the same data as the network’s input FBPs.

But although not as strongly, the bar charts 6.3 and 6.4 in the appendixshow that again, in all scenarios there is at least one network that outperformsits own target method in terms of the SNRphantom. This might be partiallyexplained by the artifacts in algebraic reconstructions that are difficult or im-possible to relate to the limited angle FBP reconstruction. By failing to recreatethese artifacts the network might in this way produce an image with a higherSNRphantom.

Something else to consider is the fact that in a real world scenario, theSNRphantom values would not have been available and the best network choicewould have been the network with the highest SNRtarget value, which moreoften than not is a different network. The fact that some of the networksoutperform their target methods could therefore in part be attributed to aphenomenon of a more statistical nature.

In all scenarios the CNN outperforms the NNFBP (and NNSIRT and NNTVMIN)method and regular FBP reconstructions with the same input at their disposal.

7.2 Conclusion

Looking at the bar charts and the example reconstructions we can concludethat the types of neural networks considered are capable of mapping limitedangle FBP reconstructions to many angle FBP reconstructions of the ellipsedata type. In cases where enough many angle training data is available, such anetwork could provide a valuable alternative to limited angle FBP reconstruc-tions in terms of quality, and to algebraic reconstruction methods in terms ofcomputation time.

Similarly we can see that limited angle algebraic SIRT and TV-MIN recon-structions of the ellipse data type can be closely approximated from limitedangle FBP reconstructions, providing an alternative that does not need specialhigh quality training data at any point during the process.

However, since a lot is yet to be known in the field of neural network learn-ing, this neural network approximation method remains essentially a black box.Although a box deterministic in nature, any not previously seen input or typeof input could theoretically produce unexpected results e.g. due to overfitting.This risk can be lowered by increasing the size and variety of the training set,


but can not be eliminated without a better understanding of the mechanismsunderlying the training process.

Following the hypothesis that the complexity of the relation between inputand output decreases when the output resembles the original phantom more,we could expect the network’s approximation quality to improve along with theaccuracy of the used reconstruction method for target creation. Since neuralnetwork computation times are fixed, the added value of a neural network ap-proximator over the method it approximates increases with the computationtime needed by this original method. A hypothetical method producing recon-structions that are almost perfect but that is too slow for practical purposeswould probably be approximated even better by our neural network than theones used in this research. A fast neural network assisted approximator of thismethod could then turn this hypothetical method from practically useless intoa useful invention that can be run on any home computer.

7.3 Discussion

It should be noted that due to the stochastic nature of the network’s parameterinitiation, final SNR results as presented in the appendix will always fluctuatedepending on the random seed used at the start and as such comparisons be-tween individual networks should be done carefully. Comparisons of groups ofnetworks based on network depth or number of filters are more significant. It isgenerally believed that high capacity networks are more difficult to train thanlow capacity ones. So, although skip connections should decrease the effects ofthis problem, it is still likely that higher capacity networks are more susceptibleto this random seed dependent fluctuation than lower capacity networks. Thisissue could easily be resolved by training the networks a number of times andtaking the average SNRs on the test set, but this hasn’t been done.

As mentioned above, another simple way to improve on the concept of thisresearch is by increasing the size and variety of the images in the training set.

An interesting path any future research might want to focus on lies in thevariation of some of the other properties that characterize our tomographic data.The reduction of radiation exposure by limitation of the number of detectorscould be investigated as an alternative to limitation of the number of projectionangles, but since detectors are relatively cheap this investigation would be ofa mostly theoretical nature. Since noisy images result from projection datarecorded with low radiation exposure, a neural network that is able to succesfullyand reliably map noisy data towards less noisy data would have more practicalbenefit.

Bibliography

[1] Wim van Aarle et al. “Fast and flexible X-ray tomography using theASTRA toolbox”. In: Opt. Express 24.22 (Oct. 2016), pp. 25129–25147.doi: 10.1364/OE.24.025129. url: http://www.opticsexpress.org/abstract.cfm?URI=oe-24-22-25129.

[2] Wim van Aarle et al. “The ASTRA Toolbox: A platform for advancedalgorithm development in electron tomography”. In: Ultramicroscopy 157(2015), pp. 35–47. issn: 0304-3991. doi: http://dx.doi.org/10.1016/j.ultramic.2015.05.002. url: http://www.sciencedirect.com/

science/article/pii/S0304399115001060.

[3] F. Andersson, M. Carlsson, and V. V. Nikitin. “Fast algorithms andefficient GPU implementations for the Radon transform and the back-projection operator represented as convolution operators”. In: ArXiv e-prints (May 2015). arXiv: 1506.00014 [math.NA].

[4] Amir Beck and Marc Teboulle. “A Fast Iterative Shrinkage-ThresholdingAlgorithm for Linear Inverse Problems”. In: SIAM Journal on Imag-ing Sciences 2.1 (2009), pp. 183–202. doi: 10.1137/080716542. eprint:https://doi.org/10.1137/080716542. url: https://doi.org/10.1137/080716542.

[5] Sharan Chetlur et al. “cuDNN: Efficient Primitives for Deep Learning”.In: CoRR abs/1410.0759 (2014). arXiv: 1410.0759. url: http://arxiv.org/abs/1410.0759.

[6] Francois Chollet et al. Keras. https://github.com/keras-team/keras.2015.

[7] Develop, Optimize and Deploy GPU-accelerated Apps. https://developer.nvidia.com/cuda-toolkit. Accessed: 2018-03-29.

[8] Chao Dong et al. “Image Super-Resolution Using Deep Convolutional Net-works”. In: CoRR abs/1501.00092 (2015). url: http://arxiv.org/abs/1501.00092.

[9] T. H. Farquhar et al. “An investigation of filter choice for filtered back-projection reconstruction in PET”. In: IEEE Transactions on NuclearScience 45.3 (June 1998), pp. 1133–1137. issn: 0018-9499. doi: 10.1109/23.681991.

63

BIBLIOGRAPHY 64

[10] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning.http://www.deeplearningbook.org. MIT Press, 2016.

[11] Richard H. R. Hahnloser, H. Sebastian Seung, and Jean-Jacques Slo-tine. “Permitted and Forbidden Sets in Symmetric Threshold-Linear Net-works”. In: Neural Computation 15.3 (2003), pp. 621–638. doi: 10.1162/089976603321192103. eprint: http://dx.doi.org/10.1162/089976603321192103.url: http://dx.doi.org/10.1162/089976603321192103.

[12] Richard H. R. Hahnloser et al. “Digital selection and analogue amplifi-cation coexist in a cortex-inspired silicon circuit”. In: Nature 405.6789(June 2000), pp. 947–951. issn: 0028-0836. doi: 10.1038/35016072. url:http://dx.doi.org/10.1038/35016072.

[13] Kaiming He et al. “Deep Residual Learning for Image Recognition”. In:CoRR abs/1512.03385 (2015). url: http : / / arxiv . org / abs / 1512 .

03385.

[14] Kyong Hwan Jin et al. “Deep Convolutional Neural Network for InverseProblems in Imaging”. In: CoRR abs/1611.03679 (2016). url: http://arxiv.org/abs/1611.03679.

[15] Avinash C. Kak and Malcolm Slaney. Principles of Computerized Tomo-graphic Imaging. IEEE, 1999.

[16] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. “Deep learning”. In:Nature 521.7553 (May 2015). Insight, pp. 436–444. issn: 0028-0836. url:http://dx.doi.org/10.1038/nature14539.

[17] mrzezo. Tomography and panoramic radiography. https://pocketdentistry.com/12-tomography-and-panoramic-radiography/. Accessed: 2018-03-20.

[18] W.J. Palenstijn, K.J. Batenburg, and J. Sijbers. “Performance improve-ments for iterative electron tomography reconstruction using graphics pro-cessing units (GPUs)”. In: Journal of Structural Biology 176.2 (2011),pp. 250–253. issn: 1047-8477. doi: http://dx.doi.org/10.1016/j.jsb.2011.07.017. url: http://www.sciencedirect.com/science/article/pii/S1047847711002267.

[19] Daniel M. Pelt. PyTV-TOMO, a Python library for Total Variation min-imization in tomography. https://github.com/dmpelt/pytvtomo. 2016.

[20] Daniel Maria Pelt and Kees Joost Batenburg. “Fast tomographic recon-struction from limited data using artificial neural networks”. In: IEEETransactions on Image Processing 22.12 (2013), pp. 5238–5251.

[21] Linda Plantagie and Kees Batenburg. “Approximating Algebraic Tomog-raphy Methods by Filtered Backprojection: A Local Filter Approach”. In:135 (Nov. 2014), pp. 1–19.

[22] C. E. Shannon. “Communication in the Presence of Noise”. In: Proceedingsof the IRE 37.1 (Jan. 1949), pp. 10–21. issn: 0096-8390. doi: 10.1109/JRPROC.1949.232969.

BIBLIOGRAPHY 65

[23] Matthew D. Zeiler. “ADADELTA: An Adaptive Learning Rate Method”.In: CoRR abs/1212.5701 (2012). url: http://arxiv.org/abs/1212.5701.

Convolutional Neural Network Assisted Limited Angle …1.2 Arti cial Neural Networks An increasingly...

Documents

Transcript of Convolutional Neural Network Assisted Limited Angle …1.2 Arti cial Neural Networks An increasingly...