A Uniﬁed Approach to Statistical Tomography Using...

A Unified Approach to Statistical Tomography Using

Coordinate Descent Optimization †‡

Charles A. Bouman

School of Electrical Engineering

Purdue University

West Lafayette, IN 47907-0501

(317) 494-0340

Ken Sauer

Laboratory for Image and Signal Analysis

Department of Electrical Engineering

University of Notre Dame

Notre Dame, IN 46556

(219) 631-6999

December 4, 1996

Abstract 1

Over the past ten years there has been considerable interest in statistically optimal re-construction of image cross-sections from tomographic data. In particular, a variety of suchalgorithms have been proposed for maximum a posteriori (MAP) reconstruction from emis-sion tomographic data. While MAP estimation requires the solution of an optimizationproblem, most existing reconstruction algorithms take an indirect approach based on theexpectation maximization (EM) algorithm.

In this paper we propose a new approach to statistically optimal image reconstructionbased on direct optimization of the MAP criterion. The key to this direct optimizationapproach is greedy pixel-wise computations known as iterative coordinate decent (ICD). Weshow that the ICD iterations require approximately the same amount of computation periteration as EM based approaches, but the new method converges much more rapidly (in ourexperiments typically 5 iterations). Other advantages of the ICD method are that it is easilyapplied to MAP estimation of transmission tomograms, and typical convex constraints, suchas positivity, are simply incorporated.

IP EDICS #2.3 Computed Imaging: Tomography

†This work was supported by the National Science Foundation under Grant no. MIP93-00560, Electricitede France under Grant no. P21L03/2K3208/EP542, and an NEC Faculty Fellowship‡IEEE Trans. on Image Processing, vol. 5, no. 3, pp. 480-492, March 1996.1Permission to publish this abstract separately is granted.

1

1 Introduction

In the past decade, emission tomography and other photon-limited imaging problems have

benefited greatly from the introduction of statistical methods of reconstruction. Unlike

the relatively rigid deterministically-based methods such as filtered backprojection, statis-

tical methods can be applied without modification to data with missing projections or low

signal-to-noise ratios. This makes statistical reconstruction methods well suited to emission

problems, or transmission tomograms of dense materials. The Poisson processes in emission

and transmission tomography invite the application of maximum-likelihood (ML) estimation,

simply choosing the parameters in the discretized reconstruction which best match the data.

Due to the typical limits in fidelity of data, however, ML estimates are unstable, and have

been improved upon by regularized estimation, such as maximum a posteriori probability

(MAP) estimation [1], or the method of sieves [2].

Both the ML and MAP reconstructions may be formulated as the solution to an optimiza-

tion problem. However, this optimization problem is a formidable numerical task due to both

the number of parameters in the estimate (pixels or voxels) and the number of observations

(photon counting measurements). Since the work of Shepp and Vardi[3], the method of choice

for finding ML estimates in emission tomography has been the expectation-maximization

(EM) algorithm. The EM algorithm is based on the notion of a set of “complete” data

which, if available, would make the problem easier.

For the emission tomography problem, the complete data are usually the number of

photons emitted from each cell in the discretized reconstruction. While the use of such a

large complete data set appears to simplify the computation of the ML estimate, Fessler

and Hero[4] have shown that a large or “informative” complete data space also slows down

convergence. This suggests that the fastest convergence should be achieved by using the

actual observations as the complete data set, which is equivalent to direct optimization of

the ML or MAP functionals.

2

In the Bayesian problem, the EM approach is less simple to apply to emission tomographic

reconstruction. This is because the maximization step has no closed form and itself requires

the use of an iterative optimization technique. To address this problem, many approaches

have been proposed, but all of them approximate the otherwise intractable maximization

step. For the case of Gaussian prior densities, Liang and Hart[5, 6], Herman and Odhner[7]

and Herman, De Pierro, and Gai [8] have modified the EM approach to include Bayesian

estimation. A variety of methods have also been proposed for adapting the EM algorithm

to more general Markov random field priors. These methods include the Generalized EM

(GEM) algorithm proposed by Hebert and Leahy[9, 10], the one-step-late (OSL) method

proposed by Green[11], and a more general form of De Pierro’s method[12]. Most recently,

Fessler and Hero have proposed SAGE [13], a collection of methods designed to minimize

the complete data with each pixel update in EM reconstruction.

For the transmission problem, the EMmethods are more difficult to apply because there is

not such an obvious choice for the complete data space[14]. Ollinger used the EM approach

to solve the transmission problem and found that convergence required from 200 to 2000

iterations[15].

In this paper, we expand on the work first presented in [16] and take a direct optimiza-

tion approach to the problem of MAP image reconstruction of emission and transmission

tomograms. It is interesting to note that in spite of their relative simplicity, such direct

optimization methods do not seem to have been investigated for exact statistical image re-

construction with Poisson measurement noise. However, we show that direct optimization

is quite tractable for this problem. Moreover, a direct approach to optimization without

the use of EM allows one to use a wide range of efficient optimization algorithms to achieve

fast convergence. It may also be viewed as maximizing convergence speed by employing a

minimally informative complete data set.

Our approach to optimization is based on the sequential greedy optimization of pixel (or

3

voxel) values in the reconstruction. This method, which we refer to as iterative coordinate

descent (ICD) 2 goes by a variety of other names including iterated conditional modes (ICM)

for MAP segmentation [17], and Gauss-Seidel (GS) for partial differential equations [18]. All

these algorithms work by iteratively updating individual pixels or coordinates to minimize

a cost functional. The ICD method was first applied to least squares Bayesian tomographic

reconstruction by Sauer and Bouman [19]. More recently, Fessler has applied ICD to the

least squares emission problem and shown that underrelaxation methods can speed ICD

convergence[20].

The ICD method has a number of important advantages which make it a good choice for

direct optimization. First, ICD can be efficiently applied to the log likelihood expressions re-

sulting from photon limited imaging systems. In fact, each ICD iteration is similar in nature

and computational complexity to an iteration of the EM algorithm; however, the function we

attack is the posterior probability function rather than the Q function of EM. Second, the

ICD algorithm is demonstrated to converge very rapidly (in our experiments typically 5-10

iterations) when initialized with the filtered back projection (FBP) reconstruction. This is

not surprising since the FBP reconstruction is accurate at low spatial frequencies, and in

[19] we showed that the ICD method (also known as Gauss-Seidel) has rapid convergence

at high spatial frequencies. The third important advantage of the ICD algorithm is that

it easily incorporates convex constraints, and non-Gaussian prior distributions 3. In par-

ticular, positivity is an important convex constraint which can both improve the quality of

reconstructions and significantly speed numerical convergence[21]. Non-Gaussian prior dis-

tributions are also important since they can substantially reduce noise while preserving edge

detail[11, 22, 23, 24].

Our analysis starts with an approximate analysis of the optimization problem based

on a Taylor expansion of the log likelihood function. This analysis is important for two

2We choose to use the name ICD since it is most descriptive of the algorithmic approach.3We note that convex constraints, such as positivity, may be thought of as a special type of non-Gaussian

prior information, and is therefore consistent with the Bayesian problem formulation.

4

reasons. It provides a common conceptual framework for both the emission and transmission

problems, and it gives insight into the best choice of numerical techniques for the exact

optimization problem. We also show that this Taylor series approximation leads to an

expression similar to one proposed by Fessler[20] for the modeling of accidental coincidences

in emission reconstructions.

We next develop the framework for the optimization of the exact posterior distribution.

Based on the approximate analysis, we implement the ICD pixel updates using a variant on

the classical Newton-Raphson root finding method. The per-iteration computational cost of

ICD/Newton-Raphson is found to be similar to that of the EM algorithm, but unlike EM,

the ICD/Newton-Raphson algorithm is easily adapted to the Bayesian problem.

2 Formulation of Statistical Problem

In this section, we will develop the statistical framework for the MAP reconstruction problem

for both the emission and transmission case. We will also review the conventional EM

approach for later comparison.

For the emission problem, λ is the N dimensional vector of emission rates, Y is the

M dimensional vector of Poisson-distributed photon counts, λj represents the emission rate

from pixel (voxel) j, and Pij is the probability that an emission from pixel j is registered by

the ith detector. Thus, according to the standard emission tomographic model, the random

vector Y has the distribution

P(Y = y|λ) =M∏

i=1

exp−Pi∗λPi∗λyiyi!

(1)

where the matrix P contains the probabilities Pij, and Pi∗ denotes the vector formed by

its i-th row. This formulation is general enough to include a wide variety of photon-limited

imaging problems, and the entries of P may also incorporate the effects of detector response

5

and attenuation. Using (1), the log likelihood may be computed.

(emission) logP(Y = y|λ) =M∑

i=1

(−Pi∗λ + yi logPi∗λ − log(yi!)) (2)

The corresponding result for the transmission case is discussed in [19]. In order to

emphasize the similarity of the transmission problem, we use the same notation but interpret

λ as the attenuation density of a pixel, and y as the observed photon counts. The log

likelihood is then given by

(transmission) logP(Y = y|λ) =M∑

i=1

(

−yT e−Pi∗λ + yi(log yT − Pi∗λ)− log(yi!))

, (3)

where yT is the photon dosage per ray. Both log likelihood functions have the form

logP(Y = y|λ) = −M∑

i=1

fi(Pi∗λ) (4)

where fi(·) are convex and differentiable functions. This common form will lead to similar

methods of solving these two problems.

For the emission problem, maximum-likelihood (ML) estimation of λ from y yields the

optimization problem

λML = argminλ

M∑

i=1

(Pi∗λ − yi logPi∗λ) .

Probably the most widely applied algorithm for finding λML is expectation-maximization

(EM)[25], which was first applied by Shepp and Vardi[3] to the emission tomographic prob-

lem. EM solves the ML estimation by hypothesising the existence of complete data, which

would allow very simple estimation of λ if available. For the emission tomographic problem,

these observations are the number of photons actually emitted from each discretized cell of

the reconstruction region. The iteration resulting from the EM formulation is

λn+1j =λnj

∑Mm=1 Pmj

M∑

i=1

yiPij∑Nl=1 Pilλ

nl

(5)

where n indicates the number of the iteration, updating the entire reconstruction. Because

the log likelihood is concave, this approach can be shown to converge to the ML estimate[3].

6

In Sections 3 and 4, we will show that the exact ML or MAP reconstruction may also be

computed through direct optimization using the ICD algorithm. The ICD algorithm works

by sequentially optimizing the log likelihood with respect to each pixel (or voxel) value, λj .

In [19], we introduced a fast algorithm for implementing ICD in the transmission problem

when the log likelihood is approximated by a single quadratic function. This basic ICD

algorithm exploits the sparse nature of the projection matrix by maintaining a state vector

pn = Pλn of the projected values.

In this paper, we will investigate three new techniques for applying ICD to emission

and transmission MAP reconstruction. The first technique is as in [19], but relies on a new

quadratic approximation to the log likelihood derived for the emission problem. The second

technique, which we will refer to as ICD/Half Interval, finds the solution through greedy

sequential pixel updates, using a half-interval search to solve the problem exactly at each

step. Finally, the method we call ICD/Newton Raphson (ICD/NR) forms a revised quadratic

approximation to the log likelihood at each new pixel update. The ICD/Newton Raphson

method solves the MAP reconstruction problem exactly, with the optimum estimate being

its only fixed point. The details of computation will follow in Sections 3 and 4.

In order to compare these various algorithms, we will need an objective measure of com-

putational complexity which is independent of the specific hardware or software implemen-

tation. For this purpose, we use two figures of merit: the approximate number of multiplies

plus divides per full iteration, and the number of complete reads of the P matrix. In practice,

we have found the number of matrix reads to be a good predictor of algorithm speed since

computation is often dominated by memory access time and indexing overhead. Table 1 lists

these two performance measures for the EM algorithm in terms ofM0 the average number of

nonzero projection values associated with each pixel. 4 NM0 is then the number of nonzero

entries in the sparse projection matrix P . Notice that one iteration of the EM algorithm

4The entries in Table 1 assumes that M0 >> 1, the sparse matrix P is precomputed and stored, andsums independent of the data are precomputed.

7

Algorithm # of multi./div. # of matrix readsFiltered Backprojection M0N 1

EM (maximum likelihood) 2M0N 2ICD (Gauss-Seidel) with quadratic approximation 3M0N 2

ICD/Half Interval exact 3KM0N 2ICD/Newton-Raphson exact 4M0N 2

Table 1: Two measures of computational complexity which are independent of hard-ware/software implementation. The first measure is the number of multiplies plus dividesper full iteration of each method. The second measure is the number times the P matrixis read from memory. N is the number of points in the image, M0 is the average numberof nonzero projections associated with each image pixel, and K is the average number ofiterations required for the half interval search.

requires the computation of two iterations of filtered back projection.

For low signal-to-noise ratio medical imaging problems, the shortcomings of ML esti-

mation are well documented[2, 26, 27]. Therefore, many researchers have resorted to some

form of regularized estimation for tomographic inversion. Maximum a posteriori probability

(MAP) estimation addresses this problem by adding regularization in the form an a priori

density for λ. The MAP estimate has been shown to substantially improve performance in

many image reconstruction and estimation problems. In addition, the computation of the

MAP estimate is not prohibitively difficult provided that the log of the prior density is a

concave function of λ.

A frequent choice for a prior model is the Gaussian MRF, but the quadratic penalty

extracted for the Gaussian often causes excessive smoothing of edges. Several prior models

have been developed which include desirable edge-preserving properties, and which maintain

convexity in their log prior densities[11, 28, 23, 24]. Provided we choose one of these models,

the MAP problem is also convex, and iterations converge to the unique global minimum

solution. We use the Generalized Gaussian MRF (GGMRF) model proposed in [23] with

the density function

pλ(λ) =1

zexp

−γq∑

j,k∈C

bj−k|λj − λk|q

where C is the set of all neighboring pixel pairs, bj−k is the coefficient linking pixels j and

8

k, γ is a scale parameter, and 1 ≤ q ≤ 2 is a parameter which controls the smoothness of

the reconstruction. This model includes a Gaussian MRF for q = 2, and an absolute-value

potential function with q = 1. In general, smaller values of q allow sharper edges to form in

reconstructed images.

Prior information may also be available in the form of constraints on the reconstructed

solution. We will assume that the set of feasible reconstructions Ω is convex, and in all

experiments we will choose Ω to be the set of positive reconstructions5. Combining this

prior model with the log likelihood expression of (4) yields the expression for the MAP

reconstruction.

λMAP = argminλ∈Ω

M∑

i=1

fi(Pi∗λ) + γq∑

j,k∈C

bj−k|λj − λk|q

(6)

While the EM algorithm is not difficult to implement or understand in the ML case, it

is not directly and simply applicable to MAP estimation when the complete data is taken

to be the number of photons emitted from each pixel. This is because there is no closed

form solution for the maximization step of the iteration. Hebert and Leahy[9, 10] have

developed the GEM algorithm to cope with these effects. The GEM algorithm takes the

form of coordinate gradient ascent of the MAP EM cost functional with a heuristic step size

which can be adjusted to guarantee convergence. De Pierro’s majorization method for MAP

reconstruction is also guaranteed to converge[12]. The OSL method proposed by Green[11]

uses an approximate maximization step based on the previous values for neighboring pixels.

For all three of these algorithms, computation per iteration is approximately the same as

listed for EM in Table 1.

5This is consistent with the Bayesian formulation since we may condition on the information that λ ∈ Ω.

9

3 Computation of Approximate MAP estimate

The first step toward efficient direct optimization of (6) will be to develop a quadratic ap-

proximation to the log likelihood functions for the emission and transmission problems. This

approximation is useful because it will guide the design of efficient optimization techniques

for the exact problem, and because it gives important intuition into the method and its

relationship to existing reconstruction algorithms.

In the Appendix, we compute the first two terms in a Taylor expansion to find a quadratic

approximation to the emission log likelihood of (2). In [19], a similar quadratic approximation

was derived for the transmission problem. Both approximations have the form

logP(Y = y|λ) ≈ −12(p− Pλ)TD(p− Pλ) + c(y) (7)

where p is a vector of projection measurements, D is a diagonal matrix, and c(y) is some

function of the data. For the purposes of MAP estimation c(y) may be ignored since it does

not depend on λ. For the emission case, p, and D are given by 6

(emission) p = y

D = diagy−1i

while for the transmission case they are given by

(transmission) pi = ln(yT/yi) (8)

D = diagyi . (9)

The placement and character of the diagonal matrix D gives immediate insight into both

problems. For the transmission problem, D more heavily weights those projections which

correspond to large photon counts, and therefore have high signal-to-noise ratios. The emis-

sion problem is the reverse, with larger photon counts resulting in greater measurement

variance.6For notational simplicity, we assume that all yi > 0. The Appendix gives a more general quadratic

approximation obtained by treating the terms corresponding to yi = 0 separately.

10

1.0 2.0 3.0 4.0 5.0 6.0 7.0Integral Density (P )

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

Log

Lik

elih

ood

Transmission Case

λ1*

y = 5y = 20

y = 1001

1 1

0.0 100.0 200.0 300.0 400.0Integral Emission

-5.0

-4.0

-3.0

-2.0

-1.0

0.0

Log

Lik

elih

ood

Emission Case

y =10y =100

y =300

(P )λ

1

1

1

1*

(a) (b)

Figure 1: Plots of the relative log-likelihoods for (a) transmission and (b) emission tomogra-phy as a function of a single projection across the reconstructed image. The exact Poissonmodel (dashed lines) and the quadratic approximation (solid lines) are each plotted for threevalues of y1, the observed photon count. In the transmission case, yT = 500. Each of theseplots covers a confidence interval of 0.99 for a maximum-likelihood estimate of the projectionvalue.

Fig. 1 compares the exact log likelihood functions and their quadratic approximations

over a range of photon counts. Both plots assume a single projection y1 and use a 99%

confidence interval. In the Appendix, we show that for both transmission and emission cases

the approximation error bound is proportional to∑Mi=1 1/

√yi for large yi. This is consistent

with the plots which show better approximations for larger yi. In practice, the accuracy of

this least squares approximation to the log likelihood function will depend on the dosage

(transmission) or emission rates (emission), and the particular application.

Fessler[20] has independently developed a closely related approximation to the log like-

lihood for reconstruction of PET imagery from data precompensated for accidental coinci-

dences. In this case, the Poisson model is violated since the observed counts of the compen-

sated data can be negative, Fessler employed a Gaussian approximation for the density of

corrected counts y, yielding the objective function

Φ(λ) = (y − Pλ)Tdiagσ−2i (y − Pλ).

11

Here the variance estimates are given by

σ2i = nia2i (a−1i yi + 2ri).

where ni represents detector efficiency and ai represent attenuation correction factors, y is a

smoothed version of y, constrained to be positive, and ri are accidental coincidence counts

made in an independent measurement. Under this least squares formulation, Fessler found

ICD to converge relatively quickly, improving on the speed of EM iterations.

Our approximate formulation of the log likelihood of (9) is similar to Fessler’s approx-

imation without accidental coincidences, except for the smoothing of the entries in y. In

section 5, we will show how the basic emission model of (2) can also be used to optimally

account for accidental coincidences when calibration measurements are available.

4 Computation of Exact MAP estimate

To compute the exact MAP reconstruction, we must perform the optimization of (6). Of

course, there is a wide variety of possible optimization techniques from which to choose, but

we will use the approximate quadratic structure derived in section 3 to guide our selection.

Optimization techniques such as gradient ascent are undesirable because of their slow con-

vergence [19, 29]. Alternatively, conjugate gradient[30] or preconditioned gradient descent[31,

32] techniques may be used since they are known to have rapid convergence for quadratic

optimization problems. However, the performance of these techniques is less predictable for

the nonquadratic problems resulting from non-Gaussian prior distributions. In fact, as q

approaches 1, the log prior distribution is not differentiable, and gradient based optimiza-

tion methods can become unreliable[33]. Possibly the most significant drawback of conjugate

gradient and preconditioning methods is the difficulty of incorporating positivity constraints.

The incorporation of positivity constraints for these methods requires the use of “bending”

techniques which are potentially very computationally intensive[34].

12

We choose to use the ICD algorithm for a number of reasons. First, it was shown in [19]

that the greedy pixel-wise updates of the ICD algorithm produce rapid convergence of the

high spatial frequency components. Since the FBP can be used as a starting point of the

algorithm, the convergence of the low spatial frequencies is less important. Second, the ICD

updates work well with non-Gaussian prior models. In fact, the ICM algorithm, which is

functionally equivalent to ICD, was developed for the MAP segmentation problem with a

discrete prior model. Finally, the ICD algorithm is easily implemented along with convex

constraints such as positivity. For this example, each pixel update is simply constrained to

be positive.

The ICD algorithm is implemented by sequentially updating each pixel of the image.

With each update the current pixel is chosen to minimize the MAP cost function. For

emission tomography, the ICD update of the jth pixel is given by

λn+1j = argminx≥0

M∑

i=0

[

Pijx− yi log(

Pij(x− λnj ) + Pi∗λn)]

+ γq∑

k∈Nj

bj−k|x− λnk |q

, (10)

where Nj is the set of pixels neighboring j. Notice that in this case λn and λn+1 differ at a

single pixel, so a full update of the image requires that (10) be applied sequentially at each

pixel.

No simple closed-form expression for λn+1j results from (10), but there are many op-

timization techniques which can be employed to find its minimum. We will describe two

strategies to the solution of (10). The first strategy is a direct application of half interval

search. However, each iteration of this direct approach is significantly more computationally

expensive than an iteration of EM based methods. The second strategy uses a technique

similar to Newton-Raphson search to substantially reduce computation by exploiting the

approximately quadratic nature of the log likelihood function.

Half interval search may be directly applied to solve (10) by searching for a zero in the

derivative of the cost function. The cost function may be analytically differentiated to yield

13

the expression

∑

i

Pij

(

1− yiPij(x− λnj ) + Pi∗λn

)

+ qγq∑

k∈Nj

bj−k|x− λnk |q−1sign[x− λnk ] .

The disadvantage of this direct approach is that it requires the repeated computation of

the derivative. By maintaining a state vector for Pi∗λn, the derivative may be evaluated

with approximately 3M0 multiplies and divides. Therefore, the computation required for

a complete update is given in Table 1 as 3KM0N where K is the average number of half

interval iterations required for convergence to the desired precision.

We may reduce computation of ICD updates by exploiting the fact that the log likelihood

function is approximately quadratic. Newton-Raphson optimization works by applying a

second order Taylor series approximation to the function being maximized. For example, if

g(x) is the function being maximized and xn is the current value, then the new value xn+1

is given by

xn+1 = argminx

g(xn) + (x− xn)g′(xn) + 12(x− xn)2g′′(xn)

= xn − g′(xn)/g′′(xn)

where g′(x) and g′′(x) are the first and second derivatives of g(·) respectively. Should g(·)

be quadratic, we find the exact solution in a single step.

We apply the Newton-Raphson approach to the ICD updates by locally approximating

the log likelihood function as quadratic. However, our method will deviate from conventional

Newton-Raphson because we retain the exact expression for the log likelihood of the prior

distribution. This is because the prior term is generally not well approximated by a quadratic

function. Let θ1 and θ2 be the first and second derivatives of the log likelihood function

evaluated for the current pixel value λnj . Using our Newton-Raphson update, the new pixel

value is given by

λn+1j = argminx≥0

θ1(x− λnj ) +θ22(x− λnj )2 + γq

∑

k∈Nj

bj−k|x− λnk |q

. (11)

14

This equation may be solved by analytically calculating the derivative and then numerically

computing the derivative’s root using the half interval method. This root finding computa-

tion is relatively inexpensive since θ1 and θ2 are precomputed and the number of neighboring

pixels is generally small.

The complete set of ICD/Newton-Raphson update equations for emission data is then

given by

θ1 =M∑

i=1

Pij

(

1− yipni

)

(12)

θ2 =M∑

i=1

yi

(

Pijpni

)2

(13)

0 = θ1 + θ2(x− λnj ) + qγq∑

k∈Nj

bj−k|x− λnk |q−1sign(x− λnk)∣

∣

∣

∣

∣

∣

x=λn+1j

(14)

pn+1 = P∗j(λn+1j − λnj ) + pn (15)

The root finding operation of (14) is usually computationally inexpensive, since the neigh-

borhood Nj typically contains only a few pixels. 7 Therefore, the computation is dominated

by the 4M0 total multiplies and divides required to compute θ1 and θ2. A full iteration

consists of applying a single Newton-Raphson update to each pixel in λ. This results in

4M0N operations per full image update. In practice, the computation is often dominated by

the time required to index through the data. By this measure, the ICD/Newton-Raphson

and EM algorithms are computationally equivalent, both requiring two indexings through

the projection matrix P .

It should be noted that the distinction between the ICD/Newton-Raphson method and

the approximate method of section 3, is that for ICD/Newton-Raphson the parameters of

the quadratic approximation are recomputed for each new update. This guarantees that the

exact MAP reconstruction is the only fixed point of the algorithm. While we have not yet

shown that the ICD/Newton-Raphson method is guaranteed to converge to its fixed point,

7In some cases, computation time can be reduced by replacing the power function xq−1 by a linearlyinterpolated lookup table.

15

there are a number of reasons to believe that its convergence should be very stable. Since

the true log likelihood is very close to quadratic the approximation of the Newton Raphson

updates is quite accurate. Of course in the quadratic case, the ICD/Newton-Raphson method

is very stable. In fact, ICD/Newton-Raphson method has an intrinsic safety factor since

it remains stable in the quadratic case even with over relaxation by a factor of two [18].

In practice, we have never observed ICD/Newton-Raphson to have unstable or unreliable

convergence.

The ICD/Newton-Raphson method may also be applied to the transmission tomography

problem. In this case, the parameters θ1 and θ2 of (12) and (13) are then given by the

following equations.

(transmission) θ1 =M∑

i=1

Pij(

yi − yT e−pni

)

θ2 =M∑

i=1

P 2ijyTe−pni .

We note that these updates require the evaluation of exponential functions.

5 Attenuation and Accidental Coincidence Effects

Typical emission tomographic systems include imperfections which have been included into

EM-based estimation approaches. For example, emitted photons have some non-zero proba-

bility of attenuation before they are registered. This probability is related to the transmission

characteristics of the object, and can typically be measured by a preliminary transmission

scan. In this case, the attenuation probability can be included directly in the Pij values.

Another possibility is the joint estimation of emission and attenuation properties[35]. For

this work, we have assumed attenuation probabilities are included into P .

In positron emission tomography (PET) imaging, a significant fraction of registered pho-

tons are caused by “accidental coincidences,” i.e. readings from distinct positron/electron

annihilations which are counted as having arisen from a single event. This increases the

16

expected value of the count yi by some number ri. If this value is assumed known, as in [36],

then the log likelihood values may simply be chosen to reflect a Poisson distribution with

the adjusted mean.

Another possibility is that independent calibration measurements are made which result

in Poisson random variables Ri with mean ri. In principle this measurement can be made for

a PET imaging system by counting the number of coincidences that occur between delayed

time intervals[20, 37]. In this case, the optimal reconstruction can be performed by simply

augmenting the vector of projection measurements, y, to include the independent Poisson

calibration measurements r = [r1, · · · , rM ]t. We define the quantities

λ =

[

λr

]

y =

[

yr

]

P =

[

P I0 I

]

where I is the identity matrix. Since y is a vector of independent Poisson measurements

with mean P λ, all the techniques of the previous sections may be used directly. We note

that updates to the components of r are simple to compute since only two “projection”

measurements are associated with each value ri.

6 Experimental Results

Fig. 2 shows the phantoms we used for our emission and transmission tomography experi-

ments together with the FBP reconstructions. The emission phantom in Fig. 2(a) represents

higher emission rates with higher image intensity, having zero emission from the background.

Rates are scaled to yield a total count of approximately 5× 104 for the section in Fig. 2(a),

with readings taken at 64 equally spaced angles, and 64 perfectly collimated detectors at

each angle. The transmission phantom represents an object of diameter 20 cm and density

0.2 cm−1, with higher density regions of up to 0.48 cm−1 added. The dosage per ray (yT )

17

of only 500 in this experiment results in zero counts at many detectors. Though this is

well below typical medical transmission CT imaging rates, low dosage reconstructions are

useful in reconstructing an approximate attenuation map for emission imaging[38]. The

FBP reconstructions of Fig. 2 show significant noise and streaking artifacts. This is typical

of FBP reconstructions since they do not account for the relative accuracy of projection

measurements.

We choose an 8-point neighborhood system for the GGMRF, with normalization of

weights bj−k to a total of 1.0 for each j, and bj−k = (2√2+4)−1 for nearest neighbors and

bj−k = (4 + 4√2)−1 for diagonal neighbors. In order to illustrate the effect of the Bayesian

prior, we will compute reconstructions for both q = 2 and q = 1.1. The first case is equivalent

to the common Gaussian prior, and the second does a better job of preserving edges. Since

the log prior is strictly concave and differentiable in both cases, convergence of numerical

algorithms can be guaranteed. We choose scale parameters yielding the qualitatively best

results for comparison of reconstructions under the exact and approximate likelihoods. The

parameters of the prior model were (q = 2.0, γ = 1.0), (q = 1.1, γ = 3.0) for the emission

problem, and (q = 2.0, γ = 15.0), (q = 1.1, γ = 40.0) for the transmission problem.

Fig. 3 and Fig. 4 show the results of MAP reconstruction for the emission and transmis-

sion problems. In each figure, the exact and approximate MAP reconstructions are shown.

The exact MAP reconstructions were computed using a large number of iterations of the

ICD/Newton-Raphson algorithm of section 4. In each case the cost function being mini-

mized is convex and the algorithm converges to the global minimum. Therefore, the MAP

reconstruction computed using the EM algorithm will be identical. Since the form of the

approximate log likelihood (7) is the same for the emission and transmission problems, the

ICD algorithm (called Gauss-Seidel) of [19] was used to calculate both approximate MAP

estimates. This algorithm has been shown to converge rapidly and requires approximately

3M0N operations per iterations as listed in Table 1.

18

In both examples of Fig. 3, there are small but perceptible differences between the exact

and approximate reconstructions. Fessler has found that the quadratic approximation can

introduce significant bias into transmission reconstructions under low dosages[39]. So exact

MAP transmission reconstruction may be of value if precise and absolute density measure-

ments are required.

We concentrate on the exact emission reconstruction problem for comparison of conver-

gence rates to previously proposed algorithms, since it is in this arena that the majority of

recent research activity has taken place. (Convergence of ICD in transmission tomography

was treated in [19].) Three alternatives to ICD/NR appear in the plots. Green’s one-step-late

(OSL) algorithm allows EM to be applied to MAP problems by adding a regularizing term

to each EM maximization step which is based on the previous iteration’s pixel values[11].

The update is made by setting to zero the sum of the gradient of the EM functional, and the

gradient of the log of the prior density, evaluated at pixel values from the previous iteration.

The OSL is very simple to compute, but may fail to converge. The generalized expectation-

maximization (GEM) of Hebert and Leahy [9] substitutes an increase in the MAP/EM

objective function for the more difficult maximization. GEM features an adjustable step

size for the update accompanied by evaluation of the cost functional to guarantee increase

in a posteriori likelihood. While the vector p is updated after all pixels have been visited,

the pixel values used in evaluating the log of the prior density are updated sequentially. Fi-

nally, we include DePierro’s method[8], which guarantees convergence through a MAP/EM

approach which decouples the computation of pixel updates in the maximization step. This

technique may therefore be applied to complete parallel updates. While DePierro’s method

was originally designed for the case of a Gaussian prior density, it applies for other convex

penalties as well[12], such as the GGMRF with q = 1.1.

The three alternative algorithms were implemented without modification to their origi-

nally proposed forms. All methods could include a one-dimensional search for each pixel’s

19

update, which is assumed available in both De Pierro’s method and ICD/NR. Both of these

techniques require the minimization of a non-quadratic function at each pixel in the case of

a non-Gaussian prior. The adjustable step size of GEM may be replaced by a minimization,

and OSL may be augmented to vary the influence of the derivative of the log of the prior at

each step to guarantee convergence as well[11]. While computational costs are often dom-

inated by accessing the data and projection matrix, the expense of the 1-D minimizations

may be appreciable for certain priors.

We will plot convergence performance in terms of complete updates of the image, since like

the computational cost measures of Table 1, it is independent of implementation. Each of the

four methods was initialized in all cases with an FBP reconstruction, which is of negligible

cost relative to the ensuing computation. Since the a posteriori log likelihood is strictly

convex, the solution will not be influenced by this choice of initial condition. Because low

frequency components in the error between the FBP image and the MAP reconstruction will

converge most slowly[19], we correct the zero-frequency component of the initial condition

with a least-squares estimate directly from the data.

Figure 5 shows the convergence rates for the maximum-likelihood problem (γ = 0). In

this case, all three alternative methods reduce to the EM algorithm. The ICD/NR estimate

has, for practical purposes, converged after 5 or 6 iterations, while EM appears to require

over an order of magnitude more. This behavior is at least partly explained by the similarity

of EM to gradient ascent, which is particularly slow for this type of problem[19].

Figures 6 and 7 illustrate similar results for the MAP problem. With the Gaussian prior

model, the three EM-based algorithms perform similarly in early iterations, but OSL fails

to converge for this case, settling into an oscillation significantly below the maximum a

posteriori likelihood. For a scale of γ = 2.0 with the same data, OSL diverged badly from

the solution. Both GEM and DePierro’s method approach the optimum, but as in Fig. 5,

the convergence is much slower than ICD/NR.

20

Non-Gaussian models allow better preservation of abrupt transitions in MAP estimates.

One example which possesses this advantage along with convexity of the potential function

is the GGMRF with values of q near 1.0. We use q = 1.1, which affords good edge preser-

vation with tractable optimization. While this likelihood has first derivatives which are well

behaved, the second derivative of the function is unbounded, which may have varying effects

on the optimization approaches. Although convergence is similar in this case also, interesting

differences exhibit themselves in Fig. 7. OSL again fails to converge, which is not surprising

given the character of the derivative of |x|1.1 near the origin. GEM is the fastest of the three

EM-type methods in this problem, reaching parity with the ICD/NR solution at about 50

iterations. There is also some potentially interesting asymptotic behavior. After about 60

iterations, when the estimate is undergoing very minor changes, the log likelihood of the

GEM estimate slightly exceeds that of ICD/NR. Asymptotic characteristics of ICD/NR in

nonlinear problems may require further study and improvement.

7 Conclusion

We have presented a new method of computing MAP reconstructions using direct optimiza-

tion of the log likelihood function. Each iteration of our proposed ICM/Newton-Raphson

algorithm has computation comparable to an iteration of the EM algorithm. However, the

new method works well with Bayesian prior distributions and converges rapidly. The direct

optimization approach also gives a common framework for solving both the emission and

transmission tomography problems. Experiments indicated that while a fixed quadratic ap-

proximation is adequate for some transmission problems, optimization of the exact likelihood

appeared to yield improved results in the experiments presented here.

21

a bc d

Figure 2: Original synthetic phantoms and their FBP reconstructions for emission and trans-mission examples. (a) Emission phantom with higher emission intensities in lighter areas,and (b) FBP emission reconstruction. (c) Transmission phantom with higher density inlighter areas, and (d) FBP transmission reconstruction. FBP reconstructions were com-puted using a raised cosine rolloff filter, and served as the initial estimate for the iterativestatistical methods.

22

a bc d

Figure 3: Emission MAP reconstructions with a Gaussian MRF prior, and (a) exact re-construction using ICD/Newton-Raphson; (b) quadratic approximation. MAP estimatesresulting from GGMRF model with q = 1.1, and (c) exact reconstruction using ICD/Newton-Raphson; (d) quadratic approximation.

23

a bc d

Figure 4: Transmission MAP reconstructions with a Gaussian MRF prior, and (a) exactreconstruction using ICD/Newton-Raphson; (b) quadratic approximation. MAP estimatesresulting from GGMRF model with q = 1.1, and (c) exact reconstruction using ICD/Newton-Raphson; (d) quadratic approximation.

24

0 20 40 60 80 100 120 140Iteration Number

-3.0e+03

-2.0e+03

-1.0e+03

Log

A P

oste

rior

i Lik

elih

ood

Maximum Likelihood

ICD/NREM

Figure 5: Convergence of ML estimates using ICD/Newton-Raphson updates and EM. Thea posteriori likelihood function values are plotted as a function of full iterations.

0 10 20 30 40 50Iteration Number

-4.7e+03

-3.7e+03

-2.7e+03

Log

A P

oste

rior

i Lik

elih

ood

Gaussian Prior (q=2.0)γ = 1

ICD/NRGEMOSLDePierro’s

Figure 6: Convergence of MAP estimates using ICD/Newton-Raphson updates, Green’s(OSL), and Hebert/Leahy’s GEM, and De Pierro’s method, and a Gaussian prior modelwith γ = 1.0.

25

0 10 20 30 40 50Iteration Number

-5.5e+03

-4.5e+03

-3.5e+03

Log

A P

oste

rior

i Lik

elih

ood

GGMRF Prior, q=1.1γ = 3.0

ICD/NRGEMOSLDePierro’s

Figure 7: Convergence of MAP estimates with a generalized Gaussian prior model withq = 1.1 and γ = 3.0.

8 Appendix: Taylor Series Approximation of Emission

Log Likelihood

In this Appendix, we derive the Taylor series approximation for the emission log likelihood

of (2). We will also determine the convergence behavior of the quadratic approximation for

both the transmission and emission case.

Using the convention that pi = Pi∗λ, the log likelihood may be expressed as

logP(Y = y|λ) = −∑

i

fi(pi)

where for the emission case

fi(pi) = pi − yi logpi+ log(yi!) .

If we assume that yi > 0, then the gradient of the likelihood function evaluated at p = y has

entries

∂ logP(Y = y|λ)∂pi

∣

∣

∣

∣

∣

p=y

= −1 + yipi

∣

∣

∣

∣

∣

p=y

26

= 0

The Hessian is diagonal, with

∂2 logP(Y = y|λ)∂p2i

∣

∣

∣

∣

∣

p=y

=−yip2i

∣

∣

∣

∣

∣

p=y

= − 1yi

We may account for the terms with yi = 0 by including them separately. Let S0 = i : yi = 0

and let S1 = S − S0. Then the approximation for the log likelihood is given by

logP(Y = y|λ) ≈∑

i∈S1

− 12yi(yi − Pi∗λ)2 +

∑

i∈S0

−(yi − Pi∗λ) + c(y) .

If we ignore the terms in S0, then the log likelihood may be simply written as

logP(Y = y|λ) ≈ −12(y − Pλ)TD(y − Pλ) + c(y),

with D = diagy−1i .

We next show that the quadratic approximation for the log likelihood function converges

as∑

i 1/√yi → ∞ for both the transmission and emission cases. For the Taylor series

approximation of a function g(x),

g(x) = g(a) + g′(a)(x− a) + g′′(a)(x− a)22!

+R3,

we have the Lagrange form of the remainder

R3 =g(3)(ǫ)(x− a)3

3!, (16)

where ǫ is between a and x. We will show in both the transmission and emission tomographic

cases, that (16) goes to zero on arbitrarily large confidence intervals as the photon counts,

yi, become large.

In order to cover an arbitrarily large confidence interval for the parameter pi, we will

assume that pi ∈ [yi − k√yi, yi + k√yi] where k is any fixed positive integer. For large yi,

27

the total error is then bounded by

E =∑

i

f(3)i (ǫi)(pi − yi)3

3!

=∑

i

2yiǫ3i

(pi − yi)36

≤∑

i

yi(k√yi)3

3(yi − k√yi)3

≈ k3

3

∑

i

1√yi.

Thus the error goes to zero as∑

i 1/√yi.

For the transmission problem

fi(pi) = yTe−pi − yi(ln yT − pi) + log yi!

To cover an arbitrary confidence interval, we assume pi ∈ [p − k/√yi, p + k/√yi] where

pi = log(yT/yi). This results in the following error bound for large yi.

E =f(3)i (ǫi)(pi − pi)3

3!

=∑

i

yTe−ǫi(pi − pi)36

≤∑

i

k3ek√yi

6√yi

≈ k3

6

∑

i

1√yi

Thus in both cases, the bound on the error magnitude → 0 as ∑i 1/√yi.

References

[1] S. Geman and D. McClure, “Bayesian Image Analysis: An Application to Single PhotonEmission Tomography,” in Proc. Statist. Comput. Sect. Amer. Stat. Assoc., Washington, DC,pp. 12-18, 1985.

[2] D. Snyder and M. Miller, “The use of Sieves to Stabilize Images Produced with the EMAlgorithm for Emission Tomography,” IEEE Trans. on Nuclear Science, vol. NS-32, no. 5, pp.3864-3871, Oct. 1985.

[3] L. Shepp and Y. Vardi, “Maximum Likelihood Reconstruction for Emission Tomography,”IEEE Trans. Med. Im., vol. MI-1, no. 2, Oct. 1982.

28

[4] J.A. Fessler and A.O. Hero, “Complete-Data Spaces and Generalized EM Algorithms,” Proc.ICASSP 1993, vol. 5, pp. 1-4, April 27-30, 1993, Minneapolis MN.

[5] H. Hart and Z. Liang, “Bayesian Image Processing in Two Dimensions,” IEEE Trans. Med.Im., vol. MI-6, no. 3, pp. 201-208, Sept. 1987.

[6] Z. Liang and H. Hart, “Bayesian Image Processing of Data from Constrained Source Distribu-tions; I:Non-valued, uncorrelated and correlated constraints,” Bull. Math. Biol., vol. 49, no.1, pp. 51-74, 1987.

[7] G.T. Herman and D. Odhner, “Performance Evaluation of an Iterative Image ReconstructionAlgorithm for Positron Emission Tomography,” IEEE Trans. Med. Im., vol. 10, no. 3, Sept.1991.

[8] G. T. Herman, A. R. De Pierro, and N. Gai, “On Methods for Maximum A Posteriori ImageReconstruction with a Normal Prior,” J. Visual Comm. Image Rep., vol. 3, no. 4, pp. 316-324,Dec. 1992.

[9] T. Hebert and R. Leahy, “A Generalized EM Algorithm for 3-D Bayesian Reconstruction fromPoisson data Using Gibbs Priors,” IEEE Trans. Med. Im., vol. 8, no. 2, pp. 194-202, June1989.

[10] T. Hebert and S. Gopal, “The GEM MAP Algorithm with 3-D SPECT System Response,”IEEE Trans. Med. Im., vol. 11, no. 1, pp. 81-90, March 1992.

[11] P. J. Green “Bayesian Reconstructions from Emission Tomography Data Using a ModifiedEM Algorithm,” IEEE Trans. Med. Im., vol. 9, no. 1, pp. 84-93, March 1990.

[12] A.R. De Pierro, “A Modified Expectation Maximization Algorithm for Penalized LikelihoodEstimation in Emission Tomography,” to appear in IEEE Trans. on Med. Im., Dec. 1994.

[13] J. A. Fessler and A. O. Hero, “Space-Alternating Generalized EM Algorithm for PenalizedMaximum-Likelihood Reconstruction,” University of Michigan, Technical Report No. 286, Feb.1994.

[14] K. Lange and R. Carson,, “EM Reconstruction Algorithms for Emission and TransmissionTomography,” Journal of Computer Assisted Tomography, vol. 8, no. 2, pp. 306-316, April1994.

[15] J. M. Ollinger, “Maximum-Likelihood Reconstruction of Transmission Images in EmissionComputed Tomography via the EM Algorithm,” IEEE Trans. Med. Im., vol. 13, no. 1, pp.89-101, March 1994.

[16] C. Bouman and K. Sauer, “Fast Numerical Methods for Emission and Transmission Tomo-graphic Reconstruction,” Proceedings of the Twenty-seventh Annual Conference on Informa-tion Sciences and Systems, pp. 611-616, The Johns Hopkins University, Baltimore, MD, March,1993.

[17] J. Besag, “On the Statistical Analysis of Dirty Pictures,” J. Roy. Statist. Soc. B, vol. 48, no.3, pp. 259-302, 1986.

[18] D. M. Young, Iterative Solution of Large Linear Systems, Academic Press, New York, 1971.

[19] K. Sauer and C. Bouman, “A Local Update Strategy for Iterative Reconstruction from Pro-jections,” IEEE Trans. on Sig. Proc., vol. 41, no. 2, pp. 533-548, Feb. 1993.

[20] J.A. Fessler, “Penalized Weighted Least-Squares Image Reconstruction for Positron EmissionTomography,” IEEE Trans. Med. Im., vol. 13, no. 2, pp. 290-300, June 1994.

[21] P. Oskoui-Fard and H. Stark, “Tomographic Image Reconstruction Using the Theory of ConvexProjections,” IEEE Trans. Med. Im., vol. 7, no. 1, pp. 45-58, March 1988.

[22] T. A. Gooley and H. H. Barrett, “Evaluation of Statistical Methods of Image Reconstructionthrough ROC Analysis,” IEEE Transactions on Medical Imaging, vol. 11, no. 2, June 1992.

29

[23] C. Bouman and K. Sauer, “A Generalized Gaussian Image Model for Edge-Preserving MAPEstimation,” IEEE Trans. Image Proc., vol. 2, no. 3, pp. 296-310, July 1993.

[24] R. Stevenson, B. Schmitz, and E. Delp, “Discontinuity Preserving Regularization of InverseVisual Problems,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 24, no. 3,March, 1994.

[25] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Datavia the EM Algorithm,” J. Royal Stat. Soc. B, vol. 39, pp. 1-38, 1977.

[26] E. Veklerov and J. Llacer, “Stopping Rule for the MLE Algorithm Based on Statistical Hy-pothesis Testing,” IEEE Trans. Med. Im., vol. MI-6, pp. 313-319, 1987.

[27] T. Hebert, R. Leahy, and M. Singh, “Fast MLE for SPECT Using an Intermediate PolarRepresentation and a Stopping Criterion,” IEEE Trans. Nucl. Sci., vol. NS-35, pp. 615-619,Feb. 1988.

[28] K. Lange, “Convergence of EM Image Reconstruction Algorithms with Gibbs Priors,” IEEETrans. Med. Im., vol. 9, no. 4, pp. 439-446, Dec. 1990.

[29] W. Press, B. Flannery, S. Teukolsky and Vetterling, Numerical Recipes in C: The Art ofScientific Computing, Cambridge University Press, Cambridge, 1988.

[30] F. Beckman, “The Solution of Linear Equations by the Conjugate Gradient Method,” in eds.A. Ralston, H. Wilf and K. Enslein, Mathematical Methods for Digital Computers, Wiley,1960.

[31] T.-S. Pan, A. E. Yagle, N. H. Clinthorne, and W. L. Rogers, “Acceleration and Filtering inthe Generalized Landweber Iteration Using a Variable Shaping Matrix,” IEEE Transactionson Medical Imaging, vol. 12, no. 2, pp. 278-286, June 1993.

[32] N. H. Clinthorne, T.-S. Pan, P.-C. Chiao, W. L. Rogers, and J. A. Stamos, “PreconditioningMethods for Improved Covergence Rates in Iterative Reconstruction,” IEEE Trans. on MedicalImaging, vol. 12, no. 1, March 1993.

[33] K. Sauer and C. Bouman, “Bayesian Estimation of Transmission Tomograms Using Segmen-tation Based Optimization,” IEEE Trans. on Nuclear Science, vol. 39, no. 4, pp. 1144-1152,Aug. 1992.

[34] D. M. Goodman, “Appling Conjugate Gradient to Image Processing Problems” Proceedings ofthe Seventh Workshop on Multidimensional Signal Processing, p. 1.1, September 23-25, 1991,Lake Placid, New York.

[35] N.H. Clinthorne, J.A. Fessler, G.D. Hutchins, and W.L. Rogers, “Joint Maximum LikelihoodEstimation of Emission and Attenuation Densities in PET,” Proc. 1991 IEEE Nucl. Sci. Symp.& Med. Im. Conf., Nov. 2-9, 1991, Sante Fe, NM, pp. 1927-1932.

[36] D.G. Politte and D.L. Snyder, “Corrections for Accidental Coincidences and Attenuation inMaximum-Likelihood Image Reconstruction for Positron-Emission Tomography,” IEEE Trans.Med. Im., vol. 10, no. 1, pp. 82-89, March 1991

[37] T. Spinks, T. Jones, M. Gilardi and J. Heather, “Physical Performance of the Latest Gener-ation of Commercial Positron Scanner,” IEEE Trans. Nucl. Sci., vol. 35, no. 1, pp. 721-725,Feb. 1988.

[38] R.H. Huesman, S.E. Derenzo, J.L. Cahoon, A.B. Geyer, W.W. Moses, D.C. Uber, T. Vuletich,and T.F. Budinger, “Orbiting Transmission Source for Positron Tomography,” IEEE Trans.Nucl. Sci., vol. NS-35, vol. 1, pp. 735-739, Feb. 1988.

[39] J. A. Fessler, “Sequential Iterative Algorithms for Image Reconstruction,” presented at theMidwest Workshop on Iterative Image Reconstruction, June 17-18, 1994, Washington Univer-sity St. Louis, Missouri.

30

A Uniﬁed Approach to Statistical Tomography Using...

Documents

Transcript of A Uniﬁed Approach to Statistical Tomography Using...