A Unified Approach to Statistical Tomography Using...
Transcript of A Unified Approach to Statistical Tomography Using...
A Unified Approach to Statistical Tomography Using
Coordinate Descent Optimization †‡
Charles A. Bouman
School of Electrical Engineering
Purdue University
West Lafayette, IN 47907-0501
(317) 494-0340
Ken Sauer
Laboratory for Image and Signal Analysis
Department of Electrical Engineering
University of Notre Dame
Notre Dame, IN 46556
(219) 631-6999
December 4, 1996
Abstract 1
Over the past ten years there has been considerable interest in statistically optimal re-construction of image cross-sections from tomographic data. In particular, a variety of suchalgorithms have been proposed for maximum a posteriori (MAP) reconstruction from emis-sion tomographic data. While MAP estimation requires the solution of an optimizationproblem, most existing reconstruction algorithms take an indirect approach based on theexpectation maximization (EM) algorithm.
In this paper we propose a new approach to statistically optimal image reconstructionbased on direct optimization of the MAP criterion. The key to this direct optimizationapproach is greedy pixel-wise computations known as iterative coordinate decent (ICD). Weshow that the ICD iterations require approximately the same amount of computation periteration as EM based approaches, but the new method converges much more rapidly (in ourexperiments typically 5 iterations). Other advantages of the ICD method are that it is easilyapplied to MAP estimation of transmission tomograms, and typical convex constraints, suchas positivity, are simply incorporated.
IP EDICS #2.3 Computed Imaging: Tomography
†This work was supported by the National Science Foundation under Grant no. MIP93-00560, Electricitede France under Grant no. P21L03/2K3208/EP542, and an NEC Faculty Fellowship‡IEEE Trans. on Image Processing, vol. 5, no. 3, pp. 480-492, March 1996.1Permission to publish this abstract separately is granted.
1
1 Introduction
In the past decade, emission tomography and other photon-limited imaging problems have
benefited greatly from the introduction of statistical methods of reconstruction. Unlike
the relatively rigid deterministically-based methods such as filtered backprojection, statis-
tical methods can be applied without modification to data with missing projections or low
signal-to-noise ratios. This makes statistical reconstruction methods well suited to emission
problems, or transmission tomograms of dense materials. The Poisson processes in emission
and transmission tomography invite the application of maximum-likelihood (ML) estimation,
simply choosing the parameters in the discretized reconstruction which best match the data.
Due to the typical limits in fidelity of data, however, ML estimates are unstable, and have
been improved upon by regularized estimation, such as maximum a posteriori probability
(MAP) estimation [1], or the method of sieves [2].
Both the ML and MAP reconstructions may be formulated as the solution to an optimiza-
tion problem. However, this optimization problem is a formidable numerical task due to both
the number of parameters in the estimate (pixels or voxels) and the number of observations
(photon counting measurements). Since the work of Shepp and Vardi[3], the method of choice
for finding ML estimates in emission tomography has been the expectation-maximization
(EM) algorithm. The EM algorithm is based on the notion of a set of “complete” data
which, if available, would make the problem easier.
For the emission tomography problem, the complete data are usually the number of
photons emitted from each cell in the discretized reconstruction. While the use of such a
large complete data set appears to simplify the computation of the ML estimate, Fessler
and Hero[4] have shown that a large or “informative” complete data space also slows down
convergence. This suggests that the fastest convergence should be achieved by using the
actual observations as the complete data set, which is equivalent to direct optimization of
the ML or MAP functionals.
2
In the Bayesian problem, the EM approach is less simple to apply to emission tomographic
reconstruction. This is because the maximization step has no closed form and itself requires
the use of an iterative optimization technique. To address this problem, many approaches
have been proposed, but all of them approximate the otherwise intractable maximization
step. For the case of Gaussian prior densities, Liang and Hart[5, 6], Herman and Odhner[7]
and Herman, De Pierro, and Gai [8] have modified the EM approach to include Bayesian
estimation. A variety of methods have also been proposed for adapting the EM algorithm
to more general Markov random field priors. These methods include the Generalized EM
(GEM) algorithm proposed by Hebert and Leahy[9, 10], the one-step-late (OSL) method
proposed by Green[11], and a more general form of De Pierro’s method[12]. Most recently,
Fessler and Hero have proposed SAGE [13], a collection of methods designed to minimize
the complete data with each pixel update in EM reconstruction.
For the transmission problem, the EMmethods are more difficult to apply because there is
not such an obvious choice for the complete data space[14]. Ollinger used the EM approach
to solve the transmission problem and found that convergence required from 200 to 2000
iterations[15].
In this paper, we expand on the work first presented in [16] and take a direct optimiza-
tion approach to the problem of MAP image reconstruction of emission and transmission
tomograms. It is interesting to note that in spite of their relative simplicity, such direct
optimization methods do not seem to have been investigated for exact statistical image re-
construction with Poisson measurement noise. However, we show that direct optimization
is quite tractable for this problem. Moreover, a direct approach to optimization without
the use of EM allows one to use a wide range of efficient optimization algorithms to achieve
fast convergence. It may also be viewed as maximizing convergence speed by employing a
minimally informative complete data set.
Our approach to optimization is based on the sequential greedy optimization of pixel (or
3
voxel) values in the reconstruction. This method, which we refer to as iterative coordinate
descent (ICD) 2 goes by a variety of other names including iterated conditional modes (ICM)
for MAP segmentation [17], and Gauss-Seidel (GS) for partial differential equations [18]. All
these algorithms work by iteratively updating individual pixels or coordinates to minimize
a cost functional. The ICD method was first applied to least squares Bayesian tomographic
reconstruction by Sauer and Bouman [19]. More recently, Fessler has applied ICD to the
least squares emission problem and shown that underrelaxation methods can speed ICD
convergence[20].
The ICD method has a number of important advantages which make it a good choice for
direct optimization. First, ICD can be efficiently applied to the log likelihood expressions re-
sulting from photon limited imaging systems. In fact, each ICD iteration is similar in nature
and computational complexity to an iteration of the EM algorithm; however, the function we
attack is the posterior probability function rather than the Q function of EM. Second, the
ICD algorithm is demonstrated to converge very rapidly (in our experiments typically 5-10
iterations) when initialized with the filtered back projection (FBP) reconstruction. This is
not surprising since the FBP reconstruction is accurate at low spatial frequencies, and in
[19] we showed that the ICD method (also known as Gauss-Seidel) has rapid convergence
at high spatial frequencies. The third important advantage of the ICD algorithm is that
it easily incorporates convex constraints, and non-Gaussian prior distributions 3. In par-
ticular, positivity is an important convex constraint which can both improve the quality of
reconstructions and significantly speed numerical convergence[21]. Non-Gaussian prior dis-
tributions are also important since they can substantially reduce noise while preserving edge
detail[11, 22, 23, 24].
Our analysis starts with an approximate analysis of the optimization problem based
on a Taylor expansion of the log likelihood function. This analysis is important for two
2We choose to use the name ICD since it is most descriptive of the algorithmic approach.3We note that convex constraints, such as positivity, may be thought of as a special type of non-Gaussian
prior information, and is therefore consistent with the Bayesian problem formulation.
4
reasons. It provides a common conceptual framework for both the emission and transmission
problems, and it gives insight into the best choice of numerical techniques for the exact
optimization problem. We also show that this Taylor series approximation leads to an
expression similar to one proposed by Fessler[20] for the modeling of accidental coincidences
in emission reconstructions.
We next develop the framework for the optimization of the exact posterior distribution.
Based on the approximate analysis, we implement the ICD pixel updates using a variant on
the classical Newton-Raphson root finding method. The per-iteration computational cost of
ICD/Newton-Raphson is found to be similar to that of the EM algorithm, but unlike EM,
the ICD/Newton-Raphson algorithm is easily adapted to the Bayesian problem.
2 Formulation of Statistical Problem
In this section, we will develop the statistical framework for the MAP reconstruction problem
for both the emission and transmission case. We will also review the conventional EM
approach for later comparison.
For the emission problem, λ is the N dimensional vector of emission rates, Y is the
M dimensional vector of Poisson-distributed photon counts, λj represents the emission rate
from pixel (voxel) j, and Pij is the probability that an emission from pixel j is registered by
the ith detector. Thus, according to the standard emission tomographic model, the random
vector Y has the distribution
P(Y = y|λ) =M∏
i=1
exp−Pi∗λPi∗λyiyi!
(1)
where the matrix P contains the probabilities Pij, and Pi∗ denotes the vector formed by
its i-th row. This formulation is general enough to include a wide variety of photon-limited
imaging problems, and the entries of P may also incorporate the effects of detector response
5
and attenuation. Using (1), the log likelihood may be computed.
(emission) logP(Y = y|λ) =M∑
i=1
(−Pi∗λ + yi logPi∗λ − log(yi!)) (2)
The corresponding result for the transmission case is discussed in [19]. In order to
emphasize the similarity of the transmission problem, we use the same notation but interpret
λ as the attenuation density of a pixel, and y as the observed photon counts. The log
likelihood is then given by
(transmission) logP(Y = y|λ) =M∑
i=1
(
−yT e−Pi∗λ + yi(log yT − Pi∗λ)− log(yi!))
, (3)
where yT is the photon dosage per ray. Both log likelihood functions have the form
logP(Y = y|λ) = −M∑
i=1
fi(Pi∗λ) (4)
where fi(·) are convex and differentiable functions. This common form will lead to similar
methods of solving these two problems.
For the emission problem, maximum-likelihood (ML) estimation of λ from y yields the
optimization problem
λML = argminλ
M∑
i=1
(Pi∗λ − yi logPi∗λ) .
Probably the most widely applied algorithm for finding λML is expectation-maximization
(EM)[25], which was first applied by Shepp and Vardi[3] to the emission tomographic prob-
lem. EM solves the ML estimation by hypothesising the existence of complete data, which
would allow very simple estimation of λ if available. For the emission tomographic problem,
these observations are the number of photons actually emitted from each discretized cell of
the reconstruction region. The iteration resulting from the EM formulation is
λn+1j =λnj
∑Mm=1 Pmj
M∑
i=1
yiPij∑Nl=1 Pilλ
nl
(5)
where n indicates the number of the iteration, updating the entire reconstruction. Because
the log likelihood is concave, this approach can be shown to converge to the ML estimate[3].
6
In Sections 3 and 4, we will show that the exact ML or MAP reconstruction may also be
computed through direct optimization using the ICD algorithm. The ICD algorithm works
by sequentially optimizing the log likelihood with respect to each pixel (or voxel) value, λj .
In [19], we introduced a fast algorithm for implementing ICD in the transmission problem
when the log likelihood is approximated by a single quadratic function. This basic ICD
algorithm exploits the sparse nature of the projection matrix by maintaining a state vector
pn = Pλn of the projected values.
In this paper, we will investigate three new techniques for applying ICD to emission
and transmission MAP reconstruction. The first technique is as in [19], but relies on a new
quadratic approximation to the log likelihood derived for the emission problem. The second
technique, which we will refer to as ICD/Half Interval, finds the solution through greedy
sequential pixel updates, using a half-interval search to solve the problem exactly at each
step. Finally, the method we call ICD/Newton Raphson (ICD/NR) forms a revised quadratic
approximation to the log likelihood at each new pixel update. The ICD/Newton Raphson
method solves the MAP reconstruction problem exactly, with the optimum estimate being
its only fixed point. The details of computation will follow in Sections 3 and 4.
In order to compare these various algorithms, we will need an objective measure of com-
putational complexity which is independent of the specific hardware or software implemen-
tation. For this purpose, we use two figures of merit: the approximate number of multiplies
plus divides per full iteration, and the number of complete reads of the P matrix. In practice,
we have found the number of matrix reads to be a good predictor of algorithm speed since
computation is often dominated by memory access time and indexing overhead. Table 1 lists
these two performance measures for the EM algorithm in terms ofM0 the average number of
nonzero projection values associated with each pixel. 4 NM0 is then the number of nonzero
entries in the sparse projection matrix P . Notice that one iteration of the EM algorithm
4The entries in Table 1 assumes that M0 >> 1, the sparse matrix P is precomputed and stored, andsums independent of the data are precomputed.
7
Algorithm # of multi./div. # of matrix readsFiltered Backprojection M0N 1
EM (maximum likelihood) 2M0N 2ICD (Gauss-Seidel) with quadratic approximation 3M0N 2
ICD/Half Interval exact 3KM0N 2ICD/Newton-Raphson exact 4M0N 2
Table 1: Two measures of computational complexity which are independent of hard-ware/software implementation. The first measure is the number of multiplies plus dividesper full iteration of each method. The second measure is the number times the P matrixis read from memory. N is the number of points in the image, M0 is the average numberof nonzero projections associated with each image pixel, and K is the average number ofiterations required for the half interval search.
requires the computation of two iterations of filtered back projection.
For low signal-to-noise ratio medical imaging problems, the shortcomings of ML esti-
mation are well documented[2, 26, 27]. Therefore, many researchers have resorted to some
form of regularized estimation for tomographic inversion. Maximum a posteriori probability
(MAP) estimation addresses this problem by adding regularization in the form an a priori
density for λ. The MAP estimate has been shown to substantially improve performance in
many image reconstruction and estimation problems. In addition, the computation of the
MAP estimate is not prohibitively difficult provided that the log of the prior density is a
concave function of λ.
A frequent choice for a prior model is the Gaussian MRF, but the quadratic penalty
extracted for the Gaussian often causes excessive smoothing of edges. Several prior models
have been developed which include desirable edge-preserving properties, and which maintain
convexity in their log prior densities[11, 28, 23, 24]. Provided we choose one of these models,
the MAP problem is also convex, and iterations converge to the unique global minimum
solution. We use the Generalized Gaussian MRF (GGMRF) model proposed in [23] with
the density function
pλ(λ) =1
zexp
−γq∑
j,k∈C
bj−k|λj − λk|q
where C is the set of all neighboring pixel pairs, bj−k is the coefficient linking pixels j and
8
k, γ is a scale parameter, and 1 ≤ q ≤ 2 is a parameter which controls the smoothness of
the reconstruction. This model includes a Gaussian MRF for q = 2, and an absolute-value
potential function with q = 1. In general, smaller values of q allow sharper edges to form in
reconstructed images.
Prior information may also be available in the form of constraints on the reconstructed
solution. We will assume that the set of feasible reconstructions Ω is convex, and in all
experiments we will choose Ω to be the set of positive reconstructions5. Combining this
prior model with the log likelihood expression of (4) yields the expression for the MAP
reconstruction.
λMAP = argminλ∈Ω
M∑
i=1
fi(Pi∗λ) + γq∑
j,k∈C
bj−k|λj − λk|q
(6)
While the EM algorithm is not difficult to implement or understand in the ML case, it
is not directly and simply applicable to MAP estimation when the complete data is taken
to be the number of photons emitted from each pixel. This is because there is no closed
form solution for the maximization step of the iteration. Hebert and Leahy[9, 10] have
developed the GEM algorithm to cope with these effects. The GEM algorithm takes the
form of coordinate gradient ascent of the MAP EM cost functional with a heuristic step size
which can be adjusted to guarantee convergence. De Pierro’s majorization method for MAP
reconstruction is also guaranteed to converge[12]. The OSL method proposed by Green[11]
uses an approximate maximization step based on the previous values for neighboring pixels.
For all three of these algorithms, computation per iteration is approximately the same as
listed for EM in Table 1.
5This is consistent with the Bayesian formulation since we may condition on the information that λ ∈ Ω.
9
3 Computation of Approximate MAP estimate
The first step toward efficient direct optimization of (6) will be to develop a quadratic ap-
proximation to the log likelihood functions for the emission and transmission problems. This
approximation is useful because it will guide the design of efficient optimization techniques
for the exact problem, and because it gives important intuition into the method and its
relationship to existing reconstruction algorithms.
In the Appendix, we compute the first two terms in a Taylor expansion to find a quadratic
approximation to the emission log likelihood of (2). In [19], a similar quadratic approximation
was derived for the transmission problem. Both approximations have the form
logP(Y = y|λ) ≈ −12(p− Pλ)TD(p− Pλ) + c(y) (7)
where p is a vector of projection measurements, D is a diagonal matrix, and c(y) is some
function of the data. For the purposes of MAP estimation c(y) may be ignored since it does
not depend on λ. For the emission case, p, and D are given by 6
(emission) p = y
D = diagy−1i
while for the transmission case they are given by
(transmission) pi = ln(yT/yi) (8)
D = diagyi . (9)
The placement and character of the diagonal matrix D gives immediate insight into both
problems. For the transmission problem, D more heavily weights those projections which
correspond to large photon counts, and therefore have high signal-to-noise ratios. The emis-
sion problem is the reverse, with larger photon counts resulting in greater measurement
variance.6For notational simplicity, we assume that all yi > 0. The Appendix gives a more general quadratic
approximation obtained by treating the terms corresponding to yi = 0 separately.
10
1.0 2.0 3.0 4.0 5.0 6.0 7.0Integral Density (P )
-5.0
-4.0
-3.0
-2.0
-1.0
0.0
Log
Lik
elih
ood
Transmission Case
λ1*
y = 5y = 20
y = 1001
1 1
0.0 100.0 200.0 300.0 400.0Integral Emission
-5.0
-4.0
-3.0
-2.0
-1.0
0.0
Log
Lik
elih
ood
Emission Case
y =10y =100
y =300
(P )λ
1
1
1
1*
(a) (b)
Figure 1: Plots of the relative log-likelihoods for (a) transmission and (b) emission tomogra-phy as a function of a single projection across the reconstructed image. The exact Poissonmodel (dashed lines) and the quadratic approximation (solid lines) are each plotted for threevalues of y1, the observed photon count. In the transmission case, yT = 500. Each of theseplots covers a confidence interval of 0.99 for a maximum-likelihood estimate of the projectionvalue.
Fig. 1 compares the exact log likelihood functions and their quadratic approximations
over a range of photon counts. Both plots assume a single projection y1 and use a 99%
confidence interval. In the Appendix, we show that for both transmission and emission cases
the approximation error bound is proportional to∑Mi=1 1/
√yi for large yi. This is consistent
with the plots which show better approximations for larger yi. In practice, the accuracy of
this least squares approximation to the log likelihood function will depend on the dosage
(transmission) or emission rates (emission), and the particular application.
Fessler[20] has independently developed a closely related approximation to the log like-
lihood for reconstruction of PET imagery from data precompensated for accidental coinci-
dences. In this case, the Poisson model is violated since the observed counts of the compen-
sated data can be negative, Fessler employed a Gaussian approximation for the density of
corrected counts y, yielding the objective function
Φ(λ) = (y − Pλ)Tdiagσ−2i (y − Pλ).
11
Here the variance estimates are given by
σ2i = nia2i (a−1i yi + 2ri).
where ni represents detector efficiency and ai represent attenuation correction factors, y is a
smoothed version of y, constrained to be positive, and ri are accidental coincidence counts
made in an independent measurement. Under this least squares formulation, Fessler found
ICD to converge relatively quickly, improving on the speed of EM iterations.
Our approximate formulation of the log likelihood of (9) is similar to Fessler’s approx-
imation without accidental coincidences, except for the smoothing of the entries in y. In
section 5, we will show how the basic emission model of (2) can also be used to optimally
account for accidental coincidences when calibration measurements are available.
4 Computation of Exact MAP estimate
To compute the exact MAP reconstruction, we must perform the optimization of (6). Of
course, there is a wide variety of possible optimization techniques from which to choose, but
we will use the approximate quadratic structure derived in section 3 to guide our selection.
Optimization techniques such as gradient ascent are undesirable because of their slow con-
vergence [19, 29]. Alternatively, conjugate gradient[30] or preconditioned gradient descent[31,
32] techniques may be used since they are known to have rapid convergence for quadratic
optimization problems. However, the performance of these techniques is less predictable for
the nonquadratic problems resulting from non-Gaussian prior distributions. In fact, as q
approaches 1, the log prior distribution is not differentiable, and gradient based optimiza-
tion methods can become unreliable[33]. Possibly the most significant drawback of conjugate
gradient and preconditioning methods is the difficulty of incorporating positivity constraints.
The incorporation of positivity constraints for these methods requires the use of “bending”
techniques which are potentially very computationally intensive[34].
12
We choose to use the ICD algorithm for a number of reasons. First, it was shown in [19]
that the greedy pixel-wise updates of the ICD algorithm produce rapid convergence of the
high spatial frequency components. Since the FBP can be used as a starting point of the
algorithm, the convergence of the low spatial frequencies is less important. Second, the ICD
updates work well with non-Gaussian prior models. In fact, the ICM algorithm, which is
functionally equivalent to ICD, was developed for the MAP segmentation problem with a
discrete prior model. Finally, the ICD algorithm is easily implemented along with convex
constraints such as positivity. For this example, each pixel update is simply constrained to
be positive.
The ICD algorithm is implemented by sequentially updating each pixel of the image.
With each update the current pixel is chosen to minimize the MAP cost function. For
emission tomography, the ICD update of the jth pixel is given by
λn+1j = argminx≥0
M∑
i=0
[
Pijx− yi log(
Pij(x− λnj ) + Pi∗λn)]
+ γq∑
k∈Nj
bj−k|x− λnk |q
, (10)
where Nj is the set of pixels neighboring j. Notice that in this case λn and λn+1 differ at a
single pixel, so a full update of the image requires that (10) be applied sequentially at each
pixel.
No simple closed-form expression for λn+1j results from (10), but there are many op-
timization techniques which can be employed to find its minimum. We will describe two
strategies to the solution of (10). The first strategy is a direct application of half interval
search. However, each iteration of this direct approach is significantly more computationally
expensive than an iteration of EM based methods. The second strategy uses a technique
similar to Newton-Raphson search to substantially reduce computation by exploiting the
approximately quadratic nature of the log likelihood function.
Half interval search may be directly applied to solve (10) by searching for a zero in the
derivative of the cost function. The cost function may be analytically differentiated to yield
13
the expression
∑
i
Pij
(
1− yiPij(x− λnj ) + Pi∗λn
)
+ qγq∑
k∈Nj
bj−k|x− λnk |q−1sign[x− λnk ] .
The disadvantage of this direct approach is that it requires the repeated computation of
the derivative. By maintaining a state vector for Pi∗λn, the derivative may be evaluated
with approximately 3M0 multiplies and divides. Therefore, the computation required for
a complete update is given in Table 1 as 3KM0N where K is the average number of half
interval iterations required for convergence to the desired precision.
We may reduce computation of ICD updates by exploiting the fact that the log likelihood
function is approximately quadratic. Newton-Raphson optimization works by applying a
second order Taylor series approximation to the function being maximized. For example, if
g(x) is the function being maximized and xn is the current value, then the new value xn+1
is given by
xn+1 = argminx
g(xn) + (x− xn)g′(xn) + 12(x− xn)2g′′(xn)
= xn − g′(xn)/g′′(xn)
where g′(x) and g′′(x) are the first and second derivatives of g(·) respectively. Should g(·)
be quadratic, we find the exact solution in a single step.
We apply the Newton-Raphson approach to the ICD updates by locally approximating
the log likelihood function as quadratic. However, our method will deviate from conventional
Newton-Raphson because we retain the exact expression for the log likelihood of the prior
distribution. This is because the prior term is generally not well approximated by a quadratic
function. Let θ1 and θ2 be the first and second derivatives of the log likelihood function
evaluated for the current pixel value λnj . Using our Newton-Raphson update, the new pixel
value is given by
λn+1j = argminx≥0
θ1(x− λnj ) +θ22(x− λnj )2 + γq
∑
k∈Nj
bj−k|x− λnk |q
. (11)
14
This equation may be solved by analytically calculating the derivative and then numerically
computing the derivative’s root using the half interval method. This root finding computa-
tion is relatively inexpensive since θ1 and θ2 are precomputed and the number of neighboring
pixels is generally small.
The complete set of ICD/Newton-Raphson update equations for emission data is then
given by
θ1 =M∑
i=1
Pij
(
1− yipni
)
(12)
θ2 =M∑
i=1
yi
(
Pijpni
)2
(13)
0 = θ1 + θ2(x− λnj ) + qγq∑
k∈Nj
bj−k|x− λnk |q−1sign(x− λnk)∣
∣
∣
∣
∣
∣
x=λn+1j
(14)
pn+1 = P∗j(λn+1j − λnj ) + pn (15)
The root finding operation of (14) is usually computationally inexpensive, since the neigh-
borhood Nj typically contains only a few pixels. 7 Therefore, the computation is dominated
by the 4M0 total multiplies and divides required to compute θ1 and θ2. A full iteration
consists of applying a single Newton-Raphson update to each pixel in λ. This results in
4M0N operations per full image update. In practice, the computation is often dominated by
the time required to index through the data. By this measure, the ICD/Newton-Raphson
and EM algorithms are computationally equivalent, both requiring two indexings through
the projection matrix P .
It should be noted that the distinction between the ICD/Newton-Raphson method and
the approximate method of section 3, is that for ICD/Newton-Raphson the parameters of
the quadratic approximation are recomputed for each new update. This guarantees that the
exact MAP reconstruction is the only fixed point of the algorithm. While we have not yet
shown that the ICD/Newton-Raphson method is guaranteed to converge to its fixed point,
7In some cases, computation time can be reduced by replacing the power function xq−1 by a linearlyinterpolated lookup table.
15
there are a number of reasons to believe that its convergence should be very stable. Since
the true log likelihood is very close to quadratic the approximation of the Newton Raphson
updates is quite accurate. Of course in the quadratic case, the ICD/Newton-Raphson method
is very stable. In fact, ICD/Newton-Raphson method has an intrinsic safety factor since
it remains stable in the quadratic case even with over relaxation by a factor of two [18].
In practice, we have never observed ICD/Newton-Raphson to have unstable or unreliable
convergence.
The ICD/Newton-Raphson method may also be applied to the transmission tomography
problem. In this case, the parameters θ1 and θ2 of (12) and (13) are then given by the
following equations.
(transmission) θ1 =M∑
i=1
Pij(
yi − yT e−pni
)
θ2 =M∑
i=1
P 2ijyTe−pni .
We note that these updates require the evaluation of exponential functions.
5 Attenuation and Accidental Coincidence Effects
Typical emission tomographic systems include imperfections which have been included into
EM-based estimation approaches. For example, emitted photons have some non-zero proba-
bility of attenuation before they are registered. This probability is related to the transmission
characteristics of the object, and can typically be measured by a preliminary transmission
scan. In this case, the attenuation probability can be included directly in the Pij values.
Another possibility is the joint estimation of emission and attenuation properties[35]. For
this work, we have assumed attenuation probabilities are included into P .
In positron emission tomography (PET) imaging, a significant fraction of registered pho-
tons are caused by “accidental coincidences,” i.e. readings from distinct positron/electron
annihilations which are counted as having arisen from a single event. This increases the
16
expected value of the count yi by some number ri. If this value is assumed known, as in [36],
then the log likelihood values may simply be chosen to reflect a Poisson distribution with
the adjusted mean.
Another possibility is that independent calibration measurements are made which result
in Poisson random variables Ri with mean ri. In principle this measurement can be made for
a PET imaging system by counting the number of coincidences that occur between delayed
time intervals[20, 37]. In this case, the optimal reconstruction can be performed by simply
augmenting the vector of projection measurements, y, to include the independent Poisson
calibration measurements r = [r1, · · · , rM ]t. We define the quantities
λ =
[
λr
]
y =
[
yr
]
P =
[
P I0 I
]
where I is the identity matrix. Since y is a vector of independent Poisson measurements
with mean P λ, all the techniques of the previous sections may be used directly. We note
that updates to the components of r are simple to compute since only two “projection”
measurements are associated with each value ri.
6 Experimental Results
Fig. 2 shows the phantoms we used for our emission and transmission tomography experi-
ments together with the FBP reconstructions. The emission phantom in Fig. 2(a) represents
higher emission rates with higher image intensity, having zero emission from the background.
Rates are scaled to yield a total count of approximately 5× 104 for the section in Fig. 2(a),
with readings taken at 64 equally spaced angles, and 64 perfectly collimated detectors at
each angle. The transmission phantom represents an object of diameter 20 cm and density
0.2 cm−1, with higher density regions of up to 0.48 cm−1 added. The dosage per ray (yT )
17
of only 500 in this experiment results in zero counts at many detectors. Though this is
well below typical medical transmission CT imaging rates, low dosage reconstructions are
useful in reconstructing an approximate attenuation map for emission imaging[38]. The
FBP reconstructions of Fig. 2 show significant noise and streaking artifacts. This is typical
of FBP reconstructions since they do not account for the relative accuracy of projection
measurements.
We choose an 8-point neighborhood system for the GGMRF, with normalization of
weights bj−k to a total of 1.0 for each j, and bj−k = (2√2+4)−1 for nearest neighbors and
bj−k = (4 + 4√2)−1 for diagonal neighbors. In order to illustrate the effect of the Bayesian
prior, we will compute reconstructions for both q = 2 and q = 1.1. The first case is equivalent
to the common Gaussian prior, and the second does a better job of preserving edges. Since
the log prior is strictly concave and differentiable in both cases, convergence of numerical
algorithms can be guaranteed. We choose scale parameters yielding the qualitatively best
results for comparison of reconstructions under the exact and approximate likelihoods. The
parameters of the prior model were (q = 2.0, γ = 1.0), (q = 1.1, γ = 3.0) for the emission
problem, and (q = 2.0, γ = 15.0), (q = 1.1, γ = 40.0) for the transmission problem.
Fig. 3 and Fig. 4 show the results of MAP reconstruction for the emission and transmis-
sion problems. In each figure, the exact and approximate MAP reconstructions are shown.
The exact MAP reconstructions were computed using a large number of iterations of the
ICD/Newton-Raphson algorithm of section 4. In each case the cost function being mini-
mized is convex and the algorithm converges to the global minimum. Therefore, the MAP
reconstruction computed using the EM algorithm will be identical. Since the form of the
approximate log likelihood (7) is the same for the emission and transmission problems, the
ICD algorithm (called Gauss-Seidel) of [19] was used to calculate both approximate MAP
estimates. This algorithm has been shown to converge rapidly and requires approximately
3M0N operations per iterations as listed in Table 1.
18
In both examples of Fig. 3, there are small but perceptible differences between the exact
and approximate reconstructions. Fessler has found that the quadratic approximation can
introduce significant bias into transmission reconstructions under low dosages[39]. So exact
MAP transmission reconstruction may be of value if precise and absolute density measure-
ments are required.
We concentrate on the exact emission reconstruction problem for comparison of conver-
gence rates to previously proposed algorithms, since it is in this arena that the majority of
recent research activity has taken place. (Convergence of ICD in transmission tomography
was treated in [19].) Three alternatives to ICD/NR appear in the plots. Green’s one-step-late
(OSL) algorithm allows EM to be applied to MAP problems by adding a regularizing term
to each EM maximization step which is based on the previous iteration’s pixel values[11].
The update is made by setting to zero the sum of the gradient of the EM functional, and the
gradient of the log of the prior density, evaluated at pixel values from the previous iteration.
The OSL is very simple to compute, but may fail to converge. The generalized expectation-
maximization (GEM) of Hebert and Leahy [9] substitutes an increase in the MAP/EM
objective function for the more difficult maximization. GEM features an adjustable step
size for the update accompanied by evaluation of the cost functional to guarantee increase
in a posteriori likelihood. While the vector p is updated after all pixels have been visited,
the pixel values used in evaluating the log of the prior density are updated sequentially. Fi-
nally, we include DePierro’s method[8], which guarantees convergence through a MAP/EM
approach which decouples the computation of pixel updates in the maximization step. This
technique may therefore be applied to complete parallel updates. While DePierro’s method
was originally designed for the case of a Gaussian prior density, it applies for other convex
penalties as well[12], such as the GGMRF with q = 1.1.
The three alternative algorithms were implemented without modification to their origi-
nally proposed forms. All methods could include a one-dimensional search for each pixel’s
19
update, which is assumed available in both De Pierro’s method and ICD/NR. Both of these
techniques require the minimization of a non-quadratic function at each pixel in the case of
a non-Gaussian prior. The adjustable step size of GEM may be replaced by a minimization,
and OSL may be augmented to vary the influence of the derivative of the log of the prior at
each step to guarantee convergence as well[11]. While computational costs are often dom-
inated by accessing the data and projection matrix, the expense of the 1-D minimizations
may be appreciable for certain priors.
We will plot convergence performance in terms of complete updates of the image, since like
the computational cost measures of Table 1, it is independent of implementation. Each of the
four methods was initialized in all cases with an FBP reconstruction, which is of negligible
cost relative to the ensuing computation. Since the a posteriori log likelihood is strictly
convex, the solution will not be influenced by this choice of initial condition. Because low
frequency components in the error between the FBP image and the MAP reconstruction will
converge most slowly[19], we correct the zero-frequency component of the initial condition
with a least-squares estimate directly from the data.
Figure 5 shows the convergence rates for the maximum-likelihood problem (γ = 0). In
this case, all three alternative methods reduce to the EM algorithm. The ICD/NR estimate
has, for practical purposes, converged after 5 or 6 iterations, while EM appears to require
over an order of magnitude more. This behavior is at least partly explained by the similarity
of EM to gradient ascent, which is particularly slow for this type of problem[19].
Figures 6 and 7 illustrate similar results for the MAP problem. With the Gaussian prior
model, the three EM-based algorithms perform similarly in early iterations, but OSL fails
to converge for this case, settling into an oscillation significantly below the maximum a
posteriori likelihood. For a scale of γ = 2.0 with the same data, OSL diverged badly from
the solution. Both GEM and DePierro’s method approach the optimum, but as in Fig. 5,
the convergence is much slower than ICD/NR.
20
Non-Gaussian models allow better preservation of abrupt transitions in MAP estimates.
One example which possesses this advantage along with convexity of the potential function
is the GGMRF with values of q near 1.0. We use q = 1.1, which affords good edge preser-
vation with tractable optimization. While this likelihood has first derivatives which are well
behaved, the second derivative of the function is unbounded, which may have varying effects
on the optimization approaches. Although convergence is similar in this case also, interesting
differences exhibit themselves in Fig. 7. OSL again fails to converge, which is not surprising
given the character of the derivative of |x|1.1 near the origin. GEM is the fastest of the three
EM-type methods in this problem, reaching parity with the ICD/NR solution at about 50
iterations. There is also some potentially interesting asymptotic behavior. After about 60
iterations, when the estimate is undergoing very minor changes, the log likelihood of the
GEM estimate slightly exceeds that of ICD/NR. Asymptotic characteristics of ICD/NR in
nonlinear problems may require further study and improvement.
7 Conclusion
We have presented a new method of computing MAP reconstructions using direct optimiza-
tion of the log likelihood function. Each iteration of our proposed ICM/Newton-Raphson
algorithm has computation comparable to an iteration of the EM algorithm. However, the
new method works well with Bayesian prior distributions and converges rapidly. The direct
optimization approach also gives a common framework for solving both the emission and
transmission tomography problems. Experiments indicated that while a fixed quadratic ap-
proximation is adequate for some transmission problems, optimization of the exact likelihood
appeared to yield improved results in the experiments presented here.
21
a bc d
Figure 2: Original synthetic phantoms and their FBP reconstructions for emission and trans-mission examples. (a) Emission phantom with higher emission intensities in lighter areas,and (b) FBP emission reconstruction. (c) Transmission phantom with higher density inlighter areas, and (d) FBP transmission reconstruction. FBP reconstructions were com-puted using a raised cosine rolloff filter, and served as the initial estimate for the iterativestatistical methods.
22
a bc d
Figure 3: Emission MAP reconstructions with a Gaussian MRF prior, and (a) exact re-construction using ICD/Newton-Raphson; (b) quadratic approximation. MAP estimatesresulting from GGMRF model with q = 1.1, and (c) exact reconstruction using ICD/Newton-Raphson; (d) quadratic approximation.
23
a bc d
Figure 4: Transmission MAP reconstructions with a Gaussian MRF prior, and (a) exactreconstruction using ICD/Newton-Raphson; (b) quadratic approximation. MAP estimatesresulting from GGMRF model with q = 1.1, and (c) exact reconstruction using ICD/Newton-Raphson; (d) quadratic approximation.
24
0 20 40 60 80 100 120 140Iteration Number
-3.0e+03
-2.0e+03
-1.0e+03
Log
A P
oste
rior
i Lik
elih
ood
Maximum Likelihood
ICD/NREM
Figure 5: Convergence of ML estimates using ICD/Newton-Raphson updates and EM. Thea posteriori likelihood function values are plotted as a function of full iterations.
0 10 20 30 40 50Iteration Number
-4.7e+03
-3.7e+03
-2.7e+03
Log
A P
oste
rior
i Lik
elih
ood
Gaussian Prior (q=2.0)γ = 1
ICD/NRGEMOSLDePierro’s
Figure 6: Convergence of MAP estimates using ICD/Newton-Raphson updates, Green’s(OSL), and Hebert/Leahy’s GEM, and De Pierro’s method, and a Gaussian prior modelwith γ = 1.0.
25
0 10 20 30 40 50Iteration Number
-5.5e+03
-4.5e+03
-3.5e+03
Log
A P
oste
rior
i Lik
elih
ood
GGMRF Prior, q=1.1γ = 3.0
ICD/NRGEMOSLDePierro’s
Figure 7: Convergence of MAP estimates with a generalized Gaussian prior model withq = 1.1 and γ = 3.0.
8 Appendix: Taylor Series Approximation of Emission
Log Likelihood
In this Appendix, we derive the Taylor series approximation for the emission log likelihood
of (2). We will also determine the convergence behavior of the quadratic approximation for
both the transmission and emission case.
Using the convention that pi = Pi∗λ, the log likelihood may be expressed as
logP(Y = y|λ) = −∑
i
fi(pi)
where for the emission case
fi(pi) = pi − yi logpi+ log(yi!) .
If we assume that yi > 0, then the gradient of the likelihood function evaluated at p = y has
entries
∂ logP(Y = y|λ)∂pi
∣
∣
∣
∣
∣
p=y
= −1 + yipi
∣
∣
∣
∣
∣
p=y
26
= 0
The Hessian is diagonal, with
∂2 logP(Y = y|λ)∂p2i
∣
∣
∣
∣
∣
p=y
=−yip2i
∣
∣
∣
∣
∣
p=y
= − 1yi
We may account for the terms with yi = 0 by including them separately. Let S0 = i : yi = 0
and let S1 = S − S0. Then the approximation for the log likelihood is given by
logP(Y = y|λ) ≈∑
i∈S1
− 12yi(yi − Pi∗λ)2 +
∑
i∈S0
−(yi − Pi∗λ) + c(y) .
If we ignore the terms in S0, then the log likelihood may be simply written as
logP(Y = y|λ) ≈ −12(y − Pλ)TD(y − Pλ) + c(y),
with D = diagy−1i .
We next show that the quadratic approximation for the log likelihood function converges
as∑
i 1/√yi → ∞ for both the transmission and emission cases. For the Taylor series
approximation of a function g(x),
g(x) = g(a) + g′(a)(x− a) + g′′(a)(x− a)22!
+R3,
we have the Lagrange form of the remainder
R3 =g(3)(ǫ)(x− a)3
3!, (16)
where ǫ is between a and x. We will show in both the transmission and emission tomographic
cases, that (16) goes to zero on arbitrarily large confidence intervals as the photon counts,
yi, become large.
In order to cover an arbitrarily large confidence interval for the parameter pi, we will
assume that pi ∈ [yi − k√yi, yi + k√yi] where k is any fixed positive integer. For large yi,
27
the total error is then bounded by
E =∑
i
f(3)i (ǫi)(pi − yi)3
3!
=∑
i
2yiǫ3i
(pi − yi)36
≤∑
i
yi(k√yi)3
3(yi − k√yi)3
≈ k3
3
∑
i
1√yi.
Thus the error goes to zero as∑
i 1/√yi.
For the transmission problem
fi(pi) = yTe−pi − yi(ln yT − pi) + log yi!
To cover an arbitrary confidence interval, we assume pi ∈ [p − k/√yi, p + k/√yi] where
pi = log(yT/yi). This results in the following error bound for large yi.
E =f(3)i (ǫi)(pi − pi)3
3!
=∑
i
yTe−ǫi(pi − pi)36
≤∑
i
k3ek√yi
6√yi
≈ k3
6
∑
i
1√yi
Thus in both cases, the bound on the error magnitude → 0 as ∑i 1/√yi.
References
[1] S. Geman and D. McClure, “Bayesian Image Analysis: An Application to Single PhotonEmission Tomography,” in Proc. Statist. Comput. Sect. Amer. Stat. Assoc., Washington, DC,pp. 12-18, 1985.
[2] D. Snyder and M. Miller, “The use of Sieves to Stabilize Images Produced with the EMAlgorithm for Emission Tomography,” IEEE Trans. on Nuclear Science, vol. NS-32, no. 5, pp.3864-3871, Oct. 1985.
[3] L. Shepp and Y. Vardi, “Maximum Likelihood Reconstruction for Emission Tomography,”IEEE Trans. Med. Im., vol. MI-1, no. 2, Oct. 1982.
28
[4] J.A. Fessler and A.O. Hero, “Complete-Data Spaces and Generalized EM Algorithms,” Proc.ICASSP 1993, vol. 5, pp. 1-4, April 27-30, 1993, Minneapolis MN.
[5] H. Hart and Z. Liang, “Bayesian Image Processing in Two Dimensions,” IEEE Trans. Med.Im., vol. MI-6, no. 3, pp. 201-208, Sept. 1987.
[6] Z. Liang and H. Hart, “Bayesian Image Processing of Data from Constrained Source Distribu-tions; I:Non-valued, uncorrelated and correlated constraints,” Bull. Math. Biol., vol. 49, no.1, pp. 51-74, 1987.
[7] G.T. Herman and D. Odhner, “Performance Evaluation of an Iterative Image ReconstructionAlgorithm for Positron Emission Tomography,” IEEE Trans. Med. Im., vol. 10, no. 3, Sept.1991.
[8] G. T. Herman, A. R. De Pierro, and N. Gai, “On Methods for Maximum A Posteriori ImageReconstruction with a Normal Prior,” J. Visual Comm. Image Rep., vol. 3, no. 4, pp. 316-324,Dec. 1992.
[9] T. Hebert and R. Leahy, “A Generalized EM Algorithm for 3-D Bayesian Reconstruction fromPoisson data Using Gibbs Priors,” IEEE Trans. Med. Im., vol. 8, no. 2, pp. 194-202, June1989.
[10] T. Hebert and S. Gopal, “The GEM MAP Algorithm with 3-D SPECT System Response,”IEEE Trans. Med. Im., vol. 11, no. 1, pp. 81-90, March 1992.
[11] P. J. Green “Bayesian Reconstructions from Emission Tomography Data Using a ModifiedEM Algorithm,” IEEE Trans. Med. Im., vol. 9, no. 1, pp. 84-93, March 1990.
[12] A.R. De Pierro, “A Modified Expectation Maximization Algorithm for Penalized LikelihoodEstimation in Emission Tomography,” to appear in IEEE Trans. on Med. Im., Dec. 1994.
[13] J. A. Fessler and A. O. Hero, “Space-Alternating Generalized EM Algorithm for PenalizedMaximum-Likelihood Reconstruction,” University of Michigan, Technical Report No. 286, Feb.1994.
[14] K. Lange and R. Carson,, “EM Reconstruction Algorithms for Emission and TransmissionTomography,” Journal of Computer Assisted Tomography, vol. 8, no. 2, pp. 306-316, April1994.
[15] J. M. Ollinger, “Maximum-Likelihood Reconstruction of Transmission Images in EmissionComputed Tomography via the EM Algorithm,” IEEE Trans. Med. Im., vol. 13, no. 1, pp.89-101, March 1994.
[16] C. Bouman and K. Sauer, “Fast Numerical Methods for Emission and Transmission Tomo-graphic Reconstruction,” Proceedings of the Twenty-seventh Annual Conference on Informa-tion Sciences and Systems, pp. 611-616, The Johns Hopkins University, Baltimore, MD, March,1993.
[17] J. Besag, “On the Statistical Analysis of Dirty Pictures,” J. Roy. Statist. Soc. B, vol. 48, no.3, pp. 259-302, 1986.
[18] D. M. Young, Iterative Solution of Large Linear Systems, Academic Press, New York, 1971.
[19] K. Sauer and C. Bouman, “A Local Update Strategy for Iterative Reconstruction from Pro-jections,” IEEE Trans. on Sig. Proc., vol. 41, no. 2, pp. 533-548, Feb. 1993.
[20] J.A. Fessler, “Penalized Weighted Least-Squares Image Reconstruction for Positron EmissionTomography,” IEEE Trans. Med. Im., vol. 13, no. 2, pp. 290-300, June 1994.
[21] P. Oskoui-Fard and H. Stark, “Tomographic Image Reconstruction Using the Theory of ConvexProjections,” IEEE Trans. Med. Im., vol. 7, no. 1, pp. 45-58, March 1988.
[22] T. A. Gooley and H. H. Barrett, “Evaluation of Statistical Methods of Image Reconstructionthrough ROC Analysis,” IEEE Transactions on Medical Imaging, vol. 11, no. 2, June 1992.
29
[23] C. Bouman and K. Sauer, “A Generalized Gaussian Image Model for Edge-Preserving MAPEstimation,” IEEE Trans. Image Proc., vol. 2, no. 3, pp. 296-310, July 1993.
[24] R. Stevenson, B. Schmitz, and E. Delp, “Discontinuity Preserving Regularization of InverseVisual Problems,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 24, no. 3,March, 1994.
[25] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Datavia the EM Algorithm,” J. Royal Stat. Soc. B, vol. 39, pp. 1-38, 1977.
[26] E. Veklerov and J. Llacer, “Stopping Rule for the MLE Algorithm Based on Statistical Hy-pothesis Testing,” IEEE Trans. Med. Im., vol. MI-6, pp. 313-319, 1987.
[27] T. Hebert, R. Leahy, and M. Singh, “Fast MLE for SPECT Using an Intermediate PolarRepresentation and a Stopping Criterion,” IEEE Trans. Nucl. Sci., vol. NS-35, pp. 615-619,Feb. 1988.
[28] K. Lange, “Convergence of EM Image Reconstruction Algorithms with Gibbs Priors,” IEEETrans. Med. Im., vol. 9, no. 4, pp. 439-446, Dec. 1990.
[29] W. Press, B. Flannery, S. Teukolsky and Vetterling, Numerical Recipes in C: The Art ofScientific Computing, Cambridge University Press, Cambridge, 1988.
[30] F. Beckman, “The Solution of Linear Equations by the Conjugate Gradient Method,” in eds.A. Ralston, H. Wilf and K. Enslein, Mathematical Methods for Digital Computers, Wiley,1960.
[31] T.-S. Pan, A. E. Yagle, N. H. Clinthorne, and W. L. Rogers, “Acceleration and Filtering inthe Generalized Landweber Iteration Using a Variable Shaping Matrix,” IEEE Transactionson Medical Imaging, vol. 12, no. 2, pp. 278-286, June 1993.
[32] N. H. Clinthorne, T.-S. Pan, P.-C. Chiao, W. L. Rogers, and J. A. Stamos, “PreconditioningMethods for Improved Covergence Rates in Iterative Reconstruction,” IEEE Trans. on MedicalImaging, vol. 12, no. 1, March 1993.
[33] K. Sauer and C. Bouman, “Bayesian Estimation of Transmission Tomograms Using Segmen-tation Based Optimization,” IEEE Trans. on Nuclear Science, vol. 39, no. 4, pp. 1144-1152,Aug. 1992.
[34] D. M. Goodman, “Appling Conjugate Gradient to Image Processing Problems” Proceedings ofthe Seventh Workshop on Multidimensional Signal Processing, p. 1.1, September 23-25, 1991,Lake Placid, New York.
[35] N.H. Clinthorne, J.A. Fessler, G.D. Hutchins, and W.L. Rogers, “Joint Maximum LikelihoodEstimation of Emission and Attenuation Densities in PET,” Proc. 1991 IEEE Nucl. Sci. Symp.& Med. Im. Conf., Nov. 2-9, 1991, Sante Fe, NM, pp. 1927-1932.
[36] D.G. Politte and D.L. Snyder, “Corrections for Accidental Coincidences and Attenuation inMaximum-Likelihood Image Reconstruction for Positron-Emission Tomography,” IEEE Trans.Med. Im., vol. 10, no. 1, pp. 82-89, March 1991
[37] T. Spinks, T. Jones, M. Gilardi and J. Heather, “Physical Performance of the Latest Gener-ation of Commercial Positron Scanner,” IEEE Trans. Nucl. Sci., vol. 35, no. 1, pp. 721-725,Feb. 1988.
[38] R.H. Huesman, S.E. Derenzo, J.L. Cahoon, A.B. Geyer, W.W. Moses, D.C. Uber, T. Vuletich,and T.F. Budinger, “Orbiting Transmission Source for Positron Tomography,” IEEE Trans.Nucl. Sci., vol. NS-35, vol. 1, pp. 735-739, Feb. 1988.
[39] J. A. Fessler, “Sequential Iterative Algorithms for Image Reconstruction,” presented at theMidwest Workshop on Iterative Image Reconstruction, June 17-18, 1994, Washington Univer-sity St. Louis, Missouri.
30