Motion Compensation for Cone-Beam CT Using Fourier … · 2017. 12. 6. · Motion Compensation for...

Motion Compensation for Cone-Beam CT Using

Fourier Consistency Conditions

M. Berger1,2, Y. Xia1,3, W. Aichinger1, K. Mentl1,

M. Unberath1,3, A. Aichert1, C. Riess1,2, J. Hornegger1,2,3,

R. Fahrig4, A. Maier1,2,3

1Pattern Recognition Lab, Friedrich-Alexander-Universtat Erlangen-Nurnberg, 91058

Erlangen, Germany2Graduate School 1773, Heterogeneous Image Systems, 91058 Erlangen, Germany3Erlangen Graduate School in Advanced Optical Technologies (SAOT), 91052

Erlangen, Germany4Radiological Sciences Laboratory, Stanford University, Stanford, California 94305

E-mail: [email protected]

May 26, 2017

Abstract. In cone-beam CT, involuntary patient motion and inaccurate or

irreproducible scanner motion substantially degrades image quality. To avoid artifacts

this motion needs to be estimated and compensated during image reconstruction. In

previous work we showed that Fourier Consistency Conditions (FCC) can be used in

fan-beam CT to estimate motion in the sinogram domain. This work extends the

FCC to 3-D cone-beam CT. We derive an efficient cost function to compensate for 3-D

motion using 2-D detector translations.

The extended FCC method have been tested with five translational motion

patterns, using a challenging numerical phantom. We evaluated the root-mean-square-

error (RMSE) and the structural-similarity-index (SSIM) between motion corrected

and motion-free reconstructions. Additionally, we computed the mean-absolute-

difference (MAD) between the estimated and the ground-truth motion. The practical

applicability of the method is demonstrated by application to respiratory motion

estimation in rotational angiography, but also to motion correction for weight-bearing

imaging of knees. Where the latter makes use of specifically modified FCC version

which is robust to axial truncation.

The results show a great reduction of motion artifacts. Accurate estimation results

were achieved with a maximum MAD value of 708 µm and 1184 µm for motion along

the vertical and horizontal detector direction, respectively. The image quality of

reconstructions obtained with the proposed method is close to that of motion corrected

reconstructions based on the ground-truth motion. Simulations using noise-free and

noisy data demonstrate that FCC are robust to noise. Even high-frequency motion was

accurately estimated leading to a considerable reduction of streaking artifacts. The

method is purely image-based and therefore independent of any auxiliary data.

PACS numbers: XXXXX

Submitted to: Phys. Med. Biol.

Motion Compensation for CBCT Using Fourier Consistency Conditions 2

1. Introduction

Classical CT reconstruction algorithms assume a static object and an ideal scanning

trajectory. In many clinical applications this assumption is violated due to movements

of the patient, such as respiratory motion (Rit et al.; 2009), leading to inconsistencies

in the projection data. Depending on the amplitude and frequency of the underlying

motion, this can cause streaking and blurring artifacts in the reconstruction.

Different approaches exist to reduce motion-induced artifacts. Typically, the

first step is to detect and estimate the motion, followed by a motion compensated

reconstruction that takes the estimated motion into account. The majority of methods

require additional auxiliary signals for motion estimation. In the field of cardiovascular

imaging, artifacts can be reduced by clustering consistent projection images using an

ECG signal (Schaefer et al.; 2006). Subsequently, the image may be further improved via

registration in projection (Schwemmer et al.; 2013) or reconstruction domain (Muller et

al.; 2014). Another type of motion estimation uses fiducial markers, that are identified

in projection images and tracked over time (Li et al.; 2006; Choi et al.; 2014). This

approach is known to be accurate but requires the attachment or even surgical placement

of metallic markers to the anatomy of interest (Shirato et al.; 2003). A key problem

of using auxiliary signals is that they usually reduce patient comfort and increase

acquisition time thus hindering clinical workflow. Further, these signals are dedicated

to very specific use cases or anatomies and may not be used in other acquisition types.

For example, an ECG signal will not correlate with skeletal motion while implanting

markers for conventional heart imaging does not seem reasonable. Another approach

is to perform 2-D/3-D registration of a known volume or an initial reconstruction to

the individual projection images (Zeng et al.; 2005; Gendrin et al.; 2012; Ouadah et al.;

2016; Berger et al.; 2016). Registration approaches have been investigated extensively

for tumor tracking in the field of radiotherapy, where we refer to Rit et al. (2013) for a

comprehensive review.

In principle, motion estimation methods relying on the acquired projection images

only can alleviate most of these problems. Only little work has been done on purely

image-based motion estimation. Some methods impose assumptions on the imaged

object and optimize entropy or positivity measures for the 3-D reconstruction (Rohkohl

et al.; 2013; Ens et al.; 2010). Others use mathematically formulated data consistency

conditions (CC) that describe redundancies in projection or sinogram domain.

The best known CC are the Helgason-Ludwig consistency conditions (HLCC),

derived by Helgason (1980) and Ludwig (1966). The HLCC state that after building

image moments along the detector direction of a parallel-beam sinogram, the Fourier

series expansion along the projection angles needs to be zero at well defined positions.

An intuitive representation of different image moments is given by Welch et al. (1998)

who use the HLCC for attenuation correction in PET acquisitions. Several extensions

of the HLCC exist for the fan-beam geometry and have been used for motion estimation

or detection by (Yu and Wang; 2007; Leng et al.; 2007; Clackdoyle and Desbat; 2015).


Recently, an extension to three-dimensional cone-beam geometries has been proposed

by Clackdoyle and Desbat (2013). It requires that all X-ray source positions are on a

plane which does not intersect the object. Therefore it is not yet applicable for circular

cone-beam CT.

In addition to the HLCC there also exist FCC that are defined in the Fourier

domain of the sinogram. It has been shown by Edholm et al. (1986) and Natterer

(1986) that specific triangular regions in the spectrum of a sinogram have to have an

absolute value close to zero. The size of these regions depends on the scanner geometry

and the maximum extent of the object with respect to the system’s center of rotation.

Thus, knowing the object’s extent the triangular zero regions can be precomputed.

Extensions of FCC to the fan-beam geometry have been proposed by Hawkins et al.

(1988); Natterer (1993); Mazin and Pelc (2010). Recently, we used the FCC for motion

estimation in fan-beam CT (Berger et al.; 2014). Detector shifts are estimated for

each projection angle, such that the energy in the triangular regions of the sinogram’s

spectrum is minimized. Initial work for an extension of the FCC theorem to cone-beam

projections was presented by Brokish and Bresler (2006). However, to our knowledge,

the FCC have never been used for artifact correction in cone-beam geometries.

Both HLCC and FCC have initially been proposed for parallel-beam geometries and

were extended to fan- and cone-beam geometries. In contrast, John (1938) presented full

CC directly for cone-beam geometries that are related to second-order partial differential

equations. The theorem provides a system of ultrahyperbolic partial differential

equations for the projection images. The applicability to an actual reconstruction was

however limited. Yet, extensions to a more practical formulation have more recently

been made by Patch (2002b) with the application of estimating missing projection data

(Patch; 2002b,a).

In this work, we present an extension of the FCC to the cone-beam geometry and

apply it for motion estimation of simulated cone-beam CT data. Besides the thorough

evaluation of translational motion using a challenging numerical phantom, we apply our

method for respiratory motion estimation in rotational angiography, which was recently

published by Unberath et al. (2017). Additionally, we show its applicability to motion

correction for cone-beam CT of knees under weight-bearing conditions. Parts of this

work were recently shown in Berger (2016). The extension to cone-beam CT is based

on the findings of Brokish and Bresler (2006). We define a cost function that estimates

horizontal and vertical detector shifts to correct for translational object motion in 3-D.

The optimization is gradient-based and uses an analytic gradient computation.

2. Methods and Materials

In the following we present the FCC and its extension to the 3-D cone-beam geometry

in Section 2.1. Further, we describe a cost function for the estimation of arbitrary

motion models in Section 2.2. An efficient variant of the cost function and its gradient

with respect to the motion parameters is presented in Section 2.3. Implications of


vu L

y

x

D

λ

z

(a) Cone-beam CT

u

λ

(b) Sinogram

ξ

ω

(c) Spectrum

Figure 1: Cone-beam CT geometry and example of triangular regions.

sampling and discretization are outlined in Section 2.4. Additionally, Section 2.5

provides information about the numeric optimization followed by an approach to increase

the cost function’s robustness to data truncation in Section 2.6. Finally, we present

implementation details in Section 2.7.

2.1. Extended Fourier Properties of the Sinogram

It has been shown that the Fourier transform of a parallel- and fan-beam sinogram

contains triangular regions with an absolute value close to zero (Edholm et al.; 1986;

Natterer; 1986; Mazin and Pelc; 2010). An example of such regions is shown in Figure 1c,

showing the logarithmically scaled spectrum of the sinogram in Figure 1b. The size and

orientation of the empty triangular spectral regions depend on the maximum distance

of the object to the center of rotation rp, as well as the source-isocenter-distance L and

the detector-isocenter-distance D. According to (Mazin and Pelc; 2010) the triangular

regions for a flat detector fan-beam geometry can be described by∣∣∣∣ γ

γ − ξ(L+D)

∣∣∣∣ > rpL

, (1)

where γ and ξ are the frequency variables corresponding to the projection angles and the

detector rows, respectively. These triangular regions exist for perfect sinograms, that

is, a perfectly circular scan without motion, truncation or beam-hardening. As shown

by Berger et al. (2014) potential patient or scanner motion violates this requirement.

Hence, the overall energy in the triangular spectral regions increases.

Figure 1a gives an overview of the cone-beam CT acquisition geometry and the

corresponding mapping from the world to the detector coordinate system. Let ψ be

the frequency variable that corresponds to the vertical detector directions. We now

want to use the 2-D fan-beam properties of Eqn. 1 for the 3-D Fourier space of a

cone-beam projection volume acquired with a circular source trajectory. Brokish and

Bresler (2006) investigated the essential support of this 3-D Fourier domain and derived

minimum sampling conditions that ensure an aliasing-free sampling of the object. The

support is the opposite to the triangular zero regions and described by reversing the

inequality of Eqn. 1. Further, Brokish and Bresler showed that for the majority of


cone-beam imaging devices it is sufficient to determine the support of γ for ψ = 0, i.e.,

in the fan-beam case. From their numerical simulation results it becomes clear that

the variation of the support along the ψ-direction is negligible. Therefore, we use the

fan-beam properties from Eqn. 1 and expand them in the direction of ψ to construct

the triangular zero regions for the 3-D Fourier transform.

2.2. Cost Function for Arbitrary Motion Models

To minimize the energy in the regions described by Eqn. 1, we define a cost function for

a set of K projection images of size I×J , i.e., for a total of M = I ·J ·K measurements.

The cost function is given by

e(α) =∥∥∥W (F p(α))

∥∥∥2

2, (2)

where F ∈ CM×M is a symmetric matrix that performs the 3-D discrete Fourier

transform (DFT), p(α) ∈ RM is the projection data in vector format with respect to the

motion parameters given by α, and W = diag (w) ∈ RM×M is the diagonal matrix of

vector w ∈ RM that represents the 3-D mask for the zero regions. The function diag(·)takes a vector and returns a diagonal matrix with the vector elements on its diagonal.

Further, let ξi, ψj and γk be the frequency values of the Fourier transforms’ discrete

axes, over detector u- and v-directions and the projection angles λ, respectively. Using

the detector spacings ∆u and ∆v and the angular spacing ∆λ they can be computed by

ξi = f(i,∆u, I)

ψj = f(j,∆v, J) with f(l,∆, N) =l − 1− bN

2c

N∆,

γk = f(k,∆λ,K)

where N is the signal’s length, l ∈ [1, . . . , N ] its index, and ∆ the sampling period. The

mask w is defined as

w =(w1,1,1, . . . , wi,j,k, . . . , wI,J,K

)T

, (3)

where the entries of the mask are computed by an extension of the 2-D mask given in

Eqn. 1

wi,j,k =

1, if∣∣∣ γkγk−ξi(L+D)

∣∣∣ > rpL

0, otherwise. (4)

2.3. Efficient Cost Function to Estimate Detector Shifts

Given a reasonably large L, shifts in detector u- and v-direction can be used to

approximate 3-D translations of moderate amplitude. Our image coordinate system

is aligned with the plane of rotation for CT acquisition. In that case, 3-D motion in

direction of the detector u-axis can be associated with detector u-shifts. In contrast,

a 3-D translation towards and away from the detector makes the object or patient


appear bigger and smaller, which could be roughly approximated by an image scaling.

However, the distance an object has to be moved orthogonal to the detector to introduce

an error of about one pixel depends on the cone angle and is typically larger than the

corresponding distance of motion parallel to the detector. Additionally, 2-D translations

are applied directly in the 2-D Fourier domain of the projection images using the shift

theorem. This reduces the computational cost as the 2-D-FFT of the projection images

can be precomputed.

We obtain a parameter vector α ∈ R2K ,

α =(s1, t1, s2, t2, . . . , sK , tK

)T

, (5)

where sk and tk are the detector shifts for projection index k in u- and v-direction,

respectively. A single cost function evaluation requires only the 1-D Fourier transforms

over the rotation angles, i.e., over index k. The complexity of the FFT thus reduces

from O(M logM) to O(M logK). We can rewrite Eqn. 2 to

e(α) =∥∥∥x(α)

∥∥∥2

2=∥∥∥W (F λ (T (α)F uvp))

∥∥∥2

2, (6)

where F λ ∈ CM×M and F uv ∈ CM×M denote the 1-D Fourier transforms over the

angles and the 2-D Fourier transforms over the projection images, respectively. Further,

p ∈ RM represents the original projection data and T (α) ∈ CM×M is a diagonal matrix

that holds the phase factors which encode the shifts in α. Assuming that p is ordered

first over the rows u, then over the columns v and finally over the angles λ, T (α) is

given by

T (α) = diag((e−ı2π(ξ1s1+ψ1t1), . . . , e−ı2π(ξIs1+ψJ t1),

e−ı2π(ξ1s2+ψ1t2), . . . , e−ı2π(ξIsK+ψJ tK))T)

.(7)

To allow for a gradient-based minimization of Eqn. 6 we need to compute the

gradient with respect to the shifting parameter vector α. It is defined by

∇e(α) =(∂e(α)∂s1

, ∂e(α)∂t1

, · · · , ∂e(α)∂sK

, ∂e(α)∂tK

)T

. (8)

Hence, the computation of the gradient requires the partial derivatives of Eqn. 6 with

respect to the individual shifting parameters. The partial derivatives are computed by

∂e(α)

∂αl=

∂

∂αl

∥∥∥x(α)∥∥∥2

2

=∂

∂αl

(x(α)Hx(α)

)= 2x(α)H

(∂

∂αlx(α)

)(9)

= 2(pHF uv

H T (α)HF λHW H

)WF λ

(∂

∂αlT (α)

)F uvp

= 2pHF uvH T (α)HF λ

HWF λ

(∂

∂αlT (α)

)F uvp ,


where αl is an element of vector α and corresponds to either sl or tl, i.e. a shift of

the l-th projection image in u- or in v-direction, respectively. Further, �H denotes

the transposed and complex conjugated of a vector or matrix. Due to the diagonal and

binary structure of W it holds that W H = W and that W HW = W which was used in

the last step of Eqn. 9. Note, that x(α) has already been computed in the cost function

evaluation and does not need to be recomputed for the evaluation of the gradient. The

partial derivatives of T (α) are given by

∂

∂αlT (α) = diag (s) T (α) (10)

s =

(0, . . . , 0, −ı2πξ1, . . . , −ı2πξI , 0, . . . , 0)T if αl = sl

(0, . . . , 0, −ı2πψ1, . . . , −ı2πψJ , 0, . . . , 0)T if αl = tl(11)

where s ∈ CM holds the partial derivatives of the arguments of the exponential functions

in T (α). Note, that a change in the parameters sl or tl only effects the 2-D Fourier

transform of the l-th projection image. Therefore, s is only non-zero at elements in

the range of [(l− 1)IJ + 1, (l)IJ ]. Note also, that the multiplication in Fourier domain

by diag (s), corresponds to a spatial derivative of the l-th projection image in u- or

v-direction.

2.4. Discretization and Adjustment of the Mask

The implementation of the mask w according to Eqn. 4 represents a discretization of

the continuous function given in Eqn. 1. Discretization includes the intersection of the

continuous straight lines at non-integer positions with the discretized DFT grid, making

the determination of the mask’s boundaries challenging. This process is illustrated in

Figure 2a, where we can see the sampling grid as well as the discretized mask along

with the continuous straight lines defined by Eqn. 1. We can see that some pixels of

the mask extend over the boundary defined by the straight lines, whereas others barely

touch the boundary. The implications of this inaccuracy are two-fold. First, a mask

that is too large prevents the motion-induced spectral energy to be redistributed close

to the support boundary. Thus, we implicitly assume that these frequencies are zero

even for ideal motion-free data which may not be entirely true. Second, if the mask is

too narrow, energy may be shifted to spectral bins that are close to the boundary but

not accurately covered by the mask. Hence, these bins have none or only little influence

on the cost function and are therefore not important for the actual consistency. During

our experiments we have noticed that the latter, i.e., a too narrow mask is less robust

for motion estimation leading to a low-frequency sinusoidal signal with high amplitude

which interferes with the actual motion signal. Please see the discussion in Section 5

for more details.

As a result we present an adjustment of Eqn. 4 such that the mask region slightly

extends over the lines. The outcome can be seen in Figure 2b, where the adjusted


ξu

[

10−3

mm

]

-5 0 5

ωk

[

1

rad

]

-1

-0.5

0

0.5

1

(a) Direct discretization

ξu

[

10−3

mm

]

-5 0 5

ωk

[

1

rad

]

-1

-0.5

0

0.5

1

(b) Adjusted mask

Figure 2: (a) Direct discretization of triangular regions, (b) Heuristic adjustment of

mask to ensure that all motion related energies are covered.

mask is again visualized as gray area. The mask now extends over the solid blue lines

that indicate the original boundary. As new discretization boundaries we define four

shifted lines (solid, red lines) parallel to the original triangle boundary (dashed, red

lines). These lines form two overlapping triangles which define our new mask region

as depicted in Figure 2b. To ensure that the zero-frequencies are not contained in

the adjusted mask, we exclude frequencies that are covered by both triangles. The

adjustment can be easily done by modifying the distance ε of ideal triangle boundaries

to the shifted, parallel lines.

2.5. Regularization, Optimization and Motion Compensation

By definition the FCC cost function in Eqn. 6 is invariant to shifts in u- or v-direction

which are constant over the projection images, e.g. a global detector offset. Let ϕi,j,nbe the complex Fourier coefficient of the 2-D Fourier transform of the n-th projection

image along u- and v-direction and c1 and c2 are constant shift values, respectively.

Further, xi,j,k is the 3-D Fourier coefficient of frequency bin (i, j, k) computed by

xi,j,k(α) = wi,j,k

K∑n=1

ϕi,j,n e−ı2π(ξisn+ψjtn) e−ı2πkn/K . (12)

If we set sk = c1 and tk = c2, we can factorize the phase factor out of the Fourier

transform’s summation

xi,j,k = wi,j,k e−ı2π(ξic1+ψjc2)K∑n=1

ϕi,j,n e−ı2πkn/K

xi,j,k = wi,j,k e−ı2π(ξic1+ψjc2) Φi,j,k

, (13)


where Φi,j,k is the 3-D Fourier coefficient of the shift-free projection image stack at

frequency bin (i, j, k). The FCC theorem considers only the absolute value of the

spectrum which is why the cost function in Eqn. 6 uses the complex, squared L2-norm.

This is equal to the sum of element-wise complex conjugated multiplications

e(α) =I∑i=1

J∑j=1

K∑k=1

wi,j,k xi,j,k xi,j,k

e(α) =I∑i=1

J∑j=1

K∑k=1

wi,j,k Φi,j,k Φi,j,k eı2π(ξic1+ψjc2) e−ı2π(ξic1+ψjc2)︸︷︷︸=1

, (14)

showing that the cost function is independent of any constant shifts c1 or c2.

While a constant offset in v-direction corresponds only to an object shift in z-

direction (not considering cone-beam artifacts), a constant shift in u-direction leads to

distortions in the reconstruction. To account for this limitation and to ensure stability

of the algorithm we decided to provide our method with constant motion parameters

for the first projection image, i.e., s1 = s and t1 = t.

We incorporate s and t as regularizer that extends the cost function in Eqn. 6. Our

final cost function including the regularization is given by

e(α) = e(α) + κ1

2

∥∥∥∥∥(s1 − st1 − t

)∥∥∥∥∥2

2

. (15)

As the regularizer is incorporated with the Lagrangian multiplier κ, the gradient can be

computed in a straightforward manner yielding

∇e(α) = ∇e(α) + κ(

(s1 − s) (t1 − t) 0 · · · 0)T

. (16)

We use an unconstrained gradient-based optimizer for the minimization of Eqn. 15.

The gradient is computed analytically following Eqn. 16 whilst the inverse Hessian is

approximated from the gradient using a BFGS scheme. To determine the step direction

we attempt a Newton step based on the inverted Hessian. Finally, the step size is given

by a line-search method. Further information about the optimizer may be obtained

from Maier and Berger (2010); Verrill (2005).

The final motion estimates are directly incorporated into the projection matrices

to compensate motion during the backprojection step. The function p(pk,α) is used to

update the original, motion corrupted projection matrices pk ∈ R3×4 according to

p(pk,α) =

1 0 − sk∆u

0 1 − tk∆v

0 0 1

pk . (17)

2.6. Axial Truncation: Problems and Solutions

FCC are not theoretically defined for truncated projection data. For the proposed

method the truncation problem becomes visible after application of detector


λ

u

(a) Truncated fan-beam sinogram

λ

u

(b) Sinogram after Fourier-based shifting

Figure 3: (a) Truncated fan-beam sinogram. (b) Reappearance of sinogram at its

boundaries after applying translations in Fourier domain. The erroneously introduced

intensity variations over λ may violate FCC.

translations. A detector translation using phase factors, causes image content at the

detector boundary to reappear at the opposite side. This is shown in in Figures 3a

and 3b, using a truncated fan-beam sinogram and its shifted version using a simulated

detector translation. Application of the translation in spatial domain would also not

solve the problem as the periphery is still unknown. We propose a practical extension

to FCC, that allow their application to truncated data. In principle the concept would

also be suitable for lateral truncation, however, within the scope of this work we focus

on axial truncation only. Thus, the method becomes applicable to more realistic data,

where axial truncation is usually unavoidable.

To ensure that the projection image is of limited extent, we multiply each image

with an apodization function that smoothly fades out to zero in horizontal detector

direction. This weighting emulates a maximum object size along the scanner’s rotation

axis. However, apodization has to be performed after application of estimated motion

parameters in order to avoid intensity variations due to the windowing function itself.

For motion estimation we use a Blackman-Harris apodization function as depicted in

Fig. 4. The function is then multiplied with each column of all projection images, after

applying the estimated translations. However, instead of a regular Blackman-Harris

window, that covers the full detector height, we limit the range by tmax at its borders.

As a result, any translation tk ∈ [−tmax, tmax] will not cause a periodic reappearance at

bottom or top of the detector.

Apodization is easily integrated into the cost functions in Equations 2 and 6. Let

A = diag(a) ∈ RIJ×IJ be the diagonal matrix which holds all weights a ∈ RIJ of the

apodization function. Following Harris (1978) it is defined by,

ai,j,k =

0 if j /∈

[d tmax

∆ve+ 1, J − d tmax

∆ve]

3∑l=0

cl · cos(l (2πj′)J ′

)otherwise

(18)

with adjusted index j′ = j − dtmax/∆ve − 1 and reduced window width J ′ = J −2dtmax/∆ve, where d·e represents rounding to the next highest integer. The coefficients


1 ⌈ tmax

∆v⌉+ 1 J

2J − ⌈ tmax

∆v⌉ J

0

0.2

0.4

0.6

0.8

1

Figure 4: Adjusted window function to allow for translations tk ∈ [−tmax, tmax] on

truncated data . Yellow, dashed line: Original Blackman-Harris function with a width of

j ∈ [1, J ]. Blue, solid line: Adjusted function ranging from j ∈[d tmax

∆ve+ 1, J − d tmax

∆ve].

are given by c0 = 0.35875, c1 = −0.48829, c2 = 0.14128, c3 = −0.01168.

The cost functions in Eqn. 2 is now adjusted to

e(α) =∥∥∥W (F Ap(α))

∥∥∥2

2. (19)

However, incorporating A into the efficient cost function described by Eqn. 6,

requires an additional inverse (i.e., F Huv) and forward (i.e., F uv) 2-D FFT, yielding

e(α) =1

2

∥∥∥WF λ (F uvAF

Huv)︸︷︷︸

apodization

(T (α)F uvp)

∥∥∥2

2. (20)

Consequently, the computational benefit of this description would vanish. Yet we

can exploit the convolution theorem and apply the window by convolution with its

Fourier transform. It is particularly beneficial that the Blackman-Harris window, has

only a small number of nonzero Fourier coefficients (Harris; 1978). For tmax = 0 it holds

that only four unique spectral coefficients are present, requiring seven multiplications

and six summations at each Fourier coefficient to perform apodization. This is no longer

true for tmax > 0, as the window’s spectrum is no longer sparse. However, in practice

using only a small number of most significant coefficients, reduces complexity and still

leads to a good approximation of the actual window. In practice, we chose to retain

eight unique spectral coefficients, which led already to a very close approximation to

the actual window function. Note, that the cost function’s gradient in Eqn. 9 is still

valid. Yet, it now also includes an apodization step, thus, obtaining partial derivatives

also requires described convolution in spectral domain.

An unwanted side effect of apodization is that the overall energy of the spectrum

may not remain constant. Hence, energy in the triangular regions can also be reduced

by translations that move high intensities such that they observe a lower weight


during apodization. To reduce this effect, we propose regularization with the motion

parameters’ squared L2 norm. The updated cost function is given by

e′(p) = e(p) + κ′(

1

2IJKpTp

). (21)

Consequently, the overall gradient of the cost function may be computed by

∇e′(p) = ∇e(p) + κ′(

1

IJKp

). (22)

Similar to Eqn. 15 an additional Lagrangian multiplier κ′ is used to create a single

combined cost function for optimization.

2.7. Implementation Details

The runtime of a single cost function and gradient evaluation is crucial for the

practical applicability of our algorithm. First tests based on a single-threaded CPU

implementation revealed a high runtime of several minutes for only a single evaluation

of the cost function due to multiple iterations over the projection volume and the 3-D

Fourier transform. We save time by exploiting the shift theorem and use Eqn. 6 so that

F uvp can be precomputed prior to the optimization. To further speed up the recurring

1-D-Fourier transform over the angular direction, we also swapped dimensions such that

the projection angles are stored sequentially in memory. Further, we decided to exploit

the parallelizable nature of the problem and implemented all necessary computations

on the GPU using OpenCL.

2.7.1. Cost Function Evaluation For large data (in our case 640 × 480 × 512 × 2 ×4 Byte/10242 ≈ 1200 MB), it may be infeasible to store multiple copies of the projection

stack in GPU memory. Therefore, we only keep a single copy of the data on the GPU,

representing the shifted projection images. The parameter vector for the n-th iteration,

i.e. αn is given by the optimizer and builds upon the original motion-corrupted data.

As a first step we build a relative parameter vector ∆αn = αn − αn−1, where αn−1

denotes the parameter vector of the previous iteration. Instead of applying αn, we use

the relative shifts in ∆αn and apply them directly to the already shifted projection

volume.

The algorithm for a single cost function evaluation consists of the following steps:

(i) Compute relative parameter vector: ∆αn = αn −αn−1

(ii) Apply shifts: Performs the complex multiplication with the phase factors derived

from ∆αn. We precompute two vectors holding all ψ and ξ values. In every

iteration, only the relative parameter vector ∆αn needs to be transferred to the

GPU.

(iii) Angular 1-D FFTs: We use a GPU-based OpenCL library for the 1-D FFT along

the projection angles. This implementation does not support memory efficient in-

place transforms, which is why we store the 3-D FFT result separately.


(iv) L2-norm and mask: The sum of all magnitudes in the mask region is built using a

GPU-based parallel reduction. Note, that only the summation result needs to be

transferred back to the host. The mask indicating the frequency-region we want

to minimize is precomputed according to Eqn. 3, hence, only the final result of the

summation needs to be transferred back to the host.

2.7.2. Gradient Computation Eqns. 8 and 9 state that the gradient vector requires

computation of x(α) ∂∂αlx(α) for each of the 2K partial derivatives. x(α) is already

computed, as the gradient computation is executed after the cost function evaluation.∂∂αlx(α) consists of a multiplication of the precomputed 2-D Fourier transforms F uv p,

with the partial derivatives of the shifting matrix ∂∂αlT (α), followed by the angular

Fourier transform F λ. Thus, naive implementation requires one Fourier transform per

partial derivative. However, a mathematical trick allows to avoid computation of the

angular FFTs entirely.

We exploit the fact that ∂∂αlT (α) will only be non-zero for values corresponding to

the l-th projection image. The same holds for ∂∂αlT (α)F uv p. Using the notation in

Section 2.5, we define the angular FFT of the partial derivative as

∂∂αlxi,j,k(α) = wi,j,k

K∑n=1

ϕi,j,n

(∂

∂αle−ı2π(ξisn+ψjtn)

)︸︷︷︸

=0 ∀n6=l

e−ı2πkn/K ,

∂∂αlxi,j,k(α) = wi,j,k ϕi,j,l

(∂∂αl

e−ı2π(ξisl+ψjtl))e−ı2πkl/K .

(23)

As a result, the angular Fourier transform breaks down to a simple multiplication by

e−ı2πkl/K which drastically reduces the computational complexity.

Further, we make sure that the partial derivatives ∂∂sle(α) and ∂

∂tle(α) are computed

within the same GPU kernel call. As both values are based on the same data, and data

transfer is the dominant time factor, this step almost halves the overall runtime.

3. Simulation and Evaluation Protocol

To evaluate our method, we perform a numerical phantom study using the 3-D

FORBILD head phantom (Lauritsch and Bruder; 2001). Overall, we created five

projection image stacks, where three of them suffer from high-frequency object motion

and two contain only low-frequency motion. We also rendered motion-free projections

to obtain ground-truth images and a reference reconstruction.

All projection images have been created using a monochromatic absorption model

with a photon energy of 80 keV. The rendering was performed as described by Maier

et al. (2012). It uses analytic descriptions of the phantoms which allows for a direct

computation of intersection lengths. We performed noise-free and noisy simulations

using a Poisson distributed noise model. The number of photons emitted from the X-

ray source was limited to 5000 per detector pixel which resulted in a high noise level.


A small number of rays had large intersection lengths with bone material, so that we

encountered photon starvation. Those rays produced infinite line integral values which

were subsequently limited to the maximum value of the pixels that did not suffer from

photon starvation.

One application field we aim for with the development of FCC-based motion

estimation, is knee imaging under weight-bearing conditions. Therefore, we also evaluate

the proposed method on a simulated but highly realistic dataset of two knees under load.

The data is generated with the 4-D XCAT model, which is based on segmentations of

real anatomies (Segars et al.; 2010). Moreover, the simulated motion is taken from a real

subject by incorporating motion recordings of subjects holding a squat for 20 s using an

optical camera tracking system. For details on the acquisition process and registration

of the recorded to the XCAT motion we refer to (Choi et al.; 2013).

In contrast to the FORBILD phantom, axial truncation is inevitable when imaging

knees. Yet, the datasets do not suffer from lateral truncation as the detector spacing was

specifically increased to allow full coverage of both legs. Note, that the FCC method is

not yet applicable to real knee acquisitions, as that would require full scans and possibly

also truncation-free acquisitions in lateral direction. Nevertheless, we consider this

experiment as a first step towards an application in real scenarios. We are specifically

interested, if the proposed extension based on dynamic apodization of projection images

in detector v-direction can account for axial truncation.

In addition to weight-bearing imaging of the knee, we recently proposed the

usage of FCC for respiratory motion estimation in rotational angiography of coronary

arteries (Unberath et al.; 2017). In this field, the respiratory motion of the heart is

often approximated by only vertical detector translations. Thus, we restricted the

optimization to only vertical translations by removing sk from the parameter vector.

A synthetic rotational angiography dataset, also based on the XCAT phantom, was

used for projection generation. The background structures, e.g., soft tissue and rips,

were removed by a novel approach for virtual, digital subtraction angiography (DSA)

to achieve truncation-free projection images, depicting contrast filled coronary arteries

only (Unberath et al.; 2016). For further information on data acquisition and the setup

of the experiment, we refer to (Unberath et al.; 2017). Note, that this study did not

include subsequent image reconstruction, which is still challenging due to the heart’s

contraction.

Table 1 shows the geometry and reconstruction parameters for the simulations. We

estimate the object extent rp directly from the motion-corrupted data. We estimate rp by

using the motion that shows the largest displacement in horizontal detector direction.

This way, rp will be rather overestimated, causing smaller triangular regions, which

prevents the mask from extending into spectral energies originating from the scanned

object. First, we build the sum over dimension J and dimension K, resulting in a vector

of size I. Then, we manually identify the distances of the left and right zero-edge to the

center, i.e. ul and ur, respectively. The maximum object extent is estimated by fitting


Parameter Symbol Unit Ideal / Noisy XCAT Knee

SID L mm 600 780

DID D mm 600 418

Object size rp mm 125.0 112.3

Mask adj. ε - 3× 10−3 1× 10−3

Detector Size I×J - 640× 480 620× 480

Pixel size ∆u×∆v mm2 1.2× 1.2 1.2× 1.2

#Projections K - 512 256

Angular spacing ∆λ deg 0.703 1.406

Reconstr. size Rx×Ry×Rz - 512×512×512 512×512×256

Isotr. voxel size - mm 0.5 0.5

#Photons per pixel - - - / 5000 -

Absorption model - - monochromatic

Photon Energy - keV 80

Table 1: Parameters used for FORBILD and XCAT knee cases. For the rotational

angiography dataset we refer to (Unberath et al.; 2017).

a circle to the backprojection of max(ul, ur). This can be formalized by

rp = max(ul, ur)L

L+D(24)

rp = rpL√

L2 + (rp)2, (25)

where Eqn. 24 corrects for magnification and Eqn. 25 for the fan-angle.

As described in Section 2.5, constants s and t have to be selected for the shift values

of the first projection. We decided to set these values to the 2-D ground-truth motion

to allow for an accurate quantitative comparison. Further implications on determining

s and t are discussed in Section 5. We normalized the data term e(α) such that it

evaluates to 100 in the first iteration and the Lagrangian multiplier of the regularizer

was set to κ = 1. Finally, the gradient-based optimizer was restricted to a maximum of

10000 iterations, yet, this limit was never reached in practice (see Section 4.6).

For the XCAT knee dataset the Lagrangian for L2 regularization κ′ as well as ε were

heuristically determined, whereas other parameters remained constant and identical to

the FORBILD cases. We did not have access to ground-truth detector translations.

Thus, we decided to remove the regularizer presented in Eqn. 15 by setting its Lagrangian

multiplier κ = 0. However, a constant bias in motion parameters is also prevented by

usage of the L2 regularizer.

Respiratory motion estimation on coronary artery data generally followed the same

pipeline as was used for the FORBILD cases. Apodization was not necessary as

truncation could already be removed using a virtual DSA approach.


3.1. Simulation of Translational 3-D Motion for FORBILD Phantom

In addition to a motion-free case, we simulated three high-frequency translational motion

patterns and two low-frequency cases. When selecting the motion patterns we made sure

that their visual effect to the triangular regions is widespread, as can be obtained from

the differences of the uncorrected spectra in Figure 8 and Figure 9. We use concepts

of projective geometry to efficiently describe the imaging geometry of a cone-beam CT,

i.e., the mapping of 3-D world to 2-D detector coordinates, using projection matrices

and homogeneous coordinates (Hartley and Zisserman; 2004). Thus, the motion can be

easily incorporated by a right-multiplication with the projection matrices

P k = P k

1 0 0 ∆xk0 1 0 ∆yk0 0 1 ∆zk0 0 0 1

∀k ∈ [1, · · · , K] , (26)

where P k ∈ R3×4 are the original projection matrices of the circular trajectory. The

matrices P k are then used for the creation of the motion-corrupted projection images.

Note, that Eqns. 27 to 31 are normalized such that they take τk = k−1K−1

∈ [0, 1] as

a temporal parameter, which would effectively equal a total scan time of 1 s. Thus,

motion frequencies reported below have to be interpreted in perspective and would have

to be divided by a typically longer scan time. Fig. 5 provides an overview of all simulated

3-D translational motion patterns.

Oscillating Motion (Oscil) Our first motion type represents an accelerated,

oscillating translation along the x-, y- and z-direction. It is a straightforward 3-D

extension of the 2-D motion used in our fan-beam study (Berger et al.; 2014). This

motion has a constant cosine oscillation with zero-mean. Low-frequency contributions

are small due to a moderate setting for the acceleration factor. The translations are

computed as

∆xk = ∆yk = ∆zk = moscil(a, b, f) = a( 2

1 + exp(b cos(2πfτk))− 1), (27)

where a controls the amplitude, b the acceleration and f is the frequency. We

set a = 3 mm, b = 4 and f = 16 Hz. Note, that the amplitude is given by

a(exp(b)− 1)/(exp(b) + 1) ≈ 2.89 mm.

Linear Increase in Frequency (Chirp) This motion describes an oscillating cosine

motion with the same amplitude along the x-, y- and z- direction. It has a zero-mean

and only a moderate amplitude but the frequency increases linearly between [0, p] over

the acquisition time. We use this type of translation characterize the behavior of our

method for a combination of low- and high-frequency motion. It is define by

∆xk = ∆yk = ∆zk = a cos(2πfkτk) , (28)


-202

Oscil

-101

Chirp

-2

0

2

Rect

0246

LF-1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Normalized scan duration τk

0123

LF-2

Figure 5: Plot of all simulated 3-D translations. Except for LF-1 , all translations have

been simultaneously applied to x-, y- and z-direction, to ensure that the object motion

covers all possible directions.

where we set the amplitude to a = 1.5 mm, the frequency to fk = (p τk) and the

frequency pitch to p = 64 s−2.

Rectangular Steps (Rect) With this motion, we model sharp transitions of motion

states by two overlapping rectangular functions of different amplitude and frequency.

The rectangular function is emulated by the Oscil function in Eqn. 27 with an extreme

setting for the acceleration factor b, i.e.,

∆xk = ∆yk = ∆zk = moscil(a1, b, f1) +moscil(a2, b, f2) , (29)

where a1 and a2 as well as f1 and f2 correspond to the amplitude and frequency of

the first and second rectangular signal, respectively. We set a1 = 1.5 mm, f1 = 16 Hz,

a2 = 1.0 mm, f2 = 4 Hz and b = 128.

Low-Frequency Motion 1 (LF-1) We adopted this motion from Yu et al. (2006)

who used the x- and y-axis translation for their evaluation using fan-beam CC. The

object moves linearly over time in along x-, y- and z-axis with three different slopes.

The motion is therefore low-frequent and has a nonzero-mean. The starting and end

position are not identical, leading to a sharp transition in the periodic extension over

the projection angles. This leads to strong streaking artifacts and influences the angular

Fourier transform as it assumes a periodic extension. The motion is defined by

∆xk = q1τk, ∆yk = q2τk, ∆zk = q3τk , (30)


where q1, q2 and q3 are the slopes for x-, y- and z-axis which we set to q1 = 6 mm s−1,

q2 = 4 mm s−1 and q3 = 3 mm s−1.

Low-Frequency Motion 2 (LF-2) In contrast to LF-1 , this low-frequency motion

has the same start and end position. It performs a single forth-and-back transition

along x-, y- and z-axis. The reconstruction artifacts are limited to motion blur and

low-frequency intensity gradients as no sharp transitions exist in the angular direction

of the projection stack. The motion is given by

∆xk = ∆yk = ∆zk = a

(exp(−cos(2πfτk) + 1)− 1

exp(2)− 1

), (31)

where we set the amplitude to a =√

252

mm and the frequency to f = 1 Hz.

3.2. Evaluation Protocol of Estimated 2-D Motion Parameters

We perform a qualitative comparison of the estimated motion parameters using plot

overlays. Additionally, we compute the mean-absolute-distance (MAD) and its standard

deviation (SD) between the estimated and the ground-truth shifting parameters. The

measures are calculated separately for each detector direction for all cases, including

motion-free, noise-free and noisy data. They are defined by

MAD(α) = 1K

K∑k=1

(|sk − gk1||tk − gk2|

)

SD(α) = 1K−1

K∑k=1

((|sk − gk1| −MAD(α)1)2

(|tk − gk2| −MAD(α)2)2

).

(32)

where gk =(g1k g2k

)T

denotes the 2-D ground-truth motion. We compute gk by

projecting the coordinate center as well as the 3-D translations(

∆xk ∆yk ∆zk

)T

to

2-D detector coordinates. Due to the structure of the projection matrices, the results are

directly given by the last columns of P k and P k, which we call ck =(ck,1 ck,2 ck,3

)T

and ck =(ck,1 ck,2 ck,3

)T

, respectively. The projection to 2-D is then given as division

by the third component. As shown in Eqn. 33, gk can be computed by the difference of

the projected 2-D points, scaled by the detector pixel sizes.

gk =

(∆u 0

0 ∆v

)((ck,1/ck,3ck,2/ck,3

)−

(ck,1/ck,3ck,2/ck,3

))(33)

3.2.1. Image-Based Evaluation Protocol We performed three different reconstructions

for all cases using the reconstruction settings mentioned in Table 1. The standard

Feldkamp-Davis-Kress (FDK) algorithm (Feldkamp et al.; 1984) was used for image

reconstruction, followed by clamping negative reconstruction intensities to zero.


Note that the upper bound for the image quality after a correction is determined

by the limitations of the motion model. Detector shifts do not take the scaling effects

into account that occur from motion orthogonal to the detector. To allow for a

separate evaluation of the motion model and the motion estimation we perform three

types of reconstructions. Initially, we created uncorrected reconstructions using the

original projection matrices P k. Thereafter, the parameter vector as obtained from the

minimization of Eqn. 6 was used to create the FCC-based reconstruction. Additionally,

we built reconstructions using the extracted ground-truth motion gk directly as input

to Eqn. 17. These reconstructions serve as an upper bound for the image quality that

can be achieved given this motion model.

On top of a qualitative comparison we added a quantitative evaluation of the

reconstructed volumes with respect to the motion-free reconstruction. We computed

the root-mean-squared-error (RMSE) as well as the structural-similarity index (SSIM)

proposed by Wang et al. (2004). Let f(x, y, z) be the reconstruction that we want to

evaluate and r(x, y, z) the motion-free reference reconstruction. Then, the RMSE can

be computed as

RMSE(f, r) =

(1

RxRyRz

Rx∑x=1

Ry∑y=1

Rz∑z=1

(f(x, y, z)− r(x, y, z))2

) 12

. (34)

It has been shown that the RMSE may not reflect the visual perception of image

differences (Wang et al.; 2004). In contrast, the SSIM was designed to incorporate this

information by merging information about intensity, contrast and structural differences

of two images. It is defined by

SSIM(f, r) =(2µfµr + C1)(2σfr + C2)

(µ2f + µ2

r + C1)(σ2f + σ2

r + C2), (35)

where f and r correspond to volume regions, µf and µr are the mean intensity values of

the regions, σf and σr are their standard deviations and σfr their correlation. Further,

C1 and C2 are constants that assure stability for regions which either have a zero-mean

(µ2f +µ2

r ≈ 0) or are free of variations (σ2f +σ2

r ≈ 0). We adjusted them to C1 = (0.01B)2

and C2 = (0.03B)2 as proposed by Wang et al. (2004), where B is the intensity range

of the full reference volume r(x, y, z). We evaluate the SSIM (see Eqn. 35) locally for

each voxel, using a cubical neighborhood of size 9 × 9 × 9. The final value is obtained

by the mean over all voxel-wise SSIM values.

4. Results

4.1. Qualitative Results for the FORBILD Cases

To provide a qualitative comparison we extracted the central axial slices of the

reconstructed volumes. The first row of Figure 6 shows the reconstructions of the

motion-free reference and the following rows show the reconstruction of high-frequency


motion cases Oscil , Chirp and Rect , respectively. Similarly, Figure 7 depicts two cases

that have initially been corrupted with a low-frequency motion pattern, namely LF-1

and LF-2 .

On the columns of Figure 6 and Figure 7 we show reconstructed slices without

correction (NoCorr), corrected using the proposed method (FCC), and the reference

reconstruction using the ground-truth 2-D motion (CorrGT). The fourth column shows

zoomed details of the head’s anterior region, containing a high amount of bone as well as

a low-contrast region representing the left and right eyeball. Similarly, the fifth column

depicts details of the ear region which contains high intensity variations due to bone

material and inserted air bubbles. The zoom regions are indicated by yellow (dotted)

and green (dashed) bounding boxes on the corresponding NoCorr reconstructions. From

top to bottom the zoomed images correspond to NoCorr, FCC, and CorrGT.

For the Oscil (cf. Figure 6(f-j)) as well as for the Rect motion (cf. Figure 6(p-

t)), we can see clear streaking and blurring artifacts in case no correction was applied.

The Chirp motion has a lower amplitude, yet, a broad range of motion frequencies

is observed, mainly resulting in streaking artifacts (cf. Figure 6(k-o)). The proposed

method was able to improve image quality for all motion types. The detailed views of

the anterior and the ear show that the structural loss of the Rect and Oscil motion could

be well restored and the low-contrast eyeballs and the cerebral fluid can be identified

after the correction. The structural image quality of the corrected reconstructions is

close to that of the reconstructions using the ground-truth motion. Streaking artifacts

are also reduced to a large extent, which can be clearly seen when comparing Figure 6k

with Figure 6l. From the reconstructions based on the ground-truth 2-D motion we can

see that a more accurate motion estimation could further improve streak reduction.

When applied to motion-free projection images our method slightly misestimates

motion in detector u-direction as can be seen in Figure 11a. Note that Figure 6a is

identical to the reconstruction in Figure 6c as the 2-D ground-truth motion was zero.

Both correspond to the motion-free reference volume r(x, y, z).

The reconstructions of the low-frequency motion in Figure 7 are dominated by

blurring artifacts. Streaking is only present for the LF-1 motion due to an angular

discontinuity based on the different starting and end positions of the object. The

streaking could be well reduced by our correction as can be seen in Figure 7b and

Figure 7d. The low-frequency intensity variations resulting from the LF-2 motion

appear well restored when comparing Figure 7f and Figure 7g. The zoomed images

of the ear region in Figure 7e and Figure 7j reveal that the spatial alignment could be

enhanced for both motion types. However, there is still a noticeable difference in edge

sharpness when comparing reconstructions based on the ground-truth motion and the

proposed method.

In addition to the reconstruction results we show the sinograms and the

logarithmically scaled magnitude of their 2-D spectra. The sinograms and spectra

correspond to the central detector row. They are depicted for the high- and low-

frequency motion in Figure 8 and Figure 9, respectively. The first three columns show


(a) NoCorr (b) FCC (c) CorrGT (d) Anterior (e) Ear

(f) NoCorr (g) FCC (h) CorrGT (i) Anterior (j) Ear

(k) NoCorr (l) FCC (m) CorrGT (n) Anterior (o) Ear

(p) NoCorr (q) FCC (r) CorrGT (s) Anterior (t) Ear

Figure 6: Top to bottom: Axial reconstructions for motion-free (None) and high-

frequency motion data (Oscil , Chirp and Rect). Left to right: Reconstructions without

correction (NoCorr), corrected using FCC and corrected with the ground-truth motion

(CorrGT). Zoomed regions of Anterior and Ear regions with results from NoCorr, FCC

and CorrGT as rows. (W: 697 HU, C: 105 HU).

the motion-corrupted sinogram, the corrected sinogram and a zoom to the sinograms’

region highlighted by a red (dotted) box in the motion-corrupted sinograms. The next

two columns depict the corresponding spectra with a zoom to a low-frequency region in

the last column.



(f) NoCorr (g) FCC (h) CorrGT (i) Anterior (j) Ear

Figure 7: Axial reconstructions for low-frequency motion data (LF-1 and LF-2 ).

Arrangement of results identical to Figure 6. (W: 697 HU, C: 105 HU).

High-frequency motion artifacts are clearly visible in the sinogram domain,

especially for Oscil (cf. Figure 8g) and Rect (cf. Figure 8s), due to their higher

motion amplitudes. Also the low-amplitude Chirp motion can be well identified in

the zoomed sinogram in Figure 8o. Compared to the spectrum of the motion-free

reference in Figure 8d, the triangular regions for all high-frequency motion patterns

show a substantial increase of energy. Interestingly, Chirp motion exhibits a wider

redistribution of spectral energies into the triangular regions as the Oscil motion.

This indicates that the redistribution of energies is more sensitive to the motion’s

frequency rather than its amplitude. Our correction method was able to restore the

distorted sinogram outline which can be clearly seen in the zoomed sinogram regions of

Figure 8(i,o,u). Additionally, large portions of the spectral energies could be moved back

to the object’s support region, leading to a clearly visible restoration of the triangular

regions.

When applying our motion correction to the motion-free data we notice very little

difference in the sinogram (cf. Figure 8a and 8b) as well as the spectral domain (cf.

Figure 8d and 8e).

Due to the low-frequency object motion, only little effect can be seen in the sinogram

domain for the LF-1 and LF-2 motion types. Similarly, the difference between corrected

and motion-corrupted sinograms is only visible from a slight shift and deformation of

the sinogram boundary in Figure 9c and 9i. The changes to the triangular regions

are substantially smaller than for the high-frequency motion patterns. The vertical

bar in the center of the spectrum for the LF-1 motion type corresponds to the angular


(a) NoCorr (b) FCC (c) Edge (d) NoCorr (e) FCC (f) LF

(g) NoCorr (h) FCC (i) Edge (j) NoCorr (k) FCC (l) LF

(m) NoCorr (n) FCC (o) Edge (p) NoCorr (q) FCC (r) LF

(s) NoCorr (t) FCC (u) Edge (v) NoCorr (w) FCC (x) LF

Figure 8: Sinograms and their log-spectra for the central slice of the projection stack.

Top to bottom: Motion-free (None) and high-frequency motion data (Oscil , Chirp and

Rect). Left to right: Sinogram without correction (NoCorr), corrected using FCC, two

zooms at the sinograms’ edge (Edge), the log-spectrum of the uncorrected and corrected

sinogram and a zoom to the lower frequencies of the spectra (LF). Visualization window

for sinograms and log-spectra were [1.5, 5] and [5, 10], respectively.

discontinuity due to varying start and end positions of the object. In contrast corrections

for motion type LF-2 in Figure 9j are almost identical to the motion-free spectrum in

Figure 8d. Our correction method was able to visually reduce the energy inside the

spectral regions belonging to the LF-1 motion. However, the energy spread of the LF-2


(a) NoCorr (b) FCC (c) Edge (d) NoCorr (e) FCC (f) LF

(g) NoCorr (h) FCC (i) Edge (j) NoCorr (k) FCC (l) LF

Figure 9: Sinograms and log-spectra for low-frequency motion data (LF-1 and LF-2 ).

Arrangement of results and visualization windows identical to Figure 8.

motion in Figure 9l into the triangular regions even appears to increase.

In Figure 10 we show examples of reconstructions, sinograms, and spectral domain

visualizations of a noise-prone case. In general, our correction method performed

similarly when applied to data with a considerable amount of noise. Despite the

reduction in image quality due to the noise itself, we could not observe a lower accuracy

of the motion estimation when compared to the noise-free examples. This is further

supported by the quantitative evaluation of the motion estimates in Table 3. Even

though Figure 10i and 10j clearly show the noise in the spectral domain, we observed

only minor differences between the motion estimates of noisy and noise-free data.

4.2. Quantitative Results for the FORBILD Cases

4.2.1. Image-Based Measures The SSIM and RMSE results are given in Table 2a and

Table 2b, respectively. The first three rows of the tables show the results for the ideal,

noise-free data, whereas the last three rows correspond to the noisy data. We evaluated

reconstructions without correction (NoCorr), the corrected versions using the proposed

method (FCC) and the corrections using the 2-D ground-truth motion (CorrGT). For

presentation SSIM and RMSE values are scaled by a factor 100. Thus, the maximum

SSIM value is 100, which corresponds to the minimum RMSE of 0, i.e., both volumes

would be identical. Note that the CorrGT values build an upper bound for the FCC

results, as they merely reflect loss of image quality due to the motion model itself.

For all motion-corrupted datasets, the FCC led to a substantial increase of the

SSIM value and a clear reduction of the RMSE when compared to the NoCorr values.

The RMSE values of the FCC results are in a comparable range for all motion cases.



(f) NoCorr (g) FCC (h) Edge (i) NoCorr (j) FCC (k) LF

Figure 10: Top: Axial reconstructions for the noisy Oscil data. Bottom: Sinograms

and spectra for central slice of projection stack. The motion estimation worked equally

well as for the ideal data, despite the high amount of noise. Visualization windows are

equal to Figure 6 and Figure 8.

In contrast, the improvements in SSIM are not as consistent. Cases with very high-

frequency motion, i.e., Chirp and Rect , showed the highest difference between CorrGT

and FCC, whereas, for other motion types FCC yielded measures close to the CorrGT

results. From the SSIM results of CorrGT we can see that the high-frequency motions,

i.e., Oscil , Chirp, and Rect , were most difficult to reconstruct, which may correspond

to the increased level of streaking artifacts due to fast variations in the motion. These

motion types also had the lowest SSIM if no correction was applied (NoCorr). When

applying the FCC to a motion-free dataset (None), we notice a decrease of the SSIM

by 1.6 and 1.7 for the ideal and noisy data, respectively. Similarly, Table 2b shows

an increase of the RMSE by 2.6 for the ideal and 1.5 for the noisy data. Noise did

not substantially change the RMSE value in case of the motion-corrupted data, yet the

uncorrected, motion-free reconstruction increased from 0 to 1.4. The SSIM appears to

be more sensitive to the noise with a reduction from 100 to 85.3 in case of the motion-

free reference reconstruction. Despite the general reduction of SSIM values in case of

noise, the relative changes of the FCC compared to NoCorr results are similar to those

of the noise free cases, showing the methods robustness against noise.

4.2.2. Motion Estimation Accuracy Figure 11 shows the estimated motion sk and tk in

blue for the high-frequency and the low-frequency motion, and the projected 2-D ground-

truth motion gk in red on noise-free data. Additionally, Table 3 shows the MAD and


None Oscil Chirp Rect LF-1 LF-2

Idea

l NoCorr 100.0 61.7 73.6 69.2 85.1 92.7

FCC 98.4 87.6 84.0 88.5 96.2 97.4

CorrGT 100.0 90.7 95.4 94.2 97.6 99.1N

oisy

NoCorr 85.3 53.8 64.2 60.6 72.4 78.5

FCC 83.6 74.9 72.3 75.7 81.9 82.9

CorrGT 85.3 77.5 81.2 80.3 83.2 84.6

(a) SSIM results


Idea

l NoCorr 0.0 11.4 4.8 8.0 13.7 8.2

FCC 2.6 3.6 3.3 3.3 3.4 3.4

CorrGT 0.0 2.1 0.8 1.7 2.6 2.0

Noi

sy

NoCorr 1.4 11.5 5.1 8.1 13.7 8.3

FCC 2.9 3.9 3.6 3.6 3.7 3.7

CorrGT 1.4 2.6 1.7 2.2 3.0 2.4

(b) RMSE results

Table 2: Quantiative results for reconstructions without correction (NoCorr), corrected

using the FCC method, and corrected using the 2-D ground-truth motion (CorrGT).

Evaluation was performed for all motion types (see columns) with ideal and noisy data.

its SD between 2-D ground-truth and estimated motion for the ideal and noisy data.

Note, that the geometric effect at the center of the reconstruction is approximately

scaled by the ratio of source-isocenter- to source-detector-distance LL+D

, which in our

case evaluates to 12.

The motion in vertical detector direction tk is estimated very well in all cases and

can even follow the sharp transitions of the Chirp and Rect motion (cf. Figure 11c and

11d). Also, on motion-free data (None) only very little fluctuation occurs. Quantitative

results for tk in Table 3 support these findings, where the highest MAD values evaluate

to 708 µm for Chirp, 415 µm for Oscil and 293 µm for Rect . After scaling by 12

to estimate

potential effects on the central reconstruction domain, all MAD values for tk result in

values below the reconstruction voxel size of 500 µm.

Somewhat larger deviations are observed for the horizontal detector direction sk.

As can be seen in Figure 11 all estimated motion curves show a low-frequency offset

to the ground-truth. Yet, the general trend of the motions’ variation is well covered,

particularly for high-frequency motion in Figures 11(b,c,d). The reduced accuracy is

also reflected in the MAD and SD values in Table 3, where the MAD values range

between 814 µm for the motion-free data up to 1184 µm for the Chirp motion. The SD

values are generally higher than those of tk with the highest values for Chirp and Rect .


-2

-1

0

1

sk[m

m]

100 200 300 400 500

Projection k

-0.5

0

0.5

tk[m

m]

(a) None

-10

-5

0

5

sk[m

m]

100 200 300 400 500

Projection k

-5

0

5

tk[m

m]

(b) Oscil

-6-4-2024

sk[m

m]

100 200 300 400 500

Projection k

-2

0

2

tk[m

m]

(c) Chirp

-10

-5

0

5

sk[m

m]

100 200 300 400 500

Projection k

-5

0

5

tk[m

m]

(d) Rect

-5

0

5

10

sk[m

m]

100 200 300 400 500

Projection k

0

2

4

6

tk[m

m]

(e) LF-1

-8-6-4-2024

sk[m

m]

100 200 300 400 500

Projection k

0

2

4

6

tk[m

m]

(f) LF-2

Figure 11: Plots of estimated motion parameters sk and tk (solid, blue line) and the 2-D

ground-truth motion (dotted, red line).

We made an interesting observation when having a closer look at the difference of

estimated and ground-truth motion for the horizontal detector direction, i.e., sk−g1k . In

Figure 12a we plotted this difference for all cases. It shows that the error is dominated

by the motion estimated on the motion-free data. Thus, the estimated motion sk always

consists of an additive mixture of the actual motion signal and the error estimated on


None Oscil Chirp Rect LF1 LF2Id

eal s 814 (472) 934 (548) 1184 (681) 913 (632) 909 (631) 960 (447)

t 11 (0) 415 (121) 708 (295) 293 (228) 217 (5) 194 (4)

Noi

sy s 800 (457) 923 (541) 1171 (673) 899 (620) 881 (543) 946 (431)

t 11 (0) 418 (123) 708 (295) 293 (228) 125 (5) 195 (4)

Table 3: Mean absolute difference and standard deviation (in brackets) between

estimated motion parameters and 2-D ground-truth motion given in µm.

100 200 300 400 500

Projection k

-2

0

2

(sk−

g1 k)[m

m]

(a) Horizontal motion error

-6-4-2024

sk[m

m]

-2

-1

0

1

sk[m

m]

-4

-2

0

2

4

s k[m

m]

=

−

(b) Additive model

Figure 12: (a) Differences between estimated and ground-truth motion show very similar

deviations for horizontal direction. (b) Subtracting the motion, estimated on the motion-

free reference data, from the estimated motion in case of motion-corrupted data, reveals

a very accurate motion estimation.

the motion-free data. In Figure 12b we subtract the motion estimated on the motion-

free data (cf. Figure 11a) from the estimated Chirp motion in Figure 11c resulting in

a very accurate estimation of the Chirp motion. We noticed the same effect for the

other motion patterns. Note, that the approach shown in Figure 12b is for visualization

purposes only and was not applied to obtain reconstruction results.

The deviation for the motion-free data has an MAD of 814 µm and is in the range

of −2.33 mm to 1.78 mm. The approximate motion effect in at the isocenter will be

scaled by a factor of 12

(see Section 4.2.2) resulting in 407 µm which is still below the

reconstruction voxel size of 500 µm.

4.3. XCAT Knee Phantom Under Weight-Bearing Conditions

Fig. 13 shows reconstructed images of the XCAT phantom. The images show axial

slices through tibia and fibula (top row) and sagittal slices of right-sided femur, patella,

and tibia (bottom row). From left to right, we can see reconstructions without motion

correction (NoCorr) in Fig. 13a, after correction with FCC in Fig. 13b, and the motion-

free data (GT) in Fig. 13c. Severe motion artifacts can be seen in case no correction is

applied, manifested by streaking and blurring. In addition, the outlines of femur and


(a) NoCorr (b) FCC (c) GT

Figure 13: Reconstructions of the XCAT squat phantom. Top row: Axial slices inferior

to the knee joint. Bottom row: Sagittal slices through the right leg. The colmuns

correspond to reconstructions without correction (NoCorr), after correction using FCC,

and of the motion-free projection images (GT). (W: 1815 HU, C: 246 HU).

patella are hardly visible in the sagittal slice of Fig. 13a. Our method was able to restore

most of the structure at left and right tibia and shows a clearer visibility of the fibulas.

The structure of femur and patella improved by a large amount, yet, the difference to

the ground-truth reconstruction is still substantial. Overall, the skin surface and soft

tissue appears sharper after correction and contains less streaking artifacts.

The same quantitative evaluation as used for the FORBILD phantoms was

performed. The resulting values support the visual impression in Fig. 13, where NoCorr

yielded an SSIM of 69.7 and an RMSE of 5.69. After applying the proposed motion

correction, image quality improved, resulting in an SSIM of 79.2 and an RMSE of 5.19.

Sinograms of the XCAT knee phantom are shown in the top row of Fig. 14. Irregular

trajectories of object boundaries clearly depict the motion artifacts for the uncorrected

sinogram in Fig. 14a. A location with a very apparent motion artifact is indicated by

a red arrow. After correction with FCC, the sinogram appears smoother and a clear

improvement can be seen at the location of severe motion. Yet, we still observe several

differences when comparing FCC and GT (cf. Figures 14b and 14c). The spectra, shown

with a logarithmic scaling at the bottom row of Fig. 14, depict the triangular regions

for a vertical frequency of ψ = 0 mm−1. The proposed motion correction could remove

a large amount of energy from the triangular regions compared to NoCorr, yielding a

sharper outline of the zero regions. However, after correction we still notice a higher

amount of energies located within the zero regions, compared to the ground truth in

Fig. 14c. In general, the spectral energies are in line with their assigned sinograms and


(a) NoCorr (b) FCC (c) GT

Figure 14: Sinograms and spectra for the XCAT squat phantom. Top row: Sinograms

extracted at a central slice. Bottom row: Spectra of the central slice, i.e., for ψ = 0 mm−1

after logarithmic scaling. Visualization windows of sinograms and spectra are [0.05, 0.5]

and [4, 12], respectively.

(a) Projection of DSA

20 40 60 80 100 120

Projection k

-20

0

20

tk[m

m]

(b) Respiratory motion estimates

Figure 15: (a) Angiographic image after virtual DSA. (b) Comparison of estimated

(blue) and ground-truth (red) detector translations for respiratory motion estimation.

the achieved image quality.

4.4. Respiratory Motion Estimation in Rotational Angiography

We conducted a feasibility study for respiratory motion in rotational angiography. The

breathing motion is estimated by vertical detector translations only. Ground-truth

translations were extracted by tracking two bifurcation points at a proximal and distal

point within the coronary tree. Finally, the ground-truth is given by the average

displacement of the tracked point locations.


Fig. 15a shows a typical projection image after application of virtual DSA, with the

background being well removed. The respiratory motion estimated by FCC is depicted

as blue curve in Fig. 15b. We can clearly see that the estimation accurately follows the

extracted ground truth, shown as red curve. Even the local maxima, which correspond

to craniocaudal displacement during the cardiac cycle, are well covered by FCC. Even

though FCC are not yet applicable for non-rigid deformations, they may be used to

improve initialization for a non-rigid cardiac reconstruction.

4.5. Mask Sensitivity Analysis

The goal of this evaluation is to verify the effect of the mask’s parameters. Additionally,

we aim to obtain a deeper insight in the systematic deviation of motion estimated in

detector u-direction. According to our definition in Eqn. 3 and the adjustment from

Section 2.4, the shape of the triangular regions depends on the estimated object radius

rp and the adjustment parameter ε. To get a better understanding of their non-intuitive

effect on the estimation accuracy, we performed a 2-D grid-search over rp and ε values.

The MAD has been computed for each parameter configuration and for each FORBILD

case. Note, that due the computational complexity (see Section 4.6), the projection

volume was downsampled by a factor of four, which also led to generally worse results

of the MAD.

We varied the object radius rp from its estimated value (cf. Eqn. 25) in the range

of ±50 mm with a sampling distance of 2 mm. Accordingly, the extension parameter

ε has been adjusted in the range of ε ∈ [−0.01, 0.01] with a sampling distance of

4× 10−4. Note, that a negative ε actually decreases the triangular regions, similar

to a morphological erosion, whereas, a positive value increases the region’s coverage as

proposed in Fig. 2.

Fig. 16 shows the MAD results for selected FORBILD cases. The images hold MAD

errors in its gray values whereas the image x- and y-axis encode rp and ε, respectively.

The top row corresponds to MAD(α)1 and the bottom row to MAD(α)2, i.e., to

detector u- and v-direction, respectively. Interestingly, the regions where ε > 0, i.e.,

the mask has been extended, are reproducible over all cases. However, for ε < 0,

i.e., for a reduced mask size, the results seem to depend strongly on the type of

simulated motion. For detector v-direction we observe generally better results than

for the horizontal translations. With an ε > 0 consistently good results were achieved,

relatively independent of variations in rp, which indicates a robust method for estimation

of vertical detector translations.

The orange cross represents the parameter settings used for our evaluation, thus,

they correspond to the additive bias presented in Fig. 12. For certain motion types we

were able to find parameter settings with an ε < 0 (smaller mask) that led to improved

motion estimates in u-direction. For None, this is the case within the black area in

the lower right of the top image in Fig. 16a. For Oscil these beneficial configurations

are scattered within the streaky pattern, meaning that small variations could cause


0

0

ε

ε

rp

(a) None

rp

(b) Oscil

rp

(c) Chirp

rp

(d) LF-2

Figure 16: Sensitivity analysis of mask parameters. Intensities correspond to MAD,

evaluated after each optimization. Top row: Results for detector u-direction. Bottom

row: Results for detector v-direction. Parameter ε ∈ [−0.01, 0.01] and rp is deviating

from its initial estimate by ±50 mm. Note, the reproducibility over all cases for ε > 0

(larger mask), whereas results for ε <= 0 (smaller mask) depend stronger on the motion

type. Visualization window: 0 mm to 10 mm.

large changes in the accuracy of the estimated motion. None of these settings could be

identified for low-frequency motions LF-1 and LF-2 .

In conclusion, an ε > 0 led to higher robustness and stable results over all motion

types. Yet, depending on the motion, the error might be reducible with a reduction of

the mask size (ε < 0). However, the outcome may be less predictable and parameter

tuning becomes more difficult.

Generally, we noticed that u-direction motion estimates with a larger error, i.e.,

corresponding to the white areas of the images in Fig. 16, are always dominated by

a relatively harmonic, low-frequency signal. This is well visualized in Fig. 17, where

we show 100 estimated motion curves for randomly selected parameter configurations

of the Oscil case. For all estimations we see that the actual oscillating motion is well

estimated, yet, it is added to the clearly visible erroneous low-frequency perturbation.

One reason for the presented low-frequency bias could be the heuristic extension of

2-D to 3-D FCC introduced in Section 2.1. To verify the validity of our 3-D extension, we

performed an identical sensitivity analysis for mask parameters on 2-D fan-beam data.

We repeated the same numerical FORBILD simulations in 2-D, including the rendering

of sinograms. Therefore, only the central slice of the detector is used and the axial

direction of the individual motion types was set to ∆zk = 0. Due to the computational


100 200 300 400 500

Projection k

-100

-50

0

50

sk[m

m]

Figure 17: Random selection of estimated u-direction translations for Oscil motion for

various combinations of rp and ε.

0

ε

rprp

(a) None

rp

(b) Oscil

rp

(c) Chirp

rp

(d) LF-2

Figure 18: Sensitivity analysis of mask parameters for 2-D fan-beam datasets. The

parameter ranges for rp and ε are identical to the 3-D evaluation, yet, a lower a sampling

rate has been applied. Behavior, of optimization method follows same trend as shown

for the 3-D results (cf. Fig. 16). Visualization window: 0 mm to 30 mm.

complexity, we used a lower sampling rate, yet, the parameter ranges were identical to

the 3-D experiment.

The resulting sensitivity maps for the 2-D fan-beam case are shown in Fig. 18.

The motion refers only to detector shifts in u-direction, as for the fan-beam case no

v-direction is present. We also observe reproducible results for ε > 0, whereas the

distributions of MAD values for ε < 0 are very similar to the 3-D evaluation. We

also observed very similar motion patterns than were estimated for the 3-D cases.

This includes the low-frequency bias shown in 17. Overall, the 3-D extension of FCC

performed very similar to the already more established 2-D FCC. In conclusion, the

results indicate that the systematic low-frequency bias does not originate from the

approximations made when extending FCC to 3-D cone-beam CT.

4.6. Robustness and Runtime

To evaluate the computational effort of the proposed algorithm, we performed runtime

measurements for all simulations involving the FORBILD phantom. Thus, a total of



250500750

1000125015001750

Runtime[s]

32 28

214 283

196

191 229

208

222 297

42 42

Figure 19: Total runtime of FCC motion estimation for all FORBILD cases. Depending

on the number of iterations (printed in white), the computation times lie between 3 min

to 30 min.

12 cases have been analyzed, including different motion patterns and noisy as well as

noise-free data. All performance measurements were collected on a desktop computer

(Windows 7 64-Bit, Intel Xeon CPU E5-1650 v3, 3.5 GHz, 32 GB RAM, Nvidia Quadro

K5200, 8 GB Graphics Memory). The maximum number of iterations was never reached

by the optimizer, thus, the optimization was either terminated by too small values within

the function gradient or too little variations of the motion parameters.

Fig. 19 depicts the total runtime for all FORBILD cases, where the left and right

bar of each case correspond to noise-free and noisy data, respectively. The number of

iterations used by the optimizer is reported within the bars. The longest computation

time of 1802 s was measured for the noisy LF-1 data and the lowest time of 189 s for the

noisy, motion-free data. The overall runtime appears to scale linearly with the number

of iterations. Runtimes of individual components are independent of the iteration count

and provide a fairer comparison. Overall, 1984 iterations were performed, using the

same number of gradient computations. The median gradient computation took 5734

[5725, 5743] ms, where the brackets report lower and upper quartiles. Each gradient

computation consisted of 512 computation steps for partial derivatives with respect to

sk and tk, having a median computation time of 11.2 [11.15, 11.24] ms. The number of

cost-function evaluations was 2125, hence, higher than the iteration count due to the

optimizer’s line-search approach. The median evaluation time was 87.14 [87.58, 88.27]

ms.

During our evaluation we initialized the estimated motion parameters with a

null vector. To evaluate the robustness of the optimization process with respect to

larger deviations in the initial motion estimates, we repeatedly initialized the motion

parameters with detector translations drawn from a 2-D Gaussian distribution. A

maximum intensity was determined by setting the standard deviations in u- and v-

direction to σu = 131 mm3

and σv = 25 mm3

, i.e., to a third of the non-radiated image

borders. Translations larger than the border were truncated to the border width. Thus,

no unrealistic periodic reappearance of the image content was ensured. The standard

deviations were varied ten times by a factor equally distributed within [0, 1]. For each

factor, we repeatedly applied the FCC algorithm ten times, yielding a total of 100


1 2 3 4 5 6 7 8 9 10

Parameter Noise Intensity

0.81

1.21.41.61.8

MAD(α

) 1[m

m]

(a) Detector u-direction

1 2 3 4 5 6 7 8 9 10

Parameter Noise Intensity

0.4

0.6

MAD(α

) 2[m

m]

(b) Detector v-direction

Figure 20: MAD for the Oscil dataset. Parameters were initialized with increasing

intensity of Gaussian noise. Blue crosses show ten repetitions per intensity level, while

the red curve depicts the average error.

runs. Fig. 20 shows that the estimation error increases only moderately with increasing

noise level. This supports that the parameter estimation is reproducible and robust to

variances in parameter initialization.

5. Discussion and Conclusion

We extended the Fourier Consistency Conditions (FCC) from fan-beam (Berger et al.;

2014) to cone-beam CT. It is applied directly in the Fourier-space of the projection

image stack. Our experiments show that it substantially reduces artifacts due to 3-D

translational object motion, with the ability to detect even very high-frequency motion.

A clear advantage of our method is that we do not require any temporal assumptions for

the motion: for example Yu et al. (2006) limited their estimation to a polynomial model

along time resulting in only 6 parameters. In contrast, we were able to successfully

estimate 2K − 2 = 1022 detector shifts independently for each projection, which allows

compensation to a great variety of translational motion shapes. An extension to the

full set of affine or even non-linear deformations is possible using the more general cost

function in Eqn. 2. These deformations would need to be implemented in the spatial

domain which prohibits the use of precomputed 2-D FFTs on the projection images and

requires a 3-D FFT for each partial derivative and cost function evaluation.

Large improvements in image quality were achieved by the FCC correction method

on the FORBILD phantom for all tested motion patterns, with SSIM and RMSE

values close to that of the CorrGT reconstructions in most of the cases. In general,

the high-frequency motions, i.e., Oscil , Chirp, and Rect , had the lowest SSIM when

reconstructing with the ground-truth motion CorrGT. This may correspond to the

increased level of streaking artifacts caused by the fast variations within these motion

patterns. The visually observed energy within the triangular regions after applying

the proposed method is in agreement with the observed residual reconstruction errors,

i.e., higher for high-frequency motion and lower for low-frequency motion. Certain

motion types, such as LF-2 , appear to have only limited effect to the triangular regions.

Nevertheless, reconstruction results improved also for the LF-2 motion and the MAD


is comparable to other motion patterns.

The visual impression of the reconstructions is in our case better reflected by SSIM

than RMSE. We believe that this is due to the SSIM’s sensitivity to the edge alignment

of the object which is very indicative for quantifying motion artifacts. The SSIM also

allows the verification of the applicability of a plain shifting motion model for different

motion cases using the CorrGT reconstructions. The reasons for the reduction in SSIM

in case of the CorrGT reconstructions are two-fold. First, the SSIM is influenced by

inaccuracies of the approximative motion compensation algorithm (Schaefer et al.; 2006)

especially for higher motion amplitudes. Second, a plain shifting motion model can

only approximate 3-D translational motion, as the shape of the projection as well as its

intensities may vary based on the magnification factor as well as the cone-angle of the

CT system.

Our results show that the FCC cost function is robust to even severe Poisson

distributed noise artifacts. This can be clearly seen by comparing the results of Figure 10

to the corresponding noise-free results in Figure 6(f-j) and Figure 8(g-l). Further, the

relative improvement of SSIM and RMSE in Table 2 and for MAD and SD in Table 3,

are not much affected by noise.

The proposed method showed very accurate results when estimating axial motion

using the vertical detector direction. This makes it a promising candidate for correction

of respiratory motion, where often only vertical detector translations are estimated,

e.g., in the field of radiotherapy (Rit et al.; 2013). In addition to the FORBILD cases

we could show that FCC are also applicable to more realistic cases. We show that

respiratory motion can be accurately tracked when applied to an XCAT-based rotational

angiography of the coronary arteries. Additionally, we were able to successfully apply

FCC to a knee phantom simulated under weight-bearing conditions, which was based

on real patient motion. This also allowed testing of the truncation robust extension of

FCC using apodization windows, as introduced in Section 2.6.

A limitation of our method is that it is invariant to constant detector offsets,

i.e., a general misalignment of the detector cannot be corrected. For our experiments

we decided to alleviate this problem by fixing the motion parameters of the first

projection image to the ground-truth motion. This was helpful in multiple ways.

First, we made sure that an undefined value does not reduce image quality which

allows a fair qualitative and quantitative comparison. Second, adjusting the initial

parameters to the ground-truth motion implicitly aligns the reconstructions’ coordinate

systems for all tested motion cases which is essential for a meaningful SSIM and RMSE

computation. However, we are aware that the ground-truth motion is not available

in a real scenario. We expect that a moderate offset in v-direction will manifest in a

slight increase of cone-beam artifacts and an object shift in z-direction. In contrast a

constant shift in u-direction has more severe implications on the reconstruction result.

Thus, estimating only one parameter may be sufficient to obtain a reasonable image

quality. A straightforward estimation method is to create reconstructions for various

u-direction offsets and either manually select the visually most accurate reconstruction


or automatically select the best reconstruction based on image quality measures such

as Entropy or Positivity (Rohkohl et al.; 2013).

An important result of our study is the systematic low-frequency deviation of the

motion in horizontal detector direction as outlined in Figures 12a and 12b. The key

finding is that the erroneously estimated motion in case of the motion-free data can

also be found as additive bias in the residual error of the motion estimates for the

motion-corrupted data. Figure 17 shows that the low-frequency deviation is often

reflected by sinusoidal harmonics, depending on the parameter configuration of the

mask. Yet, we were not able to specify its source exactly. One reason could be the

mask’s discretization, as shown in Section 2.4. We propose a slight increase of the mask

size to avoid undefined spectral regions. Our results indicate that this extension is able

to stabilize the optimization and reduces the deviation to a much smaller amplitude.

Additionally, we were able to show that the same deviations are also present in 2-D

fan-beam geometries, proving that they do not originate from approximations during

our extension to 3-D cone-beam CT.

For future work we plan an extension of the method to also lateral truncations.

Additionally, we will aim to get a better understanding of the sources for the reported

deviations in horizontal detector direction, e.g., by a comparison with the parallel beam

geometry, where FCC have been initially derived for. The goal is to reduce the residual

error for the horizontal detector motion to a similar level than for the vertical direction,

yielding very accurate motion estimates and improved image quality.

6. Conclusion

In summary, this work presents a novel approach for motion estimation and

compensation in 3-D cone-beam CT. The method is based on FCC that are defined in

the Fourier domain of the acquired projection image stack. The method works directly

in the projection domain and estimates 2-D detector shifts for each projection image

in order to optimize an FCC-based cost function. Even though detector shifts may

be limited to simple 3-D motion, the presented method does not rely on temporal

assumptions and can therefore be considered for many types of motion. Resulting

motion parameters can be easily incorporated during the final backprojection step of

the image reconstruction, thus, no iterative or intermediate reconstruction is necessary.

We do not require any prior reconstructions and also no additional signals such as

externally attached fiducial markers or an ECG. Our extension of FCC to 3-D cone-

beam CT showed promising results, when applied to a variety of motion patterns on

a challenging numerical phantom. Furthermore, we could successfully apply FCC to a

realistic phantom of knees under weight-bearing conditions and for respiratory motion

estimation in rotational angiography of the coronaries. Particularly good results were

achieved for motion along the vertical detector directions, i.e., collinear to the CT’s

rotation axis. Even high-frequency motion could be accurately identified and estimated

such that we noticed a great reduction of motion-induced streaking artifacts.

REFERENCES 38

Acknowledgments

The authors gratefully acknowledge funding of the Research Training Group 1773

Heterogeneous Image Systems and of the Erlangen Graduate School in Advanced Optical

Technologies (SAOT) by the German Research Foundation (DFG) in the framework of

the German excellence initiative. Further, we acknowledge funding support from the

NIH Shared Instrument Grant S10 RR026714 supporting the zeego@StanfordLab, the

NIH Weight-bearing imaging Grant R01 AR065248, and Siemens AT.

References

Berger, M. (2016). Motion-Corrected Reconstruction in Cone-Beam Computed

Tomography for Knees under Weight-Bearing Condition, PhD thesis, Friedrich-

Alexander-Universitat Erlangen-Nurnberg.

Berger, M., Maier, A., Xia, Y., Hornegger, J. and Fahrig, R. (2014). Motion

Compensated Fan-Beam CT by Enforcing Fourier Properties of the Sinogram, in

F. Noo (ed.), Proceedings of the third international conference on image formation in

x-ray computed tomography, pp. 329–332.

Berger, M., Muller, K., Aichert, A., Unberath, M., Thies, J., Choi, J.-H., Fahrig, R.

and Maier, A. (2016). Marker-free motion correction in weight-bearing cone-beam ct

of the knee joint, Medical Physics 43(3): 1235–1248.

Brokish, J. and Bresler, Y. (2006). Sampling Requirements for Circular Cone Beam

Tomography, Nuclear Science Symposium Conference Record, 2006. IEEE, Vol. 5,

pp. 2882–2884.

Choi, J.-H., Fahrig, R., Keil, A., Besier, T. F., Pal, S., McWalter, E. J., Beaupre,

G. S. and Maier, A. (2013). Fiducial marker-based correction for involuntary motion

in weight-bearing C-arm CT scanning of knees. Part I. Numerical model-based

optimization., Medical Physics 40(9): 091905.

Choi, J.-H., Maier, A., Keil, A., Pal, S., McWalter, E. J., Beaupre, G. S., Gold, G. E. and

Fahrig, R. (2014). Fiducial marker-based correction for involuntary motion in weight-

bearing C-arm CT scanning of knees. II. Experiment, Medical Physics 41(6): 061902.

Clackdoyle, R. and Desbat, L. (2013). Full data consistency conditions for cone-beam

projections with sources on a plane, Physics in Medicine and Biology 58(23): 8437.

Clackdoyle, R. and Desbat, L. (2015). Data consistency conditions for truncated

fanbeam and parallel projections, Medical Physics 42(2): 831–845.

Edholm, P. R., Lewitt, R. M. and Lindholm, B. (1986). Novel Properties Of The Fourier

Decomposition Of The Sinogram, in T. F. Budinger, Z.-H. Cho and O. Nalcioglu

(eds), Proc. SPIE 0671, Physics and Engineering of Computerized Multidimensional

Imaging and Processing, International Society for Optics and Photonics, pp. 8–18.

Ens, S., Ulrici, J., Hell, E. and Buzug, T. M. (2010). Automatic detection of patient

REFERENCES 39

motion in cone-beam computed tomography, 2010 IEEE International Symposium on

Biomedical Imaging: From Nano to Macro, IEEE, pp. 1257–1260.

Feldkamp, L. A., Davis, L. C. and Kress, J. W. (1984). Practical cone-beam algorithm,

Journal of the Optical Society of America A 1(6): 612–619.

Gendrin, C., Furtado, H., Weber, C., Bloch, C., Figl, M., Pawiro, S. A., Bergmann,

H., Stock, M., Fichtinger, G., Georg, D. and Birkfellner, W. (2012). Monitoring

tumor motion by real time 2D/3D registration during radiotherapy., Radiotherapy and

oncology : journal of the European Society for Therapeutic Radiology and Oncology

102(2): 274–80.

Harris, F. J. (1978). On the use of windows for harmonic analysis with the discrete

fourier transform, Proceedings of the IEEE 66(1): 51–83.

Hartley, R. I. and Zisserman, A. (2004). Multiple View Geometry in Computer Vision,

2 edn, Cambridge University Press, Cambridge, Great Britain.

Hawkins, W., Leichner, P. and Yang, N.-C. (1988). The circular harmonic transform

for spect reconstruction and boundary conditions on the fourier transform of the

sinogram, Medical Imaging, IEEE Transactions on 7(2): 135–138.

Helgason, S. (1980). The Radon Transform, 1 edn, Birkhauser Boston, Boston, MA,

USA.

John, F. (1938). The ultrahyperbolic differential equation with four independent

variables, Duke Mathematical Journal 4(2): 300–322.

Lauritsch, G. and Bruder, H. (2001). Forbild head phantom, Website. last checked:

2016/01/21.

Leng, S., Nett, B., Speidel, M. and Chen, G.-H. (2007). Motion artifact reduction in

fan-beam and cone-beam computed tomography via the fan-beam data consistency

condition (FDCC), Proc. SPIE, Vol. 6510, pp. 65101W–65101W–13.

Li, T., Xing, L., Munro, P., McGuinness, C., Chao, M., Yang, Y., Loo, B. and Koong,

A. (2006). Four-dimensional cone-beam computed tomography using an on-board

imager, Medical Physics 33(10): 3825–3833.

Ludwig, D. (1966). The radon transform on euclidean space, Communications on Pure

and Applied Mathematics 19(1): 49–81.

Maier, A. and Berger, M. (2010). Jpop: Java parallel optimization package, Website.

https://www5.cs.fau.de/research/software/java-parallel-optimization-package/, last

checked: 2017/05/23.

Maier, A., Hofmann, H., Schwemmer, C., Hornegger, J., Keil, A. and Fahrig, R. (2012).

Fast Simulation of X-ray Projections of Spline-based Surfaces using an Append Buffer,

Physics in Medicine and Biology 57(19): 6193–6210.

Mazin, S. R. and Pelc, N. J. (2010). Fourier properties of the fan-beam sinogram.,

Medical physics 37(4): 1674–80.

REFERENCES 40

Muller, K., Maier, A., Zheng, Y., Wang, Y., Lauritsch, G., Schwemmer, C., Rohkohl,

C., Hornegger, J. and Fahrig, R. (2014). Interventional heart wall motion analysis

with cardiac C-arm CT systems, Physics in Medicine and Biology 59(9): 2265–2294.

Natterer, F. (1986). The mathematics of computerized tomography, Springer.

Natterer, F. (1993). Sampling in fan beam tomography, SIAM Journal on Applied

Mathematics 53(2): 358–380.

Ouadah, S., Stayman, J. W., Gang, G. J., Ehtiati, T. and Siewerdsen, J. H. (2016).

Self-calibration of cone-beam CT geometry using 3D-2D image registration, Physics

in Medicine and Biology 61(7): 2613–2632.

Patch, S. K. (2002a). Computation of unmeasured third-generation VCT views from

measured views, Medical Imaging, IEEE Transactions on 21(7): 801–813.

Patch, S. K. (2002b). Consistency conditions upon 3D CT data and the wave equation,

Physics in Medicine and Biology 47(15): 2637.

Rit, S., Sarrut, D. and Desbat, L. (2009). Comparison of analytic and algebraic

methods for motion-compensated cone-beam ct reconstruction of the thorax, IEEE

Transactions on Medical Imaging 28(10): 1513–1525.

Rit, S., Sarrut, D. and Sonke, J.-J. (2013). Respiratory Motion Correction in Cone-Beam

CT for Image-Guided Radiotherapy, in J. Ehrhardt and C. Lorenz (eds), 4D Modeling

and Estimation of Respiratory Motion for Radiation Therapy, Biological and Medical

Physics, Biomedical Engineering, Springer Berlin Heidelberg, pp. 319–334.

Rohkohl, C., Bruder, H., Stierstorfer, K. and Flohr, T. (2013). Improving best-phase

image quality in cardiac CT by motion correction with MAM optimization., Medical

physics 40(3): 031901.

Schaefer, D., Borgert, J., Rasche, V. and Grass, M. (2006). Motion-compensated

and gated cone beam filtered back-projection for 3-D rotational X-ray angiography,

Medical Imaging, IEEE Transactions on 25(7): 898–906.

Schwemmer, C., Rohkohl, C., Lauritsch, G., Muller, K. and Hornegger, J. (2013).

Residual motion compensation in ECG-gated interventional cardiac vasculature

reconstruction, Physics in Medicine and Biology 58(11): 3717.

Segars, W. P., Sturgeon, G., Mendonca, S., Grimes, J. and Tsui, B. M. W. (2010). 4D

XCAT phantom for multimodality imaging research, Medical Physics 37(9): 4902–

4915.

Shirato, H., Harada, T., Harabayashi, T., Hida, K., Endo, H., Kitamura, K., Onimaru,

R., Yamazaki, K., Kurauchi, N., Shimizu, T., Shinohara, N., Matsushita, M., Dosaka-

Akita, H. and Miyasaka, K. (2003). Feasibility of insertion/implantation of 2.0-mm-

diameter gold internal fiducial markers for precise setup and real-time tumor tracking

in radiotherapy, International Journal of Radiation Oncology*Biology*Physics

56(1): 240–247.

Unberath, M., Aichert, A., Achenbach, S. and Maier, A. (2016). Virtual Single-frame

REFERENCES 41

Subtraction Imaging, in M. Kachelries (ed.), Proceedings of the 4th CT Meeting,

pp. 89–92.

Unberath, M., Berger, M., Aichert, A. and Maier, A. (2017). Fourier Consistency-Based

Motion Estimation in Rotational Angiography, in DKFZ (ed.), Proc. Bildverarbeitung

fur die Medizin (BVM) 2017, Heidelberg, pp. 110–116.

Verrill, S. (2005). An unconstrained nonlinear optimization solver: A users’s guide,

Website. https://www1.fpl.fs.fed.us/uncmin userguide.pdf, last checked: 2017/05/23.

Wang, Z., Bovik, A., Sheikh, H. and Simoncelli, E. (2004). Image Quality

Assessment: From Error Visibility to Structural Similarity, IEEE Transactions on

Image Processing 13(4): 600–612.

Welch, A., Campbell, C., Clackdoyle, R., Natterer, F., Hudson, M., Bromiley, A.,

Mikecz, P., Chillcot, F., Dodd, M., Hopwood, P., Craib, S., Gullberg, G. T. and Sharp,

P. (1998). Attenuation correction in PET using consistency information, Nuclear

Science, IEEE Transactions on 45(6): 3134–3141.

Yu, H. and Wang, G. (2007). Data Consistency Based Rigid Motion Artifact Reduction

in Fan-Beam CT, Medical Imaging, IEEE Transactions on 26(2): 249–260.

Yu, H., Wei, Y., Hsieh, J. and Wang, G. (2006). Data consistency based translational

motion artifact reduction in fan-beam CT, IEEE Transactions on Medical Imaging

25(6): 792–803.

Zeng, R., Fessler, J. A. and Balter, J. M. (2005). Respiratory motion estimation from

slowly rotating x-ray projections: Theory and simulation, Medical Physics 32(4): 984.

Motion Compensation for Cone-Beam CT Using Fourier … · 2017. 12. 6. · Motion Compensation for...

Documents

Transcript of Motion Compensation for Cone-Beam CT Using Fourier … · 2017. 12. 6. · Motion Compensation for...