Hidden Markov Models-based 3D MRI Brain Segmentation
-
Upload
eddy-triyono -
Category
Documents
-
view
218 -
download
0
Transcript of Hidden Markov Models-based 3D MRI Brain Segmentation
-
7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation
1/15
Hidden Markov models-based 3D MRI brain segmentation
M. Ibrahim, N. John, M. Kabuka *, A. Younis
Department of Electrical and Computer Engineering, College of Engineering, University of Miami,
1251 Memorial Drive, Room 406, Coral Gables, FL 33146, USA
Received 18 September 2004; received in revised form 4 February 2006; accepted 1 March 2006
Abstract
This paper introduces a 3D MRI segmentation algorithm based on Hidden Markov Models (HMMs). The mathematical models for the HMM
that forms the basis of the segmentation algorithm for both the continuous and discrete cases are developed and contrasted with Hidden MarkovRandom Field in terms of complexity and extensibility to larger fields. The presented algorithm clearly demonstrates the capacity of HMM to
tackle multi-dimensional classification problems.
The HMM-based segmentation algorithm was evaluated through application to simulated brain images from the McConnell Brain Imaging
Centre, Montreal Neurological Institute, McGill University as well as real brain images from the Internet Brain Segmentation Repository (IBSR),
Harvard University. The HMM model exhibited high accuracy in segmenting the simulated brain data and an even higher accuracy when
compared to other techniques applied to the IBSR 3D MRI data sets. The achieved accuracy of the segmentation results is attributed to the HMM
foundation and the utilization of the 3D model of the data. The IBSR 3D MRI data sets encompass various levels of difficulty and artifacts that
were chosen to pose a wide range of challenges, which required handling of sudden intensity variations and the need for global intensity level
correction and 3D anisotropic filtering. During segmentation, each class of MR tissue was assigned to a separate HMM and all of the models were
trained using the discriminative MCE training algorithm. The results were numerically assessed and compared to those reported using other
techniques applied to the same data sets, including manual segmentations establishing the ground truth for real MR brain data. The results
obtained using the HMM-based algorithm were the closest to the manual segmentation ground truth in terms of an objective measure of overlap
compared to other methods.q 2006 Elsevier B.V. All rights reserved.
Keywords: Hidden Markov Models; Image segmentation; Medical imaging
1. Introduction
Interpretation of the biomedical imaging of the brain plays
an important part in diagnosis of various diseases and injury.
Due to the importance of brain imaging interpretation,
significant research efforts have been devoted to developing
better and more efficient techniques in several related areas
including processing, modeling, and understanding of brain
images. In particular, the problem of automating 3D
segmentation of brain imaging using Magnetic Resonance
Imaging (MRI), Computed Tomography (CT), Positron
Emission Tomography (PET) or other modalities, has received
special attention as evidenced by numerous published research
work[13]. This is mainly due to the multitude of benefits that
may be gained from accurate automated 3D brain
segmentation.
Segmentation frameworks based on Markov Random Fields
(MRF) and Hidden Markov Random Fields (HMRF) were
introduced in several reported efforts [912]. MRFs and
HMRFs share the common property of revealing the
dependency between the imaging voxels to be segmented and
their first-degree neighbors. However, both frameworks are
computationally intensive, which adversely affects their
practical applicability in medical environments. On the other
hand, Hidden Markov Models (HMMs) have proven valuable
when applied to Automatic Speech Recognition (ASR) [4],
where ASR is essentially a pattern recognition problem. In fact,
HMRFs, which are mainly applied in computer vision and
image processing, grew out of further developments of HMMs.
Hidden Markov Chains have also been reported for image
segmentation using radar, synthetic and multi-sensor images
[3133]. A generalized mixture estimation approach is
Image and Vision Computing xx (2006) 115
www.elsevier.com/locate/imavis
0262-8856/$ - see front matter q 2006 Elsevier B.V. All rights reserved.
doi:10.1016/j.imavis.2006.03.001
* Corresponding author. Tel.: C1 305 284 2212; fax: C1 305 284 4044.
E-mail address: [email protected] (M. Kabuka).
+ model ARTICLE IN PRESS
http://www.elsevier.com/locate/imavismailto:[email protected]:[email protected]://www.elsevier.com/locate/imavis -
7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation
2/15
presented for unsupervised classification of Hilbert-Peano
scans of radar images [31], which combines Hidden Markov
Chain models and Hidden Markov Random Field models.
Similarly, pairwise Markov Random Chain models provided
the basis for unsupervised signal and image segmentation of
simulated as well as radar images [32]. Another approach
utilizing Hidden Markov Chains was presented for imagesegmentation of synthetic and multi-sensor radar images [33].
These techniques provide promising results for utilizing
HMMs for MR image segmentation.
HMMs, implemented using the Viterbi algorithm, are
sufficiently capable of encoding the first-degree relationships
and can be extended to higher degrees. Encoding first-degree
relationships among the voxels will be shown, as evidenced by
the experimental results, to provide sufficient information for
accurate segmentation of 3D MRI brain imaging data. The
main training algorithms that have been developed for HMMs
are the BaumWelsh algorithm [4] and the Maximum Mutual
Information (MMI) algorithm [5]. The inefficiency of both
techniques is argued in the context of Bayesian classification
where it is shown that both algorithms do not necessarily result
in the best Bayesian threshold [6]. Consequently, a new
algorithm, namely the Minimum Classification Error (MCE),
was developed [6], which takes into consideration exposing
each of the HMM nodes to both the patterns to be rejected as
well as the patterns to be recognized. As a result, the HMM
nodes can minimize the accompanying error rate by moving
the Bayesian threshold closer to the correct location as shown
in Fig. 1.
Many advances in brain MR image segmentation have
relied on a Bayesian framework and Markov Random Fields
(MRFs) [17]. In [15], the smoothness and piecewise contiguousnature of the tissue regions in MR cerebral images was
modeled using a 3D MRF. A segmentation algorithm, based on
the statistical model, finds the approximate Maximum A
Posteriori (MAP) estimation of the segmentation model
parameters from the MR imaging data. Another scheme for
segmentation was based on the Iterative Conditional Modes
(ICM) algorithm [18], in which measurement model para-
meters were estimated using local information at each site, and
the prior model parameters were estimated using the
segmentation results after each cycle of iterations. In this
case, MRFs were used to model only the intensity process, and
the segmentation results were improved by incorporating the
discontinuity process into the prior model. The scheme also
addressed the effect of magnetic field inhomogeneities and
biological variations of tissues as variations of the model
parameters. Unfortunately, this model did not investigate the
discontinuity process in the 3D MR volumes.A fully automated 3D-segmentation technique for MR brain
images was introduced in [19] that relied on a MRF model to
capture the non-parametric distributions of tissue intensities,
neighborhood correlations, and signal inhomogeneities in MR
images. The technique used two algorithms based on Simulated
Annealing and on Iterative Conditional Modes and started with
a training process of typical echo intensities and setting one of
the MRF parameter according to the expected inhomogeneity.
The technique was able to automatically segment the entire 3D
MR volume, as well as different MR images acquired using the
same MR sequence. Another study [20] involved embedding
the problem of functional MRI (fMRI) analysis into a Bayesian
framework, and then provided an algorithm to restore and
analyze fMRI using MRFs in a Bayesian framework. The study
analyzed the shortcomings of the Statistical Parameter Map
(SPM) by using a 3D MRF where the third dimension
represents time, and then the proposed restoration approach
was applied before using SPM, which resulted in an
improvement of the detection sensitivity. This study also
analyzed the hemodynamic response using three parameters,
the norm, the maximum and the time when the maximum
occurs, where it was shown that when the values of these
parameters in neighboring voxels are far from each other, the
probability of detection is lower since the associated
hemodynamic responses are not consistent in the spatialdomain. Hence, the problem was modeled using two-level
MRF interactions between the activation map and the three
parameter maps. The detection of an activated area, thus,
depends on the norm of the hemodynamic response and some
contextual information on this norm as well as the consistency
of the hemodynamic function parameters across this area.
Another fully automated method for model-based tissue
classification of magnetic resonance MR images of the brain
was introduced in [16]. The method relies on MRFs to
incorporate contextual information and uses a digital brain
atlas for the expected a priori information of the spatial
locations of the tissue classes. The main idea of the method is
to interleave the classification with MR bias field correction,
intensity distribution estimation, and estimation of MRF
parameters. Hence, it improves the classification in each
iteration of the segmented single and multi-spectral MR
images, and corrected MR signal inhomogeneities. The
proposed strategy can be considered a fully automated method
for tissue classification that produces objective and reprodu-
cible results. Another automatic method is presented in [21],
where the objective of the study is to classify the brain tissue
while taking into account the partial volume effect, which
results in MR image volumes being composed of a mixture of
several tissue types. This study assumes that the brain dataset is
composed of gray matter, white matter, cerebro-spinal fluid,
Class PDF
Non-classPDF
Errorneous threshold
Bayesian threshold
Probability
Argument
Fig. 1. Correct Bayesian threshold vs. erroneous one.
M. Ibrahim et al. / Image and Vision Computing xx (2006) 1152
+ model ARTICLE IN PRESS
-
7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation
3/15
and mixtures (called mix-classes). The study provided a
statistical model of the mix-classes and it showed that it
could be approximated by a Gaussian function under some
conditions. The proposed method used a two-step strategy; in
the first step, it segmented the brain into pure and mix-classes
while the second step is to re-classify the mix-classes into the
pure classes using knowledge about the obtained pure classes.Both steps use MRF models as well as the multi-fractal
dimension describing the topology of the brain to provide an
additional energy term in the MRF model to improve
discrimination of the mix-classes. The proposed strategy is
unsupervised, fully automatic, and uses only T1-weighted
images. In [22], a statistical framework for partial volume
segmentation of MR images of the brain was introduced. The
framework starts by segmenting the image using a parametric
statistical model in which each voxel is classified to one single
type of tissue. Then, it uses a down-sampling step that
addresses partial volumes along the borders between tissues. In
this step, a number of voxels in the original image grid
contribute to the intensity of each voxel in the resulting image
grid. The framework also uses an Expectation Maximization
(EM) approach to estimate the parameters of the new model
and to perform the partial volume classification.
In [23], a statistical segmentation framework of brain MR
images based on Hidden Markov Random Field (HMRF) is
introduced, which overcomes the problems of Finite Mixture
(FM) models [24,25] that do not take into account the spatial
properties of the image. The HMRF model is an MRF model
whose state sequence cannot be observed directly but can be
indirectly estimated through observations. The strategy also
uses an EM algorithm to provide an accurate and robust
segmentation. The study in [26] introduced an efficient andaccurate automatic 3D segmentation approach for brain MR
images. The approach uses a brain atlas in conjunction with a
robust registration procedure to find a non-rigid transformation
that maps the standard brain to the specimen to be segmented,
and hence, is used to segment the brain from non-brain tissues
and compute prior probabilities for each class at each voxel
location. The approach also involved a fast and accurate way to
find optimal segmentations based on EM models, given the
intensity models along with the spatial coherence assumption.
Unfortunately, the study does not take the Partial Volume (PV)
effect into account.
A contextual segmentation technique to detect brain
activation from functional brain images based on a Bayesian
framework is presented [28], which uses an MRF model to
represent configurations of activated brain voxels. It also uses
likelihoods given by statistical parametric maps to find the
maximum a posteriori estimation of segmentation. The
technique is capable of analyzing experiments involving
multiple-input stimuli. The study in [27] introduced a model-
based approach for automatic segmentation and classification
of multi-parameter MR brain images into 15 tissue classes. The
model approximated the spatial distribution of tissue classes by
a Gaussian MRF and used the maximum likelihood method to
estimate class probabilities and transitional probabilities for
each pixel of the image. The proposed algorithm is not only
accurate compared to manual segmentation but also can learn
new tissue classes. An unsupervised tissue characterization
algorithm was introduced in [29] that is both statistically
principled and patient specific. The method used adaptive
standard finite normal mixture and inhomogeneous MRF
models, whose parameters were estimated using ER method
and relaxation labeling algorithms under information theoreticcriteria.
A technique for assessing the accuracy of segmentation
algorithms was presented in [10] and applied to the
performance evaluation of brain editing and brain tissue
segmentation algorithms for MR images. It relied on a
distance-based discrepancy features between the ground truth
obtained from realistic digital brain phantom, which is taken as
a reference, and the edited/segmented brain tissues. The
proposed strategy can be used to evaluate and validate any
segmentation algorithm, and it is able to determine quantitat-
ively to what extent a segmentation algorithm is sensitive to
internal parameters, noise, artifacts or distortions when a
ground truth is given.
In this paper, a segmentation algorithm based on Hidden
Markov Models is presented, in conjunction with the required
preprocessing, for MR data. The algorithm is multi-dimen-
sional and demonstrates a high degree of accuracy for 3D MRI
brain segmentation, compared to other techniques. Unlike
generic pre-processing used in most image processing and
computer vision applications, the pre-processing phases used in
this algorithm are specifically developed to handle problems
encountered in 3D MRI brain segmentation. These problems
include correction of sudden intensity variations resulting from
artifacts during the acquisition process and global brightness
and contrast correction, with both problems showing asignificant impact on segmentation accuracy. In addition to
its segmentation accuracy, the HMM-based segmentation
algorithm distinguishing characteristics include efficient
computational requirements, unique scanning of the 3D MRI
data that enables the modeling of the voxels neighborhood
effect on that voxels segmentation, and generic applicability to
larger neighborhoods that is important for the detection of
larger features that exceed the high-resolution neighborhood
size.
The 3D MRI segmentation algorithm was evaluated using
simulated 3D MRI brain data sets obtained from McConnell
Brain Imaging Centre, Montreal Neurological Institute, McGill
University (http://www.bic.mni.mcgill.ca/) and real 3D MRI
brain data sets obtained from the Internet Brain Segmentation
Repository (IBSR), Center for Morphometric Analysis at
Massachusetts General Hospital (http://www.cma.mgh.harvard.
edu/ibsr/). The 3D MRI data sets are used to perform an
objective assessment of the segmentation results based on a
metric that enables the comparison of the segmentation results
obtained using the presented algorithm as well as clinical
experts performing segmentation manually, which are avail-
able from the IBSR web site. The metric is termed the
overlapping coefficient and is equal to one if the automatic
segmentation results were identical to the manual ones and
reduces to 0 with no intersection. The quality of the
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 3
+ model ARTICLE IN PRESS
http://www.bic.mni.mcgill.ca/http://www.cma.mgh.harvard.edu/ibsr/http://www.cma.mgh.harvard.edu/ibsr/http://www.cma.mgh.harvard.edu/ibsr/http://www.cma.mgh.harvard.edu/ibsr/http://www.bic.mni.mcgill.ca/ -
7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation
4/15
segmentation results obtained using the algorithm presented in
this paper were further evaluated by comparison with the
results of other algorithms applied to the same data sets and
published on the IBSR website.
This paper is organized as follows: Section 2 describes the
underlying mathematical foundation upon which the algorithm
is based. Section 3 provides the details of the adoptedmathematical model for the discrete Hidden Markov Models
and is followed by the mathematical foundation of the
continuous case. Then, a complexity analysis for comparing
Markov Random Fields and Hidden Markov Models is
presented in Section 3. Section 5 details the training and
segmentation steps in both the continuous and discrete cases.
Section 4 provides the detail of the preprocessing phases.
Finally, experimental results using both real and simulated 3D
MRI data sets are presented in Section 6.
2. Mathematical model
The basic foundation of the presented algorithm relies on
the ability of the underlying Hidden Markov Model (HMM) to
build knowledge about the input multi-dimensional data
vectors or sequences that reflect the parameters of the MR
imaging modality, i.e. intensity information about the voxel
and its neighborhood. Hidden Markov Models are descendants
of Markov Chains, which are made of different states
statistically bound by transition probabilities. A HMM is
characterized by a set of internal states, the transition
probabilities among the states in response to an input symbol
from the sequence, and the emission probabilities of symbols
from the different states. The HMM knowledge is built in theform of the transition as well as the emission probabilities of
the states that are conditioned in response to the input symbols
of the sequence during the learning stage based on two
mathematical assumptions. First, the Markovianity assump-
tion, which is expressed as follows:
pqiZ sijqiK1Z sa;qiK2Z sb;.ZpqiZ sijqiK1Z sa (1)Eq. (1) imposes the condition that the probability p of
transition from one state qiK1 to another qi, is only dependent
on the previous state qiK1. In other words, the probability is
independent of the states prior to qiK1.
Second, the assumption that the emission probabilities from
each state are independent of each other, which leads to the
output probability being the product of the emission
probabilities of all states, as expressed in Eq. (2) as follows
pOjLZX
q1;q2;.;qn
pq1 bq1 O1aq1q2 bq2O2aq2q3/bqn On (2)
where p is the output probability of a chain OZO1O2/On,
bqx(Oy) is the emission probability of pattern Oy from state qx,
aij is the transition probability from state i to state j, pqx is the
initial probability of state qx, and L is a vector representing the
model parameters. Higher-order Markov Models increase the
level of dependency, which complicates the analysis of higher-
order systems. Moreover, first-order Hidden Markov Models
assume that the states are hidden and cannot be observed at the
output stage. Instead, only the outputs emitted from those states
are observable without knowing which states emitted those
outputs.
This is true when the Hidden Markov Models are viewed
from a similar perspective to the one presented in [4], where the
HMM was imagined as a process generating output symbolsand the observations were viewed from the outside without
knowing which states emitted them. At that point, the emission
probability of one state can well be assumed to be independent
of the other outputs. However, a different case exists when the
HMM is used for MR image segmentation, where the objective
is to find the best state sequence that might have produced an
output. By inspecting (2), the output probability for each
segment is calculated using the most probable path only, i.e.
without a summation over all possible paths. During training,
the goal of the training algorithms is to increase the output
probability of input sequences representing a certain class of
tissues. Hence, the transition and emission probabilities are
updated in a manner that maximizes the output probability of a
given class of tissues. This in some cases entails changing
transition and emission probabilities of prior states in order to
maximize the output probability given a certain terminating
output. A case that is clear if the output probability is
considered only due to the most probable path. This
mechanism in turn encodes some form of relationship between
the terminal and the input sequences. The encoding of relations
arises from the fact that upon updating the transition
probabilities of the prior states their values are decreased,
forcing the most probable chain of states to change to another
set of states having higher transition and emission probabilities.
The fact that relation encoding takes place is demonstratedthrough a numerical example that shows that emission
probabilities during classification with HMMs are conditioned
by non-neighboring outputs.
The encoding of this relation is demonstrated through the
example HMM, shown in Fig. 2, which involves two states,
State 0 and State 1 with the following initial probability,
pq0Zpq1Z0:5, emission probabilities, bq0(0)Z0.8, bq0(1)Z
0.2, bq1(0)Z0.4, bq1(1)Z0.6, and transition probabilities a00Z
0.3, a01Z0.7, a10Z0.6, a11Z0.4. This was tested using a
sequence of five outputs 00000 and the most probable chain
was found to be 01010 with probability of 0.016257.
However, when the last output is changed to 00001, themost probable chain changes not only in terms of the last state
but also in terms of the first state to be 10101 with probability
of 0.008129. Changing the output emitted final state inferred a
change in initial state and consequently changes the emission
Pi=0.5
b(0)=0.8b(1)=0.2
Pi=0.5
b(0)=0.4b(1)=0.6
0.7
0.6
0.3 0.4
State 0 State 1
Fig. 2. Example HMM.
M. Ibrahim et al. / Image and Vision Computing xx (2006) 1154
+ model ARTICLE IN PRESS
-
7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation
5/15
probability of O1 depending on O4. This shows that the
emission probability of output intensities can be conditioned by
the presence of other output intensities emitted by non-
neighboring states. A simple argument based on those results
shows that the HMM can encode relations in more than one
dimension, since the intensities in these sequences are
constructed from a 3!3!3 neighborhood of voxels. More-over, the HMM encodes relations between intensities of non-
neighboring voxels in the same 3!3!3 neighborhood, even if
they do not reside in the same clique, as defined in HMRF
models. The knowledge stored in the HMM encodes the
conditional dependency of the voxels intensities and the class
of tissue to which they belong in the form of the initial
probabilities, transition probabilities and emission probabilities
which are based on the mathematical model of the HMM
transition among the constituent states. In contrast, Hidden
Markov Random Fields (HMRFs) are based on Gibbs
distribution, which encodes relations between voxels through
the usage of cliques and mathematical modeling of thepotential.
In other words, both MRFs and HMRFs provide a
mathematical model for the dependency between voxel
intensities. However, HMMs can establish similar dependen-
cies among pixel/voxel intensities that are in larger regions or
do not belong to the same clique, as will be shown in Section 3
addressing the HMM mathematical model. In this work, when
presenting the pixel/voxel data to the HMM-based segmenta-
tion module, each pixel/voxel is represented by a vector
composed of its grayscale/color value as well as those of other
pixels/voxels in its neighborhood, 9 pixel-vector and 27 voxel-
vector for 2D and 3D imaging data, respectively. The vector is
presented to the HMM models and the probability of output is
calculated using prior training knowledge stored in the model.
Labeling takes place by setting the label of the voxel to that of
the HMM showing the highest output probability.
The outputs of a HMM can be discrete, acquiring certain
specific quantized levels or continuous based on continuous
probability density functions (PDFs). The most common
continuous PDF representation is a Multivariate Gaussian
distribution whose co-variances are assumed to be zeros,
reducing to a mixture additive set of normal distributions. By
estimating the probability that a pattern was generated by a
certain HMM where the most probable model to produce that
pattern is regarded as its tissue type or class. HMMs werepreviously used successfully in automatic speech recognition
(ASR) and are commonly used with the Minimum Classifi-
cation Error (MCE) training algorithm described in [6,7],
which forms the foundation of the learning process employed
in the proposed segmentation framework.
During the MCE training, the derivatives of the output are
computed with respect to every parameter to be updated. Since
the output we seek is the class number, a continuous
differentiable formula is required that evaluates the correctness
of the result by replacing the non-differentiable discrete on/off
output. The mathematical model of the loss in [6,7] was used
for that case where
liZ sigmoiddiZ1
1CeKgdiCq(3)
where g is the sigmoid slope, q is a shift and di is continuous
variable that is more negative when the result is more correct,
i.e. when the HMM of class i has higher probability, which can
be expressed as follows:
diZKgiX;LC1
kK1
XkjZ1
jsi
gjX;Lh
0BBBBBB@
1CCCCCCA
1=h
(4)
The right term of Eq. (4) approaches MAXk
jZ1
jsi
gjX;L as h/
N, which leads to di being negative if the HMM model of class
i showed the highest probability and so will the corresponding
li.
gx is a discriminant function for each class, which is notnecessarily corresponding to a probability since no restrictions
are imposed for that purpose. However, by using HMMs the
output is the probability of the pattern, and the used
discriminant is the probability due to the most probable path.
k is the number of models involved.
The MCE updates each parameter trying to reach the
minimum of li. For a certain parameter x, this update proceeds
as follows
xtC1Zx
tK3
vli
vx(5)
where 3 is the learning rate.
In the MCE algorithm [7], it is discussed that if the learningrate was chosen such that the following conditions are satisfied
XNtZ0
3t/N (6)
XNtZ0
3t2!N (7)
the model parameters L approaches at least a local minimum
L*. It is also described that by using a small sigmoid slope that
increases across iterations, the global minimum is achievable
with a higher probability than other training algorithms due tosmoothing of the error surface. Both considerations were
addressed in the context of this paper, where the learning rate is
given by:
3tZ 301Cat
(8)
The integration from zero to infinity is infinity, and the
integration of the learning rate squared is equal to 3(0)/a, where
a is a constant, and t is time, which is substituted by the
iteration number, i.e. (6) and (7) are satisfied. In other words,
the proposed HMM is accurate since it converges to the global
minimum as well as robust since the convergence is only
dependent on the established learning rate.
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 5
+ model ARTICLE IN PRESS
-
7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation
6/15
Two HMM models will be considered during analysis. The
first one is a binary discrete HMM where each node has an
emission probability for zero and an emission probability for
one. Consequently, the input is taken in the form of a long
vector having the binary equivalent of the intensities
represented in eight bits each. The second model is a
continuous model where each node represents the emissionprobabilities in the form of a Gaussian Mixture. The analysis of
the discrete model will be first presented followed by the
formulas necessary for the continuous Gaussian Mixture.
Since li is a function of di, and di is a function of gx, xZ
1,.,k, then the derivative of li with respect to a certain
parameter x using the chain rule is given:
vli
vxZ
vli
vdi
vdi
vgx
vgx
vx(9)
vli
vdiZgli1Kli (10)
vdi
vgxZ
K1; xZ i
ghK1x
kK1
1
kK1
X jZ 1jsi
k
gh
j
24
351=hK1; xsi
8>>>>>>>:
(11)
The output of the HMM can take several forms
giZXq2C
gx;q;L (12)
giZMAXq2
C g
x;q;L
(13)
giZ1
NCXq2C
gx;q;Lh" #1=h
(14)
where C represents the set of possible chains and N(C) is the
number of elements in C. The output can be any of the previous
forms or functions of them. MCE training discussed in [6,7]
was based on Eq. (13), which is called the segmental form
where only the most optimal path is considered for update
during the Generalized Probabilistic Descent update step.
Since minimizing or maximizing a function requires the
minimization or the maximization of its log, we choose thediscriminant function given by
giZLogpq0 CXTtZ1
LogaqtK1qtCLogbqt (15)
where the bs are the output functions, as are the transition
probabilities and ps are the initial probabilities.
The HMM imposes constraints on the most of the
parameters associated with each model. Such constraints
include the summation of all transition probabilities going
out of a state which must be one, the summation of all initial
probabilities and many other constraints, which have to be
satisfied during parameter update.
For that reason in [6] a substitution was used which
guarantees those constraints where the substituted parameter is
the one that gets updated in each step. The substitution
previously used in [6] for the initial probability is:
pxZexp px
PQqZ1
exp pq(16)
The previous substitution works well except for the fact that
its uses exponents, which slows down execution. Another
substitution is proposed and used in this research that does not
depend on exponents
pxZp2xPQ
qZ1
p2q
(17)
aixZ
a2ixPQqZ1
a2iq(18)
P0ZP
20
P20C
P21
; P1ZP
21
P20C
P21
(19)
where P0 and P1 belong to a certain state, and represent the
emission probabilities of ones and zeros. The parameters that
get updated are the substituted bar parameters.
To update the initial probabilities we need to find the
derivative of gx with respect to every pq, where qZ1,.,Q. If
qZq0 then
vgx
v pq0Z
21Kpq0pq0
(20)
On the other hand, if qsq0, a dependency still exists
through the normalization formula (17), and the derivative
becomes:
vgx
vpzZK2pq0 pz
p2q0(21)
A similar case holds for the transition probabilities. During
the update, we will consider the derivatives of the transition
probability going out from a certain state i to a state j.
vgx
v aijZ
XTtZ1
1
aqtK1qt
vaqtK1qtv aij
(22)
vaij
v aijZ
2aij1Kaijaij
(23)
vaix
v aijZK2a2ixaij
a2ix(24)
Updating the output probabilities is much easier than the
rest of the parameters. The first step is to find the derivative of g
with respect to b.
M. Ibrahim et al. / Image and Vision Computing xx (2006) 1156
+ model ARTICLE IN PRESS
-
7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation
7/15
vgx
vbqZ
1
bq(25)
vbxv P1
Z2P0P12xK1
P1(26)
vbxv P0
Z2P0P11K2x
P0(27)
The only difference between the discrete and the continuous
HMM models, is the way the output probability is calculated.
In the continuous case, b(x) is derived from a Gaussian mixture
as follows
bxZXKkZ1
CkNx;mk;s2k (28)
where two constraints are imposed. The first is that the
summation of the weights Ck must be one. The second is that
the standard deviations skis always positive. To guarantee that,
s2k is used for the standard deviation while s4k is used for the
variance. mk is the mean of distribution k and x is the input
variable. The substitution used for the weights is given by
CkZC
2kPK
xZ1
C2
x
(29)
where K is the number of mixtures used. Updating the
parameters is governed by the following equations:
vCx
v Cx Z
2Cx
1KCx
Cx (30)
vCy
v CxZK2C2y Cx
C2
y
(31)
vbxvCjZNx;mk;s2k (32)
vbxvmjZ
vbxvCj
CjxKmjs4j
(33)
vbxvsjZKvbxvmj
4xKmj2C2s4jsj
(34)
This then leads to the general form of the training and
segmentation algorithms for a 3!3!3 3D neighborhood.
Voxel data is represented as a vector composed of 27 floating-
point numbers. Each of these numbers represents the intensity
of the voxel and the intensity of each of its 26 3D neighbors.
This vector is presented to the HMM model and the probability
of output is calculated using prior training knowledge stored in
each model. Labeling takes place by setting the label of the
voxel to that of the node showing the highest output
probability.
3. Comparison with Hidden Markov Random Field
Comparison of the HMM and HMRF in the context of MRI
segmentation will be presented from two points of view. The
first is performance where the complexity analysis of both is
presented. The second is the ability of encoding relations
among voxels in larger neighborhoods.In order to assess the computational efficiency of the
proposed HMM-based segmentation framework, the complex-
ity of the HMM-based segmentation is compared to the widely
utilized HMRF-based segmentation in terms of performance.
Since the continuous Gaussian Mixture HMM is similar to
HMRF segmentation, its complexity analysis is used for
performance comparisons. This starts with the estimation of the
Gaussian mixture given by
OZXGiZ1
wi
ffiffiffiffiffiffiffiffiffiffiffi2ps2
p eKxKm2
s2 (35)
where wi is the weight associated with this Gaussian response.The number of floating-point operations required Nffor such an
operation is given by
OHMMZ9!G (36)where the nine operations account for finding (xKm), squaring
it, finding s2, negating (xKm)2, finding the exponent,
calculating 2ps2, finding square root, dividing wi by square
root, and multiplying by the exponent and G is the number of
mixtures used.
Hence, to find the output probability of a certain number in a
sequence, Eq. (36) gives the number of required floating-point
operations. In its first iteration, the Viterbi algorithm computesthe output probability of the first pattern in the sequence,
multiplied by the initial probability of each node, which forms
n!(1C9!G) computations for n nodes. In the subsequent
operations, the Viterbi algorithm multiplies the current
probability set at each node by the transition probability to
each node, which require an extra n2 operations, and adds the
output probability of the current pattern, which needs 1C9!G
operations, so the total number of operations is given by
OHMMZn1C9GC n21C9GLK10OHMM
ZOn2GL (37)where L is the length of the sequence.
HMRF models start by counting the number of cliques in the
3!3!3 neighborhood. Those cliques can only be formed as
2!2!2 neighborhoods, i.e. composed of eight voxels. Any
combination of voxels larger than two will form a clique in that
neighborhood. Each of cliques requires the evaluation of the
potential. Since the complexity of computing the potential
depends on the model being used, the potential is assumed to
require one cycle per voxel and another cycle for the clique,
which results in the best case scenario for the HMRF models.
This can be demonstrated by the simplest case of subtracting
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 7
+ model ARTICLE IN PRESS
-
7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation
8/15
the mean out of each voxel, squaring and summing all the
potentials together. More complicated models will, in turn,
require higher complexities. The number of operations NVrequired to carry out these computations of the potential will
thus be:
NVZ2!16!
2C1
C4!4!
2C1
C1
C
X8vZ3
8
v
!4!vCvK1 (38)
And since the probability distribution P(f) of the configur-
ation is a Gibbs distribution (with respect to the neighborhood
system used), which can be given by:
PfZ 1Z!eK
1T
Uf where UfZX
allCliques
Vcf (39)
IfZis assumed a constant by restricting the cliques either to
single locations like or single and double locations, the order of
computing the Gibbs distribution then the computationalcomplexity of G will be O(1), since it requires a constant
number of operations irrespective of any of the model
parameters. However, for more accurate computations the
process of estimation of Z increases the order of complexity,
moreover, for a continuous case like that presented in this
paper it is impossible to find the exact value of Z as it will be
the result of 27 nested integrals. This leads to estimation, which
in turn affects the accuracy of the computed probability. The
complexity becomes:
OHMRFZOZ (40)
So, in the continuous case, the HMRF becomes thecomputation of 27 nested integrals, whereas the HMM is
dependant on the number of classes, number of nodes, and the
size of the input vector.
In the HMM, application to larger neighborhoods occurs
with a change in the size of the input vectors used for training
and segmentation to represent the larger neighborhood since
the parameter updates (11)(34) do not rely on the size of the
input vector. No further change in the HMM algorithm is
necessary. As a result, the HMM provides a robust foundation
that is generically applicable to the segmentation of multi-
dimensional datasets in arbitrarily large neighborhoods, i.e.
applicable to MRI as well as MRSI data.
Larger neighborhoods raise a computational concern in the
case of HMRF segmentation. For example, the classification of
a voxel based on a neighborhood larger than 3!3!3 involves
the mapping from MRF to Gibbs distribution, which, in turn,
entails computing the Gibbs distribution in a 3!3!3
neighborhood. Larger regions necessitate the analysis of higher
order Markov Fields, which requires the re-definition of the
neighborhood. To successfully relate larger neighborhoods,
HMRF must be used in which iterative segmentation takes
place. This is due to the dependency of segmenting each voxel
on its neighbors and their prior segmentations, which are used
to compute the potential. Thus, the segmentation becomes
subject to iterative local maximization/minimization
algorithms like the Expectation Maximization (EM) and the
Iterative Conditional Modes (ICM), which are typically used to
avoid the analytically intractable nature of estimating the best
solution for the HMRF. A common concern with such methods
is their sensitivity to the initialization conditions and the
reaction of the system to input sequences during segmentation.
HMM are easily applicable to larger neighborhoods at costof increasing the additional complexity required by the
algorithm (increase in L in Eq. (37)), instead of adding
increased sensitivities to initial conditions and reactions to
input sequences. The problem of applicability to larger
neighborhoods is specifically important in the context of
segmentation of biomedical imaging data from multiple
modalities where the voxel neighborhood must be extended
across modalities or across time, e.g. in functional MRI,
beyond the 3!3!3 neighborhood. Although this increase may
provide better segmentation accuracy, the increase in accuracy
is bounded, i.e. will occur up to a certain neighborhood, after
which there could not be any significant change in the
segmentation accuracy due to the smoothing effect of utilizing
a larger neighborhood. This issue can even further complicate
the choice of the appropriate neighborhood size, since if the
neighborhood becomes very large, the segmentation accuracy
can be negatively affected. Hence, the contribution of the
different neighbors in the segmentation process may be
weighted according to their distance to the voxel under
investigation. These weights can be inversely proportional to
the distance between the neighboring pixels/voxels and the
investigated pixel/voxel. In other words, the significance of the
neighboring pixels/voxels in the segmentation strategy
increases as the neighbors become closer to the pixel/voxel
under investigation.
4. Preprocessing phase
Preprocessing steps aim to reduce the effects of noise,
address intensity inhomogeneities, and perform global
intensity level correction and are applied prior to segmentation.
These are based on existing techniques and are only presented
here for completeness, but are not discussed in detail.
4.1. Intensity inhomogeneities
Intensity inhomogeneities are defined as variations in voxel
intensities through or across imaging data sets, which appear as
either sudden or slow variations. Handling both types of
intensity variations in a pre-processing phase to segmentation
results in improving the segmentation accuracy through the
control of adverse effects caused by such inhomogeneities. A
normalized histogram intersection between each two consecu-
tive images in a data set is used for this purpose. The
distributions of pixel intensities between each pair of
consecutive images are expected to change slowly. If the
mean and variance across slices nearly match, then the
distribution will change slowly. Assuming that Ii is the
intensity of pixel i in an image, then the standard deviation
of the image is given by
M. Ibrahim et al. / Image and Vision Computing xx (2006) 1158
+ model ARTICLE IN PRESS
-
7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation
9/15
sZ
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1
N
XNiZ1
IiKm2vuut (41)
where m is the mean intensity. If we assume that a contrast a
and a brightness b, which made the standard deviation of thevoxel intensity distribution become r 0, then:
s0Z
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1
N
XNiZ1
aIiCbKamKb2vuut
Zas (42)
This shows that the standard deviation is only affected by
the contrast. Obviously, that case maps to the correction of
each slice, with respect to its preceding slice. The slices that are
considered are those ones having non-empty preceding slices,
which can be determined since after skull peeling (cerebrum
reconstruction, or skull stripping) all the background voxels
end up being exactly zeroes.A similar argument holds for the brightness, where
m0ZamCb (43)
which means that by the knowing a from Eq. (42), b can be
estimated from (43) Fig. 3.
4.2. Global intensity level correction
Global intensity correction is addressed after handling both
sudden and slow intensity variations. Since, the HMM-based
segmentation utilizes the grayscale or color information of
voxels, it is sensitive to global variations among data sets. In
order to remedy this condition, global correction is employed
to maximize the histogram intersection between the data sets,
so that errors due to intensity differences are minimized.
In order to achieve the required global intensity correction,
the normalized histograms are utilized due to differences
between the number of pixels/voxels of different data sets. The
histogram, which represents the frequency of the intensities, is
normalized against the total number of non-background pixels
present in each data set. And brightness that leads to the
maximization of the integral of the histogram intersection,
expressed as follows, is performed after applying an
anisotropic filtering stage:
^Hint
VMaxvZ0
^HIA; vh ^HIB; vdv (44)
The brightness and contrast values were estimated in the
same way done in Eqs. (42) and (43), where they were appliedafter the sudden intensity correction and before the filtering.
5. Training and segmentation steps
Based on the mathematical models of both the discrete and
continuous HMM-based segmentation techniques, the general
form of the HMM-based training and segmentation algorithms
for a 3D neighborhood Ninvolves representing each voxel by a
vector or sequence of symbols v. The sequence represents the
relevant parameters of the voxel and the voxels neighbors in
N. The representative vector or symbol of each voxel is
presented to a set of HMM models, each corresponding to a
separate class or tissue type, and the output probabilities are
calculated using prior training knowledge stored in each HMM
model. Labeling takes place by assigning to the voxel the label
associated with the HMM showing the highest output
probability. Training of both the continuous and the discrete
models follow the same procedure (Fig. 4). The segmentation
also follows the same procedure for both discrete and
continuous HMM-based techniques (Fig. 5). If labeling
encounters segments whose characteristics are not consistent
with any of the known tissue types, these are classed as
unknown tissue. A clinical expert is then requested to assign a
Fig. 3. Sudden intensity correction steps.
Fig. 4. Training the HMM.
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 9
+ model ARTICLE IN PRESS
-
7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation
10/15
label to the unknown tissues. The segments characteristics are
then used to initialize the knowledge of the newly identified
tissue and the corresponding HMM. The acquired knowledge is
then used to label new segments that belong to the newly
identified tissue type.
6. Experimental results
Three types of preprocessing were applied. 3D anisotropic
filtering as described in [13], using kZ5 and for 10 iterations.
Previously in [14], we showed how Global Intensity level
correction could be applied to MRI sequences. Also in [14], we
showed how sudden intensity variations that appear in many
MR sequences could be accounted for. The same techniques
were used here for preprocessing of imaged prior to
segmentation. The anisotropic filter used had kZ5 and applied
for 10 iterations. For the discrete HMM, the number of states
was successively increased and after 15 states no significant
improvement was detected. The number of states used was 10,
with a Gaussian mixture of 15 distributions. The maximum
number of iterations was set to 30,000 and the sigmoid slope to0.08.
Fig. 6 shows the classification accuracy (1-loss) averaged
for every 1000 iterations for the discrete model. It is clear that
the training algorithm reaches the minimum of the error surface
after around 10,000 iterations, which justifies why during
experimentation we chose 30,000 as an upper bound for the
number of iterations.
6.1. BrainWeb data results
The algorithm was tested using simulated digital phantoms
from the BrainWeb MR simulator (http://www.bic.mni.mcgill.ca/brainweb/). The digital phantoms were obtained using an
isotropic voxel size of 19 mm to investigate the influence of
noise, field inhomogeneity, and contrast (T1-weighted using
[18, 10 ms, and 30 (TR, TE, and flip angle))] with varying
levels of noise from 1 to 9% and varying levels of spatial
inhomogeneity, i.e. intensity variations for each tissue class,
from 0 to 40%. The comparison was performed on the basis of
the Dice similarity coefficient that measures the overlap
between two segmentations X and Y
DX;YZ 2jXhYj
jX
jC
jY
j(46)
where jrj represents the number of voxels in segment r. TheDice coefficient was computed for both gray matter and white
matter segmentation. The results are shown in Tables 15.
As can be seen from the tables, the results for the Dice
similarity coefficient shows that the HMM-segmentation
provides accurate segmentation of the White Matter (WM)
and Gray Matter (GM) even in the presence of increasing noise
and spatial inhomegeneities. The increase in the slice thickness
has the expected effect of reducing the accuracy of the
algorithm as evidenced by the similarity coefficient. This is
expected as the algorithm itself is geared to use in 3D images
6.20E-01
6.40E-01
6.60E-01
6.80E-01
7.00E-01
7.20E-01
7.40E-01
7.60E-01
1.00E+03 1.00E+04 1.00E+05 1.00E+06
Iterations
Classification accuracy
Fig. 6. Classification accuracy (1-loss) evaluated across iterations for every
1000 iterations.
Table 1
BrainWeb results, 1 mm slice
Spatial inhomogeneity
0% 20% 40%
White Gray White Gray White Gray
0% 0.831 0.872 0.831 0.872 0.708 0.756
1% 0.825 0.870 0.756 0.801 0.706 0.756
3% 0.815 0.869 0.772 0.828 0.713 0.773
5% 0.793 0.860 0.765 0.833 0.717 0.793
7% 0.739 0.832 0.739 0.825 0.702 0.797
9% 0.663 0.796 0.682 0.799 0.672 0.787
Average 0.778 0.85 0.758 0.826 0.703 0.777
Fig. 5. Segmenting with the HMM.
M. Ibrahim et al. / Image and Vision Computing xx (2006) 11510
+ model ARTICLE IN PRESS
http://www.bic.mni.mcgill.ca/brainweb/http://www.bic.mni.mcgill.ca/brainweb/http://www.bic.mni.mcgill.ca/brainweb/http://www.bic.mni.mcgill.ca/brainweb/ -
7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation
11/15
data sets in which the neighboring images are of a similar
distance as the pixel distances (i.e. that the slice thickness is
close to the pixel distance). In the case of the simulated dataset,
the ground truth is established from the original data
creation, whereby each tissue is clearly established and the
segmentation is completely known. Thus, this data set does not
need an expert segmentation for comparison. Additionally,
expected results should be better that for real data. Note that
Table 4
BrainWeb results, 7 mm slice
Spatial inhomogeneity
0% 20% 40%
White Gray White Gray White Gray
0% 0.568 0.630 0.559 0.589 0.524 0.567
1% 0.568 0.634 0.560 0.594 0.543 0.570
3% 0.577 0.662 0.564 0.613 0.551 0.595
5% 0.569 0.578 0.581 0.652 0.562 0.629
7% 0.563 0.708 0.577 0.671 0.556 0.641
9% 0.510 0.698 0.596 0.697 0.555 0.690
Average 0.559 0.652 0.573 0.636 0.549 0.615
Table 5
BrainWeb results, 9 mm slice
Spatial inhomogeneity
0% 20% 40%
White Gray White Gray White Gray
0% 0.526 0.573 0.523 0.539 0.512 0.518
1% 0.532 0.576 0.523 0.538 0.513 0.521
3% 0.534 0.598 0.532 0.570 0.519 0.538
5% 0.529 0.630 0.539 0.593 0.527 0.594
7% 0.508 0.659 0.535 0.634 0.533 0.596
9% 0.527 0.673 0.529 0.646 0.534 0.630
Average 0.526 0.618 0.53 0.587 0.523 0.566
Table 3
BrainWeb results, 5 mm slice
Spatial inhomogeneity
0% 20% 40%
White Gray White Gray White Gray
0% 0.629 0707 0.607 0.662 0.582 0.6371% 0.628 0.711 0.609 0.669 0.583 0.641
3% 0.634 0.733 0.624 0.704 0.591 0.664
5% 0.620 0.750 0.625 0.719 0.601 0.697
7% 0.618 0.755 0.611 0.745 0.603 0.714
9% 0.598 0.753 0.600 0.752 0.601 0.729
Average 0.621 0.735 0.613 0.709 0.594 0.68
Table 2
BrainWeb results, 3 mm slice
Spatial inhomogeneity
0% 20% 40%
White Gray White Gray White Gray
0% 0.707 0.790 0.671 0.727 0.637 0.707
1% 0.708 0.792 0.672 0.741 0.636 0.7073% 0.703 0.802 0.682 0.765 0.641 0.723
5% 0.695 0.804 0.685 0.783 0.646 0.749
7% 0.668 0.796 0.664 0.782 0.642 0.758
9% 0.600 0.770 0.620 0.772 0.618 0.755
Average 0.68 0.792 0.666 0.762 0.637 0.733
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 11
+ model ARTICLE IN PRESS
-
7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation
12/15
this is used to initial test the capability of the algorithm to carry
out segmentation, further testing with real data is also
presented that show comparisons with existing and expert
segmentation.
Sample images for 1 mm slices are shown in Fig. 7. The
leftmost images are the original phantom data, the center is the
data used to generate the phantom (i.e. the ground truth data),
and the rightmost is the segmented result. As can be seen,subjectively, the segmentation results reflect the objective
overlap results. One noticeable exception is that the segmenta-
tion algorithm currently is configured to only look at gray and
white matter, and ignores all other tissue types. Further work is
continuing in expanding the algorithm for use with additional
tissue types.
6.2. IBSR data results
After segmenting each case, the accuracy of the HMM-
based segmentation relative to the manual segmentations as
well as the results of existing techniques, including themaximum likelihood, was determined using the Tanimoto
coefficient, which was previously used in existing techniques,
and is given by
TX; YZ jXhYjjXhYj (47)
where jrj represents the number of voxels in segment r. By thedefinitions of the Dice and Tanimoto coefficients,
T(X,Y)%D(X,Y). So, the Tanimoto is more conservative
than the Dice, where equality is subject to the condition that
Table 6
IBSR reported results
White Gray Method
0.567 0.564 Adaptive MAP
0.562 0.558 Biased MAP
0.567 0.473 Fuzzy c-means
0.554 0.550 Maximum A posteriori Probability (MAP)
0.551 0.535 Maximum-Likelihood
0.571 0.477 Tree-structure k-means
0.832 0.876 Manual (4 brains averaged over 2 experts)
Fig. 7. Sample segmentation of simulated digital phantoms.
Table 7
Overlapping results obtained from applying both HMM algorithms on the IBSR
data after training
Discrete Continuous
White Gray White Gray
100_23 0.517 0.694 0.774 0.879
11_3 0.537 0.718 0.778 0.878
110_3 0.589 0.747 0.746 0.869
111_2 0.614 0.737 0.748 0.857
112_2 0.610 0.761 0.761 0.874
12_3 0.574 0.748 0.784 0.881191_3 0.504 0.708 0.762 0.870
13_3 0.617 0.746 0.761 0.868
202_3 0.566 0.743 0.756 0.864
205_3 0.499 0.623 0.723 0.782
7_8 0.587 0.745 0.758 0.869
8_4 0.595 0.723 0.742 0.853
17_3 0.613 0.716 0.735 0.854
4_8 0.590 0.690 0.669 0.813
15_3 0.592 0.697 0.669 0.817
5_8 0.578 0.635 0.731 0.854
16_3 0.604 0.719 0.702 0.842
2_4 0.596 0.684 0.635 0.797
6_10 0.528 0.582 0.752 0.855
Average 0.546 0.671 0.699 0.809
M. Ibrahim et al. / Image and Vision Computing xx (2006) 11512
+ model ARTICLE IN PRESS
-
7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation
13/15
XKYZf. Either coefficient can be utilized in the evaluation.
However, the choice is dictated to ensure the consistent
comparison of the published results with the HMM-based
segmentation results. The Tanimoto coefficient was computed
for both gray matter and white matter segmentation based on an
analysis of variance in which the coefficient is the dependent
variable while the training dataset and the tissue type are the
independent factors. The results are demonstrated in Table 7.
Table 6 shows the average results that were reported on theIBSR website using the same data that was used in this study.
The BMAP algorithm described in [11] is based on HMRF
computation. Although the results of the discrete model shows
to be near from most of the reported ones, yet the results of the
continuous model shows to be superior even when compared
HMRF. The IBSR data consists of various image sequences
representing differing real work data sets. The HMM was
trained with one data set, and then used to segment the
remaining data sets. In both Tables 7 and 8, the first column
represents the image sequence numbers. The averages for the
HMM are show on the last row for comparison with existingresults.
For a fair comparison, the preprocessing phase of intensity
variations was removed, and the results were compared to that
of the Adaptive MAP algorithm [15], which takes care of
intensity variations of segments through an ML stage for
initialization purposes. The AMAP is based on Hidden Markov
Random Fields, so after removing this preprocessing phase the
comparison becomes so close to comparing both algorithms
together except for the difference in filtering.
We compare with the AMAP and not the BMAP because the
latter models the bias field, which was not considered in our
analysis. The results in Table 8 demonstrate that the HMMwere able to segment the brain with higher accuracies.
Conversely, this supports the initial argument present at the
beginning of the paper, which is that HMMs during
classification encodes relations not only between neighboring
voxels, but also between voxels present in non-neighboring
sites and which is not present in the HMRF.
Inspection of segmented slices from case 5_8, which was
before described in the preprocessing phases during sudden
intensity correction, is shown in Fig. 8. The comparison reveals
the expected, without sudden intensity correction the whole
slice is erroneously segmented as gray, and the bright one gets
segmented as being white.The results demonstrated in the tables demonstrate an
objective assessment of the quality of the algorithms, yet
practical cases may be much higher than that since each of the
IBSR data sets contains at least has one form of difficulty.
Table 8
Overlapping results obtained without carrying out sudden intensity correction
White Gray
100_23 0.792087 0.867957
11_3 0.795936 0.863375
110_3 0.756762 0.850009
111_2 0.777369 0.844556112_2 0.775694 0.854445
12_3 0.814668 0.873579
191_3 0.798191 0.864343
13_3 0.799298 0.869574
202_3 0.793114 0.857034
205_3 0.760803 0.761042
7_8 0.743964 0.836973
8_4 0.734167 0.817848
17_3 0.710917 0.814474
4_8 0.631508 0.774435
15_3 0.700746 0.790821
5_8 0.238758 0.690957
16_3 0.72307 0.815892
2_4 0.632165 0.77051
6_10 0.386919 0.668827Average 0.668307 0.774333
Fig. 8. Comparison between segmented slices from image sequence 5_8 with and without carrying out sudden intensity correction.
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 13
+ model ARTICLE IN PRESS
-
7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation
14/15
7. Conclusion
In this paper, a 3D MRI segmentation algorithm based on
HMMs is presented. The algorithm demonstrates the ability of
HMMs to handle multi-dimensional classification, whereas
HMMs were previously considered as candidates for 1D
classification only. The HMM model, together with carefullyconstructed preprocessing steps showed significant improve-
ment in the quality of 3D MRI segmentation when objectively
compared to other results obtained using the same data. Both
simulated and real data were used in the evaluation of the
algorithm with promising results. The objective measure on the
simulated phantoms (created from images used to establish
ground truth) showed that the algorithm, although currently
restricted to only gray and white matter, accurately identifies
these tissues within limits of error. Further work is progressing
on increasing the number of identified tissues. The results from
the real data (using expert manual segmentations as ground
truth) showed that the overlap measures are better thatpreviously established methods, and are within the limits of
error. This is easily seen even the expert manual segmentation
shows errors from one operator to another.
Comparisons between HMMs and HMRFs concerning the
complexity of computations involved and the ability to
segment based on the decision made from larger regions are
presented. The comparative results indicate that the current
mathematical model of MRF using Gibbs distribution can be
extended to neighborhoods larger than a 3!3!3 neighbor-
hood if the Markovianity assumption is extended to higher
orders. A restriction that is not required for the current HMM
modeling scheme.
The challenge in using HMRFs for larger neighborhoods
requires more research in terms of innovative modeling
schemes. In the HMM, application to larger neighborhoods
occurs with only a change in the input vectors used for training
and segmentation to represent this larger neighborhood since
the parameter updates do not rely on the size of the input
vector. No further change in the HMM algorithm is necessary.
For that purpose, the proposed HMM provides a robust
foundation that is not sensitive to the initial conditions for
enabling the segmentation of MR imaging data.
The problem that affects the application of MRF in
segmentation is the classification of the voxel based on regions
larger than 3!3!3 neighborhoods. The basis of the mappingfrom HMRF to Gibbs distribution forces the neighborhood
used to compute the Gibbs distribution in a 3!3!3
neighborhood. Larger regions necessitate the analysis of higher
order Markov Fields, which in turns needs the re-definition of
the neighborhood. To successfully relate larger neighborhoods
HMRF must be used, where iterative segmentation takes place.
This is due to the dependency of segmenting each voxel on its
neighbors and their prior segments, which are used to compute
the potential. This in turn becomes subject to iterative local
maximization/minimization algorithms like the Expectation
Maximization (EM) and the Iterative Conditional Modes
(ICM). A common and crucial problem with such methods is
their sensitivity to the initialization conditions and the reaction
of the system to input patterns during segmentation.
The cost of increasing the neighborhood size in the context
of the proposed segmentation strategy is the extra compu-
tations required by the algorithm (increase in L in Eq. (37)).
The problem of applicability to larger neighborhoods is
specifically important in the context of segmentation ofbiomedical imaging data from multiple modalities where the
pixel/voxel neighborhood must be extended across modalities
or across time, e.g. in functional MRI, beyond the 3!3!3
neighborhood. Although this increase may provide better
segmentation accuracy, this will only occur up to a certain
point after which there would not be any significant change in
the segmentation accuracy. This issue is under investigation.
This issue can even further complicate the choice of the
appropriate neighborhood size, since if the neighborhood
becomes very large, the segmentation accuracy can be
negatively affected. Hence, the contribution of the different
neighbors in the segmentation strategy can be weightedaccording to their distance to the voxel under investigation.
These weights can be inversely proportional to the distance
between the neighboring pixels/voxels and the investigated
pixel/voxel, in other words, the significance of the neighboring
pixels/voxels in the segmentation strategy increases as the
neighbors become closer to the pixel/voxel under investigation.
Further work is continuing on the effect of increased
neighborhood sizes.
Other considerations that will enhance the accuracy of
segmentation include the usage of multi-spectral images, not
only T1 but also T2 and PD. The vectors used in classification
will then be extracted from each voxel and its neighbors in thethree images forming a 27!3 input. And in this case, a 3D
Gaussian mixture model can be used, where the input to each
state is a vector of the three intensities. Similar work using
multi-sensor data and Hidden Markov Chains has been
reported [33] and concludes that the applicability of HMC to
these problems is appropriate. This does present promising
results that have yet to be applied to MR imaging data. Further
work in applying HMM to multi-spectral MR imaging data is
currently in progress.
References
[1] W. Grimson, G. Ettinger, T. Kapur, M. Leventon, W. Wells, R. Kikinis,
Utilizing segmented MRI data in image-guided surgery, International
Journal of Pattern Recognition and Artificial Intelligence 11 (8) (1998)
13671397.
[2] S. Warfield, J. Dengler, J. Zaers, C.R.G. Guttmann, W.M. Wells,
G.J. Ettinger, J. Hiller, R. Kikinis, Automatic identification of grey matter
structures from MRI to improve the segmentation of white matter lesions,
Journal of Image Guided Surgery 1 (6) (1996) 326338.
[3] E. Grimson, M. Leventon, G. Ettinger, A. Chabrerie, S. Nakajima,
F. Ozlen, H. Atsumi, R. Kikinis, P. Black, Clinical Experience with a
High Precision Image-Guided Neurosurgery System, MICCAI, Springer,
Berlin, 1998. pp. 6373.
[4] L.R. Rabiner, A tutorial on hidden markov models and selected
applications in speech recognition, Proceedings of the IEEE (1989)
257286.
M. Ibrahim et al. / Image and Vision Computing xx (2006) 11514
+ model ARTICLE IN PRESS
-
7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation
15/15
[5] L. Bahl, P.F. de Souza Brown, P.V., K.L. Mercer, Maximum mutual
information estimation of hidden markov parameters for speech
recognition, Proceedings of the IEEE (1988) 4952.
[6] Biing-Hwang Juang, W. Chou, Chin-Hui Lee, Minimum classification
error rate methods for speech recognition, IEEE Transactions on Speech
and Audio Processing (1997) 257265.
[7] S. Katagiri, Biing-Hwang Juang, Chin-Hui Lee, Pattern recognition
using a family of design algorithms based upon the generalized
probabilistic descent method, Proceedings of the IEEE 86 (11) (1998)
23452373.
[9] Y. Zhang, M. Brady, S. Smith, Segmentation of brain MR images
through a hidden markov random field model and the expectation
maximization algorithm, IEEE Transactions on Medical Imaging 20 (1)
(2001) 4557.
[10] J.C. Rajapakse, J. Piyaratna, Bayesian approach to segmentation of
statistical parametric maps, IEEE Transactions on Biomedical Engin-
eering 48 (10) (2001) 11861194.
[11] J.C. Rajapakse, F. Kruggel, Segmentation of MR images with intensity
inhomogeneities, Image and Vision Computing 16 (1998) 165180.
[12] Jagath. C. Rajapakse, Jay. N. Giedd, Judith. L. Rapoport, Statistical
approach to segmentation of single-channel cerebral MR images, IEEE
Transactions on Medical Imaging 16 (2) (1997) 176186.
[13] N.M. John, A three dimensional statistical model for image segmentation
and its application to mr brain images, PhD thesis, University of Miami,
1999.
[14] N.M. John, M. Kabuka, M.O. Ibrahim, Multivariate statistical model for
3D image segmentation with application to medical images, Journal of
Digital Imaging 16 (4) (2004) 365377.
[15] J.C. Rajapakse, J.N. Giedd, J.L. Rapoport, Statistical approach to
segmentation of single-channel cerebral MR images, IEEE Transactions
on Medical Imaging 16 (2) (1997) 176186.
[16] K. Van Leemput, F. Maes, D. Vandermeulen, P. Suetens, Automated
model-based tissue classification of MR images of the brain, IEEE
Transactions on Medical Imaging 18 (10) (1999) 8979008.
[17] S.Z. Li, Markov random field models in computer vision, in: Proceedings
of the European Conference on Computer Vision, Stockholm, Sweden,
1994, pp. 361370.
[18] J. Besag, On the statistical analysis of dirty pictures, Journal of the RoyalStatistical Society, Series B 48 (3) (1986) 259302.
[19] K. Held, E.R. Kops, B.J. Krause, W.M. Wells III, R. Kikinis, H.-
W. Muller-Gartner, Markov random field segmentation of brain MR
images, IEEE Transactions on Medical Imaging 16 (6) (1997) 878886.
[20] X. Descombes, F. Kruggel, D.Y. von Cramon, Spatio-temporal fMRI
analysis using markov random fields, IEEE Transactions on Medical
Imaging 17 (6) (1998) 10281039.
[21] S. Ruan, C. Jaggi, J. Xue, J. Fadili, D. Bloyet, Brain tissue classification of
magnetic resonance images using partial volume modeling, IEEE
Transactions on Medical Imaging 19 (12) (2000) 11791187.
[22] K. Van Leemput, F. Maes, D. Vandermeulen, P. Suetens, A unifying
framework for partial volume segmentation of brain MR images, IEEE
Transactions on Medical Imaging 22 (1) (2003) 105113.
[23] W.M. Wells, E.L. Grimson, R. Kikinis, F.A. Jolesz, Adaptive
segmentation of MRI data, IEEE Transactions on Medical Imaging 15
(8) (1996) 429442.
[24] R. Guillemaud, J.M. Brady, Estimating the bias field of MR images, IEEE
Transactions on Medical Imaging 16 (6) (1997) 238251.
[25] J.L. Marroquin, B.C. Vemuri, S. Botello, F. Calderon, A. Fernandez-
Bouzas, An accurate and efficient bayesian method for automatic
segmentation of brain MRI, IEEE Transactions on Medical Imaging 21
(8) (2002) 934945.
[26] B. Moretti, L.M. Fadili, S. Ruan, N. Bloyet, B. Mazoyer, Phantom-based
performance evaluation: application to brain segmentation from magnetic
resonance images, Medical Image Analysis 4 (4) (2000) 303316.
[27] A. Zavaljevski, A.P. Dhawan, M. Gaskil, W. Ball, J.D. Johnson, Multi-
level adaptive segmentation of multi-parameter MR brain images,
Computerized Medical Imaging and Graphics 24 (2) (2000) 8798.
[28] Y. Wang, T. Adali, J. Xuan, Z. Szabo, Magnetic resonance image analysis
by information theoretic criteria and stochastic site models, IEEE
Transactions on Information Technology in Biomedicine 5 (2) (2001)
150158.
[29] K. Van Leemput, F. Maes, D. Vandermeulen, P. Suetens, A statistical
framework for partial volume segmentation, Lecture Notes Computer
Science 2208 (2001) 204212.
[31] R. Fjortoft, Y. Delignon, W. Pieczynski, M. Sigelle, F. Tupin,
Unsupervised classification of radar images using hidden markov chains
and hidden markov random fields, IEEE Transactions of Geoscience and
Remote Sensing 41 (3) (2003) 675685.
[32] S. Derrode, W. Pieczynski, Signal and image segmentation using pairwise
markov chains, IEEE Transactions on Signal Processing 52 (9) (2004)
24772489.[33] N. Giordana, W. Pieczynski, Estimation of generalized multisensor
hidden markov chains and unsupervised image segmentation, IEEE
Transactions on Pattern Analysis and Machine Intelligence 19 (5) (1997)
465475.
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 15
+ model ARTICLE IN PRESS