Hidden Markov Models-based 3D MRI Brain Segmentation

7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation

1/15

Hidden Markov models-based 3D MRI brain segmentation

M. Ibrahim, N. John, M. Kabuka *, A. Younis

Department of Electrical and Computer Engineering, College of Engineering, University of Miami,

1251 Memorial Drive, Room 406, Coral Gables, FL 33146, USA

Received 18 September 2004; received in revised form 4 February 2006; accepted 1 March 2006

Abstract

This paper introduces a 3D MRI segmentation algorithm based on Hidden Markov Models (HMMs). The mathematical models for the HMM

that forms the basis of the segmentation algorithm for both the continuous and discrete cases are developed and contrasted with Hidden MarkovRandom Field in terms of complexity and extensibility to larger fields. The presented algorithm clearly demonstrates the capacity of HMM to

tackle multi-dimensional classification problems.

The HMM-based segmentation algorithm was evaluated through application to simulated brain images from the McConnell Brain Imaging

Centre, Montreal Neurological Institute, McGill University as well as real brain images from the Internet Brain Segmentation Repository (IBSR),

Harvard University. The HMM model exhibited high accuracy in segmenting the simulated brain data and an even higher accuracy when

compared to other techniques applied to the IBSR 3D MRI data sets. The achieved accuracy of the segmentation results is attributed to the HMM

foundation and the utilization of the 3D model of the data. The IBSR 3D MRI data sets encompass various levels of difficulty and artifacts that

were chosen to pose a wide range of challenges, which required handling of sudden intensity variations and the need for global intensity level

correction and 3D anisotropic filtering. During segmentation, each class of MR tissue was assigned to a separate HMM and all of the models were

trained using the discriminative MCE training algorithm. The results were numerically assessed and compared to those reported using other

techniques applied to the same data sets, including manual segmentations establishing the ground truth for real MR brain data. The results

obtained using the HMM-based algorithm were the closest to the manual segmentation ground truth in terms of an objective measure of overlap

compared to other methods.q 2006 Elsevier B.V. All rights reserved.

Keywords: Hidden Markov Models; Image segmentation; Medical imaging

1. Introduction

Interpretation of the biomedical imaging of the brain plays

an important part in diagnosis of various diseases and injury.

Due to the importance of brain imaging interpretation,

significant research efforts have been devoted to developing

better and more efficient techniques in several related areas

including processing, modeling, and understanding of brain

images. In particular, the problem of automating 3D

segmentation of brain imaging using Magnetic Resonance

Imaging (MRI), Computed Tomography (CT), Positron

Emission Tomography (PET) or other modalities, has received

special attention as evidenced by numerous published research

work[13]. This is mainly due to the multitude of benefits that

may be gained from accurate automated 3D brain

segmentation.

Segmentation frameworks based on Markov Random Fields

(MRF) and Hidden Markov Random Fields (HMRF) were

introduced in several reported efforts [912]. MRFs and

HMRFs share the common property of revealing the

dependency between the imaging voxels to be segmented and

their first-degree neighbors. However, both frameworks are

computationally intensive, which adversely affects their

practical applicability in medical environments. On the other

hand, Hidden Markov Models (HMMs) have proven valuable

when applied to Automatic Speech Recognition (ASR) [4],

where ASR is essentially a pattern recognition problem. In fact,

HMRFs, which are mainly applied in computer vision and

image processing, grew out of further developments of HMMs.

Hidden Markov Chains have also been reported for image

segmentation using radar, synthetic and multi-sensor images

[3133]. A generalized mixture estimation approach is

Image and Vision Computing xx (2006) 115

www.elsevier.com/locate/imavis

0262-8856/$ - see front matter q 2006 Elsevier B.V. All rights reserved.

doi:10.1016/j.imavis.2006.03.001

* Corresponding author. Tel.: C1 305 284 2212; fax: C1 305 284 4044.

E-mail address: [email protected] (M. Kabuka).

+ model ARTICLE IN PRESS
http://www.elsevier.com/locate/imavismailto:[email protected]:[email protected]://www.elsevier.com/locate/imavis


2/15

presented for unsupervised classification of Hilbert-Peano

scans of radar images [31], which combines Hidden Markov

Chain models and Hidden Markov Random Field models.

Similarly, pairwise Markov Random Chain models provided

the basis for unsupervised signal and image segmentation of

simulated as well as radar images [32]. Another approach

utilizing Hidden Markov Chains was presented for imagesegmentation of synthetic and multi-sensor radar images [33].

These techniques provide promising results for utilizing

HMMs for MR image segmentation.

HMMs, implemented using the Viterbi algorithm, are

sufficiently capable of encoding the first-degree relationships

and can be extended to higher degrees. Encoding first-degree

relationships among the voxels will be shown, as evidenced by

the experimental results, to provide sufficient information for

accurate segmentation of 3D MRI brain imaging data. The

main training algorithms that have been developed for HMMs

are the BaumWelsh algorithm [4] and the Maximum Mutual

Information (MMI) algorithm [5]. The inefficiency of both

techniques is argued in the context of Bayesian classification

where it is shown that both algorithms do not necessarily result

in the best Bayesian threshold [6]. Consequently, a new

algorithm, namely the Minimum Classification Error (MCE),

was developed [6], which takes into consideration exposing

each of the HMM nodes to both the patterns to be rejected as

well as the patterns to be recognized. As a result, the HMM

nodes can minimize the accompanying error rate by moving

the Bayesian threshold closer to the correct location as shown

in Fig. 1.

Many advances in brain MR image segmentation have

relied on a Bayesian framework and Markov Random Fields

(MRFs) [17]. In [15], the smoothness and piecewise contiguousnature of the tissue regions in MR cerebral images was

modeled using a 3D MRF. A segmentation algorithm, based on

the statistical model, finds the approximate Maximum A

Posteriori (MAP) estimation of the segmentation model

parameters from the MR imaging data. Another scheme for

segmentation was based on the Iterative Conditional Modes

(ICM) algorithm [18], in which measurement model para-

meters were estimated using local information at each site, and

the prior model parameters were estimated using the

segmentation results after each cycle of iterations. In this

case, MRFs were used to model only the intensity process, and

the segmentation results were improved by incorporating the

discontinuity process into the prior model. The scheme also

addressed the effect of magnetic field inhomogeneities and

biological variations of tissues as variations of the model

parameters. Unfortunately, this model did not investigate the

discontinuity process in the 3D MR volumes.A fully automated 3D-segmentation technique for MR brain

images was introduced in [19] that relied on a MRF model to

capture the non-parametric distributions of tissue intensities,

neighborhood correlations, and signal inhomogeneities in MR

images. The technique used two algorithms based on Simulated

Annealing and on Iterative Conditional Modes and started with

a training process of typical echo intensities and setting one of

the MRF parameter according to the expected inhomogeneity.

The technique was able to automatically segment the entire 3D

MR volume, as well as different MR images acquired using the

same MR sequence. Another study [20] involved embedding

the problem of functional MRI (fMRI) analysis into a Bayesian

framework, and then provided an algorithm to restore and

analyze fMRI using MRFs in a Bayesian framework. The study

analyzed the shortcomings of the Statistical Parameter Map

(SPM) by using a 3D MRF where the third dimension

represents time, and then the proposed restoration approach

was applied before using SPM, which resulted in an

improvement of the detection sensitivity. This study also

analyzed the hemodynamic response using three parameters,

the norm, the maximum and the time when the maximum

occurs, where it was shown that when the values of these

parameters in neighboring voxels are far from each other, the

probability of detection is lower since the associated

hemodynamic responses are not consistent in the spatialdomain. Hence, the problem was modeled using two-level

MRF interactions between the activation map and the three

parameter maps. The detection of an activated area, thus,

depends on the norm of the hemodynamic response and some

contextual information on this norm as well as the consistency

of the hemodynamic function parameters across this area.

Another fully automated method for model-based tissue

classification of magnetic resonance MR images of the brain

was introduced in [16]. The method relies on MRFs to

incorporate contextual information and uses a digital brain

atlas for the expected a priori information of the spatial

locations of the tissue classes. The main idea of the method is

to interleave the classification with MR bias field correction,

intensity distribution estimation, and estimation of MRF

parameters. Hence, it improves the classification in each

iteration of the segmented single and multi-spectral MR

images, and corrected MR signal inhomogeneities. The

proposed strategy can be considered a fully automated method

for tissue classification that produces objective and reprodu-

cible results. Another automatic method is presented in [21],

where the objective of the study is to classify the brain tissue

while taking into account the partial volume effect, which

results in MR image volumes being composed of a mixture of

several tissue types. This study assumes that the brain dataset is

composed of gray matter, white matter, cerebro-spinal fluid,

Class PDF

Non-classPDF

Errorneous threshold

Bayesian threshold

Probability

Argument

Fig. 1. Correct Bayesian threshold vs. erroneous one.

M. Ibrahim et al. / Image and Vision Computing xx (2006) 1152



3/15

and mixtures (called mix-classes). The study provided a

statistical model of the mix-classes and it showed that it

could be approximated by a Gaussian function under some

conditions. The proposed method used a two-step strategy; in

the first step, it segmented the brain into pure and mix-classes

while the second step is to re-classify the mix-classes into the

pure classes using knowledge about the obtained pure classes.Both steps use MRF models as well as the multi-fractal

dimension describing the topology of the brain to provide an

additional energy term in the MRF model to improve

discrimination of the mix-classes. The proposed strategy is

unsupervised, fully automatic, and uses only T1-weighted

images. In [22], a statistical framework for partial volume

segmentation of MR images of the brain was introduced. The

framework starts by segmenting the image using a parametric

statistical model in which each voxel is classified to one single

type of tissue. Then, it uses a down-sampling step that

addresses partial volumes along the borders between tissues. In

this step, a number of voxels in the original image grid

contribute to the intensity of each voxel in the resulting image

grid. The framework also uses an Expectation Maximization

(EM) approach to estimate the parameters of the new model

and to perform the partial volume classification.

In [23], a statistical segmentation framework of brain MR

images based on Hidden Markov Random Field (HMRF) is

introduced, which overcomes the problems of Finite Mixture

(FM) models [24,25] that do not take into account the spatial

properties of the image. The HMRF model is an MRF model

whose state sequence cannot be observed directly but can be

indirectly estimated through observations. The strategy also

uses an EM algorithm to provide an accurate and robust

segmentation. The study in [26] introduced an efficient andaccurate automatic 3D segmentation approach for brain MR

images. The approach uses a brain atlas in conjunction with a

robust registration procedure to find a non-rigid transformation

that maps the standard brain to the specimen to be segmented,

and hence, is used to segment the brain from non-brain tissues

and compute prior probabilities for each class at each voxel

location. The approach also involved a fast and accurate way to

find optimal segmentations based on EM models, given the

intensity models along with the spatial coherence assumption.

Unfortunately, the study does not take the Partial Volume (PV)

effect into account.

A contextual segmentation technique to detect brain

activation from functional brain images based on a Bayesian

framework is presented [28], which uses an MRF model to

represent configurations of activated brain voxels. It also uses

likelihoods given by statistical parametric maps to find the

maximum a posteriori estimation of segmentation. The

technique is capable of analyzing experiments involving

multiple-input stimuli. The study in [27] introduced a model-

based approach for automatic segmentation and classification

of multi-parameter MR brain images into 15 tissue classes. The

model approximated the spatial distribution of tissue classes by

a Gaussian MRF and used the maximum likelihood method to

estimate class probabilities and transitional probabilities for

each pixel of the image. The proposed algorithm is not only

accurate compared to manual segmentation but also can learn

new tissue classes. An unsupervised tissue characterization

algorithm was introduced in [29] that is both statistically

principled and patient specific. The method used adaptive

standard finite normal mixture and inhomogeneous MRF

models, whose parameters were estimated using ER method

and relaxation labeling algorithms under information theoreticcriteria.

A technique for assessing the accuracy of segmentation

algorithms was presented in [10] and applied to the

performance evaluation of brain editing and brain tissue

segmentation algorithms for MR images. It relied on a

distance-based discrepancy features between the ground truth

obtained from realistic digital brain phantom, which is taken as

a reference, and the edited/segmented brain tissues. The

proposed strategy can be used to evaluate and validate any

segmentation algorithm, and it is able to determine quantitat-

ively to what extent a segmentation algorithm is sensitive to

internal parameters, noise, artifacts or distortions when a

ground truth is given.

In this paper, a segmentation algorithm based on Hidden

Markov Models is presented, in conjunction with the required

preprocessing, for MR data. The algorithm is multi-dimen-

sional and demonstrates a high degree of accuracy for 3D MRI

brain segmentation, compared to other techniques. Unlike

generic pre-processing used in most image processing and

computer vision applications, the pre-processing phases used in

this algorithm are specifically developed to handle problems

encountered in 3D MRI brain segmentation. These problems

include correction of sudden intensity variations resulting from

artifacts during the acquisition process and global brightness

and contrast correction, with both problems showing asignificant impact on segmentation accuracy. In addition to

its segmentation accuracy, the HMM-based segmentation

algorithm distinguishing characteristics include efficient

computational requirements, unique scanning of the 3D MRI

data that enables the modeling of the voxels neighborhood

effect on that voxels segmentation, and generic applicability to

larger neighborhoods that is important for the detection of

larger features that exceed the high-resolution neighborhood

size.

The 3D MRI segmentation algorithm was evaluated using

simulated 3D MRI brain data sets obtained from McConnell

Brain Imaging Centre, Montreal Neurological Institute, McGill

University (http://www.bic.mni.mcgill.ca/) and real 3D MRI

brain data sets obtained from the Internet Brain Segmentation

Repository (IBSR), Center for Morphometric Analysis at

Massachusetts General Hospital (http://www.cma.mgh.harvard.

edu/ibsr/). The 3D MRI data sets are used to perform an

objective assessment of the segmentation results based on a

metric that enables the comparison of the segmentation results

obtained using the presented algorithm as well as clinical

experts performing segmentation manually, which are avail-

able from the IBSR web site. The metric is termed the

overlapping coefficient and is equal to one if the automatic

segmentation results were identical to the manual ones and

reduces to 0 with no intersection. The quality of the

M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 3

http://www.bic.mni.mcgill.ca/http://www.cma.mgh.harvard.edu/ibsr/http://www.cma.mgh.harvard.edu/ibsr/http://www.cma.mgh.harvard.edu/ibsr/http://www.cma.mgh.harvard.edu/ibsr/http://www.bic.mni.mcgill.ca/


4/15

segmentation results obtained using the algorithm presented in

this paper were further evaluated by comparison with the

results of other algorithms applied to the same data sets and

published on the IBSR website.

This paper is organized as follows: Section 2 describes the

underlying mathematical foundation upon which the algorithm

is based. Section 3 provides the details of the adoptedmathematical model for the discrete Hidden Markov Models

and is followed by the mathematical foundation of the

continuous case. Then, a complexity analysis for comparing

Markov Random Fields and Hidden Markov Models is

presented in Section 3. Section 5 details the training and

segmentation steps in both the continuous and discrete cases.

Section 4 provides the detail of the preprocessing phases.

Finally, experimental results using both real and simulated 3D

MRI data sets are presented in Section 6.

2. Mathematical model

The basic foundation of the presented algorithm relies on

the ability of the underlying Hidden Markov Model (HMM) to

build knowledge about the input multi-dimensional data

vectors or sequences that reflect the parameters of the MR

imaging modality, i.e. intensity information about the voxel

and its neighborhood. Hidden Markov Models are descendants

of Markov Chains, which are made of different states

statistically bound by transition probabilities. A HMM is

characterized by a set of internal states, the transition

probabilities among the states in response to an input symbol

from the sequence, and the emission probabilities of symbols

from the different states. The HMM knowledge is built in theform of the transition as well as the emission probabilities of

the states that are conditioned in response to the input symbols

of the sequence during the learning stage based on two

mathematical assumptions. First, the Markovianity assump-

tion, which is expressed as follows:

pqiZ sijqiK1Z sa;qiK2Z sb;.ZpqiZ sijqiK1Z sa (1)Eq. (1) imposes the condition that the probability p of

transition from one state qiK1 to another qi, is only dependent

on the previous state qiK1. In other words, the probability is

independent of the states prior to qiK1.

Second, the assumption that the emission probabilities from

each state are independent of each other, which leads to the

output probability being the product of the emission

probabilities of all states, as expressed in Eq. (2) as follows

pOjLZX

q1;q2;.;qn

pq1 bq1 O1aq1q2 bq2O2aq2q3/bqn On (2)

where p is the output probability of a chain OZO1O2/On,

bqx(Oy) is the emission probability of pattern Oy from state qx,

aij is the transition probability from state i to state j, pqx is the

initial probability of state qx, and L is a vector representing the

model parameters. Higher-order Markov Models increase the

level of dependency, which complicates the analysis of higher-

order systems. Moreover, first-order Hidden Markov Models

assume that the states are hidden and cannot be observed at the

output stage. Instead, only the outputs emitted from those states

are observable without knowing which states emitted those

outputs.

This is true when the Hidden Markov Models are viewed

from a similar perspective to the one presented in [4], where the

HMM was imagined as a process generating output symbolsand the observations were viewed from the outside without

knowing which states emitted them. At that point, the emission

probability of one state can well be assumed to be independent

of the other outputs. However, a different case exists when the

HMM is used for MR image segmentation, where the objective

is to find the best state sequence that might have produced an

output. By inspecting (2), the output probability for each

segment is calculated using the most probable path only, i.e.

without a summation over all possible paths. During training,

the goal of the training algorithms is to increase the output

probability of input sequences representing a certain class of

tissues. Hence, the transition and emission probabilities are

updated in a manner that maximizes the output probability of a

given class of tissues. This in some cases entails changing

transition and emission probabilities of prior states in order to

maximize the output probability given a certain terminating

output. A case that is clear if the output probability is

considered only due to the most probable path. This

mechanism in turn encodes some form of relationship between

the terminal and the input sequences. The encoding of relations

arises from the fact that upon updating the transition

probabilities of the prior states their values are decreased,

forcing the most probable chain of states to change to another

set of states having higher transition and emission probabilities.

The fact that relation encoding takes place is demonstratedthrough a numerical example that shows that emission

probabilities during classification with HMMs are conditioned

by non-neighboring outputs.

The encoding of this relation is demonstrated through the

example HMM, shown in Fig. 2, which involves two states,

State 0 and State 1 with the following initial probability,

pq0Zpq1Z0:5, emission probabilities, bq0(0)Z0.8, bq0(1)Z

0.2, bq1(0)Z0.4, bq1(1)Z0.6, and transition probabilities a00Z

0.3, a01Z0.7, a10Z0.6, a11Z0.4. This was tested using a

sequence of five outputs 00000 and the most probable chain

was found to be 01010 with probability of 0.016257.

However, when the last output is changed to 00001, themost probable chain changes not only in terms of the last state

but also in terms of the first state to be 10101 with probability

of 0.008129. Changing the output emitted final state inferred a

change in initial state and consequently changes the emission

Pi=0.5

b(0)=0.8b(1)=0.2

Pi=0.5

b(0)=0.4b(1)=0.6

0.7

0.6

0.3 0.4

State 0 State 1

Fig. 2. Example HMM.




5/15

probability of O1 depending on O4. This shows that the

emission probability of output intensities can be conditioned by

the presence of other output intensities emitted by non-

neighboring states. A simple argument based on those results

shows that the HMM can encode relations in more than one

dimension, since the intensities in these sequences are

constructed from a 3!3!3 neighborhood of voxels. More-over, the HMM encodes relations between intensities of non-

neighboring voxels in the same 3!3!3 neighborhood, even if

they do not reside in the same clique, as defined in HMRF

models. The knowledge stored in the HMM encodes the

conditional dependency of the voxels intensities and the class

of tissue to which they belong in the form of the initial

probabilities, transition probabilities and emission probabilities

which are based on the mathematical model of the HMM

transition among the constituent states. In contrast, Hidden

Markov Random Fields (HMRFs) are based on Gibbs

distribution, which encodes relations between voxels through

the usage of cliques and mathematical modeling of thepotential.

In other words, both MRFs and HMRFs provide a

mathematical model for the dependency between voxel

intensities. However, HMMs can establish similar dependen-

cies among pixel/voxel intensities that are in larger regions or

do not belong to the same clique, as will be shown in Section 3

addressing the HMM mathematical model. In this work, when

presenting the pixel/voxel data to the HMM-based segmenta-

tion module, each pixel/voxel is represented by a vector

composed of its grayscale/color value as well as those of other

pixels/voxels in its neighborhood, 9 pixel-vector and 27 voxel-

vector for 2D and 3D imaging data, respectively. The vector is

presented to the HMM models and the probability of output is

calculated using prior training knowledge stored in the model.

Labeling takes place by setting the label of the voxel to that of

the HMM showing the highest output probability.

The outputs of a HMM can be discrete, acquiring certain

specific quantized levels or continuous based on continuous

probability density functions (PDFs). The most common

continuous PDF representation is a Multivariate Gaussian

distribution whose co-variances are assumed to be zeros,

reducing to a mixture additive set of normal distributions. By

estimating the probability that a pattern was generated by a

certain HMM where the most probable model to produce that

pattern is regarded as its tissue type or class. HMMs werepreviously used successfully in automatic speech recognition

(ASR) and are commonly used with the Minimum Classifi-

cation Error (MCE) training algorithm described in [6,7],

which forms the foundation of the learning process employed

in the proposed segmentation framework.

During the MCE training, the derivatives of the output are

computed with respect to every parameter to be updated. Since

the output we seek is the class number, a continuous

differentiable formula is required that evaluates the correctness

of the result by replacing the non-differentiable discrete on/off

output. The mathematical model of the loss in [6,7] was used

for that case where

liZ sigmoiddiZ1

1CeKgdiCq(3)

where g is the sigmoid slope, q is a shift and di is continuous

variable that is more negative when the result is more correct,

i.e. when the HMM of class i has higher probability, which can

be expressed as follows:

diZKgiX;LC1

kK1

XkjZ1

jsi

gjX;Lh

0BBBBBB@

1CCCCCCA

1=h

(4)

The right term of Eq. (4) approaches MAXk

jZ1

jsi

gjX;L as h/

N, which leads to di being negative if the HMM model of class

i showed the highest probability and so will the corresponding

li.

gx is a discriminant function for each class, which is notnecessarily corresponding to a probability since no restrictions

are imposed for that purpose. However, by using HMMs the

output is the probability of the pattern, and the used

discriminant is the probability due to the most probable path.

k is the number of models involved.

The MCE updates each parameter trying to reach the

minimum of li. For a certain parameter x, this update proceeds

as follows

xtC1Zx

tK3

vli

vx(5)

where 3 is the learning rate.

In the MCE algorithm [7], it is discussed that if the learningrate was chosen such that the following conditions are satisfied

XNtZ0

3t/N (6)

XNtZ0

3t2!N (7)

the model parameters L approaches at least a local minimum

L*. It is also described that by using a small sigmoid slope that

increases across iterations, the global minimum is achievable

with a higher probability than other training algorithms due tosmoothing of the error surface. Both considerations were

addressed in the context of this paper, where the learning rate is

given by:

3tZ 301Cat

(8)

The integration from zero to infinity is infinity, and the

integration of the learning rate squared is equal to 3(0)/a, where

a is a constant, and t is time, which is substituted by the

iteration number, i.e. (6) and (7) are satisfied. In other words,

the proposed HMM is accurate since it converges to the global

minimum as well as robust since the convergence is only

dependent on the established learning rate.




6/15

Two HMM models will be considered during analysis. The

first one is a binary discrete HMM where each node has an

emission probability for zero and an emission probability for

one. Consequently, the input is taken in the form of a long

vector having the binary equivalent of the intensities

represented in eight bits each. The second model is a

continuous model where each node represents the emissionprobabilities in the form of a Gaussian Mixture. The analysis of

the discrete model will be first presented followed by the

formulas necessary for the continuous Gaussian Mixture.

Since li is a function of di, and di is a function of gx, xZ

1,.,k, then the derivative of li with respect to a certain

parameter x using the chain rule is given:

vli

vxZ

vli

vdi

vdi

vgx

vgx

vx(9)

vli

vdiZgli1Kli (10)

vdi

vgxZ

K1; xZ i

ghK1x

kK1

1

kK1

X jZ 1jsi

k

gh

j

24

351=hK1; xsi

8>>>>>>>:

(11)

The output of the HMM can take several forms

giZXq2C

gx;q;L (12)

giZMAXq2

C g

x;q;L

(13)

giZ1

NCXq2C

gx;q;Lh" #1=h

(14)

where C represents the set of possible chains and N(C) is the

number of elements in C. The output can be any of the previous

forms or functions of them. MCE training discussed in [6,7]

was based on Eq. (13), which is called the segmental form

where only the most optimal path is considered for update

during the Generalized Probabilistic Descent update step.

Since minimizing or maximizing a function requires the

minimization or the maximization of its log, we choose thediscriminant function given by

giZLogpq0 CXTtZ1

LogaqtK1qtCLogbqt (15)

where the bs are the output functions, as are the transition

probabilities and ps are the initial probabilities.

The HMM imposes constraints on the most of the

parameters associated with each model. Such constraints

include the summation of all transition probabilities going

out of a state which must be one, the summation of all initial

probabilities and many other constraints, which have to be

satisfied during parameter update.

For that reason in [6] a substitution was used which

guarantees those constraints where the substituted parameter is

the one that gets updated in each step. The substitution

previously used in [6] for the initial probability is:

pxZexp px

PQqZ1

exp pq(16)

The previous substitution works well except for the fact that

its uses exponents, which slows down execution. Another

substitution is proposed and used in this research that does not

depend on exponents

pxZp2xPQ

qZ1

p2q

(17)

aixZ

a2ixPQqZ1

a2iq(18)

P0ZP

20

P20C

P21

; P1ZP

21

P20C

P21

(19)

where P0 and P1 belong to a certain state, and represent the

emission probabilities of ones and zeros. The parameters that

get updated are the substituted bar parameters.

To update the initial probabilities we need to find the

derivative of gx with respect to every pq, where qZ1,.,Q. If

qZq0 then

vgx

v pq0Z

21Kpq0pq0

(20)

On the other hand, if qsq0, a dependency still exists

through the normalization formula (17), and the derivative

becomes:

vgx

vpzZK2pq0 pz

p2q0(21)

A similar case holds for the transition probabilities. During

the update, we will consider the derivatives of the transition

probability going out from a certain state i to a state j.

vgx

v aijZ

XTtZ1

1

aqtK1qt

vaqtK1qtv aij

(22)

vaij

v aijZ

2aij1Kaijaij

(23)

vaix

v aijZK2a2ixaij

a2ix(24)

Updating the output probabilities is much easier than the

rest of the parameters. The first step is to find the derivative of g

with respect to b.




7/15

vgx

vbqZ

1

bq(25)

vbxv P1

Z2P0P12xK1

P1(26)

vbxv P0

Z2P0P11K2x

P0(27)

The only difference between the discrete and the continuous

HMM models, is the way the output probability is calculated.

In the continuous case, b(x) is derived from a Gaussian mixture

as follows

bxZXKkZ1

CkNx;mk;s2k (28)

where two constraints are imposed. The first is that the

summation of the weights Ck must be one. The second is that

the standard deviations skis always positive. To guarantee that,

s2k is used for the standard deviation while s4k is used for the

variance. mk is the mean of distribution k and x is the input

variable. The substitution used for the weights is given by

CkZC

2kPK

xZ1

C2

x

(29)

where K is the number of mixtures used. Updating the

parameters is governed by the following equations:

vCx

v Cx Z

2Cx

1KCx

Cx (30)

vCy

v CxZK2C2y Cx

C2

y

(31)

vbxvCjZNx;mk;s2k (32)

vbxvmjZ

vbxvCj

CjxKmjs4j

(33)

vbxvsjZKvbxvmj

4xKmj2C2s4jsj

(34)

This then leads to the general form of the training and

segmentation algorithms for a 3!3!3 3D neighborhood.

Voxel data is represented as a vector composed of 27 floating-

point numbers. Each of these numbers represents the intensity

of the voxel and the intensity of each of its 26 3D neighbors.

This vector is presented to the HMM model and the probability

of output is calculated using prior training knowledge stored in

each model. Labeling takes place by setting the label of the

voxel to that of the node showing the highest output

probability.

3. Comparison with Hidden Markov Random Field

Comparison of the HMM and HMRF in the context of MRI

segmentation will be presented from two points of view. The

first is performance where the complexity analysis of both is

presented. The second is the ability of encoding relations

among voxels in larger neighborhoods.In order to assess the computational efficiency of the

proposed HMM-based segmentation framework, the complex-

ity of the HMM-based segmentation is compared to the widely

utilized HMRF-based segmentation in terms of performance.

Since the continuous Gaussian Mixture HMM is similar to

HMRF segmentation, its complexity analysis is used for

performance comparisons. This starts with the estimation of the

Gaussian mixture given by

OZXGiZ1

wi

ffiffiffiffiffiffiffiffiffiffiffi2ps2

p eKxKm2

s2 (35)

where wi is the weight associated with this Gaussian response.The number of floating-point operations required Nffor such an

operation is given by

OHMMZ9!G (36)where the nine operations account for finding (xKm), squaring

it, finding s2, negating (xKm)2, finding the exponent,

calculating 2ps2, finding square root, dividing wi by square

root, and multiplying by the exponent and G is the number of

mixtures used.

Hence, to find the output probability of a certain number in a

sequence, Eq. (36) gives the number of required floating-point

operations. In its first iteration, the Viterbi algorithm computesthe output probability of the first pattern in the sequence,

multiplied by the initial probability of each node, which forms

n!(1C9!G) computations for n nodes. In the subsequent

operations, the Viterbi algorithm multiplies the current

probability set at each node by the transition probability to

each node, which require an extra n2 operations, and adds the

output probability of the current pattern, which needs 1C9!G

operations, so the total number of operations is given by

OHMMZn1C9GC n21C9GLK10OHMM

ZOn2GL (37)where L is the length of the sequence.

HMRF models start by counting the number of cliques in the

3!3!3 neighborhood. Those cliques can only be formed as

2!2!2 neighborhoods, i.e. composed of eight voxels. Any

combination of voxels larger than two will form a clique in that

neighborhood. Each of cliques requires the evaluation of the

potential. Since the complexity of computing the potential

depends on the model being used, the potential is assumed to

require one cycle per voxel and another cycle for the clique,

which results in the best case scenario for the HMRF models.

This can be demonstrated by the simplest case of subtracting




8/15

the mean out of each voxel, squaring and summing all the

potentials together. More complicated models will, in turn,

require higher complexities. The number of operations NVrequired to carry out these computations of the potential will

thus be:

NVZ2!16!

2C1

C4!4!

2C1

C1

C

X8vZ3

8

v

!4!vCvK1 (38)

And since the probability distribution P(f) of the configur-

ation is a Gibbs distribution (with respect to the neighborhood

system used), which can be given by:

PfZ 1Z!eK

1T

Uf where UfZX

allCliques

Vcf (39)

IfZis assumed a constant by restricting the cliques either to

single locations like or single and double locations, the order of

computing the Gibbs distribution then the computationalcomplexity of G will be O(1), since it requires a constant

number of operations irrespective of any of the model

parameters. However, for more accurate computations the

process of estimation of Z increases the order of complexity,

moreover, for a continuous case like that presented in this

paper it is impossible to find the exact value of Z as it will be

the result of 27 nested integrals. This leads to estimation, which

in turn affects the accuracy of the computed probability. The

complexity becomes:

OHMRFZOZ (40)

So, in the continuous case, the HMRF becomes thecomputation of 27 nested integrals, whereas the HMM is

dependant on the number of classes, number of nodes, and the

size of the input vector.

In the HMM, application to larger neighborhoods occurs

with a change in the size of the input vectors used for training

and segmentation to represent the larger neighborhood since

the parameter updates (11)(34) do not rely on the size of the

input vector. No further change in the HMM algorithm is

necessary. As a result, the HMM provides a robust foundation

that is generically applicable to the segmentation of multi-

dimensional datasets in arbitrarily large neighborhoods, i.e.

applicable to MRI as well as MRSI data.

Larger neighborhoods raise a computational concern in the

case of HMRF segmentation. For example, the classification of

a voxel based on a neighborhood larger than 3!3!3 involves

the mapping from MRF to Gibbs distribution, which, in turn,

entails computing the Gibbs distribution in a 3!3!3

neighborhood. Larger regions necessitate the analysis of higher

order Markov Fields, which requires the re-definition of the

neighborhood. To successfully relate larger neighborhoods,

HMRF must be used in which iterative segmentation takes

place. This is due to the dependency of segmenting each voxel

on its neighbors and their prior segmentations, which are used

to compute the potential. Thus, the segmentation becomes

subject to iterative local maximization/minimization

algorithms like the Expectation Maximization (EM) and the

Iterative Conditional Modes (ICM), which are typically used to

avoid the analytically intractable nature of estimating the best

solution for the HMRF. A common concern with such methods

is their sensitivity to the initialization conditions and the

reaction of the system to input sequences during segmentation.

HMM are easily applicable to larger neighborhoods at costof increasing the additional complexity required by the

algorithm (increase in L in Eq. (37)), instead of adding

increased sensitivities to initial conditions and reactions to

input sequences. The problem of applicability to larger

neighborhoods is specifically important in the context of

segmentation of biomedical imaging data from multiple

modalities where the voxel neighborhood must be extended

across modalities or across time, e.g. in functional MRI,

beyond the 3!3!3 neighborhood. Although this increase may

provide better segmentation accuracy, the increase in accuracy

is bounded, i.e. will occur up to a certain neighborhood, after

which there could not be any significant change in the

segmentation accuracy due to the smoothing effect of utilizing

a larger neighborhood. This issue can even further complicate

the choice of the appropriate neighborhood size, since if the

neighborhood becomes very large, the segmentation accuracy

can be negatively affected. Hence, the contribution of the

different neighbors in the segmentation process may be

weighted according to their distance to the voxel under

investigation. These weights can be inversely proportional to

the distance between the neighboring pixels/voxels and the

investigated pixel/voxel. In other words, the significance of the

neighboring pixels/voxels in the segmentation strategy

increases as the neighbors become closer to the pixel/voxel

under investigation.

4. Preprocessing phase

Preprocessing steps aim to reduce the effects of noise,

address intensity inhomogeneities, and perform global

intensity level correction and are applied prior to segmentation.

These are based on existing techniques and are only presented

here for completeness, but are not discussed in detail.

4.1. Intensity inhomogeneities

Intensity inhomogeneities are defined as variations in voxel

intensities through or across imaging data sets, which appear as

either sudden or slow variations. Handling both types of

intensity variations in a pre-processing phase to segmentation

results in improving the segmentation accuracy through the

control of adverse effects caused by such inhomogeneities. A

normalized histogram intersection between each two consecu-

tive images in a data set is used for this purpose. The

distributions of pixel intensities between each pair of

consecutive images are expected to change slowly. If the

mean and variance across slices nearly match, then the

distribution will change slowly. Assuming that Ii is the

intensity of pixel i in an image, then the standard deviation

of the image is given by




9/15

sZ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

N

XNiZ1

IiKm2vuut (41)

where m is the mean intensity. If we assume that a contrast a

and a brightness b, which made the standard deviation of thevoxel intensity distribution become r 0, then:

s0Z

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

N

XNiZ1

aIiCbKamKb2vuut

Zas (42)

This shows that the standard deviation is only affected by

the contrast. Obviously, that case maps to the correction of

each slice, with respect to its preceding slice. The slices that are

considered are those ones having non-empty preceding slices,

which can be determined since after skull peeling (cerebrum

reconstruction, or skull stripping) all the background voxels

end up being exactly zeroes.A similar argument holds for the brightness, where

m0ZamCb (43)

which means that by the knowing a from Eq. (42), b can be

estimated from (43) Fig. 3.

4.2. Global intensity level correction

Global intensity correction is addressed after handling both

sudden and slow intensity variations. Since, the HMM-based

segmentation utilizes the grayscale or color information of

voxels, it is sensitive to global variations among data sets. In

order to remedy this condition, global correction is employed

to maximize the histogram intersection between the data sets,

so that errors due to intensity differences are minimized.

In order to achieve the required global intensity correction,

the normalized histograms are utilized due to differences

between the number of pixels/voxels of different data sets. The

histogram, which represents the frequency of the intensities, is

normalized against the total number of non-background pixels

present in each data set. And brightness that leads to the

maximization of the integral of the histogram intersection,

expressed as follows, is performed after applying an

anisotropic filtering stage:

^Hint

VMaxvZ0

^HIA; vh ^HIB; vdv (44)

The brightness and contrast values were estimated in the

same way done in Eqs. (42) and (43), where they were appliedafter the sudden intensity correction and before the filtering.

5. Training and segmentation steps

Based on the mathematical models of both the discrete and

continuous HMM-based segmentation techniques, the general

form of the HMM-based training and segmentation algorithms

for a 3D neighborhood Ninvolves representing each voxel by a

vector or sequence of symbols v. The sequence represents the

relevant parameters of the voxel and the voxels neighbors in

N. The representative vector or symbol of each voxel is

presented to a set of HMM models, each corresponding to a

separate class or tissue type, and the output probabilities are

calculated using prior training knowledge stored in each HMM

model. Labeling takes place by assigning to the voxel the label

associated with the HMM showing the highest output

probability. Training of both the continuous and the discrete

models follow the same procedure (Fig. 4). The segmentation

also follows the same procedure for both discrete and

continuous HMM-based techniques (Fig. 5). If labeling

encounters segments whose characteristics are not consistent

with any of the known tissue types, these are classed as

unknown tissue. A clinical expert is then requested to assign a

Fig. 3. Sudden intensity correction steps.

Fig. 4. Training the HMM.




10/15

label to the unknown tissues. The segments characteristics are

then used to initialize the knowledge of the newly identified

tissue and the corresponding HMM. The acquired knowledge is

then used to label new segments that belong to the newly

identified tissue type.

6. Experimental results

Three types of preprocessing were applied. 3D anisotropic

filtering as described in [13], using kZ5 and for 10 iterations.

Previously in [14], we showed how Global Intensity level

correction could be applied to MRI sequences. Also in [14], we

showed how sudden intensity variations that appear in many

MR sequences could be accounted for. The same techniques

were used here for preprocessing of imaged prior to

segmentation. The anisotropic filter used had kZ5 and applied

for 10 iterations. For the discrete HMM, the number of states

was successively increased and after 15 states no significant

improvement was detected. The number of states used was 10,

with a Gaussian mixture of 15 distributions. The maximum

number of iterations was set to 30,000 and the sigmoid slope to0.08.

Fig. 6 shows the classification accuracy (1-loss) averaged

for every 1000 iterations for the discrete model. It is clear that

the training algorithm reaches the minimum of the error surface

after around 10,000 iterations, which justifies why during

experimentation we chose 30,000 as an upper bound for the

number of iterations.

6.1. BrainWeb data results

The algorithm was tested using simulated digital phantoms

from the BrainWeb MR simulator (http://www.bic.mni.mcgill.ca/brainweb/). The digital phantoms were obtained using an

isotropic voxel size of 19 mm to investigate the influence of

noise, field inhomogeneity, and contrast (T1-weighted using

[18, 10 ms, and 30 (TR, TE, and flip angle))] with varying

levels of noise from 1 to 9% and varying levels of spatial

inhomogeneity, i.e. intensity variations for each tissue class,

from 0 to 40%. The comparison was performed on the basis of

the Dice similarity coefficient that measures the overlap

between two segmentations X and Y

DX;YZ 2jXhYj

jX

jC

jY

j(46)

where jrj represents the number of voxels in segment r. TheDice coefficient was computed for both gray matter and white

matter segmentation. The results are shown in Tables 15.

As can be seen from the tables, the results for the Dice

similarity coefficient shows that the HMM-segmentation

provides accurate segmentation of the White Matter (WM)

and Gray Matter (GM) even in the presence of increasing noise

and spatial inhomegeneities. The increase in the slice thickness

has the expected effect of reducing the accuracy of the

algorithm as evidenced by the similarity coefficient. This is

expected as the algorithm itself is geared to use in 3D images

6.20E-01

6.40E-01

6.60E-01

6.80E-01

7.00E-01

7.20E-01

7.40E-01

7.60E-01

1.00E+03 1.00E+04 1.00E+05 1.00E+06

Iterations

Classification accuracy

Fig. 6. Classification accuracy (1-loss) evaluated across iterations for every

1000 iterations.

Table 1

BrainWeb results, 1 mm slice

Spatial inhomogeneity

0% 20% 40%

White Gray White Gray White Gray

0% 0.831 0.872 0.831 0.872 0.708 0.756

1% 0.825 0.870 0.756 0.801 0.706 0.756

3% 0.815 0.869 0.772 0.828 0.713 0.773

5% 0.793 0.860 0.765 0.833 0.717 0.793

7% 0.739 0.832 0.739 0.825 0.702 0.797

9% 0.663 0.796 0.682 0.799 0.672 0.787

Average 0.778 0.85 0.758 0.826 0.703 0.777

Fig. 5. Segmenting with the HMM.


http://www.bic.mni.mcgill.ca/brainweb/http://www.bic.mni.mcgill.ca/brainweb/http://www.bic.mni.mcgill.ca/brainweb/http://www.bic.mni.mcgill.ca/brainweb/


11/15

data sets in which the neighboring images are of a similar

distance as the pixel distances (i.e. that the slice thickness is

close to the pixel distance). In the case of the simulated dataset,

the ground truth is established from the original data

creation, whereby each tissue is clearly established and the

segmentation is completely known. Thus, this data set does not

need an expert segmentation for comparison. Additionally,

expected results should be better that for real data. Note that

Table 4



0% 20% 40%


0% 0.568 0.630 0.559 0.589 0.524 0.567

1% 0.568 0.634 0.560 0.594 0.543 0.570

3% 0.577 0.662 0.564 0.613 0.551 0.595

5% 0.569 0.578 0.581 0.652 0.562 0.629

7% 0.563 0.708 0.577 0.671 0.556 0.641

9% 0.510 0.698 0.596 0.697 0.555 0.690

Average 0.559 0.652 0.573 0.636 0.549 0.615

Table 5



0% 20% 40%


0% 0.526 0.573 0.523 0.539 0.512 0.518

1% 0.532 0.576 0.523 0.538 0.513 0.521

3% 0.534 0.598 0.532 0.570 0.519 0.538

5% 0.529 0.630 0.539 0.593 0.527 0.594

7% 0.508 0.659 0.535 0.634 0.533 0.596

9% 0.527 0.673 0.529 0.646 0.534 0.630

Average 0.526 0.618 0.53 0.587 0.523 0.566

Table 3



0% 20% 40%


0% 0.629 0707 0.607 0.662 0.582 0.6371% 0.628 0.711 0.609 0.669 0.583 0.641

3% 0.634 0.733 0.624 0.704 0.591 0.664

5% 0.620 0.750 0.625 0.719 0.601 0.697

7% 0.618 0.755 0.611 0.745 0.603 0.714

9% 0.598 0.753 0.600 0.752 0.601 0.729

Average 0.621 0.735 0.613 0.709 0.594 0.68

Table 2



0% 20% 40%


0% 0.707 0.790 0.671 0.727 0.637 0.707

1% 0.708 0.792 0.672 0.741 0.636 0.7073% 0.703 0.802 0.682 0.765 0.641 0.723

5% 0.695 0.804 0.685 0.783 0.646 0.749

7% 0.668 0.796 0.664 0.782 0.642 0.758

9% 0.600 0.770 0.620 0.772 0.618 0.755

Average 0.68 0.792 0.666 0.762 0.637 0.733




12/15

this is used to initial test the capability of the algorithm to carry

out segmentation, further testing with real data is also

presented that show comparisons with existing and expert

segmentation.

Sample images for 1 mm slices are shown in Fig. 7. The

leftmost images are the original phantom data, the center is the

data used to generate the phantom (i.e. the ground truth data),

and the rightmost is the segmented result. As can be seen,subjectively, the segmentation results reflect the objective

overlap results. One noticeable exception is that the segmenta-

tion algorithm currently is configured to only look at gray and

white matter, and ignores all other tissue types. Further work is

continuing in expanding the algorithm for use with additional

tissue types.

6.2. IBSR data results

After segmenting each case, the accuracy of the HMM-

based segmentation relative to the manual segmentations as

well as the results of existing techniques, including themaximum likelihood, was determined using the Tanimoto

coefficient, which was previously used in existing techniques,

and is given by

TX; YZ jXhYjjXhYj (47)

where jrj represents the number of voxels in segment r. By thedefinitions of the Dice and Tanimoto coefficients,

T(X,Y)%D(X,Y). So, the Tanimoto is more conservative

than the Dice, where equality is subject to the condition that

Table 6

IBSR reported results

White Gray Method

0.567 0.564 Adaptive MAP

0.562 0.558 Biased MAP

0.567 0.473 Fuzzy c-means

0.554 0.550 Maximum A posteriori Probability (MAP)

0.551 0.535 Maximum-Likelihood

0.571 0.477 Tree-structure k-means

0.832 0.876 Manual (4 brains averaged over 2 experts)

Fig. 7. Sample segmentation of simulated digital phantoms.

Table 7

Overlapping results obtained from applying both HMM algorithms on the IBSR

data after training

Discrete Continuous

White Gray White Gray

100_23 0.517 0.694 0.774 0.879

11_3 0.537 0.718 0.778 0.878

110_3 0.589 0.747 0.746 0.869

111_2 0.614 0.737 0.748 0.857

112_2 0.610 0.761 0.761 0.874

12_3 0.574 0.748 0.784 0.881191_3 0.504 0.708 0.762 0.870

13_3 0.617 0.746 0.761 0.868

202_3 0.566 0.743 0.756 0.864

205_3 0.499 0.623 0.723 0.782

7_8 0.587 0.745 0.758 0.869

8_4 0.595 0.723 0.742 0.853

17_3 0.613 0.716 0.735 0.854

4_8 0.590 0.690 0.669 0.813

15_3 0.592 0.697 0.669 0.817

5_8 0.578 0.635 0.731 0.854

16_3 0.604 0.719 0.702 0.842

2_4 0.596 0.684 0.635 0.797

6_10 0.528 0.582 0.752 0.855

Average 0.546 0.671 0.699 0.809




13/15

XKYZf. Either coefficient can be utilized in the evaluation.

However, the choice is dictated to ensure the consistent

comparison of the published results with the HMM-based

segmentation results. The Tanimoto coefficient was computed

for both gray matter and white matter segmentation based on an

analysis of variance in which the coefficient is the dependent

variable while the training dataset and the tissue type are the

independent factors. The results are demonstrated in Table 7.

Table 6 shows the average results that were reported on theIBSR website using the same data that was used in this study.

The BMAP algorithm described in [11] is based on HMRF

computation. Although the results of the discrete model shows

to be near from most of the reported ones, yet the results of the

continuous model shows to be superior even when compared

HMRF. The IBSR data consists of various image sequences

representing differing real work data sets. The HMM was

trained with one data set, and then used to segment the

remaining data sets. In both Tables 7 and 8, the first column

represents the image sequence numbers. The averages for the

HMM are show on the last row for comparison with existingresults.

For a fair comparison, the preprocessing phase of intensity

variations was removed, and the results were compared to that

of the Adaptive MAP algorithm [15], which takes care of

intensity variations of segments through an ML stage for

initialization purposes. The AMAP is based on Hidden Markov

Random Fields, so after removing this preprocessing phase the

comparison becomes so close to comparing both algorithms

together except for the difference in filtering.

We compare with the AMAP and not the BMAP because the

latter models the bias field, which was not considered in our

analysis. The results in Table 8 demonstrate that the HMMwere able to segment the brain with higher accuracies.

Conversely, this supports the initial argument present at the

beginning of the paper, which is that HMMs during

classification encodes relations not only between neighboring

voxels, but also between voxels present in non-neighboring

sites and which is not present in the HMRF.

Inspection of segmented slices from case 5_8, which was

before described in the preprocessing phases during sudden

intensity correction, is shown in Fig. 8. The comparison reveals

the expected, without sudden intensity correction the whole

slice is erroneously segmented as gray, and the bright one gets

segmented as being white.The results demonstrated in the tables demonstrate an

objective assessment of the quality of the algorithms, yet

practical cases may be much higher than that since each of the

IBSR data sets contains at least has one form of difficulty.

Table 8

Overlapping results obtained without carrying out sudden intensity correction

White Gray

100_23 0.792087 0.867957

11_3 0.795936 0.863375

110_3 0.756762 0.850009

111_2 0.777369 0.844556112_2 0.775694 0.854445

12_3 0.814668 0.873579

191_3 0.798191 0.864343

13_3 0.799298 0.869574

202_3 0.793114 0.857034

205_3 0.760803 0.761042

7_8 0.743964 0.836973

8_4 0.734167 0.817848

17_3 0.710917 0.814474

4_8 0.631508 0.774435

15_3 0.700746 0.790821

5_8 0.238758 0.690957

16_3 0.72307 0.815892

2_4 0.632165 0.77051

6_10 0.386919 0.668827Average 0.668307 0.774333

Fig. 8. Comparison between segmented slices from image sequence 5_8 with and without carrying out sudden intensity correction.




14/15

7. Conclusion

In this paper, a 3D MRI segmentation algorithm based on

HMMs is presented. The algorithm demonstrates the ability of

HMMs to handle multi-dimensional classification, whereas

HMMs were previously considered as candidates for 1D

classification only. The HMM model, together with carefullyconstructed preprocessing steps showed significant improve-

ment in the quality of 3D MRI segmentation when objectively

compared to other results obtained using the same data. Both

simulated and real data were used in the evaluation of the

algorithm with promising results. The objective measure on the

simulated phantoms (created from images used to establish

ground truth) showed that the algorithm, although currently

restricted to only gray and white matter, accurately identifies

these tissues within limits of error. Further work is progressing

on increasing the number of identified tissues. The results from

the real data (using expert manual segmentations as ground

truth) showed that the overlap measures are better thatpreviously established methods, and are within the limits of

error. This is easily seen even the expert manual segmentation

shows errors from one operator to another.

Comparisons between HMMs and HMRFs concerning the

complexity of computations involved and the ability to

segment based on the decision made from larger regions are

presented. The comparative results indicate that the current

mathematical model of MRF using Gibbs distribution can be

extended to neighborhoods larger than a 3!3!3 neighbor-

hood if the Markovianity assumption is extended to higher

orders. A restriction that is not required for the current HMM

modeling scheme.

The challenge in using HMRFs for larger neighborhoods

requires more research in terms of innovative modeling

schemes. In the HMM, application to larger neighborhoods

occurs with only a change in the input vectors used for training

and segmentation to represent this larger neighborhood since

the parameter updates do not rely on the size of the input

vector. No further change in the HMM algorithm is necessary.

For that purpose, the proposed HMM provides a robust

foundation that is not sensitive to the initial conditions for

enabling the segmentation of MR imaging data.

The problem that affects the application of MRF in

segmentation is the classification of the voxel based on regions

larger than 3!3!3 neighborhoods. The basis of the mappingfrom HMRF to Gibbs distribution forces the neighborhood

used to compute the Gibbs distribution in a 3!3!3

neighborhood. Larger regions necessitate the analysis of higher

order Markov Fields, which in turns needs the re-definition of

the neighborhood. To successfully relate larger neighborhoods

HMRF must be used, where iterative segmentation takes place.

This is due to the dependency of segmenting each voxel on its

neighbors and their prior segments, which are used to compute

the potential. This in turn becomes subject to iterative local

maximization/minimization algorithms like the Expectation

Maximization (EM) and the Iterative Conditional Modes

(ICM). A common and crucial problem with such methods is

their sensitivity to the initialization conditions and the reaction

of the system to input patterns during segmentation.

The cost of increasing the neighborhood size in the context

of the proposed segmentation strategy is the extra compu-

tations required by the algorithm (increase in L in Eq. (37)).

The problem of applicability to larger neighborhoods is

specifically important in the context of segmentation ofbiomedical imaging data from multiple modalities where the

pixel/voxel neighborhood must be extended across modalities

or across time, e.g. in functional MRI, beyond the 3!3!3

neighborhood. Although this increase may provide better

segmentation accuracy, this will only occur up to a certain

point after which there would not be any significant change in

the segmentation accuracy. This issue is under investigation.

This issue can even further complicate the choice of the

appropriate neighborhood size, since if the neighborhood

becomes very large, the segmentation accuracy can be

negatively affected. Hence, the contribution of the different

neighbors in the segmentation strategy can be weightedaccording to their distance to the voxel under investigation.

These weights can be inversely proportional to the distance

between the neighboring pixels/voxels and the investigated

pixel/voxel, in other words, the significance of the neighboring

pixels/voxels in the segmentation strategy increases as the

neighbors become closer to the pixel/voxel under investigation.

Further work is continuing on the effect of increased

neighborhood sizes.

Other considerations that will enhance the accuracy of

segmentation include the usage of multi-spectral images, not

only T1 but also T2 and PD. The vectors used in classification

will then be extracted from each voxel and its neighbors in thethree images forming a 27!3 input. And in this case, a 3D

Gaussian mixture model can be used, where the input to each

state is a vector of the three intensities. Similar work using

multi-sensor data and Hidden Markov Chains has been

reported [33] and concludes that the applicability of HMC to

these problems is appropriate. This does present promising

results that have yet to be applied to MR imaging data. Further

work in applying HMM to multi-spectral MR imaging data is

currently in progress.

References

[1] W. Grimson, G. Ettinger, T. Kapur, M. Leventon, W. Wells, R. Kikinis,

Utilizing segmented MRI data in image-guided surgery, International

Journal of Pattern Recognition and Artificial Intelligence 11 (8) (1998)

13671397.

[2] S. Warfield, J. Dengler, J. Zaers, C.R.G. Guttmann, W.M. Wells,

G.J. Ettinger, J. Hiller, R. Kikinis, Automatic identification of grey matter

structures from MRI to improve the segmentation of white matter lesions,

Journal of Image Guided Surgery 1 (6) (1996) 326338.

[3] E. Grimson, M. Leventon, G. Ettinger, A. Chabrerie, S. Nakajima,

F. Ozlen, H. Atsumi, R. Kikinis, P. Black, Clinical Experience with a

High Precision Image-Guided Neurosurgery System, MICCAI, Springer,

Berlin, 1998. pp. 6373.

[4] L.R. Rabiner, A tutorial on hidden markov models and selected

applications in speech recognition, Proceedings of the IEEE (1989)

257286.




15/15

[5] L. Bahl, P.F. de Souza Brown, P.V., K.L. Mercer, Maximum mutual

information estimation of hidden markov parameters for speech

recognition, Proceedings of the IEEE (1988) 4952.

[6] Biing-Hwang Juang, W. Chou, Chin-Hui Lee, Minimum classification

error rate methods for speech recognition, IEEE Transactions on Speech

and Audio Processing (1997) 257265.

[7] S. Katagiri, Biing-Hwang Juang, Chin-Hui Lee, Pattern recognition

using a family of design algorithms based upon the generalized

probabilistic descent method, Proceedings of the IEEE 86 (11) (1998)

23452373.

[9] Y. Zhang, M. Brady, S. Smith, Segmentation of brain MR images

through a hidden markov random field model and the expectation

maximization algorithm, IEEE Transactions on Medical Imaging 20 (1)

(2001) 4557.

[10] J.C. Rajapakse, J. Piyaratna, Bayesian approach to segmentation of

statistical parametric maps, IEEE Transactions on Biomedical Engin-

eering 48 (10) (2001) 11861194.

[11] J.C. Rajapakse, F. Kruggel, Segmentation of MR images with intensity

inhomogeneities, Image and Vision Computing 16 (1998) 165180.

[12] Jagath. C. Rajapakse, Jay. N. Giedd, Judith. L. Rapoport, Statistical

approach to segmentation of single-channel cerebral MR images, IEEE

Transactions on Medical Imaging 16 (2) (1997) 176186.

[13] N.M. John, A three dimensional statistical model for image segmentation

and its application to mr brain images, PhD thesis, University of Miami,

1999.

[14] N.M. John, M. Kabuka, M.O. Ibrahim, Multivariate statistical model for

3D image segmentation with application to medical images, Journal of

Digital Imaging 16 (4) (2004) 365377.

[15] J.C. Rajapakse, J.N. Giedd, J.L. Rapoport, Statistical approach to

segmentation of single-channel cerebral MR images, IEEE Transactions

on Medical Imaging 16 (2) (1997) 176186.

[16] K. Van Leemput, F. Maes, D. Vandermeulen, P. Suetens, Automated

model-based tissue classification of MR images of the brain, IEEE


[17] S.Z. Li, Markov random field models in computer vision, in: Proceedings

of the European Conference on Computer Vision, Stockholm, Sweden,

1994, pp. 361370.

[18] J. Besag, On the statistical analysis of dirty pictures, Journal of the RoyalStatistical Society, Series B 48 (3) (1986) 259302.

[19] K. Held, E.R. Kops, B.J. Krause, W.M. Wells III, R. Kikinis, H.-

W. Muller-Gartner, Markov random field segmentation of brain MR

images, IEEE Transactions on Medical Imaging 16 (6) (1997) 878886.

[20] X. Descombes, F. Kruggel, D.Y. von Cramon, Spatio-temporal fMRI

analysis using markov random fields, IEEE Transactions on Medical

Imaging 17 (6) (1998) 10281039.

[21] S. Ruan, C. Jaggi, J. Xue, J. Fadili, D. Bloyet, Brain tissue classification of

magnetic resonance images using partial volume modeling, IEEE


[22] K. Van Leemput, F. Maes, D. Vandermeulen, P. Suetens, A unifying

framework for partial volume segmentation of brain MR images, IEEE


[23] W.M. Wells, E.L. Grimson, R. Kikinis, F.A. Jolesz, Adaptive

segmentation of MRI data, IEEE Transactions on Medical Imaging 15

(8) (1996) 429442.

[24] R. Guillemaud, J.M. Brady, Estimating the bias field of MR images, IEEE


[25] J.L. Marroquin, B.C. Vemuri, S. Botello, F. Calderon, A. Fernandez-

Bouzas, An accurate and efficient bayesian method for automatic

segmentation of brain MRI, IEEE Transactions on Medical Imaging 21

(8) (2002) 934945.

[26] B. Moretti, L.M. Fadili, S. Ruan, N. Bloyet, B. Mazoyer, Phantom-based

performance evaluation: application to brain segmentation from magnetic

resonance images, Medical Image Analysis 4 (4) (2000) 303316.

[27] A. Zavaljevski, A.P. Dhawan, M. Gaskil, W. Ball, J.D. Johnson, Multi-

level adaptive segmentation of multi-parameter MR brain images,

Computerized Medical Imaging and Graphics 24 (2) (2000) 8798.

[28] Y. Wang, T. Adali, J. Xuan, Z. Szabo, Magnetic resonance image analysis

by information theoretic criteria and stochastic site models, IEEE

Transactions on Information Technology in Biomedicine 5 (2) (2001)

150158.

[29] K. Van Leemput, F. Maes, D. Vandermeulen, P. Suetens, A statistical

framework for partial volume segmentation, Lecture Notes Computer

Science 2208 (2001) 204212.

[31] R. Fjortoft, Y. Delignon, W. Pieczynski, M. Sigelle, F. Tupin,

Unsupervised classification of radar images using hidden markov chains

and hidden markov random fields, IEEE Transactions of Geoscience and

Remote Sensing 41 (3) (2003) 675685.

[32] S. Derrode, W. Pieczynski, Signal and image segmentation using pairwise

markov chains, IEEE Transactions on Signal Processing 52 (9) (2004)

24772489.[33] N. Giordana, W. Pieczynski, Estimation of generalized multisensor

hidden markov chains and unsupervised image segmentation, IEEE

Transactions on Pattern Analysis and Machine Intelligence 19 (5) (1997)

465475.



Hidden Markov Models-based 3D MRI Brain Segmentation

Documents

Transcript of Hidden Markov Models-based 3D MRI Brain Segmentation