Hidden Markov Models-based 3D MRI Brain Segmentation

download Hidden Markov Models-based 3D MRI Brain Segmentation

of 15

Transcript of Hidden Markov Models-based 3D MRI Brain Segmentation

  • 7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation

    1/15

    Hidden Markov models-based 3D MRI brain segmentation

    M. Ibrahim, N. John, M. Kabuka *, A. Younis

    Department of Electrical and Computer Engineering, College of Engineering, University of Miami,

    1251 Memorial Drive, Room 406, Coral Gables, FL 33146, USA

    Received 18 September 2004; received in revised form 4 February 2006; accepted 1 March 2006

    Abstract

    This paper introduces a 3D MRI segmentation algorithm based on Hidden Markov Models (HMMs). The mathematical models for the HMM

    that forms the basis of the segmentation algorithm for both the continuous and discrete cases are developed and contrasted with Hidden MarkovRandom Field in terms of complexity and extensibility to larger fields. The presented algorithm clearly demonstrates the capacity of HMM to

    tackle multi-dimensional classification problems.

    The HMM-based segmentation algorithm was evaluated through application to simulated brain images from the McConnell Brain Imaging

    Centre, Montreal Neurological Institute, McGill University as well as real brain images from the Internet Brain Segmentation Repository (IBSR),

    Harvard University. The HMM model exhibited high accuracy in segmenting the simulated brain data and an even higher accuracy when

    compared to other techniques applied to the IBSR 3D MRI data sets. The achieved accuracy of the segmentation results is attributed to the HMM

    foundation and the utilization of the 3D model of the data. The IBSR 3D MRI data sets encompass various levels of difficulty and artifacts that

    were chosen to pose a wide range of challenges, which required handling of sudden intensity variations and the need for global intensity level

    correction and 3D anisotropic filtering. During segmentation, each class of MR tissue was assigned to a separate HMM and all of the models were

    trained using the discriminative MCE training algorithm. The results were numerically assessed and compared to those reported using other

    techniques applied to the same data sets, including manual segmentations establishing the ground truth for real MR brain data. The results

    obtained using the HMM-based algorithm were the closest to the manual segmentation ground truth in terms of an objective measure of overlap

    compared to other methods.q 2006 Elsevier B.V. All rights reserved.

    Keywords: Hidden Markov Models; Image segmentation; Medical imaging

    1. Introduction

    Interpretation of the biomedical imaging of the brain plays

    an important part in diagnosis of various diseases and injury.

    Due to the importance of brain imaging interpretation,

    significant research efforts have been devoted to developing

    better and more efficient techniques in several related areas

    including processing, modeling, and understanding of brain

    images. In particular, the problem of automating 3D

    segmentation of brain imaging using Magnetic Resonance

    Imaging (MRI), Computed Tomography (CT), Positron

    Emission Tomography (PET) or other modalities, has received

    special attention as evidenced by numerous published research

    work[13]. This is mainly due to the multitude of benefits that

    may be gained from accurate automated 3D brain

    segmentation.

    Segmentation frameworks based on Markov Random Fields

    (MRF) and Hidden Markov Random Fields (HMRF) were

    introduced in several reported efforts [912]. MRFs and

    HMRFs share the common property of revealing the

    dependency between the imaging voxels to be segmented and

    their first-degree neighbors. However, both frameworks are

    computationally intensive, which adversely affects their

    practical applicability in medical environments. On the other

    hand, Hidden Markov Models (HMMs) have proven valuable

    when applied to Automatic Speech Recognition (ASR) [4],

    where ASR is essentially a pattern recognition problem. In fact,

    HMRFs, which are mainly applied in computer vision and

    image processing, grew out of further developments of HMMs.

    Hidden Markov Chains have also been reported for image

    segmentation using radar, synthetic and multi-sensor images

    [3133]. A generalized mixture estimation approach is

    Image and Vision Computing xx (2006) 115

    www.elsevier.com/locate/imavis

    0262-8856/$ - see front matter q 2006 Elsevier B.V. All rights reserved.

    doi:10.1016/j.imavis.2006.03.001

    * Corresponding author. Tel.: C1 305 284 2212; fax: C1 305 284 4044.

    E-mail address: [email protected] (M. Kabuka).

    + model ARTICLE IN PRESS

    http://www.elsevier.com/locate/imavismailto:[email protected]:[email protected]://www.elsevier.com/locate/imavis
  • 7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation

    2/15

    presented for unsupervised classification of Hilbert-Peano

    scans of radar images [31], which combines Hidden Markov

    Chain models and Hidden Markov Random Field models.

    Similarly, pairwise Markov Random Chain models provided

    the basis for unsupervised signal and image segmentation of

    simulated as well as radar images [32]. Another approach

    utilizing Hidden Markov Chains was presented for imagesegmentation of synthetic and multi-sensor radar images [33].

    These techniques provide promising results for utilizing

    HMMs for MR image segmentation.

    HMMs, implemented using the Viterbi algorithm, are

    sufficiently capable of encoding the first-degree relationships

    and can be extended to higher degrees. Encoding first-degree

    relationships among the voxels will be shown, as evidenced by

    the experimental results, to provide sufficient information for

    accurate segmentation of 3D MRI brain imaging data. The

    main training algorithms that have been developed for HMMs

    are the BaumWelsh algorithm [4] and the Maximum Mutual

    Information (MMI) algorithm [5]. The inefficiency of both

    techniques is argued in the context of Bayesian classification

    where it is shown that both algorithms do not necessarily result

    in the best Bayesian threshold [6]. Consequently, a new

    algorithm, namely the Minimum Classification Error (MCE),

    was developed [6], which takes into consideration exposing

    each of the HMM nodes to both the patterns to be rejected as

    well as the patterns to be recognized. As a result, the HMM

    nodes can minimize the accompanying error rate by moving

    the Bayesian threshold closer to the correct location as shown

    in Fig. 1.

    Many advances in brain MR image segmentation have

    relied on a Bayesian framework and Markov Random Fields

    (MRFs) [17]. In [15], the smoothness and piecewise contiguousnature of the tissue regions in MR cerebral images was

    modeled using a 3D MRF. A segmentation algorithm, based on

    the statistical model, finds the approximate Maximum A

    Posteriori (MAP) estimation of the segmentation model

    parameters from the MR imaging data. Another scheme for

    segmentation was based on the Iterative Conditional Modes

    (ICM) algorithm [18], in which measurement model para-

    meters were estimated using local information at each site, and

    the prior model parameters were estimated using the

    segmentation results after each cycle of iterations. In this

    case, MRFs were used to model only the intensity process, and

    the segmentation results were improved by incorporating the

    discontinuity process into the prior model. The scheme also

    addressed the effect of magnetic field inhomogeneities and

    biological variations of tissues as variations of the model

    parameters. Unfortunately, this model did not investigate the

    discontinuity process in the 3D MR volumes.A fully automated 3D-segmentation technique for MR brain

    images was introduced in [19] that relied on a MRF model to

    capture the non-parametric distributions of tissue intensities,

    neighborhood correlations, and signal inhomogeneities in MR

    images. The technique used two algorithms based on Simulated

    Annealing and on Iterative Conditional Modes and started with

    a training process of typical echo intensities and setting one of

    the MRF parameter according to the expected inhomogeneity.

    The technique was able to automatically segment the entire 3D

    MR volume, as well as different MR images acquired using the

    same MR sequence. Another study [20] involved embedding

    the problem of functional MRI (fMRI) analysis into a Bayesian

    framework, and then provided an algorithm to restore and

    analyze fMRI using MRFs in a Bayesian framework. The study

    analyzed the shortcomings of the Statistical Parameter Map

    (SPM) by using a 3D MRF where the third dimension

    represents time, and then the proposed restoration approach

    was applied before using SPM, which resulted in an

    improvement of the detection sensitivity. This study also

    analyzed the hemodynamic response using three parameters,

    the norm, the maximum and the time when the maximum

    occurs, where it was shown that when the values of these

    parameters in neighboring voxels are far from each other, the

    probability of detection is lower since the associated

    hemodynamic responses are not consistent in the spatialdomain. Hence, the problem was modeled using two-level

    MRF interactions between the activation map and the three

    parameter maps. The detection of an activated area, thus,

    depends on the norm of the hemodynamic response and some

    contextual information on this norm as well as the consistency

    of the hemodynamic function parameters across this area.

    Another fully automated method for model-based tissue

    classification of magnetic resonance MR images of the brain

    was introduced in [16]. The method relies on MRFs to

    incorporate contextual information and uses a digital brain

    atlas for the expected a priori information of the spatial

    locations of the tissue classes. The main idea of the method is

    to interleave the classification with MR bias field correction,

    intensity distribution estimation, and estimation of MRF

    parameters. Hence, it improves the classification in each

    iteration of the segmented single and multi-spectral MR

    images, and corrected MR signal inhomogeneities. The

    proposed strategy can be considered a fully automated method

    for tissue classification that produces objective and reprodu-

    cible results. Another automatic method is presented in [21],

    where the objective of the study is to classify the brain tissue

    while taking into account the partial volume effect, which

    results in MR image volumes being composed of a mixture of

    several tissue types. This study assumes that the brain dataset is

    composed of gray matter, white matter, cerebro-spinal fluid,

    Class PDF

    Non-classPDF

    Errorneous threshold

    Bayesian threshold

    Probability

    Argument

    Fig. 1. Correct Bayesian threshold vs. erroneous one.

    M. Ibrahim et al. / Image and Vision Computing xx (2006) 1152

    + model ARTICLE IN PRESS

  • 7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation

    3/15

    and mixtures (called mix-classes). The study provided a

    statistical model of the mix-classes and it showed that it

    could be approximated by a Gaussian function under some

    conditions. The proposed method used a two-step strategy; in

    the first step, it segmented the brain into pure and mix-classes

    while the second step is to re-classify the mix-classes into the

    pure classes using knowledge about the obtained pure classes.Both steps use MRF models as well as the multi-fractal

    dimension describing the topology of the brain to provide an

    additional energy term in the MRF model to improve

    discrimination of the mix-classes. The proposed strategy is

    unsupervised, fully automatic, and uses only T1-weighted

    images. In [22], a statistical framework for partial volume

    segmentation of MR images of the brain was introduced. The

    framework starts by segmenting the image using a parametric

    statistical model in which each voxel is classified to one single

    type of tissue. Then, it uses a down-sampling step that

    addresses partial volumes along the borders between tissues. In

    this step, a number of voxels in the original image grid

    contribute to the intensity of each voxel in the resulting image

    grid. The framework also uses an Expectation Maximization

    (EM) approach to estimate the parameters of the new model

    and to perform the partial volume classification.

    In [23], a statistical segmentation framework of brain MR

    images based on Hidden Markov Random Field (HMRF) is

    introduced, which overcomes the problems of Finite Mixture

    (FM) models [24,25] that do not take into account the spatial

    properties of the image. The HMRF model is an MRF model

    whose state sequence cannot be observed directly but can be

    indirectly estimated through observations. The strategy also

    uses an EM algorithm to provide an accurate and robust

    segmentation. The study in [26] introduced an efficient andaccurate automatic 3D segmentation approach for brain MR

    images. The approach uses a brain atlas in conjunction with a

    robust registration procedure to find a non-rigid transformation

    that maps the standard brain to the specimen to be segmented,

    and hence, is used to segment the brain from non-brain tissues

    and compute prior probabilities for each class at each voxel

    location. The approach also involved a fast and accurate way to

    find optimal segmentations based on EM models, given the

    intensity models along with the spatial coherence assumption.

    Unfortunately, the study does not take the Partial Volume (PV)

    effect into account.

    A contextual segmentation technique to detect brain

    activation from functional brain images based on a Bayesian

    framework is presented [28], which uses an MRF model to

    represent configurations of activated brain voxels. It also uses

    likelihoods given by statistical parametric maps to find the

    maximum a posteriori estimation of segmentation. The

    technique is capable of analyzing experiments involving

    multiple-input stimuli. The study in [27] introduced a model-

    based approach for automatic segmentation and classification

    of multi-parameter MR brain images into 15 tissue classes. The

    model approximated the spatial distribution of tissue classes by

    a Gaussian MRF and used the maximum likelihood method to

    estimate class probabilities and transitional probabilities for

    each pixel of the image. The proposed algorithm is not only

    accurate compared to manual segmentation but also can learn

    new tissue classes. An unsupervised tissue characterization

    algorithm was introduced in [29] that is both statistically

    principled and patient specific. The method used adaptive

    standard finite normal mixture and inhomogeneous MRF

    models, whose parameters were estimated using ER method

    and relaxation labeling algorithms under information theoreticcriteria.

    A technique for assessing the accuracy of segmentation

    algorithms was presented in [10] and applied to the

    performance evaluation of brain editing and brain tissue

    segmentation algorithms for MR images. It relied on a

    distance-based discrepancy features between the ground truth

    obtained from realistic digital brain phantom, which is taken as

    a reference, and the edited/segmented brain tissues. The

    proposed strategy can be used to evaluate and validate any

    segmentation algorithm, and it is able to determine quantitat-

    ively to what extent a segmentation algorithm is sensitive to

    internal parameters, noise, artifacts or distortions when a

    ground truth is given.

    In this paper, a segmentation algorithm based on Hidden

    Markov Models is presented, in conjunction with the required

    preprocessing, for MR data. The algorithm is multi-dimen-

    sional and demonstrates a high degree of accuracy for 3D MRI

    brain segmentation, compared to other techniques. Unlike

    generic pre-processing used in most image processing and

    computer vision applications, the pre-processing phases used in

    this algorithm are specifically developed to handle problems

    encountered in 3D MRI brain segmentation. These problems

    include correction of sudden intensity variations resulting from

    artifacts during the acquisition process and global brightness

    and contrast correction, with both problems showing asignificant impact on segmentation accuracy. In addition to

    its segmentation accuracy, the HMM-based segmentation

    algorithm distinguishing characteristics include efficient

    computational requirements, unique scanning of the 3D MRI

    data that enables the modeling of the voxels neighborhood

    effect on that voxels segmentation, and generic applicability to

    larger neighborhoods that is important for the detection of

    larger features that exceed the high-resolution neighborhood

    size.

    The 3D MRI segmentation algorithm was evaluated using

    simulated 3D MRI brain data sets obtained from McConnell

    Brain Imaging Centre, Montreal Neurological Institute, McGill

    University (http://www.bic.mni.mcgill.ca/) and real 3D MRI

    brain data sets obtained from the Internet Brain Segmentation

    Repository (IBSR), Center for Morphometric Analysis at

    Massachusetts General Hospital (http://www.cma.mgh.harvard.

    edu/ibsr/). The 3D MRI data sets are used to perform an

    objective assessment of the segmentation results based on a

    metric that enables the comparison of the segmentation results

    obtained using the presented algorithm as well as clinical

    experts performing segmentation manually, which are avail-

    able from the IBSR web site. The metric is termed the

    overlapping coefficient and is equal to one if the automatic

    segmentation results were identical to the manual ones and

    reduces to 0 with no intersection. The quality of the

    M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 3

    + model ARTICLE IN PRESS

    http://www.bic.mni.mcgill.ca/http://www.cma.mgh.harvard.edu/ibsr/http://www.cma.mgh.harvard.edu/ibsr/http://www.cma.mgh.harvard.edu/ibsr/http://www.cma.mgh.harvard.edu/ibsr/http://www.bic.mni.mcgill.ca/
  • 7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation

    4/15

    segmentation results obtained using the algorithm presented in

    this paper were further evaluated by comparison with the

    results of other algorithms applied to the same data sets and

    published on the IBSR website.

    This paper is organized as follows: Section 2 describes the

    underlying mathematical foundation upon which the algorithm

    is based. Section 3 provides the details of the adoptedmathematical model for the discrete Hidden Markov Models

    and is followed by the mathematical foundation of the

    continuous case. Then, a complexity analysis for comparing

    Markov Random Fields and Hidden Markov Models is

    presented in Section 3. Section 5 details the training and

    segmentation steps in both the continuous and discrete cases.

    Section 4 provides the detail of the preprocessing phases.

    Finally, experimental results using both real and simulated 3D

    MRI data sets are presented in Section 6.

    2. Mathematical model

    The basic foundation of the presented algorithm relies on

    the ability of the underlying Hidden Markov Model (HMM) to

    build knowledge about the input multi-dimensional data

    vectors or sequences that reflect the parameters of the MR

    imaging modality, i.e. intensity information about the voxel

    and its neighborhood. Hidden Markov Models are descendants

    of Markov Chains, which are made of different states

    statistically bound by transition probabilities. A HMM is

    characterized by a set of internal states, the transition

    probabilities among the states in response to an input symbol

    from the sequence, and the emission probabilities of symbols

    from the different states. The HMM knowledge is built in theform of the transition as well as the emission probabilities of

    the states that are conditioned in response to the input symbols

    of the sequence during the learning stage based on two

    mathematical assumptions. First, the Markovianity assump-

    tion, which is expressed as follows:

    pqiZ sijqiK1Z sa;qiK2Z sb;.ZpqiZ sijqiK1Z sa (1)Eq. (1) imposes the condition that the probability p of

    transition from one state qiK1 to another qi, is only dependent

    on the previous state qiK1. In other words, the probability is

    independent of the states prior to qiK1.

    Second, the assumption that the emission probabilities from

    each state are independent of each other, which leads to the

    output probability being the product of the emission

    probabilities of all states, as expressed in Eq. (2) as follows

    pOjLZX

    q1;q2;.;qn

    pq1 bq1 O1aq1q2 bq2O2aq2q3/bqn On (2)

    where p is the output probability of a chain OZO1O2/On,

    bqx(Oy) is the emission probability of pattern Oy from state qx,

    aij is the transition probability from state i to state j, pqx is the

    initial probability of state qx, and L is a vector representing the

    model parameters. Higher-order Markov Models increase the

    level of dependency, which complicates the analysis of higher-

    order systems. Moreover, first-order Hidden Markov Models

    assume that the states are hidden and cannot be observed at the

    output stage. Instead, only the outputs emitted from those states

    are observable without knowing which states emitted those

    outputs.

    This is true when the Hidden Markov Models are viewed

    from a similar perspective to the one presented in [4], where the

    HMM was imagined as a process generating output symbolsand the observations were viewed from the outside without

    knowing which states emitted them. At that point, the emission

    probability of one state can well be assumed to be independent

    of the other outputs. However, a different case exists when the

    HMM is used for MR image segmentation, where the objective

    is to find the best state sequence that might have produced an

    output. By inspecting (2), the output probability for each

    segment is calculated using the most probable path only, i.e.

    without a summation over all possible paths. During training,

    the goal of the training algorithms is to increase the output

    probability of input sequences representing a certain class of

    tissues. Hence, the transition and emission probabilities are

    updated in a manner that maximizes the output probability of a

    given class of tissues. This in some cases entails changing

    transition and emission probabilities of prior states in order to

    maximize the output probability given a certain terminating

    output. A case that is clear if the output probability is

    considered only due to the most probable path. This

    mechanism in turn encodes some form of relationship between

    the terminal and the input sequences. The encoding of relations

    arises from the fact that upon updating the transition

    probabilities of the prior states their values are decreased,

    forcing the most probable chain of states to change to another

    set of states having higher transition and emission probabilities.

    The fact that relation encoding takes place is demonstratedthrough a numerical example that shows that emission

    probabilities during classification with HMMs are conditioned

    by non-neighboring outputs.

    The encoding of this relation is demonstrated through the

    example HMM, shown in Fig. 2, which involves two states,

    State 0 and State 1 with the following initial probability,

    pq0Zpq1Z0:5, emission probabilities, bq0(0)Z0.8, bq0(1)Z

    0.2, bq1(0)Z0.4, bq1(1)Z0.6, and transition probabilities a00Z

    0.3, a01Z0.7, a10Z0.6, a11Z0.4. This was tested using a

    sequence of five outputs 00000 and the most probable chain

    was found to be 01010 with probability of 0.016257.

    However, when the last output is changed to 00001, themost probable chain changes not only in terms of the last state

    but also in terms of the first state to be 10101 with probability

    of 0.008129. Changing the output emitted final state inferred a

    change in initial state and consequently changes the emission

    Pi=0.5

    b(0)=0.8b(1)=0.2

    Pi=0.5

    b(0)=0.4b(1)=0.6

    0.7

    0.6

    0.3 0.4

    State 0 State 1

    Fig. 2. Example HMM.

    M. Ibrahim et al. / Image and Vision Computing xx (2006) 1154

    + model ARTICLE IN PRESS

  • 7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation

    5/15

    probability of O1 depending on O4. This shows that the

    emission probability of output intensities can be conditioned by

    the presence of other output intensities emitted by non-

    neighboring states. A simple argument based on those results

    shows that the HMM can encode relations in more than one

    dimension, since the intensities in these sequences are

    constructed from a 3!3!3 neighborhood of voxels. More-over, the HMM encodes relations between intensities of non-

    neighboring voxels in the same 3!3!3 neighborhood, even if

    they do not reside in the same clique, as defined in HMRF

    models. The knowledge stored in the HMM encodes the

    conditional dependency of the voxels intensities and the class

    of tissue to which they belong in the form of the initial

    probabilities, transition probabilities and emission probabilities

    which are based on the mathematical model of the HMM

    transition among the constituent states. In contrast, Hidden

    Markov Random Fields (HMRFs) are based on Gibbs

    distribution, which encodes relations between voxels through

    the usage of cliques and mathematical modeling of thepotential.

    In other words, both MRFs and HMRFs provide a

    mathematical model for the dependency between voxel

    intensities. However, HMMs can establish similar dependen-

    cies among pixel/voxel intensities that are in larger regions or

    do not belong to the same clique, as will be shown in Section 3

    addressing the HMM mathematical model. In this work, when

    presenting the pixel/voxel data to the HMM-based segmenta-

    tion module, each pixel/voxel is represented by a vector

    composed of its grayscale/color value as well as those of other

    pixels/voxels in its neighborhood, 9 pixel-vector and 27 voxel-

    vector for 2D and 3D imaging data, respectively. The vector is

    presented to the HMM models and the probability of output is

    calculated using prior training knowledge stored in the model.

    Labeling takes place by setting the label of the voxel to that of

    the HMM showing the highest output probability.

    The outputs of a HMM can be discrete, acquiring certain

    specific quantized levels or continuous based on continuous

    probability density functions (PDFs). The most common

    continuous PDF representation is a Multivariate Gaussian

    distribution whose co-variances are assumed to be zeros,

    reducing to a mixture additive set of normal distributions. By

    estimating the probability that a pattern was generated by a

    certain HMM where the most probable model to produce that

    pattern is regarded as its tissue type or class. HMMs werepreviously used successfully in automatic speech recognition

    (ASR) and are commonly used with the Minimum Classifi-

    cation Error (MCE) training algorithm described in [6,7],

    which forms the foundation of the learning process employed

    in the proposed segmentation framework.

    During the MCE training, the derivatives of the output are

    computed with respect to every parameter to be updated. Since

    the output we seek is the class number, a continuous

    differentiable formula is required that evaluates the correctness

    of the result by replacing the non-differentiable discrete on/off

    output. The mathematical model of the loss in [6,7] was used

    for that case where

    liZ sigmoiddiZ1

    1CeKgdiCq(3)

    where g is the sigmoid slope, q is a shift and di is continuous

    variable that is more negative when the result is more correct,

    i.e. when the HMM of class i has higher probability, which can

    be expressed as follows:

    diZKgiX;LC1

    kK1

    XkjZ1

    jsi

    gjX;Lh

    0BBBBBB@

    1CCCCCCA

    1=h

    (4)

    The right term of Eq. (4) approaches MAXk

    jZ1

    jsi

    gjX;L as h/

    N, which leads to di being negative if the HMM model of class

    i showed the highest probability and so will the corresponding

    li.

    gx is a discriminant function for each class, which is notnecessarily corresponding to a probability since no restrictions

    are imposed for that purpose. However, by using HMMs the

    output is the probability of the pattern, and the used

    discriminant is the probability due to the most probable path.

    k is the number of models involved.

    The MCE updates each parameter trying to reach the

    minimum of li. For a certain parameter x, this update proceeds

    as follows

    xtC1Zx

    tK3

    vli

    vx(5)

    where 3 is the learning rate.

    In the MCE algorithm [7], it is discussed that if the learningrate was chosen such that the following conditions are satisfied

    XNtZ0

    3t/N (6)

    XNtZ0

    3t2!N (7)

    the model parameters L approaches at least a local minimum

    L*. It is also described that by using a small sigmoid slope that

    increases across iterations, the global minimum is achievable

    with a higher probability than other training algorithms due tosmoothing of the error surface. Both considerations were

    addressed in the context of this paper, where the learning rate is

    given by:

    3tZ 301Cat

    (8)

    The integration from zero to infinity is infinity, and the

    integration of the learning rate squared is equal to 3(0)/a, where

    a is a constant, and t is time, which is substituted by the

    iteration number, i.e. (6) and (7) are satisfied. In other words,

    the proposed HMM is accurate since it converges to the global

    minimum as well as robust since the convergence is only

    dependent on the established learning rate.

    M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 5

    + model ARTICLE IN PRESS

  • 7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation

    6/15

    Two HMM models will be considered during analysis. The

    first one is a binary discrete HMM where each node has an

    emission probability for zero and an emission probability for

    one. Consequently, the input is taken in the form of a long

    vector having the binary equivalent of the intensities

    represented in eight bits each. The second model is a

    continuous model where each node represents the emissionprobabilities in the form of a Gaussian Mixture. The analysis of

    the discrete model will be first presented followed by the

    formulas necessary for the continuous Gaussian Mixture.

    Since li is a function of di, and di is a function of gx, xZ

    1,.,k, then the derivative of li with respect to a certain

    parameter x using the chain rule is given:

    vli

    vxZ

    vli

    vdi

    vdi

    vgx

    vgx

    vx(9)

    vli

    vdiZgli1Kli (10)

    vdi

    vgxZ

    K1; xZ i

    ghK1x

    kK1

    1

    kK1

    X jZ 1jsi

    k

    gh

    j

    24

    351=hK1; xsi

    8>>>>>>>:

    (11)

    The output of the HMM can take several forms

    giZXq2C

    gx;q;L (12)

    giZMAXq2

    C g

    x;q;L

    (13)

    giZ1

    NCXq2C

    gx;q;Lh" #1=h

    (14)

    where C represents the set of possible chains and N(C) is the

    number of elements in C. The output can be any of the previous

    forms or functions of them. MCE training discussed in [6,7]

    was based on Eq. (13), which is called the segmental form

    where only the most optimal path is considered for update

    during the Generalized Probabilistic Descent update step.

    Since minimizing or maximizing a function requires the

    minimization or the maximization of its log, we choose thediscriminant function given by

    giZLogpq0 CXTtZ1

    LogaqtK1qtCLogbqt (15)

    where the bs are the output functions, as are the transition

    probabilities and ps are the initial probabilities.

    The HMM imposes constraints on the most of the

    parameters associated with each model. Such constraints

    include the summation of all transition probabilities going

    out of a state which must be one, the summation of all initial

    probabilities and many other constraints, which have to be

    satisfied during parameter update.

    For that reason in [6] a substitution was used which

    guarantees those constraints where the substituted parameter is

    the one that gets updated in each step. The substitution

    previously used in [6] for the initial probability is:

    pxZexp px

    PQqZ1

    exp pq(16)

    The previous substitution works well except for the fact that

    its uses exponents, which slows down execution. Another

    substitution is proposed and used in this research that does not

    depend on exponents

    pxZp2xPQ

    qZ1

    p2q

    (17)

    aixZ

    a2ixPQqZ1

    a2iq(18)

    P0ZP

    20

    P20C

    P21

    ; P1ZP

    21

    P20C

    P21

    (19)

    where P0 and P1 belong to a certain state, and represent the

    emission probabilities of ones and zeros. The parameters that

    get updated are the substituted bar parameters.

    To update the initial probabilities we need to find the

    derivative of gx with respect to every pq, where qZ1,.,Q. If

    qZq0 then

    vgx

    v pq0Z

    21Kpq0pq0

    (20)

    On the other hand, if qsq0, a dependency still exists

    through the normalization formula (17), and the derivative

    becomes:

    vgx

    vpzZK2pq0 pz

    p2q0(21)

    A similar case holds for the transition probabilities. During

    the update, we will consider the derivatives of the transition

    probability going out from a certain state i to a state j.

    vgx

    v aijZ

    XTtZ1

    1

    aqtK1qt

    vaqtK1qtv aij

    (22)

    vaij

    v aijZ

    2aij1Kaijaij

    (23)

    vaix

    v aijZK2a2ixaij

    a2ix(24)

    Updating the output probabilities is much easier than the

    rest of the parameters. The first step is to find the derivative of g

    with respect to b.

    M. Ibrahim et al. / Image and Vision Computing xx (2006) 1156

    + model ARTICLE IN PRESS

  • 7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation

    7/15

    vgx

    vbqZ

    1

    bq(25)

    vbxv P1

    Z2P0P12xK1

    P1(26)

    vbxv P0

    Z2P0P11K2x

    P0(27)

    The only difference between the discrete and the continuous

    HMM models, is the way the output probability is calculated.

    In the continuous case, b(x) is derived from a Gaussian mixture

    as follows

    bxZXKkZ1

    CkNx;mk;s2k (28)

    where two constraints are imposed. The first is that the

    summation of the weights Ck must be one. The second is that

    the standard deviations skis always positive. To guarantee that,

    s2k is used for the standard deviation while s4k is used for the

    variance. mk is the mean of distribution k and x is the input

    variable. The substitution used for the weights is given by

    CkZC

    2kPK

    xZ1

    C2

    x

    (29)

    where K is the number of mixtures used. Updating the

    parameters is governed by the following equations:

    vCx

    v Cx Z

    2Cx

    1KCx

    Cx (30)

    vCy

    v CxZK2C2y Cx

    C2

    y

    (31)

    vbxvCjZNx;mk;s2k (32)

    vbxvmjZ

    vbxvCj

    CjxKmjs4j

    (33)

    vbxvsjZKvbxvmj

    4xKmj2C2s4jsj

    (34)

    This then leads to the general form of the training and

    segmentation algorithms for a 3!3!3 3D neighborhood.

    Voxel data is represented as a vector composed of 27 floating-

    point numbers. Each of these numbers represents the intensity

    of the voxel and the intensity of each of its 26 3D neighbors.

    This vector is presented to the HMM model and the probability

    of output is calculated using prior training knowledge stored in

    each model. Labeling takes place by setting the label of the

    voxel to that of the node showing the highest output

    probability.

    3. Comparison with Hidden Markov Random Field

    Comparison of the HMM and HMRF in the context of MRI

    segmentation will be presented from two points of view. The

    first is performance where the complexity analysis of both is

    presented. The second is the ability of encoding relations

    among voxels in larger neighborhoods.In order to assess the computational efficiency of the

    proposed HMM-based segmentation framework, the complex-

    ity of the HMM-based segmentation is compared to the widely

    utilized HMRF-based segmentation in terms of performance.

    Since the continuous Gaussian Mixture HMM is similar to

    HMRF segmentation, its complexity analysis is used for

    performance comparisons. This starts with the estimation of the

    Gaussian mixture given by

    OZXGiZ1

    wi

    ffiffiffiffiffiffiffiffiffiffiffi2ps2

    p eKxKm2

    s2 (35)

    where wi is the weight associated with this Gaussian response.The number of floating-point operations required Nffor such an

    operation is given by

    OHMMZ9!G (36)where the nine operations account for finding (xKm), squaring

    it, finding s2, negating (xKm)2, finding the exponent,

    calculating 2ps2, finding square root, dividing wi by square

    root, and multiplying by the exponent and G is the number of

    mixtures used.

    Hence, to find the output probability of a certain number in a

    sequence, Eq. (36) gives the number of required floating-point

    operations. In its first iteration, the Viterbi algorithm computesthe output probability of the first pattern in the sequence,

    multiplied by the initial probability of each node, which forms

    n!(1C9!G) computations for n nodes. In the subsequent

    operations, the Viterbi algorithm multiplies the current

    probability set at each node by the transition probability to

    each node, which require an extra n2 operations, and adds the

    output probability of the current pattern, which needs 1C9!G

    operations, so the total number of operations is given by

    OHMMZn1C9GC n21C9GLK10OHMM

    ZOn2GL (37)where L is the length of the sequence.

    HMRF models start by counting the number of cliques in the

    3!3!3 neighborhood. Those cliques can only be formed as

    2!2!2 neighborhoods, i.e. composed of eight voxels. Any

    combination of voxels larger than two will form a clique in that

    neighborhood. Each of cliques requires the evaluation of the

    potential. Since the complexity of computing the potential

    depends on the model being used, the potential is assumed to

    require one cycle per voxel and another cycle for the clique,

    which results in the best case scenario for the HMRF models.

    This can be demonstrated by the simplest case of subtracting

    M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 7

    + model ARTICLE IN PRESS

  • 7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation

    8/15

    the mean out of each voxel, squaring and summing all the

    potentials together. More complicated models will, in turn,

    require higher complexities. The number of operations NVrequired to carry out these computations of the potential will

    thus be:

    NVZ2!16!

    2C1

    C4!4!

    2C1

    C1

    C

    X8vZ3

    8

    v

    !4!vCvK1 (38)

    And since the probability distribution P(f) of the configur-

    ation is a Gibbs distribution (with respect to the neighborhood

    system used), which can be given by:

    PfZ 1Z!eK

    1T

    Uf where UfZX

    allCliques

    Vcf (39)

    IfZis assumed a constant by restricting the cliques either to

    single locations like or single and double locations, the order of

    computing the Gibbs distribution then the computationalcomplexity of G will be O(1), since it requires a constant

    number of operations irrespective of any of the model

    parameters. However, for more accurate computations the

    process of estimation of Z increases the order of complexity,

    moreover, for a continuous case like that presented in this

    paper it is impossible to find the exact value of Z as it will be

    the result of 27 nested integrals. This leads to estimation, which

    in turn affects the accuracy of the computed probability. The

    complexity becomes:

    OHMRFZOZ (40)

    So, in the continuous case, the HMRF becomes thecomputation of 27 nested integrals, whereas the HMM is

    dependant on the number of classes, number of nodes, and the

    size of the input vector.

    In the HMM, application to larger neighborhoods occurs

    with a change in the size of the input vectors used for training

    and segmentation to represent the larger neighborhood since

    the parameter updates (11)(34) do not rely on the size of the

    input vector. No further change in the HMM algorithm is

    necessary. As a result, the HMM provides a robust foundation

    that is generically applicable to the segmentation of multi-

    dimensional datasets in arbitrarily large neighborhoods, i.e.

    applicable to MRI as well as MRSI data.

    Larger neighborhoods raise a computational concern in the

    case of HMRF segmentation. For example, the classification of

    a voxel based on a neighborhood larger than 3!3!3 involves

    the mapping from MRF to Gibbs distribution, which, in turn,

    entails computing the Gibbs distribution in a 3!3!3

    neighborhood. Larger regions necessitate the analysis of higher

    order Markov Fields, which requires the re-definition of the

    neighborhood. To successfully relate larger neighborhoods,

    HMRF must be used in which iterative segmentation takes

    place. This is due to the dependency of segmenting each voxel

    on its neighbors and their prior segmentations, which are used

    to compute the potential. Thus, the segmentation becomes

    subject to iterative local maximization/minimization

    algorithms like the Expectation Maximization (EM) and the

    Iterative Conditional Modes (ICM), which are typically used to

    avoid the analytically intractable nature of estimating the best

    solution for the HMRF. A common concern with such methods

    is their sensitivity to the initialization conditions and the

    reaction of the system to input sequences during segmentation.

    HMM are easily applicable to larger neighborhoods at costof increasing the additional complexity required by the

    algorithm (increase in L in Eq. (37)), instead of adding

    increased sensitivities to initial conditions and reactions to

    input sequences. The problem of applicability to larger

    neighborhoods is specifically important in the context of

    segmentation of biomedical imaging data from multiple

    modalities where the voxel neighborhood must be extended

    across modalities or across time, e.g. in functional MRI,

    beyond the 3!3!3 neighborhood. Although this increase may

    provide better segmentation accuracy, the increase in accuracy

    is bounded, i.e. will occur up to a certain neighborhood, after

    which there could not be any significant change in the

    segmentation accuracy due to the smoothing effect of utilizing

    a larger neighborhood. This issue can even further complicate

    the choice of the appropriate neighborhood size, since if the

    neighborhood becomes very large, the segmentation accuracy

    can be negatively affected. Hence, the contribution of the

    different neighbors in the segmentation process may be

    weighted according to their distance to the voxel under

    investigation. These weights can be inversely proportional to

    the distance between the neighboring pixels/voxels and the

    investigated pixel/voxel. In other words, the significance of the

    neighboring pixels/voxels in the segmentation strategy

    increases as the neighbors become closer to the pixel/voxel

    under investigation.

    4. Preprocessing phase

    Preprocessing steps aim to reduce the effects of noise,

    address intensity inhomogeneities, and perform global

    intensity level correction and are applied prior to segmentation.

    These are based on existing techniques and are only presented

    here for completeness, but are not discussed in detail.

    4.1. Intensity inhomogeneities

    Intensity inhomogeneities are defined as variations in voxel

    intensities through or across imaging data sets, which appear as

    either sudden or slow variations. Handling both types of

    intensity variations in a pre-processing phase to segmentation

    results in improving the segmentation accuracy through the

    control of adverse effects caused by such inhomogeneities. A

    normalized histogram intersection between each two consecu-

    tive images in a data set is used for this purpose. The

    distributions of pixel intensities between each pair of

    consecutive images are expected to change slowly. If the

    mean and variance across slices nearly match, then the

    distribution will change slowly. Assuming that Ii is the

    intensity of pixel i in an image, then the standard deviation

    of the image is given by

    M. Ibrahim et al. / Image and Vision Computing xx (2006) 1158

    + model ARTICLE IN PRESS

  • 7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation

    9/15

    sZ

    ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

    N

    XNiZ1

    IiKm2vuut (41)

    where m is the mean intensity. If we assume that a contrast a

    and a brightness b, which made the standard deviation of thevoxel intensity distribution become r 0, then:

    s0Z

    ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

    N

    XNiZ1

    aIiCbKamKb2vuut

    Zas (42)

    This shows that the standard deviation is only affected by

    the contrast. Obviously, that case maps to the correction of

    each slice, with respect to its preceding slice. The slices that are

    considered are those ones having non-empty preceding slices,

    which can be determined since after skull peeling (cerebrum

    reconstruction, or skull stripping) all the background voxels

    end up being exactly zeroes.A similar argument holds for the brightness, where

    m0ZamCb (43)

    which means that by the knowing a from Eq. (42), b can be

    estimated from (43) Fig. 3.

    4.2. Global intensity level correction

    Global intensity correction is addressed after handling both

    sudden and slow intensity variations. Since, the HMM-based

    segmentation utilizes the grayscale or color information of

    voxels, it is sensitive to global variations among data sets. In

    order to remedy this condition, global correction is employed

    to maximize the histogram intersection between the data sets,

    so that errors due to intensity differences are minimized.

    In order to achieve the required global intensity correction,

    the normalized histograms are utilized due to differences

    between the number of pixels/voxels of different data sets. The

    histogram, which represents the frequency of the intensities, is

    normalized against the total number of non-background pixels

    present in each data set. And brightness that leads to the

    maximization of the integral of the histogram intersection,

    expressed as follows, is performed after applying an

    anisotropic filtering stage:

    ^Hint

    VMaxvZ0

    ^HIA; vh ^HIB; vdv (44)

    The brightness and contrast values were estimated in the

    same way done in Eqs. (42) and (43), where they were appliedafter the sudden intensity correction and before the filtering.

    5. Training and segmentation steps

    Based on the mathematical models of both the discrete and

    continuous HMM-based segmentation techniques, the general

    form of the HMM-based training and segmentation algorithms

    for a 3D neighborhood Ninvolves representing each voxel by a

    vector or sequence of symbols v. The sequence represents the

    relevant parameters of the voxel and the voxels neighbors in

    N. The representative vector or symbol of each voxel is

    presented to a set of HMM models, each corresponding to a

    separate class or tissue type, and the output probabilities are

    calculated using prior training knowledge stored in each HMM

    model. Labeling takes place by assigning to the voxel the label

    associated with the HMM showing the highest output

    probability. Training of both the continuous and the discrete

    models follow the same procedure (Fig. 4). The segmentation

    also follows the same procedure for both discrete and

    continuous HMM-based techniques (Fig. 5). If labeling

    encounters segments whose characteristics are not consistent

    with any of the known tissue types, these are classed as

    unknown tissue. A clinical expert is then requested to assign a

    Fig. 3. Sudden intensity correction steps.

    Fig. 4. Training the HMM.

    M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 9

    + model ARTICLE IN PRESS

  • 7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation

    10/15

    label to the unknown tissues. The segments characteristics are

    then used to initialize the knowledge of the newly identified

    tissue and the corresponding HMM. The acquired knowledge is

    then used to label new segments that belong to the newly

    identified tissue type.

    6. Experimental results

    Three types of preprocessing were applied. 3D anisotropic

    filtering as described in [13], using kZ5 and for 10 iterations.

    Previously in [14], we showed how Global Intensity level

    correction could be applied to MRI sequences. Also in [14], we

    showed how sudden intensity variations that appear in many

    MR sequences could be accounted for. The same techniques

    were used here for preprocessing of imaged prior to

    segmentation. The anisotropic filter used had kZ5 and applied

    for 10 iterations. For the discrete HMM, the number of states

    was successively increased and after 15 states no significant

    improvement was detected. The number of states used was 10,

    with a Gaussian mixture of 15 distributions. The maximum

    number of iterations was set to 30,000 and the sigmoid slope to0.08.

    Fig. 6 shows the classification accuracy (1-loss) averaged

    for every 1000 iterations for the discrete model. It is clear that

    the training algorithm reaches the minimum of the error surface

    after around 10,000 iterations, which justifies why during

    experimentation we chose 30,000 as an upper bound for the

    number of iterations.

    6.1. BrainWeb data results

    The algorithm was tested using simulated digital phantoms

    from the BrainWeb MR simulator (http://www.bic.mni.mcgill.ca/brainweb/). The digital phantoms were obtained using an

    isotropic voxel size of 19 mm to investigate the influence of

    noise, field inhomogeneity, and contrast (T1-weighted using

    [18, 10 ms, and 30 (TR, TE, and flip angle))] with varying

    levels of noise from 1 to 9% and varying levels of spatial

    inhomogeneity, i.e. intensity variations for each tissue class,

    from 0 to 40%. The comparison was performed on the basis of

    the Dice similarity coefficient that measures the overlap

    between two segmentations X and Y

    DX;YZ 2jXhYj

    jX

    jC

    jY

    j(46)

    where jrj represents the number of voxels in segment r. TheDice coefficient was computed for both gray matter and white

    matter segmentation. The results are shown in Tables 15.

    As can be seen from the tables, the results for the Dice

    similarity coefficient shows that the HMM-segmentation

    provides accurate segmentation of the White Matter (WM)

    and Gray Matter (GM) even in the presence of increasing noise

    and spatial inhomegeneities. The increase in the slice thickness

    has the expected effect of reducing the accuracy of the

    algorithm as evidenced by the similarity coefficient. This is

    expected as the algorithm itself is geared to use in 3D images

    6.20E-01

    6.40E-01

    6.60E-01

    6.80E-01

    7.00E-01

    7.20E-01

    7.40E-01

    7.60E-01

    1.00E+03 1.00E+04 1.00E+05 1.00E+06

    Iterations

    Classification accuracy

    Fig. 6. Classification accuracy (1-loss) evaluated across iterations for every

    1000 iterations.

    Table 1

    BrainWeb results, 1 mm slice

    Spatial inhomogeneity

    0% 20% 40%

    White Gray White Gray White Gray

    0% 0.831 0.872 0.831 0.872 0.708 0.756

    1% 0.825 0.870 0.756 0.801 0.706 0.756

    3% 0.815 0.869 0.772 0.828 0.713 0.773

    5% 0.793 0.860 0.765 0.833 0.717 0.793

    7% 0.739 0.832 0.739 0.825 0.702 0.797

    9% 0.663 0.796 0.682 0.799 0.672 0.787

    Average 0.778 0.85 0.758 0.826 0.703 0.777

    Fig. 5. Segmenting with the HMM.

    M. Ibrahim et al. / Image and Vision Computing xx (2006) 11510

    + model ARTICLE IN PRESS

    http://www.bic.mni.mcgill.ca/brainweb/http://www.bic.mni.mcgill.ca/brainweb/http://www.bic.mni.mcgill.ca/brainweb/http://www.bic.mni.mcgill.ca/brainweb/
  • 7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation

    11/15

    data sets in which the neighboring images are of a similar

    distance as the pixel distances (i.e. that the slice thickness is

    close to the pixel distance). In the case of the simulated dataset,

    the ground truth is established from the original data

    creation, whereby each tissue is clearly established and the

    segmentation is completely known. Thus, this data set does not

    need an expert segmentation for comparison. Additionally,

    expected results should be better that for real data. Note that

    Table 4

    BrainWeb results, 7 mm slice

    Spatial inhomogeneity

    0% 20% 40%

    White Gray White Gray White Gray

    0% 0.568 0.630 0.559 0.589 0.524 0.567

    1% 0.568 0.634 0.560 0.594 0.543 0.570

    3% 0.577 0.662 0.564 0.613 0.551 0.595

    5% 0.569 0.578 0.581 0.652 0.562 0.629

    7% 0.563 0.708 0.577 0.671 0.556 0.641

    9% 0.510 0.698 0.596 0.697 0.555 0.690

    Average 0.559 0.652 0.573 0.636 0.549 0.615

    Table 5

    BrainWeb results, 9 mm slice

    Spatial inhomogeneity

    0% 20% 40%

    White Gray White Gray White Gray

    0% 0.526 0.573 0.523 0.539 0.512 0.518

    1% 0.532 0.576 0.523 0.538 0.513 0.521

    3% 0.534 0.598 0.532 0.570 0.519 0.538

    5% 0.529 0.630 0.539 0.593 0.527 0.594

    7% 0.508 0.659 0.535 0.634 0.533 0.596

    9% 0.527 0.673 0.529 0.646 0.534 0.630

    Average 0.526 0.618 0.53 0.587 0.523 0.566

    Table 3

    BrainWeb results, 5 mm slice

    Spatial inhomogeneity

    0% 20% 40%

    White Gray White Gray White Gray

    0% 0.629 0707 0.607 0.662 0.582 0.6371% 0.628 0.711 0.609 0.669 0.583 0.641

    3% 0.634 0.733 0.624 0.704 0.591 0.664

    5% 0.620 0.750 0.625 0.719 0.601 0.697

    7% 0.618 0.755 0.611 0.745 0.603 0.714

    9% 0.598 0.753 0.600 0.752 0.601 0.729

    Average 0.621 0.735 0.613 0.709 0.594 0.68

    Table 2

    BrainWeb results, 3 mm slice

    Spatial inhomogeneity

    0% 20% 40%

    White Gray White Gray White Gray

    0% 0.707 0.790 0.671 0.727 0.637 0.707

    1% 0.708 0.792 0.672 0.741 0.636 0.7073% 0.703 0.802 0.682 0.765 0.641 0.723

    5% 0.695 0.804 0.685 0.783 0.646 0.749

    7% 0.668 0.796 0.664 0.782 0.642 0.758

    9% 0.600 0.770 0.620 0.772 0.618 0.755

    Average 0.68 0.792 0.666 0.762 0.637 0.733

    M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 11

    + model ARTICLE IN PRESS

  • 7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation

    12/15

    this is used to initial test the capability of the algorithm to carry

    out segmentation, further testing with real data is also

    presented that show comparisons with existing and expert

    segmentation.

    Sample images for 1 mm slices are shown in Fig. 7. The

    leftmost images are the original phantom data, the center is the

    data used to generate the phantom (i.e. the ground truth data),

    and the rightmost is the segmented result. As can be seen,subjectively, the segmentation results reflect the objective

    overlap results. One noticeable exception is that the segmenta-

    tion algorithm currently is configured to only look at gray and

    white matter, and ignores all other tissue types. Further work is

    continuing in expanding the algorithm for use with additional

    tissue types.

    6.2. IBSR data results

    After segmenting each case, the accuracy of the HMM-

    based segmentation relative to the manual segmentations as

    well as the results of existing techniques, including themaximum likelihood, was determined using the Tanimoto

    coefficient, which was previously used in existing techniques,

    and is given by

    TX; YZ jXhYjjXhYj (47)

    where jrj represents the number of voxels in segment r. By thedefinitions of the Dice and Tanimoto coefficients,

    T(X,Y)%D(X,Y). So, the Tanimoto is more conservative

    than the Dice, where equality is subject to the condition that

    Table 6

    IBSR reported results

    White Gray Method

    0.567 0.564 Adaptive MAP

    0.562 0.558 Biased MAP

    0.567 0.473 Fuzzy c-means

    0.554 0.550 Maximum A posteriori Probability (MAP)

    0.551 0.535 Maximum-Likelihood

    0.571 0.477 Tree-structure k-means

    0.832 0.876 Manual (4 brains averaged over 2 experts)

    Fig. 7. Sample segmentation of simulated digital phantoms.

    Table 7

    Overlapping results obtained from applying both HMM algorithms on the IBSR

    data after training

    Discrete Continuous

    White Gray White Gray

    100_23 0.517 0.694 0.774 0.879

    11_3 0.537 0.718 0.778 0.878

    110_3 0.589 0.747 0.746 0.869

    111_2 0.614 0.737 0.748 0.857

    112_2 0.610 0.761 0.761 0.874

    12_3 0.574 0.748 0.784 0.881191_3 0.504 0.708 0.762 0.870

    13_3 0.617 0.746 0.761 0.868

    202_3 0.566 0.743 0.756 0.864

    205_3 0.499 0.623 0.723 0.782

    7_8 0.587 0.745 0.758 0.869

    8_4 0.595 0.723 0.742 0.853

    17_3 0.613 0.716 0.735 0.854

    4_8 0.590 0.690 0.669 0.813

    15_3 0.592 0.697 0.669 0.817

    5_8 0.578 0.635 0.731 0.854

    16_3 0.604 0.719 0.702 0.842

    2_4 0.596 0.684 0.635 0.797

    6_10 0.528 0.582 0.752 0.855

    Average 0.546 0.671 0.699 0.809

    M. Ibrahim et al. / Image and Vision Computing xx (2006) 11512

    + model ARTICLE IN PRESS

  • 7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation

    13/15

    XKYZf. Either coefficient can be utilized in the evaluation.

    However, the choice is dictated to ensure the consistent

    comparison of the published results with the HMM-based

    segmentation results. The Tanimoto coefficient was computed

    for both gray matter and white matter segmentation based on an

    analysis of variance in which the coefficient is the dependent

    variable while the training dataset and the tissue type are the

    independent factors. The results are demonstrated in Table 7.

    Table 6 shows the average results that were reported on theIBSR website using the same data that was used in this study.

    The BMAP algorithm described in [11] is based on HMRF

    computation. Although the results of the discrete model shows

    to be near from most of the reported ones, yet the results of the

    continuous model shows to be superior even when compared

    HMRF. The IBSR data consists of various image sequences

    representing differing real work data sets. The HMM was

    trained with one data set, and then used to segment the

    remaining data sets. In both Tables 7 and 8, the first column

    represents the image sequence numbers. The averages for the

    HMM are show on the last row for comparison with existingresults.

    For a fair comparison, the preprocessing phase of intensity

    variations was removed, and the results were compared to that

    of the Adaptive MAP algorithm [15], which takes care of

    intensity variations of segments through an ML stage for

    initialization purposes. The AMAP is based on Hidden Markov

    Random Fields, so after removing this preprocessing phase the

    comparison becomes so close to comparing both algorithms

    together except for the difference in filtering.

    We compare with the AMAP and not the BMAP because the

    latter models the bias field, which was not considered in our

    analysis. The results in Table 8 demonstrate that the HMMwere able to segment the brain with higher accuracies.

    Conversely, this supports the initial argument present at the

    beginning of the paper, which is that HMMs during

    classification encodes relations not only between neighboring

    voxels, but also between voxels present in non-neighboring

    sites and which is not present in the HMRF.

    Inspection of segmented slices from case 5_8, which was

    before described in the preprocessing phases during sudden

    intensity correction, is shown in Fig. 8. The comparison reveals

    the expected, without sudden intensity correction the whole

    slice is erroneously segmented as gray, and the bright one gets

    segmented as being white.The results demonstrated in the tables demonstrate an

    objective assessment of the quality of the algorithms, yet

    practical cases may be much higher than that since each of the

    IBSR data sets contains at least has one form of difficulty.

    Table 8

    Overlapping results obtained without carrying out sudden intensity correction

    White Gray

    100_23 0.792087 0.867957

    11_3 0.795936 0.863375

    110_3 0.756762 0.850009

    111_2 0.777369 0.844556112_2 0.775694 0.854445

    12_3 0.814668 0.873579

    191_3 0.798191 0.864343

    13_3 0.799298 0.869574

    202_3 0.793114 0.857034

    205_3 0.760803 0.761042

    7_8 0.743964 0.836973

    8_4 0.734167 0.817848

    17_3 0.710917 0.814474

    4_8 0.631508 0.774435

    15_3 0.700746 0.790821

    5_8 0.238758 0.690957

    16_3 0.72307 0.815892

    2_4 0.632165 0.77051

    6_10 0.386919 0.668827Average 0.668307 0.774333

    Fig. 8. Comparison between segmented slices from image sequence 5_8 with and without carrying out sudden intensity correction.

    M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 13

    + model ARTICLE IN PRESS

  • 7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation

    14/15

    7. Conclusion

    In this paper, a 3D MRI segmentation algorithm based on

    HMMs is presented. The algorithm demonstrates the ability of

    HMMs to handle multi-dimensional classification, whereas

    HMMs were previously considered as candidates for 1D

    classification only. The HMM model, together with carefullyconstructed preprocessing steps showed significant improve-

    ment in the quality of 3D MRI segmentation when objectively

    compared to other results obtained using the same data. Both

    simulated and real data were used in the evaluation of the

    algorithm with promising results. The objective measure on the

    simulated phantoms (created from images used to establish

    ground truth) showed that the algorithm, although currently

    restricted to only gray and white matter, accurately identifies

    these tissues within limits of error. Further work is progressing

    on increasing the number of identified tissues. The results from

    the real data (using expert manual segmentations as ground

    truth) showed that the overlap measures are better thatpreviously established methods, and are within the limits of

    error. This is easily seen even the expert manual segmentation

    shows errors from one operator to another.

    Comparisons between HMMs and HMRFs concerning the

    complexity of computations involved and the ability to

    segment based on the decision made from larger regions are

    presented. The comparative results indicate that the current

    mathematical model of MRF using Gibbs distribution can be

    extended to neighborhoods larger than a 3!3!3 neighbor-

    hood if the Markovianity assumption is extended to higher

    orders. A restriction that is not required for the current HMM

    modeling scheme.

    The challenge in using HMRFs for larger neighborhoods

    requires more research in terms of innovative modeling

    schemes. In the HMM, application to larger neighborhoods

    occurs with only a change in the input vectors used for training

    and segmentation to represent this larger neighborhood since

    the parameter updates do not rely on the size of the input

    vector. No further change in the HMM algorithm is necessary.

    For that purpose, the proposed HMM provides a robust

    foundation that is not sensitive to the initial conditions for

    enabling the segmentation of MR imaging data.

    The problem that affects the application of MRF in

    segmentation is the classification of the voxel based on regions

    larger than 3!3!3 neighborhoods. The basis of the mappingfrom HMRF to Gibbs distribution forces the neighborhood

    used to compute the Gibbs distribution in a 3!3!3

    neighborhood. Larger regions necessitate the analysis of higher

    order Markov Fields, which in turns needs the re-definition of

    the neighborhood. To successfully relate larger neighborhoods

    HMRF must be used, where iterative segmentation takes place.

    This is due to the dependency of segmenting each voxel on its

    neighbors and their prior segments, which are used to compute

    the potential. This in turn becomes subject to iterative local

    maximization/minimization algorithms like the Expectation

    Maximization (EM) and the Iterative Conditional Modes

    (ICM). A common and crucial problem with such methods is

    their sensitivity to the initialization conditions and the reaction

    of the system to input patterns during segmentation.

    The cost of increasing the neighborhood size in the context

    of the proposed segmentation strategy is the extra compu-

    tations required by the algorithm (increase in L in Eq. (37)).

    The problem of applicability to larger neighborhoods is

    specifically important in the context of segmentation ofbiomedical imaging data from multiple modalities where the

    pixel/voxel neighborhood must be extended across modalities

    or across time, e.g. in functional MRI, beyond the 3!3!3

    neighborhood. Although this increase may provide better

    segmentation accuracy, this will only occur up to a certain

    point after which there would not be any significant change in

    the segmentation accuracy. This issue is under investigation.

    This issue can even further complicate the choice of the

    appropriate neighborhood size, since if the neighborhood

    becomes very large, the segmentation accuracy can be

    negatively affected. Hence, the contribution of the different

    neighbors in the segmentation strategy can be weightedaccording to their distance to the voxel under investigation.

    These weights can be inversely proportional to the distance

    between the neighboring pixels/voxels and the investigated

    pixel/voxel, in other words, the significance of the neighboring

    pixels/voxels in the segmentation strategy increases as the

    neighbors become closer to the pixel/voxel under investigation.

    Further work is continuing on the effect of increased

    neighborhood sizes.

    Other considerations that will enhance the accuracy of

    segmentation include the usage of multi-spectral images, not

    only T1 but also T2 and PD. The vectors used in classification

    will then be extracted from each voxel and its neighbors in thethree images forming a 27!3 input. And in this case, a 3D

    Gaussian mixture model can be used, where the input to each

    state is a vector of the three intensities. Similar work using

    multi-sensor data and Hidden Markov Chains has been

    reported [33] and concludes that the applicability of HMC to

    these problems is appropriate. This does present promising

    results that have yet to be applied to MR imaging data. Further

    work in applying HMM to multi-spectral MR imaging data is

    currently in progress.

    References

    [1] W. Grimson, G. Ettinger, T. Kapur, M. Leventon, W. Wells, R. Kikinis,

    Utilizing segmented MRI data in image-guided surgery, International

    Journal of Pattern Recognition and Artificial Intelligence 11 (8) (1998)

    13671397.

    [2] S. Warfield, J. Dengler, J. Zaers, C.R.G. Guttmann, W.M. Wells,

    G.J. Ettinger, J. Hiller, R. Kikinis, Automatic identification of grey matter

    structures from MRI to improve the segmentation of white matter lesions,

    Journal of Image Guided Surgery 1 (6) (1996) 326338.

    [3] E. Grimson, M. Leventon, G. Ettinger, A. Chabrerie, S. Nakajima,

    F. Ozlen, H. Atsumi, R. Kikinis, P. Black, Clinical Experience with a

    High Precision Image-Guided Neurosurgery System, MICCAI, Springer,

    Berlin, 1998. pp. 6373.

    [4] L.R. Rabiner, A tutorial on hidden markov models and selected

    applications in speech recognition, Proceedings of the IEEE (1989)

    257286.

    M. Ibrahim et al. / Image and Vision Computing xx (2006) 11514

    + model ARTICLE IN PRESS

  • 7/28/2019 Hidden Markov Models-based 3D MRI Brain Segmentation

    15/15

    [5] L. Bahl, P.F. de Souza Brown, P.V., K.L. Mercer, Maximum mutual

    information estimation of hidden markov parameters for speech

    recognition, Proceedings of the IEEE (1988) 4952.

    [6] Biing-Hwang Juang, W. Chou, Chin-Hui Lee, Minimum classification

    error rate methods for speech recognition, IEEE Transactions on Speech

    and Audio Processing (1997) 257265.

    [7] S. Katagiri, Biing-Hwang Juang, Chin-Hui Lee, Pattern recognition

    using a family of design algorithms based upon the generalized

    probabilistic descent method, Proceedings of the IEEE 86 (11) (1998)

    23452373.

    [9] Y. Zhang, M. Brady, S. Smith, Segmentation of brain MR images

    through a hidden markov random field model and the expectation

    maximization algorithm, IEEE Transactions on Medical Imaging 20 (1)

    (2001) 4557.

    [10] J.C. Rajapakse, J. Piyaratna, Bayesian approach to segmentation of

    statistical parametric maps, IEEE Transactions on Biomedical Engin-

    eering 48 (10) (2001) 11861194.

    [11] J.C. Rajapakse, F. Kruggel, Segmentation of MR images with intensity

    inhomogeneities, Image and Vision Computing 16 (1998) 165180.

    [12] Jagath. C. Rajapakse, Jay. N. Giedd, Judith. L. Rapoport, Statistical

    approach to segmentation of single-channel cerebral MR images, IEEE

    Transactions on Medical Imaging 16 (2) (1997) 176186.

    [13] N.M. John, A three dimensional statistical model for image segmentation

    and its application to mr brain images, PhD thesis, University of Miami,

    1999.

    [14] N.M. John, M. Kabuka, M.O. Ibrahim, Multivariate statistical model for

    3D image segmentation with application to medical images, Journal of

    Digital Imaging 16 (4) (2004) 365377.

    [15] J.C. Rajapakse, J.N. Giedd, J.L. Rapoport, Statistical approach to

    segmentation of single-channel cerebral MR images, IEEE Transactions

    on Medical Imaging 16 (2) (1997) 176186.

    [16] K. Van Leemput, F. Maes, D. Vandermeulen, P. Suetens, Automated

    model-based tissue classification of MR images of the brain, IEEE

    Transactions on Medical Imaging 18 (10) (1999) 8979008.

    [17] S.Z. Li, Markov random field models in computer vision, in: Proceedings

    of the European Conference on Computer Vision, Stockholm, Sweden,

    1994, pp. 361370.

    [18] J. Besag, On the statistical analysis of dirty pictures, Journal of the RoyalStatistical Society, Series B 48 (3) (1986) 259302.

    [19] K. Held, E.R. Kops, B.J. Krause, W.M. Wells III, R. Kikinis, H.-

    W. Muller-Gartner, Markov random field segmentation of brain MR

    images, IEEE Transactions on Medical Imaging 16 (6) (1997) 878886.

    [20] X. Descombes, F. Kruggel, D.Y. von Cramon, Spatio-temporal fMRI

    analysis using markov random fields, IEEE Transactions on Medical

    Imaging 17 (6) (1998) 10281039.

    [21] S. Ruan, C. Jaggi, J. Xue, J. Fadili, D. Bloyet, Brain tissue classification of

    magnetic resonance images using partial volume modeling, IEEE

    Transactions on Medical Imaging 19 (12) (2000) 11791187.

    [22] K. Van Leemput, F. Maes, D. Vandermeulen, P. Suetens, A unifying

    framework for partial volume segmentation of brain MR images, IEEE

    Transactions on Medical Imaging 22 (1) (2003) 105113.

    [23] W.M. Wells, E.L. Grimson, R. Kikinis, F.A. Jolesz, Adaptive

    segmentation of MRI data, IEEE Transactions on Medical Imaging 15

    (8) (1996) 429442.

    [24] R. Guillemaud, J.M. Brady, Estimating the bias field of MR images, IEEE

    Transactions on Medical Imaging 16 (6) (1997) 238251.

    [25] J.L. Marroquin, B.C. Vemuri, S. Botello, F. Calderon, A. Fernandez-

    Bouzas, An accurate and efficient bayesian method for automatic

    segmentation of brain MRI, IEEE Transactions on Medical Imaging 21

    (8) (2002) 934945.

    [26] B. Moretti, L.M. Fadili, S. Ruan, N. Bloyet, B. Mazoyer, Phantom-based

    performance evaluation: application to brain segmentation from magnetic

    resonance images, Medical Image Analysis 4 (4) (2000) 303316.

    [27] A. Zavaljevski, A.P. Dhawan, M. Gaskil, W. Ball, J.D. Johnson, Multi-

    level adaptive segmentation of multi-parameter MR brain images,

    Computerized Medical Imaging and Graphics 24 (2) (2000) 8798.

    [28] Y. Wang, T. Adali, J. Xuan, Z. Szabo, Magnetic resonance image analysis

    by information theoretic criteria and stochastic site models, IEEE

    Transactions on Information Technology in Biomedicine 5 (2) (2001)

    150158.

    [29] K. Van Leemput, F. Maes, D. Vandermeulen, P. Suetens, A statistical

    framework for partial volume segmentation, Lecture Notes Computer

    Science 2208 (2001) 204212.

    [31] R. Fjortoft, Y. Delignon, W. Pieczynski, M. Sigelle, F. Tupin,

    Unsupervised classification of radar images using hidden markov chains

    and hidden markov random fields, IEEE Transactions of Geoscience and

    Remote Sensing 41 (3) (2003) 675685.

    [32] S. Derrode, W. Pieczynski, Signal and image segmentation using pairwise

    markov chains, IEEE Transactions on Signal Processing 52 (9) (2004)

    24772489.[33] N. Giordana, W. Pieczynski, Estimation of generalized multisensor

    hidden markov chains and unsupervised image segmentation, IEEE

    Transactions on Pattern Analysis and Machine Intelligence 19 (5) (1997)

    465475.

    M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 15

    + model ARTICLE IN PRESS