Post on 14-Feb-2017
A novel Gaussian mixture model for superpixel
segmentation
Zhihua Bana, Jianguo Liua,∗, Li Caoa
aNational Key laboratory of Science and Technology on Multi-spectral InformationProcessing, School of Automation, Huazhong University of Science and Technology,
Wuhan, Hubei Province 430074, China
Abstract
Superpixel segmentation is used to partition an image into perceptually co-
herence atomic regions. As a preprocessing step of computer vision appli-
cations, it can enormously reduce the number of entries of subsequent algo-
rithms. With each superpixel associated with a Gaussian distribution, we
assume that a pixel is generated by first randomly choosing one of the su-
perpixels, and then the pixel is drawn from the corresponding Gaussian den-
sity. Unlike most applications of Gaussian mixture model in clustering, data
points in our model are assumed to be non-identically distributed. Given an
image, a log-likelihood function is constructed for maximizing. Based on a
solution derived from the expectation-maximization method, a well designed
algorithm is proposed. Our method is of linear complexity with respect to
the number of pixels, and it can be implemented using parallel techniques.
To the best of our knowledge, our algorithm outperforms the state-of-the-
art in accuracy and presents a competitive performance in computational
∗Corresponding author. Tel.:+862787558912; fax: +862787543130.Email addresses: sawpara@126.com (Zhihua Ban), jgliu@ieee.org (Jianguo Liu)
Preprint submitted to Pattern Recognition December 30, 2016
arX
iv:1
612.
0879
2v1
[cs
.CV
] 2
8 D
ec 2
016
efficiency.
Keywords: Expectation-maximization, Gaussian mixture model, parallel
algorithm, superpixel segmentation
1. Introduction
Partitioning image into superpixels can be used as a preprocessing step to
complex computer vision tasks, such as segmentation [1], visual tracking [2],
stereo matching [3], edge detection [4], etc. Sophisticated algorithms benefit
from working with superpixels, instead of just pixels, because superpixels
reduce input entries and enable feature computation on more meaningful
regions.
Like many terminologies in computer vision, there is no rigorous math-
ematical definition for superpixel. The commonly accepted description of a
superpixel is “a group of connected, perceptually homogeneous pixels which
does not overlap any other superpixel.” For superpixel segmentation, the
following properties are generally desirable.
Prop. 1. Accuracy. Superpixels should adhere well to object bound-
aries. Superpixels crossing object boundaries arbitrarily may lead to bad or
catastrophic result for subsequent algorithms. [5, 6, 7, 8]
Prop. 2. Regularity. The shape of superpixels should be regular. Super-
pixels with regular shape make it easier to construct a graph for subsequent
algorithms. Moreover, these superpixels are visually pleasant which is helpful
for algorithm designers’ analysis. [9, 10, 11]
Prop. 3. Similar size. Superpixels should have a similar size. This
property enables subsequent algorithms dealing with each superpixel without
2
bias [12, 13, 14]. As pixels have the same “size” and the term of “superpixel”
is originated from “pixel”, this property is also reasonable intuitively. This is
a key property to distinguish between superpixel and other over-segmented
regions.
Prop. 4. Efficiency. A superpixel algorithm should have a low complex-
ity. Extracting superpixels effectively is critical for real-time applications.
[12, 6].
Under the constraint of Prop. 3, the requirements on accuracy and reg-
ularity are to a certain extent oppositional. Intuitively, if a superpixel, with
a limited size, needs to adhere well to object boundaries, the superpixel
has to adjust its shape to that object which may be irregular. To our best
knowledge, state-of-the-art superpixel algorithms failed to find a compromise
between regularity and accuracy. As four typical algorithms shown in Fig.
1(b)-1(e), the shape of superpixels generated by NC [15, 16] (Fig. 1(b)) and
LRW [10] (Fig. 1(c)) is more regular than that of superpixels extracted by
SEEDS [6] (Fig. 1(d)) and ERS [7] (Fig. 1(e)) Nonetheless, the superpixels
generated by SEEDS [6] and ERS [7] adhere object boundaries better than
those of NC [15] and LRW [10]. In this work, A Gaussian mixture model
(GMM) and an algorithm derived from the expectation-maximization (EM)
[17] are built. It is shown that the proposed method can strike a balance
between regularity and accuracy. An example is displayed in Fig. 1(a), the
compromise is that superpixels at regions with complex textures have an ir-
regular shape to adhere object boundaries, while at homogeneous regions,
the superpixels are regular.
Computational efficiency is a matter of both algorithmic complexity and
3
implementation. Our algorithm has a linear complexity with respect to the
number of pixels. As an algorithm has to read all pixels, linear time theo-
retically is the best time complexity for superpixel problem. Algorithms can
be categorized into two major groups: parallel algorithms that are able to
be implemented with parallel techniques and scale for the number of parallel
processing units, and serial algorithms whose implementations are usually
executed sequentially and only part of the system resources can be used on a
parallel computer. Modern computer architectures are parallel and applica-
tions can benefit from parallel algorithms because parallel implementations
generally run faster than serial implementations for the same algorithm. The
proposed algorithm is inherently parallel and our serial implementation can
easily achieve speedups by adding few simple OpenMP directives.
Our method is constructed by modelling each pixel with a Gaussian mix-
ture model; associating each superpixel to one of the Gaussian densities;
and further solving the proposed model with the expectation-maximization
algorithm. Differing from the commonly used assumption that data points
are independent and identically distributed (i.i.d.) in clustering applications,
pixels are assumed to be independent but non-identically distributed in our
model. The proposed approach was tested on the Berkeley Segmentation
Data Set and Benchmarks 500 (BSDS500) [18]. To the best of our knowl-
edge, the proposed method outperforms state-of-the-art methods in accuracy
and presents a competitive performance in computational efficiency.
The rest of this paper is organized as follows. Section 2 presents an
overview of related works on superpixel segmentation. Section 3 introduces
the model, solution, algorithm, parallel potential, parameters, and complex-
4
(a) (b) (c) (d) (e)
Figure 1: Superpixel segmentations by five algorithms: (a) Our method, (b) NC [15], (c)
LRW [10], (d) SEEDS [6], and (e) ERS [7]. Each segmentation has approximately 200
superpixels. The second row zooms in the regions of interest defined by the black boxes in
the first row. At the third row, superpixel boundaries are drawn to purely black images
to highlight shapes of the superpixels.
ity of the proposed method. Experiments are discussed in section 4. Finally,
the paper is concluded in section 5.
2. Related works
The concept of superpixel was first introduced by Xiaofeng Ren and Ji-
tendra Malik in 2003 [19]. During the last decades, the superpixel problem
has been well studied. Exsiting superpixel algorithms extract superpixels ei-
ther by optimizing superpixel boundaries, such as finding paths and evolving
curves, or by grouping pixels, e.g. the most well-known SLIC [12].
5
2.1. Optimize boundaries
Algorithms exctact superpixels not by labelling pixels directly but by
marking superpixel boundaries, or by only updating the label of pixels on
superpixel boundary is in this category.
Rohkohl et al. present a superpixel method that iteratively assigns su-
perpixel boundaries to their most similar neighbouring superpixel [20]. A
superpixel is represented with a group of pixels that are randomly selected
from that superpixel. The similarity between a pixel and a super-pixel is
defined as the average similarities from the pixel to all the selected represen-
tatives.
Aiming to extract lattice-like superpixels, or “superpixel lattices”, [11]
partitions an image into superpixels by gradually adding horizontal and ver-
tical paths in strips of a pre-computed boundary map. The paths are formed
by two different methods: s-t min-cut and dynamic programming. The for-
mer finds paths by graph cuts and the latter constructs paths directly. The
paths have been designed to avoid parallel paths crossing and guarantee per-
pendicular paths cross only once. The idea of modelling superpixel bound-
aries as paths (or seam carving [21]) and the use of dynamic programming
were borrowed by later variations or improvements [22, 23, 24, 25, 26, 27].
In TurboPixels [14], Levinshtein et al. model the boundary of each super-
pixel as a closed curve. So, the connectivity is naturally guaranteed. Based
on level-set evolution, the curves gradually sweep over the unlabelled pix-
els to form superpixels under the constraints of two velocities. Although
this method can produce superpixels with homogeneous size and shape, its
accuracy is relative poor.
6
In VCells [5], a superpixel is represented as a mean vector of colour of
pixels in that superpixel. With the designed distance [5], VCells iteratively
updates superpixel boundaries to their nearest neighbouring superpixel. The
iteration stops when there are no more pixels need to be updated.
SEEDS [28, 6] exchanges superpixel boundaries using a hierarchical struc-
ture. At the first iteration, the biggest blocks on superpixel boundary are
updated for a better energy. The size of pixel blocks becomes smaller and
smaller as the number of iterations increases. The iteration stops after the
update of boundary exchanges in pixel level.
Improved from SLIC [12], [29, 30] present more complex energy. To min-
imize their corresponding energy, [29, 30] update boundary pixels instead of
assigning a label for all pixels in each iteration. Based on [29], [30] adds the
connectivity and superpixel size into their energy. For the pixel updating,
[30] uses a hierarchical structure like SEEDS [28], while [30] exchanges la-
bels only in pixel level. Zhu et al. propose a speedup of SLIC [12] by only
moving unstable boundary pixels, the label of which changed in the previous
iteration [22].
Besides, based on pre-computed line segments or edge maps of the input
image, [31, 9] align superpixel boundaries to the lines or the edges to form
superpixels with very regular shape.
2.2. Grouping pixels
Superpixels algorithms that assign labels for all pixels in each iteration is
in this category.
With an affinity matrix constructed based on boundary cue [32], the
algorithm developed in [16, 19], which is usually abbreviated as NC [12],
7
uses normalized cut [15] to extract superpixels. This method produces very
regular superpixels, while its time complexity is approximately O(N3/2) [14],
which is expensive as a preprocessing step, where N is the number of pixels.
In Quick shift (QS) [33], the pixel density is estimated on a Parzen window
with a Gaussian kernel. A pixel is assigned to the same group with its
parent which is the nearest pixel with a greater density and within a specified
distance. QS does not guarantee connectivity, or in other words, pixels with
the same label may not be connected.
Veksler et al. propose an approach that distributes a number of overlap-
ping square patches on the input image and extracts superpixels by finding
a label for each pixel from patches that cover the present pixel [34]. The
expansion algorithm in [35] is gradually adapted to modify pixel label within
local regions with a fixed size in each iteration. This method can generate
superpixels with regular shape and its run-time is proportional to the num-
ber of overlapping patches [34]. A similar solution in [36] is to formulate the
superpixel problem as a two-label problem and build an algorithm through
grouping pixels into vertical and horizontal bands. By doing this, pixels in
the same vertical and horizontal group form a superpixel.
Starting from an empty graph edge set, ERS [7] sequentially adds edges
to the set until the desired number of superpixels is reached. At each adding,
ERS [7] takes the edge that results in the greatest increase of an objective
function. The number of generated superpixels is exactly equal to the desired
number. This method adheres object boundary well and its performance in
accuracy was not surpassed until our method is proposed.
SLIC [12] is the most well-known superpixel algorithm due to its efficiency
8
and simplicity. In SLIC [12], a pixel corresponds to a five dimensional vector
including colour and spatial location, and k-means is employed to cluster
those vectors locally, i.e. each pixel only compares with superpixels that fall
into a specified spatial distance and is assigned to the nearest superpixel.
Many variations follow the idea of SLIC in order to either decrease its run-
time [37, 38, 39] or improve its accuracy [40, 29]. LSC [8] also uses a k-means
method to refine superpixels. Instead of directly using the 5D vector used in
SLIC [12], LSC [8] maps them to a feature space and a weighted k-means is
adopted to extract superpixels. It is the most recent algorithm that achieves
equally well accuracy with ERS [7].
Based on marker-based watershed transform, [13, 37] incorporate spatial
constraints to an image gradient in order to produce superpixels with regular
shape and similar size. Generally, those methods run relatively faster, but
adhere ground-truth boundaries badly.
LRW [10] groups pixels using an improved random walk algorithm. By
using texture features to optimize an initial superpixel map, this method can
produce regular superpixels in regions with complex texture. However, this
method suffers from a very slow speed.
Although FH [41], mean shift [42] and watersheds [43], have been refered
to as “superpixel” alogrithms in the literature, they are not covered in this
paper as the size of the regions produced by them varies enormously. This
is mainly because these algorithms do not offer direct control to the size of
the segmented regions. Structure-sensitive or content-sensitive superpixels
in [44, 45] are also not considered to be superpixels, as they do not aim to
extract regions with similar size (see Prop. 3 in section 1).
9
A large number of superpixel algorithms have been proposed, however,
few works present novel models and most of the exsiting energy functions
are variation of the objective function of k-means. In our work, we propose
a novel model to tackle the superpixel problem. With a comprehensively
designed algorithm, the underlying segmentation from the model is well re-
vealed.
3. The method
The proposed method can be described by two steps: the first one is
to introduce the proposed new model, in which pixel and superpixel are
associated with each other; after that, an algorithm is constructed to solve
this model in the second step. The complexity of the proposed algorithm is
presented at the end of this section.
3.1. Model
In the proposed model, i is supposed to be the pixel index of an input
image I with its width W and height H in pixels. The total number of
pixels in I can be denoted as N = W · H. For each pixel i, which belongs
to one of the integers in the image pixel set V = {0, 1, . . . , N − 1}, (xi, yi)
represents its position on the image plane, where, xi ∈ {0, 1, . . . ,W −1}, and
yi ∈ {0, 1, . . . , H−1}. ci is used to represent its intensity or colour. If colour
image is used, ci is a vector, otherwise, ci is a scalar. To better represent
pixel i, a random variable Zi = (Xi, Yi, Ci)T along with its observed value
zi = (xy, yi, ci)T is used. Note that here the random variables Zi, for all
i ∈ V , are independent but non-identically distributed as discussed below.
10
The width vx and height vy of each superpixel should be specified by user.
If the desired number of superpixels K is preferred, we obtain vx and vy using
equation (1).
vx = vy =
⌊√W ·HK
⌋. (1)
It is encouraged to use the same value for vx and vy, or they should not have
a big difference as we wish the generated superpixels are with square shape.
Once vx and vy are obtained, the numbers of superpixels nx and ny re-
spectively along the width and the height of I are defined using equation
(2).
nx =
⌊W
vx
⌋, ny =
⌊H
vy
⌋. (2)
For simplicity of discussion, we assume thatW mod vx = 0 andH mod vy =
0. Therefore, the initial number of superpixels K becomes nx · ny.
For each individual pixel i, there are two initial superpixel numbers, Li,x
and Li,y, which are defined in equation (3).
Li,x =
⌊xivx
⌋, Li,y =
⌊yivy
⌋. (3)
Based on equation (2) and (3), it can be inferred that 0 ≤ Li,x ≤ nx− 1, and
0 ≤ Li,y ≤ ny − 1. Li is used to denote the random latent variable for pixel
i. The possible values of Li are in a set Ki expressed in equation (4).
Ki ={k | k = (Li,x + tx) + (Li,y + ty) · nx, k ≥ 0}, (4)
where tx ∈ {−tx, 1− tx, . . . , tx}, ty ∈ {−ty, 1− ty, . . . , ty}, and tx and ty are
positive integers, such as 1. Obviously, Ki is a subset ofK = {0, 1, . . . , K−1}.
We assume that a pixel is generated by first randomly choosing one of the
Gaussian densities with the same probability 1/K, and then being sampled
11
on the selected Gaussian distribution. With the definitions and notations
above, pixel i is described by a mixture of Gaussians as defined in equation
(5).
p(Zi = zi) =∑k∈Ki
Pr(Li = k) p(zi | Li = k;µk,Σk) , i ∈ V , (5)
where Pr(Li = k) is manually set as 1/K, k ∈ {0, 1, . . . , K − 1}. Although
this setting results in a fact that∫zip(Zi = zi)dzi may not equal 1, its
effect will be removed in our algorithm due to the same value for the prior
distribution of Li. For a given k, p(zi|Li = k;µk,Σk) is a Gaussian density
function parametrized by a mean vector, µk, and a covariance matrix Σk, as
shown in equation (6).
p(z | Li = k;µk,Σk) = (6)
1
(2π)D/2√
det(Σk)exp
{− 1
2(z− µk)TΣ−1k (z− µk)
},
where k ∈ K and D is the number of components in z.
Given an image, our model is defined as maximizing equation (7), which
is extended from logarithmic likelihood function used in many statistic esti-
mation problems.
L(θ) =∑i∈V
ln p(Zi = zi)
=∑i∈V
ln∑k∈Ki
wk p(zi | Li = k;θk). (7)
In the above equation, θ = (θ1, . . . ,θK−1) is used to denote the parameters
in the Gaussian densities, where θk = (µk,Σk).
The label of pixel i is determined by the posterior distribution of Li as
12
shown below.
Li = argk∈Kimax Pr(Li = k | Zi = zi;θk). (8)
The posterior probability of Li can be expressed as
Pr(Li = k | Zi = zi) =p(Li = k,Zi = zi)
p(Zi = zi)
=Pr(Li = k)p(Zi = zi|Li = k)∑k∈Ki
Pr(Li = k)p(Zi = zi|Li = k)
=p(zi|Li = k;θk)∑k∈Ki
p(zi|Li = k;θk). (9)
Therefore, once we find a solution to maximize (7), Li can be easily obtained.
3.2. Solution
As Pr(Li = k) is constant, we will use wk to represent it in the following
text. According to Jensen’s inequality, L(θ) is greater than or equal to
Q(R,θ) as shown below.
L(θ) =∑i∈V
ln∑k∈Ki
Ri,kwk p(zi | Li = k;θk)
Ri,k
(10)
≥∑i∈V
∑k∈Ki
Ri,k lnwk p(zi | Li = k;θk)
Ri,k
= Q(R,θ), (11)
where Ri,k ≥ 0,∑
k∈KiRi,k = 1, and R = {Ri,k | i ∈ V, k ∈ Ki}. We
use the expectation-maximization (EM) algorithm to iteratively maximize
Q(R,θ) to approach the maximum of L(θ) with two steps: the expectation
step (E-step) and the maximization step (M-step).
E-step: once a guess of θ is given, Q(R,θ) is expected to be tightly
attached to L(θ). To this end, R is required to ensure L(θ) = Q(R,θ).
13
Equation (12) is a sufficient condition for Jensen’s inequality to hold the
equality.wk p(zi | Li = k;θk)
Ri,k
= α, k ∈ Ki, (12)
where α is a constant number. Since∑
k∈KiRi,k = 1, α can be eliminated
and Ri,k can be updated by equation (13).
Ri,k =p(zi|Li = k;θk)∑k∈Ki
p(zi|Li = k;θk). (13)
Notice that equation (13) is exactly the same with equation (9). Therefore,
equation (8) can be rewrote as
Li = argk∈KimaxRi,k. (14)
M-step: in this step, θ is derived by maximizing Q(R,θ) with a given
R. To do this, we first get the derivatives of Q(R,θ) with respect to µk and
Σk, and set the derivatives to zero, as seen in equations (15)-(17). Then the
parameters are obtained by solving equation (17).
∂Q(R,θ)
∂µk=∑i∈Ik
Ri,k
{Σ−1k (zi − µk)
}, (15)
∂Q(R,θ)
∂Σk
=∑i∈Ik
Ri,k
{1
2Σ−1k (zi − µk)(zi − µk)TΣ−1k −
1
2Σ−1k
}, (16)
∂Q(R,θ)
∂µk= 0,
∂Q(R,θ)
∂Σk
= 0, (17)
µk =
∑i∈Ik Ri,kzi∑i∈Ik Ri,k
, (18)
Σk =
∑i∈Ik Ri,k(zi − µk)(zi − µk)T∑
i∈Ik Ri,k
, (19)
where Ik = {i|k ∈ Ki, i ∈ V } is a subset of V . The update of θ will mono-
tonically improve L(θ): L(θ(t+1)) = Q(R(t+1),θ(t+1)) ≥ Q(R(t),θ(t+1)) ≥
Q(R(t),θ(t)) = L(θ(t)).
14
3.3. The algorithm
In this section, we will discuss the choice of covariance matrices and tricks
to make the algorithm running well in practice.
It can be noted that although the solution in section 3.2 supports full
covariance matrices, i.e., a covariance matrix with all its elements as shown
in equation (19), only block diagonal matrices are used in this paper (see
equation (20)). This is done for three reasons. First, computing block diag-
onal matrices is more efficient than full matrices. Second, generally there is
no strong relation between the spatial coordinates (xi, yi) and the intensity
or the colour ci. So it is reasonable to consider them separately. Third, full
matrices will not bring better performance but give bad results for colour
images. For different colour space, it is encouraged to split components that
do not have strong relation into different covariance matrices. For example,
if CIELAB is adopted, it is better to put colour-opponent dimensions a and b
into a 2 by 2 covariance matrix. In this case, (20) will become (21). However,
we will keep using (20) to discuss our algorithm for simplicity.
Σk =
Σk,s 0
0 Σk,c
, (20)
Σk =
Σk,s 0 0
0 σ2k,l 0
0 0 Σk,(a,b)
, (21)
where Σk,s and Σk,c respectively represent the spatial covariance matrix and
the colour covariance matrix. The covariance matrices are updated according
to equations (22) and (23) which are derived by replacing Σk in equation (16)
15
with the block diagonal matrices, and by further solving (17).
Σk,s =
∑i∈Ik Ri,k(zi,s − µk,s)(zi,s − µk,s)T∑
i∈Ik Ri,k
, (22)
Σk,c =
∑i∈Ik Ri,k(zi,c − µk,c)(zi,c − µk,c)T∑
i∈Ik Ri,k
, (23)
where zi,s and µi,s are the spatial components of zi and µi, and zi,c and µi,c
are, for grayscale image, the intensity component, or, for colour image, the
colour component of zi and µi.
Since Σk,s and Σk,c are positive semi-definite in practice, they may be not
invertible sometimes. To avoid this trouble, we first compute the eigende-
composition of the two covariance matrices as shown in equations (24) and
(25), then eigenvalues on the major diagonal of Λk,s and Λk,c are modified
using equations (26) and (27), and finally Σk,c and Σk,c are reconstructed
with the equations (28) and (29).
Σk,s = Qk,s Λk,s Q−1k,s, (24)
Σk,c = Qk,c Λk,c Q−1k,c, (25)
where Λk,s and Λk,c are diagonal matrices with eigenvalues on their respective
major diagonal. λk,s(js) and λk,c(jc) for colour image are used to denote the
respective eigenvalues, for js ∈ {0, 1} and jc ∈ {0, 1, 2}. Qk,s and Qk,c are
orthogonal matrices. If the input image is grayscale, Qk,c = 1, Σk,c and Λk,c
are scalars, and jc will be reduced to 0.
λk,s(js) =
λk,s if λk,s(js) ≥ εs,
εs else.(26)
16
λk,c(jc) =
λk,c if λk,c(jc) ≥ εc,
εc else.(27)
where εs and εc are two constant numbers.
Σk,s = Qk,s Λk,s Q−1k,s, (28)
Σk,c = Qk,c Λk,c Q−1k,c, (29)
where Λk,s and Λk,c are diagonal matrices with λk,s(js) and λk,c(jc) on their
respective major diagonal.
After initializing θ, equations (13), (18), (28), and (29) are iterated until
convergence. Once the iteration stops, the superpixel label Li can be obtained
using equation (14).
As the connectivity of superpixels cannot be guaranteed, a postprocessing
step is required. This is done by sorting the isolated superpixels in ascending
order according to their sizes, and sequentially merging small isolated super-
pixels, which are less than one fourth of the desired superpixel size, to their
nearest neighbouring superpixels, with only intensity or colour being taken
into account. Once an isolated superpixel (source) is merged to another su-
perpixel (destination), the size of the source superpixel is cleared to zero,
and the size of the destination superpixel will be updated by adding the size
of the source superpixel. This size updating trick will prevent the size of the
produced superpixels from significantly varying.
As a preprocessing step, superpixel algorithm should run as fast as pos-
sible. Since in SLIC [12] and LSC [8], iterating a certain number of times
is sufficient for most images without checking convergence, we borrow this
trick to our algorithm and set the number of iterations T as a parameter.
The proposed algorithm can be summarized in Algorithm 1.
17
Algorithm 1 The proposed superpixel algorithm
Input : vx and vy, or K
Output: Li, i ∈ V
1: Initialize µk, k ∈ K, using K seed pixels over the input image uniformly
at fixed horizontal and vertical intervals vx and vy.
2: Initialize Σk,s and Σk,c.
3: Compute Li,x and Li,y using equation (3).
4: Calculate R using equation (13), set t = 0.
5: while t < T do
6: Compute µk using equations (18).
7: Compute Σt+1k,s and Σt+1
k,c using equations (28) and (29).
8: Update Ri,k using equation (13).
9: t = t+ 1.
10: end while
11: Li is determined by equation (14).
12: Merge small superpixels to their nearest neighbour.
18
3.4. Parallel potential
As the frequency of a single processor is difficult to improve, modern
processors are designed using parallel architecture. If an algorithm is able
to be implemented with parallel techniques and scales for the number of
parallel processing units, its computational efficiency will be significantly
improved. Fortunately, the most expensive part of our algorithm, namely
the computing R and θ, can be parallelly executed. Each Ri,k is computed
independently, and so do µk and Σk. In our experiments, we will show that
our implementation is easy to get speedup on multi-core CPUs.
3.5. Parameters
In addition to the parameters (i.e. vx and vy, or K) left to users, tx, ty,
T , εs, εc, and the initialization of θ should be assigned before starting the
proposed algorithm.
tx and ty control the size of overlapping region of neighbouring superpix-
els. We set them to 1 for all the results in this paper. If we use a large
tx or ty, the run-time will increase a lot but the results will not present a
satisfactory improvement in accuracy. In general, larger T will give better
performance but, again, it will sacrifice the efficiency. We have found that
T = 10 is enough for most images. In most state-of-the-art algorithms, the
size of overlapping region is not provided as parameters. We make them free
to users so that they can customize their own algorithm.
Unlike tx, ty and T , different εs, εc, and initialization of θ will not change
the run-time but give a different performance in accuracy. Although εs and
εc are originally used to prevent the covariance matrices from being singular,
they also can weigh the relative importance between spatial proximity and
19
colour similarity. For instance, a larger εc produces more regular superpixels,
and the opposite is true for a smaller εc. As εc and εs are opposite to each
other, we set εs = 2 and leave εc for detailed description in section 4. As we
hope superpixels being local or regularly positioned on the image plane, µk
are initialized regularly as already presented in Algorithm 1. For Σk,s, we
set their main diagonal to v2x, v2y and others to zero, so that neighbouring
superpixels can be well overlapped at the beginning. The initialization of
Σk,c is not very straightforward, the basic idea is to set their main diagonal
with a small colour distance with which two pixels are perceptually uniform.
The effect of different initialization of Σk,c will be discussed in section 4.
3.6. Complexity
The updating of R has a complexity of O(|Ki| · (T + 1) · N), for i ∈ V .
According to equation (4), (T + 1) ·N ≤ |Ki| · (T + 1) ·N < (2tx + 1) · (2ty +
1) · (T + 1) ·N , in which T , tx, and ty are constant numbers in our algorithm.
Therefore, the complexity of updating R is O(N).
Based on equations (18), (22), and (23), the complexity of updating θ
is O(T · K · |Ik|), for k ∈ {0, 1, . . . , K}. According to equations (3) and
(4), we have vx · vy ≤ |Ik| ≤ (2tx + 1) · (2ty + 1) · vx · vy. As a result,
T ·K · vx · vy ≤ T ·K · |Ik| ≤ T · (2tx + 1) · (2ty + 1) ·K · vx · vy, which means
T ·N ≤ T ·K · |Ik| ≤ T · (2tx + 1) · (2ty + 1) ·N . Therefore, the updating of
Gaussian parameters has a complexity of O(N).
In the worst case, the sorting procedure in the postprocessing step re-
quires O(m2) operations, where m is the number of isolated superpixels.
The merging step needs O(m ·n) operations, where m is the number of small
isolated superpixels and n represents the average number of their adjacent
20
100 200 300 400 500 600
Number of superpixels
0.8
0.85
0.9
BR
6=26=46=66=86=100.863
0.8660.869
(a)
100 200 300 400 500 600
Number of superpixels
0.15
0.2
0.25
0.3
UE
6=26=46=66=86=10
0.202
0.203
(b)
100 200 300 400 500 600
Number of superpixels
0.93
0.94
0.95
0.96
AS
A
6=26=46=66=86=100.9585
0.9587
(c)
Figure 2: Different initialization for Σk,c. Experiments are performed on BSDS500 and
results are averaged over 500 images. The results of BR, UE, and ASA are correspondingly
plotted in (a), (b), and (c). In order to see more details, part of the results are zoomed
in. (better see in colour)
neighbours. In practice, m2 + m · n � T · N , the operations required for
the postprocessing step can be ignored. Therefore, the proposed superpixel
algorithm is of a linear complexity O(N).
4. Experiment
In this section, algorithms are evaluated in terms of accuracy, compu-
tational efficiency, and visual effects. Like many state-of-the-art superpixel
algorithms, we also use CIELAB colour space for our experiments because it
is perceptually uniform for small colour distance.
Accuracy : three commonly used metrics are adopted: boundary recall
(BR), under-segmentation error (UE), and achievable segmentation accuracy
(ASA). To assess the performance of the selected algorithms, experiments
are conducted on the Berkeley Segmentation Data Set and Benchmarks 500
(BSDS500) which is an extension of BSDS300. These two data sets have
been wildly used in superpixel algorithms. BSDS500 contains 500 images,
and each one of them has the size of 481×321 or 321×481 with at least four
21
ground-truth human annotations.
• BR measures the percentage of ground-truth boundaries correctly re-
covered by the superpixel boundary pixels. A true boundary pixel is
considered to be correctly recovered if it falls within two pixels from at
least one superpixel boundary. A high BR indicates that very few true
boundaries are missed.
• A superpixel should not cross ground-truth boundary, or, in other
words, it should not cover more than one object. To quantify this
notion, UE calculates the percentage of superpixels that have pixels
“leak” from their covered object as shown in equation (30).
UE = (−1) +1
N
∑|sk∩sg | > ε|sk|
|sk|, (30)
where sk and sg are pixel sets of superpixel k and ground-truth segment
g. ε = 0.05 is generally accepted.
• If we assign every superpixel with the label of a ground-truth segment
(a) (b) (c) (d) (e)
Figure 3: visual results with (a) λ = 2, (b) λ = 4, (c) λ = 6, (d) λ = 8, and (e) λ = 10.
The test image is from BSDS500 and approximately 400 superpixels are extracted in each
image.
22
into which the most pixels of the superpixel fall, how much segmen-
tation accuracy can we achieve, or how many pixels are correctly seg-
mented? ASA is designed to answer this question. Its formula is defined
in equation (31) in which G is the set of ground-truth segments.
ASA =1
N
∑k∈K
max
{|sk ∩ sg|
∣∣ g ∈ G}. (31)
Computational efficiency : execution time is used to quantify this prop-
erty.
4.1. Effect of parameters
As we have mentioned in section 3.4, the effect of εc and the initialization
of Σk,c is discussed in this section.
Σk,c are initialized to diagonal matrix with the same λ2 on their major
diagonal. As shown in Fig. 2, there is no obvious regularity. In Fig. 2, the
maximum difference between two lines is around 0.001∼ 0.006 which is very
small. Although it seems that small λ will lead to a better BR result, it is
not true for UE and ASA. For instance, in the enlarged region of Fig. 2b, the
result of λ = 10 is slightly better than λ = 6. Visual results with different
λ are plotted in Fig. 3, it is hard for human to distinguish the difference
among the five results.
εc can be used to control the regularity of the generated superpixels in
each iteration. As shown in Fig. 4, small difference of εc does not present
obvious variation for UE and ASA, but it do affect the results of BR. In
general, a larger εc leads to more regular superpixels. Conversely, the shape
of superpixels generated with a smaller εc is relative irregular (see Fig. 5).
23
100 200 300 400 500 600
Number of superpixels
0.8
0.85
0.9
BR
0c=20c=40c=60c=80c=10
0.860.8650.87
(a)
100 200 300 400 500 600
Number of superpixels
0.15
0.2
0.25
0.3
UE
0c=2
0c=4
0c=6
0c=8
0c=10
0.2
0.202
(b)
100 200 300 400 500 600
Number of superpixels
0.94
0.95
0.96
AS
A
0c=2
0c=4
0c=6
0c=8
0c=100.95840.95860.9588
(c)
Figure 4: Results with different εc. Experiments are performed on BSDS500 and results
are averaged over 500 images. The results of BR, UE, and ASA are correspondingly plotted
in (a), (b), and (c). In order to see more details, part of the results are zoomed in. (better
see in colour)
(a) (b) (c) (d) (e)
Figure 5: visual results with (a) εc = 2, (b) εc = 4, (c) εc = 6, (d) εc = 8, and (e) εc = 10.
The test image is from BSDS500 and approximately 400 superpixels are extracted in each
image. The second row is enlarged from the rectangular marked in the first row.
Because superpixels with irregular shape will produce more boundary pixels,
the result of BR with small εc is better than that with greater εc.
We will use λ = 8 and εc = 8 in the following experiments. Although
this setting does not give the best performance in accuracy, the shape of
superpixels using this setting is regular and visually pleasant (see Fig. 5(d)).
Moreover, it is enough to outperform state-of-the-art algorithms as shown in
Fig. 6.
24
4.2. Parallel scalability
In order to evaluate scalability for the number of processors, we test our
implementation on an machine attached with an Intel(R) Xeon(R) CPU E5-
2620 v3 @ 2.40GHz and 8 GB RAM. The source code is not optimized for
any specific architecture. Only two OpenMP directives are added for the
updating of Σk, µk, and R, as they can be computed independently (see
section 3.4). As listed in Table 1, for a given image, multiple cores will
present a better performance.
Table 1: run-time (ms) of our implementation on different images with various resolution.
The program is executed using 1, 2, 4, and 6 cores.
Resolution 1 core 2 cores 4 cores 6 cores
240×320 393.646 303.821 227.078 200.708
320×480 776.586 589.785 400.073 321.548
480×640 1569.74 1011.62 743.629 624.561
640×960 3186.71 2244.12 1353.72 1069.79
4.3. Comparison with state-of-the-art algorithms
We compare the proposed algorithm to eight state-of-the-art superpixel
segmentation algorithms including LSC1 [8], SLIC2 [12], SEEDS3 [6], ERS4
1http://jschenthu.weebly.com/projects.html2http://ivrl.epfl.ch/research/superpixels3http://www.mvdblive.org/seeds/4https://github.com/mingyuliutw/ers
25
[7], TurboPixels5 [14], LRW6 [10], VCells7 [5], and Waterpixels8 [13]. The re-
sults of the eight algorithms are all generated from implementations provided
by the authors on their respective websites with their default parameters ex-
cept for the desired number of superpixels, which is generally decided by
users.
As shown in Fig. 6, our method outperforms the selected state-of-the-art
algorithms especially for UE and ASA. It is not easy to distinguish between
our result and LSC in Fig. 6(a). However, if we use εc = 2, our result will
obviously outperforms LSC as displayed in Fig. 7.
To compare the run-time of the selected algorithms, we test them on
a desktop machine equipped with an Intel(R) Core(TM) i5-4590 CPU @
3.30GHz and 8 GB RAM. The results are plotted in Fig. 8. According to Fig.
8(b), as the size of the input image increases, run-time of our algorithm grows
linearly, which proves our algorithm is of linear complexity experimentally.
A visual comparison is displayed in Fig. 9. According to the zooms,
only our algorithm can correctly reveal the segmentations. Our superpixel
boundaries can adhere object very well. LSC gives a really competitive result,
however there are still parts of the objects being under-segmented. The
superpixels extracted by SEEDS and ERS are very irregular and their size
varies tremendously. The remaining five algorithms can generate regular
superpixels, but they adhere object boundaries very bad.
5http://www.cs.toronto.edu/ babalex/research.html6https://github.com/shenjianbing/lrw147http://www-personal.umich.edu/ jwangumi/software.html8http://cmm.ensmp.fr/ machairas/waterpixels.html
26
100 200 300 400 500 600
Number of superpixels
0.4
0.6
0.8
BR
OursLSCSEEDSERS
VCellsLRWSLICWaterTP
(a)
100 200 300 400 500 600
Number of superpixels
0.2
0.3
0.4
0.5
UE
OursLSCSEEDSERS
VCellsLRWSLICWaterTP
(b)
100 200 300 400 500 600
Number of superpixels
0.88
0.9
0.92
0.94
0.96
AS
A
OursLSCSEEDSERS
VCellsLRWSLICWaterTP
(c)
Figure 6: Comparison with state-of-the-art algorithms. Experiments are performed on
BSDS500 and results are averaged over 500 images. The results of BR, UE, and ASA are
correspondingly plotted in (a), (b), and (c).
5. Conclusion
This paper presents an efficient superpixel segmentation algorithm by
constructing a novel Gaussian mixture model. With each superpixel as-
sociated to a Gaussian density, each pixel is assumed to be independently
distributed according to a mixture of the Gaussian densities. Aiming to ex-
tract superpixles with similar size, the Gaussian densities are assumed to be
occurred with the same chance. We formulate a log-likelihood function to
describe the probability of an image. Based on Jensen’s inequality and the
expectation-maximization algorithm, an iterative solution is constructed to
approach a maximum of the log-likelihood by improving its low bound. The
label of each pixel is determined to the one with maximum posterior proba-
bility. With a comprehensively designed algorithm, opportunity is discovered
to control the shape of superpixels.
According to our experiments, the initialization of our method produces
results with tiny difference which can be ignored. The proposed algorithm
is of linear complexity, which has been proved by both theoretical analysis
27
100 200 300 400 500 600
Number of superpixels
0.8
0.85
0.9
BR
OursLSC
Figure 7: Comparison of BR between LSC and our method. Without changing the default
value of other parameters, we use ε = 2 in this figure.
229#
229
458#
458
606#
606
725#
725
826#
826
916#
916
999#
999
Image size
0
2
4
Exe
cutio
n tim
e (m
s)
#104
OursLSCSEEDSERSVCells
SLICWaterTP
(a)
229#
229
458#
458
606#
606
725#
725
826#
826
916#
916
999#
999
Image size
0
500
1000
1500
2000
Exe
cutio
n tim
e (m
s) OursLSCSEEDSSLIC
(b)
Figure 8: Comparison of run-time. Seven algorithms are compared in (a). In order to see
more details, the rum-time of the fastest four algorithms is plotted in (b). LRW is not
included in the two figures due to its slow speed.
28
Ou
rsL
SC
SE
ED
SE
RS
VC
ells
LR
WS
LIC
Wat
erT
P
(a) (b) (c) (d)
Figure 9: Visual comparison. The test image is selected from BSDS500. Each algorithm
extracts approximately 200 superpuxels. For each segmentation, four parts are enlarged
to display more details.
29
and experimental results. What’s more, it can be implemented using par-
allel techniques, and its run-time scales for the number of processors. The
comparison with the state-of-the-art algorithms is shown that our algorithm
outperforms the selected methods in accuracy and presents a competitive
performance in computational efficiency.
As a contribution to open source society, we will make our test code public
available at https://github.com/ahban.
References
References
[1] Z. Li, X.-M. Wu, S.-F. Chang, Segmentation using superpixels: A bi-
partite graph partitioning approach, in: CVPR, 2012, pp. 789–796.
[2] F. Yang, H. Lu, M.-H. Yang, Robust superpixel tracking, IP 23 (4)
(2014) 1639–1651.
[3] F. Cheng, H. Zhang, M. Sun, D. Yuan, Cross-trees, edge and superpixel
priors-based cost aggregation for stereo matching, PR 48 (7) (2015) 2269
– 2278.
[4] X. Sun, K. Shang, D. Ming, J. Tian, J. Ma, A biologically-inspired
framework for contour detection using superpixel-based candidates and
hierarchical visual cues, Sensors 15 (10) (2015) 26654–26674.
[5] J. Wang, X. Wang, VCells: Simple and efficient superpixels using edge-
weighted centroidal voronoi tessellations, PAMI 34 (6) (2012) 1241–1247.
30
[6] M. Van den Bergh, X. Boix, G. Roig, L. Van Gool, SEEDS: Superpixels
extracted via energy-driven sampling, IJCV 111 (3) (2015) 298–314.
[7] M.-Y. Liu, O. Tuzel, S. Ramalingam, R. Chellappa, Entropy rate super-
pixel segmentation, in: CVPR, 2011, pp. 2097–2104.
[8] Z. Li, J. Chen, Superpixel segmentation using linear spectral clustering,
in: CVPR, 2015, pp. 1356–1363.
[9] L. Duan, F. Lafarge, Image partitioning into convex polygons, in:
CVPR, 2015, pp. 3119–3127.
[10] J. Shen, Y. Du, W. Wang, X. Li, Lazy random walks for superpixel
segmentation, IP 23 (4) (2014) 1451–1462.
[11] A. P. Moore, S. Prince, J. Warrell, U. Mohammed, G. Jones, Superpixel
lattices, in: CVPR, 2008, pp. 1–8.
[12] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, S. Susstrunk, SLIC
superpixels compared to state-of-the-art superpixel methods, PAMI
34 (11) (2012) 2274–2282.
[13] V. Machairas, M. Faessel, D. Cardenas-Pena, T. Chabardes, T. Walter,
E. Decenciere, Waterpixels, IP 24 (11) (2015) 3707–3716.
[14] A. Levinshtein, A. Stere, K. N. Kutulakos, D. J. Fleet, S. J. Dickinson,
K. Siddiqi, TurboPixels: Fast superpixels using geometric flows, PAMI
31 (12) (2009) 2290–2297.
[15] J. Shi, J. Malik, Normalized cuts and image segmentation, PAMI 22 (8)
(2000) 888–905.
31
[16] G. Mori, Guiding model search using segmentation, in: ICCV, 2005, pp.
1417–1423.
[17] A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum likelihood from
incomplete data via the em algorithm, Journal of the royal statistical
society. Series B (methodological) (1977) 1–38.
[18] P. Arbelaez, M. Maire, C. Fowlkes, J. Malik, Contour detection and
hierarchical image segmentation, PAMI 33 (5) (2011) 898–916.
[19] X. Ren, J. Malik, Learning a classification model for segmentation, in:
ICCV, 2003, pp. 10–17.
[20] C. Rohkohl, K. Engel, Efficient image segmentation using pairwise pixel
similarities, in: Joint Pattern Recognition Symposium, 2007, pp. 254–
263.
[21] S. Avidan, A. Shamir, Seam carving for content-aware image resizing,
TOG 26 (3).
[22] S. Zhu, D. Cao, S. Jiang, Y. Wu, P. Hu, Fast superpixel segmentation
by iterative edge refinement, EL 51 (3) (2015) 230–232.
[23] A. P. Moore, S. J. Prince, J. Warrell, “lattice cut”-constructing super-
pixels using layer constraints, in: CVPR, 2010, pp. 2117–2124.
[24] H. Fu, X. Cao, D. Tang, Y. Han, D. Xu, Regularity preserved superpixels
and supervoxels, MM 16 (4) (2014) 1165–1175.
[25] D. Tang, H. Fu, X. Cao, Topology preserved regular superpixel, in:
ICME, 2012, pp. 765–768.
32
[26] P. Siva, A. Wong, Grid seams: A fast superpixel algorithm for real-time
applications, in: CRV, 2014, pp. 127–134.
[27] P. Siva, C. Scharfenberger, I. B. Daya, A. Mishra, A. Wong, Return of
grid seams: A superpixel algorithm using discontinuous multi-functional
energy seam carving, in: ICIP, 2015, pp. 1334–1338.
[28] M. Van den Bergh, X. Boix, G. Roig, B. de Capitani, L. Van Gool,
SEEDS: Superpixels extracted via energy-driven sampling, in: ECCV,
2012, pp. 13–26.
[29] K. Yamaguchi, D. McAllester, R. Urtasun, Efficient joint segmentation,
occlusion labeling, stereo and flow estimation, in: ECCV, 2014, pp.
756–771.
[30] J. Yao, M. Boben, S. Fidler, R. Urtasun, Real-time coarse-to-fine topo-
logically preserving segmentation, in: CVPR, 2015, pp. 2947–2955.
[31] L. Li, J. Yao, J. Tu, X. Lu, K. Li, Y. Liu, Edge-based split-and-merge
superpixel segmentation, in: ICIA, 2015, pp. 970–975.
[32] D. R. Martin, C. C. Fowlkes, J. Malik, Learning to detect natural image
boundaries using local brightness, color, and texture cues, PAMI 26 (5)
(2004) 530–549.
[33] A. Vedaldi, S. Soatto, Quick shift and kernel methods for mode seeking,
in: ECCV, 2008, pp. 705–718.
[34] O. Veksler, Y. Boykov, P. Mehrani, Superpixels and supervoxels in an
energy optimization framework, in: ECCV, 2010, pp. 211–224.
33
[35] Y. Boykov, O. Veksler, R. Zabih, Fast approximate energy minimization
via graph cuts, PAMI 23 (11) (2001) 1222–1239.
[36] Y. Zhang, R. Hartley, J. Mashford, S. Burn, Superpixels via pseudo-
boolean optimization, in: ICCV, 2011, pp. 1387–1394.
[37] P. Neubert, P. Protzel, Compact watershed and preemptive slic: On
improving trade-offs of superpixel segmentation algorithms., in: ICPR,
2014, pp. 996–1001.
[38] Y. Kesavan, A. Ramanan, One-pass clustering superpixels, in: ICIAfS,
2014, pp. 1–5.
[39] C. Y. Ren, V. A. Prisacariu, I. D. Reid, gSLICr: SLIC superpixels at
over 250hz, ArXiv e-printsarXiv:1509.04232.
[40] S. Jia, S. Geng, Y. Gu, J. Yang, P. Shi, Y. Qiao, NSLIC: SLIC super-
pixels based on nonstationarity measure, in: ICIP, 2015, pp. 4738–4742.
[41] P. Felzenszwalb, D. Huttenlocher, Efficient graph-based image segmen-
tation, IJCV 59 (2) (2004) 167–181.
[42] D. Comaniciu, P. Meer, Mean shift: A robust approach toward feature
space analysis, PAMI 24 (5) (2002) 603–619.
[43] L. Vincent, P. Soille, Watersheds in digital spaces: an efficient algorithm
based on immersion simulations, PAMI 13 (6) (1991) 583–598.
[44] Y.-J. Liu, C.-C. Yu, M.-J. Yu, Y. He, Manifold slic: A fast method to
compute content-sensitive superpixels, in: CVPR, 2016, pp. 651–659.
34
[45] P. Wang, G. Zeng, R. Gan, J. Wang, H. Zha, Structure-sensitive super-
pixels via geodesic distance, IJCV 103 (1) (2013) 1–21.
35