Epitome Representation of Images and Notes on Epitome ......epitome was introduced in the context of...

13
Epitome Representation of Images and Notes on Epitome-based Silhouette Generation Shinko Y. Cheng Electrical and Computer Engineering University of California, San Diego La Jolla CA 92037 Abstract Describing the appearance of images is a task common to many computer vision applications. A novel appearance and shape model called epitomes was recently reported and seems to draw some attention in academia. Epitomes are condensed miniatures of images composed of an arrange- ment of patches from the original image. The result is a compact representation of the textural and shape essence of an image. This arrangement of patches is found using a variant of the EM algorithm called “variational EM”. The epitome was introduced in the context of generative image models and structuring the image as various constitutive el- ements layers. The purpose of this paper is to describe in some de- tail epitome construction, the use of variational EM and Bayesian networks in epitome estimation and a layered gen- erative models of images using epitomes. The intended au- dience of this paper is the student who is interested in epit- omic image representation and has some knowledge of the EM algorithm and graphical models. 1. Introduction Much research was done on finding an appearance model that would be discerning enough to distinguish one image from the next, and flexible enough to categorize the im- age as having a particular kind of content. Jojic, et al. presents Epitomes, a novel summarization of an image gen- erated from multiple patches from an image, combined into a single miniature image [1]. When used within a hierar- chical generative model of an image, that model can gen- erate promising image segmentation by virtue of separat- ing regions of the image that contain particular constituent patches, as opposed to other constituent patches. This paper will describe epitomes, estimation of epito- mes, and its significance as an image summarization tech- nique. This paper will attempt to elucidate the derivations epitome estimation using variational EM algorithm. This paper does not attempt to provide a comprehensive exper- imental evaluation; it will however, show a few results of the epitome and the likely uses of it based on the epitome construction. The intended audience of this paper is the stu- dent reader. Specifically, this paper will derive the update equations and describe the key assumptions for estimating epitomes and epitome-based image segmentation. This pa- per will describe the variational inference technique, which is a variant of the EM algorithm, called variational EM [4] used to simplify the estimation task computationally. The paper will conclude by trying to expose possible issues in- volved in utilizing epitome representation of images for the purpose of silhouette generation, and show a few results of epitome estimation of images. 2. Epitome Model Explained A condensed version of an image x of size N × M is de- fined as the epitome e of size n × m. The epitome contains the textural and spatial essence of the original and is found by finding an arrangement of several patches of different shapes and sizes from the original image that best fits into the space of the epitome. The patches in the image may overlap, and the arrangement of these patches in the epit- ome may overlap. Given the mapping of each of these patches from the epitome to the original and the epitome itself, it is possi- ble to perform image reconstruction. It is desirable to use patches large enough to capture spatial properties of the im- age, but small enough to generalize across the image and across a collection of images. The explanation will begin with a description of the epit- ome model as a miniature image constructed from a pre- specified selection of patches. Then, patch color estimation is incorporated into the inference algorithm itself, although never changing the original patch’s image coordinates. This allows the formulation to be inserted into larger graphi- cal models for performing other tasks using the essence- extracting characteristic of epitomes. An example of a larger graphical model is the generative layered model of an image for separating the image into two layers. 1

Transcript of Epitome Representation of Images and Notes on Epitome ......epitome was introduced in the context of...

Page 1: Epitome Representation of Images and Notes on Epitome ......epitome was introduced in the context of generative image models and structuring the image as various constitutive el-ements

Epitome Representation of Imagesand Notes on Epitome-based Silhouette Generation

Shinko Y. ChengElectrical and Computer EngineeringUniversity of California, San Diego

La Jolla CA 92037

Abstract

Describing the appearance of images is a task common tomany computer vision applications. A novel appearanceand shape model called epitomes was recently reported andseems to draw some attention in academia. Epitomes arecondensed miniatures of images composed of an arrange-ment of patches from the original image. The result is acompact representation of the textural and shape essenceof an image. This arrangement of patches is found using avariant of the EM algorithm called “variational EM”. Theepitome was introduced in the context of generative imagemodels and structuring the image as various constitutive el-ements layers.

The purpose of this paper is to describe in some de-tail epitome construction, the use of variational EM andBayesian networks in epitome estimation and a layered gen-erative models of images using epitomes. The intended au-dience of this paper is the student who is interested in epit-omic image representation and has some knowledge of theEM algorithm and graphical models.

1. IntroductionMuch research was done on finding an appearance modelthat would be discerning enough to distinguish one imagefrom the next, and flexible enough to categorize the im-age as having a particular kind of content. Jojic, et al.presents Epitomes, a novel summarization of an image gen-erated from multiple patches from an image, combined intoa single miniature image [1]. When used within a hierar-chical generative model of an image, that model can gen-erate promising image segmentation by virtue of separat-ing regions of the image that contain particular constituentpatches, as opposed to other constituent patches.

This paper will describe epitomes, estimation of epito-mes, and its significance as an image summarization tech-nique. This paper will attempt to elucidate the derivationsepitome estimation using variational EM algorithm. Thispaper does not attempt to provide a comprehensive exper-imental evaluation; it will however, show a few results of

the epitome and the likely uses of it based on the epitomeconstruction. The intended audience of this paper is the stu-dent reader. Specifically, this paper will derive the updateequations and describe the key assumptions for estimatingepitomes and epitome-based image segmentation. This pa-per will describe the variational inference technique, whichis a variant of the EM algorithm, called variational EM [4]used to simplify the estimation task computationally. Thepaper will conclude by trying to expose possible issues in-volved in utilizing epitome representation of images for thepurpose of silhouette generation, and show a few results ofepitome estimation of images.

2. Epitome Model Explained

A condensed version of an imagex of sizeN ×M is de-fined as the epitomee of sizen×m. The epitome containsthe textural and spatial essence of the original and is foundby finding an arrangement of several patches of differentshapes and sizes from the original image that best fits intothe space of the epitome. The patches in the image mayoverlap, and the arrangement of these patches in the epit-ome may overlap.

Given the mapping of each of these patches from theepitome to the original and the epitome itself, it is possi-ble to perform image reconstruction. It is desirable to usepatches large enough to capture spatial properties of the im-age, but small enough to generalize across the image andacross a collection of images.

The explanation will begin with a description of the epit-ome model as a miniature image constructed from a pre-specified selection of patches. Then, patch color estimationis incorporated into the inference algorithm itself, althoughnever changing the original patch’s image coordinates. Thisallows the formulation to be inserted into larger graphi-cal models for performing other tasks using the essence-extracting characteristic of epitomes. An example of alarger graphical model is the generative layered model ofan image for separating the image into two layers.

1

Page 2: Epitome Representation of Images and Notes on Epitome ......epitome was introduced in the context of generative image models and structuring the image as various constitutive el-ements

Figure 1: Generative Model of the Epitome.

2.1. Epitome as a Generative Model of ImagePatches{Zk}Pk=1

Assume for the moment that an imagex is given in theform of only a set of patches{Zk}Pk=1. The patch shapecan be arbitrary, but for the current discussion, we assumethe patch shape to be a square of sizeK×K. For each patchZk, the generative model uses hidden mappingsTk that mapcoordinatesj ∈ Ek in the epitome to the coordinatesi ∈ Sk

in the patch. An epitomic element contains two parameters,the mean and variance of the epitomic pixel. These vari-ables are shown visually in fig 1.

Given the epitomee = (µ, φ) and the mappingTk, thepatchZk is generated by copying the appropriate pixelsfrom the epitome mean and adding Gaussian noise of thelevel given in the variance map. This can be formulated asthe conditional probability

p(Zk|Tk, e) =∏

i∈Sk

N (zi,k;µTk(i), φTk(i)) (1)

where N (zi,k;µTk(i), φTk(i)) is a Gaussian distributionoverzi,k with meanµTk(i) and varianceφTk(i).

The graph representation of the epitome definition as agenerative model of image patches is shown in fig 2. Thediagram depicts the patches{Zk}Pk=1 as independently ob-served evidence and the epitome and mappings as hiddenvariables to be inferred.

The number of possible mappingsTk is assumed to befinite. Because the patches are square, the patchZk can bemapped to the epitome innm possible ways. Recall thatthe location of the patch at the image end is fixed, becausethe patches are assumed given. Define the image to epitomecoordinate mapping as the functionTk(i) = i − Tk. Thei-th pixel in the patch maps toTk(i)-th pixel in the epit-ome, andTk is the two-dimensional shift. A square blockcopy implies thati range starts at the top-left and ends atthe bottom-right of the image patchZk, andTk range startsat the top-left and ends at the bottom-right of the epitomee.The mappings can otherwise include rotations, scaling andother deformations of patches or be completely arbitrary.

The given patches are assumed to come in multiples ofthe largest patch to capture large and small patterns in the

Figure 2: Generative Model of the Epitome of ImagePatches.

image. This allows coarse and fine tessellations of the im-age patches to compete in the inference process. The com-petition prevents excessive blurring of the epitome, while atthe same time allowing grouping of the image features intolarger patterns.

The patches are assumed to be generated independently,so the joint distribution is:

p({Zk,Tk}Pk=1, e)

=P∏

k=1

p(Zk, Tk, e) (2)

=P∏

k=1

p(Zk|Tk, e)p(Tk, e) (3)

=P∏

k=1

p(Zk|Tk, e)p(Tk|e)p(e) (4)

=p(e)P∏

k=1

p(Tk)p(Zk|Tk, e)

=p(e)P∏

k=1

p(Tk)∏

i∈Sk

N(zi,k;µTk(i), φTk(i)) (5)

We arrive at equ. (3) and (4) using Bayes rule on condi-tional probabilities. We assume that the prior on all epito-mes is flat and therefore does not appear in the parameter es-timation. The prior on the mappingsp(Tk) is used to favorcertain mappings over others such as the desire for largercomplete patches rather than several smaller components ofthat patch. The mappingsp(Tk) can be estimated at thesame time as the other parameters. It was found by Jojic, etal, that the simple mappings resulted in a tidy arrangementof patches in the epitome nonetheless.

2

Page 3: Epitome Representation of Images and Notes on Epitome ......epitome was introduced in the context of generative image models and structuring the image as various constitutive el-ements

Maximizing negative Helmholtz Free-Energyas a Variant of the EM Algorithm

The parameters{Tk}Pk=1 ande are estimated using a vari-ant of the Expectation-Maximization Algorithm.Learningthe configuration of the parameters involves finding the pa-rameters that maximize the log-likelihood. A direct wayinvolves finding the log-likelihood of the data. In the caseof epitome estimation, this means finding the log-likelihoodof the patches which requires summing over all possiblemappings for all possible epitomes of the joint distributionequ. (5).

The EM algorithm is an approximate way of esti-mating these parameters by iterating between an E-Step,finding the log-likelihood of the data from the com-plete log-likelihood, and an M-Step, estimating the pa-rameters from the log-likelihood. Starting with a non-trivial guess at the configuration of the parameters, theE-step involves taking the expectation of the completelog-likelihood logP ({Zk, Tk}Pk=1|e) over the hidden vari-ables {Tk}Pk=1 given the observed data and parameters{Zk}Pk=1e. Then, the M-step follows, maximizing theresulting (non-complete) likelihood function of the dataL(e|{Zk, Tk}Pk=1) = logP ({Zk}Pk=1|{Tk}Pk=1, e). Re-place the new configuration of parameters that maximizedthe likelihood at this iteration with the non-trivial guess ofthe parameters for the next E-step iteration. The estimatedmappings{Tk}Pk=1 result from the E-step and the estimatedepitomee will result in the M-Step while.

The EM algorithm is an approximation because whilethe direct maximum likelihood technique finds parametersthat result in the global maximum, the EM algorithm willpotentially find the parameters that result in a local max-imum of the likelihood. At each iteration, the likelihoodwith the newer set of parameters was proven to be greaterthan the likelihood of any previous iteration [6]. That is tosay,

L(θ(0)|z) > L(θ(1)|z) > ... > L(θ(t−1)|z) = L(θ(t)|z)

whereθ = e, z = {Zk, Tk}Pk=1 in the case of epitomeestimation and the superscript is the iteration number. Thelikelihood is proven to converge to a maximum value.

A variant of the EM algorithm, dubbed variationEM, involves forming the negative Helmholtz free-energyF (q, {Tk}Pk=1, e) which is the lower-bound of the log-likelihood. Take the log-likelihood of the dataL(θ|v) =log p(v|θ) and write it in the form of a marginalization ofthe joint distribution of the observed and hidden random

variables given the parameters, we get

log p(v|θ) = log∫

h

p(v, h|θ)

= log∫

h

q(h)p(v, h|θ)q(h)

≥∫

h

q(h) logp(v, h|θ)q(h)

(6)

=− F (q, p) (7)

where h are the hidden and v are the observed random vari-ables, andθ the set of deterministic parameters. Notingthat log(·) is a concave function in eqn. (6), we can applyJensen’s inequality to acquire the lower bound.

The negative of the Helmholtz free-energy can be rear-ranged to become

−F (q, p) =∫

h

q(h) logp(v, h|θ)q(h)

=∫

h

q(h) logp(h|v, θ)p(v|θ)

q(h)(8)

=∫

h

q(h) logp(h|v, θ)q(h)

+ log p(v|θ)

=−D(p||q) + L(θ|v) (9)

Worthy of note here is that the Kullback-Liebler divergenceof the two densitiesD(p||q) = 0 if q(h) = p(h|v, θ) ∀h,and therefore the negative Helmholtz free-energy equals thelog-likelihood of the data.

The standard E-Step of the standard EM algorithm in-volves computing the posterior distributionp(h|v, θ) forcalculating the expectation of the complete log-likelihood.The corresponding E-Step in the EM algorithm variant ismaximizing the negative Helmholtz free-energy−F (q, p)with respect to theq(h) which has its maximum atq(h) =p(h|v, θ), which means maximizing the negative Helmholtzfree-energy results in calculating the posterior distribu-tion. The standard M-Step involves maximizing the log-likelihood of the observed data with respect toθ. TheM-Step in the variant of the EM algorithm involves max-imizing F (q, p) in terms of the parametersθ with theq(h)that maximizes−F (q, p) from the last E-Step, which is thesame as maximizing the log-likelihood of the dataL(θ|v).

Furthermore, It has been found by Neal and Hinton thatif −F (q, p) has a local (or global) maximum atq∗(h) andθ∗, then a local (or global) maximum ofL(θ|z) occurs atθ∗

as well [4]. This implies that we can maximize−F (q, p)over q(h) and θ instead of instead of maximizingL(θ|z)over the parametersθ.

3

Page 4: Epitome Representation of Images and Notes on Epitome ......epitome was introduced in the context of generative image models and structuring the image as various constitutive el-ements

Maximizing Negative Helmholtz Free-Energyof the Generative Model of the Epitome

Returning to the estimation of epitomes, the first step in us-ing this variant of the EM algorithm is to find the lowerbound of the log-likelihood of the data{Zk}Pk=1:

log p({Zk}Pk=1)

= log∑

{Tk}Pk=1

∫e

p({Zk, Tk}Pk=1, e)

= log∑

{Tk}Pk=1

∫e

q({Tk}Pk=1, e)p({Zk, Tk}Pk=1, e)q({Tk}Pk=1, e)

(10)

≥∑

{Tk}Pk=1

∫e

q({Tk}Pk=1, e) log[p({Zk, Tk}Pk=1, e)q({Tk}Pk=1, e)

](11)

=B

wheree is the parameter to be estimated. As above, sincelog(·) is a concave function, we can use Jensen’s inequalityin eqn. (10) to arrive at eqn. (11) the Helmholtz free-energyexpression [4].

We are interested in the estimate of the deterministic pa-rametere. The deterministic nature ofe is reflected in theformulation with a delta functionδ(e − e). The mappingsare described by their distributionsq(Tk) where any twomappings are independent of each other. The lower boundis subsequently simplified by using the posterior distribu-tion

q({Tk}Pk=1, e) = δ(e− e)∏

i

q(Tk) (12)

which results in

log p({Zk}Pk=1)

≥P∑

k=1

q(Tk)[log p(Tk)− log q(Tk)

]+

P∑k=1

q(Tk)∑i∈Sk

logN (zi,k; µTk(i), φTk(i)). (13)

This lower bound is now easily maximized overq(Tk)ande. As shown above, maximizing in this way the lowerbound or negative Helmholtz free-energy results in the stan-dard expectation-maximization algorithm. The distributionover the mappingsTk is set the actual posterior distribution

q(Tk) ∼ p(Tk)∏

i∈SK

N (zi,k; µTk(i), φTk(i)) (14)

normalizedq(Tk) by summing over all allowed mappings.Evaluatingq(Tk) also comprises of the E-step.

In the M-step, we optimize the lower boundB of thelog-likelihood over the epitome parameterse = (µ, φ). Tofind the value of the epitome meanµj at epitome coordi-natej that locally maximizes the log-likelihood, take thederivative with respect toµj of the log-likelihood and set tozero. The first term on the right hand side of (13) does notdepend onµj . We introduce a summation to sum the gaus-sian distributions that are dependent onµj , thus allowingthe derivative with respect toµj to move inside the summa-tion. The derivative of B with respect toµj is then

∂B

∂µj=

∂µj

P∑k=1

q(Tk)∑i∈Sk

logN (zi,k; µTk(i), φTk(i))

=∂

∂µj

P∑k=1

∑i∈Sk

q(Tk) logN (zi,k; µTk(i), φTk(i))

=∂

∂µj

P∑k=1

∑i∈Sk

∑Tk:Tk(i)=j

q(Tk) logN (zi,k; µTk(i), φTk(i))

=P∑

k=1

∑i∈Sk

∑Tk:Tk(i)=j

q(Tk)∂

∂µjlogN (zi,k; µj , φj)

=P∑

k=1

∑i∈Sk

∑Tk:Tk(i)=j

q(Tk)

· ∂

∂µj

{− 1

2

(log 2πφj −

(zi,k − µj)2

φj

)}=

P∑k=1

∑i∈Sk

∑Tk:Tk(i)=j

q(Tk)(zi,k − µj)

φj

(15)

The variance in the denominator drops out after setting theright hand side to zero, sinceφj is in every term of the triplesummation. Theµj that maximizes the lower bound of thelog-likelihood is therefore

µj =

P∑k=1

∑i∈Sk

∑Tk:Tk(i)=j

q(Tk)zi,k

P∑k=1

∑i∈Sk

∑Tk:Tk(i)=j

q(Tk)

(16)

Because the block-copy mapping is assumed (Tk(i) =i− Tk) in this discussion, the summation overTk such thatTk(i) = j simply results in only one termTk = i − j. Butthis summation is necessary to express the terms in closedform because of the presence of the approximate posteriorterm q(Tk). This additional sum accounts for the generalcase when multipleTk may produceTk(i) = j.

To find the variance of the epitomeφj that locally maxi-mizes the log-likelihood, take the derivative with respect to

4

Page 5: Epitome Representation of Images and Notes on Epitome ......epitome was introduced in the context of generative image models and structuring the image as various constitutive el-ements

φj the lower bound of the log-likelihood and set itequal tozero. Like the case for finding the mean, we introduce asum to account for terms that depend onφj only.

∂B

∂φj

=P∑

k=1

∑i∈Sk

∑Tk:Tk(i)=j

q(Tk)

· ∂

∂φj

{− 1

2

(log 2πφj −

(zi,k − µj)2

φj

)}=

P∑k=1

∑i∈Sk

∑Tk:Tk(i)=j

q(Tk){− 1

2φj

+(zi,k − µj)2

2φ2j

}The φj that maximizes the lower bound of the log-

likelihood is therefore

φj =

P∑k=1

∑i∈Sk

∑Tk:Tk(i)=j

q(Tk)(zi,k − µj)2

P∑k=1

∑i∈Sk

∑Tk:Tk(i)=j

q(Tk)

(17)

In the M-step the epitome mean and variance are com-puted by (16) and (17).

2.2 Epitome as a generative model of an en-tire image

In the previous section, we assumed that the patches of theimage{Zk}Pk=1 were independently chosen. Because epit-ome estimation will ultimately be used in the context of alarger generative model, it is necessary to augment the pre-vious model and introduce the process of combining the in-dependently selected patches by averaging the overlappingpatch pixels to yield the image pixel. In the inference step,the patch pixels are made to agree.

The conditional probability of the image pixelxi giventhe patches{Zk}Pk=1 is found by

p(xi|{Zk}Pk=1) = N (xi;1Ni

∑k,i∈Sk

zi,k, ψi) (18)

whereNi is the number of patches that overlap at coordinatei. The entire model now has the joint distribution

p(x, {Zk, Tk}Pk=1, e)

=p({Zk, Tk}Pk=1, e)∏

i

p(xi|{Zk}Pk=1)

=p(e)P∏

k=1

p(Tk)∏

i∈Sk

N(zi,k;µTk(i), φTk(i))

·∏

i

N(xi;1Ni

∑k,i∈Sk

zi,k, ψi) (19)

Figure 3: Generative Model of the Epitome of an Image.

where the first term on the right hand side is the joint dis-tribution of the patches, mappings and epitome defined inthe previous section. The content of the patchesZk are nowhidden and has to be inferred from the observed imagex.Since we are interested in only the patches in which theoverlapping pixels agree, we introduce the following poste-rior in the inference process.

q({zi,k}, {Tk}, e) = δ(e− e)∏k

q(Tk)∏

i∈Sk

δ(zi,k − ζi)

(20)

This posterior assumes that the patch pixels that overlap inthe image share the same colorζi. This posterior also re-flects the treatment ofζi as a deterministic parameter to beestimated.

Again, employing the variant of the EM algorithm forparameter estimation, we find the lower bound of the log-likelihood of the data, in this casex, to form

log p(x) = B ≥∫{zi,k}

∑{Tk}P

k=1

∫e

q({zi,k}, {Tk}Pk=1, e) (21)

· logp(x, {Zk, Tk}Pk=1, e)q({zi,k}, {Tk}Pk=1, e)

(22)

5

Page 6: Epitome Representation of Images and Notes on Epitome ......epitome was introduced in the context of generative image models and structuring the image as various constitutive el-ements

which with the chosen posterior reduces to

B =P∑

k=1

q(Tk)[log p(Tk)− log q(Tk)

]+

P∑k=1

q(Tk)∑i∈Sk

logN(ζi; µTk(i), φTk(i))

+∑

i

logN(xi; ζi, ψi) (23)

Since the image parameterx is treated as a deterministicparameter, the E-Step consists the same calculation as in thecase of a generative model of only patches, calculating theposterior distribution

q(Tk) ∼ p(Tk)∏

i∈Sk

N(ζi; µTk(i), φTk(i)) (24)

Taking the derivative of the lower bound of the log-likelihood with respect toφj , µj , and ζi and setting it tozero will result in the following three update equations forthe M-Step. The update equation for the color of the patchpixel ζi is given by

ζi =

xi

ψk+

P∑k=1

∑u∈Sku:u→i

q(Tk)µTk(u)

φTk(u)

1ψu

+P∑

k=1

∑u∈Sku:u→i

q(Tk)1

φTk(u)

(25)

whereu is in epitome coordinates andi is in image coordi-nates and the expressionu : u→ i means the patch coordi-nateu that maps to the image coordinatei.

The update equations for the mean and variance of theepitome are given by

φj =

∑k

∑i∈Sk

∑Tk,Tk(i)=j

q(Tk)(ζi − µj)2∑k

∑i∈Sk

∑Tk,Tk(i)=j

q(Tk)(26)

µj =

∑k

∑i∈Sk

∑Tk,Tk(i)=j

q(Tk)ζi∑k

∑i∈Sk

∑Tk,Tk(i)=j

q(Tk)(27)

wherei is in patch coordinates andj is in epitome coordi-nates.

2.3. Shape Epitome and Image SegmentationIn this section, the example of epitomes used as a modulein a generative model of an image in the paper by Jojic, et

Figure 4: Multi-layer Generative Model of an Image UsingEpitomes to Generate Appearance and Shape Layers.

al. is explained. There, epitomes are used to model thetexture and color of the appearance and shape of an image.Fig 4 shows a Bayesian network of this generative model,illustrating the dependency relationships between randomvariables.

The generative model of overlapping objects consists ofone epitomees to model the layer appearancess1, s2 andanother epitomeem to model the mask imagem. The com-plete imagex is composed of the layer equation

x = s1 ◦m + (1−m) ◦ s2 + n

where n is gaussian distributed noise and◦ is element-wisemultiplication. This produces the following conditionalprobability

p(x|s1, s2,m) = N (x; s1 ◦m + (1−m) ◦ s2,Ψ)

Bayesian Networks and d-connectivity

Recall, the conditions for d-connectivity between nodes ina Bayesian network: A random variablea is dependent onrandom variableb given evidenceE = {e1, e2, ..., en} ifthere is a d-connecting path froma to b given E. A d-connecting path is defined as a path from nodesa to b withrespect to the evidence nodesE if every interior noden inthe path has the property that either

1. it is linear or diverging and not a member of E.

2. or it is converging, and either n or one of its descen-dants is in E.

There are three basic kinds of paths between two nodes.They are linear, converging and diverging paths. An exam-ple of a linear path betweena to b is

a→ n→ b.

An example of a a diverging path between the same twonodes is

a→ n← b.

6

Page 7: Epitome Representation of Images and Notes on Epitome ......epitome was introduced in the context of generative image models and structuring the image as various constitutive el-ements

For a converging one we would reverse the arrows for

a← n→ b.

So, if random variablesa andb given evidenceE were d-separated (not d-connected) by the conditions named above,thena andb givenE are independent and we may write

p(a, b|E) = p(a|E)p(b|E)

For more details on the dependence relationships betweennodes in a Bayesian network are in [5].

Estimating Image Layers, Appearance andShape Epitomes

With the notion of d-connectivity in mind, the joint distri-bution can be written in factored form by observing the de-pendence structure implied in the Bayesian network.

p(x,s1, s2,m, es, em)=p(x, es, em|s1, s2,m)p(s1, s2,m) (28)

=p(x|s1, s2,m)p(es, em|s1, s2,m)p(s1, s2,m) (29)

=p(x|s1, s2,m)p(es, em, s1, s2,m) (30)

=p(x|s1, s2,m)p(s1, s2, es)p(m, em) (31)

=p(x|s1, s2,m)p(s1, s2|es)p(es)p(m)|em)p(em)(32)

=p(x|s1, s2,m)p(s1|es)p(s2|es)p(es)p(m)|em)p(em)(33)

=p(x|s1, s2,m) p(s1|es) p(s2|es) p(m|em) p(es) p(em)(34)

Bayes rule allows us to separate the joint probabilityto get eqn (28). For the first term in eqn (29),s1, s1,m “block” the paths existing as evidence betweenx andes,em, violating the first condition for d-connectivity whichsays interior nodes for linear paths should not be inE ford-connnectivity. Therefore, we can separate the conditionalprobability in terms of the product of the conditional prob-ability of the imagex and the epitomeses,em given thelayerss1,s2,andm. We can combine the probabilities ofthe epitome and layers again using Bayes rule in eqn (30).Because the five nodes (random variables) are disconnectedwithout the image node,es,s1,ands2 as a group are inde-pendent fromem andm. In eqn (31),s1 ands2 givenes

is d-separated by both conditions above therefore indepen-dent. Two paths exist betweens1 ands2 in this situation, adiverging path throughes and a converging path throughx.The diverging path violates the first d-connectivity condi-tion since the interior node,es is evidence. The convergingpath violates the second condition sincex is not evidenceandx does not have any descendants.

This joint distribution is then marginalized to attain theHelmholtz free-energy lower bound of the log-likelihood ofthe datax and optimized. Again employing the variant ofthe EM algorithm and fixing the layers and epitomes as de-terministic parameters of a family of distributions by assum-ing the posterior

q = δ(s1 − s1)δ(s2 − s2)δ(m− m)δ(em − em)δ(es − es)(35)

Therefore,

log p(x) ≥B =

=∫s1,s2,m

q logp(x, s1, s2,m, es, em)q(s1)q(s2, )q(m)

= log p(x|s1, s2, m) + log p(s1|es)+ log p(s2|es) + log p(m|em) + const

The first term in the sum is quadratic in the three imagess1 ,s2, andm,

log p(x|s1, s1, m) = Bx

=∑

i

logN(xi; ζm

i ζs1i + (1− ζm

i )ζs2i , ψi

)The remaining terms resemble the log-likelihood func-

tions for the generative epitome model given entire images.

log p(sl|es) = Bl

=P∑

k=1

q(T sl

k )[log p(T sl

k )− log q(T sl

k )]

+P∑

k=1

q(T sl

k )∑i∈Sk

logN(ζsli ; µs

Tslk (i)

, φsT

slk (i)

)

log p(m|em) = Bm

=P∑

k=1

q(Tmk )

[log p(Tm

k )− log q(Tmk )

]+

P∑k=1

q(Tmk )

∑i∈Sk

logN(ζmi ; µs

T mk (i), φ

sT m

k (i))

B = Bx +B1 +B2 +Bm

For the E-Step, as in the case in sec. 2.2 but for everylayer and its epitome, calculate the posterior distribution

q(T sl

k ) ∼ p(T sl

k )∏

i∈Sk

N (zi,k;µTk(i), φTk(i)) (36)

normalized by the possible mappingsTk

7

Page 8: Epitome Representation of Images and Notes on Epitome ......epitome was introduced in the context of generative image models and structuring the image as various constitutive el-ements

The M-Step consists of a set of bilinear update equationsare iterated over until convergence. These update equationsare found by taking the derivative of Helmholtz free-energyof the generative layered image modelB with respect toµs

j , φsj ,µm

j , φmj , ζs1

i , ζs2i andζm

i . The update equations aregiven in table 1 and 2.

3. Examples and Discussion

The epitomes of some sample images are estimated and theresults are shown in fig. 5. This code that created theseexamples is obtainable at http://research.microsoft.com/ jo-jic/software.htm

The algorithm is initialized by randomly selectingTK ×K patches over the entire image. The epitome mean isinitialized to the average pixel value over the entire imageplus some gaussian noise with variance equal to the amountof variance over the pixels in the image The epitome vari-ance is set to all ones. The patch mappingTk(i) = i − Tk

is defined on a torus. In other words, the patch destinationcoordinates in the epitome are modK. Eqn. 16 and 17 areiterated through a series of convolutions of the approximateposterior probability map and the patch or epitome.

Because the algorithm starts by randomly choosingframes and randomly choosing positions in epitome to placethe patches, each epitome estimate will be different even forthe same image the position of the patches. The constitutingparts in the epitome will be similar. This illustrates the mul-tiple local minima that is produced by the EM algorithm.

Epitomes of the image may be used as a starting pointto estimate the epitomes of other images or the same image.An example of epitomes of a sequence of images in a video,where the previous epitome is used as a starting point forestimation of the new epitome is shown in fig. 6. Here wecan see that the sequence of epitomes are similar to oneanother, even though patches were randomly chosen fromthe image in estimation.

Most note-worthy is the use of variation EM in epitomeestimation. The variant of the EM algorithm has permittedthe estimation ofe without the need to find the exact pos-terior probabilityp({Tk}Pk=1, e|{Zk}Pk=1) for the standardE-Step which requires finding the marginalP ({Zk}Pk=1) bysumming over all possible patch mappings for each patch –a non-trivial task. Instead, variational EM has allowed usto use an approximate posterior, the only requirement be-ing that it is proportional to the complete log-likelihood andsum over allTk equals 1 [4].

Parameter estimation is performed by forming the jointdistribution of the system of the patches and mappingsfor the epitome or layered generative model of the image,and marginalizing it to attain the lower bound of the log-likelihood of the data. This lower bound is then maximizedover the parameterse and the approximate posterior con-

structed using the complete log-likelihood and normalizedby the sum of the function over the hidden variable. Theprior on the mappings can have a density that favors largermappings, but can otherwise be flat. Assuming the densi-ties of the epitome and the color of the patch pixels reflectsthe manner in which point estimates are found in generalparameter estimation problems, assuming the parameters tobe deterministic and be of only one configuration.

Of critical concern is the size of the patches. Epitomes ofimages will be different given the maximum and minimumpatch sizes specified. When the patch size is limited to only1 pixel in size, the epitome and the mappings basically de-scribe the average color and the frequency of each of thesecolors used in the image. This epitome is then equivalentto the color histogram and spatial arrangement within theepitome means little. When the patch size is as large as theepitome, then the epitome is like an averaging of several re-gions of the image aligned such that the variance map hassmall values.

Ideally, the epitome would be generated with the patchsize as large as the one of the features in a repeating patternof features in the image. The ability to have several patchsizes compete with each other is built into the epitome esti-mation algorithm by allowing some patches to survive whileothers fade away.

4. ConclusionEpitomes are a novel appearance model composed ofpatches from an image. The epitome of an image are foundusing a variant of the EM algorithm. We have shown mostof the steps in deriving the update equations for the epitomeas a generative model of image patches, of entire image,and for the image as a generative model of image layers andepitomes.

Incorporating epitomes into a larger generative models,the appearance extracting ability can be used for many otherpurposes, and deriving update equations using the varia-tional EM inference framework can be relatively straight-forward.

References

[1] N. Jojic and B. Frey, “Epitomic analysis of appearance andshape,”To be presented at ICCV, 2003.

[2] I.Mikic, et al, “Articulated Body Posture Estimation fromMulti-Camera Voxel Data,” CVPR 2001. Proceedings of the2001 IEEE Computer Society Conference on , Volume: 1 ,8-14 Dec. 2001

[3] B. Frey and N. Jojic, “Advances in algorithms for inferenceand learning in complex probability models,”accepted, IEEETrans. PAMI, 2003.

8

Page 9: Epitome Representation of Images and Notes on Epitome ......epitome was introduced in the context of generative image models and structuring the image as various constitutive el-ements

Table 1: Update equations for epitome-based layered generative model of an Image.

µsj =

P∑k=1

∑i∈Sk

∑T

s1k :T

s1k (i)=j

q(T s1k )ζs1

i +P∑

k=1

∑i∈Sk

∑T

s2k :T

s2k (i)=j

q(T s2k )ζs2

i

P∑k=1

∑i∈Sk

∑T

s1k :T

s1k (i)=j

q(T s1k ) +

P∑k=1

∑i∈Sk

∑T

s2k :T

s2k (i)=j

q(T s2k )

(37)

φsj =

P∑k=1

∑i∈Sk

∑T

s1k :T

s1k (i)=j

q(T s1k )(ζs1

i − µsj)

2 +P∑

k=1

∑i∈Sk

∑T

s2k :T

s2k (i)=j

q(T s2k )(ζs2

i − µsj)

2

P∑k=1

∑i∈Sk

∑T

s1k :T

s1k (i)=j

q(T s1k ) +

P∑k=1

∑i∈Sk

∑T

s2k :T

s2k (i)=j

q(T s2k )

(38)

µmj =

P∑k=1

∑i∈Sk

∑T m

k :T mk (i)=j

q(Tmk )ζm

i

P∑k=1

∑i∈Sk

∑T m

k :T mk (i)=j

q(Tmk )

(39)

φmj =

P∑k=1

∑i∈Sk

∑T m

k :T mk (i)=j

q(Tmk )(ζm

i − µmj )2

P∑k=1

∑i∈Sk

∑T m

k :T mk (i)=j

q(Tmk )

(40)

9

Page 10: Epitome Representation of Images and Notes on Epitome ......epitome was introduced in the context of generative image models and structuring the image as various constitutive el-ements

Table 2: Update equations for epitome-based layered generative model of an Image, continued...

ζs1i =

xiζmi

2ψi+

P∑k=1

∑u∈Sku→i

µsT

s1k (u)

φsT

s1k (u)

(2ζmi − 1)ζm

i

2ψi+

P∑k=1

∑u∈Sku→i

1

φsT

s1k (u)

(41)

ζs2i =

−xi(1− ζmi )

2ψi+

P∑k=1

∑u∈Sku→i

µsT

s2k (u)

φsT

s2k (u)

−ζmi (1− ζm

i ) + (1− ζmi )2

2ψi+

P∑k=1

∑u∈Sku→i

1

φsT

s2k (i)

(42)

ζmi =

(ζs2i − xi)(ζs1

i + ζs2i )

2ψi+

P∑k=1

∑u∈Sku→i

µsT m

k (i)

φsT m

k (i)

−(ζs1i + ζs2

i )2

2ψi+

P∑k=1

∑u∈Sku→i

1

φsT m

k (i)

(43)

10

Page 11: Epitome Representation of Images and Notes on Epitome ......epitome was introduced in the context of generative image models and structuring the image as various constitutive el-ements

Figure 5: Sample epitomes of images. These50× 50 epitomes are generated from100 20× 20 patches over10 iterations ofEM. Convergence was achieved with∼ 4 iterations.

Figure 6: Sample epitomes of a sequence of images in a video. These30 × 30 epitomes are generated from100 20 × 20patches over4 iterations of EM. Convergence was achieved with∼ 4 iterations.. Note the similarity of the epitomes from one frame to the next.

11

Page 12: Epitome Representation of Images and Notes on Epitome ......epitome was introduced in the context of generative image models and structuring the image as various constitutive el-ements

[4] M.I.Jordan, Z.Ghahramani, T.S.Jaakkola, and L.K. Saul, “Anintroduction to variational methods for graphical models,”Learning in Graphical Models,M.I. Jordan, Ed. Kluwer Aca-demic Publishers, Norwell MA., 1998.

[5] E.Charniak, “Bayesian Networks without Tears”,distributedby The American Association for Artificial intelligence,1991

[6] A.P.Dempster, N.M.Laird, D.B.Rubin, “Maximum Likeli-hood from Incomplete Data via the EM Algorithm,” Journalof the Royal Statistical Society, 1977.

A. Original Project Objectives andMilestones

The primary task is to explicate the definition of the ap-pearance epitome of an image and the derived algorithm forepitome estimation. Namely, this paper’s intent is to explainthe process of learning the collection of image patches andpatch-to-image mappings in the epitome estimation. Thesecondary task is to utilize the epitome representation in alayered generative image model for silhouette generation, aform of image segmentation. And the intent is to exposethe issues involved in this utility. It would then be qualita-tively compared with a background subtraction segmenta-tion technique.

A milestones chart for the project is shown in table 3.The explication of the proof of the epitome estimation al-gorithm is complete. The derivations of the segmentationalgorithm are also complete. Only the estimation of theepitome as a generative model of image patches are exper-imentally verified with the provided code. The results ofthem are given in the main text.

A.1. Target Questions

The questions the resulting paper will try to address aregeneral questions about inference algorithms and statisti-cal generative models, including what is the appearanceand shape epitome? What are generative models and unsu-pervised variational inference algorithms (EM algorithm)?How do a generative model and an unsupervised variationalinference algorithm allow extraction of the epitome, seg-mentation of the foreground and background, and filling ofthe occluded parts with the appearance predicted from theepitome?

The paper will also attempt to address specific issuesof the algorithms pertaining to the epitome construction,such as how does the extreme and intermediate values ofK (patch size), N(epitome size), I (EM iterations) affect theepitome result and its usefulness?

With regards to epitomes and segmentation, the paperwill try to understand what is required to perform segmenta-tion an image of a person into a person silhouette? What arethe constraints and parameters? Is it effective? What is therate of computation and how may it be improved? What ad-vantages does this hold over conventional background sub-traction segmentation techniques?

A.2. Existing Code and Test Images

Experimental data for image segmentation will be gatheredfrom a digital camera available to me to capture the varioussituations, such as orange-colored-clothed person in a greenroom for high contrast example, a person in orange in a

12

Page 13: Epitome Representation of Images and Notes on Epitome ......epitome was introduced in the context of generative image models and structuring the image as various constitutive el-ements

Milestone Date Description of work to be completed1 Oct 28 - Finalize hypotheses and tasks for the project

- Read up on Generative Models, Unsupervised Variation Inference Algorithm(M.I. Jordan)- ∗Complete final project proposal

2 Nov 11 - Explicate the proof of the estimation of parameters of the epitome as a gen-erative model of image patches and entire images algorithm. Understand d-connectivity of Bayesian networks

3 Nov 18 - Understand the notion of Helmholtz Free Energy and its relationship to EMalgorithm.- ∗Epitome Presentation

4 Nov 25 - Devise the steps needed for image segmentation and implement it in Matlabif time permits- ∗Complete polished draft of project report- Complete as much as possible programming of segmentation algorithm. Cap-ture and apply algorithm to a series of test sequences for qualitative analysis.

5 Dec 4 - ∗Submit Final Draft of project report

Table 3: Milestones Table

varied background for something more realistic, and finallya plain clothed person in a varied background.

Matlab code for epitome generation and epitome color-ing for segmentation is provided athttp://www.research.microsoft.com/ jojic/software.htm.This software includes the epitome generating algorithm asa generative model from patches, not images, described inSection 2.1 of the paper by Jojic.

A.3. Previous Work on EpitomesThe parameter estimation technique used in the paper by Jo-jic [1] for epitomes is described in a general context in [3].[1] was referred to specifically in Section 2.1 of the epit-ome paper describing “parameters estimation by marginal-izing the joint distribution and optimizing the log likelihoodof the data using the approximate posterior to compute thelower bound on the log likelihood.”

The paper that explains an unsupervised variational in-ference algorithm used to jointly extract the epitome, seg-ments the foreground from the background and fills the oc-cluded parts with the appearance predicted from the epit-ome is described in [4].

13