Post on 27-Jan-2015
description
Novel Approaches to Natural Scene Categorization
Amit Prabhudesai
Roll No. 04307002
amitp@ee.iitb.ac.in
M.Tech Thesis Defence
Under the guidance of
Prof. Subhasis Chaudhuri
Indian Institute of Technology, Bombay
Natural Scene Categorization – p.1/32
Overview of topics to be covered
• Natural Scene Categorization: Challenges• Our contribution
◦ Qualitative visual environment description• Portable, real-time system to aid the visually impaired• System has peripheral vision!
◦ Model-based approaches• Use of stochastic models to capture semantics• pLSA and maximum entropy models
• Conclusions and Future Work
Natural Scene Categorization – p.2/32
Natural Scene Categorization
• Interesting application of a CBIR system• Images from a broad image domain: diverse and often
ambiguous• Bridging the semantic gap• Grouping scenes into semantically meaningful categories
could aid further retrieval• Efficient schemes for grouping images into semantic
categories
Natural Scene Categorization – p.3/32
Qualitative Visual Environment Retrieval
SKYBUILDING
WO
OD
S
LAWN
RB
RTFR
LT
LB
WATER BODY
P1
P2 P3
• Use of omnidirectional images• Challenges
◦ Unstructured environment◦ No prior learning (unlike navigation/localization)
• Target application and objective◦ Wearable computing community, emphasis on visually
challenged people◦ Real-time operation
Natural Scene Categorization – p.4/32
Qualitative Visual Environment System: Overview
• Environment representation• Environment retrieval
◦ View partitioning◦ Feature extraction◦ Node annotation◦ Dynamic node annotation◦ Real-time operation
• Results
Natural Scene Categorization – p.5/32
System Overview (contd.)
• Environment representation◦ Image database containing images belonging to 6
classes: Lawns(L), Woods(W), Buildings(B),Waterbodies(H), Roads(R) and Traffic(T)
◦ Moderately large intra-class variance (in the featurespace) in images of each category
◦ Description relative to the person using the system: e.g.,‘to left of’, ‘in the front’, etc.
◦ Topological relationships indicated by a graph◦ Each node annotated by an identifier associated with a
class
Natural Scene Categorization – p.6/32
System Overview (contd.)
• Environment Retrieval◦ View Partitioning
RT
RBLB
LT
XX
FR
BS
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � �
� � � � �
� � � � �
� � � � �
� � � � �� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
FORWARD DIRECTION
BACKWARD DIRECTION
RT
FR
LT
LB
XX
RB
BS
View Partitioning Graphical representation
◦ Feature Extraction• Feature invariant to scaling, viewpoint, illumination
changes, and geometric warping introduced byomnicam images
• Colour histogram selected as the feature forperforming CBIR
Natural Scene Categorization – p.7/32
System Overview (contd.)
• Environment Retrieval◦ View Partitioning
RT
RBLB
LT
XX
FR
BS
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � �
� � � � �
� � � � �
� � � � �
� � � � �� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
� � � � � �
FORWARD DIRECTION
BACKWARD DIRECTION
RT
FR
LT
LB
XX
RB
BS
View Partitioning Graphical representation
◦ Feature Extraction• Feature invariant to scaling, viewpoint, illumination
changes, and geometric warping introduced byomnicam images
• Colour histogram selected as the feature forperforming CBIR
Natural Scene Categorization – p.7/32
System Overview (contd.)
• Environment Retrieval◦ Node annotation
• Objective: Robust retrieval against illuminationchanges and intra-class variations
• Solution: Annotation decided by a simple votingscheme
◦ Dynamic node annotation• Temporal evolution of graph Gn with time tn• Complete temporal evolution of the graph given by G,
obtained by concatenating the subgraphs Gn,i.e.,G = {G1, G2, . . . , Gk, . . .}
Natural Scene Categorization – p.8/32
System Overview (contd.)
• Environment Retrieval◦ Node annotation
• Objective: Robust retrieval against illuminationchanges and intra-class variations
• Solution: Annotation decided by a simple votingscheme
◦ Dynamic node annotation• Temporal evolution of graph Gn with time tn• Complete temporal evolution of the graph given by G,
obtained by concatenating the subgraphs Gn,i.e.,G = {G1, G2, . . . , Gk, . . .}
Natural Scene Categorization – p.8/32
System Overview (contd.)
• Environment Retrieval◦ Real-time operation
• Colour histogram: compact feature vector• Pre-computed histograms of all the database images• Linear time complexity (O(N)): on P-IV 2.0 GHz, ∼
100 ms for single omnicam image
◦ Portable, low-cost system for visually impaired• Modest hardware and software requirements• Easily put together using off-the-shelf components
Natural Scene Categorization – p.9/32
System Overview (contd.)
• Environment Retrieval◦ Real-time operation
• Colour histogram: compact feature vector• Pre-computed histograms of all the database images• Linear time complexity (O(N)): on P-IV 2.0 GHz, ∼
100 ms for single omnicam image
◦ Portable, low-cost system for visually impaired• Modest hardware and software requirements• Easily put together using off-the-shelf components
Natural Scene Categorization – p.9/32
System Overview (contd.)
• Results
◦ Cylindrical concentric mosaics
Natural Scene Categorization – p.10/32
System Overview (contd.)
• Results
◦ Cylindrical concentric mosaics
Natural Scene Categorization – p.10/32
System Overview (contd.)
• Results
◦ Still omnicam image
Natural Scene Categorization – p.11/32
System Overview (contd.)
• Results
◦ Still omnicam image
Natural Scene Categorization – p.11/32
System Overview (contd.)
• Results
◦ Omnivideo sequence
5
10
15
20
W
W
B
W
W
W
W
W
W
W
B
W
W
W
W
X
X
X
X
XW
W
W
W
W
W
B
W
W
W
1
n
nFORWARD DIRECTION
BACKWARD DIRECTION
1051 15 20
R R R L L
n
R
25
Natural Scene Categorization – p.12/32
System Overview (contd.)
• Results
◦ Omnivideo sequence
5
10
15
20
W
W
B
W
W
W
W
W
W
W
B
W
W
W
W
X
X
X
X
XW
W
W
W
W
W
B
W
W
W
1
n
nFORWARD DIRECTION
BACKWARD DIRECTION
1051 15 20
R R R L L
n
R
25
Natural Scene Categorization – p.12/32
Analyzing our results
• System accuracy: close to 70%– This is not enough!• Some scenes are inherently ambiguous!• Often the second best class is the correct class
• Limitations1. Limited discriminating power of global colour histogram
(GCH)2. Local colour histogram (LCH) based on tiling cannot be
used3. Each frame analyzed independently
• Possible solutions1. Adding memory to the system2. Clustering scheme before computing similarity measure
Natural Scene Categorization – p.13/32
Analyzing our results
• System accuracy: close to 70%– This is not enough!• Some scenes are inherently ambiguous!• Often the second best class is the correct class
• Limitations1. Limited discriminating power of global colour histogram
(GCH)2. Local colour histogram (LCH) based on tiling cannot be
used3. Each frame analyzed independently
• Possible solutions1. Adding memory to the system2. Clustering scheme before computing similarity measure
Natural Scene Categorization – p.13/32
Analyzing our results
• System accuracy: close to 70%– This is not enough!• Some scenes are inherently ambiguous!• Often the second best class is the correct class
• Limitations1. Limited discriminating power of global colour histogram
(GCH)2. Local colour histogram (LCH) based on tiling cannot be
used3. Each frame analyzed independently
• Possible solutions1. Adding memory to the system2. Clustering scheme before computing similarity measure
Natural Scene Categorization – p.13/32
Method I. Adding memory to the system
• System uses only the current observation in labeling• Good idea to use all observations upto the current one• Desired: A recursive implementation to calculate the
posterior (should be able to do it in real-time!)• Hidden Markov Model: Parameter estimation using Kevin
Murphy’s HMM toolkit
• Challenges1. Estimation of the transition matrix- possible solution is to
use limited classes2. Enormous training data required
Natural Scene Categorization – p.14/32
Method I. Adding memory to the system
• System uses only the current observation in labeling• Good idea to use all observations upto the current one• Desired: A recursive implementation to calculate the
posterior (should be able to do it in real-time!)• Hidden Markov Model: Parameter estimation using Kevin
Murphy’s HMM toolkit
• Challenges1. Estimation of the transition matrix- possible solution is to
use limited classes2. Enormous training data required
Natural Scene Categorization – p.14/32
Adding memory. . . (Results)
• Improved confidence in the results. However, negligibleimprovement in the accuracy
• Reasons for poor performance◦ Limited number of transitions in categories (as opposed
to locations◦ Typical training data for HMMs is thousands of labels:
difficult to collect such vast data• Limitation: Makes the system dependent on the system
dependent on the training sequence
Natural Scene Categorization – p.15/32
Method II. Preclustering the image
• Presence of clutter, images from a broad domain• Premise: The part of the image indicative of the semantic
category forms a distinct part in the feature space
Some test images belonging to the ‘Water-bodies’ category
• Possible solution: segment out the clutter in the scene
Natural Scene Categorization – p.16/32
Preclustering the image. . .
• K means clustering of the image• Use only pixels from the largest cluster to compute the
colour histogram
Results of K means clustering on the test images
• Results◦ Accuracy improves significantly– for ‘water-bodies’ class
improvement from 25% to about 72%
• Limitations: What about, say, a traffic scene?!
Natural Scene Categorization – p.17/32
Preclustering the image. . .
• K means clustering of the image• Use only pixels from the largest cluster to compute the
colour histogram
Results of K means clustering on the test images
• Results◦ Accuracy improves significantly– for ‘water-bodies’ class
improvement from 25% to about 72%
• Limitations: What about, say, a traffic scene?!
Natural Scene Categorization – p.17/32
Preclustering the image. . .
• K means clustering of the image• Use only pixels from the largest cluster to compute the
colour histogram
Results of K means clustering on the test images
• Results◦ Accuracy improves significantly– for ‘water-bodies’ class
improvement from 25% to about 72%
• Limitations: What about, say, a traffic scene?!
Natural Scene Categorization – p.17/32
Model-based approaches
• Stochastic models used to learn semantic concepts fromtraining images
• Use of normal perspective images• Use of local image features• Two models examined
1. probabilistic Latent Semantic Analysis (pLSA)2. Maximum entropy models
• Use of the ‘bag of words’ approach
Natural Scene Categorization – p.18/32
Bag of words approach
• Local features more robust to occlusions and spatialvariations
• Image represented as a collection of local patches• Image patches are members of a learned (visual)
vocabulary• Positional relationships not considered!• Data representation by a co-occurrence matrix
• Notation◦ D = {d1, . . . , dN} : corpus of documents◦ W = {w1, . . . , wM} : dictionary of words◦ Z = {z1, . . . , zK} : (latent) topic variables◦ N = {n(w, d)}: co-occurrence table
Natural Scene Categorization – p.19/32
Bag of words approach
• Local features more robust to occlusions and spatialvariations
• Image represented as a collection of local patches• Image patches are members of a learned (visual)
vocabulary• Positional relationships not considered!• Data representation by a co-occurrence matrix
• Notation◦ D = {d1, . . . , dN} : corpus of documents◦ W = {w1, . . . , wM} : dictionary of words◦ Z = {z1, . . . , zK} : (latent) topic variables◦ N = {n(w, d)}: co-occurrence table
Natural Scene Categorization – p.19/32
pLSA model . . .
• Generative model◦ select a document d with probability P (d)
◦ select a latent class z with probability P (z|d)
◦ select a word w with probability P (w|z)
• Joint observation probabilityP (d,w) = P (d)P (w|d), whereP (w|d) =
∑
z∈Z P (w|z)P (z|d)
• Modeling assumptions1. Observation pairs (d,w) generated independently2. Conditional independence assumption
P (w, d|z) = P (w|z)P (d|z)
Natural Scene Categorization – p.20/32
pLSA model . . .
• Generative model◦ select a document d with probability P (d)
◦ select a latent class z with probability P (z|d)
◦ select a word w with probability P (w|z)
• Joint observation probabilityP (d,w) = P (d)P (w|d), whereP (w|d) =
∑
z∈Z P (w|z)P (z|d)
• Modeling assumptions1. Observation pairs (d,w) generated independently2. Conditional independence assumption
P (w, d|z) = P (w|z)P (d|z)
Natural Scene Categorization – p.20/32
pLSA model . . .
• Generative model◦ select a document d with probability P (d)
◦ select a latent class z with probability P (z|d)
◦ select a word w with probability P (w|z)
• Joint observation probabilityP (d,w) = P (d)P (w|d), whereP (w|d) =
∑
z∈Z P (w|z)P (z|d)
• Modeling assumptions1. Observation pairs (d,w) generated independently2. Conditional independence assumption
P (w, d|z) = P (w|z)P (d|z)
Natural Scene Categorization – p.20/32
pLSA model . . .
• Model fitting◦ Maximize the log-likelihood functionL =
∑
d∈D
∑
w∈Wn(d,w)logP (d,w)
◦ Minimizing the KL divergence between the empiricaldistribution and the model
◦ EM algorithm to learn model parameters
• Evaluating model on unseen test images◦ P (w|z) and P (z|d) learned from the training dataset◦ ‘Fold-in’ heuristic for categorization: learned factors
P (w|z) are kept fixed, mixing coefficients P (z|dtest) areestimated using the EM iterations
Natural Scene Categorization – p.21/32
pLSA model . . .
• Model fitting◦ Maximize the log-likelihood functionL =
∑
d∈D
∑
w∈Wn(d,w)logP (d,w)
◦ Minimizing the KL divergence between the empiricaldistribution and the model
◦ EM algorithm to learn model parameters
• Evaluating model on unseen test images◦ P (w|z) and P (z|d) learned from the training dataset◦ ‘Fold-in’ heuristic for categorization: learned factors
P (w|z) are kept fixed, mixing coefficients P (z|dtest) areestimated using the EM iterations
Natural Scene Categorization – p.21/32
pLSA model . . .
• Details of experiment to evaluate model◦ 5 categories: houses, forests, mountains, streets and
beaches◦ Image dataset: COREL photo CDs, images from internet
search engines, and personal image collections◦ 100 images of each category◦ Modifications in Rob Fergus’s code for the experiments◦ 128-dim SIFT feature used to represent a patch◦ Visual codebook with 125 entries
• Image annotationz = arg maxi P (zi|dtest)
Natural Scene Categorization – p.22/32
pLSA model. . . Results
• 50 runs of the experiment: with random partitioning on eachrun
• Vastly different accuracy on different runs: best case ∼ 46%,and worst case 5%
• Analysis of the results◦ Confusion matrix gives us further insights◦ Most of the labeling errors occur between houses and
streets◦ Ambiguity between mountains and forests
Natural Scene Categorization – p.23/32
pLSA model. . . Results
• 50 runs of the experiment: with random partitioning on eachrun
• Vastly different accuracy on different runs: best case ∼ 46%,and worst case 5%
• Analysis of the results◦ Confusion matrix gives us further insights◦ Most of the labeling errors occur between houses and
streets◦ Ambiguity between mountains and forests
Natural Scene Categorization – p.23/32
Results using the pLSA model
Figure 0: Some images that were wrongly anno-
tated by our system
Natural Scene Categorization – p.24/32
Results of the pLSA model . . .
• Comparison with the naive Bayes’ classifier
Figure 0: Confusion matrices for the pLSA and
naive Bayes models
• 10-fold cross validation test on the same dataset: meanaccuracy: ∼ 66%
Natural Scene Categorization – p.25/32
Analysis of our results
• Reasons for poor performance◦ Model convergence!◦ Local optima problem in the EM algorithm◦ Optimum value of the objective function depends on the
initialized values◦ We initialize the algorithm randomly at each run!
• Possible solution: Deterministic annealing EM (DAEM)algorithm
• Even with DAEM no guarantee of converging to the globaloptimal solution
Natural Scene Categorization – p.26/32
Analysis of our results
• Reasons for poor performance◦ Model convergence!◦ Local optima problem in the EM algorithm◦ Optimum value of the objective function depends on the
initialized values◦ We initialize the algorithm randomly at each run!
• Possible solution: Deterministic annealing EM (DAEM)algorithm
• Even with DAEM no guarantee of converging to the globaloptimal solution
Natural Scene Categorization – p.26/32
Maximum entropy models
• Maximum entropy prefers a uniform distribution when nodata are available
• Best model is the one that is:1. Consistent with the constraints imposed by training data2. Makes as few assumptions as possible
• Training dataset: {(x1, y1), (x2, y2), . . . , (xN , yN )}, where xi
represents an image and yi represents a label• Predicate functions
◦ Unigram predicate: co-occurrence statistics of a wordand a label
fv1,LABEL(x, y) =
{
1 if y=LABEL and v1 ∈ x
0 otherwise
Natural Scene Categorization – p.27/32
Maximum entropy models . . .
• Notation◦ f : predicate function◦ p(x, y): empirical distribution of the observed pairs◦ p(y|x): stochastic model to be learnt
• Model fitting: expected value of the predicate function w.r.t.to the stochastic model should equal the expected value ofthe predicate measured from the training data
• Constrained optimization problemMaximize H(p) = −
∑
x,y p(x)p(y|x)logp(y|x)
s.t.∑
x,y p(x, y)f(x, y) =∑
x,y p(x)p(y|x)f(x, y)
• p(y|x) = 1Z(x)exp
∑ki=1 λifi(x, y)
Natural Scene Categorization – p.28/32
Maximum entropy models . . .
• Notation◦ f : predicate function◦ p(x, y): empirical distribution of the observed pairs◦ p(y|x): stochastic model to be learnt
• Model fitting: expected value of the predicate function w.r.t.to the stochastic model should equal the expected value ofthe predicate measured from the training data
• Constrained optimization problemMaximize H(p) = −
∑
x,y p(x)p(y|x)logp(y|x)
s.t.∑
x,y p(x, y)f(x, y) =∑
x,y p(x)p(y|x)f(x, y)
• p(y|x) = 1Z(x)exp
∑ki=1 λifi(x, y)
Natural Scene Categorization – p.28/32
Maximum entropy models . . .
• Notation◦ f : predicate function◦ p(x, y): empirical distribution of the observed pairs◦ p(y|x): stochastic model to be learnt
• Model fitting: expected value of the predicate function w.r.t.to the stochastic model should equal the expected value ofthe predicate measured from the training data
• Constrained optimization problemMaximize H(p) = −
∑
x,y p(x)p(y|x)logp(y|x)
s.t.∑
x,y p(x, y)f(x, y) =∑
x,y p(x)p(y|x)f(x, y)
• p(y|x) = 1Z(x)exp
∑ki=1 λifi(x, y)
Natural Scene Categorization – p.28/32
Results for the maximum entropy model
• Same dataset, feature and codebook as used for the pLSAexperiment
• Evaluation using Zhang Le’s maximum entropy toolkit
• 25-fold cross-validation accuracy: ∼ 70%
• The second best label is often the correct label: accuracyimproves to 85%
Figure 1: Confusion matrices for the maximum
entropy and naive Bayes models
Natural Scene Categorization – p.29/32
Results for the maximum entropy model
• Same dataset, feature and codebook as used for the pLSAexperiment
• Evaluation using Zhang Le’s maximum entropy toolkit
• 25-fold cross-validation accuracy: ∼ 70%
• The second best label is often the correct label: accuracyimproves to 85%
Figure 1: Confusion matrices for the maximum
entropy and naive Bayes models
Natural Scene Categorization – p.29/32
Results for the maximum entropy model
• Same dataset, feature and codebook as used for the pLSAexperiment
• Evaluation using Zhang Le’s maximum entropy toolkit
• 25-fold cross-validation accuracy: ∼ 70%
• The second best label is often the correct label: accuracyimproves to 85%
Figure 1: Confusion matrices for the maximum
entropy and naive Bayes modelsNatural Scene Categorization – p.29/32
A comparative study
Method # of catg. training # per catg. perf(%)
Maximum entropy 5 50 70
pLSA 5 50 46
Naive Bayes’ classifier 5 50 66
Fei-Fei 13 100 64
Vogel 6 ∼100 89.3
Vogel 6 ∼100 67.2
Oliva 8 250 ∼ 300 89
Table 0: A performance comparison with other
studies reported in literature.
Natural Scene Categorization – p.30/32
Future Work
• Further investigations into the pLSA model• Issue of model convergence• DAEM algorithm is not the ideal solution• Using a richer feature set, e.g., bank of Gabor filters• For maximum entropy models, ways to define predicates
that will capture semantic information better
Natural Scene Categorization – p.31/32
THANK YOU
Natural Scene Categorization – p.32/32