Information-theoretic Computer Vision for Autonomous Robots

8/3/2019 Information-theoretic Computer Vision for Autonomous Robots

1/94

Information-theoretic Computer Visionfor Autonomous Robots

Boyan Bonev

Robot Vision GroupUniversity of Alicante

November 26th, 2010

Talk at the Max Planck Institute for Biological Cybernetics,Tubingen, Germany

Boyan Bonev (University of Alicante) Information-theoretic Computer Vision November 26th , 2010 1 / 94


2/94

Outline

1 IntroductionThe Robot Vision GroupResearchInformation theory

2 TheoriesThe Method of Types

3 MeasuresMutual information for alignmentEntropy in visual navigationJensen-Renyi divergence

4 PrinciplesMinimum description lengthMinimax entropy

5 Entropy EstimationFeature Selection



3/94

Outline

1 IntroductionThe Robot Vision GroupResearchInformation theory







4/94

University of Alicante



5/94

The Robot Vision Group - People



6/94

The Robot Vision Group - Research



7/94

The Robot Vision Group - Platforms



8/94

The Robot Vision Group - Mobile

Bench Project: Collaboration with James Coughlan, Smith Kettlewel Eye Research Institute (California)



9/94

Outline

1 IntroductionThe Robot Vision Group

ResearchInformation theory







10/94

Former research (1)

Quadruped walk calibration

Video

Bonev, Cazorla, Martnez (2005)

Walk calibration in a four-legged robot. Climbing and Walking Robots, London, U.K.

http://video/robocup.wmvhttp://video/robocup.wmvhttp://video/robocup.wmv


11/94

Former research (2)

Localization

0

500

1000

1500

2000

2500

-1500 -1000 -500 0 500 1000 1500

Y(mm)

X (mm)

Path followed

Ground truthEstimated

0

500

1000

1500

2000

2500

-1500 -1000 -500 0 500 1000 1500

Y(mm

)

X (mm)

Path followed


speedmeans+errors(odometry)

0

500

1000

1500

2000

2500

-1500 -1000 -500 0 500 1000 1500

Y(mm)

X (mm)

Path followed


0

500

1000

1500

2000

2500

-1500 -1000 -500 0 500 1000 1500

Y(mm

)

X (mm)

Path followed


Bonev, Cazorla, Martn, Matellan (2010)

Portable autonomous walk calibration for 4-legged robots. Applied Intelligence



12/94

Former research (3)

Architecture and robotic tasks

Commander

Perceptual

Anchoring

Module

Hierarchical

Behaviour

Module

Global

Map

Hierarchical

FiniteState Machine

Team

Communication

Module

Lower Layer

Middle Layer

Higher Layer

Communication

Layer

Sensor

Data

Motor

Commands

Local State

Local State

Global State

Behaviours

Messages

Other

Robot

Other

Robot

Probability maps

Martnez, Matellan, Cazorla, Saffiotti, Herrero, Martn, Bonev, LeBlanc (2005)

Team Chaos description paper. RoboCup (Competition), Osaka, Japan



13/94

Former research (4)

Teamwork

University of Murcia, University Rey Juan Carlos (Madrid), University of Alicante, Orebro University (Sweden)



14/94

Motivation

From controlled, constrained, laboratory environments

To different indoor/outdoor environments



15/94

Ph.D. Thesis

B. Bonev (2010)Feature Selection based onInformation Theory

Supervised by M. Cazorlaand F. Escolano

Estimation of

mutual information To optimize (for classification)

a high-dimensional set offeatures:

Image filters Spectral graph features Genes

0 2 4 6 8 10 1 2 1 4 16 1 8 200

2

4

6

8

10

12

14

16

18

20

Fluorescent

labeling

Sample RNA

Referece ADNc

Combination

Hybridization

Fluorescent

labeling

MDF

eature

Sele

ction

Num

ber

of

Sele

ctedGen

e

Class (disease)MELANOMA

MELANOMA

MELANOMA

MELANOMA

MELANOMA

MELANOMA

BREAST

BREAST

MELANOMA

NSCLC

NSCLC

NSCLC

BREAST

MCF7Drepro

BREAST

MCF7Arepro

COLON

COLON

COLON

COLON

COLON

COLON

COLON

LEUKEMIA

LEUKEMIA

LEUKEMIA

LEUKEMIA

LEUKEMIA

K562Arepro

K562Brepro

LEUKEMIA

NSCLC

NSCLC

NSCLC

PROSTATE

OVARIAN

OVARIAN

OVARIAN

OVARIAN

OVARIAN

PROSTATE

MELANOMA

OVARIAN

UNKNOWN

RENAL

NSCLC

BREAST

RENAL

RENAL

RENAL

RENAL

RENAL

RENAL

RENAL

NSCLC

NSCLC

BREAST

CNS

CNS

BREAST

RENAL

CNS

CNS

CNS

19135

246663766982

117714701671

2080

32273400396440574063411042894357444146634813522654815494549555085790589260136019603260456087

6145

61846643



16/94

Outline








I d i (1)


17/94

Introduction (1)

Information Theory

Specifies how to encode data which obey a probabilitydistribution so that they can be transmitted and thendecoded.

Cover and Thomas (1991)

Elements of Information Theory. Wiley-Interscience

Information Theory in Computer Vision

Encoding is performed by light rays reflectedoff the objects in the scene.Depends on the reflectance properties, spatiallocations, light sources: encoding is out of ourcontrol.We can look for common structures or models.

Yuille (2010) An information theory perspective on computational vision.

Front. Electr. Electron. Eng. China

M.C. Escher, Three worlds


I d i (2)


18/94

Introduction (2)

Escolano, Suau, Bonev (2009) Information Theory in Computer Vision and Pattern Recognition. SpringerBoyan Bonev (University of Alicante) Information-theoretic Computer Vision November 26th , 2010 18 / 94

I d i (2)


19/94

Introduction (2)


I t d ti (2)


20/94

Introduction (2)


I t d ti (2)


21/94

Introduction (2)


Introduction (2)


22/94

Introduction (2)


Introduction (3)


23/94

Introduction (3)

A glance at IT in several Computer Vision and Autonomous Robotic

tasks. Classification of the topics in 4 dimensions:


Outline


24/94

Outline





4

PrinciplesMinimum description lengthMinimax entropy



Outline


25/94

Outline





4

PrinciplesMinimum description lengthMinimax entropy



The Method of Types


26/94

The Method of Types

The method of types (Csiszar and Korner) Partition the n-sized samples into classes according to their type

(empirical distribution). There are only a polynomial number of types (wrt n). There are an exponential number of samples of each type.

The sequence {2, 2, 6} has the type

P(2) =23 , P(6) =

13 , P(1) = P(3) = P(4) = P(5) = 0.

The class type of P is the set of all sequences of length 3with two 2s and one 6:T(P) = {226, 262, 622}

For samples drawn i.i.d. according to a distribution Q The probability of each type class depends exponentially on the

relative entropy distance between the type P and the distribution Q Thus, type classes that are far from the true distribution have

exponentially smaller probability.



27/94

Filtering


28/94

Filtering

Finding a threshold to discard all points x whose relative entropy atSmax is (x) < .

Pixel filtering with increasing values.

Finding is an image-dependent task. Exploit the method of types to ensure the best filtering.

Bonev, Escolano, Lozano, Suau, Cazorla, Aguilar (2007)Constellations and the Unsupervised Learning of Graphs .

6th IAPR -TC-15 Workshop on Graph-based Representations in Pattern Recognition. Alicante, SpainBoyan Bonev (University of Alicante) Information-theoretic Computer Vision November 26th , 2010 28 / 94

Optimal filtering


29/94

Optimal filtering

Pon() is the pdf of the probability to have a relative entropy given

that a point is part of the salient regions. Poff() is the pdf of the probability to have a relative entropy given

that a point is part of the discarded regions.


Chernoff information


30/94

Chernoff information

The Chernoff information

C(Pon, Poff) = min01

log J

j=1

Pon(xj)P1off (xj)

,where xj is the histogram bin j,measures how discriminable are Pon and Poff.

The expected error rate of the likelihood test logPon()

Poff()< T

decreases exponentially wrt C(Pon(), Poff()). T is bound by the Kullback-Leibler divergence:D(P

off()||P

on()) < T < D(P

on()||P

off())

A low C(Pon, Poff) means that the images in the set are tooheterogeneous less points will be discarded.

Video


Different thresholds
http://video/trayectoria.avihttp://video/trayectoria.avihttp://video/trayectoria.avi


31/94

Different thresholds

Environment C(Pon, Poff ) %Pointsoffice 0.2434 33.63%

corridor#1 0.4491 38.58%corridor#2 0.4223 36.69%

hall 0.2732 34.46%entrance 0.1405 29.17%

trees-avenue 0.2279 43.00%

Lozano, Escolano, Bonev, Suau, Aguilar, Saez, Cazorla (2008)

Region and constellations based categorization of images with unsupervised graph learning. Image and Vision ComputingBoyan Bonev (University of Alicante) Information-theoretic Computer Vision November 26th , 2010 31 / 94

Outline


32/94








Outline


33/94








Image alignment


34/94

g g

Quadcopter video Vertical camera video


Local and global approaches
http://video/videofer.mp4http://video/videofer.mp4http://video/videowarping.mp4http://video/videowarping.mp4http://video/videowarping.mp4http://video/videofer.mp4


35/94

g pp

Based on local features: SURF, SIFT, saliency, . . . (a problem if thereare no features or there is noise)

Based on the global appearance: correlation, mutual information,entropy, . . . (time-consuming)


Conditional entropy and mutual information


36/94

Among the space of transformations , find a transformation T whichmaximizes some measure of the alignment between T(I2) and I1.

Conditional entropy: arg minT

H(T(I2)|I1) self-predictability problem

Mutual information: arg maxT

I (T(I2)|I1)

= arg maxT

{H(T(I2)) + H(I1) H(T(I2)|I1)}

I(X,Y,Z)

H(Y|X,Z)H(X|Y,Z)

H(Z|X,Y)

H(X) H(Y)

H(Z) H(X,Y,Z)


The histogram-binning problem


37/94

Joint histogram

0 50 100 150 200

0

50

100

150

200

0

0

50

50

200 200200 150

X Y

x

y

2

1 1

A high number of bins: sparse histogram

sensitive to noise

Without noise:

10, 50 and 255 binned histograms


The histogram-binning problem


38/94

Joint histogram

0 50 100 150 200

0

50

100

150

200

0

0

50

50

200 200200 150

X Y

x

y

2

1 1

A high number of bins: sparse histogram

sensitive to noise

With noise:

10, 50 and 255 binned histograms


The isocontours method (1)


39/94

50 100 150 200 250 300

20

40

60

80

100

120

140

160

180

200

220

Image considered as a continuous surface; divided in Q iso-intensity lines.

1

2

3

4

1

2

42

1

4

(a) (b)

(c)

1

2

42

1

4

(d)

(e)

a) Subpixel interpolationb) Iso-intensities intersect inside(vote)c) Iso-intensities intersect

outsided) Iso-intensities are parallele) Intersection area ofiso-surfaces

Rajwade, Banerjee, Rangarajan (2009)Probability density estimation using isocontours and isosurfaces: Application to information theoretic image registration.

IEEE Transactions on Pattern Analysis and Machine IntelligenceBoyan Bonev (University of Alicante) Information-theoretic Computer Vision November 26th , 2010 39 / 94



40/94

Classical, point-counting and area-based.




41/94

0

10

20

30

40

50 0

10

20

30

40

50

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

POINT COUNTING

0

10

20

30

40

50 0

10

20

30

40

50

0.5

1

1.5

2

2.5

AREA BETWEEN ISOCONTOURS


Outline


42/94








Omnidirectional camera


43/94

Basic skills for topological navigation in a structured world:finding the direction and avoiding obstacles


Entropy for finding the direction (1)


44/94




45/94

0 180 3600.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Angle

Entropy

Entropy approximation

Entropy

2ndorder Fourier Series Approximation




46/94

Indoor and outdoor results.

0 20 40 60 80 100 120

80

60

40

20

0

20

40

60

80

Entropybased direction estimation

# frames

Angle(degrees)

Estimated directionDesired direction

0 50 100 150

80

60

40

20

0

20

40

60

80

Entropybased direction estimation

# frames

Angle(degrees)

Estimated directionDesired direction

Bonev, Cazorla, Escolano (2007)Robot Navigation Behaviors based on Omnidirectional Vision and Information Theory.

Journal of Physical Agents


Obstacle avoidance


47/94

Visual sonars based on gradient

maxD

vk

fk

f*

Video

Bonev, Cazorla, Escolano (2007)Robot Navigation Behaviors based on Omnidirectional Vision and Information Theory.



Outline
http://video/navigation.avihttp://video/navigation.avihttp://video/navigation.avi


48/94





4 Principles

Minimum description lengthMinimax entropy



The Jensen-Renyi divergence


49/94

Jensen-Renyi divergence

JR (p1, , pn) = H

n

i=1

ipi

ni=1

iH(pi), p1, p2, , pn are n probability distributions

H(p) is the Renyi entropy of order

= (1, 2, , 3) is a weight vector satisfyingn

i=1 i = 1 with i 0

Symmetric

n weighted distributions

Robust to noise

Renyi entropy

H(X) =1

1 log

ni=1

xi

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

p

H

(p)

=0

=0.2

=0.5

Shannon

=2

=10


Trajectory segmentation (1)


50/94

Segment a sequence of images based on their distributions of low-levelfilters. Useful for topological localization and navigation.

Region A Region B

W1 W2

W1 W2

W1 W2

W1 W2

W1 W2

W1 W2

W1 W2

Data

J-R divergence

JR = 0.4

JR = 0.5

JR = 0.6

JR = 0.7

JR = 0.6

JR = 0.5

JR = 0.4

What window size?Bonev, Cazorla (2010)

Large scale environment partitioning in mobile robotics recognition tasks.



Trajectory segmentation (2)


51/94

220 230 240 250 260 2702800

10

20

30

JRdivergence at various window sizes

# image

window size

JRdivergence

#241 #254 #273

discriminative segmentation

Bonev, Cazorla (2010)Large scale environment partitioning in mobile robotics recognition tasks.



Localization


52/94

0 50 100 150 200 2500

100

200

300

400

Single similiarity function response

# test image

#referenceimag

e

H

0 50 100 150 200 2500

100

200

300

400

# test image

#referenceimag

e

0 50 100 150 200 2500

100

200

300

400

Similiarity functions responses

# test image

#referenceimage

H1

H2

H3

H4

H5

H6

H7

H8

H9

H10

H11H12

H13

H14

H15

H16

H17

H18

H19

H20

0 100 200 300 400

0

0.5

1Particle filter, iterations 1,5,8,11

likelihood

0 100 200 300 400

0

0.5

1

likelihood

0 100 200 300 400

0

0.5

1

likeliho

od

0 100 200 300 400

0

0.5

1

particle position (# reference image)

likelihood


Other IT measures


53/94

Henze-Penrose divergence. Based onthe Friedman-Rafsky test (using

spanning trees). Symmetrized Kullback-Leibler

divergence (using k-NN).

Jensen-Tsallis -divergence (using

k-NN). Symmetrized and normalized entropy

square variation (using k-NN).

Total variation divergence (usingkd-partitions).

Escolano, Lozano, Bonev, Suau (2010)Bypass information-theoretic shape similarity from non-rigid points-based alignment.

Workshop on Non-Rigid Shape Analysis and Deformable Image Alignment (NORDIA), in conjunction with CVPR.


Outline


54/94




3

MeasuresMutual information for alignmentEntropy in visual navigationJensen-Renyi divergence

4 Principles




Outline


55/94




3


4 Principles




Minimum description length


56/94

Minimum description lenght (MDL) principle Formalization of Occams Razor.

The best hypothesis is the one that leads to the best compression ofthe data.

A tradeoff between the complexity of the hypothesis and thecomplexity of the data given the hypothesis (avoids overfitting).

The MDL principle

For any probability distribution P, it is possible to construct a code C such

that C(x) is log2 P(x) bits long.


Example: the EBEM algorithm


57/94

Expectation-Maximization (EM) schemes have the model order selectionproblem.

Example: the Entropy-based EM for Gaussian mixtures (generativeapproach)

No initialization problem: starts with one Gaussian kernel, parameters

given by the sample. Divides the kernel whose data is not Gaussian enough.

Need for a stopping criterion: otherwise the maximum likelihoodhappens when each data point is described by one kernel (overfitting).


EBEM iterations and MDL

M d l d l ti H k l l ki f ?


58/94

Model order selection: How many kernels are we looking for?We cannot establish a threshold without knowledge about the data.

Escolano, Penalver, Bonev (2010)Entropy-based Variational Scheme for Fast Bayesian Learning of Gaussian Mixtures.

Statistical, Structural and Syntactic Pattern Recognition. Cezme, Turkey


MDL for model order selection

f


59/94

MDL for model order selection

The optimal instance of a model of any order M is the one that minimizes

L(D|M) + L(M).

The problem usually is how to estimate the model and code lenghts.In the EBEM case: L(D|M) is given by the likelihood of the data D given the model M

L(M) depends on the number of parameters of the mixtures

Video


Other approaches
http://video/ebem1.wmvhttp://video/ebem1.wmvhttp://video/ebem1.wmv


60/94

Other approches related to MDL:

AIC, An Information Criterion ofAkaike.

BIC, Bayesian InformationCriterion of Schwarz.

MML, Minimum MessageLength of Wallace.

Alternatives to MDL:

Variational EM algorithms whichdo not need a stopping criterion.

Video

Example of EBEM segmentation of the colour space.

Penalver, Escolano, Saez (2010)Learning Gaussian Mixture Models with Entropy Based Criteria.

IEEE Transactions on neural networks


Outline

1 I t d ti
http://video/ebem3.wmvhttp://video/ebem3.wmvhttp://video/ebem3.wmv


61/94




3


4 Principles




The Maximum Entropy principle


62/94

Maximum Entropy principle

When learning a probability distribution from data, the most unbiased(neutral) hypothesis is the distribution with maximum entropy whichsatisfies the expectation constraints on the datas statistics.

p() = arg maxp()

p()log p()ds.t.

p()Gi()d = E(Gi()) = i, i = 1, . . . , m

p()d = 1 A way to find the hypothesis with less assumptions (maximize

generalization).


FRAME


63/94

Filters, Random Fields and Maximum Entropy (Zhu, Wu, Mumford)

A statistical theory for texture modeling. Textures are modeled by a general filter bank fi(), i = 1, . . . , m.

Generative approach: a pdf of filters is learnt textures can besynthesized by sampling the pdf.

The maximum entropy principle is used to learn the pdf: Estimates of the marginal distributions of f(I) by applying the filters tothe texture.

Derive a maximum entropy distribution p(I) s.t. have the samemarginal distributions.

Select a set Sm of filters by filter pursuit through minimax entropy.

Zhu, Wu, Mumford (1997)Minimax entropy principle and its applications to texture modeling, Neural Computation.

Neural Computation


The Mini-Max Entropy principle

Filter pursuit (incremental feature


64/94

selection)

Select the filter which changes morethe distribution (the less redundantwith the already selected filters).

Minimax principle:

The optimal set of filters should be

chosen to minimize theKullback-Leibler divegence betweenthe filters marginals of the originaltexture and the synthesized texture.

As f(I) is fixed, then Sm is chosensuch that p(I; m, Sm) has theminimum entropy. Thus,

Sm = arg minSmSmaxm H(p(I))

Zhu, Wu, Mumford c1997 MIT Press


Outline

1 Introduction


65/94





4 Principles




Entropy in Information Theory


66/94

Entropy is a measure of the disorder or amount of information. Entropy is related to predictability with several interpretations:

Measure of the amount of information that an event provides Measure of the uncertainty in the outcome of an event Measure of the dispersion in the probability distribution


The problem


67/94

Multivariate entropy is difficult to estimate with a high number of

dimensions (more than 2).

Points (n=9) following a Gaussian distribution in 1D, 2D and 3D.

Data points get exponentially sparser

Computational issues of the estimation algorithms


Entropy estimation approaches

E t ti ti


68/94

Entropy estimation plug-in: Estimate the density p(x) underlying the samples X and

plug it into the formula: H(X) =

x p(x)log p(x) Parzens Window with variable kernel width [Viola, 1997], Wavelet

density estimation [Peter and Rangarajan, 2008] Estimation degrades exponentially wrt # dimensions

bypass: Bypass the density estimation and estimate entropy directly

from the data, based on: Entropic spanning graphs [Hero and Michel, 2002], Nearest neighbors

[Leonenko et al., 2008], k-d partitioning [Stowell and Plumbley, 2009]


Bypass entropy estimation


69/94

Renyi entropy estimator [Heroand Michel, 2002]

Estimates H(X) from thelength of the minimal

spanning tree of the samplesRenyi entropy has a discontinuityat = 1

In the limit it is the Shannonentropy:

lim1 H(X) = H(X)

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

p

H

(p)

=0

=0.2

=0.5

Shannon

=2

=10




70/94

Shannon entropy from the Renyi entropy estimates [Penalver et al., 2009]

Find an

value close to 1 to approximate the value of the Shannonentropy

depends on the number of dimensions, number of samples andsome parameters which depend on the nature of the data and have tobe set experimentally.




71/94

Shannon entropy from k-d partitioning [Stowell and Plumbley, 2009]

Estimation depends on the number of samples in each partition andits volume

Upper and lower limits have to be set



Shannon entropy from the distances of the k-nearest neighbors graph


72/94

Shannon entropy from the distances of the k nearest neighbors graph[Leonenko et al.2008]

Estimation depends on N, d, and the distances between each sampleand its k-NN.

3-NN

3-NN

k-NN of two different datasets.The result is the same.

MST

MST

MST of two different datasets.The result is different.


Comparison on Gaussian distributions


73/94


Comparison on Gaussian distributions


74/94


Comparison


75/94

The k-NN-based estimator is a good alternative to the MST-basedestimation (does not need calibration of parameters).

The k-NN-based estimator may fail with very separated modes.

The k-NN-based estimator performs well for distributions in Rd.

The k-d partitioning estimator performs better for distributions with afinite support.

The k-d partitioning estimator tends to underestimate entropy whilethe k-NN-based tends to overestimate


Outline

1 Introduction


76/94

The Robot Vision GroupResearchInformation theory



4 Principles




Feature selection in supervised classification

Feature selection


77/94

Discarding from all samples those features or variables which are less

useful for some purpose, e.g. classification.Example: wrapper feature selection for supervised classification.

M

Images

M

NF

Vectors

M

NS

Vectors

All Features Selected F.

10-Fold

CV

Train

Test

Error

Best F. Set

?

Motivation: Datasets with thousands of features Dependencies among features


Feature dependencies

Uni ariate MI does not capt re the interactions among feat res


78/94

Univariate MI does not capture the interactions among features.

Figure by Guyon and Elisseeff, 2003


MI-based criterion

Filter feature selection criterion:


79/94

Select the feature which, in

combination with the rest,provides more information aboutthe class. (No independenceassumptions).

Perform well with thousands ofdimensions.

I(S|C) = H(S) H(S|C) k-NN-based estimator:

small sample performance distributions in Rd no additional parameters

(Discriminative approach).

I(X;C)

Cx1

x2 x3x4

0 20 40 60 80 100 120 140 16010

15

20

25

30

35

40

45

# features

%e

rrorandMI

MD Feature Selection on Microarray

LOOCV error (%)

I(S;C)


Application to visual localization

Approach

S l f f l


80/94

Select from a set of general

purpose filters Environment independence

Bank of general, low-level filters

Nitzberg

Canny Horizontal Gradient

Vertical Gradient

Gradient Magnitude

12 Color FiltersHi, 1 i 12

(Feature independence cannot beassumed)


Feature extraction

Rings: distance dependent, orientation independent


81/94

1

2

.

.

.

K

Img

i C = 4

K = 17

Filters

Bank

.

.

.

C x K

1

2

3

4

1

2

68

Rings Histograms

.

.

.

4 bins

Feature Vector

N = C x K x (B-1) = 204 features

... ...


Nearest neighbors (1)

800Confusion Trajectory for Fine Localization (P=10NN)


82/94

0 100 200 300 400 5000

100

200

300

400

500

600

700

800

Test Image #

NN#

Escolano, Bonev, Suau, Aguilar, Frauel, Saez, Cazorla (2007)Contextual visual localization: cascaded submap classification, optimized saliency detection, and fast view matching.

IEEE International Conference on Intelligent Robots and Systems. San Diego, California, USA


Nearest neighbors (2)Test Image 1st NN 2nd NN 3rd NN 4th NN 5th NN


83/94

Boyan Bonev (University of Alicante) Information-theoretic Computer Vision November 26th 2010 83 / 94

Application to microarray analysis


84/94

Fluorescent

labeling

Sample RNA

Referece ADNc

Combination

Hybridization

Fluorescent

labeling

To predict the tumor class, based on the microarray analysis of apatient.

To identify a reduced set of genes which are related to the diseases.

Boyan Bonev (University of Alicante) Information-theoretic Computer Vision November 26th 2010 84 / 94

Gene selection on the NCI dataset

MD Feature Selection

RENALCNSCNSCNS

mRMR Feature Selection

RENALCNSCNSCNS 1


85/94

Number of Selected Gene

Class(disease)

MELANOMAMELANOMAMELANOMAMELANOMAMELANOMAMELANOMA

BREASTBREAST

MELANOMANSCLC

NSCLCNSCLC

BREASTMCF7Drepro

BREASTMCF7Arepro

COLONCOLONCOLONCOLONCOLONCOLONCOLON

LEUKEMIALEUKEMIALEUKEMIALEUKEMIALEUKEMIA

K562AreproK562BreproLEUKEMIANSCLC

NSCLCNSCLC

PROSTATEOVARIANOVARIANOVARIANOVARIANOVARIAN

PROSTATEMELANOMA

OVARIANUNKNOWN

RENALNSCLC

BREASTRENALRENALRENALRENALRENALRENALRENAL

NSCLCNSCLC

BREASTCNSCNS

BREAST

19

135

246

663

766

982

1177

1470

1671

2080

3227

3400

3964

4057

4063

4110

4289

4357

4441

4663

4813

5226

5481

5494

5495

5508

5790

5892

6013

6019

6032

6045

6087

6145

6184

6643

Number of Selected Gene

MELANOMAMELANOMAMELANOMAMELANOMAMELANOMAMELANOMA

BREASTBREAST

MELANOMANSCLC

NSCLCNSCLC

BREASTMCF7Drepro

BREASTMCF7Arepro

COLONCOLONCOLONCOLONCOLONCOLONCOLON

LEUKEMIALEUKEMIALEUKEMIALEUKEMIALEUKEMIA

K562AreproK562BreproLEUKEMIANSCLC

NSCLCNSCLC

PROSTATEOVARIANOVARIANOVARIANOVARIANOVARIAN

PROSTATEMELANOMA

OVARIANUNKNOWN

RENALNSCLC

BREASTRENALRENALRENALRENALRENALRENALRENAL

NSCLCNSCLC

BREASTCNSCNS

BREAST

133

134

135

233

259

381

561

1378

1382

1409

1841

2080

2081

2083

2086

3253

3371

3372

4383

4459

4527

5435

5504

5538

5696

5812

5887

5934

6072

6115

6145

6305

6399

6429

6430

6566

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Boyan Bonev (University of Alicante) Information-theoretic Computer Vision November 26th

2010 85 / 94

Datasets and results

NCI: 60 samples, 14 classes, 6380 features 10.94% LOOCV (39selected)


86/94

selected)

Leukemia: 38 + 34 samples, 2 classes, 6817 features 2.94% test (7selected) Colon: 62 samples, 2 classes, 2000 features 0% LOOCV (15

selected) CNS Embryonal: 60 samples, 2 classes, 7129 features

1.67% LOOCV (9 selected) Prostate: 102 + 43 samples, 2 classes, 12600 features 5.88% test

(5 selected)

Successful feature selection (outperforming state-of-the-art results) on

data sets with A low number of samples and a high number of features High-order dependencies among informative features

Bonev, Escolano, Cazorla (2008)Feature selection, mutual information, and the classification of high-dimensional patterns.

Pattern Analysis and Applications


2010 86 / 94

3D Object classification


87/94


2010 87 / 94

Graph extraction

Graphs from 3D shapes (SHREC database)


88/94

Extended Reeb graphs: global topological structure; local featuresidentified by a function:

a) geodesic distanceb) mass centerc) center of the circumscribing sphere

Bonev, Escolano, Giorgi, Biasotti (2010)High-dimensional Spectral Feature Selection for 3D Object Recognition based on Reeb Graphs.

Statistical, Structural and Syntactic Pattern Recognition. Cezme, Turkey


2010 88 / 94

Unattributed graphs classificationSpectral features

Complexity Flow


89/94

Sphere

Baricenter

Geodesic

Fiedler vector

Adjacency spectrum

Degrees

Perron-Frobenius

Norm. Lapl. Spectrum

Node centrality

Commute times

3D ShapeSample 1Class

2 4 6 8

...Feature vector - Sample 1

.

.

.

Feature vector - Sample n

Feature vector -Sample n-1

Feature selection

C1

C1

C1

C2

C15

C15

.

.

.

0 100 200 300 400 50020

25

30

35

40

45

50

X: 222

Y: 23.33

# features

%er

ror

Mutual Information Feature Selection

10fold CV error

Mutual Information

2 bins4 bins6 bins8 bins

Commute Times 1

Commute Times 2

Node Centrality

N.Laplacian Spectrum

PerronFrobeniusDegrees

Adjacency Spectrum

Fiedler Vector

Complexity Flow

Geodesic Graph

Baricenter Graph

Sphere Graph

Statistics for the first 222 selected features


2010 89 / 94

Feature analysis

Proportion of features during selection


90/94

0 100 200 300 400 500

Commute Times 1

Commute Times 2

Node Centrality

N.Laplacian Spectrum

PerronFrobenius

DegreesAdjacency Spectrum

Fiedler Vector

Complexity Flow

# features

Geodesic Graph

Baricenter Graph

Sphere Graph

Proportion of features during selection

Boyan Bonev (University of Alicante) Information theoretic Computer Vision November 26th

2010 90 / 94

Summary


91/94

Information-theory as a theoretical framework and a set of tools whichhelp many decoding tasks in computer vision. Some of them:

Method of Types error bound analysis

Measures (entropy, mutual information, divergences) alignment,regions of interest, segmentation, etc

Minimum description length model order selection, avoid overfitting

Maximum entropy the most unbiased hypothesis

Mutual information feature selection

IT in both generative and discriminative approaches


2010 91 / 94

Conclusions

Entropy estimation


92/94

the cornerstone of many information-theoretical implementations

some complexity issues depending on the estimation method

bypass estimation advances have motivated the use of IT in CVPR

Information theory in CV for robotic tasks

towards environment-independent applications treat images as general information provided to solve a task deal with noise and unuseful information minimize the number of assumptions

challenges related to computational cost (even with low complexity in

some cases)

Everything should be made as simple as possible, but no simpler.


2010 92 / 94

References


93/94

A. Yuille (2010)

An information theory perspective on computational vision.Front. Electr. Electron. Eng. China

F. Escolano, P. Suau, B. Bonev (2009)Information Theory in Computer Vision and Pattern Recognition.Springer

Penalver, Escolano, Saez (2009)Learning Gaussian Mixture Models with Entropy Based Criteria.IEEE Transactions on Neural Networks

B. Bonev, F. Escolano, M. Cazorla (2008)

Feature selection, mutual information, and the classification ofhigh-dimensional patterns.Pattern Analysis and Applications


2010 93 / 94

Information-theoretic Computer Visionfor Autonomous Robots


94/94

Boyan Bonev

Robot Vision GroupUniversity of Alicante

November 26th, 2010

Talk at the Max Planck Institute for Biological Cybernetics,Tubingen, Germany

Bo an Bone (Uni ersit of Alicante) Information theoretic Comp ter Vision No ember 26th

2010 94 / 94

Information-theoretic Computer Vision for Autonomous Robots

Documents

Transcript of Information-theoretic Computer Vision for Autonomous Robots