Download - Biologically Motivated Computer Vision

Biologically Motivated Computer Vision

Digital Image Processing

Sumitha BalasuriyaDepartment of Computing Science, University of Glasgow

General Vision Problem• Machine vision has been very successful in finding

solutions to specific, well constrained problems such as optical character recognition or fingerprint recognition. In fact machine vision has surpassed human vision in many such closed domain tasks.

• However it is only in biology where we find systems that can handle unconstrained, diverse vision problems.

• How can a biological or machine system which just captures two dimensional visual information from a view of a cluttered field even attempt to reason with and function in the environment? An accurate detailed spatial model of the environment is difficult to compute and the whole problem of scene analysis is ill-posed.

A problem is well posed if (1) a solution exists, (2) the solution is unique, (3) the solution depends continuously on the initial data (stability property).

Ill-posed problem

?

Several possible solutions exist

The general vision problem isn’t really solved in biology …

• For example I can't build an accurate spatial world model of the scene I look at ...

• Biological systems have evolved to process visual data to extract just enough information to perform the reasoning for everyday tasks that are part of survival.

• Visual information is combined with higher level knowledge and other sensory modalities that constrain the reasoning in the solution space and finally makes vision possible.

Visual cortex and a bit more …

Lower visual cortex

Direct feedback projections to V1 originate from: V2 (complex features) V3 (orientation, motion, depth) V4 (colour, attention) MT (motion) MST (motion) FEF (saccades, spatial memory) LIP (saccade planning) IT (recognition)

Feedback from higher cortical areas

Frontal cortex V2, V4, FEF, IT V1

Face features V1

• Newborn kittens• Placed in a carousel• One active, other passively

towed along• Both receive same stimulation• The actively moving kitten

receives visual stimulation whichresults from its own movements

• Only the active kitten developssensory-motor coordination.

Held and Hein, 1963

Conventional Computer Vision Architecture

Feature Extraction

Input Action

Classification, Recognition, Disparity

Output

The Future - Biologically Motivated Computer Vision Architecture

Task /Goal

Hierarchical processing

Input

Mor

e ab

stra

ct f

eatu

res

/ sy

mbo

ls

Square triangle

s t

Feedback processing

Lateral processing

Is there a square, triangle or circle?

Feedforwardprocessing

Optical illusions

Oth

er m

odal

ities

Biologically Motivated Computer Vision Architectures in action

http://www.lira.dist.unige.it/babybotvideos.htm

Simple colour cues. Foveated sensors.

Also:Learnt arm control, Learn how to act on objects

Biologically Inspired features

• Machine vision and biological vision systems process similar information (visual scenes) and perform similar tasks (recognition, targeting)

• Not surprisingly the optimal features that are extracted by many machine vision system look surprising like those found in biology

• But first ….

11

Why bother with feature extraction?

• Why not use the actual image/video itself for reasoning/analysis?

INVARIANCE!

• The information we extract (i.e. the features) from the ‘entity’ must be insensitive to changes.

• The extracted features might be invariant to rotation and scaling of objects in images, lighting conditions, partial occlusions

What features should we extract?

• Depends….• Modality (video/image/audio …)

• Task (eg: topic categorisation/face recognition/ audio compression)

• Dimensionality reduction / sparsification

• Invariance vs descriptiveness

If the features are too descriptive they can’t generalise to new

examples If they generalise to much – everything looks just about

the same

As the feature we extract becomes more complex/descriptive it will also become less invariant to even minor changes in the entity that we are measuring.

Human visual pathway

• Inspiration for feature extraction methodology

Circularly symmetric retinal ganglion receptive fields

Receptive field: area in the FOV in which stimulation leads to a response in the neuron

Orientated simple cell cortical receptive fields (similar to Gabor filter)

Gabor filter• A function f(t) can be decomposed into cosine (even)

and sine (odd) functions. Good for defining periodic structures. Not localised.

• There is an uncertainty relation between a signals specificity in time and frequency.

• Dennis Gabor defined a family of signals that optimised this trade-off

• Enables us to extract local features• Daugman(1995) defined a 2D filter based on the above

which was called a Gabor filter• These filters resemble cortical simple cells

Gabor filter

• Localise the sine and cosine functions using a Gabor envelope.

Gaussian envelope

Modulating cosine Modulating sine

Even symmetric cosine Gabor wavelet

Odd symmetric sine Gabor wavelet

Gaussian envelope

2 2

2

221( , )

2

x yj Ux Vyh x y e e

2 22 22( , )

u U v VH u v e

Assuming symmetric Gaussian envelope

In the Fourier domain the Gabor is a Gaussian centred about the central frequency (U,V). The orientation of the Gabor in the spatial domain is

1tanV

U

σ

U,V

u

v

Spatial Frequency Bandwidth

• Bandwidth at half power point

• Bandwidth depends on symmetricGaussian envelope’s sigma. Largesigma results in narrow bandwidthat the Gabor filter exactly filters at its central frequency. Also due to the uncertainty relation a narrow frequency bandwidth will result in reduced spatial localisation by the filter.

Spatial Spectral (Fourier)

frequency

Wide bandwidth Narrow bandwidth Odd symmetric sine Gabor wavelet

Even symmetric cosine Gabor wavelet

Spatial filter profile

1 20.2650

u u

Gabor filter with asymmetric Gaussian• However the Gabor’s Gaussian envelope need not be

circular symmetric! An elliptical spatial Gaussian envelope lets us control orientation bandwidth.

• Better formulation for asymmetric Gaussian envelope

2 22 2

2 22 ' '

2 '( , ) o o

o

f fx y

j f xofx y e e

' cos sinx x y ' - sin cosy x y

2

22 2 22

' '

( , v) o

o

u f vfu e

' cos sinu u v ' - sin cosv u v

Spatial domain

along direction of wave propagation

Spectral domain

along direction of wave propagation

fo= central frequencyθ = angleγ = sigma in direction of propagationη = sigma perpendicular to direction of propagation

Fourier domain

Bandwidth of Gabor with asymmetric Gaussian

2

22 2 22

' '1

2

oo

u f vfe

' 0v

2

222

'1

2

oo

u ffe

2

2

2 2

1 ' ln

2o

o

fu f

1 ' ln

2o

o

fu f

Half power points

Along direction of wave propagation,

' ou f

2

2 22

'1

2o

vfe

22

2 2

1' ln

2ofv

1' ln

2ofv

Perpendicular to direction of wave propagation,

2 1ln

2of

Spatial bandwidth perpendicular to wave propagation

Spatial bandwidth in direction of wave propagation

2 1ln

2of

Orientation Bandwidth• Orientation bandwidth is related to the number of orientations we

want to extract. The half power points of the filters should coincide in the spectral domain.

u

v

Orientation bandwidth

Spatial frequency bandwidth

Half power

ωo

Δθ

' ov fk

2 1ln

2o

o

ff

k

2

2 1ln

2

k

If the filter bank consists of k orientated filters, and redundancy in orientation sampling

l=rθsmall θ

Orientation Bandwidth

u

v

u

v

Orientation bandwidth

Spatial frequency bandwidth

Half power

u

v

u

v

ωo

Δθ

Spatial domain

Frequency domain

Filter bank

Hypercolumn

• Experiments by Hubel and Weisel (1962,1968)

• A set of orientation selective units over a common patch of the FOV.

• Organised as a vertical column in the visual cortex

• In computational system use information in hypercolumn for higher level reasoning

Feature vector

Only using the even symmetric component in the filter bank

Properties of the hypercolumn feature vector

• Invariance to rotation in image plane

Even symmetric detector

8 82 2

, 0,1 1

i ii i

R R

Hypercolumn responses

stimulation

Cycle to canonical orientation

• Invariance to rotation in image plane

Cycle responses in feature vector

stimulation

Properties of the hypercolumn feature vector

• Invariance to scaling (i.e. spatial frequency)

central frequency

8 82 2

, 0,1 1

i ii i

R R

stimulation

Scale Invariance Feature Transform

• Pandemonium model (Selfridge, 1959!)

• Build ever more complex/ abstract features alongthe hierarchy

• Aggregate hypercolumnfeature vectors to complex feature

SIFT features

Hypercolumn features

Complex feature vector

Rotate hypercolumn features to canonical of large support region

Rotate descriptor canonical of large support region

Recognition

• Extract SIFT features at corner locations (Harris corner detector), and scale space peaks

Training Recognition

Recap• Biologically motivated computer vision architecture

• Feedforward, feedback, lateral processing in architecture

• Hierarchical processing

• Feature extraction provides information about entities which are (somewhat!) invariant to changes

• Gabor filter

• Hypercolumn feature vector.

• SIFT features

The End