Using associative memory principles to enhance perceptual ability of vision systems (Giving a...
-
Upload
alexina-neal -
Category
Documents
-
view
218 -
download
0
Transcript of Using associative memory principles to enhance perceptual ability of vision systems (Giving a...
Using associative memory principles to enhance perceptual ability of
vision systems (Giving a meaning to what you see)
CVPR Workshop on Face Processing in Video June 28, 2004, Washington, DC, USA
Dr. Dmitry GorodnichyComputational Video Group
Institute for Information Technology National Research Council Canada
www.cv.iit.nrc.ca/~dmitry/pinn
Designing visual memory using attractor-based neural networks with
application to perceptual vision systems
CVPR Workshop on Face Processing in Video June 28, 2004, Washington, DC, USA
Dr. Dmitry GorodnichyComputational Video Group
Institute for Information Technology National Research Council Canada
www.cv.iit.nrc.ca/~dmitry/pinn www.perceptual-vision.com/memory
3. Associative memory for video (Dr. Dmitry Gorodnichy)
The unique place of this research
You are here
Computer vision
Pattern recognition
Your eyes, y
our brain
4. Associative memory for video (Dr. Dmitry Gorodnichy)
Talk overview
1. On neurobiology side- How it works in brain: From eye retina, to primary visual cortex, to
neurons, to synapses
2. Memories as attractors of associative neural network- Finding Best learning rule to tune the synapses
3. On computer vision path- Evolution of Perceptual Vision User Interface Systems:
From Face Detection to Face Tracking to Face Localization to Face Recognition
4. Putting it all together: Visual memory for analyzing faces:- What makes processing in video special
- Canonical face representation- Memories of faces as attractors of the network
5. Associative memory for video (Dr. Dmitry Gorodnichy)
How we (humans) do it: see & memorize what we see ?
Dorsal (“where”) stream:V1,V2,V3… deal with object localization
Ventral (“what”) stream: V1, V2, V4, inferior temporal cortex (TE/IT)deals with object recognition
Refs: Perus, Ungerleider, Haxby, Riesenhuber, Poggio …
•Seeing
6. Associative memory for video (Dr. Dmitry Gorodnichy)
How we (humans) do it: see & memorize what we see ? (cntd)
• In brain: 1010 to 1013 interconnected neurons
• Neurons are either in rest or activated (modelled as units taking values) Yi={+1,-1}, depending on value of other neurons Yj and the strength of synaptic connections Cij
•Brain is thus modelled as a network of binary neurons evolving in time from an initial state (e.g. stimulus coming from retina) until it reaches a stable state - attractor
•The attractors of the network is what we actually remember – associative memory.
•Recognizing / Memorizing
Refs: Hebb’49, Little’74,’78, Willshaw’71, …
7. Associative memory for video (Dr. Dmitry Gorodnichy)
Recognition / memorization: formally
• Main question:
How to compute Cij so that a) the desired patterns Vm become attractors, i.e. VVm ~CV ~CVm
and
b) network exhibits best associative (error-correction) properties, i.e.- largest attraction radius (tolerated noise)largest attraction radius (tolerated noise)- largest number of prototypes M stored- largest number of prototypes M stored
??Refs: Hebb’49, McCalloch-Pitts‘43, Amari’71,’77, Hopfield’82,Sejnowski’89, Willshaw’71
•Attractor-based neural networks
8. Associative memory for video (Dr. Dmitry Gorodnichy)
Learning rules: From biologically plausible to mathematically justifiable
Neurophysiological Postulate: “If two neurons on either side of a synapse are
activated, then the strength of the synapse is strengthened”
“When a child is born, she knows nothing. As she repeatedly observed, she learns” – Postulate from Montessory approach to enfant development.
Models
Hebb: (C = 1/N VVT) , Generalized Hebb:
Better however:
or even
Refs: Hebb’49, Hopfield’82, Sejnowski’77, Willshaw’71
How to update weights
mj
mi
mij VV
NC
1
mij
mij
mij CCC 1
),( mj
mi
mij VVFC
),,( 1 mj
mi
mij
mij VVCFC
),( 1 mmmij VCFC
9. Associative memory for video (Dr. Dmitry Gorodnichy)
C = VV+ • Obtained mathematically from stability condition: VVm =CV =CVm
• With reduced self-connection (Cii = 0.15 Cii ), it is guaranteed
[Gorodnichy’97]
to retrieve M=0.5N patterns from 8% noise
M=0.7N patterns from 2% noise (for comparison: Hebb rule stops retrieving when M=0.14N)
• Widrow-Hoff’s (delta) rule is the iterative approximation of it.
• Hebb rule is the special case of it for orthogonal prototypes.
Refs: Amari’71,’77, Kohonen’72, Personnaz’85, Kanter-Sompolinsky’86,Gorodnichy‘95-’99
Pseudo-inverse as the best learning rule
10. Associative memory for video (Dr. Dmitry Gorodnichy)
… besides that it yields the best retrieval for this type of networks.
• It is non-iterative – good for fast (real-time) learning • It is also fast in retrieval. • The performance of the network can be examined and improved analytically.
Guaranteed to converge.
• It can deal with continuous stream of data, never being saturated: if dynamic desuturation is used-> maintaining the capacity of 0.2N (with complete retrieval)-> providing means for forgetting obsolete data-> setting the basis for designing of adaptive filters
• All this makes the network very suitable for real-time memorization and recognition, as needed for video processing tasks.
• Finally, there's a free CPP code which you can compile and try yourself!
At PINN website: www.cv.iit.nrc.ca/~dmitry/pinn
-
What’s else good about PI rule
11. Associative memory for video (Dr. Dmitry Gorodnichy)
These Neural Network are known as…
•pseudo-inverse networks - for using Moore-Penrose pseudoinverse V+ in computing the synapces
•projection networks - for synaptic (weight) matrix C=VV+ being the projection matrix on the space of prototypes
•Hopfield-like networks - for being binary and fully-connected in the stage of learning
•recurrent networks - for evolving in time, based on external input and internal memory
•attractor-based networks - for storing patterns as attractors (i.e. stable states of the network)
•dynamic systems - for allowing the dynamic systems theory to be applied
•associative memory - for being able to memorize, recall and forget patterns, just as much as humans do.
12. Associative memory for video (Dr. Dmitry Gorodnichy)
Analytical examination
By looking at the synaptic
weights Cij, one can say a lot …
about the properties of memory:
- how many main attractors (stored memories) it has.
- how good the retrieval is.
13. Associative memory for video (Dr. Dmitry Gorodnichy)
Attraction Radius as function of weights
Theoretical result:
(for direct attraction radius)
14. Associative memory for video (Dr. Dmitry Gorodnichy)
Dynamics of the network
The behaviour of the network is governed by the energy functions
• However :
-> They are fewThey are few, when D>0.1 [Gorodnichy&Reznik’97]-> They are detected automaticallyThey are detected automatically
The network always converges: as long as Cij=CjiThe network always converges: as long as Cij=Cji
• Cycles are possible, when D<1 :
15. Associative memory for video (Dr. Dmitry Gorodnichy)
Update flow neuro-processing
->-> is very fast is very fast (as only few neurons are actually changing in one iteration)
-> detects cycle automatically -> detects cycle automatically
-> suitable for parallel implementation-> suitable for parallel implementation
[Gorodnichy&Reznik’94]:
“Process only those neurons which change during the evolution”, i.e.
instead of N multiplications:
do only few of them :
16. Associative memory for video (Dr. Dmitry Gorodnichy)
How video information is processed ?
• As we know how to memorize, the question is
what should be memorized?
What type of video information needs to be processed
?
Lets see what mother nature (neurobiology) tells us
17. Associative memory for video (Dr. Dmitry Gorodnichy)
Visual Processing mechanisms
• Images are of very low resolution except in the fixation point.
• The eyes look at points which attract visual attention.
• Saliency is: in a) motion, b) colour, c) disparitydisparity, d) intensity.
• These channels are processed independently in brain Intensity means: frequencies, orientation, gradient .
• Brain process the sequences of images rather than one image. - Bad quality of images is compensated by the abundance of images.
• Colour & motion are used for segmentation.
• Intensity is used for recognition.
• Bottom-up (image driven) visual attention is very fast and precedes top-down (goal-driven) attention: 25ms vs 1sec.
Refs:Itti,….
18. Associative memory for video (Dr. Dmitry Gorodnichy)
Visual recognition mechanism
- What to learn: generality vs specifics, invariance vs selectivity
- Affine transformations in 2D (rotation in image plane, scale) are easily dealt with.
- No 3D model stored. Instead, several view-based 2D models stored
- One neural network per view.
In context of face recognition:
- Faces are stored in canonical representation
- 2D transformations are easy in image/video processing!
- Video allows to wait (until a face is a position in which it was stored)
Refs: Poggio,…
19. Associative memory for video (Dr. Dmitry Gorodnichy)
Orientation selectivity,Top-down vs bottom up detection
From [Riesenhuber-Poggio, Nature Neuroscience,2000]
20. Associative memory for video (Dr. Dmitry Gorodnichy)
On computer vision side
21. Associative memory for video (Dr. Dmitry Gorodnichy)
Perceptual Vision System
Goal: To detect, track and recognize face and facial movements of the user.
x y , z PUI
monitor
binary eventON
OFFrecognition /
memorizationUnknown User!
Setup:
+ face close to camera (within hand
distance)
+ approximately front-faced oriented
+ limited number of users and motions
- off-the shelf camera (low quality,
low resolution)
- Desktop computer (with limited
processing power)
22. Associative memory for video (Dr. Dmitry Gorodnichy)
What can be “perceived”: Face processing tasks
““Something yellow moves”Something yellow moves”
Face Segmentation
Facial Event Recognition
Face Memorization
Face Detection
Face Tracking(crude)
Face Classification
Face Localization(precise)
Face Identification
“It’s a face”
“It’s at (x,y,z,”
“Lets follow it!”
“It’s face of a child”“S/he smiles, blinks”
“Face unknown. Store it!” “It’s Mila!”
“I look and see…”
…
23. Associative memory for video (Dr. Dmitry Gorodnichy)
Computer Vision results achieved
• 1998. Proof-of-concept PUI: colour-based tracking [Bradski]– Unlikely to be used for precise tracking…
• 1999-2002.Several good skin colour models developed(HSV,UCS,YCrCb*)
– Unlikely to get better than that…
• 2002. Subpixel-accuracy convex-shape nose tracking [Nouse]
•
• 1999-2001. Motion-based segmentation & localization– 2001. Non-linear change detection– 2003. Second-order change detection [Double-blink]
•
• 2001. Viola-Jones face detection using Haar-like wavelets
•
• 2004. Stereotracking using nose and projective vision [Gorodnichy,Roth-IVC]
24. Associative memory for video (Dr. Dmitry Gorodnichy)
Face Detection and Tracking
25. Associative memory for video (Dr. Dmitry Gorodnichy)
Face Detection and Tracking (lights off)
26. Associative memory for video (Dr. Dmitry Gorodnichy)
Face Detection and Tracking (lights on)
27. Associative memory for video (Dr. Dmitry Gorodnichy)
Demand and applications
Internet, tendencias & tecnología
La nariz utilizada como mouse
En el Instituto de Tecnología de la Información, en Canadá, se desarrolló un sistema llamado Nouse que permite manejar softwares con movimientos del rostro. El creador de este programa, Dmitry Gorodnichy, explicó vía e-mail a LA NACION LINE cómo funciona y cuáles son sus utilidades
Si desea acceder a más información, contenidos relacionados, material audiovisual y opiniones de nuestros lectores ingrese en : http://www.lanacion.com.ar/03/05/21/dg_497588.asp Copyright S. A. LA NACION 2003. Todos los derechos reservados.
28. Associative memory for video (Dr. Dmitry Gorodnichy)
On importance of nose
Test: The user rotates his head only! (the shoulders do not move)
Precision / convenience is such that it allows one to use nose as mouse (or a joystick handle) – to Nouse
29. Associative memory for video (Dr. Dmitry Gorodnichy)
NouseTM : range and speed of tracking
30. Associative memory for video (Dr. Dmitry Gorodnichy)
Stereotracking with nose feature
31. Associative memory for video (Dr. Dmitry Gorodnichy)
Second-order change detection
• Detecting change in a change [Gorodnichy’03]
• Non-linear change detection deals with changes due illumination changes [ Durucan’02]
32. Associative memory for video (Dr. Dmitry Gorodnichy)
Eye Blink Detection
• Previously very difficult in moving heads
• With second-order change detection became possible
• Is currently used to enable people with brain injury face-to-face communication [AAATE’03]
33. Associative memory for video (Dr. Dmitry Gorodnichy)
Something (special) about video
Importance:
- Video is becoming ubiquitous. Cameras are everywhere.
- For security, computer–human interaction, video-conferencing, entertainment …
Constraints:
- Real-time processing is required.
- Low resolution: 160x120 images or mpeg-decoded.
- Low-quality: week exposure, blurriness, cheap lenses
Essence:
- It is inherently dynamic! temporal info to make up for bad quality
- It has parallels with biological vision! it can be processed efficiently
34. Associative memory for video (Dr. Dmitry Gorodnichy)
Applicability of 160x120 video
• According to face
anthropometrics(studied on BioID database)
• Tested with
Perceptual User interfaces
Face size
½ image ¼ image 1/8 image 1/16 image
In pixels 80x80 40x40 20x20 10x10
Between eyes-IOD 40 20 10 5
Eye size 20 10 5 2
Nose size 10 5 - -
FS b
FD b -
FT b -
FL b - -
FER b -
FC b -
FM / FI - -
– goodb – barely applicable
- – not good
35. Associative memory for video (Dr. Dmitry Gorodnichy)
Choosing the face model.
On importance of eyes:
• Eyes are the most salient features on a face.
• Besides, there two of them, which makes the excellent
reference frame out of them
• They also the best (and the only) stable landmarks on a face
which can be used a reference.
Intra-ocular distance (IOD) makes a very
convenient unit of measurement!
Eye –centered face model
On resolution:
• Lowest resolution possible, not to inflict overfitting due
to the present noise
(and there’s a lot of noise in video!)
36. Associative memory for video (Dr. Dmitry Gorodnichy)
Eye-centered face representations
Suitable for Face Analysis from video
d
24
2. .IOD
Suitable for Face Recognition in
travel documents [ICAO’02]
Size 24 x 24 is sufficient for face memorization & recognition and is optimal for low-quality video and for fast processing.
37. Associative memory for video (Dr. Dmitry Gorodnichy)
From image pixels to feature vectors
• When the eyes are detected, and a face is converted to a canonical
representation, it is easy to memorize to recognize
• Using (orientational, frequency) features: Gabor filters ?
• As faces are already rectified (to the same scale and orientation), no need
for complex transformations.
Just deal with illumination changes.
• Converting 24x24 face to binary feature vector:
A) Vi =Ixy - Iave , N=24x24=576
B ) Vi,j =sign(Ii - Ij ), N= 244
C ) Vi,j =Haar-like(i,j,k,l ) much more
• Some pixels may be ignored (corners, eye location)
38. Associative memory for video (Dr. Dmitry Gorodnichy)
Closer to experiments
• Network size of N=576 stores– M=N/2 states with …– M=N/4 states with 25%N error correction
• Faces are extracted using OpenCV Viola-Jones function
• Another way: from blinking as in [avbpa03]:
39. Associative memory for video (Dr. Dmitry Gorodnichy)
Visual memory for user perception
• What can be retrieved:– user identity, – face orientation, – facial expression
40. Associative memory for video (Dr. Dmitry Gorodnichy)
Retrieving orientation
41. Associative memory for video (Dr. Dmitry Gorodnichy)
A few more demos: taped and live…
… as time allows…
• Watch how memory is being filled out, as you learn new prototypes
42. Associative memory for video (Dr. Dmitry Gorodnichy)
Conclusions ?
Computer vision
Pattern recognition
Neuro-biology
43. Associative memory for video (Dr. Dmitry Gorodnichy)
Conclusions
• A lot has been done in PR, CV, NB. – How to know all of these ?…– How to use all of these ?… Or which way you’d prefer?
• Attractor-based network - great tool: – very easy to understand what it is doing– very suitable for live real-time video processing– Very much within the lines of biological vision– You are invited to try it yourself! – from our website
• Other contributions:– Canonical face representation for FPIV
• Is that possible to work, while on parental leave with two kids?
44. Associative memory for video (Dr. Dmitry Gorodnichy)
Dealing with a stream of data
Dynamic desaturationDynamic desaturation:
-> maintains the capacity of -> maintains the capacity of 0.2N0.2N (with complete retrieval) (with complete retrieval)-> allows to store data in real-time -> allows to store data in real-time (no need for iterative learning methods!) -> -> provides means for forgetting obsolete dataprovides means for forgetting obsolete data-> is the basis for the design of -> is the basis for the design of adaptive filtersadaptive filters