Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple...

66
Challenges in Visual Recognition: Challenges in Visual Recognition: A Historical Perspective A Historical Perspective Jitendra Malik Jitendra Malik Jitendra Malik Jitendra Malik University of California at Berkeley University of California at Berkeley

Transcript of Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple...

Page 1: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Challenges in Visual Recognition:Challenges in Visual Recognition:A Historical PerspectiveA Historical Perspective

Jitendra MalikJitendra MalikJitendra MalikJitendra MalikUniversity of California at BerkeleyUniversity of California at Berkeley

Page 2: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

The more you look, the more you see!The more you look, the more you see!

Page 3: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

PASCAL Visual Object Challenge

Page 4: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

We want to locate the objectWe want to locate the object

Orig. Image Segmentation Orig. Image Segmentation

Page 5: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

And we want to detect and label parts..

The Visually Tagged HumanThe Visually Tagged Human Projectj

Page 6: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Categorization at Multiple Levelsg p

Watertd

TigerGrass outdoor

wildlife

Sand back

Tiger

eye

head

tail

eye

legs mouth

Computer Vision GroupUC Berkeleyshadow

Page 7: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.
Page 8: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Examples of Actionsp• Movement and posture change

run walk crawl jump hop swim skate sit stand kneel lie dance– run, walk, crawl, jump, hop, swim, skate, sit, stand, kneel, lie, dance (various), …

• Object manipulationObject manipulation– pick, carry, hold, lift, throw, catch, push, pull, write, type, touch, hit,

press, stroke, shake, stir, turn, eat, drink, cut, stab, kick, point, drive, bike insert extract juggle play musical instrument (various)bike, insert, extract, juggle, play musical instrument (various)…

• Conversational gesturepoint– point, …

• Sign Language

Page 9: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Key cues for action recognitiony g

• “Morpho-kinetics” of action (shape andMorpho kinetics of action (shape and movement of the body)Id tit f th bj t/• Identity of the object/s

• Activity contexty

• ACTION = MOVEMENT + GOAL

Page 10: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Resolution Regimesg

Far field Near fieldFar field Near field

3 i l• 3-pixel man• Blob tracking

• 300-pixel man• Stick Figureg

Page 11: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Medium-field Recognitiong

The 30-Pixel Man

Page 12: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

The more you look, the more you see!The more you look, the more you see!

Page 13: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

We need to identifyy

• Objects

A t• Agents

• Relationships among objects with objects, objects p g j j jwith agents, agents with agents …

• Events and ActionsEvents and Actions

Computer Vision GroupUniversity of California

Berkeley

Page 14: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Different aspects of visionDifferent aspects of vision

• Perception: study the “laws of seeing” -predict what a• Perception: study the laws of seeing -predict what a human would perceive in an image.

• Neuroscience: understand the mechanisms in the retina andNeuroscience: understand the mechanisms in the retina and the brain

• Function: how laws of optics, and the statistics of the p ,world we live in, make certain interpretations of an image more likely to be valid

The match between human and computer vision is strongest at the level of function, but since typically the results of computer vision aremeant to be conveyed to humans makes it useful to be consistent

ith h ti N i i f id b t b iwith human perception. Neuroscience is a source of ideas but beingbio-mimetic is not a requirement.

Page 15: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Taxonomy and Partonomyy y

• Taxonomy: E.g. Cats are in the order Felidae which in turn is in the class Mammalia

Recognition can be at multiple levels of categorization or be identification at– Recognition can be at multiple levels of categorization, or be identification at the level of specific individuals , as in faces.

• Partonomy: Objects have parts, they have subparts and so on. The human body contains the head, which in turn contains the eyes.

• These notions apply equally well to scenes and to activities.

h l i h d h h i b i l l hi h• Psychologists have argued that there is a “basic-level” at which categorization is fastest (Eleanor Rosch et al).

• In a partonomy each level contributes useful information for recognition.In a partonomy each level contributes useful information for recognition.

Computer Vision GroupUC Berkeley

Page 16: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Visual Processing AreasVisual Processing Areasgg

Page 17: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Macaque Visual AreasMacaque Visual Areas

Page 18: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Hubel and Wiesel (1962) discovered orientation sensitive neurons in V1

Page 19: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

These cells respond to edges and bars ..

Page 20: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.
Page 21: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Orientation based features were inspired by V1 (SIFT GIST HOG GB etc)(SIFT, GIST, HOG, GB etc)

Computer Vision GroupUC Berkeley

Page 22: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Attneave’s Cat (1954)Line drawings convey most of the informationLine drawings convey most of the information

Computer Vision GroupUC Berkeley

Page 23: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Modeling simple cellsModeling simple cells

• Elongated directional G i d i i

• Elongated directional G i d i iGaussian derivatives

• 2nd derivative and Gaussian derivatives

• 2nd derivative and Hilbert transform

• L1 normalized for Hilbert transform

• L1 normalized for 1scale invariance

• 6 orientations 3 scales

1scale invariance

• 6 orientations 3 scales6 orientations, 3 scales• Zero mean

6 orientations, 3 scales• Zero mean

Used for texture discrimination and classification by Malik and Perona (1990), Leung and Malik (1999)

Page 24: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Texton Histogram Model for Recognition(Leung & Malik 1999) cf Bag of Words(Leung & Malik, 1999) cf. Bag of Words

Rough Plastic

Pebbles

Plaster-b

Terrycloth

ICCV '99, Corfu, Greece

Page 25: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Object Detection can be very fastj y

O k f j d i i l• On a task of judging animal vs no animal, humans can make mostly correct saccades in 150 ms (Kirchner & (Thorpe, 2006)

C bl t ti d l i th ti– Comparable to synaptic delay in the retina, LGN, V1, V2, V4, IT pathway.

– Doesn’t rule out feed back but shows feed f d l i f lforward only is very powerful

• Detection and categorization are ti ll i lt (G ill S tpractically simultaneous (Grill-Spector

& Kanwisher, 2005)

Computer Vision GroupUC Berkeley

Page 26: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Rolls et al (2000)Rolls et al (2000)Rolls et al (2000)Rolls et al (2000)

Page 27: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.
Page 28: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Convolutional Neural Networks (LeCun et al)(LeCun et al)

Page 29: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

A brief history of computer vision ..

Those who cannot remember the past are condemned to repeat it-George Santayana

29

Page 30: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Fifty years of computer vision 1963-2013y y p

• 1960s: Beginnings in artificial intelligence, image processing and pattern recognition

• 1970s: Foundational work on image formation: Horn, Koenderink, Longuet-Higgins …

• 1980s: Vision as applied mathematics: geometry, multi-scale analysis, probabilistic modeling, control theory, optimization

• 1990s: Geometric analysis largely completed, vision meets graphics, statistical learning approaches resurface

• 2000s: Significant advances in visual recognition, range of practical applications

Computer Vision GroupUC Berkeley

Page 31: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Object recognition in computer visionj g p

• Recognition as Pose Estimation

R iti D i ti i V l t i• Recognition as Description using Volumetric primitives

• Recognition as Pattern Classification

• Recognition as Deformable MatchingRecognition as Deformable Matching

Computer Vision GroupUniversity of California

Berkeley

Page 32: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Recognition as Pose Estimation:Object as a set of points in 3DObject as a set of points in 3D

• Roberts (1963) , Faugeras & Hebert (1983), Huttenlocher & Ullman (1987)( )

• VariantsGeometric Hashing : Lamdan & Wolfson (1988)– Geometric Hashing : Lamdan & Wolfson (1988)

– Pose Clustering : Stockman (1987), Olson (1994)Linear Combination of Views: Basri & Ullman (1991)– Linear Combination of Views: Basri & Ullman (1991)

Computer Vision GroupUniversity of California

Berkeley

Page 33: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Huttenlocher & Ullman’s alignment Algorithm (1990)Algorithm (1990)

Page 34: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Recognition as Fitting Volumetric Primitives: Object as a hierarchy of simple shapesObject as a hierarchy of simple shapes

• Binford (1971) , Marr & Nishihara (1978), Biederman(1987)( )

• Discredited as an approach for recognition in general, it has retained appeal for analyzing images of peopleit has retained appeal for analyzing images of people

Computer Vision GroupUniversity of California

Berkeley

Page 35: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

The Stick Figure IdealThe Stick Figure Ideal

Page 36: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.
Page 37: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Recognition as Statistical Pattern Classification: Object as a feature vectorObject as a feature vector

• Optical Character Recognition studied as far back as the 1950s. Recent years focus on handwritten digit classification and face detection.

• Some examples:– Neural networks: Neocognitron (Fukushima, 1980, 1988) , Convolution

Neural Networks (LeCun et al), C2 Features (Serre, Wolf & Poggio 2005)

– Support Vector Machines (various)– Decision Trees (Amit, Geman, & Wilder, 1997)– Boosted Decision Trees (Viola & Jones 2001)– Boosted Decision Trees (Viola & Jones, 2001)

Computer Vision GroupUniversity of California

Berkeley

Page 38: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.
Page 39: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Handwritten digit recognition (MNIST USPS)(MNIST,USPS)

• LeCun’s Convolutional Neural Networks variations (0.8%, 0 6% d 0 4% MNIST)0.6% and 0.4% on MNIST)

• Tangent Distance(Simard, LeCun & Denker: 2.5% on USPS)

• Randomized Decision Trees (Amit, Geman & Wilder, 0.8%)

• K-NN based Shape context/TPS matching (Belongie, Malik & p g ( g ,Puzicha: 0.6% on MNIST)

Computer Vision GroupUniversity of California

Berkeley

Page 40: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Convolutional Neural Networks (LeCun et al)(LeCun et al)

Page 41: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

The idea behind Tangent Distance (Simard et al)(Simard et al)

Page 42: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.
Page 43: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.
Page 44: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Amit, Geman & Wilder (1997)( )

Page 45: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Recognition as Pictorial Structure Matching: Object as a spatial configuration of features

• Transformations to model shape variation - D’Arcy Wentworth Thompson (1910)(1910)

• Grenander (1970s and later) probabilistic models on transformations

Fi hl d El hl (1973) d f bl hi f l d k “ i• Fischler and Elschlager (1973) - deformable matching of landmarks ,“point masses”, in a configuration of “springs” to model deformable templates.

• Von der Malsburg - dynamic link architecture for neural modeling, elastic Vo de a sbu g dy a c a c tectu e o eu a ode g, e ast cgraph matching for face recognition (1993, 1997)

• Felzenszwalb and Huttenlocher (2000) - pictorial structures for aligning h b di t ti k fi i d i ihuman bodies to stick figures using dynamic programming

• Belongie, Malik & Puzicha (2001) use “shape contexts” as point descriptors, and thin plate splines to model deformation.p , p p

Computer Vision GroupUniversity of California

Berkeley

Page 46: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Modeling shape variation in a categoryg p g y

• D’Arcy Thompson: On Growth and Form, 1917y p ,– studied transformations between shapes of organisms

Computer Vision GroupUniversity of California

Berkeley

Page 47: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

MatchingExampleExample

model target

Computer Vision GroupUniversity of California

Berkeley

Page 48: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

EZ-Gimpy Results (Mori & Malik, 2003)py ( , )

• 171 of 192 images correctly identified: 92 %g y

horse spadep

smile join

Computer Vision GroupUC Berkeleycanvas here

Page 49: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Face DetectionFace Detection Carnegie Mellon University

R lt i i b itt d t th CMU li f d t tResults on various images submitted to the CMU on‐line face detectorhttp://www.vasc.ri.cmu.edu/cgi‐bin/demos/findface.cgi

Page 50: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Multiscale sliding windowMultiscale sliding window

Ask this question repeatedly varying position scale categoryAsk this question repeatedly, varying position, scale, category…

Paradigm introduced by Rowley, Baluja & Kanade 96 for face detectionViola & Jones 01 Dalal & Triggs 05 Felzenszwalb McAllester Ramanan 08Viola & Jones 01, Dalal & Triggs 05, Felzenszwalb, McAllester, Ramanan 08

Page 51: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Caltech-101 [Fei-Fei et al. 04][ ]

• 102 classes, 31-300 images/class

Computer Vision GroupUC Berkeley

Page 52: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Caltech 101 classification results

(even better by combining cues )(even better by combining cues..)

Page 53: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.
Page 54: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

PASCAL Visual Object Challenge

Page 55: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.
Page 56: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.
Page 57: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.
Page 58: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

A good building block is a linear SVM trainedA good building block is a linear SVM trained on HOG features (Dalal & Triggs)

Page 59: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.
Page 60: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

AP=0.23AP 0.23

Page 61: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.
Page 62: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.
Page 63: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Datasets and computer vision (slide credit: Fei‐Fei Li)(slide credit: Fei‐Fei Li)

UIUC Cars (2004)S. Agarwal, A. Awan, D. Roth

FERET Faces (1998)P. Phillips, H. Wechsler, J. H P R

CMU/VASC Faces (1998)H. Rowley, S. Baluja, T. Kanade

COIL Objects (1996)S. Nene, S. Nayar, H. Murase

Huang, P. Raus

MNIST di i (1998 10) KTH h i (2004) Si L (2008) S i (2001)MNIST  digits (1998‐10)Y LeCun & C. Cortes

KTH human action (2004)I. Leptev & B. Caputo

Sign Language (2008)P. Buehler, M. Everingham, A. Zisserman 

Segmentation (2001)D. Martin, C. Fowlkes, D. Tal, J. Malik.

3D Textures (2005)S. Lazebnik, C. Schmid, J. Ponce

CuRET Textures (1999)K. Dana B. Van Ginneken S. Nayar J. Koenderink

CAVIAR Tracking (2005)R. Fisher, J. Santos‐Victor J. Crowley 

Middlebury Stereo (2002)D. Scharstein R. Szeliski 

Page 64: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

Comparison among free datasets

10)

p g(slide credit: Fei‐Fei Li)

4ory (lo

g_1

3

PASCAL1LabelMe

er catego

2

Caltech101/256MRSCTiny Images2m

ages pe

1

f clean

 im

1 2 3 4 5# of visual concept categories (log_10)

# of

1. Excluding the Caltech101 datasets from PASCAL2. No image in this dataset is human annotated. The # of clean images per category is a rough estimation

Page 65: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

The more you look, the more you see!The more you look, the more you see!

Page 66: Challenges in Visual Recognition: A Historical Perspective · Project. Categgporization at Multiple ... Recent years focus on handwritten digit classification and face detection.

So much remains to be done…

• Objects, Scenes, Events

• The semantic gap is to be confronted, not avoided!

Computer Vision GroupUC Berkeley