Pictorial Structures for Object Recognition€¦ · Felzenswab Talk, 2007. Find interesting...

Post on 16-Jul-2020

6 views 0 download

Transcript of Pictorial Structures for Object Recognition€¦ · Felzenswab Talk, 2007. Find interesting...

Pictorial Structures for Object Recognition

Pedro F. FelzenszwalbPresented by Hanlin Tang

COS/PSY 598b

Feature detection

Felzenswab Talk, 2007

Presenter
Presentation Notes
Find interesting features Build spatial model on how feature locations vary LOCAL decisions on feature detection difficult – NO CONTEXT

Pictorial Recognition

Felzenswab Talk, 2007

Presenter
Presentation Notes
Springs are cool. Parts put together by spring, and try to make it fit the image! Provides context!

The Task

Presenter
Presentation Notes
Given model from higher being on what a human should look like.. Find it in an image!

Formalizing this intuition

Vertex vi has {ui } Edge has cij

Presenter
Presentation Notes
What exactly does the higher being give you? A set of parameters Object is a graph where the nodes are parts, and the edges are connections U_i is the appearance parameter, what we expect the part to look like pixel-wise C_ij characterizes the connections.. How tight is the spring? How is it oriented? These are all relative aspects, nothing is location fixed!

The Task

Θ ={u,c} I L = {l1 , l2 , l3 …

Bayes Rule to the Rescue…

Assuming part independence,

Presenter
Presentation Notes
P(I|L,theta) -> how likely that the training image comes from this dataset (level of mismatch) P(L|\theta) -> how likely this configuration comes from the model (deformation.. How far from ideal model in terms of stretching) DECOUPLING OF MISMATCH AND DEFORMATION!!!!

A more intuitive formulation:

Fischler and Eschlager (1972)

Mismatch with image

Deformation Cost(cost of stretching springs!)

The Grand AssumptionWant to define

In 1- dimension:

2

⎟⎠⎞

⎜⎝⎛ −

σμx

In N-dimensions:

Let: 21 llx −=

)()( 1 μμ −Σ− − xx T

The Grand Assumptionx

Presenter
Presentation Notes
Done to reduce running time, as explained later!

Iconic Models - Faces

Θ ={u,c} L = {l1 , l2 , l3 …

Presenter
Presentation Notes
Here, l_1 is simply (x,y) reminder: u is what each part should “look like” C is how stiff the spring is and what the ideal spatial location is

What an eye should look like

• The naïve way:

⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜

=

2.03.03.03.02.01.04.08.03.01.01.01.09.01.01.01.05.06.05.01.02.04.04.04.02.0

μ},{ σμ=u

But can reduce dimensions!

• The intuition:

COS424 notes

COS424 notes

More intuition-building:

Applied to natural data

⊗ =

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

6.05.02.06.04.09.07.01.02.0

= )( ilα

Presenter
Presentation Notes
Robust to gain or bias from lighting conditions Changes in small scale or image deformations

),( iiiu Σ= μ

),),((),|( iiiii ululIp ΣΝ∝ α

Characterizing springs

sij

),,()|,( ijijjiijii sllcllp Σ−Ν=

),( ijijij sc Σ=

Seek models to define:

• Location• Appearance• Connections••

),( iiiu Σ= μ),( yxli =

),( ijijij sc Σ=),),((),|( iiiii ululIp ΣΝ∝ α),,()|,( ijijjiijii sllcllp Σ−Ν=

Presenter
Presentation Notes
Go to blackboard for this part!

Some Math:),,()|,( ijijjiijii sllcllp Σ−Ν=

Presenter
Presentation Notes
If you were a terrorist and tried to hide your face in airport, no luck! 3/5 part detection!
Presenter
Presentation Notes
Gaze direction, location uncertainty when 1 fixed

Articulated Models - Humans

Presenter
Presentation Notes
BLACKBOARD Talk about how to get the probability distributions, and the various definitions. Notes written up on paper for blackboard use!

Seek models to define:

• Location• Appearance• Connections••

),( 21 qqui =),,,( θsyxli =

),,,,,,,,( kyxyxc ijsyxjijiijijij θσσσ=212211 )5.0()1()1(),|( '

22'

11AATnnnn

ii qqqqulIp −−−−=

),0),()(()|,( ijjijijiijii DlTlTcllp −Ν∝

Presenter
Presentation Notes
Go to blackboard for this part!
Presenter
Presentation Notes
Good results
Presenter
Presentation Notes
One Arm estimate wrong because of binary foreground/background selection. But still good estimate because of context
Presenter
Presentation Notes
Deals with noise well.. Model from higher being provides context for robust fitting even with noise
Presenter
Presentation Notes
More good noise countering. Power of context in pictorial representation.

• Restriction on dij allows linear running time for Finding L*

• Efficient ways to sample from the Posterior p(L|I,θ)

Learning

,..},{ 21 III = ,..},{ 21 LLL = Θ ={u,c}

Presenter
Presentation Notes
Note: connections not preset! Algorithm learns itself!

Learning the Model

Learning Appearance and Connections

General Framework

Felzenswab Talk, 2007

Key Points

• Pictorial models bring context to recognition

• Robust to noise, scale, lighting effects• Generalized structure• Heavily dependent on prior training of

model• Require robust definitions of (u,v)

Presenter
Presentation Notes
Robust definitios – i.e. dependent on background subtraction for humans

Fischler and Eschlager (1972)