Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random...

57
Machine Learning Srihari 1 Conditional Random Fields Sargur N. Srihari [email protected] Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/CSE574/index.html

Transcript of Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random...

Page 1: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

1

Conditional Random Fields

Sargur N. [email protected]

Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/CSE574/index.html

Page 2: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

2

Outline1. Generative and Discriminative Models2. Classifiers

Naïve Bayes and Logistic Regression3. Sequential Models

HMM and CRFMarkov Random Field

4. CRF vs HMM performance comparisonNLP: Table extraction, POS tagging, Shallow parsing, Document analysis

5. CRFs in Computer Vision6. Summary7. References

Page 3: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

3

Methods for Classification• Generative Models (Two-step)

• Infer class-conditional densities p(x|Ck) and priors p(Ck)• then use Bayes theorem to determine posterior

probabilities.

• Discriminative Models (One-step)• Directly infer posterior probabilities p(Ck|x)

• Decision Theory– In both cases use decision theory to assign each new x

to a class

)x()()|x()x|(

pCpCpCp kk

k =

Page 4: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

4

Generative Classifier• Given M variables, x =(x1,..,xM), class variable

y and joint distribution p(x,y) we can – Marginalize

– Condition• By conditioning the joint pdf we form a classifier

• Huge need for samples– If xi are binary, need 2M values to specify p(x,y)

• If M=10, there are two classes and 100 samples are needed to estimate a given probability, then we need2048 samples

x( ) (x, )p y p y= ∑

(x, )( | x)(x)

p yp yp

=

Page 5: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

5

Classification of ML Methods• Generative Methods

– “Generative” since sampling can generate synthetic data points

– Popular models• Naïve Bayes, Mixtures of multinomials• Mixtures of Gaussians, Hidden Markov Models• Bayesian networks, Markov random fields

• Discriminative Methods– Focus on given task– better performance– Popular models

• Logistic regression, SVMs• Traditional neural networks, Nearest neighbor• Conditional Random Fields (CRF)

Page 6: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

6

Generative-Discriminative Pairs

• Naïve Bayes and Logistic Regression form a generative-discriminative pair

• Their relationship mirrors that between HMMs and linear-chain CRFs

Page 7: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

7

Graphical Model RelationshipHidden Markov ModelNaïve Bayes Classifier

GE

NE

RA

TIV

E

Logistic Regression

SEQUENCE

CONDITION

y

x1 xM x1 xN

y1 yN

p(y,x)

p(y/x)

Y

p(Y,X)Xx

CONDITION

SEQUENCE

p(Y/X)

DIS

CR

IMIN

ATI

VE

Conditional Random Field

Page 8: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

8

Naïve Bayes Classifier• Goal is to predict single class variable y

given a vector of features x=(x1,..,xM)• Assume that once class labels are

known the features are independent• Joint probability model has the form

– Need to estimate only M probabilities• Factor graph obtained by defining

factors ψ(y)=p(y), ψm(y,xm)=p(xm,y)

1

( , x) ( ) ( | )M

mm

p y p y p x y=

= ∏

Page 9: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

9

Logistic Regression Classifier

a

σ(a)

Properties:A. Symmetry

σ(-a)=1-σ(a)B. Inverse

a=ln(σ /1-σ)known as logit.Also known as log odds since it is the ratioln[p(C1|x)/p(C2|x)]

C. Derivativedσ/da=σ(1-σ)

Logistic Sigmoid• Feature vector x• Two-class classification: class

variable y has values C1 and C2• A posteriori probability p(C1|x)

written asp(C1|x) =f(x) = σ (wTx) where

• Known as logistic regression in statistics– Although a model for classification

rather than for regression

1( )1 exp( )

aa

σ =+ −

Page 10: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

10

Relationship between Logistic Regression and Generative Classifier

• Posterior probability of class variable y is

• In a generative model we estimate the class-conditionals (which are used to determine a)

• In the discriminative approach we directly estimate a as a linear function of x i.e., a = wTx

1 11

1 1 2 2

1 1

2 2

(x | ) ( )( | x)(x | ) ( ) (x | ) ( )

(x | ) ( )1 = ( ) where ln1 exp( ) (x | ) ( )

p C p Cp Cp C p C p C p C

p C p Ca aa p C p C

σ

=+

= =+ −

Page 11: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

11

Logistic Regression Parameters

• With M variables logistic regression has Mparameters w=(w1,..,wM)

• By contrast, generative approach– by fitting Gaussian class-conditional densities will

result in 2M parameters for means, M(M+1)/2parameters for shared covariance matrix, and one for class prior p(C1)

– Which can be reduced to O(M) parameters by assuming independence via Naïve Bayes

Page 12: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

12

Learning Logistic Parameters• For data set {xn,tn}, tn={0,1} likelihood is

where t=(t1,..tN)T and yn=p(C1|xn)• Parameter w specifies the yn as follows

yn=σ(an) and an=wTxn

• Defining cross-entropy error function

• Taking gradient wrt w

• Same form as sum-of-squares error for linear regression– Can use sequential algorithm where samples are

presented one at a time using

1

1

(t | w) {1 }n n

n

Nt t

nn

p y y −

=

= −∏

{ }1

(w) ln (t | w) ln (1 ) ln(1 )N

n n n nn

E p t y t y=

= − = − + − −∑

1(w) ( )x

N

n n nn

E y t=

∇ = −∑

( 1) ( )w wt tnEη+ = − ∇

Page 13: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

13

Multi-class Logistic Regression

• Case of K>2 classes

• Known as normalized exponentialwhere ak=ln p(x|Ck)p(Ck)

• Normalized exponential also known as softmaxsince if ak>>aj then p(Ck|x)=1 and p(Cj|x)=0

• In logistic regression we assume activations given by ak=wk

Tx

( | ) ( )( | x)( | ) ( )

exp( ) =exp( )

k kk

j jj

k

jj

p x C p Cp Cp x C p C

aa

=∑

Page 14: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

14

Learning Parameters for Multiclass• Determining parameters {wk} by m.l.e. requires

derivatives of yk wrt all activations ajwhere Ikj are elements of the identity matrix

• Likelihood function written using 1-of-K coding– Target vector tn for xn belonging to Ck is a binary vector

with all elements zero except for element k which is one

where T is an N x K matrix of target variables with elements tnk

• Cross entropy error function

• Gradient of error function is– which allows a sequential weight vector update algorithm

( )kk kj j

j

y y I ya

∂= −

11 1 1 1

( | w ,.., w ) ( | ) nk nk

nk

N K N Kt t

K k nn k n k

p T p C x y= = = =

= =∏∏ ∏∏

1 11 1

(w ,.., w ) ln ( | w ,..w ) lnN K

K K nk nn k

E p T t ky= =

= − = −∑∑

( )1

xN

nj nj nn

y t=

−∑

Page 15: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

15

Graphical Model for Logistic Regression• Multiclass logistic regression can be

written as

• Rather than using one weight per class we can define feature functions that are nonzero only for a single class

• This notation mirrors the usual notation for CRFs

1

1

1( | x) exp where(x)

(x) = exp

K

y yj jj

K

y yj jyj

p y xZ

Z x

λ λ

λ λ

=

=

⎧ ⎫= +⎨ ⎬

⎩ ⎭⎧ ⎫

+⎨ ⎬⎩ ⎭

∑ ∑

1

1( | x) exp ( , x)(x)

K

k kk

p y f yZ

λ=

⎧ ⎫= ⎨ ⎬

⎩ ⎭∑

Page 16: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

16

3. Sequence Models

• Classifiers predict only a single class variable

• Graphical Models are best to model many variables that are interdependent

• Given sequence of observations X={xn}n=1N

• Underlying sequence of states Y={yn}n=1N

Page 17: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

17

Sequence Models

Y

X

• Hidden Markov Model (HMM)

• Independence assumptions:• each state depends only on its immediate predecessor• each observation variable depends only on the current

state

• Limitations:

• Strong independence assumptions among the observations X. • Introduction of a large number of parameters by modeling the joint

probability p(y,x), which requires modeling the distribution p(x)

Page 18: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

18

Sequence ModelsHidden Markov Model (HMM)

Conditional Random Fields (CRF)

Y

X

Y

X

Y

X

Y

X

x

Y

A key advantage of CRF is their great flexibility to include a wide variety of arbitrary, non-independent features of the observations.

Page 19: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

19

Generative Model: HMM• X is observed data sequence to be

labeled, Y is the random variable over the label sequences

• HMM is a distribution that models p(Y, X)

• Joint distribution is

• Highly structured network indicates conditional independences,– past states independent of future

states– Conditional independence of observed

given its state.

y1 y2 yn yN

x1 x2 xn xN

11

, ( | ) ( | )N

n n n nn

p( ) p y y p y−=

= ∏Y X x

Page 20: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

20

Discriminative Model for Sequential Data

• CRF models the conditional distribution p(Y/X)

• CRF is a random field globally conditioned on the observation X

• The conditional distribution p(Y|X) that follows from the joint distribution p(Y,X) can be rewritten as a Markov Random Field

y1 y2 yn yN

X

Page 21: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

21

Markov Random Field (MRF)• Also called undirected graphical model• Joint distribution of set of variables x is defined by an

undirected graph as

where C is a maximal clique (each node connected to every other node),

xC is the set of variables in that clique, ψC is a potential function (or local or compatibility function)

such that ψC(xC) > 0, typically ψC(xC) = exp{-E(xC)}, andis the partition function for normalization

• Model refers to a family of distributions and Fieldrefers to a specific one

1(x) (x )C CC

pZ

ψ= ∏

x(x )C C

C

Z ψ= ∑ ∏

Page 22: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

22

MRF with Input-Output Variables• X is a set of input variables that are observed

– Element of X is denoted x• Y is a set of output variables that we predict

– Element of Y is denoted y• A are subsets of X U Y

– Elements of A that are in A ^ X are denoted xA– Element of A that are in A ^ Y are denoted yA

• Then undirected graphical model has the form

x,y

1(x,y) (x , y ) where Z= (x , y )A A A A A AA A

pZ

= Ψ Ψ∑∏ ∏

Page 23: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

23

MRF Local Function

• Assume each local function has the form

where θA is a parameter vector, fA are feature functions and m=1,..M are feature subscripts

(x , y ) exp (x , y ) A A A Am Am A Am

fθ⎧ ⎫Ψ = ⎨ ⎬

⎩ ⎭∑

Page 24: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

24

From HMM to CRF• In an HMM

• Can be rewritten as

• Further rewritten as

• Which gives us

• Note that Z cancels out

11

, ( | ) ( | )N

n n n nn

p( ) p y y p y−=

= ∏Y X x

1{ } { },

1( , ) exp 1 1n nij y i y j

n i j S n ip

Zλ µ

−= =∈ ∈

⎧ ⎫= +⎨ ⎬

⎩ ⎭∑ ∑ ∑∑Y X { } { }1 1

n noi y i x oS o O

= =∈∑

11

1( ) exp ( , , )M

m m n n nm

p f y yZ

λ −=

⎧ ⎫= ⎨ ⎬

⎩ ⎭∑Y, X x

1

1

, )

, , )

n n

n n n

y

y

1

''

1

exp ( ,( , )( )

( ', ) exp (

M

m m nm

My

m mym

f yp y xp

p y x f y

λ

λ

=

=

⎧ ⎫⎨ ⎬⎩ ⎭= =

⎧ ⎫⎨ ⎬

x

x⎩ ⎭

∑∑ ∑ ∑

Y | X

Indicator function:1{x = x’} takes value 1whenx=x’ and 0 otherwise

Parameters of the distribution:θ ={λij,µoi}

Feature Functions have the form fm(yn,yn-1,xn):Need one feature for eachstate transition (i,j)fij(y,y’,x)=1{y=i}1{y’=j} and one for each state-observation pairfio(y,y’,x)=1{y=i}1{x=o}

Page 25: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

25

CRF definition

• A linear chain CRF is a distribution p(Y|X) that takes the form

• Where Z(X) is an instance specific normalization function

11

1( | ) exp ( , , )(X)

M

m m n n nm

p f y yZ

λ −=

⎧ ⎫= ⎨ ⎬

⎩ ⎭∑Y X x

11

(X) exp ( , , )M

m m n n ny m

Z f y yλ −=

⎧ ⎫= ⎨ ⎬

⎩ ⎭∑ ∑ x

Page 26: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

26

Functional ModelsHidden Markov ModelNaïve Bayes Classifier

Logistic Regression

y

x1 xM x1 xN

y1 yN

1

( | )M

mm

Y

GE

NE

RA

TIV

E

x X

DIS

CR

IMIN

ATI

VE

Conditional Random Field

p(y, ) p(y) p x y=

= ∏x

1

'1

exp ( , )( | )

exp ( ', )

M

m mm

M

m mym

f yp y

f y

λ

λ

=

=

⎧ ⎫⎨ ⎬⎩ ⎭=

⎧ ⎫⎨ ⎬⎩ ⎭

∑ ∑

xx

x

11

, ( | ) ( | )N

n n n nn

p( ) p y y p y−=

= ∏Y X x

11

1'1

exp ( , , )( | )

exp ( ', ', )

M

m m n n nm

M

m m n n nm

f y yp

f y y

λ

λ

−=

−=

⎧ ⎫⎨ ⎬⎩ ⎭=

⎧ ⎫⎨ ⎬⎩ ⎭

∑ ∑y

xY X

x

yn

xn

Page 27: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

27

4. CRF vs Other Models• Generative Models

– Relax assuming conditional independence of observed data given the labels

– Can contain arbitrary feature functions• Each feature function can use entire input data sequence.

Probability of label at observed data segment may depend on any past or future data segments.

• Other Discriminative Models– Avoid limitation of other discriminative Markov models

biased towards states with few successor states.– Single exponential model for joint probability of entire

sequence of labels given observed sequence.– Each factor depends only on previous label, and not

future labels. P(y | x) = product of factors, one for each label.

Page 28: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

28

NLP: Part Of Speech Tagging

For a sequence of words w = {w1,w2,..wn} find syntactic labels s for each word:

w = The quick brown fox jumped over the lazy dogs = DET VERB ADJ NOUN-S VERB-P PREP DET ADJ NOUN-S

Baseline is already 90%

• Tag every word with its most frequent tag

• Tag unknown words as nouns

Model Error

HMM 5.69%

CRF 5.55% Per-word error rates for POS tagging on the Penn treebank

Page 29: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

29

Table ExtractionTo label lines of text document:

Whether part of table and its role in table.

Finding tables and extracting information is necessary component of data mining, question-answering and IR tasks.

HMM CRF

89.7% 99.9%

Page 30: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

30

Shallow Parsing• Precursor to full parsing or information extraction

– Identifies non-recursive cores of various phrase types in text• Input: words in a sentence annotated automatically with POS

tags• Task: label each word with a label indicating

– word is outside a chunk (O), starts a chunk (B), continues a chunk (I)

CRFs beat all reported single-model NP chunking results on standard evaluation dataset

NP chunks

Page 31: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

31

Handwritten Word Recognition

∑=

');x,'(

);x,(

),x|(y

y

y

eeyP θψ

θψ

θ ∑ ∑= ∈

⎟⎟⎠

⎞⎜⎜⎝

⎛+=

m

j Ekj

tkj

sj yykjIyjAy

1 ),(),x,,,,(;x,,();x,( θθθψ

where yi ε (a-z,A-Z,0-9}, θ : model parameters

∑ ⋅=i

sijj

si

sj yjfyjA ))x,,(();x,,( θθ

∑ ⋅=i

tijkkj

ti

tkj yykjfyykjI ))x,,,,(();x,,,,( θθ

Given word image and lexicon, find most probable lexical entryAlgorithm Outline

• Oversegment image segment combinations are potential characters

• Given y = a word in lexicon, s = grouping of segments, x = input word image features

• Find word in lexicon and segment grouping that maximizesP(y,s | x),

0 20 40 60 80 100 1200.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

Word Recognition Rank

Pre

cisi

on

SDPCRF

Interaction Potential

Association Potential (state term)

CRF Model

CRF

Segment-DP

WR Rank

Prec

isio

n

Page 32: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

32

Document Zoning (labeling regions) error rates

CRF Neural Network

Naive Bayes

Machine Printed Text

1.64% 2.35% 11.54%

Handwritten Text

5.19% 20.90% 25.04%

Noise 10.20% 15.00% 12.23%

Total 4.25% 7.04% 12.58%

Page 33: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

33

5. CRFs in Computer Vision

Page 34: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

34

Applications in Computer Vision• Image Modeling

– Labeling regions in image (image segmentation)

– Object detection and recognition (important areas)

– Sign detection in natural images

– Natural scene categorization

– Gesture Recognition

• Image retrieval• Object/motion

segmentation in video scene

Small patch is water in onecontext and sky in another

Page 35: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

35

……

(a) Segmentation and labeling of input image in meaningful regions.

(b) Detection of structured textures such as buildings.

(c) restore images corrupted by noise.

Various tasks in computer vision that require explicit consideration of spatial dependencies!

Page 36: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

36

Image ModelingImage Labeling:

classifying every image pixel/patch into a finite set of classObject Recognition:

requires semantic image content understanding based on labeling

CRF models:• Directly predict the

segmentation/labeling given the observed image

• Incorporate arbitrary functions of the observed features into the training process

Page 37: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

37

CRFs for Image Modeling• Discriminative Random Fields (DRF)

– S. Kumar and M. Hebert. Discriminative fields for modeling spatial dependencies in natural images. 2003

• Multiscale Conditional Random Fields (MCRF)– X. He, R. S. Zemel, and M. A’ . Carreira-Perpin˜a’n. Multiscale

conditional random fields for image labeling. 2004.

• Hierarchical Conditional Random Field (HCRF)– S. Kumar and M. Hebert. A hierarchical field framework for unified

context-based classification. 2005– Jordan Reynolds and Kevin Murphy. Figure-ground segmentation

using a hierarchical conditional random field. 2007

• Tree Structured Conditional Random Fields (TCRF)– P. Awasthi, A. Gagrani, and B. Ravindran, Image Modeling using

Tree Structured Conditional Random Fields. 2007

Page 38: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

38

Discriminative Random Fields (DRF)

CRF - normalized product of potential functions

transition feature functionInteraction potential

state feature functionAssociation potential

enforce adaptive data-dependent smoothing over the label field.

In the DRF framework

take into account the neighborhood interaction of the data

Proposed DRF model was applied to the task of detecting man-made structures in natural scenes.

Page 39: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

39

structure detection by DRF(label each image site as structured or nonstructured)

For similar detection rates, DRF reduces the false positives considerably.

Though the model outperforms traditional MRFs, it is not strong enough to capture long range correlations among the labels due to the rigid lattice based structure which allows for only pairwiseinteractions

Page 40: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

40

Multiscale Conditional Random Fields (MCRF)

Combination of three different classifiers operating at local, regional and global contexts respectively.

Two main drawbacks:

• Including additional classifiers operating at different scales into the mCRF framework introduces a large number of model parameters.

• The model assumes conditional independence of hidden variables given the label field.

Page 41: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

41

Hierarchical CRF (HCRF)

Scene context is important in different domains to achieve good classification even though the local appearance is impoverished

• region-region interaction• object-region interaction• object-object interaction

HCRF: Incorporate the local as well as the global context of any of the three types in a single model

Page 42: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

42

Hierarchical CRF (HCRF)Layer 1short-range: pixelwise label

Layer 2long-range : contextual object or regions

exploit different levels of contextual information in images for robust classification.

Each layer is modeled as a conditional field that allows one to capture arbitrary observation dependent label interactions

Page 43: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

43

Hierarchical CRF (HCRF)

Page 44: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

44

……region-region interaction

object-region interaction

Page 45: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

45

object-object interaction

Page 46: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

46

Tree-Structured CRF (TCRF)Combine advantages of hierarchical models and discriminative

approaches in a single framework

model the association between the labels of 2×2 neighborhood of pixels

a. introduce a weight vector for every edge which represents the compatibility between the labels of the pairwise nodes

b. introduce a hidden variable H which is connected to all the 4 nodes. For every value which variable H takes, it induces a probability distribution over the labels of the nodes connected to it

Page 47: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

47

Tree-Structured CRF (TCRF)• Dividing the whole image into regions of size m × m• introducing a hidden variable for each of them gives a layer of hidden

variables over the given label field• each label node is associated with a hidden variable inthe layer above

Tree structured CRF with m=2Tree structured CRF with m=2

Similar to CRFs, the conditional probability p(y,H|x) of a TCRF factors into a product of potential functions

Page 48: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

48

Tree-Structured CRF (TCRF)

Object detection Image labeling

long range correlations among non-neighboring pixels can be easily modeled as associations among the hidden variables in the layer above.

tree structure allows inference to be carried out in a time linear in the number of pixels

Page 49: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

49

Sign Detection in Natural ImagesWeinman, J. Hanson, A. McCallum, A. Sign detection in natural images with conditional random fields. 2004.

Calculates a joint labeling of image patches, rather than labeling patches independently and use CRF to learn the characteristics of regions that contain text.

MaxEnt adding CRF

Page 50: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

50

Human Gesture RecognitionWang Quattoni, A. Morency, L.-P. Demirdjian, D. Darrell, T..Hidden Conditional Random Fields for Gesture Recognition. Computer Science and Artificial Intelligence Laboratory, MIT. 2006.

These gesture classes are: FB - Flip Back, SV - Shrink Vertically, EV - Expand Vertically, DB - Double Back, PB - Point and Back, EH – Expand Horizontally

Page 51: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

51

Human Gesture RecognitionHidden Conditional Random Fields

s = {s1, s2, ..., sm}, is the set of hidden states in the model,captures certain underlying structure of each class.

is potential function, measures the compatibility between a label, a set of observations and a configuration of the hidden states.

Page 52: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

52

Hidden states distribution

Page 53: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

53

Natural Scene CategorizationWang, Y., Gong, S., Conditional Random Field for Natural Scene Categorization. 2007.

Classification Oriented Conditional Random Field (COCRF)

Page 54: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

54

Natural Scene CategorizationCOCRF discoversthe spatial layoutdistribution of local patches and their pairwise interaction for a category

pairwise interaction potential between topics for four categories

Page 55: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

55

6. Summary • Generative and Discriminative methods are two-broad

approaches: former involve modeling, latter directly solve classification

• For classification – Naïve Bayes and Logistic Regression form a generative

discriminative pair• For sequential data

– HMM and CRF are corresponding pair • CRF performs better in language related tasks• Generative models are more elegant, have explanatory

power

Page 56: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

56

7. References1. C. Bishop, Pattern Recognition and Machine Learning,

Springer, 20062. C. Sutton and A. McCallum, An Introduction to Conditional

Random Fields for Relational Learning3. S. Shetty, H. Srinivasan and S. N. Srihari, Handwritten

Word Recognition using CRFs, ICDAR 20074. S. Shetty, H.Srinivasan and S. N. Srihari, Segmentation

and Labeling of Documents using CRFs, SPIE-DRR 20075. X. He, R. Zennel and M.A. Carreira-Perpinan, Multiscale

Conditional Random Fields for Image Labeling

Page 57: Conditional Random Fields - Welcome to CEDARsrihari/CSE574/Chap13/Ch13.5... · Conditional Random Fields ... – by fitting Gaussian class-conditional densities will ... • Independence

Machine Learning Srihari

57

Other Topics in Sequential Data

• Sequential Data and Markov Models:http://www.cedar.buffalo.edu/~srihari/CSE574/Chap11/Ch11.1-MarkovModels.pdf

• Hidden Markov Models:http://www.cedar.buffalo.edu/~srihari/CSE574/Chap11/Ch11.2-HiddenMarkovModels.pdf

• Extensions of HMMs:http://www.cedar.buffalo.edu/~srihari/CSE574/Chap11/Ch11.3-HMMExtensions.pdf

• Linear Dynamical Systems:http://www.cedar.buffalo.edu/~srihari/CSE574/Chap11/Ch11.4-LinearDynamicalSystems.pdf