Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the...

Structured learning: overviewSunita Sarawagi

IIT Bombayhttp://www.cse.iitb.ac.in/~sunita

Constituents of a structured model Feature vector f(x,y)

Features: real-valued, typically binary User-defined Number of features typically very large

Parameter vector w Weight of each feature

Score of a prediction y for input x: s(x,y) = w. f(x,y) Many interpretations:

Log unnormalized probability Negative energy

Prediction problem Predict: y* = argmaxy s(x,y)

Popularly known as MAP estimation Challenge: Space of possible y exponentially large

Exploit decomposability of feature function over parts of y f(x,y) = c f (x,yc,c)

Form of features and MAP inference algorithms is structure specific.

Examples..

Sequence labelingMy review of Fermat’s last theorem by S. Singh

Sequence labelingMy review of Fermat’s last theorem by S. Singh

1 2 3 4 5 6 7 8 9

My review of Fermat’s last theorem

by S. Singh

Other Other Other Title Title Title other Author Author

t

x

y

y1 y2 y3 y4 y5 y6 y7 y8 y9

Features decompose over adjacent labels.

Sequence labeling Examples of features

[x8=“S.” and y8=“Author”] [y8=“Author” and y9=“Author”]

MAP: Viterbi can find best y in O(nm2)

Markov models (CRFs) Application: Image segmentation and many

others y is a vector y1, y2, .., yn of discrete labels Features decompose over cliques of a

triangulated graph MAP inference algorithms for graphical models,

extensively researched Junction trees for exact, many approximate algorithms

Special case: Viterbi

Framework of structured models subsumes graphical models

Segmentation of sequenceApplication: speech recognition, information extraction

Output y is a sequence of segments s1,…,sp

Feature f(x,y) decomposes over segment and label of previous segment

MAP: easy extension of Viterbi O(m2 n2) m = number of labels, n = length of a sequence

My review of Fermat’s last theorem

by S. Singh

Other Other Other Title other Author

x

y

f(x;y) = P pj =1 f(x;sj ;yj ¡ 1)

Parse tree of a sentence Input x: “John hit the ball” Output y: parse tree

Features decompose over nodes of the tree MAP: Inside/outside algorithm O(n3)

Sentence alignment Input: sentence pair Output: alignment

Features decompose over each aligned edge MAP: Maximum weight matchingImage from :

http://gate.ac.uk/sale/tao/alignment-editor.png

Training Given

Several input output pairs (x1 y1), (x2 y2), …, (xN yN)

Error of an output: Ei(y) Example: Hamming error. Also decomposable.

Train parameter vector w to minimize training error minw

Pi E i (y¤ = argmaxyw:f(xi ;y))

Two problems: Discontinuous objective Might over-fit training data

Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the...

Documents

Transcript of Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the...