Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the...

11
Structured learning: overview Sunita Sarawagi IIT Bombay http://www.cse.iitb.ac.in/~sunita

description

Prediction problem Predict: y * = argmax y s(x,y) Popularly known as MAP estimation Challenge: Space of possible y exponentially large Exploit decomposability of feature function over parts of y f(x,y) =  c f (x,y c,c) Form of features and MAP inference algorithms is structure specific. Examples..

Transcript of Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the...

Page 1: Structured learning: overview Sunita Sarawagi IIT Bombay  TexPoint fonts used in EMF. Read the TexPoint manual before.

Structured learning: overviewSunita Sarawagi

IIT Bombayhttp://www.cse.iitb.ac.in/~sunita

Page 2: Structured learning: overview Sunita Sarawagi IIT Bombay  TexPoint fonts used in EMF. Read the TexPoint manual before.

Constituents of a structured model Feature vector f(x,y)

Features: real-valued, typically binary User-defined Number of features typically very large

Parameter vector w Weight of each feature

Score of a prediction y for input x: s(x,y) = w. f(x,y) Many interpretations:

Log unnormalized probability Negative energy

Page 3: Structured learning: overview Sunita Sarawagi IIT Bombay  TexPoint fonts used in EMF. Read the TexPoint manual before.

Prediction problem Predict: y* = argmaxy s(x,y)

Popularly known as MAP estimation Challenge: Space of possible y exponentially large

Exploit decomposability of feature function over parts of y f(x,y) = c f (x,yc,c)

Form of features and MAP inference algorithms is structure specific.

Examples..

Page 4: Structured learning: overview Sunita Sarawagi IIT Bombay  TexPoint fonts used in EMF. Read the TexPoint manual before.

Sequence labelingMy review of Fermat’s last theorem by S. Singh

Page 5: Structured learning: overview Sunita Sarawagi IIT Bombay  TexPoint fonts used in EMF. Read the TexPoint manual before.

Sequence labelingMy review of Fermat’s last theorem by S. Singh

1 2 3 4 5 6 7 8 9

My review of Fermat’s last theorem

by S. Singh

Other Other Other Title Title Title other Author Author

t

x

y

y1 y2 y3 y4 y5 y6 y7 y8 y9

Features decompose over adjacent labels.

Page 6: Structured learning: overview Sunita Sarawagi IIT Bombay  TexPoint fonts used in EMF. Read the TexPoint manual before.

Sequence labeling Examples of features

[x8=“S.” and y8=“Author”] [y8=“Author” and y9=“Author”]

MAP: Viterbi can find best y in O(nm2)

Page 7: Structured learning: overview Sunita Sarawagi IIT Bombay  TexPoint fonts used in EMF. Read the TexPoint manual before.

Markov models (CRFs) Application: Image segmentation and many

others y is a vector y1, y2, .., yn of discrete labels Features decompose over cliques of a

triangulated graph MAP inference algorithms for graphical models,

extensively researched Junction trees for exact, many approximate algorithms

Special case: Viterbi

Framework of structured models subsumes graphical models

Page 8: Structured learning: overview Sunita Sarawagi IIT Bombay  TexPoint fonts used in EMF. Read the TexPoint manual before.

Segmentation of sequenceApplication: speech recognition, information extraction

Output y is a sequence of segments s1,…,sp

Feature f(x,y) decomposes over segment and label of previous segment

MAP: easy extension of Viterbi O(m2 n2) m = number of labels, n = length of a sequence

My review of Fermat’s last theorem

by S. Singh

Other Other Other Title other Author

x

y

f(x;y) = P pj =1 f(x;sj ;yj ¡ 1)

Page 9: Structured learning: overview Sunita Sarawagi IIT Bombay  TexPoint fonts used in EMF. Read the TexPoint manual before.

Parse tree of a sentence Input x: “John hit the ball” Output y: parse tree

Features decompose over nodes of the tree MAP: Inside/outside algorithm O(n3)

Page 10: Structured learning: overview Sunita Sarawagi IIT Bombay  TexPoint fonts used in EMF. Read the TexPoint manual before.

Sentence alignment Input: sentence pair Output: alignment

Features decompose over each aligned edge MAP: Maximum weight matchingImage from :

http://gate.ac.uk/sale/tao/alignment-editor.png

Page 11: Structured learning: overview Sunita Sarawagi IIT Bombay  TexPoint fonts used in EMF. Read the TexPoint manual before.

Training Given

Several input output pairs (x1 y1), (x2 y2), …, (xN yN)

Error of an output: Ei(y) Example: Hamming error. Also decomposable.

Train parameter vector w to minimize training error minw

Pi E i (y¤ = argmaxyw:f(xi ;y))

Two problems: Discontinuous objective Might over-fit training data