Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the...
-
Upload
thomasina-king -
Category
Documents
-
view
220 -
download
0
description
Transcript of Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the...
Structured learning: overviewSunita Sarawagi
IIT Bombayhttp://www.cse.iitb.ac.in/~sunita
Constituents of a structured model Feature vector f(x,y)
Features: real-valued, typically binary User-defined Number of features typically very large
Parameter vector w Weight of each feature
Score of a prediction y for input x: s(x,y) = w. f(x,y) Many interpretations:
Log unnormalized probability Negative energy
Prediction problem Predict: y* = argmaxy s(x,y)
Popularly known as MAP estimation Challenge: Space of possible y exponentially large
Exploit decomposability of feature function over parts of y f(x,y) = c f (x,yc,c)
Form of features and MAP inference algorithms is structure specific.
Examples..
Sequence labelingMy review of Fermat’s last theorem by S. Singh
Sequence labelingMy review of Fermat’s last theorem by S. Singh
1 2 3 4 5 6 7 8 9
My review of Fermat’s last theorem
by S. Singh
Other Other Other Title Title Title other Author Author
t
x
y
y1 y2 y3 y4 y5 y6 y7 y8 y9
Features decompose over adjacent labels.
Sequence labeling Examples of features
[x8=“S.” and y8=“Author”] [y8=“Author” and y9=“Author”]
MAP: Viterbi can find best y in O(nm2)
Markov models (CRFs) Application: Image segmentation and many
others y is a vector y1, y2, .., yn of discrete labels Features decompose over cliques of a
triangulated graph MAP inference algorithms for graphical models,
extensively researched Junction trees for exact, many approximate algorithms
Special case: Viterbi
Framework of structured models subsumes graphical models
Segmentation of sequenceApplication: speech recognition, information extraction
Output y is a sequence of segments s1,…,sp
Feature f(x,y) decomposes over segment and label of previous segment
MAP: easy extension of Viterbi O(m2 n2) m = number of labels, n = length of a sequence
My review of Fermat’s last theorem
by S. Singh
Other Other Other Title other Author
x
y
f(x;y) = P pj =1 f(x;sj ;yj ¡ 1)
Parse tree of a sentence Input x: “John hit the ball” Output y: parse tree
Features decompose over nodes of the tree MAP: Inside/outside algorithm O(n3)
Sentence alignment Input: sentence pair Output: alignment
Features decompose over each aligned edge MAP: Maximum weight matchingImage from :
http://gate.ac.uk/sale/tao/alignment-editor.png
Training Given
Several input output pairs (x1 y1), (x2 y2), …, (xN yN)
Error of an output: Ei(y) Example: Hamming error. Also decomposable.
Train parameter vector w to minimize training error minw
Pi E i (y¤ = argmaxyw:f(xi ;y))
Two problems: Discontinuous objective Might over-fit training data