Gholamreza Haffari Anoop Sarkar Presenter: Milan Tofiloski

30
1 Gholamreza Haffari Anoop Sarkar Presenter: Milan Tofiloski Natural Language Lab Simon Fraser university Homotopy-based Semi- Supervised Hidden Markov Models for Sequence Labeling

description

Homotopy-based Semi-Supervised Hidden Markov Models for Sequence Labeling. Gholamreza Haffari Anoop Sarkar Presenter: Milan Tofiloski Natural Language Lab Simon Fraser university. Outline. Motivation & Contributions Experiments Homotopy method More experiments. - PowerPoint PPT Presentation

Transcript of Gholamreza Haffari Anoop Sarkar Presenter: Milan Tofiloski

Page 1: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

1

Gholamreza Haffari Anoop Sarkar

Presenter: Milan Tofiloski

Natural Language Lab

Simon Fraser university

Homotopy-based Semi-Supervised Hidden Markov

Models for Sequence Labeling

Page 2: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

2

P (x;y)y (1)

• Motivation & Contributions

• Experiments

• Homotopy method

• More experiments

Outline

Page 3: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

3

P (x;y)y (1)

• Parameter setting for the joint probability of input-output which maximizes probability of the given data:

• L : labeled data

• U : unlabeled data

Maximum Likelihood Principle

Page 4: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

4

Deficiency of MLE

• Usually |U| >> |L|, then

99

44

1qq

• Which means the relationship of input-output is ignored when estimating the parameters !– MLE focuses on modeling the input distribution P(x)– But we are interested in modeling the joint distribution

P(x,y)

Page 5: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

5

Remedy for the Deficiency

99

44

1qq

• Balance the effect of lab and unlab data:

• Find which maximally take advantage of lab and unlab data

• MLE

Page 6: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

6

An experiment with HMM

99

44

1qq

Lower is Better

MLE Performance

• MLE can hurt the performance• Balancing lab and unlab data related terms is beneficial

Page 7: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

7

Our Contributions

1. Introducing a principled way to choose for HMM in sequence labeling (tagging) tasks

2. Introducing an efficient dynamic programming algorithm to compute second order statistics in HMM

Page 8: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

8

P (x;y)y (1)

• Motivation & Contributions

• Experiments

• Homotopy method

• More experiments

Outline

Page 9: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

9

Task

99

44

1qq

• Field segmentation in information extraction• 13 tag fields: AUTHOR, TITLE, …

EDITOR EDITOR EDITOR EDITOR EDITOR EDITOR TITLE

A . Elmagarmid , editor . Transaction

TITLE TITLE TITLE TITLE TITLE TITLE PUB

Models for Advanced Database Applications , Morgan

PUB PUB PUB DATE DATE

- Kaufmann , 1992 .

Page 10: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

10

Experimental Setup

• Use an HMM with 13 states– Freeze the transition (state->state) probabilities to what

has been observed in the lab data– Use the Homotopy method to just learn the emission

(state->alphabet) probabilities– Do add- smoothing for the initial values of emission

and transition probabilities

• Data statistics:– Average seq. length : 36.7– Average number of segments in a seq: 5.4– Size of Lab/Unlab data is 300/700

99

44

1qq

Page 11: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

11

Baselines

• Held-out: put aside part of the lab data as a held-out set, and use it t choose

• Oracle: choose based on test data using per position accuracy

• Supervised: forgetting about unlab data, and just using lab data

99

44

1qq

Page 12: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

12

P (x;y)y (1)

Homotopy vs Baselines

Higher is Better

• Sequence of most probable states decoding See paper for more results

Even very small values of can be useful.

In homotopy =.004, and in supervised = 0

Page 13: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

13

P (x;y)y (1)

• Motivation & Contributions

• Experiments

• Homotopy method

• More experiments

Outline

Page 14: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

14

Path of Solutions

• Look at as changes from 0 to 1• Choose the best based on the path

99

44

1qq

Discontinuity

Bifurcation

Page 15: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

15

EMfor HMM

• Let be a state->state or state->observation event in our HMM

• To find best parameter values which (locally) maximizes the objective function for a fixed :

99

44

1qq

Repeat until convergenceEM()

Page 16: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

16

Fixed Points of EM

• Useful fact

• At the fixed points , then

• This is similar to using Homotopy for root finding– Same numerical techniques should be applicable here

99

44

1qq

Page 17: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

17

Homotopy for Root Finding

• To find a root of G()– start from a root of a simple problem F() – trace the roots of intermediate problems while

morphing F to G

• To find which satisfy the above: – Set the derivative to zero: gives differential equation– Numerically solve the resulting differential eqn.

99

44

1qq

Page 18: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

18

Solving the Differential Eqn

99

44

1qq

M . v = 0

Repeat until – Update in a proper direction parallel to v=Kernel(M)

– Update M

Jaccobian of EM1

Page 19: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

19

Jaccobian of EM1

99

44

1qq

• So, we need to compute the covariance matrix of the events

• The entry in the row and column of the covariance matrix is

See the paper for details

Challenging for HMM

Forward-Backward

Page 20: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

20

Expected Quadratic Counts for HMM

• Dynamic programming algorithm to efficiently compute

• Pre-compute a table Zx for each sequence

• Having table Zx, the EQC can be computed efficiently– The time complexity is where K is the number of states

in the HMM (see paper for more details)

99

44

1qq

k1k2

xi xi+1 xj…

… …

……

Page 21: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

21

How to Choose based on Path

• monotone: the first point at which the monotonocity of changes

• MaxEnt: choose for which the model has maximum entropy on the unlab data

• minEig: when solving the diff eqn, consider the minimum singular value of the matrix M. Across rounds, choose for which the minimum singular value is the smallest

99

44

1qq

Page 22: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

22

P (x;y)y (1)

• Motivation & Contributions

• Experiments

• Homotopy method

• More experiments

Outline

Page 23: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

23

Varying the Size of Unlab Data

99

44

1qq

Size of the labeled data: 100

• The three Homotopy-based methods outperform EM • maxEnt outperforms minEig and monotone • minEig and monotone have similar performances

Page 24: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

24

Picked Values

99

44

1qq

Page 25: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

25

99

44

1qq

• EM gives higher weight to unlabeled data compared to Homotopy-based method

Picked Values

selected by − maxEnt are much smaller than those

selected by minEig and monotone − minEig and monotone are close

Page 26: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

26

Conclusion and Future Work

• Using EM can hurt performance in the case |L| << |U|

• Proposed a method to alleviate this problem for HMMs for seq. labeling tasks

• To speed up the method– Using sampling to find approximation to

covariance matrix– Using faster methods in recovering the

solution path, e.g. predictor-corrector

99

44

1qq

Page 27: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

27

Questions?

99

44

1qq

Page 28: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

28

Is Oracle outperformed by Homotopy?

99

44

1qq

• No!

- The performance measure used in selecting in oracle method may be different from that used in comparing homotopy and oracle

- The decoding alg used in oracle may be different from that used in comparing homotopy and oracle

Page 29: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

29

Why not set ?

99

44

1qq

• This adhoc way of setting has two drawbacks:

- It still may hurt the performance. The proper may be much smaller than that.

- In some situations, the right choice of may be a big value. Setting is very conservative and dose not fully take advantage of the available unlabeled data.

Page 30: Gholamreza Haffari                           Anoop Sarkar Presenter: Milan Tofiloski

30

Homotopy vs Baselines

– Viterbi Decoding: most probable seq of states decoding – SMS Decoding: seq of most probable states decoding

Our method(see the paper for more results)

Higher is Better