Gholamreza Haffari Anoop Sarkar Presenter: Milan Tofiloski

Gholamreza Haffari Anoop Sarkar

Presenter: Milan Tofiloski

Natural Language Lab

Simon Fraser university

Homotopy-based Semi-Supervised Hidden Markov

Models for Sequence Labeling

P (x;y)y (1)

• Motivation & Contributions

• Experiments

• Homotopy method

• More experiments

Outline

P (x;y)y (1)

• Parameter setting for the joint probability of input-output which maximizes probability of the given data:

• L : labeled data

• U : unlabeled data

Maximum Likelihood Principle

Deficiency of MLE

• Usually |U| >> |L|, then

• Which means the relationship of input-output is ignored when estimating the parameters !– MLE focuses on modeling the input distribution P(x)– But we are interested in modeling the joint distribution

P(x,y)

Remedy for the Deficiency

• Balance the effect of lab and unlab data:

• Find which maximally take advantage of lab and unlab data

• MLE

An experiment with HMM

Lower is Better

MLE Performance

• MLE can hurt the performance• Balancing lab and unlab data related terms is beneficial

Our Contributions

1. Introducing a principled way to choose for HMM in sequence labeling (tagging) tasks

2. Introducing an efficient dynamic programming algorithm to compute second order statistics in HMM

P (x;y)y (1)

• Experiments

• Homotopy method

Outline

• Field segmentation in information extraction• 13 tag fields: AUTHOR, TITLE, …

EDITOR EDITOR EDITOR EDITOR EDITOR EDITOR TITLE

A . Elmagarmid , editor . Transaction

TITLE TITLE TITLE TITLE TITLE TITLE PUB

Models for Advanced Database Applications , Morgan

PUB PUB PUB DATE DATE

- Kaufmann , 1992 .

Experimental Setup

• Use an HMM with 13 states– Freeze the transition (state->state) probabilities to what

has been observed in the lab data– Use the Homotopy method to just learn the emission

(state->alphabet) probabilities– Do add- smoothing for the initial values of emission

and transition probabilities

• Data statistics:– Average seq. length : 36.7– Average number of segments in a seq: 5.4– Size of Lab/Unlab data is 300/700

Baselines

• Held-out: put aside part of the lab data as a held-out set, and use it t choose

• Oracle: choose based on test data using per position accuracy

• Supervised: forgetting about unlab data, and just using lab data

P (x;y)y (1)

Homotopy vs Baselines

Higher is Better

• Sequence of most probable states decoding See paper for more results

Even very small values of can be useful.

In homotopy =.004, and in supervised = 0

P (x;y)y (1)

• Experiments

• Homotopy method

Outline

Path of Solutions

• Look at as changes from 0 to 1• Choose the best based on the path

Discontinuity

Bifurcation

EMfor HMM

• Let be a state->state or state->observation event in our HMM

• To find best parameter values which (locally) maximizes the objective function for a fixed :

Repeat until convergenceEM()

Fixed Points of EM

• Useful fact

• At the fixed points , then

• This is similar to using Homotopy for root finding– Same numerical techniques should be applicable here

Homotopy for Root Finding

• To find a root of G()– start from a root of a simple problem F() – trace the roots of intermediate problems while

morphing F to G

• To find which satisfy the above: – Set the derivative to zero: gives differential equation– Numerically solve the resulting differential eqn.

Solving the Differential Eqn

M . v = 0

Repeat until – Update in a proper direction parallel to v=Kernel(M)

– Update M

Jaccobian of EM1

• So, we need to compute the covariance matrix of the events

• The entry in the row and column of the covariance matrix is

See the paper for details

Challenging for HMM

Forward-Backward

Expected Quadratic Counts for HMM

• Dynamic programming algorithm to efficiently compute

• Pre-compute a table Zx for each sequence

• Having table Zx, the EQC can be computed efficiently– The time complexity is where K is the number of states

in the HMM (see paper for more details)

xi xi+1 xj…

… …

……

How to Choose based on Path

• monotone: the first point at which the monotonocity of changes

• MaxEnt: choose for which the model has maximum entropy on the unlab data

• minEig: when solving the diff eqn, consider the minimum singular value of the matrix M. Across rounds, choose for which the minimum singular value is the smallest

P (x;y)y (1)

• Experiments

• Homotopy method

Outline

Varying the Size of Unlab Data

Size of the labeled data: 100

• The three Homotopy-based methods outperform EM • maxEnt outperforms minEig and monotone • minEig and monotone have similar performances

Picked Values

• EM gives higher weight to unlabeled data compared to Homotopy-based method

Picked Values

selected by − maxEnt are much smaller than those

selected by minEig and monotone − minEig and monotone are close

Conclusion and Future Work

• Using EM can hurt performance in the case |L| << |U|

• Proposed a method to alleviate this problem for HMMs for seq. labeling tasks

• To speed up the method– Using sampling to find approximation to

covariance matrix– Using faster methods in recovering the

solution path, e.g. predictor-corrector

Questions?

Is Oracle outperformed by Homotopy?

• No!

- The performance measure used in selecting in oracle method may be different from that used in comparing homotopy and oracle

- The decoding alg used in oracle may be different from that used in comparing homotopy and oracle

Why not set ?

• This adhoc way of setting has two drawbacks:

- It still may hurt the performance. The proper may be much smaller than that.

- In some situations, the right choice of may be a big value. Setting is very conservative and dose not fully take advantage of the available unlabeled data.

Homotopy vs Baselines

– Viterbi Decoding: most probable seq of states decoding – SMS Decoding: seq of most probable states decoding

Our method(see the paper for more results)

Higher is Better

Gholamreza Haffari Anoop Sarkar Presenter: Milan Tofiloski

Documents

Transcript of Gholamreza Haffari Anoop Sarkar Presenter: Milan Tofiloski

AUTOMOBILE INDUSTRY- Anoop G

Anoop Thomas

Li fi by anoop

Anoop Mayampurath | amayampurath@peds.bsd.uchicago.edu ...

1 Gholamreza Haffari Anoop Sarkar Presenter: Milan Tofiloski Natural Language Lab Simon Fraser university Homotopy-based Semi- Supervised Hidden Markov.

Curriculum Vita Gholamreza Mowlaviisp.tums.ac.ir/content/board_members/10/Dr_Gholamreza...1 Curriculum Vita Gholamreza Mowlavi Professor of Parasitology Department of Medical Parasitology

ENDOCRINE SYSTEM Physiology Edited by: Dr. Gholamreza Komeili.

1 Natural Language Processing Gholamreza Ghassem-Sani Fall 1383.

Anoop singh jatav ett

BOSS Anoop Updated Installation Manual

Anoop Nicco Assignment

Anoop Project Synopsis

CURRICULUM VITAE GHODRATI AMIRI, Gholamreza … Files/cv/CV-Ghodrati-96-2-23.pdf · CURRICULUM VITAE GHODRATI AMIRI, Gholamreza PERSONAL INFORMATION ... Canadian Standard Association

Fatemeh Rasouli Khorshidi , Gholamreza Sarami , Habibollah Naderi , Ali Asghar …edcbmj.ir/article-1-1809-en.pdf · * Corresponding author: Gholamreza Sarami, Assistant Professor,

Inductive Semi-supervised Learninganoop/papers/pdf/semisup06.pdf · Inductive Semi-supervised Learning with Applicability to NLP Anoop Sarkar and Gholamreza Haffari anoop,ghaffar1@cs.sfu.ca

Dr ANOOP DIXIT @ SPECTRUM CAREER INSTITUTE ......Dr. ANOOP DIXIT (9810683007, 9811683007) END OF LECTURE 1 START OF LECTURE 2 Dr ANOOP DIXIT @ SPECTRUM CAREER INSTITUTE(9810683007)

Floating bridge by Anoop

IIIT Hyderabad Atif Iqbal and Anoop Namboodiri atif.iqbal@research.iiit.ac.inatif.iqbal@research.iiit.ac.in, anoop@iiit.ac.in anoop@iiit.ac.in Cascaded.

Inductive Semi-supervised Learninganoop/papers/pdf/semisup_naacl.pdfInductive Semi-supervised Learning with Applicability to NLP Anoop Sarkar and Gholamreza Haffari anoop,ghaffar1@cs.sfu.ca

Social Responsibility of Business (ANOOP )