Inference Protocols for Coreference...
Transcript of Inference Protocols for Coreference...
![Page 1: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/1.jpg)
Practical Learning Algorithms for Structured Prediction Models
Kai-Wei Chang
University of Illinois at Urbana-Champaign
![Page 2: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/2.jpg)
2
Dream:Intelligent systems that are
able to read, to see, to talk,
and to answer questions.
![Page 3: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/3.jpg)
3
Personal assistant system
Translation system
![Page 4: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/4.jpg)
Carefully Slide
4
![Page 5: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/5.jpg)
5
ε°εΏ:
Carefully
Careful
Take Care
Caution
ε°ζ»:
Slide
Landslip
Wet Floor
Smooth
![Page 6: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/6.jpg)
Christopher Robin is alive and well. He is the
same person that you read about in the book,
Winnie the Pooh. As a boy, Chris lived in a
pretty home called Cotchfield Farm. When
Chris was three years old, his father wrote a
poem about him. The poem was printed in a
magazine for others to read. Mr. Robin then
wrote a book
6
Q: [Chris] = [Mr. Robin] ?
Slide modified from Dan Roth
![Page 7: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/7.jpg)
Christopher Robin is alive and well. He is the
same person that you read about in the book,
Winnie the Pooh. As a boy, Chris lived in a
pretty home called Cotchfield Farm. When
Chris was three years old, his father wrote a
poem about him. The poem was printed in a
magazine for others to read. Mr. Robin then
wrote a book
7
Complex Decision Structure
![Page 8: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/8.jpg)
Christopher Robin is alive and well. He is the
same person that you read about in the book,
Winnie the Pooh. As a boy, Chris lived in a
pretty home called Cotchfield Farm. When
Chris was three years old, his father wrote a
poem about him. The poem was printed in a
magazine for others to read. Mr. Robin then
wrote a book
8
Co-reference Resolution
![Page 9: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/9.jpg)
Scalability Issues
Large amount of data
Complex decision structure
9
Bill Clinton, recently elected as the President of
the USA, has been invited by the Russian
President], [Vladimir Putin, to visit Russia.
President Clinton said that he looks forward to
strengthening ties between USA and Russia
Algorithm 2 is shown to perform
better Berg-Kirkpatrick, ACL
2010. It can also be expected to
converge faster -- anyway, the E-
step changes the auxiliary
function by changing the
expected counts, so there's no
point in finding a local maximum
of the auxiliary
function in each iteration
a local-optimality guarantee.
Consequently, LOLS can
improve upon the reference
policy, unlike previous
algorithms. This enables us to
develop structured contextual
bandits, a partial information
structured prediction setting with
many potential applications.
Can learning to search work even
when the reference is poor?
We provide a new learning to
search algorithm, LOLS, which
does well relative to the
reference policy, but additionally
guarantees low regret compared
to deviations from the learned
policy.
Methods for learning to search
for structured prediction typically
imitate a reference policy, with
existing theoretical guarantees
demonstrating low regret
compared to that reference. This
is unsatisfactory in many
applications where the reference
policy is suboptimal and the goal
of learning is to
Robin is alive and well. He is the
same person that you read about
in the book, Winnie the Pooh. As
a boy, Chris lived in a pretty
home called Cotchfield Farm.
When Chris was three years old,
his father wrote a poem about
him. The poem was printed in a
magazine for others to read. Mr.
Robin then wrote a book
Robin is alive and well. He is the same
person that you read about in the book,
Winnie the Pooh. As a boy, Chris lived
in a pretty home called Cotchfield
Farm. When Chris was three years old,
his father wrote a poem about him. The
poem was printed in a magazine for
others to read. Mr. Robin then wrote a
book
![Page 10: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/10.jpg)
[Modeling] Expressive and general formulations
[Algorithms] Principled and efficient
[Applications] Support many applications
Goal: Practical Machine Learning
10
![Page 11: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/11.jpg)
My Research Contributions
Data
Size
Problem Complexity
Limited memory linear classifier [KDD 10, 11, TKDD 12]
Structured prediction models[ICML 14, ECML 13a, 13b , AAAI 15, CoNLL 11, 12]
Latent representation for
knowledge bases[EMNLP 13, 14]
Linear classification[ICML08, KDD 08,
JMLR 08a, 10a, 10b,10c]
11
Bill Clinton, recently elected as the
President of the USA, has been invited
by the Russian President], [Vladimir
Putin, to visit Russia. President
Clinton said that he looks forward to
strengthening ties between USA and
Russia
Algorithm 2 is shown to
perform better Berg-
Kirkpatrick, ACL 2010. It
can also be expected to
converge faster --
anyway, the E-step
changes the auxiliary
function by changing the
expected counts, so there's
no point in finding a local
maximum of the auxiliary
function in each iteration
a local-optimality
guarantee. Consequently,
LOLS can improve upon
the reference policy,
unlike previous
algorithms. This enables
us to develop structured
contextual bandits, a
partial information
structured prediction
setting with many
potential applications.
Can learning to search
work even when the
reference is poor?
We provide a new
learning to search
algorithm, LOLS, which
does well relative to the
reference policy, but
additionally guarantees
low regret compared to
deviations from the
learned policy.
Methods for learning to
search for structured
prediction typically
imitate a reference policy,
with existing theoretical
guarantees demonstrating
low regret compared to
that reference. This is
unsatisfactory in many
applications where the
reference policy is
suboptimal and the goal
of learning is to
Robin is alive and well.
He is the same person that
you read about in the
book, Winnie the Pooh.
As a boy, Chris lived in a
pretty home called
Cotchfield Farm. When
Chris was three years old,
his father wrote a poem
about him. The poem was
printed in a magazine for
others to read. Mr. Robin
then wrote a book
Robin is alive and well. He
is the same person that you
read about in the book,
Winnie the Pooh. As a boy,
Chris lived in a pretty home
called Cotchfield Farm.
When Chris was three years
old, his father wrote a poem
about him. The poem was
printed in a magazine for
others to read. Mr. Robin
then wrote a book
![Page 12: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/12.jpg)
My Research Contributions
LIBLINEAR [ICML08, KDD 08, JMLR 08a, 10a, 10b,10c]
β’ Implements our proposed learning algorithms
β’ Supports binary and multiclass classification
Impact: > 60,000 downloads, > 2,600 citations in AI (AAAI, IJCAI), Data Mining (KDD, ICDM), Machine Learning (ICML,
NIPS) Computer Vision (ICCV, CVPR), Information Retrieval (WWW, SIGIR),
NLP (ACL, EMNLP), Multimedia (ACM-MM), HCI (UIST), System (CCS)
12
Data
Size
Problem complexity
Limited memory linear classifier [KDD 10, KDD 11, TKDD 12]
Structured prediction models
[ICML 14, ECML 13a, 13b, AAAI 15, CoNLL 11,12]
Latent representation for
knowledge bases
[EMNLP 13, EMNLP 14]
Linear Classification
[ICML08, KDD 08,
JMLR 08a, 10a, 10b,10c]
![Page 13: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/13.jpg)
My Research Contributions
13
Data
Size
Problem complexity
Limited memory linear classifier [KDD 10, KDD 11, TKDD 12]
Latent representation for
knowledge bases
[EMNLP 13, EMNLP 14]
Linear Classification
[ICML08, KDD 08,
JMLR 08a, 10a, 10b,10c]
(Selective) Block Minimization[KDD 10, 11, TKDD 12]
Supports learning from large data and streaming data
KDD best paper (2010), Yahoo! KSC award (2011)
Structured prediction models
[ICML 14, ECML 13a, 13b, AAAI 15, CoNLL 11,12]
![Page 14: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/14.jpg)
My Research Contributions
14
Data
Size
Problem complexity
Limited memory linear classifier [KDD 10, KDD 11, TKDD 12]
Latent representation for
knowledge bases
[EMNLP 13, EMNLP 14]
Linear Classification
[ICML08, KDD 08,
JMLR 08a, 10a, 10b,10c]
Latent Representation for KBs[EMNLP 13b,14]
Tensor methods for completing missing entries in KBs
Applications: e.g., entity relation extraction,
word relation extraction.
Structured prediction models
[ICML 14, ECML 13a, 13b, AAAI 15, CoNLL 11,12]
![Page 15: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/15.jpg)
My Research Contributions
15
Data
Size
Problem complexity
Limited memory linear classifier [KDD 10, KDD 11, TKDD 12]
Latent representation for
knowledge bases
[EMNLP 13, EMNLP 14]
Linear Classification
[ICML08, KDD 08,
JMLR 08a, 10a, 10b,10c]
Structured Prediction Models[ECML 13a, 13b, ICML14, CoNLL 11,12, ECML 13a, AAAI15]
β’ Design tractable, principled, domain specific models
β’ Speedup general structured models
Structured prediction models
[ICML 14, ECML 13a, 13b, AAAI 15, CoNLL 11,12]
![Page 16: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/16.jpg)
Structured Prediction
Task Input Output
Part-of-speech
Tagging
They operate
ships and banks.
Dependency
Parsing
They operate
ships and banks.
Segmentation
16
Pronoun Verb Noun And Noun
Root They operate ships and banks .
Assign values to a set of interdependent output variables
![Page 17: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/17.jpg)
Structured Prediction Models
Learn a scoring function:
πππππ ππ’π‘ππ’π‘ π¦ | ππππ’π‘ π₯ ,πππππ π€
Linear model: π π¦ | π₯, π€ = ππ€π ππ π₯, π¦
Features: e.g., Verb-Noun, Mary-Noun
Mary had a little lamb
Noun Verb Det Adj Noun
Input π₯:
Output π¦:
Features based on both input and output
17
![Page 18: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/18.jpg)
Find the best scoring output given the model
argmaxπ¦
πππππ ππ’π‘ππ’π‘ π¦ | ππππ’π‘ π₯ ,πππππ π€
Output space is usually exponentially large
Inference algorithms:
Specific: e.g., Viterbi (linear chain)
General: Integer linear programming (ILP)
Approximate inference algorithms:
e.g., belief propagation, dual decomposition
Inference
18
![Page 19: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/19.jpg)
Learning Structured Models
Online, e.g., Structured Perceptron [Collins 02]
Batch e.g., Structured SVM
Cutting plane: [Tsochantaridis+ 05, Joachims+ 09]
Dual Coordinate Descent: [Shevade+ 11, Chang+ 13]
Block-Coordinate Frank-Wolfe: [Lacoste-Julien+ 13]
Parallel Dual Coordinate Descent: [ECML 13a]
19
Solve inferences Update the model
![Page 20: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/20.jpg)
Outline
20
Co-reference
1. Applications:
Co-reference; ESL Grammar Correction; Word Relation;
3. Algorithms: Learning with Amortized Inference
2. Modeling: Supervised Clustering Model
ESL grammar
Correction
Word
Relations
![Page 21: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/21.jpg)
Outline
21
1. Applications:
Co-reference; ESL Grammar Correction; Word Relation;
Co-reference Word RelationsESL grammar
Correction
![Page 22: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/22.jpg)
Christopher Robin is alive and well. He is the
same person that you read about in the book,
Winnie the Pooh. As a boy, Chris lived in a
pretty home called Cotchfield Farm. When
Chris was three years old, his father wrote a
poem about him. The poem was printed in a
magazine for others to read. Mr. Robin then
wrote a book
22
Co-reference Resolution
Co-reference Word RelationsESL grammar
Correction
![Page 23: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/23.jpg)
Performance*
Proposed a novel, principled, linguistically motivated model
Co-reference Resolution[EMNLP 13a, ICML14, In submission]
50
55
60
65
Stanford
Chen+
Ours (2012)
Martschat+
Ours (2013)
Fernandes+
HOTCoref
Berkeley
Ours (2015)
23
Winner of the CoNLL ST 11
Winner of the CoNLL ST 12
*Avg ( MUC, B3, CEAF )
Co-reference Word RelationsESL grammar
Correction
Latent forest structure
![Page 24: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/24.jpg)
Co-reference Resolution Demo
24
http://bit.ly/illinoisCoref
Co-reference Word RelationsESL grammar
Correction
![Page 25: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/25.jpg)
ESL Grammar Error Correction[CoNLL 13, 14]
They believe that such situation must be avoided.
situation
a situation
situations
a situations
First place in CoNLL Shared tasks 13β 14β
25
Co-reference Word RelationsESL grammar
Correction
![Page 26: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/26.jpg)
Identifying Relations between Words[EMNLP 14]
GRE antonym task (no context):
Look up in a thesaurus [Encarta]: 56%
Our tensor method [EMNLP 13b]: 77% (the best result so far)
Why?
Considers multiple word relations simultaneously
e.g., inanimate β Ant β alive β Syn β living
Which word is the opposite of adulterate?
(a) renounce (b) forbid (c) purify (d) criticize (e) correct
26
Co-reference Word RelationsESL grammar
Correction
![Page 27: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/27.jpg)
Word Relation Demo
http://bit.ly/wordRelation
Antonym of adulterate?
(a) renounce -0.014
(b) forbid 0.004
(c) purify 0.781
(d) criticize -0.004
(e) correct -0.010
27
Co-reference Word RelationsESL grammar
Correction
![Page 28: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/28.jpg)
Outline
28
Co-reference
2. Modeling: Supervised Clustering Model
ESL grammar
Correction
Word
Relations
![Page 29: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/29.jpg)
Christopher Robin is alive and well. He is the
same person that you read about in the book,
Winnie the Pooh. As a boy, Chris lived in a
pretty home called Cotchfield Farm. When
Chris was three years old, his father wrote a
poem about him. The poem was printed in a
magazine for others to read. Mr. Robin then
wrote a book
29
Co-reference Resolution
![Page 30: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/30.jpg)
Learn a pairwise similarity measure
(local predictor)
Example features:
same sub-string?
positions in the paragraph
other 30+ feature types
Key questions:
How to learn the similarity function
How to do clustering
30
Co-reference ResolutionChristopher Robin is
alive and well. He is
the same person that
you read about in the
book, Winnie the
Pooh. As a boy, Chris
lived in a pretty home
called Cotchfield
Farm. When Chris
was three years old,
his father wrote a
poem about him. The
poem was printed in a
magazine for others to
read. Mr. Robin then
wrote a book
![Page 31: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/31.jpg)
Decoupling Approach
A heuristic to learn the model [Soon+ 01, Bengtson+ 08, CoNLL11]
Decouple learning and inference:
31
Learn a pairwise similarity function
Cluster based on this function
![Page 32: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/32.jpg)
Decoupling Approach-Learning
32
Chris1
Chris2 his father3
him4
Mr. Robin5
Positive Samples(Chris1, him4)
(Chris2, him4)(Chris1, Chris2)
(his father3, Mr. Robin5)
Negative Samples(Chris1, his father3)
(Chris2, his father3)
(him4, his father3)
(Chris1, Mr. Robin5)
(Chris2, Mr. Robin5)
(him4, Mr. Robin5)
![Page 33: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/33.jpg)
Greedy Best-Left-Link Clustering
[Bill Clinton], recently elected as the [President of the USA
33
![Page 34: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/34.jpg)
Greedy Best-Left-Link Clustering
[Bill Clinton], recently elected as the [President of the USA], has been
invited by the [Russian President]
34
![Page 35: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/35.jpg)
Greedy Best-Left-Link Clustering
[Bill Clinton], recently elected as the [President of the USA], has been
invited by the [Russian President], [Vladimir Putin], to visit [Russia].
[President Clinton]
35
![Page 36: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/36.jpg)
Greedy Best-Left-Link Clustering
[Bill Clinton], recently elected as the [President of the USA], has been
invited by the [Russian President], [Vladimir Putin], to visit [Russia].
[President Clinton] said that [he] looks forward to strengthening ties
between [USA] and [Russia].
36
[Russia].
[Russia].
[Vladimir Putin]
[Bill Clinton]
[President of the USA]
[Russian President]
[President Clinton]
Best Left-Linking Forest
[Soon+ 01, Bengtson+ 08, CoNLL11]
[he]
![Page 37: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/37.jpg)
Challenges
37
Christopher Robin is alive and well. He is the same person that
you read about in the book, Winnie the Pooh. As a boy, Chris
lived in a pretty home called Cotchfield Farm. When Chris was
three years old, his father wrote a poem about him. The poem
was printed in a magazine for others to read. Mr. Robin then
wrote a book
Decoupling may lose information
![Page 38: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/38.jpg)
Challenges
38
Christopher Robin is alive and well. He is the same person that
you read about in the book, Winnie the Pooh. As a boy, Chris
lived in a pretty home called Cotchfield Farm. When Chris was
three years old, his father wrote a poem about him. The poem
was printed in a magazine for others to read. Mr. Robin then
wrote a book
Decoupling may lose information
![Page 39: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/39.jpg)
Challenges
39
In addition, we need world knowledge
Chris
Chris
his father
1. Complexity: need an efficient algorithm
2. Modeling: learn the metric while clustering
3. Knowledge: augment with knowledge
![Page 40: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/40.jpg)
Structured Learning Approach
Learn the similarity function while clustering
40
Cluster based on this function.
Update the similarity function
![Page 41: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/41.jpg)
Attempt: All-Links Clustering[Mccallum+ 03, CoNLL 11]
Define a global scoring function:
Attempt: using all within-cluster pairs:
Inference problem is too hard
41
Christopher Robin He
Chris
Chris his father
Mr. Robin
![Page 42: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/42.jpg)
Latent Left-Linking Model (L3M) [ICML 14, EMNLP 13]
Score (a clustering C)
= Score (the best left-linking forest that is consistent with C)
= Score of edges in the forests
42
Christopher Robin He
Chris
Chris his father
Mr. Robin
![Page 43: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/43.jpg)
Linguistic Constraints
Must-link constraints:
E.g., SameProperName, β¦
Cannot-link constraints:
E.g., ModifierMismatch, β¦
43
[Bill Clinton], recently elected as the [President of the USA], has been
invited by the [Russian President], [Vladimir Putin], to visit [Russia].
[President Clinton] said that [he] looks forward to strengthening ties
between [USA] and [Russia].
![Page 44: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/44.jpg)
Solved by a greedy algorithm or formulated as an
Integer Linear Program (ILP)
Inference in L3M [ICML 14, EMNLP 13]
44
argmaxπ
π ππ,π π¦π,π
π . π‘ π΄π β€ π; π¦π,π β {0,1}
π¦π,π= 1 β π, π is an edge in the forest
β’ Modeling constraints
β’ Linguistic constraints
![Page 45: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/45.jpg)
Learning L3M (simplified version)[ICML 14, EMNLP 13a]
45
predicted forest
latent forest
[Bill Clinton], recently elected as the
[President of the USA], has been
invited by the [Russian President],
[Vladimir Putin], to visit [Russia].
[President Clinton] said that [he]
looks forward to strengthening ties
between [USA] and [Russia].
[Bill Clinton], recently elected as the
[President of the USA], has been
invited by the [Russian President],
[Vladimir Putin], to visit [Russia].
[President Clinton] said that [he]
looks forward to strengthening ties
between [USA] and [Russia].
![Page 46: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/46.jpg)
Learning L3M (simplified version)[ICML 14, EMNLP 13a]
46
predicted forest latent forest
Loop until stopping condition is met:
For each ππ , ππ pair:
π, π = argmaxπ,π
πππ(ππ , π, π)
π‘π’ = argmaxπ
πππ(ππ , ππ, π)
π β π+ π(π ππ , ππ, ππ β π ππ , π, π ), π: learning rate
![Page 47: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/47.jpg)
Define a log-linear modelPr [a clustering C]
= Pr [forests that are consistent with C]
= Pr [edges in the forest]
Pr [edge] ~ Pr [ πβπ exp(π β π(π, π)/πΎ)] (Β° : a parameter)
Regularized Maximum Likelihood Estimation:
Extension: Probabilistic L3M[ICML 14, EMNLP 13a]
47
Pr [a clustering C]
= Pr [forests that are consistent with C]
= Pr [edges in the forest]
Pr [edge] ~ Pr [ πβπ exp(π β π(π, π)/πΎ)] (Β° : a parameter)
minπ°
LL(w) = π½| π° |2 + π log ππ(π)
- π π log( π<π exp(π β π(π, π) /πΎ)πΆπ(π, π))
![Page 48: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/48.jpg)
Coreference: OntoNotes-5.0 (with gold mentions)
72
73
74
75
76
77
78
Decoupled L3M Probabilistic
L3M
48
Performance*
*Avg ( MUC, B3, CEAF )
Better
![Page 49: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/49.jpg)
Latent Left-Linking Model (L3M) [ICML 14, EMNLP 13]
49
Advantages:β’ Complexity: Very efficient
β’ Modeling: Learn the metric while clustering
β’ Knowledge: Easy to incorporate constraints
(must-link or cannot-link)
Can be applied to other supervised clustering problems!
e.g., the posts in a forum, error reports from users β¦
![Page 50: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/50.jpg)
Outline
50
Co-reference
3. Algorithms: Learning with Amortized Inference
ESL grammar
Correction
Word
Relations
![Page 51: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/51.jpg)
Learning Structured Models
Online, e.g., Structured Perceptron [Collins 02]
Batch e.g., Structured SVM
Cutting plane: [Tsochantaridis+ 05, Joachims+ 09]
Dual Coordinate Descent: [Shevade+ 11, Chang+ 13]
Block-Coordinate Frank-Wolfe: [Lacoste-Julien+ 13]
Parallel Dual Coordinate Descent: [ECML 13a]
51
Solve inferences Update the model
![Page 52: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/52.jpg)
Redundancy in Learning Phase[AAAI 15]
Recognizing Entities and Relations Task
Counts
100k
80k
60k
40k
20k
00
# training rounds
0 10 20 30 40 50
Inference problems
Distinct solutions
52
![Page 53: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/53.jpg)
Redundancy of Solutions[Kundu+13]
S1
He
is
reading
a
book
S2
She
is
watching
a
movie
POS
Pronoun
VerbZ
VerbG
Det
Noun
53
POS
Pronoun
VerbZ
VerbG
Det
Noun
Although the inference problems are different,
their solutions might be the same
![Page 54: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/54.jpg)
Fewer Inference Calls [AAAI 15]
Obtain the same model with fewer inference calls
BaselineOur
Performance
# inference calls
0.9
0.85
0.80
0.75
0 5k 10k 15k 20k 25k
54
Recognizing Entities and Relations Task
![Page 55: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/55.jpg)
A general inference framework
β¦ to represent inference problems
A condition
β¦ to check if two problems have the same solution
Learning with Amortized Inference[AAAI 15]
55
If CONDITION(problem cache, new problem)
then (no need to call the solver)
SOLUTION(new problem) = old solution
Else
Call base solver and update cache
End
0.04 ms
2 ms
![Page 56: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/56.jpg)
A General Inference Framework
Integer Linear Programming (ILP)
Widely used in NLP & Vision tasks [Roth+ 04]
E.g., Dependency Parsing, Sentence Compression
Any MAP problem w.r.t. any probabilistic model,
can be formulated as an ILP [Roth+ 04, Sontag 10]
Only used for verifying amortized conditions
argmaxπ
π πππ¦π π . π‘ π΄π β€ π; π¦π β {0,1}
56
![Page 57: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/57.jpg)
Amortized Inference Theorem[Kundu+13]
Theorem: If the following conditions are satisfied
1. Same # variables & same constraints (same equivalence class)
2. βπ, 2ππ,πβ β 1 (ππ,π β ππ,π) β₯ 0
then the optimal solution of Q is ππβ
Theorem 1: If the following conditions are satisfied
1. Same # variables & same constraints
2. βπ, 2ππ,πβ β 1 (ππ,π β ππ,π) β₯ 0
(The solution is not sensitive to the changes of the
coefficients.)
then the optimal solution of Q is ππβ
o π₯πβ : the solution to P
o c: the coefficients of ILPs
57
![Page 58: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/58.jpg)
Amortized Inference Theorem[Kundu+13]
Theorem: If the following conditions are satisfied
1. Same # variables & same constraints (same equivalence class)
2. βπ, 2ππ,πβ β 1 (ππ,π β ππ,π) β₯ 0
then the optimal solution of Q is ππβ
Theorem 1: If the following conditions are satisfied
1. Same # variables & same constraints
2. βπ, 2ππ,πβ β 1 (cπ,π β ππ,π) β₯ 0
then the optimal solution of Q is ππβ
58
max 2x1+4x2+2x3+0.5x4
x1 + x2 β€ 1x3 + x4 β€ 1
max 2x1+3x2+2x3+1x4
x1 + x2 β€ 1x3 + x4 β€ 1
ππβ : <0, 1, 1, 0> 6
max 2x1+3x2+2x3+1x4
x1 + x2 β€ 1x3 + x4 β€ 1
max 2x1+4x2+2x3+0.5x4
x1 + x2 β€ 1x3 + x4 β€ 1
π₯β²: <1, 0, 1, 0> 4
P: Q:
![Page 59: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/59.jpg)
Amortized Inference Theorem[Kundu+13]
Theorem: If the following conditions are satisfied
1. Same # variables & same constraints (same equivalence class)
2. βπ, 2ππ,πβ β 1 (ππ,π β ππ,π) β₯ 0
then the optimal solution of Q is ππβ
Theorem 1: If the following conditions are satisfied
1. Same # variables & same constraints
2. βπ, 2ππ,πβ β 1 (ππ,π β ππ,π) β₯ 0
if ππ,πβ = 1 then (ππ,πβππ,π) β₯ 0
if ππ,πβ = 0 then (ππ,πβππ,π) β€ 0
then the optimal solution of Q is ππβ
o π₯πβ : the solution to P
o c: the coefficients of ILPs59
![Page 60: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/60.jpg)
Approx. Amortized Inference [AAAI 15]
Theorem: If the following conditions are satisfied
1. Same # variables & same constraints (same equivalence class)
2. βπ, 2ππ,πβ β 1 (ππ,π β ππ,π) β₯ 0
then the optimal solution of Q is ππβ
Theorem 2: If the following conditions are satisfied
1. Same # variables & same constraints
2. βπ, 2π₯π,πβ β 1 (ππ,π β ππ,π) β₯ βπ|ππ,π|
then π₯πβ is a (
1
1+ππ)-approximate solution to Q
o π₯πβ : the solution to P
o M: a constant
o c: the coefficients of ILPs
60
![Page 61: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/61.jpg)
Approx. Amortized Inference [AAAI 15]
Theorem: If the following conditions are satisfied
1. Same # variables & same constraints (same equivalence class)
2. βπ, 2ππ,πβ β 1 (ππ,π β ππ,π) β₯ 0
then the optimal solution of Q is ππβ
Theorem 2: If the following conditions are satisfied
1. Same # variables & same constraints
2. βπ, 2π₯π,πβ β 1 (ππ,π β ππ,π) β₯ βπ|ππ,π|
then π₯πβ is a (
1
1+ππ)-approximate solution to Q
Corollary 1:Learning Structured SVM with approximate
amortized inference gives a model with bounded
empirical risk
61
![Page 62: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/62.jpg)
Approx. Amortized Inference [AAAI 15]
Theorem: If the following conditions are satisfied
1. Same # variables & same constraints (same equivalence class)
2. βπ, 2ππ,πβ β 1 (ππ,π β ππ,π) β₯ 0
then the optimal solution of Q is ππβ
Theorem 2: If the following conditions are satisfied
1. Same # variables & same constraints
2. βπ, 2π₯π,πβ β 1 (ππ,π β ππ,π) β₯ βπ|ππ,π|
then π₯πβ is a (
1
1+ππ)-approximate solution to Q
Corollary 2:Dual coordinate descent for structured SVM can still
return an exact model even if approx. amortized
inference is used.
62
![Page 63: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/63.jpg)
# Solver Calls (Entity-Relation Extraction)
0
20
40
60
80
100Exact Exact
Baseline Our Our-approx.
Ent F1: 87.7
Rel F1: 47.6
Ent F1: 87.3
Rel F1: 47.8
63
% Solver Calls
Better
![Page 64: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/64.jpg)
Outline
64
Co-reference
1. Applications:
Co-reference; ESL Grammar Correction; Word Relation;
3. Algorithms: Learning with Amortized Inference
2. Modeling: Supervised Clustering Model
ESL grammar
Correction
Word
Relations
![Page 65: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/65.jpg)
Other Related Work
65
Co-reference
1. Applications:Dependency Parsing [Arxiv 15 b]; Multi-label Classification [ECML13]
3. Algorithms:Parallel learning algorithms [ECML 13b]
2. Modeling:Semi-Supervised Learning[ECML 13a] Search-Based Model [Arxiv 15 a]
ESL grammar
Correction
Word
Relations
![Page 66: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/66.jpg)
My Research Contributions
Data
Size
Problem Complexity
Limited memory linear classifier [KDD 10, 11, TKDD 12]
Structured prediction models[ICML 14, ECML 13a, 13b , AAAI 15, CoNLL 11, 12]
Latent representation for
knowledge bases[EMNLP 13, 14]
Linear classification[ICML08, KDD 08,
JMLR 08a, 10a, 10b,10c]
66
Bill Clinton, recently elected as the
President of the USA, has been invited
by the Russian President], [Vladimir
Putin, to visit Russia. President
Clinton said that he looks forward to
strengthening ties between USA and
Russia
Algorithm 2 is shown to
perform better Berg-
Kirkpatrick, ACL 2010. It
can also be expected to
converge faster --
anyway, the E-step
changes the auxiliary
function by changing the
expected counts, so there's
no point in finding a local
maximum of the auxiliary
function in each iteration
a local-optimality
guarantee. Consequently,
LOLS can improve upon
the reference policy,
unlike previous
algorithms. This enables
us to develop structured
contextual bandits, a
partial information
structured prediction
setting with many
potential applications.
Can learning to search
work even when the
reference is poor?
We provide a new
learning to search
algorithm, LOLS, which
does well relative to the
reference policy, but
additionally guarantees
low regret compared to
deviations from the
learned policy.
Methods for learning to
search for structured
prediction typically
imitate a reference policy,
with existing theoretical
guarantees demonstrating
low regret compared to
that reference. This is
unsatisfactory in many
applications where the
reference policy is
suboptimal and the goal
of learning is to
Robin is alive and well.
He is the same person that
you read about in the
book, Winnie the Pooh.
As a boy, Chris lived in a
pretty home called
Cotchfield Farm. When
Chris was three years old,
his father wrote a poem
about him. The poem was
printed in a magazine for
others to read. Mr. Robin
then wrote a book
Robin is alive and well. He
is the same person that you
read about in the book,
Winnie the Pooh. As a boy,
Chris lived in a pretty home
called Cotchfield Farm.
When Chris was three years
old, his father wrote a poem
about him. The poem was
printed in a magazine for
others to read. Mr. Robin
then wrote a book
![Page 67: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/67.jpg)
Future Work: Practical Machine Learning
67
Co-reference
1. Applications: More applications, easy access tools
3. Algorithms: Handle large & complex data
2. Modeling: Learning from heterogeneous information
ESL grammar
CorrectionOthers
![Page 68: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/68.jpg)
Learning From World Knowledge
Go beyond supervised learning
Learning from indirect supervision signals
68
After the vessel suffered a catastrophic torpedo
detonation, Kursk sank in the waters of Barents
Sea with all hands lost.
![Page 69: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/69.jpg)
Learning From World Knowledge
Massive textual data on the Internet
Wikipedia: 4.7 M English articles 35M in total
Tweets: 500 M per day & 200 Billion per year
Learn world knowledge to support target tasks
Extract knowledge from free text
Handle large-scale data
Inference on knowledge bases
69
[EMNLP 13a, 14, ICML 14]
[Liblinear, KDD 12]
[EMNLP 14b, 14]
![Page 70: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/70.jpg)
Applications & Tools
LIBLINEAR: library for classification
Streaming Data SVM:
Support training on very large data
Illinois-SL: library for structured prediction
Support various algorithms; parallel β very fast
Provide a nice platform
β’ for developing novel methods
β’ for collaboration
β’ for education
More easy-access tools; More collaborations70
![Page 71: Inference Protocols for Coreference Resolutionweb.cs.ucla.edu/~kwchang/documents/slides/practical.pdfMy Research Contributions Data Size Problem Complexity Limited memory linear classifier](https://reader033.fdocuments.us/reader033/viewer/2022053012/5f0f66e97e708231d443fb42/html5/thumbnails/71.jpg)
Conclusion
Goal: Practical Machine Learning
[Modeling] Expressive and general formulations
[Algorithms] Principled and efficient
[Applications] Support many applications
Code and Demos:
http://www.illinois.edu/~kchang10
71