Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured...
Transcript of Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured...
![Page 1: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/1.jpg)
Search-Guided,Lightly-SupervisedTrainingofStructuredPredictionEnergyNetworks
AndrewMcCallumPedram Rooshenas Dongxu Zhang Gopal Sharma
![Page 2: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/2.jpg)
StructuredPrediction
• Weareinterestedtolearnafunction• Xinputvariables• Youtputvariables
• Wecandefineas• ForaGibbsdistribution:
![Page 3: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/3.jpg)
StructuredPredictionEnergyNetworks(SPENs)
• Ifisparameterizedusingadifferentiablemodelsuchasadeepneuralnetwork:• WecanfindalocalminimumofEusinggradientdescent
• Theenergynetworksexpressthecorrelationamonginputandoutputvariables.• Traditionallygraphicalmodelsareusedforrepresentingthecorrelationamongoutputvariables.• Inference isintractable formostofexpressive graphicalmodels
![Page 4: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/4.jpg)
EnergyModels
[picture from Belanger (2016)]
[picture from Altinel (2018)]
![Page 5: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/5.jpg)
TrainingSPENs
• StructuralSVM(BelangerandMcCallum,2016)• End-to-End(Belangeretal.,2017)• Value-basedtraining(Gygli etal.2017)• InferenceNetwork(Lifu Tu andKevinGimpel,2018)• Rank-BasedTraining(Rooshenasetal.,2018)
![Page 6: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/6.jpg)
IndirectSupervision• Dataannotationisexpensive,especiallyforstructuredoutputs.• Domainknowledge asthesourceofsupervision.
• Itcanbewrittenasrewardfunctions• evaluatesapairofinputandoutputconfigurationintoascalarvalue• Foragivenx,wearelookingforthebestythatmaximize
6
![Page 7: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/7.jpg)
Search-GuidedTraining
Wehaveareward function thatprovides indirect supervision
![Page 8: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/8.jpg)
Search-GuidedTraining
Wehaveareward function thatprovides indirect supervision
Wewanttolearnasmooth versionof the rewardfunctionsuch thatwecanusegradient-descent inference attesttime
![Page 9: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/9.jpg)
Search-GuidedTraining
y0
Wesample apoint from energy function using noisygradient-descent inference
![Page 10: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/10.jpg)
Search-GuidedTraining
y0
y1
Wesample apoint from energy function using noisygradient-descent inference
![Page 11: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/11.jpg)
Search-GuidedTraining
y0
y2
y1
Wesample apoint from energy function using noisygradient-descent inference
![Page 12: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/12.jpg)
Search-GuidedTraining
y0
y2
y3y1
Wesample apoint from energy function using noisygradient-descent inference
![Page 13: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/13.jpg)
Search-GuidedTraining
y0
y2
y3y1
y4
Wesample apoint from energy function using noisygradient-descent inference
![Page 14: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/14.jpg)
Search-GuidedTraining
y0
y2
y3y1
y4y5
Wesample apoint from energy function using noisygradient-descent inference
![Page 15: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/15.jpg)
Search-GuidedTraining
y0
y2
y3y1
y4y5
Thenweproject thesample tothedomain ofthe rewardfunction(thesample isapoint inthesimplex,but thedomain ofthe rewardfunction isoften discrete, i.e.,theverticesof thesimplex)
![Page 16: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/16.jpg)
Search-GuidedTraining
y0
y2
y3y1
y4y5
Then thesearchprocedure usesthesampleasinput andreturns anoutput structure bysearching therewardfunction
![Page 17: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/17.jpg)
Search-GuidedTraining
y0
y2
y3y1
y4y5
Weexpectthatthe twopoints havethesamerankingon thereward function andnegative oftheenergy function
![Page 18: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/18.jpg)
Search-GuidedTraining
y0
y2
y3y1
y4y5
Rankingviolation
Weexpectthatthe twopoints havethesamerankingon thereward function andnegative oftheenergy function
![Page 19: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/19.jpg)
Search-GuidedTraining
y0
y2
y3y1
y4y5
Whenwefind apairofpoints thatviolates theranking constraints,weupdate theenergy function towards reducing theviolation
![Page 20: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/20.jpg)
Task-LossasRewardFunctionforMulti-LabelClassification• Thesimplestformofindirectsupervisionistousetask-lossasrewardfunction:
![Page 21: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/21.jpg)
DomainKnowledgeasRewardFunctionforCitationFieldExtraction
24
![Page 22: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/22.jpg)
DomainKnowledgeasRewardFunctionforCitationFieldExtraction
25
![Page 23: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/23.jpg)
DomainKnowledgeasRewardFunctionforCitationFieldExtraction
26
![Page 24: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/24.jpg)
DomainKnowledgeasRewardFunctionforCitationFieldExtraction
27
![Page 25: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/25.jpg)
EnergyModel
0.9
0.9
0.85
0.4
0.1
0.05
0.05
0.04
0.1
0.45
0.8
0.9
... ...
Input embedding
Tagdistribution
Convolutional layer with multiple filters
and differentwindow sizes
Max pooling and
concatenation Multi-layer perceptron
Tokens
WeiLi.
DeepLearning
for
...
Energy
...
...
...
...
...
...
...
author title ...
Filter
size
Filte
r siz
e
![Page 26: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/26.jpg)
PerformanceonCitationFieldExtraction
![Page 27: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/27.jpg)
Semi-SupervisedSetting• Alternativelyusetheoutputofsearchandground-truthlabelfortraining.
![Page 28: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/28.jpg)
ShapeParser
I
+
-
c(32,32,28) c(32,32,24)
t(32,32,20)
Parsing
![Page 29: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/29.jpg)
ShapeParser
+
-
c(32,32,28) c(32,32,24)
t(32,32,20)
Parsing
I
Predict
+
-
c(32,32,28) c(32,32,24)
t(32,32,20)
Parsing
![Page 30: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/30.jpg)
ShapeParser
+
-
c(32,32,28) c(32,32,24)
t(32,32,20)
Parsing
+
-
c(32,32,28) c(32,32,24)
t(32,32,20)
Parsing
+
-
c(32,32,28) c(32,32,24)
t(32,32,20)
Parsing
GraphicEngine
I O
Predict
+
-
c(32,32,28) c(32,32,24)
t(32,32,20)
Parsing
![Page 31: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/31.jpg)
ShapeParser
+
-
c(32,32,28) c(32,32,24)
t(32,32,20)
Parsing
+
-
c(32,32,28) c(32,32,24)
t(32,32,20)
Parsing
+
-
c(32,32,28) c(32,32,24)
t(32,32,20)
Parsing
GraphicEngine
I O
Predict
+
-
c(32,32,28) c(32,32,24)
t(32,32,20)
Parsing
![Page 32: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/32.jpg)
ShapeParserEnergyModel
0.8
1e-5
1e-5
0.01
1e-5
...
...
...
...
...
Convolutional layer
Program
circle(16,16,12)triangle(32,48,16)
+
circle(16,24,12)
Energy
1e-5
1e-5
1e-3
1e-5
0.9
circle(16,16,12) -...
CNN
Output
distribution
Input
image Multi-layer perceptron
![Page 33: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/33.jpg)
SearchBudgetvs.Constraints
![Page 34: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/34.jpg)
PerformanceonShapeParser
![Page 35: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model](https://reader033.fdocuments.us/reader033/viewer/2022060920/60abfd6d483b9d63174cfb30/html5/thumbnails/35.jpg)
ConclusionandFutureDirections
• Ifarewardfunctionexiststoevaluateeverystructuredoutputintoascalarvalue• Wecanuseunlabled datafortrainingstructuredpredictionenergynetworks
• Domainknowledgeornon-differentiablepipelinescanbeusedtodefinetherewardfunctions.• Themainingredientforlearningfromtherewardfunctionisthesearchoperator.• Hereweonlyusesimplesearchoperators,butmorecomplexsearchfunctionsderivedfromdomainknowledgecanbeusedforcomplicatedproblems.