Labeling Text in Several Languages with Mul;lingual Hierarchical … · 2021. 1. 18. · June 9,...
Transcript of Labeling Text in Several Languages with Mul;lingual Hierarchical … · 2021. 1. 18. · June 9,...
June9,2017Swisstext,Winterthur
NikolaosPappas,AndreiPopescu-BelisIdiapResearchIns;tute
LabelingTextinSeveralLanguageswithMul;lingualHierarchicalAEen;onNetworks
NikolaosPappas /18
TopicRecogni;on
2
Spamfiltering—MailboxOp;miza;on—CustomerSupport
NikolaosPappas /18
Ques;onAnswering
3
Reading/Naviga;onAssistant—Interac;veSearch
Ques;on:WhichGaudi’screa;onishismasterpiece?Answer:SagradaFamília
NikolaosPappas /18
MachineTransla;on
4
DocumentTransla;on—DialogueTransla;on
NikolaosPappas /18
FundamentalFunc;on:Represen;ngWordSequences
5
• Goal:Learnrepresenta;ons(distributedvectors)ofwordsequenceswhichencodeeffec;velythemeaning/knowledgeneededtoperform
✓ TopicRecogni;on• Ques;onAnswering• MachineTransla;on• Summariza;on
Can we benefit from multiple languages?
…
NikolaosPappas /18
DealingwithMul;pleLanguages:Monolingually
6
• Solu:on?Separatemodelsperlanguage• language-dependentlearning• lineargrowthoftheparameters• lackofcross-languageknowledgetransfer• hierarchicalmodelingatthedocument-level
Documents X = {xi | i=1…n}
Labels Y = {yi | i=1…n}
Models f: X →Y
(Yang et al., 2016)
(Tang et al., 2015)(Lin et al., 2015)
(Kim, 2014)
NikolaosPappas /18
DealingwithMul;pleLanguages:Mul;lingually
7
• Solu:on?Singlemodelwithalignedinputspace• language-independentlearning• constantnumberofparameters• commonlabelsetsacrosslanguages• modelingattheword-level
Model
(Ammar et al., 2016)
(Gouws et al., 2015)
(Herman and Blunsom, 2014)(Klementiev et al., 2012)
NikolaosPappas /18
ModelMModel1 Model2 …
DealingwithMul;pleLanguages:Ourcontribu;on
8
• Solu:on:Singlemodeltrainedoverarbitrarylabelsetswithanalignedinputspace
• language-independentlearning• sub-lineargrowthofparameters• arbitrarylabelsetsacrosslanguages• hierarchicalmodelingatthedocument-level
NikolaosPappas /18
Background:HierarchicalAEen;onNetworks(HANs)
9
Sentences:Document:
Words:
• Input:sequenceofwordvectors
• Output:documentvectoru
• Hierarchicalstructure- Word-levelandsentence-levelabstrac;onlayers-encoder(Hs,Hw)-aEen;onmechanism(aw,αs)
- Classifica;onlayer(Wc)+cross-entropy• Training:usingSGDwithADAM
(Yang et al., 2016)
NikolaosPappas /18
MHANs:Mul;lingualHierarchicalAEen;onNetworks
10
NikolaosPappas /18
• Afewernumberofparametersisneeded• θenc = {H, W(l), H, W(l), W(l)} , θatt = {H(l), W, H(l) , W , W(l)}
• θboth = {H, W, H, W, W(l)} , θmono = {H(l), W(l), H(l), W(l), W(l)}
• Thefollowinginequali;esaretrue:
• ExamplewithsharedaEen;onmechanisms
Mul;lingualAEen;onNetworks:Computa;onalCost
11
Naive DLmultilingual adaptation
fails!
NikolaosPappas /18
Mul;lingualAEen;onNetworks:TrainingStrategy
12
• Minimizingthesumofthecross-entropyerrors
• Issue:Naiveconsecu;vetrainingbiasesthemodel• Sampledocument-labelpairsforeachlanguageinacyclicfashion:
(L1,…,LM)(1)→…→(L1,…,LM)(M)
• Op:mizer:SGDwithADAM(sameasbefore)
NikolaosPappas /18
Dataset:DeutscheWelleCorpus(600kdocs,8langs)
13
Tagged by journalists
NikolaosPappas /18
Full-resourceScenario:BilingualTraining
14
• Mul;lingualmodelsconsistentlyoutperformmonolingualones• SharingaEen;onisthebestconfigura;on(onaverage)• Tradi;onal(bow)vsneural(en+ar,biGRUencoders)
• en:75.8%vs77.8%—ar:81.8%vs84.0%
Input:40-d,Encoders:Dense100-d,AEen;ons:Dense100-dAc;va;on:relu
NikolaosPappas /18
Impr
ovem
ent
low
high
50%
5%
0.5%
Trai
ning
per
cent
age
Low-resourceScenario:BilingualTraining
15
NikolaosPappas /1816
Qualita;veAnalysis:English-German
• Trueposi;vedifference(mul;vsmono)increasesovertheen;respectrum• German
russland(21),berlin(19),irak(14),wahlen(13)andnato(13)
• Englishgermany(259),german(97),soccer(73),football753(47)andmerkel(25)
Cum
ulat
ive
TP d
iffer
ence
Labels sorted by frequency
NikolaosPappas /1817
Qualita;veAnalysis:InterpretableOutput
NikolaosPappas /18
ConclusionandPerspec;ves
18
• Newmul;lingualmodelstolearnshareddocumentstructuresfortextclassifica;on
• Benefitfull-resourceandlow-resourcelanguages• AchievebeEeraccuracywithfewerparameters• Capableofcross-languagetransfer
• Futurework• Removetheconstraintofclosedlabelsets• Incorporatelabelinforma;on• ApplytootherNLUtasks
NikolaosPappas /1819
User group meetingJuly 3, 2017 Caversham, UK
Demos Technical talks
Posters & discussions Contact us if interested!
Thankyou
NikolaosPappas /1820
• Mul;lingualHierarchicalAEen;onNetworksforTextClassifica;on,NikolaosPappasandAndreiPopescu-Belis,2017(submiEed)
• WaleedAmmar,GeorgeMulcaire,YuliaTsvetkov,GuillaumeLample,ChrisDyer,andNoahA.Smith.2016.Massivelymul;lingualwordembeddings.CoRRabs/1602.01925.
• StephanGouws,YoshuaBengio,andGregoryS.Corrado.2015.BilBOWA:Fastbilingualdistributedrepresenta;onswithoutwordalignments.32ndInterna;onalConferenceonMachineLearning.
• KarlMoritzHermannandPhilBlunsom.2014.Mul;lingualmodelsforcomposi;onaldistributedseman;cs.52ndAnnualMee;ngoftheAssocia;onforComputa;onalLinguis;cs.
• AlexandreKlemen;ev,IvanTitov,andBinodBhaEarai.2012.Inducingcrosslingualdistributedrep-894resenta;onsofwords.Interna;onalConferenceonComputa;onalLinguis;cs.
• ZichaoYang,DiyiYang,ChrisDyer,XiaodongHe,AlexSmola,andEduardHovy.2016.HierarchicalaEen;onnetworksfordocumentclassifica;on.InProceedingsofthe2016ConferenceoftheNorthAmericanChapteroftheAssocia;onforComputa;onalLinguis;cs:HumanLanguageTechnologies.
• DuyuTang,BingQin,andTingLiu.2015.Documentmodelingwithgatedrecurrentneuralnetworkforsen;mentclassifica;on.InEmpiricalMethodsonNaturalLanguageProcessing.
• RuiLin,ShujieLiu,MuyunYang,MuLi,MingZhou,andShengLi.2015.Hierarchicalrecurrentneuralnetworkfordocumentmodeling.ConferenceonEmpiricalMethodsinNaturalLanguageProcessing.
• YoonKim.2014.Convolu;onalneuralnetworksforsentenceclassifica;on.ConferenceonEmpiricalMethodsinNaturalLanguageProcessing.
References