Attention-based Encoder-Decoder Networks for Spelling and ...€¦ · 2 Background Probabilistic...

Sina Ahmadi

Introduction

Problemdefinition

Background

Probabilistictechniques

Neural Networks

NLP challenges

Methods

Seq2seq

Attentionmechanism

Experiments

Qualitativecomparison

Conclusionand futurework

Conclusion

Future studies

References

Questions

Attention-based Encoder-Decoder Networksfor Spelling and Grammatical Error Correction

Sina Ahmadi

Paris Descartes University

sina.ahmadi@etu.parisdescartes.fr

September 6, 2017

Sina Ahmadi

Introduction

Problemdefinition

Background

Neural Networks

NLP challenges

Methods

Seq2seq

Attentionmechanism

Experiments

Conclusion

Future studies

References

Questions

Overview1 Introduction

Problem definition

2 BackgroundProbabilistic techniquesNeural NetworksNLP challenges

3 MethodsRNNBRNNSeq2seqAttention mechanism

4 ExperimentsQualitative comparison

5 Conclusion and future workConclusionFuture studies

6 References

7 Questions

Sina Ahmadi

Introduction

Problemdefinition

Background

Neural Networks

NLP challenges

Methods

Seq2seq

Attentionmechanism

Experiments

Conclusion

Future studies

References

Questions

Introduction

Automatic spelling and grammar correction is the task of auto-matically correcting errors in written text.

• This cake is basicly sugar, butter, and flour. [→ basically]

• We went to the store and bought new stove. [→ a new stove]

• i ’m entirely awake. [→ {I, wide}]

The ability to correct errors accurately will improve

• the reliability of the underlying applications

• the construction of software to help foreign languagelearning

• to reduce noise in the entry of NLP tools

• better processing of unedited texts on the Web.

Sina Ahmadi

Introduction

Problemdefinition

Background

Neural Networks

NLP challenges

Methods

Seq2seq

Attentionmechanism

Experiments

Conclusion

Future studies

References

Questions

Problem definition

Given a N-character source sentence S = s1, s2, ..., sN with itsreference sentence T = t1, t2, ..., tM , we define an error correc-tion system as:

Definition

T = MC (S) (1)

where T is a correction hypothesis.

Question: How can the MC function can be modeled?

Sina Ahmadi

Introduction

Problemdefinition

Background

Neural Networks

NLP challenges

Methods

Seq2seq

Attentionmechanism

Experiments

Conclusion

Future studies

References

Questions

Background

Various algorithms propose different approaches:

• Error detection: involves determining whether an inputword has an equivalence relation with a word in thedictionary.

• Dictionary lookup• n-gram analysis

• Error correction: refers to the attempt to endow spellcheckers with the ability to correct detected errors.

• Minimum edit distance technique• Similarity key technique• Rule-based techniques• Probabilistic techniques

Sina Ahmadi

Introduction

Problemdefinition

Background

Neural Networks

NLP challenges

Methods

Seq2seq

Attentionmechanism

Experiments

Conclusion

Future studies

References

Questions

Probabilistic techniques

We assume the task of error correction as a type of monolingualmachine translation where the source sentence is potentially er-roneous and the target sentence should be the corrected form ofthe input.

AimTo create a probabilistic model in such a way that:

T = argmaxT

P(T |S ; θ) (2)

where θ is the parameters of the model.This is called the Fundamental Equation of Machine Translation[Smith, 2012].

Sina Ahmadi

Introduction

Problemdefinition

Background

Neural Networks

NLP challenges

Methods

Seq2seq

Attentionmechanism

Experiments

Conclusion

Future studies

References

Questions

Neural networks as a probabilistic model

Activationfunction∑

......

inputs weights

Output

Hiddenlayer

Inputlayer

Outputlayer

• Mathematical model of the biologicalneural networks

• Computes a single output frommultiple real-valued inputs:

z =n∑

wixi + b = W T x + b (3)

• Putting the output into a non-linearfunction:

tanh(z) =e2z − 1

e2z + 1(4)

• Back-propagates in order to minimizethe loss function H:

θ∗ = argminθ

H(y − y) (5)

Sina Ahmadi

Introduction

Problemdefinition

Background

Neural Networks

NLP challenges

Methods

Seq2seq

Attentionmechanism

Experiments

Conclusion

Future studies

References

Questions

NLP challenges in Machine Translation

Large input state spaces → word embedding

No upper limit on the number of words.

Long-term dependencies

• Constraints: He did not even think about himself.

• Selectional preferences: I ate salad with fork NOT rake.

Variable-length output sizes

• This strucutre have anormality → 30 characters

• This structure has an abnormality. → 34 characters

Sina Ahmadi

Introduction

Problemdefinition

Background

Neural Networks

NLP challenges

Methods

Seq2seq

Attentionmechanism

Experiments

Conclusion

Future studies

References

Questions

Recurrent Neural Network

Unlike a simple MLP, can make use of all the previous inputs.Thus, it provides a memory-like functionality.

Inputs

Internal States

Outputs

~x0 ~x1 · · · ~xt−1 ~xt · · · ~xn−1 ~xn

~h0~h1 · · · ~ht−1

~ht · · · ~hn−1~hn

~y0 ~y1 · · · ~yt−1 ~yt · · · ~yn−1 ~yn

ht = tanh(Wxt + Uht−1 + b) (6)

yt = softmax(Vht) = (7)

W , U an V are the parameters of our network we want to learn.

Sina Ahmadi

Introduction

Problemdefinition

Background

Neural Networks

NLP challenges

Methods

Seq2seq

Attentionmechanism

Experiments

Conclusion

Future studies

References

Questions

Bidirectional Recurrent Neural Network

We can use two RNN models; one that reads through the in-put sequence forwards and the other backwards, both with twodifferent hidden units but connected to the same output.

. . . ~st−1 ~st ~st+1 ~st+2 . . . Forward states

←−s t+2←−s t+1

←−s t←−s t−1Backward states . . . . . .

~xt−1 ~xt ~xt+1 ~xt+2

~yt+2~yt+1~yt~yt−1

. . . . . .

~ht = tanh( ~Wxt + ~U~ht−1 + ~b) (8)←−ht = tanh(

←−Wxt +

←−U←−h t−1 +

←−b ) (9)

Sina Ahmadi

Introduction

Problemdefinition

Background

Neural Networks

NLP challenges

Methods

Seq2seq

Attentionmechanism

Experiments

Conclusion

Future studies

References

Questions

Sequence-to-sequence models

The sequence-to-sequence model is composed of two processes: encoding and decoding.

h3 = tanh(Wx2 + Uh2 + b)

embedding x1 embedding x2 embedding x3

x1 x2 x3

softmax softmax softmax

o1 o2 o3

s1 s2 s3

ht = RNN(xt , ht−1) (10)

c = tanh(hT ) (11)

where ht is a hidden state at time t, and c is the context vectorof the hidden layers of the encoder.

Sina Ahmadi

Introduction

Problemdefinition

Background

Neural Networks

NLP challenges

Methods

Seq2seq

Attentionmechanism

Experiments

Conclusion

Future studies

References

Questions

Attention mechanismThe attention mechanism calculates a new vector ct for the out-put word yt at the decoding step t.

ct =T∑j=1

2mm αij =exp(eij)∑T

k=1 exp(eik)(13)

eij = attentionScore(si−1, hj) (14)

attention

h3 = tanh(Wx2 + Uh2 + b)

embedding x1 embedding x2 embedding x3

x1 x2 x3

softmax softmax softmax

o1 o2 o3

s1 s2 s3

where hj is the hidden state of the word xj , and atj is the weightof hj for predicting yt . This vector is also called attention vector.

Sina Ahmadi

Introduction

Problemdefinition

Background

Neural Networks

NLP challenges

Methods

Seq2seq

Attentionmechanism

Experiments

Conclusion

Future studies

References

Questions

Experiments

Various metrics are used to evaluate the correction models, in-cluding MaxMatch M2 [Dahlmeier, 2012], I-measure [Felice, 2015],BLEU and GLEU [Napoles, 2015].

ModelM2 scorerP R F0.5

Baseline 1.0000 0.0000 0.0000

RNN 0.5397 0.2487 0.4373

BiRNN 0.5544 0.2943 0.4711

Encoder-decoder 0.5835 0.3249 0.5034Attention 0.5132 0.2132 0.4155

Table: Evaluation results of the models using MaxMatch M2 metric.Bold numbers indicate the scores of the best model.

Sina Ahmadi

Introduction

Problemdefinition

Background

Neural Networks

NLP challenges

Methods

Seq2seq

Attentionmechanism

Experiments

Conclusion

Future studies

References

Questions

Qualitative comparison

Source sentence 306 characters

اقويه اهبه ايهبوا اان ا، افعليهم االمهينه االمعادله اكان االسوريون ال ايرتضون ابهذه ااذا امن اتخليه اعن امنصبه ا، افلذلك اكلفه ااقل اوآلما االناس اارواح اان ايظن االشبيحه اكبير ال ازال اوالذل اوالثمن اسيكون اغاليا اولكنه ايستأهل اهذه االتضحيات االخنوعومن ا ا اعاما ااربعين امن ااكثر اندفع اثمن احبائي ايا اأاننا ا ا، اعنوه االعصابه احقوقهم امن اهذه اوياخذو اواحده

Gold standard reference 309 characters

لزال اكبير االشبيحة ايظن اأن اأرواح اوآلما االناس اأقل اكلفة امن اتخليه اعن امنصبه ا، افلذلك اإذا اكان االسوريون ال ايرتضون ابهذه االمعادلة االمهينة ا، افعليهم اأن ايهبوا اهبة اقوية اواحدة اويأخذوا احقوقهم امن اهذه االعصابة اعنوة ا. اإننا ايا اأحبائي اندفع اثمن اأكثر امن اأربعين اعاما ا، اومن االخنوع ا، اوالذل اوالثمن اسيكون اغاليا اولكنه ايستأهل اهذه ا. االتضحيات

Recurrent neural network model prediction 307 characters

اقويه اهبه ايهبوا اان ا، افعليهم االمهينه االمعادلة اكان االسوريون ال ايرتضون ابهذه ااذا اعن امنصبه ا، افلذلك اتخليه امن اكلفه ااقل اوآلما االناس اارواح اان ايظن االشبيحة اكبير ال ازالااضحي ايستتهههههه ا اولنن اااااا اسيكنن ا اوالمن اوالل ا،، اااخنوع اومن ا ا،عاما اأربعين امن ا اأكثر اندفع اثمنأحبائي ايا اأننا ا ا ا اعنوه االعاابه اههه انن احقوههم اوياخذذاو اواحده

Bidirectional recurrent neural network prediction 307 characters

اقويه اهبه ايهبوا اأن ا، افعليهم االمهينة االمعادلة اكان االسوريون ال ايرتضون ابهذه اإذا ا، افلذلك امنصبه اعن اتخليه امن اكلفه اأقل االناس اوالما اأرواح ايظن اأن االشبيحة اكبير ال ازال اههه اوككنهههستههل اسيكوننغغايا اوالمثن،والذل ا، االنووعومن او ا، اامم ا ارربعن اككثرممن ا امن ا اندف اأأحائئي اياننا ا ا ا اععوهه ااالعصبب اههه اقققهمممم ا اوياخذو اواحده

التضحيتSequence-to-sequence model prediction 306 characters

قويه اهبه ايهبوا اأن ا، افعليهم االمهينة االمعادله اكان االسوريون ال ايرتضون ابهذه اإذا امن اتخليه اعن امنصبه ا، افلذلك اكلفه اأقل االناس اوللما اأرواح اأن ايظن االشبيحة اكبير ال ازالووكنهههستهألههذههااتتضاا اااايي اييون اواللمنن اواللل ا ا االخنوو امن ا ا ا ا ا اع ارربعن ا ان ا اأككر اثمم احبائييندفع ايا ا اأننا ا ا اعنوه االلصابب اوحقققهمممننههه اوياخذو اواحده

Attention-based sequence-to-sequence model prediction 309 characters

اقويه اهبه ايهبوا اان ا، افعليهم االمهينة االمعادلة اكان االسوريون ال ايرتضون ابهذه ااذا امن اتخليه اعن امنصبه ا، افلذلك اكلفه ااقل االناس اوللما اارواح اان ايظن االشبيحه اكبير ال ازال ايتتههههذذه اوككنه اااليا اوالثمن ا ا ا اسيكون اواللل ا ا،الخخوع اومن ا ا، اعاماأربيين امن اأكثر امن ا ا اندف احببائي ايا اإناا ا ا اعنوه ااععصابة امن اهذه اواحقققههم اوياخذ اواحده

التضضي

Sina Ahmadi

Introduction

Problemdefinition

Background

Neural Networks

NLP challenges

Methods

Seq2seq

Attentionmechanism

Experiments

Conclusion

Future studies

References

Questions

Conclusion and future work

Conclusion

• Modeling correction error for any language.

• Variant results using different metrics.

• Reducing precision in correction of long sentences.

Future studies

• Models to be explored in more levels, e.g., word-level,phrase-level.

• Limiting the length of the sequences in training models.

• Using deeper networks with larger embedding size.

• Preventing over-learning of models by not training themover correct input tokens (action =”OK”).

Sina Ahmadi

Introduction

Problemdefinition

Background

Neural Networks

NLP challenges

Methods

Seq2seq

Attentionmechanism

Experiments

Conclusion

Future studies

References

Questions

References

Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., & Mercer, R. L.(1993)

The mathematics of statistical machine translation: Parameterestimation.

Computational linguistics 19(2), 263-311.

Daniel Dahlmeier and Hwee Tou Ng (2012).

Better evaluation for grammatical error correction.

Association for Computational Linguistics 568-572.

Mariano Felice and Ted Briscoe (2015).

Towards a standard evaluation method for grammatical errordetection and correction.

HLT-NAACL 578-587.

Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault(2015).

Ground truth for grammatical error correction metrics.

Association for Computational Linguistics 588-593.

Sina Ahmadi

Introduction

Problemdefinition

Background

Neural Networks

NLP challenges

Methods

Seq2seq

Attentionmechanism

Experiments

Conclusion

Future studies

References

Questions

Questions?

Attention-based Encoder-Decoder Networks for Spelling and ...€¦ · 2 Background Probabilistic...

Documents

Transcript of Attention-based Encoder-Decoder Networks for Spelling and ...€¦ · 2 Background Probabilistic...

8. Appendix - openaccess.thecvf.comopenaccess.thecvf.com/content_ICCV_2019/...Helps...The RNN and Seq2seq models are implemented in Ten-sorﬂow [1]. For the QuaterNet-SPL model we

How to get students attention. Index 1. Attention conditions1. Attention conditions 2. Attention types2. Attention types 3. Strategies to get children.

CAUTION: ATTENTION: ADVERTENCIA: IMPORTANT: ATTENTION ...

BRNN B2C PERNN T B2B BERS - IQPC Corporate · BRNN B2C PERNN T B2B BERS he op ay to eliver a B2C hopping Experience ithout Sacrificing B2B-Specific Purchasing Needs “Consumerization”

Vong restaurant menu hoveret 25x20cm 28.2.19 WEB spreadsªפריט-מסעדה.pdf52 | ena'po ,01plp oDnnu 39 | , N ymn .nlN 45 BrnN- 411 - 'mop pr.,' ,lnN anon' 42 | npvoo 39 .n1pNl

Attention, Attention, Attention!!!

Administrivia CS388: Natural Language Processinggdurrett/courses/fa2019/lectures/lec16-… · Bahdanau et al. (2014) Problems with Seq2seq Models ‣Unknown words: ‣In fact, we

Attention: Marketing in the Attention Age

Technetium-99mLabelingofDNA Oligonucleotidesjnm.snmjournals.org/content/36/12/2306.full.pdf · 0 @@ 0 oxo,_@00 BrnN FiGURE1.StructureofAandBDNA chainsusedinthisinvestigation. dissociatedlabelbysize-exclusionanalysisbeforeandafter

Bidirectional Recurrent Neural Networks as Generative Models · Bidirectional RNNs (BRNN) [25, 2] extend the unidirectional RNN by introducing a second hid-den layer, where the hidden

>T0515 ...bliulab.net/IDP-Seq2Seq/static/dataset/CASP.pdf · >t0515 mietpyylidkakltrnmeriahvreksgakallalkcfatwsvfdlmrdymdgttssslfevrlgrer fgkethaysvaygdneidevvshadkiifnsisqlerfadkaagiarglrlnpqvssssfdladparpfs

Attention. What is “attention” attention is poorly defined - different people mean different things by “attention”

Multilingual Dialogue Generation with Shared-Private Memorytcci.ccf.org.cn/conference/2019/papers/24.pdf · Keywords: Multilingual dialogue system Memory network Seq2Seq Multi-task

Chapter 4 Attention Attention Defining Attention

Package ‘brnn’Package ‘brnn’ January 9, 2020 Version 0.8 Date 2020-01-04 Title Bayesian Regularization for Feed-Forward Neural Networks Author Paulino Perez Rodriguez, Daniel

Attention exogène. Attention endogène. Chien perçu.

Learning to Explicitate Connectives with Seq2Seq Network ... · make the representation as similar as connective-augmented one and showed that the inclusion of implicit connectives

Matthew Greene CPSC490 Poster - yale-lily.github.io · LILY Lab Figure 4: Results for Transformer compared to STOA results for LSTM seq2seq. Title: Matthew_Greene_CPSC490_Poster Created

USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts

Interpreting, Training, and Distilling Seq2Seq Modelsnlp.seas.harvard.edu/seq2seq-talk/slidescmu.pdf · Actual Seq2Seq / Encoder-Decoder / Attention-Based Models Di erent encoders,