^ E W } v Æ µ o ] Ì v ] } v r E Á } l ( } } v À ] } v o Y µ ] } v v Á ] v P ·...

SDNet: Contextualized Attention-based Deep Network for Conversational

Question AnsweringJan. 31, 2019

Chenguang ZhuMicrosoft Speech and Dialogue Research Group

Motivation

• Traditional MRC models target for single-turn QA scenario

• No correlation between different questions/answers

• Humans interact by conversation, with rich context

• Models with conversational AI better match realistic settings, like customer support chatbot, social chatbot

CoQA Dataset

• From Stanford NLP Lab

• First conversational question and answering datasets

• Context can be carried over, requiring coreference resolution

• Model should comprehend both the passage and questions/answers from previous rounds

Example

Our formulation

• Passage (context): • Previous rounds of questions and answers: • Current question:

• We prepend the latest rounds of QAs to the current question:•

Encoding Layer

• For each word in passage and question• 300-dim GloVe word embedding• 1024-dim BERT contextual embedding

Using BERT

• We utilize BERT as a contextual embedder, with some adaptation

• BERT tokenizes text into byte pair encoding (BPE)

• It may not align with canonical tokenizers like spaCy

Using BERT--Alignment

• Tokenize text into word sequence

• Then use BERT to tokenize each word into BPEs:•

• After obtaining contextual embedding for each BPE, we average the BERT embeddings for all the BPEs corresponding to a word

Using BERT—Linear combination

• BERT has many layers and the paper recommends using only the last layer

• We linearly combine all layers of output

• This can greatly improve the results

Using BERT—Linear combination

Linear combination Averaged BPE embedding for a word

Using BERT– Lock internal weights

• Locking weights of BERT (no backpropagation)

• In our case, finetuning does not work well

Representation

• Each word in passage is represented by

• Each word in question is represented by

Integration Layer

• Word-level Inter-Attention• Attend from question to passage, i.e. :

• Passage words have several more features• POS, NER embedding• 3-dim exact matching vector• A normalized term frequency

• Passage (context) and question words both go through K layers of Bi-LSTMs

• Question words go through one more RNN layer

Question self-attention

• Question contains conversation history• We need to establish relationship between history and current question• Self-attention:

• is the final representation of question words

Multi-level inter-attention

• Adapted from FusionNet (Huang et al. 2018)

• After multi-level inter-attention, we conduct RNN, self-attention and another RNN to obtain the final representation of context:

Generate answer – positional probability

• Firstly, we condense the question into one vector

• Then we generate the probability that the answer starts and ends at each position:

Start probability:

GRU: End probability:

• We generate the probability of “yes”, “no”, “no answer” similarly

Overall structure -- SDNet

Training and prediction

• Loss is cross-entropy• Batch size: 1 dialogue• 30 epochs• ~2 hours per epoch on one Volta-100 GPU

• Prediction of span is done by choosing maximum (start x end) probability, with length <= 15

• Ensemble model consists of 12 best single models with different random seeds

Experiments

• CoQA dataset• Dev set: 100 passages per in-domain topics• Test set: 100 passages per in-domain and out-of-domain topics

Leaderboard (As of 12/11/2018)

Ablation Studies

Effect of history

Training

Summary

• SDNet leverages MRC, BERT and conversational history

• BERT is important, but adaptation is required to use it as a module

• Next step is open-domain conversational QA, with no passage given, and model may also ask questions

Introduction to SDRG

• Microsoft Speech and Dialogue Research Group

• Conduct research on speech recognition and conversational AI

• We’re from Stanford, Princeton, Cambridge, IBM, …

• We’re now hiring researchers and machine learning engineers

• We have world-class research environment• DGX-1 machine and large clusters with tons of GPUs

• Welcome to learn more!

• Thank you!

• My Email: chezhu@microsoft.com

• More details in our paper:• SDNet: Contextualized Attention-based Deep Network for Conversational

Question Answering – Chenguang Zhu, Michael Zeng, Xuedong Huang• https://arxiv.org/abs/1812.03593

^ E W } v Æ µ o ] Ì v ] } v r E Á } l ( } } v À ] } v o Y µ ] } v v Á ] v P ·...

Documents

Transcript of ^ E W } v Æ µ o ] Ì v ] } v r E Á } l ( } } v À ] } v o Y µ ] } v v Á ] v P ·...

v Ç } µ . v Z o ] } ( Á } ( } u & v l ] À Z < v ] P Z [ E M o µ …...o µ W Á } u ] P Z l Á } } v ] P } v o d } Á ^ Á } Z P } v < v ] P Z o , } > } µ ] FRANKIE'S

Z ì ~ ( / v } E } v P ] U / u ] o o ] v Z µ ] } U } Ç } µ Á v } Á ] v P Ç M } v ... · 2020-03-29 · v P o E Ç u Z } A í ^ o µ A ì/ u o o Ç µ ] } µ } µ Á Z Á } o

Z µ o } ( Z µ À Ç } v Z µ } ( Z } ] v Á v : µ v î ì U î ì ... · Z µ o } ( Z µ À Ç } v Z µ } ( Z ] } o ] v / v v } v o } ] v Á v : µ v î ì U î ì í ò v u î

March 2016 AMI Newsletter...Y> u v ] d ] v ] v P ^ µ Ç v ' µ > µ W } P u Á Á Á X µ X µ X µ l u ] 0DUFK D ^ ] }

P v µ ^ o } ( d v í Á W Á Á Á X l v v µ ] } X } · 2021. 1. 21. · w z } v w = õ í ð õ ì î ð ô í ì ì ì Á w Á Á Á x l v v µ ] } x } 7(1'(5 '2&80(17 7hqghu

µ v P v X ] Á Á Á X u

Annex A Hazard id and characterisation rev 4 clean...E ] l o ] v ( } } v ] v l ] v P Á Á Á Á X ( X µ } X µ l ( i } µ v o ï &^ : } µ v o î ìzz V À } o µ u ~ ] µ WEEEE

BEETHOVEN’S FIFTH - BernaertsMusic.comBEETHOVEN’S FIFTH E } } ( Z ] µ o ] } v u Ç } µ ] v v Ç ( } u } ( ] v Á ] Z } µ ] } v Á ] © v u ] ] } v } ( Z µ o ] Z BEETHOVEN’S

Brochure€¦ · /(Ç}µ vv} Z µ i Ç}µÁ] Z } µ Ç]v Z] o] U o } }v µ XdZ ] v Æ oo v Zv Z Á Á]oo o } vP }µ (} Ç}µ]v Ç}µ Z} v. o } ( µ ÇX u] W }P uu t Z}À Á v Ç ]+

h ] v Ç d / v ( ] } v ~hd/ ] v K o µ o Á Á Á XZ Æ& ] o X ï ... · h ] v Ç d / v ( ] } v ~hd/ ] v K o µ o Á Á Á XZ Æ& ] o X / v Ç u } u ] ] v ] À ] µ o U v µ ] v u

Actuonix L16 Datasheet E - Amazon S3L16+Datasheet.pdf · Á Á Á X µ } v ] Æ X } u X W µ Z } U À } o µ u µ } U v µ } u µ v v } o µ } v ] Æ X } u

& } ] P Z v ] } ] v ] ( Ç ] v P ( µ µ l ] o o v v v : v µ Ç î ì î ì...D d ^ Á ] o ] v l Z W l l Á Á Á X } i u X µ l µ o l o ] À o l > µ Z } / } v v ] P U ' } P ^ u

Hypnotic Storytelling 003 - Mindcamp Canadamindcamp.org/handouts/2016/hypnotic-storytelling.pdfv ] } µ lh v } v ] } µ D ] v } v ] } µ u ] v ±/ v o µ À Ç Z ] v P Z Á Á } (

ÁÁÁX ] ] µ v }X - espiritusanto.cc · Á Á Á X ] ] µ v } X , } oÇt ll ^ Z µ o ]oõ W ]}vl Wou ^µv Ç ^ µ Çs] P]oD WðWì ì v òWì ì Xu X

P µ ] v P Z ] o v µ ] v P > } l } Á v€¦ · ^ ( P µ ] v P Z ] o v µ ] v P > } l } Á v d Z o } } ( ] + v Á Ç Ç } µ v P Z o ] ( Ç } µ Á } ] Title: C:UserscscfsgsmAppDataLocalTempmso9EAB.tmp

2020 Who we ared } } u Y X t Z À v } } } u } ] v Á ] Z µ ] u v ] v ] Y. P & o } ( µ ^E Á _ À Z ] o Á u v µ ( µ ] v í õ õ ð

Hyundai V14 New Map Features S4 · d Z o } Á ( µ Á µ ] v P ] ( ( v ] } X / ( Ç } µ µ v u À ] } v ] Z u } v Á U Z

Rustblock Rust Converter - Paint Online · µ o ] v D µ o ] v K Á v 6DIHW\ 'DWD 6KHHW ... µ o ] v D µ o ] v K Á v 6DIHW\ 'DWD 6KHHW

v o P í Ì µ Z ] Z o ] v ] ( º ] ' Á Z µ v P À } v µ Á v µ v P v µ u ^ } v ... · v o P í Ì µ Z ] Z o ] v ] ( º ] ' Á Z µ v P À } v µ Á v µ v P v µ u ^ } v µ

SUMMER PROJECT GUIDES - menlo.edu · ICS Recommends: SUMMER PROJECTS Ç } µ µ v Á Z } i µ . v ] Z Z ] v P u Á Z } Á v } P ] v } u o À v Æ ] v ... µ µ u M d Z ] l u ] P Z