Study of End to End memory networks

A presentation on

End to End Memory Networks (MemN2N)

Slides: 26Time: 15 minutes

IE 594 Data Science 2University of Illinois at Chicago, February 2017

Under the guidance of,Prof. Dr. Ashkan Sharabiani

By,Ashish Menkudale

The kitchen is north of the hallway.

The bathroom is west of the bedroom.

The den is east of the hallway.

The office is south of the bedroom.

How do you go from den to kitchen?

The kitchen is north of the hallway.

The bathroom is west of the bedroom.

The den is east of the hallway.

The office is south of the bedroom.

How do you go from den to kitchen?

Kitchen

Hallway

Bathroom Bedroom

Office

West, North.

Brian is frog.Lily is grey.

Brian is yellow.Julius is green.

What color is Greg?

Greg is frog.

Brian is frog.

Lily is grey.

Brian is yellow.

Julius is green.

What color is Greg?

Greg is frog.Br

Yellow.

External Global Memory

Memory Module

Controller Module

Output

Dedicated separate memory module.

Memory can be

stack or list/set of vectors.

Control module accesses memory

(read, write).

Advantage: stable, scalable.

Charles BabbageInvented

analytical engineConcept.

1791-1871

Konrad ZuseInvented

stored-program concept.

1910-1995

Warren Sturgis McCulloch.

Computational model for neural

networks1898-1969

Memory Networks

• Memory network with large external memory.required for low level tasks like object recognition.

• Writes everything to the memory, but reads only relative information.• Attempts to add long term memory component to make it more like artificial intelligence.

• Two types:• Strongly supervised memory network: Hard addressing.• Weekly supervised memory network: Soft addressing.

• Hard addressing: max of the inner product between the internal state and memory contents.

Mary is in garden. John is in office. Q: Where is John? Bob is in kitchen.

Walter Pitts.Computational

model for neural networks

1923-1969

Memory VectorsExample: Constructing memory vectors with bag of words (BoW)

Embed each wordSum embedding vectors

“Sam drops apple” apple =

Embedding vectorsmemory vector

Example: Temporal structure – special words for time and include them in bag of words.

1. Sam moved to garden2. Sam moved to kitchen.3. Sam drops apple.

apple + time =

Time EmbeddingTime Stamp

Bob is in kitchen. Mary is in garden. John is in office. Where is John?

Embed Embed Embed Embed

Internal State vector

John is in office

Embed +

Decoder

OfficeOutput

Supervision

Memory Controller

Memory NetworksInput

Issues with Memory Network

• Requires explicit supervision of attention during training. Need to say which memory the model should use.

• Need a model that just requires supervision at output.No supervision of attention required.

• Only feasible for simple tasks. Severely limits application of model.

End-To-End Memory Networks

• Soft attention version of MemN2N.• Flexible read-only memory.

• Multiple memory lookups (hops).• Can consider multiple memory before deciding output.• More reasoning power.

• End-to-end training.• Only needs final output for training.• Simple back-propagation.

Sainbayar Sukhbaatar

Arthur Szlam

Jason Weston

Rob Fergus

Tanh / ReLU

Dot product

Softmax

Weighted Sum

Linear

StateMemory module

Output TargetLossFunction

Controller moduleE.g. RNN

MemN2N architecture

Supervision

MemN2N in action : Single memory lookup

Sentences {Xi}

Softmax

Question q

Embedding BInner product

Embedding A

Embedding C

Probability

Weighted Sum

Weight

Softmax

PredictedAnswer

Mary is in garden. John is in office. Bob is in kitchen.

Where is John?

OfficeTraining: estimate embedding matrices A, B & C and output matrix W.

Multiple Memory Lookups: Multiple Hops

Sentences {Xi}

Input 1

Output 1

Question q

Input 2

Output 2

Input 3

Output 3

O3W �̂�

Predicted Answer

Components

I (Input): No conversion keep original text X.

G (Generalization): Stores I (X) in next available memory slot.

O (Output): Loops over all memories.Find best match of with X.Find best match of with ( , X)Can be extended to multiple number of hops.

R (Response) : Ranks all words in dictionary given o and returns best single word. Infact, RNN can be used here for better sentence correction.

Weight Tying

Weight tying : Indicates how weight vectors are multiplied to input and output component.

Two Methods:Adjacent:

Similar to stack layersOutput embedding of one layer are input embedding of the next layer.

Layer wise: Input embedding remains the same for every layer in architecture.

Scoring function

Question : Answers are mapped to story using word embedding.

Word Embedding : Maps different words in low dimensional vector space with advantage to calculate distance between word vectors.

Allow us to find similarity score between different sentence to understand maximum correlation between them.

Match (‘Where is football?’, ‘John picked up the football’).

qTUTUd : This model is default word embedding used in memory networks.q – Question.U – matrix by which word embedding are obtained. d – Answer.

Model Selection

Model Selection: Determines how to model story, questions and answer vectors for word embedding.

Two possible approach:

Bag of words model: Considers each word in a sentence.Embeds each word and sums resulting vector.Does not take into account context for each word.

Position Encoding: Considers position/context of sentence/words.Takes care of preceding and forwarding words.Maps it to low dimensional vector space.

Model RefiningAddition of noise.increasing training dataset.

Decisions for Configuration

• Number of hops

• Number of epochs

• Embedding size

• Training dataset

• Validation dataset

• Model selection

• Weight tying

RNN viewpoint of MemN2N

Plain RNN Memory Network

Input Sequence

Memory

All Input

Selected input Addressing signal

Inputs are fed to RNN one-by-one in order. RNN has only one chance to look at a certain input symbol.

Place all inputs in the memory. Let the model decide which part it reads next.

• More generic input format • Any set of vectors can be input • Each vector can be

o BOW of symbols (including location) o Image feature + feature position

• Location can be 1D, 2D, … • Variable size

Advantages of MemN2N over RNN

• Out-of-order access to input data

• Less distracted by unimportant inputs

• Longer term memorization

• No vanishing or exploding gradient problems

bAbi Project: Task CAtegories

Training dataset: 1000 questions for each tasks. Testing dataset: 1000 questions for each tasks.

Demo for bAbi tasks

bAbi Project: Benchmark results

1. GitHub project archives: https://github.com/vinhkhuc/MemN2N-babi-python2. https://www.msri.org/workshops/796/schedules/20462/documents/2704/assets/247343. Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes: https://arxiv.org/pdf/1607.00036.pdf4. bAbi answers: https://arxiv.org/pdf/1502.05698.pdf5. Memory Networks by Microsoft research: https://www.youtube.com/watch?v=ZwvWY9Yy76Q&t=1s6. Memory Networks (Jenil Shah): https://www.youtube.com/watch?v=BN7Kp0JD04o7. N gram – SVM – generative models difference.

http://stackoverflow.com/questions/20315897/n-grams-vs-other-classifiers-in-text-categorization8. Paper on results for bAbi tasks by Facebook AI team:

https://papers.nips.cc/paper/5846-end-to-end-memory-networks.pdf9. Towards AI-complete question answering : a set of prerequisite toy tasks https://arxiv.org/pdf/1502.05698.pdf

References

Questions

Study of End to End memory networks

Engineering

Transcript of Study of End to End memory networks

TANGO NETWORKS END USER LICENSE AGREEMENT End User … · Tango Networks’ End Use License Agreement Page 1 of 3 TANGO NETWORKS’ END USER LICENSE AGREEMENT Tango Networks, Inc.,

End-to-end Performance over Research Networks

Gated End-to-End Memory Networks

Memory Aggregation Networks for Efficient …openaccess.thecvf.com/content_CVPR_2020/papers/Miao...Memory Aggregation Networks for Efﬁcient Interactive Video Object Segmentation

Meta-Learning with Memory Augmented Neural Networks

End-to-End Text Recognition with Convolutional Neural Networks

Connectivity Solutions - End to End Networks

YUN-NUNG (VIVIAN) CHENyvchen/doc/IS16... · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in Interspeech, 2016.

Disconnected, Non end-to-end networks

End-to-end slicing in all- optical networks I.Baldine.

Hierarchical Temporal Memory using Memristor Networks: A ...

End to End KPI in Cellular Networks

Compound Memory Networks for Few-shot Video Classification · 2018. 8. 28. · augmented Neural Networks ·Compound Memory Networks 1 Introduction Deep learning models have been successfully

End-to-End Prediction of Buffer Overruns from Raw …End-to-End Prediction of Buffer Overruns from Raw Source Code via Neural Memory Networks Min-je Choi, Sehun Jeong, Hakjoo Oh and

End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding

End-to-end slicing in all-optical networks

CS230: Lecture 10 Class wrap-upcs230.stanford.edu/fall2018/lecture10.pdf · [Chen et al., End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding

End-to-End dialogue systems with Dynamic Memory Networks …trap.ncirl.ie/3416/1/sukanyahanumanthu.pdf · 2018-11-03 · End-to-End dialogue systems with Dynamic Memory Networks and

Powering AI Robots with Deep Learning - NVIDIA · 2017-11-06 · Deep Learning for Voice and Dialogue Speech LSTM-RNN (Recurrent Neural Networks) End-to-End Memory Networks (N2N MemNet)

Neuronal synchrony reveals working memory networks and … · Neuronal synchrony reveals working memory networks and predicts individual memory capacity J. Matias Palvaa,1, Simo Montoa,b,2,