Neural Speed Reading via Skim-RNN - Minjoon Seo · 2020. 8. 19. · Neural Speed Reading via...

Neural Speed Reading via Skim-RNNMinjoon Seo1,2*, Sewon Min3*, Ali Farhadi1,4,5, Hannaneh Hajishirzi1

University of Washington1, NAVER Clova2, Seoul National University3, Allen Institute for AI4, XNOR.AI5

ICLR 2018 @Vancouver, Canada - April 30 (Mon)UWNLP

BigRNN

SmallRNN

READ SKIM

BigRNN

SmallRNN

READ SKIM

#" #$ #%#&

!& !%!$!'

“Intelligent” “and” “invigorating” “film”

(&=1 ("=2 ($=1 (%=2

Training

!′ θ; % = ! θ; % + γ1*+,-.

(− log Pr %, = 2 )

(γ: hyperparameter, encourages skimming)

Goal: minimize 9[!′ θ ]=∑; !′ θ; % Pr(%)→ sample space = 2T for % = [%., … , %/]

Gumbel-Softmax• Biased but low variance, good empirical results

• Reparameterize ℎ, = A,.ℎ,

B, where

A,C =DEF( G(HIJ KLM NOLM) P)

∑Q DEF( ⁄(HIJ KLQ NOLQ) P)

(S ~ Gumbel 0,1 , τ: hyperparameter)• Slowly anneal (decrease τ), making the distribution

more discrete to allow differentiation with stochasticityModelConsists of two RNNs, sharing hidden state• Big RNN (hidden size of d)

updates the entire hidden state. O(d2)• Small RNN (hidden size of d’, where d≫d’)

updates a small portion of the hidden state. O(d’d)

c,: Input state at d,ℎ,: Hidden state at d, %,: Random variable for skim decision at d

e,=softmax(α c,, ℎ,f. )%,~ Multinomial(e,)

ℎ, = gh c,, ℎ,f. , ih %, = 1[hj c,, ℎ,f. ; ℎ,f.[kj + 1: k]], ih %, = 2

(h, hj: big and small RNN, respectively)

Related WorkLSTM-Jump (Yu et al., 2017)

• Skip rather than skim• No outputs for skipped time steps

Variable-Computation RNN (VCRNN) (Jernite et al., 2017)

• Use 1 RNN, and controls # of hidden units to update by making a d-way decision

• Required to be invariant of # of hidden units being updated

• Computing loss is more intractable (dT vs 2T)

ExperimentsText Classification (w/ LSTM)

Question Answering (SQuAD) (w/ LSTM+Attention)

1 1.5 2 2.5 3

Flop-R (Float operation Reduction)

d’ = 10 d’ = 0

Dataset SST Rotten Tomatoes IMDb AGNewsRegular 86.4 82.5 91.1 93.5

LSTM-Jump - 79.3 / 1.6x Sp 89.4 / 1.6x Sp 89.3 / 1.1x Sp

VCRNN 81.9 / 2.6x FLOP-R 81.4 / 1.6x Sp - -

Skim-RNN 86.4 / 3.0x FLOP-R 84.2 / 1.3x Sp 91.2 / 2.3x Sp 93.6 / 1.0x Sp

F1 Exact Match FLOP-RRegular 75.5 67.0 1.0x

VCRNN 74.9 65.4 1.0x

Skim-RNN 75.0 66.0 2.3x

▲ Accuracy, Floating point operation reduction (FLOP-R) and Speed-up (Sp). Skim-RNN achieves comparable to/better than regular RNN and related work, with up to 3.0x FLOP-R.

▲ Skim-RNN achieves comparable result to Regular RNN with 2.3x FLOP-R on SQuAD.

▲ Examples on IMDb. Skim-RNN skims black words and fully reads blue words.

◀ Trade-off between F1 and FLOP-R, obtained by adjusting the threshold for the skim. d’=10 & d’=0.

▲ Examples on SQuAD with Skim-RNNs in different levels. Upper LSTMs skim more than lower LSTMs do.

MotivationRNN on CPUs• RNNs are slow on CPUs/GPUs• CPUs are often more desirable than GPUs

Human Speed Reading• Skim unimportant words and fully read important words

Our contributions• Dynamically decide which one to use between big and

small RNN with shared hidden state• Same interface as standard RNN → Easily Replaceable• Computational Advantage over standard RNN on CPUs

with comparable/better accuracy

Neural Speed Reading via Skim-RNN - Minjoon Seo · 2020. 8. 19. · Neural Speed Reading via...

Documents

Transcript of Neural Speed Reading via Skim-RNN - Minjoon Seo · 2020. 8. 19. · Neural Speed Reading via...

Training a Recurrent Neural Network to Predict Quantum … · 2019. 5. 13. · Recurrent Neural Network JPA RNN Learning Stochastic Quantum Dynamics Recurent Neural Network 32 neurones

Deep Neural Networks - theodore bluchewith multidimensional recurrent neural networks. In Advances in neural information processing systems (pp. 545-552). ((MDLSTM-RNN -- the state-of-the-art,

Fundamentals of Recurrent Neural Network (RNN) and Long ...

Compression of Neural Machine Translation Models via Pruningnlp.stanford.edu/pubs/see2016compression.pdf(2015b) to recurrent neural networks (RNN), in particular LSTM architectures

Keras Intro ADACS - OzGrav · 2019. 3. 21. · •To employ Neural Networks for learning over time sequences of data, we use a Recurring Neural Network (RNN). •A Recurrent neural

Automated Neural Image Caption Generator for Visually Impaired …cs224d.stanford.edu/reports/mcelamri.pdf · 2016-06-20 · then fed into a vanilla RNN or a LSTM. The vanilla RNN

NEURAL SPEED READING VIA SKIM-RNN€¦ · For instance, in question answering, a rather efﬁcient strategy would be to allocate less computation on irrelevant parts of the text (to

Recurrent Neural Networksbhiksha/courses/deeplearning/...The Backpropagation Through Time (BTT) Algorithm Different Recurrent Neural Network (RNN) paradigms How Layering RNNs works

180912 Lec6 RNN architectures(LSTM, GRU 등 최신 RNN 모델, …alinlab.kaist.ac.kr/resource/Lec6_RNN_architectures.pdf• Google’s Neural Machine Translation (GNMT) 3. Overcoming

DNPU: An 8.1TOPS/W Reconfigurable CNN-RNN Processor for ... · PDF file14.2: DNPU: An 8.1TOPS/W Reconfigurable CNN-RNN Processor for General-Purpose Deep Neural Networks © 2017 IEEE

Stacking velocity estimation using recurrent neural network … · 2018-09-17 · Recurrent Neural Network (RNN) A RNN is a type of Neural Network which is quite similar to a feedforward

Recurrent Neural Networks (RNN) and Long-Short-Term … · Last Time: CNN Architectures. ... Sequential Processing of Non-Sequence Data Ba, Mnih, and Kavukcuoglu, ... Recurrent Neural

Beyond Credential Stuffing: Password Similarity Models ... · Encoder RNN Decoder RNN Encoder-decoder architecture built using character level recurrent neural network (RNN) Pass2Path

Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered

1 Machine Learning: Naïve Bayes, Neural Networks, Clustering Skim 20.5 CMSC 471.

Neural Network Part 4: Recurrent Neural Networkspages.cs.wisc.edu/~jerryzhu/cs760/10_neural-networks-4.pdf · •recurrent neural networks (RNN) and the advantage •training recurrent

Recurrent Neural Networks - znu.ac.ircv.znu.ac.ir/afsharchim/DeepL/RNN1.pdf · Summary of RNN types (1) Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized

Recurrent Neural Networks · 2020-04-06 · Overview •Finite state models •Recurrent neural networks (RNNs) •Training RNNs •RNN Models •Long short-term memory (LSTM) •Attention

Intro to Recurrent Neural Networks€¦ · This Lecture: 1. Segmentation Review 2. Intro to Polygon-RNN 3. Modeling Sequences 1. Vanilla RNN 2. Training and Problems 3. LSTM 4. Convolutional

Gated Feedback Recurrent Neural Networkssji/classes/DL16/RNN-basic/Gated Feedback... · Gated Feedback Recurrent Neural Networks ... of character-level language modeling and Python