Lili Mou, Ph.D. Candidate Software Institute, Peking...
Transcript of Lili Mou, Ph.D. Candidate Software Institute, Peking...
![Page 1: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/1.jpg)
TreeBased Convolution and its Applications
Lili Mou, Ph.D. CandidateSoftware Institute, Peking University
[email protected]://sei.pku.edu.cn/~moull12
![Page 2: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/2.jpg)
Outline
● Introduction: Sentence Modeling● Related Work: CNNs, RNNs, etc● TreeBased Convolution● Conclusion and Discussion
![Page 3: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/3.jpg)
Discriminative Sentence Modeling
● Sentence modeling– Capture the “meaning” of a sentence
● Discriminative sentence modeling– Classify a sentence according to a certain criterion
E.g., sentiment analysis, polarity classification
![Page 4: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/4.jpg)
Discriminative Sentence Modeling
● Sentence modeling– Capture the “meaning” of a sentence
● Discriminative sentence modeling– Classify a sentence according to certain criterion
● Related to various NLP tasks– Sentence matching: QA [1], conversation [2]– Discourse analysis [3]– Extractive summarization [4]– Parsing [5]– Machine translation [6]
![Page 5: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/5.jpg)
An Example: Sentiment Analysis
![Page 6: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/6.jpg)
Human Engineering
● Feature engineering– Bagofwords, ngram, sentiment lexicon [7]
However, sentence modeling is usually nontrivial [8]
● Kernel machines, e.g., SVM– Circumvent explicit feature representation– The kernel is crucial as it fully summarizes data information
![Page 7: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/7.jpg)
Neural Networks● Automatic feature learning
– Word embeddings [9]– Paragraph vectors [10]
● Prevailing neural sentence models– Convolutional neural networks (CNNs) [11]
� Capture invariant features
� Efficient feature extraction and learning
� Structure insensitive
– Recursive neural networks (RNNs) [8, 12, 13]
� Structure sensitive
� Long propagation path
![Page 8: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/8.jpg)
Our Intuition
● Can we combine?– Short propagation path like convolutional nets– Structuresensitive like recursive nets
● Our solution: Treebased convolution [14, 15, 16]– Design subtree convolutional kernel to extract invariant
structural features
![Page 9: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/9.jpg)
Outline
● Introduction: Sentence Modeling● Related Work: CNNs, RNNs, etc● TreeBased Convolution● Conclusion
![Page 10: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/10.jpg)
Convolutional Neural Networks (CNNs)● Convolution in signal processing: Linear timeinvariant system
– Flip, innerproduct, and slide
● Convolution in the neural network regime– Sliding window
![Page 11: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/11.jpg)
[17] Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. JMLR, 2011.
![Page 12: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/12.jpg)
Convolutional Neural Networks (CNNs)
[11] Blunsom, Phil, Edward Grefenstette, and Nal Kalchbrenner. "A Convolutional Neural Network for Modelling Sentences." ACL, 2014.
![Page 13: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/13.jpg)
Recurrent Nerual Networks (RNNs)
● A recurrent net keeps one or a few hidden layers as the state● The state changes according to the discrete input
![Page 14: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/14.jpg)
Problem● Gradient vanishing or exploding
– Backpropagation is a linear system
● Design long short term memory (LSTM) units or gate recurrent units (GRU) to alleviate the problem
(N.B.: Not completely solved)
![Page 15: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/15.jpg)
Example: Relation Classification
[18] Yan Xu, Lili Mou, Ge Li, Yunchuan Chen, Hao Peng and Zhi Jin. "Classifying relations via long short term memory networks along shortest dependency paths." In EMNLP, pages 17851794, 2015.
![Page 16: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/16.jpg)
Recursive Neural Networks (RNNs again)
● Where does the tree come from?– Dynamically constructing a tree structure similar to constituency– Parsed by external parsers
Constituency tree– Leaf nodes = words– Interior nodes = abstract
components of a sentence (e.g., noun phrase)
– Root nodes = the whole sentence
![Page 17: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/17.jpg)
Why parse trees may be important?
[19] Pinker, Steven. The Language Instinct: The New Science of Language and Mind. Penguin UK, 1995.
![Page 18: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/18.jpg)
● Perceptionlike interaction [8]
p= f(W[c1 c2]T)
● Matrixvector interaction [12]
● Tensor interaction [13]
[8] Socher R, et al. Semisupervised recursive autoencoders for predicting sentiment distributions. EMNLP, 2011
[12] Socher, R, et al. "Semantic compositionality through recursive matrixvector spaces." EMNLPCoNLL, 2012.
[13] Socher, R, et al. "Recursive deep models for semantic compositionality over a sentiment treebank." EMNLP, 2013.
Recursive Propagation
![Page 19: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/19.jpg)
Recursive Propagation● Perceptionlike interaction [8]
p= f(W[c1 c2]T)
● Matrixvector interaction [12]
● Tensor interaction [13]
[8] Socher R, et al. Semisupervised recursive autoencoders for predicting sentiment distributions. EMNLP, 2011
[12] Socher, R, et al. "Semantic compositionality through recursive matrixvector spaces." EMNLPCoNLL, 2012.
[13] Socher, R, et al. "Recursive deep models for semantic compositionality over a sentiment treebank." EMNLP, 2013.
![Page 20: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/20.jpg)
Even More Interaction
● LSTM interaction [18, 19, 20]
[18] Tai, Kai Sheng, Richard Socher, and Christopher D. Manning. "Improved semantic representations from treestructured long shortterm memory networks." ACL, 2015
[19] Zhu, Xiaodan, Parinaz Sobihani, and Hongyu Guo. "Long shortterm memory over recursive structures." ICML, 2015.
[20] Le, Phong, and Willem Zuidema. "Compositional distributional semantics with long short term memory." arXiv:1503.02510 (2015).
![Page 21: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/21.jpg)
Outline
● Introduction: Sentence Modeling● Related Work: CNNs, RNNs, etc● TreeBased Convolution● Discussion● Conclusion
![Page 22: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/22.jpg)
TreeBased Convolutional Neural Network (TBCNN)
● CNNs
� Efficient feature learning and extractionThe propagation path is irrelevant to the length of a sentence
� Structureinsensitive● Recursive networks
� Structuresensitive
� Long propagation pathThe problem of “gradient vanishing or explosion”
![Page 23: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/23.jpg)
Our intuition
● Can we combine?– Structuresensitive as recursive neural networks– Short propagation path as convolutional neural networks
● Solution– The treebased convolutional neural network (TBCNN)
● Recall convolution = sliding window in the NN regime● Treebased convolution = sliding window of a subtree
![Page 24: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/24.jpg)
TreeBased Convolution
![Page 25: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/25.jpg)
Technical Difficulties
● How to represent nodes as vectors?● How to determine weights?● How to aggregate information?
![Page 26: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/26.jpg)
Technical Difficulties
● How to represent nodes as vectors?● How to determine weights?● How to aggregate information?
![Page 27: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/27.jpg)
Technical Difficulties
● How to represent nodes as vectors?● How to determine weights?● How to aggregate information?
![Page 28: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/28.jpg)
A Few Variants
● Constituency treebased convolutional neural network (cTBCNN) [15]
● Dependency treebased convolutional neural network
(dTBCNN) [15]● Abstract syntrax treebased convolutional neural network
(asTBCNN) [14]
![Page 29: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/29.jpg)
cTBCNN: constituency tree
● How to represent nodes as vectors?● Pretrain a recursive net
![Page 30: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/30.jpg)
● How to determine weights?● 3 matrices as parameters
– Parent, left child, and right child
cTBCNN: constituency tree
![Page 31: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/31.jpg)
● How to aggregate information? ● Dynamic pooling: 1way v.s. 3way
cTBCNN: constituency tree
![Page 32: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/32.jpg)
dTBCNN: dependency tree
● How to represent nodes as vectors?● Word embeddings
![Page 33: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/33.jpg)
● How to determine weights?● Assign weight by dependency type
(e.g., nsubj)
dTBCNN: dependency tree
![Page 34: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/34.jpg)
● How to aggregate information?● kway pooling
dTBCNN: dependency tree
![Page 35: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/35.jpg)
asTBCNN: abstract syntax tree
Programming language processing
vs
Natural language processing
● Abstract syntax tree of “int a = b + 3”
![Page 36: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/36.jpg)
asTBCNN: abstract syntax tree
● How to represent nodes as vectors?● Pretrain or random initialization
(44 AST symbols)
![Page 37: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/37.jpg)
● How to determine weights?● Continuous binary tree
asTBCNN: abstract syntax tree
![Page 38: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/38.jpg)
● How to aggregate information?● Global pooling or 3way pooling
asTBCNN: abstract syntax tree
![Page 39: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/39.jpg)
Performance
● Sentiment analysis [15]
![Page 40: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/40.jpg)
Performance
● Question classification [15]
![Page 41: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/41.jpg)
Performance
● Natural language inference [16]
(entailment and contradiction recognition )
…
![Page 42: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/42.jpg)
Performance
● 104label program classification [14]
![Page 43: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/43.jpg)
Model Analysis
● TBCNN is not sensitive to pooling
● TBCNN is less sensitive to length visavis recursive nets
![Page 44: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/44.jpg)
Visualization
● TBCNN does mix information
![Page 45: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/45.jpg)
Outline
● Introduction: Sentence Modeling● Related Work: CNNs, RNNs, etc● TreeBased Convolution● Conclusion and Discussion
![Page 46: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/46.jpg)
Conclusion
● A new neural model: Treebased convolution● Applications
– constituency trees, dependency tree, abstract syntax trees
![Page 47: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/47.jpg)
Discussion
● Is it possible for a neural network to “automatically” focus on relevant (e.g., treedependent) information by some attention mechanisms?
● Yes, but...
![Page 48: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/48.jpg)
Challenge of endtoend learning:
avg sum max attention argmax
Differentiability � � � � �
Supervision � � � � �
Scalability � � � � �
Discussion
![Page 49: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/49.jpg)
![Page 50: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/50.jpg)
Max/Avg/Sum pooling
![Page 51: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/51.jpg)
Attention=Weighted avgWeights are selfadaptive.αα
1α2 α
3
α4
α5 α
6
![Page 52: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/52.jpg)
Argmax
Softmax
Recall sentence generation
![Page 53: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/53.jpg)
Intuition
● Using external information to guide an NN instead of designing endtoend machines– Better performance in short term– May or may not conform to the goal of AI,
depending on how strict the external information is
Hard mechanismDifferentiability �
Supervision �
Scalability �
![Page 54: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/54.jpg)
Hard mechanismnon endtoend
Softmax
Recall sentence generation
![Page 55: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/55.jpg)
Thanks for listening!
Q&A?
![Page 56: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/56.jpg)
References[1] Chen K, Wang J, Chen LC, Gao H, Xu W, Nevatia R. ABCCNN: An Attention Based Convolutional Neural Network for Visual Question Answering. arXiv:1511.05960, 2015.
[2] Rui Yan, Yiping Song, Hua Wu. Learning to Respond with Deep Neural Networks for Retrieval based HumanComputer Conversation System. In SIGIR, 2016.
[3] Liu Y, Li S, Zhang X, Sui Z. Implicit discourse relation classification via multitask neural networks. In AAAI, 2016.
[4] Cao Z, Wei F, Dong L, Li S, Zhou M. Ranking with Recursive Neural Networks and Its Application to MultiDocument Summarization. In AAAI, 2015.
[5] Pei W, Ge T, Chang B. An effective neural network model for graphbased dependency parsing. In ACL, 2015.
[6] Meng F, Lu Z, Wang M, Li H, Jiang W, Liu Q. Encoding source language with convolutional neural network for machine translation. In ACL, 2015.
[7] Qiu G, Liu B, Bu J, Chen C. Expanding Domain Sentiment Lexicon through Double Propagation. In IJCAI, 2009.
![Page 57: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/57.jpg)
[8] Socher R., et al. Semisupervised recursive autoencoders for predicting sentiment distributions. EMNLP, 2011.
[9] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[10] Le QV, Mikolov T. Distributed representations of sentences and documents. In ICML, 2014.
[11] Blunsom, Phil, Edward Grefenstette, and Nal Kalchbrenner. "A Convolutional Neural Network for Modelling Sentences." ACL, 2014.
[12] Socher, R, et al. "Semantic compositionality through recursive matrixvector spaces." EMNLPCoNLL, 2012.
[13] Socher, R, et al. "Recursive deep models for semantic compositionality over a sentiment treebank." EMNLP, 2013.
[14] Lili Mou, Ge Li, Lu Zhang, Tao Wang, Zhi Jin. "Convolutional neural networks over tree structures for programming language processing." In AAAI, pages 12871293, 2016.
[15] Lili Mou, Hao Peng, Ge Li, Yan Xu, Lu Zhang, Zhi Jin. "Discriminative neural sentence modeling by treebased convolution." In EMNLP, pages 23152325, 2015.
[16] Lili Mou, Rui Men, Ge Li, Yan Xu, Lu Zhang, Rui Yan, Zhi Jin. "Natural language inference by treebased convolution and heuristic matching." ACL(2), 2016.
![Page 58: Lili Mou, Ph.D. Candidate Software Institute, Peking ...sei.pku.edu.cn/~moull12/resource/TBCNN_RForum.pdf · Convolution in signal processing: Linear timeinvariant system ... M, Kavukcuoglu](https://reader033.fdocuments.us/reader033/viewer/2022042005/5e6f799f35aeae56a739775b/html5/thumbnails/58.jpg)
[17] Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. JMLR, 2011.
[18] Yan Xu, Lili Mou, Ge Li, Yunchuan Chen, Hao Peng and Zhi Jin. "Classifying relations via long short term memory networks along shortest dependency paths." In EMNLP, 2015.
[19] Pinker, Steven. The Language Instinct: The New Science of Language and Mind. Penguin UK, 1995.
[20] Tai, Kai Sheng, Richard Socher, and Christopher D. Manning. "Improved semantic representations from treestructured long shortterm memory networks." ACL, 2015
[21] Zhu, Xiaodan, Parinaz Sobihani, and Hongyu Guo. "Long shortterm memory over recursive structures." ICML, 2015.
[22] Le, Phong, and Willem Zuidema. "Compositional distributional semantics with long short term memory." arXiv:1503.02510 (2015).