Chiung-Yun Hu Robert Olivares Bo Ram Kim Edger Arriaza Wen-Hui Chien Cynthia Tsai 04/23/2014.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff...
-
Upload
isabella-wilcox -
Category
Documents
-
view
225 -
download
0
Transcript of Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff...
![Page 1: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/1.jpg)
Parsing Natural Scenes and Natural Language with
Recursive Neural NetworksRichard Socher
Cliff Chiung-Yu Lin
Andrew Y. Ng
Christopher D. Manning
Slides & Speech: Rui Zhang
![Page 2: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/2.jpg)
Outline
•Motivation & Contribution•Recursive Neural Network•Scene Segmentation using RNN• Learning and Optimization• Language Parsing using RNN•Experiments
![Page 3: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/3.jpg)
Motivation
•Data naturally contains recursive structures• Image: Scenes split into objects, objects split into
parts• Language: A noun phrase contains a clause which
contains noun phrases of its own
![Page 4: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/4.jpg)
Motivation
•The recursive structure helps to• Identify components of the data•Understand how the components interact to form
the whole data
![Page 5: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/5.jpg)
Contribution
•First deep learning method to achieve state-of-art performance on scene segmentation and annotation• Learned deep features outperform hand-crafted
ones(e.g. Gist)•Can be generalized for other tasks, e.g. language
parsing
![Page 6: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/6.jpg)
Recursive Neural Network
•Similar to one-layer full-connected network•Models transformation from children nodes to
parent node•Recursively applied to tree structure• Parent of one layer become child
of the upper layer•Parameters shared across layers
𝑐1 𝑐2
𝑊 𝑟𝑒𝑐𝑢𝑟
𝑥
h
𝑊 𝑟𝑒𝑐𝑢𝑟
𝑐3
![Page 7: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/7.jpg)
Recursive vs. Recurrent NN
•There are two models called RNN: Recursive and Recurrent•Similar• Both have shared parameter which are applied in a
recursive style•Different• Recursive NN applies to trees, while Recurrent NN applies
to sequences • Recurrent NN could be considered as Recursive NN for
one-way trees
![Page 8: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/8.jpg)
Scene Segmentation Pipeline
Over segment image into superpixels
Extract feature of superpixels
Map feature onto semantic
space
Compute score for each merge
with RNN
Permute possible merges
Merge pair of nodes with
highest score
Repeat until only one node
is left
![Page 9: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/9.jpg)
Input Data Representation
• Image• Over-segmented superpixels• Extract hand-crafted feature•Map onto semantic space by one full-connection layer to
obtain feature vector• Each superpixel has a class label
![Page 10: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/10.jpg)
Tree Construction
• Scene parse trees are constructed in bottom-up style• Leaf nodes are over-segmented superpixels• Extract hand-crafted feature• Map onto semantic space by one full-connection layer• Each leaf has a feature vector
• An adjacency matrix records neighboring relations
Adjacency Matrix
![Page 11: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/11.jpg)
Greedy Merging
• Nodes are merged in a greedy style• In each iteration• Permute all possible merge(pairs of adjacent nodes)• Compute score for each possible merge
• Full-connection transformation upon • Merge the pair with highest score
• and replaced by new node • becomes feature for • Union of neighbors of and becomes neighbors of
• Repeat until only one node is left
𝑐1 𝑐2
𝑊 𝑟𝑒𝑐𝑢𝑟
𝑥
h12
𝑠𝑐𝑜𝑟𝑒
𝑊 𝑠𝑐𝑜𝑟𝑒
![Page 12: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/12.jpg)
Training(1)
•Max Margin Estimation• Structured Margin Loss • Penalize merging a segment with another one of a different label before
merging with all its neighbors of the same label • Number of sub-trees not appearing in correct trees
• Tree Score • Sum of merge scores on all non-leaf nodes
• Class Label• Softmax upon node feature vector
• Correct Trees• Adjacent nodes with same label are merged first• One image may have more than one correct
tree
![Page 13: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/13.jpg)
Training(2)
• Intuition: We want the score of highest scoring correct tree to be larger than other trees by a margin • Formulation
• Margin
• Loss Function
• is minimized
is all model parameters is index of training image is training image is labels of is set of correct trees of is all possible trees of is the tree score function
is a node in the parse tree is the set of nodes
![Page 14: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/14.jpg)
Training(3)
• Label of node is predicted by softmax• The margin is no differentiable• Therefore only a sub-gradient is computed
• is obtained by back-propagation• Gradient of label prediction is also obtained by
back-propagation
𝑐1 𝑐2
𝑊 𝑟𝑒𝑐𝑢𝑟
𝑥
h12
𝑠𝑐𝑜𝑟𝑒
𝑊 𝑠𝑐𝑜𝑟𝑒 𝑊 𝑙𝑎𝑏𝑒𝑙
𝑙𝑎𝑏𝑒𝑙
![Page 15: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/15.jpg)
Language Parsing
• Language parsing is similar to scene parsing• Differences
• Input is natural language sentence• Adjacency is strictly left and right• Class labels are syntactical classes
• Word Level• Phrase Level• Clause(从句 ) Level
• Each sentence has only one correct tree
![Page 16: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/16.jpg)
Experiments Overview
• Image• Scene Segmentation and Annotation• Scene Classification• Nearest Neighbor Scene Subtree
• Language• Supervised Language Parsing• Nearest Neighbor Phrases
![Page 17: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/17.jpg)
Scene Segmentation and Annotation• Dataset
• Stanford Background Dataset
• Task:• Segment and label foreground and
different types of background pixelwise
• Result• 78.1% pixelwise accuracy• 0.6% above state-of-art
![Page 18: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/18.jpg)
Scene Classification
• Dataset• Stanford Background Dataset
• Task• Three classes: city, countryside, sea-side
• Method• Feature: Average of all node features/top node feature only• Classifier: Linear SVM
• Result• 88.1% accuracy for average feature
• 4.1% above Gist, the state-of-art feature• 71.0% accuracy for top feature
• Discussion• Learned RNN feature can better capture semantic info of scene• Top feature losses some lower level info
![Page 19: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/19.jpg)
Nearest Neighbor Scene Subtrees• Dataset
• Stanford Background Dataset
• Task• Retrieve similar segments from all images• Subtrees whose nodes have the same label
corresponds to a segment
• Method• Feature: Top node feature of the subtree• Metrics: Euclidean Distance
• Result• Similar segments are retrieved
• Discuss• RNN feature can capture segment level
characteristics
![Page 20: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/20.jpg)
Supervised Language Parsing
• Dataset• Penn Treebank• Wall Street Journal Section
• Task• Generate parse tree with labeled node
• Result• Unlabeled bracketing F-measure• 90.29%, comparable to 91.63% of Berkley Parser
![Page 21: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/21.jpg)
Nearest Neighbor Phrases
• Dataset• Penn Treebank• Wall Street Journal Section
• Task• Retrieve nearest neighbor of given
sentence
• Method• Feature: Top node feature• Metrics: Euclidean Distance
• Result• Similar sentences are retrieved
![Page 22: Parsing Natural Scenes and Natural Language with Recursive Neural Networks Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides.](https://reader030.fdocuments.us/reader030/viewer/2022020219/56649e315503460f94b22897/html5/thumbnails/22.jpg)
Discussion
• Understanding semantic structure of data is essential for applications like fine-grained search or captioning• Recursive NN predicts tree structure along with node labels
in an elegant way• Recursive NN can be incorporated with CNN• If we can jointly learn Recursive NN with