Top-down Tree Long Short-Term Memory Networks

Xingxing Zhang, Liang Lu, Mirella Lapata

School of Informatics, University of Edinburgh

12th June, 2016

Zhang et al., 2016 Tree LSTM 12th June, 2016 1 / 18

Sequential Language Models

P(S = w1,w2, . . . ,wn) =n∏

P(wi |w1:i−1) (1)

State of the Art

based on Long Short Term Memory Network Language Model(Hochreiter and Schmidhuber, 1997; Sundermeyer et al., 2012)Billion word benchmark results reported in Jozefowicz et al., (2016)

Models PPLKN5 67.6LSTM 30.6LSTM+CNN INPUTS 30.0

Will tree structures help LMs?

Probably yes

LMs based on Constituency Parsing (Chelba and Jelinek, 2000; Roark,2001; Charniak, 2001)LMs based on Dependency Parsing (Shen et al., 2008; Zhang, 2009;Sennrich, 2015)

Will tree structures help LMs?

Probably yes

LMs based on Constituency Parsing (Chelba and Jelinek, 2000; Roark,2001; Charniak, 2001)LMs based on Dependency Parsing (Shen et al., 2008; Zhang, 2009;Sennrich, 2015)

LSTMs + Dependency Trees = TreeLSTMs

Sentence Length N v.s. Tree Height log(N)

Top-down GenerationBreadth-first searchreminiscent of Eisner (1996)

LSTMs + Dependency Trees = TreeLSTMs

Sentence Length N v.s. Tree Height log(N)

Top-down GenerationBreadth-first searchreminiscent of Eisner (1996)

Generation Process (Unlabeled Trees)

The luxury auto manufacturer last year sold 1,214 cars in the U.S.

Tree LSTM

P(S) =n∏

P(wi |w1:i−1) (2)

P(S |T ) =∏

w∈BFS(T )\root

P(w |D(w)) (3)

D(w) is the Dependency Path of w .

D(w) is a generated sub-tree.

Works on projective and unlabeled dependency trees.

Tree LSTM

One Limitation of Tree LSTM

Left Dependent Tree LSTM

Experiments

MSR Sentence Completion Challenge

Training set: 49 million words (around 2 million sentences)

development set: 4000 sentences

test set: 1040 completion questions.

Dependency Parsing Reranking

Rerank 2nd Order MSTParser (McDonald and Pereira, 2006)

We train TreeLSTM and LdTreeLSTM as language models.

We only use words as input features; POS tags, dependency labels orcomposition features are not used.

NN: Chen & Manning, 2014; S-LSTM: Dyer et al., 2015Zhang et al., 2016 Tree LSTM 12th June, 2016 14 / 18

Tree Generation

Four binary classifiers:

Add Left? No!

Add Left?

Add Right?

Add Next Left?

Add Next Right?

Features: hidden states andword embeddings

Classifiers Accuracies

Add-Left 94.3Add-Right 92.6Add-Nx-Left 93.4Add-Nx-Right 96.0

Tree Generation

Add Right? Yes!

Add Left?

Add Right?

Add Next Left?

Add Next Right?

Tree Generation

Add Right? Yes!

Add Left?

Add Right?

Add Next Left?

Add Next Right?

Tree Generation

Add Next Right? No!

Add Left?

Add Right?

Add Next Left?

Add Next Right?

Tree Generation

Add Left? Yes!

Add Left?

Add Right?

Add Next Left?

Add Next Right?

Tree Generation

Add Left? Yes!

Add Left?

Add Right?

Add Next Left?

Add Next Right?

Tree Generation

Add Next Left? No!

Add Left?

Add Right?

Add Next Left?

Add Next Right?

Tree Generation

Add Left?

Add Right?

Add Next Left?

Add Next Right?

Tree Generation

Conclusions

Syntax can help language modeling.

Predicting tree structures with Neural Networks is possible.

Next Steps:

Sequence to Tree ModelsTree to Tree Models

code available:https://github.com/XingxingZhang/td-treelstm

Thanks & Questions?

Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks...

Transcript of Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks...

Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks...

Documents

Transcript of Top-down Tree Long Short-Term Memory Networks · Top-down Tree Long Short-Term Memory Networks...

Lapata by Nimra Ahmed Urdu Novels Center (Urdunovels12.Blogspot.com)

A Quantum Light Source for Light-Matter Interaction by Xingxing Xing A thesis submitted in

Neural Summarization by Extracting Sentences and … Summarization by Extracting Sentences and Words Jianpeng Cheng Mirella Lapata ILCC, School of Informatics, University of Edinburgh

Probabilistic Text Structuring: Experiments with Sentence Ordering Mirella Lapata Department of Computer Science University of Sheffield, UK (ACL 2003)

ferromagnetic semiconductors Electronic Supporting ... · CrXTe3 (X= Si, Ge) nanosheets: two dimensional intrinsic ferromagnetic semiconductors Xingxing Lia and Jinlong Yang*ab aHefei

CHICK-FIL-A FIRCRESTDial 811 Callbefore you dig. Know what'sbelow. CHICK-FIL-A FIRCREST TREE #1 TREE #2 TREE #3 TREE #4 TREE #5 TREE #6 TREE #7 TREE #8 TREE #9 TREE #10 TREE #11 TREE

Experience Grounds Language · Experience Grounds Language Yonatan Bisk* Ari Holtzman* Jesse Thomason* Jacob Andreas Yoshua Bengio Joyce Chai Mirella Lapata Angeliki Lazaridou Jonathan

Aristotle begins his Metaphysics with the assertion `All · Web viewLaurence Goldstein Tree, tree, tree, tree, tree, tree, tree, tree, tree, tree. Are you now experiencing the meaning

A Probabilistic Account of Logical Metonymyhomepages.inf.ed.ac.uk/mlap/Papers/lapata-01.pdf · A Probabilistic Account of Logical Metonymy Maria Lapata Alex Lascarides† University

Language to Logical Form with Neural Attention › pdf › 1601.01280.pdfLanguage to Logical Form with Neural Attention Li Dong and Mirella Lapata Institute for Language, Cognition

sitesubplans.townofcary.org · cedar tree dogwood tree holly tree maple tree oak tree pine tree tree sweet gum tree unknown tree walnut tree ecm..„existing concrete monument localization

Real-time high-rate GNSS techniques for earthquake ... · Real-time high-rate GNSS techniques for earthquake monitoring and early warning vorgelegt von Dipl.-Ing. Xingxing Li ...

Dependency Parsing as Head Selectionhomepages.inf.ed.ac.uk/s1270921/res/slides/dense.pdf · Dependency Parsing as Head Selection Xingxing Zhang, Jianpeng Cheng, Mirella Lapata Institute

Multiple Instance Learning Networks for Fine-Grained ... · Multiple Instance Learning Networks for Fine-Grained Sentiment Analysis Stefanos Angelidis and Mirella Lapata Institute

Creative and Artistic Text Generation · IJCAI 2019 Tutorial: Creative and Artistic Text Generation 39 •Zhang, Xingxing, and Mirella Lapata. "Chinese poetry generation with recurrent

Simple and Effective Text Simplification Using Semantic ...eliors/talks/ISCOL_talk.pdf · Neural Machine Translation (Nisioi et al., 2017; Zhang et al., 2017; Zhang and Lapata, 2017)

Speech Segmentation - The University of Edinburgh · Speech Segmentation Informatics 1 CG: Lecture 8 Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk

Based on “Semi-Supervised Semantic Role Labeling via Structural Alignment” by Furstenau and Lapata, 2011 Advisors: Prof. Michael Elhadad and Mr. Avi Hayoun.

Hierarchical Transformers for Multi-Document Summarization · Yang Liu and Mirella Lapata Institute for Language, Cognition and Computation School of Informatics, University of Edinburgh

Discourse Constraints for Document Compressionjamesclarke.net/media/papers/clarke-lapata-cl2010.pdf · Clarke and Lapata Discourse Constraints for Document Compression In building