Statistical Language Models Based on...

Statistical Language Models Based on RNN

Liu Rong

CSLT of Tsinghua University

2014-10-9

• Language Model

• Neural Network Based Language Model

• Standard NNLM

• RNN

• RNN with LSTM

• Introduction

• Toolkit

• References

Outline

• N-Gram

• A language model is usually formulated as a probability distribution p(s) over strings s that attempts to

reflect how frequently a string s occurs as a sentence

p s = p w1, w2, ⋯ ,w𝑘 = p(w1)p(w2|w1)⋯p(w𝑘|w1, w2, ⋯ , w𝑘−1)

• n-gram

• example(3-gram)

eg: “This is a example”

p(This is a example) ≈p(This|<s>)p(is|This <s>)p(a| <s> This is)

p(example|This is a)p(</s>| is a example)

=> to calculate the p(example|This is a)

N-Gram: p(example|This is a) = 𝐶(𝑇ℎ𝑖𝑠 𝑖𝑠 𝑎 𝑒𝑥𝑎𝑚𝑝𝑙𝑒)

𝐶(𝑇ℎ𝑖𝑠 𝑖𝑠 𝑎)

N-gram-Introduction

NNLM-Introduction

A Neural Probabilistic Language Model(Bengio et al, NIPS’2000 and JMLR 2003)

• Motivation:

– LM does not take into account contexts farther than 2 words.

– LM does not take into account the “similarity” between words.

• Idea:

– A word 𝑤 is associated with a distributed feature vector (a real-

valued vector inℝ𝑛 𝑛 is much smaller than size of the vocabulary)

– Express joint probability function 𝑓 of words sequence in terms of

feature vectors

– Learn simultaneously the word feature vector and the parameters of

NNLM-Intoduction

[1] Y. Bengio, and R. Ducharme. A neural probabilistic language model. In Neural Information Processing Systems, volume 13, pages 932-938. 2001.

• CSLM Toolkit http://www-lium.univ-lemans.fr/cslm/

Holger Schwenk; CSLM - A modular Open-Source Continuous Space Language Modeling Toolkit, in Interspeech, August 2013.

• NPLM Toolkit http://nlg.isi.edu/software/nplm/

Decoding with large-scale neural language models improves translation. Ashish Vaswani, Yinggong Zhao, Victoria Fossum, and

David Chiang, 2013. InProceedings of EMNLP.

NNLM--toolkit

RNNLM-description

Forward:

𝑠 𝑡 = 𝑓 𝑈𝑤 𝑡 +𝑊𝑠 𝑡 − 1 (1)

𝑦 𝑡 = 𝑔 𝑉𝑠 𝑡 (2)

where 𝑓 𝑧 =1

1+𝑒−𝑧, 𝑔 𝑧𝑚 =

𝑒𝑧𝑚

𝑘 𝑒𝑧𝑘

𝑤 𝑡 , 𝑦 𝑡 𝑖𝑠 𝑒𝑛𝑐𝑜𝑑𝑒𝑑 𝑎𝑠 1 − 𝑜𝑓 − 𝑉 𝑣𝑒𝑐𝑡𝑜𝑟

𝑒𝑜 𝑡 = 𝑑 𝑡 − 𝑦 𝑡 3

𝑉 𝑡 + 1 = 𝑉 𝑡 + 𝑠 𝑡 𝑒𝑜 𝑡𝑇𝛼 4

𝑈 𝑡 + 1 = 𝑈 𝑡 + 𝑤 𝑡 𝑒𝑜 𝑡𝑇𝑉𝑠 𝑡 1 − 𝑠 𝑡 𝛼 (5)

W 𝑡 + 1 = 𝑊 𝑡 + 𝑠 𝑡 − 1 𝑒𝑜 𝑡𝑇𝑉𝑠 𝑡 1 − 𝑠 𝑡 𝛼 (6)

RNNLM-Backpropagation Through Time

W 𝑡 + 1 = 𝑊 𝑡 + 𝑧=0𝑇 𝑠 𝑡 − 𝑧 − 1 𝑒ℎ(𝑡 − 𝑧)

𝑇𝛼 (6)

where 𝑒ℎ t − z − 1 = 𝑒ℎ 𝑡 − 𝑧𝑇𝑊𝑠 𝑡 − 𝑧 − 1 1 − 𝑠 𝑡 − 𝑧 − 1

𝑒ℎ 𝑡 = 𝑒𝑜 𝑡𝑇𝑉𝑠 𝑡 1 − 𝑠 𝑡

1. The unfolding can be applied for as many time steps as many training

examples Were already seen,however the error gradients quickly vanish

as they get backpropagated in time,so several steps of unfolding are

sufficient.

RNNLM-Classes

9[] Efficient Lattice Rescoring using Recurrent Neural Network Language Model X. Liu, Y. Wang, X. Chen..

1. By using simple classes,can achieve speedups

on large data sets more than 100 times

2. Lose a bit of accuracy of the model(usually 5-

10% in perplexity)

RNNLM-Joint Training With Maximum Entropy Model

RNNLM-LSTM

11F. A. Gers. Long Short-Term Memory in Recurrent Neural Networks. PhD thesis, Switzerland, 2001. http://felixgers.de/papers/phd.pdf

RNNLM-LSTM

• Traditional LSTM

RNNLM-LSTM

• Traditional LSTM

RNNLM-LSTM

• LSTM with Forget

RNNLM-LSTM

• LSTM in Network

RNNLM-LSTM

• LSTM for LM

LSTM Neural Networks for Language Modeling ,Martin Sundermeyer, Ralf Schl uter, and Hermann Ney

RNNLM-tool

• LTSM/RNN training, GPU&deep supported http://sourceforge.net/projects/currennt/

• RNNLM: RNN LM toolkit http://www.fit.vutbr.cz/~imikolov/rnnlm/

• RWTHLM: RNN LTSM toolkit http://www-i6.informatik.rwth-aachen.de/web/Software/rwthlm.php

• RNN toolkit from Microsoft http://research.microsoft.com/en-us/projects/rnn/

LSTM Neural Networks for Language Modeling ,Martin Sundermeyer, Ralf Schl uter, and Hermann Ney

• Thanks

• Q & A

Statistical Language Models Based on...

Documents

Transcript of Statistical Language Models Based on...

Latch Mediawiki english

Hacking Mediawiki

Analysing Learning Through Mediawiki

MediaWiki - How and Why

MediaWiki for Kisorigaku

Demo Semantic MediaWiki+ v1.4.4

Mediawiki to Confluence migration

Semantic Mapping with MediaWiki Jeroen De Dauw. Presentation outline Introduction to MediaWiki Introduction to Semantic MediaWiki – Questions Maps Semantic.

Semantic mapping with MediaWiki

Improving Outreach and Evaluation for the NNLM Wikipedia ...€¦ · Leading up to the edit-a-thons, the NNLM Wikipedia Edit-a-Thon Planning Team, hereafter referred to as the Team,

NNLM/MAR - Award Recipient Project Report - October 31, 2013

Creating Software Architecture Documentation for MediaWiki … · MediaWiki • MediaWiki is a software that powers Wikipedia • MediaWiki was introduced in 2002 and has been constantly

IBIS Conversations with MediaWiki

MediaWiki User Guide

WikimediaFoundation MediaWiki - NCC Group · WikimediaFoundation MediaWiki ApplicationPenetrationTest Preparedfor: Preparedby: ValentinLeon—SecurityEngineer DavidThiel—SecurityEngineer

Julia Language Documentation - Seoul National Universityplantgenomics.snu.ac.kr/mediawiki-1.21.3/images/2/2e/Julia.pdf · Julia Language Documentation, Release 0.3.0-dev expression

Power-Search for MediaWiki - diqa-pm.de · Power-Search for MediaWiki - 4 - 1 Introduction Power-search for MediaWiki is a powerful replacement for the standard MediaWiki search.

Adding Natural Language Processing Support to your ... · (Semantic) MediaWiki René Witte, Bahar Sateli Introduction Natural Language Processing Motivation Semantic Assistants The

Bringing a Semantic MediaWiki Flora to Lifebeta.floranorthamerica.org/w/images/2/21/FNA_SMW_Poster.pdf · Semantic MediaWiki (SMW) is a free extension to MediaWiki, the open source

MediaWiki API