Language Model Methods and Metrics Gary Luu Ryan Fortune.

9
Language Model Methods and Metrics Gary Luu Ryan Fortune
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    0

Transcript of Language Model Methods and Metrics Gary Luu Ryan Fortune.

Language Model Methods and Metrics

Gary LuuRyan Fortune

Skip N-grams

• Interpolated with Bigram• Get Influence of words further away without

increasing dimensionality• Learning Curve

Skip N-gram Learning Curve

Content Word Language Model

• Help predict next word using last uncommon word, try to capture context

• Found list of 250 most common words• Tried different sizes for common words• Interpolated with language models, since this

wouldn’t maintain grammar• P(w|C)

Content Word Model

Bag Generation Metrics

• Bag Generation – NP-Hard• Random Restart Greedy Hill-Climbing• Stability Metric

• Give model correct sentence, does it maintain it as an optima?

• A percentage of sentences that remain stable

• Reconstruction Metric• Needs to be compared against lucky/random

Bag Generation Metrics

Clustering -IBMFullPredict

• Clustering overview• Perplexity down to 107 with million sentence

corpus

• Pibmfullpredict(wi|wi-2wi-1) = [λP(W|wi-2wi-1) + (1-λ)P(W|Wi-1Wi-2)] * [μP(w|wi-1wi-2,W) + (1-μ)P(w|Wi-2,Wi-1,W)]

Learning Curve for IBMFullPredict