Human Language Technology

January 2012 Spelling Models 1

Spelling Models

References

• Eric Mays, Fred J. Damerau, and Robert L. Mercer. 1991. Context based spelling correction. Inf. Process. Manage. 27, 5 (September 1991), 517-522.

• Church, K. and W. Gale (1991). Probability Scoring for Spelling Correction. Statistics and Computing 1: 93-103.

• Brill, E. and Moore, R., (2000), An improved error model for noisy channel spelling correction, Proceedings of ACL Conference, [pdf]

Outline

• In this lecture we describe three different models of how spelling errors are produced.

• Single Character– Equal probabililty– Differentiated probability

• Multiple Character

Confusion Set

The confusion set of a word w includes w along with all words in the dictionary D such that O can be derived from w by a single application of one of the four edit operations: – Add a single letter.– Delete a single letter.– Replace one letter with another.– Transpose two adjacent letters.

Error Model 1Mayes, Damerau et al. 1991

• Let C be the number of words in the confusion set of w.

• The error model, for all s in the confusion set of d, is:P(O|w) = α if O=w,

(1- α)/(C-1) otherwise• α is the prior probability of a given typed word

being correct.• Key Idea: The remaining probability mass is

distributed evenly among all other words in the confusion set.

Error Model 2: Church & Gale 1991

• Church & Gale (1991) propose a more sophisticated error model based on same confusion set (one edit operation away from w).

• Two improvements:1. Unequal weightings attached to different editing

operations.2. Insertion and deletion probabilities are conditioned

on context. The probability of inserting or deleting a character is conditioned on the letter appearing immediately to the left of that character.

Obtaining Error Probabilities

• The error probabilities are derived by first assuming all edits are equiprobable.

• They use as a training corpus a set of space-delimited strings that were found in a large collection of text, and that (a) do not appear in their dictionary and (b) are no more than one edit away from a word that does appear in the dictionary.

• They iteratively run the spell checker over the training corpus to find corrections, then use these corrections to update the edit probabilities.

Error Model 3Brill and Moore (2000)

• Let Σ be an alphabet• Model allows all operations of the form

α β, where α,β in Σ*. • P(α β) is the probability that when users

intends to type the string α they type β instead.

• N.B. model considers substitutions of arbitrary substrings not just single characters.

Model 3Brill and Moore (2000)

• Model also tries to account for the fact that in general, positional information is a powerful conditioning feature, e.g. p(entler|antler) < p(reluctent|reluctant)

• i.e. Probability is partially conditioned by the position in the string in which the edit occurs.

• artifact/artefact; correspondance/correspondence

Three Stage Model

• Person picks a word.physical

• Person picks a partition of characters within word.ph y s i c al

• Person types each partition, perhaps erroneously.

• f i s i k le• p(fisikle|physical) =

p(f|ph) * p(i|y) * p(s|s) * p(i|i) * p(k|c) * p(le|al)

Formal Presentation

∑ ∏∑∈ =

=∈)(

)|()|(wPartR

RTsPartT

RTPwRP

• Let Part(w) be the set of all possible ways to partition string w into substrings.

• For particular R in Part(w) containing j continuous segments, let Ri be the ith segment. Then P(s|w) =

Simplification

P(s | w) =max R

P(R|w) P(Ti|Ri)

• By considering only the best partitioning of s and w this simplifies to

Training the Model

• To train model, need a series of (s,w) word pairs.

• begin by aligning the letters in (si,wi) based on MED.

• For instance, given the training pair (akgsual, actual), this could be aligned as:a c t u a l

a k g s u a l

Training the Model

• This corresponds to the sequence of editing operations

• aa ck εg ts uu aa ll• To allow for richer contextual information, each

nonmatch substitution is expanded to incorporate up to N additional adjacent edits.

• For example, for the first nonmatch edit ck in the example above, with N=2, we would generate the following substitutions:

Training the Model

a c t u a l

a k g s u a l

c kac akc kgac akgct kgs

• We would do similarly for the other nonmatch edits, and give each of these substitutions a fractional count.

Training the Model

• We can then calculate the probability of each substitution α β ascount(α β)/count(α).

• count(α β) is simply the sum of the counts derived from our training data as explained above

• Estimating count(α) is harder, since we are not training from a text corpus, but from a a set of (s,w) tuples (without an associated corpus)

Training the Model

• From a large collection of representative text, count the number of occurrences of α.

• Adjust the count based on an estimate of the rate with which people make typing errors.

Human Language Technology

Documents

Transcript of Human Language Technology

Teaching and Training in Human Language Technology Frank Van Eynde University of Leuven.

Human Language Technology Maturity Forecast … Language Technology Maturity Forecast Final Report 24 March 2009 Prepared for the National Technology Alliance by the Rosettex Technology

Human Language Technology for the Semantic Web //gate.ac.uk/ //nlp.shef.ac.uk/ Hamish Cunningham Kalina.

CSA3202 Human Language Technology HMMs for POS Tagging.

Metadata Extraction: Human Language Technology and the Semantic Web

Chapter One. Where did human language come from ? How did human language originate ? When did human language begin ?

Human Language Technology: Where are we going and where have we been (since 2003)?

Human Language Technology for the Semantic Web gate.ac.uk/ nlp.shef.ac.uk

Information Retrieval Meets Human Language Technology Retrieval Meets Human Language Technology Marko Tadic ... Several HLT sub-fields interesting for information retrieval Several

Bridge the Digital Divide with the Human Language Technology

Lessons for Reproducible Science from DARPA’s Progams in Human Language Technology

Moving Beyond Entity Extraction to Entity Resolution - Human Language Technology Conference

Conference on Human Language Technology for Development for Development

Human Language Technology Conference and Conference on Empirical

Sindhi computing in Human Language Technology

A Lightning Introduction To Clouds & HLT - Human Language Technology Conference

October 2009HLT: Conflation Algorithms1 Human Language Technology Conflation Algorithms.

Draft for a road map on human language technology · Towards a Road Map on Human Language Technology: Natural Language Processing Editors: Andreas Eisele, Dorothea Ziegler-Eisele

Draft for a road map on human language technology - ELSNET · Towards a Road Map on Human Language Technology: Natural Language Processing Editors: Andreas Eisele, Dorothea Ziegler-Eisele

HUMAN LANGUAGE TECHNOLOGY - Association for …aclweb.org/anthology/H/H94/H94-1000.pdfHUMAN LANGUAGE TECHNOLOGY Proceedings of a Workshop held at Plainsboro, New Jersey March 8-11,