Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural...
Transcript of Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural...
![Page 1: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/1.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Natural Language Processing
Nov 19, 2019
1 / 35
![Page 2: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/2.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
2 / 35
![Page 3: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/3.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Who wrote the Federalist Papers?
1787-8: anonymous essays try to convince New York toratify U.S Constitution: Jay, Madison, Hamilton.
Authorship of 12 of the letters in dispute.
1963: solved by Mosteller and Wallace using Bayesianmethods.
By the end of this lecture we will see how to do that.
3 / 35
![Page 4: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/4.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Who wrote the Federalist Papers?
1787-8: anonymous essays try to convince New York toratify U.S Constitution: Jay, Madison, Hamilton.
Authorship of 12 of the letters in dispute.
1963: solved by Mosteller and Wallace using Bayesianmethods.
By the end of this lecture we will see how to do that.
3 / 35
![Page 5: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/5.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
What makes it hard?
Formal languages are:
unambiguous
Natural languages areambiguous:
“He saw her duck”.“Time flies like an arrow. Fruit flies like a banana”
4 / 35
![Page 6: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/6.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
By the end of the class
By the end of the class we will see how to do:
1 Text Classification. E.g. Spam detection, Authorshipidentification.
2 Spell Correction. E.g. Auto-correct.
3 Word suggestion.
5 / 35
![Page 7: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/7.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Regular Expressions
A formal language for specifying text strings.
6 / 35
![Page 8: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/8.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Notations
• Disjunctions [] :
Pattern Matches[Ww ]oodchuck woodchuck, Woodchuck[0123456789] Any single digit
• Disjunctions |:
Pattern Matchesabc|def Find ‘abc’ or ‘def’.a|b|ab Find ‘a’ or ‘b’ or ‘ab’. Example: ‘abc’
7 / 35
![Page 9: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/9.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Notations
• Ranges:
Pattern Matches[A− Z ] An uppercase letter.[a− z ] A lowercase letter.[0− 9] A single digit.
• Negation ˆ. (Note: Carat means negation only when its firstin [])
Pattern Matches[ˆA− Z ] Not upper case
[ˆSs] Not ‘S’ nor ‘s’[ˆeˆ] Not ‘e’ nor ‘ˆ’aˆb Search for the pattern‘aˆb’
8 / 35
![Page 10: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/10.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Notations (? * . + ˆ $)
? 0 or 1 of previous character* 0 or more of previous character+ 1 or more of previous character. Any characterˆ Start anchor$ End anchor\ Escape character
9 / 35
![Page 11: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/11.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.
Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..
10 / 35
![Page 12: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/12.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.
• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..
10 / 35
![Page 13: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/13.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.
cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..
10 / 35
![Page 14: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/14.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.
• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..
10 / 35
![Page 15: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/15.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.
end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..
10 / 35
![Page 16: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/16.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.
• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..
10 / 35
![Page 17: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/17.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.
end..
10 / 35
![Page 18: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/18.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: ˆ[A− Z ].Which of them are matches? “Class”, “cSCE”, “420”.Class.• Pattern: ˆ[ˆA− Z ].Which of them are matches? “Class”, “cSCE”, “420”.cSCE, 420.• Pattern: .$Which of them are matches? “end”, “end?”, “end!”, “end.”.end. end? end! end.• Pattern: \.$Which of them are matches? “end”, “end?”, “end!”, “end.”.end..
10 / 35
![Page 19: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/19.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.
color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.
11 / 35
![Page 20: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/20.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.
• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.
11 / 35
![Page 21: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/21.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.
colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.
11 / 35
![Page 22: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/22.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.
• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.
11 / 35
![Page 23: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/23.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.
color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.
11 / 35
![Page 24: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/24.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.
• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.
11 / 35
![Page 25: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/25.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.
colouur.
11 / 35
![Page 26: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/26.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Examples
• Pattern: colou?r .Which of them are matches? “color”, “colour”, “colouur”.color, colour.• Pattern: colou + r .Which of them are matches? “color”, “colour”, “colouur”.colour, colouur.• Pattern: colou ∗ r .Which of them are matches? “color”, “colour”, “colouur”.color, colour, colouur.• Pattern: colou.r .Which of them are matches? “color”, “colour”, “colouur”.colouur.
11 / 35
![Page 27: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/27.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
We need to find instances of “the” in a text.
the × ‘The’
[Tt]he × ‘Theology’
[ˆA− Za− z ][Tt]he[ˆA− Za− z ]
12 / 35
![Page 28: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/28.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
We need to find instances of “the” in a text.
the
× ‘The’
[Tt]he × ‘Theology’
[ˆA− Za− z ][Tt]he[ˆA− Za− z ]
12 / 35
![Page 29: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/29.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
We need to find instances of “the” in a text.
the × ‘The’
[Tt]he × ‘Theology’
[ˆA− Za− z ][Tt]he[ˆA− Za− z ]
12 / 35
![Page 30: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/30.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
We need to find instances of “the” in a text.
the × ‘The’
[Tt]he
× ‘Theology’
[ˆA− Za− z ][Tt]he[ˆA− Za− z ]
12 / 35
![Page 31: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/31.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
We need to find instances of “the” in a text.
the × ‘The’
[Tt]he × ‘Theology’
[ˆA− Za− z ][Tt]he[ˆA− Za− z ]
12 / 35
![Page 32: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/32.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
We need to find instances of “the” in a text.
the × ‘The’
[Tt]he × ‘Theology’
[ˆA− Za− z ][Tt]he[ˆA− Za− z ]
12 / 35
![Page 33: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/33.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Text classification
Assigning subject categories, topics, or genres.
Spam detection.
Authorship identification.
Age/gender identification.
Language Identification.
· · ·
13 / 35
![Page 34: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/34.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Text Classification
Inputs:
Document d.Fixed set of classes C = {c1, c2, · · · , cn}.
Output:
A predicted class c ∈ C
14 / 35
![Page 35: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/35.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Naive Bayes
Relies on simple representation of document – Bag of Words.
For a document d and a class c
P(c |d) =P(d |c)P(c)
P(d)
15 / 35
![Page 36: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/36.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Naive Bayes
Relies on simple representation of document – Bag of Words.
For a document d and a class c
P(c |d) =P(d |c)P(c)
P(d)
15 / 35
![Page 37: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/37.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Naive Bayes Classifier
cMAP = argmaxc∈C
P(c |d)
MAP - Maximum a posteriori (most likely class).
cMAP = argmaxc∈C
P(d |c)P(c)
P(d)
cMAP = argmaxc∈C
P(d |c)P(c)
16 / 35
![Page 38: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/38.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Naive Bayes Classifier
cMAP = argmaxc∈C
P(d |c)P(c)
Let’s say that the document is represented by n featuresx1, x2, · · · xn
cMAP = argmaxc∈C
P(x1, x2, · · · xn|c)P(c)
17 / 35
![Page 39: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/39.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Assumptions
Bag of words: Position of words does not matter.Conditional Independence: The feature probabilities P(xi |c)are independent given the class c .
P(x1, x2, · · · xn|c) =n∏
i=1
P(xi |c)
18 / 35
![Page 40: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/40.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Bag of word representation
19 / 35
![Page 41: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/41.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Bag of word representation
20 / 35
![Page 42: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/42.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Naive Bayes: Learning
What do we need?Training set of m hand-labeled documents(d1, c1), · · · , (dm, cm)
21 / 35
![Page 43: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/43.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Naive Bayes: Learning
Let ND be the number of documents, and Ncj be the numberof documents present in class cj .Let Vcj be the set of all words in the documents of class cjNow we find the maximum likelihood estimates:
P̂(cj) =Ncj
ND
P̂(wi |cj) =count(wi , cj)∑
w∈Vcjcount(w , cj)
Now we can classify a document d by:
cd = argmaxcj∈C
P̂(cj)∏wi∈d
P̂(wi |cj)
22 / 35
![Page 44: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/44.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Naive Bayes: Learning
What if we come across an unknown word in the document d .Let wu be the unknown word P̂(wu|cj) = 0,∀cj .
23 / 35
![Page 45: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/45.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Laplace smoothing
Let V be the set of all words in the test documents, i.e,V = ∪cjVcj
Add one word for the unknown word in the vocabulary.
P̂(wi |cj) =count(wi , cj) + 1∑
w∈Vcjcount(w , cj) + |V |+ 1
So, for all unknown words, we have:
P̂(wu|cj) =1∑
w∈Vcjcount(w , cj) + |V |+ 1
24 / 35
![Page 46: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/46.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Training set:
# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b
P̂(a) = 3/4P̂(b) = 1/4
25 / 35
![Page 47: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/47.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Training set:
# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b
P̂(a) =
3/4P̂(b) = 1/4
25 / 35
![Page 48: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/48.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Training set:
# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b
P̂(a) = 3/4P̂(b) =
1/4
25 / 35
![Page 49: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/49.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Training set:
# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b
P̂(a) = 3/4P̂(b) = 1/4
25 / 35
![Page 50: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/50.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Training set:
# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b
P̂(Carla|a) =(5 + 1)/(8 + 6 + 1) = 6/15P̂(Taylor |a) = (0 + 1)/(8 + 6 + 1) = 1/15
26 / 35
![Page 51: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/51.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Training set:
# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b
P̂(Carla|a) =
(5 + 1)/(8 + 6 + 1) = 6/15P̂(Taylor |a) = (0 + 1)/(8 + 6 + 1) = 1/15
26 / 35
![Page 52: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/52.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Training set:
# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b
P̂(Carla|a) =(5 + 1)/(8 + 6 + 1) = 6/15P̂(Taylor |a) =
(0 + 1)/(8 + 6 + 1) = 1/15
26 / 35
![Page 53: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/53.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Training set:
# Text Class1 Carla Betty Carla a2 Carla Carla Suzanne a3 Carla Matt a4 Taylor Jessica Carla b
P̂(Carla|a) =(5 + 1)/(8 + 6 + 1) = 6/15P̂(Taylor |a) = (0 + 1)/(8 + 6 + 1) = 1/15
26 / 35
![Page 54: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/54.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Document d5: Carla Carla Carla Taylor Jessica
P̂(a) = 3/4 and P̂(b) = 1/4P̂(Carla|a) = 6/15, P̂(Carla|b) = 2/10P̂(Taylor |a) = 1/15, P̂(Taylor |b) = 2/10P̂(Jessica|a) = 1/15, P̂(Jessica|b) = 2/10
P(a|d5) = 3/4× (6/15)3 × 1/15× 1/15 ≈ 0.0002P(b|d5) = 1/4× (2/10)3 × 2/10× 2/10 ≈ 0.00008
27 / 35
![Page 55: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/55.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Document d5: Carla Carla Carla Taylor Jessica
P̂(a) = 3/4 and P̂(b) = 1/4P̂(Carla|a) = 6/15, P̂(Carla|b) = 2/10P̂(Taylor |a) = 1/15, P̂(Taylor |b) = 2/10P̂(Jessica|a) = 1/15, P̂(Jessica|b) = 2/10
P(a|d5) = 3/4× (6/15)3 × 1/15× 1/15 ≈ 0.0002P(b|d5) = 1/4× (2/10)3 × 2/10× 2/10 ≈ 0.00008
27 / 35
![Page 56: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/56.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Document d5: Carla Carla Carla Taylor Jessica
P̂(a) = 3/4 and P̂(b) = 1/4P̂(Carla|a) = 6/15, P̂(Carla|b) = 2/10P̂(Taylor |a) = 1/15, P̂(Taylor |b) = 2/10P̂(Jessica|a) = 1/15, P̂(Jessica|b) = 2/10
P(a|d5) =
3/4× (6/15)3 × 1/15× 1/15 ≈ 0.0002P(b|d5) = 1/4× (2/10)3 × 2/10× 2/10 ≈ 0.00008
27 / 35
![Page 57: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/57.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Document d5: Carla Carla Carla Taylor Jessica
P̂(a) = 3/4 and P̂(b) = 1/4P̂(Carla|a) = 6/15, P̂(Carla|b) = 2/10P̂(Taylor |a) = 1/15, P̂(Taylor |b) = 2/10P̂(Jessica|a) = 1/15, P̂(Jessica|b) = 2/10
P(a|d5) = 3/4× (6/15)3 × 1/15× 1/15 ≈ 0.0002P(b|d5) =
1/4× (2/10)3 × 2/10× 2/10 ≈ 0.00008
27 / 35
![Page 58: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/58.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
Document d5: Carla Carla Carla Taylor Jessica
P̂(a) = 3/4 and P̂(b) = 1/4P̂(Carla|a) = 6/15, P̂(Carla|b) = 2/10P̂(Taylor |a) = 1/15, P̂(Taylor |b) = 2/10P̂(Jessica|a) = 1/15, P̂(Jessica|b) = 2/10
P(a|d5) = 3/4× (6/15)3 × 1/15× 1/15 ≈ 0.0002P(b|d5) = 1/4× (2/10)3 × 2/10× 2/10 ≈ 0.00008
27 / 35
![Page 59: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/59.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Naive Bayes
Naive Bayes is not so naive!!
Robust to Irrelevant Features.
Optimal if the independence assumptions hold.
A good dependable baseline for text classification. - Thereexists other classifiers that give better accuracy
28 / 35
![Page 60: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/60.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Federalist Papers
Discussion: Federalist papers.E.g. What training set do we need?
29 / 35
![Page 61: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/61.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Language Modelling
Goal: Assign probability to a sentence.
Why?
Machine Translation. P(high winds tonight) > P(largewinds tonight)
Spell Correction. P(about fifteen minutes from) >P(about fifteen minuets from)
Speech Recognition. P(I saw a van) > P(eyes awe of an)
30 / 35
![Page 62: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/62.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Language Modelling
Goal: Assign probability to a sentence.Why?
Machine Translation. P(high winds tonight) > P(largewinds tonight)
Spell Correction. P(about fifteen minutes from) >P(about fifteen minuets from)
Speech Recognition. P(I saw a van) > P(eyes awe of an)
30 / 35
![Page 63: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/63.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Language Modelling
Goal: Assign probability to a sentence.Why?
Machine Translation. P(high winds tonight) > P(largewinds tonight)
Spell Correction. P(about fifteen minutes from) >P(about fifteen minuets from)
Speech Recognition. P(I saw a van) > P(eyes awe of an)
30 / 35
![Page 64: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/64.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Language Modelling
Goal: Assign probability to a sentence.Why?
Machine Translation. P(high winds tonight) > P(largewinds tonight)
Spell Correction. P(about fifteen minutes from) >P(about fifteen minuets from)
Speech Recognition. P(I saw a van) > P(eyes awe of an)
30 / 35
![Page 65: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/65.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Language Modelling
Goal: Assign probability to a sentence.Why?
Machine Translation. P(high winds tonight) > P(largewinds tonight)
Spell Correction. P(about fifteen minutes from) >P(about fifteen minuets from)
Speech Recognition. P(I saw a van) > P(eyes awe of an)
30 / 35
![Page 66: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/66.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Language Modelling
Goal: compute the probability of a sentence or sequenceof words. P(W ) = P(w1,w2, · · · ,wn)
Related task: probability of an upcoming word.P(wi |w1,w2, · · · ,wi−1)
Chain rule:
P(x1, x2, · · · , xn)
= P(x1)P(x2|x1)P(x3|x1, x2) · · ·P(xn|x1, x2 · · · , xn−1)
=∏i
P(xi |x1, x2, · · · xi−1)
31 / 35
![Page 67: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/67.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Language Modelling
Goal: compute the probability of a sentence or sequenceof words. P(W ) = P(w1,w2, · · · ,wn)
Related task: probability of an upcoming word.P(wi |w1,w2, · · · ,wi−1)
Chain rule:
P(x1, x2, · · · , xn)
= P(x1)P(x2|x1)P(x3|x1, x2) · · ·P(xn|x1, x2 · · · , xn−1)
=∏i
P(xi |x1, x2, · · · xi−1)
31 / 35
![Page 68: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/68.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
P(its water is so transparent that) = P(its)× P(water |its)×P(so|its,water)× P(transparent|its,water , is, so)×P(that|its,water , is, so, transparent)Can we count?
P(that|its,water , is, so, transparent)
=P(its,water , is, so, transparent, that)
P(its,water , is, so, transparent)
No. Too many possibilities.
32 / 35
![Page 69: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/69.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Example
P(its water is so transparent that) = P(its)× P(water |its)×P(so|its,water)× P(transparent|its,water , is, so)×P(that|its,water , is, so, transparent)Can we count?
P(that|its,water , is, so, transparent)
=P(its,water , is, so, transparent, that)
P(its,water , is, so, transparent)
No. Too many possibilities.
32 / 35
![Page 70: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/70.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Markov Assumption
Take only the k words preceding it.
P(wi |w1,w2, · · · ,wi−1) ≈ P(wi |wi−k · · · ,wi−1)
P(that|its,water , is, so, transparent) = P(that|transparent)
or,
P(that|its,water , is, so, transparent) = P(that|so, transparent)
33 / 35
![Page 71: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/71.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Markov Assumption
Take only the k words preceding it.
P(wi |w1,w2, · · · ,wi−1) ≈ P(wi |wi−k · · · ,wi−1)
P(that|its,water , is, so, transparent) = P(that|transparent)
or,
P(that|its,water , is, so, transparent) = P(that|so, transparent)
33 / 35
![Page 72: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/72.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Unigram, Bigram and N-gram
Unigram model:
P(w1,w2, · · · ,wn−1,wn) ≈∏i
P(wi )
Bigram model:
P(wi |w1,w2, · · · ,wi−1) ≈ P(wi |wi−1)
N-gram model:Extension to trigram, 4-gram, 5-gram, etc.
34 / 35
![Page 73: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/73.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Discussion
1. Spell correction
2. Word suggestion.
35 / 35
![Page 74: Natural Language Processingrobotics.cs.tamu.edu/dshell/cs420/nlp.pdf · 2019-12-01 · Natural Language Processing Introduction Regular Expression Notations Examples Text Classi cation](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb1bad48825762f186715bb/html5/thumbnails/74.jpg)
NaturalLanguageProcessing
Introduction
RegularExpression
Notations
Examples
TextClassification
Naive Bayes
Example
LanguageModelling
Unigram, Bigramand N-gram
Discussion
1. Spell correction2. Word suggestion.
35 / 35