NLP_session-1
-
Upload
alexandra-m-liguori-phd -
Category
Documents
-
view
35 -
download
2
Transcript of NLP_session-1
NLP Training – Session 1
Dr. Alexandra M. Liguori
Incubio – The Big Data Academy
Barcelona, March 11, 2015
Dr. Alexandra M. Liguori NLP Training – Session 1
Outline
1 Introduction2 Natural Language Processing3 Linguistic Ambiguities4 Typical NLP tasks5 POS-tagging6 What next? Topics for the next sessions
Dr. Alexandra M. Liguori NLP Training – Session 1
Introduction
1 Name?2 Background and current activity?3 Interest in this NLP training?4 What do you expect from this training?
Dr. Alexandra M. Liguori NLP Training – Session 1
Introduction: Intelligent machines?
Video:https://www.youtube.com/watch?v=dSIKBliboIo
(Stanley Kubrick and Arthur C. Clarke,screenplay of 2001: A Space Odyssey )
Dr. Alexandra M. Liguori NLP Training – Session 1
Introduction: Intelligent machines?
Dave Bowman: Open the pod bay doors, HAL.
HAL: I’m sorry Dave, I’m afraid I can’t do that.
(Stanley Kubrick and Arthur C. Clarke,screenplay of 2001: A Space Odyssey )
https://www.youtube.com/watch?v=dSIKBliboIo
Dr. Alexandra M. Liguori NLP Training – Session 1
Introduction: Intelligent machines?
1 Phonetics and phonology2 Morphology → produce contractions I’m and can’t3 Syntax → cfr. Open the pod bay doors, HAL.
vs. HAL, the pod bay door is open.vs. HAL, is the pod bay door open?
4 Lexical semantics → meaning of component words5 Compositional semantics → knowledge of how
components combine to form larger meanings6 Pragmatics → cfr. I’m sorry ... , I’m afraid I can’t
vs. No, I won’t open the door.vs. No.
7 Discourse conventions → engaging in structuredconversation using reference that in I’m sorry Dave, I’mafraid I can’t do that
Dr. Alexandra M. Liguori NLP Training – Session 1
Introduction: Intelligent machines?
1 Phonetics and phonology
2 Morphology → produce contractions I’m and can’t3 Syntax → cfr. Open the pod bay doors, HAL.
vs. HAL, the pod bay door is open.vs. HAL, is the pod bay door open?
4 Lexical semantics → meaning of component words5 Compositional semantics → knowledge of how
components combine to form larger meanings6 Pragmatics → cfr. I’m sorry ... , I’m afraid I can’t
vs. No, I won’t open the door.vs. No.
7 Discourse conventions → engaging in structuredconversation using reference that in I’m sorry Dave, I’mafraid I can’t do that
Dr. Alexandra M. Liguori NLP Training – Session 1
Introduction: Intelligent machines?
1 Phonetics and phonology2 Morphology → produce contractions I’m and can’t
3 Syntax → cfr. Open the pod bay doors, HAL.vs. HAL, the pod bay door is open.vs. HAL, is the pod bay door open?
4 Lexical semantics → meaning of component words5 Compositional semantics → knowledge of how
components combine to form larger meanings6 Pragmatics → cfr. I’m sorry ... , I’m afraid I can’t
vs. No, I won’t open the door.vs. No.
7 Discourse conventions → engaging in structuredconversation using reference that in I’m sorry Dave, I’mafraid I can’t do that
Dr. Alexandra M. Liguori NLP Training – Session 1
Introduction: Intelligent machines?
1 Phonetics and phonology2 Morphology → produce contractions I’m and can’t3 Syntax → cfr. Open the pod bay doors, HAL.
vs. HAL, the pod bay door is open.vs. HAL, is the pod bay door open?
4 Lexical semantics → meaning of component words5 Compositional semantics → knowledge of how
components combine to form larger meanings6 Pragmatics → cfr. I’m sorry ... , I’m afraid I can’t
vs. No, I won’t open the door.vs. No.
7 Discourse conventions → engaging in structuredconversation using reference that in I’m sorry Dave, I’mafraid I can’t do that
Dr. Alexandra M. Liguori NLP Training – Session 1
Introduction: Intelligent machines?
1 Phonetics and phonology2 Morphology → produce contractions I’m and can’t3 Syntax → cfr. Open the pod bay doors, HAL.
vs. HAL, the pod bay door is open.vs. HAL, is the pod bay door open?
4 Lexical semantics → meaning of component words
5 Compositional semantics → knowledge of howcomponents combine to form larger meanings
6 Pragmatics → cfr. I’m sorry ... , I’m afraid I can’tvs. No, I won’t open the door.vs. No.
7 Discourse conventions → engaging in structuredconversation using reference that in I’m sorry Dave, I’mafraid I can’t do that
Dr. Alexandra M. Liguori NLP Training – Session 1
Introduction: Intelligent machines?
1 Phonetics and phonology2 Morphology → produce contractions I’m and can’t3 Syntax → cfr. Open the pod bay doors, HAL.
vs. HAL, the pod bay door is open.vs. HAL, is the pod bay door open?
4 Lexical semantics → meaning of component words5 Compositional semantics → knowledge of how
components combine to form larger meanings
6 Pragmatics → cfr. I’m sorry ... , I’m afraid I can’tvs. No, I won’t open the door.vs. No.
7 Discourse conventions → engaging in structuredconversation using reference that in I’m sorry Dave, I’mafraid I can’t do that
Dr. Alexandra M. Liguori NLP Training – Session 1
Introduction: Intelligent machines?
1 Phonetics and phonology2 Morphology → produce contractions I’m and can’t3 Syntax → cfr. Open the pod bay doors, HAL.
vs. HAL, the pod bay door is open.vs. HAL, is the pod bay door open?
4 Lexical semantics → meaning of component words5 Compositional semantics → knowledge of how
components combine to form larger meanings6 Pragmatics → cfr. I’m sorry ... , I’m afraid I can’t
vs. No, I won’t open the door.vs. No.
7 Discourse conventions → engaging in structuredconversation using reference that in I’m sorry Dave, I’mafraid I can’t do that
Dr. Alexandra M. Liguori NLP Training – Session 1
Introduction: Intelligent machines?
1 Phonetics and phonology2 Morphology → produce contractions I’m and can’t3 Syntax → cfr. Open the pod bay doors, HAL.
vs. HAL, the pod bay door is open.vs. HAL, is the pod bay door open?
4 Lexical semantics → meaning of component words5 Compositional semantics → knowledge of how
components combine to form larger meanings6 Pragmatics → cfr. I’m sorry ... , I’m afraid I can’t
vs. No, I won’t open the door.vs. No.
7 Discourse conventions → engaging in structuredconversation using reference that in I’m sorry Dave, I’mafraid I can’t do that
Dr. Alexandra M. Liguori NLP Training – Session 1
Natural Language Processing
NLP: techniques that process written human language aslanguage.
Applicationsword countingautomatic hyphenationautomated question answeringnamed entity extraction (NER)information/content extractionsemantic analysissentiment analysismachine translation
Dr. Alexandra M. Liguori NLP Training – Session 1
Natural Language Processing
NLP: techniques that process written human language aslanguage.
Applicationsword countingautomatic hyphenationautomated question answeringnamed entity extraction (NER)information/content extractionsemantic analysissentiment analysismachine translation
Dr. Alexandra M. Liguori NLP Training – Session 1
Natural Language Processing
NLP: techniques that process written human language aslanguage.
Dr. Alexandra M. Liguori NLP Training – Session 1
Natural Language Processing
NLP: techniques that process written human language aslanguage.
Dr. Alexandra M. Liguori NLP Training – Session 1
Natural Language Processing
An ideal NLP team is very interdisciplinary, including:Language experts (linguists)Maths experts (mathematicians, physicists, statisticians)Programmers (computer scientists)
Dr. Alexandra M. Liguori NLP Training – Session 1
NLP: Maths & Computer Science
Dr. Alexandra M. Liguori NLP Training – Session 1
NLP: Six categories of linguistic knowledge
1 Phonetics and phonology ↔ red - read - read ; coche -cotxe
2 Morphology ↔ he walks - we walk ; chico - chica - chicos -chicas
3 Syntax ↔ She ate a mammoth breakfast - She eating amammoth breakfast
4 Semantics ↔ book (verb) - book (noun); nou (new) - nou(nine) - nou (nut)
5 Pragmatics ↔ prestame tu coche - puedes prestarme tucoche? - podrías prestarme tu coche, por favor?
Dr. Alexandra M. Liguori NLP Training – Session 1
NLP: Six categories of linguistic knowledge
1 Phonetics and phonology ↔ red - read - read ; coche -cotxe
2 Morphology ↔ he walks - we walk ; chico - chica - chicos -chicas
3 Syntax ↔ She ate a mammoth breakfast - She eating amammoth breakfast
4 Semantics ↔ book (verb) - book (noun); nou (new) - nou(nine) - nou (nut)
5 Pragmatics ↔ prestame tu coche - puedes prestarme tucoche? - podrías prestarme tu coche, por favor?
Dr. Alexandra M. Liguori NLP Training – Session 1
NLP: Six categories of linguistic knowledge
1 Phonetics and phonology ↔ red - read - read ; coche -cotxe
2 Morphology ↔ he walks - we walk ; chico - chica - chicos -chicas
3 Syntax ↔ She ate a mammoth breakfast - She eating amammoth breakfast
4 Semantics ↔ book (verb) - book (noun); nou (new) - nou(nine) - nou (nut)
5 Pragmatics ↔ prestame tu coche - puedes prestarme tucoche? - podrías prestarme tu coche, por favor?
Dr. Alexandra M. Liguori NLP Training – Session 1
NLP: Six categories of linguistic knowledge
1 Phonetics and phonology ↔ red - read - read ; coche -cotxe
2 Morphology ↔ he walks - we walk ; chico - chica - chicos -chicas
3 Syntax ↔ She ate a mammoth breakfast - She eating amammoth breakfast
4 Semantics ↔ book (verb) - book (noun); nou (new) - nou(nine) - nou (nut)
5 Pragmatics ↔ prestame tu coche - puedes prestarme tucoche? - podrías prestarme tu coche, por favor?
Dr. Alexandra M. Liguori NLP Training – Session 1
NLP: Six categories of linguistic knowledge
1 Phonetics and phonology ↔ red - read - read ; coche -cotxe
2 Morphology ↔ he walks - we walk ; chico - chica - chicos -chicas
3 Syntax ↔ She ate a mammoth breakfast - She eating amammoth breakfast
4 Semantics ↔ book (verb) - book (noun); nou (new) - nou(nine) - nou (nut)
5 Pragmatics ↔ prestame tu coche - puedes prestarme tucoche? - podrías prestarme tu coche, por favor?
Dr. Alexandra M. Liguori NLP Training – Session 1
NLP: Six categories of linguistic knowledge
1 Phonetics and phonology ↔ red - read - read ; coche -cotxe
2 Morphology ↔ he walks - we walk ; chico - chica - chicos -chicas
3 Syntax ↔ She ate a mammoth breakfast - She eating amammoth breakfast
4 Semantics ↔ book (verb) - book (noun); nou (new) - nou(nine) - nou (nut)
5 Pragmatics ↔ prestame tu coche - puedes prestarme tucoche? - podrías prestarme tu coche, por favor?
Dr. Alexandra M. Liguori NLP Training – Session 1
NLP: Six categories of linguistic knowledge
Discourse
Gracie: Oh yeah... And then Mr. and Mrs. Jones werehaving matrimonial trouble, and my brother was hired towatch Mrs. Jones.George: Well, I imagine she was a very attractive woman.Gracie: She was, and my brother watched her day andnight for six months.George: Well, what happened?Gracie: She finally got a divorce.George: Mrs. Jones?Gracie: No, my brother’s wife.
Jordi se fué al restaurante de Xavi para comer pescado.Este estaba fresco y le gustó.
Dr. Alexandra M. Liguori NLP Training – Session 1
NLP: Six categories of linguistic knowledge
DiscourseGracie: Oh yeah... And then Mr. and Mrs. Jones werehaving matrimonial trouble, and my brother was hired towatch Mrs. Jones.George: Well, I imagine she was a very attractive woman.Gracie: She was, and my brother watched her day andnight for six months.George: Well, what happened?Gracie: She finally got a divorce.George: Mrs. Jones?Gracie: No, my brother’s wife.
Jordi se fué al restaurante de Xavi para comer pescado.Este estaba fresco y le gustó.
Dr. Alexandra M. Liguori NLP Training – Session 1
NLP: Six categories of linguistic knowledge
DiscourseGracie: Oh yeah... And then Mr. and Mrs. Jones werehaving matrimonial trouble, and my brother was hired towatch Mrs. Jones.George: Well, I imagine she was a very attractive woman.Gracie: She was, and my brother watched her day andnight for six months.George: Well, what happened?Gracie: She finally got a divorce.George: Mrs. Jones?Gracie: No, my brother’s wife.
Jordi se fué al restaurante de Xavi para comer pescado.Este estaba fresco y le gustó.
Dr. Alexandra M. Liguori NLP Training – Session 1
NLP: Ambiguities and Solutions
Dr. Alexandra M. Liguori NLP Training – Session 1
NLP: Ambiguities and Solutions
Dr. Alexandra M. Liguori NLP Training – Session 1
Linguistic Ambiguities
Dr. Alexandra M. Liguori NLP Training – Session 1
Linguistic Ambiguities
Dr. Alexandra M. Liguori NLP Training – Session 1
Linguistic Ambiguities
Morphological ambiguityDeja la comida que sobre sobre la mesa de la cocina, dijo
llevando el sobre en la mano.
Syntactic ambiguity
María vio a un niño con un telescopio en la ventana.Vamos a escuchar la sinfonía encargada por la marquesa,que fue ejecutada el diez de abril de 1792.
Semantic ambiguity
Luís dejó el periódico en el banco.Vamos a escuchar la sinfonía encargada por la marquesa,que fue ejecutada el diez de abril de 1792.
Dr. Alexandra M. Liguori NLP Training – Session 1
Linguistic Ambiguities
Morphological ambiguityDeja la comida que sobre sobre la mesa de la cocina, dijo
llevando el sobre en la mano.
Syntactic ambiguity
María vio a un niño con un telescopio en la ventana.Vamos a escuchar la sinfonía encargada por la marquesa,que fue ejecutada el diez de abril de 1792.
Semantic ambiguity
Luís dejó el periódico en el banco.Vamos a escuchar la sinfonía encargada por la marquesa,que fue ejecutada el diez de abril de 1792.
Dr. Alexandra M. Liguori NLP Training – Session 1
Linguistic Ambiguities
Morphological ambiguityDeja la comida que sobre sobre la mesa de la cocina, dijo
llevando el sobre en la mano.
Syntactic ambiguity
María vio a un niño con un telescopio en la ventana.Vamos a escuchar la sinfonía encargada por la marquesa,que fue ejecutada el diez de abril de 1792.
Semantic ambiguity
Luís dejó el periódico en el banco.Vamos a escuchar la sinfonía encargada por la marquesa,que fue ejecutada el diez de abril de 1792.
Dr. Alexandra M. Liguori NLP Training – Session 1
Linguistic Ambiguities
Morphological ambiguityDeja la comida que sobre sobre la mesa de la cocina, dijo
llevando el sobre en la mano.
Syntactic ambiguity
María vio a un niño con un telescopio en la ventana.Vamos a escuchar la sinfonía encargada por la marquesa,que fue ejecutada el diez de abril de 1792.
Semantic ambiguity
Luís dejó el periódico en el banco.Vamos a escuchar la sinfonía encargada por la marquesa,que fue ejecutada el diez de abril de 1792.
Dr. Alexandra M. Liguori NLP Training – Session 1
Typical NLP tasks: Basic and simpler tasks
Tokenization RegEx
Sentence splitting RegEx
POS-tagging POS-tagging algorithms andtag sets
Dr. Alexandra M. Liguori NLP Training – Session 1
Typical NLP tasks: Basic and simpler tasks
Tokenization
RegEx
Sentence splitting RegEx
POS-tagging POS-tagging algorithms andtag sets
Dr. Alexandra M. Liguori NLP Training – Session 1
Typical NLP tasks: Basic and simpler tasks
Tokenization RegEx
Sentence splitting RegEx
POS-tagging POS-tagging algorithms andtag sets
Dr. Alexandra M. Liguori NLP Training – Session 1
Typical NLP tasks: Basic and simpler tasks
Tokenization RegEx
Sentence splitting
RegEx
POS-tagging POS-tagging algorithms andtag sets
Dr. Alexandra M. Liguori NLP Training – Session 1
Typical NLP tasks: Basic and simpler tasks
Tokenization RegEx
Sentence splitting RegEx
POS-tagging POS-tagging algorithms andtag sets
Dr. Alexandra M. Liguori NLP Training – Session 1
Typical NLP tasks: Basic and simpler tasks
Tokenization RegEx
Sentence splitting RegEx
POS-tagging
POS-tagging algorithms andtag sets
Dr. Alexandra M. Liguori NLP Training – Session 1
Typical NLP tasks: Basic and simpler tasks
Tokenization RegEx
Sentence splitting RegEx
POS-tagging POS-tagging algorithms andtag sets
Dr. Alexandra M. Liguori NLP Training – Session 1
Typical NLP tasks: Complex tasks
Lemmatization or StemmingImplementations of PorterStemmer (e.g. in Java),
Stanford NLP tool, GATE, ...
Syntactic parsing
Early algorithm, CYKalgorithm, GHR algorithm,
Stanford Parser (Javaimplementation of
probabilistic algorithm)
Topic extractionNERSemantic analysis...
Ad hoc tools, e.g.dictionaries, ontologies,Frames, GATE, NLTK,
...
Dr. Alexandra M. Liguori NLP Training – Session 1
Typical NLP tasks: Complex tasks
Lemmatization or Stemming
Implementations of PorterStemmer (e.g. in Java),
Stanford NLP tool, GATE, ...
Syntactic parsing
Early algorithm, CYKalgorithm, GHR algorithm,
Stanford Parser (Javaimplementation of
probabilistic algorithm)
Topic extractionNERSemantic analysis...
Ad hoc tools, e.g.dictionaries, ontologies,Frames, GATE, NLTK,
...
Dr. Alexandra M. Liguori NLP Training – Session 1
Typical NLP tasks: Complex tasks
Lemmatization or StemmingImplementations of PorterStemmer (e.g. in Java),
Stanford NLP tool, GATE, ...
Syntactic parsing
Early algorithm, CYKalgorithm, GHR algorithm,
Stanford Parser (Javaimplementation of
probabilistic algorithm)
Topic extractionNERSemantic analysis...
Ad hoc tools, e.g.dictionaries, ontologies,Frames, GATE, NLTK,
...
Dr. Alexandra M. Liguori NLP Training – Session 1
Typical NLP tasks: Complex tasks
Lemmatization or StemmingImplementations of PorterStemmer (e.g. in Java),
Stanford NLP tool, GATE, ...
Syntactic parsing
Early algorithm, CYKalgorithm, GHR algorithm,
Stanford Parser (Javaimplementation of
probabilistic algorithm)
Topic extractionNERSemantic analysis...
Ad hoc tools, e.g.dictionaries, ontologies,Frames, GATE, NLTK,
...
Dr. Alexandra M. Liguori NLP Training – Session 1
Typical NLP tasks: Complex tasks
Lemmatization or StemmingImplementations of PorterStemmer (e.g. in Java),
Stanford NLP tool, GATE, ...
Syntactic parsing
Early algorithm, CYKalgorithm, GHR algorithm,
Stanford Parser (Javaimplementation of
probabilistic algorithm)
Topic extractionNERSemantic analysis...
Ad hoc tools, e.g.dictionaries, ontologies,Frames, GATE, NLTK,
...
Dr. Alexandra M. Liguori NLP Training – Session 1
Typical NLP tasks: Complex tasks
Lemmatization or StemmingImplementations of PorterStemmer (e.g. in Java),
Stanford NLP tool, GATE, ...
Syntactic parsing
Early algorithm, CYKalgorithm, GHR algorithm,
Stanford Parser (Javaimplementation of
probabilistic algorithm)
Topic extractionNERSemantic analysis...
Ad hoc tools, e.g.dictionaries, ontologies,Frames, GATE, NLTK,
...
Dr. Alexandra M. Liguori NLP Training – Session 1
Typical NLP tasks: Complex tasks
Lemmatization or StemmingImplementations of PorterStemmer (e.g. in Java),
Stanford NLP tool, GATE, ...
Syntactic parsing
Early algorithm, CYKalgorithm, GHR algorithm,
Stanford Parser (Javaimplementation of
probabilistic algorithm)
Topic extractionNERSemantic analysis...
Ad hoc tools, e.g.dictionaries, ontologies,Frames, GATE, NLTK,
...
Dr. Alexandra M. Liguori NLP Training – Session 1
POS-tagging
Example with Penn Treebank POS-tags (English):
A/DT woman/NN came/VBD from/IN the/DT back/NN of/INthe/DT store/NN ./. She/PP appeared/VBD to/TO be/VB
sleepy/JJ and/CC quite/RB a/DT bit/NN younger/JJR than/INMr./NNP Dobbs/NNP and/CC to/TO be/VB wearing/VBG
too/RB much/RB makeup/NN ./.
Dr. Alexandra M. Liguori NLP Training – Session 1
POS-tagging
Example with Penn Treebank POS-tags (English):
A/DT woman/NN came/VBD from/IN the/DT back/NN of/INthe/DT store/NN ./. She/PP appeared/VBD to/TO be/VB
sleepy/JJ and/CC quite/RB a/DT bit/NN younger/JJR than/INMr./NNP Dobbs/NNP and/CC to/TO be/VB wearing/VBG
too/RB much/RB makeup/NN ./.
Dr. Alexandra M. Liguori NLP Training – Session 1
POS-tagging
Example with Penn Treebank POS-tags (English):
A/DT woman/NN came/VBD from/IN the/DT back/NN of/INthe/DT store/NN ./. She/PP appeared/VBD to/TO be/VB
sleepy/JJ and/CC quite/RB a/DT bit/NN younger/JJR than/INMr./NNP Dobbs/NNP and/CC to/TO be/VB wearing/VBG
too/RB much/RB makeup/NN ./.
Dr. Alexandra M. Liguori NLP Training – Session 1
POS-tagging
Example of ambiguity:
1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBNmuy/RB divertida/JJ !/!
2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DTjungla/NN oscura/JJ ./.
Dr. Alexandra M. Liguori NLP Training – Session 1
POS-tagging
Example of ambiguity:1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBN
muy/RB divertida/JJ !/!
2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DTjungla/NN oscura/JJ ./.
Dr. Alexandra M. Liguori NLP Training – Session 1
POS-tagging
Example of ambiguity:1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBN
muy/RB divertida/JJ !/!
2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DTjungla/NN oscura/JJ ./.
Dr. Alexandra M. Liguori NLP Training – Session 1
POS-tagging
Three main tagging algorithms or methods:1 rule-based tagging, e.g. ENGTWOL2 stochastic tagging, e.g. HMM tagger3 transformation-based tagging, e.g. Brill tagger
Dr. Alexandra M. Liguori NLP Training – Session 1
Rule-based POS-tagging
Example of ambiguity:1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBN
muy/RB divertida/JJ !/!2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DT
jungla/NN oscura/JJ ./.
Large database of hand-written disambiguation rules, e.g.:
DT + NN → YESDT + VB → NOPP + NN → NOPP + VB → YES
Dr. Alexandra M. Liguori NLP Training – Session 1
Rule-based POS-tagging
Example of ambiguity:1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBN
muy/RB divertida/JJ !/!2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DT
jungla/NN oscura/JJ ./.
Large database of hand-written disambiguation rules, e.g.:
DT + NN → YESDT + VB → NOPP + NN → NOPP + VB → YES
Dr. Alexandra M. Liguori NLP Training – Session 1
Rule-based POS-tagging
Example of ambiguity:1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBN
muy/RB divertida/JJ !/!2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DT
jungla/NN oscura/JJ ./.
Large database of hand-written disambiguation rules, e.g.:
DT + NN → YESDT + VB → NOPP + NN → NOPP + VB → YES
Dr. Alexandra M. Liguori NLP Training – Session 1
Rule-based POS-tagging
Example of ambiguity:1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBN
muy/RB divertida/JJ !/!2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DT
jungla/NN oscura/JJ ./.
Large database of hand-written disambiguation rules, e.g.:DT + NN → YESDT + VB → NOPP + NN → NOPP + VB → YES
Dr. Alexandra M. Liguori NLP Training – Session 1
Stochastic POS-tagging
Example of ambiguity:1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBN
muy/RB divertida/JJ !/!2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DT
jungla/NN oscura/JJ ./.
Training corpus to compute probability of given word havinggiven tag in given context, e.g.:
La/DT aventura/NN de/IN → 98%
La/DT aventura/VBZ de/IN → 2%
El/DT explorador/NN se/PP aventura/VBZ → 97%
El/DT explorador/NN se/PP aventura/NN → 3%
Dr. Alexandra M. Liguori NLP Training – Session 1
Stochastic POS-tagging
Example of ambiguity:1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBN
muy/RB divertida/JJ !/!2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DT
jungla/NN oscura/JJ ./.
Training corpus to compute probability of given word havinggiven tag in given context, e.g.:
La/DT aventura/NN de/IN → 98%
La/DT aventura/VBZ de/IN → 2%
El/DT explorador/NN se/PP aventura/VBZ → 97%
El/DT explorador/NN se/PP aventura/NN → 3%
Dr. Alexandra M. Liguori NLP Training – Session 1
Stochastic POS-tagging
Example of ambiguity:1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBN
muy/RB divertida/JJ !/!2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DT
jungla/NN oscura/JJ ./.
Training corpus to compute probability of given word havinggiven tag in given context, e.g.:
La/DT aventura/NN de/IN → 98%
La/DT aventura/VBZ de/IN → 2%
El/DT explorador/NN se/PP aventura/VBZ → 97%
El/DT explorador/NN se/PP aventura/NN → 3%
Dr. Alexandra M. Liguori NLP Training – Session 1
Stochastic POS-tagging
Example of ambiguity:1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBN
muy/RB divertida/JJ !/!2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DT
jungla/NN oscura/JJ ./.
Training corpus to compute probability of given word havinggiven tag in given context, e.g.:
La/DT aventura/NN de/IN → 98%
La/DT aventura/VBZ de/IN → 2%
El/DT explorador/NN se/PP aventura/VBZ → 97%
El/DT explorador/NN se/PP aventura/NN → 3%
Dr. Alexandra M. Liguori NLP Training – Session 1
Transformation-based tagging POS-tagging
Example of ambiguity:
La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBNmuy/RB divertida/JJ !/!El/DT explorador/NN se/PP aventura/VBZ en/IN la/DTjungla/NN oscura/JJ ./.
Rules automatically induced from data using Machine Learningtechniques, e.g.:
1 prob(aventura=NN) = 65%, prob(aventura=VBZ) = 35%→ system would always take aventura = NN
2 apply Machine Learning techniques and learn theconditional probabilities:
3 La/DT aventura/NN de/IN → 98%El/DT explorador/NN se/PP aventura/VBZ → 97%
Dr. Alexandra M. Liguori NLP Training – Session 1
Transformation-based tagging POS-tagging
Example of ambiguity:
La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBNmuy/RB divertida/JJ !/!El/DT explorador/NN se/PP aventura/VBZ en/IN la/DTjungla/NN oscura/JJ ./.
Rules automatically induced from data using Machine Learningtechniques, e.g.:
1 prob(aventura=NN) = 65%, prob(aventura=VBZ) = 35%→ system would always take aventura = NN
2 apply Machine Learning techniques and learn theconditional probabilities:
3 La/DT aventura/NN de/IN → 98%El/DT explorador/NN se/PP aventura/VBZ → 97%
Dr. Alexandra M. Liguori NLP Training – Session 1
Transformation-based tagging POS-tagging
Example of ambiguity:
La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBNmuy/RB divertida/JJ !/!El/DT explorador/NN se/PP aventura/VBZ en/IN la/DTjungla/NN oscura/JJ ./.
Rules automatically induced from data using Machine Learningtechniques, e.g.:
1 prob(aventura=NN) = 65%, prob(aventura=VBZ) = 35%→ system would always take aventura = NN
2 apply Machine Learning techniques and learn theconditional probabilities:
3 La/DT aventura/NN de/IN → 98%El/DT explorador/NN se/PP aventura/VBZ → 97%
Dr. Alexandra M. Liguori NLP Training – Session 1
Transformation-based tagging POS-tagging
Example of ambiguity:
La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBNmuy/RB divertida/JJ !/!El/DT explorador/NN se/PP aventura/VBZ en/IN la/DTjungla/NN oscura/JJ ./.
Rules automatically induced from data using Machine Learningtechniques, e.g.:
1 prob(aventura=NN) = 65%, prob(aventura=VBZ) = 35%→ system would always take aventura = NN
2 apply Machine Learning techniques and learn theconditional probabilities:
3 La/DT aventura/NN de/IN → 98%El/DT explorador/NN se/PP aventura/VBZ → 97%
Dr. Alexandra M. Liguori NLP Training – Session 1
Transformation-based tagging POS-tagging
Example of ambiguity:
La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBNmuy/RB divertida/JJ !/!El/DT explorador/NN se/PP aventura/VBZ en/IN la/DTjungla/NN oscura/JJ ./.
Rules automatically induced from data using Machine Learningtechniques, e.g.:
1 prob(aventura=NN) = 65%, prob(aventura=VBZ) = 35%→ system would always take aventura = NN
2 apply Machine Learning techniques and learn theconditional probabilities:
3 La/DT aventura/NN de/IN → 98%El/DT explorador/NN se/PP aventura/VBZ → 97%
Dr. Alexandra M. Liguori NLP Training – Session 1
Transformation-based tagging POS-tagging
Example of ambiguity:
La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBNmuy/RB divertida/JJ !/!El/DT explorador/NN se/PP aventura/VBZ en/IN la/DTjungla/NN oscura/JJ ./.
Rules automatically induced from data using Machine Learningtechniques, e.g.:
1 prob(aventura=NN) = 65%, prob(aventura=VBZ) = 35%→ system would always take aventura = NN
2 apply Machine Learning techniques and learn theconditional probabilities:
3 La/DT aventura/NN de/IN → 98%El/DT explorador/NN se/PP aventura/VBZ → 97%
Dr. Alexandra M. Liguori NLP Training – Session 1
POS-tagging
POS-tagging tools for Spanish:
Petra Tag (C++)OpenNLP POS-Tagging Engine (Apache Stanbol)POS-tagger with NLTK (Python)FreeLing (TALP Research Center at UPC)create your own POS-tagger in Java and plug it into GATE
Dr. Alexandra M. Liguori NLP Training – Session 1
What next?
Topics for the next sessions:1 Semantic analysis2 Question answering3 Reference resolution4 Named Entity Recognition (NER)5 Keyword / topic / information extraction
Dr. Alexandra M. Liguori NLP Training – Session 1