NLP_session-1

73
NLP Training Session 1 Dr. Alexandra M. Liguori Incubio The Big Data Academy Barcelona, March 11, 2015 Dr. Alexandra M. Liguori NLP Training Session 1

Transcript of NLP_session-1

Page 1: NLP_session-1

NLP Training – Session 1

Dr. Alexandra M. Liguori

Incubio – The Big Data Academy

Barcelona, March 11, 2015

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 2: NLP_session-1

Outline

1 Introduction2 Natural Language Processing3 Linguistic Ambiguities4 Typical NLP tasks5 POS-tagging6 What next? Topics for the next sessions

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 3: NLP_session-1

Introduction

1 Name?2 Background and current activity?3 Interest in this NLP training?4 What do you expect from this training?

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 4: NLP_session-1

Introduction: Intelligent machines?

Video:https://www.youtube.com/watch?v=dSIKBliboIo

(Stanley Kubrick and Arthur C. Clarke,screenplay of 2001: A Space Odyssey )

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 5: NLP_session-1

Introduction: Intelligent machines?

Dave Bowman: Open the pod bay doors, HAL.

HAL: I’m sorry Dave, I’m afraid I can’t do that.

(Stanley Kubrick and Arthur C. Clarke,screenplay of 2001: A Space Odyssey )

https://www.youtube.com/watch?v=dSIKBliboIo

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 6: NLP_session-1

Introduction: Intelligent machines?

1 Phonetics and phonology2 Morphology → produce contractions I’m and can’t3 Syntax → cfr. Open the pod bay doors, HAL.

vs. HAL, the pod bay door is open.vs. HAL, is the pod bay door open?

4 Lexical semantics → meaning of component words5 Compositional semantics → knowledge of how

components combine to form larger meanings6 Pragmatics → cfr. I’m sorry ... , I’m afraid I can’t

vs. No, I won’t open the door.vs. No.

7 Discourse conventions → engaging in structuredconversation using reference that in I’m sorry Dave, I’mafraid I can’t do that

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 7: NLP_session-1

Introduction: Intelligent machines?

1 Phonetics and phonology

2 Morphology → produce contractions I’m and can’t3 Syntax → cfr. Open the pod bay doors, HAL.

vs. HAL, the pod bay door is open.vs. HAL, is the pod bay door open?

4 Lexical semantics → meaning of component words5 Compositional semantics → knowledge of how

components combine to form larger meanings6 Pragmatics → cfr. I’m sorry ... , I’m afraid I can’t

vs. No, I won’t open the door.vs. No.

7 Discourse conventions → engaging in structuredconversation using reference that in I’m sorry Dave, I’mafraid I can’t do that

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 8: NLP_session-1

Introduction: Intelligent machines?

1 Phonetics and phonology2 Morphology → produce contractions I’m and can’t

3 Syntax → cfr. Open the pod bay doors, HAL.vs. HAL, the pod bay door is open.vs. HAL, is the pod bay door open?

4 Lexical semantics → meaning of component words5 Compositional semantics → knowledge of how

components combine to form larger meanings6 Pragmatics → cfr. I’m sorry ... , I’m afraid I can’t

vs. No, I won’t open the door.vs. No.

7 Discourse conventions → engaging in structuredconversation using reference that in I’m sorry Dave, I’mafraid I can’t do that

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 9: NLP_session-1

Introduction: Intelligent machines?

1 Phonetics and phonology2 Morphology → produce contractions I’m and can’t3 Syntax → cfr. Open the pod bay doors, HAL.

vs. HAL, the pod bay door is open.vs. HAL, is the pod bay door open?

4 Lexical semantics → meaning of component words5 Compositional semantics → knowledge of how

components combine to form larger meanings6 Pragmatics → cfr. I’m sorry ... , I’m afraid I can’t

vs. No, I won’t open the door.vs. No.

7 Discourse conventions → engaging in structuredconversation using reference that in I’m sorry Dave, I’mafraid I can’t do that

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 10: NLP_session-1

Introduction: Intelligent machines?

1 Phonetics and phonology2 Morphology → produce contractions I’m and can’t3 Syntax → cfr. Open the pod bay doors, HAL.

vs. HAL, the pod bay door is open.vs. HAL, is the pod bay door open?

4 Lexical semantics → meaning of component words

5 Compositional semantics → knowledge of howcomponents combine to form larger meanings

6 Pragmatics → cfr. I’m sorry ... , I’m afraid I can’tvs. No, I won’t open the door.vs. No.

7 Discourse conventions → engaging in structuredconversation using reference that in I’m sorry Dave, I’mafraid I can’t do that

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 11: NLP_session-1

Introduction: Intelligent machines?

1 Phonetics and phonology2 Morphology → produce contractions I’m and can’t3 Syntax → cfr. Open the pod bay doors, HAL.

vs. HAL, the pod bay door is open.vs. HAL, is the pod bay door open?

4 Lexical semantics → meaning of component words5 Compositional semantics → knowledge of how

components combine to form larger meanings

6 Pragmatics → cfr. I’m sorry ... , I’m afraid I can’tvs. No, I won’t open the door.vs. No.

7 Discourse conventions → engaging in structuredconversation using reference that in I’m sorry Dave, I’mafraid I can’t do that

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 12: NLP_session-1

Introduction: Intelligent machines?

1 Phonetics and phonology2 Morphology → produce contractions I’m and can’t3 Syntax → cfr. Open the pod bay doors, HAL.

vs. HAL, the pod bay door is open.vs. HAL, is the pod bay door open?

4 Lexical semantics → meaning of component words5 Compositional semantics → knowledge of how

components combine to form larger meanings6 Pragmatics → cfr. I’m sorry ... , I’m afraid I can’t

vs. No, I won’t open the door.vs. No.

7 Discourse conventions → engaging in structuredconversation using reference that in I’m sorry Dave, I’mafraid I can’t do that

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 13: NLP_session-1

Introduction: Intelligent machines?

1 Phonetics and phonology2 Morphology → produce contractions I’m and can’t3 Syntax → cfr. Open the pod bay doors, HAL.

vs. HAL, the pod bay door is open.vs. HAL, is the pod bay door open?

4 Lexical semantics → meaning of component words5 Compositional semantics → knowledge of how

components combine to form larger meanings6 Pragmatics → cfr. I’m sorry ... , I’m afraid I can’t

vs. No, I won’t open the door.vs. No.

7 Discourse conventions → engaging in structuredconversation using reference that in I’m sorry Dave, I’mafraid I can’t do that

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 14: NLP_session-1

Natural Language Processing

NLP: techniques that process written human language aslanguage.

Applicationsword countingautomatic hyphenationautomated question answeringnamed entity extraction (NER)information/content extractionsemantic analysissentiment analysismachine translation

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 15: NLP_session-1

Natural Language Processing

NLP: techniques that process written human language aslanguage.

Applicationsword countingautomatic hyphenationautomated question answeringnamed entity extraction (NER)information/content extractionsemantic analysissentiment analysismachine translation

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 16: NLP_session-1

Natural Language Processing

NLP: techniques that process written human language aslanguage.

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 17: NLP_session-1

Natural Language Processing

NLP: techniques that process written human language aslanguage.

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 18: NLP_session-1

Natural Language Processing

An ideal NLP team is very interdisciplinary, including:Language experts (linguists)Maths experts (mathematicians, physicists, statisticians)Programmers (computer scientists)

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 19: NLP_session-1

NLP: Maths & Computer Science

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 20: NLP_session-1

NLP: Six categories of linguistic knowledge

1 Phonetics and phonology ↔ red - read - read ; coche -cotxe

2 Morphology ↔ he walks - we walk ; chico - chica - chicos -chicas

3 Syntax ↔ She ate a mammoth breakfast - She eating amammoth breakfast

4 Semantics ↔ book (verb) - book (noun); nou (new) - nou(nine) - nou (nut)

5 Pragmatics ↔ prestame tu coche - puedes prestarme tucoche? - podrías prestarme tu coche, por favor?

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 21: NLP_session-1

NLP: Six categories of linguistic knowledge

1 Phonetics and phonology ↔ red - read - read ; coche -cotxe

2 Morphology ↔ he walks - we walk ; chico - chica - chicos -chicas

3 Syntax ↔ She ate a mammoth breakfast - She eating amammoth breakfast

4 Semantics ↔ book (verb) - book (noun); nou (new) - nou(nine) - nou (nut)

5 Pragmatics ↔ prestame tu coche - puedes prestarme tucoche? - podrías prestarme tu coche, por favor?

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 22: NLP_session-1

NLP: Six categories of linguistic knowledge

1 Phonetics and phonology ↔ red - read - read ; coche -cotxe

2 Morphology ↔ he walks - we walk ; chico - chica - chicos -chicas

3 Syntax ↔ She ate a mammoth breakfast - She eating amammoth breakfast

4 Semantics ↔ book (verb) - book (noun); nou (new) - nou(nine) - nou (nut)

5 Pragmatics ↔ prestame tu coche - puedes prestarme tucoche? - podrías prestarme tu coche, por favor?

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 23: NLP_session-1

NLP: Six categories of linguistic knowledge

1 Phonetics and phonology ↔ red - read - read ; coche -cotxe

2 Morphology ↔ he walks - we walk ; chico - chica - chicos -chicas

3 Syntax ↔ She ate a mammoth breakfast - She eating amammoth breakfast

4 Semantics ↔ book (verb) - book (noun); nou (new) - nou(nine) - nou (nut)

5 Pragmatics ↔ prestame tu coche - puedes prestarme tucoche? - podrías prestarme tu coche, por favor?

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 24: NLP_session-1

NLP: Six categories of linguistic knowledge

1 Phonetics and phonology ↔ red - read - read ; coche -cotxe

2 Morphology ↔ he walks - we walk ; chico - chica - chicos -chicas

3 Syntax ↔ She ate a mammoth breakfast - She eating amammoth breakfast

4 Semantics ↔ book (verb) - book (noun); nou (new) - nou(nine) - nou (nut)

5 Pragmatics ↔ prestame tu coche - puedes prestarme tucoche? - podrías prestarme tu coche, por favor?

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 25: NLP_session-1

NLP: Six categories of linguistic knowledge

1 Phonetics and phonology ↔ red - read - read ; coche -cotxe

2 Morphology ↔ he walks - we walk ; chico - chica - chicos -chicas

3 Syntax ↔ She ate a mammoth breakfast - She eating amammoth breakfast

4 Semantics ↔ book (verb) - book (noun); nou (new) - nou(nine) - nou (nut)

5 Pragmatics ↔ prestame tu coche - puedes prestarme tucoche? - podrías prestarme tu coche, por favor?

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 26: NLP_session-1

NLP: Six categories of linguistic knowledge

Discourse

Gracie: Oh yeah... And then Mr. and Mrs. Jones werehaving matrimonial trouble, and my brother was hired towatch Mrs. Jones.George: Well, I imagine she was a very attractive woman.Gracie: She was, and my brother watched her day andnight for six months.George: Well, what happened?Gracie: She finally got a divorce.George: Mrs. Jones?Gracie: No, my brother’s wife.

Jordi se fué al restaurante de Xavi para comer pescado.Este estaba fresco y le gustó.

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 27: NLP_session-1

NLP: Six categories of linguistic knowledge

DiscourseGracie: Oh yeah... And then Mr. and Mrs. Jones werehaving matrimonial trouble, and my brother was hired towatch Mrs. Jones.George: Well, I imagine she was a very attractive woman.Gracie: She was, and my brother watched her day andnight for six months.George: Well, what happened?Gracie: She finally got a divorce.George: Mrs. Jones?Gracie: No, my brother’s wife.

Jordi se fué al restaurante de Xavi para comer pescado.Este estaba fresco y le gustó.

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 28: NLP_session-1

NLP: Six categories of linguistic knowledge

DiscourseGracie: Oh yeah... And then Mr. and Mrs. Jones werehaving matrimonial trouble, and my brother was hired towatch Mrs. Jones.George: Well, I imagine she was a very attractive woman.Gracie: She was, and my brother watched her day andnight for six months.George: Well, what happened?Gracie: She finally got a divorce.George: Mrs. Jones?Gracie: No, my brother’s wife.

Jordi se fué al restaurante de Xavi para comer pescado.Este estaba fresco y le gustó.

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 29: NLP_session-1

NLP: Ambiguities and Solutions

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 30: NLP_session-1

NLP: Ambiguities and Solutions

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 31: NLP_session-1

Linguistic Ambiguities

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 32: NLP_session-1

Linguistic Ambiguities

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 33: NLP_session-1

Linguistic Ambiguities

Morphological ambiguityDeja la comida que sobre sobre la mesa de la cocina, dijo

llevando el sobre en la mano.

Syntactic ambiguity

María vio a un niño con un telescopio en la ventana.Vamos a escuchar la sinfonía encargada por la marquesa,que fue ejecutada el diez de abril de 1792.

Semantic ambiguity

Luís dejó el periódico en el banco.Vamos a escuchar la sinfonía encargada por la marquesa,que fue ejecutada el diez de abril de 1792.

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 34: NLP_session-1

Linguistic Ambiguities

Morphological ambiguityDeja la comida que sobre sobre la mesa de la cocina, dijo

llevando el sobre en la mano.

Syntactic ambiguity

María vio a un niño con un telescopio en la ventana.Vamos a escuchar la sinfonía encargada por la marquesa,que fue ejecutada el diez de abril de 1792.

Semantic ambiguity

Luís dejó el periódico en el banco.Vamos a escuchar la sinfonía encargada por la marquesa,que fue ejecutada el diez de abril de 1792.

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 35: NLP_session-1

Linguistic Ambiguities

Morphological ambiguityDeja la comida que sobre sobre la mesa de la cocina, dijo

llevando el sobre en la mano.

Syntactic ambiguity

María vio a un niño con un telescopio en la ventana.Vamos a escuchar la sinfonía encargada por la marquesa,que fue ejecutada el diez de abril de 1792.

Semantic ambiguity

Luís dejó el periódico en el banco.Vamos a escuchar la sinfonía encargada por la marquesa,que fue ejecutada el diez de abril de 1792.

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 36: NLP_session-1

Linguistic Ambiguities

Morphological ambiguityDeja la comida que sobre sobre la mesa de la cocina, dijo

llevando el sobre en la mano.

Syntactic ambiguity

María vio a un niño con un telescopio en la ventana.Vamos a escuchar la sinfonía encargada por la marquesa,que fue ejecutada el diez de abril de 1792.

Semantic ambiguity

Luís dejó el periódico en el banco.Vamos a escuchar la sinfonía encargada por la marquesa,que fue ejecutada el diez de abril de 1792.

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 37: NLP_session-1

Typical NLP tasks: Basic and simpler tasks

Tokenization RegEx

Sentence splitting RegEx

POS-tagging POS-tagging algorithms andtag sets

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 38: NLP_session-1

Typical NLP tasks: Basic and simpler tasks

Tokenization

RegEx

Sentence splitting RegEx

POS-tagging POS-tagging algorithms andtag sets

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 39: NLP_session-1

Typical NLP tasks: Basic and simpler tasks

Tokenization RegEx

Sentence splitting RegEx

POS-tagging POS-tagging algorithms andtag sets

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 40: NLP_session-1

Typical NLP tasks: Basic and simpler tasks

Tokenization RegEx

Sentence splitting

RegEx

POS-tagging POS-tagging algorithms andtag sets

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 41: NLP_session-1

Typical NLP tasks: Basic and simpler tasks

Tokenization RegEx

Sentence splitting RegEx

POS-tagging POS-tagging algorithms andtag sets

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 42: NLP_session-1

Typical NLP tasks: Basic and simpler tasks

Tokenization RegEx

Sentence splitting RegEx

POS-tagging

POS-tagging algorithms andtag sets

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 43: NLP_session-1

Typical NLP tasks: Basic and simpler tasks

Tokenization RegEx

Sentence splitting RegEx

POS-tagging POS-tagging algorithms andtag sets

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 44: NLP_session-1

Typical NLP tasks: Complex tasks

Lemmatization or StemmingImplementations of PorterStemmer (e.g. in Java),

Stanford NLP tool, GATE, ...

Syntactic parsing

Early algorithm, CYKalgorithm, GHR algorithm,

Stanford Parser (Javaimplementation of

probabilistic algorithm)

Topic extractionNERSemantic analysis...

Ad hoc tools, e.g.dictionaries, ontologies,Frames, GATE, NLTK,

...

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 45: NLP_session-1

Typical NLP tasks: Complex tasks

Lemmatization or Stemming

Implementations of PorterStemmer (e.g. in Java),

Stanford NLP tool, GATE, ...

Syntactic parsing

Early algorithm, CYKalgorithm, GHR algorithm,

Stanford Parser (Javaimplementation of

probabilistic algorithm)

Topic extractionNERSemantic analysis...

Ad hoc tools, e.g.dictionaries, ontologies,Frames, GATE, NLTK,

...

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 46: NLP_session-1

Typical NLP tasks: Complex tasks

Lemmatization or StemmingImplementations of PorterStemmer (e.g. in Java),

Stanford NLP tool, GATE, ...

Syntactic parsing

Early algorithm, CYKalgorithm, GHR algorithm,

Stanford Parser (Javaimplementation of

probabilistic algorithm)

Topic extractionNERSemantic analysis...

Ad hoc tools, e.g.dictionaries, ontologies,Frames, GATE, NLTK,

...

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 47: NLP_session-1

Typical NLP tasks: Complex tasks

Lemmatization or StemmingImplementations of PorterStemmer (e.g. in Java),

Stanford NLP tool, GATE, ...

Syntactic parsing

Early algorithm, CYKalgorithm, GHR algorithm,

Stanford Parser (Javaimplementation of

probabilistic algorithm)

Topic extractionNERSemantic analysis...

Ad hoc tools, e.g.dictionaries, ontologies,Frames, GATE, NLTK,

...

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 48: NLP_session-1

Typical NLP tasks: Complex tasks

Lemmatization or StemmingImplementations of PorterStemmer (e.g. in Java),

Stanford NLP tool, GATE, ...

Syntactic parsing

Early algorithm, CYKalgorithm, GHR algorithm,

Stanford Parser (Javaimplementation of

probabilistic algorithm)

Topic extractionNERSemantic analysis...

Ad hoc tools, e.g.dictionaries, ontologies,Frames, GATE, NLTK,

...

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 49: NLP_session-1

Typical NLP tasks: Complex tasks

Lemmatization or StemmingImplementations of PorterStemmer (e.g. in Java),

Stanford NLP tool, GATE, ...

Syntactic parsing

Early algorithm, CYKalgorithm, GHR algorithm,

Stanford Parser (Javaimplementation of

probabilistic algorithm)

Topic extractionNERSemantic analysis...

Ad hoc tools, e.g.dictionaries, ontologies,Frames, GATE, NLTK,

...

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 50: NLP_session-1

Typical NLP tasks: Complex tasks

Lemmatization or StemmingImplementations of PorterStemmer (e.g. in Java),

Stanford NLP tool, GATE, ...

Syntactic parsing

Early algorithm, CYKalgorithm, GHR algorithm,

Stanford Parser (Javaimplementation of

probabilistic algorithm)

Topic extractionNERSemantic analysis...

Ad hoc tools, e.g.dictionaries, ontologies,Frames, GATE, NLTK,

...

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 51: NLP_session-1

POS-tagging

Example with Penn Treebank POS-tags (English):

A/DT woman/NN came/VBD from/IN the/DT back/NN of/INthe/DT store/NN ./. She/PP appeared/VBD to/TO be/VB

sleepy/JJ and/CC quite/RB a/DT bit/NN younger/JJR than/INMr./NNP Dobbs/NNP and/CC to/TO be/VB wearing/VBG

too/RB much/RB makeup/NN ./.

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 52: NLP_session-1

POS-tagging

Example with Penn Treebank POS-tags (English):

A/DT woman/NN came/VBD from/IN the/DT back/NN of/INthe/DT store/NN ./. She/PP appeared/VBD to/TO be/VB

sleepy/JJ and/CC quite/RB a/DT bit/NN younger/JJR than/INMr./NNP Dobbs/NNP and/CC to/TO be/VB wearing/VBG

too/RB much/RB makeup/NN ./.

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 53: NLP_session-1

POS-tagging

Example with Penn Treebank POS-tags (English):

A/DT woman/NN came/VBD from/IN the/DT back/NN of/INthe/DT store/NN ./. She/PP appeared/VBD to/TO be/VB

sleepy/JJ and/CC quite/RB a/DT bit/NN younger/JJR than/INMr./NNP Dobbs/NNP and/CC to/TO be/VB wearing/VBG

too/RB much/RB makeup/NN ./.

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 54: NLP_session-1

POS-tagging

Example of ambiguity:

1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBNmuy/RB divertida/JJ !/!

2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DTjungla/NN oscura/JJ ./.

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 55: NLP_session-1

POS-tagging

Example of ambiguity:1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBN

muy/RB divertida/JJ !/!

2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DTjungla/NN oscura/JJ ./.

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 56: NLP_session-1

POS-tagging

Example of ambiguity:1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBN

muy/RB divertida/JJ !/!

2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DTjungla/NN oscura/JJ ./.

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 57: NLP_session-1

POS-tagging

Three main tagging algorithms or methods:1 rule-based tagging, e.g. ENGTWOL2 stochastic tagging, e.g. HMM tagger3 transformation-based tagging, e.g. Brill tagger

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 58: NLP_session-1

Rule-based POS-tagging

Example of ambiguity:1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBN

muy/RB divertida/JJ !/!2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DT

jungla/NN oscura/JJ ./.

Large database of hand-written disambiguation rules, e.g.:

DT + NN → YESDT + VB → NOPP + NN → NOPP + VB → YES

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 59: NLP_session-1

Rule-based POS-tagging

Example of ambiguity:1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBN

muy/RB divertida/JJ !/!2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DT

jungla/NN oscura/JJ ./.

Large database of hand-written disambiguation rules, e.g.:

DT + NN → YESDT + VB → NOPP + NN → NOPP + VB → YES

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 60: NLP_session-1

Rule-based POS-tagging

Example of ambiguity:1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBN

muy/RB divertida/JJ !/!2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DT

jungla/NN oscura/JJ ./.

Large database of hand-written disambiguation rules, e.g.:

DT + NN → YESDT + VB → NOPP + NN → NOPP + VB → YES

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 61: NLP_session-1

Rule-based POS-tagging

Example of ambiguity:1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBN

muy/RB divertida/JJ !/!2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DT

jungla/NN oscura/JJ ./.

Large database of hand-written disambiguation rules, e.g.:DT + NN → YESDT + VB → NOPP + NN → NOPP + VB → YES

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 62: NLP_session-1

Stochastic POS-tagging

Example of ambiguity:1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBN

muy/RB divertida/JJ !/!2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DT

jungla/NN oscura/JJ ./.

Training corpus to compute probability of given word havinggiven tag in given context, e.g.:

La/DT aventura/NN de/IN → 98%

La/DT aventura/VBZ de/IN → 2%

El/DT explorador/NN se/PP aventura/VBZ → 97%

El/DT explorador/NN se/PP aventura/NN → 3%

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 63: NLP_session-1

Stochastic POS-tagging

Example of ambiguity:1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBN

muy/RB divertida/JJ !/!2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DT

jungla/NN oscura/JJ ./.

Training corpus to compute probability of given word havinggiven tag in given context, e.g.:

La/DT aventura/NN de/IN → 98%

La/DT aventura/VBZ de/IN → 2%

El/DT explorador/NN se/PP aventura/VBZ → 97%

El/DT explorador/NN se/PP aventura/NN → 3%

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 64: NLP_session-1

Stochastic POS-tagging

Example of ambiguity:1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBN

muy/RB divertida/JJ !/!2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DT

jungla/NN oscura/JJ ./.

Training corpus to compute probability of given word havinggiven tag in given context, e.g.:

La/DT aventura/NN de/IN → 98%

La/DT aventura/VBZ de/IN → 2%

El/DT explorador/NN se/PP aventura/VBZ → 97%

El/DT explorador/NN se/PP aventura/NN → 3%

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 65: NLP_session-1

Stochastic POS-tagging

Example of ambiguity:1 La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBN

muy/RB divertida/JJ !/!2 El/DT explorador/NN se/PP aventura/VBZ en/IN la/DT

jungla/NN oscura/JJ ./.

Training corpus to compute probability of given word havinggiven tag in given context, e.g.:

La/DT aventura/NN de/IN → 98%

La/DT aventura/VBZ de/IN → 2%

El/DT explorador/NN se/PP aventura/VBZ → 97%

El/DT explorador/NN se/PP aventura/NN → 3%

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 66: NLP_session-1

Transformation-based tagging POS-tagging

Example of ambiguity:

La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBNmuy/RB divertida/JJ !/!El/DT explorador/NN se/PP aventura/VBZ en/IN la/DTjungla/NN oscura/JJ ./.

Rules automatically induced from data using Machine Learningtechniques, e.g.:

1 prob(aventura=NN) = 65%, prob(aventura=VBZ) = 35%→ system would always take aventura = NN

2 apply Machine Learning techniques and learn theconditional probabilities:

3 La/DT aventura/NN de/IN → 98%El/DT explorador/NN se/PP aventura/VBZ → 97%

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 67: NLP_session-1

Transformation-based tagging POS-tagging

Example of ambiguity:

La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBNmuy/RB divertida/JJ !/!El/DT explorador/NN se/PP aventura/VBZ en/IN la/DTjungla/NN oscura/JJ ./.

Rules automatically induced from data using Machine Learningtechniques, e.g.:

1 prob(aventura=NN) = 65%, prob(aventura=VBZ) = 35%→ system would always take aventura = NN

2 apply Machine Learning techniques and learn theconditional probabilities:

3 La/DT aventura/NN de/IN → 98%El/DT explorador/NN se/PP aventura/VBZ → 97%

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 68: NLP_session-1

Transformation-based tagging POS-tagging

Example of ambiguity:

La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBNmuy/RB divertida/JJ !/!El/DT explorador/NN se/PP aventura/VBZ en/IN la/DTjungla/NN oscura/JJ ./.

Rules automatically induced from data using Machine Learningtechniques, e.g.:

1 prob(aventura=NN) = 65%, prob(aventura=VBZ) = 35%→ system would always take aventura = NN

2 apply Machine Learning techniques and learn theconditional probabilities:

3 La/DT aventura/NN de/IN → 98%El/DT explorador/NN se/PP aventura/VBZ → 97%

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 69: NLP_session-1

Transformation-based tagging POS-tagging

Example of ambiguity:

La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBNmuy/RB divertida/JJ !/!El/DT explorador/NN se/PP aventura/VBZ en/IN la/DTjungla/NN oscura/JJ ./.

Rules automatically induced from data using Machine Learningtechniques, e.g.:

1 prob(aventura=NN) = 65%, prob(aventura=VBZ) = 35%→ system would always take aventura = NN

2 apply Machine Learning techniques and learn theconditional probabilities:

3 La/DT aventura/NN de/IN → 98%El/DT explorador/NN se/PP aventura/VBZ → 97%

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 70: NLP_session-1

Transformation-based tagging POS-tagging

Example of ambiguity:

La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBNmuy/RB divertida/JJ !/!El/DT explorador/NN se/PP aventura/VBZ en/IN la/DTjungla/NN oscura/JJ ./.

Rules automatically induced from data using Machine Learningtechniques, e.g.:

1 prob(aventura=NN) = 65%, prob(aventura=VBZ) = 35%→ system would always take aventura = NN

2 apply Machine Learning techniques and learn theconditional probabilities:

3 La/DT aventura/NN de/IN → 98%El/DT explorador/NN se/PP aventura/VBZ → 97%

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 71: NLP_session-1

Transformation-based tagging POS-tagging

Example of ambiguity:

La/DT aventura/NN de/IN ayer/RB ha/MD sido/VBNmuy/RB divertida/JJ !/!El/DT explorador/NN se/PP aventura/VBZ en/IN la/DTjungla/NN oscura/JJ ./.

Rules automatically induced from data using Machine Learningtechniques, e.g.:

1 prob(aventura=NN) = 65%, prob(aventura=VBZ) = 35%→ system would always take aventura = NN

2 apply Machine Learning techniques and learn theconditional probabilities:

3 La/DT aventura/NN de/IN → 98%El/DT explorador/NN se/PP aventura/VBZ → 97%

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 72: NLP_session-1

POS-tagging

POS-tagging tools for Spanish:

Petra Tag (C++)OpenNLP POS-Tagging Engine (Apache Stanbol)POS-tagger with NLTK (Python)FreeLing (TALP Research Center at UPC)create your own POS-tagger in Java and plug it into GATE

Dr. Alexandra M. Liguori NLP Training – Session 1

Page 73: NLP_session-1

What next?

Topics for the next sessions:1 Semantic analysis2 Question answering3 Reference resolution4 Named Entity Recognition (NER)5 Keyword / topic / information extraction

Dr. Alexandra M. Liguori NLP Training – Session 1