Hinrich Schu¨tze 2015-11-04 - uni-muenchen.de › ~hs › teach › 15w › pmclii › pdf ›...
Transcript of Hinrich Schu¨tze 2015-11-04 - uni-muenchen.de › ~hs › teach › 15w › pmclii › pdf ›...
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Representation Learning for Domain Adaptation
Hinrich Schütze
Center for Information and Language Processing, University of Munich
2015-11-04
Schütze: Representation learning for domain adaptation 1 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Overview
1 “Traditional” computational linguistics representations
2 Count vector representations
3 Deep learning representations
4 Task 1: Part-of-speech (POS) tagging
5 Task 2: Morphological (MORPH) tagging
6 Task 3: Sentiment analysis
7 Task 4: Semantic similarity between words
8 Conclusion
Schütze: Representation learning for domain adaptation 2 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Outline
1 “Traditional” computational linguistics representations
2 Count vector representations
3 Deep learning representations
4 Task 1: Part-of-speech (POS) tagging
5 Task 2: Morphological (MORPH) tagging
6 Task 3: Sentiment analysis
7 Task 4: Semantic similarity between words
8 Conclusion
Schütze: Representation learning for domain adaptation 3 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Representations created by (computational) linguists
Generative lexicon (Pustejovsky) entry for “build”:
Schütze: Representation learning for domain adaptation 4 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Representations created by (computational) linguists
Lexicon entry for “obeshchat’” in Tolkovo-kombinatornyj Slovar’Sovremennogo Russkogo Jazyka:
Schütze: Representation learning for domain adaptation 5 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Representations created by (computational) linguists
Morphological paradigm of French verb “faire”:
Schütze: Representation learning for domain adaptation 6 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
“Traditional” representations in computational linguistics
Schütze: Representation learning for domain adaptation 7 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
“Traditional” representations in computational linguistics
Motivated by linguistic theory
Schütze: Representation learning for domain adaptation 7 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
“Traditional” representations in computational linguistics
Motivated by linguistic theory
Many successes in practical applications
Schütze: Representation learning for domain adaptation 7 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
“Traditional” representations in computational linguistics
Motivated by linguistic theory
Many successes in practical applications
So why would we need any other representation incomputational linguistics?
Schütze: Representation learning for domain adaptation 7 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Why learned representations: Problems with LING reps
Coverage
Domain dependence
Noise / need for robustness
Manual creation of representations for rich semantics / worldknowledge: unsolved problem
Schütze: Representation learning for domain adaptation 8 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Coverage
Schütze: Representation learning for domain adaptation 9 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Coverage
Natural languages are productive: new words and meaningsare created all the time.
Schütze: Representation learning for domain adaptation 9 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Coverage
Natural languages are productive: new words and meaningsare created all the time.
Example: “unfriend”
Schütze: Representation learning for domain adaptation 9 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Coverage
Natural languages are productive: new words and meaningsare created all the time.
Example: “unfriend”
“A new study from a University of Colorado Denver gradstudent attempts to uncover what types of people we aremost likely to unfriend.”
Schütze: Representation learning for domain adaptation 9 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Coverage
Natural languages are productive: new words and meaningsare created all the time.
Example: “unfriend”
“A new study from a University of Colorado Denver gradstudent attempts to uncover what types of people we aremost likely to unfriend.”
Traditional CL: New words are not covered.
Schütze: Representation learning for domain adaptation 9 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Coverage
Natural languages are productive: new words and meaningsare created all the time.
Example: “unfriend”
“A new study from a University of Colorado Denver gradstudent attempts to uncover what types of people we aremost likely to unfriend.”
Traditional CL: New words are not covered.
Representation learning: Representations for new words canbe automatically learned.
Schütze: Representation learning for domain adaptation 9 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Domain dependence
Schütze: Representation learning for domain adaptation 10 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Domain dependence
The language in many NLP applications has domain-specificproperties.
Schütze: Representation learning for domain adaptation 10 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Domain dependence
The language in many NLP applications has domain-specificproperties.
Example: Patents
Schütze: Representation learning for domain adaptation 10 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Domain dependence
The language in many NLP applications has domain-specificproperties.
Example: Patents
An apparatus for winding fence material comprising a leadingedge portion, wherein said apparatus is comprised of a firstshaft . . .
Schütze: Representation learning for domain adaptation 10 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Domain dependence
The language in many NLP applications has domain-specificproperties.
Example: Patents
An apparatus for winding fence material comprising a leadingedge portion, wherein said apparatus is comprised of a firstshaft . . .
The word “said” is used as a demonstrative here.
Schütze: Representation learning for domain adaptation 10 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Domain dependence
The language in many NLP applications has domain-specificproperties.
Example: Patents
An apparatus for winding fence material comprising a leadingedge portion, wherein said apparatus is comprised of a firstshaft . . .
The word “said” is used as a demonstrative here.
Traditional CL: Domain-dependent usage not covered.
Schütze: Representation learning for domain adaptation 10 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Domain dependence
The language in many NLP applications has domain-specificproperties.
Example: Patents
An apparatus for winding fence material comprising a leadingedge portion, wherein said apparatus is comprised of a firstshaft . . .
The word “said” is used as a demonstrative here.
Traditional CL: Domain-dependent usage not covered.
Representation learning: Representations fordomain-dependent usage can be automatically learned.
Schütze: Representation learning for domain adaptation 10 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Noise / need for robustness
Tweet: “water” = “what are”
Amazon review: “since i was young i always dreamed of goingto walt disney world, but no that i live in florida i go thereevery chance i get!but the days i cant go i just play this game,its like being on the rides themselves!not to easy for you tobeat in a day”
Schütze: Representation learning for domain adaptation 11 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Rich semantics / world
knowledge
Schütze: Representation learning for domain adaptation 12 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Rich semantics / world
knowledge
spider
Schütze: Representation learning for domain adaptation 12 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Rich semantics / world
knowledge
spider
Schütze: Representation learning for domain adaptation 12 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Rich semantics / world
knowledge
spiderTaxonomy: spi-ders are animals,similar to insects
Schütze: Representation learning for domain adaptation 12 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Rich semantics / world
knowledge
spiderTaxonomy: spi-ders are animals,similar to insects
Attributes of spi-ders: venomous,fuzzy, small, fast-moving
Schütze: Representation learning for domain adaptation 12 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Rich semantics / world
knowledge
spiderTaxonomy: spi-ders are animals,similar to insects
Attributes of spi-ders: venomous,fuzzy, small, fast-moving
Typical actionsof spiders: bite,prey, weave,burrow
Schütze: Representation learning for domain adaptation 12 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Rich semantics / world
knowledge
spiderTaxonomy: spi-ders are animals,similar to insects
Attributes of spi-ders: venomous,fuzzy, small, fast-moving
Typical actionsof spiders: bite,prey, weave,burrow
Schütze: Representation learning for domain adaptation 12 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Rich semantics / world
knowledge
spiderTaxonomy: spi-ders are animals,similar to insects
Attributes of spi-ders: venomous,fuzzy, small, fast-moving
Typical actionsof spiders: bite,prey, weave,burrow
Schütze: Representation learning for domain adaptation 12 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Rich semantics / world
knowledge
spiderTaxonomy: spi-ders are animals,similar to insects
Attributes of spi-ders: venomous,fuzzy, small, fast-moving
Typical actionsof spiders: bite,prey, weave,burrow
Schütze: Representation learning for domain adaptation 12 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Rich semantics / world
knowledge
spiderTaxonomy: spi-ders are animals,similar to insects
Attributes of spi-ders: venomous,fuzzy, small, fast-moving
Typical actionsof spiders: bite,prey, weave,burrow
Schütze: Representation learning for domain adaptation 12 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Rich semantics / world
knowledge
Taxonomy: spi-ders are animals,similar to insects
Attributes of spi-ders: venomous,fuzzy, small, fast-moving
Typical actionsof spiders: bite,prey, weave,burrow
Meaning is very het-erogeneous: abstract,concrete, sensory,core semantics, worldknowledge
Schütze: Representation learning for domain adaptation 13 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Rich semantics / world
knowledge
Taxonomy: spi-ders are animals,similar to insects
Attributes of spi-ders: venomous,fuzzy, small, fast-moving
Typical actionsof spiders: bite,prey, weave,burrow
Meaning is very het-erogeneous: abstract,concrete, sensory,core semantics, worldknowledge
Traditional CL: Diffi-cult to represent allthis in a computation-ally useful way
Schütze: Representation learning for domain adaptation 13 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Rich semantics / world
knowledge
Taxonomy: spi-ders are animals,similar to insects
Attributes of spi-ders: venomous,fuzzy, small, fast-moving
Typical actionsof spiders: bite,prey, weave,burrow
Meaning is very het-erogeneous: abstract,concrete, sensory,core semantics, worldknowledge
Traditional CL: Diffi-cult to represent allthis in a computation-ally useful way
Representation learn-ing: deal well withheterogeneity ofmeaning
Schütze: Representation learning for domain adaptation 13 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Zipf
The long tail of language use
The adjective “hard”: die hard, hard by, hard and fast, hardcopy, hard back, hard core, hard disk, hard drive, hard drugs,hard earned, hard hit, hard rock, hard going, hard nosed, hardof hearing, hard put, hard to get, hard way
Schütze: Representation learning for domain adaptation 14 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Zipf
The long tail of language use
The adjective “hard”: die hard, hard by, hard and fast, hardcopy, hard back, hard core, hard disk, hard drive, hard drugs,hard earned, hard hit, hard rock, hard going, hard nosed, hardof hearing, hard put, hard to get, hard way
Not just memorization:
Schütze: Representation learning for domain adaptation 14 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Zipf
The long tail of language use
The adjective “hard”: die hard, hard by, hard and fast, hardcopy, hard back, hard core, hard disk, hard drive, hard drugs,hard earned, hard hit, hard rock, hard going, hard nosed, hardof hearing, hard put, hard to get, hard way
Not just memorization: “hard and fast”
Schütze: Representation learning for domain adaptation 14 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Zipf
The long tail of language use
The adjective “hard”: die hard, hard by, hard and fast, hardcopy, hard back, hard core, hard disk, hard drive, hard drugs,hard earned, hard hit, hard rock, hard going, hard nosed, hardof hearing, hard put, hard to get, hard way
Not just memorization: “hard and fast” → “fast and hard”,
Schütze: Representation learning for domain adaptation 14 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Zipf
The long tail of language use
The adjective “hard”: die hard, hard by, hard and fast, hardcopy, hard back, hard core, hard disk, hard drive, hard drugs,hard earned, hard hit, hard rock, hard going, hard nosed, hardof hearing, hard put, hard to get, hard way
Not just memorization: “hard and fast” → “fast and hard”,“hard back”
Schütze: Representation learning for domain adaptation 14 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Zipf
The long tail of language use
The adjective “hard”: die hard, hard by, hard and fast, hardcopy, hard back, hard core, hard disk, hard drive, hard drugs,hard earned, hard hit, hard rock, hard going, hard nosed, hardof hearing, hard put, hard to get, hard way
Not just memorization: “hard and fast” → “fast and hard”,“hard back” → “hard bound”,
Schütze: Representation learning for domain adaptation 14 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Zipf
The long tail of language use
The adjective “hard”: die hard, hard by, hard and fast, hardcopy, hard back, hard core, hard disk, hard drive, hard drugs,hard earned, hard hit, hard rock, hard going, hard nosed, hardof hearing, hard put, hard to get, hard way
Not just memorization: “hard and fast” → “fast and hard”,“hard back” → “hard bound”, “hard rock”
Schütze: Representation learning for domain adaptation 14 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Zipf
The long tail of language use
The adjective “hard”: die hard, hard by, hard and fast, hardcopy, hard back, hard core, hard disk, hard drive, hard drugs,hard earned, hard hit, hard rock, hard going, hard nosed, hardof hearing, hard put, hard to get, hard way
Not just memorization: “hard and fast” → “fast and hard”,“hard back” → “hard bound”, “hard rock” → “hard punk”,
Schütze: Representation learning for domain adaptation 14 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problems for traditional CL: Zipf
The long tail of language use
The adjective “hard”: die hard, hard by, hard and fast, hardcopy, hard back, hard core, hard disk, hard drive, hard drugs,hard earned, hard hit, hard rock, hard going, hard nosed, hardof hearing, hard put, hard to get, hard way
Not just memorization: “hard and fast” → “fast and hard”,“hard back” → “hard bound”, “hard rock” → “hard punk”,“truly knuckles-scraping-against-asphalt hard”
Schütze: Representation learning for domain adaptation 14 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Why learned representations: Problems with LING reps
Coverage
Domain dependence
Noise / need for robustness
Manual creation of representations for rich semantics / worldknowledge: unsolved problem
Schütze: Representation learning for domain adaptation 15 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Types of representations used in NLP
NONE: No representation, except for word index
LING: Representations based on linguistic resources
COUNT: Count vectors
UNSU: Representations learned by unsupervised learning
PREDICT: Representations learned by supervised learning:embeddings / predict vectors
Schütze: Representation learning for domain adaptation 16 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Types of representations used in NLP
NONE: No representation, except for word index
LING: Representations based on linguistic resources
COUNT: Count vectors
UNSU: Representations learned by unsupervised learning
PREDICT: Representations learned by supervised learning:embeddings / predict vectors
Next: COUNT vector models
Schütze: Representation learning for domain adaptation 16 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Outline
1 “Traditional” computational linguistics representations
2 Count vector representations
3 Deep learning representations
4 Task 1: Part-of-speech (POS) tagging
5 Task 2: Morphological (MORPH) tagging
6 Task 3: Sentiment analysis
7 Task 4: Semantic similarity between words
8 Conclusion
Schütze: Representation learning for domain adaptation 17 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector models
Schütze: Representation learning for domain adaptation 18 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector models
Dimensionality is vocabulary V (or large subset thereof)
Schütze: Representation learning for domain adaptation 18 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector models
Dimensionality is vocabulary V (or large subset thereof)
Value of dimension i of distributional representation of wordv : (weighted) cooccurrence count of v and wi
Schütze: Representation learning for domain adaptation 18 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector model: The counts
Schütze: Representation learning for domain adaptation 19 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector model: The counts
Count the cooccurrence of two words in a large corpus
Schütze: Representation learning for domain adaptation 19 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector model: The counts
Count the cooccurrence of two words in a large corpus
E.g., cooccurrence = cooccurrence within k = 10 words
Schütze: Representation learning for domain adaptation 19 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector model: The counts
Count the cooccurrence of two words in a large corpus
E.g., cooccurrence = cooccurrence within k = 10 words
Example counts from the Wikipedia
Schütze: Representation learning for domain adaptation 19 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector model: The counts
Count the cooccurrence of two words in a large corpus
E.g., cooccurrence = cooccurrence within k = 10 words
Example counts from the Wikipedia
cooc.(rich,silver) = 186cooc.(poor,silver) = 34cooc.(rich,disease) = 17cooc.(poor,disease) = 162cooc.(rich,society) = 143cooc.(poor,society) = 228
Schütze: Representation learning for domain adaptation 19 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector model: Vectors
Schütze: Representation learning for domain adaptation 20 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector model: Vectors
rich
poor0
50
100
150
200
0 50 100 150 200 250
Schütze: Representation learning for domain adaptation 20 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector model: Vectors
rich
poor0
50
100
150
200
0 50 100 150 200 250
cooc.(poor,silver)=34, cooc.(rich,silver)=186,
Schütze: Representation learning for domain adaptation 20 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector model: Vectors
rich
poor0
50
100
150
200
0 50 100 150 200 250
silver
cooc.(poor,silver)=34, cooc.(rich,silver)=186,
Schütze: Representation learning for domain adaptation 20 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector model: Vectors
rich
poor0
50
100
150
200
0 50 100 150 200 250
silversilver
cooc.(poor,silver)=34, cooc.(rich,silver)=186,
Schütze: Representation learning for domain adaptation 20 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector model: Vectors
rich
poor0
50
100
150
200
0 50 100 150 200 250
silver
cooc.(poor,silver)=34, cooc.(rich,silver)=186,
Schütze: Representation learning for domain adaptation 20 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector model: Vectors
rich
poor0
50
100
150
200
0 50 100 150 200 250
silver
cooc.(poor,silver)=34, cooc.(rich,silver)=186,cooc.(poor,disease)=162, cooc.(rich,disease)=17,
Schütze: Representation learning for domain adaptation 20 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector model: Vectors
rich
poor0
50
100
150
200
0 50 100 150 200 250
silver
disease
cooc.(poor,silver)=34, cooc.(rich,silver)=186,cooc.(poor,disease)=162, cooc.(rich,disease)=17,
Schütze: Representation learning for domain adaptation 20 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector model: Vectors
rich
poor0
50
100
150
200
0 50 100 150 200 250
silver
disease
cooc.(poor,silver)=34, cooc.(rich,silver)=186,cooc.(poor,disease)=162, cooc.(rich,disease)=17,cooc.(poor,society)=228, cooc.(rich,society)=143
Schütze: Representation learning for domain adaptation 20 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector model: Vectors
rich
poor0
50
100
150
200
0 50 100 150 200 250
silver
disease
society
cooc.(poor,silver)=34, cooc.(rich,silver)=186,cooc.(poor,disease)=162, cooc.(rich,disease)=17,cooc.(poor,society)=228, cooc.(rich,society)=143
Schütze: Representation learning for domain adaptation 20 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector model: Similarity
Schütze: Representation learning for domain adaptation 21 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector model: Similarity
rich
poor0
50
100
150
200
0 50 100 150 200 250
silver
disease
society
The similarity between two words is the cosine of the anglebetween them.
Schütze: Representation learning for domain adaptation 21 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector model: Similarity
rich
poor0
50
100
150
200
0 50 100 150 200 250
goldsilver
disease
society
The similarity between two words is the cosine of the anglebetween them.
Small angle: gold and silver are similar.
Schütze: Representation learning for domain adaptation 21 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector model: Similarity
rich
poor0
50
100
150
200
0 50 100 150 200 250
goldsilver
disease
society
The similarity between two words is the cosine of the anglebetween them.
Large angle: gold and disease are not similar.
Schütze: Representation learning for domain adaptation 21 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vector model: Similarity
rich
poor0
50
100
150
200
0 50 100 150 200 250
goldsilver
disease
society
The similarity between two words is the cosine of the anglebetween them.
Schütze: Representation learning for domain adaptation 21 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Types of representations used in NLP
NONE: No representation, except for word index
LING: Representations based on linguistic resources
COUNT: Count vectors
UNSU: Representations learned by unsupervised learning
PREDICT: Representations learned by supervised learning:embeddings / predict vectors
Next: PREDICT: predict vectors in deep learning
Schütze: Representation learning for domain adaptation 22 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Terminology
Schütze: Representation learning for domain adaptation 23 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Terminology
A distributed representation
is simply a vector representation, i.e., a point in a high-dimensionalreal-valued space. Implicit in the concept of distributedrepresentation is that similarity/distance is interpretable. E.g.,representing a 1000x1000 binary pixel image as a one milliondimensional vector is not distributed since similarity/distance doesnot have an intuitive interpretation.
Schütze: Representation learning for domain adaptation 23 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Terminology
A distributed representation
is simply a vector representation, i.e., a point in a high-dimensionalreal-valued space. Implicit in the concept of distributedrepresentation is that similarity/distance is interpretable. E.g.,representing a 1000x1000 binary pixel image as a one milliondimensional vector is not distributed since similarity/distance doesnot have an intuitive interpretation.
Embeddings/predict vectors, count vectors, representations learnedby unsupervised learning: these are all distributed representations.
Schütze: Representation learning for domain adaptation 23 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Terminology
A distributed representation
is simply a vector representation, i.e., a point in a high-dimensionalreal-valued space. Implicit in the concept of distributedrepresentation is that similarity/distance is interpretable. E.g.,representing a 1000x1000 binary pixel image as a one milliondimensional vector is not distributed since similarity/distance doesnot have an intuitive interpretation.
Embeddings/predict vectors, count vectors, representations learnedby unsupervised learning: these are all distributed representations.(Linguistic resources usually do not provide distributedrepresentations.)
Schütze: Representation learning for domain adaptation 23 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Terminology (cont.)
A distributional representation
can be defined (i) as a representation based on distributionalinformation, i.e., on the distribution of words in contexts in a largecorpus or (ii) as a synonym of distributed representation.
Schütze: Representation learning for domain adaptation 24 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Terminology (cont.)
A distributional representation
can be defined (i) as a representation based on distributionalinformation, i.e., on the distribution of words in contexts in a largecorpus or (ii) as a synonym of distributed representation.
Count vectors are distributional representations according todefinition (i). Embeddings/predict vectors and representationslearned by unsupervised learning may or may not be viewed asdistributional representations according to definition (ii) since thelink to the distribution of words in contexts is more indirect in thiscase.
Schütze: Representation learning for domain adaptation 24 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Outline
1 “Traditional” computational linguistics representations
2 Count vector representations
3 Deep learning representations
4 Task 1: Part-of-speech (POS) tagging
5 Task 2: Morphological (MORPH) tagging
6 Task 3: Sentiment analysis
7 Task 4: Semantic similarity between words
8 Conclusion
Schütze: Representation learning for domain adaptation 25 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning as a gestalt
Schütze: Representation learning for domain adaptation 26 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning as a gestalt
Automatic learning of features(as opposed to hand-designed features)
Schütze: Representation learning for domain adaptation 26 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning as a gestalt
Automatic learning of features(as opposed to hand-designed features)
nonlinear
Schütze: Representation learning for domain adaptation 26 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning as a gestalt
Automatic learning of features(as opposed to hand-designed features)
nonlinear
Representation learning: embeddings or predict vectors
Schütze: Representation learning for domain adaptation 26 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning as a gestalt
Automatic learning of features(as opposed to hand-designed features)
nonlinear
Representation learning: embeddings or predict vectors
“deep” = “multi-layer architectures”
Schütze: Representation learning for domain adaptation 26 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Geoff Hinton on Automatic Feature Learning
Adding a layer of hand-coded features . . . makes them much morepowerful but the hard bit is designing the features. We need toautomate the loop of designing features for a particular task andseeing how well they work.
Schütze: Representation learning for domain adaptation 27 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Opposing view: Feature design still is important
Solving any complex task requires domain expertise.
Domain expertise can be used in various ways: definition oftask to be learned, collection and composition of trainingdata, architecture of machine learning system, design ofrepresentation, design of features
It’s unclear why domain expertise should be used for some ofthese, but not for feature design.
Schütze: Representation learning for domain adaptation 28 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning: Nonlinearity
Schütze: Representation learning for domain adaptation 29 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning: Nonlinearity
SVMs (perhaps the main competitor of neural networks) aremore efficient and have a better understood theory thanneural networks.
Schütze: Representation learning for domain adaptation 29 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning: Nonlinearity
SVMs (perhaps the main competitor of neural networks) aremore efficient and have a better understood theory thanneural networks.
But they are linear.
Schütze: Representation learning for domain adaptation 29 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning: Nonlinearity
SVMs (perhaps the main competitor of neural networks) aremore efficient and have a better understood theory thanneural networks.
But they are linear.
The only “knob” you can turn is the kernel that gives thelearner access to similarity in a complex representation space.
Schütze: Representation learning for domain adaptation 29 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning: Nonlinearity
SVMs (perhaps the main competitor of neural networks) aremore efficient and have a better understood theory thanneural networks.
But they are linear.
The only “knob” you can turn is the kernel that gives thelearner access to similarity in a complex representation space.
Guess: Real life is complicated and often nonlinear.
Schütze: Representation learning for domain adaptation 29 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning: Nonlinearity
SVMs (perhaps the main competitor of neural networks) aremore efficient and have a better understood theory thanneural networks.
But they are linear.
The only “knob” you can turn is the kernel that gives thelearner access to similarity in a complex representation space.
Guess: Real life is complicated and often nonlinear.
Neural networks offer more flexibility in learning complexdecision boundaries.
Schütze: Representation learning for domain adaptation 29 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning: Representation learning + Architectures
Schütze: Representation learning for domain adaptation 30 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning: Representation learning + Architectures
Representation learning
Schütze: Representation learning for domain adaptation 30 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning: Representation learning + Architectures
Representation learning
Supervised learning to train embeddings or predict-vectors forwords
Schütze: Representation learning for domain adaptation 30 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning: Representation learning + Architectures
Representation learning
Supervised learning to train embeddings or predict-vectors forwords
Architectures: deep = multilayer
Schütze: Representation learning for domain adaptation 30 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning: Representation learning + Architectures
Representation learning
Supervised learning to train embeddings or predict-vectors forwords
Architectures: deep = multilayer
Use trained predict-vectors/embeddings in a deep, multilayerneural network architecture
Schütze: Representation learning for domain adaptation 30 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Embeddings = predict vectors (Schwenk & Koehn 2008)
Schütze: Representation learning for domain adaptation 31 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Embeddings = predict vectors (Schwenk & Koehn 2008)
Schütze: Representation learning for domain adaptation 31 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Embeddings = predict vectors (Schwenk & Koehn 2008)
Language modeling task: predict the next word wj from the n − 1preceding words wj−n+1,wj−n+2, . . . ,wj−1
Schütze: Representation learning for domain adaptation 31 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Embeddings = predict vectors (Schwenk & Koehn 2008)
Input representation of words: one-hot vectors
Schütze: Representation learning for domain adaptation 31 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Embeddings = predict vectors (Schwenk & Koehn 2008)
Output/target: multinomial classification, V classes, where V isthe size of the vocabulary
Schütze: Representation learning for domain adaptation 31 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Embeddings = predict vectors (Schwenk & Koehn 2008)
Embedding layer for learning predict vectors. There is only oneembedding per word, independent of position
Schütze: Representation learning for domain adaptation 31 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Embeddings = predict vectors (Schwenk & Koehn 2008)
Complex nonlinear decision surfaces can be learned due to hiddenlayer.
Schütze: Representation learning for domain adaptation 31 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Embeddings = predict vectors (Schwenk & Koehn 2008)
Embeddings/predict vectors are learned by backpropagating theprediction error.
Schütze: Representation learning for domain adaptation 31 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Embeddings/predict vectors: Comments
Schütze: Representation learning for domain adaptation 32 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Embeddings/predict vectors: Comments
Low dimensionality, can be used efficiently for a wide range ofNLP tasks
Schütze: Representation learning for domain adaptation 32 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Embeddings/predict vectors: Comments
Low dimensionality, can be used efficiently for a wide range ofNLP tasks
Supervised training, can in theory learn arbitrarily complexphenomena
Schütze: Representation learning for domain adaptation 32 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Embeddings/predict vectors: Comments
Low dimensionality, can be used efficiently for a wide range ofNLP tasks
Supervised training, can in theory learn arbitrarily complexphenomena
Rare events as well as frequent events
Schütze: Representation learning for domain adaptation 32 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Embeddings/predict vectors: Comments
Low dimensionality, can be used efficiently for a wide range ofNLP tasks
Supervised training, can in theory learn arbitrarily complexphenomena
Rare events as well as frequent eventsComplex contextual dependencies
Schütze: Representation learning for domain adaptation 32 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Embeddings/predict vectors: Comments
Low dimensionality, can be used efficiently for a wide range ofNLP tasks
Supervised training, can in theory learn arbitrarily complexphenomena
Rare events as well as frequent eventsComplex contextual dependencies
Word order is respected.
Schütze: Representation learning for domain adaptation 32 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Embeddings/predict vectors: Comments
Low dimensionality, can be used efficiently for a wide range ofNLP tasks
Supervised training, can in theory learn arbitrarily complexphenomena
Rare events as well as frequent eventsComplex contextual dependencies
Word order is respected.
Very cautious independence assumptions compared to countvectors (high-order Markov assumption)
Schütze: Representation learning for domain adaptation 32 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Embeddings/predict vectors: Comments
Low dimensionality, can be used efficiently for a wide range ofNLP tasks
Supervised training, can in theory learn arbitrarily complexphenomena
Rare events as well as frequent eventsComplex contextual dependencies
Word order is respected.
Very cautious independence assumptions compared to countvectors (high-order Markov assumption)
Many different approaches to learning embeddings/ predictvectors
Schütze: Representation learning for domain adaptation 32 / 97
-
t-SNE visualiza-tion of a smallsubset of embed-dings / predictvectors
(FC Barcelona)
(Man Utd)
(Arsenal FC)
(InterMilan FC)
(Schalke)
(AC Milan)
(Reading UK)
(Reading PA)
Barcelona
EnglandLondon
(Berlin)
(London)
Washington
(Los Angeles)
(LA)
(Rome)
(Paris)
(NY)(WA)
(Chicago)
villagetowncity
islandpark
Bayern
(Reading VERB)(Reading VERB)
(Learning)
vocabulary
poetry
composing
semantics
translating
terminology
writing
-
t-SNE visualiza-tion of a smallsubset of embed-dings / predictvectors
(FC Barcelona)
(Man Utd)
(Arsenal FC)
(InterMilan FC)
(Schalke)
(AC Milan)
-
t-SNE visualiza-tion of a smallsubset of embed-dings / predictvectors
(Reading UK)
(Reading PA)
Barcelona
EnglandLondon
(Berlin)
(London)
Washington
(Los Angeles)
(LA)
(Rome)
(Paris)
(NY)(WA)
(Chicago)
villagetowncity
islandpark
Bayern
-
t-SNE visualiza-tion of a smallsubset of embed-dings / predictvectors (Reading VERB)(Reading VERB)
(Learning)
vocabulary
poetry
composing
semantics
translating
terminology
writing
-
t-SNE visualiza-tion of a smallsubset of embed-dings / predictvectors
(FC Barcelona)
(Man Utd)
(Arsenal FC)
(InterMilan FC)
(Schalke)
(AC Milan)
(Reading UK)
(Reading PA)
Barcelona
EnglandLondon
(Berlin)
(London)
Washington
(Los Angeles)
(LA)
(Rome)
(Paris)
(NY)(WA)
(Chicago)
villagetowncity
islandpark
Bayern
(Reading VERB)(Reading VERB)
(Learning)
vocabulary
poetry
composing
semantics
translating
terminology
writing
-
Deep learning: Deep architectures
-
Deep learning: Deep architectures
-
Deep learning: Deep architectures
Lookup table: this is where the learnedembeddings are retrieved and fed into thenetwork
-
Deep learning: Deep architectures
Lookup table: this is where the learnedembeddings are retrieved and fed into thenetwork
Example of complex learning architecture:convolution, max over time, hidden layer
-
Deep learning: Deep architectures
Lookup table: this is where the learnedembeddings are retrieved and fed into thenetwork
Example of complex learning architecture:convolution, max over time, hidden layerMeaning of the logical operator “or” 6=embedding of “or”
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Vision: Deep network, automatic features
Schütze: Representation learning for domain adaptation 35 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Domain knowledge built in, end-to-end
Schütze: Representation learning for domain adaptation 36 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Key to remember: these are PREDICT vectors
Schütze: Representation learning for domain adaptation 37 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Key to remember: these are COUNT vectors
rich
poor0
50
100
150
200
0 50 100 150 200 250
goldsilver
disease
society
Schütze: Representation learning for domain adaptation 38 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality high
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality high low to medium
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality high low to mediumlearning regime
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality high low to mediumlearning regime unsupervised
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality high low to mediumlearning regime unsupervised supervised
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to model
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor good
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor goodindependence assumptions
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor goodindependence assumptions strong
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor goodindependence assumptions strong weak
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor goodindependence assumptions strong weak
simple
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor goodindependence assumptions strong weak
simple complex
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor goodindependence assumptions strong weak
simple complexefficient
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor goodindependence assumptions strong weak
simple complexefficient long training times
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor goodindependence assumptions strong weak
simple complexefficient long training timeselegant
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Count vectors vs Embeddings/Predict Vectors
COUNT PREDICT
dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor goodindependence assumptions strong weak
simple complexefficient long training timeselegant messy
Schütze: Representation learning for domain adaptation 39 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Vision: Deep network, automatic features
For natural language: what corresponds to pixels? whatcorresponds to edges? what corresponds to object parts? whatcorresponds to object models?
Schütze: Representation learning for domain adaptation 40 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning as a gestalt
Automatic learning of features(as opposed to hand-designed features)
nonlinear
Representation learning: embeddings or predict vectors
“deep” = “multi-layer architectures”
Schütze: Representation learning for domain adaptation 41 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning: Why now?
Schütze: Representation learning for domain adaptation 42 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning: Why now?
Moore’s law
Schütze: Representation learning for domain adaptation 42 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning: Why now?
Moore’s law
Big data: Several orders of magnitude more than in 80s / 90s
Schütze: Representation learning for domain adaptation 42 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning: Why now?
Moore’s law
Big data: Several orders of magnitude more than in 80s / 90s
Better understanding of how to train very complex networks:initialization, regularization, much expanded bag of tricks
Schütze: Representation learning for domain adaptation 42 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning: Why now?
Moore’s law
Big data: Several orders of magnitude more than in 80s / 90s
Better understanding of how to train very complex networks:initialization, regularization, much expanded bag of tricks
Canonical machine learning stuck? – Great strides, but notrecently
Schütze: Representation learning for domain adaptation 42 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Deep learning: Why now?
Moore’s law
Big data: Several orders of magnitude more than in 80s / 90s
Better understanding of how to train very complex networks:initialization, regularization, much expanded bag of tricks
Canonical machine learning stuck? – Great strides, but notrecently
Diverse knowledge about the domain can be integrated into aneural network architecture in a very flexible way – but it stillcan be trained end-to-end.
Schütze: Representation learning for domain adaptation 42 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Types of representations used in NLP
NONE: No representation, except for word index
LING: Representations based on linguistic resources
COUNT: Count vectors
UNSU: Representations learned by unsupervised learning
PREDICT: Representations learned by supervised learning:embeddings / predict vectors
Schütze: Representation learning for domain adaptation 43 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Types of representations used in NLP
NONE: No representation, except for word index (typicalapproach to supervised training in NLP is to have no initialrepresentation of a word)
LING: Representations based on linguistic resources
COUNT: Count vectors
UNSU: Representations learned by unsupervised learning
PREDICT: Representations learned by supervised learning:embeddings / predict vectors
Schütze: Representation learning for domain adaptation 43 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Types of representations used in NLP
NONE: No representation, except for word index (typicalapproach to supervised training in NLP is to have no initialrepresentation of a word)
LING: Representations based on linguistic resources
COUNT: Count vectors
UNSU: Representations learned by unsupervised learning:SVD, LSI, PLSI, NMF, Hellinger PCA, MSDA, (Brown)clustering
PREDICT: Representations learned by supervised learning:embeddings / predict vectors
Schütze: Representation learning for domain adaptation 43 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Types of representations used in NLP
NONE: No representation, except for word index (typicalapproach to supervised training in NLP is to have no initialrepresentation of a word)
LING: Representations based on linguistic resources
COUNT: Count vectors
UNSU: Representations learned by unsupervised learning:SVD, LSI, PLSI, NMF, Hellinger PCA, MSDA, (Brown)clustering
PREDICT: Representations learned by supervised learning:embeddings / predict vectors
Next: Which representation is best for NLP?Schütze: Representation learning for domain adaptation 43 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Which representation is best for domain adaptation?
Task 1: Part-of-speech (POS) tagging
Task 2: Morphological (MORPH) tagging
Task 3: Sentiment
Task 4: Semantic similarity
Schütze: Representation learning for domain adaptation 44 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Which representation is best for domain adaptation?
Task 1: Part-of-speech (POS) tagging(very low complexity task)
Task 2: Morphological (MORPH) tagging
Task 3: Sentiment
Task 4: Semantic similarity
Schütze: Representation learning for domain adaptation 44 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Which representation is best for domain adaptation?
Task 1: Part-of-speech (POS) tagging(very low complexity task)
Task 2: Morphological (MORPH) tagging(low complexity task)
Task 3: Sentiment
Task 4: Semantic similarity
Schütze: Representation learning for domain adaptation 44 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Which representation is best for domain adaptation?
Task 1: Part-of-speech (POS) tagging(very low complexity task)
Task 2: Morphological (MORPH) tagging(low complexity task)
Task 3: Sentiment(medium complexity task)
Task 4: Semantic similarity
Schütze: Representation learning for domain adaptation 44 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Which representation is best for domain adaptation?
Task 1: Part-of-speech (POS) tagging(very low complexity task)
Task 2: Morphological (MORPH) tagging(low complexity task)
Task 3: Sentiment(medium complexity task)
Task 4: Semantic similarity(high complexity task)
Schütze: Representation learning for domain adaptation 44 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problem setting: Domain adaptation
Schütze: Representation learning for domain adaptation 45 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problem setting: Domain adaptation
Schütze: Representation learning for domain adaptation 46 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Problem setting: Domain adaptation
Schütze: Representation learning for domain adaptation 47 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Outline
1 “Traditional” computational linguistics representations
2 Count vector representations
3 Deep learning representations
4 Task 1: Part-of-speech (POS) tagging
5 Task 2: Morphological (MORPH) tagging
6 Task 3: Sentiment analysis
7 Task 4: Semantic similarity between words
8 Conclusion
Schnabel & Schütze: POS tagging 48 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
This section based on: Schnabel & Schütze. FLORS: Fastand Simple Domain Adaptation for Part-of-Speech Tag-ging. In Transactions of the Association for ComputationalLinguistics (TACL), 2:1526, 2014
Schnabel & Schütze: POS tagging 49 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Task: Part-of-speech (POS) tagging
Schnabel & Schütze: POS tagging 50 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Task: Part-of-speech (POS) tagging
Disambiguate part-of-speech (syntactic category) in context
Schnabel & Schütze: POS tagging 50 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Task: Part-of-speech (POS) tagging
Disambiguate part-of-speech (syntactic category) in context
Example:time NNflies VBZlike INan DTarrow NN
Schnabel & Schütze: POS tagging 50 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Task: Part-of-speech (POS) tagging
Disambiguate part-of-speech (syntactic category) in context
Example:time NNflies VBZlike INan DTarrow NN
“flies” can be a form of the verb “to fly” or the plural of thenoun “fly”.
Schnabel & Schütze: POS tagging 50 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Task: Part-of-speech (POS) tagging
Disambiguate part-of-speech (syntactic category) in context
Example:time NNflies VBZlike INan DTarrow NN
“flies” can be a form of the verb “to fly” or the plural of thenoun “fly”.
It is correctly disambiguated here.
Schnabel & Schütze: POS tagging 50 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Representation for POS tagging
Schnabel & Schütze: POS tagging 51 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Representation for POS tagging
Formalize problem as classification of a 5-word context (usinglinear SVM)
Schnabel & Schütze: POS tagging 51 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Representation for POS tagging
Formalize problem as classification of a 5-word context (usinglinear SVM)
Feature representation used for 5-word context:
suffix, shapeCOUNTUNSU: Brown clustersPREDICT: Collobert & Weston
Schnabel & Schütze: POS tagging 51 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Representation for POS tagging
Formalize problem as classification of a 5-word context (usinglinear SVM)
Feature representation used for 5-word context:
suffix, shapeCOUNTUNSU: Brown clustersPREDICT: Collobert & Weston
Question: Which representation works best for POS tagging:COUNT, UNSU or PREDICT?
Schnabel & Schütze: POS tagging 51 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Representation for POS tagging
Schnabel & Schütze: POS tagging 52 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
POS tagging: Results
newsgroups reviews weblogsALL OOV ALL OOV ALL OOV
COUNT 90.86 66.42 92.95 75.29 94.71 83.64UNSU 90.34∗ 62.41∗ 92.23∗ 71.47∗ 94.45 81.76PREDICT 90.57 64.57 92.54∗ 72.48∗ 94.51 80.58∗
answers emailsALL OOV ALL OOV
COUNT 90.30 62.15 89.44 62.61UNSU 89.71∗ 56.28∗ 89.02∗ 63.20PREDICT 90.23 60.99 89.44 63.13
Schnabel & Schütze: POS tagging 56 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
POS tagging: Results
newsgroups reviews weblogsALL OOV ALL OOV ALL OOV
COUNT 90.86 66.42 92.95 75.29 94.71 83.64PREDICT 90.57 64.57 92.54∗ 72.48∗ 94.51 80.58∗
UNSU 90.34∗ 62.41∗ 92.23∗ 71.47∗ 94.45 81.76
answers emails INDOMAINALL OOV ALL OOV ALL OOV
COUNT 90.30 62.15 89.44 62.61 96.59 90.37PREDICT 90.23 60.99 89.44 63.13 96.72 90.48UNSU 89.71∗ 56.28∗ 89.02∗ 63.20 96.48∗ 87.50
Schnabel & Schütze: POS tagging 57 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Best representation for POS tagging
NONE: No representation
LING: Representations based on linguistic resources
COUNT: Count vectors
UNSU: Rep’s learned by unsupervised learning
PREDICT: Predict vectors
And the winner is:
Schnabel & Schütze: POS tagging 58 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Best representation for POS tagging
NONE: No representation
LING: Representations based on linguistic resources
COUNT: Count vectors
UNSU: Rep’s learned by unsupervised learning
PREDICT: Predict vectors
And the winner is:
Schnabel & Schütze: POS tagging 58 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
POS tagging: Why is COUNT best?
NONE: No representation
LING: Representations based on linguistic resources
COUNT: Count vectors
UNSU: Rep’s learned by unsupervised learning
PREDICT: Predict vectors
Schnabel & Schütze: POS tagging 59 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
POS tagging: Why is COUNT best?
NONE: No representation
LING: Representations based on linguistic resources
COUNT: Count vectors
UNSU: Rep’s learned by unsupervised learning
PREDICT: Predict vectors
COUNT is better than NONE because representation learning(doing some adaptation vs no adaptation) works in this case.
Schnabel & Schütze: POS tagging 59 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
POS tagging: Why is COUNT best?
NONE: No representation
LING: Representations based on linguistic resources
COUNT: Count vectors
UNSU: Rep’s learned by unsupervised learning
PREDICT: Predict vectors
COUNT is better than NONE because representation learning(doing some adaptation vs no adaptation) works in this case.
Why is COUNT better than UNSU and PREDICT?
Schnabel & Schütze: POS tagging 59 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
POS tagging: Why is COUNT best?
NONE: No representation
LING: Representations based on linguistic resources
COUNT: Count vectors
UNSU: Rep’s learned by unsupervised learning
PREDICT: Predict vectors
COUNT is better than NONE because representation learning(doing some adaptation vs no adaptation) works in this case.
Why is COUNT better than UNSU and PREDICT?
Hypothesis: POS tagging is a very simple problem, so you don’tneed a complex representation learning formalism.
Schnabel & Schütze: POS tagging 59 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Outline
1 “Traditional” computational linguistics representations
2 Count vector representations
3 Deep learning representations
4 Task 1: Part-of-speech (POS) tagging
5 Task 2: Morphological (MORPH) tagging
6 Task 3: Sentiment analysis
7 Task 4: Semantic similarity between words
8 Conclusion
Müller, Schmid, Schütze (in progress): MORPH tagging 60 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
This section based on: Müller, Schmid & Schütze. DomainAdaptation for Morphological Tagging. In progress.
Müller, Schmid, Schütze (in progress): MORPH tagging 61 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Task: Morphological (MORPH) tagging
Müller, Schmid, Schütze (in progress): MORPH tagging 62 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Task: Morphological (MORPH) tagging
Disambiguate both part-of-speech and morphological features
Müller, Schmid, Schütze (in progress): MORPH tagging 62 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Task: Morphological (MORPH) tagging
Disambiguate both part-of-speech and morphological features
Example:Ein ART case=nom|number=sg|gender=neutKlettergebiet NN case=nom|number=sg|gender=neutmacht VVFIN number=sg|person=3|tense=pres|mood=indGeschichte NN case=acc|number=sg|gender=fem
Müller, Schmid, Schütze (in progress): MORPH tagging 62 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Task: Morphological (MORPH) tagging
Disambiguate both part-of-speech and morphological features
Example:Ein ART case=nom|number=sg|gender=neutKlettergebiet NN case=nom|number=sg|gender=neutmacht VVFIN number=sg|person=3|tense=pres|mood=indGeschichte NN case=acc|number=sg|gender=fem
Part-of-speech disambiguation: ART, NN, VFIN
Müller, Schmid, Schütze (in progress): MORPH tagging 62 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Task: Morphological (MORPH) tagging
Disambiguate both part-of-speech and morphological features
Example:Ein ART case=nom|number=sg|gender=neutKlettergebiet NN case=nom|number=sg|gender=neutmacht VVFIN number=sg|person=3|tense=pres|mood=indGeschichte NN case=acc|number=sg|gender=fem
Part-of-speech disambiguation: ART, NN, VFIN
Morphological disambiguation: case=nom, number=sg,tense=pres, mood=ind etc
Müller, Schmid, Schütze (in progress): MORPH tagging 62 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Representation for MORPH tagging
Müller, Schmid, Schütze (in progress): MORPH tagging 63 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Representation for MORPH tagging
Formalize problem as sequence classification (usinghigher-order CRF: MarMoT)
Müller, Schmid, Schütze (in progress): MORPH tagging 63 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Representation for MORPH tagging
Formalize problem as sequence classification (usinghigher-order CRF: MarMoT)
Feature representation used for each token:
NONE (word index), suffix, shapeUNSU: SVD, Brown clustersPREDICT: polyglot (Al-Rfou et al)LING: finite state morphology (Manually created linguisticresource)
Müller, Schmid, Schütze (in progress): MORPH tagging 63 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Representation for MORPH tagging
Formalize problem as sequence classification (usinghigher-order CRF: MarMoT)
Feature representation used for each token:
NONE (word index), suffix, shapeUNSU: SVD, Brown clustersPREDICT: polyglot (Al-Rfou et al)LING: finite state morphology (Manually created linguisticresource)
Question: Which representation works best for MORPHtagging: NONE, LING, UNSU or PREDICT?
Müller, Schmid, Schütze (in progress): MORPH tagging 63 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
MORPH tagging: In domain results
SVMTool Morfette MarMoT
NONE NONE NONE UNSU1 UNSU2 PREDICT LING
cs 91.06 91.48 93.86 94.15 94.16 94.13 94.52
hu 94.72 95.47 96.14 96.45 96.47 96.46 96.84
Müller, Schmid, Schütze (in progress): MORPH tagging 68 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
MORPH tagging: Results
MarMoT
NONE UNSU1 UNSU2 PREDICT LING
Czech 78.01 78.44 78.51 78.42 78.88
Hungarian 89.77 90.52 90.41 90.88 91.24
SVMTool Morfette MarMoT
NONE NONE NONE UNSU1 UNSU2 PREDICT LING
cs 75.28 76.04 78.01 78.44 78.51 78.42 78.88
hu 88.44 89.18 89.77 90.52 90.41 90.88 91.24
Müller, Schmid, Schütze (in progress): MORPH tagging 69 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Best representation for morphology DA
NONE: No representation
LING: Representations based on linguistic resources
COUNT: Count vectors
UNSU: Rep’s learned by unsupervised learning
PREDICT: Predict vectors
And the winner is:
Müller, Schmid, Schütze (in progress): MORPH tagging 70 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Best representation for morphology DA
NONE: No representation
LING: Representations based on linguistic resources
COUNT: Count vectors
UNSU: Rep’s learned by unsupervised learning
PREDICT: Predict vectors
And the winner is:
Müller, Schmid, Schütze (in progress): MORPH tagging 70 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
MORPH tagging: Why is LING best?
NONE: No representation
LING: Representations based on linguistic resources
COUNT: Count vectors
UNSU: Rep’s learned by unsupervised learning
PREDICT: Predict vectors
Müller, Schmid, Schütze (in progress): MORPH tagging 71 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
MORPH tagging: Why is LING best?
NONE: No representation
LING: Representations based on linguistic resources
COUNT: Count vectors
UNSU: Rep’s learned by unsupervised learning
PREDICT: Predict vectors
Hypothesis: Learning morphological paradigms is actually a prettyhard problem. So the representation learning algorithms failed?
Müller, Schmid, Schütze (in progress): MORPH tagging 71 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Discussion
Müller, Schmid, Schütze (in progress): MORPH tagging 72 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Discussion
Morphology is more Zipfian.
Müller, Schmid, Schütze (in progress): MORPH tagging 72 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Discussion
Morphology is more Zipfian.
This is a difference between English (morphologically poor)and Czech / Hungarian (morphologically rich).
Müller, Schmid, Schütze (in progress): MORPH tagging 72 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Discussion
Morphology is more Zipfian.
This is a difference between English (morphologically poor)and Czech / Hungarian (morphologically rich).
Something like gender is difficult to infer from count vectors.
Müller, Schmid, Schütze (in progress): MORPH tagging 72 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Outline
1 “Traditional” computational linguistics representations
2 Count vector representations
3 Deep learning representations
4 Task 1: Part-of-speech (POS) tagging
5 Task 2: Morphological (MORPH) tagging
6 Task 3: Sentiment analysis
7 Task 4: Semantic similarity between words
8 Conclusion
Chen et al.: Sentiment 73 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
This section based on: Chen, Xu, Weinberger, Sha.Marginalized denoising autoencoders for domain adapta-tion. ICML 2012
Chen et al.: Sentiment 74 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Task: Sentiment analysis
For a review (of a book, a camera, a washing machine etc):determine if the review has positive polarity or negativepolarity.
Chen et al.: Sentiment 75 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Example of a review
I photograph almost 45 years and now is photography as my joband I am as a member of The Royal Photographic Society inEngland. I had bought the photographic books as new one and inthe secondhand bookstore. I have at this time maybe 2 meters longa queue of these photographic books in english, german and czechlanguage. I know, what is important information for photographerand what is the value the information in the proper time. . . . Isummarize the impression from this book: I can very hardrecommend this book not only for beginner but so for advancedphotographer with very strong interest ybout close-up photography.
Chen et al.: Sentiment 76 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Example of a review
I photograph almost 45 years and now is photography as my joband I am as a member of The Royal Photographic Society inEngland. I had bought the photographic books as new one and inthe secondhand bookstore. I have at this time maybe 2 meters longa queue of these photographic books in english, german and czechlanguage. I know, what is important information for photographerand what is the value the information in the proper time. . . . Isummarize the impression from this book: I can very hardrecommend this book not only for beginner but so for advancedphotographer with very strong interest ybout close-up photography.
categories: positive / neutral / negative
Chen et al.: Sentiment 76 / 97
-
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics
Example of a review
I photograph almost 45 years and now is photography as my joband I am as a member of The Royal Photographic Society inEngland. I had bought the photographic books as new one and inthe secondhand bookstore. I have at this time maybe 2 meters longa queue of these photographic books in english, german and czechlanguage. I know, what is important information for photographerand what is the value the information in the proper time. . . . Isummarize the impression from this book: I can very hardrecommend this book not only for beginner but so for advancedphotographer with very strong interest ybout close-up photography.
categories: positive / neutral / negative
classification decision: positive
Chen et al.: Sentiment 76 / 97
-
LING reps Count vec