Hinrich Schu¨tze 2015-11-04 - uni-muenchen.de › ~hs › teach › 15w › pmclii › pdf ›...

277
LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics Representation Learning for Domain Adaptation Hinrich Sch¨ utze Center for Information and Language Processing, University of Munich 2015-11-04 Sch¨ utze: Representation learning for domain adaptation 1 / 97

Transcript of Hinrich Schu¨tze 2015-11-04 - uni-muenchen.de › ~hs › teach › 15w › pmclii › pdf ›...

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Representation Learning for Domain Adaptation

    Hinrich Schütze

    Center for Information and Language Processing, University of Munich

    2015-11-04

    Schütze: Representation learning for domain adaptation 1 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Overview

    1 “Traditional” computational linguistics representations

    2 Count vector representations

    3 Deep learning representations

    4 Task 1: Part-of-speech (POS) tagging

    5 Task 2: Morphological (MORPH) tagging

    6 Task 3: Sentiment analysis

    7 Task 4: Semantic similarity between words

    8 Conclusion

    Schütze: Representation learning for domain adaptation 2 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Outline

    1 “Traditional” computational linguistics representations

    2 Count vector representations

    3 Deep learning representations

    4 Task 1: Part-of-speech (POS) tagging

    5 Task 2: Morphological (MORPH) tagging

    6 Task 3: Sentiment analysis

    7 Task 4: Semantic similarity between words

    8 Conclusion

    Schütze: Representation learning for domain adaptation 3 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Representations created by (computational) linguists

    Generative lexicon (Pustejovsky) entry for “build”:

    Schütze: Representation learning for domain adaptation 4 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Representations created by (computational) linguists

    Lexicon entry for “obeshchat’” in Tolkovo-kombinatornyj Slovar’Sovremennogo Russkogo Jazyka:

    Schütze: Representation learning for domain adaptation 5 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Representations created by (computational) linguists

    Morphological paradigm of French verb “faire”:

    Schütze: Representation learning for domain adaptation 6 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    “Traditional” representations in computational linguistics

    Schütze: Representation learning for domain adaptation 7 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    “Traditional” representations in computational linguistics

    Motivated by linguistic theory

    Schütze: Representation learning for domain adaptation 7 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    “Traditional” representations in computational linguistics

    Motivated by linguistic theory

    Many successes in practical applications

    Schütze: Representation learning for domain adaptation 7 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    “Traditional” representations in computational linguistics

    Motivated by linguistic theory

    Many successes in practical applications

    So why would we need any other representation incomputational linguistics?

    Schütze: Representation learning for domain adaptation 7 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Why learned representations: Problems with LING reps

    Coverage

    Domain dependence

    Noise / need for robustness

    Manual creation of representations for rich semantics / worldknowledge: unsolved problem

    Schütze: Representation learning for domain adaptation 8 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Coverage

    Schütze: Representation learning for domain adaptation 9 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Coverage

    Natural languages are productive: new words and meaningsare created all the time.

    Schütze: Representation learning for domain adaptation 9 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Coverage

    Natural languages are productive: new words and meaningsare created all the time.

    Example: “unfriend”

    Schütze: Representation learning for domain adaptation 9 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Coverage

    Natural languages are productive: new words and meaningsare created all the time.

    Example: “unfriend”

    “A new study from a University of Colorado Denver gradstudent attempts to uncover what types of people we aremost likely to unfriend.”

    Schütze: Representation learning for domain adaptation 9 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Coverage

    Natural languages are productive: new words and meaningsare created all the time.

    Example: “unfriend”

    “A new study from a University of Colorado Denver gradstudent attempts to uncover what types of people we aremost likely to unfriend.”

    Traditional CL: New words are not covered.

    Schütze: Representation learning for domain adaptation 9 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Coverage

    Natural languages are productive: new words and meaningsare created all the time.

    Example: “unfriend”

    “A new study from a University of Colorado Denver gradstudent attempts to uncover what types of people we aremost likely to unfriend.”

    Traditional CL: New words are not covered.

    Representation learning: Representations for new words canbe automatically learned.

    Schütze: Representation learning for domain adaptation 9 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Domain dependence

    Schütze: Representation learning for domain adaptation 10 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Domain dependence

    The language in many NLP applications has domain-specificproperties.

    Schütze: Representation learning for domain adaptation 10 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Domain dependence

    The language in many NLP applications has domain-specificproperties.

    Example: Patents

    Schütze: Representation learning for domain adaptation 10 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Domain dependence

    The language in many NLP applications has domain-specificproperties.

    Example: Patents

    An apparatus for winding fence material comprising a leadingedge portion, wherein said apparatus is comprised of a firstshaft . . .

    Schütze: Representation learning for domain adaptation 10 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Domain dependence

    The language in many NLP applications has domain-specificproperties.

    Example: Patents

    An apparatus for winding fence material comprising a leadingedge portion, wherein said apparatus is comprised of a firstshaft . . .

    The word “said” is used as a demonstrative here.

    Schütze: Representation learning for domain adaptation 10 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Domain dependence

    The language in many NLP applications has domain-specificproperties.

    Example: Patents

    An apparatus for winding fence material comprising a leadingedge portion, wherein said apparatus is comprised of a firstshaft . . .

    The word “said” is used as a demonstrative here.

    Traditional CL: Domain-dependent usage not covered.

    Schütze: Representation learning for domain adaptation 10 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Domain dependence

    The language in many NLP applications has domain-specificproperties.

    Example: Patents

    An apparatus for winding fence material comprising a leadingedge portion, wherein said apparatus is comprised of a firstshaft . . .

    The word “said” is used as a demonstrative here.

    Traditional CL: Domain-dependent usage not covered.

    Representation learning: Representations fordomain-dependent usage can be automatically learned.

    Schütze: Representation learning for domain adaptation 10 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Noise / need for robustness

    Tweet: “water” = “what are”

    Amazon review: “since i was young i always dreamed of goingto walt disney world, but no that i live in florida i go thereevery chance i get!but the days i cant go i just play this game,its like being on the rides themselves!not to easy for you tobeat in a day”

    Schütze: Representation learning for domain adaptation 11 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Rich semantics / world

    knowledge

    Schütze: Representation learning for domain adaptation 12 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Rich semantics / world

    knowledge

    spider

    Schütze: Representation learning for domain adaptation 12 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Rich semantics / world

    knowledge

    spider

    Schütze: Representation learning for domain adaptation 12 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Rich semantics / world

    knowledge

    spiderTaxonomy: spi-ders are animals,similar to insects

    Schütze: Representation learning for domain adaptation 12 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Rich semantics / world

    knowledge

    spiderTaxonomy: spi-ders are animals,similar to insects

    Attributes of spi-ders: venomous,fuzzy, small, fast-moving

    Schütze: Representation learning for domain adaptation 12 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Rich semantics / world

    knowledge

    spiderTaxonomy: spi-ders are animals,similar to insects

    Attributes of spi-ders: venomous,fuzzy, small, fast-moving

    Typical actionsof spiders: bite,prey, weave,burrow

    Schütze: Representation learning for domain adaptation 12 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Rich semantics / world

    knowledge

    spiderTaxonomy: spi-ders are animals,similar to insects

    Attributes of spi-ders: venomous,fuzzy, small, fast-moving

    Typical actionsof spiders: bite,prey, weave,burrow

    Schütze: Representation learning for domain adaptation 12 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Rich semantics / world

    knowledge

    spiderTaxonomy: spi-ders are animals,similar to insects

    Attributes of spi-ders: venomous,fuzzy, small, fast-moving

    Typical actionsof spiders: bite,prey, weave,burrow

    Schütze: Representation learning for domain adaptation 12 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Rich semantics / world

    knowledge

    spiderTaxonomy: spi-ders are animals,similar to insects

    Attributes of spi-ders: venomous,fuzzy, small, fast-moving

    Typical actionsof spiders: bite,prey, weave,burrow

    Schütze: Representation learning for domain adaptation 12 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Rich semantics / world

    knowledge

    spiderTaxonomy: spi-ders are animals,similar to insects

    Attributes of spi-ders: venomous,fuzzy, small, fast-moving

    Typical actionsof spiders: bite,prey, weave,burrow

    Schütze: Representation learning for domain adaptation 12 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Rich semantics / world

    knowledge

    Taxonomy: spi-ders are animals,similar to insects

    Attributes of spi-ders: venomous,fuzzy, small, fast-moving

    Typical actionsof spiders: bite,prey, weave,burrow

    Meaning is very het-erogeneous: abstract,concrete, sensory,core semantics, worldknowledge

    Schütze: Representation learning for domain adaptation 13 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Rich semantics / world

    knowledge

    Taxonomy: spi-ders are animals,similar to insects

    Attributes of spi-ders: venomous,fuzzy, small, fast-moving

    Typical actionsof spiders: bite,prey, weave,burrow

    Meaning is very het-erogeneous: abstract,concrete, sensory,core semantics, worldknowledge

    Traditional CL: Diffi-cult to represent allthis in a computation-ally useful way

    Schütze: Representation learning for domain adaptation 13 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Rich semantics / world

    knowledge

    Taxonomy: spi-ders are animals,similar to insects

    Attributes of spi-ders: venomous,fuzzy, small, fast-moving

    Typical actionsof spiders: bite,prey, weave,burrow

    Meaning is very het-erogeneous: abstract,concrete, sensory,core semantics, worldknowledge

    Traditional CL: Diffi-cult to represent allthis in a computation-ally useful way

    Representation learn-ing: deal well withheterogeneity ofmeaning

    Schütze: Representation learning for domain adaptation 13 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Zipf

    The long tail of language use

    The adjective “hard”: die hard, hard by, hard and fast, hardcopy, hard back, hard core, hard disk, hard drive, hard drugs,hard earned, hard hit, hard rock, hard going, hard nosed, hardof hearing, hard put, hard to get, hard way

    Schütze: Representation learning for domain adaptation 14 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Zipf

    The long tail of language use

    The adjective “hard”: die hard, hard by, hard and fast, hardcopy, hard back, hard core, hard disk, hard drive, hard drugs,hard earned, hard hit, hard rock, hard going, hard nosed, hardof hearing, hard put, hard to get, hard way

    Not just memorization:

    Schütze: Representation learning for domain adaptation 14 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Zipf

    The long tail of language use

    The adjective “hard”: die hard, hard by, hard and fast, hardcopy, hard back, hard core, hard disk, hard drive, hard drugs,hard earned, hard hit, hard rock, hard going, hard nosed, hardof hearing, hard put, hard to get, hard way

    Not just memorization: “hard and fast”

    Schütze: Representation learning for domain adaptation 14 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Zipf

    The long tail of language use

    The adjective “hard”: die hard, hard by, hard and fast, hardcopy, hard back, hard core, hard disk, hard drive, hard drugs,hard earned, hard hit, hard rock, hard going, hard nosed, hardof hearing, hard put, hard to get, hard way

    Not just memorization: “hard and fast” → “fast and hard”,

    Schütze: Representation learning for domain adaptation 14 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Zipf

    The long tail of language use

    The adjective “hard”: die hard, hard by, hard and fast, hardcopy, hard back, hard core, hard disk, hard drive, hard drugs,hard earned, hard hit, hard rock, hard going, hard nosed, hardof hearing, hard put, hard to get, hard way

    Not just memorization: “hard and fast” → “fast and hard”,“hard back”

    Schütze: Representation learning for domain adaptation 14 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Zipf

    The long tail of language use

    The adjective “hard”: die hard, hard by, hard and fast, hardcopy, hard back, hard core, hard disk, hard drive, hard drugs,hard earned, hard hit, hard rock, hard going, hard nosed, hardof hearing, hard put, hard to get, hard way

    Not just memorization: “hard and fast” → “fast and hard”,“hard back” → “hard bound”,

    Schütze: Representation learning for domain adaptation 14 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Zipf

    The long tail of language use

    The adjective “hard”: die hard, hard by, hard and fast, hardcopy, hard back, hard core, hard disk, hard drive, hard drugs,hard earned, hard hit, hard rock, hard going, hard nosed, hardof hearing, hard put, hard to get, hard way

    Not just memorization: “hard and fast” → “fast and hard”,“hard back” → “hard bound”, “hard rock”

    Schütze: Representation learning for domain adaptation 14 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Zipf

    The long tail of language use

    The adjective “hard”: die hard, hard by, hard and fast, hardcopy, hard back, hard core, hard disk, hard drive, hard drugs,hard earned, hard hit, hard rock, hard going, hard nosed, hardof hearing, hard put, hard to get, hard way

    Not just memorization: “hard and fast” → “fast and hard”,“hard back” → “hard bound”, “hard rock” → “hard punk”,

    Schütze: Representation learning for domain adaptation 14 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problems for traditional CL: Zipf

    The long tail of language use

    The adjective “hard”: die hard, hard by, hard and fast, hardcopy, hard back, hard core, hard disk, hard drive, hard drugs,hard earned, hard hit, hard rock, hard going, hard nosed, hardof hearing, hard put, hard to get, hard way

    Not just memorization: “hard and fast” → “fast and hard”,“hard back” → “hard bound”, “hard rock” → “hard punk”,“truly knuckles-scraping-against-asphalt hard”

    Schütze: Representation learning for domain adaptation 14 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Why learned representations: Problems with LING reps

    Coverage

    Domain dependence

    Noise / need for robustness

    Manual creation of representations for rich semantics / worldknowledge: unsolved problem

    Schütze: Representation learning for domain adaptation 15 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Types of representations used in NLP

    NONE: No representation, except for word index

    LING: Representations based on linguistic resources

    COUNT: Count vectors

    UNSU: Representations learned by unsupervised learning

    PREDICT: Representations learned by supervised learning:embeddings / predict vectors

    Schütze: Representation learning for domain adaptation 16 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Types of representations used in NLP

    NONE: No representation, except for word index

    LING: Representations based on linguistic resources

    COUNT: Count vectors

    UNSU: Representations learned by unsupervised learning

    PREDICT: Representations learned by supervised learning:embeddings / predict vectors

    Next: COUNT vector models

    Schütze: Representation learning for domain adaptation 16 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Outline

    1 “Traditional” computational linguistics representations

    2 Count vector representations

    3 Deep learning representations

    4 Task 1: Part-of-speech (POS) tagging

    5 Task 2: Morphological (MORPH) tagging

    6 Task 3: Sentiment analysis

    7 Task 4: Semantic similarity between words

    8 Conclusion

    Schütze: Representation learning for domain adaptation 17 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector models

    Schütze: Representation learning for domain adaptation 18 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector models

    Dimensionality is vocabulary V (or large subset thereof)

    Schütze: Representation learning for domain adaptation 18 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector models

    Dimensionality is vocabulary V (or large subset thereof)

    Value of dimension i of distributional representation of wordv : (weighted) cooccurrence count of v and wi

    Schütze: Representation learning for domain adaptation 18 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector model: The counts

    Schütze: Representation learning for domain adaptation 19 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector model: The counts

    Count the cooccurrence of two words in a large corpus

    Schütze: Representation learning for domain adaptation 19 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector model: The counts

    Count the cooccurrence of two words in a large corpus

    E.g., cooccurrence = cooccurrence within k = 10 words

    Schütze: Representation learning for domain adaptation 19 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector model: The counts

    Count the cooccurrence of two words in a large corpus

    E.g., cooccurrence = cooccurrence within k = 10 words

    Example counts from the Wikipedia

    Schütze: Representation learning for domain adaptation 19 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector model: The counts

    Count the cooccurrence of two words in a large corpus

    E.g., cooccurrence = cooccurrence within k = 10 words

    Example counts from the Wikipedia

    cooc.(rich,silver) = 186cooc.(poor,silver) = 34cooc.(rich,disease) = 17cooc.(poor,disease) = 162cooc.(rich,society) = 143cooc.(poor,society) = 228

    Schütze: Representation learning for domain adaptation 19 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector model: Vectors

    Schütze: Representation learning for domain adaptation 20 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector model: Vectors

    rich

    poor0

    50

    100

    150

    200

    0 50 100 150 200 250

    Schütze: Representation learning for domain adaptation 20 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector model: Vectors

    rich

    poor0

    50

    100

    150

    200

    0 50 100 150 200 250

    cooc.(poor,silver)=34, cooc.(rich,silver)=186,

    Schütze: Representation learning for domain adaptation 20 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector model: Vectors

    rich

    poor0

    50

    100

    150

    200

    0 50 100 150 200 250

    silver

    cooc.(poor,silver)=34, cooc.(rich,silver)=186,

    Schütze: Representation learning for domain adaptation 20 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector model: Vectors

    rich

    poor0

    50

    100

    150

    200

    0 50 100 150 200 250

    silversilver

    cooc.(poor,silver)=34, cooc.(rich,silver)=186,

    Schütze: Representation learning for domain adaptation 20 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector model: Vectors

    rich

    poor0

    50

    100

    150

    200

    0 50 100 150 200 250

    silver

    cooc.(poor,silver)=34, cooc.(rich,silver)=186,

    Schütze: Representation learning for domain adaptation 20 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector model: Vectors

    rich

    poor0

    50

    100

    150

    200

    0 50 100 150 200 250

    silver

    cooc.(poor,silver)=34, cooc.(rich,silver)=186,cooc.(poor,disease)=162, cooc.(rich,disease)=17,

    Schütze: Representation learning for domain adaptation 20 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector model: Vectors

    rich

    poor0

    50

    100

    150

    200

    0 50 100 150 200 250

    silver

    disease

    cooc.(poor,silver)=34, cooc.(rich,silver)=186,cooc.(poor,disease)=162, cooc.(rich,disease)=17,

    Schütze: Representation learning for domain adaptation 20 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector model: Vectors

    rich

    poor0

    50

    100

    150

    200

    0 50 100 150 200 250

    silver

    disease

    cooc.(poor,silver)=34, cooc.(rich,silver)=186,cooc.(poor,disease)=162, cooc.(rich,disease)=17,cooc.(poor,society)=228, cooc.(rich,society)=143

    Schütze: Representation learning for domain adaptation 20 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector model: Vectors

    rich

    poor0

    50

    100

    150

    200

    0 50 100 150 200 250

    silver

    disease

    society

    cooc.(poor,silver)=34, cooc.(rich,silver)=186,cooc.(poor,disease)=162, cooc.(rich,disease)=17,cooc.(poor,society)=228, cooc.(rich,society)=143

    Schütze: Representation learning for domain adaptation 20 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector model: Similarity

    Schütze: Representation learning for domain adaptation 21 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector model: Similarity

    rich

    poor0

    50

    100

    150

    200

    0 50 100 150 200 250

    silver

    disease

    society

    The similarity between two words is the cosine of the anglebetween them.

    Schütze: Representation learning for domain adaptation 21 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector model: Similarity

    rich

    poor0

    50

    100

    150

    200

    0 50 100 150 200 250

    goldsilver

    disease

    society

    The similarity between two words is the cosine of the anglebetween them.

    Small angle: gold and silver are similar.

    Schütze: Representation learning for domain adaptation 21 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector model: Similarity

    rich

    poor0

    50

    100

    150

    200

    0 50 100 150 200 250

    goldsilver

    disease

    society

    The similarity between two words is the cosine of the anglebetween them.

    Large angle: gold and disease are not similar.

    Schütze: Representation learning for domain adaptation 21 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vector model: Similarity

    rich

    poor0

    50

    100

    150

    200

    0 50 100 150 200 250

    goldsilver

    disease

    society

    The similarity between two words is the cosine of the anglebetween them.

    Schütze: Representation learning for domain adaptation 21 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Types of representations used in NLP

    NONE: No representation, except for word index

    LING: Representations based on linguistic resources

    COUNT: Count vectors

    UNSU: Representations learned by unsupervised learning

    PREDICT: Representations learned by supervised learning:embeddings / predict vectors

    Next: PREDICT: predict vectors in deep learning

    Schütze: Representation learning for domain adaptation 22 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Terminology

    Schütze: Representation learning for domain adaptation 23 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Terminology

    A distributed representation

    is simply a vector representation, i.e., a point in a high-dimensionalreal-valued space. Implicit in the concept of distributedrepresentation is that similarity/distance is interpretable. E.g.,representing a 1000x1000 binary pixel image as a one milliondimensional vector is not distributed since similarity/distance doesnot have an intuitive interpretation.

    Schütze: Representation learning for domain adaptation 23 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Terminology

    A distributed representation

    is simply a vector representation, i.e., a point in a high-dimensionalreal-valued space. Implicit in the concept of distributedrepresentation is that similarity/distance is interpretable. E.g.,representing a 1000x1000 binary pixel image as a one milliondimensional vector is not distributed since similarity/distance doesnot have an intuitive interpretation.

    Embeddings/predict vectors, count vectors, representations learnedby unsupervised learning: these are all distributed representations.

    Schütze: Representation learning for domain adaptation 23 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Terminology

    A distributed representation

    is simply a vector representation, i.e., a point in a high-dimensionalreal-valued space. Implicit in the concept of distributedrepresentation is that similarity/distance is interpretable. E.g.,representing a 1000x1000 binary pixel image as a one milliondimensional vector is not distributed since similarity/distance doesnot have an intuitive interpretation.

    Embeddings/predict vectors, count vectors, representations learnedby unsupervised learning: these are all distributed representations.(Linguistic resources usually do not provide distributedrepresentations.)

    Schütze: Representation learning for domain adaptation 23 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Terminology (cont.)

    A distributional representation

    can be defined (i) as a representation based on distributionalinformation, i.e., on the distribution of words in contexts in a largecorpus or (ii) as a synonym of distributed representation.

    Schütze: Representation learning for domain adaptation 24 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Terminology (cont.)

    A distributional representation

    can be defined (i) as a representation based on distributionalinformation, i.e., on the distribution of words in contexts in a largecorpus or (ii) as a synonym of distributed representation.

    Count vectors are distributional representations according todefinition (i). Embeddings/predict vectors and representationslearned by unsupervised learning may or may not be viewed asdistributional representations according to definition (ii) since thelink to the distribution of words in contexts is more indirect in thiscase.

    Schütze: Representation learning for domain adaptation 24 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Outline

    1 “Traditional” computational linguistics representations

    2 Count vector representations

    3 Deep learning representations

    4 Task 1: Part-of-speech (POS) tagging

    5 Task 2: Morphological (MORPH) tagging

    6 Task 3: Sentiment analysis

    7 Task 4: Semantic similarity between words

    8 Conclusion

    Schütze: Representation learning for domain adaptation 25 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning as a gestalt

    Schütze: Representation learning for domain adaptation 26 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning as a gestalt

    Automatic learning of features(as opposed to hand-designed features)

    Schütze: Representation learning for domain adaptation 26 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning as a gestalt

    Automatic learning of features(as opposed to hand-designed features)

    nonlinear

    Schütze: Representation learning for domain adaptation 26 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning as a gestalt

    Automatic learning of features(as opposed to hand-designed features)

    nonlinear

    Representation learning: embeddings or predict vectors

    Schütze: Representation learning for domain adaptation 26 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning as a gestalt

    Automatic learning of features(as opposed to hand-designed features)

    nonlinear

    Representation learning: embeddings or predict vectors

    “deep” = “multi-layer architectures”

    Schütze: Representation learning for domain adaptation 26 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Geoff Hinton on Automatic Feature Learning

    Adding a layer of hand-coded features . . . makes them much morepowerful but the hard bit is designing the features. We need toautomate the loop of designing features for a particular task andseeing how well they work.

    Schütze: Representation learning for domain adaptation 27 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Opposing view: Feature design still is important

    Solving any complex task requires domain expertise.

    Domain expertise can be used in various ways: definition oftask to be learned, collection and composition of trainingdata, architecture of machine learning system, design ofrepresentation, design of features

    It’s unclear why domain expertise should be used for some ofthese, but not for feature design.

    Schütze: Representation learning for domain adaptation 28 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning: Nonlinearity

    Schütze: Representation learning for domain adaptation 29 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning: Nonlinearity

    SVMs (perhaps the main competitor of neural networks) aremore efficient and have a better understood theory thanneural networks.

    Schütze: Representation learning for domain adaptation 29 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning: Nonlinearity

    SVMs (perhaps the main competitor of neural networks) aremore efficient and have a better understood theory thanneural networks.

    But they are linear.

    Schütze: Representation learning for domain adaptation 29 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning: Nonlinearity

    SVMs (perhaps the main competitor of neural networks) aremore efficient and have a better understood theory thanneural networks.

    But they are linear.

    The only “knob” you can turn is the kernel that gives thelearner access to similarity in a complex representation space.

    Schütze: Representation learning for domain adaptation 29 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning: Nonlinearity

    SVMs (perhaps the main competitor of neural networks) aremore efficient and have a better understood theory thanneural networks.

    But they are linear.

    The only “knob” you can turn is the kernel that gives thelearner access to similarity in a complex representation space.

    Guess: Real life is complicated and often nonlinear.

    Schütze: Representation learning for domain adaptation 29 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning: Nonlinearity

    SVMs (perhaps the main competitor of neural networks) aremore efficient and have a better understood theory thanneural networks.

    But they are linear.

    The only “knob” you can turn is the kernel that gives thelearner access to similarity in a complex representation space.

    Guess: Real life is complicated and often nonlinear.

    Neural networks offer more flexibility in learning complexdecision boundaries.

    Schütze: Representation learning for domain adaptation 29 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning: Representation learning + Architectures

    Schütze: Representation learning for domain adaptation 30 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning: Representation learning + Architectures

    Representation learning

    Schütze: Representation learning for domain adaptation 30 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning: Representation learning + Architectures

    Representation learning

    Supervised learning to train embeddings or predict-vectors forwords

    Schütze: Representation learning for domain adaptation 30 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning: Representation learning + Architectures

    Representation learning

    Supervised learning to train embeddings or predict-vectors forwords

    Architectures: deep = multilayer

    Schütze: Representation learning for domain adaptation 30 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning: Representation learning + Architectures

    Representation learning

    Supervised learning to train embeddings or predict-vectors forwords

    Architectures: deep = multilayer

    Use trained predict-vectors/embeddings in a deep, multilayerneural network architecture

    Schütze: Representation learning for domain adaptation 30 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Embeddings = predict vectors (Schwenk & Koehn 2008)

    Schütze: Representation learning for domain adaptation 31 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Embeddings = predict vectors (Schwenk & Koehn 2008)

    Schütze: Representation learning for domain adaptation 31 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Embeddings = predict vectors (Schwenk & Koehn 2008)

    Language modeling task: predict the next word wj from the n − 1preceding words wj−n+1,wj−n+2, . . . ,wj−1

    Schütze: Representation learning for domain adaptation 31 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Embeddings = predict vectors (Schwenk & Koehn 2008)

    Input representation of words: one-hot vectors

    Schütze: Representation learning for domain adaptation 31 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Embeddings = predict vectors (Schwenk & Koehn 2008)

    Output/target: multinomial classification, V classes, where V isthe size of the vocabulary

    Schütze: Representation learning for domain adaptation 31 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Embeddings = predict vectors (Schwenk & Koehn 2008)

    Embedding layer for learning predict vectors. There is only oneembedding per word, independent of position

    Schütze: Representation learning for domain adaptation 31 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Embeddings = predict vectors (Schwenk & Koehn 2008)

    Complex nonlinear decision surfaces can be learned due to hiddenlayer.

    Schütze: Representation learning for domain adaptation 31 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Embeddings = predict vectors (Schwenk & Koehn 2008)

    Embeddings/predict vectors are learned by backpropagating theprediction error.

    Schütze: Representation learning for domain adaptation 31 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Embeddings/predict vectors: Comments

    Schütze: Representation learning for domain adaptation 32 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Embeddings/predict vectors: Comments

    Low dimensionality, can be used efficiently for a wide range ofNLP tasks

    Schütze: Representation learning for domain adaptation 32 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Embeddings/predict vectors: Comments

    Low dimensionality, can be used efficiently for a wide range ofNLP tasks

    Supervised training, can in theory learn arbitrarily complexphenomena

    Schütze: Representation learning for domain adaptation 32 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Embeddings/predict vectors: Comments

    Low dimensionality, can be used efficiently for a wide range ofNLP tasks

    Supervised training, can in theory learn arbitrarily complexphenomena

    Rare events as well as frequent events

    Schütze: Representation learning for domain adaptation 32 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Embeddings/predict vectors: Comments

    Low dimensionality, can be used efficiently for a wide range ofNLP tasks

    Supervised training, can in theory learn arbitrarily complexphenomena

    Rare events as well as frequent eventsComplex contextual dependencies

    Schütze: Representation learning for domain adaptation 32 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Embeddings/predict vectors: Comments

    Low dimensionality, can be used efficiently for a wide range ofNLP tasks

    Supervised training, can in theory learn arbitrarily complexphenomena

    Rare events as well as frequent eventsComplex contextual dependencies

    Word order is respected.

    Schütze: Representation learning for domain adaptation 32 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Embeddings/predict vectors: Comments

    Low dimensionality, can be used efficiently for a wide range ofNLP tasks

    Supervised training, can in theory learn arbitrarily complexphenomena

    Rare events as well as frequent eventsComplex contextual dependencies

    Word order is respected.

    Very cautious independence assumptions compared to countvectors (high-order Markov assumption)

    Schütze: Representation learning for domain adaptation 32 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Embeddings/predict vectors: Comments

    Low dimensionality, can be used efficiently for a wide range ofNLP tasks

    Supervised training, can in theory learn arbitrarily complexphenomena

    Rare events as well as frequent eventsComplex contextual dependencies

    Word order is respected.

    Very cautious independence assumptions compared to countvectors (high-order Markov assumption)

    Many different approaches to learning embeddings/ predictvectors

    Schütze: Representation learning for domain adaptation 32 / 97

  • t-SNE visualiza-tion of a smallsubset of embed-dings / predictvectors

    (FC Barcelona)

    (Man Utd)

    (Arsenal FC)

    (InterMilan FC)

    (Schalke)

    (AC Milan)

    (Reading UK)

    (Reading PA)

    Barcelona

    EnglandLondon

    (Berlin)

    (London)

    Washington

    (Los Angeles)

    (LA)

    (Rome)

    (Paris)

    (NY)(WA)

    (Chicago)

    villagetowncity

    islandpark

    Bayern

    (Reading VERB)(Reading VERB)

    (Learning)

    vocabulary

    poetry

    composing

    semantics

    translating

    terminology

    writing

  • t-SNE visualiza-tion of a smallsubset of embed-dings / predictvectors

    (FC Barcelona)

    (Man Utd)

    (Arsenal FC)

    (InterMilan FC)

    (Schalke)

    (AC Milan)

  • t-SNE visualiza-tion of a smallsubset of embed-dings / predictvectors

    (Reading UK)

    (Reading PA)

    Barcelona

    EnglandLondon

    (Berlin)

    (London)

    Washington

    (Los Angeles)

    (LA)

    (Rome)

    (Paris)

    (NY)(WA)

    (Chicago)

    villagetowncity

    islandpark

    Bayern

  • t-SNE visualiza-tion of a smallsubset of embed-dings / predictvectors (Reading VERB)(Reading VERB)

    (Learning)

    vocabulary

    poetry

    composing

    semantics

    translating

    terminology

    writing

  • t-SNE visualiza-tion of a smallsubset of embed-dings / predictvectors

    (FC Barcelona)

    (Man Utd)

    (Arsenal FC)

    (InterMilan FC)

    (Schalke)

    (AC Milan)

    (Reading UK)

    (Reading PA)

    Barcelona

    EnglandLondon

    (Berlin)

    (London)

    Washington

    (Los Angeles)

    (LA)

    (Rome)

    (Paris)

    (NY)(WA)

    (Chicago)

    villagetowncity

    islandpark

    Bayern

    (Reading VERB)(Reading VERB)

    (Learning)

    vocabulary

    poetry

    composing

    semantics

    translating

    terminology

    writing

  • Deep learning: Deep architectures

  • Deep learning: Deep architectures

  • Deep learning: Deep architectures

    Lookup table: this is where the learnedembeddings are retrieved and fed into thenetwork

  • Deep learning: Deep architectures

    Lookup table: this is where the learnedembeddings are retrieved and fed into thenetwork

    Example of complex learning architecture:convolution, max over time, hidden layer

  • Deep learning: Deep architectures

    Lookup table: this is where the learnedembeddings are retrieved and fed into thenetwork

    Example of complex learning architecture:convolution, max over time, hidden layerMeaning of the logical operator “or” 6=embedding of “or”

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Vision: Deep network, automatic features

    Schütze: Representation learning for domain adaptation 35 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Domain knowledge built in, end-to-end

    Schütze: Representation learning for domain adaptation 36 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Key to remember: these are PREDICT vectors

    Schütze: Representation learning for domain adaptation 37 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Key to remember: these are COUNT vectors

    rich

    poor0

    50

    100

    150

    200

    0 50 100 150 200 250

    goldsilver

    disease

    society

    Schütze: Representation learning for domain adaptation 38 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality high

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality high low to medium

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality high low to mediumlearning regime

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality high low to mediumlearning regime unsupervised

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality high low to mediumlearning regime unsupervised supervised

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to model

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor good

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor goodindependence assumptions

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor goodindependence assumptions strong

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor goodindependence assumptions strong weak

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor goodindependence assumptions strong weak

    simple

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor goodindependence assumptions strong weak

    simple complex

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor goodindependence assumptions strong weak

    simple complexefficient

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor goodindependence assumptions strong weak

    simple complexefficient long training times

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor goodindependence assumptions strong weak

    simple complexefficient long training timeselegant

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Count vectors vs Embeddings/Predict Vectors

    COUNT PREDICT

    dimensionality high low to mediumlearning regime unsupervised supervisedcomplex linguistic context hard to model easier to modelrare event coverage poor goodindependence assumptions strong weak

    simple complexefficient long training timeselegant messy

    Schütze: Representation learning for domain adaptation 39 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Vision: Deep network, automatic features

    For natural language: what corresponds to pixels? whatcorresponds to edges? what corresponds to object parts? whatcorresponds to object models?

    Schütze: Representation learning for domain adaptation 40 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning as a gestalt

    Automatic learning of features(as opposed to hand-designed features)

    nonlinear

    Representation learning: embeddings or predict vectors

    “deep” = “multi-layer architectures”

    Schütze: Representation learning for domain adaptation 41 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning: Why now?

    Schütze: Representation learning for domain adaptation 42 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning: Why now?

    Moore’s law

    Schütze: Representation learning for domain adaptation 42 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning: Why now?

    Moore’s law

    Big data: Several orders of magnitude more than in 80s / 90s

    Schütze: Representation learning for domain adaptation 42 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning: Why now?

    Moore’s law

    Big data: Several orders of magnitude more than in 80s / 90s

    Better understanding of how to train very complex networks:initialization, regularization, much expanded bag of tricks

    Schütze: Representation learning for domain adaptation 42 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning: Why now?

    Moore’s law

    Big data: Several orders of magnitude more than in 80s / 90s

    Better understanding of how to train very complex networks:initialization, regularization, much expanded bag of tricks

    Canonical machine learning stuck? – Great strides, but notrecently

    Schütze: Representation learning for domain adaptation 42 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Deep learning: Why now?

    Moore’s law

    Big data: Several orders of magnitude more than in 80s / 90s

    Better understanding of how to train very complex networks:initialization, regularization, much expanded bag of tricks

    Canonical machine learning stuck? – Great strides, but notrecently

    Diverse knowledge about the domain can be integrated into aneural network architecture in a very flexible way – but it stillcan be trained end-to-end.

    Schütze: Representation learning for domain adaptation 42 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Types of representations used in NLP

    NONE: No representation, except for word index

    LING: Representations based on linguistic resources

    COUNT: Count vectors

    UNSU: Representations learned by unsupervised learning

    PREDICT: Representations learned by supervised learning:embeddings / predict vectors

    Schütze: Representation learning for domain adaptation 43 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Types of representations used in NLP

    NONE: No representation, except for word index (typicalapproach to supervised training in NLP is to have no initialrepresentation of a word)

    LING: Representations based on linguistic resources

    COUNT: Count vectors

    UNSU: Representations learned by unsupervised learning

    PREDICT: Representations learned by supervised learning:embeddings / predict vectors

    Schütze: Representation learning for domain adaptation 43 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Types of representations used in NLP

    NONE: No representation, except for word index (typicalapproach to supervised training in NLP is to have no initialrepresentation of a word)

    LING: Representations based on linguistic resources

    COUNT: Count vectors

    UNSU: Representations learned by unsupervised learning:SVD, LSI, PLSI, NMF, Hellinger PCA, MSDA, (Brown)clustering

    PREDICT: Representations learned by supervised learning:embeddings / predict vectors

    Schütze: Representation learning for domain adaptation 43 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Types of representations used in NLP

    NONE: No representation, except for word index (typicalapproach to supervised training in NLP is to have no initialrepresentation of a word)

    LING: Representations based on linguistic resources

    COUNT: Count vectors

    UNSU: Representations learned by unsupervised learning:SVD, LSI, PLSI, NMF, Hellinger PCA, MSDA, (Brown)clustering

    PREDICT: Representations learned by supervised learning:embeddings / predict vectors

    Next: Which representation is best for NLP?Schütze: Representation learning for domain adaptation 43 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Which representation is best for domain adaptation?

    Task 1: Part-of-speech (POS) tagging

    Task 2: Morphological (MORPH) tagging

    Task 3: Sentiment

    Task 4: Semantic similarity

    Schütze: Representation learning for domain adaptation 44 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Which representation is best for domain adaptation?

    Task 1: Part-of-speech (POS) tagging(very low complexity task)

    Task 2: Morphological (MORPH) tagging

    Task 3: Sentiment

    Task 4: Semantic similarity

    Schütze: Representation learning for domain adaptation 44 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Which representation is best for domain adaptation?

    Task 1: Part-of-speech (POS) tagging(very low complexity task)

    Task 2: Morphological (MORPH) tagging(low complexity task)

    Task 3: Sentiment

    Task 4: Semantic similarity

    Schütze: Representation learning for domain adaptation 44 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Which representation is best for domain adaptation?

    Task 1: Part-of-speech (POS) tagging(very low complexity task)

    Task 2: Morphological (MORPH) tagging(low complexity task)

    Task 3: Sentiment(medium complexity task)

    Task 4: Semantic similarity

    Schütze: Representation learning for domain adaptation 44 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Which representation is best for domain adaptation?

    Task 1: Part-of-speech (POS) tagging(very low complexity task)

    Task 2: Morphological (MORPH) tagging(low complexity task)

    Task 3: Sentiment(medium complexity task)

    Task 4: Semantic similarity(high complexity task)

    Schütze: Representation learning for domain adaptation 44 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problem setting: Domain adaptation

    Schütze: Representation learning for domain adaptation 45 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problem setting: Domain adaptation

    Schütze: Representation learning for domain adaptation 46 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Problem setting: Domain adaptation

    Schütze: Representation learning for domain adaptation 47 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Outline

    1 “Traditional” computational linguistics representations

    2 Count vector representations

    3 Deep learning representations

    4 Task 1: Part-of-speech (POS) tagging

    5 Task 2: Morphological (MORPH) tagging

    6 Task 3: Sentiment analysis

    7 Task 4: Semantic similarity between words

    8 Conclusion

    Schnabel & Schütze: POS tagging 48 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    This section based on: Schnabel & Schütze. FLORS: Fastand Simple Domain Adaptation for Part-of-Speech Tag-ging. In Transactions of the Association for ComputationalLinguistics (TACL), 2:1526, 2014

    Schnabel & Schütze: POS tagging 49 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Task: Part-of-speech (POS) tagging

    Schnabel & Schütze: POS tagging 50 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Task: Part-of-speech (POS) tagging

    Disambiguate part-of-speech (syntactic category) in context

    Schnabel & Schütze: POS tagging 50 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Task: Part-of-speech (POS) tagging

    Disambiguate part-of-speech (syntactic category) in context

    Example:time NNflies VBZlike INan DTarrow NN

    Schnabel & Schütze: POS tagging 50 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Task: Part-of-speech (POS) tagging

    Disambiguate part-of-speech (syntactic category) in context

    Example:time NNflies VBZlike INan DTarrow NN

    “flies” can be a form of the verb “to fly” or the plural of thenoun “fly”.

    Schnabel & Schütze: POS tagging 50 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Task: Part-of-speech (POS) tagging

    Disambiguate part-of-speech (syntactic category) in context

    Example:time NNflies VBZlike INan DTarrow NN

    “flies” can be a form of the verb “to fly” or the plural of thenoun “fly”.

    It is correctly disambiguated here.

    Schnabel & Schütze: POS tagging 50 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Representation for POS tagging

    Schnabel & Schütze: POS tagging 51 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Representation for POS tagging

    Formalize problem as classification of a 5-word context (usinglinear SVM)

    Schnabel & Schütze: POS tagging 51 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Representation for POS tagging

    Formalize problem as classification of a 5-word context (usinglinear SVM)

    Feature representation used for 5-word context:

    suffix, shapeCOUNTUNSU: Brown clustersPREDICT: Collobert & Weston

    Schnabel & Schütze: POS tagging 51 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Representation for POS tagging

    Formalize problem as classification of a 5-word context (usinglinear SVM)

    Feature representation used for 5-word context:

    suffix, shapeCOUNTUNSU: Brown clustersPREDICT: Collobert & Weston

    Question: Which representation works best for POS tagging:COUNT, UNSU or PREDICT?

    Schnabel & Schütze: POS tagging 51 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Representation for POS tagging

    Schnabel & Schütze: POS tagging 52 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    POS tagging: Results

    newsgroups reviews weblogsALL OOV ALL OOV ALL OOV

    COUNT 90.86 66.42 92.95 75.29 94.71 83.64UNSU 90.34∗ 62.41∗ 92.23∗ 71.47∗ 94.45 81.76PREDICT 90.57 64.57 92.54∗ 72.48∗ 94.51 80.58∗

    answers emailsALL OOV ALL OOV

    COUNT 90.30 62.15 89.44 62.61UNSU 89.71∗ 56.28∗ 89.02∗ 63.20PREDICT 90.23 60.99 89.44 63.13

    Schnabel & Schütze: POS tagging 56 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    POS tagging: Results

    newsgroups reviews weblogsALL OOV ALL OOV ALL OOV

    COUNT 90.86 66.42 92.95 75.29 94.71 83.64PREDICT 90.57 64.57 92.54∗ 72.48∗ 94.51 80.58∗

    UNSU 90.34∗ 62.41∗ 92.23∗ 71.47∗ 94.45 81.76

    answers emails INDOMAINALL OOV ALL OOV ALL OOV

    COUNT 90.30 62.15 89.44 62.61 96.59 90.37PREDICT 90.23 60.99 89.44 63.13 96.72 90.48UNSU 89.71∗ 56.28∗ 89.02∗ 63.20 96.48∗ 87.50

    Schnabel & Schütze: POS tagging 57 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Best representation for POS tagging

    NONE: No representation

    LING: Representations based on linguistic resources

    COUNT: Count vectors

    UNSU: Rep’s learned by unsupervised learning

    PREDICT: Predict vectors

    And the winner is:

    Schnabel & Schütze: POS tagging 58 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Best representation for POS tagging

    NONE: No representation

    LING: Representations based on linguistic resources

    COUNT: Count vectors

    UNSU: Rep’s learned by unsupervised learning

    PREDICT: Predict vectors

    And the winner is:

    Schnabel & Schütze: POS tagging 58 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    POS tagging: Why is COUNT best?

    NONE: No representation

    LING: Representations based on linguistic resources

    COUNT: Count vectors

    UNSU: Rep’s learned by unsupervised learning

    PREDICT: Predict vectors

    Schnabel & Schütze: POS tagging 59 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    POS tagging: Why is COUNT best?

    NONE: No representation

    LING: Representations based on linguistic resources

    COUNT: Count vectors

    UNSU: Rep’s learned by unsupervised learning

    PREDICT: Predict vectors

    COUNT is better than NONE because representation learning(doing some adaptation vs no adaptation) works in this case.

    Schnabel & Schütze: POS tagging 59 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    POS tagging: Why is COUNT best?

    NONE: No representation

    LING: Representations based on linguistic resources

    COUNT: Count vectors

    UNSU: Rep’s learned by unsupervised learning

    PREDICT: Predict vectors

    COUNT is better than NONE because representation learning(doing some adaptation vs no adaptation) works in this case.

    Why is COUNT better than UNSU and PREDICT?

    Schnabel & Schütze: POS tagging 59 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    POS tagging: Why is COUNT best?

    NONE: No representation

    LING: Representations based on linguistic resources

    COUNT: Count vectors

    UNSU: Rep’s learned by unsupervised learning

    PREDICT: Predict vectors

    COUNT is better than NONE because representation learning(doing some adaptation vs no adaptation) works in this case.

    Why is COUNT better than UNSU and PREDICT?

    Hypothesis: POS tagging is a very simple problem, so you don’tneed a complex representation learning formalism.

    Schnabel & Schütze: POS tagging 59 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Outline

    1 “Traditional” computational linguistics representations

    2 Count vector representations

    3 Deep learning representations

    4 Task 1: Part-of-speech (POS) tagging

    5 Task 2: Morphological (MORPH) tagging

    6 Task 3: Sentiment analysis

    7 Task 4: Semantic similarity between words

    8 Conclusion

    Müller, Schmid, Schütze (in progress): MORPH tagging 60 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    This section based on: Müller, Schmid & Schütze. DomainAdaptation for Morphological Tagging. In progress.

    Müller, Schmid, Schütze (in progress): MORPH tagging 61 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Task: Morphological (MORPH) tagging

    Müller, Schmid, Schütze (in progress): MORPH tagging 62 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Task: Morphological (MORPH) tagging

    Disambiguate both part-of-speech and morphological features

    Müller, Schmid, Schütze (in progress): MORPH tagging 62 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Task: Morphological (MORPH) tagging

    Disambiguate both part-of-speech and morphological features

    Example:Ein ART case=nom|number=sg|gender=neutKlettergebiet NN case=nom|number=sg|gender=neutmacht VVFIN number=sg|person=3|tense=pres|mood=indGeschichte NN case=acc|number=sg|gender=fem

    Müller, Schmid, Schütze (in progress): MORPH tagging 62 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Task: Morphological (MORPH) tagging

    Disambiguate both part-of-speech and morphological features

    Example:Ein ART case=nom|number=sg|gender=neutKlettergebiet NN case=nom|number=sg|gender=neutmacht VVFIN number=sg|person=3|tense=pres|mood=indGeschichte NN case=acc|number=sg|gender=fem

    Part-of-speech disambiguation: ART, NN, VFIN

    Müller, Schmid, Schütze (in progress): MORPH tagging 62 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Task: Morphological (MORPH) tagging

    Disambiguate both part-of-speech and morphological features

    Example:Ein ART case=nom|number=sg|gender=neutKlettergebiet NN case=nom|number=sg|gender=neutmacht VVFIN number=sg|person=3|tense=pres|mood=indGeschichte NN case=acc|number=sg|gender=fem

    Part-of-speech disambiguation: ART, NN, VFIN

    Morphological disambiguation: case=nom, number=sg,tense=pres, mood=ind etc

    Müller, Schmid, Schütze (in progress): MORPH tagging 62 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Representation for MORPH tagging

    Müller, Schmid, Schütze (in progress): MORPH tagging 63 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Representation for MORPH tagging

    Formalize problem as sequence classification (usinghigher-order CRF: MarMoT)

    Müller, Schmid, Schütze (in progress): MORPH tagging 63 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Representation for MORPH tagging

    Formalize problem as sequence classification (usinghigher-order CRF: MarMoT)

    Feature representation used for each token:

    NONE (word index), suffix, shapeUNSU: SVD, Brown clustersPREDICT: polyglot (Al-Rfou et al)LING: finite state morphology (Manually created linguisticresource)

    Müller, Schmid, Schütze (in progress): MORPH tagging 63 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Representation for MORPH tagging

    Formalize problem as sequence classification (usinghigher-order CRF: MarMoT)

    Feature representation used for each token:

    NONE (word index), suffix, shapeUNSU: SVD, Brown clustersPREDICT: polyglot (Al-Rfou et al)LING: finite state morphology (Manually created linguisticresource)

    Question: Which representation works best for MORPHtagging: NONE, LING, UNSU or PREDICT?

    Müller, Schmid, Schütze (in progress): MORPH tagging 63 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    MORPH tagging: In domain results

    SVMTool Morfette MarMoT

    NONE NONE NONE UNSU1 UNSU2 PREDICT LING

    cs 91.06 91.48 93.86 94.15 94.16 94.13 94.52

    hu 94.72 95.47 96.14 96.45 96.47 96.46 96.84

    Müller, Schmid, Schütze (in progress): MORPH tagging 68 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    MORPH tagging: Results

    MarMoT

    NONE UNSU1 UNSU2 PREDICT LING

    Czech 78.01 78.44 78.51 78.42 78.88

    Hungarian 89.77 90.52 90.41 90.88 91.24

    SVMTool Morfette MarMoT

    NONE NONE NONE UNSU1 UNSU2 PREDICT LING

    cs 75.28 76.04 78.01 78.44 78.51 78.42 78.88

    hu 88.44 89.18 89.77 90.52 90.41 90.88 91.24

    Müller, Schmid, Schütze (in progress): MORPH tagging 69 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Best representation for morphology DA

    NONE: No representation

    LING: Representations based on linguistic resources

    COUNT: Count vectors

    UNSU: Rep’s learned by unsupervised learning

    PREDICT: Predict vectors

    And the winner is:

    Müller, Schmid, Schütze (in progress): MORPH tagging 70 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Best representation for morphology DA

    NONE: No representation

    LING: Representations based on linguistic resources

    COUNT: Count vectors

    UNSU: Rep’s learned by unsupervised learning

    PREDICT: Predict vectors

    And the winner is:

    Müller, Schmid, Schütze (in progress): MORPH tagging 70 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    MORPH tagging: Why is LING best?

    NONE: No representation

    LING: Representations based on linguistic resources

    COUNT: Count vectors

    UNSU: Rep’s learned by unsupervised learning

    PREDICT: Predict vectors

    Müller, Schmid, Schütze (in progress): MORPH tagging 71 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    MORPH tagging: Why is LING best?

    NONE: No representation

    LING: Representations based on linguistic resources

    COUNT: Count vectors

    UNSU: Rep’s learned by unsupervised learning

    PREDICT: Predict vectors

    Hypothesis: Learning morphological paradigms is actually a prettyhard problem. So the representation learning algorithms failed?

    Müller, Schmid, Schütze (in progress): MORPH tagging 71 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Discussion

    Müller, Schmid, Schütze (in progress): MORPH tagging 72 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Discussion

    Morphology is more Zipfian.

    Müller, Schmid, Schütze (in progress): MORPH tagging 72 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Discussion

    Morphology is more Zipfian.

    This is a difference between English (morphologically poor)and Czech / Hungarian (morphologically rich).

    Müller, Schmid, Schütze (in progress): MORPH tagging 72 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Discussion

    Morphology is more Zipfian.

    This is a difference between English (morphologically poor)and Czech / Hungarian (morphologically rich).

    Something like gender is difficult to infer from count vectors.

    Müller, Schmid, Schütze (in progress): MORPH tagging 72 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Outline

    1 “Traditional” computational linguistics representations

    2 Count vector representations

    3 Deep learning representations

    4 Task 1: Part-of-speech (POS) tagging

    5 Task 2: Morphological (MORPH) tagging

    6 Task 3: Sentiment analysis

    7 Task 4: Semantic similarity between words

    8 Conclusion

    Chen et al.: Sentiment 73 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    This section based on: Chen, Xu, Weinberger, Sha.Marginalized denoising autoencoders for domain adapta-tion. ICML 2012

    Chen et al.: Sentiment 74 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Task: Sentiment analysis

    For a review (of a book, a camera, a washing machine etc):determine if the review has positive polarity or negativepolarity.

    Chen et al.: Sentiment 75 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Example of a review

    I photograph almost 45 years and now is photography as my joband I am as a member of The Royal Photographic Society inEngland. I had bought the photographic books as new one and inthe secondhand bookstore. I have at this time maybe 2 meters longa queue of these photographic books in english, german and czechlanguage. I know, what is important information for photographerand what is the value the information in the proper time. . . . Isummarize the impression from this book: I can very hardrecommend this book not only for beginner but so for advancedphotographer with very strong interest ybout close-up photography.

    Chen et al.: Sentiment 76 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Example of a review

    I photograph almost 45 years and now is photography as my joband I am as a member of The Royal Photographic Society inEngland. I had bought the photographic books as new one and inthe secondhand bookstore. I have at this time maybe 2 meters longa queue of these photographic books in english, german and czechlanguage. I know, what is important information for photographerand what is the value the information in the proper time. . . . Isummarize the impression from this book: I can very hardrecommend this book not only for beginner but so for advancedphotographer with very strong interest ybout close-up photography.

    categories: positive / neutral / negative

    Chen et al.: Sentiment 76 / 97

  • LING reps Count vectors Deep learning POS tagging MORPH tagging Sentiment Semantics

    Example of a review

    I photograph almost 45 years and now is photography as my joband I am as a member of The Royal Photographic Society inEngland. I had bought the photographic books as new one and inthe secondhand bookstore. I have at this time maybe 2 meters longa queue of these photographic books in english, german and czechlanguage. I know, what is important information for photographerand what is the value the information in the proper time. . . . Isummarize the impression from this book: I can very hardrecommend this book not only for beginner but so for advancedphotographer with very strong interest ybout close-up photography.

    categories: positive / neutral / negative

    classification decision: positive

    Chen et al.: Sentiment 76 / 97

  • LING reps Count vec