An introduction to compositional models in distributional semantics
-
Upload
andre-freitas -
Category
Education
-
view
832 -
download
4
Transcript of An introduction to compositional models in distributional semantics
![Page 1: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/1.jpg)
www.insight-centre.org
An Introduction to Compositional Models in Distributional Semantics
André FreitasSupervisor: Edward Curry
Reading GroupFriday (22/11/2013)
![Page 2: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/2.jpg)
www.insight-centre.org
Based on: Baroni et al. (2012)
Frege in Space: A Program for Compositional Distributional Semantics
![Page 3: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/3.jpg)
www.insight-centre.org
The Paper
• Comprehensive (107 pages) introduction and overview of compositional distributional models.
3
![Page 4: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/4.jpg)
www.insight-centre.org
Semantics for a Complex World
• Most semantic models have dealt with particular types of constructions, and have been carried out under very simplifying assumptions, in true lab conditions.
• If these idealizations are removed it is not clear at all that modern semantics can give a full account of all but the simplest sentences.
Sahlgren, 2013
4
![Page 5: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/5.jpg)
www.insight-centre.org
Goal behind Compositional Distributional Models
• Principled and effective semantic models for coping with real world semantic conditions.
• Focus on semantic approximation.
• Applications– Semantic search.– Approximate semantic inference.– Paraphrase detection.– Semantic anomaly detection.– ...
5
![Page 6: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/6.jpg)
www.insight-centre.org
Paraphrase Detection
• I find it rather odd that people are already trying to tie the Commission's hands in relation to the proposal for a directive, while at the same calling on it to present a Green Paper on the current situation with regard to optional and supplementary health insurance schemes.
• I find it a little strange to now obliging the Commission to a motion for a resolution and to ask him at the same time to draw up a Green Paper on the current state of voluntary insurance and supplementary sickness insurance.
=?
6
![Page 7: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/7.jpg)
www.insight-centre.org
Solving the Problem: The Data-driven Way
• Distributional– Use vast corpora to extract the meaning of content words.– Provide a principled representation of distributional
meaning.
• Compositional– These representations should be objects that compose
together to form more complex meanings.– Content words should be able to combine with
grammatical roles, in ways that account for the importance of structure in sentence meaning.
7
![Page 8: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/8.jpg)
www.insight-centre.org
Distributional Semantics
8
![Page 9: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/9.jpg)
www.insight-centre.org
Distributional Semantics
• “Words occurring in similar (linguistic) contexts are semantically similar.”
• Practical way to automatically harvest word “meanings” on a large-scale.
• meaning = linguistic context.• This can then be used as a surrogate of its
semantic representation.
99
![Page 10: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/10.jpg)
www.insight-centre.org
Vector Space Model
c1
child
husbandspouse
cn
c2
function (number of times that the words occur in c1)
10
0.7
0.5
![Page 11: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/11.jpg)
www.insight-centre.org
Semantic Similarity/Relatedness
θ
11
c1
child
husbandspouse
cn
c2
![Page 12: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/12.jpg)
www.insight-centre.org
Similarity
• Distributional vectors allow a precise quantification of similarity.
• Measured by the distance of the corresponding vectors on the Cartesian plane.
12
![Page 13: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/13.jpg)
www.insight-centre.org
Semantic Approximation (Video)
![Page 14: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/14.jpg)
www.insight-centre.org
CompositionalModel
![Page 15: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/15.jpg)
www.insight-centre.org
Compositional Semantics
• Can we extend DS to account for the meaning of phrases and sentences?
15
![Page 16: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/16.jpg)
www.insight-centre.org
Compositionality
• The meaning of a complex expression is a function of the meaning of its constituent parts.
carnivorous plants
digest slowly
16
![Page 17: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/17.jpg)
www.insight-centre.org
Compositionality Principles
Words in which the meaning is directly determined by their distributional behaviour (e.g., nouns).
Words that act as functions transforming the distributional profile of other words (e.g., verbs, adjectives, …).
dogs
old
17
![Page 18: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/18.jpg)
www.insight-centre.org
Compositionality Principles• Take the syntactic structure to constitute the backbone
guiding the assembly of the semantic representations of phrases.
• A correspondence between syntactic categories and distributional objects.
18
![Page 19: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/19.jpg)
www.insight-centre.org
Mixture-based Models
• Mitchell and Lapata (2010)• Proposed two broad classes of composition
models. – Additive.– Multiplicative.
19
![Page 20: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/20.jpg)
www.insight-centre.org
Additive Model
20
![Page 21: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/21.jpg)
www.insight-centre.org
Additive Model
• Limitations with the additive model:– The input vectors contribute to the composed
expression in the same way. – Linguistic intuition would suggest that the
composition operation is asymmetric (head of the phrase should have greater weight).
21
![Page 22: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/22.jpg)
www.insight-centre.org
Multiplicative Model
22
![Page 23: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/23.jpg)
www.insight-centre.org
Analysis
• Multiplicative models perform quite well in the task of predicting human similarity judgments about adjective-noun, noun-noun, verb-noun and noun-verb phrases.
23
![Page 24: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/24.jpg)
www.insight-centre.org
Criticism of Mixture Models
• Some words have an intrinsic functional behaviour:
“lice on dogs”, “lice and dogs”
• Lack of recursion.
• To address these limitations function-based models were introduced.
24
![Page 25: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/25.jpg)
www.insight-centre.org
Mixture vs Function
25
![Page 26: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/26.jpg)
www.insight-centre.org
Distributional Functions
• Composition as function application.• Nouns are still represented as vectors.• Adjectives, verbs, determiners, prepositions,
conjunctions and so forth are all modelled by distributional functions.
(ON(dogs))(lice)AND(lice, dogs)
26
![Page 27: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/27.jpg)
www.insight-centre.org
Distributional functions as linear transformations
• Distributional functions are linear transformations on semantic vector/tensor spaces.
• Matrix: First-order, one argument distributional functions.• Used to represent adjectives and intransitive verbs.
27
![Page 28: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/28.jpg)
www.insight-centre.org
Example: Adjective + Noun
• Adjective = a function from nouns to nouns,
28
![Page 29: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/29.jpg)
www.insight-centre.org
Measuring similarity of tensors
• Two matrices (or tensors) are similar when they have a similar weight distribution, i.e., they perform similar input-to-output component mappings.
• DECREPIT, OLD might dampen the “runs” component of a noun.
29
![Page 30: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/30.jpg)
www.insight-centre.org
Inducing distributional functions from corpus data
- Distributional functions are induced from input to output transformation examples - Regression techniques commonly used in machine learning.
old
30
![Page 31: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/31.jpg)
www.insight-centre.org
31
![Page 32: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/32.jpg)
www.insight-centre.org
Socher, 2012• Recursive neural network (RNN) model that learns
compositional vector representations for phrases and sentences.
• State of the art performance on three different experiments sentiment analysis and cause-effect semantic relations.
32
![Page 33: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/33.jpg)
www.insight-centre.org
Main Challenges• Challenge I: Lack of sufficient examples of their inputs and
outputs.– Possible Solution: Extend the training sets exploiting
similarities between linguistic expressions to ‘share’ training examples across distributional functions.
• Challenge II: Computational power and space– Grefenstette et al., 2013.– Nouns live in 300-dimensional spaces, a transitive verb is a
(300 × 300) × 300 tensor, that is, it contains 27 million components.
– Relative pronoun: (300 × 300) × (300 × 300) tensor, contains 8.1 billion components.
33
![Page 34: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/34.jpg)
www.insight-centre.org
Categorial Grammar
• Provides the syntax-semantics interface.• Tight connection between syntax and semantics.• Motivated by the principle of compositionality.• View that syntactic constituents should generally
combine as functions or according to a function-argument relationship.
34
![Page 35: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/35.jpg)
www.insight-centre.org
Categorial Grammar
ApplyInference
rules
The string is a sentence ((the (bad boy)) (made (that mess)))
35
![Page 36: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/36.jpg)
www.insight-centre.org
Local compositions
BARK x dogs
vector matrix
36
![Page 37: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/37.jpg)
www.insight-centre.org
Local compositions
(CHASE × cats) × dogs.
3rd order tensor vector
vector
(CHASE × cats)
37
![Page 38: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/38.jpg)
www.insight-centre.org
Syntax-Semantics interfacefor a English fragment
38
![Page 39: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/39.jpg)
www.insight-centre.org
Other Compositional Models
• Coecke et al. (2010): Category theory and Lambek calculus.
• Grefenstette et al. (2013): Simulating Logical Calculi with Tensors.
• Novacek et al. ISWC (2011), Freitas et al. ICSC (2011) : Semantic Web & Distributional Semantics.
39
![Page 40: An introduction to compositional models in distributional semantics](https://reader036.fdocuments.us/reader036/viewer/2022081504/554e9a9db4c90526358b53a2/html5/thumbnails/40.jpg)
www.insight-centre.org
Conclusion
• Distributional semantics brings a promising approach for building computational models that work in the real world.
• Semantic approximation as a built-in construct.• Compositionality is still an open problem but
classical (formal) works have been leveraged and adapted to DSMs.
• Exciting time to be around!
40