Information Inference
description
Transcript of Information Inference
![Page 1: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/1.jpg)
Information Inference
Mimicking human text-based reasoning
P.D. Bruza & D. Song
Information Ecology Project
Distributed Systems Technology Centre
![Page 2: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/2.jpg)
Penguin Books U.K
Why Linus chose a penguin
Surfing the Himalayas
![Page 3: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/3.jpg)
Introductory remarks
Information inference is a common and real phenomenom It can be modelled by symbolic inference, but this isn’t satisfying The inferences are often latent associations triggered by seeing a
word(s) in the context of other words- so inference is not deductive, but about producing appropriate implicit associations appropriate to the context
We need to look at the problem from a cognitive perspective….
![Page 4: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/4.jpg)
Since last time….
(Philosophical) positioning of the work is clearer Some encouraging experimental results using
information inference to derive query models Some initial ideas about how information inference fits
into an abductive logic for text-based knowledge discovery
![Page 5: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/5.jpg)
Dretske’s Information Content
To a person with prior knowledge K, r being F carries the informationthat s is G if and only if the conditional probability of s being Ggiven r is F is 1 (and less than one given K alone)
We can say that s being G is inferred (informationally) from r is F and K
![Page 6: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/6.jpg)
T= “Why Linus chose a penguin”
1 K) | Torvalds" Linus" being Linus"Pr("
}Peanuts""in character cartoon a is Linus
penguin, a is logoLinux The Linux, invented Torvalds Linus{
K
So Dretske’s definition does not permit the inference “Linus” is “Linus Torvalds”, though a human being may proceedunder this “hasty” judgment.
Dretske’s information content “sets too high a standard” (Barwise & Seligman)
1T)in penguin"" with is Linus""K,|Torvalds" Linus" is Linus"Pr("
![Page 7: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/7.jpg)
Inferential information content (Barwise &Seligman)
To a person with prior knowledge K, r being F carries the information thats is G, if the person could legitimately infer that s is G from r being Ftogether with K (but could not from K alone)
![Page 8: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/8.jpg)
T= “Why Linus chose a penguin”
aloneK from inferredly legitimate bet can' Torvalds" Linus" being Linus""
}Peanuts""in character cartoon a is Linus
penguin, a is logoLinux The Linux, invented Torvalds Linus{K
“Linus” being with “penguin” in T, together with K, carries the information that “Linus” is “Linus Torvalds”
![Page 9: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/9.jpg)
Barwise & Seligman (con’t)
“… by relativizing information flow to human inference, this definitionmakes room for different standards in what sorts of inferences the personis able and willing to make”
Remarks:- Psychologistic stance taken- Onerous from an engineering standpoint: “different standards” implies “nonmonotonicity”. Consider, “Linux Online: Why Linus chose a penguin” (willing) v.s. “Why Linus chose a penguin” (not willing)
![Page 10: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/10.jpg)
Consequences of psychologism
Representations of information need not be propositional Semantics is not a model-theoretic issue, but a cognitive one - the
“meanings” stored and manipulated by the system should accord with what we have in our heads.
![Page 11: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/11.jpg)
Gärdenfors’ cognitive model
symbolic
conceptual
associationist(sub-conceptual)
Propositionalrepresentation
Geometricrepresentation
Connectionistrepresentation
![Page 12: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/12.jpg)
Conceptual spaces: the property “red”
huechromaticity
brightness
Properties and concepts are dimensional (geometric) objects.Dimensions may be integral - the value in a dimension(s) determines thevalue in another.
red(x)
![Page 13: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/13.jpg)
Barwise & Seligman’s real valued state spaces
7.0:,6.0:,445: brightnesschromhuered
Observation function
![Page 14: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/14.jpg)
Gärdenfors’ cognitive model: how we realize it
symbolic
conceptual
associationist(sub-conceptual)
Propositionalrepresentation
Geometricrepresentation
Connectionistrepresentation
HAL
LSA
keywords
![Page 15: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/15.jpg)
Geometric representations of words via Hyperspace Analogue to Language (HAL)
reagan = < administration: 0.45, bill: 0.05, budget: 0.07, house: 0.06, president: 0.83, reagan: 0.21, trade: 0.05, veto: 0.06, … >
This example demonstrates how a word is represented as a weighted vector Whose dimensions comprise other words.
The weights represent the strengths of association between “reagan” and other words seen in the same context(s)
![Page 16: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/16.jpg)
How HAL vectors are constructed
…….Kemp urges Reagan to oppose stock tax…..
Slide a window of width n across corpusPer word: Compute weight of association with other words within windowthe weight is inversely proportional to distance
HAL space: each word in the corpus represented by a multi-dimensional vector - a weighted sum of the contexts the word appeared in.(Burgess et al refer to it as a “high dimensional context space”, or a “high dimensional semantic space”)
![Page 17: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/17.jpg)
Remarks about HAL
A HAL space is easy to construct Cognitive compatibility with human information processing
– “word representations learned by HAL account for a variety of semantic phenomena” (Burgess et al)
– Therefore a good candidate for represented “meanings” in accord with our psychologistic stance
A HAL space is a real-valued state space, thus opening the door to driving information inference according to Barwise & Seligman’s definition
– A HAL vector represents a word’s “state” in the context of the text corpus it was derived from
![Page 18: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/18.jpg)
Differences with Burgess et al.
We (often) normalize the weights Pre- and post- vectors are added into a single vector HAL vectors derived from small text corpora (e.g.,
Reuters-21758) seem to be OK HAL vectors are “summed” representations- similar in
spirit to “prototypical concepts” (which are averaged representations
![Page 19: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/19.jpg)
Reagan traces
President Reagan was ignorant about much of the Iran arms scandal
Reagan says U.S. to offer missile treaty
REAGAN SEEKS MORE AID FOR CENTRAL AMERICA
Kemp urges Reagan to oppose stock tax
![Page 20: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/20.jpg)
Prototypical concepts
* *
*
*
**
![Page 21: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/21.jpg)
Prototypical “Reagan” = average of vectors from traces
president: 3.23,administration: 1.82,trade: 0.40,budget: 0.37,veto: 0.34,bill: 0.31,congress: 0.31,tax: 0.29,::
![Page 22: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/22.jpg)
Concept combination: “Pink Elephant”
Elephant = < , , …… >
![Page 23: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/23.jpg)
Heuristic concept combination: “Star wars”
star = <trek: 0.2, episode: 0.05, soviet: 0.3, bush: 0.4, missile: 0.25>
wars = <soviet: 0.1, missile:0.2, iran: 0.33, iraq: 0.28, gulf: 0.4>
starwars = < trek: 0.3, episode: 0.15, soviet: 0.6, bush: 0.53, missile: 0.65, iran: 0.2, iraq: 0.18, gulf: 0.25>
Observation: “star” dominates “wars”
How to weight dimensions appropriately according to context?Weights are affected by how one concept appears in the light of another concept:Intersecting dimensions are emphasized, weights are adjusted according to degree of dominance. (NB moving prototypical concepts in the HAL space is a cleaner way ofdealing with context)
![Page 24: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/24.jpg)
Theoretical background: Information inference via HAL-based information flow computations
)degree( iff ,,1 jin ccjii
ij
scandal iran reagan,
)()()( iff , lightslivesonslightliveon
Barwise&Seligman: state-based “information flow”
HAL-based “information flow”
symbolic conceptual
![Page 25: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/25.jpg)
Degree of inclusion (flow) computation
)i(cQPpk
pic k
))jQP(c)i(c(QPpl
pic l
w
w
)degree(
ji cc
Consider the “quality properties” above mean weight in the source concept.(Intuition: how much of the salient aspects of the source are contained in thetarget)
Compute the ratio of intersecting dimensions between source and targetconcept to the dimensions in the source concept
source target
![Page 26: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/26.jpg)
Visualizing degree of inclusion between HAL vectors
ABCDFGKLM
A.F.K..Q
source target
Many of the above avg.“quality properties” of thesource concept arepresent in the target, sothe degree of inclusion willbe high
![Page 27: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/27.jpg)
![Page 28: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/28.jpg)
Information Inference in practice: deriving query models
Construct HAL vectors for all vocabulary terms from the document collection
Given a query such as “space program”, compute the information flows from it and use these to expand the query, e.g.
nasa - programspace
Query expansion term derived via information flow computation
(We used the top 80 information flows for expansion without feedback, 65 with feedback)
![Page 29: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/29.jpg)
The experiments
Associated Press 88/89 collections TREC topics 1 – 50, 100-150, 151-200 (titles only). Models for comparison: Baseline, Composition,
Relevance Model, Markov chain model
![Page 30: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/30.jpg)
Baseline Model
BM-25 term weighting (terms were stemmed) Replication of Lafferty & Zhai’s baseline (SIGIR 2001) Dot product matching function
![Page 31: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/31.jpg)
Composition model
Combine the HAL vectors of individual query terms by recursively applying the concept combination heuristic; query terms ranked according to idf (dominance ranking)
starwars = < trek: 0.3, episode: 0.15, soviet: 0.6, bush: 0.53, missile: 0.65, iran: 0.2, iraq: 0.18, gulf: 0.25>
![Page 32: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/32.jpg)
Results
Baseline Model
Composition Model
Info flow Model
AvgPr 0.182 0.197 (+8%)
0.247 (+35%)
InitPr 0.476 0.520 (+10%)
0.544 (+14%)
Recall 1667/3301 1996/3301 (+15%)
2269/3301 (+35%)
![Page 33: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/33.jpg)
The effect of information inference
26% of the 35% improvement in precision of the HAL-based information flow model is due to information inference
For example, the query “space program”. The information flow model infersquery expansion terms such as “Reagan”, “satellites”,”scientists”,“pentagon”, “mars”, “moon”.
These are real inferences with respect “space program”, as these terms do not appear as dimensions in HAL vectors of the concept combination:spaceprogram
![Page 34: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/34.jpg)
Comparison with probabilistic query language models
MC: Markov chain model (Lafferty & Zhai, SIGIR 2001)
MC IM MCwP IMwP
1-50
AP89
0.201 0.247 0.232 0.258
Scores are average precision
![Page 35: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/35.jpg)
Comparison with probabilistic query language models (con’t)
RM: Relevance model (Lavrenko & Croft, SIGIR 2001)
IM IMwP RM
101-150
AP0.265 0.301 0.261
151-200
AP0.298 0.344 0.319
Scores are average precision
![Page 36: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/36.jpg)
Text-based scientific discovery
Fish OilRaynaud
B1Blood viscosity
B2Platelet Aggregation
B3Vascular Reactivity
A C
Weeber et al “Using Concepts in Literature-Based Discovery JASIST 52(7):548-557
“.., he made the connection between these literatures and formulated the hypothesis thatfish oil may be used for treating Raynaud’s disease..”
![Page 37: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/37.jpg)
Logic of Abduction (Gabbay & Woods)
Abductive logic
Logic of discovery Logic of justification
Hypothesis testing
HAL-based info flow ? ?
![Page 38: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/38.jpg)
Raw material for abduction? Information flows from “Raynaud”
Raynaud
Raynaud: 1.0myocardial: 0.56coronary: 0.54renal: 0.52ventricular: 0.52...oil: 0.23.fish: 0.20..
.
.
Some promise, but lack of representation ofintegral dimensions a problem
![Page 39: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/39.jpg)
Index expressions
“Beneficial effects of fish oil on blood viscosity”
beneficial
effects
fish
oil
blood
viscosity
of on
![Page 40: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/40.jpg)
Power index expressions for representing integral dimensions
fish oil effects blood viscosity
eff of fish oil eff on blood viscosity
Information flows are single terms, power index expressions determinehow they may be combined into higher order syntactic structures
![Page 41: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/41.jpg)
Initial results from using information flow computations as a logic of discovery
27 ventricular (0.52) infarction (0.46)27 thromboplastin (0.17)27 pulmonary (0.51) arteries (0.25)27 placental (0.19) protein (0.42)27 monoamine (0.17) oxidase (0.18)27 lupus (0.37) nephritis (0.17)27 instruments (0.17)27 coagulant (0.21)27 blood (0.63) coagulation (0.29)26 umbilical (0.24) vein (0.32)25 fish (0.20)23 viscosity (0.21)23 cigarette (0.26) smokers (0.22)4 fish (0.20) oil (0.23)
![Page 42: Information Inference](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814bfd550346895db8faac/html5/thumbnails/42.jpg)
Summary
(Barwise & Seligman) and Gärdenfors have very stance wrt “human stance” (Gabbay and Woods also)… psychologism is alive….
An integration of a primitive approximation of a conceptual space with an information inference mechanism driven by information flow computations
An initial attempt towards realizing Gärdenfors’ conceptual spaces– A HAL space is only a primitive approximation– We are looking at Voronoi tessellations
A tiny contribution to Barwise & Seligman’s call for a “distinctively different model of human reasoning”
(We are looking beyond IR)