What is coming… n Today: u Probabilistic models u Improving classical models F Latent Semantic...
-
date post
19-Dec-2015 -
Category
Documents
-
view
217 -
download
0
Transcript of What is coming… n Today: u Probabilistic models u Improving classical models F Latent Semantic...
What is coming…
Today: Probabilistic models Improving classical models
Latent Semantic Indexing Relevance feedback (Chapter 5)
Monday Feb 5 Chapter 5 continued
Wednesday Feb 7 Web Search Engines
Chapter 13 & Google paper
Annoucement: Free Food Event Where & When: GWC 487, Tuesday Feb
6th, 12:15-1:30 What: Pizza, Softdrinks Catch: A pitch on going to graduate
school at ASU CSE… Meet admissions committee, some graduate students,
some faculty members
Silver-lining: Did we mention Pizza? Also not as boring as time-sharing condo
presentations.Mak
e Su
re to
Com
e!!
Eudora.lnk
Problems with Vector Model No semantic basis!
Keywords are plotted as axes But are they really independent? Or they othogonal?
No support for boolean queries How do you ask for papers that
don’t contain a keyword?
Probabilistic Model
Objective: to capture the IR problem using a probabilistic framework
Given a user query, there is an ideal answer set Querying as specification of the properties of
this ideal answer set (clustering) But, what are these properties?
Guess at the beginning what they could be (i.e., guess initial description of ideal answer set)
Improve by iteration
Probabilistic Model An initial set of documents is retrieved somehow User inspects these docs looking for the relevant ones (in
truth, only top 10-20 need to be inspected) IR system uses this information to refine description of ideal
answer set By repeting this process, it is expected that the description of
the ideal answer set will improve Have always in mind the need to guess at the very beginning
the description of the ideal answer set Description of ideal answer set is modeled in probabilistic terms
Probabilistic Ranking Principle Given a user query q and a document dj,
estimate the probability that the user will find the document dj interesting (i.e., relevant). The model assumes that this probability of relevance depends on
the query and the document representations only. Ideal answer set is referred to as R and should maximize the
probability of relevance. Documents in the set R are predicted to be relevant.
But, how to compute probabilities? what is the sample space?
The Ranking
Probabilistic ranking computed as: sim(q,dj) = P(dj relevant-to q) / P(dj non-relevant-to q)
This is the odds of the document dj being relevant Taking the odds minimize the probability of an erroneous
judgement Definition:
wij {0,1} P(R | vec(dj)) : probability that given doc is relevant P(R | vec(dj)) : probability doc is not relevant
The Ranking
sim(dj,q) = P(R | vec(dj)) / P(R | vec(dj))
= [P(vec(dj) | R) * P(R)] [P(vec(dj) | R) * P(R)]
~ P(vec(dj) | R) P(vec(dj) | R)
P(vec(dj) | R) : probability of randomly selecting the document dj from the set R of relevant documents
Bayes Rule
P(R ) and P(R )
Same for all docs
The Ranking
sim(dj,q) ~ P(vec(dj) | R) P(vec(dj) | R)
Where vec(dj) is of the form (k1, k2,k3....kt)
Using pair-wise independence assumption among
keywords
~ [ P(ki | R)] * [ P(ki | R)]
[ P(ki | R)] * [ P(ki | R)]
P(ki | R) : probability that the index term ki is present in a document randomly selected from the set R of relevant documents
For keywords that
are present in dj
For keywords that
are NOT present in dj
The Ranking sim(dj,q) ~ log [ P(ki | R)] * [ P(kj | R)]
[ P(ki | R)] * [ P(kj | R)]
~ K * [ log P(ki | R) +
P(ki | R)
log P(ki | R) ] P(ki | R)
~ wiq * wij * (log P(ki | R) + log P(ki | R) ) P(ki | R) P(ki | R)
where P(ki | R) = 1 - P(ki | R) P(ki | R) = 1 - P(ki | R)
The Initial Ranking sim(dj,q) ~
~ wiq * wij * (log P(ki | R) + log P(ki | R) ) P(ki | R) P(ki | R)
Probabilities P(ki | R) and P(ki | R) ? Estimates based on assumptions:
P(ki | R) = 0.5 P(ki | R) = ni
Nwhere ni is the number of docs that contain ki
Use this initial guess to retrieve an initial ranking Improve upon this initial ranking
Pluses and Minuses Advantages:
Docs ranked in decreasing order of probability of relevance
Disadvantages: need to guess initial estimates for P(ki | R) method does not take into account tf and idf factors
Alternative Probabilistic Models
Probability Theory Semantically clear Computationally clumsy
Why Bayesian Networks? Clear formalism to combine evidences Modularize the world (dependencies) Bayesian Network Models for IR
Inference Network (Turtle & Croft, 1991) Belief Network (Ribeiro-Neto & Muntz, 1996)
Bayesian Networks
Definition:Bayesian networks are directed acyclic graphs (DAGS) in which the nodes represent random variables, the arcs portray causal relationships between these variables, and the strengths of these causal influences are expressed by conditional probabilities.
Bayesian Networks
yi : parent nodes (in this case, root nodes)x : child node
yi cause xY the set of parents of xThe influence of Y on x can be quantified by any function
F(x,Y) such that x F(x,Y) = 1 0 < F(x,Y) < 1For example, F(x,Y)=P(x|Y)
yty2y1
x
Bayesian Networks
Given the dependencies declaredin a Bayesian Network, theexpression for the joint probability can be computed asa product of local conditional probabilities, for example,
P(x1, x2, x3, x4, x5)=
P(x1 ) P(x2| x1 ) P(x3| x1 ) P(x4| x2, x3 ) P(x5| x3 ).
P(x1 ) : prior probability of the root node
x 2
x 1
x 3
x 4 x 5
Bayesian Networks
In a Bayesian network eachvariable x is conditionally independent of all its non-descendants, given itsparents.
For example:
P(x4, x5| x2 , x3)= P(x4| x2 , x3) P( x5| x3)
x 2
x 1
x 3
x 4 x 5
An Example Bayes Net•Typically, networks written in causal directionwind up being most compact
•need least number of probabilitiesto be specified
Two Models
and
or
qq2
q1
k1 k2 ki kt
I
dj
or
… …
q
kik2k1
dj dnd1
kt ku
“Inference Network model” “Belief network model”
Comparison
Inference Network Model is the first and well known• Used in Inquery system
Belief Network adopts a set-theoretic view• a clearly defined sample space• a separation between query and document portions • is able to reproduce any ranking produced by the Inference
Network while the converse is not true (for example: the ranking of the standard vector model)
Belief Network Model
As the Inference Network Model Epistemological view of the IR problem Random variables associated with documents,
index terms and queries Contrary to the Inference Network Model
Clearly defined sample space Set-theoretic view Different network topology
Belief Network Model
The Probability SpaceDefine:
K={k1, k2, ...,kt} the sample space (a concept space)u K a subset of K (a concept)
ki an index term (an elementary concept)
k=(k1, k2, ...,kt) a vector associated to each u such that gi(k)=1 ki u
ki a binary random variable associated with the index term ki , (ki = 1 gi(k)=1 ki u)
Belief Network Model
A Set-Theoretic View
Define:
a document dj and query q as concepts in K
a generic concept c in K
a probability distribution P over K, as
P(c)=uP(c|u) P(u)
P(u)=(1/2)t
P(c) is the degree of coverage of the space K by c
Belief Network Model
Assumption
P(dj|q) is adopted as the rank of the document dj with respect to the query q. It reflects the degree of coverage provided to the concept dj by the concept q.
Belief Network Model
The rank of dj
P(dj|q) = P(dj q) / P(q)
~ P(dj q)
~ u P(dj q | u) P(u)
~ u P(dj | u) P(q | u) P(u)
~ k P(dj | k) P(q | k) P(k)
q
kik2k1
dj dnd1
kt ku
Belief Network Model
For the vector model
Define
Define a vector ki given by
ki = k | ((gi(k)=1) (ji gj(k)=0))
in the state ki only the node ki is active and all the others are inactive
Belief Network Model
For the vector modelDefine
(wi,q / |q|) if k = ki gi(q)=1P(q | k) =
0 if k ki v gi(q)=0
P(¬q | k) = 1 - P(q | k)
(wi,q / |q|) is a normalized version of weight of the index term ki in the query q
Belief Network Model
For the vector model
Define
(wi,j / |dj|) if k = ki gi(dj)=1
P(dj | k) =
0 if k ki v gi(dj)=0
P(¬ dj | k) = 1 - P(dj | k)
(wi,j / |dj|) is a normalized version of the weight of the index term k i in the document d,j
Bayesian Network Models
Computational costs Inference Network Model one document node at a
time then is linear on number of documents Belief Network only the states that activate each
query term are considered The networks do not impose additional costs
because the networks do not include cycles.