What is coming… n Today: u Probabilistic models u Improving classical models F Latent Semantic...

31
What is coming… Today: Probabilistic models Improving classical models Latent Semantic Indexing Relevance feedback (Chapter 5) Monday Feb 5 Chapter 5 continued Wednesday Feb 7 Web Search Engines Chapter 13 & Google paper
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    0

Transcript of What is coming… n Today: u Probabilistic models u Improving classical models F Latent Semantic...

What is coming…

Today: Probabilistic models Improving classical models

Latent Semantic Indexing Relevance feedback (Chapter 5)

Monday Feb 5 Chapter 5 continued

Wednesday Feb 7 Web Search Engines

Chapter 13 & Google paper

Annoucement: Free Food Event Where & When: GWC 487, Tuesday Feb

6th, 12:15-1:30 What: Pizza, Softdrinks Catch: A pitch on going to graduate

school at ASU CSE… Meet admissions committee, some graduate students,

some faculty members

Silver-lining: Did we mention Pizza? Also not as boring as time-sharing condo

presentations.Mak

e Su

re to

Com

e!!

Eudora.lnk

Problems with Vector Model No semantic basis!

Keywords are plotted as axes But are they really independent? Or they othogonal?

No support for boolean queries How do you ask for papers that

don’t contain a keyword?

Probabilistic Model

Objective: to capture the IR problem using a probabilistic framework

Given a user query, there is an ideal answer set Querying as specification of the properties of

this ideal answer set (clustering) But, what are these properties?

Guess at the beginning what they could be (i.e., guess initial description of ideal answer set)

Improve by iteration

Probabilistic Model An initial set of documents is retrieved somehow User inspects these docs looking for the relevant ones (in

truth, only top 10-20 need to be inspected) IR system uses this information to refine description of ideal

answer set By repeting this process, it is expected that the description of

the ideal answer set will improve Have always in mind the need to guess at the very beginning

the description of the ideal answer set Description of ideal answer set is modeled in probabilistic terms

Probabilistic Ranking Principle Given a user query q and a document dj,

estimate the probability that the user will find the document dj interesting (i.e., relevant). The model assumes that this probability of relevance depends on

the query and the document representations only. Ideal answer set is referred to as R and should maximize the

probability of relevance. Documents in the set R are predicted to be relevant.

But, how to compute probabilities? what is the sample space?

The Ranking

Probabilistic ranking computed as: sim(q,dj) = P(dj relevant-to q) / P(dj non-relevant-to q)

This is the odds of the document dj being relevant Taking the odds minimize the probability of an erroneous

judgement Definition:

wij {0,1} P(R | vec(dj)) : probability that given doc is relevant P(R | vec(dj)) : probability doc is not relevant

The Ranking

sim(dj,q) = P(R | vec(dj)) / P(R | vec(dj))

= [P(vec(dj) | R) * P(R)] [P(vec(dj) | R) * P(R)]

~ P(vec(dj) | R) P(vec(dj) | R)

P(vec(dj) | R) : probability of randomly selecting the document dj from the set R of relevant documents

Bayes Rule

P(R ) and P(R )

Same for all docs

The Ranking

sim(dj,q) ~ P(vec(dj) | R) P(vec(dj) | R)

Where vec(dj) is of the form (k1, k2,k3....kt)

Using pair-wise independence assumption among

keywords

~ [ P(ki | R)] * [ P(ki | R)]

[ P(ki | R)] * [ P(ki | R)]

P(ki | R) : probability that the index term ki is present in a document randomly selected from the set R of relevant documents

For keywords that

are present in dj

For keywords that

are NOT present in dj

The Ranking sim(dj,q) ~ log [ P(ki | R)] * [ P(kj | R)]

[ P(ki | R)] * [ P(kj | R)]

~ K * [ log P(ki | R) +

P(ki | R)

log P(ki | R) ] P(ki | R)

~ wiq * wij * (log P(ki | R) + log P(ki | R) ) P(ki | R) P(ki | R)

where P(ki | R) = 1 - P(ki | R) P(ki | R) = 1 - P(ki | R)

The Initial Ranking sim(dj,q) ~

~ wiq * wij * (log P(ki | R) + log P(ki | R) ) P(ki | R) P(ki | R)

Probabilities P(ki | R) and P(ki | R) ? Estimates based on assumptions:

P(ki | R) = 0.5 P(ki | R) = ni

Nwhere ni is the number of docs that contain ki

Use this initial guess to retrieve an initial ranking Improve upon this initial ranking

Pluses and Minuses Advantages:

Docs ranked in decreasing order of probability of relevance

Disadvantages: need to guess initial estimates for P(ki | R) method does not take into account tf and idf factors

Alternative Probabilistic Models

Probability Theory Semantically clear Computationally clumsy

Why Bayesian Networks? Clear formalism to combine evidences Modularize the world (dependencies) Bayesian Network Models for IR

Inference Network (Turtle & Croft, 1991) Belief Network (Ribeiro-Neto & Muntz, 1996)

Bayesian Networks

Definition:Bayesian networks are directed acyclic graphs (DAGS) in which the nodes represent random variables, the arcs portray causal relationships between these variables, and the strengths of these causal influences are expressed by conditional probabilities.

Bayesian Networks

yi : parent nodes (in this case, root nodes)x : child node

yi cause xY the set of parents of xThe influence of Y on x can be quantified by any function

F(x,Y) such that x F(x,Y) = 1 0 < F(x,Y) < 1For example, F(x,Y)=P(x|Y)

yty2y1

x

Bayesian Networks

Given the dependencies declaredin a Bayesian Network, theexpression for the joint probability can be computed asa product of local conditional probabilities, for example,

P(x1, x2, x3, x4, x5)=

P(x1 ) P(x2| x1 ) P(x3| x1 ) P(x4| x2, x3 ) P(x5| x3 ).

P(x1 ) : prior probability of the root node

x 2

x 1

x 3

x 4 x 5

Bayesian Networks

In a Bayesian network eachvariable x is conditionally independent of all its non-descendants, given itsparents.

For example:

P(x4, x5| x2 , x3)= P(x4| x2 , x3) P( x5| x3)

x 2

x 1

x 3

x 4 x 5

An Example Bayes Net•Typically, networks written in causal directionwind up being most compact

•need least number of probabilitiesto be specified

Two Models

and

or

qq2

q1

k1 k2 ki kt

I

dj

or

… …

q

kik2k1

dj dnd1

kt ku

“Inference Network model” “Belief network model”

Comparison

Inference Network Model is the first and well known• Used in Inquery system

Belief Network adopts a set-theoretic view• a clearly defined sample space• a separation between query and document portions • is able to reproduce any ranking produced by the Inference

Network while the converse is not true (for example: the ranking of the standard vector model)

Belief Network Model

As the Inference Network Model Epistemological view of the IR problem Random variables associated with documents,

index terms and queries Contrary to the Inference Network Model

Clearly defined sample space Set-theoretic view Different network topology

Belief Network Model

The Probability SpaceDefine:

K={k1, k2, ...,kt} the sample space (a concept space)u K a subset of K (a concept)

ki an index term (an elementary concept)

k=(k1, k2, ...,kt) a vector associated to each u such that gi(k)=1 ki u

ki a binary random variable associated with the index term ki , (ki = 1 gi(k)=1 ki u)

Belief Network Model

A Set-Theoretic View

Define:

a document dj and query q as concepts in K

a generic concept c in K

a probability distribution P over K, as

P(c)=uP(c|u) P(u)

P(u)=(1/2)t

P(c) is the degree of coverage of the space K by c

Belief Network Model

Network topology

query side

document side

q

kik2k1

dj dnd1

kt ku

Belief Network Model

Assumption

P(dj|q) is adopted as the rank of the document dj with respect to the query q. It reflects the degree of coverage provided to the concept dj by the concept q.

Belief Network Model

The rank of dj

P(dj|q) = P(dj q) / P(q)

~ P(dj q)

~ u P(dj q | u) P(u)

~ u P(dj | u) P(q | u) P(u)

~ k P(dj | k) P(q | k) P(k)

q

kik2k1

dj dnd1

kt ku

Belief Network Model

For the vector model

Define

Define a vector ki given by

ki = k | ((gi(k)=1) (ji gj(k)=0))

in the state ki only the node ki is active and all the others are inactive

Belief Network Model

For the vector modelDefine

(wi,q / |q|) if k = ki gi(q)=1P(q | k) =

0 if k ki v gi(q)=0

P(¬q | k) = 1 - P(q | k)

(wi,q / |q|) is a normalized version of weight of the index term ki in the query q

Belief Network Model

For the vector model

Define

(wi,j / |dj|) if k = ki gi(dj)=1

P(dj | k) =

0 if k ki v gi(dj)=0

P(¬ dj | k) = 1 - P(dj | k)

(wi,j / |dj|) is a normalized version of the weight of the index term k i in the document d,j

Bayesian Network Models

Computational costs Inference Network Model one document node at a

time then is linear on number of documents Belief Network only the states that activate each

query term are considered The networks do not impose additional costs

because the networks do not include cycles.

Bayesian Network Models

Impact

The major strength is net combination of distinct evidential sources to support the rank of a given document.