Download - Azzopardi2012economics of iir_tech_talk

The Economics in Interactive Information Retrieval

Leif Azzopardihttp://www.dcs.gla.ac.uk/~leif

http://www.dcs.gla.ac.uk/~leif

Cost

Interaction

Benefit

RelevantInformation

Interactive and Iterative Search

Queries

A simplified, abstracted, representation

Information Need

DocumentsReturned

System

User

Observational & Empirical

Theoretical & FormalInformation Foraging Theory

ASK

Berry Picking IS&R

Framework

Pirolli (1999)

Interactive Information Retrieval needs formal models to: • describe, explain and predict the interaction of users

with systems,• provide a basis on which to reason about interaction,• understand the relationships between interaction,

performance and cost,• help guide the design, development and research of

information systems, and• derive laws and principles of interaction.

Theoretical & Formal

A Major Research Challenge

Belkin (2008)

Jarvelin (2011)

How do users behave?

Patent searchers typically examine 100-200 documents per query (using a Boolean system)

User queries tend to be short (only 2-3 terms) Web searchers typically

only examine the first page of results

Users adapt to degraded systems by issuing more queries

Users rarely provide explicit relevance feedback

Users will often pose a series of short queries

Patent searchers usually express longer and complex queries

Why do users behave like this?

So why do users pose short queries?

User queries tend to be short

But longer queries tend to be more effective!

So why do users pose short queries?

0 5 10 15 20 25 300

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Query Length (No. of Terms)

Perf

orm

ance

Exponentially diminishing returns kicks in after 2 query terms

Around 2-3 terms is where the user gets the most bang for their buck

Marginal Performance

Total Performance

Azzopardi (2009)

How can we use microeconomics to model the search process?

Microeconomics

Consumer Theory

Production Theory

Utility Maximization

Cost Minimization

Production Theorya.k.a. Theory of Firms

OutputInputs

The Firm

Technology

Utilizes Constrains

CapitalLabor

Widgets

Varian (1987)

Production FunctionsCa

pita

l

Labor

Production Function


pita

l

LaborQuantity 1

Quantity 2

Quantity 3

Production Function Quantity = F ( Capital, Labor )


pita

l

LaborQuantity 1

Quantity 2

Quantity 3

Production FunctionProduction Set


pita

l

LaborQuantity 1

Quantity 2

Quantity 3

Technology constrainsthe production set

Production FunctionProduction Set

Applying Production Theory to Interactive Information Retrieval

RelevantInformation

Interactive and Iterative Search

Queries

A simplified, abstracted, representation

Information Need

DocumentsReturned

System

User

Search as Production

OutputInputs

The Firm

Search Engine Technology

Utilizes Constrains

QueriesAssessments

Relevance Gain

Search Production FunctionN

o. o

f Que

ries

(Q)

No. of Assessments per Query (A)Gain = 10

Gain = 20

Gain = 30

Gain = F(Q,A)

The function represents how well a system could be used. i.e. the min input required to achieve that level of gain

Few Queries,

Lots of Assessment

s?

Lots of Queries,

Few Assessment

s?

Or someother way?

What strategies can the user employwhen interacting with the search system to achieve their end goal

What is the most cost-efficient way for a user to interact with an IR system?

Modeling Caveatsof an economic model of the search process

AbstractedSimplified

Representative

Gain = F(Q,A)

What does the model tell us about search & interaction?

ScenarioSearch Scenario

• Task: Find news articles about ….

• Goal: To find a number of relevant documents and reach the desired level of Cumulative Gain.

• Output: Total Cumulative Gain (G) across the session

• Inputs:

– Y No. of Queries, and

– X No. of Assessments per Query

• Collections:

– TREC News Collections (AP, LA, Aquaint)

– Each topic had about 30 or more relevant documents

• Simulation: built using C++ and the Lemur IR toolkit

Simulating User Interaction

TREC Documents marked Relevant

Issues Y Queries of Length 3

TREC Aquaint Topics

AssessesX Documents per QuerySimulated User

Models:ProbabilisticVector SpaceBoolean

Queries generated from Relevant set

Record X & Y for each level of gain

Select the best query first/next

The simulation assumes the user has perfect information – in order to find out how well the system could be used.

0 50 100 150 200 250 3000

2

4

6

8

10

12

14

16

18

20

BM25 NCG=0.2

BM25 NCG=0.4

Search Production Curves

No. of Assessments per Query

No.

of Q

uerie

sTREC Aquaint Collection

8 Q & 15 Q/A gets NCG = 0.44 Q & 40 Q/A gets NCG = 0.4

7.7 Q & 5 Q/A gets NCG = 0.23.6 Q & 15 Q/A gets NCG = 0.2

Same Retrieval Model, Different Gain

To double the gain, requires more than double the no. of assessments

0 50 100 150 200 250 3000

2

4

6

8

10

12

14

16

18

20BM25 NCG=0.4

BOOL NCG=0.4

TFIDF NCG=0.4

Search Production Curves


No.

of Q

uerie

s

TREC Aquaint Collection

No input combinations with depth less than this are technically feasible!

BM25 provides more strategies (i.e. input combinations) than BOOL or TFIDF

User Adaption:-BM25: 5 Q @ 25 A/Q-BOOL: 10 Q @ 25A/QMore queries on the degraded systems

For the same gain, BOOL and TFIDF require a lot more interaction.

Different Retrieval Models, Same Gain

Search Production FunctionCobbs-Douglas Production Function

Model K α Goodness of FitBM25 5.39 0.58 0.995BOOL 3.47 0.58 0.992TFIDF 1.69 0.50 0.997

Example Values on Aquaint when NCG = 0.6

No. of queries issued

No. of Assessments per query Mixing parameter determined by the technology

Efficiency of the technology used

Using the Cobbs-Douglas Search Function

– the change in gain over the change in querying– i.e. how much more gain do we get if we pose

extra queries

We can differentiate the function to find the rates of change of the input variables

Marginal Product of Querying

Marginal Product of Assessing – the change in gain over the change in assessing– i.e. how much more gain do we get if we assess

extra documents

Technical Rate of Substitution

0 50 100 150 200 250 3000

2

4

6

8

10

12

14

16

18

20

BM25 NCG=0.4

How many more assessments per query are needed, if one less query was posed?

0.4

1.2

2.5

4.2

8.3


No.

of Q

uerie

s

TRS of Assessments for Queries

EXAMPLE:If 5 queries are submitted, instead of 6, then 24.2 docs/query need to be assessed, instead of 20 docs/query

6Q @ 20A / Q = 120 A5Q @ 24.2 / Q = 121 A

At this point if you gave up one query you’d need to assess 1.2 extra docs/query

What about the cost of interaction?

User Search Cost Function

No. of queries issued

No. of Assessments per query

Relative cost of a Query to an Assessment

Total no. of documents assessed

A linear cost function

What is the relative cost of a query?Using cognitive costs of querying and assessing taken from Gwizdka (2010):• The average cost of querying was 2628 ms• The average cost of assessing was 2226 ms• So β was set to 2628/2226 = 1.1598

Cost Efficient Strategies

0 5 10 15 20 25 300

10

20

30

40

50

0 5 10 15 20 25 30130

180

230

280

330

380

BM25 0.4 and 0.6 Gains

Cost

No.

of Q

uerie

s

No. of Assessment per Query

Minimum Cost

On BM25 to increase gain pose more queries, but examine the same no. of docs per [email protected]

[email protected]

Cost Efficient Strategies

20 40 60 80100

120140

160180

200300500700900

110013001500

20 60100

140180

02468

1012

Cost

No.

of Q

uerie

s

BOOL 0.4 & 0.6 Gains


Minimum Cost

On Boolean, to increase gain,

issue the about the same no. of queries,

but examine more docs per query

[email protected]

[email protected]

Contrasting Systems

20 40 60 80100

120140

160180

200300500700900

110013001500

20 60100

140180

02468

1012

Cost

No.

of Q

uerie

s

0 5 10 15 20 25 300

10

20

30

40

50

0 5 10 15 20 25 30130

180

230

280

330

380

BM25 0.4 and 0.6 Gains BOOL 0.4 and 0.6 Gains

Cost

No.

of Q

uerie

s

No. of Assessment per Query No. of Assessment per Query

BM25 is less costly to use than BOOL

On BM25 issue more queries

But examine less doc per [email protected]

[email protected]

A Hypothetical Experiment

Querying costsgo down?

More queries issued

Decrease in assessments per query

Querying costs go up?

Increase in assessmentsper query

Decrease inqueries issued

$$$$

What happens if

Changing the Relative Query CostCo

st


As β increases the relative cost of querying goes up, it is cheaper to assess more documents per query and consequently query less!

• Knowing how benefit, interaction and cost relate can help guide how we design systems – We can theorize about how changes to the system

will affect the user’s interaction• Is this desirable? Do we want the user to query more? Or

for them to assess more?

– We can categorize the type of user• Is this a savvy rational user? Or is this a user behaving

irrationally?

– We can scrutinize the introduce of new features• Are they going to be of any use? Are they worth it for the

user? i.e. how much more performance, or how little must they cost?

Implications for Design

Future Directions• Validate the theory by conducting

observational & empirical research– Do the predictions about user behavior hold?

• Incorporate other inputs into the model– Find Similar, Relevance Feedback, Browsing, – Query length, Query Type, etc

• Develop more accurate cost functions– Obtain Better Estimates of Costs

• Model other search tasks

Future Directions

Contact Details

Email: [email protected]

Skype: Leifos

Twitter: @leifos

Questions

mailto:[email protected]

• Varian, H., Intermediate Microeconomics, 1987• Varian, H., Economics and Search, ACM SIGIR Forum,

1999• Pirolli, P., Information Foraging Theory, 1999• Belkin, N., Some (what) grand challenges of Interactive

Information Retrieval, ACM SIGIR Forum, 2008• Azzopardi, L., Query Side Evaluation, ACM SIGIR 2009

– http://dl.acm.org/citation.cfm?doid=1571941.1572037

• Azzopardi, L., The Economics of Interactive Information Retrieval, ACM SIGIR 2011 – http://dl.acm.org/citation.cfm?doid=2009916.2009923

• Jarvelin, K., IR Research: Systems, Interaction, Evaluation and Theories, ACM SIGIR Forum, 2011

Selected References

http://dl.acm.org/citation.cfm?doid=1571941.1572037

http://dl.acm.org/citation.cfm?doid=2009916.2009923

Search Production FunctionIn

tera

ction

X

Interaction Y

G = F( X, Y )

Example

Search Production FunctionLe

ngth

of Q

uery

(L)

No. of Assessments (A)

P@10= 0.1

P@10= 0.2

P@10= 0.3

P@10 = F(L,A)

Example application for web search