The Economics in Interactive Information Retrieval
Leif Azzopardihttp://www.dcs.gla.ac.uk/~leif
Cost
Interaction
Benefit
RelevantInformation
Interactive and Iterative Search
Queries
A simplified, abstracted, representation
Information Need
DocumentsReturned
System
User
Observational & Empirical
Theoretical & FormalInformation Foraging Theory
ASK
Berry Picking IS&R
Framework
Pirolli (1999)
Interactive Information Retrieval needs formal models to: • describe, explain and predict the interaction of users
with systems,• provide a basis on which to reason about interaction,• understand the relationships between interaction,
performance and cost,• help guide the design, development and research of
information systems, and• derive laws and principles of interaction.
Theoretical & Formal
A Major Research Challenge
Belkin (2008)
Jarvelin (2011)
How do users behave?
Patent searchers typically examine 100-200 documents per query (using a Boolean system)
User queries tend to be short (only 2-3 terms) Web searchers typically
only examine the first page of results
Users adapt to degraded systems by issuing more queries
Users rarely provide explicit relevance feedback
Users will often pose a series of short queries
Patent searchers usually express longer and complex queries
Why do users behave like this?
So why do users pose short queries?
User queries tend to be short
But longer queries tend to be more effective!
So why do users pose short queries?
0 5 10 15 20 25 300
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Query Length (No. of Terms)
Perf
orm
ance
Exponentially diminishing returns kicks in after 2 query terms
Around 2-3 terms is where the user gets the most bang for their buck
Marginal Performance
Total Performance
Azzopardi (2009)
How can we use microeconomics to model the search process?
Microeconomics
Consumer Theory
Production Theory
Utility Maximization
Cost Minimization
Production Theorya.k.a. Theory of Firms
OutputInputs
The Firm
Technology
Utilizes Constrains
CapitalLabor
Widgets
Varian (1987)
Production FunctionsCa
pita
l
Labor
Production Function
Production FunctionsCa
pita
l
LaborQuantity 1
Quantity 2
Quantity 3
Production Function Quantity = F ( Capital, Labor )
Production FunctionsCa
pita
l
LaborQuantity 1
Quantity 2
Quantity 3
Production FunctionProduction Set
Production FunctionsCa
pita
l
LaborQuantity 1
Quantity 2
Quantity 3
Technology constrainsthe production set
Production FunctionProduction Set
Applying Production Theory to Interactive Information Retrieval
RelevantInformation
Interactive and Iterative Search
Queries
A simplified, abstracted, representation
Information Need
DocumentsReturned
System
User
Search as Production
OutputInputs
The Firm
Search Engine Technology
Utilizes Constrains
QueriesAssessments
Relevance Gain
Search Production FunctionN
o. o
f Que
ries
(Q)
No. of Assessments per Query (A)Gain = 10
Gain = 20
Gain = 30
Gain = F(Q,A)
The function represents how well a system could be used. i.e. the min input required to achieve that level of gain
Few Queries,
Lots of Assessment
s?
Lots of Queries,
Few Assessment
s?
Or someother way?
What strategies can the user employwhen interacting with the search system to achieve their end goal
What is the most cost-efficient way for a user to interact with an IR system?
Modeling Caveatsof an economic model of the search process
AbstractedSimplified
Representative
Gain = F(Q,A)
What does the model tell us about search & interaction?
ScenarioSearch Scenario
• Task: Find news articles about ….
• Goal: To find a number of relevant documents and reach the desired level of Cumulative Gain.
• Output: Total Cumulative Gain (G) across the session
• Inputs:
– Y No. of Queries, and
– X No. of Assessments per Query
• Collections:
– TREC News Collections (AP, LA, Aquaint)
– Each topic had about 30 or more relevant documents
• Simulation: built using C++ and the Lemur IR toolkit
Simulating User Interaction
TREC Documents marked Relevant
Issues Y Queries of Length 3
TREC Aquaint Topics
AssessesX Documents per QuerySimulated User
Models:ProbabilisticVector SpaceBoolean
Queries generated from Relevant set
Record X & Y for each level of gain
Select the best query first/next
The simulation assumes the user has perfect information – in order to find out how well the system could be used.
0 50 100 150 200 250 3000
2
4
6
8
10
12
14
16
18
20
BM25 NCG=0.2
BM25 NCG=0.4
Search Production Curves
No. of Assessments per Query
No.
of Q
uerie
sTREC Aquaint Collection
8 Q & 15 Q/A gets NCG = 0.44 Q & 40 Q/A gets NCG = 0.4
7.7 Q & 5 Q/A gets NCG = 0.23.6 Q & 15 Q/A gets NCG = 0.2
Same Retrieval Model, Different Gain
To double the gain, requires more than double the no. of assessments
0 50 100 150 200 250 3000
2
4
6
8
10
12
14
16
18
20BM25 NCG=0.4
BOOL NCG=0.4
TFIDF NCG=0.4
Search Production Curves
No. of Assessments per Query
No.
of Q
uerie
s
TREC Aquaint Collection
No input combinations with depth less than this are technically feasible!
BM25 provides more strategies (i.e. input combinations) than BOOL or TFIDF
User Adaption:-BM25: 5 Q @ 25 A/Q-BOOL: 10 Q @ 25A/QMore queries on the degraded systems
For the same gain, BOOL and TFIDF require a lot more interaction.
Different Retrieval Models, Same Gain
Search Production FunctionCobbs-Douglas Production Function
Model K α Goodness of FitBM25 5.39 0.58 0.995BOOL 3.47 0.58 0.992TFIDF 1.69 0.50 0.997
Example Values on Aquaint when NCG = 0.6
No. of queries issued
No. of Assessments per query Mixing parameter determined by the technology
Efficiency of the technology used
Using the Cobbs-Douglas Search Function
– the change in gain over the change in querying– i.e. how much more gain do we get if we pose
extra queries
We can differentiate the function to find the rates of change of the input variables
Marginal Product of Querying
Marginal Product of Assessing – the change in gain over the change in assessing– i.e. how much more gain do we get if we assess
extra documents
Technical Rate of Substitution
0 50 100 150 200 250 3000
2
4
6
8
10
12
14
16
18
20
BM25 NCG=0.4
How many more assessments per query are needed, if one less query was posed?
0.4
1.2
2.5
4.2
8.3
No. of Assessments per Query
No.
of Q
uerie
s
TRS of Assessments for Queries
EXAMPLE:If 5 queries are submitted, instead of 6, then 24.2 docs/query need to be assessed, instead of 20 docs/query
6Q @ 20A / Q = 120 A5Q @ 24.2 / Q = 121 A
At this point if you gave up one query you’d need to assess 1.2 extra docs/query
What about the cost of interaction?
User Search Cost Function
No. of queries issued
No. of Assessments per query
Relative cost of a Query to an Assessment
Total no. of documents assessed
A linear cost function
What is the relative cost of a query?Using cognitive costs of querying and assessing taken from Gwizdka (2010):• The average cost of querying was 2628 ms• The average cost of assessing was 2226 ms• So β was set to 2628/2226 = 1.1598
Cost Efficient Strategies
0 5 10 15 20 25 300
10
20
30
40
50
0 5 10 15 20 25 30130
180
230
280
330
380
BM25 0.4 and 0.6 Gains
Cost
No.
of Q
uerie
s
No. of Assessment per Query
Minimum Cost
On BM25 to increase gain pose more queries, but examine the same no. of docs per [email protected]
Cost Efficient Strategies
20 40 60 80100
120140
160180
200300500700900
110013001500
20 60100
140180
02468
1012
Cost
No.
of Q
uerie
s
BOOL 0.4 & 0.6 Gains
No. of Assessment per Query
Minimum Cost
On Boolean, to increase gain,
issue the about the same no. of queries,
but examine more docs per query
Contrasting Systems
20 40 60 80100
120140
160180
200300500700900
110013001500
20 60100
140180
02468
1012
Cost
No.
of Q
uerie
s
0 5 10 15 20 25 300
10
20
30
40
50
0 5 10 15 20 25 30130
180
230
280
330
380
BM25 0.4 and 0.6 Gains BOOL 0.4 and 0.6 Gains
Cost
No.
of Q
uerie
s
No. of Assessment per Query No. of Assessment per Query
BM25 is less costly to use than BOOL
On BM25 issue more queries
But examine less doc per [email protected]
A Hypothetical Experiment
Querying costsgo down?
More queries issued
Decrease in assessments per query
Querying costs go up?
Increase in assessmentsper query
Decrease inqueries issued
$$$$
What happens if
Changing the Relative Query CostCo
st
No. of Assessment per Query
As β increases the relative cost of querying goes up, it is cheaper to assess more documents per query and consequently query less!
• Knowing how benefit, interaction and cost relate can help guide how we design systems – We can theorize about how changes to the system
will affect the user’s interaction• Is this desirable? Do we want the user to query more? Or
for them to assess more?
– We can categorize the type of user• Is this a savvy rational user? Or is this a user behaving
irrationally?
– We can scrutinize the introduce of new features• Are they going to be of any use? Are they worth it for the
user? i.e. how much more performance, or how little must they cost?
Implications for Design
Future Directions• Validate the theory by conducting
observational & empirical research– Do the predictions about user behavior hold?
• Incorporate other inputs into the model– Find Similar, Relevance Feedback, Browsing, – Query length, Query Type, etc
• Develop more accurate cost functions– Obtain Better Estimates of Costs
• Model other search tasks
Future Directions
• Varian, H., Intermediate Microeconomics, 1987• Varian, H., Economics and Search, ACM SIGIR Forum,
1999• Pirolli, P., Information Foraging Theory, 1999• Belkin, N., Some (what) grand challenges of Interactive
Information Retrieval, ACM SIGIR Forum, 2008• Azzopardi, L., Query Side Evaluation, ACM SIGIR 2009
– http://dl.acm.org/citation.cfm?doid=1571941.1572037
• Azzopardi, L., The Economics of Interactive Information Retrieval, ACM SIGIR 2011 – http://dl.acm.org/citation.cfm?doid=2009916.2009923
• Jarvelin, K., IR Research: Systems, Interaction, Evaluation and Theories, ACM SIGIR Forum, 2011
Selected References
Search Production FunctionIn
tera
ction
X
Interaction Y
G = F( X, Y )
Example
Search Production FunctionLe
ngth
of Q
uery
(L)
No. of Assessments (A)
P@10= 0.1
P@10= 0.2
P@10= 0.3
P@10 = F(L,A)
Example application for web search
Top Related