Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term...
Transcript of Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term...
![Page 1: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/1.jpg)
Information Retrieval I
David Hawking
30 Sep 2010
Machine Learning Summer School, ANU
![Page 2: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/2.jpg)
Session Outline
I Ranking documents in response to a query
I Measuring the quality of such rankings
I Case Study: Tuning 40 parameters at the London School ofEconomics
I Coffee Break
I Web SearchEngineering
I Field Work: how do Web search engines really work?
I Stretch Break
I Discussion: Other IR problems for machine learning
I Historical context
![Page 3: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/3.jpg)
Start a Machine Learning Run
to discuss later
![Page 4: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/4.jpg)
Information Retrieval
Information Need
IRS
DocumentsQuery
Results
I Ranked retrieval → ranked list of results
![Page 5: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/5.jpg)
Measuring/comparing thequality of rankings
![Page 6: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/6.jpg)
Precision - Recall curves
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
prec
isio
n
recall
normal
bad
good
I Mean average precision (MAP) = area under curve.
![Page 7: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/7.jpg)
Normalised Discounted Cumulative Gain
Perfect System
Real System A
Real System B
1 20
5 5 4 4 3 2 1 1 1 1 1 1 1 1 1 1 1 1 - -
1 2 3 4 5 - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - 5 4 3 2 1
(Relevance judged on a 5 point scale)
DCG [r ] =
{G [1] if r = 1DCG [r − 1] + G [r ]/ logb r otherwise
![Page 8: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/8.jpg)
But where do the utility judgments comefrom?
(We’ll return to this later on.)
![Page 9: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/9.jpg)
Probability Ranking Principle
Maron & Kuhns, JACM, 1960”... technique called “Probabilistic Indexing”, allows a computingmachine, given a request for information, to derive a number(called the “relevance number”) for each document, which is ameasure of the probability that the document will satisfy the givenrequest. The result of the search is an ordered list of thosedocuments which satisfy the request ranked according to theirprobable relevance.”
I Cooper (1977) produced a counter example, based onsub-classes of users with different criteria submitting the samequery ⇒ need to model diversity.
![Page 10: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/10.jpg)
Modern Ranking Functions
RSV = α0Do + . . .+ αnDn + β0S0 + . . .+ βnSn (1)
I Machine learned combination of:I dynamic scores – probability of relevance given doc and query
textI static priors, independent of the query
![Page 11: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/11.jpg)
Dynamic factors
![Page 12: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/12.jpg)
Key Concepts
I Term — Basic unit of indexing: e.g. a word, a word-stem, aphrase. Could be any discrete feature, not necessarily derivedfrom text.
I Term Coordination.
I tf — Term frequency.
I N — Number of documents in the collection.
I V — Vocab – distinct terms in the collection.
I ni — Number of documents with i-th term present.
I idf — Inverse document frequency. Sparck Jones, J Doc,1972: dlog2 Ne − dlog2 nie+ 1.
I Relevance — Often modelled as dichotomous variable.Rel |Rel
![Page 13: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/13.jpg)
Probabilistic Retrieval
(From Robertson and Zaragoza tutorial, SIGIR 2007) Starting withthe probability ranking principle:
P(Rel |d , q) ∝qP(Rel |d , q)
P(Rel |d , q)transform to odds (2)
∝qP(d |Rel , q)
P(d |Rel , q)Bayes rule (3)
≈∏V
P(tfi |Rel , q)
P(tfi |Rel , q)Assume independence (4)
≈∏t∈q
P(tfi |Rel , q)
P(tfi |Rel , q)Restrict to query terms (5)
∝q
∑t∈q
logP(tfi |Rel , q)
P(tfi |Rel , q)So we can add weights (6)
![Page 14: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/14.jpg)
Okapi BM25 (Robertson et al, 1994)
wt = tfd ×log(N−n+0.5
n+0.5 )
k1 × ((1− b) + b × dlavdl ) + tf d
(7)
Sd =∑t∈q
wt (8)
I Sd is not a probability but should be rank-equivalent to it.
![Page 15: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/15.jpg)
Term saturation
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20
tf/(t
f + k
)
term frequency
k=0.5k=2
k=100
tf
tf + k(9)
I Modelling saturation is important.
![Page 16: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/16.jpg)
Length normalisationNeed for normalisation of tfi depends upon why some documentsare longer than others. Make it tunable:
tf ′i = tfi/B (10)
B = (1− b) + bdl
dl(11)
0
20
40
60
80
100
0.01 0.1 1 10 100
tf/((
1-b)
+ b
* r
atio
)
Ratio of length to average length
tf=10, b=0tf=10, b=0.1tf=10, b=0.5tf=10, b=0.9
tf=10, k=2, b=1.0
![Page 17: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/17.jpg)
BM25F - Extension to fields
I Weight term frequencies prior to non-linear combination inBM25.
I Robertson, Zaragoza & Taylor, CIKM 2004
![Page 18: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/18.jpg)
Other Retrieval Models
I Vector Space
I Language Models
I Divergence from Randomness (parameter free!)
![Page 19: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/19.jpg)
Using an inverted file to generate dynamic scores
Postings (uncompressed).(2,3)(7,1)(11,2)(17,1)(22,6)
Term count postingsaaaaa 1
oboe 5
oblong 3
zzzzz 2
Term Dictionary
Index
DocID Length QIE Snippet
Document TableAcc.s
doc001
doc002
doc003
doc004
doc005
doc006
5327 0.735 Arist...
2106
4108
2999
101
27111
0.6
0.33
0.1
0.2
0.7
Score0.145
0.212
0.707
0.009
0.031
0.100
![Page 20: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/20.jpg)
External Textual Evidence
![Page 21: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/21.jpg)
Anchor text
Microsoft
Micro$oft
click here
Buy Windows today
Microsoft
l’empire satanique
Microsoft
Microsoft homepage
the biggest software co.
Microsoft Micro$oft click here Buy Windows today Microsoft l’empire satanique Microsoft Microsoft homepage the biggest software co.
Can you guess the URL?
?
Important target pages havemany incoming links, each with itsown brief description of the target.
Appropriately weighted, these annotationscan be used to index the page they target.
The text highlighted in your browser toindicate a link you can click on, is calledanchor text.
![Page 22: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/22.jpg)
Click-associated queries
Search the XYZ Intranet
patents Search
Patent Leather ShoesXYZ anounces a stylish line of patent leather shoes.xyz.com/clothing/patent_shoes.htm
Exciting new XYZ footwear patentWe have recently been granted a patent on ...xyz.com/media/060318.htm
XYZ Intellectual Property PolicyOfficial XYZ policy on patents, copyright etc.xyz.com/ip_policy.htm
Memo to all staff - 13 Dec 2002Rhubarb rhubarb rhubarb ...xyz.com/archive/2002_420.htm
"patents"
xyz.com/ip_policy.htm
"patents" (234)"ip" (42)"ip policy" (5201)"intellectual property (39)
When a searcher enters a query and clicks on a document,we can associate that query with the document. Associatedqueries can be weighted by click frequency and used inindexing and retrieval.
![Page 23: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/23.jpg)
Folksonomy tags
application-form apply-for-leavebreak holidays leaveleave-form rec-leave
rec-leave-form useful vacation
What’s this resource about?
?
Important resources receive many tags.The frequency of a tag -- indicated bythe type size in a "tag cloud" display --can be used as an indexing weight.
A collaborative bookmarking tool can be usedto tag a document, image or other resourcewith an annotation which is shared with other users.
I See e.g. Dubinko et al, WWW 2006
![Page 24: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/24.jpg)
Collecting tags
![Page 25: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/25.jpg)
Should these external texts betreated as document fields?
![Page 26: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/26.jpg)
Static factorsAdapted from Richardson, Prakash and Brill, WWW 2006
I Incoming hyperlinksI e.g. raw count, PageRank, Kleinberg Hub/Authority
I Searcher behaviour dataI e.g. Frequency of visits to page (from toolbars, or proxy logs);
Frequency of clicks when this page appears in search results;Average dwell time on the page
I Query independent use of anchor textI Amount of referring anchor text; Size of anchor text vocabulary
I Page propertiesI e.g. Word count; Frequency of most common term;
I URL properties (Kraiij & Westerveld, SIGIR 2002)I e.g. Length, depth of URL; type (root, subroot, page,
dynamic)I Domain properties
I e.g. Average outlink count for pages in this domain.I Spam rating.
I e.g. Presence of AdSense ads!I Adult content score.
![Page 27: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/27.jpg)
PageRank
A
B
C
D
Initial PR value for all 15 nodes: 1/15After convergence, which of A,B,C,D has highest PR?
I random surfer
I start with equal probability for all bookmarked pages
I follow outgoing links with equal probability
I teleport to a bookmark with probability d
![Page 28: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/28.jpg)
Query independent evidence in the Australian Government
There’s no query!
![Page 29: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/29.jpg)
Machine-Learning the OverallRanking Function
![Page 30: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/30.jpg)
RSV = α0Do + . . .+ αnDn + β0S0 + . . .+ βnSn (12)
I We need to be able to compute ranking quality for gezillionsof combinations of the αs and βs.
I Ranking quality is highly dependent upon the query so at eachpoint we need to run very large numbers of queries andmeasure the quality of results.
![Page 31: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/31.jpg)
Thoughts on a loss function
(Except for nerds like me, people don’t actually search for the funof it. They do it in order to complete a task.)
I What we really want to optimise:I Proportion of search-facilitated tasks that people completeI How satisfactorily they complete themI How fast they complete them
I That’s very difficult. How can we do it?
![Page 32: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/32.jpg)
User Studies
I Bring large numbers of human subjects into a laboratory andask them to do search tasks.
I Measure their task performances.I But:
I ExpensiveI Not a real task – do the subjects do it properly?I Huge sources of variance to be controlled
I individual differencesI order effectsI interactions
I Results are set level – not reusable
![Page 33: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/33.jpg)
In-Situ StudiesI Ask representative human subjects to use a two-panel search
tool instead of their normal search engine.I Controls for many of the problemsI Still not re-usableI Explicit or implicit judgments.
I Results are still set level – not reusable
![Page 34: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/34.jpg)
Observing natural user behaviour
I Via search engine or browser logsI Where do people click?
I Trust biasI Interpreting no-clickI Increased frequency of clicks before and after page boundaries
and “the fold”
I Can get preference judgments:I If someone skips over Dn and clicks on Dn+1 we have evidence
that they prefer Dn+1 to DDn for
I That could be input into a machine learning system.
![Page 35: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/35.jpg)
Manipulating Rankings
I Reordering results
I Interleaving results
I Inserting resultsI Observe behavioural differences
I Flights and Buckets
I GYB do lots of this.
![Page 36: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/36.jpg)
TREC
I Cranfield? TREC? Huh?
![Page 37: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/37.jpg)
Cranfield / TREC Style Judging
I Employ judges to assign relevance / utility scores to alldocuments (or for a large pool of documents which mightpossibly be relevant to the query).
I TREC pools– Union of top 100 docs for participating systemsI Results in re-usable test sets, modulo:
I completenessI judging errors and disagreements
I TREC studies of stability of rankings across strict/lenientjudging
I GYB have large budgets for this.I Bing: 5 point scale, Gains are 2n
![Page 38: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/38.jpg)
Issues with TREC style test sets
I Of what population are the TREC topics a representativesample?
I No penalty for duplicates – they’re very common
I No reward for diversificationI Solution: es.csiro.au/C-TEST
I InterpretationsI Differential utilitiesI Equivalence sets
![Page 39: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/39.jpg)
![Page 40: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/40.jpg)
C-TEST Example
![Page 41: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/41.jpg)
C-TEST Example Outfile
![Page 42: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/42.jpg)
C-TEST: Tools for
I Creating testfilesI From a spreadsheetI From TREC topicsI By searching and browsingI By sampling a query log and judging
I Computing measures and significance testing of differences
![Page 43: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/43.jpg)
LSE Case Study
![Page 44: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/44.jpg)
Sources of testfiles at LSE
I A-Z Sitemap (¿500 entries)I Biased toward anchortext
I Keymatches file (¿500 entries)I Pessimistic
I Click data (¿250 queries with ¿t clicks)I Biased toward clicks - can achieve 100% success
I Random sample of workload, post-judgedI Popular/Critical queries (134 manually judged)
I Optimising for searchers or for publishers
![Page 45: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/45.jpg)
Tuning problem
I Approximately 40 parameters, some continuous, some binary,some integer
I Not much idea about the shape of the function.I Pretty sure that there are multiple points of inflection.
I Some combinations make no sense
I Obviously brute-force grid search is impossible
I Even so, millions of query executions are needed.
![Page 46: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/46.jpg)
Dimension at a time tuning
1 2
3
dim1
dim2
![Page 47: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/47.jpg)
Where have we got with ourtuning run?
![Page 48: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/48.jpg)
LSE Tuning results (failure rates)
I Out-of-the-box: 24.63%
I As configured: 22.39%
I After tuning (DAAT mode): 8.21%
![Page 49: Information Retrieval Iusers.cecs.anu.edu.au/~ssanner/MLSS2010/Hawking1.pdfoblong 3 zzzzz 2 Term Dictionary Index DocID Length QIE Snippet Acc.s Document Table doc001 doc002 doc003](https://reader036.fdocuments.us/reader036/viewer/2022081410/609367c0d5381349237127dd/html5/thumbnails/49.jpg)
On the flipside of coffee ...
I Web SearchEngineering
I Field Work: how do Web search engines really work?
I Stretch Break
I Discussion: Other IR problems for machine learning
I Historical context