Reverted Indexing for Expansion and Feedback
-
Upload
gene-golovchinsky -
Category
Technology
-
view
996 -
download
0
description
Transcript of Reverted Indexing for Expansion and Feedback
Reverted Indexing for Feedback and Expansion
Jeremy Pickens, Matthew Cooper,
Gene Golovchinsky
Reverted Indexing for Feedback and Expansion
Jeremy PickensCatalyst Repository Systems
Query-Document Duality has long history
• Using queries to label documents
• Queries and documents as bipartite graph– Used for random walks– Used for partitioning
• Reverse Querying
Motivation – Three R’s
Retrievability
Reuse (Algorithmic)
Recall-Oriented Tasks
Our Key Contribution
We treat query result sets as unstructured text “documents” -- and index them
Outline
• Reverted Documents• Reverted Indexing• Experimental Setup• Results
– Effectiveness– Efficiency
• Related Work• Future Extensions
Reverted Document
Query Expression
Ranking Algorithm
Results (docid)
Results (score)
ID(Basis Query)
Body
Basis Query(Reverted Document ID)
Query Expression
RankingAlgorithm
giraffe BM25
cheetah BM25
gazelle BM25
gazelle Language Model
gazelle PL2 (Divergence from Randomness)
gazelle Y
gazelle B
gazelle G
fast cheetah BM25
cheetah AND NOT gazelle Boolean
Latitude+Longitude of Zanzibar Euclidean distance
Reverted Document Body
Results (docid)
Results (score)
Canonical URL and/or docid
1. Probability of Relevance2. Cosine similarity3. KL Divergence4. Raw Rank5. 1 or 0 (Boolean)
rank docid score shift-scale Ahn&Moffat
1 #415 0.82 10.0 10
2 #32 0.73 8.92 9
3 #63 0.62 7.57 8
4 #7 0.49 5.95 6
5 #56 0.35 4.24 4
6 #12 0.14 1.72 2
7 #108 0.12 1.36 1
8 #115 0.09 1.09 1
9 #42 0.08 1.0 1
10 #85 0.08 1.0 1
Result Set→Document Body
Result Set→Document Bodydocid Ahn&Moffat
#415 10
#32 9
#63 8
#7 6
#56 4
#12 2
#108 1
#115 1
#42 1
#85 1
<text>415 415 415 415 415 415 415 415 415 415 32 32 32 32 32 32 32 32 32 63 63 63 63 63 63 63 63 7 7 7 7 7 7 56 56 56 56 12 12 108 115 42 85</text>
Reverted Document
Query Expression
Ranking Algorithm
Results (docid)
Results (score)
ID(Basis Query)
Body
Reverted Document<document><docid>[gazelle : BM25]</docid><text>415 415 415 415 415 415 415 415 415 415 32 32 32 32 32 32 32 32 32 63 63 63 63 63 63 63 63 7 7 7 7 7 7 56 56 56 56 12 12 108 115 42 85</text></document>
Fin
Questions?
Outline
• Reverted Documents• Reverted Indexing• Experimental Setup• Results
– Effectiveness– Efficiency
• Related Work• Future Extensions
Reverted Indexing
1. Choose a set of basis queries
2. For each basis query:1. Execute each query, producing results up to
cutoff depth k
2. Use results to create a “reverted document”
3. Add the reverted document to the index
How basis queries are chosen (in these experiments): All singleton terms (unigrams) with df ≥ 2. Ranking algorithm for all basis queries is PL2.
Standard Index
Reverted Index
Reverted Index Statistics
Retrieval Score of docid Term Frequency
Sum of Retrieval Scores of all docids retrieved by
a Basis Query
Document Length
Number of Basis Queries that docid was
retrieved by
Document Frequency
Outline
• Reverted Documents• Reverted Indexing• Experimental Setup• Results
– Effectiveness– Efficiency
• Related Work• Future Extensions
Experiment: Relevance Feedback
1. Run initial query using PL2 (Terrier platform)[poaching wildlife preserves]
2. Judge top k documents for relevance
3.
4. Expand using top 500 terms (strongest baseline @ 500)
5. Run expanded query using PL2
6. Evaluate
Use KL Divergence to select and weight query expansion terms
Use Bo1 to select and weight query expansion terms
Use PL2 retrieval on the Reverted Index to select and weight query expansion terms
Reverted Index→Expansion1. Original query = [poaching wildlife preserves]
2. Reverted query = [#415 #56 #42 #85]
3. Expanded query = [poaching^2.0 wildlife^1.24 preserves^1.0 poachers^0.57 tsavo^0.56 leakey^0.41 tusks^0.39 …]
term original retrieved weightpoaching 1 1.0 2.0poachers 0 0.57 0.57
tsavo 0 0.56 0.56leakey 0 0.41 0.41tusks 0 0.39 0.39
elephants 0 0.34 0.34wildlife 1 0.24 1.24
kws 0 0.2 0.2… … … …
preserves 1 0 1.0
Outline
• Reverted Documents• Reverted Indexing• Experimental Setup• Results
– Effectiveness– Efficiency
• Related Work• Future Extensions
MAP
%Change
Residual MAP
%Change
Efficiency
• Two components to query expansion– Selection and Weighting– Execution of Expanded Query
Avg Selection Time
Avg Execution Time
Why would execution be faster?
Bo1 Reverted_PL2Term Score Term Score
leakey 0.88 poaching 1.00poaching 0.74 poachers 0.56wildlife 0.73 tsavo 0.56kenya 0.52 leakey 0.41ivory 0.47 tusks 0.39elephants 0.46 elephants 0.34elephant 0.32 wildlife 0.24deer 0.30 kws 0.20poachers 0.28 kez 0.17conservation 0.27 ivory 0.14species 0.23 jealousies 0.14tusks 0.19 elephant 0.14african 0.19 conservationists 0.09namibia 0.19 kenya 0.09animals 0.17 fiefdom 0.08africa 0.15 safaris 0.04zimbabwe 0.15 conservationist 0.03tsavo 0.14 egos 0.01kenyan 0.13 kierie 0.00conservationists 0.12 aphrodisiacs 0.00
Bo1 Reverted_PL2Term DF Term DF
africa 20390 wildlife 2891african 10636 kenya 1163conservation 4298 ivory 1014animals 3928 elephant 743species 3479 elephants 356wildlife 2891 poaching 331kenya 1163 conservationists 293ivory 1014 egos 269zimbabwe 966 kez 173deer 748 fiefdom 129elephant 743 conservationist 125namibia 483 poachers 117kenyan 436 safaris 57elephants 356 jealousies 56poaching 331 tusks 42conservationists 293 leakey 22poachers 117 tsavo 12tusks 42 aphrodisiacs 12leakey 22 kws 9tsavo 12 kierie 2Average DF 2617 Average DF 391
Bo1 Reverted_PL2Term DF Term DF
los 46748 transportation 15262angeles 45147 freeway 3506metro 39849 tunnel 2643safety 22569 disasters 1822fire 21257 subway 805foot 13120 extinguished 452traffic 12410 rtd 227feet 12034 caved 193hollywood 7677 shoring 158heat 6004 roper 147rail 5747 timbers 98downtown 5390 shored 97engineers 4308 pilgrimages 73freeway 3506 asphyxiation 71disasters 1822 smolder 29firefighters 1489 busway 22subway 805 grouting 21rtd 227 smoldered 19timbers 98 lutgen 10busway 22 droped 2Average DF 12511 Average DF 1283
Outline
• Reverted Documents• Reverted Indexing• Experimental Setup• Results
– Effectiveness– Efficiency
• Related Work• Future Extensions
Related Work
Inspiration:
“Retrievability: An Evaluation Measure for Higher Order Information Access Tasks” --Azzopardi and Vinay, CIKM 2008
Azzopardi & Vinay take a document centric approach, examining whether documents (n)ever appear among top k results to any query
Related Work
Query-Document Duality has long history– S. E. Robertson. “Query-Document Symmetry
and Dual models.” Journal of Documentation, 50(3),1994
– B. Billerbeck, F. Scholer, H. E. Williams, and J. Zobel. Query Expansion Using Associated Queries. CIKM '03
– N. Craswell and M. Szummer. Random walks on the Query-Click Graph. SIGIR 2007
– Reverse Querying / alerting (various)
Future ExtensionsBasis queries
– Query expression may be arbitrarily complex– Ranking function may be arbitrarily complex
(remember: ranking function is a part of the basis query)
Reverted queries– Best Match: [#415 #56 #42 #85]– Boolean: (#415 AND #56) OR (#42 AND #85)– Other query operators:
[SYNONYM(#415 #56) #42 #85]
[ORDERED(#415 #56) #42 #85]
Motivation – Three R’s
Retrievability
Reuse (Algorithmic)
Recall-Oriented Tasks
Fin
Questions?