Beyond Keywords: Finding Information More Accurately and...
Transcript of Beyond Keywords: Finding Information More Accurately and...
![Page 1: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/1.jpg)
Beyond Keywords:Beyond Keywords:Finding Information More AccuratelyFinding Information More Accuratelyand Easily Using Natural Languageand Easily Using Natural Language
TexPoint fonts used in EMF.Read the TexPoint manual before you delete this box.: AAAAAAAAAAAAAAA
Matt LeaseMatt [email protected]@cs.brown.edu
Brown Laboratory for LinguisticBrown Laboratory for LinguisticInformation Processing (BLLIP)Information Processing (BLLIP)
Brown UniversityBrown University
Center for Intelligent InformationCenter for Intelligent InformationRetrieval (CIIR)Retrieval (CIIR)
University of Massachusetts AmherstUniversity of Massachusetts Amherst
![Page 3: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/3.jpg)
3Matt Lease <[email protected]>
What is the state ofWhat is the state ofrecognizing handwriting inrecognizing handwriting intoday's computer systems?today's computer systems?
Only 2 relevant results!
1st relevant result: rank 5
![Page 5: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/5.jpg)
5Matt Lease <[email protected]>
Searching off the DesktopSearching off the Desktop
Longer and more natural queries emergeLonger and more natural queries emergein spoken settingsin spoken settings [Du and Crestani[Du and Crestani’’06]06]
![Page 6: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/6.jpg)
6Matt Lease <[email protected]>
Verbosity and ComplexityVerbosity and Complexity▶ Complex information requires complex descriptionComplex information requires complex description
Information theory [ShannonInformation theory [Shannon’’51]51]
Human discourse implicitly respects this [GriceHuman discourse implicitly respects this [Grice’’67]67]
▶ Simple searches easily expressed in keywordsSimple searches easily expressed in keywords
navigation: navigation: ““alaskaalaska airlines airlines””
information: information: ““americanamerican revolution revolution””
▶ Verbosity naturally increases with complexityVerbosity naturally increases with complexity
More specific information needs [More specific information needs [PhanPhan et al. et al.’’07]07]
Iterative reformulation [Lau and HorvitzIterative reformulation [Lau and Horvitz’’99]99]
Keywords?
![Page 7: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/7.jpg)
7Matt Lease <[email protected]>
Outline of TalkOutline of Talk▶ Natural language queries: what, where & why?
▶ Term-based models for NL queries
Problem: query complexity → query ambiguity
▶ Regression Rank [Lease, Allan, and Croft, ECIR’09]
Learning framework independent of retrieval model
▶ Extensions
Modeling term relationships [Lease, SIGIR’09]
Relevance feedback: explicit and pseudo [Lease, TREC’08]
![Page 8: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/8.jpg)
8Matt Lease <[email protected]>
.R elevance(Q;D) =
X
w2V
weightQD (w).
R elevance(Q;D) =X
w2V
weightQD (w)+ P rior(D )
Term-Based RetrievalTerm-Based Retrieval
Standard approachesStandard approaches
▶ Vector-similarity Vector-similarity [Salton et al.[Salton et al.’’60s, 60s, SinghalSinghal et al. et al.’’96]96]
▶ Document-likelihood Document-likelihood [[SparckSparck Jones et al. Jones et al.’’00]00]
▶ Query-likelihood Query-likelihood [Ponte and Croft[Ponte and Croft’’98]98]
KL-divergence variant [KL-divergence variant [Lafferty and Zhai’01]]
Roughly same features and accuracyRoughly same features and accuracy [Fang et al.[Fang et al.’’04]04]
DL QL under = parameterization DL QL under = parameterization [Lease, SIGIR[Lease, SIGIR’’09]09]
.r ank=
![Page 9: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/9.jpg)
9Matt Lease <[email protected]>
KL-Divergence RankingKL-Divergence Ranking▶ Estimate a unigram Estimate a unigram ££DD underlying each document underlying each document
Length- & order-independent representation of topicalityLength- & order-independent representation of topicality
Smoothing assigns non-zero probability to unseen termsSmoothing assigns non-zero probability to unseen terms
▶ Estimate similar unigram Estimate similar unigram ££Q Q underlying the queryunderlying the query
Default: maximum-likelihood (ML) estimationDefault: maximum-likelihood (ML) estimation
▶ Rank documents by minimal KL(Rank documents by minimal KL(££Q Q || || ££DD)) - KL(- KL(££Q Q || || ££DD)) == ££QQ ¢ ¢ log log ££DD + C+ CQQ
▶ Key IdeaKey Idea: : weightweightQDQD((¢¢)) decomposed into decomposed into ££Q Q && ££DD
££D D fixed for all queries (Dirichlet smoothing)fixed for all queries (Dirichlet smoothing)
££QQ expresses importance of terms for a given query expresses importance of terms for a given query
E xample: D = \duck duck goose"
ML estimate: µDduck =
23 , µ
Dgoose =
13
Smoothed: µDduck <
23 , µ
Dgoose <
13 ; (8w) µ
Dw > 0
E xample: D = \duck duck goose"
ML estimate: µDduck =
23 , µ
Dgoose =
13
![Page 10: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/10.jpg)
10Matt Lease <[email protected]>
Verbosity vs. Retrieval AccuracyVerbosity vs. Retrieval AccuracyTREC Topic 838TREC Topic 838
TitleTitle: : ““urban suburban coyotesurban suburban coyotes””DescriptionDescription: : ““How have humans responded and how should they respondHow have humans responded and how should they respond
to the appearance of coyotes in urban and suburban areas?to the appearance of coyotes in urban and suburban areas?””
![Page 11: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/11.jpg)
11Matt Lease <[email protected]>
Verbosity vs. Retrieval AccuracyVerbosity vs. Retrieval AccuracyTREC Topic 838TREC Topic 838
TitleTitle: : ““urban suburban coyotesurban suburban coyotes”” <urban suburban <urban suburban coyotcoyot>>
DescriptionDescription: : ““How have humans responded and how should they respondHow have humans responded and how should they respondto the appearance of coyotes in urban and suburban areas?to the appearance of coyotes in urban and suburban areas?””
<human respond <human respond respondrespond appear appear coyotcoyot urban suburban area> urban suburban area>
Average Precision example:Average Precision example:AP = (1/1 + 2/2 + 3/5) / 3AP = (1/1 + 2/2 + 3/5) / 3 1 2 3 4 5
NaturalLanguage?
![Page 12: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/12.jpg)
12Matt Lease <[email protected]>
RRIA Workshop [Buckley and Harman’04]
▶ 10-40 hours error analysis per-query, 45 Description queries
▶ Models failed to emphasize the right terms for Models failed to emphasize the right terms for ¼¼ 2/3 queries 2/3 queries
Verbosity vs. Retrieval Accuracy (2)Verbosity vs. Retrieval Accuracy (2)
Document C ollection Type # Documents # QueriesRobust04 Newswire 528,155 250W10g Web 1,692,096 100GOV2 Web 25,205,179 150
Mean Average Precision (MAP):per-query AP averaged across queries
![Page 13: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/13.jpg)
13Matt Lease <[email protected]>
Problem: Query AmbiguityProblem: Query Ambiguity
ML assumes all query tokens equally important to ML assumes all query tokens equally important to ££QQ!!
▶ The core information is often obscuredThe core information is often obscured
▶ Details distract rather than informDetails distract rather than inform
<human respond respond appear coyot urban suburban area>
![Page 14: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/14.jpg)
14Matt Lease <[email protected]>
Example: Better EstimateExample: Better Estimate ££QQ
More important terms should be assigned greater weight in More important terms should be assigned greater weight in ££QQ
How to estimate How to estimate ££Q Q ????
![Page 15: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/15.jpg)
15Matt Lease <[email protected]>
Outline of TalkOutline of Talk▶ Natural language queries: what, where & why?
▶ Term-based models for NL queries
Problem: query complexity → query ambiguity
▶ Regression Rank [Lease, Allan, and Croft, ECIR’09]
Learning framework independent of retrieval model
▶ Extensions
Modeling term relationships [Lease, SIGIR’09]
Relevance feedback: explicit and pseudo [Lease, TREC’08]
![Page 16: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/16.jpg)
16Matt Lease <[email protected]>
Supervised Learning of Supervised Learning of ££QQ
▶ Training data: Training data: document relevancedocument relevance Known relevance: documents manually assessedKnown relevance: documents manually assessed
Inferred relevance: query log Inferred relevance: query log ““click-throughclick-through”” data data
▶ Potential benefitsPotential benefits Data-driven: let examples guide estimationData-driven: let examples guide estimation
Lifetime learning: continually improve with more dataLifetime learning: continually improve with more data
Expressiveness: keep terms, replace estimationExpressiveness: keep terms, replace estimation
▶ Challenge: Challenge: sparsitysparsity One parameter per vocabulary term [cf. Mei et al.One parameter per vocabulary term [cf. Mei et al.’’07]07]
Existing Existing Learning To Rank Learning To Rank methods donmethods don’’t address thist address this
![Page 17: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/17.jpg)
17Matt Lease <[email protected]>
00
11
00
00
11
respondrespond
coyotcoyot
urbanurban
suburbansuburban
DallasDallas
4.134.13
3.483.48
3.833.83
3.733.73
3.233.23
0.030.03
0.30.3
0.110.11
0.160.16
0.400.40
Query Capitalized? Is noun? Log(DF) £Q
¸1 + ¸2 + ¸3 =
00
00
00
00
11
Regression Rank Regression Rank [Lease et al.[Lease et al.’’09]09]▶ IdeaIdea: Predict : Predict ££Q Q using fewer parametersusing fewer parameters
Find features correlated with Find features correlated with ££QQ (term importance) (term importance)
Predict Predict ££Q Q from these featuresfrom these features
![Page 18: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/18.jpg)
18Matt Lease <[email protected]>
EstimationEstimation, , Feature ExtractionFeature Extraction, , RegressionRegression
.d£Q1
.d£Q2
Estimate“gold” £Q’s
.d£Q3
“gold” £Q’sTraining Examples
Features
Feature Extraction
F = {f1, f2, f3}
Regression Training
Feature Weights
¤ ={¸1, ¸2, ¸3}
Feature ExtractionFeatures
Predicted £Q
.d£Qn
Regression Prediction ¤ ¢ F = £Q
Training
1 2 3
Input Query
n
Estimation: Given relevant/non-relevant documents, find strong £Q
Explicit relevance feedback with massive feedbackFeature Extraction: define features correlated with term importance Regression: predict £Q given features Run-time
![Page 19: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/19.jpg)
19Matt Lease <[email protected]>
Regression Rank: Regression Rank: EstimationEstimation▶ GoalGoal: optimize : optimize ££QQ for rank-based metric (e.g. AP) for rank-based metric (e.g. AP)
ChallengeChallenge: non-differentiable, non-convex: non-differentiable, non-convex
Simpler metrics to optimize, but diverge from goalSimpler metrics to optimize, but diverge from goal
▶ Grid search (sampling)Grid search (sampling)
[cf. Metzler and Croft[cf. Metzler and Croft’’05]05]
Embarrassingly parallelEmbarrassingly parallel
Exponential # samplesExponential # samples
E AP [£Q]=
1
Z
X
s
AP (£Qs )£
Qs
argmax£ Q AP (£
Qs )
![Page 20: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/20.jpg)
20Matt Lease <[email protected]>
ML £Q
[1, 0, 0][0, 1, 0][0, 0, 1]
Estimation Estimation ExampleExample
.
E AP [£Q]=
1
Z
X
s
AP (£Qs )£
Qs ; Z = 0:3859+ 0:2992+ 0:4897= 1:175
=0:3859
1:175£
Q1+
0:2992
1:175£
Q2+
0:4897
1:175£
Q3
=[0:3285;0:2547;0:4168 ]
AP(£Q)0.38590.29920.4897
Sub-queryQ1: humanQ2: suburbanQ3: urban
Query: [human suburban urban]
![Page 21: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/21.jpg)
21Matt Lease <[email protected]>
EstimationEstimation, , Feature ExtractionFeature Extraction, , RegressionRegressionTraining Examples
Features
Feature Extraction
F = {f1, f2, f3}
Training
1 2 3
Feature Extraction: define features correlated with term importance
![Page 22: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/22.jpg)
22Matt Lease <[email protected]>
Regression Rank: Regression Rank: FeaturesFeatures▶ FeaturesFeatures
Traditional IR statisticsTraditional IR statistics: e.g. term frequency, document frequency: e.g. term frequency, document frequency
▶ source: document collection & large external corporasource: document collection & large external corpora
PositionPosition: integer index of term in query: integer index of term in query
Lexical contextLexical context: Preceding/following terms and punctuation: Preceding/following terms and punctuation
Syntactic part-of-speechSyntactic part-of-speech: e.g. is term a noun / verb / other?: e.g. is term a noun / verb / other?
▶ Feature normalization: Feature normalization: set mean=0 & standard deviation=1set mean=0 & standard deviation=1
▶ Feature selectionFeature selection: prune features occurring <12 times: prune features occurring <12 times
![Page 23: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/23.jpg)
23Matt Lease <[email protected]>
EstimationEstimation, , Feature ExtractionFeature Extraction, , RegressionRegression
.d£Q1
.d£Q2
Estimate“gold” £Q’s
.d£Q3
“gold” £Q’sTraining Examples
Features
Feature Extraction
F = {f1, f2, f3}
Regression Training
Feature Weights
¤ ={¸1, ¸2, ¸3}
Training
1 2 3
Estimation: given relevant/non-relevant documents, find strong£Q
Feature Extraction: define features correlated with term importance Regression: predict £Q given features
![Page 24: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/24.jpg)
24Matt Lease <[email protected]>
▶ Ridge regression (L2 regularization of least-squares)Ridge regression (L2 regularization of least-squares)
Consistently better than ML, Lasso (L1), and othersConsistently better than ML, Lasso (L1), and others
Metric divergence (squared-loss vs. AP)Metric divergence (squared-loss vs. AP)
Regression Rank: Regression Rank: RegressionRegression
00
11
00
00
11
respondrespond
coyotcoyot
urbanurban
suburbansuburban
DallasDallas
4.134.13
3.483.48
3.833.83
3.733.73
3.233.23
0.030.03
0.30.3
0.110.11
0.160.16
0.400.40
Query Capitalized? Is noun? Log(DF) £Q
¸1 + ¸2 + ¸3 =
00
00
00
00
11
![Page 25: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/25.jpg)
25Matt Lease <[email protected]>
▶ Learning framework is independent of retrieval modelLearning framework is independent of retrieval model
e.g. Predict weights for term-interactions rather than termse.g. Predict weights for term-interactions rather than terms
Similar to Probabilistic Indexing Similar to Probabilistic Indexing [Fuhr and Buckley’91]
▶ Can learn context-dependent term weightsCan learn context-dependent term weights
Model richer context than just query lengthModel richer context than just query length
▶ Together: query-specific LTR Together: query-specific LTR [[GengGeng et al. et al.’’08]08]
e.g. Dynamically-weighted mixture modele.g. Dynamically-weighted mixture model
Regression Rank: StrengthsRegression Rank: Strengths
![Page 26: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/26.jpg)
26Matt Lease <[email protected]>
Key Concepts Key Concepts [[BenderskyBendersky and Croft and Croft’’08]08]▶ Annotate Annotate ““keykey”” NP for each query, train a classifier NP for each query, train a classifier
▶ Weight NPs by classifier confidence, and mix with ML Weight NPs by classifier confidence, and mix with ML ££QQ
Document C ollection Type # Documents # QueriesRobust04 Newswire 528,155 250W10g Web 1,692,096 100GOV2 Web 25,205,179 150
![Page 27: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/27.jpg)
27Matt Lease <[email protected]>
Regression Rank: ResultsRegression Rank: Results
Collection Type # Documents # Queries # Dev QueriesRobust04 Newswire 528,155 250 150W10g Web 1,692,096 100 -GOV2 Web 25,205,179 150 -
BLIND
5-fold cross-validation
▶ Fully-predicts all parameters (no mixing/tying)Fully-predicts all parameters (no mixing/tying)▶Can optimize model accuracy for any metricCan optimize model accuracy for any metric▶ Lifetime learning from query logLifetime learning from query log
![Page 28: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/28.jpg)
28Matt Lease <[email protected]>
Example: Predicted Example: Predicted ££QQ
TREC Topic 838TREC Topic 838How have humans responded and how should they respond toHow have humans responded and how should they respond to
the appearance of coyotes in urban and suburban areas?the appearance of coyotes in urban and suburban areas?<human respond respond appear coyot urban suburban areas>
E AP [£Q]=
1
Z
X
s
AP (£Qs )£
Qs
![Page 29: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/29.jpg)
29Matt Lease <[email protected]>
Room for Further ImprovementRoom for Further Improvement▶ Expectation below restricted to query vocabularyExpectation below restricted to query vocabulary
Expand vocabulary: feedback documentsExpand vocabulary: feedback documents
Model more than terms: e.g. term-interactionsModel more than terms: e.g. term-interactions
![Page 30: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/30.jpg)
30Matt Lease <[email protected]>
Outline of TalkOutline of Talk▶ Natural language queries: what, where & why?
▶ Term-based models for NL queries
Problem: query complexity → query ambiguity
▶ Regression Rank [Lease, Allan, and Croft, ECIR’09]
Learning framework independent of retrieval model
▶ Extensions
Modeling term relationships [Lease, SIGIR’09]
Relevance feedback: explicit and pseudo [Lease, TREC’08]
![Page 31: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/31.jpg)
31Matt Lease <[email protected]>
Sequential Dependency ModelSequential Dependency Model▶ [Metzler and Croft[Metzler and Croft’’05]05]
Simple, efficient,Simple, efficient, & consistently beats unigram& consistently beats unigram
▶ConsecutiveConsecutive query terms are scored 3 ways query terms are scored 3 ways Individual occurrence: Individual occurrence: unigramunigram
Co-occurrence: Co-occurrence: adjacencyadjacency (ordered) & (ordered) & proximityproximity
▶ExampleExampleWhat research is ongoing for new What research is ongoing for new fuel sourcesfuel sources??
Document = Document = ““fuel source fuel sourcefuel source fuel source””unigramunigramadjacencyadjacencyproximityproximity
![Page 32: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/32.jpg)
32Matt Lease <[email protected]>
Better Estimation of SD UnigramBetter Estimation of SD Unigram▶Estimate SD Unigram by Regression RankEstimate SD Unigram by Regression Rank
Adjacency and Proximity still use MLAdjacency and Proximity still use ML Consistent improvement [Lease, SIGIRConsistent improvement [Lease, SIGIR’’09]09]
![Page 33: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/33.jpg)
33Matt Lease <[email protected]>
Dependency Importance Varies tooDependency Importance Varies too
What research is ongoing for new fuel sources?What research is ongoing for new fuel sources?<research ongoing new fuel sources><research ongoing new fuel sources>{{research,ongoingresearch,ongoing} {} {ongoing,newongoing,new} {} {new,fuelnew,fuel} {} {fuel,sourcesfuel,sources}}
![Page 34: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/34.jpg)
34Matt Lease <[email protected]>
Filtering Spurious DependenciesFiltering Spurious DependenciesOracle ExperimentOracle Experiment [Lease, SIGIR [Lease, SIGIR’’09]09]
Rank dependencies by expected weightRank dependencies by expected weight
Successively add them in rank orderSuccessively add them in rank order
▶3% better MAP using single best dependency3% better MAP using single best dependency
![Page 35: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/35.jpg)
35Matt Lease <[email protected]>
Next: Estimate Dependency WeightsNext: Estimate Dependency Weights
▶Apply current features like TF/IDFApply current features like TF/IDF
▶Add new term relationship featuresAdd new term relationship features
Syntax, collocations, named-entities, etc.Syntax, collocations, named-entities, etc.
![Page 36: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/36.jpg)
36Matt Lease <[email protected]>
Outline of TalkOutline of Talk▶ Natural language queries: what, where & why?
▶ Term-based models for NL queries
Problem: query complexity → query ambiguity
▶ Regression Rank [Lease, Allan, and Croft, ECIR’09]
Learning framework independent of retrieval model
▶ Extensions
Modeling term relationships [Lease, SIGIR’09]
Relevance feedback: explicit and pseudo [Lease, TREC’08]
![Page 37: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/37.jpg)
37Matt Lease <[email protected]>
Relevance Feedback (Explicit & Pseudo)Relevance Feedback (Explicit & Pseudo)
▶ Idea: Idea: Better estimate Better estimate ££QQ using related documents using related documents
Particularly valuable for finding other related termsParticularly valuable for finding other related terms
▶ Explicit: Explicit: Given examples of relevant documentsGiven examples of relevant documents
Compute average Compute average ££DD, mix with query , mix with query ££QQ
▶ Pseudo: Pseudo: Blind expansionBlind expansion
Score documents with Score documents with ££QQ
Compute expected Compute expected ££DD, mix with query , mix with query ££QQ
▶ How can we apply supervised learning here?How can we apply supervised learning here?
[Rochio[Rochio’’71, 71, LavrenkoLavrenko and Croft and Croft’’01, Lafferty and Zhai01, Lafferty and Zhai’’01]01]
![Page 38: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/38.jpg)
38Matt Lease <[email protected]>
Preliminaries: TRECPreliminaries: TREC’’08 RF Track08 RF Track▶ Varied feedback: none (ad-hoc) to many documentsVaried feedback: none (ad-hoc) to many documents
▶ Approach: RF + PRF + Sequential Term DependenciesApproach: RF + PRF + Sequential Term Dependencies
▶ Best results in track [LeaseBest results in track [Lease’’08] (GOV2)08] (GOV2)
![Page 39: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/39.jpg)
39Matt Lease <[email protected]>
Step 1: Supervised Step 1: Supervised ££Q Q + PRF+ PRF
Without PRF With PRF
▶Are supervision and PRF complementary?Are supervision and PRF complementary?
▶Yes, and dependencies too! [Lease, SIGIRYes, and dependencies too! [Lease, SIGIR’’09]09]
![Page 40: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/40.jpg)
40Matt Lease <[email protected]>
Outlook: Supervised RF/PRFOutlook: Supervised RF/PRF▶ [Cao et al.[Cao et al.’’08]08]
Standard PRF: only 17% terms help, Standard PRF: only 17% terms help, 26-37% 26-37% hurthurt
Classify terms as good/bad, weight by confidenceClassify terms as good/bad, weight by confidence
Some details of approach can be improvedSome details of approach can be improved
▶ Future workFuture work: apply Regression Rank: apply Regression Rank
Feedback Feedback document(sdocument(s) just more verbosity) just more verbosity
Apply better learning, more featuresApply better learning, more features
![Page 41: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/41.jpg)
41Matt Lease <[email protected]>
SummarySummary▶ Natural language queries: what, where & why?
▶ Term-based models for NL queries
Problem: query complexity → query ambiguity
▶ Regression Rank [Lease, Allan, and Croft, ECIR’09]
Learning framework independent of retrieval model
▶ Extensions
Modeling term relationships [Lease, SIGIR’09]
Relevance feedback: explicit and pseudo [Lease, TREC’08]
![Page 42: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using](https://reader034.fdocuments.us/reader034/viewer/2022042307/5ed3fc558d46b66d226337f3/html5/thumbnails/42.jpg)
Brown Laboratory for Linguistic Information Processing (BLLIP)Brown Laboratory for Linguistic Information Processing (BLLIP)Brown UniversityBrown University
http://http://bllip.cs.brown.edubllip.cs.brown.edu
Center for Intelligent Information Retrieval (CIIR)Center for Intelligent Information Retrieval (CIIR)University of Massachusetts AmherstUniversity of Massachusetts Amherst
http://http://ciir.cs.umass.educiir.cs.umass.edu
Support for this work comes from theSupport for this work comes from theNational Science FoundationNational Science Foundation
Partnerships for International Research and Education (PIRE)Partnerships for International Research and Education (PIRE)