A Field Relevance Model

A Field Relevance Model for Structured Document Retrieval

JIN YOUNG KIM @ ECIR 2012

Three Themes• The Concept of Field Relevance

• Using Field Relevance for Retrieval

• The Estimation of Field Relevance

Relevance Field Weighting

Field Relevance

THE FIELD RELEVANCE

IR : The Quest for Relevance• The Role of Relevance• Core Component of Retrieval Models• Basis of (Pseudo) Relevance Feedback

• Retrieval Models based on the Relevance• Binary Independence Model (BM25) [Robertson76]

• Relevance-based Language Model [Lavrenko01]

V = (w1 w2 ... wm)

P(w|R)

Structured Document Retrieval• Documents have multiple fields• Emails, products (entities), and so on.

• Retrieval models exploit the structure• Field weighting is common

q1 q2 ... qm

multiply

Relevance for Structured Document Retrieval• Term-level Relevance• Which term is important for user’s information need?

• Field-level Relevance• Which field is important for user’s information need?

P(F|R)

F = (F1 F2 … Fn)V = (w1 w2 ... wm)

P(w|R)Field-level relevanceTerm-level relevance

Defining the Field Relevance7

q1 … qi … qm

F 1 …

P(F|q1,R) P(F|qm,R)P(F|qi,R)

Field RelevanceThe distribution of per-term relevance over document fields

per-term P(F|w,R)

Query:m words

Collection:n fields for each document

Q = (q1 q2 ... qm)

F = (F1 F2 … Fn)

• Different fields are relevant for different query-term

‘james’ is relevant when it occurs in

‘registration’ is relevant when it occurs

in <subject>

Why P(F|w,R) instead of P(F|R)?

Query: ‘james registration’

More Evidence for the Field Relevance• Field Operator / Advanced Search Interface• User’s search terms are found in multiple fields

Understanding Re-finding Behavior in Naturalistic Email Interaction Logs. Elsweiler, D, Harvey, M, Hacker., M [SIGIR'11]

Evaluating Search in Personal Social Media Collections Chia-Jung L, Croft, W.B., Kim, J[WSDM12]

THE FIELD RELEVANCE MODEL

Retrieval over Structured Documents• Field-based Retrieval Models• Score each field against each query-term• Combine field-level scores using field weights

Fixed field weights wj can be too restrictive

• Field Relevance Model

• Comparison with Mixture of Field Language Model

Using the Field Relevance for Retrieval

Per-term Field Weight

Per-term Field Score

q1 q2 ... qm

P(F1|q1)

P(F2|q1)

P(Fn|q1)

P(F1|qm)

P(F2|qm)

P(Fn|qm)

multiply

Structured Document Retrieval: PRM-S13

• Probabilistic Retrieval Model for Semi-structured data • Estimate the mapping between query terms and doc. fields• Use the mapping probability as per-term field weights

[Kim, Xue, Croft 09]

Estimation is based on limited sources.

• Field Relevance Model

• Comparison with the PRM-S• FRM has the same functional form to PRM-S• FRM differs in how per-term field weights are estimated

Using the Field Relevance for Retrieval

ESTIMATING FIELD RELEVANCE

Estimating Field Relevance: in a Nutshell• If User Provides Feedback• Relevant document provides sufficient information

• If No Feedback is Available• Combine field-level term statistics from multiple sources

contenttitle

from/to

Relevant Docscontent

titlefrom/to

Collection content

titlefrom/to

Top-k Docs

• Assume a user who marked DR as relevant• Estimate field relevance from the field-level term dist. of

• We can personalize the results accordingly• Rank higher docs with similar field-level term distribution

Estimating Field Relevance using Feedback17

- To is relevant for ‘james’- Content is relevant for ‘registration’

Field Relevance:

Estimating Field Relevance without Feedback18

Unigram is thesame to PRM-S

Similar to MFLM and BM25F

Pseudo-relevance Feedback

• Method• Linear Combination of Multiple Sources• Weights estimated using training queries

• Features• Field-level term distribution of the collection• Unigram and Bigram LM

• Field-level term distribution of top-k docs• Unigram and Bigram LM

• A priori importance of each field (wj)• Estimated using held-out training queries

EXPERIMENTS

Experimental Setup• Collections• TREC Emails• IMDB Movies• Monster Resumes

• Distribution of the Most Relevant Field

#Documents

#Queries

#RelDocs / Query

TREC 198,394 125 1IMDB 437,281 50 2Monster 1,034,795 60 15

Query Examples (Indri)• Oracle Estimates of Field Relevance

Monster

Retrieval Methods Compared• Baselines• DQL / BM25F• MFLM : fixed regardless of terms• PRM-S : estimated using the collection

• Field Relevance Models• FRM-C : estimated using the combination• FRM-O : estimated using relevant documents

Differs only in terms of the field weighting!

DQL BM25F MFLM PRM-S FRM-C FRM-OTREC 54.2% 59.7% 60.1% 62.4% 66.8% 79.4%IMDB 40.8% 52.4% 61.2% 63.7% 65.7% 70.4%Monster 42.9% 27.9% 46.0% 54.2% 55.8% 71.6%

Retrieval Effectiveness23

(Metric: Mean Reciprocal Rank)

Fixed Field WeightsPer-term Field Weights

DQL BM25F MFLM PRM-S FRM-C FRM-O20.0%30.0%40.0%50.0%60.0%70.0%80.0%

TRECIMDBMonster

• Aggregated KL-Divergence from Oracle Estimates

• Aggregated Cosine Similarity with Oracle Estimates

Quality of Field Relevance Estimation24

TREC Monster IMDB0.0001.0002.0003.0004.0005.000

MFLMPRM-SFRM-C

TREC Monster IMDB0.0000.2000.4000.6000.8001.000

MFLMPRM-SFRM-C

• Features Revisited• Field-level term distribution of the collection (PRM-S)• Field-level term distribution of top-k documents• A priori relevance of term (prior)

• Results for TREC Collection

Feature Ablation Results25

Feature Set All -rug/rbg -cbg/

rbg -cbg/cug -priorMAP 0.668 0.662 0.651 0.648 0.644%Reduction 0% -0.9% -2.5% -3% -3.6%

Unigram

Bigram

Collection LM

cug cbg

Top-k Docs LM

rug rbg

CONCLUSIONS

Summary• Field relevance as a generalization of field weighting• Relevance modeling for structured document retrieval

• Field relevance model for structured doc. retrieval• Using field relevance to combine per-field LM scores

• Estimating the field relevance using relevant docs• Providing a natural way to incorporate relevance feedback

• Estimating the field relevance by combining sources• Improved performance over MFLM and PRM-S

Ongoing Work• Large-scale batch evaluation on a book collection• Test collections built using OpenLibrary.org query logs

• Evaluation of the relevance feedback on FRM• Does relevance feedback improves on subsequent results?

• Integrating the term relevance and field relevance• Further improvement is expected when combined

Field Relevance

Term Relevance

I’m on the job market!• Structured Document Retrieval• A Probabilistic Retrieval Model for Semi-structured Data [ECIR09]

• A Field Relevance Model for Structured Document Retrieval [ECIR11]

• Personal Search• Retrieval Experiments using Pseudo-Desktop Collections [CIKM09]

• Ranking using Multiple Document Types in Desktop Search [SIGIR10]

• Evaluating an Associative Browsing Model for Personal Info. [CIKM11]

• Evaluating Search in Personal Social Media Collections [WSDM12]

• Web Search• An Analysis of Instability for Web Search Results [ECIR10]

• Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic [WSDM12]

More at @jin4ir, orcs.umass.edu/

~jykim

OPTIONAL SLIDES

Optimality of Field Relevance Estimation• This results in the optimal field weighting• Scores DR as highly as possible against other docs• Under the language modeling framework for IR

Proof on the extended version

Features based on Field-level Term Dists.• Summary

• Estimation

Unigram LM (= PRM-S)

Bigram LM

Unigram

Bigram

Collection LM cug cbgTop-k Docs LM rug rbg

A Field Relevance Model

Documents

Transcript of A Field Relevance Model

Rigor and Relevance: A Model of Enhanced Math Learning in ...€¦ · Rigor and Relevance: A Model of Enhanced Math Learning in Career and Technical Education National Research Center

PENGARUH MODEL ATTENTION, RELEVANCE, …digilib.uin-suka.ac.id/21787/1/12670033_BAB-I_IV-atau-V_DAFTAR... · pengaruh model attention, relevance, confidence, and satisfaction ...

ROSEMARY AINSLIE Magnetic Field Model

Relevance, Beneﬁts, and Problems of Software Modelling … · Relevance, Beneﬁts, and Problems of Software Modelling and Model Driven Techniques - A Survey in the Italian Industry

Learning a Product Relevance Model from Click-Through Data ...

PENGARUH MODEL PEMBELAJARAN ATTENTION RELEVANCE …digilib.uin-suka.ac.id/28169/1/13670050_BAB-I_IV-atau-V_DAFTAR-PUSTAKA.pdf · pengaruh model pembelajaran attention, relevance,

Stimulus fear-relevance and the vicarious learning pathway to ...Field, A. P. (2013). Stimulus fear-relevance and the vicarious learning pathway to childhood Stimulus fear-relevance

Relevance, Benefits, and Problems of Software Modelling and Model-Driven Techniques

Relevance of Norwegian Model for Developing Countriesfaroukal-kasim

The Relevance of the Structural-Contingency Model for ...core.ac.uk/download/pdf/6648358.pdfThe Relevance of the Structural-Contingency Model for Organizational ... Structural-Contingency

Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.

An Empirical Test of an Updated Relevance - Accessibility Model of

A Unified Relevance Model for Opinion Retrieval

PENGGUNAAN MODEL PEMBELAJARAN ARIAS filepenggunaan model pembelajaran arias ( assurance, relevance, interest, assesment, satisfaction ) untuk meningkatkan penguasaan konsep perubahan

CONTRIBUTION OF RABINDRANATH TAGORE IN THE FIELD OF EDUCATION AND ITS RELEVANCE IN THE MODERN WORLD

The Relevance of Garbage- Can Model

Coulomb Force The field model and the electric field

Media Theory, Public Relevance and the Propaganda Model Today

Model / Algorithm / Field Control

Recurrent Support and Relevance Vector Machines Based Model ...