Language Modeling Information Retrieval - IIT Bombaynlp-ai/java ppt/java_lect_35.pdf · 2 Outline...
Transcript of Language Modeling Information Retrieval - IIT Bombaynlp-ai/java ppt/java_lect_35.pdf · 2 Outline...
2
Outline of the Seminar
• What is Language Modeling?
• Language Modeling in NLP• Speech Recognition, Statistical MT, Classification, IR …
• The Language Modeling approach to IR– the conceptual model– smoothing– semantic smoothing
• IR as Statistical Machine Translation
3
Language Modeling
• A language model is a probabilistic mechanism for generating text
• Language models estimate the probability distribution of various natural language phenomena– sentences, utterances, queries …
4
Language Modeling Techniques• N-grams
• Class-based N-grams• Probabilistic CFGs
• Decision Trees
Claude Shannon was probably the first language modeler“How well can we predict or compress natural language text through simple n-gram models?”
6
Applications
• Speech Recognition• Statistical Machine Translation• Classification• Spell Checking
Play the role of either the prior or the likelihood
7
Language Models inSpeech Recognition
Noisy ChannelSource WordNoisyWord
Decoder
)()|(maxarg
)|(maxargˆ
wpwOp
Owpw
Vocabularyw
Vocabularyw
�
�
w
)(wp (source model)
Ow
8
Language Models in Statistical Machine Translation
Noisy Channel
Original sentence e Noisy sentence f
Decoder
)e()e|f(maxarg
)f|e(maxarge
e
e
pp
p
E
E
�
�
)e(p )e|f(p (translation model)(source model)
Source e
9
Language Models inText Classification
)c()c|d(maxarg
)d|c(maxargc
c
c
pp
p
C
C
�
�
a language model for each class
Bayesian Classifier
11
The Conceptual Model of IR
• The user has an information need ,
• From this need he generates an ideal document fragment d,
• He selects a set of key terms from d,, and generates a query q from this set
12
Analysis of Australia’s defeatto England in the Ashes. Reason
for defeat: overconfidence?
Eng bt. Aus: AnalysisAustralia’s defeat to England in the Ashes islargely due to their overconfidence. They don’tseem to have figured out that they are facing a
far superior team …
The Ashes ReviewBrash batting by Australia has lost them
a test match. Their opposition is not Zimbabwe, and not many of the Aussies
seem to remember that! …
Information need ,
Ideal document d,
A candidate document
australia ashes defeat analysis
query
d
q
13
Document generation model
Document-queryTranslation model
Retrieval Engineq
q
{d}
d,,
retrieved documentsranked by:
informationneed
Ideal documentfragment
query
query
),q|d( 8p
8
14
)|q(
)|d(),d|q(),q|d( 8
888p
ppp
Using Bayes’ law,
For the purpose of ranking,
)|d(),d|q()d(q 88 ppp
)d()d|q()d(q ppp (user independent)
The Language Modeling Approach
15
)d()d|q()d(q ppp
query-likelihood: comes from the language model
document prior(query-independent)
E.g., PageRank
Language Modeling for IR:Using document language models to assign likelihood scores to queries(Ponte and Croft, 1998)
16
Analysis of Australia’s defeatto England in the Ashes. Reason
for defeat: overconfidence?
Eng bt. Aus: AnalysisAustralia’s defeat to England in the Ashes islargely due to their overconfidence. They don’tseem to have figured out that they are facing a
far superior team …
The Ashes ReviewBrash batting by Australia has lost them
a test match. Their opposition is not Zimbabwe, and not many of the Aussies
seem to remember that! …
Information need ,
Ideal document d,
A candidate document
australia ashes defeat analysis
query
?),d|q( 8p
q
d
17
The Language Model
• For each document din the collection C, for each term w, estimate:
• To find the query likelihood, assume query terms to be independent:
)d|w(p
� m
iiqpp
1)d|()d|q(
18
The Need for Smoothing
• To eliminate “zero” probabilities– Correct the error in the MLE due to the
problem of data sparsity
australia ashes defeat analysis
The Ashes ReviewBrash batting by Australia has lost them
a test match. Their opposition is not Zimbabwe, and not many of the Aussies
seem to remember that! …
p(defeat|d) = 0
p(q|d) = 0
d
q
19
Smoothing
• Jelinek-Mercer Method: Linear interpolation of the maximum likelihood model with the collection model
• (Zhai & Lafferty, 2003) compares Jelinek-Mercer, Dirichlet and the absolute discounting methods for Language Models applied to IR.
)|w()d|w()1()d|w( &ppp OOO ��
20
Analysis of Australia’s defeatto England in the Ashes. Reason
for defeat: overconfidence?
Eng bt. Aus: AnalysisAustralia’s defeat to England in the Ashes islargely due to their overconfidence. They don’tseem to have figured out that they are facing a
far superior team …
The Ashes ReviewBrash batting by Australia has lost them
a test match. Their opposition is not Zimbabwe, and not many of the Aussies
seem to remember that! …
Information need , Ideal document d,
A candidate document
australia ashes defeat overconfidence
query
q
d
The Need for Semantic Smoothing
21
Eng bt. Aus: AnalysisAustralia’s defeat to England in the Ashes islargely due to their overconfidence. They don’tseem to have figured out that they are facing a
far superior team …
australia ashes defeat overconfidence
Document-queryTranslation model
Ideal document d,
query
Document-Query Translation Model
?)d|q( p
Must be sophisticated enough to handle
polysemy and synonymy
22
Semantic Smoothing
�¦
m
1i w
i )d|w()w|()d|q( pqtp
)d|(analysisp
Estimate translation probabilities t(wq|wd) formapping a document term wd to a query term wq
)d|(review)|( reviewpanalysisp
(Berger and Lafferty, 1999)(Lafferty and Zhai, 2001)
23
Compare withThe Statistical MT model
Noisy Channel
Original sentence e Noisy sentence f
Decoder
)e()e|f(maxarg
)f|e(maxarge
e
e
pp
p
E
E
�
�
)e(p )e|f(p (translation model)(source model)
Source e
Language model Semantic Smoothing
24
Language Modeling vis-à-visthe ‘traditional probabilistic model’
?)d,q|( rp
),q|d(
),q|d(
rp
rp log
The Probability Ranking Principle(Robertson, 1977)
The Robertson-Sparck Jones Model(Sparck Jones et al., 2000)
Language Modeling approach: � Where’s the relevance?� Is the ranking optimal?
(Lafferty & Zhai, 2002)(Laverenko and Croft, 2001)
25
Discussion: Advantages of the Language Modeling Approach
• Conditioning on d provides a larger ‘foothold’ for estimation
• p(d): an explicit notion of the importance of a document
• Document normalization is not an issue
“When designing a statistical model … the most natural route is to apply a generative model which builds up the output step-by-step. The source channel perspective suggests a different approach: turn the search problem around to predict the input. In speech recognition, NLP, and MT, researchers have time and again found that predicting what is known from competing hypothesis can be easier than directly predicting all of the hypothesis”
26
Discussion: Disadvantage of the Language Modeling Approach
• The Robertson-Sparck Jones model can directly use relevance judgements to improve the estimation of p(Ai |q,r)
This is not possible in the LanguageModeling approach
27
References
• (Robertson, 1977) The probability ranking principle in IR
• (Sparck Jones et al., 2000) A probabilistic model of information retrieval: development and comparative experiments.
• (Ponte and Croft, 1998) A language modeling approach to information retrieval
• (Zhai and Lafferty, 2001) A study of smoothing methods for language models applied to ad hoc information retrieval
28
References …
• (Berger and Lafferty, 1999) Information retrieval as statistical translation
• (Lafferty and Zhai, 2001) Risk minimization and language modeling in information retrieval
• (Lavrenko and Croft, 2001) Relevance-based language models
• (Lafferty and Zhai, 2001) Probabilistic IR models based on document and query generation