Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu,...
-
Upload
natalie-strickland -
Category
Documents
-
view
220 -
download
0
Transcript of Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu,...
![Page 1: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/1.jpg)
Dependence Language Model for Information Retrieval
Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Infor
mation Retrieval, SIGIR 2004
![Page 2: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/2.jpg)
Reference
• Structure and performance of a dependency language model. Ciprian, David Engle and et al. Eurospeech 1997.
• Parsing English with a Link Grammar. Daniel D. K. Sleator and Davy Temperley. Technical Report CMU-CS-91-196 1991.
![Page 3: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/3.jpg)
Why we use independence assumption?
• The independence assumption is one of the assumptions widely adopted in probabilistic retrieval theory.
• Why?– Make retrieval models easier.
– Make retrieval operation tractable.
• The shortage of independence assumption– Independence assumption does not hold in textual data.
![Page 4: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/4.jpg)
Latest ideas of dependence assumption
• Bigram– Some language modeling approach try to incorporate word frequency b
y using bigram.
– Shortage:
• Some of word dependencies not only exist between adjacent words but also exist at more distant.
• Some of adjacent words are not exactly connected.
– Bigam language model showed only marginally better effectiveness than the unigram model.
• Bi-term– Bi-term language model is similar to the bigram model except the const
raint of order in terms is relaxed.
– “information retrieval” and “retrieval of information” will be assigned the same probability of generating the query.
![Page 5: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/5.jpg)
Structure and performance of a dependency language model
![Page 6: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/6.jpg)
Introduction
• This paper present a maximal entropy language model that incorporates both syntax and semantics via a dependency grammar.
• Dependency grammar: express the relations between words by a directed graph which can incorporate the predictive power of words that lie outside of bigram or trigram range.
![Page 7: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/7.jpg)
Introduction
• Why we use Ngram– Assume
if we want to record
we need to store independent parameters
• The drawback of Ngram– Ngram blindly discards relevant words that lie N or more positions in t
he past.
nwwwwS ...,,, 210)...|()...|()()( 10010 nn wwwPwwPwPSP
)...|( 10 nn wwwP
)1(1 VV i
![Page 8: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/8.jpg)
Structure of the model
![Page 9: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/9.jpg)
Structure of the model
• Develop an expression for the joint probability , K is the linkages in the sentence.
• Then we get
• Assume that the sum is dominated by a single term, then
),( KSP
K
KSPSP ),()(
),(maxarg
),(),(*
*
KSPKwhere
KSPKSP
K
K
),()( *KSPSP
![Page 10: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/10.jpg)
A dependency language model of IR
• A query we want to rank – Previous work:
• Assume independence between query terms :
– New work:
• Assume that term dependencies in a query form a linkage
)...( 1 mqqQ )|( DQP
mi i DqPDQP...1
)|()|(
L L
DLQPDLPDLQPDQP ),|()|()|,()|(
![Page 11: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/11.jpg)
A dependency language model of IR
• Assume that the sum over all the possible Ls is dominated by a single term
• Assume that each term is dependent on exactly one related query term generated previous.
L L
DLQPDLPDLQPDQP ),|()|()|,()|(
L
DLQP )|,(
*L
)|(maxarg
),|()|()|(
DLPLthatsuch
DLQPDLPDQP
L
hq iq jq
![Page 12: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/12.jpg)
A dependency language model of IR
Lji ji
jijh
Lji i
ijh
Ljiijh
DLqPDLqP
DLqPDLqqPDqP
DLqP
DLqqPDqP
DLqqPDqPDLQP
),(
),(
),(
),|(),|(
),|(),|,()|(
),|(
),|,()|(
),,|()|(),|(
)|(maxarg
),|()|()|(
DLPLthatsuch
DLQPDLPDQP
L
hq iq jq
![Page 13: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/13.jpg)
A dependency language model of IR
• Assume– The generation of a single term is independent of L
• By this assumption, we would have arrived at the same result by starting from any term. L can be represented as an undirected graph.
)|(),|( DqPDLqP jj
mi Lji ji
jii
hj Lji ji
jijh
DqPDqP
DLqqPDqP
DqPDqP
DLqqPDqPDqPDLQP
...1 ),(
),(
)|()|(
),|,()|(
)|()|(
),|,()|()|(),|(
Lji ji
jijh DLqPDLqP
DLqPDLqqPDqP
),( ),|(),|(
),|(),|,()|(
![Page 14: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/14.jpg)
A dependency language model of IR
)|(maxarg
),|()|()|(
DLPLthatsuch
DLQPDLPDQP
L
)|()|(
),|,(log),|,(
),|,()|(log)|(log)|(log...1 ),(
DqPDqP
DLqqPDLqqMI
DLqqMIDqPDLPDQP
ji
jiji
mi Ljijii
取 log
mi Lji ji
jii DqPDqP
DLqqPDqPDLQP
...1 ),( )|()|(
),|,()|(),|(
![Page 15: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/15.jpg)
Parameter Estimation
• Estimating– Assume that the linkages are independent.
– Then count the relative frequency of link l between and given that they appear in the same sentence.
)|( DLP
Ll
DlPDLP )|()|(
),(
),,(),|(
ji
jiji qqC
RqqCqqRF
mi Lji
jii DLqqMIDqPDLPDQP...1 ),(
),|,()|(log)|(log)|(log
iq jq
Have a link in a sentence
in training dataA score
)|(),|(
),|(
),(
QlPqqRF
qqRF
ljiji
ji
The link frequency of
query i and query j
![Page 16: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/16.jpg)
Parameter Estimation
),|()|( ji qqRFQlP
Lji
jiLL
qqRFQLPL),(
),|(maxarg)|(maxarg
Ll Lji
ji qqRFDlPDQLPDLP),(
),|()|(),|()|(
)|(),|(
),|(
),(
QlPqqRF
qqRF
ljiji
ji
assumption
),|(),|()1(),|( jiCjiDji qqRFqqRFqqRF
Assumption: )|()|( DLPQLP
![Page 17: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/17.jpg)
Parameter Estimation
• Estimating– The document language model is smoothed with a Dirichlet prior
)|( DqP i
ii qiC
iC
qiC
iCiD
iii
qC
qC
qC
qCqC
CqPDqPDqP
)(
)(
)(
)()()1(
)|()|()1()|('
Dirichilet distribution
Constant discount
![Page 18: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/18.jpg)
Parameter Estimation
• Estimating ),|,( DLqqMI ji
),(*,),*,(
),,(log
)),(*,)(),*,((
),,(log
),|(),|(
),|,(log),|,(
RqCRqC
NRqqC
NRqCNRqC
NRqqC
DLqPDLqP
DLqqPDLqqMI
jDiD
jiD
jDiD
jiD
ji
jiji
)(*,*, RCN D
![Page 19: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/19.jpg)
Experimental Setting
• Stemmed and stop words were removed.
• Queries are TREC topics 202 to 250 on TREC disk 2 and 3.
![Page 20: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/20.jpg)
The flow of the experimental
Find the linkage of query
query
Find the max L by maxlP(l|Q)
Get
document Training data For weight computation
Count the frequency
),|( and
),|(
jiC
jiD
qqRF
qqRF
),|( ji qqRF
Get P(L|D)
Count the frequency
)( and )( iDiC qCqC Get )|( DqP i
Count the frequency
)(*,*, and
),,( and
),*,( and ),*,(
RC
RqqC
RqCRqC
D
jiD
iDiD Get ),|,( DLqqMI ji
combine Ranking
document
![Page 21: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/21.jpg)
Result-BM & UG
• BM: binary independent retrieval
• UG: unigram language model approach
• UG achieves the performance similar to, or worse than, that of BM.
![Page 22: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/22.jpg)
Result- DM
• DM: dependency model
• The improve of DM over UG is statistically significant.
![Page 23: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/23.jpg)
Result- BG
• BG: bigram language model
• BG is slightly worse than DM in five out of six TREC collections but substantially outperforms UG in all collection.
![Page 24: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/24.jpg)
Result- BT1 & BT2
• BT: bi-term language model
)),|(),|((2
1),|( 1111 DqqPDqqPDqqP iiBGiiBGiiBT
)}(),(min{2
),(),(),|(
1
1112
iDiD
iiDiiDiiBT qCqC
qqCqqCDqqP
![Page 25: Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697c0041a28abf838cc4842/html5/thumbnails/25.jpg)
Conclusion
• This paper introduce the linkage of a query as a hidden variable.
• Generate each term in turn depending on other related terms according to the linkage.– This approach cover several language model approaches as special case
s.
• The experimental of this paper outperforms substantially over unigram, bigram and classical probabilistic retrieval model.