Grammatical Machine Translation Stefan Riezler & John Maxwell.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010...
-
Upload
eugenia-stokes -
Category
Documents
-
view
214 -
download
0
Transcript of Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010...
![Page 1: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.](https://reader036.fdocuments.us/reader036/viewer/2022082710/56649e3f5503460f94b2f525/html5/thumbnails/1.jpg)
Query Rewriting Using Monolingual Statistical Machine TranslationStefan RiezlerYi LiuGoogle2010 Association for Computational Linguistics
![Page 2: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.](https://reader036.fdocuments.us/reader036/viewer/2022082710/56649e3f5503460f94b2f525/html5/thumbnails/2.jpg)
Introduction• Create a system that learns to generate query rewrite from a
large amount of user query logs.• Use query expansion in Web search for evaluation of rewritten
queries.• For a given set of randomly selected queries, n-best rewrites
are produced.• From the changes introduced by the rewrites, expansion terms
are extracted and added as alternate form.
![Page 3: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.](https://reader036.fdocuments.us/reader036/viewer/2022082710/56649e3f5503460f94b2f525/html5/thumbnails/3.jpg)
Example• For a query like herbs for chronic constipation AND operator
used. Expansion terms added with OR operator. For this sentence remedies, medicine, or supplement are appropriate terms, but in this context spices are not.
• Herbs for mexican cooking only spices is a good alternative.
![Page 4: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.](https://reader036.fdocuments.us/reader036/viewer/2022082710/56649e3f5503460f94b2f525/html5/thumbnails/4.jpg)
Goal• Use the translation model and language model to expand
query terms in context.• Translation model proposes expansion candidates.• Query language model performs a selection in the context of
the surrounding query terms.• SMT is readily applicable to this task. Apply to large parallel
data of queries on the source side, and snippets of clicked search results on the target side.
• Snippets introduce noise since they are not complete sentences.
• TREC Data.
![Page 5: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.](https://reader036.fdocuments.us/reader036/viewer/2022082710/56649e3f5503460f94b2f525/html5/thumbnails/5.jpg)
Review: Query Expansion by Q-D Term Correlation• A session links query terms with a document:
• Aggregation of clicks over sessions will reflect the preferences of multiple users (probability distribution of doc words given query words from counts over clicked docs D over sessions):
• This formula considers the Query as a cohesive unit:
![Page 6: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.](https://reader036.fdocuments.us/reader036/viewer/2022082710/56649e3f5503460f94b2f525/html5/thumbnails/6.jpg)
Review: Machine Translation 1/2• Linear Model for SMT:• Find English string e that is a translation of foreign string f using a
linear combination of feature function hm(e,f) and weights lambda:
• Word Alignment:• Relationship of translation model and alignment model for source
language string f and targe string e is via a hidden variable describing an alignment mapping from source position j to target position aj:
![Page 7: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.](https://reader036.fdocuments.us/reader036/viewer/2022082710/56649e3f5503460f94b2f525/html5/thumbnails/7.jpg)
Review: Machine Translation 2/2• “Sentence Aligned” parallel training data are prepared by
paring user queries with snippets of clicked search results for the respective queries.
• Phrase Extraction:• Maximum-likelihood estimation of sentence aligned strings:
• Alignment with highest probability:
![Page 8: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.](https://reader036.fdocuments.us/reader036/viewer/2022082710/56649e3f5503460f94b2f525/html5/thumbnails/8.jpg)
Language Model• n-gram language modeling, smoothing for sparse data
problems.• Ultimate task is to pick appropriate phrase translations in the
context of the original query for query expansion.
![Page 9: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.](https://reader036.fdocuments.us/reader036/viewer/2022082710/56649e3f5503460f94b2f525/html5/thumbnails/9.jpg)
Data• Training data for translation model and correlation-base
model consists of pairs of queries and snippets for clicked result taken from query logs.
• 3 billion query-snippet pairs from which a phrase-table of 700 million query-snippet phrase translation is extracted.
• Trigram trained on English queries in user logs.• N-gram cutoffs at minimum frequency of 4.• Query were avg. length of 2.6 words.• Snippets were avg. length 8.3 words.
![Page 10: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.](https://reader036.fdocuments.us/reader036/viewer/2022082710/56649e3f5503460f94b2f525/html5/thumbnails/10.jpg)
Query Expansion• Use Google, SMT-based system, correlation-based system,
and correlation-based system using language model as filter.• Expansion terms:• 150,000 randomly extracted 3+ word queries rewritten by each of
the systems.• For each system, expansion terms from 5-best rewrites, and
stored in table that maps source phrases to target phrases in context of full query.
![Page 11: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.](https://reader036.fdocuments.us/reader036/viewer/2022082710/56649e3f5503460f94b2f525/html5/thumbnails/11.jpg)
Evaluation 1/2• 3 independent raters, presented with queries and 10-best
search results from two systems. 7-point Likert Scale
![Page 12: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.](https://reader036.fdocuments.us/reader036/viewer/2022082710/56649e3f5503460f94b2f525/html5/thumbnails/12.jpg)
Evaluation 2/2
![Page 13: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.](https://reader036.fdocuments.us/reader036/viewer/2022082710/56649e3f5503460f94b2f525/html5/thumbnails/13.jpg)
Conclusion• SMT model is flexible enough to capture the peculiarities of
query-snippet translation.• Hope to apply SMT to query suggestions.