Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*,...

Time-dependent Similarity Time-dependent Similarity Measure of Queries Using Measure of Queries Using

Historical Click-through DataHistorical Click-through Data

Time-dependent Similarity Time-dependent Similarity Measure of Queries Using Measure of Queries Using

Historical Click-through DataHistorical Click-through Data

Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al.

Presented by: Tie-Yan Liu

* This work was done when Zhao and Hoi were interns at Microsoft Research Asia

OutlineOutline

• Background• Observations and Motivation • Our approach• Empirical study• Future work

BackgroundBackground

• A dilemma for Web search engines2Very short queries ~2.52Inconsistency of term usages • The Web is not well-organized• Users express queries with their own

vocabulary

Background (cont’d)Background (cont’d)

• Solution: query expansion2Document term based expansion (KDD00,

SIGIR05)• a query can be expanded with top keywords in the

top-k relevant documents

2Query term based expansion (WWW02, CIKM04)• a query can be expanded with similar queries

(queries are similar if they lead to similar pages, pages are similar if they are visited by issuing similar queries)

2Click-though data were used for query expansion in many previous work.

Background (cont’d)Background (cont’d)

• Click-through data2 Log data about the interactions between users and Web search engines

• Typical Click-through data representation

(month)

Observation 1Observation 1

• Accuracy of query similarity

Calculated only from the click-through data in that time interval.

Calculated from all the click-through data before that time point

Observation 2Observation 2

• Event driven and dynamic character of query similarity

the keyword “firework” and related pages are becoming more popular one week before the

event and reach the peak on July 4th

“firework + injuries" and “firework + picture“ have a little delay in terms of the number of times being issued and visited.

“firework + market" and “firework + show" become

popular and reach their peaks a few days before

July 4th

MotivationsMotivations

• Exploit the click-through data for semantic similarity of queries by incorporating temporal information

• To combine explicit content similarity and implicit semantic similarity

Our ApproachOur Approach

Time-Dependent Concepts Time-Dependent Concepts

• Calendar schema and pattern

• Example2 Calendar schema <day, month, year>2 Calendar pattern <15, *,*>2 <15, 1, 2002> is contained in the pattern <15, *,*>

Time-Dependent ConceptsTime-Dependent Concepts

• Click-Through Subgroup

• Example2 Based on the schema <day, week>, and the pattern

<1,*>, <2,*>,…,<7,*>, we can partition the data into 7 groups, which correspond to Sun, Mon, Tue, …, Sat.

Similarity MeasureSimilarity Measure

• For efficiency and simplicity, we measure the query similarity in a certain time slot only based on the click-through data.2 Vector representation of queries with respect to clicked d

ocuments.

2 wi is defined by Page Frequency (PF) and Inverted Query Frequency (IQF)

Similarity MeasureSimilarity Measure• Query similarity measures

2Cosine function2Marginalized kernel

• By introducing query clusters, one can model the query similarity in a more semantic way.

Time-Dependent Similarity MeasureTime-Dependent Similarity Measure

Empirical EvaluationEmpirical Evaluation

• Dataset2 Click-through log of a commercial search engine:

• June 16, 2005 to July 17,2005• Total size of 22GB• Only queries from US

2 Calendar schema and pattern• <hour, day, month>, <1, *, *>, <2, *, *>, …• Divide the data into 24 subgroups• Average subgroup size: 59,400,000 query-page pairs

Empirical ExamplesEmpirical Examples

• Kids+toy, map+route

Time-dependent daily similarityIncremented daily similarity

Empirical ExamplesEmpirical Examples

• weather + forecast, fox + news

Time-dependent daily similarityIncremented daily similarity

Quality EvaluationQuality Evaluation

• Experimental Settings2 Partition 32-day dataset into two parts

• First part for model construction• Second part for model evaluation

2 Accuracy is defined as the percentage of difference between the actual similarity and the model-based prediction

2 1000 representative query pairs, similarity larger than 0.3 using the entire dataset

• Half of them are top queries of the month• Half are selected manually related to real world events such as

“hurricane”.

Experimental ResultsExperimental Results

Here “distance” is the time difference between the first test data record and the last model construction data record. For example, when the distance is 1 and the training data size is 10, we summarize all the accuracy values that use the I to 10+i days as training and use the 10+1+i as testing.

Experimental ResultsExperimental Results

ConclusionConclusion

• Presented a preliminary study of the dynamic nature of query similarity using click-through data

• Observed and verified that query similarity are dynamic and event driven with real data

• Proposed an time-dependent model• For our future work, we will investigate an

adaptive way to determine the most suitable time granularity for two given queries.

Thanks!Thanks!

tyliu@microsoft.comhttp://research.microsoft.com/users/tyliu

Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*,...

Documents

Transcript of Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*,...

TIE-13100 / TIE-13106 Tietotekniikan projektityö / Project ...projekti/2014-15/TIE-PROJ_Arch_WS_materials.pdf · TIE-13100 / TIE-13106 Tietotekniikan projektityö / Project Work

Wireless Communications with Reconﬁgurable Intelligent ...Wankai Tang, Ming Zheng Chen, Xiangyu Chen, Jun Yan Dai, Yu Han, Marco Di Renzo, Yong Zeng, Shi Jin, Qiang Cheng, and Tie

Topology - Yan

Measuring*Scale Economies*in*Search · Measuring*Scale Economies*in*Search June%2, 2015 Preston%McAfee Microsoft With%Justin%Rao,Aadharsh%Kannan Di%He,Tao%Qin,Tie ?Yan%Liu 1

Yan-Kit's Classic Chinese Cookbook - Yan-Kit So

PopMAG: Pop Music Accompaniment Generation · 2020. 8. 19. · PopMAG: Pop Music Accompaniment Generation Yi Ren1∗, Jinzheng He1∗, Xu Tan2, Tao Qin2, Zhou Zhao1†, Tie-Yan Liu2

Best Action Selection in a Stochastic Environment - GitHub Pages · 2020. 7. 10. · Best Action Selection in a Stochastic Environment Yingce Xia1, Tao Qin2, Nenghai Yu1, Tie-Yan

Distribution Tie - Preformed Line Productspreformed.com/.../Distribution/Ties/Distribution_Tie/Distribution_Tie.p… · Tie Tube: Each tie is furnished with Tie Tube Component.The

Yan portfolio

Enabling Phased Array Signal Processing for Mobile WiFi Devicestns.thss.tsinghua.edu.cn/wifiradar/papers/QianKun-TMC... · 2018-09-17 · Manuscript received 29 May 2016; revised

International Tie Disposal, LLC – Project Tie

TO TIE A TIE - MultiVu: Multimedia Production & Strategic ... · PDF fileto tie a tie Find your nearest ... because it gives the tie more length than a thicker knot. The bow tie is

Daniel Yan

How To Tie A Bow Tie

Neural Machine Translation: A Machine Learning Perspectivetcci.ccf.org.cn/summit/2017/dlinfo/06.pdf · Neural Machine Translation: A Machine Learning Perspective Tie-Yan Liu Principal

Reinforcement Learning with Dynamic Boltzmann …Reinforcement Learning with Dynamic Boltzmann Softmax Updates Ling Pan 1, Qingpeng Cai , Qi Meng 2, Wei Chen , Longbo Huang1, Tie-Yan

Learning to Rank for Information Retrievalpiyush/teaching/ranking_tutorial2.pdfLearning to Rank for Information Retrieval Tie-Yan Liu Microsoft Research Asia A tutorial at WWW 2009

Dual Learning: Pushing New Frontier of Artificial …...Dual Learning: Pushing New Frontier of Artificial Intelligence Tie-Yan Liu Principal Researcher Microsoft Research Asia AI is

MAC YAC YAC Yacht Stores Ltd. ~ 1962 ~ former ref: Second ...MAC YAN YAN Yan Kim Yew ~ 1955 Yan, Betty ~ 1958 ~ former ref: Second series Yan, Yuen Fat Tang ~ 1948 ~ former ref: Second

Baisheng Yan - Michigan State Universityusers.math.msu.edu/users/yan/full-notes.pdf · Baisheng Yan Department of Mathematics Michigan State University yan@math.msu.edu. Contents

MeasuringScale EconomiesinSearch · MeasuringScale EconomiesinSearch June%2, 2015 Preston%McAfee Microsoft With%Justin%Rao,Aadharsh%Kannan Di%He,Tao%Qin,Tie ?Yan%Liu 1