Clustering Personalized Web Search Results

15
Clustering Personalized Web Search Results Xuehua Shen and Hong Cheng

description

Clustering Personalized Web Search Results. Xuehua Shen and Hong Cheng. Introduction. Search engine’s objectives Rank most relevant search results at top Effectiveness PageRank / HITS Group and present different categories of search results Global view Clustering. - PowerPoint PPT Presentation

Transcript of Clustering Personalized Web Search Results

Page 1: Clustering Personalized Web Search Results

Clustering Personalized Web Search Results

Xuehua Shen and Hong Cheng

Page 2: Clustering Personalized Web Search Results

Introduction

• Search engine’s objectives– Rank most relevant search results at top

• Effectiveness• PageRank / HITS

– Group and present different categories of search results

• Global view• Clustering

Page 3: Clustering Personalized Web Search Results

Clustering Personalized Search Results

• Study the clustering problem in the UCAIR framework

• Personalized search ranks or reranks the search results based on user implicit feedback

• Bring interesting problems– Efficient and effective clustering/presentation– Dynamically update the clustering results bas

ed on personalization

Page 4: Clustering Personalized Web Search Results

Goal

• Effective– Cluster user search results into meaningful groups – Present in a clear format– Provide users with main themes of search results

• Efficient– Implement efficient clustering algorithms

• Dynamic– Dynamically maintain the clustering results based on

personalized ranking and reranking

Page 5: Clustering Personalized Web Search Results

Progress

• Implemented two clustering algorithms– K-Medoids– Hierarchical clustering

• Presentation– Replace Google ads with clustering results– Present ranked results together with clustering results– Two presentation strategies

• Most centrally located document in each cluster• Most frequent terms in each cluster

Page 6: Clustering Personalized Web Search Results

Partial Results

• K-Medoids– Select the most centrally located documents a

s cluster center– Present the centroid documents as each clust

er’s representative– Efficiency not so good

• Other processing time: 490+100+1562=2152 ms

• Cluster search results time: 2844 ms

Page 7: Clustering Personalized Web Search Results

Partial Results (II)

• Hierarchical clustering– Merge similar documents in a pair-wise mann

er– Use weighted average term vectors to represe

nt cluster center– Present centroid term vectors as a virtual doc

uments (output Top-K terms)– Efficiency better than K-Medoids

• Other processing time: 200+110+831= 1141 ms

• Cluster search results time: 661 ms

Page 8: Clustering Personalized Web Search Results

Efficiency Analysis

• K-Medoids

– O(k(n-k)2 ) for each iteration

where n is # of documents, k is # of clusters

– Need multiple iterations for convergence

• Hierarchical clustering– O(n2 ) for each iteration– Need n-k iterations

Page 9: Clustering Personalized Web Search Results

Lessons Learned

• Clustering takes longer time as more search results accumulate (when we click “Next”)

• Top-K frequent terms in each cluster sometimes do not make sense– Combine additional information besides term

frequency

• Re-cluster each time when reranking search results– Incremental update of clustering results is desired!

Page 10: Clustering Personalized Web Search Results

Remaining

• Implementation– KMeans– MMR– Frequent word sets

• Effective presentation study– Based on user feedback– Literature survey

• Dynamic maintenance of clustering based on search result ranking and reranking– Drill down in a particular cluster– Update overall clustering organization

Page 11: Clustering Personalized Web Search Results

Feedback

• Which way to present clustering results is more meaningful?– Based on central documents– Based on term vectors– More options?

• Any other clustering algorithms to achieve effectiveness and efficiency?

• Any other presentation strategy besides “rank list + cluster center” ?

Page 12: Clustering Personalized Web Search Results
Page 13: Clustering Personalized Web Search Results
Page 14: Clustering Personalized Web Search Results
Page 15: Clustering Personalized Web Search Results