Clustering Personalized Web Search Results
-
Upload
joshua-savage -
Category
Documents
-
view
38 -
download
4
description
Transcript of Clustering Personalized Web Search Results
Clustering Personalized Web Search Results
Xuehua Shen and Hong Cheng
Introduction
• Search engine’s objectives– Rank most relevant search results at top
• Effectiveness• PageRank / HITS
– Group and present different categories of search results
• Global view• Clustering
Clustering Personalized Search Results
• Study the clustering problem in the UCAIR framework
• Personalized search ranks or reranks the search results based on user implicit feedback
• Bring interesting problems– Efficient and effective clustering/presentation– Dynamically update the clustering results bas
ed on personalization
Goal
• Effective– Cluster user search results into meaningful groups – Present in a clear format– Provide users with main themes of search results
• Efficient– Implement efficient clustering algorithms
• Dynamic– Dynamically maintain the clustering results based on
personalized ranking and reranking
Progress
• Implemented two clustering algorithms– K-Medoids– Hierarchical clustering
• Presentation– Replace Google ads with clustering results– Present ranked results together with clustering results– Two presentation strategies
• Most centrally located document in each cluster• Most frequent terms in each cluster
Partial Results
• K-Medoids– Select the most centrally located documents a
s cluster center– Present the centroid documents as each clust
er’s representative– Efficiency not so good
• Other processing time: 490+100+1562=2152 ms
• Cluster search results time: 2844 ms
Partial Results (II)
• Hierarchical clustering– Merge similar documents in a pair-wise mann
er– Use weighted average term vectors to represe
nt cluster center– Present centroid term vectors as a virtual doc
uments (output Top-K terms)– Efficiency better than K-Medoids
• Other processing time: 200+110+831= 1141 ms
• Cluster search results time: 661 ms
Efficiency Analysis
• K-Medoids
– O(k(n-k)2 ) for each iteration
where n is # of documents, k is # of clusters
– Need multiple iterations for convergence
• Hierarchical clustering– O(n2 ) for each iteration– Need n-k iterations
Lessons Learned
• Clustering takes longer time as more search results accumulate (when we click “Next”)
• Top-K frequent terms in each cluster sometimes do not make sense– Combine additional information besides term
frequency
• Re-cluster each time when reranking search results– Incremental update of clustering results is desired!
Remaining
• Implementation– KMeans– MMR– Frequent word sets
• Effective presentation study– Based on user feedback– Literature survey
• Dynamic maintenance of clustering based on search result ranking and reranking– Drill down in a particular cluster– Update overall clustering organization
Feedback
• Which way to present clustering results is more meaningful?– Based on central documents– Based on term vectors– More options?
• Any other clustering algorithms to achieve effectiveness and efficiency?
• Any other presentation strategy besides “rank list + cluster center” ?