A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and...
-
Upload
bruno-harrell -
Category
Documents
-
view
226 -
download
0
Transcript of A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and...
![Page 1: A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f2c5503460f94c47bcf/html5/thumbnails/1.jpg)
A Statistical Comparison of Tag and Query Logs
Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark BaillieSIGIR 2009
June 4, 2010Hyunwoo Kim
![Page 2: A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f2c5503460f94c47bcf/html5/thumbnails/2.jpg)
Contents Introduction Building a Dataset Are the Distributions Similar? Investigating Website Content Conclusion
2 / 20
![Page 3: A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f2c5503460f94c47bcf/html5/thumbnails/3.jpg)
Introduction
tags3 / 20
![Page 4: A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f2c5503460f94c47bcf/html5/thumbnails/4.jpg)
Introduction Questions
1. Are queries and tags similar across URLs?2. Can tag data be used to approximate user queries to a
search engine?3. Can query logs be used to suggest new tags for a particular
webpage?4. For what types of websites is the correlation between the
term distributions for queries and tags the highest?5. Which of the distributions, tags or queries, is most closely re-
lated to the content of the clicked websites?
4 / 20
![Page 5: A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f2c5503460f94c47bcf/html5/thumbnails/5.jpg)
Building a Dataset AOL query log
– Sizable– Recent (2006)– English queries– Available to academic researchers– 657,426 users– A period of 3 months from March to May, 2006
Delicious tag– Collaborative tagging system
Final dataset: 4145 complete URLs– Google query, stemming, prunning
5 / 20
![Page 6: A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f2c5503460f94c47bcf/html5/thumbnails/6.jpg)
Are the Distributions Similar?
http://www.nytimes.com
tags
or
6 / 20
![Page 7: A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f2c5503460f94c47bcf/html5/thumbnails/7.jpg)
Are the Distributions Similar? Kullback-Leibler divergence
7 / 20
![Page 8: A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f2c5503460f94c47bcf/html5/thumbnails/8.jpg)
Are the Distributions Similar? Jensen-Shannon divergence
– Symmetric measure
Overlap coefficient
Vq : query logsVr : tags
8 / 20
![Page 9: A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f2c5503460f94c47bcf/html5/thumbnails/9.jpg)
Are the Distributions Similar?
9 / 20
![Page 10: A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f2c5503460f94c47bcf/html5/thumbnails/10.jpg)
Are the Distributions Similar? Open directory project
10 / 20
![Page 11: A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f2c5503460f94c47bcf/html5/thumbnails/11.jpg)
Are the Distributions Similar?
11 / 20
![Page 12: A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f2c5503460f94c47bcf/html5/thumbnails/12.jpg)
Are the Distributions Similar?
12 / 20
![Page 13: A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f2c5503460f94c47bcf/html5/thumbnails/13.jpg)
Are the Distributions Similar?
13 / 20
![Page 14: A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f2c5503460f94c47bcf/html5/thumbnails/14.jpg)
Are the Distributions Similar?
14 / 20
![Page 15: A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f2c5503460f94c47bcf/html5/thumbnails/15.jpg)
Are the Distributions Similar?
15 / 20
![Page 16: A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f2c5503460f94c47bcf/html5/thumbnails/16.jpg)
Are the Distributions Similar?
16 / 20
![Page 17: A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f2c5503460f94c47bcf/html5/thumbnails/17.jpg)
Investigating Website Content
17 / 20
![Page 18: A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f2c5503460f94c47bcf/html5/thumbnails/18.jpg)
Investigating Website Content
18 / 20
![Page 19: A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f2c5503460f94c47bcf/html5/thumbnails/19.jpg)
Conclusion Similarity between query term and tag
– Vocabularies contain a large amount of overlap– Term frequency distributions are correlated– Similarity is not dependent on the topic area
Queries are more similar to content than to tags Queries and tags are more similar to one another
than to content
Future work– Models for automatically removing noise from the tag and
query logs– Techniques for predicting useful tags from query distributions– Techniques for the effective use of tag data to improve dif-
ferent forms of Web search
19 / 20
![Page 20: A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f2c5503460f94c47bcf/html5/thumbnails/20.jpg)
Thank you