Predicting Content Change on the Web Kira Radinsky Technion, Israel Paul Bennettt Microsoft...
-
Upload
piers-timothy-montgomery -
Category
Documents
-
view
215 -
download
1
Transcript of Predicting Content Change on the Web Kira Radinsky Technion, Israel Paul Bennettt Microsoft...
![Page 1: Predicting Content Change on the Web Kira Radinsky Technion, Israel Paul Bennettt Microsoft Research.](https://reader035.fdocuments.us/reader035/viewer/2022062802/56649e9d5503460f94b9edb6/html5/thumbnails/1.jpg)
Predicting Content Change on the Web
Kira RadinskyTechnion, Israel
Paul BennetttMicrosoft Research
![Page 2: Predicting Content Change on the Web Kira Radinsky Technion, Israel Paul Bennettt Microsoft Research.](https://reader035.fdocuments.us/reader035/viewer/2022062802/56649e9d5503460f94b9edb6/html5/thumbnails/2.jpg)
2009
2010
2011
Bing Site
![Page 3: Predicting Content Change on the Web Kira Radinsky Technion, Israel Paul Bennettt Microsoft Research.](https://reader035.fdocuments.us/reader035/viewer/2022062802/56649e9d5503460f94b9edb6/html5/thumbnails/3.jpg)
![Page 4: Predicting Content Change on the Web Kira Radinsky Technion, Israel Paul Bennettt Microsoft Research.](https://reader035.fdocuments.us/reader035/viewer/2022062802/56649e9d5503460f94b9edb6/html5/thumbnails/4.jpg)
Personal Site
200920102011
![Page 5: Predicting Content Change on the Web Kira Radinsky Technion, Israel Paul Bennettt Microsoft Research.](https://reader035.fdocuments.us/reader035/viewer/2022062802/56649e9d5503460f94b9edb6/html5/thumbnails/5.jpg)
![Page 6: Predicting Content Change on the Web Kira Radinsky Technion, Israel Paul Bennettt Microsoft Research.](https://reader035.fdocuments.us/reader035/viewer/2022062802/56649e9d5503460f94b9edb6/html5/thumbnails/6.jpg)
Unified Approach for Content Change Prediction
1D Setting use observation of change only
2D Setting use observation of change and
content from the page itself only
3D Settinguse change and content from
page and related pages.
![Page 7: Predicting Content Change on the Web Kira Radinsky Technion, Israel Paul Bennettt Microsoft Research.](https://reader035.fdocuments.us/reader035/viewer/2022062802/56649e9d5503460f94b9edb6/html5/thumbnails/7.jpg)
Results – what information to use?
Content improves over Page Change Frequency aloneRelated pages improve over Content & Change frequency
![Page 8: Predicting Content Change on the Web Kira Radinsky Technion, Israel Paul Bennettt Microsoft Research.](https://reader035.fdocuments.us/reader035/viewer/2022062802/56649e9d5503460f94b9edb6/html5/thumbnails/8.jpg)
Results – how to combine the information?
Having different views of the change leads to best results
![Page 9: Predicting Content Change on the Web Kira Radinsky Technion, Israel Paul Bennettt Microsoft Research.](https://reader035.fdocuments.us/reader035/viewer/2022062802/56649e9d5503460f94b9edb6/html5/thumbnails/9.jpg)
Results – how to choose the related pages?
Best indicators of page change are the correlations in content similarity over time.
![Page 10: Predicting Content Change on the Web Kira Radinsky Technion, Israel Paul Bennettt Microsoft Research.](https://reader035.fdocuments.us/reader035/viewer/2022062802/56649e9d5503460f94b9edb6/html5/thumbnails/10.jpg)
How Can it Improve Crawling?
![Page 11: Predicting Content Change on the Web Kira Radinsky Technion, Israel Paul Bennettt Microsoft Research.](https://reader035.fdocuments.us/reader035/viewer/2022062802/56649e9d5503460f94b9edb6/html5/thumbnails/11.jpg)
Conclusions
• Page content is useful for identifying page change• Related pages content also helps in deciding which
pages will change• The combination of the data is important, and can
be efficiently distributed• Applications– Improved incremental crawling strategy.– Prediction of a new hyper-link to a previously unknown
(i.e., non-indexed) web page.– Personalized new content RSS