Lecture 4
description
Transcript of Lecture 4
![Page 1: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/1.jpg)
Lecture 4
CS492 Special Topics in Computer ScienceDistributed Algorithms and Systems
![Page 2: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/2.jpg)
“The PageRank Citation Ranking:Bringing Order to the Web”
L. Page, S. Brin, R. Motwani, T Winograd1998
Fall 2008 CS492
![Page 3: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/3.jpg)
3Fall 2008 CS492
Origin of “Google” Googol
10^100 Motivation behind
Human maintained indices such as Yahoo! Explosive growth
http://news.netcraft.com
HostnamesActive
![Page 4: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/4.jpg)
4Fall 2008 CS492
Design Goals of Google Improved search quality
In 1997, 1 out of 4 top search engines found itself High precision in finding relevant document was necessary
Academic search engine research Search engine technology went commercial: an black art To build systems that a good number of people could use To build an architecture to support novel research on
large-scale Web data
![Page 5: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/5.jpg)
5Fall 2008 CS492
Weakness of Existing Approaches Calculate similarities
Based on flat, vector-space model of each page Prone to cheating (Web spamming or search engine per-
suasion)
![Page 6: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/6.jpg)
6Fall 2008 CS492
Basic Idea of PageRank Exploit the topological structure of hypertextual sys-
tems
![Page 7: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/7.jpg)
Fall 2008 CS492 7
Simple Example
A
C
B0.2
0.4
0.4
![Page 8: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/8.jpg)
8Fall 2008 CS492
Related Work Academic citation analysis
Similarities Graph structure; paper = node, web page = node citation = link, URL = link “node” authority independent of “node” content
Differences Uniform unit of info (paper) versus great variability in quality, usage, citations, and length Equal link weight vs variable importance A backlink from Yahoo! vs. from a friend
![Page 9: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/9.jpg)
Fall 2008 CS492 9
Which Page Should Be Ranked Higher?
A B
John Doe
![Page 10: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/10.jpg)
10Fall 2008 CS492
Simple Expression
page rank of set of pages pointing at
out-degree of
Question: role of c?Answer: total rank of all web pages constant
![Page 11: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/11.jpg)
11Fall 2008 CS492
Dangling links Pages without outgoing pointers
Example: Pages not yet downloaded Do not affect the calculation much
Remove them, calculate ranks, and add them back
![Page 12: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/12.jpg)
12Fall 2008 CS492
Loop
A
C
B
Question: ranks of A, B, and C?Answer: infinite! (rank sink)
![Page 13: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/13.jpg)
13Fall 2008 CS492
Basic Algorithm
page rank of set of pages pointing at
out-degree of
dumping factor
![Page 14: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/14.jpg)
14Fall 2008 CS492
Matrix Representation
Question: Where to start?
where and
![Page 15: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/15.jpg)
15Fall 2008 CS492
Iterative Algorithm
where and
Question: Will it converge?
![Page 16: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/16.jpg)
16Fall 2008 CS492
Example
[LM04]
![Page 17: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/17.jpg)
17Fall 2008 CS492
Turn the Problem into a Markov Process
[LM04]
![Page 18: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/18.jpg)
18Fall 2008 CS492
Evenly Split Rank of Dangling Links
[LM04]
![Page 19: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/19.jpg)
19Fall 2008 CS492
Final Solution Eigenvector of P = steady state rank
![Page 20: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/20.jpg)
20Fall 2008 CS492
Spam Rank
[BGS05]
![Page 21: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/21.jpg)
21Fall 2008 CS492
Questions Where to start?
Find a nondegenerate start vector What if there are two pages that point to each other
and no one else and there is a page that points to one of them? Role of dumping factor guarantees no rank sink
![Page 22: Lecture 4](https://reader034.fdocuments.us/reader034/viewer/2022051821/56816530550346895dd7b573/html5/thumbnails/22.jpg)
22Fall 2008 CS492
References[BP98] Sergey Brin, Lawrence Page, “The anatomy of a large-scale hypertextual Web search en-
gine,” Computer Networks and ISDN Systems, Vol. 30, 1998.[BGS05] Monica Bianchini, Marco Gori, Franco Scarselli, “Inside PageRank,” ACM Transactions on
Internet Technology, Vol. 5, No. 1, Feb. 2005.[LM04] Amy N. Langville, Carl Meyer, “Deeper inside PageRank,” Internet Mathematics, Vol. I, No.
3, 2004.[K99] Jon Kleinberg, “Authoritative sources in a Hyperlinked Environment,” Journal of the ACM
46:5 (1999).