PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web...
Transcript of PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web...
![Page 1: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/1.jpg)
Edith Law
PageRanklecture 12 (October 9, 2008)
![Page 2: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/2.jpg)
Page RankWhat’s the big deal?
![Page 3: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/3.jpg)
Life before PageRank
![Page 4: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/4.jpg)
Life after PageRank
![Page 5: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/5.jpg)
Evolution of the web
Centralization
![Page 6: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/6.jpg)
Idea #1: Centralization
![Page 7: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/7.jpg)
Idea #1: Centralization
![Page 8: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/8.jpg)
Veronica(1992)
Jughead(1993)
Archie(1990)
![Page 9: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/9.jpg)
Evolution of the web
Centralization
Relevancy
![Page 10: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/10.jpg)
Idea #2: Relevancy
2. More sophisticated indexing methods
Filename Description Content
1. Web directories
Given a query, how do we know what to retrieve?
![Page 11: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/11.jpg)
The index size war
![Page 12: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/12.jpg)
Evolution of the web
Centralization
Relevancy
Ranking
![Page 13: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/13.jpg)
Idea #3: Ranking
![Page 14: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/14.jpg)
Page RankHow it works ...
![Page 15: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/15.jpg)
Main idea
A page is important if it is pointed to by other important pages
![Page 16: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/16.jpg)
Importance
/ l(Pj) (t+1) (t)
r (Pi) = ∑ r (Pj) j∈E(i)
C
B
0.2 0.8
0.6
A ...
9,999
![Page 17: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/17.jpg)
The Algorithm
Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks
• Assign each node an initial page rank
• repeat until convergence
calculate the page rank of each node (using the equation in the previous slide)
![Page 18: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/18.jpg)
Example
2
5
3
1
4
Iteration 0 Iteration 1 Iteration 2 Page Rank
P1 1/5 1/20 1/40 5
P2 1/5 5/20 3/40 4
P3 1/5 1/10 5/40 3
P4 1/5 5/20 15/40 2
P5 1/5 7/20 16/40 1
r1(P5)=1/5 + 1/5×1/4 + 1/5 × 1/2 = 7/20
![Page 19: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/19.jpg)
Matrix representation
1/20
5/20
1/10
5/20
7/20
r(t+1)
0 0 1/4 0 0
1 0 1/4 0 0
0 0 0 1/2 0
0 0 1/4 0 1
0 1 1/4 1/2 0
H
=
= r(t)
1/5
1/5
1/5
1/5
1/5
![Page 20: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/20.jpg)
Three Questions
• Does this converge?
• Does it converge to what we want?
• Are the results reasonable?
r(t+1) = H r(t)
Also known as the power method
![Page 21: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/21.jpg)
Does it converge?
Iteration 0 Iteration 1 Iteration 2 Iteration 3
P1 1 0 1 0
P2 0 1 0 1
21
![Page 22: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/22.jpg)
Iteration 0 Iteration 1 Iteration 2 Iteration 3
P1 1 0 0 0
P2 0 1 0 0
21
Does it converge to what we want?
![Page 23: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/23.jpg)
Does it converge to what we want?
2
5
3
1
4
xDangling
Node0 0 1/4 0 0
1 0 1/4 0 0
0 0 0 1/2 0
0 0 1/4 0 1
0 0 1/4 1/2 0
Page ranks to converge to 0.
![Page 24: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/24.jpg)
Looks a lot like ...
Markov Chains
Set of states X
Transition matrix P where Pij = P(Xt=j | Xt-1=i)
π specifying the probability of being at each state x ∈ X
Goal is to find π such that π = P π
r(t+1) = H r(t)
![Page 25: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/25.jpg)
Why is this analogy useful?
There exists a theory about Markov chains that says that for any start vector, the power method applied to a Markov transition matrix P will converge to a unique positive stationary vector as long as P is stochastic, irreducible and aperiodic.
![Page 26: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/26.jpg)
Make H stochastic
S = H + a(1/n eT)
2
5
3
1
4
0 1/5 1/4 0 0
1 1/5 1/4 0 0
0 1/5 0 1/2 0
0 1/5 1/4 0 1
0 1/5 1/4 1/2 0
![Page 27: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/27.jpg)
Make H aperiodic
A chain is periodic if there exists k > 1 such that the interval between two visits to some state s is always a multiple of k.
1
2 5
3 4
![Page 28: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/28.jpg)
Make H irreducibleFrom any state, there is a non-zero probability of going from one state to another.
1
2 3
4 5
![Page 29: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/29.jpg)
The Google Matrix
G = αS + (1-α) 1/n eeT
2
5
3
1
4
The Random Surfer Model: for each page, time spent ∝ importance.
![Page 30: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/30.jpg)
G = αS + (1-α) 1/n eeT
G is stochastic, aperiodic and irreducible.
r(t+1) = G r(t)
G is dense but computable using the sparse matrix H.
G = αS + (1-α) 1/n eeT
= α(H + 1/naeT) + (1-α) 1/n eeT
= αH + (αa + (1-α)e) 1/n eT
Are the results reasonable?
![Page 31: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/31.jpg)
Page RankThe problems
![Page 32: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/32.jpg)
The Rich Gets Richer
(Cho et al, 04)
![Page 33: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/33.jpg)
Google Bombs
![Page 34: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/34.jpg)
Google Bombs
![Page 35: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/35.jpg)
Google Bombs
![Page 36: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/36.jpg)
Link Farms
... ...
![Page 37: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/37.jpg)
Link Farms
(Wu and Davison, 05)
![Page 38: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/38.jpg)
Take-home
Ranking is important.
Relationship between links and the importance of pages.
Why PageRank converges (to the right answer).
How link-based ranking methods can be manipulated.
![Page 39: PageRank - Carnegie Mellon School of Computer Scienceelaw/pagerank.pdfThe Algorithm Given a web graph with n nodes, where the nodes are pages and edges are hyperlinks • Assign each](https://reader034.fdocuments.us/reader034/viewer/2022051904/5ff5b097d9616f437e2bed25/html5/thumbnails/39.jpg)
g2gttyl