Google PageRank
-
Upload
beat-signer -
Category
Technology
-
view
9.377 -
download
1
description
Transcript of Google PageRank
December 9, 2008
Google PageRank Prof. Beat Signer
Department of Computer Science
Vrije Universiteit Brussel
http://www.beatsigner.com
December 9, 2008 Beat Signer, [email protected]
Overview
History of PageRank
PageRank algorithm
Examples
Implications for website development
2
December 9, 2008 Beat Signer, [email protected]
History of PageRank
Developed as part of an academic
project at Stanford University
research platform to aid under-
standing of large-scale web data
and enable researches to easily
experiment with new search technologies
Larry Page and Sergey Brin worked on the project
about a new kind of search engine (1995-1998) which
finally led to a functional prototype called Google
3
Larry Page Sergey Brin
December 9, 2008 Beat Signer, [email protected]
Web Search Until 1998
Find all documents using a query term
use information retrieval (IR) solutions
ranking based on "on-page factors"
problem: poor quality of search results (order)
Page and Brin proposed to compute the
absolute qualtity of a page (PageRank)
based on the number and quality of pages
linking to a page (votes)
4
December 9, 2008 Beat Signer, [email protected]
PageRank
A page has a high PageRank R if
there are many pages linking to it
or, if there are some pages with a high PageRank
linking to it
Total score = IR score x PageRank
5
P1
R1
P2
R2
P3
R3
P4
R4
P5
R5
P6
R6
P7
R7
P8
R8
December 9, 2008 Beat Signer, [email protected]
PageRank Algorithm
where
Bi is the set of pages
that link to page Pi
Lj is the number of
outgoing links for page Pj
6
ij BP j
j
iL
PRPR
)()( P1 P2
P3
P1
1
P2
1
P3
1
P1
1.5
P2
1.5
P3
0.75
P1
1.5
P2
1.5
P3
0.75
December 9, 2008 Beat Signer, [email protected]
Matrix Representation
Let us define a hyperlink
matrix H
7
P1 P2
P3
otherwise0
if1 ijj
ij
BPLH
0210
001
1210
H iPRRand
HRR
R is an eigenvector of H
with eigenvalue 1
December 9, 2008 Beat Signer, [email protected]
Matrix Representation ...
We can use the power method to find R
8
tt HRR 1
0210
001
1210
HFor our example
this results in or 122R 2.04.04.0
December 9, 2008 Beat Signer, [email protected]
Dangling Pages
Problem with pages
that have no outbound
links (P2)
9
P1 P2
01
00H and 00R
210
210C
211
210CHSand
C
C
December 9, 2008 Beat Signer, [email protected]
Strongly Connected Pages (Graph)
Add new transition
probabilities between
all pages
with probability d we follow
the hyperlink structure S
with probability 1-d we
choose a random page
10
P1 P2
P3 P4
P5
S1G dn
d 1
1 GRR
1-d
1-d 1-d
December 9, 2008 Beat Signer, [email protected]
Examples ...
12
A1
0.13
A2
0.185
A3
0.185
B1
0.13
B2
0.185
B3
0.185
5.0AP 5.0BP
S1G dn
d 1
1
December 9, 2008 Beat Signer, [email protected]
Examples ...
13
A1
0.10
A2
0.14
A3
0.14
B1
0.22
B2
0.20
B3
0.20
38.0AP 62.0BP
S1G dn
d 1
1
December 9, 2008 Beat Signer, [email protected]
Examples ...
14
A1
0.3
A2
0.23
A3
0.18
B1
0.10
B2
0.095
B3
0.095
71.0AP 29.0BP
S1G dn
d 1
1
December 9, 2008 Beat Signer, [email protected]
Examples ...
15
A1
0.35
A2
0.24
A3
0.18
B1
0.09
B2
0.07
B3
0.07
77.0AP 23.0BP
S1G dn
d 1
1
December 9, 2008 Beat Signer, [email protected]
Examples ...
16
A1
0.33
A2
0.17
A3
0.175
B1
0.08
B2
0.06
B3
0.06
80.0AP
20.0BPA4
0.125
S1G dn
d 1
1
December 9, 2008 Beat Signer, [email protected]
Implications for Website Development
First make sure that your page gets indexed
on-page factors
Think about your site's internal link structure
create many internal links for important pages
be "careful" about where to put outgoing links
Increase the number of pages
Ensure that webpages are addressed consistently
http://www.vub.ac.be http://www.vub.ac.be/index.php
Make sure that you get links from good websites
17
December 9, 2008 Beat Signer, [email protected]
Search Engine Optimisations (SEO)
Internet marketing has become a big business
white hat and black hat optimisations
Bad ranking or removal from index can cost a
company a lot of money
e.g. supplemental index ("Google hell")
19
December 9, 2008 Beat Signer, [email protected]
Black Hat Optimisations (Don'ts)
Link farms
Spamdexing in guestbooks, Wikipedia etc.
"solution": <a rel="nofollow" href="…">…</a>
Doorway pages (cloaking)
e.g. BMW Germany and Ricoh Germany banned in
February 2006
Selling/buying links
...
20
December 9, 2008 Beat Signer, [email protected]
On-Page Factors (Speculative)
It is assumed that there are over 200 on-page
and off-page factors
Positive factors
keyword in title tag
keyword in URL
keyword in domain name
quality of HTML code
page freshness (occasional changes)
…
December 9, 2008 Beat Signer, [email protected]
On-Page Factors (Speculative) …
Negative factors
links to "bad neighbourhood"
over optimisation penalty (keyword stuffing)
text with same colour as background (hidden content)
automatic redirects via the refresh meta tag
any copyright violations
…
December 9, 2008 Beat Signer, [email protected]
Off-Page Factors (Speculative)
Positive factors
high PageRank
anchor text of inbound links
links from authority sites (Hilltop algorithm)
listed in DMOZ (ODP) and Yahoo directories
site age (stability)
domain expiration date
…
December 9, 2008 Beat Signer, [email protected]
Off-Page Factors (Speculative) …
Negative factors
link buying (fast increasing number of inbound links)
link farms
cloaking (different pages for spider and user)
limited (temporal) availability of site
links from bad neighbourhood?
competitor attack (e.g. duplicate content)?
…
December 9, 2008 Beat Signer, [email protected]
Tools
Google toolbar
PageRank information not frequently updated
Google webmaster tools
meta description issues
title tag issues
non-indexable content issues
number and URLs of indexed pages
number and URLs of inbound/outbound links
...
25
December 9, 2008 Beat Signer, [email protected]
Questions
Is PageRank fair?
What about Google's power and influence?
26
December 9, 2008 Beat Signer, [email protected]
Conclusions
PageRank algorithm
absolute quality of a page based on incoming links
random surfer model
computed as eigenvector of Google matrix G
Implications for website development and SEO
PageRank is just one (important) factor
27
December 9, 2008 Beat Signer, [email protected]
References
The PageRank Citation Ranking: Bringing Order
to the Web, L. Page, S. Brin, R. Motwani and
T. Winograd, January 1998
The Anatomy of a Large-Scale Hypertextual
Web Search Engine, S. Brin and L. Page,
Computer Networks and ISDN Systems, 30(1-7),
April 1998
December 9, 2008 Beat Signer, [email protected]
References …
PageRank Uncovered, C. Ridings and
M. Shishigin, September 2002
PageRank Calculator,
http://www.webworkshop.net/pagerank_
calculator.php
29