Google PageRank

29
December 9, 2008 Google PageRank Prof. Beat Signer Department of Computer Science Vrije Universiteit Brussel http://www.beatsigner.com <[email protected]>

description

Explanation of Google's PageRank algorithm with some examples followed by a discussion of related Search engine optimisation (SEO) issues.

Transcript of Google PageRank

Page 1: Google PageRank

December 9, 2008

Google PageRank Prof. Beat Signer

Department of Computer Science

Vrije Universiteit Brussel

http://www.beatsigner.com

<[email protected]>

Page 2: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Overview

History of PageRank

PageRank algorithm

Examples

Implications for website development

2

Page 3: Google PageRank

December 9, 2008 Beat Signer, [email protected]

History of PageRank

Developed as part of an academic

project at Stanford University

research platform to aid under-

standing of large-scale web data

and enable researches to easily

experiment with new search technologies

Larry Page and Sergey Brin worked on the project

about a new kind of search engine (1995-1998) which

finally led to a functional prototype called Google

3

Larry Page Sergey Brin

Page 4: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Web Search Until 1998

Find all documents using a query term

use information retrieval (IR) solutions

ranking based on "on-page factors"

problem: poor quality of search results (order)

Page and Brin proposed to compute the

absolute qualtity of a page (PageRank)

based on the number and quality of pages

linking to a page (votes)

4

Page 5: Google PageRank

December 9, 2008 Beat Signer, [email protected]

PageRank

A page has a high PageRank R if

there are many pages linking to it

or, if there are some pages with a high PageRank

linking to it

Total score = IR score x PageRank

5

P1

R1

P2

R2

P3

R3

P4

R4

P5

R5

P6

R6

P7

R7

P8

R8

Page 6: Google PageRank

December 9, 2008 Beat Signer, [email protected]

PageRank Algorithm

where

Bi is the set of pages

that link to page Pi

Lj is the number of

outgoing links for page Pj

6

ij BP j

j

iL

PRPR

)()( P1 P2

P3

P1

1

P2

1

P3

1

P1

1.5

P2

1.5

P3

0.75

P1

1.5

P2

1.5

P3

0.75

Page 7: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Matrix Representation

Let us define a hyperlink

matrix H

7

P1 P2

P3

otherwise0

if1 ijj

ij

BPLH

0210

001

1210

H iPRRand

HRR

R is an eigenvector of H

with eigenvalue 1

Page 8: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Matrix Representation ...

We can use the power method to find R

8

tt HRR 1

0210

001

1210

HFor our example

this results in or 122R 2.04.04.0

Page 9: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Dangling Pages

Problem with pages

that have no outbound

links (P2)

9

P1 P2

01

00H and 00R

210

210C

211

210CHSand

C

C

Page 10: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Strongly Connected Pages (Graph)

Add new transition

probabilities between

all pages

with probability d we follow

the hyperlink structure S

with probability 1-d we

choose a random page

10

P1 P2

P3 P4

P5

S1G dn

d 1

1 GRR

1-d

1-d 1-d

Page 11: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Examples

11

S1G dn

d 1

1

A1

0.26

A2

0.37

A3

0.37

Page 12: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Examples ...

12

A1

0.13

A2

0.185

A3

0.185

B1

0.13

B2

0.185

B3

0.185

5.0AP 5.0BP

S1G dn

d 1

1

Page 13: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Examples ...

13

A1

0.10

A2

0.14

A3

0.14

B1

0.22

B2

0.20

B3

0.20

38.0AP 62.0BP

S1G dn

d 1

1

Page 14: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Examples ...

14

A1

0.3

A2

0.23

A3

0.18

B1

0.10

B2

0.095

B3

0.095

71.0AP 29.0BP

S1G dn

d 1

1

Page 15: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Examples ...

15

A1

0.35

A2

0.24

A3

0.18

B1

0.09

B2

0.07

B3

0.07

77.0AP 23.0BP

S1G dn

d 1

1

Page 16: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Examples ...

16

A1

0.33

A2

0.17

A3

0.175

B1

0.08

B2

0.06

B3

0.06

80.0AP

20.0BPA4

0.125

S1G dn

d 1

1

Page 17: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Implications for Website Development

First make sure that your page gets indexed

on-page factors

Think about your site's internal link structure

create many internal links for important pages

be "careful" about where to put outgoing links

Increase the number of pages

Ensure that webpages are addressed consistently

http://www.vub.ac.be http://www.vub.ac.be/index.php

Make sure that you get links from good websites

17

Page 18: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Consistent Addressing of Webpages

18

Page 19: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Search Engine Optimisations (SEO)

Internet marketing has become a big business

white hat and black hat optimisations

Bad ranking or removal from index can cost a

company a lot of money

e.g. supplemental index ("Google hell")

19

Page 20: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Black Hat Optimisations (Don'ts)

Link farms

Spamdexing in guestbooks, Wikipedia etc.

"solution": <a rel="nofollow" href="…">…</a>

Doorway pages (cloaking)

e.g. BMW Germany and Ricoh Germany banned in

February 2006

Selling/buying links

...

20

Page 21: Google PageRank

December 9, 2008 Beat Signer, [email protected]

On-Page Factors (Speculative)

It is assumed that there are over 200 on-page

and off-page factors

Positive factors

keyword in title tag

keyword in URL

keyword in domain name

quality of HTML code

page freshness (occasional changes)

Page 22: Google PageRank

December 9, 2008 Beat Signer, [email protected]

On-Page Factors (Speculative) …

Negative factors

links to "bad neighbourhood"

over optimisation penalty (keyword stuffing)

text with same colour as background (hidden content)

automatic redirects via the refresh meta tag

any copyright violations

Page 23: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Off-Page Factors (Speculative)

Positive factors

high PageRank

anchor text of inbound links

links from authority sites (Hilltop algorithm)

listed in DMOZ (ODP) and Yahoo directories

site age (stability)

domain expiration date

Page 24: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Off-Page Factors (Speculative) …

Negative factors

link buying (fast increasing number of inbound links)

link farms

cloaking (different pages for spider and user)

limited (temporal) availability of site

links from bad neighbourhood?

competitor attack (e.g. duplicate content)?

Page 25: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Tools

Google toolbar

PageRank information not frequently updated

Google webmaster tools

meta description issues

title tag issues

non-indexable content issues

number and URLs of indexed pages

number and URLs of inbound/outbound links

...

25

Page 26: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Questions

Is PageRank fair?

What about Google's power and influence?

26

Page 27: Google PageRank

December 9, 2008 Beat Signer, [email protected]

Conclusions

PageRank algorithm

absolute quality of a page based on incoming links

random surfer model

computed as eigenvector of Google matrix G

Implications for website development and SEO

PageRank is just one (important) factor

27

Page 28: Google PageRank

December 9, 2008 Beat Signer, [email protected]

References

The PageRank Citation Ranking: Bringing Order

to the Web, L. Page, S. Brin, R. Motwani and

T. Winograd, January 1998

The Anatomy of a Large-Scale Hypertextual

Web Search Engine, S. Brin and L. Page,

Computer Networks and ISDN Systems, 30(1-7),

April 1998

Page 29: Google PageRank

December 9, 2008 Beat Signer, [email protected]

References …

PageRank Uncovered, C. Ridings and

M. Shishigin, September 2002

PageRank Calculator,

http://www.webworkshop.net/pagerank_

calculator.php

29