Link Analysis on the Web An Example: Broad-topic Queries Xin.

Post on 13-Jan-2016

221 views 0 download

Tags:

Transcript of Link Analysis on the Web An Example: Broad-topic Queries Xin.

Link Analysis on the WebAn Example: Broad-topic Queries

Xin Xin

Problem

• Specific queries: “Does Netscape support the JDK 1.1 code-signing API?”

• Broad-topic queries: “Find information about the Java programming language.”

• Authority is important in broad-topic queries

WebQuery: “java”

1. http://java.sun.com

2. http://sunsite.unc.edu/javafaq/javafaq.html

3. …

Why to use link analysis comparing to content information?

Query: Harvard

“Harvard” occurring times: 4

Harvard Homepage Other page introducing Harvard

“Harvard” occurring times: 8

Query: Search engines

“Search engines” occurring times: 0

Yahoo! Homepage Other page introducing search engines

“Search engines” occurring times: 4

Graph Presentation

G=(V,E)

V: pages

E: in-link and out-link

Adjacency matrix

1

2

43

p1 p2 p3 p4

p1

p2

p3

p4

1

1

1

1

1

Given a query, how to find the most authoritative page through these link information?

Overview

Web

Query: “java”

1. http://java.sun.com

2. http://sunsite.unc.edu/javafaq/javafaq.html

3. …

1

2

43

1

2

1. Sub-graph construction

2. Hubs and authorities computation

Step1: Sub-graph Construction• Challenge:

– Small in size– Rich in relevant pages– Contains most of the strongest authorities

Step2: Hubs and Authorities

• Basic Idea: in-degree

• Problem:

Step2: Hubs and Authorities

Step2: Hubs and AuthoritiesAn Iterative Algorithm:

Simple Example 1

1

2

43

(x,y):

x=hub score

y=authority score

(1/4,1/4)

(1/4,1/4)

(1/4,1/4)

(1/4,1/4)

Simple Example 2

1/ 7

4 / 7

1/ 7

1

2

43

(1/4,1/4)

(1/4,1/4)

(1/4,1/4)

(1/4,1/4)

Hub :

1: 1/4

2: 1/4+1/4

3: 1/4

4: 1/4

Authority :

1: 1/4+1/4+1/4

2: 1/4

3: 0

4: 1/4

1/ 7

9 /11

1/11

1/11

0

Page Rank