Lecture 5: Network centrality Slides are modified from Lada Adamic.

87
Lecture 5: Network centrality Slides are modified from Lada Adamic

Transcript of Lecture 5: Network centrality Slides are modified from Lada Adamic.

Page 1: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Lecture 5:

Network centrality

Slides are modified from Lada Adamic

Page 2: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Measures and Metrics

Knowing the structure of a network, we can calculate various useful quantities or measures that capture particular features of the network topology. basis of most of such measures are from social network analysis

So far, Degree distribution, Average path length, Density

Centrality Degree, Eigenvector, Katz, PageRank, Hubs, Closeness,

Betweenness, ….

Several other graph metrics Clustering coefficient, Assortativity, Modularity, …

2

Page 3: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Characterizing networks:Who is most central?

?

?

?

3

Page 4: Lecture 5: Network centrality Slides are modified from Lada Adamic.

network centrality

Which nodes are most ‘central’?

Definition of ‘central’ varies by context/purpose

Local measure: degree

Relative to rest of network: closeness, betweenness, eigenvector (Bonacich power

centrality), Katz, PageRank, …

How evenly is centrality distributed among nodes? Centralization, hubs and authorities, …

4

Page 5: Lecture 5: Network centrality Slides are modified from Lada Adamic.

centrality: who’s important based on their network position

indegree

In each of the following networks, X has higher centrality than Y according to

a particular measure

outdegree betweenness closeness

5

Page 6: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Outline

Degree centrality Centralization

Betweenness centrality Closeness centrality

Eigenvector centrality Bonacich power centrality

Katz centrality PageRank Hubs and Authorities

6

Page 7: Lecture 5: Network centrality Slides are modified from Lada Adamic.

He who has many friends is most important.

degree centrality (undirected)

When is the number of connections the best centrality measure?o people who will do favors for youo people you can talk to (influence set, information access, …)o influence of an article in terms of citations (using in-degree)

7

Page 8: Lecture 5: Network centrality Slides are modified from Lada Adamic.

degree: normalized degree centrality

divide by the max. possible, i.e. (N-1)

8

Page 9: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Prestige in directed social networks

when ‘prestige’ may be the right word admiration influence gift-giving trust

directionality especially important in instances where ties may not be reciprocated (e.g. dining partners choice network)

when ‘prestige’ may not be the right word gives advice to (can reverse direction) gives orders to (- ” -) lends money to (- ” -) dislikes distrusts

9

Page 10: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Extensions of undirected degree centrality - prestige

degree centrality indegree centrality

a paper that is cited by many others has high prestige a person nominated by many others for a reward has high prestige

10

Page 11: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Freeman’s general formula for centralization: (can use other metrics, e.g. gini coefficient or standard deviation)

CD CD (n*) CD (i)

i1

g[(N 1)(N 2)]

centralization: how equal are the nodes?

How much variation is there in the centrality scores among the nodes?

maximum value in the network

11

Page 12: Lecture 5: Network centrality Slides are modified from Lada Adamic.

degree centralization examples

CD = 0.167

CD = 0.167CD = 1.0

12

Page 13: Lecture 5: Network centrality Slides are modified from Lada Adamic.

degree centralization examples

example financial trading networks

high centralization: one node trading with many others

low centralization: trades are more evenly distributed

13

Page 14: Lecture 5: Network centrality Slides are modified from Lada Adamic.

when degree isn’t everything

In what ways does degree fail to capture centrality in the following graphs?

ability to broker between groups likelihood that information originating anywhere in the

network reaches you…

14

Page 15: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Outline

Degree centrality Centralization

Betweenness centrality Closeness centrality

Eigenvector centrality Bonacich power centrality

Katz centrality PageRank Hubs and Authorities

15

Page 16: Lecture 5: Network centrality Slides are modified from Lada Adamic.

betweenness: another centrality measure

intuition: how many pairs of individuals would have to go through you in order to reach one another in the minimum number of hops?

who has higher betweenness, X or Y?

XY

16

Page 17: Lecture 5: Network centrality Slides are modified from Lada Adamic.

betweenness on toy networks

non-normalized version:

A B C ED

A lies between no two other vertices B lies between A and 3 other vertices: C, D, and E C lies between 4 pairs of vertices (A,D),(A,E),(B,D),(B,E)

note that there are no alternate paths for these pairs to take, so C gets full credit

17

Page 18: Lecture 5: Network centrality Slides are modified from Lada Adamic.

CB (i) g jk (i) /g jkjk

Where gjk = the number of geodesics connecting j-k, and

gjk = the number that actor i is on.

Usually normalized by:

CB' (i) CB (i ) /[(n 1)(n 2) /2]

number of pairs of vertices excluding the vertex itself

betweenness centrality: definition

18

betweenness of vertex ipaths between j and k that pass through i

all paths between j and k

directed graph: (N-1)*(N-2)

Page 19: Lecture 5: Network centrality Slides are modified from Lada Adamic.

betweenness on toy networks

non-normalized version:

19

Page 20: Lecture 5: Network centrality Slides are modified from Lada Adamic.

betweenness on toy networks

non-normalized version:

20

broker

Page 21: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Nodes are sized by degree, and colored by betweenness.

example

Can you spot nodes with high betweenness but relatively low degree?

What about high degree but relatively low betweenness?

21

Page 22: Lecture 5: Network centrality Slides are modified from Lada Adamic.

betweenness on toy networks

non-normalized version:

A B

C

E

D

why do C and D each have betweenness 1?

They are both on shortest paths for pairs (A,E), and (B,E), and so must share credit:

½+½ = 1

Can you figure out why B has betweenness 3.5 while E has betweenness 0.5?

22

Page 23: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Alternative betweenness computations

Slight variations in geodesic path computations inclusion of self in the computations

Flow betweenness Based on the idea of maximum flow

edge-independent path selection effects the results May not include geodesic paths

Random-walk betweenness Based on the idea of random walks Usually yields ranking similar to geodesic betweenness

Many other alternative definitions exist based on diffusion, transmission or flow along network edges

23

Page 24: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Extending betweenness centrality to directed networks

We now consider the fraction of all directed paths between any two vertices that pass through a node

Only modification: when normalizing, we have (N-1)*(N-2) instead of (N-1)*(N-2)/2, because we have twice as many ordered pairs as unordered pairs

CB (i) g jkj ,k

(i) /g jk

betweenness of vertex ipaths between j and k that pass through i

all paths between j and k

CB

' (i) CB(i) /[(N 1)(N 2)]

24

Page 25: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Directed geodesics

A node does not necessarily lie on a geodesic from j to k if it lies on a geodesic from k to j

k

j

25

Page 26: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Outline

Degree centrality Centralization

Betweenness centrality Closeness centrality

Eigenvector centrality Bonacich power centrality

Katz centrality PageRank Hubs and Authorities

26

Page 27: Lecture 5: Network centrality Slides are modified from Lada Adamic.

closeness: another centrality measure

What if it’s not so important to have many direct friends?

Or be “between” others

But one still wants to be in the “middle” of things, not too far from the center

27

Page 28: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Closeness is based on the length of the average shortest path between a vertex and all vertices in the graph

Cc (i) d(i, j)j1

N

1

)1)).((()(' NiCiC CC

Closeness Centrality:

Normalized Closeness Centrality

closeness centrality: definition

28

depends on inverse distance to other vertices

Page 29: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Cc' (A)

d(A, j)j1

N

N 1

1

1 2 3 4

4

1

10

4

1

0.4

closeness centrality: toy example

A B C ED

29

Page 30: Lecture 5: Network centrality Slides are modified from Lada Adamic.

closeness centrality: more toy examples

30

Page 31: Lecture 5: Network centrality Slides are modified from Lada Adamic.

degree number of

connections denoted by size

closeness length of shortest

path to all others denoted by color

how closely do degree and betweenness correspond to closeness?

31

Page 32: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Closeness centrality

Values tend to span a rather small dynamic range typical distance increases logarithmically with network size

In a typical network the closeness centrality C might span a factor of five or less It is difficult to distinguish between central and less central

vertices a small change in network might considerably affect the

centrality order

Alternative computations exist but they have their own problems

32

Page 33: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Influence range

The influence range of i is the set of vertices who are reachable from the node i

33

Page 34: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Extensions of undirected closeness centrality

closeness centrality usually implies all paths should lead to you paths should lead from you to everywhere else

usually consider only vertices from which the node i in question can be reached

34

Page 35: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Outline

Degree centrality Centralization

Betweenness centrality Closeness centrality

Eigenvector centrality Bonacich power centrality

Katz centrality PageRank Hubs and Authorities

Applications to Information Retrieval LexRank

35

Page 36: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Eigenvalues and eigenvectors have their origins in physics, in particular in problems where motion is involved, although their uses extend from solutions to stress and strain problems to differential equations and quantum mechanics.

Eigenvectors are vectors that point in directions where there is no rotation. Eigenvalues are the change in length of the eigenvector from the original length.

The basic equation in eigenvalue problems is:

Axx

Eigenvalues and eigenvectors

Slides from Fred K. Duennebier

Page 37: Lecture 5: Network centrality Slides are modified from Lada Adamic.

In words, this deceptively simple equation says that for the square matrix A, there is a vector x such that the product of Ax is a SCALAR, , that, when multiplied by x, results in the same product. The multiplication of vector x by a scalar constant is

the same as stretching or shrinking the coordinates by a constant value.

The vector x is called an eigenvector and the scalar is called an eigenvalue.

Axx

Eigenvalues and eigenvectors

Page 38: Lecture 5: Network centrality Slides are modified from Lada Adamic.

AxxDo all matrices have real eigenvalues?

No, they must be square and the determinant of A- I must equal zero. This is easy to show:

This can only be true if det(A- I )=|A- I |=0

Are eigenvectors unique?

No, if x is an eigenvector, then x is also an eigenvector and is an eigenvalue.

Ax x0 x A I 0

A(x)= Ax = x = (x)

(E.02)

(E.03)

(E.04)

(E.01)

Page 39: Lecture 5: Network centrality Slides are modified from Lada Adamic.

How do you calculate eigenvectors and eigenvalues?

Expand equation (E.03): det(A- I )=|A- I |=0 for a 2x2 matrix:

A I a11 a12

a21 a22

1 0

0 1

a11 a12

a21 a22

det A I a11 a12

a21 a22 a11 a22 a12a21 0

0 a11a22 a12a21 a11 a22 2

For a 2-dimensional problem such as this, the equation above is a simple quadratic equation with two solutions for . In fact, there is generally one eigenvalue for each dimension, but some may be zero, and some complex.

(E.05)

Page 40: Lecture 5: Network centrality Slides are modified from Lada Adamic.

The solution to E.05 is:

0 a11a22 a12a21 a11 a22 2

a11 a22 a11 a22 2

4 a11a22 a12a21

(E.06)

This “characteristic equation” does not involve x, and the resulting values of can be used to solve for x.

Consider the following example:

A1 2

2 4

(E.07)

Eqn. E.07 doesn’t work here because a11a22-a12a12=0, so we use E.06:

Page 41: Lecture 5: Network centrality Slides are modified from Lada Adamic.

0 a11a22 a12a21 a11 a22 2

0 14 22 (14) 2

(14) 2

We see that one solution to this equation is =0, and dividing both sides of the above equation by yields =5.

Thus we have our two eigenvalues, and the eigenvectors for the first eigenvalue, =0 are:

Axx, A I x0

1 2

2 4

0

0

x

y

1 2

2 4

x

y

1x2y

2x4y

0

0

These equations are multiples of x=-2y, so the smallest whole number values that fit are x=2, y=-1

Page 42: Lecture 5: Network centrality Slides are modified from Lada Adamic.

For the other eigenvalue, =5:

1 2

2 4

5 0

0 5

x

y

4 2

2 1

x

y

4x2y

2x 1y

0

0

-4x + 2y = 0, and 2x y0, so, x1, y2

This example is rather special; A-1 does not exist, the two rows of A- I are dependent and thus one of the eigenvalues is zero. (Zero is a legitimate eigenvalue!)

EXAMPLE: A more common case is A =[1.05 .05 ; .05 1] used in the strain exercise. Find the eigenvectors and eigenvalues for this A, and then calculate [V,D]=eig[A].

The procedure is:

1) Compute the determinant of A- I

2) Find the roots of the polynomial given by | A- I|=0

3) Solve the system of equations (A- I)x=0

Page 43: Lecture 5: Network centrality Slides are modified from Lada Adamic.

A2 .70 .45

.30 .55

A

3 .65 .525

.35 .475

A

100 .600 .600

.400 .400

Or we could find the eigenvalues of A and obtain A100 very quickly using eigenvalues.

What is A100 ?

We can get A100 by multiplying matrices many many times:

What good are such things?

Consider the matrix:

A.8 .3

.2 .7

Page 44: Lecture 5: Network centrality Slides are modified from Lada Adamic.

For now, I’ll just know that there are two eigenvectors for A:

x1 .6

.4

and Ax1

.8 .3

.7 .2

.6

.4

x1 (1 = 1)

x2 1

1

and Ax2

.8 .3

.7 .2

1

1

.5

.5

(2 = 0.5)

The eigenvectors are x1=[.6 ; .4] and x2=[1 ; -1], and

the eigenvalues are 1=1 and 2=0.5.

Note that, if we multiply x1 by A, we get x1.

If we multiply x1 by A again, we STILL get x1.

Thus x1 doesn’t change as we mulitiply it by An.

Page 45: Lecture 5: Network centrality Slides are modified from Lada Adamic.

What about x2?

When we multiply A by x2, we get x2/2,

and if we multiply x2 by A2, we get x2/4 .

This number gets very small fast.

Note that when A is squared the eigenvectors stay the same, but the eigenvalues are squared!

Back to our original problem; we note that for A100,

the eigenvectors will be the same,

the eigenvalues 1=1 and 2=(0.5)100, which is effectively zero.

Each eigenvector is multiplied by its eigenvalue whenever A is applied,

Page 46: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Outline

Degree centrality Centralization

Betweenness centrality Closeness centrality

Eigenvector centrality Bonacich power centrality

Katz centrality PageRank Hubs and Authorities

Applications to Information Retrieval LexRank

46

Page 47: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Eigenvector Centrality

47

Page 48: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Eigenvector Centrality

48

Page 49: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Eigenvector Centrality

Can be calculated for directed graphs as well We need to decide between incoming or outgoing edges

A has no incoming edges, hence a centrality of 0 B has only an incoming edge from A

hence its centrality is also 0

Only vertices that are in a strongly connected component of two or more vertices or the out-component of such a component have non-zero centrality

49

B

A

CD

E

Page 50: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Katz centrality

50

Page 51: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Katz Centrality:

The magnitude of reflects the radius of power• Small values of weight local structure• Larger values weight global structure

If > 0, ego has higher centrality when tied to people who are central

If < 0, then ego has higher centrality when tied to people who are not central

With = 0, you get degree centrality

51

Page 52: Lecture 5: Network centrality Slides are modified from Lada Adamic.

=.25

Katz Centrality: examples

=-.25

Why does the middle node have lower centrality than itsneighbors when is negative?

52

Page 53: Lecture 5: Network centrality Slides are modified from Lada Adamic.

PageRank: bringing order to the web

It’s in the links: links to URLs can be interpreted as endorsements or recommendations the more links a URL receives, the more likely it is to be a

good/entertaining/provocative/authoritative/interesting information source

but not all link sources are created equal a link from a respected information source a link from a page created by a spammer

Many webpages scatteredacross the web

an important page, e.g. slashdot

if a web page isslashdotted, it gains attention

Page 54: Lecture 5: Network centrality Slides are modified from Lada Adamic.

PageRank

54

Page 55: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Ranking pages by tracking a drunk

A random walker following edges in a network for a very long time will spend a proportion of time at each nodewhich can be used as a measure ofimportance

Page 56: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Trapping a drunk

Problem with pure random walk metric: Drunk can be “trapped” and end up going in circles

Page 57: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Ingenuity of the PageRank algorithm

Allow drunk to teleport with some probability e.g. random websurfer follows links for a while, but with some

probability teleports to a “random” page bookmarked page or uses a search engine to start anew

Page 58: Lecture 5: Network centrality Slides are modified from Lada Adamic.

PageRank algorithm

where p1,p2,...,pN are the pages under consideration,

M(pi) is the set of pages that link to pi,

L(pj) is the number of outbound links on page pj, and

N is the total number of pages.

d is the random jumping probability (d = 0.85 for google)

Page 59: Lecture 5: Network centrality Slides are modified from Lada Adamic.

GUESS PageRank demo

Exercise: PageRank

What happens to the relative PageRank scores of the nodes as you increase the teleportation probability?

Can you construct a network such that a node with low indegree has the highest PageRank?

http://projects.si.umich.edu/netlearn/GUESS/pagerank.html

Page 60: Lecture 5: Network centrality Slides are modified from Lada Adamic.

example: probable location of random walker after 1 step

1

2

34

5

7

6 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8

Pag

eRan

k

t=0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8

Pag

eRan

k

t=1

20% teleportation probability

Page 61: Lecture 5: Network centrality Slides are modified from Lada Adamic.

1

2

34

5

7

6 8

example: location probability after 10 steps

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8

Pag

eRan

k

t=0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8

Pag

eRan

k

t=1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8

Pag

eRan

k

t=10

Page 62: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Matrix-based Centrality measures

62

Divide by out-degree

No division

with constant term without constant term

PageRank Degree centrality

Eigenvector centralityKatz centrality

Page 63: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Outline

Degree centrality Centralization

Betweenness centrality Closeness centrality

Eigenvector centrality Bonacich power centrality

Katz centrality PageRank Hubs and Authorities

Applications to Information Retrieval LexRank

63

Page 64: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Hubs and Authorities

In directed networks, vertices that point to important resources should also get a high centrality e.g. review articles, web indexes

recursive definition:

hubs are nodes that links to good authorities

authorities are nodes that are linked to by good hubs

Page 65: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Hyperlink-Induced Topic Search

HITS algorithm

start with a set of pages matching a query

expand the set by following forward and back links

take transition matrix E, where the i,jth entry Eij =1/ni

where i links to j, and ni is the number of links from i

then one can compute the authority scores a, and hub scores h through an iterative approach:

hEa T' Eah '

Page 66: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Outline

Degree centrality Centralization

Betweenness centrality Closeness centrality

Eigenvector centrality Bonacich power centrality

Katz centrality PageRank Hubs and Authorities

Applications to Information Retrieval LexRank

66

Page 67: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Applications to Information Retrieval

Can we use the notion of centrality to pick the best summary sentence?

Can we use the subgraph of query results to infer something about the query?

Can we use a graph of word translations to expand dictionaries? disambiguate word meanings?

How might one use the HITS algorithm for document summarization? Consider a bipartite graph of sentences and words

Page 68: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Centrality in summarization

Extractive summarization pick k sentences that are most representative of a collection of n

sentences

Motivation: capture the most central words in a document or cluster

Centroid score [Radev & al. 2000, 2004a]

Alternative methods for computing centrality?

Page 69: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Sample multidocument cluster

1 (d1s1) Iraqi Vice President Taha Yassin Ramadan announced today, Sunday, that Iraq refuses to back down from its decision to stop cooperating with disarmament inspectors before its demands are met.

2 (d2s1) Iraqi Vice president Taha Yassin Ramadan announced today, Thursday, that Iraq rejects cooperating with the United Nations except on the issue of lifting the blockade imposed upon it since the year 1990.

3 (d2s2) Ramadan told reporters in Baghdad that "Iraq cannot deal positively with whoever represents the Security Council unless there was a clear stance on the issue of lifting the blockade off of it.

4 (d2s3) Baghdad had decided late last October to completely cease cooperating with the inspectors of the United Nations Special Commission (UNSCOM), in charge of disarming Iraq's weapons, and whose work became very limited since the fifth of August, and announced it will not resume its cooperation with the Commission even if it were subjected to a military operation.

5 (d3s1) The Russian Foreign Minister, Igor Ivanov, warned today, Wednesday against using force against Iraq, which will destroy, according to him, seven years of difficult diplomatic work and will complicate the regional situation in the area.

6 (d3s2) Ivanov contended that carrying out air strikes against Iraq, who refuses to cooperate with the United Nations inspectors, ``will end the tremendous work achieved by the international group during the past seven years and will complicate the situation in the region.''

7 (d3s3) Nevertheless, Ivanov stressed that Baghdad must resume working with the Special Commission in charge of disarming the Iraqi weapons of mass destruction (UNSCOM).

8 (d4s1) The Special Representative of the United Nations Secretary-General in Baghdad, Prakash Shah, announced today, Wednesday, after meeting with the Iraqi Deputy Prime Minister Tariq Aziz, that Iraq refuses to back down from its decision to cut off cooperation with the disarmament inspectors.

9 (d5s1) British Prime Minister Tony Blair said today, Sunday, that the crisis between the international community and Iraq ``did not end'' and that Britain is still ``ready, prepared, and able to strike Iraq.''

10 (d5s2) In a gathering with the press held at the Prime Minister's office, Blair contended that the crisis with Iraq ``will not end until Iraq has absolutely and unconditionally respected its commitments'' towards the United Nations.

11 (d5s3) A spokesman for Tony Blair had indicated that the British Prime Minister gave permission to British Air Force Tornado planes stationed in Kuwait to join the aerial bombardment against Iraq.

(DUC cluster d1003t)

Page 70: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Cosine between sentences

Let s1 and s2 be two sentences.

Let x and y be their representations in an n-dimensional vector space

The cosine between is then computed based on the inner product of the two.

yx

yx

yx niii

,1),cos(

The cosine ranges from 0 to 1.

Page 71: Lecture 5: Network centrality Slides are modified from Lada Adamic.

LexRank (Cosine centrality)

1 2 3 4 5 6 7 8 9 10 11

1 1.00 0.45 0.02 0.17 0.03 0.22 0.03 0.28 0.06 0.06 0.00

2 0.45 1.00 0.16 0.27 0.03 0.19 0.03 0.21 0.03 0.15 0.00

3 0.02 0.16 1.00 0.03 0.00 0.01 0.03 0.04 0.00 0.01 0.00

4 0.17 0.27 0.03 1.00 0.01 0.16 0.28 0.17 0.00 0.09 0.01

5 0.03 0.03 0.00 0.01 1.00 0.29 0.05 0.15 0.20 0.04 0.18

6 0.22 0.19 0.01 0.16 0.29 1.00 0.05 0.29 0.04 0.20 0.03

7 0.03 0.03 0.03 0.28 0.05 0.05 1.00 0.06 0.00 0.00 0.01

8 0.28 0.21 0.04 0.17 0.15 0.29 0.06 1.00 0.25 0.20 0.17

9 0.06 0.03 0.00 0.00 0.20 0.04 0.00 0.25 1.00 0.26 0.38

10 0.06 0.15 0.01 0.09 0.04 0.20 0.00 0.20 0.26 1.00 0.12

11 0.00 0.00 0.00 0.01 0.18 0.03 0.01 0.17 0.38 0.12 1.00

Page 72: Lecture 5: Network centrality Slides are modified from Lada Adamic.

d4s1

d1s1

d3s2

d3s1

d2s3

d2s1

d2s2

d5s2d5s3

d5s1

d3s3

Lexical centrality (t=0.3)

Page 73: Lecture 5: Network centrality Slides are modified from Lada Adamic.

d4s1

d1s1

d3s2

d3s1

d2s3

d2s1

d2s2

d5s2d5s3

d5s1

d3s3

Lexical centrality (t=0.2)

Page 74: Lecture 5: Network centrality Slides are modified from Lada Adamic.

d4s1

d1s1

d3s2

d3s1

d2s3d3s3

d2s1

d2s2

d5s2d5s3

d5s1

Lexical centrality (t=0.1)

Sentences vote for the most central sentence…

d4s1

d3s2

d2s1

Page 75: Lecture 5: Network centrality Slides are modified from Lada Adamic.

N

dTiTETp

Tc

dTiTETp

Tc

dTip nn

n

1)()(

)(...)()(

)()( ,,11

1

LexRank

T1…Tn are pages that link to A,

c(Ti) is the outdegree of pageTi, and

N is the total number of pages.

d is the “damping factor”, or the probability that we “jump” to a far-away node during the random walk. It accounts for disconnected components or periodic graphs.

When d = 0, we have a strict uniform distribution.When d = 1, the method is not guaranteed to converge to a unique solution.

Typical value for d is between [0.1,0.2] (Brin and Page, 1998).

Güneş Erkan and Dragomir R. Radev, LexRank: Graph-based Lexical Centrality as Salience in Text Summarization

Page 76: Lecture 5: Network centrality Slides are modified from Lada Adamic.

lab: Lexrank demo

http://tangra.si.umich.edu/demos/lexrank/

how does the summary change as you:

increase the cosine similarity threshold for an edge how similar two

sentences have to be?

increase the salience threshold (minimum degree of a node)

Page 77: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Menczer, Filippo (2004) The evolution of document networks.

Content similarity distributions forweb pages (DMOZ) and scientific articles (PNAS)

Page 78: Lecture 5: Network centrality Slides are modified from Lada Adamic.

what is that good for?

How could you take advantage of the fact that pages that are similar in content tend to link to one another?

Page 79: Lecture 5: Network centrality Slides are modified from Lada Adamic.

What can networks of query results tell us about the query?

Jure Leskovec, Susan Dumais: Web Projections: Learning from Contextual Subgraphs of the Web

If query results are highly interlinked, is this a narrow or broad query?

How could you use query connection graphs to predict whether a query will be reformulated?

Page 80: Lecture 5: Network centrality Slides are modified from Lada Adamic.

How can bipartite citation graphs be used to find related articles?

co-citation: both A and B are cited by many other papers (C, D, E …)

AB

C

D E

bibliographic coupling: both A and B are cite many of the same articles (F,G,H …)

FG

H

AB

Page 81: Lecture 5: Network centrality Slides are modified from Lada Adamic.

which of these pairs is more proximate

according to cycle free effective conductance: the probability that you reach the other node before cycling back

on yourself, while doing a random walk….

Page 82: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Proximity as cycle free effective conductance

Measuring and Extracting Proximity in Networks by Yehuda Koren, Stephen C. North, Chris Volinsky, KDD 2006

demo: http://public.research.att.com/~volinsky/cgi-bin/prox/prox.pl

Page 83: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Source: undetermined

Using network algorithms (specifically proximity) to improve movie recommendations can pay off

Page 84: Lecture 5: Network centrality Slides are modified from Lada Adamic.

final IR application: machine translation

not all pairwise translations are available e.g. between rare languages

in some applications, e.g. image search, a word may have multiple meanings “spring” is an example in english

But in other languages, the word may be unambiguous.

automated translation could be the key

or or or

Page 85: Lecture 5: Network centrality Slides are modified from Lada Adamic.

final IR application: machine translation

spring English

printemps French

primavera Spanish

ربيعArabic

koanga Maori

udaherri Basque

1

vzmet Slovenian пружина

Russian

ressort French

2

veer Dutch

рысора Belarusian

……

……

3

11 1

1

3

3

…3

3

444

2

22

4

2

2

4

3

4

1

if we combine all known word pairs, can we construct additional dictionaries between rare languages?

source: Reiter et al., ‘Lexical Translation with Application to Image Search on the Web ’

Page 86: Lecture 5: Network centrality Slides are modified from Lada Adamic.

Automatic translation & network structure

Two words more likely to have same meaning if there are multiple indirect paths of length 2 through other languages

spring English

printemps French

primavera Spanish

ربيعArabic

koanga Maori

udaherri Basque

1…

3

11 1

1

3

3

…3

33

1

пружина Russian…

22

Page 87: Lecture 5: Network centrality Slides are modified from Lada Adamic.

summary

the web can be studied as a network

this is useful for retrieving relevant content

network concepts can be used in other IR tasks summarization query prediction machine translation