Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan...

101
Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 1 / 47

Transcript of Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan...

Page 1: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Random Walks on Graphs - Part I

Pawan Goyal

CSE, IITKGP

September 09, 2015

Footnotetext without footnote markPawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 1 / 47

Page 2: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Social Networks: underlying data

The underlying data is naturally a graph

Papers linked by citation

Authors linked by co-authorship

Bipartite graph of customers and products

Web-graph: who links to whom

Friendship networks: who knows whom

Follower-followee network

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 2 / 47

Page 3: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Social Networks: underlying data

The underlying data is naturally a graph

Papers linked by citation

Authors linked by co-authorship

Bipartite graph of customers and products

Web-graph: who links to whom

Friendship networks: who knows whom

Follower-followee network

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 2 / 47

Page 4: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Social Networks: underlying data

The underlying data is naturally a graph

Papers linked by citation

Authors linked by co-authorship

Bipartite graph of customers and products

Web-graph: who links to whom

Friendship networks: who knows whom

Follower-followee network

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 2 / 47

Page 5: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Social Networks: underlying data

The underlying data is naturally a graph

Papers linked by citation

Authors linked by co-authorship

Bipartite graph of customers and products

Web-graph: who links to whom

Friendship networks: who knows whom

Follower-followee network

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 2 / 47

Page 6: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Social Networks: underlying data

The underlying data is naturally a graph

Papers linked by citation

Authors linked by co-authorship

Bipartite graph of customers and products

Web-graph: who links to whom

Friendship networks: who knows whom

Follower-followee network

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 2 / 47

Page 7: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Social Networks: underlying data

The underlying data is naturally a graph

Papers linked by citation

Authors linked by co-authorship

Bipartite graph of customers and products

Web-graph: who links to whom

Friendship networks: who knows whom

Follower-followee network

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 2 / 47

Page 8: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Social Networks: underlying data

The underlying data is naturally a graph

Papers linked by citation

Authors linked by co-authorship

Bipartite graph of customers and products

Web-graph: who links to whom

Friendship networks: who knows whom

Follower-followee network

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 2 / 47

Page 9: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Social Networks: what we are looking for

Rank nodes for a particular query

Top k websites for a query

Top k book recommendations from Amazon

Top k Friend suggestions to X when he joins Facebook

Search vs. RecommendationThe problem definition is slightly different if the query is also one of the nodesof the network.

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 3 / 47

Page 10: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Social Networks: what we are looking for

Rank nodes for a particular query

Top k websites for a query

Top k book recommendations from Amazon

Top k Friend suggestions to X when he joins Facebook

Search vs. RecommendationThe problem definition is slightly different if the query is also one of the nodesof the network.

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 3 / 47

Page 11: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Social Networks: what we are looking for

Rank nodes for a particular query

Top k websites for a query

Top k book recommendations from Amazon

Top k Friend suggestions to X when he joins Facebook

Search vs. RecommendationThe problem definition is slightly different if the query is also one of the nodesof the network.

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 3 / 47

Page 12: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Social Networks: what we are looking for

Rank nodes for a particular query

Top k websites for a query

Top k book recommendations from Amazon

Top k Friend suggestions to X when he joins Facebook

Search vs. RecommendationThe problem definition is slightly different if the query is also one of the nodesof the network.

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 3 / 47

Page 13: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Social Networks: what we are looking for

Rank nodes for a particular query

Top k websites for a query

Top k book recommendations from Amazon

Top k Friend suggestions to X when he joins Facebook

Search vs. RecommendationThe problem definition is slightly different if the query is also one of the nodesof the network.

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 3 / 47

Page 14: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Why Random Walks?

A wide variety of interesting real world applications can be framed asranking entities in a graph

A graph-theoretic measure for ranking nodes as well as similarity: forexample, two entities are similar, if lots of short paths between them.

Random walks have proven to be a simple, but powerful mathematicaltool for extracting this information.

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 4 / 47

Page 15: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

What is Random Walk?

Given a graph and a starting point (node), we select a neighbor of it atrandom, and move to this neighbor

Then we select a neighbor of this node and move to it, and so on

The (random) sequence of nodes selected this way is a random walk onthe graph

In the steady state, each node has a long-term visit rate.

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 5 / 47

Page 16: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Adjacency and Transition Matrix

n×n Adjacency matrix A

A(i, j): weight on edge from node i to node j

If the graph is undirected A(i, j) = A(j, i), i.e. A is symmetric

n×n Transition matrix PP is row stochastic

P(i, j) = probability of stepping on node j from node i = A(i,j)ΣjA(i,j)

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 6 / 47

Page 17: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Adjacency and Transition Matrix

n×n Adjacency matrix A

A(i, j): weight on edge from node i to node j

If the graph is undirected A(i, j) = A(j, i), i.e. A is symmetric

n×n Transition matrix PP is row stochastic

P(i, j) = probability of stepping on node j from node i = A(i,j)ΣjA(i,j)

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 6 / 47

Page 18: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Adjacency and Transition Matrix: Example

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 7 / 47

Page 19: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

What is a random walk?

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 8 / 47

Page 20: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Probability Distributions

xt(i): probability that the surfer is at node i at time t

xt+1(i): ∑j (Probability of being at node j at time t)* Pr(j→ i) =∑j xt(j)∗P(j, i)

xt+1 = xtP = xt−1PP = . . .= x0Pt+1

What if the surfer keeps walking for a long time?

Stationary DistributionWhen the surfer keeps walking for a long time

When the distribution does not change anymore, i.e. xT+1 = xT

For well-behaved graphs, this does not depend on the start distribution

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 9 / 47

Page 21: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Probability Distributions

xt(i): probability that the surfer is at node i at time t

xt+1(i): ∑j (Probability of being at node j at time t)* Pr(j→ i) =∑j xt(j)∗P(j, i)

xt+1 = xtP = xt−1PP = . . .= x0Pt+1

What if the surfer keeps walking for a long time?

Stationary DistributionWhen the surfer keeps walking for a long time

When the distribution does not change anymore, i.e. xT+1 = xT

For well-behaved graphs, this does not depend on the start distribution

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 9 / 47

Page 22: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Probability Distributions

xt(i): probability that the surfer is at node i at time t

xt+1(i): ∑j (Probability of being at node j at time t)* Pr(j→ i) =∑j xt(j)∗P(j, i)

xt+1 = xtP = xt−1PP = . . .= x0Pt+1

What if the surfer keeps walking for a long time?

Stationary DistributionWhen the surfer keeps walking for a long time

When the distribution does not change anymore, i.e. xT+1 = xT

For well-behaved graphs, this does not depend on the start distribution

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 9 / 47

Page 23: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Probability Distributions

xt(i): probability that the surfer is at node i at time t

xt+1(i): ∑j (Probability of being at node j at time t)* Pr(j→ i) =∑j xt(j)∗P(j, i)

xt+1 = xtP = xt−1PP = . . .= x0Pt+1

What if the surfer keeps walking for a long time?

Stationary DistributionWhen the surfer keeps walking for a long time

When the distribution does not change anymore, i.e. xT+1 = xT

For well-behaved graphs, this does not depend on the start distribution

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 9 / 47

Page 24: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

What is a stationary distribution?

Stationary distribution at a node is related to the amount of time a randomwalker spends visiting that node

Probability distribution at a node can be written as xt+1 = xtP

For the stationary distribution (say v0), we have v0 = v0P

This is the left eigenvector of the transition matrix

Does a stationary distribution always exist? Is it unique?Yes, if the graph is “well-behaved”

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 10 / 47

Page 25: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

What is a stationary distribution?

Stationary distribution at a node is related to the amount of time a randomwalker spends visiting that node

Probability distribution at a node can be written as xt+1 = xtP

For the stationary distribution (say v0), we have v0 = v0P

This is the left eigenvector of the transition matrix

Does a stationary distribution always exist? Is it unique?Yes, if the graph is “well-behaved”

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 10 / 47

Page 26: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

What is a stationary distribution?

Stationary distribution at a node is related to the amount of time a randomwalker spends visiting that node

Probability distribution at a node can be written as xt+1 = xtP

For the stationary distribution (say v0), we have v0 = v0P

This is the left eigenvector of the transition matrix

Does a stationary distribution always exist? Is it unique?Yes, if the graph is “well-behaved”

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 10 / 47

Page 27: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

What is a stationary distribution?

Stationary distribution at a node is related to the amount of time a randomwalker spends visiting that node

Probability distribution at a node can be written as xt+1 = xtP

For the stationary distribution (say v0), we have v0 = v0P

This is the left eigenvector of the transition matrix

Does a stationary distribution always exist? Is it unique?Yes, if the graph is “well-behaved”

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 10 / 47

Page 28: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Well behaved graphs

IrreducibleThere is a path from every node to every other node.

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 11 / 47

Page 29: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Well behaved graphs

AperiodicThe GCD of all cycle lengths is 1. The GCD is also called period.

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 12 / 47

Page 30: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Why the start distribution does not matter?

Perron Frobenius Theorem: StatementLet A = (aij) be an n×n positive matrix: aij > 0∀1≤ i, j≤ n. Then

There is a positive real number r, such that r is an eigenvalue of A andany other eigenvalue is strictly smaller than r in absolute value.

Markov Chain: irreducible and aperiodic

For any matrix A with eigenvalue σ, |σ| ≤ maxi ∑j |Aij|.Since P is row stochastic, the largest eigenvalue of the transition matrixwill be equal to 1 and all other eigenvalues will be strictly less than 1

Let the eigenvalues of P be {σi|i = 0 : n−1} in non-decreasing order ofσi

σ0 = 1 > σ1 ≥ σ2 ≥ . . .σn

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 13 / 47

Page 31: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Why the start distribution does not matter?

Perron Frobenius Theorem: StatementLet A = (aij) be an n×n positive matrix: aij > 0∀1≤ i, j≤ n. Then

There is a positive real number r, such that r is an eigenvalue of A andany other eigenvalue is strictly smaller than r in absolute value.

Markov Chain: irreducible and aperiodic

For any matrix A with eigenvalue σ, |σ| ≤ maxi ∑j |Aij|.Since P is row stochastic, the largest eigenvalue of the transition matrixwill be equal to 1 and all other eigenvalues will be strictly less than 1

Let the eigenvalues of P be {σi|i = 0 : n−1} in non-decreasing order ofσi

σ0 = 1 > σ1 ≥ σ2 ≥ . . .σn

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 13 / 47

Page 32: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Why the start distribution does not matter?

Perron Frobenius Theorem: StatementLet A = (aij) be an n×n positive matrix: aij > 0∀1≤ i, j≤ n. Then

There is a positive real number r, such that r is an eigenvalue of A andany other eigenvalue is strictly smaller than r in absolute value.

Markov Chain: irreducible and aperiodic

For any matrix A with eigenvalue σ, |σ| ≤ maxi ∑j |Aij|.

Since P is row stochastic, the largest eigenvalue of the transition matrixwill be equal to 1 and all other eigenvalues will be strictly less than 1

Let the eigenvalues of P be {σi|i = 0 : n−1} in non-decreasing order ofσi

σ0 = 1 > σ1 ≥ σ2 ≥ . . .σn

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 13 / 47

Page 33: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Why the start distribution does not matter?

Perron Frobenius Theorem: StatementLet A = (aij) be an n×n positive matrix: aij > 0∀1≤ i, j≤ n. Then

There is a positive real number r, such that r is an eigenvalue of A andany other eigenvalue is strictly smaller than r in absolute value.

Markov Chain: irreducible and aperiodic

For any matrix A with eigenvalue σ, |σ| ≤ maxi ∑j |Aij|.Since P is row stochastic, the largest eigenvalue of the transition matrixwill be equal to 1 and all other eigenvalues will be strictly less than 1

Let the eigenvalues of P be {σi|i = 0 : n−1} in non-decreasing order ofσi

σ0 = 1 > σ1 ≥ σ2 ≥ . . .σn

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 13 / 47

Page 34: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Perron Frobenius Theorem: Implications

v0 = v0P (unique for a well-behaved graph)

Let x be an arbitrary initial distribution

x =n

∑i=1

aiui

xP =n

∑i=1

ai(uiP)

=n

∑i=1

ai(σiui)

Similarly, xPk =n

∑i=1

ai(σikui)

xPPP . . .P = xPk tends to v0 as k goes to infinity.

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 14 / 47

Page 35: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Perron Frobenius Theorem: Implications

xPk = σ1k{a1u1 +a2

(σ2σ1

)ku2 + . . .+an

(σnσ1

)kun}

u1 = v0, thus xPk approaches to v0 as k goes to infinity with a speed in theorder of σ2/σ1 exponentially.

Show that a1 = 1

1n×1 is the right eigenvector of P with eigenvalue 1, since P is stochastic,i.e. P∗1n×1 = 1n×1

Hence, ui∗1n×1 = 1 for i = 1, 0 otherwise (relation between left and right

eigen vectors)

Now, 1 = x∗1n×1 = a1u1∗1n×1 = a1 (Why?)

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 15 / 47

Page 36: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Perron Frobenius Theorem: Implications

xPk = σ1k{a1u1 +a2

(σ2σ1

)ku2 + . . .+an

(σnσ1

)kun}

u1 = v0, thus xPk approaches to v0 as k goes to infinity with a speed in theorder of σ2/σ1 exponentially.

Show that a1 = 11n×1 is the right eigenvector of P with eigenvalue 1, since P is stochastic,i.e. P∗1n×1 = 1n×1

Hence, ui∗1n×1 = 1 for i = 1, 0 otherwise (relation between left and right

eigen vectors)

Now, 1 = x∗1n×1 = a1u1∗1n×1 = a1 (Why?)

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 15 / 47

Page 37: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Important Parameters of a random walk

Access time or hitting time (hij)

hij is the expected number of steps before node j is first visited, starting fromnode i

hij = 1+∑k pikhkj, i 6= j, 0 otherwise

Commute time (cij)

cij = hij +hji

Cover Time Cov(G)

Cov(s,G): expected number of steps it takes a walk that starts at s to visit allverticesCov(G): maximum over s of Cov(s,G)Cov+(G): Cover and return to start

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 16 / 47

Page 38: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Important Parameters of a random walk

Access time or hitting time (hij)

hij is the expected number of steps before node j is first visited, starting fromnode ihij = 1+∑k pikhkj, i 6= j, 0 otherwise

Commute time (cij)

cij = hij +hji

Cover Time Cov(G)

Cov(s,G): expected number of steps it takes a walk that starts at s to visit allverticesCov(G): maximum over s of Cov(s,G)Cov+(G): Cover and return to start

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 16 / 47

Page 39: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Important Parameters of a random walk

Access time or hitting time (hij)

hij is the expected number of steps before node j is first visited, starting fromnode ihij = 1+∑k pikhkj, i 6= j, 0 otherwise

Commute time (cij)

cij = hij +hji

Cover Time Cov(G)

Cov(s,G): expected number of steps it takes a walk that starts at s to visit allverticesCov(G): maximum over s of Cov(s,G)Cov+(G): Cover and return to start

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 16 / 47

Page 40: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Important Parameters of a random walk

Access time or hitting time (hij)

hij is the expected number of steps before node j is first visited, starting fromnode ihij = 1+∑k pikhkj, i 6= j, 0 otherwise

Commute time (cij)

cij = hij +hji

Cover Time Cov(G)

Cov(s,G): expected number of steps it takes a walk that starts at s to visit allvertices

Cov(G): maximum over s of Cov(s,G)Cov+(G): Cover and return to start

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 16 / 47

Page 41: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Important Parameters of a random walk

Access time or hitting time (hij)

hij is the expected number of steps before node j is first visited, starting fromnode ihij = 1+∑k pikhkj, i 6= j, 0 otherwise

Commute time (cij)

cij = hij +hji

Cover Time Cov(G)

Cov(s,G): expected number of steps it takes a walk that starts at s to visit allverticesCov(G): maximum over s of Cov(s,G)

Cov+(G): Cover and return to start

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 16 / 47

Page 42: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Important Parameters of a random walk

Access time or hitting time (hij)

hij is the expected number of steps before node j is first visited, starting fromnode ihij = 1+∑k pikhkj, i 6= j, 0 otherwise

Commute time (cij)

cij = hij +hji

Cover Time Cov(G)

Cov(s,G): expected number of steps it takes a walk that starts at s to visit allverticesCov(G): maximum over s of Cov(s,G)Cov+(G): Cover and return to start

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 16 / 47

Page 43: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Mixing Rate

How fast the random walk converges to its limiting distribution

Mixing rate for some graphs can be very small: O(logn)

Mixing rate depends on the spectral gap: 1−σ2, where σ2 is the secondhighest eigen value

Smaller the value of σ2, larger is the spectral gap, faster is the mixing rate

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 17 / 47

Page 44: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Mixing Rate

How fast the random walk converges to its limiting distribution

Mixing rate for some graphs can be very small: O(logn)

Mixing rate depends on the spectral gap: 1−σ2, where σ2 is the secondhighest eigen value

Smaller the value of σ2, larger is the spectral gap, faster is the mixing rate

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 17 / 47

Page 45: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

PageRank (Page and Brin, 1998)

Basic IntuitionA webpage is important if other important pages point to it

PageRank is a “vote” by all other webpages about the importance of apage.

v(i) = ∑j→i

v(j)degout(j)

v is the stationary distribution of the Markov chain

“The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google’’

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 18 / 47

Page 46: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

PageRank Example (Source: Wikipedia)

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 19 / 47

Page 47: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Irreducibility and Aperiodicity

Dead endsThe web is full of dead ends.

Random walk can get stuck in dead ends.

If there are dead ends, long-term visit rate is not well defined.

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 20 / 47

Page 48: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Irreducibility and Aperiodicity

How to guarantee this for a web graph? – Teleporting

At any time-step the random surfer

jumps (teleport) to any other node with probability c

jumps to its direct neighbors with total probability 1− c

P = (1− c)P+ cU

Uij =1n∀i, j

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 21 / 47

Page 49: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Irreducibility and Aperiodicity

How to guarantee this for a web graph? – TeleportingAt any time-step the random surfer

jumps (teleport) to any other node with probability c

jumps to its direct neighbors with total probability 1− c

P = (1− c)P+ cU

Uij =1n∀i, j

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 21 / 47

Page 50: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Computing PageRank: The Power Method

Start with any distribution x0, e.g. uniform distribution

Algorithm: multiply x0 by increasing powers of P until convergence

After one step, x1 = x0P, after k steps xk = x0Pk

Regardless of where we start, we eventually reach the steady state v0

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 22 / 47

Page 51: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Computing PageRank: The Power Method

Start with any distribution x0, e.g. uniform distribution

Algorithm: multiply x0 by increasing powers of P until convergence

After one step, x1 = x0P, after k steps xk = x0Pk

Regardless of where we start, we eventually reach the steady state v0

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 22 / 47

Page 52: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Computing PageRank: The Power Method

Start with any distribution x0, e.g. uniform distribution

Algorithm: multiply x0 by increasing powers of P until convergence

After one step, x1 = x0P, after k steps xk = x0Pk

Regardless of where we start, we eventually reach the steady state v0

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 22 / 47

Page 53: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Computing PageRank: The Power Method

Start with any distribution x0, e.g. uniform distribution

Algorithm: multiply x0 by increasing powers of P until convergence

After one step, x1 = x0P, after k steps xk = x0Pk

Regardless of where we start, we eventually reach the steady state v0

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 22 / 47

Page 54: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

PageRank: Example

From “Introduction to Information Retrieval” slidesPawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 23 / 47

Page 55: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

PageRank: Example

From “Introduction to Information Retrieval” slidesPawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 24 / 47

Page 56: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

PageRank: Example

From “Introduction to Information Retrieval” slidesPawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 25 / 47

Page 57: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

PageRank: Example

From “Introduction to Information Retrieval” slidesPawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 26 / 47

Page 58: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

PageRank: Example

From “Introduction to Information Retrieval” slidesPawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 27 / 47

Page 59: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Using PageRank for Web Search

PreprocessingGiven graph of links, build matrix P

Apply teleportation

From modified matrix, compute v

vi is the PageRank of page i.

Query processingRetrieve pages satisfying the query

Rank them by their PageRank

Return reranked list to the user

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 28 / 47

Page 60: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Using PageRank for Web Search

PreprocessingGiven graph of links, build matrix P

Apply teleportation

From modified matrix, compute v

vi is the PageRank of page i.

Query processingRetrieve pages satisfying the query

Rank them by their PageRank

Return reranked list to the user

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 28 / 47

Page 61: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Personalized PageRank

We are looking for the vector v such that

v = (1− c)vP+ cr

r is a distribution over web-pages

If r is the uniform distribution we get pagerank

What happens if r is non-uniform? → Pesonalization

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 29 / 47

Page 62: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Personalized PageRank

We are looking for the vector v such that

v = (1− c)vP+ cr

r is a distribution over web-pages

If r is the uniform distribution we get pagerank

What happens if r is non-uniform? → Pesonalization

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 29 / 47

Page 63: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Personalized PageRank

We are looking for the vector v such that

v = (1− c)vP+ cr

r is a distribution over web-pages

If r is the uniform distribution we get pagerank

What happens if r is non-uniform? → Pesonalization

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 29 / 47

Page 64: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Personalized PageRank

We are looking for the vector v such that

v = (1− c)vP+ cr

r is a distribution over web-pages

If r is the uniform distribution we get pagerank

What happens if r is non-uniform?

→ Pesonalization

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 29 / 47

Page 65: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Personalized PageRank

We are looking for the vector v such that

v = (1− c)vP+ cr

r is a distribution over web-pages

If r is the uniform distribution we get pagerank

What happens if r is non-uniform? → Pesonalization

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 29 / 47

Page 66: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Personalized PageRank

The only difference is that we use a non-uniform teleportation distribution,i.e. at any time step, teleport to a set of webpages.

In other words we are looking for the vector v such that

v = (1− c)vP+ cr

r is a non-uniform preference vector specific to a user.

v gives “personalized views” of the web.

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 30 / 47

Page 67: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Topic Sensitive PageRank

Divide the webpages into k broad categories

For each category, compute the biased personalized pagerank vector byteleporting uniformly to websites under that category.

At query time, the probability of query being from any of the aboveclasses is computed

Final pageRank vector is computed by a linear combination of the biasedpagerank vectors computed offline

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 31 / 47

Page 68: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

HITS - Hyperlink-Induced Topic Search

There are two different types of web-pages for searching a broad topic:the authorities and the hubs

Authorities: pages which are good sources of information about a giventopic

Hub: provides pointers to many authorities

Works on a subgraph - can consist of top k search results for the givenquery from a standard text-based engine

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 32 / 47

Page 69: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

HITS - Hyperlink-Induced Topic Search

There are two different types of web-pages for searching a broad topic:the authorities and the hubs

Authorities: pages which are good sources of information about a giventopic

Hub: provides pointers to many authorities

Works on a subgraph - can consist of top k search results for the givenquery from a standard text-based engine

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 32 / 47

Page 70: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

HITS - Hyperlink-Induced Topic Search

There are two different types of web-pages for searching a broad topic:the authorities and the hubs

Authorities: pages which are good sources of information about a giventopic

Hub: provides pointers to many authorities

Works on a subgraph - can consist of top k search results for the givenquery from a standard text-based engine

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 32 / 47

Page 71: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

HITS - Hyperlink-Induced Topic Search

There are two different types of web-pages for searching a broad topic:the authorities and the hubs

Authorities: pages which are good sources of information about a giventopic

Hub: provides pointers to many authorities

Works on a subgraph - can consist of top k search results for the givenquery from a standard text-based engine

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 32 / 47

Page 72: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Hubs and Authorities

Given this subgraph, the idea is to assign two numbers to a node: ahub-score and an authority score

A node is a good hub if it points to many good authorities, whereas anode is a good authority if many good hubs point to it.

a(i)← ∑j:j∈I(i)

h(j)

h(i)← ∑j:j∈O(i)

a(j)

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 33 / 47

Page 73: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Hubs and Authorities

Given this subgraph, the idea is to assign two numbers to a node: ahub-score and an authority score

A node is a good hub if it points to many good authorities, whereas anode is a good authority if many good hubs point to it.

a(i)← ∑j:j∈I(i)

h(j)

h(i)← ∑j:j∈O(i)

a(j)

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 33 / 47

Page 74: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Hubs and Authorities

a = ATh, h = Aa

h = AATh, a = ATAa

h converges to the principal eigenvector of AAT and a converges to theprincipal eigenvector of ATA

AAT(i, j) = ∑k A(i,k)A(j,k): number of nodes both i and j point to,bibliographic coupling

ATA(i, j) = ∑k A(k, i)A(k, j): number of nodes which point to both i and j,co-citation matrix

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 34 / 47

Page 75: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Hubs and Authorities

a = ATh, h = Aa

h = AATh, a = ATAa

h converges to the principal eigenvector of AAT and a converges to theprincipal eigenvector of ATA

AAT(i, j) = ∑k A(i,k)A(j,k): number of nodes both i and j point to,bibliographic coupling

ATA(i, j) = ∑k A(k, i)A(k, j): number of nodes which point to both i and j,co-citation matrix

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 34 / 47

Page 76: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Hubs and Authorities

a = ATh, h = Aa

h = AATh, a = ATAa

h converges to the principal eigenvector of AAT and a converges to theprincipal eigenvector of ATA

AAT(i, j) = ∑k A(i,k)A(j,k): number of nodes both i and j point to,bibliographic coupling

ATA(i, j) = ∑k A(k, i)A(k, j): number of nodes which point to both i and j,co-citation matrix

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 34 / 47

Page 77: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Hubs and Authorities

a = ATh, h = Aa

h = AATh, a = ATAa

h converges to the principal eigenvector of AAT and a converges to theprincipal eigenvector of ATA

AAT(i, j) = ∑k A(i,k)A(j,k): number of nodes both i and j point to,bibliographic coupling

ATA(i, j) = ∑k A(k, i)A(k, j): number of nodes which point to both i and j,co-citation matrix

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 34 / 47

Page 78: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

HITS: Example

From “Introduction to Information Retrieval” slidesPawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 35 / 47

Page 79: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

HITS: Example

From “Introduction to Information Retrieval” slidesPawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 36 / 47

Page 80: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

HITS: Example

From “Introduction to Information Retrieval” slidesPawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 37 / 47

Page 81: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

HITS: Example

From “Introduction to Information Retrieval” slidesPawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 38 / 47

Page 82: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

HITS: Example

From “Introduction to Information Retrieval” slidesPawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 39 / 47

Page 83: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

HITS: Example

From “Introduction to Information Retrieval” slidesPawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 40 / 47

Page 84: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

HITS: Example

From “Introduction to Information Retrieval” slidesPawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 41 / 47

Page 85: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Tightly Knit Communities (TKC) Effect

Consider a collection C which contains the following two communities:

a community y, with a small number of hubs and authorities, in whichevery hub points to most of the authorities,

and a much larger community z, in which each hub points to a smallerpart of the authorities.

Since there are many z-authoritative sites, the hubs do not link to all ofthem, whereas the smaller y community is densely interconnected.

The topic covered by z is the dominant topic of the collection, and isprobably of wider interest on the World Wide Web. The TKC effect occurswhen the sites of y are ranked higher than those of z.

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 42 / 47

Page 86: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Tightly Knit Communities (TKC) Effect

Consider a collection C which contains the following two communities:

a community y, with a small number of hubs and authorities, in whichevery hub points to most of the authorities,

and a much larger community z, in which each hub points to a smallerpart of the authorities.

Since there are many z-authoritative sites, the hubs do not link to all ofthem, whereas the smaller y community is densely interconnected.

The topic covered by z is the dominant topic of the collection, and isprobably of wider interest on the World Wide Web. The TKC effect occurswhen the sites of y are ranked higher than those of z.

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 42 / 47

Page 87: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Tightly Knit Communities (TKC) Effect

Consider a collection C which contains the following two communities:

a community y, with a small number of hubs and authorities, in whichevery hub points to most of the authorities,

and a much larger community z, in which each hub points to a smallerpart of the authorities.

Since there are many z-authoritative sites, the hubs do not link to all ofthem, whereas the smaller y community is densely interconnected.

The topic covered by z is the dominant topic of the collection, and isprobably of wider interest on the World Wide Web.

The TKC effect occurswhen the sites of y are ranked higher than those of z.

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 42 / 47

Page 88: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

Tightly Knit Communities (TKC) Effect

Consider a collection C which contains the following two communities:

a community y, with a small number of hubs and authorities, in whichevery hub points to most of the authorities,

and a much larger community z, in which each hub points to a smallerpart of the authorities.

Since there are many z-authoritative sites, the hubs do not link to all ofthem, whereas the smaller y community is densely interconnected.

The topic covered by z is the dominant topic of the collection, and isprobably of wider interest on the World Wide Web. The TKC effect occurswhen the sites of y are ranked higher than those of z.

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 42 / 47

Page 89: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

TKC Effect

HITS ranking is vulnerable to the tightly knit communities, coined as theTKC effect, and will sometimes rank the sites of a TKC in unjustified highpositions

It has been shown that SALSA is less vulnerable to the TKC effect thanHITS.

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 43 / 47

Page 90: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

TKC Effect

HITS ranking is vulnerable to the tightly knit communities, coined as theTKC effect, and will sometimes rank the sites of a TKC in unjustified highpositions

It has been shown that SALSA is less vulnerable to the TKC effect thanHITS.

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 43 / 47

Page 91: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

SALSA: The Stochastic Approach for Link-StructureAnalysis

Consider a bipartite graph G, two parts correspond to hubs andauthorities

Two separate random walks: Hub walk and Authority walk

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 44 / 47

Page 92: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

SALSA: The Stochastic Approach for Link-StructureAnalysis

Consider a bipartite graph G, two parts correspond to hubs andauthorities

Two separate random walks: Hub walk and Authority walk

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 44 / 47

Page 93: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

SALSA

Two distinct random walksEach walk only visits nodes from one of the two sides of the graph

Traverses paths consisting of two G-edges in each step

Hub matrix H:

hij = ∑{k|(ih,ka),(jh,ka)∈G}

1deg(ih)

· 1deg(ka)

Authority matrix A:

aij = ∑{k|(kh,ia),(kh,ja)∈G}

1deg(ia)

· 1deg(kh)

ai,j > 0 implies that a certain page k links to both pages i and j, thus j isreachable from i by two steps: retracting along k→ i and following k→ j

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 45 / 47

Page 94: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

SALSA

Two distinct random walksEach walk only visits nodes from one of the two sides of the graph

Traverses paths consisting of two G-edges in each step

Hub matrix H:

hij = ∑{k|(ih,ka),(jh,ka)∈G}

1deg(ih)

· 1deg(ka)

Authority matrix A:

aij = ∑{k|(kh,ia),(kh,ja)∈G}

1deg(ia)

· 1deg(kh)

ai,j > 0 implies that a certain page k links to both pages i and j, thus j isreachable from i by two steps: retracting along k→ i and following k→ j

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 45 / 47

Page 95: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

SALSA

Two distinct random walksEach walk only visits nodes from one of the two sides of the graph

Traverses paths consisting of two G-edges in each step

Hub matrix H:

hij = ∑{k|(ih,ka),(jh,ka)∈G}

1deg(ih)

· 1deg(ka)

Authority matrix A:

aij = ∑{k|(kh,ia),(kh,ja)∈G}

1deg(ia)

· 1deg(kh)

ai,j > 0 implies that a certain page k links to both pages i and j, thus j isreachable from i by two steps: retracting along k→ i and following k→ j

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 45 / 47

Page 96: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

SALSA

Two distinct random walksEach walk only visits nodes from one of the two sides of the graph

Traverses paths consisting of two G-edges in each step

Hub matrix H:

hij = ∑{k|(ih,ka),(jh,ka)∈G}

1deg(ih)

· 1deg(ka)

Authority matrix A:

aij = ∑{k|(kh,ia),(kh,ja)∈G}

1deg(ia)

· 1deg(kh)

ai,j > 0 implies that a certain page k links to both pages i and j, thus j isreachable from i by two steps: retracting along k→ i and following k→ j

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 45 / 47

Page 97: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

SALSA Vectors

A: adjacency matrix of the directed graph defined by its link structure

Ar: Row normalized version, obtained by dividing each nonzero entry of Aby the sum of entries in its row

Ac: Column normalized version

It can be shown that H consists of the nonzero rows and columns ofArAc

T

Similarly, A consists of the nonzero rows and columns of AcTAr

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 46 / 47

Page 98: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

SALSA Vectors

A: adjacency matrix of the directed graph defined by its link structure

Ar: Row normalized version, obtained by dividing each nonzero entry of Aby the sum of entries in its row

Ac: Column normalized version

It can be shown that H consists of the nonzero rows and columns ofArAc

T

Similarly, A consists of the nonzero rows and columns of AcTAr

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 46 / 47

Page 99: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

SALSA Vectors

A: adjacency matrix of the directed graph defined by its link structure

Ar: Row normalized version, obtained by dividing each nonzero entry of Aby the sum of entries in its row

Ac: Column normalized version

It can be shown that H consists of the nonzero rows and columns ofArAc

T

Similarly, A consists of the nonzero rows and columns of AcTAr

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 46 / 47

Page 100: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

SALSA Vectors

A: adjacency matrix of the directed graph defined by its link structure

Ar: Row normalized version, obtained by dividing each nonzero entry of Aby the sum of entries in its row

Ac: Column normalized version

It can be shown that H consists of the nonzero rows and columns ofArAc

T

Similarly, A consists of the nonzero rows and columns of AcTAr

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 46 / 47

Page 101: Random Walks on Graphs - Part Ipawang/courses/SC15/rw1.pdf · Random Walks on Graphs - Part I Pawan Goyal CSE, IITKGP September 09, 2015 Footnotetext without footnote mark Pawan Goyal

SALSA Vectors

Pawan Goyal (IIT Kharagpur) Random Walks for Social Networks September 09, 2015 47 / 47