Measuring and Extracting Proximity in Networks
description
Transcript of Measuring and Extracting Proximity in Networks
![Page 1: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/1.jpg)
Measuring and Extracting Proximity in Networks
By - Yehuda Koren, Stephen C.North and Chris Volinsky
- Rahul Sehgal
![Page 2: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/2.jpg)
Introduction Network Information hidden in a network What is Proximity? Why do we need proximity in a network? Methods for measuring and extracting proximities in a network. CFEC ( Cycle free effective conductance) and how to compute
cycle-free escape probability Extracting proximity through proximity graphs obtained by CFEC Working with large networks Experiments Conclusion Questions
![Page 3: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/3.jpg)
Network
Collection of edges or links
Collection of nodes These edges and links
help in deciding the proximity in a given network. node 3
node1
node 2
node 5Node 6
node 4
![Page 4: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/4.jpg)
Hidden information in a network C
D E
A F B
![Page 5: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/5.jpg)
Hidden information in a network If two people speak on
the phone to many common friends, the probability is high that they will talk to each other in the future, or perhaps that they already communicate through some other medium as email.
C
D E
A F B
![Page 6: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/6.jpg)
Hidden information in a network If two nodes are connected
to a common node then it might be possible that they have a strong relationship or they don’t have any relationship at all.
for example: in a telephone network where many people call to service center but it might be possible that two people who are calling they don’t know each other at all.
C
D E
A F B
![Page 7: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/7.jpg)
What is proximity?
Proximity is a method of measuring distance or closeness between different objects.
It is a method of finding hidden relationship between the objects.
It is a method of finding similarities and helps in clustering objects or nodes.
![Page 8: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/8.jpg)
Need of proximity
It measures potential information exchange between two non-linked objects through intermediaries
It can measure the extent to which two nodes belong to each other.
It helps in knowing the likelihood that a link will exist in future.
In social network settings proximity helps to predict or track the propagation of an idea, product or disease .
It helps in discovering unexpected communities in a network.
![Page 9: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/9.jpg)
Various Methods
Graph-theoretic distance: It is length of shortest path connecting two nodes
measured either by number of hops or sum of weight of edges
Limitations Proximity decays as nodes become farther apart. Information may be lost due to friction or noise at a
particular node. This method doesn’t assume that proximity can exist via
multiple paths.
![Page 10: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/10.jpg)
Various Methods contd… Network Flow (maximal
network flow) Limited capacity is
assigned to each edge, depending upon the weight of that edge and then compute the maximal number of units that can be simultaneously delivered from node s to node t.
It prefers high weight edges and captures the premise that an increasing number of alternative paths increases the proximity.
For example we consider
the adjacent figure.
A B
a1
b1
A B
a1
![Page 11: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/11.jpg)
Various Methods contd…
Network Flow (maximal network flow)
Limitations: It disregards the length of the path. It also follow that the maximal s-t flow (that is graph flow
from node s to node t) in a graph equals the minimal s-t cut – that is the minimal edge capacity we need to remove to disconnect s from t, therefore we cannot implement it in a robust system.
![Page 12: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/12.jpg)
Various Methods contd…
Effective Conductance (EC): Modeling of networks as an electric circuit by treating the edges
as resistors whose conductance is the given edge weight. In this method we keep the starting node (s) as 1 and the end
node (t) as 0. and then we solve the linear equations for getting voltage and current on each edge.
It accounts for both path length and number of alternative paths . It avoids dependence on single shortest path and bottlenecks It has a monotonicity property which states that in an electrical
resistor network, increasing the conductance of any resistor or increasing the number of resistors increase conductance between any two nodes.
![Page 13: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/13.jpg)
Various Methods contd…
Limitations of Effective Conductance- Monotonicity has its limitations consider following example
s1
t1 a1
a1 s t
same EC
![Page 14: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/14.jpg)
Various Methods contd…
Sink augmented effective conductance
Each node in a network is connected to a universal sink which is at voltage zero.
This universal sink competes with the node t. Universal sink “tax” each node that absorbs a portion of out
going current. Consequently it forces all the node to have degree greater than 1. So, our restriction for degree one node doesn’t exist any more.
![Page 15: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/15.jpg)
Various Methods contd…
Limitations of sink augmented effective conductance
Its required to know how much current will flow through each node . Understanding how such parameter will influence the proximity is a difficult task.
It destroys the concept of monotonicity . It means whenever a new node is added to the network it has a direct link to the sink but not to t. It strengthens sink and it compete with node t and thus the proximity between s and t is lost.
So, we look for a solution that can overcome the limitations of above methods…..
![Page 16: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/16.jpg)
Random Walk
Definitions: Random Walk is a transition from one state to another
without depending on the previous state. The transition could be to the same state also.
s ta1a3
a2 a4
![Page 17: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/17.jpg)
Random Walk
Random walk in network proximity is the infinite number of attempts that is made to reach from starting node s to end node t. It might be possible that when we traverse this path we might return back to the starting node s.
s t
![Page 18: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/18.jpg)
Random Walk
Explanation In network it might be possible that during random walk we
might go back to s without going to t. As we can see in the diagram it might be possible that we
return back to s via path s–a1-a2-a3-s without going to t. We are having a cyclic path which is leading us back to
starting point s.
continued =>
![Page 19: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/19.jpg)
Random Walk
Diagram
s t a1
a4
a2 a5
a3
![Page 20: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/20.jpg)
What is our goal?
We have to improve the effective conductance measure and avoid any cyclic path…..
![Page 21: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/21.jpg)
CFEC( Cycle Free Effective Conductance) It considers random walk interpretation DEFINITION: The cycle-free escape probability from s
to t is the probability that a random walk originating at s will reach t without visiting any node more than once.
In random walk, a probability of transition from node i to node j is
= probability of transition from node i to node j = weight of edge from i to j = degree of node i.
![Page 22: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/22.jpg)
CFEC( Cycle Free Effective Conductance) The probability that a random walk starting
at v1 for path P= v1-v2-…-vr, will follow this path is given by:
![Page 23: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/23.jpg)
Features of CFEC: We have following equalities:
Multiplying by the degree:
R = set of simple paths from s to t, simple path are those that never visit the same node twice
CFEC discourages long paths as the probability of following the path decays exponentially with its length. It is given by
It supports proximity measure for multiple paths. Degree-1 nodes dilute the significance of path from s to t. So we cut the
main graph into sub-graph. And we preserve the original degree of the nodes.
![Page 24: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/24.jpg)
Explanation of CFEC features
s tc1 c2
CFEC discourages long paths:
ck
Proximity decreases
![Page 25: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/25.jpg)
Computing Cycle Free Escape Probability Restrict the sum in following equation to K most probable simple
path.
Edge weights are transformed into edge lengths, establishing 1-1 correspondence between path probability and path length.
Path probability is given as
![Page 26: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/26.jpg)
Extracting Proximity Through Proximity Graphs : Extract a subgraph that maximizes the ratio:
= subgraph = constant, Find subset of R, which required for above problem. Set of simple paths R is sorted in ascending order of
weights of paths.
![Page 27: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/27.jpg)
Extracting Proximity Through Proximity Graphs : Optimizing the given formula using branch
and bound path merging algorithm:
![Page 28: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/28.jpg)
Branch and Bound path merging algorithm
![Page 29: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/29.jpg)
Working with large networks
Growing candidate graphs via {s, t} neighborhoods
Producing the candidate graph is to find a sub graph containing the highest weight paths originating at either s or t.
Now, the problem becomes , find a sub graph containing shortest paths originating at either s or t. Our objective is to expand the neighborhoods of s and t. This is done by Dijkstra’s algorithm for computing shortest path on graphs with non-negative edge lengths.
![Page 30: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/30.jpg)
Working with large networks contd…. Determining neighborhood size
we determine paths which are not longer than L – log(€). Path lengths greater than is are not useful. Here L is the length of shortest s-t path.
In practice, (L – log(€))/2 – neighborhoods might be too large.
![Page 31: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/31.jpg)
Working with large networks contd…. Pruning the neighborhood:
We prune the neighbor in such a way that dist(s,i)+dist(t,i) > β , where s is our starting node and t is terminating node.
Then, we exclude from the neighborhood any i for which dist(s,i)+dist(i,t) > L – log(€).
![Page 32: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/32.jpg)
Experiments:
Online movie database IMDB.
Co-authorship graph.
Telecommunication graph.
![Page 33: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/33.jpg)
Experiments:
![Page 34: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/34.jpg)
Telecommunication Graph
Extract the candidate graph by growing neighborhoods from the two nodes of interest.
2000 random pairs of telephone numbers to calculate the CFEC value.
For 1808(90%) of these pairs paths between them were found. Others do not have any known path between them, this could be due to, these number were not in frequent use.
We have values for as 1,5,10 and 50.
![Page 35: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/35.jpg)
Telecommunication Graph (contd…)
![Page 36: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/36.jpg)
Conclusion: CFEC proximity allows us to readily compute
proximity graphs, which are small portions of the network that are aimed at capturing a related proximity value.
An analyst studying proximity in a graph has to focus on the most relevant part of the graph.
It is extension of connection graph which is capable of presenting compact relationship between objects of a network.
We can deduce relationship between more than two endpoints, the flexibility to handle edge direction, and the fact that they are obtained by solving an intuitively tunable optimization problem.
![Page 37: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/37.jpg)
QUESTIONS??????
![Page 38: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/38.jpg)
References:
![Page 39: Measuring and Extracting Proximity in Networks](https://reader035.fdocuments.us/reader035/viewer/2022062411/56816837550346895dddf968/html5/thumbnails/39.jpg)
References: