CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014

14
1 CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network models Tamer Kahveci

description

CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014. Network models Tamer Kahveci. Graphs. Useful for describing networks. G = (V, E) with V = set of nodes E = set of edges Topological models Directed/Undirected Weighted/Unweighted Deterministic/Probabilistic (G = (V, E, P)) - PowerPoint PPT Presentation

Transcript of CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014

Page 1: CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014

1

CIS 4930/6930 – Recent Advances in Bioinformatics

Spring 2014

Network models

Tamer Kahveci

Page 2: CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014

2

Graphs

• Useful for describing networks.• G = (V, E) with

– V = set of nodes– E = set of edges

• Topological models– Directed/Undirected– Weighted/Unweighted– Deterministic/Probabilistic (G = (V, E, P))

• Concepts– Degree (indegree/outdegree), path

Page 3: CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014

3

Topological properties

• Degree distribution, P(k) of G=(V, E)– Deg(k) = number of nodes in G

with degree = k.

– P(k) = Deg(k)/|V| = Probability that a random node in G has degree = k.

H.PyloriTodor et al. TCBB. 10:4. 2013

3

2

2

1

Page 4: CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014

4

Topological properties

• Neighbors of node v, N(v) = set of nodes adjacent to v.

• Clustering coefficient of node v, Cv shows the connectivity of N(v).

• Slightly different denominator for directed vs undirected graph

Cv = # edges among N(v)

Max # edges possible among N(v)

• C(k) = Average clustering coefficients for all nodes with k edges.

• Networks clustering coefficient = average clustering coefficients of all nodes in G = (∑ Cv) / |V|

2/6

Page 5: CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014

5

Centrality of a node

• Centrality of a node v in graph G = (V, E) indicates relative importance of v in G with respect to the rest of the nodes in G. Lets denote it with f(v | G) or simply f(v).

• Many centrality measures exists– Degree centrality

• How popular am I?• fDeg(v) = Deg(v)

– Closeness centrality– Betweenness centrality

Page 6: CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014

6

Closeness Centrality

• How close am I to everyone else?• Given G = (V, E)• Dist(u,v) = shortest path length from u

to v in G• fClose(u) = ∑v in G Dist(u, v)

• Alternative (for disconnected networks)– fClose(u) = ∑v in V-{u} 1/ Dist(u, v)– 1/inf = 0

• How do I find shortest path?– Floyd-Warshall algorithm– Johnson’s algorithm

1

12

3

Page 7: CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014

7

Betweenness Centrality

• How many pairs of nodes use me on the cheapest route to communicate?

• gst = number of shortest path between s & and t.

• gst(v) = number of shortest path between s & and t that contains v.

• fBetween(v) = (∑s,t gst(v)/ gst) / (number of s,t pairs in V- {v}).

Page 8: CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014

Floyd-Warshall: shortest path

8

for k = 1 to n do // use node k on pathfor i = 1 to n do // origin i

for j = 1 to n do // destination jif (d[i,k] + d[k,j]) < d[i,j]) {

d[i,j] = d[i,k] + d[k,j] // shorter path lengthvisit[i,j] = k // new path goes through k

}

Given G = (V, E, w)

Distance(i, j, 0) = w(i, j)Distance(i, j, k+1) = min{Distance(i, j, k), Distance(i, k+1, k) + Distance(k+1, j, k)}

ji

k+1

V’ = {1, 2, …, k}

Page 9: CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014

9

Key network models

• Erdos-Renyi

• Small world

• Scale free

Page 10: CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014

10

Erdos-Renyi

• Totally uniformly random distribution of edges• Construction

– Given two parameters (n = # of nodes, p = probability of an edge existence)

– For all pairs of node (u,v)• Create an edge (u,v) with probability p.

Page 11: CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014

11

Small World (Watts-Strogatz)• Everyone tends to be close to each other.• As the number of nodes (N) in the network

grows, the distance between two random nodes grows with the logarithm of N.

• Construction– Given three parameters:

• N = # of nodes. • K = average degree• p = rewiring probability

– Construct a ring lattice• Connect each ith node to nodes {i-1, i-2, …,

i-k/2} and {i+1, i+2, …, i+k/2} with an edge– For each node u

• For each edge (u, v)– Randomly pick a node v’ = V-{u}– Replace (u, v) with (u, v’) with probability p

Page 12: CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014

12

Scale-Free

• A lot of poor work for a few super rich• Probability that a node has degree k drops exponentially

with k.– P(k) ~ k-ᵞ

• Construction (preferential attachment – or rich gets richer)– Given two parameters (n = # of nodes, k = average degree)– Build a small network (e.g. two nodes and one edge)– Repeat

• Insert a new node v• Insert k edges from v to existing nodes. Existing node u gets an edge with

probability pu = Deg(u)/ ∑i Deg(i)

– Until we have n nodes

Page 13: CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014

13

Hierarchical

• Similar to fractals• Scale-free networks with high

clustering.• Construction

– Create an initial network (seed) with t peripheral nodes

– Create t copies of this network and connect each of them to the central node.

Fractal

Page 14: CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014

Probabilistic

14

a

b c

0.3 0.6a

b c

a

b c

a

b c

a

b c

(1-0.6) x (1-0.3) = 0.28 0.180.28 0.12 0.42

0.28 + 0.12 + 0.42 + 0.18 = 1

G = (V, E, P)

P: E -> (0, 1]