Large-scale network analysis -...

101
Large-scale network analysis G´ abor Cs´ ardi [email protected] Department of Biophysics, KFKI Research Institute for Nuclear and Particle Physics of the Hungarian Academy of Sciences, Budapest, Hungary Currently at Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland

Transcript of Large-scale network analysis -...

Page 1: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Large-scale network analysis

Gabor [email protected]

Department of Biophysics, KFKI Research Institute for Nuclear and Particle Physics of the

Hungarian Academy of Sciences, Budapest, Hungary

Currently at

Department of Medical Genetics,

University of Lausanne, Lausanne, Switzerland

Page 2: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Outline

1. The igraph R package

2. What can you do with large graphs?

3. Some unique igraph features

4. Rapid prototyping

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 2

Page 3: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

The igraph software package

• R package, Python extension and C library.

Page 4: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

The igraph software package

• R package, Python extension and C library.

• Under active development.

Page 5: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

The igraph software package

• R package, Python extension and C library.

• Under active development.

• Free for academic and commercial use (GPL). “Standing on the shoulder of giants.”

Page 6: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

The igraph software package

• R package, Python extension and C library.

• Under active development.

• Free for academic and commercial use (GPL). “Standing on the shoulder of giants.”

• State of the art data structures and algorithms, works well with large graphs.

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 3

Page 7: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

How LARGE?

• Well, it depends what you want to calculate.

Page 8: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

How LARGE?

• Well, it depends what you want to calculate.

• Just to create and manipulate it, it is enough if it fits into the memory.

Page 9: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

How LARGE?

• Well, it depends what you want to calculate.

• Just to create and manipulate it, it is enough if it fits into the memory.

• How do I know that it fits into the memory?

Page 10: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

How LARGE?

• Well, it depends what you want to calculate.

• Just to create and manipulate it, it is enough if it fits into the memory.

• How do I know that it fits into the memory?

• igraph (typically) needs 32 bytes per edge and 16 bytes per vertex.

Page 11: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

How LARGE?

• Well, it depends what you want to calculate.

• Just to create and manipulate it, it is enough if it fits into the memory.

• How do I know that it fits into the memory?

• igraph (typically) needs 32 bytes per edge and 16 bytes per vertex.

• A graph with one million vertices and ten million edges needs about 320 Mbytes.

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 4

Page 12: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Installation

1 install.packages("igraph")

It is really that simple. Isn’t it?

Page 13: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Installation

1 install.packages("igraph")

It is really that simple. Isn’t it?

You might also need

1 install.packages("digest")

2 install.packages("rgl")

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 5

Page 14: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

How to follow this “lecture”?

1. Go to http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html and copy &paste everything into your R session. You can skip any example if you wish to.

Page 15: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

How to follow this “lecture”?

1. Go to http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html and copy &paste everything into your R session. You can skip any example if you wish to.

2. You type in everything I type in.

Page 16: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

How to follow this “lecture”?

1. Go to http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html and copy &paste everything into your R session. You can skip any example if you wish to.

2. You type in everything I type in.

3. Sit back and watch. You can download the slides/code anyway.

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 6

Page 17: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

The igraph data model

• Binary relation (=edges) between elements of a set (=vertices).

Page 18: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

The igraph data model

• Binary relation (=edges) between elements of a set (=vertices).

• If the pairs are unordered, then the graph is undirected:

vertices = {A,B,C,D,E}edges = ({A,B}, {A,C}, {B,C}, {C,E}).

A

B

C

E

D

Page 19: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

The igraph data model

• Binary relation (=edges) between elements of a set (=vertices).

• If the pairs are unordered, then the graph is undirected:

vertices = {A,B,C,D,E}edges = ({A,B}, {A,C}, {B,C}, {C,E}).

A

B

C

E

D

• Otherwise it is directed:

vertices = {A,B,C,D,E}edges = ((A,B), (A,C), (B,C), (C,E)).

A

B

C

E

D

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 7

Page 20: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Vertex and edge ids

• Vertices are always numbered from 0.• Numbering is continual, form 0 to n−1.

Page 21: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Vertex and edge ids

• Vertices are always numbered from 0.• Numbering is continual, form 0 to n−1.

• We have to translate vertex names toids:

V = {A,B,C,D,E}E = ((A,B), (A,C), (B,C), (C,E)).

A = 0, B = 1, C = 2, D = 3, E = 4.

Page 22: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Vertex and edge ids

• Vertices are always numbered from 0.• Numbering is continual, form 0 to n−1.

• We have to translate vertex names toids:

V = {A,B,C,D,E}E = ((A,B), (A,C), (B,C), (C,E)).

A = 0, B = 1, C = 2, D = 3, E = 4.

1 library(igraph)

2 g <- graph( c(0,1, 0,2, 1,2, 2,4), n=5 )

3 g

4 g2 <- graph( c(0,1, 0,2, 1,2, 2,4),

5 n=5, dir=FALSE )

6 g2

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 8

Page 23: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Working with igraph graphs

1 ## How to decide what kind of object a variable refers to

2 class(g2)

3 class(1)

4 class("foobar")

5

6 ## Is this object an igraph graph?

7 is.igraph(g)

8 is.igraph(1:10)

9

10 ## Summary, number of vertices, edges

11 summary(g)

12 vcount(g)

13 ecount(g)

14

15 ## Is the graph directed?

16 is.directed(g)

17 is.directed(g2)

18 is.directed(1:10)

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 9

Page 24: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Working with igraph graphs

1 ## Convert from directed to undirected

2 as.undirected(g)

3

4 ## And back

5 as.directed(as.undirected(g))

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 10

Page 25: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

The igraph data model, multiple edges

• igraph can handle multi-graphs:

V = {A,B,C,D,E}E = ((AB), (AB), (AC), (BC), (CE)).

1 g <- graph( c(0,1,0,1, 0,2, 1,2, 3,4), n=5 )

2 g

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 11

Page 26: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

The igraph data model, loop edges

• igraph can handle loop-edges:

V = {A,B,C,D,E}E = ((AA), (AB), (AC), (BC), (CE)).

1 g <- graph( c(0,0,0,1, 0,2, 1,2, 3,4), n=5 )

2 g

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 12

Page 27: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

The igraph data model, what cannot be represented

• “Mixed” graphs, with undirected and directed edges.

• Hypergraphs.

• No direct support for bipartite (two-mode) graphs.

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 13

Page 28: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Naming vertices

1 g <- graph.ring(10)

2 V(g)$name <- letters[1:10]

3 V(g)$name

4 g

5 print(g, v=T)

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 14

Page 29: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Creating graphs, the formula interface

1 # A simple undirected graph

2 g <- graph.formula( Alice-Bob-Cecil-Alice,

3 Daniel-Cecil-Eugene, Cecil-Gordon )

Page 30: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Creating graphs, the formula interface

1 # A simple undirected graph

2 g <- graph.formula( Alice-Bob-Cecil-Alice,

3 Daniel-Cecil-Eugene, Cecil-Gordon )

1 # Another undirected graph, ":" notation

2 g2 <- graph.formula( Alice-Bob:Cecil:Daniel,

3 Cecil:Daniel-Eugene:Gordon )

Page 31: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Creating graphs, the formula interface

1 # A simple undirected graph

2 g <- graph.formula( Alice-Bob-Cecil-Alice,

3 Daniel-Cecil-Eugene, Cecil-Gordon )

1 # Another undirected graph, ":" notation

2 g2 <- graph.formula( Alice-Bob:Cecil:Daniel,

3 Cecil:Daniel-Eugene:Gordon )

1 # A directed graph

2 g3 <- graph.formula( Alice +-+ Bob --+ Cecil

3 +-- Daniel, Eugene --+ Gordon:Helen )

Page 32: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Creating graphs, the formula interface

1 # A simple undirected graph

2 g <- graph.formula( Alice-Bob-Cecil-Alice,

3 Daniel-Cecil-Eugene, Cecil-Gordon )

1 # Another undirected graph, ":" notation

2 g2 <- graph.formula( Alice-Bob:Cecil:Daniel,

3 Cecil:Daniel-Eugene:Gordon )

1 # A directed graph

2 g3 <- graph.formula( Alice +-+ Bob --+ Cecil

3 +-- Daniel, Eugene --+ Gordon:Helen )

1 # A graph with isolate vertices

2 g4 <- graph.formula( Alice -- Bob -- Daniel,

3 Cecil:Gordon, Helen )

Page 33: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Creating graphs, the formula interface

1 # A simple undirected graph

2 g <- graph.formula( Alice-Bob-Cecil-Alice,

3 Daniel-Cecil-Eugene, Cecil-Gordon )

1 # Another undirected graph, ":" notation

2 g2 <- graph.formula( Alice-Bob:Cecil:Daniel,

3 Cecil:Daniel-Eugene:Gordon )

1 # A directed graph

2 g3 <- graph.formula( Alice +-+ Bob --+ Cecil

3 +-- Daniel, Eugene --+ Gordon:Helen )

1 # A graph with isolate vertices

2 g4 <- graph.formula( Alice -- Bob -- Daniel,

3 Cecil:Gordon, Helen )

1 # "Arrows" can be arbitrarily long

2 g5 <- graph.formula( Alice +---------+ Bob )

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 15

Page 34: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Creating graphs, from edge lists and adjacency matrices

1 ## From edge lists

2 el <- cbind( c(0, 0, 1, 2),

3 c(1, 2, 2, 4) )

4 g <- graph.edgelist(el)

5 g

6

7 ## Symbolic edge lists

8 el <- cbind( c("Alice", "Alice", "Bob", "Cecil"),

9 c("Bob", "Cecil", "Cecil", "Ed") )

10 g <- graph.edgelist(el)

11 g

12 summary(g)

13

14 ## Adjacency matrices

15 A <- matrix(sample(0:1, 100, rep=TRUE), 10, 10)

16 g <- graph.adjacency(A)

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 16

Page 35: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Creating graphs, from data frames

1 source("http://cneurocvs.rmki.kfki.hu/igraph/plus.R")

2 vertices <- read.csv("judicial.csv")

3 edges <- read.table("allcites.txt")

4 jg <- graph.data.frame(edges, vertices=vertices, dir=TRUE)

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 17

Page 36: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Visualizing graphs

• plot Uses traditional R graphics,non-interactive, 2d. Publicationquality plots in all formats Rsupports.

Page 37: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Visualizing graphs

• plot Uses traditional R graphics,non-interactive, 2d. Publicationquality plots in all formats Rsupports.

1 g <- barabasi.game(100, m=1)

2 g <- simplify(g)

3 igraph.par("plot.layout",

4 layout.fruchterman.reingold)

5 plot(g, vertex.size=3, vertex.label=NA,

6 edge.arrow.size=0.6)

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 18

Page 38: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Visualizing graphs

• tkplot Uses Tcl/Tk via thetcltk package, interactive, 2d.

Page 39: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Visualizing graphs

• tkplot Uses Tcl/Tk via thetcltk package, interactive, 2d.

1 id <- tkplot(g, vertex.size=3, vertex.label=NA,

2 edge.arrow.size=0.6)

3 coords <- tkplot.getcoords(id)

Page 40: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Visualizing graphs

• tkplot Uses Tcl/Tk via thetcltk package, interactive, 2d.

1 id <- tkplot(g, vertex.size=3, vertex.label=NA,

2 edge.arrow.size=0.6)

3 coords <- tkplot.getcoords(id)

• rglplot Needs the rgl package.

Page 41: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Visualizing graphs

• tkplot Uses Tcl/Tk via thetcltk package, interactive, 2d.

1 id <- tkplot(g, vertex.size=3, vertex.label=NA,

2 edge.arrow.size=0.6)

3 coords <- tkplot.getcoords(id)

• rglplot Needs the rgl package.

1 rglplot(g, vertex.size=3, vertex.label=NA)

2

3 coords <- layout.kamada.kawai(g, dim=3)

4 rglplot(g, vertex.size=3, vertex.label=NA,

5 layout=coords)

Page 42: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Visualizing graphs

• tkplot Uses Tcl/Tk via thetcltk package, interactive, 2d.

1 id <- tkplot(g, vertex.size=3, vertex.label=NA,

2 edge.arrow.size=0.6)

3 coords <- tkplot.getcoords(id)

• rglplot Needs the rgl package.

1 rglplot(g, vertex.size=3, vertex.label=NA)

2

3 coords <- layout.kamada.kawai(g, dim=3)

4 rglplot(g, vertex.size=3, vertex.label=NA,

5 layout=coords)

• (Almost) identical interfaces.

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 19

Page 43: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Graph/vertex/edge attributes

• Assigning/querying attributes:set.graph.attribute, get.graph.attribute,

set.vertex.attribute, get.vertex.attribute,

set.edge.attribute, get.edge.attribute,

list.graph.attributes,

list.vertex.attributes,

list.edge.attributes.

Page 44: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Graph/vertex/edge attributes

• Assigning/querying attributes:set.graph.attribute, get.graph.attribute,

set.vertex.attribute, get.vertex.attribute,

set.edge.attribute, get.edge.attribute,

list.graph.attributes,

list.vertex.attributes,

list.edge.attributes.

• V(g) and E(g).

Page 45: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Graph/vertex/edge attributes

• Assigning/querying attributes:set.graph.attribute, get.graph.attribute,

set.vertex.attribute, get.vertex.attribute,

set.edge.attribute, get.edge.attribute,

list.graph.attributes,

list.vertex.attributes,

list.edge.attributes.

• V(g) and E(g).

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 20

Page 46: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Graph/vertex/edge attributes

1 ## Load the jurisdiction network

2 load("judicial.Rdata.gz")

3

4 ## If we don’t have it then create it again

5 if (!exists("jg")) {

6 source("http://cneurocvs.rmki.kfki.hu/igraph/plus.R")

7 vertices <- read.csv("http://cneurocvs.rmki.kfki.hu/igraph/judicial.csv")

8 edges <- read.table("http://cneurocvs.rmki.kfki.hu/igraph/allcites.txt")

9 jg <- graph.data.frame(edges, vertices=vertices, dir=TRUE)

10 }

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 21

Page 47: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Graph/vertex/edge attributes

1 ## What do we have?

2 summary(jg)

3 V(jg)$year

4 V(jg)$parties

5

6 ## Select vertices based on attributes

7 V(jg) [ year >= 1990 ]

8 V(jg) [ overruled!=0 ]

9

10 ## Group network measures based on attributes

11 deg.per.year <- tapply(degree(jg, mode="out"),

12 V(jg)$year, mean)

13

14 ## Plot it

15 plot( names(deg.per.year), deg.per.year )

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 22

Page 48: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Smart indexing

• Easy access of attributes:

1 g <- erdos.renyi.game(100, 1/100)

2 V(g)$color <- sample( c("red", "black"),

3 vcount(g), rep=TRUE)

4 E(g)$color <- "grey"

5 red <- V(g)[ color == "red" ]

6 bl <- V(g)[ color == "black" ]

7 E(g)[ red %--% red ]$color <- "red"

8 E(g)[ bl %--% bl ]$color <- "black"

9 plot(g, vertex.size=5, layout=

10 layout.fruchterman.reingold)

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 23

Page 49: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Centrality, the network

1 g <- graph.formula( Andre----Beverly:Diane:Fernando:Carol,

2 Beverly--Andre:Diane:Garth:Ed,

3 Carol----Andre:Diane:Fernando,

4 Diane----Andre:Carol:Fernando:Garth:Ed:Beverly,

5 Ed-------Beverly:Diane:Garth,

6 Fernando-Carol:Andre:Diane:Garth:Heather,

7 Garth----Ed:Beverly:Diane:Fernando:Heather,

8 Heather--Fernando:Garth:Ike,

9 Ike------Heather:Jane,

10 Jane-----Ike )

11 g <- simplify(g)

12 coords <- c(5,5,119,256,119,256,120,340,478,

13 622,116,330,231,116,5,330,451,231,231,231)

14 coords <- matrix(coords, nc=2)

15 V(g)$label <- V(g)$name

16 g$layout <- coords # $

17 plot(g, asp=FALSE, vertex.label.color="blue", vertex.label.cex=1.5,

18 vertex.label.font=2, vertex.size=20, vertex.color="white",

19 vertex.frame.color="white", edge.color="black")

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 24

Page 50: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Centrality, the network

Andre

Beverly

Diane

Fernando

Carol

Garth

Ed

Heather Ike Jane

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 25

Page 51: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Centrality, degree

Andre

Beverly

Diane

Fernando

Carol

Garth

Ed

Heather Ike Jane

Number of adjacent edges.

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 26

Page 52: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Centrality, degree

Andre4

Beverly4

Diane6

Fernando5

Carol3

Garth5

Ed3

Heather3

Ike2

Jane1

Number of adjacent edges.

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 27

Page 53: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Centrality, closeness

Andre

Beverly

Diane

Fernando

Carol

Garth

Ed

Heather Ike Jane

Reciproc of the average distance to other vertices.

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 28

Page 54: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Centrality, closeness

Andre0.53

Beverly0.53

Diane0.6

Fernando0.64

Carol0.5

Garth0.64

Ed0.5

Heather0.6

Ike0.43

Jane0.31

Reciproc of the average distance to other vertices.

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 29

Page 55: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Centrality, betweenness

Andre

Beverly

Diane

Fernando

Carol

Garth

Ed

Heather Ike Jane

Number of shortest paths going through a vertex.

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 30

Page 56: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Centrality, betweenness

Andre0.83

Beverly0.83

Diane3.67

Fernando8.33

Carol0

Garth8.33

Ed0

Heather14

Ike8

Jane0

Number of shortest paths going through a vertex.

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 31

Page 57: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Centrality, eigenvector centrality

Andre

Beverly

Diane

Fernando

Carol

Garth

Ed

Heather Ike Jane

Number of adjacent edges, weighted by their “goodness”.

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 32

Page 58: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Centrality, eigenvector centrality

Andre0.73

Beverly0.73

Diane1

Fernando0.83

Carol0.59

Garth0.83

Ed0.59

Heather0.41

Ike0.1

Jane0.02

Number of adjacent edges, weighted by their “goodness”.

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 33

Page 59: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Centrality, Burt’s constraint

Andre

Beverly

Diane

Fernando

Carol

Garth

Ed

Heather Ike Jane

Benefit of brokering between other actors.

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 34

Page 60: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Centrality, Burt’s constraint

Andre0.54

Beverly0.54

Diane0.47

Fernando0.45

Carol0.66

Garth0.45

Ed0.66

Heather0.43

Ike0.5

Jane1

Benefit of brokering between other actors.

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 35

Page 61: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Centrality, transitivity

Andre

Beverly

Diane

Fernando

Carol

Garth

Ed

Heather Ike Jane

Are my friends also friends of each other?

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 36

Page 62: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Centrality, transitivity

Andre0.67

Beverly0.67

Diane0.53

Fernando0.5

Carol1

Garth0.5

Ed1

Heather0.33

Ike0

JaneNaN

Are my friends also friends of each other?

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 37

Page 63: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Sensitivity of centrality measures

Andre

Beverly

Diane

Fernando

Carol

Garth

Ed

Heather Ike Jane

Remove two random edges, add two random edges.

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 38

Page 64: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Sensitivity of centrality measures

1 cl <- numeric()

2 for (i in 1:100) {

3 g2 <- delete.edges(g, sample(ecount(g), 2)-1)

4 g2 <- g2 %u%

5 random.graph.game(vcount(g), 2, type="gnm")

6 cl <- cbind(cl, betweenness(g2))

7 }

8 cl <- as.data.frame(t(cl))

9 colnames(cl) <- V(g)$name # $

10 boxplot(cl, main="Sensitivity of betweenness")

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 39

Page 65: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Sensitivity of centrality measures

Andre Beverly Diane Fernando Carol Garth Ed Heather Ike Jane

05

10

15

20

Sensitivity of betweenness

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 40

Page 66: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Working with a (moderately) large graph

1 library(igraph)

2

3 ## Load the jurisdiction network

4 load("judicial.Rdata.gz")

5

6 ## If we don’t have it then create it again

7 if (!exists("jg")) {

8 source("http://cneurocvs.rmki.kfki.hu/igraph/plus.R")

9 vertices <- read.csv("http://cneurocvs.rmki.kfki.hu/igraph/judicial.csv")

10 edges <- read.table("http://cneurocvs.rmki.kfki.hu/igraph/allcites.txt")

11 jg <- graph.data.frame(edges, vertices=vertices, dir=TRUE)

12 }

13

14 ## Basic data

15 summary(jg)

16

17 ## Is it a simple graph?

18 is.simple(jg)

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 41

Page 67: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Working with a (moderately) large graph

1 ## Is it connected?

2 is.connected(jg)

3

4 ## How many components?

5 no.clusters(jg)

6

7 ## How big are these?

8 table(clusters(jg)$csize) #$

9

10 ## In-degree distribution

11 plot(degree.distribution(jg, mode="in"), log="xy")

12

13 ## Out-degree distribution

14 plot(degree.distribution(jg, mode="out"), log="xy")

15

16 ## Largest in- and out-degree, total degree

17 max(degree(jg, mode="in"))

18 max(degree(jg, mode="out"))

19 max(degree(jg, mode="all"))

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 42

Page 68: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Working with a (moderately) large graph

1 ## Density

2 graph.density(jg)

3

4 ## Transitivity

5 transitivity(jg)

6

7 ## Transitivity of a random graph of the same size

8 g <- erdos.renyi.game(vcount(jg), ecount(jg), type="gnm")

9 transitivity(g)

10

11 ## Dyad census

12 dyad.census(jg)

13

14 ## Triad census

15 triad.census(jg)

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 43

Page 69: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Working with a (moderately) large graph

1 ## Authority and Hub scores

2 authority.score(jg)$vector

3 cor(authority.score(jg)$vector, V(jg)$auth)

4

5 hub.score(jg)$vector

6 cor(hub.score(jg)$vector, V(jg)$hub)

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 44

Page 70: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Big table of “what can be run”

Fast (millions) creating graphs (most of the time) • structural modification(add/delete edges/vertices) • subgraph • simplify •graph.decompose • degree • clusters • graph.density • is.simple,is.loop, is.multiple • articulation points and biconnected components• ARPACK stuff: page.rank, hub.score, authority.score, evcent• transitivity • Burt’s constraint • dyad & triad census, graphmotifs • k-cores • MST • reciprocity • modularity • closeness and(edge) betweenness estimation • shortest paths from one source• generating Gn,p and Gn,m graphs • generating PA graphs withvarious PA exponents • topological sort

Slow (10000) closeness • diameter • betweenness • all-pairs shortest paths,average path length • most layout generators •

Very slow (100) cliques • cohesive blocks • edge/vertex connectivity • maximumflows and minimum cuts • bonpow • alpha centrality • (sub)graphisomorphism

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 45

Page 71: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Why is igraph sooooo slow?

cliques and independent vertex sets Hard problem

cohesive blocks Semi-hard problem

edge/vertex connectivity, maximum flowsand minimum cuts

Semi-hard problem

Bonacich’s power centrality, alphacentrality

Poor implementation

(sub)graph isomorphism Hard problem

anything else contact us if you want to speed it up

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 46

Page 72: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Connection to other network/graph software

• sna and network R packages. Currently throught adjacency matrices. Usenamespaces!

Page 73: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Connection to other network/graph software

• sna and network R packages. Currently throught adjacency matrices. Usenamespaces!

• Pajek. .net file format is supported.

Page 74: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Connection to other network/graph software

• sna and network R packages. Currently throught adjacency matrices. Usenamespaces!

• Pajek. .net file format is supported.

1 g <- read.graph("http://vlado.fmf.uni-lj.si/pub/networks/data/",

2 format="pajek")

Page 75: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Connection to other network/graph software

• sna and network R packages. Currently throught adjacency matrices. Usenamespaces!

• Pajek. .net file format is supported.

1 g <- read.graph("http://vlado.fmf.uni-lj.si/pub/networks/data/",

2 format="pajek")

• Visone. Use GraphML format.

Page 76: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Connection to other network/graph software

• sna and network R packages. Currently throught adjacency matrices. Usenamespaces!

• Pajek. .net file format is supported.

1 g <- read.graph("http://vlado.fmf.uni-lj.si/pub/networks/data/",

2 format="pajek")

• Visone. Use GraphML format.

• Cytoscape. Use GML format.

Page 77: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Connection to other network/graph software

• sna and network R packages. Currently throught adjacency matrices. Usenamespaces!

• Pajek. .net file format is supported.

1 g <- read.graph("http://vlado.fmf.uni-lj.si/pub/networks/data/",

2 format="pajek")

• Visone. Use GraphML format.

• Cytoscape. Use GML format.

• GraphViz. igraph can write .dot files.

Page 78: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Connection to other network/graph software

• sna and network R packages. Currently throught adjacency matrices. Usenamespaces!

• Pajek. .net file format is supported.

1 g <- read.graph("http://vlado.fmf.uni-lj.si/pub/networks/data/",

2 format="pajek")

• Visone. Use GraphML format.

• Cytoscape. Use GML format.

• GraphViz. igraph can write .dot files.

• In general. The GraphML and GML file formats are fully supported, many programscan read/write these.

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 47

Page 79: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Community structure detection

(Modularity and community structure in networks, by Mark Newman, 2006)

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 48

Page 80: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Modularity score

• How to define what is modular?Many proposed definitions, here isa popular one:

Q =1

2|E|∑vw

[Avw−pvw]δ(cv, cw).

Page 81: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Modularity score

• How to define what is modular?Many proposed definitions, here isa popular one:

Q =1

2|E|∑vw

[Avw−pvw]δ(cv, cw).

• Random graph null model:

pvw = p =1

|V |(|V | − 1)

Page 82: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Modularity score

• How to define what is modular?Many proposed definitions, here isa popular one:

Q =1

2|E|∑vw

[Avw−pvw]δ(cv, cw).

• Random graph null model:

pvw = p =1

|V |(|V | − 1)

• Degree sequence based null model:

pvw =kvkw

2|E|

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 49

Page 83: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

“Fast-greedy” algorithm

A Clauset, MEJ Newman, C Moore: Finding community structure in very large networks,http://www.arxiv.org/abs/cond-mat/0408187

01

23

45

6

24

25

23

27

28

31

9

8

2

13

26

29

30

22

20

15

14

18

32

33

16

5●

6●

4

10

11

12

17

21

1

3

7

0

19

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 50

Page 84: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

“Spinglass” algorithm

J. Reichardt and S. Bornholdt: Statistical Mechanics of Community Detection, Phys. Rev.E, 74, 016110 (2006), http://arxiv.org/abs/cond-mat/0603718

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 51

Page 85: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Cohesive blocks

• ‘Structural Cohesion and Embeddedness: a Hierarchical Concept of Social Groups’ byJ.Moody and D.White, Americal Sociological Review, 68, 103–127, 2003

Page 86: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Cohesive blocks

• ‘Structural Cohesion and Embeddedness: a Hierarchical Concept of Social Groups’ byJ.Moody and D.White, Americal Sociological Review, 68, 103–127, 2003

• Definition 1: A collectivity is structurally cohesive to the extent that the socialrelations of its members hold it together.

Page 87: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Cohesive blocks

• ‘Structural Cohesion and Embeddedness: a Hierarchical Concept of Social Groups’ byJ.Moody and D.White, Americal Sociological Review, 68, 103–127, 2003

• Definition 1: A collectivity is structurally cohesive to the extent that the socialrelations of its members hold it together.

• Definition 2: A group is structurally cohesive to the extent that multiple independentrelational paths among all pairs of members hold it together.

Page 88: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Cohesive blocks

• ‘Structural Cohesion and Embeddedness: a Hierarchical Concept of Social Groups’ byJ.Moody and D.White, Americal Sociological Review, 68, 103–127, 2003

• Definition 1: A collectivity is structurally cohesive to the extent that the socialrelations of its members hold it together.

• Definition 2: A group is structurally cohesive to the extent that multiple independentrelational paths among all pairs of members hold it together.

• Vertex-independent paths and vertex connectivity.

Page 89: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Cohesive blocks

• ‘Structural Cohesion and Embeddedness: a Hierarchical Concept of Social Groups’ byJ.Moody and D.White, Americal Sociological Review, 68, 103–127, 2003

• Definition 1: A collectivity is structurally cohesive to the extent that the socialrelations of its members hold it together.

• Definition 2: A group is structurally cohesive to the extent that multiple independentrelational paths among all pairs of members hold it together.

• Vertex-independent paths and vertex connectivity.

• Vertex connectivity and network flows.

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 52

Page 90: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Cohesive blocks

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 53

Page 91: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Cohesive blocks

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 54

Page 92: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Cohesive blocks

1 cb <- graph( c(1,2,1,3,1,4,1,5,1,6,

2 2,3,2,4,2,5,2,7,

3 3,4,3,6,3,7,

4 4,5,4,6,4,7,

5 5,6,5,7,5,21,

6 6,7,

7 7,8,7,11,7,14,7,19,

8 8,9,8,11,8,14,

9 9,10,

10 10,12,10,13,

11 11,12,11,14,

12 12,16, 13,16, 14,15, 15,16,

13 17,18,17,19,17,20,

14 18,20,18,21,

15 19,20,19,22,19,23,

16 20,21, 21,22,21,23,

17 22,23)-1, dir=FALSE)

18

19 V(cb)$label <- seq(vcount(cb)) # $

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 55

Page 93: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Cohesive blocks

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 56

Page 94: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Cohesive blocks

1 blocks <- cohesive.blocks(cb)

2 blocks

3

4 summary(blocks)

5 blocks$blocks

6 lapply(blocks$blocks, "+", 1)

7 blocks$block.cohesion #$

8 plot(blocks, layout=layout.kamada.kawai,

9 vertex.label.cex=2, vertex.size=15,

10 vertex.label.color="black")

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 57

Page 95: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Rapid prototyping

Weighted transitivity

c(i) =A3

ii

(A1A)ii

Page 96: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Rapid prototyping

Weighted transitivity

c(i) =A3

ii

(A1A)ii

cw(i) =W3

ii

(WWmaxW)ii

Page 97: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Rapid prototyping

Weighted transitivity

c(i) =A3

ii

(A1A)ii

cw(i) =W3

ii

(WWmaxW)ii

1 wtrans <- function(g) {

2 W <- get.adjacency(g, attr="weight")

3 WM <- matrix(max(W), nrow(W), ncol(W))

4 diag(WM) <- 0

5 diag( W %*% W %*% W ) /

6 diag( W %*% WM %*% W)

7 }

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 58

Page 98: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Rapid prototyping

Clique percolation, Palla et al.,Nature 435 814–818, 2005

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 59

Page 99: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Rapid prototyping, clique percolation

1 clique.community <- function(graph, k) {

2 clq <- cliques(graph, min=k, max=k)

3 edges <- c()

4 for (i in seq(along=clq)) {

5 for (j in seq(along=clq)) {

6 if ( length(unique(c(clq[[i]],

7 clq[[j]]))) == k+1 ) {

8 edges <- c(edges, c(i,j)-1)

9 }

10 }

11 }

12 clq.graph <- simplify(graph(edges))

13 V(clq.graph)$name <-

14 seq(length=vcount(clq.graph))

15 comps <- decompose.graph(clq.graph)

16

17 lapply(comps, function(x) {

18 unique(unlist(clq[ V(x)$name ]))

19 })

20 }

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 60

Page 100: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

Acknowledgement

Tamas Nepusz

Peter McMahan, the BLISS, Walktrap and Spinglass projects

All the people who contributed code, sent bug reports, suggestions

The R project

Hungarian Academy of Sciences

The OSS community in general

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 61

Page 101: Large-scale network analysis - cneuro.rmki.kfki.hucneurocvs.rmki.kfki.hu/igraph/igraph-final.pdfLarge-scale network analysis G abor Cs ardi csardi@rmki.kfki.hu Department of Biophysics,

More information

http://igraph.sf.net

Please send your comments, questions, feature requests, code (!) to the igraph-help

mailing list. (See Community on the homepage.)

http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 62