Lecture 3: Mathematics of Networks

61
Lecture 3: Mathematics of Networks CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic

description

Lecture 3: Mathematics of Networks. CS 765: Complex Networks. Slides are modified from Networks: Theory and Application by Lada Adamic. What are networks?. Networks are collections of points joined by lines. “ Network ” ≡ “ Graph ”. node. edge. Network elements: edges. - PowerPoint PPT Presentation

Transcript of Lecture 3: Mathematics of Networks

Page 1: Lecture 3: Mathematics of Networks

Lecture 3:

Mathematics of Networks

CS 765: Complex Networks

Slides are modified from Networks: Theory and Application by Lada Adamic

Page 2: Lecture 3: Mathematics of Networks

What are networks?

Networks are collections of points joined by lines.

“Network” ≡ “Graph”

points lines Domain

vertices edges, arcs math

nodes links computer science

sites bonds physics

actors ties, relations sociology

node

edge

2

Page 3: Lecture 3: Mathematics of Networks

Network elements: edges

Directed (also called arcs) A -> B (EBA)

A likes B, A gave a gift to B, A is B’s child

Undirected A <-> B or A – B

A and B like each other A and B are siblings A and B are co-authors

Edge attributes weight (e.g. frequency of communication) ranking (best friend, second best friend…) type (friend, relative, co-worker) properties depending on the structure of the rest of the graph: e.g.

betweenness Multiedge: multiple edges between two pair of nodes Self-edge: from a node to itself

3

Page 4: Lecture 3: Mathematics of Networks

Directed networks

2

1

1

2

1

2

1

2

1

2

21

1

2

1

2

1

2

12

1

2

1

2

1

2

1

21

2 1

2

1

2

12 1

2

1

2

12

1

2

12

1

2

1 2

12

Ada

Cora

Louise

Jean

Helen

Martha

Alice

Robin

Marion

Maxine

Lena

Hazel Hilda

Frances

Eva

RuthEdna

Adele

Jane

Anna

Mary

Betty

Ella

Ellen

Laura

Irene

girls’ school dormitory dining-table partners (Moreno, The sociometry reader, 1960)

first and second choices shown

4

Page 5: Lecture 3: Mathematics of Networks

Edge weights can have positive or negative values

One gene activates/ inhibits another

One person trusting/ distrusting another

Research challenge: How does one

‘propagate’ negative feelings in a social network?

Is my enemy’s enemy my friend?

Transcription regulatory network in baker’s yeast

5

Page 6: Lecture 3: Mathematics of Networks

Adjacency matrices

Representing edges (who is adjacent to whom) as a matrix Aij = 1 if node i has an edge to node j

= 0 if node i does not have an edge to j

Aii = 0 unless the network has self-loops

If self-loop, Aii=1

Aij = Aji if the network is undirected,or if i and j share a reciprocated edge

ij

i

ij

1

2

3

4

Example:

5

0 0 0 0 0

0 0 1 1 0

0 1 0 1 0

0 0 0 0 1

1 1 0 0 0

A =

6

Page 7: Lecture 3: Mathematics of Networks

Adjacency lists

Edge list 2 3 2 4 3 2 3 4 4 5 5 2 5 1

Adjacency list is easier to work with if network is

large sparse

quickly retrieve all neighbors for a node 1: 2: 3 4 3: 2 4 4: 5 5: 1 2

1

2

3

45

7

Page 8: Lecture 3: Mathematics of Networks

Nodes

Node network properties from immediate connections

indegreehow many directed edges (arcs) are incident on a node

outdegreehow many directed edges (arcs) originate at a node

degree (in or out)number of edges incident on a node

outdegree=2

indegree=3

degree=5

8

Page 9: Lecture 3: Mathematics of Networks

HyperGraphs

Edges join more than two nodes at a time (hyperEdge)

Affliation networks

Examples Families Subnetworks

Can be transformed to a bipartite network

9

C D

A B

C D

A B

Page 10: Lecture 3: Mathematics of Networks

Bipartite (two-mode) networks

edges occur only between two groups of nodes, not within those groups

for example, we may have individuals and events directors and boards of directors customers and the items they purchase metabolites and the reactions they participate in

Page 11: Lecture 3: Mathematics of Networks

in matrix notation

Bij = 1 if node i from the first group

links to node j from the second group = 0 otherwise

B is usually not a square matrix! for example: we have n customers and m products

i

j

1 0 0 0

1 0 0 0

1 1 0 0

1 1 1 1

0 0 0 1

B =

Page 12: Lecture 3: Mathematics of Networks

going from a bipartite to a one-mode graph

One mode projection two nodes from the first group

are connected if they link to the same node in the second group

naturally high occurrence of cliques

some loss of information Can use weighted edges to

preserve group occurrences

Two-mode networkgroup 1

group 2

Page 13: Lecture 3: Mathematics of Networks

Collapsing to a one-mode network

i and j are linked if they both link to k Pij = k Bik Bjk

P’ = B BT

the transpose of a matrix swaps Bxy and Byx

if B is an nxm matrix, BT is an mxn matrix

i

k=1

j

k=2

B = BT =

1 0 0 0

1 0 0 0

1 1 0 0

1 1 1 1

0 0 0 1

1 1 1 1 0

0 0 1 1 0

0 0 0 1 0

0 0 0 1 1

Page 14: Lecture 3: Mathematics of Networks

Matrix multiplication

general formula for matrix multiplication Zij= k Xik Ykj

let Z = P’, X = B, Y = BT

1 0 0 0

1 0 0 0

1 1 0 0

1 1 1 1

0 0 0 1

P’ =

1 1 1 1 0

0 0 1 1 0

0 0 0 1 0

0 0 0 1 1

=

1 1 1 1 0

1 1 1 1 0

1 1 2 2 0

1 1 2 4 1

0 0 0 1 1

1 1

1 2

11 1 1 1 1

1

0

0

= 1*1+1*1 + 1*0 + 1*0= 2

Page 15: Lecture 3: Mathematics of Networks

Collapsing a two-mode network to a one mode-network

Assume the nodes in group 1 are people and the nodes in group 2 are movies

P’ is symmetric The diagonal entries of P’ give the number of movies

each person has seen The off-diagonal elements of P’ give the number of

movies that both people have seen

P’ =

1 1 1 1 0

1 1 1 1 0

1 1 2 2 0

1 1 2 4 1

0 0 0 1 1

1 1

1 2

1

Page 16: Lecture 3: Mathematics of Networks

Trees

Trees are undirected graphs that contain no cycles

For n nodes, number of edges m = n-1 Any node can be dedicated as the root

Page 17: Lecture 3: Mathematics of Networks

examples of trees

In nature trees river networks arteries (or veins, but not both)

Man made sewer system

Computer science binary search trees decision trees (AI)

Network analysis minimum spanning trees

from one node – how to reach all other nodes most quickly may not be unique, because shortest paths are not always unique depends on weight of edges

Page 18: Lecture 3: Mathematics of Networks

Planar graphs

A graph is planar if it can be drawn on a plane without any edges crossing

Page 19: Lecture 3: Mathematics of Networks

Cliques and complete graphs

Kn is the complete graph (clique) with K vertices each vertex is connected to every other vertex there are n*(n-1)/2 undirected edges

K5 K8K3

Page 20: Lecture 3: Mathematics of Networks

Kuratowski’s theorem

Every non-planar network contains at least one subgraph that is an expansion of K5 or K3,3.

K5 K3,3

Expansion: Addition of new node in the middle of edges.

Research challenge: Degree of planarity?

20

Page 21: Lecture 3: Mathematics of Networks

#s of planar graphs of different sizes

1:1

2:2

3:4

4:11

Every planar graph

has a straight line

embedding

Page 22: Lecture 3: Mathematics of Networks

Edge contractions defined

A finite graph G is planar if and only if it has no subgraph that is homeomorphic or edge-contractible to the complete graph in five vertices (K5) or the complete bipartite graph K3, 3. (Kuratowski's Theorem)

Page 23: Lecture 3: Mathematics of Networks

Peterson graph

Example of using edge contractions to show a graph is not planar

Page 24: Lecture 3: Mathematics of Networks

Bi-cliques (cliques in bipartite graphs)

Km,n is the complete bipartite graph with m and n vertices of the two different types

K3,3 maps to the utility graph Is there a way to connect three utilities, e.g. gas, water, electricity to

three houses without having any of the pipes cross?

K3,3

Utility graph

Page 25: Lecture 3: Mathematics of Networks

Node degree

Outdegree =0 0 0 0 0

0 0 1 1 0

0 1 0 1 0

0 0 0 0 1

1 1 0 0 0

A =

n

jijA

1

example: outdegree for node 3 is 2, which we obtain by summing the number of non-zero entries in the 3rd row

Indegree =0 0 0 0 0

0 0 1 1 0

0 1 0 1 0

0 0 0 0 1

1 1 0 0 0

A =

n

iijA

1

example: the indegree for node 3 is 1, which we obtain by summing the number of non-zero entries in the 3rd column

n

iiA

13

n

jjA

13

1

2

3

45

25

Page 26: Lecture 3: Mathematics of Networks

Degree sequence and Degree distribution

Degree sequence: An ordered list of the (in,out) degree of each node

In-degree sequence: [2, 2, 2, 1, 1, 1, 1, 0]

Out-degree sequence: [2, 2, 2, 2, 1, 1, 1, 0]

(undirected) degree sequence: [3, 3, 3, 2, 2, 1, 1, 1]

Degree distribution: A frequency count of the occurrence of each degree

In-degree distribution:[(2,3) (1,4) (0,1)]

Out-degree distribution:[(2,4) (1,3) (0,1)]

(undirected) distribution:[(3,3) (2,2) (1,3)]

0 1 20

1

2

3

4

5

indegree

fre

qu

en

cy

26

Page 27: Lecture 3: Mathematics of Networks

Structural Metrics: Degree distribution

27

What if it is directed ?

Page 28: Lecture 3: Mathematics of Networks

Characterizing networks:How dense are they?

Page 29: Lecture 3: Mathematics of Networks

network metrics: graph density

Of the connections that may exist between n nodes directed graph

emax = n*(n-1)

undirected graph

emax = n*(n-1)/2

What fraction are present? density = e/ emax

For example, out of 12 possible connections,

this graph has 7, giving it a density of 7/12 = 0.583

29

Page 30: Lecture 3: Mathematics of Networks

Graph density

30

Would this measure be useful for comparing networks of different sizes (different numbers of nodes)?

As n → ∞, a graph whose density reaches 0 is a sparse graph

a constant is a dense graph

Page 31: Lecture 3: Mathematics of Networks

Characterizing networks:How far apart are things?

31

Page 32: Lecture 3: Mathematics of Networks

Network metrics: paths

A path is any sequence of vertices such that every consecutive pair of vertices in the sequence is connected by an edge in the network. For directed: traversed in the correct direction for the edges.

path can visit itself (vertex or edge) more than once Self-avoiding paths do not intersect themselves.

Path length r is the number of edges on the path Called hops

32

Page 33: Lecture 3: Mathematics of Networks

Network metrics: paths

33

Page 34: Lecture 3: Mathematics of Networks

Network metrics: shortest paths

A

B

C

DE

1

2

2

3

3

34

3

Page 35: Lecture 3: Mathematics of Networks

Structural metrics: Average path length

35

1 ≤ L ≤ D ≤ N-1

Page 36: Lecture 3: Mathematics of Networks

Eulerian Path

Euler’s Seven Bridges of Königsberg one of the first problems in graph theory

Is there a route that crosses each bridge only once and returns to the starting point?

Source: http://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg

Image 1 – GNU v1.2: Bogdan, Wikipedia; http://commons.wikimedia.org/wiki/Commons:GNU_Free_Documentation_License

Image 2 – GNU v1.2: Booyabazooka, Wikipedia; http://commons.wikimedia.org/wiki/Commons:GNU_Free_Documentation_License

Image 3 – GNU v1.2: Riojajar, Wikipedia; http://commons.wikimedia.org/wiki/Commons:GNU_Free_Documentation_License

Page 37: Lecture 3: Mathematics of Networks

Eulerian and Hamiltonian paths

Hamiltonian path is self avoiding

If starting point and end point are the same: only possible if no nodes have an odd degree as each path must visit and leave each shore

If don’t need to return to starting pointcan have 0 or 2 nodes with an odd degree

Eulerian path: traverse each

edge exactly once

Hamiltonian path: visit

each vertex exactly once

Page 38: Lecture 3: Mathematics of Networks

Characterizing networks:Is everything connected?

38

Page 39: Lecture 3: Mathematics of Networks

Network metrics: components

If there is a path from every vertex in a network to every other, the network is connected otherwise, it is disconnected

Component: A subset of vertices such that there exist at least one path from each member of the subset to others and there does not exist another vertex in the network which is connected to any vertex in the subset Maximal subset

A singeleton vertex that is not connected to any other forms a size one component

Every vertex belongs to exactly one component

39

Page 40: Lecture 3: Mathematics of Networks

network metrics: size of giant component

if the largest component encompasses a significant fraction of the graph, it is called the giant component

40

Page 41: Lecture 3: Mathematics of Networks

components in directed networks

A

B

C

DE

FG

H

Weakly connected componentsA B C D E

G H F

41

Strongly connected components Each node within the component can be reached from every other node in the component

by following directed links

Strongly connected componentsB C D E

A

G H

F

Weakly connected components: every node can be reached from every other node by following links in either direction

A

B

C

DE

FG

H

Page 42: Lecture 3: Mathematics of Networks

components in directed networks

Every strongly connected component of more than one vertex has at least one cycle

Out-component: set of all vertices that are reachable via directed paths starting at a specific vertex v

Out-components of all members of a strongly connected component are identical

In-component: set of all vertices from which there is a direct path to a vertex v In-components of all members of a strongly connected

component are identical

42

A

B

C

DE

FG

H

Page 43: Lecture 3: Mathematics of Networks

bowtie model of the web

The Web is a directed graph: webpages link to other webpages

The connected components tell us what set of pages can be reached from any other just by surfing no ‘jumping’ around by typing in a URL or using a search engine

Broder et al. 1999 – crawl of over 200 million pages and 1.5 billion links. SCC – 27.5% IN and OUT – 21.5% Tendrils and tubes – 21.5% Disconnected – 8%

43

Page 44: Lecture 3: Mathematics of Networks

Network Analysis

What is a network? a bunch of nodes and edges

How do you characterize it? with some basic network metrics

How did network analysis get started? it was the mathematicians

How do you analyze networks today? with pajek or other software

Page 45: Lecture 3: Mathematics of Networks

overview of network analysis tools

Pajeknetwork analysis and visualization,menu driven, suitable for large networks

platforms: Windows (on linux via Wine) download

Netlogoagent based modelingrecently added network modeling capabilities

platforms: any (Java)download

 GUESSnetwork analysis and visualization,extensible, script-driven (jython)

platforms: any (Java)download

Other software tools that we will not be using but that you may find useful: visualization and analysis: UCInet - user friendly social network visualization and analysis software (suitable smaller networks)iGraph - if you are familiar with R, you can use iGraph as a module to analyze or create large networks, or you can directly use the C functions Jung - comprehensive Java library of network analysis, creation and visualization routinesGraph package for Matlab (untested?) - if Matlab is the environment you are most comfortable in, here are some basic routines SIENA - for p* models and longitudinal analysis SNA package for R - all sorts of analysis + heavy duty stats to boot NetworkX - python based free package for analysis of large graphsInfoVis Cyberinfrastructure - large agglomeration of network analysis tools/routines, partly menu driven visualization only:GraphViz - open source network visualization software (can handle large/specialized networks)TouchGraph - need to quickly create an interactive visualization for the web? yEd - free, graph visualization and editing software specialized:fast community finding algorithmmotif profilesCLAIR library - NLP and IR library (Perl Based) includes network analysis routines

finally: INSNA long list of SNA packages

Page 46: Lecture 3: Mathematics of Networks

tools we’ll use

Pajek: extensive menu-driven functionality, including many, many network metrics and manipulations but… not extensible

Guess: extensible, scriptable tool of exploratory data analysis, but more limited selection of built-in methods compared to Pajek

NetLogo: general agent based simulation platform with excellent network modeling support many of the demos in this course were built with NetLogo

iGraph: libraries can be accessed through R or python. Routines scale to millions of nodes.

Page 47: Lecture 3: Mathematics of Networks

other tools: visualization tool: gephi

http://gephi.org primarily for visualization, has some nice touches

http://player.vimeo.com/video/9726202

Page 48: Lecture 3: Mathematics of Networks

visualization tool: GraphViz

Takes descriptions of graphs in simple text languages Outputs images in useful formats Options for shapes and colors Standalone or use as a library

dot: hierarchical or layered drawings of directed graphs, by avoiding edge crossings and reducing edge length

neato (Kamada-Kawai) and fdp (Fruchterman-Reinhold with heuristics to handle larger graphs)

twopi – radial layout circo – circular layout

http://www.graphviz.org

Page 49: Lecture 3: Mathematics of Networks

GraphViz: dot language

digraph G { ranksep=4

nodesep=0.1

size="8,11"

ARCH531_20061 [label="ARCH531",style=bold,color=yellow,style=filled]

ARCH531_20071 [label="ARCH531",gstyle=bold,color=yellow,style=filled]

BIT512_20071 [label="BIT512",gstyle=bold,color=yellow,style=filled]

BIT513_20071 [label="BIT513",gstyle=bold,color=yellow,style=filled]

BIT646_20064 [label="BIT646",gstyle=bold,color=yellow,style=filled]

BIT648_20064 [label="BIT648",gstyle=bold,color=yellow,style=filled]

DESCI502_20071 [label="DESCI502",gstyle=bold,color=yellow,style=filled]

ECON500_20064 [label="ECON500",gstyle=bold,color=yellow,style=filled]

SI791_20064->SI549_20064[weight=2,color=slategray,style="setlinewidth(4)"]SI791_20064->SI596_20071[weight=5,color=slategray,style=bold,style="setlinewidth(10)"]SI791_20064->SI616_20071[weight=2,color=slategray,style=bold,style="setlinewidth(4)"]SI791_20064->SI702_20071[weight=2,color=slategray,style=bold,style="setlinewidth(4)"]SI791_20064->SI719_20071[weight=2,color=slategray,style=bold,style="setlinewidth(4)"]

Page 50: Lecture 3: Mathematics of Networks

Dot (GraphViz)

Page 51: Lecture 3: Mathematics of Networks

Lada’s school of information course recommender (GraphViz)

ARCH531BIT545 BIT645BIT750IOE491 MO501 SI512SI514SI543SI551 SI554SI557 SI575SI605SI622SI650SI654 SI663SI684SI688SI884

COMM810EECS492IOE536 MKT501SI504 SI539SI553 SI599 SI625SI627 SI628SI644SI647 SI649SI653SI658 SI668SI681 SI682SI689 SI699SI702

ELI321

SI622 SI690 RACKHAM998 SI512

SI539

SI607

SI540

SI543 SI605 SI702SI615

SI625

SI654

SI658

SI670

SI682

SI688 SI689

SI702SI791

Page 52: Lecture 3: Mathematics of Networks

MHS663 RACKHAM575SI502 SI515 SI581SI596SI615SI616 SI620 SI621SI626SI643SI646SI655 SI690 SI692SI696SI702 SI792

COMM810 EDCURINS575EDUC601 ENGLISH516 HISTORY698MHS663 SI540SI575 SI579SI596SI624 SI629SI637SI665SI666 SI690 SI791SI901

SI501

SI502SI503

SI504

SI515 SI557

SI575 SI580

SI581 SI632 SI655SI692

SI596

SI626SI643 SI596SI601 SI620

SI624

SI792

SI640SI647

SI674 SI663

SI665SI667SI690

Lada’s school of information course recommender (GraphViz)

Page 53: Lecture 3: Mathematics of Networks

Neato (Graphviz)

Page 54: Lecture 3: Mathematics of Networks

Other visualization tools: Walrus

developed at CAIDA available under the GNU GPL. “…best suited to visualizing moderately sized graphs that

are nearly trees. A graph with a few hundred thousand nodes and only a slightly greater number of links is likely to be comfortable to work with.”

Java-based

Implemented Features rendering at a guaranteed frame rate regardless of graph size coloring nodes and links with a fixed color, or by RGB values

stored in attributes labeling nodes picking nodes to examine attribute values displaying a subset of nodes or links based on a user-supplied

boolean attribute interactive pruning of the graph to temporarily reduce clutter

and occlusion zooming in and out

Source: CAIDA, http://www.caida.org/tools/visualization/walrus/

Page 55: Lecture 3: Mathematics of Networks

visualization tools: YEd - JavaTM Graph Editorhttp://www.yworks.com/en/products_yed_about.htm

(good primarily for layouts)

Page 56: Lecture 3: Mathematics of Networks

yEd and 26,000 nodes(takes a few seconds)

Page 57: Lecture 3: Mathematics of Networks

visualization tools: Prefuse

user interface toolkit for interactive information visualization built in Java using Java2D graphics library data structures and algorithms pipeline architecture featuring reusable, composable modules animation and rendering support architectural techniques for scalability

requires knowledge of Java programming website: http://prefuse.sourceforge.net

Page 58: Lecture 3: Mathematics of Networks

Simple prefuse visualizations

Source: Prefuse, https://github.com/prefuse

Page 59: Lecture 3: Mathematics of Networks

Examples of prefuse applications: flow maps

A flow map of migration from California from 1995-2000, generated automatically by our system using edge routing but no layout adjustment.

http://graphics.stanford.edu/papers/flow_map_layout/

Page 60: Lecture 3: Mathematics of Networks

Examples of prefuse applications: vizster http://jheer.org/vizster

Page 61: Lecture 3: Mathematics of Networks

Outline

Network metrics can help us characterize networks

This has is roots in graph theory

Today there are many network analysis tools to choose from though most of them are in beta!