Post on 31-Jan-2021
Dynamic Graphs Compression andDirchlet-multinomial Entropy
Wojciech Szpankowski
Center for Science of Information, Purdue University
Joint work with P. Jacquet, A. Magner, K. Turowski
ITA, San Diego, 2019
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Talk outline
1 Introduction: dynamic networks and structural information
2 Duplication graph models
Full duplication model: entropy
Compression algorithms: Arithmetic Encoding
3 Entropy of Dirichlet-multinomial distribution
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Dynamic networks
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Dynamic networks
Main idea
Model complex systems as a set of dynamic (time-evolving) interactingentities in which the spatial structure and patterns of interactions changeover time.
Some challenges:
infer underlying dynamic processes and their parameters governingnetwork evolution from sparsely sampled system state,
infer spatio-temporal properties under assumption of a givenunderlying process, e.g. arrival sequences, clustering coefficient,degree distribution,
determine minimum number of bits to describe dynamic networks.
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Structural information in random graphs
For random graphs generated according to a known model the mainstructural quantities are:
Automorphisms set Aut(G ) – set of permutations ofvertices in which the graph preserves the edge-vertexconnectivity.
Feasible permutations set Γ(G ) – set of permuta-tions σ of V (G ) such that Pr (Gn = σ(G )) > 0.
Admissible set Adm(G ) – set of positive-probabilitygraphs which can be obtained from G by applyingσ ∈ Γ(G ).
Note: |Adm(G )| = |Γ(G)||Aut(G)| .
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Labeled vs. unlabeled graphs
There are two flavours of duplication graph models:
1 unlabeled graphs S(Gn) – the vertices contain no additionalinformation,
2 labeled graphs Gn – the vertices contain information e.g. abouttheir time of arrival.
Figure: Example unlabeled graph.
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Labeled vs. unlabeled graphs
There are two flavours of duplication graph models:
1 unlabeled graphs S(Gn) – the vertices contain no additionalinformation,
2 labeled graphs Gn – the vertices contain information e.g. abouttheir time of arrival.
1 2 3 4
5678
9
Figure: Example labeled graph.
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Compression of graphs and structures
Theorem (Structural entropy for a broad class of graph models)
If all graphs with a given structure are equiprobable, then
H(Gn)− H(S(Gn)) = E[log |Γ(Gn)|]− E[log |Aut(Gn)|].
Proof.
The theorem follows directly from two simple facts:
H(Gn) = H(Gn,S(Gn)) = H(S(Gn)) + H(Gn|S(Gn)),
Pr(Gn = G |S(Gn)) =1
|Adm(G )|=|Aut(G )||Γ(G )|
,
where H(Gn) = −∑
G Pr(Gn = G ) log Pr(Gn = G ).
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Previous results
Erdős-Renyi model G (n, p)
Graph on n vertices, each pair of nodes receives an edge independently,with probability p.
H(G ) =(n2
)h(p), H(S(G )) =
(n2
)h(p)− n log n + O(n),
where h(p) = p log p + (1− p) log(1− p).
Note:
|Γ(G )| = n!,
for ln nn � p � 1−ln nn it is true that |Aut(G )| = 1 whp.
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Previous results
Preferential Attachment model PA(n,m)
Start from a single vertex v1 with m self loops.
At each t from 2 on a new vertex vt joins and makes m independentconnection to the existing nodes with probability:
Pr[vt connects to vk |Gt−1] =degt−1(vk)
2m(t − 1).
H(G ) = mn log n + m(log(2m − 1)− log(m!)− A(m))n + o(n) forexplicitly known constant A(m),
if m ≥ 3, then H(S(G )) = (m − 1)n log n + O(n log log n).
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Talk outline
1 Introduction: dynamic networks and structural information
2 Duplication graph models
Full duplication model: entropy
Compression algorithms: Arithmetic Encoding
3 Entropy of Dirichlet-multinomial distribution
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Duplication models
Basic duplication (vertex-copying) model DD(n, p,G0)
1 Start from an arbitrary (fixed) G0 on n0 vertices,2 In each step t = 1, . . . , n:
1 add a new vertex vt to a graph,2 pick any vertex u from all previous vertices at random (uniformly),3 attach vt to all vertices connected to u (independently, withprobability p).
There are many variants of this model: e.g. connecting vt to u, addingedges from vt to other vertices of G , or removing some edges.
We consider first the boundary case p = 1.
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Example
u1 u2 u3 u4
u5u6
v1v2
v3
Figure: Example graph growth in the full duplication model
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Example
u1 u2 u3 u4
u5u6v1
v2
v3
Figure: Example graph growth in the full duplication model
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Example
u1 u2 u3 u4
u5u6v1v2
v3
Figure: Example graph growth in the full duplication model
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Example
u1 u2 u3 u4
u5u6v1v2
v3
Figure: Example graph growth in the full duplication model
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Talk outline
1 Introduction: dynamic networks and structural information
2 Duplication graph models
Full duplication model: entropy
Compression algorithms: Arithmetic Encoding
3 Entropy of Dirichlet-multinomial distribution
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Basic notions
Ancestor
Ancestor of v ∈ V (Gn) is u ∈ V (G0), from v was ultimately copied.
Orbit
Orbit is a group of vertices in Gn with the same ancestor.
u1 (u1) u2 (u2) u3 (u3) u4 (u4)
u5 (u5)u6 (u6)v1 (u2)v2 (u1)
v3 (u2)
Figure: Example graph generated from the full duplication model
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Ball-and-urn model
Polya urn model
Let each of n0 urns contain one ball: Ci,0 = 1 for i = 1, . . . , n0.
In each step, pick an urn proportionally to the number of balls in it andadd one ball to it.
The distribution of (Ci,n)n0i=1 is also known in the literature as
the Dirichlet-multinomial distribution DM(n, (1, . . . , 1)).
This distribution has also very well known marginal distribution undername beta-binomial distribution BBin(n, 1, n0 − 1):
Pr(Ci,n = (k + 1)) = (n0 − 1)(n
k
)Γ(k + 1)Γ(n − k + n0 − 1)
Γ(n + n0)
where Γ(a) is Euler gamma function.
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Graph structure and ball-and-urn model
(Ci,n)n0i=1 is the number of vertices that have ui as the ancestors, and
H(S(Gn)) = H((Ci,n)n0i=1) =
n0∑i=1
E[logCi,n] = n0E[logC1,n].
u1 (u1) u2 (u2) u3 (u3) u4 (u4)
u5 (u5)u6 (u6)v1 (u2)v2 (u1)
v3 (u2)
(a) Graph
C1,n 2C2,n 3C3,n 1C4,n 1C5,n 1C6,n 1
(b) Representation
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Automorphisms and ball-and-urn model
Automorphism:We can always swap vertices within one orbit, never between orbits,therefore:
E[log |Aut(Gn)|] =n0∑i=1
E [logCi,n!] = n0E [logC1,n!] .
Feasible Permutations Γ(G ):We can always swap vertices if (i) both are in G0 or (ii) if both are notin G0 and (iii) start Gn0 by selecting one node in each orbit. Thus:
E[log |Γ(Gn)|] = log n0! + log n! + n0E[logC1,n].
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Solution
Finally, we get
E[log |Aut(Gn)|] = n log n − nHn0 log e +3n02
log n + O(1),
E[log |Γ(Gn)|] = n log n − n log e +2n0 + 1
2log n + O(1),
H(S(Gn)) = (n0 − 1) log n + O(1),
H(Gn) = n(Hn0 − 1) log e +n0 − 1
2log n + O(1).
Note: H(Gn) = Θ(n), H(S(Gn)) = Θ(log n) – fairly unique for well-known random graph models!
Graph Compression: arithmetic encoding.
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Talk outline
1 Introduction: dynamic networks and structural information
2 Duplication graph models
Full duplication model: entropy
Compression algorithms: Arithmetic Encoding
3 Entropy of Dirichlet-multinomial distribution
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Arithmetic coding
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Talk outline
1 Introduction: dynamic networks and structural information
2 Duplication graph models
Full duplication model: entropy
Compression algorithms: Arithmetic Encoding
3 Entropy of Dirichlet-multinomial distribution
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Dirichlet-multinomial distribution
Dirichlet-multinomial distribution is multinomial distribution wiht param-eters distributed as Dirichlet distribution.
Definition (Probability mass function for DM(n, ᾱ))
Let X̄ ∼ DM(n, ᾱ) for ᾱ = (α1, . . . , αm), where αi > 0 for i = 1, . . . ,m.Then for a value x = (x1, . . . , xm), xi ∈ {0, 1, . . . , n} such that∑m
k=1 xi = n, it holds that:
Pr(X̄ = x̄) =∫
[0,1]mPr(p̄)Pr(X̄ = x̄ |p̄) dp̄
=
∫[0,1]m
Γ(α0)∑mk=1 Γ(αk)
m∏k=1
pαk−1i
(n
x1 . . . xk
) m∏k=1
pxki dp̄
=Γ(n + 1)Γ (α0)
Γ (n + α0)
m∏k=1
Γ (xk + αk)
Γ(xk + 1)Γ (αk)=
nB(n, α0)∏k : xk>0
xkB(xk , αk)
where Γ(x) is the Euler gamma function and α0 =∑m
k=1 αk .
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Beta-binomial distribution
Definition (Probability mass function for BBin(n, α, β))
Let X ∼ BBin(n, α, β) for α, β > 0.Then for a value x ∈ {0, 1, . . . , n} it holds that:
Pr(X = x) =∫ 10π(α, β, p)Pr(X = x |p) dp
=
∫ 10
pα−1(1− p)β−1
B(α, β)
(n
x
)px(1− p)n−x dp.
Beta-binomial distribution is both a special case of Dirichlet-multinomialdistribution for m = 2 – and a marginal distribution for DM(n, ᾱ).
Entropy was analyzed by Cheraghchi, 2017 (but non asymptotic expres-sion).
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Entropy of Dirichlet-multinomial distribution
The entropy of Dirichlet-multinomial distribution can be expressed as
H(X̄ ) = −E log Pr(X̄ )= log Γ (n + α0)− log Γ (n + 1)− log Γ (α0)
+m∑
k=1
log Γ (αk) +m∑
k=1
E log Γ(Xk + 1)
−m∑
k=1
E log Γ (Xk + αk)
where Xk ∼ BBin (n, αk , α0 − αk).
Thus we only need to compute E log Γ (X + t) for some t and X ∼BBin (n, α1, α2).
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Main Result: Asymptotic Formula for Entropy
Theorem
If X̄ ∼ DM(n, ᾱ), then
H(X̄ ) = (m − 1) log n − log Γ (α0)
+m∑
k=1
log Γ (αk) + log em∑
k=1
(αk − 1)(ψ(αk)− ψ(α0))
+
dmin{αi}e−1∑s=1
esn−s + O
(polylog(n)nmin{αi}
)where es are explicitly computable.
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
ExampleExample
Let ᾱ = (3, 4, 5). Then, from Theorem 4 we have
H(X̄ ) = 2 log n +(
955939240
log e + 5 + 2 log 3− log(11!))
+ n−112 log e − n−2 147718
log e + O(
polylog(n)n3
).
Table: Exact values and approximations for the entropy
n exact value approximation absolute error
100 11.29480883 11.29392204 8.8 · 10−4
500 15.81065166 15.81064409 7.5 · 10−6
1000 17.79368785 17.79368690 9.5 · 10−7
5000 22.42380687 22.42380686 7.7 · 10−9
10000 24.42207918 24.42207918 9.6 · 10−10
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy
Roadmap
Taylor’s theorem
Central moments estimation+ Stirling approximation
Euler integrals
Kummer solutions
Wojciech Szpankowski Dynamic Graphs Compression and Dirchlet-multinomial Entropy