Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan...

33
Corruption and Anomaly Detection in Networks Elchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944 (this new one ; thank you!)

Transcript of Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan...

Page 1: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

Corruption and Anomaly Detection in Networks

Elchanan Mossel (MIT)

Supported by1. NSF CCF-16652522. ONR N00014-17-1-25983. NSF DMS-1737944 (this new one ; thank you!)

Page 2: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

Intrusion Detection in Networks

Figure: ”Cisco Network-Based Intrusion Detection–Functionalities andConfiguration” from cisco.com

Page 3: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

Institutional Corruption Networks

Figure: From drawingbynumbers.org Based on the report ”The Effect ofDrug Trafficking and Corruption on Democratic Institutions in Mexico”

Page 4: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

The PMC Model

I Directed graph G = (V ,E ) of agents.

I V = T ∪ B = Truthful ∪ Corrupt / Bad .

I Truthful nodes report status (T/B) of neighbors accurately.

I Corrupt nodes report status (T/B) of neighbors adversarially.I Model of:

I Diagnosable digital systems (Perparata, Metze, Chien 1967 ...).I Byzantine Computing (e.g. Lamport et al. 1982)I Intrusion Detection (e.g Mukherjee et al. 1994).I Corruption in Social Networks.

I Thousands of papers!

Page 5: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

The PMC Model

I Directed graph G = (V ,E ) of agents.

I V = T ∪ B = Truthful ∪ Corrupt / Bad .

I Truthful nodes report status (T/B) of neighbors accurately.

I Corrupt nodes report status (T/B) of neighbors adversarially.I Model of:

I Diagnosable digital systems (Perparata, Metze, Chien 1967 ...).I Byzantine Computing (e.g. Lamport et al. 1982)I Intrusion Detection (e.g Mukherjee et al. 1994).I Corruption in Social Networks.

I Thousands of papers!

Page 6: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

The PMC Model

I Directed graph G = (V ,E ) of agents.

I V = T ∪ B = Truthful ∪ Corrupt / Bad .

I Truthful nodes report status (T/B) of neighbors accurately.

I Corrupt nodes report status (T/B) of neighbors adversarially.I Model of:

I Diagnosable digital systems (Perparata, Metze, Chien 1967 ...).I Byzantine Computing (e.g. Lamport et al. 1982)I Intrusion Detection (e.g Mukherjee et al. 1994).I Corruption in Social Networks.

I Thousands of papers!

Page 7: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

The PMC Model

I Directed graph G = (V ,E ) of agents.

I V = T ∪ B = Truthful ∪ Corrupt / Bad .

I Truthful nodes report status (T/B) of neighbors accurately.

I Corrupt nodes report status (T/B) of neighbors adversarially.

I Model of:I Diagnosable digital systems (Perparata, Metze, Chien 1967 ...).I Byzantine Computing (e.g. Lamport et al. 1982)I Intrusion Detection (e.g Mukherjee et al. 1994).I Corruption in Social Networks.

I Thousands of papers!

Page 8: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

The PMC Model

I Directed graph G = (V ,E ) of agents.

I V = T ∪ B = Truthful ∪ Corrupt / Bad .

I Truthful nodes report status (T/B) of neighbors accurately.

I Corrupt nodes report status (T/B) of neighbors adversarially.I Model of:

I Diagnosable digital systems (Perparata, Metze, Chien 1967 ...).I Byzantine Computing (e.g. Lamport et al. 1982)I Intrusion Detection (e.g Mukherjee et al. 1994).I Corruption in Social Networks.

I Thousands of papers!

Page 9: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

The PMC Model

I Directed graph G = (V ,E ) of agents.

I V = T ∪ B = Truthful ∪ Corrupt / Bad .

I Truthful nodes report status (T/B) of neighbors accurately.

I Corrupt nodes report status (T/B) of neighbors adversarially.I Model of:

I Diagnosable digital systems (Perparata, Metze, Chien 1967 ...).I Byzantine Computing (e.g. Lamport et al. 1982)I Intrusion Detection (e.g Mukherjee et al. 1994).I Corruption in Social Networks.

I Thousands of papers!

Page 10: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

Previous Results

I How to identify all corrupt nodes?

I PMC67: If |B| = t, need min in − deg(v) ≥ t.

I No results on bounded degree graphs!

I Corruption Detection in Bounded Degree Graphs?

Page 11: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

Previous Results

I How to identify all corrupt nodes?

I PMC67: If |B| = t, need min in − deg(v) ≥ t.

I No results on bounded degree graphs!

I Corruption Detection in Bounded Degree Graphs?

Page 12: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

Previous Results

I How to identify all corrupt nodes?

I PMC67: If |B| = t, need min in − deg(v) ≥ t.

I No results on bounded degree graphs!

I Corruption Detection in Bounded Degree Graphs?

Page 13: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

Previous Results

I How to identify all corrupt nodes?

I PMC67: If |B| = t, need min in − deg(v) ≥ t.

I No results on bounded degree graphs!

I Corruption Detection in Bounded Degree Graphs?

Page 14: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

Our Results

Theorem (Alon-Mossel-Pemantle-15)

In a δ-good expander with |T | > |B| can find T ′ ⊂ T and B ′ ⊂ Bwith |T ′ ∪ B ′| > (1− δ)n.

I |T | > |B| necessary.

I If all neighbors of v are in B cannot diagnose v .

I Running time exponential.

I Running time linear if |T | > (0.5 + 2δ)n.

Page 15: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

Our Results

Theorem (Alon-Mossel-Pemantle-15)

In a δ-good expander with |T | > |B| can find T ′ ⊂ T and B ′ ⊂ Bwith |T ′ ∪ B ′| > (1− δ)n.

I |T | > |B| necessary.

I If all neighbors of v are in B cannot diagnose v .

I Running time exponential.

I Running time linear if |T | > (0.5 + 2δ)n.

Page 16: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

Our Results

Theorem (Alon-Mossel-Pemantle-15)

In a δ-good expander with |T | > |B| can find T ′ ⊂ T and B ′ ⊂ Bwith |T ′ ∪ B ′| > (1− δ)n.

I |T | > |B| necessary.

I If all neighbors of v are in B cannot diagnose v .

I Running time exponential.

I Running time linear if |T | > (0.5 + 2δ)n.

Page 17: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

Our Results

Theorem (Alon-Mossel-Pemantle-15)

In a δ-good expander with |T | > |B| can find T ′ ⊂ T and B ′ ⊂ Bwith |T ′ ∪ B ′| > (1− δ)n.

I |T | > |B| necessary.

I If all neighbors of v are in B cannot diagnose v .

I Running time exponential.

I Running time linear if |T | > (0.5 + 2δ)n.

Page 18: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

On Expansion

I Def: G is δ -good ifI |N+(U) \ U| > |U|, ∀|U| ≤ 4δn.I A× B ∩ E 6= ∅, ∀|A| ≥ δn, |B| ≥ n/4 with A ∩ B = ∅.

I Expansion is needed:If ∃|U| ≤ εn such that all connected components of V \ U areof size ≤ εn, then impossible to find even one element of Teven if T = (1− 2ε)n.

I Weak Expansion suffices if T large enough:Suppose for every disjoint sets |A1| ≥ δn and |A2| ≥ 1− 3δn,there is an edge between A1 and A2. Then if |T | ≥ 1− δn,can find T ′ ⊂ T with |T ′| ≥ (1− 2δ)n.

Page 19: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

On Expansion

I Def: G is δ -good ifI |N+(U) \ U| > |U|, ∀|U| ≤ 4δn.I A× B ∩ E 6= ∅, ∀|A| ≥ δn, |B| ≥ n/4 with A ∩ B = ∅.

I Expansion is needed:If ∃|U| ≤ εn such that all connected components of V \ U areof size ≤ εn, then impossible to find even one element of Teven if T = (1− 2ε)n.

I Weak Expansion suffices if T large enough:Suppose for every disjoint sets |A1| ≥ δn and |A2| ≥ 1− 3δn,there is an edge between A1 and A2. Then if |T | ≥ 1− δn,can find T ′ ⊂ T with |T ′| ≥ (1− 2δ)n.

Page 20: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

On Expansion

I Def: G is δ -good ifI |N+(U) \ U| > |U|, ∀|U| ≤ 4δn.I A× B ∩ E 6= ∅, ∀|A| ≥ δn, |B| ≥ n/4 with A ∩ B = ∅.

I Expansion is needed:If ∃|U| ≤ εn such that all connected components of V \ U areof size ≤ εn, then impossible to find even one element of Teven if T = (1− 2ε)n.

I Weak Expansion suffices if T large enough:Suppose for every disjoint sets |A1| ≥ δn and |A2| ≥ 1− 3δn,there is an edge between A1 and A2. Then if |T | ≥ 1− δn,can find T ′ ⊂ T with |T ′| ≥ (1− 2δ)n.

Page 21: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

future work on the PMC Model

I Suppose the graph G is given: What is the largest set ofcorrupt guys we can identify? identify efficiently?

I Related to expansion / small set expansion.

I Suppose a constraint graph G on n vertices is given. What isthe best G ′ ⊂ G with m edges in terms of corruptiondetection?

I What if truthful nodes make (1 sided / 2 sided) mistakes?

I Applications to real data?

I Example: Applications to real news / fakes new sites?

Page 22: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

Future Work: Non-backtracking walks for anomaly andclustering

I From past experience ... - for sparse matrices,non-backtracking spectrum is better than simple spectrum.

I Talk to me!

Page 23: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

Future Work: Non-backtracking walks for anomaly andclustering

I From past experience ... - for sparse matrices,non-backtracking spectrum is better than simple spectrum.

I Talk to me!

Page 24: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

Thank you!

Page 25: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

BP and a New Type of Random Matrix

I Thm If d(1− 2ε)2 > 1 then possible to detect.

I Conj:(Krzakala,Moore,M,Neeman,Sly, Zdebrovoa,Zhang 13): If Ais the adjacency matrix, then w.h.p the second eigenvector of

N =

(0 D − I−I A

), D = diag(dv1 , . . . , dvn),

is correlated with the partition and the second eigenvalue isd(1− 2ε) + on(1).

I No orthogonal structure! N is not symmetric nor normal.Singular vector of N are useless.

I KMMNSZZ derived N by Linearizing Belief Propagation andapplying a number-theory identity by Hashimoto (89).

I Note: conjectured linear algebra algorithm is deterministic.

I Conjecture established by Bordenave-Lelarge-Massoulie 15.

Page 26: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

BP and a New Type of Random Matrix

I Thm If d(1− 2ε)2 > 1 then possible to detect.

I Conj:(Krzakala,Moore,M,Neeman,Sly, Zdebrovoa,Zhang 13): If Ais the adjacency matrix, then w.h.p the second eigenvector of

N =

(0 D − I−I A

), D = diag(dv1 , . . . , dvn),

is correlated with the partition and the second eigenvalue isd(1− 2ε) + on(1).

I No orthogonal structure! N is not symmetric nor normal.Singular vector of N are useless.

I KMMNSZZ derived N by Linearizing Belief Propagation andapplying a number-theory identity by Hashimoto (89).

I Note: conjectured linear algebra algorithm is deterministic.

I Conjecture established by Bordenave-Lelarge-Massoulie 15.

Page 27: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

BP and a New Type of Random Matrix

I Thm If d(1− 2ε)2 > 1 then possible to detect.

I Conj:(Krzakala,Moore,M,Neeman,Sly, Zdebrovoa,Zhang 13): If Ais the adjacency matrix, then w.h.p the second eigenvector of

N =

(0 D − I−I A

), D = diag(dv1 , . . . , dvn),

is correlated with the partition and the second eigenvalue isd(1− 2ε) + on(1).

I No orthogonal structure! N is not symmetric nor normal.Singular vector of N are useless.

I KMMNSZZ derived N by Linearizing Belief Propagation andapplying a number-theory identity by Hashimoto (89).

I Note: conjectured linear algebra algorithm is deterministic.

I Conjecture established by Bordenave-Lelarge-Massoulie 15.

Page 28: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

BP and a New Type of Random Matrix

I Thm If d(1− 2ε)2 > 1 then possible to detect.

I Conj:(Krzakala,Moore,M,Neeman,Sly, Zdebrovoa,Zhang 13): If Ais the adjacency matrix, then w.h.p the second eigenvector of

N =

(0 D − I−I A

), D = diag(dv1 , . . . , dvn),

is correlated with the partition and the second eigenvalue isd(1− 2ε) + on(1).

I No orthogonal structure! N is not symmetric nor normal.Singular vector of N are useless.

I KMMNSZZ derived N by Linearizing Belief Propagation andapplying a number-theory identity by Hashimoto (89).

I Note: conjectured linear algebra algorithm is deterministic.

I Conjecture established by Bordenave-Lelarge-Massoulie 15.

Page 29: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

BP and a New Type of Random Matrix

I Thm If d(1− 2ε)2 > 1 then possible to detect.

I Conj:(Krzakala,Moore,M,Neeman,Sly, Zdebrovoa,Zhang 13): If Ais the adjacency matrix, then w.h.p the second eigenvector of

N =

(0 D − I−I A

), D = diag(dv1 , . . . , dvn),

is correlated with the partition and the second eigenvalue isd(1− 2ε) + on(1).

I No orthogonal structure! N is not symmetric nor normal.Singular vector of N are useless.

I KMMNSZZ derived N by Linearizing Belief Propagation andapplying a number-theory identity by Hashimoto (89).

I Note: conjectured linear algebra algorithm is deterministic.

I Conjecture established by Bordenave-Lelarge-Massoulie 15.

Page 30: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

BP and a New Type of Random Matrix

I Thm If d(1− 2ε)2 > 1 then possible to detect.

I Conj:(Krzakala,Moore,M,Neeman,Sly, Zdebrovoa,Zhang 13): If Ais the adjacency matrix, then w.h.p the second eigenvector of

N =

(0 D − I−I A

), D = diag(dv1 , . . . , dvn),

is correlated with the partition and the second eigenvalue isd(1− 2ε) + on(1).

I No orthogonal structure! N is not symmetric nor normal.Singular vector of N are useless.

I KMMNSZZ derived N by Linearizing Belief Propagation andapplying a number-theory identity by Hashimoto (89).

I Note: conjectured linear algebra algorithm is deterministic.

I Conjecture established by Bordenave-Lelarge-Massoulie 15.

Page 31: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

The Eigenvalues of N

d = 3, d(1− 2ε) = 2,√d = 1.732...

�1 1 2 3

�1.5

�1.0

�0.5

0.5

1.0

1.5

λ2

Page 32: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

The spectrum on real networks

-4

-3

-2

-1

1

2

3

4

-4 -2 2 4 6 8 10 12

Football q=10

-30

-20

-10

10

20

30

-40 -20 20 40 60 80

Polblogs q=2Overlap: 0.8533

-4

-2

2

4

-5 5 10

Adjnoun q=2Overlap: 0.6250

-3

-2

-1

1

2

3

-4 -2 2 4 6 8

Dolphins q=2Overlap: 0.7419

-4

-3

-2

-1

1

2

3

4

-4 -2 2 4 6 8 10 12

Polbooks q=2Overlap: 0.7571

-4

-3

-2

-1

1

2

3

4

-4 -2 2 4 6 8 10 12

Karate q=2Overlap: 1

Page 33: Elchanan Mossel (MIT) - University of California, Santa Cruzabel/ATD2017/session5/Mossel.pdfElchanan Mossel (MIT) Supported by 1. NSF CCF-1665252 2. ONR N00014-17-1-2598 3. NSF DMS-1737944

Thank you!