Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized...
Transcript of Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized...
![Page 1: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/1.jpg)
Detecting Path Anomalies in Sequential Data on Networks
Tim LaRocktlarock.github.io
[email protected] collaboration with
Tina Eliassi-RadGiona CasiraghiVahan Nanumyan
1
Ingo Scholtes Frank Schweitzer
![Page 2: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/2.jpg)
This TalkMotivation: Understanding mechanisms behind sequential data on networks
Today:Motivate the study of path anomalies
Introduce de Bruijn graph representation of sequential data
Develop tractable null model to measure deviation of path data from expectation
Validate null model in synthetic data + compare with naïve baseline method
Application of methodology to a real system
2
![Page 3: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/3.jpg)
Intuitive Example: Passenger Flight Data
3
![Page 4: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/4.jpg)
4
Itinerary 1
Itinerary 2
…
Itinerary 3
…
…
…
BOS JFK LAX
ATL JFK BTV
BOS JFK BTV
Intuitive Example: Passenger Flight Data
![Page 5: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/5.jpg)
5
x 10
x 1
x 0
Itinerary 1
Itinerary 2
…
Itinerary 3
BOS JFK LAX
ATL JFK BTV
BOS JFK BTV
Intuitive Example: Passenger Flight Data
![Page 6: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/6.jpg)
6
x 10
x 1
x 0
Itinerary 1
Itinerary 2
…
Itinerary 3
BOS JFK LAX
ATL JFK BTV
BOS JFK BTV
Intuitive Example: Passenger Flight Data
Underrepresented?
![Page 7: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/7.jpg)
7
x 10
x 1
x 0
Itinerary 1
Itinerary 2
…
Itinerary 3
BOS JFK LAX
ATL JFK BTV
BOS JFK BTV
Intuitive Example: Passenger Flight Data
Overrepresented?
![Page 8: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/8.jpg)
is significantly overrepresented?
8
Research Question
is significantly underepresented?
x 10
x 1
x 0
Itinerary 1
Itinerary 2
Itinerary 3
BOS JFK LAX
ATL JFK BTV
JFK BTVBOS
Given this pathway dataset, can we determine whether…
![Page 9: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/9.jpg)
is significantly overrepresented?
9
Research Question
is significantly underepresented?
x 10
x 1
x 0
Itinerary 1
Itinerary 2
Itinerary 3
BOS JFK LAX
ATL JFK BTV
JFK BTVBOS
Given this pathway dataset, can we determine whether…
In other words: Which paths are anomalous?
![Page 10: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/10.jpg)
Problem: Path anomaly detection
10
For a given pathway dataset S, graph G, and integer k, identify paths of length k through G whose observed frequencies in S deviate significantly from random expectation in a (k-1)-order model of paths through G.
![Page 11: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/11.jpg)
Problem: Path anomaly detection
11
For a given pathway dataset S, graph G, and integer k, identify paths of length k through G whose observed frequencies in S deviate significantly from random expectation in a (k-1)-order model of paths through G.
When k=2, this corresponds to comparing a random walk with a single step of memory to a memoryless (Markovian) random walk on G.
![Page 12: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/12.jpg)
Toy Example
12
Three Goals:
1. Introduce de Bruijn graphs as representations of sequential data
2. Show how path anomalies emerge in a simple setting
3. Show how path anomalies can be detected through a random walk simulation approach
(Spoiler: Simulation approach is infeasible for real world datasets!)
![Page 13: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/13.jpg)
Toy Example
13
Itinerary 1
Itinerary 2
…
Itinerary 3
…
…
…
BOS JFK LAX
ATL JFK BTV
BOS JFK BTV
![Page 14: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/14.jpg)
Toy Example
14
Pathway 1
Pathway 2
…
Pathway 3
…
…
…
XA C
B DX
XA C
![Page 15: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/15.jpg)
Toy Example: Data
15
XA C
B DX
XA C
…
…
…
p1<latexit sha1_base64="32piOB4m0s8TFvrrE//lPH6QV48=">AAAB7HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGLiAQlcyN4ywIa9vcvungm58BtsLDTG1h9k579xgSsUfMkkL+/NZGZemAiujet+O4WNza3tneJuaW//4PCofHzS0nGqGPosFrHqhFSj4BJ9w43ATqKQRqHAdji5m/vtJ1Sax/LRTBMMIjqSfMgZNVbyq0nfq/bLFbfmLkDWiZeTCuRo9stfvUHM0gilYYJq3fXcxAQZVYYzgbNSL9WYUDahI+xaKmmEOsgWx87IhVUGZBgrW9KQhfp7IqOR1tMotJ0RNWO96s3F/7xuaoY3QcZlkhqUbLlomApiYjL/nAy4QmbE1BLKFLe3EjamijJj8ynZELzVl9dJq17zrmr1h3qlcZvHUYQzOIdL8OAaGnAPTfCBAYdneIU3RzovzrvzsWwtOPnMKfyB8/kDuLCN9g==</latexit>
p2<latexit sha1_base64="7XYqazwj7X+GHe48AUv8nkePnnc=">AAAB7HicbVBNT8JAEJ3iF+IX6tHLRjDxRNp60CPRi0dMLJBAQ7bLFjZsd5vdrQlp+A1ePGiMV3+QN/+NC/Sg4EsmeXlvJjPzopQzbVz32yltbG5t75R3K3v7B4dH1eOTtpaZIjQgkkvVjbCmnAkaGGY47aaK4iTitBNN7uZ+54kqzaR4NNOUhgkeCRYzgo2Vgno68OuDas1tuAugdeIVpAYFWoPqV38oSZZQYQjHWvc8NzVhjpVhhNNZpZ9pmmIywSPas1TghOowXxw7QxdWGaJYKlvCoIX6eyLHidbTJLKdCTZjverNxf+8XmbimzBnIs0MFWS5KM44MhLNP0dDpigxfGoJJorZWxEZY4WJsflUbAje6svrpO03vKuG/+DXmrdFHGU4g3O4BA+uoQn30IIACDB4hld4c4Tz4rw7H8vWklPMnMIfOJ8/ujWN9w==</latexit>
p3<latexit sha1_base64="Z6AU2fag2QeM9tg6J4H+2085QRQ=">AAAB7HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsotCTaWGLiIQlcyN4ywIa9vcvungm58BtsLDTG1h9k579xgSsUfMkkL+/NZGZemAiujet+O4WNza3tneJuaW//4PCofHzS1nGqGPosFrHqhFSj4BJ9w43ATqKQRqHAx3ByO/cfn1BpHssHM00wiOhI8iFn1FjJryb9RrVfrrg1dwGyTrycVCBHq1/+6g1ilkYoDRNU667nJibIqDKcCZyVeqnGhLIJHWHXUkkj1EG2OHZGLqwyIMNY2ZKGLNTfExmNtJ5Goe2MqBnrVW8u/ud1UzO8DjIuk9SgZMtFw1QQE5P552TAFTIjppZQpri9lbAxVZQZm0/JhuCtvrxO2vWa16jV7+uV5k0eRxHO4BwuwYMraMIdtMAHBhye4RXeHOm8OO/Ox7K14OQzp/AHzucPu7qN+A==</latexit>
S = {<latexit sha1_base64="jR9Gx72XqiPCM3LHdS3SOjkze5A=">AAAB73icbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstDEh2lhiFCQBQvaWPdiwt3fuzpmQC3/CxkJjbP07dv4bF7hCwZdM8vLeTGbm+bEUBl3328mtrK6tb+Q3C1vbO7t7xf2DpokSzXiDRTLSLZ8aLoXiDRQoeSvWnIa+5A/+6HrqPzxxbUSk7nEc825IB0oEglG0Uqt8Ry5JJy33iiW34s5AlomXkRJkqPeKX51+xJKQK2SSGtP23Bi7KdUomOSTQicxPKZsRAe8bamiITfddHbvhJxYpU+CSNtSSGbq74mUhsaMQ992hhSHZtGbiv957QSDi24qVJwgV2y+KEgkwYhMnyd9oTlDObaEMi3srYQNqaYMbUQFG4K3+PIyaVYr3lmlelst1a6yOPJwBMdwCh6cQw1uoA4NYCDhGV7hzXl0Xpx352PemnOymUP4A+fzByf6jrs=</latexit>
pN<latexit sha1_base64="66lAMOV8OdCg9bnkdFLAB9TDMvI=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04kkq2g9oQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dlZW19Y3Ngtbxe2d3b390sFhU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWjm6nfekKleSwfzThBP6IDyUPOqLHSQ9K765XKbsWdgSwTLydlyFHvlb66/ZilEUrDBNW647mJ8TOqDGcCJ8VuqjGhbEQH2LFU0gi1n81OnZBTq/RJGCtb0pCZ+nsio5HW4yiwnRE1Q73oTcX/vE5qwis/4zJJDUo2XxSmgpiYTP8mfa6QGTG2hDLF7a2EDamizNh0ijYEb/HlZdKsVrzzSvX+oly7zuMowDGcwBl4cAk1uIU6NIDBAJ7hFd4c4bw4787HvHXFyWeO4A+czx8swo25</latexit>
…
![Page 16: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/16.jpg)
Toy Example: Data
16
Total number of paths N = |S| = 235
<latexit sha1_base64="4z0D7GzwoVnotHiFLJOXG6+EZnY=">AAACDnicbVDLSgMxFM34rPU16tJNsC24KjMtohuh6MaVVOwL2qFk0kwbmkmGJCOUab/Ajb/ixoUibl27829M21lo64GEwzn3cu89fsSo0o7zba2srq1vbGa2sts7u3v79sFhQ4lYYlLHggnZ8pEijHJS11Qz0ookQaHPSNMfXk/95gORigpe06OIeCHqcxpQjLSRunahJjRikMehTyQUAYyQHiiYv4WXcHw/Nn+pfJbv2jmn6MwAl4mbkhxIUe3aX52ewHFIuMYMKdV2nUh7CZKaYkYm2U6sSITwEPVJ21COQqK8ZHbOBBaM0oOBkOZxDWfq744EhUqNQt9UhtNtF72p+J/XjnVw4SWUR7EmHM8HBTGDWsBpNrBHJcGajQxBWFKzK8QDJBHWJsGsCcFdPHmZNEpFt1ws3ZVylas0jgw4BifgFLjgHFTADaiCOsDgETyDV/BmPVkv1rv1MS9dsdKeI/AH1ucPKe6Zlg==</latexit>
…
p1<latexit sha1_base64="32piOB4m0s8TFvrrE//lPH6QV48=">AAAB7HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGLiAQlcyN4ywIa9vcvungm58BtsLDTG1h9k579xgSsUfMkkL+/NZGZemAiujet+O4WNza3tneJuaW//4PCofHzS0nGqGPosFrHqhFSj4BJ9w43ATqKQRqHAdji5m/vtJ1Sax/LRTBMMIjqSfMgZNVbyq0nfq/bLFbfmLkDWiZeTCuRo9stfvUHM0gilYYJq3fXcxAQZVYYzgbNSL9WYUDahI+xaKmmEOsgWx87IhVUGZBgrW9KQhfp7IqOR1tMotJ0RNWO96s3F/7xuaoY3QcZlkhqUbLlomApiYjL/nAy4QmbE1BLKFLe3EjamijJj8ynZELzVl9dJq17zrmr1h3qlcZvHUYQzOIdL8OAaGnAPTfCBAYdneIU3RzovzrvzsWwtOPnMKfyB8/kDuLCN9g==</latexit>
p2<latexit sha1_base64="7XYqazwj7X+GHe48AUv8nkePnnc=">AAAB7HicbVBNT8JAEJ3iF+IX6tHLRjDxRNp60CPRi0dMLJBAQ7bLFjZsd5vdrQlp+A1ePGiMV3+QN/+NC/Sg4EsmeXlvJjPzopQzbVz32yltbG5t75R3K3v7B4dH1eOTtpaZIjQgkkvVjbCmnAkaGGY47aaK4iTitBNN7uZ+54kqzaR4NNOUhgkeCRYzgo2Vgno68OuDas1tuAugdeIVpAYFWoPqV38oSZZQYQjHWvc8NzVhjpVhhNNZpZ9pmmIywSPas1TghOowXxw7QxdWGaJYKlvCoIX6eyLHidbTJLKdCTZjverNxf+8XmbimzBnIs0MFWS5KM44MhLNP0dDpigxfGoJJorZWxEZY4WJsflUbAje6svrpO03vKuG/+DXmrdFHGU4g3O4BA+uoQn30IIACDB4hld4c4Tz4rw7H8vWklPMnMIfOJ8/ujWN9w==</latexit>
p3<latexit sha1_base64="Z6AU2fag2QeM9tg6J4H+2085QRQ=">AAAB7HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsotCTaWGLiIQlcyN4ywIa9vcvungm58BtsLDTG1h9k579xgSsUfMkkL+/NZGZemAiujet+O4WNza3tneJuaW//4PCofHzS1nGqGPosFrHqhFSj4BJ9w43ATqKQRqHAx3ByO/cfn1BpHssHM00wiOhI8iFn1FjJryb9RrVfrrg1dwGyTrycVCBHq1/+6g1ilkYoDRNU667nJibIqDKcCZyVeqnGhLIJHWHXUkkj1EG2OHZGLqwyIMNY2ZKGLNTfExmNtJ5Goe2MqBnrVW8u/ud1UzO8DjIuk9SgZMtFw1QQE5P552TAFTIjppZQpri9lbAxVZQZm0/JhuCtvrxO2vWa16jV7+uV5k0eRxHO4BwuwYMraMIdtMAHBhye4RXeHOm8OO/Ox7K14OQzp/AHzucPu7qN+A==</latexit>
S = {<latexit sha1_base64="jR9Gx72XqiPCM3LHdS3SOjkze5A=">AAAB73icbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstDEh2lhiFCQBQvaWPdiwt3fuzpmQC3/CxkJjbP07dv4bF7hCwZdM8vLeTGbm+bEUBl3328mtrK6tb+Q3C1vbO7t7xf2DpokSzXiDRTLSLZ8aLoXiDRQoeSvWnIa+5A/+6HrqPzxxbUSk7nEc825IB0oEglG0Uqt8Ry5JJy33iiW34s5AlomXkRJkqPeKX51+xJKQK2SSGtP23Bi7KdUomOSTQicxPKZsRAe8bamiITfddHbvhJxYpU+CSNtSSGbq74mUhsaMQ992hhSHZtGbiv957QSDi24qVJwgV2y+KEgkwYhMnyd9oTlDObaEMi3srYQNqaYMbUQFG4K3+PIyaVYr3lmlelst1a6yOPJwBMdwCh6cQw1uoA4NYCDhGV7hzXl0Xpx352PemnOymUP4A+fzByf6jrs=</latexit>
pN<latexit sha1_base64="66lAMOV8OdCg9bnkdFLAB9TDMvI=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04kkq2g9oQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dlZW19Y3Ngtbxe2d3b390sFhU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWjm6nfekKleSwfzThBP6IDyUPOqLHSQ9K765XKbsWdgSwTLydlyFHvlb66/ZilEUrDBNW647mJ8TOqDGcCJ8VuqjGhbEQH2LFU0gi1n81OnZBTq/RJGCtb0pCZ+nsio5HW4yiwnRE1Q73oTcX/vE5qwis/4zJJDUo2XxSmgpiYTP8mfa6QGTG2hDLF7a2EDamizNh0ijYEb/HlZdKsVrzzSvX+oly7zuMowDGcwBl4cAk1uIU6NIDBAJ7hFd4c4bw4787HvHXFyWeO4A+czx8swo25</latexit>
XA C
B DX
XA C
…
…
…
![Page 17: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/17.jpg)
Toy Example: Data to (first-order) graph
17
edge counts
30
XA
205
XB
135
CX
100
DX
XA C
B DX
XA C
…
…
…
p1<latexit sha1_base64="32piOB4m0s8TFvrrE//lPH6QV48=">AAAB7HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGLiAQlcyN4ywIa9vcvungm58BtsLDTG1h9k579xgSsUfMkkL+/NZGZemAiujet+O4WNza3tneJuaW//4PCofHzS0nGqGPosFrHqhFSj4BJ9w43ATqKQRqHAdji5m/vtJ1Sax/LRTBMMIjqSfMgZNVbyq0nfq/bLFbfmLkDWiZeTCuRo9stfvUHM0gilYYJq3fXcxAQZVYYzgbNSL9WYUDahI+xaKmmEOsgWx87IhVUGZBgrW9KQhfp7IqOR1tMotJ0RNWO96s3F/7xuaoY3QcZlkhqUbLlomApiYjL/nAy4QmbE1BLKFLe3EjamijJj8ynZELzVl9dJq17zrmr1h3qlcZvHUYQzOIdL8OAaGnAPTfCBAYdneIU3RzovzrvzsWwtOPnMKfyB8/kDuLCN9g==</latexit>
p2<latexit sha1_base64="7XYqazwj7X+GHe48AUv8nkePnnc=">AAAB7HicbVBNT8JAEJ3iF+IX6tHLRjDxRNp60CPRi0dMLJBAQ7bLFjZsd5vdrQlp+A1ePGiMV3+QN/+NC/Sg4EsmeXlvJjPzopQzbVz32yltbG5t75R3K3v7B4dH1eOTtpaZIjQgkkvVjbCmnAkaGGY47aaK4iTitBNN7uZ+54kqzaR4NNOUhgkeCRYzgo2Vgno68OuDas1tuAugdeIVpAYFWoPqV38oSZZQYQjHWvc8NzVhjpVhhNNZpZ9pmmIywSPas1TghOowXxw7QxdWGaJYKlvCoIX6eyLHidbTJLKdCTZjverNxf+8XmbimzBnIs0MFWS5KM44MhLNP0dDpigxfGoJJorZWxEZY4WJsflUbAje6svrpO03vKuG/+DXmrdFHGU4g3O4BA+uoQn30IIACDB4hld4c4Tz4rw7H8vWklPMnMIfOJ8/ujWN9w==</latexit>
p3<latexit sha1_base64="Z6AU2fag2QeM9tg6J4H+2085QRQ=">AAAB7HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsotCTaWGLiIQlcyN4ywIa9vcvungm58BtsLDTG1h9k579xgSsUfMkkL+/NZGZemAiujet+O4WNza3tneJuaW//4PCofHzS1nGqGPosFrHqhFSj4BJ9w43ATqKQRqHAx3ByO/cfn1BpHssHM00wiOhI8iFn1FjJryb9RrVfrrg1dwGyTrycVCBHq1/+6g1ilkYoDRNU667nJibIqDKcCZyVeqnGhLIJHWHXUkkj1EG2OHZGLqwyIMNY2ZKGLNTfExmNtJ5Goe2MqBnrVW8u/ud1UzO8DjIuk9SgZMtFw1QQE5P552TAFTIjppZQpri9lbAxVZQZm0/JhuCtvrxO2vWa16jV7+uV5k0eRxHO4BwuwYMraMIdtMAHBhye4RXeHOm8OO/Ox7K14OQzp/AHzucPu7qN+A==</latexit>
S = {<latexit sha1_base64="jR9Gx72XqiPCM3LHdS3SOjkze5A=">AAAB73icbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstDEh2lhiFCQBQvaWPdiwt3fuzpmQC3/CxkJjbP07dv4bF7hCwZdM8vLeTGbm+bEUBl3328mtrK6tb+Q3C1vbO7t7xf2DpokSzXiDRTLSLZ8aLoXiDRQoeSvWnIa+5A/+6HrqPzxxbUSk7nEc825IB0oEglG0Uqt8Ry5JJy33iiW34s5AlomXkRJkqPeKX51+xJKQK2SSGtP23Bi7KdUomOSTQicxPKZsRAe8bamiITfddHbvhJxYpU+CSNtSSGbq74mUhsaMQ992hhSHZtGbiv957QSDi24qVJwgV2y+KEgkwYhMnyd9oTlDObaEMi3srYQNqaYMbUQFG4K3+PIyaVYr3lmlelst1a6yOPJwBMdwCh6cQw1uoA4NYCDhGV7hzXl0Xpx352PemnOymUP4A+fzByf6jrs=</latexit>
pN<latexit sha1_base64="66lAMOV8OdCg9bnkdFLAB9TDMvI=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04kkq2g9oQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dlZW19Y3Ngtbxe2d3b390sFhU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWjm6nfekKleSwfzThBP6IDyUPOqLHSQ9K765XKbsWdgSwTLydlyFHvlb66/ZilEUrDBNW647mJ8TOqDGcCJ8VuqjGhbEQH2LFU0gi1n81OnZBTq/RJGCtb0pCZ+nsio5HW4yiwnRE1Q73oTcX/vE5qwis/4zJJDUo2XxSmgpiYTP8mfa6QGTG2hDLF7a2EDamizNh0ijYEb/HlZdKsVrzzSvX+oly7zuMowDGcwBl4cAk1uIU6NIDBAJ7hFd4c4bw4787HvHXFyWeO4A+czx8swo25</latexit>
…
Total number of paths N = |S| = 235
<latexit sha1_base64="4z0D7GzwoVnotHiFLJOXG6+EZnY=">AAACDnicbVDLSgMxFM34rPU16tJNsC24KjMtohuh6MaVVOwL2qFk0kwbmkmGJCOUab/Ajb/ixoUibl27829M21lo64GEwzn3cu89fsSo0o7zba2srq1vbGa2sts7u3v79sFhQ4lYYlLHggnZ8pEijHJS11Qz0ookQaHPSNMfXk/95gORigpe06OIeCHqcxpQjLSRunahJjRikMehTyQUAYyQHiiYv4WXcHw/Nn+pfJbv2jmn6MwAl4mbkhxIUe3aX52ewHFIuMYMKdV2nUh7CZKaYkYm2U6sSITwEPVJ21COQqK8ZHbOBBaM0oOBkOZxDWfq744EhUqNQt9UhtNtF72p+J/XjnVw4SWUR7EmHM8HBTGDWsBpNrBHJcGajQxBWFKzK8QDJBHWJsGsCcFdPHmZNEpFt1ws3ZVylas0jgw4BifgFLjgHFTADaiCOsDgETyDV/BmPVkv1rv1MS9dsdKeI/AH1ucPKe6Zlg==</latexit>
![Page 18: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/18.jpg)
Toy Example: Data to (first-order) graph
18
edge counts
30
XA
205
XB
135
CX
100
DX A
BX
C
D
XA C
B DX
XA C
…
…
…
p1<latexit sha1_base64="32piOB4m0s8TFvrrE//lPH6QV48=">AAAB7HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGLiAQlcyN4ywIa9vcvungm58BtsLDTG1h9k579xgSsUfMkkL+/NZGZemAiujet+O4WNza3tneJuaW//4PCofHzS0nGqGPosFrHqhFSj4BJ9w43ATqKQRqHAdji5m/vtJ1Sax/LRTBMMIjqSfMgZNVbyq0nfq/bLFbfmLkDWiZeTCuRo9stfvUHM0gilYYJq3fXcxAQZVYYzgbNSL9WYUDahI+xaKmmEOsgWx87IhVUGZBgrW9KQhfp7IqOR1tMotJ0RNWO96s3F/7xuaoY3QcZlkhqUbLlomApiYjL/nAy4QmbE1BLKFLe3EjamijJj8ynZELzVl9dJq17zrmr1h3qlcZvHUYQzOIdL8OAaGnAPTfCBAYdneIU3RzovzrvzsWwtOPnMKfyB8/kDuLCN9g==</latexit>
p2<latexit sha1_base64="7XYqazwj7X+GHe48AUv8nkePnnc=">AAAB7HicbVBNT8JAEJ3iF+IX6tHLRjDxRNp60CPRi0dMLJBAQ7bLFjZsd5vdrQlp+A1ePGiMV3+QN/+NC/Sg4EsmeXlvJjPzopQzbVz32yltbG5t75R3K3v7B4dH1eOTtpaZIjQgkkvVjbCmnAkaGGY47aaK4iTitBNN7uZ+54kqzaR4NNOUhgkeCRYzgo2Vgno68OuDas1tuAugdeIVpAYFWoPqV38oSZZQYQjHWvc8NzVhjpVhhNNZpZ9pmmIywSPas1TghOowXxw7QxdWGaJYKlvCoIX6eyLHidbTJLKdCTZjverNxf+8XmbimzBnIs0MFWS5KM44MhLNP0dDpigxfGoJJorZWxEZY4WJsflUbAje6svrpO03vKuG/+DXmrdFHGU4g3O4BA+uoQn30IIACDB4hld4c4Tz4rw7H8vWklPMnMIfOJ8/ujWN9w==</latexit>
p3<latexit sha1_base64="Z6AU2fag2QeM9tg6J4H+2085QRQ=">AAAB7HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsotCTaWGLiIQlcyN4ywIa9vcvungm58BtsLDTG1h9k579xgSsUfMkkL+/NZGZemAiujet+O4WNza3tneJuaW//4PCofHzS1nGqGPosFrHqhFSj4BJ9w43ATqKQRqHAx3ByO/cfn1BpHssHM00wiOhI8iFn1FjJryb9RrVfrrg1dwGyTrycVCBHq1/+6g1ilkYoDRNU667nJibIqDKcCZyVeqnGhLIJHWHXUkkj1EG2OHZGLqwyIMNY2ZKGLNTfExmNtJ5Goe2MqBnrVW8u/ud1UzO8DjIuk9SgZMtFw1QQE5P552TAFTIjppZQpri9lbAxVZQZm0/JhuCtvrxO2vWa16jV7+uV5k0eRxHO4BwuwYMraMIdtMAHBhye4RXeHOm8OO/Ox7K14OQzp/AHzucPu7qN+A==</latexit>
S = {<latexit sha1_base64="jR9Gx72XqiPCM3LHdS3SOjkze5A=">AAAB73icbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstDEh2lhiFCQBQvaWPdiwt3fuzpmQC3/CxkJjbP07dv4bF7hCwZdM8vLeTGbm+bEUBl3328mtrK6tb+Q3C1vbO7t7xf2DpokSzXiDRTLSLZ8aLoXiDRQoeSvWnIa+5A/+6HrqPzxxbUSk7nEc825IB0oEglG0Uqt8Ry5JJy33iiW34s5AlomXkRJkqPeKX51+xJKQK2SSGtP23Bi7KdUomOSTQicxPKZsRAe8bamiITfddHbvhJxYpU+CSNtSSGbq74mUhsaMQ992hhSHZtGbiv957QSDi24qVJwgV2y+KEgkwYhMnyd9oTlDObaEMi3srYQNqaYMbUQFG4K3+PIyaVYr3lmlelst1a6yOPJwBMdwCh6cQw1uoA4NYCDhGV7hzXl0Xpx352PemnOymUP4A+fzByf6jrs=</latexit>
pN<latexit sha1_base64="66lAMOV8OdCg9bnkdFLAB9TDMvI=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04kkq2g9oQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dlZW19Y3Ngtbxe2d3b390sFhU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWjm6nfekKleSwfzThBP6IDyUPOqLHSQ9K765XKbsWdgSwTLydlyFHvlb66/ZilEUrDBNW647mJ8TOqDGcCJ8VuqjGhbEQH2LFU0gi1n81OnZBTq/RJGCtb0pCZ+nsio5HW4yiwnRE1Q73oTcX/vE5qwis/4zJJDUo2XxSmgpiYTP8mfa6QGTG2hDLF7a2EDamizNh0ijYEb/HlZdKsVrzzSvX+oly7zuMowDGcwBl4cAk1uIU6NIDBAJ7hFd4c4bw4787HvHXFyWeO4A+czx8swo25</latexit>
…
Total number of paths N = |S| = 235
<latexit sha1_base64="4z0D7GzwoVnotHiFLJOXG6+EZnY=">AAACDnicbVDLSgMxFM34rPU16tJNsC24KjMtohuh6MaVVOwL2qFk0kwbmkmGJCOUab/Ajb/ixoUibl27829M21lo64GEwzn3cu89fsSo0o7zba2srq1vbGa2sts7u3v79sFhQ4lYYlLHggnZ8pEijHJS11Qz0ookQaHPSNMfXk/95gORigpe06OIeCHqcxpQjLSRunahJjRikMehTyQUAYyQHiiYv4WXcHw/Nn+pfJbv2jmn6MwAl4mbkhxIUe3aX52ewHFIuMYMKdV2nUh7CZKaYkYm2U6sSITwEPVJ21COQqK8ZHbOBBaM0oOBkOZxDWfq744EhUqNQt9UhtNtF72p+J/XjnVw4SWUR7EmHM8HBTGDWsBpNrBHJcGajQxBWFKzK8QDJBHWJsGsCcFdPHmZNEpFt1ws3ZVylas0jgw4BifgFLjgHFTADaiCOsDgETyDV/BmPVkv1rv1MS9dsdKeI/AH1ucPKe6Zlg==</latexit>
![Page 19: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/19.jpg)
Toy Example: Data to 2nd order de Bruijn graph
19
Path counts
30
XA X B
105
CX
100
DXC BA D
XA C
B DX
XA C
…
…
…
p1<latexit sha1_base64="32piOB4m0s8TFvrrE//lPH6QV48=">AAAB7HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGLiAQlcyN4ywIa9vcvungm58BtsLDTG1h9k579xgSsUfMkkL+/NZGZemAiujet+O4WNza3tneJuaW//4PCofHzS0nGqGPosFrHqhFSj4BJ9w43ATqKQRqHAdji5m/vtJ1Sax/LRTBMMIjqSfMgZNVbyq0nfq/bLFbfmLkDWiZeTCuRo9stfvUHM0gilYYJq3fXcxAQZVYYzgbNSL9WYUDahI+xaKmmEOsgWx87IhVUGZBgrW9KQhfp7IqOR1tMotJ0RNWO96s3F/7xuaoY3QcZlkhqUbLlomApiYjL/nAy4QmbE1BLKFLe3EjamijJj8ynZELzVl9dJq17zrmr1h3qlcZvHUYQzOIdL8OAaGnAPTfCBAYdneIU3RzovzrvzsWwtOPnMKfyB8/kDuLCN9g==</latexit>
p2<latexit sha1_base64="7XYqazwj7X+GHe48AUv8nkePnnc=">AAAB7HicbVBNT8JAEJ3iF+IX6tHLRjDxRNp60CPRi0dMLJBAQ7bLFjZsd5vdrQlp+A1ePGiMV3+QN/+NC/Sg4EsmeXlvJjPzopQzbVz32yltbG5t75R3K3v7B4dH1eOTtpaZIjQgkkvVjbCmnAkaGGY47aaK4iTitBNN7uZ+54kqzaR4NNOUhgkeCRYzgo2Vgno68OuDas1tuAugdeIVpAYFWoPqV38oSZZQYQjHWvc8NzVhjpVhhNNZpZ9pmmIywSPas1TghOowXxw7QxdWGaJYKlvCoIX6eyLHidbTJLKdCTZjverNxf+8XmbimzBnIs0MFWS5KM44MhLNP0dDpigxfGoJJorZWxEZY4WJsflUbAje6svrpO03vKuG/+DXmrdFHGU4g3O4BA+uoQn30IIACDB4hld4c4Tz4rw7H8vWklPMnMIfOJ8/ujWN9w==</latexit>
p3<latexit sha1_base64="Z6AU2fag2QeM9tg6J4H+2085QRQ=">AAAB7HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsotCTaWGLiIQlcyN4ywIa9vcvungm58BtsLDTG1h9k579xgSsUfMkkL+/NZGZemAiujet+O4WNza3tneJuaW//4PCofHzS1nGqGPosFrHqhFSj4BJ9w43ATqKQRqHAx3ByO/cfn1BpHssHM00wiOhI8iFn1FjJryb9RrVfrrg1dwGyTrycVCBHq1/+6g1ilkYoDRNU667nJibIqDKcCZyVeqnGhLIJHWHXUkkj1EG2OHZGLqwyIMNY2ZKGLNTfExmNtJ5Goe2MqBnrVW8u/ud1UzO8DjIuk9SgZMtFw1QQE5P552TAFTIjppZQpri9lbAxVZQZm0/JhuCtvrxO2vWa16jV7+uV5k0eRxHO4BwuwYMraMIdtMAHBhye4RXeHOm8OO/Ox7K14OQzp/AHzucPu7qN+A==</latexit>
S = {<latexit sha1_base64="jR9Gx72XqiPCM3LHdS3SOjkze5A=">AAAB73icbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstDEh2lhiFCQBQvaWPdiwt3fuzpmQC3/CxkJjbP07dv4bF7hCwZdM8vLeTGbm+bEUBl3328mtrK6tb+Q3C1vbO7t7xf2DpokSzXiDRTLSLZ8aLoXiDRQoeSvWnIa+5A/+6HrqPzxxbUSk7nEc825IB0oEglG0Uqt8Ry5JJy33iiW34s5AlomXkRJkqPeKX51+xJKQK2SSGtP23Bi7KdUomOSTQicxPKZsRAe8bamiITfddHbvhJxYpU+CSNtSSGbq74mUhsaMQ992hhSHZtGbiv957QSDi24qVJwgV2y+KEgkwYhMnyd9oTlDObaEMi3srYQNqaYMbUQFG4K3+PIyaVYr3lmlelst1a6yOPJwBMdwCh6cQw1uoA4NYCDhGV7hzXl0Xpx352PemnOymUP4A+fzByf6jrs=</latexit>
pN<latexit sha1_base64="66lAMOV8OdCg9bnkdFLAB9TDMvI=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04kkq2g9oQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dlZW19Y3Ngtbxe2d3b390sFhU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWjm6nfekKleSwfzThBP6IDyUPOqLHSQ9K765XKbsWdgSwTLydlyFHvlb66/ZilEUrDBNW647mJ8TOqDGcCJ8VuqjGhbEQH2LFU0gi1n81OnZBTq/RJGCtb0pCZ+nsio5HW4yiwnRE1Q73oTcX/vE5qwis/4zJJDUo2XxSmgpiYTP8mfa6QGTG2hDLF7a2EDamizNh0ijYEb/HlZdKsVrzzSvX+oly7zuMowDGcwBl4cAk1uIU6NIDBAJ7hFd4c4bw4787HvHXFyWeO4A+czx8swo25</latexit>
…
Total number of paths N = |S| = 235
<latexit sha1_base64="4z0D7GzwoVnotHiFLJOXG6+EZnY=">AAACDnicbVDLSgMxFM34rPU16tJNsC24KjMtohuh6MaVVOwL2qFk0kwbmkmGJCOUab/Ajb/ixoUibl27829M21lo64GEwzn3cu89fsSo0o7zba2srq1vbGa2sts7u3v79sFhQ4lYYlLHggnZ8pEijHJS11Qz0ookQaHPSNMfXk/95gORigpe06OIeCHqcxpQjLSRunahJjRikMehTyQUAYyQHiiYv4WXcHw/Nn+pfJbv2jmn6MwAl4mbkhxIUe3aX52ewHFIuMYMKdV2nUh7CZKaYkYm2U6sSITwEPVJ21COQqK8ZHbOBBaM0oOBkOZxDWfq744EhUqNQt9UhtNtF72p+J/XjnVw4SWUR7EmHM8HBTGDWsBpNrBHJcGajQxBWFKzK8QDJBHWJsGsCcFdPHmZNEpFt1ws3ZVylas0jgw4BifgFLjgHFTADaiCOsDgETyDV/BmPVkv1rv1MS9dsdKeI/AH1ucPKe6Zlg==</latexit>
![Page 20: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/20.jpg)
Toy Example: Data to 2nd order de Bruijn graph
20
XA C
B DX
XA C
…
…
…
p1<latexit sha1_base64="32piOB4m0s8TFvrrE//lPH6QV48=">AAAB7HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGLiAQlcyN4ywIa9vcvungm58BtsLDTG1h9k579xgSsUfMkkL+/NZGZemAiujet+O4WNza3tneJuaW//4PCofHzS0nGqGPosFrHqhFSj4BJ9w43ATqKQRqHAdji5m/vtJ1Sax/LRTBMMIjqSfMgZNVbyq0nfq/bLFbfmLkDWiZeTCuRo9stfvUHM0gilYYJq3fXcxAQZVYYzgbNSL9WYUDahI+xaKmmEOsgWx87IhVUGZBgrW9KQhfp7IqOR1tMotJ0RNWO96s3F/7xuaoY3QcZlkhqUbLlomApiYjL/nAy4QmbE1BLKFLe3EjamijJj8ynZELzVl9dJq17zrmr1h3qlcZvHUYQzOIdL8OAaGnAPTfCBAYdneIU3RzovzrvzsWwtOPnMKfyB8/kDuLCN9g==</latexit>
p2<latexit sha1_base64="7XYqazwj7X+GHe48AUv8nkePnnc=">AAAB7HicbVBNT8JAEJ3iF+IX6tHLRjDxRNp60CPRi0dMLJBAQ7bLFjZsd5vdrQlp+A1ePGiMV3+QN/+NC/Sg4EsmeXlvJjPzopQzbVz32yltbG5t75R3K3v7B4dH1eOTtpaZIjQgkkvVjbCmnAkaGGY47aaK4iTitBNN7uZ+54kqzaR4NNOUhgkeCRYzgo2Vgno68OuDas1tuAugdeIVpAYFWoPqV38oSZZQYQjHWvc8NzVhjpVhhNNZpZ9pmmIywSPas1TghOowXxw7QxdWGaJYKlvCoIX6eyLHidbTJLKdCTZjverNxf+8XmbimzBnIs0MFWS5KM44MhLNP0dDpigxfGoJJorZWxEZY4WJsflUbAje6svrpO03vKuG/+DXmrdFHGU4g3O4BA+uoQn30IIACDB4hld4c4Tz4rw7H8vWklPMnMIfOJ8/ujWN9w==</latexit>
p3<latexit sha1_base64="Z6AU2fag2QeM9tg6J4H+2085QRQ=">AAAB7HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsotCTaWGLiIQlcyN4ywIa9vcvungm58BtsLDTG1h9k579xgSsUfMkkL+/NZGZemAiujet+O4WNza3tneJuaW//4PCofHzS1nGqGPosFrHqhFSj4BJ9w43ATqKQRqHAx3ByO/cfn1BpHssHM00wiOhI8iFn1FjJryb9RrVfrrg1dwGyTrycVCBHq1/+6g1ilkYoDRNU667nJibIqDKcCZyVeqnGhLIJHWHXUkkj1EG2OHZGLqwyIMNY2ZKGLNTfExmNtJ5Goe2MqBnrVW8u/ud1UzO8DjIuk9SgZMtFw1QQE5P552TAFTIjppZQpri9lbAxVZQZm0/JhuCtvrxO2vWa16jV7+uV5k0eRxHO4BwuwYMraMIdtMAHBhye4RXeHOm8OO/Ox7K14OQzp/AHzucPu7qN+A==</latexit>
S = {<latexit sha1_base64="jR9Gx72XqiPCM3LHdS3SOjkze5A=">AAAB73icbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstDEh2lhiFCQBQvaWPdiwt3fuzpmQC3/CxkJjbP07dv4bF7hCwZdM8vLeTGbm+bEUBl3328mtrK6tb+Q3C1vbO7t7xf2DpokSzXiDRTLSLZ8aLoXiDRQoeSvWnIa+5A/+6HrqPzxxbUSk7nEc825IB0oEglG0Uqt8Ry5JJy33iiW34s5AlomXkRJkqPeKX51+xJKQK2SSGtP23Bi7KdUomOSTQicxPKZsRAe8bamiITfddHbvhJxYpU+CSNtSSGbq74mUhsaMQ992hhSHZtGbiv957QSDi24qVJwgV2y+KEgkwYhMnyd9oTlDObaEMi3srYQNqaYMbUQFG4K3+PIyaVYr3lmlelst1a6yOPJwBMdwCh6cQw1uoA4NYCDhGV7hzXl0Xpx352PemnOymUP4A+fzByf6jrs=</latexit>
pN<latexit sha1_base64="66lAMOV8OdCg9bnkdFLAB9TDMvI=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04kkq2g9oQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dlZW19Y3Ngtbxe2d3b390sFhU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWjm6nfekKleSwfzThBP6IDyUPOqLHSQ9K765XKbsWdgSwTLydlyFHvlb66/ZilEUrDBNW647mJ8TOqDGcCJ8VuqjGhbEQH2LFU0gi1n81OnZBTq/RJGCtb0pCZ+nsio5HW4yiwnRE1Q73oTcX/vE5qwis/4zJJDUo2XxSmgpiYTP8mfa6QGTG2hDLF7a2EDamizNh0ijYEb/HlZdKsVrzzSvX+oly7zuMowDGcwBl4cAk1uIU6NIDBAJ7hFd4c4bw4787HvHXFyWeO4A+czx8swo25</latexit>
…
Total number of paths N = |S| = 235
<latexit sha1_base64="4z0D7GzwoVnotHiFLJOXG6+EZnY=">AAACDnicbVDLSgMxFM34rPU16tJNsC24KjMtohuh6MaVVOwL2qFk0kwbmkmGJCOUab/Ajb/ixoUibl27829M21lo64GEwzn3cu89fsSo0o7zba2srq1vbGa2sts7u3v79sFhQ4lYYlLHggnZ8pEijHJS11Qz0ookQaHPSNMfXk/95gORigpe06OIeCHqcxpQjLSRunahJjRikMehTyQUAYyQHiiYv4WXcHw/Nn+pfJbv2jmn6MwAl4mbkhxIUe3aX52ewHFIuMYMKdV2nUh7CZKaYkYm2U6sSITwEPVJ21COQqK8ZHbOBBaM0oOBkOZxDWfq744EhUqNQt9UhtNtF72p+J/XjnVw4SWUR7EmHM8HBTGDWsBpNrBHJcGajQxBWFKzK8QDJBHWJsGsCcFdPHmZNEpFt1ws3ZVylas0jgw4BifgFLjgHFTADaiCOsDgETyDV/BmPVkv1rv1MS9dsdKeI/AH1ucPKe6Zlg==</latexit>
30
Path counts
XA X B
105
CX
100
DXC BA D
XB
XA
DX
C30
X
100
![Page 21: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/21.jpg)
21
Toy Example: Path Anomalies via Simulations
For a given pathway dataset S, graph G, and integer k, identify paths of length k through G whose observed frequencies in S deviate significantly from random expectation in a (k-1)-order model of paths through G.
![Page 22: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/22.jpg)
Toy Example: Path Anomalies via Simulations
22
Simulate many random walk datasetsCompute expected frequency of each pathwayand subtract from observed value
A
BX
C
D XB
XA
DX
C30 - E[ ]
XXA C
100 - E[ ]
For a given pathway dataset S, graph G, and integer k, identify paths of length k through G whose observed frequencies in S deviate significantly from random expectation in a (k-1)-order model of paths through G.
![Page 23: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/23.jpg)
Toy Example: Path Anomalies via Simulations
23
Path counts
30
XA X
105
B CX
100
DXC BA D +12.57
+11.69
-12.57
-11.69
XA C
XA D
DXB
B CX
Deviation from Randomized Paths
XB
XA
DX
C30 - E[ ]
XXA C
100 - E[ ]
![Page 24: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/24.jpg)
Challenges
24
![Page 25: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/25.jpg)
Path Anomaly Detection: Challenges
Detecting path anomalies via simulations à computationally intensive
Result is expected value, no concrete notion of significance
Alternative: detect path anomalies analytically by developing a tractable null model
25
![Page 26: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/26.jpg)
Null Model: Challenges
26
Traditional null models (e.g. configuration model) cannot be applied directly
![Page 27: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/27.jpg)
Null Model: Challenges
27
Traditional null models (e.g. configuration model) cannot be applied directly
Edges between higher-order nodes can not be randomized by stub-matching
XBXA DCCX✅ 🚫XA
![Page 28: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/28.jpg)
Null Model: Challenges
28
Traditional null models (e.g. configuration model) cannot be applied directly
Edges between higher-order nodes can not be randomized by stub-matching
Need to randomize edge weight distribution in de Bruijn graph models, since connectivity structure is fixed by 1st-order topology
XBXA DCCX✅ 🚫XA
![Page 29: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/29.jpg)
HYPA: Efficient Detection of Path Anomalies
29
![Page 30: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/30.jpg)
Generalized Hypergeometric Ensemble
30Casiraghi et al. (2016, 2018)
![Page 31: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/31.jpg)
Generalized Hypergeometric Ensemble
31
Generalization of the configuration model to weighted, directed networks.
![Page 32: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/32.jpg)
Generalized Hypergeometric Ensemble
32
Generalization of the configuration model to weighted, directed networks.
Fixes the expected weight of every node, rather than the exact degree sequence.
![Page 33: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/33.jpg)
Generalized Hypergeometric Ensemble
33
Generalization of the configuration model to weighted, directed networks.
Fixes the expected weight of every node, rather than the exact degree sequence.
Urn Problem Intuition:
○ Each pair of nodes that can possibly connect is assigned a color
DXXA
DXXB
XB CX
X CX
Urn
A
![Page 34: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/34.jpg)
Generalized Hypergeometric Ensemble
34
Generalization of the configuration model to weighted, directed networks.
Fixes the expected weight of every node, rather than the exact degree sequence.
Urn Problem Intuition:
○ Each pair of nodes that can possibly connect is assigned a color
○ Add balls, where Kij = kouti kinj<latexit sha1_base64="m9/JBMqpKEyarcEuT1mV5zfrvcQ=">AAACFHicbVC7SgNBFJ31GeNr1dJmMAiCEHajoI0QtBFsIpgHJOsyO5lNJpl9MHNXDMt+hI2/YmOhiK2FnX/j5FHExAMDh3Pu5c45Xiy4Asv6MRYWl5ZXVnNr+fWNza1tc2e3pqJEUlalkYhkwyOKCR6yKnAQrBFLRgJPsLrXvxr69QcmFY/COxjEzAlIJ+Q+pwS05JrHN27Ke9lF/z5tAXsEGaRRAlnmcjwl8VArPdcsWEVrBDxP7AkpoAkqrvndakc0CVgIVBClmrYVg5MSCZwKluVbiWIxoX3SYU1NQxIw5aSjUBk+1Eob+5HULwQ8Uqc3UhIoNQg8PRkQ6KpZbyj+5zUT8M8dHSlOgIV0fMhPBIYIDxvCbS4ZBTHQhFDJ9V8x7RJJKOge87oEezbyPKmVivZJsXR7WihfTurIoX10gI6Qjc5QGV2jCqoiip7QC3pD78az8Wp8GJ/j0QVjsrOH/sD4+gWFdaBh</latexit>
DXXA
DXXB
XB CX
X CX
Urn
A 30x135
205x100
205x135
30x100
![Page 35: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/35.jpg)
Generalized Hypergeometric Ensemble
35
Generalization of the configuration model to weighted, directed networks.
Fixes the expected weight of every node, rather than the exact degree sequence.
Urn Problem Intuition:
○ Each pair of nodes that can possibly connect is assigned a color
○ Add balls, where
○ Draw m edges to sample a network from the urn, where
Kij = kouti kinj<latexit sha1_base64="m9/JBMqpKEyarcEuT1mV5zfrvcQ=">AAACFHicbVC7SgNBFJ31GeNr1dJmMAiCEHajoI0QtBFsIpgHJOsyO5lNJpl9MHNXDMt+hI2/YmOhiK2FnX/j5FHExAMDh3Pu5c45Xiy4Asv6MRYWl5ZXVnNr+fWNza1tc2e3pqJEUlalkYhkwyOKCR6yKnAQrBFLRgJPsLrXvxr69QcmFY/COxjEzAlIJ+Q+pwS05JrHN27Ke9lF/z5tAXsEGaRRAlnmcjwl8VArPdcsWEVrBDxP7AkpoAkqrvndakc0CVgIVBClmrYVg5MSCZwKluVbiWIxoX3SYU1NQxIw5aSjUBk+1Eob+5HULwQ8Uqc3UhIoNQg8PRkQ6KpZbyj+5zUT8M8dHSlOgIV0fMhPBIYIDxvCbS4ZBTHQhFDJ9V8x7RJJKOge87oEezbyPKmVivZJsXR7WihfTurIoX10gI6Qjc5QGV2jCqoiip7QC3pD78az8Wp8GJ/j0QVjsrOH/sD4+gWFdaBh</latexit>
DXXA
DXXB
XB CX
X CX
Urn
A 30x135
205x100
205x135
30x100
![Page 36: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/36.jpg)
Hypergeometric Ensemble
36
Pr(Xvw = f(v, w)) /✓
Kvw
f(v, w)
◆✓Pij Kij �Kvw
m� f(v, w)
◆
<latexit sha1_base64="NCkO6i3fdAOka+krxJwECKzqz/A=">AAACTXicbVFLSwMxGMzWd31VPXoJFqEFLbsq6EUQvQheKlgtdMuSzWY1No8lybaUZf+gF8Gb/8KLB0XE9KH4GggZZuYjySRMGNXGdR+dwsTk1PTM7FxxfmFxabm0snqpZaowaWDJpGqGSBNGBWkYahhpJoogHjJyFXZOBv5VlyhNpbgw/YS0OboWNKYYGSsFpaiuKs0g6/ZyeAjjSnerV61CP1EyMRL6UUiF5NnZMJBnAx/2qvmX4euUBxm9zeHZaNuGn1lu+Wc+KJXdmjsE/Eu8MSmDMepB6cGPJE45EQYzpHXLcxPTzpAyFDOSF/1UkwThDromLUsF4kS3s2EbOdy0SgRjqewSBg7V7xMZ4lr3eWiTHJkb/dsbiP95rdTEB+2MiiQ1RODRQXHKoG1qUC2MqCLYsL4lCCtq7wrxDVIIG/sBRVuC9/vJf8nlTs3bre2c75WPjsd1zIJ1sAEqwAP74AicgjpoAAzuwBN4Aa/OvfPsvDnvo2jBGc+sgR8ozHwAAFyzMw==</latexit>
Kij = kouti kinj<latexit sha1_base64="m9/JBMqpKEyarcEuT1mV5zfrvcQ=">AAACFHicbVC7SgNBFJ31GeNr1dJmMAiCEHajoI0QtBFsIpgHJOsyO5lNJpl9MHNXDMt+hI2/YmOhiK2FnX/j5FHExAMDh3Pu5c45Xiy4Asv6MRYWl5ZXVnNr+fWNza1tc2e3pqJEUlalkYhkwyOKCR6yKnAQrBFLRgJPsLrXvxr69QcmFY/COxjEzAlIJ+Q+pwS05JrHN27Ke9lF/z5tAXsEGaRRAlnmcjwl8VArPdcsWEVrBDxP7AkpoAkqrvndakc0CVgIVBClmrYVg5MSCZwKluVbiWIxoX3SYU1NQxIw5aSjUBk+1Eob+5HULwQ8Uqc3UhIoNQg8PRkQ6KpZbyj+5zUT8M8dHSlOgIV0fMhPBIYIDxvCbS4ZBTHQhFDJ9V8x7RJJKOge87oEezbyPKmVivZJsXR7WihfTurIoX10gI6Qjc5QGV2jCqoiip7QC3pD78az8Wp8GJ/j0QVjsrOH/sD4+gWFdaBh</latexit>
![Page 37: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/37.jpg)
Hypergeometric Ensemble
37
{Pr(Xvw = f(v, w)) /
✓Kvw
f(v, w)
◆✓Pij Kij �Kvw
m� f(v, w)
◆
<latexit sha1_base64="NCkO6i3fdAOka+krxJwECKzqz/A=">AAACTXicbVFLSwMxGMzWd31VPXoJFqEFLbsq6EUQvQheKlgtdMuSzWY1No8lybaUZf+gF8Gb/8KLB0XE9KH4GggZZuYjySRMGNXGdR+dwsTk1PTM7FxxfmFxabm0snqpZaowaWDJpGqGSBNGBWkYahhpJoogHjJyFXZOBv5VlyhNpbgw/YS0OboWNKYYGSsFpaiuKs0g6/ZyeAjjSnerV61CP1EyMRL6UUiF5NnZMJBnAx/2qvmX4euUBxm9zeHZaNuGn1lu+Wc+KJXdmjsE/Eu8MSmDMepB6cGPJE45EQYzpHXLcxPTzpAyFDOSF/1UkwThDromLUsF4kS3s2EbOdy0SgRjqewSBg7V7xMZ4lr3eWiTHJkb/dsbiP95rdTEB+2MiiQ1RODRQXHKoG1qUC2MqCLYsL4lCCtq7wrxDVIIG/sBRVuC9/vJf8nlTs3bre2c75WPjsd1zIJ1sAEqwAP74AicgjpoAAzuwBN4Aa/OvfPsvDnvo2jBGc+sgR8ozHwAAFyzMw==</latexit>
Kij = kouti kinj<latexit sha1_base64="m9/JBMqpKEyarcEuT1mV5zfrvcQ=">AAACFHicbVC7SgNBFJ31GeNr1dJmMAiCEHajoI0QtBFsIpgHJOsyO5lNJpl9MHNXDMt+hI2/YmOhiK2FnX/j5FHExAMDh3Pu5c45Xiy4Asv6MRYWl5ZXVnNr+fWNza1tc2e3pqJEUlalkYhkwyOKCR6yKnAQrBFLRgJPsLrXvxr69QcmFY/COxjEzAlIJ+Q+pwS05JrHN27Ke9lF/z5tAXsEGaRRAlnmcjwl8VArPdcsWEVrBDxP7AkpoAkqrvndakc0CVgIVBClmrYVg5MSCZwKluVbiWIxoX3SYU1NQxIw5aSjUBk+1Eob+5HULwQ8Uqc3UhIoNQg8PRkQ6KpZbyj+5zUT8M8dHSlOgIV0fMhPBIYIDxvCbS4ZBTHQhFDJ9V8x7RJJKOge87oEezbyPKmVivZJsXR7WihfTurIoX10gI6Qjc5QGV2jCqoiip7QC3pD78az8Wp8GJ/j0QVjsrOH/sD4+gWFdaBh</latexit>
Probability of observing frequency f(v,w) given the entire weighted network structure.
![Page 38: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/38.jpg)
Hypergeometric Ensemble
38
{
Pr(Xvw = f(v, w)) /✓
Kvw
f(v, w)
◆✓Pij Kij �Kvw
m� f(v, w)
◆
<latexit sha1_base64="NCkO6i3fdAOka+krxJwECKzqz/A=">AAACTXicbVFLSwMxGMzWd31VPXoJFqEFLbsq6EUQvQheKlgtdMuSzWY1No8lybaUZf+gF8Gb/8KLB0XE9KH4GggZZuYjySRMGNXGdR+dwsTk1PTM7FxxfmFxabm0snqpZaowaWDJpGqGSBNGBWkYahhpJoogHjJyFXZOBv5VlyhNpbgw/YS0OboWNKYYGSsFpaiuKs0g6/ZyeAjjSnerV61CP1EyMRL6UUiF5NnZMJBnAx/2qvmX4euUBxm9zeHZaNuGn1lu+Wc+KJXdmjsE/Eu8MSmDMepB6cGPJE45EQYzpHXLcxPTzpAyFDOSF/1UkwThDromLUsF4kS3s2EbOdy0SgRjqewSBg7V7xMZ4lr3eWiTHJkb/dsbiP95rdTEB+2MiiQ1RODRQXHKoG1qUC2MqCLYsL4lCCtq7wrxDVIIG/sBRVuC9/vJf8nlTs3bre2c75WPjsd1zIJ1sAEqwAP74AicgjpoAAzuwBN4Aa/OvfPsvDnvo2jBGc+sgR8ozHwAAFyzMw==</latexit>
Kij = kouti kinj<latexit sha1_base64="m9/JBMqpKEyarcEuT1mV5zfrvcQ=">AAACFHicbVC7SgNBFJ31GeNr1dJmMAiCEHajoI0QtBFsIpgHJOsyO5lNJpl9MHNXDMt+hI2/YmOhiK2FnX/j5FHExAMDh3Pu5c45Xiy4Asv6MRYWl5ZXVnNr+fWNza1tc2e3pqJEUlalkYhkwyOKCR6yKnAQrBFLRgJPsLrXvxr69QcmFY/COxjEzAlIJ+Q+pwS05JrHN27Ke9lF/z5tAXsEGaRRAlnmcjwl8VArPdcsWEVrBDxP7AkpoAkqrvndakc0CVgIVBClmrYVg5MSCZwKluVbiWIxoX3SYU1NQxIw5aSjUBk+1Eob+5HULwQ8Uqc3UhIoNQg8PRkQ6KpZbyj+5zUT8M8dHSlOgIV0fMhPBIYIDxvCbS4ZBTHQhFDJ9V8x7RJJKOge87oEezbyPKmVivZJsXR7WihfTurIoX10gI6Qjc5QGV2jCqoiip7QC3pD78az8Wp8GJ/j0QVjsrOH/sD4+gWFdaBh</latexit>
Number of ways to pick f(v,w) multiedges from Kvw possible.
![Page 39: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/39.jpg)
Hypergeometric Ensemble
39
{
Kij = kouti kinj<latexit sha1_base64="m9/JBMqpKEyarcEuT1mV5zfrvcQ=">AAACFHicbVC7SgNBFJ31GeNr1dJmMAiCEHajoI0QtBFsIpgHJOsyO5lNJpl9MHNXDMt+hI2/YmOhiK2FnX/j5FHExAMDh3Pu5c45Xiy4Asv6MRYWl5ZXVnNr+fWNza1tc2e3pqJEUlalkYhkwyOKCR6yKnAQrBFLRgJPsLrXvxr69QcmFY/COxjEzAlIJ+Q+pwS05JrHN27Ke9lF/z5tAXsEGaRRAlnmcjwl8VArPdcsWEVrBDxP7AkpoAkqrvndakc0CVgIVBClmrYVg5MSCZwKluVbiWIxoX3SYU1NQxIw5aSjUBk+1Eob+5HULwQ8Uqc3UhIoNQg8PRkQ6KpZbyj+5zUT8M8dHSlOgIV0fMhPBIYIDxvCbS4ZBTHQhFDJ9V8x7RJJKOge87oEezbyPKmVivZJsXR7WihfTurIoX10gI6Qjc5QGV2jCqoiip7QC3pD78az8Wp8GJ/j0QVjsrOH/sD4+gWFdaBh</latexit>
Pr(Xvw = f(v, w)) /✓
Kvw
f(v, w)
◆✓Pij Kij �Kvw
m� f(v, w)
◆
<latexit sha1_base64="NCkO6i3fdAOka+krxJwECKzqz/A=">AAACTXicbVFLSwMxGMzWd31VPXoJFqEFLbsq6EUQvQheKlgtdMuSzWY1No8lybaUZf+gF8Gb/8KLB0XE9KH4GggZZuYjySRMGNXGdR+dwsTk1PTM7FxxfmFxabm0snqpZaowaWDJpGqGSBNGBWkYahhpJoogHjJyFXZOBv5VlyhNpbgw/YS0OboWNKYYGSsFpaiuKs0g6/ZyeAjjSnerV61CP1EyMRL6UUiF5NnZMJBnAx/2qvmX4euUBxm9zeHZaNuGn1lu+Wc+KJXdmjsE/Eu8MSmDMepB6cGPJE45EQYzpHXLcxPTzpAyFDOSF/1UkwThDromLUsF4kS3s2EbOdy0SgRjqewSBg7V7xMZ4lr3eWiTHJkb/dsbiP95rdTEB+2MiiQ1RODRQXHKoG1qUC2MqCLYsL4lCCtq7wrxDVIIG/sBRVuC9/vJf8nlTs3bre2c75WPjsd1zIJ1sAEqwAP74AicgjpoAAzuwBN4Aa/OvfPsvDnvo2jBGc+sgR8ozHwAAFyzMw==</latexit>
Number of ways to pick everything else.
![Page 40: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/40.jpg)
Putting it all together: HYPA scores
40
![Page 41: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/41.jpg)
Putting it all together: HYPA scores
41
If “close” to 0, then the pathway is underrepresented.
If “close” to 1, then pathway is overrepresented.
![Page 42: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/42.jpg)
Putting it all together: HYPA scores
42
XB
XA
DX
CX0.99
0.97XB
XA
DX
C30 X
100
If “close” to 0, then the pathway is underrepresented.
If “close” to 1, then pathway is overrepresented.
+12.57
+11.69
-12.57
-11.69
XA C
XA D
DXB
B CX
Deviation from Randomized Paths
Observed Data Simulation Result HYPA Scores
![Page 43: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/43.jpg)
Validation
43
![Page 44: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/44.jpg)
Noise via Path Randomization
44
![Page 45: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/45.jpg)
Synthetic Anomalies: Setup
45
![Page 46: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/46.jpg)
Synthetic Anomalies: Setup
46
A
BX
C
D XB
XA
DX
CX
Start with an arbitrary first order topology, then construct the kth-order de Bruijn graph
k = 2
![Page 47: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/47.jpg)
Synthetic Anomalies: Setup
47
A
BX
C
D XB
XA
DX
CX
Randomly choose some edges to label over-represented
k = 2
![Page 48: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/48.jpg)
Synthetic Anomalies: Setup
48
A
BX
C
D XB
XA
DX
CX30
100
Assign heterogeneous weights based on label
k = 2
![Page 49: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/49.jpg)
Synthetic Anomalies: Setup
49
A
BX
C
D XB
XA
DX
CX
Generate paths via random walks on this model, then evaluate ability of HYPA to detect injected anomalies (binary classifier).
30
100
k = 2
![Page 50: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/50.jpg)
Synthetic Anomalies: ROC Example
50
![Page 51: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/51.jpg)
Synthetic Anomalies: ROC Example
51
Injected Anomalies of Length 2
![Page 52: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/52.jpg)
Synthetic Anomalies: AUC Results
52
![Page 53: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/53.jpg)
Application to Flight Data
53
![Page 54: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/54.jpg)
Airlines
54
5% sample of all US domestic flights in 2018
Data from: http://www.transtats.bts.gov/Tables.asp?DB_ID=125
![Page 55: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/55.jpg)
AirlinesHypotheses:
1. Return flights should be over-represented, since people most often travel round trip.
55
![Page 56: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/56.jpg)
Airlines: Return trips are over-represented
56
Fraction of over-represented return/non-return flights for various discrimination thresholds.
![Page 57: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/57.jpg)
Airlines: Return trips are over-represented
57
Fraction of over-represented return/non-return flights for various discrimination thresholds.
![Page 58: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/58.jpg)
AirlinesHypotheses:
1. Return flights should be over-represented, since people most often travel round trip.
2. Over-represented non-return flights are due to regional/national hubs, since people need to fly from small airports → regional hub → large airport.
58
![Page 59: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/59.jpg)
Airlines: Trip Balance
59
![Page 60: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/60.jpg)
AirlinesHypotheses:
1. Return flights should be over-represented, since people most often travel round trip.
2. Over-represented non-return flights are due to regional/national hubs, since people need to fly from small airports → regional hub → large airport.
3. “Efficient” paths are more likely to be over-represented.
60
![Page 61: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/61.jpg)
Airlines: Efficiency
61
![Page 62: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/62.jpg)
References
tlarock.github.iohttps://arxiv.org/abs/1905.10580
Thanks!
62
Scholtes, Ingo. "When is a network a network?: Multi-order graphical model selection in pathways and temporal networks." Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2017.
Casiraghi et al. "Generalized hypergeometric ensembles: Statistical hypothesis testing in complex networks." arXiv:1607.02441 (2016).
Casiraghi & Nanumyan. “Generalised hypergeometric ensembles of random graphs: the configuration model as an urn problem.” arXiv:1810.06495 (2018)
R. TransStat. Origin and destination survey database. http://www.transtats.bts.gov/Tables.asp?DB_ID=125, 2018.
![Page 63: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/63.jpg)
63
https://arxiv.org/abs/1905.10580
![Page 64: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/64.jpg)
Definition: kth-order de Bruijn Graph
64
For a given graph G = (V,E) and positive integer k we define a k-th order
De Bruijn graph of paths in G as a graph Gk= (V k, Ek
), where (i) each node
~v := ~v0v1 . . . vk�1 2 V kis a path of length k � 1 in G, and (ii) (~v, ~w) 2 Ek
i↵
vi+1 = wi for i = 0, . . . , k � 2.
<latexit sha1_base64="nPo7/5HIQzxKitn0S+xtps/GiRw=">AAADPnicbVJNb9NAELXNVylfLRy5jEiQbOFGdjiAkCJVhVKOReqXVKfRZj1OFtu71u46URX5l3HhN3DjyIUDCHHlyK6TStAyl32aN/PezO6Oq4IpHUVfXO/a9Rs3b63dXr9z9979BxubD4+UqCXFQyoKIU/GRGHBOB5qpgs8qSSSclzg8Th/bfnjGUrFBD/Q5xUOSzLhLGOUaJMabboHb4UEAhM2Qw4TSaopJP7ewD8Kd4MkAMJTqIRi2vDAuMYJSujmXZgjpJgZV9Oc+HkSbOkpCJka+g3CjqzZhws9kUFF9FSZfqttVZW1bMnu3llu3M7yEHbP8qAbwnyKEsFnASChU+AiRdOWzJAuZs2rwRKMotkoTopUaAWz0SLfipsGEmNglIwBswbW1JoXyCfarmWqLLeaImyX85kxSvwL/bA9503Qiu0uxbLMVBgX9ixuYABzgxqTz8zNJT4bRCEsJwkh3+onQW99tNGJelEbcBXEK9BxVrE/2vicpILWJXJNC6LUaRxVerggUjNaYLOe1AorQnMywVMDOSlRDRft8zfw1GTSdppMcA1t9u+OBSmVOi/HprK073CZs8n/cae1zl4OF4xXtUZOl0ZZXYAWYP8SpEwi1cW5AYRK80co0CmRhGrz4+wlxJdXvgqO+r34ea//vt/Z3lldx5rz2Hni+E7svHC2nXfOvnPoUPej+9X97v7wPnnfvJ/er2Wp5656Hjn/hPf7D9a6/RE=</latexit>
![Page 65: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/65.jpg)
Pseudocode
65
![Page 66: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/66.jpg)
Naïve Baseline ComparisonFrequency-Based Anomaly Detection (FBAD)
Compute mean, µ, and standard deviation, σ, of kth order edge weights
Given scaling factor α, label edges as- Overrepresented if frequency is larger than µ + σα - Underrepresented if frequency is smaller than µ - σα
66
![Page 67: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/67.jpg)
Synthetic Anomalies
67
![Page 68: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/68.jpg)
Real Data
68
![Page 69: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/69.jpg)
Exploring Motifs
69
![Page 70: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/70.jpg)
Case Study: London TubeData:
- Origin → destination statistics between London Tube stations- (origin, destination, #observations)
- Shortest paths between stations- Assume people follow shortest paths
70
![Page 71: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/71.jpg)
London TubeHypothesis:
- People typically use public transportation to travel large geographic distances- Overrepresented pathways should cover larger distances
Test:
- Measure distance between every station- For 2nd order transitions A-B-C, compute distance between nodes A and C- Analyze distributions of distance in over vs. under represented transitions
- Expect to see distribution shifted towards higher values for over-represented transitions
71
![Page 72: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/72.jpg)
London Tube
72
![Page 73: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/73.jpg)
London Tube
73
Median distance between source and destination nodes in under/over represented transitions.
![Page 74: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/74.jpg)
Constructing Ground TruthConstruct ground truth based on the method discussed earlier:
○ Randomize path data using k-1st order random walks
○ Compute kth-order path statistics
○ Repeat m times, noting the frequency of each path
○ Estimate multinomial distribution and its CDF from these statistics
○ If CDF(path) > threshold, label over-represented
74
![Page 75: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/75.jpg)
Tube Data - Ground Truth
75
![Page 76: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/76.jpg)
Computational complexity
76
O(N + |V |2�k1)
<latexit sha1_base64="ab5tZyX/wq/L2KgZL0VZSB19Lxg=">AAACAHicbVDLSsNAFJ3UV62vqAsXbgaLUBFKUgVdFt240gr2AW0aJpNJO3QyCTMToaTd+CtuXCji1s9w5984bbPQ1gMDh3Pu4c49XsyoVJb1beSWlldW1/LrhY3Nre0dc3evIaNEYFLHEYtEy0OSMMpJXVHFSCsWBIUeI01vcD3xm49ESBrxBzWMiROiHqcBxUhpyTUP7kq38BSOGqNupcN0zkeu3R2cuGbRKltTwEViZ6QIMtRc86vjRzgJCVeYISnbthUrJ0VCUczIuNBJJIkRHqAeaWvKUUikk04PGMNjrfgwiIR+XMGp+juRolDKYejpyRCpvpz3JuJ/XjtRwaWTUh4ninA8WxQkDKoITtqAPhUEKzbUBGFB9V8h7iOBsNKdFXQJ9vzJi6RRKdtn5cr9ebF6ldWRB4fgCJSADS5AFdyAGqgDDMbgGbyCN+PJeDHejY/ZaM7IMvvgD4zPHzTNlOI=</latexit>
![Page 77: Detecting Path Anomalies in Sequential Data on Networks · Casiraghi et al.(2016, 2018) Generalized Hypergeometric Ensemble 31 Generalization of the configuration model to weighted,](https://reader033.fdocuments.us/reader033/viewer/2022051815/603fda6e77dd524d5103c4ae/html5/thumbnails/77.jpg)
Scalability
77