Graphs / Networks -...
Transcript of Graphs / Networks -...
![Page 1: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/1.jpg)
Graphs / Networks Centrality measures, algorithms, interactive applications
CSE 6242/ CX 4242
Duen Horng (Polo) ChauGeorgia Tech
Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Le Song
![Page 2: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/2.jpg)
Centrality = “Importance”
![Page 3: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/3.jpg)
Why Node Centrality?What can we do if we can rank all the nodes in a graph (e.g., Facebook, LinkedIn, Twitter)?
• Find celebrities or influential people in a social network (Twitter)
• Find “gatekeepers” who connect communities (headhunters love to find them on LinkedIn)
• What else?
3
![Page 4: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/4.jpg)
More generallyHelps graph analysis, visualization, understanding, e.g.,
• Let us rank nodes, group or study them by centrality• Only show subgraph formed by the top 100 nodes,
out of the millions in the full graph• Similar to google search results (ranked, and
they only show you 10 per page)• Most graph analysis packages already have centrality
algorithms implemented. Use them!Can also compute edge centrality. Here we focus on node centrality.
4
![Page 5: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/5.jpg)
Degree Centrality (easiest)Degree = number of neighbors
• For directed graphs
• In degree = No. of incoming edges
• Out degree = No. of outgoing edges
• For undirected graphs, only degree is defined.
• Algorithms?
• Sequential scan through edge list
• What about for a graph stored in SQLite?5
1
2
3
4
![Page 6: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/6.jpg)
Computing Degrees using SQLRecall simplest way to store a graph in SQLite:edges(source_id, target_id)
1. If slow, first create index for each column2. Use group by statement to find in degreesselect count(*) from edges group by source_id;
6
![Page 7: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/7.jpg)
High betweenness = “gatekeeper”
Betweenness of a node v= = how often a node serves as the “bridge” that connects two other nodes.
Betweenness Centrality
7
Number of shortest paths between s and t that goes through v
Number of shortest paths between s and t
Betweenness is very well studied. http://en.wikipedia.org/wiki/Centrality#Betweenness_centrality
![Page 8: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/8.jpg)
(Local) Clustering CoefficientA node’s clustering coefficient is a measure of how close the node’s neighbors are from forming a clique.
• 1 = neighbors form a clique• 0 = No edges among neighbors
(Assuming undirected graph)“Local” means it’s for a node; can also compute a graph’s “global” coefficient
8Image source: http://en.wikipedia.org/wiki/Clustering_coefficient
![Page 9: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/9.jpg)
Requires triangle countingReal social networks have a lot of triangles
• Friends of friends are friends Triangles are expensive to compute
(neighborhood intersections; several approx. algos)
Can we do that quickly?
Computing Clustering Coefficients...
9
Algorithm details: Faster Clustering Coefficient Using Vertex Covershttp://www.cc.gatech.edu/~ogreen3/_docs/2013VertexCoverClusteringCoefficients.pdf
![Page 10: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/10.jpg)
But: triangles are expensive to compute (3-way join; several approx. algos)
Q: Can we do that quickly? A: Yes!
#triangles = 1/6 Sum ( λi3 )
(and, because of skewness, we only need the top few eigenvalues!
Super Fast Triangle Counting[Tsourakakis ICDM 2008]
details
10
![Page 11: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/11.jpg)
Power Law in Eigenvalues of Adjacency Matrix
Eigen exponent = slope = -0.48Eigenvalue
Rank of decreasing eigenvalue
11
![Page 12: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/12.jpg)
1000x+ speed-up, >90% accuracy12
![Page 13: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/13.jpg)
More Centrality Measures…• Degree
• Betweenness
• Closeness, by computing
• Shortest paths
• “Proximity” (usually via random walks) — used successfully in a lot of applications
• Eigenvector
• …13
![Page 14: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/14.jpg)
PageRank (Google)
Brin, Sergey and Lawrence Page (1998). Anatomy of a Large-Scale Hypertextual Web Search Engine. 7th Intl World Wide Web Conf.
Larry Page Sergey Brin
![Page 15: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/15.jpg)
Given a directed graph, find its most interesting/central node
PageRank: Problem
A node is important,if it is connected with important nodes(recursive, but OK!)
![Page 16: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/16.jpg)
Given a directed graph, find its most interesting/central node
Proposed solution: use random walk; spot most “popular” node (-> steady state probability (ssp))
PageRank: Solution
A node has high ssp,if it is connected with high ssp nodes(recursive, but OK!)
“state” = webpage
![Page 17: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/17.jpg)
Let B be the transition matrix: transposed, column-normalized
(Simplified) PageRank
=
To From B
1 2 3
45
![Page 18: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/18.jpg)
B p = p
=
B p = p
(Simplified) PageRank
1 2 3
45
![Page 19: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/19.jpg)
• B p = 1 * p • Thus, p is the eigenvector that
corresponds to the highest eigenvalue (=1, since the matrix is column-normalized)
• Why does such a p exist? –p exists if B is nxn, nonnegative, irreducible
[Perron–Frobenius theorem]
(Simplified) PageRank
![Page 20: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/20.jpg)
• In short: imagine a particle randomly moving along the edges
• Compute its steady-state probability (ssp)
Full version of algorithm: with occasional random jumps
Why? To make the matrix irreducible
(Simplified) PageRank
![Page 21: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/21.jpg)
• With probability 1-c, fly-out to a random node
• Then, we havep = c B p + (1-c)/n 1 => p = (1-c)/n [I - c B] -1 1
Full Algorithm
1 2 3
45
![Page 22: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/22.jpg)
B p
How to compute PageRank for huge matrix?
Use the power iteration methodhttp://en.wikipedia.org/wiki/Power_iteration
Can initialize this vector to any non-zero vector, e.g., all “1”s
1 2 3
45p’
+ 1/n
p = c B p + (1-c)/n 1
= c (1-c)
![Page 23: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/23.jpg)
23
http://www.cs.duke.edu/csed/principles/pagerank/
![Page 24: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/24.jpg)
PageRank for graphs (generally)You can compute PageRank for any graphsShould be in your algorithm “toolbox”
• Better than simple centrality measure (e.g., degree)
• Fast to compute for large graphs (O(E))But can be “misled” (Google Bomb)
• How?
24
![Page 25: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/25.jpg)
Personalized PageRankMake one small variation of PageRank
• Intuition: not all pages are equal, some more relevant to a person’s specific needs
• How?
25
![Page 26: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/26.jpg)
• With probability 1-c, fly-out to a random node some preferred nodes
• Then, we havep = c B p + (1-c)/n 1 => p = (1-c)/n [I - c B] -1 1
“Personalizing” PageRank
![Page 27: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/27.jpg)
Why learn Personalized PageRank?
Can be used for recommendation, e.g.,• If I like this webpage, what would I also be
interested?• If I like this product, what other products I also like?
(in a user-product bipartite graph)• Also helps with visualizing large graphs
• Instead of visualizing every single nodes, visualize the most important ones
Again, very flexible. Can be run on any graph.
27
![Page 28: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/28.jpg)
Interactive Application
![Page 29: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/29.jpg)
Building an interactive application
Will show you an example application (Apolo) that uses a “diffusion-based” algorithm to perform recommendation on a large graph
• Personalized PageRank (= Random Walk with Restart)
• Belief Propagation (powerful inference algorithm, for fraud detection, image segmentation, error-correcting codes, etc.)
• “Spreading activation” or “degree of interest” in Human-Computer Interaction (HCI)
• Guilt-by-association techniques
29
![Page 30: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/30.jpg)
Why diffusion-based algorithms are widely used? • Intuitive to interpret
uses “network effect”, homophily, etc.• Easy to implement
Math is relatively simple• Fast
run time linear to #edges, or better• Probabilistic meaning
30
Building an interactive application
![Page 31: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/31.jpg)
Human-In-The-Loop Graph Mining
Apolo: Machine Learning + VisualizationCHI 2011
31
Apolo: Making Sense of Large Network Data by Combining Rich User Interaction and Machine Learning
![Page 32: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/32.jpg)
Finding More Relevant Nodes
HCIPaper
Data MiningPaper
Citation network
32
![Page 33: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/33.jpg)
Finding More Relevant Nodes
HCIPaper
Data MiningPaper
Citation network
32
![Page 34: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/34.jpg)
Finding More Relevant Nodes
Apolo uses guilt-by-association(Belief Propagation, similar to personalized PageRank)
HCIPaper
Data MiningPaper
Citation network
32
![Page 35: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/35.jpg)
Demo: Mapping the Sensemaking Literature
33
Nodes: 80k papers from Google Scholar (node size: #citation) Edges: 150k citations
![Page 36: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/36.jpg)
![Page 37: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/37.jpg)
![Page 38: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/38.jpg)
Key Ideas (Recap)Specify exemplarsFind other relevant nodes (BP)
35
![Page 39: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/39.jpg)
Apolo’s Contributions
Apolo User
It was like having a partnership with the machine.
Human + Machine
Personalized Landscape
1
236
![Page 40: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/40.jpg)
Apolo 2009
37
![Page 41: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/41.jpg)
Apolo 2010
38
![Page 42: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/42.jpg)
Apolo 2011 22,000 lines of code. Java 1.6. Swing.Uses SQLite3 to store graph on disk
39
![Page 43: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/43.jpg)
User StudyUsed citation networkTask: Find related papers for 2 sections in a survey paper on user interface• Model-based generation of UI• Rapid prototyping tools
40
![Page 44: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/44.jpg)
Between subjects designParticipants: grad student or research staff
41
![Page 45: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/45.jpg)
41
![Page 46: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/46.jpg)
41
![Page 47: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/47.jpg)
Higher is better.Apolo wins.
* Statistically significant, by two-tailed t test, p <0.05
Judges’ Scores
0
8
16
Model-based
*Prototyping *Average
Apolo Scholar
Score
42
![Page 48: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/48.jpg)
Apolo: RecapA mixed-initiative approach for exploring and creating personalized landscape for large network data
Apolo = ML + Visualization + Interaction
43
![Page 49: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/49.jpg)
FeldsparFinding Information by Association.
CHI 2008 Polo Chau, Brad Myers, Andrew Faulring
44Paper: http://www.cs.cmu.edu/~dchau/feldspar/feldspar-chi08.pdfYouTube: http://www.youtube.com/watch?v=Q0TIV8F_o_E&feature=youtu.be&list=ULQ0TIV8F_o_E
![Page 50: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/50.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 45
Feldspar
![Page 51: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/51.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 45
Feldspar
A system that helps people find things on their computers when typical search or browsing tools don’t work
![Page 52: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/52.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 45
Feldspar
A system that helps people find things on their computers when typical search or browsing tools don’t work
An example scenario…
![Page 53: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/53.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 46
“Find the webpage mentioned in the email from the person I met at an event“
![Page 54: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/54.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 46
If I can’t remember the specifics, such as any text in the webpage, email, etc. à Can’t search
“Find the webpage mentioned in the email from the person I met at an event“
![Page 55: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/55.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 46
If I can’t remember the specifics, such as any text in the webpage, email, etc. à Can’t search
If I haven’t bookmarked the webpage à Can’t browse
“Find the webpage mentioned in the email from the person I met at an event“
![Page 56: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/56.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 47
“Find the webpage mentioned in the email from the person I met at an event“
![Page 57: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/57.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 47
But I can describe the webpage with a chain of associations.
“Find the webpage mentioned in the email from the person I met at an event“
![Page 58: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/58.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 47
But I can describe the webpage with a chain of associations.
webpage – email – person – event
“Find the webpage mentioned in the email from the person I met at an event“
![Page 59: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/59.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 47
But I can describe the webpage with a chain of associations.
webpage – email – person – event
The psychology literature has shown that people often remember things exactly like this.
“Find the webpage mentioned in the email from the person I met at an event“
![Page 60: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/60.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 48
Natural question: Can I find things by associations?
![Page 61: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/61.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 48
Natural question: Can I find things by associations?
Can I find the webpage by specifying its associated information (email, person, and event)?
![Page 62: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/62.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 48
Natural question: Can I find things by associations?
Can I find the webpage by specifying its associated information (email, person, and event)?
We created Feldspar, which supports this associative retrieval of information.
![Page 63: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/63.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 49
Feldspar stands for….
![Page 64: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/64.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 49
FELD S PA R
Feldspar stands for….
Finding Elements byLeveraging Diverse Sources of PertinentAssociative Recollection
![Page 65: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/65.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 50
Implementation: Overview
Create a graph database to store the associations among items on the computer
Develop an algorithm that processes the query and returns results
![Page 66: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/66.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 51
Creating an Association Database (a graph)
Install Google Desktop and let it index all the items on the computer
![Page 67: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/67.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 51
Creating an Association Database (a graph)
Focus on 7 types
Install Google Desktop and let it index all the items on the computer
![Page 68: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/68.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 51
Creating an Association Database (a graph)
Focus on 7 types
Install Google Desktop and let it index all the items on the computer
Identify associations and build our database, which is a directed graph
![Page 69: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/69.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 51
Creating an Association Database (a graph)
Focus on 7 types
Install Google Desktop and let it index all the items on the computer
Identify associations and build our database, which is a directed graph
![Page 70: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/70.jpg)
What to Do When Search Fails: Finding Information by Association Polo Chau, Brad Myers, Andrew Faulring 51
Creating an Association Database (a graph)
Focus on 7 types
Install Google Desktop and let it index all the items on the computer
Identify associations and build our database, which is a directed graph
![Page 71: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/71.jpg)
Practitioners’ guide to building (interactive) applications
Think about scalability early• e.g., picking a scalable algorithm early on
When building interactive applications, use iterative design approach (as in Apolo)
• Why? It’s hard to get it right the first time• Create prototype, evaluate, modify prototype,
evaluate, ...• Quick evaluation helps you identify important
fixes early (can save you a lot of time)52
![Page 72: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/72.jpg)
How to do iterative design?What kinds of prototypes?
• Paper prototype, lo-fi prototype, high-fi prototypeWhat kinds of evaluation? Important to involve REAL users as early as possible
• Recruit your friends to try your tools• Lab study (controlled, as in Apolo) • Longitudinal study (usage over months)• Deploy it and see the world’s reaction!
• To learn more:• CS 6750 Human-Computer Interaction• CS 6455 User Interface Design and Evaluation
53
Practitioners’ guide to building (interactive) applications
![Page 73: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/73.jpg)
If you want to know more about people…
54
http://amzn.com/0321767535
![Page 74: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/74.jpg)
Polonium: Web-Scale Malware DetectionSDM 2011
Polonium: Tera-Scale Graph Mining and Inference for Malware Detection
![Page 75: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/75.jpg)
Signature-based detection1.Collect malware2.Generate signatures 3.Distribute to users4.Scan computers for matches
What about “zero-day” malware?No samples à No signatures à No detectionHow to detect them early?
Typical Malware Detection Method
56
![Page 76: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/76.jpg)
Reputation-Based DetectionComputes reputation score for each application
e.g., MSWord.exe
Poor reputation = Malware
57
![Page 77: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/77.jpg)
58
PatentedI led initial design and development
Serving 120 million usersAnswered trillions of queries
TextPolonium
![Page 78: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/78.jpg)
58
PatentedI led initial design and development
Serving 120 million usersAnswered trillions of queries
Propagation of leverage of network influence unearths malware
TextPolonium
![Page 79: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/79.jpg)
Polonium works with 60 Terabyte Data
50 million machines anonymously reported their executable files
900 million unique files(Identified by their cryptographic hash values)
Goal: label malware and good files
59
![Page 80: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/80.jpg)
Why A Hard Problem?
Existing Research Polonium
Small dataset Huge dataset (60 terabytes)
Detects specific malware (e.g., worm, trojans)
Detects all types(needs a general method)
Many false alarms (>10%) Strict (<1%)
60
![Page 81: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/81.jpg)
Polonium: Problem Definition
GivenUndirected machine-file bipartite graph
37 billion edges , 1 billion nodes (machines, files)Some file labels from Symantec (good or bad)
FindLabels for all unknown files
61
![Page 82: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/82.jpg)
Symantec has a ground truth database of known-good and known-bad files
Where to Get Good and Bad Labels?
e.g., set known-good file’s prior to 0.962
![Page 83: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/83.jpg)
How to Gauge Machine Reputation?
Computed using Symantec’s proprietary formula; a value between 0 and 1
Derived from anonymous aspects of machine’s usage and behavior
63
![Page 84: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/84.jpg)
64
How to propagate known information to the unknown?
![Page 85: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/85.jpg)
Key Idea: Guilt-by-AssociationGOOD files likely appear on GOOD machinesBAD files likely appear on BAD machinesAlso known as Homophily
Machine
Good Bad
FileGood 0.9 0.1
Bad 0.1 0.9
65
![Page 86: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/86.jpg)
Adapts Belief Propagation (BP)A powerful inference algorithm
Used in image processing, computer vision, error-correcting codes, etc.
66
How to propagate known information to the unknown?
![Page 87: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/87.jpg)
A B C
2 31 4
Propagating Reputation
0.9 0.1
0.6 0.45 0.35
0.5 0.5
Machines
Files
Example
Machine
Good Bad
FileGood 0.9 0.1
Bad 0.1 0.9
67
![Page 88: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/88.jpg)
A B C
2 31 4
Propagating Reputation
0.9 0.1
0.6 0.45 0.35
0.5 0.5
Machines
Files
Example
Machine
Good Bad
FileGood 0.9 0.1
Bad 0.1 0.9
67
![Page 89: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/89.jpg)
A B C
2 31 4
Propagating Reputation
0.9 0.1
0.6 0.45 0.35
0.5 0.5
Machines
Files
Example
Machine
Good Bad
FileGood 0.9 0.1
Bad 0.1 0.9
2 31 40.92 0.060.58 0.38
67
![Page 90: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/90.jpg)
A B C
2 31 4
Propagating Reputation
0.9 0.1
0.6 0.45 0.35
0.5 0.5
Machines
Files
Example
Machine
Good Bad
FileGood 0.9 0.1
Bad 0.1 0.9
2 31 40.92 0.060.58 0.38
67
![Page 91: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/91.jpg)
A B C
2 31 4
Propagating Reputation
0.9 0.1
0.6 0.45 0.35
0.5 0.5
Machines
Files
Example
Machine
Good Bad
FileGood 0.9 0.1
Bad 0.1 0.9
A B C0.87 0.10.81
2 31 40.92 0.060.58 0.38
67
![Page 92: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/92.jpg)
Two Equations in Belief Propagation
68
Details
![Page 93: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/93.jpg)
Computing Node Belief (Reputation)
69
Details
![Page 94: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/94.jpg)
Computing Node Belief (Reputation)
Belief Prior belief Neighbors’ opinions
69
Details
![Page 95: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/95.jpg)
Computing Node Belief (Reputation)
Belief Prior belief Neighbors’ opinions
A B C
2 31 40.5
69
Details
![Page 96: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/96.jpg)
Creating Message for Neighbor
70
Details
![Page 97: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/97.jpg)
Creating Message for Neighbor
Edge potential BeliefOpinion for neighbor
Good BadGood 0.9 0.1Bad 0.1 0.9
70
Details
![Page 98: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/98.jpg)
Creating Message for Neighbor
Edge potential BeliefOpinion for neighbor
Good BadGood 0.9 0.1Bad 0.1 0.9
A B C
2 31 4
70
Details
![Page 99: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/99.jpg)
EvaluationUsing millions of ground truth files,10-fold cross validation
85% True Positive Rate 1% False Alarms
Ideal
True Positive Rate% of bad correctly labeled
False Positive Rate (False Alarms)% of good labeled as bad 71
![Page 100: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/100.jpg)
EvaluationUsing millions of ground truth files,10-fold cross validation
85% True Positive Rate 1% False Alarms
Ideal
True Positive Rate% of bad correctly labeled
False Positive Rate (False Alarms)% of good labeled as bad
Boosted existing methods by10 absolute % point
71
![Page 101: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/101.jpg)
Multi-Iteration Results
72
1234567
True Positive Rate% of bad correctly labeled
False Positive Rate (False Alarm)% of good labeled as bad
![Page 102: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/102.jpg)
Scalability Running Time Per Iteration
Linux16-core Opteron 256GB RAM
3 hours, 37 billion edges
73
![Page 103: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/103.jpg)
Scalability How Did I Scale Up BP?
74
Details
1.Early termination (after 6 iterations) à Faster
2.Keep edges on disk à Saves 200GB of RAM
3.Computes half of the messages à Twice as fast
![Page 104: Graphs / Networks - Visualizationpoloclub.gatech.edu/cse6242/2015fall/slides/CSE6242-6-GraphsAlgo… · Graphs / Networks Centrality measures, algorithms, interactive applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fb8df4459fae70276161565/html5/thumbnails/104.jpg)
Number of machines
Scale-up
Further Scale Up Belief PropagationUse Hadoop if graph doesn’t fit in memory [ICDE’11]Speed scales up linearly with number of machines
Yahoo! M45 cluster480 machines1.5 PB storage3.5TB machine
75