Graph Analytics

71
Definition Data Structures Applications Problems Graph Pattern Matching Partitioning Distribution AKKA Graph Analytics

description

Graph Analytics . Definition Data Structures Applications Problems Graph Pattern Matching Partitioning Distribution AKKA. Fast review on Graph and graph theory. Definition: A "graph" is a collection of " vertices" or "nodes" " edges " that connect pairs of vertices. - PowerPoint PPT Presentation

Transcript of Graph Analytics

Page 1: Graph Analytics

Definition Data Structures Applications Problems

Graph Pattern Matching Partitioning Distribution

AKKA

Graph Analytics

Page 2: Graph Analytics

Definition: A "graph" is a collection of • "vertices" or "nodes" • " edges " that connect pairs of vertices.

A graph may be undirected, meaning that there is no distinction between the two vertices associated with each edge, or its edges may be directed from one vertex to another

Fast review on Graph and graph theory

1 23 1 23

Page 3: Graph Analytics

The link structure of a website could be represented by a directed graph. The vertices are the web pages available at the website and a directed edge from page A to page B exists if and only if A contains a link to B. Mathematical PageRanks for a simple network, expressed as percentages. (Google uses a logarithmic scale.) Page C has a higher PageRank than Page E, even though there are fewer links to C; the one link to C comes from an important page and hence is of high value.

graph theory is also used to study molecules in chemistry and physics. In condensed matter physics, the three dimensional structure of complicated Simulated atomic structures.

Image Processing, crime detection Antology …

Applications

Page 4: Graph Analytics

The data structure used depends on both the graph structure and the algorithm used for manipulating the graph. Theoretically one can distinguish between list and matrix structures but in concrete applications the best structure is often a combination of both. List structures are often preferred for sparse graphs as they have smaller memory

requirements. Matrix structures on the other hand provide faster access for some applications but

can consume huge amounts of memory.

Graph-theoretic data structures

Page 5: Graph Analytics

List Incidence list

The edges are represented by an array containing pairs (tuples if directed) of vertices (that the edge connects) and possibly weight and other data. Vertices connected by an edge are said to be adjacent.

((a,b),(c,d),…)

Adjacency listMuch like the incidence list, each vertex has a list of which vertices it is adjacent to. This causes redundancy in an undirected graph: for example, if vertices A and B are adjacent, A's adjacency list contains B, while B's list contains A. Adjacency queries are faster, at the cost of extra storage space.

a adjacent to

b,c

b adjacent to

a,c

c adjacent to

a,b

Page 6: Graph Analytics

Matrix Incidence matrix

The graph is represented by a matrix of size |V | (number of vertices) by |E| (number of edges) where the entry [vertex, edge] contains the edge's endpoint data (simplest case: 1 - incident, 0 - not incident).

e1 e2 e3 e4

1 2 3 4

Adjacency matrixThis is an n by n matrix A, where n is the number of vertices in the graph. If there is an edge from a vertex x to a vertex y, then the element is 1 (or in general the number of xy edges), otherwise it is 0. In computing, this matrix makes it easy to find subgraphs, and to reverse a directed graph.

Page 7: Graph Analytics

Matrix

Distance matrixA symmetric n by n matrix D, where n is the number of vertices in the graph. The element is the length of a shortest path between x and y; if there is no such path = infinity. It can be derived from powers of A

a b c d e fa 0 184 222 177 216 231b 184 0 45 123 128 200c 222 45 0 129 121 203d 177 123 129 0 46 83e 216 128 121 46 0 83f 231 200 203 83 83 0

Page 8: Graph Analytics

Matrix

Laplacian matrix or "Kirchhoff matrix" or "Admittance matrix" This is defined as D − A, where D is the diagonal degree matrix. It explicitly contains both adjacency information and degree information. (However, there are other, similar matrices that are also called "Laplacian matrices" of a graph.)

Page 9: Graph Analytics

1) Enumeration2) Subgraphs, induced subgraphs, and minors3) Graph coloring4) Route problems5) Network flow6) Visibility graph problems7) Covering problems8) Graph classes

Problems in graph theory

Page 10: Graph Analytics

Enumeration describes a class of combinatorial enumeration problems in which one must count undirected or directed graphs of certain types, typically as a function of the number of vertices of the graph.Application: Enumeration of molecules has been studied for over a century and

continues to be an active area of research. The typical approach to enumerating chemical structures has been based

on constructive assembly.

It is list of all free trees on 2,3,4 labeled vertices:

tree with 2 vertices, trees with 3 vertices, trees with 4 vertices.

1. Enumeration

Page 11: Graph Analytics

2.1. Subgraphs: A common problem, called the subgraph isomorphism problem, is finding a fixed graph as a subgraph in a given graph. The subgraph isomorphism problem is a computational task in which two graphs G and Q are given as input, and one must determine whether G contains a subgraph that is isomorphic to Q. Subgraph isomorphism is a generalization of both the maximum clique problem and the problem of testing whether a graph contains a Hamiltonian cycle, and is therefore NP-complete.

clique problem: Finding the largest complete graph is called the clique problem. The term "clique" and the problem of algorithmically listing cliques both come from the social sciences, where complete subgraphs are used to model social cliques, groups of people who all know each other.In computer science, the clique problem refers to any of the problems related to finding particular complete subgraphs ("cliques") in a graph, i.e., sets of elements where each pair of elements is connected.

2. Subgraphs, induced subgraphs, and minors

Page 12: Graph Analytics

2.2 Induced subgraphs: some important graph properties are hereditary with respect to induced subgraphs, which means that a graph has a property if and only if all induced subgraphs also have it. Finding maximal induced subgraphs of a certain kind is also often NP-complete. Finding the largest edgeless induced subgraph, or independent set, called

the independent set problem An independent set or stable set is a set of vertices in a graph, no two of

which are adjacent. That is, it is a set I of vertices such that for every two vertices in I, there is no edge connecting the two. The size of an independent set is the number of vertices it contains.

The graph of the cube has six different maximal independent sets, shown as the red vertices.

2. Subgraphs, induced subgraphs, and minors

Page 13: Graph Analytics

2.3. Minors: The minor containment problem, is to find a fixed graph as a minor of a given graph. A minor or subcontraction of a graph is any graph obtained by taking a subgraph and contracting some (or no) edges. Many graph properties are hereditary for minors, which means that a graph has a property if and only if all minors have it too

A graph is planar if it contains as a minor neither the complete bipartite graph (Three-cottage problem) nor the complete graph . Graph can be drawn in such a way that no edges cross each other. Such a drawing is called a plane graph or planar embedding of the graph.

Three-cottage problem: water, gas, and electricity, the (three) utilities problem: Suppose there are three cottages on a plane and each needs to be connected to the gas, water, and electric companies. Using a third dimension or sending any of the connections through another company or cottage is disallowed. Is there a way to make all nine connections without any of the lines crossing each other?

The utility graph K3,3 K3,3 drawn with only one crossing.

2. Subgraphs, induced subgraphs, and minors

Page 14: Graph Analytics

Many problems have to do with various ways of coloring graphs, for example:1) The four-color theorem: In mathematics, the four color theorem, or the four color map theorem states

that, given any separation of a plane into contiguous regions, producing a figure called a map, no more than four colors are required to color the regions of the map so that no two adjacent regions have the same color.

2) The strong perfect graph theorem: In graph theory, a perfect graph is a graph in which the chromatic number of every induced subgraph equals the size of the largest clique of that subgraph. Perfect graphs are the same as the Berge graphs, graphs that have no odd-length induced cycle or induced complement of an odd cycle.

The chromatic polynomial counts the number of ways a graph can be colored using no more than a given number of colors.

3. Graph coloring

The Paley graph of order 9, colored with three colors and showing a clique of three vertices. In this graph and each of its induced subgraphs the chromatic number equals the clique number, so it is a perfect graph.

Page 15: Graph Analytics

3) The total coloring conjecture (unsolved): In graph theory, total coloring is a type of coloring on the vertices and edges of a graph. When used without any qualification, a total coloring is always assumed to be proper in the sense that no adjacent vertices, no adjacent edges, and no edge and its endvertices are assigned the same color. The total chromatic number χ″(G) of a graph G is the least number of colors needed in any total coloring of G.

4) The Erdős–Faber–Lovász conjecture (unsolved)5) The list coloring conjecture (unsolved)6) The Hadwiger conjecture (graph theory) (unsolved)

3. Graph coloring

Page 16: Graph Analytics

4.1. Hamiltonian path and cycle problems: Hamiltonian path problem and the Hamiltonian cycle problem are problems of determining whether a Hamiltonian path or a Hamiltonian cycle exists in a given graph Hamiltonian path: a Hamiltonian path is a path in an undirected graph that visits each vertex

exactly once. A Hamiltonian cycle (or Hamiltonian circuit) is a Hamiltonian path that is a cycle

4.2. Minimum spanning tree: In an undirected graph, a spanning tree of that graph is a subgraph that connects all the vertices together.

4.3. Route inspection problem : route inspection problem is to find a shortest closed path or circuit that visits every edge of a (connected) undirected graph

4. Route problems

Page 17: Graph Analytics

4.4. Seven Bridges of Königsberg: The problem was to find a walk through the city that would cross each bridge once and only once

4.5. Shortest path problem: the shortest path problem is the problem of finding a path between two vertices (or nodes) in a graph such that the sum of the weights of its constituent edges is minimized.4.6. Steiner tree: problem in combinatorial optimization, which may be formulated in a number of settings, with the common part being that it is required to find the shortest interconnect for a given set of objects

4.7. Three-cottage problem 4.8. Traveling salesman problem : Given a list of cities and their pairwise distances, the task is to find the shortest possible route that visits each city exactly once and returns to the origin city

4. Route problems

Page 18: Graph Analytics

Graph pattern matching is often defined in terms of subgraph isomorphism, an NP-complete problem. To lower its complexity, various extensions of graph simulation have been considered instead. Given a pattern graph Q and a data graph G, it is to find all subgraphs of G that match Q.

Graph pattern matching

input images detected features

one-shot matching (26 true) gressive matching (159 true)

Page 19: Graph Analytics

1. Isomorphism:In graph theory, an isomorphism of graphs G and Q is a bijection between the vertex sets of G and Q ( Q G )

such that any two vertices u and v of G are adjacent in G if and only if ƒ(u) and ƒ(v) are adjacent in Q A bijection (or bijective function or one-to-one correspondence) is a function giving

an exact pairing of the elements of two sets.

Graph pattern matching

Graph G Graph Q isomorphism

f(a) = 1 f(b) = 6f(c) = 8f(d) = 3f(g) = 5f(h) = 2f(i) = 4f(j) = 7

Page 20: Graph Analytics

As observed, it is often too restrictive to catch sensible matches, as it requires matches to have exactly the same topology as a pattern graph. These hinder its applicability in emerging applications such as social networks and crime detection. Simple Simulation :

denoted by Q G≺ , S V⊆ Q × V , where VQ and V are the set of nodes in Q and G, respectively, such that 1. for each (u, v) S∈ , u and v have the same label; 2. for each node u in Q, there exists v in G such that

a) (u, v) S∈ , b) for each edge (u, u’)in Q, there exists an edge (v, v’) in G such that (u’,

v’) S∈ . (same children)

Graph Simulation

100

300

2001

5

3

2

4 1

5

3

2

4

TE

ST

ST

Book

Book

Book

Book

TE

ST

ST

TE ST

Book

TE 1

ST 2,3

Book 4,5

Q G

Page 21: Graph Analytics

Dual simulation: denoted by Q ≺D G,

1. if Q G ≺ with a binary match relation S Vq × V ⊆ , 2. for each pair (u, v) S ∈ and each edge (u2, u) in Eq, there exists an

edge (v2, v) in E with (u2, v2) S∈ . (same children and same parents)

Graph Simulation

100

300

200 1

5

3

2

4 1

5

3

2

TE

ST

ST

Book

BookBook

TE

ST

ST

TE ST

Book

Q G

TE 1

ST 2,3

Book 5

Page 22: Graph Analytics

More Example of Simple and Dual Simulation

100

200

A

B

1

2

B

B

3

4

B

B

5

6

B

D

7K

8A

A 0, 8, 9

B 1, 2, 3, 4, 5

9A

Simple Simulation

A 0, 8, 9

B 1, 2, 3, 4, 5

Dual Simulation 0

1

A

B

4B

8A

1

2

B

B

3

4

B

5

8A

B

B

0A

0A

3B

QG

Page 23: Graph Analytics

More Example of Simple and Dual Simulation

10

30

A

C

A 1

B 2

C 3

D 4 , 5

E 7 , 8 , 11

F 6 , 9, 10

Simple Simulation Q G

40

20

6050

D

B

E F

1

2

A

B

4

3

6

D

C

F

5D

E

1110

98F

F

E

E 7

Page 24: Graph Analytics

Strong simulation:Define strong simulation by enforcing two conditions on simulation : duality and locality.Balls. For a node v in a graph G and a non-negative integer r, the ball with center v and radius r is a subgraph of G, denoted by ˆG[v, r], such that

1. for all nodes v in ˆG[v, r], the shortest distance dist(v, v) ≤ r, 2. it has exactly the edges that appear in G over the same node set.

denoted by Q ≺ DL G, if there exist a node v in G and a connected subgraph Gs of G such that

3. Q ≺D Gs, with the maximum match relation S;4. Gs is exactly the match graph w.r.t. S5. Gs is contained in the ball ˆG[v, dQ], where dQ is the diameter of Q.

Graph Simulation

100

200 4

321

321

Q

P

P

P PP

P

P P P

P 1, 2, 3, 44

31P

P

P

G

4

21

P

P

4

32

PP

P

Page 25: Graph Analytics

More Example of Simple and Dual Simulation

100

200

A

B

1

2

B

A

3

4

B

A

5

6

B

B

0A

QG

1

6B

0

B

A

1

2

B

A

0A

1

2

B

A

3B

2A

3

4

B

A

3

4

B

A

5B

4A

5

6

B

A

5

6

B

B

0A

Page 26: Graph Analytics

Example 1: the Bio has to be recommended by: a) an HR person; b) an SE, i.e., the Bio has experience working with SEs;

a) The SE is also recommended by an HR personc) a data mining specialist (DM), as data mining techniques are required for

the job. a) there is an artificial intelligence expert (AI) who recommends the DM

and is recommended by a DM.

 SE1

 HR1

 Bio2

 Bio1

 SE2

 HR2

 DM2

 Al1  DM1

 Al2

 AI1  DM1  AIk  DMk1

 Bio3

 Bio4

 HR

 SE

 Bio

 DM

 AI

Graph Simulation

Page 27: Graph Analytics

We next present optimization techniques for algorithm Match, by means of Query minimization Dual simulation filtering Connectivity pruning

1. Query minimization: We say that two pattern graphs Q and Q’ are equivalent, denoted by Q ≡ Q’, if they return the same result on any data graph. A pattern graph Q is minimum if it has the least size |Q| (the number of nodes and the number of edges) among all equivalent pattern graphs.

Optimization Techniques

C1 C2

A B1

D1

B2

D2

R

C1

A B1

D1

R

Page 28: Graph Analytics

2. Dual simulation filtering. Our second optimization technique aims to avoid redundant checking of balls in the data graph. Most algorithms of graph simulation recursively refine the match relation by identifying and removing false matches. So, we compute the match relation of dual simulation first, and then project the match relation on each ball to compute strong simulation. This both reduces the initial match set sim(v) for each node v in Q and reduces the number of balls . Indeed, if a node v in G does not match any node in Q, then there is no need to consider the ball centered at v.

The removal process on a ball only needs to deal with its border nodes and their affected nodes.

Optimization Techniques

P4

P3P2P1P

P’

Q G

P4

P3P1

Page 29: Graph Analytics

3. Connectivity pruning. In a ball, only the connected component containing the ball center v needs to be considered. Hence, those nodes not reachable from v can be pruned early.

Optimization Techniques

A2B1A1 B2

Q

CB1A1 A2 B2

G

Page 30: Graph Analytics

def hhk (g: Graph, q: Graph): Unit = { val sim = HashMap[Int, Set[Int]]() q.vertices.foreach ( u => {

var lis = Set[Int]() g.vertices.filter( w => g.label(w) == q.label(u)).foreach ( wp => lis += wp ) sim += u -> lis

}) var flag = true while (flag) {

flag = false for (u <- q.vertices; w <- sim(u); v <- q.post(u) if (g.post(w) & sim(v)).isEmpty ) {

sim(u) -= w flag = true

} for (u <- q.vertices; w <- sim(u); v <- q.pre(u) if (g.post(w) & sim(v)).isEmpty ) {

sim(u) -= w flag = true

} //for } //while}

Page 31: Graph Analytics

For all v € G If post (v) =0 then sim(v) = { u € Q | <<u>> = <<v>>} Else sim(v) = { u € Q | <<u>> = <<v>> and post (u) ≠ 0} Remove (v) := pre ( G) – pre (sim(v))While there is v € G , remove(v) ≠ 0 for all u € pre(v)

for all w € remove (v) if w € sim (u) sim (u) = sim (u) – {w}

for all w’ € pre (w) if post(w’) ᴨ sim (u) = 0 then remove (u) := remove (u) ᴜ {w’}

remove (v) = 0A1

C1 B1 D1

H G

C2

C3

D2

A

B C D

Sim ( D) = { D1,D2} Remove (v) := pre ( G) – pre (sim(v))

Remove (D) = {A1,B1,C1,D1,C2,C3} – {C2,C3,A1,B1} = {C1,D1}For u -> Pre(D) = { C,A} for w -> Remove (D) = {C1,D1} if w € sim(C) = {C1,C2,C3,A1} => sim (C) = {C1,C2,C3}–{C1} for all w’ € pre (w) = {A1} if post(A1) ᴨ Sim(C) = {C2,C3} ==0 (False)

Page 32: Graph Analytics

Home work: Pattern Q is looking for papers on social networks (SN) cited by papers on databases (db), which in turn cite papers on graph theory (graph). Fined the pattern graph and all Isomorphism, Simple simulation, Dual simulation and strong simulation match graph of that with given graph G

Graph Simulation

DB1 DB2 DB3 SN3

SN1 Graph1 SN2 Graph2 SN4

Page 33: Graph Analytics

The balance constraint:◦ Balance computational load such that each processor has the same execution

time◦ Balance storage such that each processor has the same storage demands

Minimum edge cut:◦ Minimize communication volume between subdomains, along the edges of the

mesh

Goals of Partitioning

Example 3-way partition with edge-cut = 9

5-cut

4-cut

Page 34: Graph Analytics

We now define the graph pattern matching problem in a distributed setting. Given pattern graph Q, and fragmented graph F = (F1, . . ., Fk) of data graph G, in which each fragment Fi = (G[Vi], Bi) (i [1, k]) is placed at a separate machine Si, the distributed graph pattern ∈matching problem is to find the maximum match in G for Q, via graph simulation.F1 = (G[V1], {BPM1 , BSA1 }), V1 = {PM1, BA1}

BPM1 = {BA1 : 2}, BSA1 = {SD1 : 2}, F2 = (G[V2], ), V2 = {SA1, ST1}, ∅

F3 = (G[V3], {BPM2}), V3 = {PM2,BA2,UD1}, BPM2 = {SA2 : 4} and BSA2 = {SDh : 5},

F4 = (G[V4], {BSA2 }), V4 = {SA2}, F5 = (G[V5], ), V5 = {SD1, ST1, . . . , SDh, STh},∅

Distributed Graph Pattern Matching

SA 

PM

BA UD

SD ST

PM1

SA1

BA1

SD1

PM2

BA2 UD1

SA2

STnSDnST1SD1

F1 F2 F3 F4

F5

Page 35: Graph Analytics

Partial match. A binary relation R Vq × Vi is said to be a partial match if ⊆ (1) for each (u, v) R, u and v have the same label; ∈ (2) for each edge (u, u’) in Eq,

◦ (a) there exists a node v’ Bv in Bi having the same label as u’ if v is a boundary node∈◦ (b) there exists an edge (v, v’) in G[Vi] such that (u’, v’) R∈

Pair (SA, SA1) is in the maximum partial match PM1 in fragment F1 for Q. However, it does not belong to the maximum match M in G for Q. Consider pattern graph Q1 and data graph G1 , and the partial match results . (1) For node SA1, its only child SD1 is located in fragment F2. The partial match SD1 is empty. Hence, a false match decision is sent back to machine S1, and this further helps determine that (SA,SA1) is a false match.(2) For node SA2, its only child SDn is located in fragment F5. The subgraph F5 contains no boundary nodes, and SDn belongs to F5. Hence, a true match decision is sent back to machine S4, and this further helps determine that (SA,SA2) is a true match. After these are done, fragment F3 is the only part of G that needs to be further evaluated. To check the matches in F3, we simply ship fragment F4 to machine S3.

Distributed Graph Pattern Matching

SA 

PM

BA UD

SD ST

PM1

SA1

BA1

SD1

PM2

BA2 UD1SA2

STnSDnST1SD1

F1 F2 F3 F4

F5

Page 36: Graph Analytics

Go for each matched label vertex and create the ball. with d=4 (L=2)

Q G

40

6050

D

E F

1

2

A

B

4

3

6 7

D

C

F

5D

E

1110

98F

F

E

E

D, E, F

12

D

D

6 7

F

5D

E

4

1110

98

E

E

F

F

D 4 D

1110

8F

F E

12D

9E

Page 37: Graph Analytics

Introducing Akka

Page 38: Graph Analytics

Correct highly scalable systems.Fault tolerant system that self heals.Truly scalable systems. ………. Using state of the art tools.

The Problem

Page 39: Graph Analytics

…. Simpler Concurrency Scalability Fault Tolerance

With a single unified Programming Model Runtime Service

Vision

Page 40: Graph Analytics

Scale up & out

Page 41: Graph Analytics

Finance• Stock trend analysis and simulation.• Event Driven Messaging Systems. Betting and Gaming• Massive multiplayer online gaming• High throughput and transactional betting. Telecom• Streaming media network gateways. Simulation• 3 D Simulation Engine. Ecommerce• Social Media Community Sites.

Where is Akka used?

Page 42: Graph Analytics

In computer science, the Actor model is a mathematical model of concurrent computation that treats "actors" as the universal primitives of concurrent digital computation: in response to a message that it receives, an actor can make local decisions, create more actors, send more messages, and determine how to respond to the next message received.

What is “Actor Model”

Page 43: Graph Analytics

AKKA is a toolkit and runtime for building highly concurrent distributed and fault tolerant even driven application on the JVM

Parallism Concurrency Event

DrivenActor

Behavior

State

Page 44: Graph Analytics

Life cycle of a Actor

Page 45: Graph Analytics

class object Tickclass Counter extends Actors {Var counter =0Def receive ={

Case tick =>Counter += 1Println (counter)}

}

Actors

Page 46: Graph Analytics

Val counter = actorOf[Counter]

Counter is an ActorRef

Create Actors

Page 47: Graph Analytics

Counter ! tick

Send !

Page 48: Graph Analytics

val future=actor !!! Messagefuture.awaitval result = future.result

Send !!!

Page 49: Graph Analytics

Class SomeActor extends Actor { def receive = { Case User(name) => Self.reply("Hi" + name) }}

Reply

Page 50: Graph Analytics

Self become{Case NewMessage =>

…….}

Hot Swap

Page 51: Graph Analytics

There are four different types of message dispatchers:1. Thread-based2. Event-based3. Priority event-based4. Work-stealing

Message Dispatcher

Page 52: Graph Analytics

The ‘ThreadBasedDispatcher’ binds a dedicated OS thread to each specific Actor. The messages are posted to a ‘LinkedBlockingQueue’ which feeds the messages to the dispatcher one by one. A ‘ThreadBasedDispatcher’ cannot be shared between actors. This dispatcher has worse performance and scalability than the event-based dispatcher but works great for creating “daemon” Actors that consumes a low frequency of messages and are allowed to go off and do their own thing for a longer period of time. Another advantage with this dispatcher is that Actors do not block threads for each other.

ThreadBasedDispatcher

Page 53: Graph Analytics

The ‘ExecutorBasedEventDrivenDispatcher’ binds a set of Actors to a thread pool backed up by a ‘BlockingQueue’. This dispatcher is highly configurable and supports a fluent configuration API to configure the ‘BlockingQueue’ (type of queue, max items etc.) as well as the thread pool.

The event-driven dispatchers must be shared between multiple Actors. One best practice is to let each top-level Actor, e.g. the Actors you define in the declarative supervisor config, to get their own dispatcher but reuse the dispatcher for each new Actor that the top-level Actor creates. But you can also share dispatcher between multiple top-level Actors.

ExecutorBasedEventDrivenDispatcher

Page 54: Graph Analytics

import akka.actor.Actorimport akka.dispatch.Dispatchers class MyActor extends Actor { self.dispatcher = Dispatchers.newExecutorBasedEventDrivenDispatcher(name) .withNewThreadPoolWithLinkedBlockingQueueWithCapacity(100) .setCorePoolSize(16) .setMaxPoolSize(128) .setKeepAliveTimeInMillis(60000) .build ...}

Page 55: Graph Analytics

It’s useful to be able to specify priority order of messages, that is done by using PriorityExecutorBasedEventDrivenDispatcher.

PriorityExecutorBasedEventDrivenDispatcher

Page 56: Graph Analytics

import akka.dispatch._import akka.actor._ val gen = PriorityGenerator { // Create a new PriorityGenerator, lower prio means more important case 'highpriority => 0 // 'highpriority messages should be treated first if possible case 'lowpriority => 100 // 'lowpriority messages should be treated last if possible case otherwise => 50 // We default to 50 } val a = Actor.actorOf( // We create a new Actor that just prints out what it processes new Actor { def receive = { case x => println(x) } })  // We create a new Priority dispatcher and seed it with the priority generator a.dispatcher = new PriorityExecutorBasedEventDrivenDispatcher("foo", gen) a.start // Start the Actor 

Page 57: Graph Analytics

The‘ExecutorBasedEventDrivenWorkStealingDispatcher’ is a variation of the ‘ExecutorBasedEventDrivenDispatcher’ in which Actors of the same type can be set up to share this dispatcher and during execution time the different actors will steal messages from other actors if they have less messages to process. This can be a great way to improve throughput at the cost of a little higher latency.

ExecutorBasedEventDriven WorkStealingDispatcher

Page 58: Graph Analytics
Page 59: Graph Analytics
Page 60: Graph Analytics

Scratch Data

Static Data• Supplied At boot time.• Supplied by other components.

Dynamic Data• Data possible to recompute.• Data from other sources.

Classification of State

Page 61: Graph Analytics

Fault Tolerant (Onion Layered)

Page 62: Graph Analytics

Error Kernel Pattern

Page 63: Graph Analytics

Akka is a implementation of Actor Model for both java and scala.

Actor encapsulates mutable state with the guarantee of one message at a time.

Akka in Summary

Page 64: Graph Analytics

Assign each child the label matched ball

Union All Matches

Page 65: Graph Analytics

Graph theory http://en.wikipedia.org/wiki/Graph_theory Capturing Topology in Graph Pattern Matching http://

vldb.org/pvldb/vol5/p310_shuaima_vldb2012.pdf Making a Move of Graphs via Probabilistic Voting http://

cv.snu.ac.kr/publication/conf/2012/ProgGM_CVPR2012.pdf GPS: A Graph Processing System http://ilpubs.stanford.edu:8090/1039/5/full_paper.pdf Distributed Graph Pattern Matching http://

www2012.wwwconference.org/proceedings/proceedings/p949.pdf Pregel: A System for Large-Scale Graph Processing new-chinese-chess-engine.googlecode.com/svn-history/r21/trunk/search_engine

/doc/arch/arch/pregel_paper.pdf Akka 2.0: Scaling Up & Out With Actors https://

www.youtube.com/watch?v=3jbqTxstlC4&feature=relmfu Apache Giraph: distributed graph processing in the cloud https://

www.youtube.com/watch?feature=endscreen&v=BmRaejKGeDM&NR=1 MapReduce Used on Large Data Sets https://www.youtube.com/watch?v=N8FHXgPJEfQ

References

Page 66: Graph Analytics

The Actor model adopts the philosophy that everything is an actor. This is similar to the everything is an object philosophy used by some object-oriented programming languages, but differs in that object-oriented software is typically executed sequentially, while the Actor model is inherently concurrent.

The Actor Model instead of manually creating threads or event loops, creates an object that has state and associated logic, and this associated logic will only be called with one thread at the time and communicate with outside through messages.

An actor is a computational entity that, in response to a message it receives, can concurrently:

send a finite number of messages to other actors; create a finite number of new actors; designate the behavior to be used for the next message it receives.

Recipients of messages are identified by address, sometimes called "mailing address". Thus an actor can only communicate with actors whose addresses it has. It can obtain those from a message it receives, or if the address is for an actor it has itself created.

Actor model

Page 67: Graph Analytics

CreateCase object TickClass Counter extends Actor{

var counter = 0Def receive = {

Case Tick => counter +=1Println(counter)

}}

Create an instance of counter Actor in system and give you the reference handle backvar counter = system.actorOf ( Props [ Counter ] . name = “ conunt”)

or if we are inside of parent and want to create child: var counter = Context.actorOf(….)

To stopCounter.stop

Actors in AKKA

Define how to create an actor

Name of the actor in the hierarchy

Page 68: Graph Analytics

Send MessageCounter ! Tick(send Tick message to counter method -> put it in mail box)

Replyclass SomeActor extends Actor{ def receive = {

case User(name) => sender ! (“Hi” + name) }}

To Change behaviour self become{ case NewMessage => …..}

Actors in AKKA

Page 69: Graph Analytics

Failure Strategyclass MySupperVision extends Actor{ def supervisionStratogy = OneForOneStratogy({ case_ : ActorKilledException => Stop case_ : ArithmaticException => Resume case_ : Exception => Restart }, maxNrOfRetries = None , with in Time Range = None) def recive = { case NewUser(name) => ….. = context.actorOf[User] (name)

Actors in AKKA

Page 70: Graph Analytics

Remoting remote actors have a different kind of ActorRef

akka{ actor{ provider = akka.remote.RemoteActorRefProvider deployment{ counter{ remote = akka://mysystem@hostname:255 } } }}

Actors in AKKA