© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem,...
-
date post
19-Dec-2015 -
Category
Documents
-
view
216 -
download
2
Transcript of © 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem,...
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel February 19, 2007
Efficient implementation of BP in P2P networks
Roman Schmidt
School of Computer and Communication SciencesEcole Polytechnique Fédérale de Lausanne (EPFL)
Evergrow Loopy Belief Propagation Algorithm and Applications WorkshopJerusalem, Israel, February 19-21, 2007
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 2February 19, 2007
Motivation
• Users share (correlated) data in P2P systems– currently mainly for retrieval
– but correlations hold hidden knowledge
• Profit by correlations for new services– Distributed Knowledge Base (e.g., for software bugs)
– Structure/cluster data (e.g., for better search results)
– Recommendation system (e.g., for data annotation)
– etc.
• Distributed Inference System on top of a P2P system
• Current focus and contribution– Message reduction
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 3February 19, 2007
Outline
• Motivation
• Basic Concepts
– Belief Propagation
– The P-Grid Overlay
• P2P Belief Propagation
– Inference Architecture
– The Relaxation Algorithm
• Evaluation
• Conclusions
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 4February 19, 2007
Belief Propagation
• Inference based on Bayesian networks– models dependencies between variables
• Iterative message-passing algorithm– compute marginal probabilities (“beliefs”)– provably efficient on trees, works for arbitrary networks
OS1 Driver1
App1
True FalseInstalled 0.2 0.8
True FalseInstalled 0.2 0.8
OS1 Driver1 Runs Error T T 0.9 0.1 T F 0.4 0.6 F T 0.0 1.0 F F 0.0 1.0
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 5February 19, 2007
The BP message-passing algorithm
• Sends messages across edges
– 2 messages per edge and iteration
– if all messages from previous iteration were received
• Beliefs are updated per iteration
– algorithm terminates if beliefs stabilize
• Messages are vectors
– length corresponds to the number of node states
• Computation complexity grows exponentially with
the number of states of nodes
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 6February 19, 2007
The P-Grid Overlay
• Peers are organized in a binary trie structure– one node for every common prefix
– trie is only virtual (exists only via routing tables)
– all nodes remain at the leaf-level (no hierarchy)
• Multiple peers per key space partition
• Multiple routing entries (random choice)– per routing table level
• Logarithmic search complexity– even for skewed data distributions
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 7February 19, 2007
P-Grid routing
00*
0*
01*
1*
10* 11*
AA FF BB CC DD EE
1* : C, D01* : B
Stores data with key prefix 00
1* : E01* : B
Stores data with key prefix 00
1* : C, D00* : F
Stores data with key prefix 01
0* : A, B11* : E
Stores data with key prefix 10
0* : A, F11* : E
Stores data with key prefix 10
0* : B, F10* : D
Stores data with key prefix 11
queryfor ‘100’
queryfor ‘100’
• Keys resolved by longest prefix matching– Insures logarithmic search cost for skewed trees
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 8February 19, 2007
The Distributed Inference System
• P-Grid– Bug reports, metadata, tags, etc.
– Bayesian network
• Variables (spread over P-Grid nodes)
• Dependencies between variables
• Distributive learning
• Belief Propagation– Distributed inference
• Message-passing algorithm
• Identified problem– high message cost
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 9February 19, 2007
Spring Relaxation
• Bayesian network as spring network
– find minimum energy configuration (relax springs)
– energy is proportional to the distance between P-Grid nodes
– variables at the same node require no energy
– optimal: all variables at one node (load balancing)
• Decentralized algorithm
– nodes try to relax their springs
– move correlated variables close to each other
– optimally, at the same node (no physical message)
– considering load distribution
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 10February 19, 2007
Spring relations in P-Grid
00*
0*
01*
1*
10* 11*
AA FF BB CC DD EE
1* : C, D01* : B
a -> h, tf -> o, r…
1* : E01* : B
a -> h, tf -> o, r…
1* : C, D00* : F
h -> a, m m -> h, u…
0* : A, B11* : E
o -> f r -> f, t…
0* : A, F11* : E
o -> fr -> f, t…
0* : B, F10* : D
t -> a, ru -> m…
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 11February 19, 2007
The Relaxation Algorithm (relax variables)
currentLoad = length(localVars);overload = currentLoad - avgLoad / 2;IF (overload <= 0) return;ENDIFundirVars = variables having a tension only at one level;WHILE ((overload > 0) AND (length(unidirVars) > 0)) move variable to a peer from the level with the tension; removeFirst(unidirVars); overload = overload - 1;ENDWHILE…
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 12February 19, 2007
The Relaxation Algorithm (balance load)
…multidirVars = vars having tensions at multiple levels;WHILE ((currentLoad > avgLoad) AND (length(multidirVars) > 0)) FOR i = routingTable.levels TO 1 IF (level i is underpopulated) cand = vars having a tension at level i; FOR j = 1 TO length(cand) IF (cand(j).tension(i) >= max(cand(j).tension)) move variable to a peer from level i; remove(multidirVars, cand(j)); currentLoad = currentLoad - 1; IF (currentLoad <= avgLoad) break; ENDIF; ENDIF; ENDFOR; ENDIF; ENDFOR; ENDWHILE
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 13February 19, 2007
Algorithm execution
• Executed at each node– Iteratively
– Independently (evaluated simultaneous)
• Termination– Max. number of iterations
– No free or multi-directional variables to move
– No tension reduction in last two iterations
• Effort– Variable movements require only 1 message
– Trade-off to message reduction
– Dynamic variables require “remote” updates
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 14February 19, 2007
Evaluation
• Matlab implementation
• Diverse Bayesian networks
– random, binary trees, scale-free
– up to 2048 Bayesian nodes
– up to 512 P-Grid nodes
• 10 repetitions
• 2 main evaluation criterions
– message reduction
– load balance
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 15February 19, 2007
Random network
1024 nodes, average node degree 4
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 16February 19, 2007
Binary tree network
1023 nodes
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 17February 19, 2007
Scale-free network
1024 nodes, average node degree 4
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 18February 19, 2007
Message reduction (random)
128 / 2048 / 4
256 / 2048 / 4 512 / 2048 / 4
64 / 2048 / 4
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 19February 19, 2007
Message reduction (binary tree)
64 / 2047 128 / 2047
256 / 2047 512 / 2047
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 20February 19, 2007
Message reduction (scale-free)
64 / 2048 128 / 2048
256 / 2048 512 / 2048
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 21February 19, 2007
Load balancing (random)
128 / 2048 / 4
256 / 2048 / 4 512 / 2048 / 4
64 / 2048 / 4
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 22February 19, 2007
Load balancing (binary tree)
64 / 2047
128 / 2047
256 / 2047512 / 2047
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 23February 19, 2007
Load balancing (scale-free)
64 / 2048128 / 2048
128 / 2048
512 / 2048
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 24February 19, 2007
Number of iterations
• Till relaxation algorithm termination• Scale-free network (128 nodes, 1024 vars)• 100 runs
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 25February 19, 2007
Reduction effort (random)
128 / 2048 / 4
256 / 2048 / 4 512 / 2048 / 4
64 / 2048 / 4
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 26February 19, 2007
Reduction effort (binary tree)
64 / 2047 128 / 2047
256 / 2047 512 / 2047
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 27February 19, 2007
Reduction effort (scale-free)
64 / 2048 128 / 2048
128 / 2048 512 / 2048
© 2007, Roman SchmidtDistributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel 28February 19, 2007
Conclusions
• Decentralized relaxation algorithm
– Reduces message cost for Belief Propagation
– Considers load balance
• Several scenarios (Distributed Knowledge Base)
• First evaluation looks promising
• Intermediate steps are still missing
– Learning of Bayesian network