Mikko Vapa, researcher student InBCT 3.2 Cheese Factory / P2P Communication Agora Center

UNIVERSITY OF JYVÄSKYLÄ

Building NeuroSearch – Intelligent Evolutionary Search Algorithm For Peer-to-Peer EnvironmentMaster’s Thesis by Joni Töyrylä 3.9.2004

Mikko Vapa, researcher studentInBCT 3.2 Cheese Factory / P2P Communication

Agora Center

http://tisu.it.jyu.fi/cheesefactory

2004


Contents• Resource Discovery Problem• Related WorkRelated Work• Peer-to-Peer NetworkPeer-to-Peer Network• Neural NetworksNeural Networks• Evolutionary ComputingEvolutionary Computing• NeuroSearch• Research Environment• Research Cases

– Fitness– PopulationPopulation– Inputs– Resources– Queriers– Brain Size

• Summary and Future

2004


Resource Discovery Problem

• In peer-to-peer (P2P) resource discovery problem a P2P node decides based on local knowledge which neighbors would be the best targets (if any) for the query to find the needed resource

• A good solution locates the predetermined number of resources using minimal number of packets

2004


NeuroSearch

• NeuroSearch resource discovery algorithm uses neural networks and evolution to adapt its behavior to given environment– neural network for deciding whether to pass the query further

down the link or not– evolution for breeding and finding out the best neural

network in a large class of local search algorithms

Query

Forward the query

Forward the query

Neighbor Node

Neighbor Node

2004


NeuroSearch’s Inputs• The internal structure of NeuroSearch algorithm

• Multiple layers enable the algorithm to express non-linear behavior

• With enough neurons the algorithm can universally approximate any decision function

2004


NeuroSearch’s Inputs

• Bias is always 1 and provides means for neuron to produce non-zero output with zero inputs

• Hops is the number of links the message has gone this far• Neighbors (also known as currentNeighbors or MyNeighbors) is

the amount of neighbor nodes this node has• Target’s neighbors (also known as toNeighbors) is the amount

of neighbor nodes the message’s target has• Neighbor rank (also known as NeighborsOrder) tells target’s

neighbor amoun related to current node’s other neighbors• Sent is a flag telling if this message has already been forwarded

to the target node by this node• Received (also known as currentVisited) is a flag describing

whether the current node has got this message earlier

2004


NeuroSearch’s Training Program

• The neural network weights define how neural network behaves so they must be adjusted to right values

• This is done using iterative optimization process based on evolution and Gaussian mutation

Define thenetwork

conditions

Define the quality requirements

for the algorithm

Create candidate algorithmsrandomly

Select the bestones for next

generation

Breed a newpopulation

Finally select thebest algorithm forthese conditions

Iteratethousands

ofgenerations

2004


Research Environment

• The peer-to-peer network being tested contained:– 100 power-law distributed P2P nodes with 394 links and 788

resources– Resources were distributed based on the number of connections the

node has meaning that high-connectivity nodes were more likely to answer to the queries

– Topology was static so nodes were not disappearing or moving– Querier and the queried resource were selected randomly and 10

different queries were used in each generation (this was found to be enough to determine the overall performance of the neural network)

• Requirements for the fitness function were:– The algorithm should locate half of the available resources for every

query (each obtained resource increased fitness 50 points)– The algorithm should use as minimal number of packets as possible

(each used packet decreased fitness by 1 point)– The algorithm should always stop (stop limit for number of packets

was set to 300)

2004


Research Environment

2004


Research Cases - Fitness

• Fitness value determines how good the neural network is compared to others

• Even smallest and simplest neural networks manage to have fitness value over 10000

• Fitness value is calculated for poor NeuroSearch as following:

Fitness = 50 * replies – packets = 50*239 – 1290 = 10660

Note: Because of bug Steiner tree does not locate half of replies and thus gets a lower fitness than HDS

2004


Research Cases – Random Weights• 10 million new neural networks were randomly generated• It seems that over 16000 fitness values cannot be obtained

purely by guessing and therefore we need optimization method

2004


Research Cases - Inputs

• Different inputs were tested individually and together to get a feeling what inputs are important

Using Hops we can forexample design rules:”I have travelled 4 hops,I will not send further”

2004

UNIVERSITY OF JYVÄSKYLÄ ”Target node contains 10 neighbors,I will send further”

”Target node contains the most number ofneighbors compared to all my neighbors,I will not send further”

2004


”I have received this query earlier,I will not send further”

”I have 7 neighbors,I will send further”

2004


The results indicate that using only one topological information is more efficient than combining it with other topological information (the explanation for this behavior is still unclear)

2004


Also the results indicate that using only one query related information is more efficient than combining it with other query related information (the explanation for this behavior is also unclear)

2004


Research Cases - Resources• The needed percentage of resources was varied and the results

compared to other local search algorithms (Highest Degree Search and Breadth-First Search) and to near-optimal search trees (Steiner)

Note: Breadth-FirstSearch curve needsto be halved becausethe percentage wascalculated to half ofresources and not allavailable resources

2004


Research Cases - Queriers

• The effect of lowering the amount of queriers per generation to calculate fitness value of neural network was examined

• It was found that the number ofqueriers can be dropped from 50 to 10 and still we get reliable fitness values Speeds up the optimizationprocess significantly

2004


Research Cases – Brain Size

• The amount of neurons on first and second layer were varied• It was found that there exists many different kind of

NeuroSearch algorithms

2004



• Also optimization of larger neural networks takes more time

2004



• And there exists an interesting breadth-first search vs. depth-first search dilemma where:– smaller networks obtain best fitness values with breadth-first

search strategy,– medium-sized networks obtain best fitness values with

depth-first search strategy and– large-sized networks obtain best fitness values with breadth-

first search strategy• In overall it seems that best fitness 18091.0 can be obtained

with breadth-first strategy using 5 hops with neuron size of 25:10 (25 on the first hidden layer and 10 on the second hidden layer)

2004


25:10 had the greatest fitness value

Would more generations than 100.000increase the fitness when 1st hiddenlayer contains more than 25 neurons?

20:10 had the greatest average hops value

What happens if the number of neuronson 2nd hidden layer is increased? Willthe average number of hops decrease?

2004


Summary and Future

• The main findings of the thesis were that:– Population size of 24 and query amount of 10 are sufficient– Optimization algorithm needs to be used, because randomly

guessing neural network weights does not give good results– Individual inputs give better results than combination of two inputs

(however the best fitnesses can be obtained by using all 7 inputs)– By choosing specific set of inputs NeuroSearch may imitate any

existing search algorithm or it may behavior as combination of any of those

– Optimal algorithm (Steiner) has efficiency of 99%, whereas the best known local search algorithm (HDS) achieves 33% and NeuroSearch 25%

– Breadth-first search vs. Depth-first search dilemma exists, but no good explanation can be given yet

2004


Summary and Future

• In addition to the problems shown this far, for the future work of NeuroSearch it is suggested that:– More inputs would be designed such that they provide useful

information e.g., the number of received replies, inputs used by Highest-Degree Search algorithm, inputs that define how many forwarding decisions have already been done in the current decision round and how many are still left

– Probability based output instead of threshold function could also be tested

– The correct neural network architecture and the size of population could be dynamically adjusted during evolution to find an optimal structure more easily

Mikko Vapa, researcher student InBCT 3.2 Cheese Factory / P2P Communication Agora Center

Documents

Transcript of Mikko Vapa, researcher student InBCT 3.2 Cheese Factory / P2P Communication Agora Center