UNIVERSITY OF JYVÄSKYLÄ Yevgeniy Ivanchenko Yevgeniy Ivanchenko University of Jyväskylä...

UNIVERSITY OF JYVÄSKYLÄ

Yevgeniy IvanchenkoYevgeniy IvanchenkoUniversity of Jyväskylä[email protected]

2004


OBJECTIVES (I)

• Since nothing is known about decision mechanism of NeuroSearch we need to look inside the algorithm to understand its behavior.

• Since nothing is known about behavior of NeuroSearch algorithm in dynamic environment, we need to know its behavior under conditions that are approximated to real life situation.

2004


OBJECTIVES (II)

• To understand behavior of NeuroSearch data analysis techniques were used. The Self-Organizing Maps (SOM) is well known tool to perform data mining task.

• Set of rules was obtained based on the analysis of NeuroSearch. The rules were tested in static environment. The question that arises here: Is it possible to use the algorithm, which utilized properties of static environment, in dynamic scenario?

2004


OBJECTIVES (III)

• If we know the inner structure of decision mechanism of NeuroSearch we will be able to tell about contribution of every input to particular decision of the algorithm. This for example can be used to remove unnecessary input information.

• This also can help evaluate complexity and robustness of the algorithm.

2004


SOM (I)

• SOM is neural network model that maps high dimensional space onto low-dimensional space (usually two dimensional).

• After using SOM algorithm similar vectors from the input space are located near each other in the output space. This can help investigate properties of obtained clusters and as a consequence causes that produced these clusters on the output map.

2004


SOM (II)

R1R2

• Usually SOM represents itself either hexagonal or rectangular grid of neurons. In the figure R1 and R2 denote different neighborhood size.

• During the training process size of neighborhood is slightly decreased to provide more accurate adjustment of the weights of the neurons.

2004


SOM (III)

BMU

• In the figure one can see that the neurons that are ‘covered’ by neighborhood kernel function move closer to the input vector.

• Best Matching Unit (BMU) is the closest neuron to the current input vector.

• The weights of the neurons are updated according to the kernel function and the distance to BMU.

2004


DATA ANALYSIS (I)

• NeuroSearch can be considered as the main part of information model of the system. To build this system black box method was used: we are modeling external behavior of the system and at the same time we don’t know what are the causes of particular behavior of the system.

• To investigate decision mechanism of NeuroSearch analysis of input-output pairs was done using SOM.

2004


DATA ANALYSIS (II)

• To perform the analysis we used Component plane & U-matrix with ‘hit’ distribution on it. Component plane visualizes values of all components of the vectors according to the output map. U-matrix is one of possible ways to visualize the output map. The ‘hits’ on the U-matrix correspond to the decisions of NeuroSearch.

• This approach allows us investigating not only contribution of each component to particular decision, but also the correlations between components.

2004


From

• The figure shows U-matrix (the left side of the figure) & fragment of Component plane (the right side of the figure).

• It is easy to see variable From is responsible for stopping further forwarding of the queries where it is 1.

• Other variables have different values in the area where From is 1, for example variable toUnsearchedNeighbors has different values in this area.

DATA ANALYSIS (III)

toUnsearchedNeighborsU-matrix

2004


DATA ANALYSIS (IV)

• After the analysis it was found that 4 variables (From, toVisited, Sent and currentVisited) are responsible for stopping further forwarding of the queries.

• Variables toUnsearchedNeighbors and Neighbors are correlated.

• Variables packetsNow and Hops are highly correlated.• Variables fromNeighborAmount, packetsNow and

Hops are correlated somehow.• NeuroSearch mostly doesn’t send the queries further if

Neighbors or toUnsearchedNeighbors is small.

2004


DATA ANALYSIS (V)

• Further investigation of the algorithm is based on Hops because only this variable shows the state of the algorithm in particular time interval, in other words analyzing intervals of this variable we can monitor the queries through their path.

• The maximum length of the queries’ path is 7. Thus we have 7 different cases to analyze.

• Data for each case contains only samples with the currently investigating value of Hops variable. All samples where at least one of From, Sent, currentVisted or toVisited variables is equal to 1 were removed as well. It is because we already know behavior of the algorithm in these areas.

2004


DATA ANALYSIS (VI)

• After investigation of the algorithm for the different values of Hops we have produced Rule Based Algorithm (RBA). RBA is based on rules that were extracted using analysis of U-matrix and corresponding component plane.

• General strategy of the algorithm is quite simple: A decision is mostly based on interconnection between Hops, Neighbors/toUnsearchedNeighbors and NeighborsOrder values. In the beginning the algorithm sends the queries to the most connected nodes. When number of hops in the query is increasing NeuroSearch slightly starts to forward the queries to low-connected nodes.

2004


DATA ANALYSIS (VII)

Algorithm Packets RepliesBFS-2 3000 619

BFS-3 12464 1325

NeuroSearch 4703 979

RBA 4904 963

The table shows efficiency of four algorithms. One can see that NeuroSearch and RBA have almost the same level of performance. This means that RBA adapted behavior of NeuroSearch and we can say that SOM suits well for analyzing of NeuroSearch. Both these algorithms have better performance compared to BFS2 and BFS3.

Comparison between algorithms

2004


DYNAMIC ENVIRONMENT (I)

• Since RBA is based on decision mechanism of NeuroSearch it is possible to evaluate behavior of NeuroSearch using RBA in dynamic environment.

• As a simulation environment P2P extension for NS-2 was built.

• The environment provides quite high dynamical changes. There are two different classes of probabilities that define dynamical changes in the network. The first class is defined randomly before starting the simulation. The second is defined by the formulas:

tyconnectiviyprobabilitleaving

1

tyconnectiviyprobabilitjoining

26

1

2004


DYNAMIC ENVIRONMENT (II)

To make qualitative evaluation of performance, RBA was compared to BFS2 and BFS3 in static and dynamic environments. Number of replies and amount of used packets in static environment are shown in the figures:

2004


DYNAMIC ENVIRONMENT (III)

• Analyzing behavior of the algorithms in static environment one can see that mostly RBA locates more resources than BFS2 and significantly less than BFS3.

• In general RBA uses more packets than BFS2 and significantly less than BFS3.

• This situation satisfies us because RBA is based on NeuroSearch’s decision mechanism that is trained to locate only half of available resources.

• In some points RBA locates more resources than BFS3 algorithm and in the same time uses less packets. This means that if some resource isn’t common in the network, RBA and as a consequence NeuroSearch can find enough instances of this resource.

2004


DYNAMIC ENVIRONMENT (IV)

Number of replies and amount of used packets in dynamic environment are shown in the figures:

Analyzing the figures one can see that performance of the algorithms didn’t suffer so much in the dynamic environment.

2004


DYNAMIC ENVIRONMENT (V)

AlgorithmPackets Replies

Static dynamic static dynamic

BFS2 3000 2515 619 528

BFS3 12464 10040 1325 1245

RBA 4904 4865 963 900

Total number of located resources and used packets in static and dynamic environment are shown in the table:

The algorithms still can find enough resources in dynamicenvironment. There are two possible causes that can explain the fact that all investigated algorithms found a little bit fewer resources:1) Some nodes in offline mode could contain queried resources.2) Some nodes in offline mode could lie on possible path of the query.

2004


DYNAMIC ENVIRONMENT (VI)

• The algorithms used less packets in dynamic environment than in static environment.

• BFS strategy is very sensitive to the size of the network, because BFS based algorithms used significantly less packets in dynamic environment where size of the network was smaller all the simulation time.

• RBA used approximately the same amount of packets in both environments. Therefore we can say that RBA is not strongly sensitive to the size of the network.

2004


FUTURE WORK

• Developing the supervised approach to train NeuroSearch.• Developing modification of the algorithm for ad hoc wireless P2P

networks.• Paying more detailed and deeper attention to the inner structure

of the algorithm, using knowledge discovery methods.• Investigating and utilizing properties of other P2P algorithms to

answer to the question about adding these properties to NeuroSearch.

2004


Thank you!Thank you!

UNIVERSITY OF JYVÄSKYLÄ Yevgeniy Ivanchenko Yevgeniy Ivanchenko University of Jyväskylä...

Documents

Transcript of UNIVERSITY OF JYVÄSKYLÄ Yevgeniy Ivanchenko Yevgeniy Ivanchenko University of Jyväskylä...