Testing neural networks using the complete round robin method

53
1 Testing neural networks using the complete round robin method Boris Kovalerchuk, Clayton Todd, Dan Henderson Dept. of Computer Science, Central Washington University, Ellensburg, WA, 98926-7520 [email protected] [email protected] [email protected]

description

Testing neural networks using the complete round robin method. Boris Kovalerchuk, Clayton Todd, Dan Henderson Dept. of Computer Science, Central Washington University, Ellensburg, WA, 98926-7520 [email protected] toddc l @cwu.edu danno @ elltel . net. Problem. Common practice - PowerPoint PPT Presentation

Transcript of Testing neural networks using the complete round robin method

Page 1: Testing neural networks using the complete round robin method

1

Testing neural networks using the complete round

robin method

Boris Kovalerchuk, Clayton Todd, Dan Henderson

Dept. of Computer Science, Central Washington University, Ellensburg, WA, 98926-7520

[email protected] [email protected] [email protected]

Page 2: Testing neural networks using the complete round robin method

2

Problem

Common practice Pattern discovered by learning methods such

as NN are tested in the data before they are used for their intended purpose.

Reliability of testing The conclusion about reliability of discovered

patterns depends on testing methods used.

Problem A testing method may not test critical

situations.

Page 3: Testing neural networks using the complete round robin method

3

Result New testing method

Tests much wider set of situations than the traditional round robin method

Speeds up required computations

Page 4: Testing neural networks using the complete round robin method

4

Novelty and Implementation

Novelty The mathematical mechanism based on the

theory of monotone Boolean functions and Multithreaded parallel processing.

Implementation The method has been implemented for

backpropagation neural networks and successfully tested for 1024 neural networks by using SP500 data.

Page 5: Testing neural networks using the complete round robin method

5

1. Approach and Method

Common approach 1. Select subsets in the data set D

subset Tr for training and

subset Tv for validating the discovered patterns.

2. repeat step 1 several times for different subsets

3. compare if results are similar to each other than a discovered

regularity can be called reliable for data D.

Tr

Tv

Page 6: Testing neural networks using the complete round robin method

6

Known Methods[Dietterich, 1997]

Random selection of subsets bootstrap aggregation (bagging)

Selection of disjoint subsets cross validated committees

Selection of subsets according the probability distribution Boosting

Page 7: Testing neural networks using the complete round robin method

7

Problems of sub-sampling

Different regularities for different sub-samples of Tr

Rejecting and accepting regularities heavily depends on a specific splitting

of Tr Example:

non-stationary financial time series bear and bull market trends for different

time intervals.

Page 8: Testing neural networks using the complete round robin method

8

Splitting sensitive regularities for non-stationary data

A A’ Cregularity 1(bear market)

B’ B C regularity 2 (bull market)

Independentdata C

Training sample Tr

Reg2 does not work on B’=A

Reg1 does not work on A’=B

Page 9: Testing neural networks using the complete round robin method

9

Round robin method

Eliminates arbitrary splitting by examining several groups of subsets of Tr.

The complete round robin method examines all groups of subsets of Tr.

Drawback 2n possible subsets, where n is the number of

groups of objects in the data set. Learning 2n neural networks is a

computational challenge.

Page 10: Testing neural networks using the complete round robin method

10

Implementation

Complete round robin method applicable to both attribute-based and relational data mining methods.

The method is illustrated for neural networks.

Page 11: Testing neural networks using the complete round robin method

11

Data

Let M be a learning method and D be a data set of N objects,

represented by m attributes. D={di},i=1,…,N, di=(di1, di2,…, dim). Method M is applied to data D for

knowledge discovery.

Page 12: Testing neural networks using the complete round robin method

12

Grouping data

Example -- a stock time series. the first 250 data objects (days) belong to

1980, the next 250 objects (days) belong to 1981,

and so on.

Similarly, half years, quarters and other time intervals can be used.

Any of these subsets can be used as training data.

Page 13: Testing neural networks using the complete round robin method

13

Testing data

Example 1 (general data) Training -- the odd groups #1, #3, #5, #7 and

#9 Testing -- #2, #4, #6, #8 and #10

Example 2 (time series) Training -- #1, #2, #3, #4 and #5 Testing -- #6, #7, #8, #9 and #10

Used third path completely independent later data C for

testing.

Page 14: Testing neural networks using the complete round robin method

14

Hypothesis of monotonicity

D1 and D2 are training data sets and Perform1 and Perform2

the performance indicators of learned models using D1 and D2

Binary Perform index, 1 stands for appropriate performance and 0 stands for inappropriate performance.

The hypothesis of monotonicity (HM) If D1 D2 then Perform1 Perform2.

(1)

Page 15: Testing neural networks using the complete round robin method

15

Assumption

If data set D1 covers data set D2, then performance of method M on D1 should be better or equal to performance of M on D2.

Extra data bring more useful information than noise for knowledge discovery.

Page 16: Testing neural networks using the complete round robin method

16

Experimental testing of the hypothesis

Hypothesis P(M,Di) P(M,Dj) for performance is not always true.

1024 subsets of ten years and generated NN

Found 683 subsets such that Di Dj. Surprisingly, in this experiment

monotonicity was observed for all of 683 combinations of years.

Page 17: Testing neural networks using the complete round robin method

17

Discussion

Experiment strong evidence for use of monotonicity

along with the complete round robin method.

Incomplete round robin method assumes some kind of independence of training subsets used.

Discovered monotonicity shows that independence at least should be tested.

Page 18: Testing neural networks using the complete round robin method

18

Formal notation

The error Er for data set D={di}, i=1,…,N, is the normalized error of all its components di:

T(di) is the actual target value for di d J(di) is the target value forecast delivered

by discovered model J, i.e., the trained neural network in our case.

N

i i

i

N

i i

T

JTEr

1

2

1

)(

))()((

d

dd

Page 19: Testing neural networks using the complete round robin method

19

Performance

Performance is measured by the error tolerance (threshold) Q0 of error Er:

0

0

QErif0,

QErif1, Perform

Page 20: Testing neural networks using the complete round robin method

20

Binary hypothesis of monotonicity

Combinations of years are coded as binary vectors vi=(vi1,vi2…vi10) like

0000011111 with 10 components from (0000000000) to (1111111111)

total 2n=1024 data subsets. Hypothesis of monotonicity

If vi vj then Performi Performj (2) Here

vi vj vik vjk for all k=1,...,10.

Page 21: Testing neural networks using the complete round robin method

21

Monotone Boolean Functions

Not every vi and vj are comparable with each other by the ”” relation.

Perform as a quality indicator Q: Q(M, D, Q0) =1 Perform=1, (3) where Q0 is some performance limit.

Rewritten monotonicity (2) If vi vj then Q(M, Di, Q0) Q(M, Dj, Q0) (4)

Q(M, D, Q0) is a monotone Boolean function of D. [Hansel, 1966; Kovalerchuk et al, 1996]

Page 22: Testing neural networks using the complete round robin method

22

Use of Monotonicity

A method M, Data sets D1 and D2, with D2 D1. Monotonicity

if the method M does not perform well on the data D1, then it will not perform well on the data D2 either.

under this assumption, we do not need to test method M on D2.

Page 23: Testing neural networks using the complete round robin method

23

Experiment

SP500 running method M 250 times instead of

the complete 1024 times. Tables

Page 24: Testing neural networks using the complete round robin method

24

Experiments with SP500 and Neural Networks

The backpropagation neural network for predicting SP500 [Rao, Rao, 1993]

0.6% prediction error on the SP500 test data (50 weeks) with 200 weeks (about four years) of training data.

Is this result reliable? Will it be sustained for wider training and testing data?

Page 25: Testing neural networks using the complete round robin method

25

Data for testing reliability

The same data:

Training data -- all trading weeks from 1980 to 1989 and

Independent testing data -- all trading weeks from 1990-1992.

Page 26: Testing neural networks using the complete round robin method

26

Complete round robin

Generated all 1024 subsets of the ten training years (1980-

1989) Computed

1024 the corresponding backpropagation neural networks and

their Perform values. 3% error tolerance threshold (Q0=0.03) (higher than 0.6% used for the smaller data set in [Rao, Rao, 1993]).

Page 27: Testing neural networks using the complete round robin method

27

Performance

Table 1. Performance of 1024 neural networksPerformance

Training TestingNumber ofneuralnetworks

% of neuralnetworks

0 0 289 28.250 1 24 2.351 0 24 2.351 1 686 67.05

A satisfactory performance is coded as 1 and non-satisfactory performance is coded as 0.

Page 28: Testing neural networks using the complete round robin method

28

Analysis of performance

Consistent performance (training and testing) 67.05% + 28.25%= 95.3% with 3% error tolerance sound Performance on training data =>

sound Performance on testing data (67.05%) unsound Performance on training data =>

unsound Performance on testing data (28.25%)

A random choice of data for training from ten-year SP500 data will not produce regularity in 32.95% of cases, although regularities useful for forecasting do exist.

Page 29: Testing neural networks using the complete round robin method

29

Specific Analysis

Table 2 9 nested data subsets of the possible

1024 subsets Begin the nested sequence with a

single year (1986). This single year’s data is too small to

train a neural network to a 3% error tolerance.

Page 30: Testing neural networks using the complete round robin method

30

Specific analysis

Table 2. Backpropagation neural networks performance for different data subsets.Training yearsPerformance Binary code

for set ofyears

80818283848586878889TrainingTesting90-92

0000001000x00

0000001001xx00

0000001011xxx00

0000011011xxxx11

0000111011xxxxx11

0001111011xxxxxx11

0011111011xxxxxxx11

0111111011xxxxxxxx11

1111111011xxxxxxxxx11

9 nested data subsets of the possible 1024 subsets

Page 31: Testing neural networks using the complete round robin method

31

Nested sets

A single year (1986) is too small to train a neural network to a 3%

error tolerance. 1986, 1988 and 1989,

produce the same negative result. 1986, 1988, 1989 and 1985,

the error moved below the 3% threshold. Five or more years of data

also satisfy the error criteria. The monotonicity hypothesis is confirmed.

Page 32: Testing neural networks using the complete round robin method

32

Analysis

All other 1023 combinations of years only a few combinations of four years

satisfy the 3% error tolerance, practically all five-year combinations

satisfy the 3% threshold. all combinations of over five years

satisfy this threshold.

Page 33: Testing neural networks using the complete round robin method

33

Analysis

Four-year training data sets produce marginally reliable forecasts. Example.

1980, 1981, 1986, and 1987, corresponding to the binary vector (1100001100) do not satisfy the 3% error tolerance

Page 34: Testing neural networks using the complete round robin method

34

Further analyses

Three levels of error tolerance, Q0: 3.5%, 3.0% and 2.0%.

The number of neural networks with sound performance goes down from

81.82% to 11.24% by moving error tolerance from 3.5% to 2%.

NN with 3.5% error tolerance are much more reliable than networks with 2.0

% error tolerance.

Page 35: Testing neural networks using the complete round robin method

35

Three levels of error tolerance

Table 3. Overall performance of 1024 neural networks with different error tolerancePerformanceError toleranceTraining Testing

Number of neural networks

% of neural networks

3.5% 0 0 167 16.320 1 3 0.2931 0 6 0.5861 1 837 81.81

3.0% 0 0 289 28.250 1 24 2.351 0 24 2.351 1 686 67.05

2.0% 0 0 845 82.600 1 22 2.1501 0 31 3.0301 1 115 11.24

Page 36: Testing neural networks using the complete round robin method

36

Performance

0102030405060708090

<0,0> <1,1> <0,1> <1,0>

Performance <training,testing>

% o

f n

eu

ral n

etw

ork

s

Er. tolerance=3.5% Er.tolerance=3.0% Er. tolerance=2.0%

Page 37: Testing neural networks using the complete round robin method

37

Standard random choice method

A random choice of training data for 2% error tolerance will more often reject

the training data as insufficient.

This standard approach does not even allow us to know how unreliable the result is without running the complete 1024 subsets in complete round robin method.

Page 38: Testing neural networks using the complete round robin method

38

Reliability of 0.6 error tolerance

Rao and Rao [1993] the 0.6% error for 200 weeks (about

four years) from the 10-year training data.

unreliable (table 3)

Page 39: Testing neural networks using the complete round robin method

39

Underlying mechanism

The number of computations depends on a sequence of testing data subsets Di.

To optimize the sequence of testing, Hansel’s lemma [Hansel, 1966, Kovalerchuk et al, 1996] from the theory of monotone Boolean functions is applied to so called Hansel’s chains of binary vectors.

Page 40: Testing neural networks using the complete round robin method

40

Computational challenge of Complete round robin method

Monotonicity and multithreading significantly speed up computing Use of monotonicity with 1023

threads decreased average runtime about 3.5 times from 15-20 minutes to 4-6 minutes in order to train 1023 Neural Networks in the case of mixed 1s and 0s for output.

Page 41: Testing neural networks using the complete round robin method

41

Error tolerance and computing time

Different error tolerance values can change output and runtime. extreme cases with all 1’s or all 0’s as

outputs.. File preparation to train 1023 Neural

Networks. The largest share of time for file

preparation (41.5%) is taken by files using five years in the data subset (table 5).

Page 42: Testing neural networks using the complete round robin method

42

Runtime

Table 1. Runtime for different error tolerance settingsMethod Average time for 1023

Neural Networks1 Processor, no threads

Average time for 1023Neural Networks1 Processor, 1023 threads

Round-Robin with monotonicity formixed 1’s and 0’s as output

15-20 min. 6-4 min.

Round-Robin with monotonicity forall 1’s as output

10 min. 3.5 min.

Page 43: Testing neural networks using the complete round robin method

43

Time distribution

Table 5. Time for backpropagation and file preparation Set of years % of time in File preparation % of time in Backpropagation0000011111 41.5% 58.5%0000000001 36.4% 63.6%1111111111 17.8% 82.2%

Page 44: Testing neural networks using the complete round robin method

44

General logic of software

The exhaustive option generate 1024 subsets using a file

preparation program and compute backpropagation for all subsets

Optimized option generate Hansel chains, store them compute backpropagation only for

specific subsets dictated by the chains.

Page 45: Testing neural networks using the complete round robin method

45

Implemented optimized option

1. For a given binary vector the corresponding training data are produced.

2. Backpropagation is computed generating the Perform value.

3. The Perform value is used along with a stored Hansel chains to decide which binary vector (i.e., subset of data) will be used next for learning neural networks.

Page 46: Testing neural networks using the complete round robin method

46

System Implementation

The exhaustive set of Hansel chains is generated first and stored for a desired case.

A 1024 subsets of data are created using a file preparation program.

Based on the sequence dictated by the Hansel Chains, we produce the corresponding training data and compute backpropagation to generate the Perform value. The next vector to be used is based on the stored Hansel Chains and the Perform value.

Page 47: Testing neural networks using the complete round robin method

47

User Interface and Implementation

Built to take advantage of Windows NT threading.

Built with Borland C++ Builder 4.0. Parameters for the Neural Networks

and sub-systems are setup through the interface.

Various options, such as monotinicity, can be turned on or off through the interface.

Page 48: Testing neural networks using the complete round robin method

48

Multithreaded Implementation

Independent Sub-Processes Learning of an individual neural network

or a group of the neural networks. Each sub-process is implemented as an

individual thread. Run in parallel on several processors to

further speed up computations.

Page 49: Testing neural networks using the complete round robin method

49

Threads

Two different types of threads. Worker Thread and Communication Threads

Communication Threads send and receive the data.

Work threads are started by the Servers to perform the calculations on the data.

Page 50: Testing neural networks using the complete round robin method

50

Client/Server Implementation

Decreases the workload per-computer Client does the data formatting, Server

performs all the data manipulation. A client connects to multiply servers. Each server performs work on the data

and returns the results to the client. The client formats the returned results

and gives a visualization of them.

Page 51: Testing neural networks using the complete round robin method

51

Client

Loads vectors into linked list. Attaches threads to each node. Connects thread to active server. Threads send server activation packet Waits for server to return results. Formats results for outputting. Send more work to server.

Page 52: Testing neural networks using the complete round robin method

52

Server

Obtains a message from the client that this client is connected.

Starts work on data given it from client.

Returns finished packet to client. Loop.

Page 53: Testing neural networks using the complete round robin method

53

System Diagram

Interface

Thread Monitor

Server 1 Server 2 Server 250

Network Communication Streams

Threads….