Download - Using support vector machine with a hybrid feature selection method to the stock trend prediction

1

Using support vector machine with a hybrid feature selection method

to the stock trend prediction

Ming-Chi LeeExpert Systems with Applications . 2009

Presenter: Yu Hsiang HuangDate: 2012-05-17

2

Outline• Introduction• Feature selection• Research design• Experimental results and analysis• Conclusion

3

Introduction• Stock market

– Highly nonlinear dynamic system

• Application of AI– Expert system , Fuzzy system, Neuron network– Back propagation neural network (BPNN)

• Power of prediction is better than the others• Require a large amount of training data to estimate the distribution of input pattern• Over-fitting nature• Fully depends on researcher’s experience of knowledge to preprocess data

– relevant input variables, hidden layer size, learning rate, momentum, etc.

4

Introduction• In this paper

– Support vector machine (SVM)• Captures geometric characteristics of feature space without deriving weights of

networks from the training data. • Extracts the optimal solution with the small training set size• Local optimal solution vs. Global optimum solution• No over-fitting • Classification performance is influenced by dimension or number of feature variables

– Feature selection• Addresses the dimensionality reduction problem by determining a subset of available

features which is most essential for classification• Hybrid feature selection : Filter method + wrapper method F_SSFS• F_SSFS : F-score + Supported sequential forward search• Optimal parameter search

– Compare performance between BP and SVM

5

SVM-based model with F_SSFSOriginal feature variables

Filter partFeature pruning using F-score

Wrapper partSSFS algorithm find best feature variables

Pre-selected features

SVMTraining , testing , evaluating the classification accuracy

Best Feature variables

Data

Hybrid feature selection

6

Feature selection• Filter method :

– No feed back from classifier– Estimate the classification performance by some indirect assessments

• Distance : reflect how well the classes separate from each other

Estimate the classification

performance : distance

No feedback from classifier

7

Feature selection• F-score and Supported Sequential Forward Search (F_SSFS)

– F-score• Play the role of filter• Pre-selected features – “informative”• Given training vector , k=1,2,..,m• the number of positive and negative instances• F-score of feature :

• are the averages of feature of the whole , positive, negative data sets

• The numerator indicates the discrimination between the positive and negative sets• The denominator indicates the one within each of two sets• The larger the F-score is, the more likely this feature is more discriminative

𝑥𝑖¿¿

𝑥𝑖(−)𝑥𝑖

8


– F-score

Calculate F-score

Original feature variables

Sort F-score

Select top K F-score feature

K pre-selected feature

9




Pre-selected features



Data


10

Feature selection• Wrapper method:

– Classifier-dependent• Evaluate the “goodness” of the selected feature subset directly (from classifier)• Should intuitively yield better performance

– Have limit applications• Due to the high computational complexity involved

Feedback from classifier

11


– Supported sequential forward search (SSFS)• Play the role of wrapper• A variation of the sequential forward search (SFS) algorithm that is specially tailored to SVM to

expedite the feature searching process• Support vector : training samples other than support vectors have no contribution to

determine the decision boundary• Dynamically maintains an active subset as the candidates of the support vector• Training SVM using reduced subset rather than the entire training set - less computational cost

12


– Supported sequential forward search (SSFS)

f1 f2 f3 f4 … fk-2 fk-1 fk label

r1 … … … … … … … … +

r2 … … … … … … … … -

… … … … … … … … … -

rN … … … … … … … … +

13


– Supported sequential forward search (SSFS)

Iteration = 1

Iteration = n+1

1. No significant reduction of is found2. Desired number of features has been obtained

Termination

14


– F_SSFS• Uses the F-score measure to decide the best feature subsets• Uses the SSFS algorithm to select the final best feature subsets• Reduces the number of features that has to be tested through the training of SVM• Reduces the unnecessary computation time spent on the testing of the “no-informative”

features by wrapper method

15

Research design• Data collection and preprocessing

– Prediction target : the direction of change in the daily NASDAQ index– Index futures lead the spot index – Using 30 technical indices as the whole features set– 20 future contracts, 9 spot indexes and 1-day lagged NASDAQ Index– Use “1” and “-1” to denote the next day’s index is higher or lower than today’s– From Nov 8, 2001 to Nov 8, 2007 with 1065 observations per feature – The original data are scaled into the range of (0,1)

f1 f2 f3 … … f28 f29 f30 label

1 … … … … … … … … 1

2 … … … … … … … … -1

… … … … … … … … … -1

1065 … … … … … … … … 1

16

Research design• SVM-based model with F_SSFS

– Filter part• Calculating F-score for every feature and ranking features without involving the classifier• Sorting F-score and select K (threshold) highest scored features to construct the feature subset

– Wrapper part• Each selected feature does the 5-fold cross-validation and calculates the average accuracy of the 5-fold cross-validation• Determining the feature to be added in the best feature subset using M is the objective function• Repeat… • Until no significant increasing accuracy of cross-validation is found or the desired number of features has been obtained

17

Research design• Modeling for support vector machine

– Model selection and parameter search• Radial basis function (RBF) kernel• Kernel parameter and penalty parameter • Grid-search on () using 5-fold cross-validation

– Preventing the over-fitting problem– Computational time to find good parameters is less that other methods– Grid-search can be easily parallelized because () is independent– Try exponentially growing sequences of ()

» = » =

• Final performance of classifier is evaluated by mean costs of v folds subsets• Use LIBSVM software to conduct SVM experiment

18




Pre-selected K features



Data


19

Experimental results and analysis• Experimental result of F_SSFS

– Threshold K determines how many features we want to keep after filtering. • K is equal to the number of all original features filter part does not contribute at all• K is equal to 1 the wrapper method is unnecessary

20

Experimental results and analysis• Experimental result of F_SSFS – filter part

– Set

– K ↑, accuracy of prediction ↑, selection process time ↑– The performance and complexity of the algorithm can be balanced by tuning K– Choose K = 22, after the process of wrapper part, 17 features variables turned out to have

21

Experimental results and analysis• Experimental result of F_SSFS – wrapper part

– Choose K = 22, after the process of wrapper part– 17 features are left, average accuracy rate 81.7%

22

Experimental results and analysis• Result of SVM model selection

– RBF kernel • Penalty parameter , Kernel parameter • Grid-search using 5-fold cross-validation

– Optimal () is () with cross-validation rate of 87.1%

23

Experimental results and analysis• Experimental result of SVM

• Experimental result of BPNN

24

Experimental results and analysis• Experimental result of feature selection

– Key deficiency of neural-network models for stock trend prediction • Difficulty in selecting the discriminative features and explaining the rationale for the stock trend prediction

– Relative importance of each feature

25

Experimental results and analysis• Conclusion

– Stock trend prediction– Support vector machine with hybrid feature selection method (F_SSFS)– Reducing high computational cost and the risk of over-fitting– Need to investigate to develop the optimal value of the parameters in SVM for

the best prediction performance– Generalization of SVM on the basis of the appropriate level of the training set

size and give a guideline to measure the generalization performance