Research Article The Effectiveness of Feature Selection...

Hindawi Publishing CorporationJournal of Renewable EnergyVolume 2013 Article ID 952613 9 pageshttpdxdoiorg1011552013952613

Research ArticleThe Effectiveness of Feature Selection Method inSolar Power Prediction

Md Rahat Hossain Amanullah Maung Than Oo and A B M Shawkat Ali

Power Engineering Research Group (PERG) Central Queensland University Rockhampton QLD 4701 Australia

Correspondence should be addressed to Md Rahat Hossain mhossaincqueduau

Received 13 December 2012 Accepted 11 July 2013

Academic Editor Czarena Crofcheck

Copyright copy 2013 Md Rahat Hossain et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

This paper empirically shows that the effect of applying selected feature subsets on machine learning techniques significantlyimproves the accuracy for solar power prediction Experiments are performed using five well-known wrapper feature selectionmethods to obtain the solar power prediction accuracy of machine learning techniques with selected feature subsets For all theexperiments the machine learning techniques namely least median square (LMS) multilayer perceptron (MLP) and supportvector machine (SVM) are used Afterwards these results are compared with the solar power prediction accuracy of those samemachine leaning techniques (ie LMS MLP and SVM) but without applying feature selection methods (WAFS) Experimentsare carried out using reliable and real life historical meteorological data The comparison between the results clearly shows thatLMS MLP and SVM provide better prediction accuracy (ie reducedMAE andMASE) with selected feature subsets than withoutselected feature subsets Experimental results of this paper facilitate to make a concrete verdict that providing more attention andeffort towards the feature subset selection aspect (eg selected feature subsets on prediction accuracy which is investigated in thispaper) can significantly contribute to improve the accuracy of solar power prediction

1 Introduction

Feature selection can be considered one of the main pre-processing steps of machine learning [1] Feature selection isdifferent from feature extraction (or feature transformation)which creates new features by combining the original features[2] The advantages of feature selection are manyfold Firstfeature selection significantly saves the operating time ofa learning procedure by eliminating irrelevant and redun-dant features Second without the intervention of irrel-evant redundant and noisy features learning algorithmscan centrally point on most essential features of data andbuild simpler but more precise data models Third featureselection can help build a simpler and more common modeland get a better insight into the fundamental perceptionof the task [3ndash5] The feature selection aspect is fairlysignificant for the reason that with the same training datait may happen that an individual regression algorithm canperform better with different feature subsets The successof machine learning on a particular task is affected by

many factors Among those factors first and foremost isthe representation and quality of the instance data [6]The training stage becomes critical with the existence of noisyirrelevant and redundant data Sometimes the real life datacontain too much information among those very little isuseful for desired purpose Therefore it is not important toinclude every piece of information from the raw data sourcefor modelling

Usually features are differentiated [7] as

(1) relevant this class of features has strong impact on theoutput

(2) irrelevant opposite to relevant features irrelevant fea-tures do not have any bias on the output

(3) redundant a redundancy occurs when a feature cap-tures the functionality of other

All the algorithms to perform feature selection consist of twocommon aspects One is the search method which is actuallya selection algorithm to generate designed feature subsets

2 Journal of Renewable Energy

Original Subsetdata set

Goodness of subset

Subset generation

Subset evaluation

Result validation

Extract the best subset

Stopping criterion

No Yes

Figure 1 Key sequences of feature selection

and attempts to reach the most advantageous ones Anotheraspect is called evaluator which is basically an evaluationalgorithm to make a decision about the goodness of theplanned feature subset and finally returns the assessmentabout righteousness of the search method [8] On the otherhand lacking an appropriate stopping condition the featureselection procedure could run exhaustively or everlastinglyall theway throughout the rawdataset Itmay be discontinuedwhenever any attribute is inserted or deleted but ultimatelynot producing a better subset or whenever a subset isproduced which provides the maximum benefits accordingto some assessing functions A feature selector may stopmanipulating features when the merit of a current featuresubset stops improving or conversely does not degrade Ausual feature selection procedure revealed in Figure 1 consistsof four fundamental steps (A) subset generation (B) subsetevaluation (C) stopping criterion and (D) result validation[9]The procedure begins with subset generation that utilizesa definite search strategy to generate candidate feature sub-sets Then each candidate subset is evaluated according to adefinite evaluation condition and comparedwith the previousbest one If it is better it replaces the previous best Theprocess of subset generation and evaluation is repetitive untila given stopping condition is satisfied Finally the selectedbest feature subset is validated by some test data Figure 1graphically demonstrates the previous mentioned steps andprocedures of feature selection process

Based on some evaluation functions and calculationsfeature selection methods find out the best feature fromdifferent candidate subsets Usually feature selection meth-ods are classified into two general groups (ie filter andwrapper) [10] Inductive algorithms are used by wrappermethods as the evaluation function whereas filter methodsare independent of the inductive algorithm Wrapper meth-ods work along wrapping the feature selection in conjunctionwith the induction algorithm to be used and to accomplishthis wrapper methods use cross validation With the sametraining data it may happen that individual regression algo-rithm can perform better with different feature subsets [11]Widely used wrapper selection methods are briefly discussedin the following section Section 3 deals with real life datacollection and analysis of the dataset Sections 4 and 5 showthe experimental results using machine learning techniquesand selected feature subsets Comparison of the results andgraphical presentation of those results are presented in thosetwo sections Section 6 demonstrates the performance ofthose obtained prediction results by paired 119905-testsThe results

from the experiments demonstrate that LMS MLP and SVMsupplied with selected feature subsets provide better predic-tion accuracy (ie reducedMAE andMASE) than when theyare without the selected feature subsets Concluding remarksare provided in final section of this paper

2 Wrapper Methods of Feature Selection

In the field of machine learning the feature subset selectionthat is also named as attribute subset selection variable selec-tion or variable subset selection is an important method thathelps to select an appropriate subset of significant featuresfor model development In machine learning based advancedand sophisticated applications (eg solar power prediction)feature selection methods have become an obvious needGenerally feature selection methods are categorised intothree different classes [3 12] filter selection methods selectfeature subsets as a preprocessing act autonomously ofthe selected predictor wrapper selection methods exploitthe predictive power of the machine learning technique toselect suitable feature subsets and to conclude embeddedselection methods usually select feature subsets in the courseof training Wrapper methods were used for the experimentsin this paper Therefore in the subsequent section widelyused and accepted wrapper methods are placed

The wrapper methods use the performance (eg regres-sion classification or prediction accuracy) of an inductionalgorithm for feature subset evaluation Figure 2 shows theideas behind wrapper approaches [3] For each generatedfeature subset 119878 wrappers evaluate its goodness by applyingthe induction algorithm to the dataset using features in subset119878 Wrappers can find feature subsets with high accuracybecause the features match well with the learning algorithms

The easiest method among all the wrapper selectionalgorithms is the forward selection (FS) This method startsthe procedurewithout having any feature in the feature subsetand follows a greedy approach so that it can sequentiallyadd features until no possible single feature addition resultsin a higher valuation of the induction function Backwardelimination (BE) begins with the complete feature set andgradually removes features as long as the valuation doesnot degrade Description about forward selection (FS) andbackward selection (BS) can be found in [13] where theauthors proved that wrapper selection methods are betterthan methods having no selection

Starting with an empty set of features the best first search(BFS) produces every possible individual feature extension

Journal of Renewable Energy 3

Training data

Feature subset Estimated accuracy

Final subset

Test data

Subset generation

Induction algorithm

Feature Hypothesissubset

Subset evaluation

Induction algorithm

Result validation

Figure 2 The wrapper approach for feature selection [3]

[3] BFS exploits the greedy hill climbing approach in con-junction with backtracking to search the space of featuresubsets BFS has all the flexibility to start with an empty subsetand search in forward direction Alternatively it can starthaving full set of attributes and search in backward directionor it can start randomly from any point and move towardsany direction Extension of the BFS is the linear forwardselection (LFS) A limited number of attributes 119896 are takeninto consideration by LFS This method either selects the top119896 attributes by initial ordering or it can carry put a ranking[14 15]

Subset size forward selection (SSFS) is the extension ofLFS SSFS carries out an internal cross validation An LFS isexecuted on every fold to find out the best possible subsetsize [15 16] Through the individual evaluations attributesare ranked by the ranker search It uses this search incombination with attribute evaluators [16]

GA performs a search using the simple genetic algorithmdescribed in Goldbergrsquos study [17] Genetic algorithms arerandom search techniques based on the principles of naturalselection [17] They utilize a population of competing solu-tions evolved to an optimal solution For feature selectiona solution is a fixed length binary sequence representinga feature subset The value of each positionmdashtypically 1 or0mdashin the sequence represents the presence or absence ofa particular feature respectively The algorithm proceedsin an iterative manner where each successive generation isproduced by applying genetic operators to the members ofthe current generation Nonetheless GAs naturally involves ahuge quantity of evaluations or iterations to achieve optimalsolution Other than all these conventional methods we haveexperimentally verified an unconventional approach In thismethod we calculated the correlation coefficient for each(except the target attribute) of the competing attributes withrespect to the target attribute of the used dataset For thispurpose we used Pearsonrsquos correlation coefficient formulawhich is described in the next section After the attribute wisecalculation we selected those attributes whose correlationcoefficient values are positive only as feature subset Theattributes having negative correlation coefficient are ignored

for this case We named this method positive correlationcoefficient selection (PCCS) [18]

3 Real Life Data Collection and Analysis

One of the key conditions to successfully perform the experi-ments of this paper is to collect recent reliable accurate andlong-term historical weather data of the particular locationHowever finding accurate multiyear data near the experi-ment site has always proved to be challenging because thesedata are not readily available due to the cost and difficultyin measurement [19] There are only few sites in Australiaproviding research-quality data of solar radiation so thesedata are generally not available Rockhampton a subtropicaltown in North Australia was chosen for the experiments ofthis research The selected station is ldquoRockhampton Aerordquohaving latitude of minus2338 and longitude of 15048 Accordingto the Renewable Energy Certificates (RECs) zones withinAustralia Rockhampton is identified within the most impor-tant zone [20] The recent data were collected from Com-monwealth Scientific and Industrial Research Organization(CSIRO) Australia

Data were also collected from the Australian Bureau ofMeteorology (BOM) the National Aeronautics and SpaceAdministration (NASA) and the National Oceanic andAtmospheric Administration (NOAA) All of this mission-critical information on solar radiation was being estimatedfrom cloud cover and humidity at airports Free data are avail-able from National Renewable Energy Laboratory (NREL)and NASA These are excellent for multiyear averages butperform poorly for hourly and daily measurements Afteranalyzing the raw data collected from different sources thedata provided by CSIRO were finally selected The data usedin this paper are based on hourly global solar irradianceground measurements which are a significant aspect of thedataset These data were gathered for a period of five yearsfrom 2006 to 2010

The attributes in the used dataset are average air temper-ature average wind speed current wind direction averagerelative humidity total rainfall wind speed wind directionmaximum peak wind gust current evaporation averageabsolute barometer and average solar radiation Table 1 rep-resents the statistical properties of the raw data

4 Applying Feature Selection Techniques onthe Dataset

All the research works related to solar radiation predictionselect the input features or attributes randomly Unlikethe conventional way this research experimented with themaximumnumber of features and found out the best possiblecombination of features for the individual learning models ofthe hybrid model To perform the experiments for selectingsignificant feature subsets for individual machine learningtechnique the traditional BFS LFS SSFS ranker search GSand our very own PCCS selection methods are used To


Table 1 Statistical description of the raw data set

Min Max Mean Std devAvg air temp (DegC) minus58 401 2047 699Avg wind speed (Kmh) 0 271 699 478Current wind dir (Deg) 0 359 15891 10366Avg relative humidity () 0 100 5511 2426Total rainfall (mm) 0 304 007 069Wind speed (Kmh) 0 2483 577 438Wind direction (Deg) 0 360 16991 10984Max peak wind gust (Kmh) 0 106 2045 1133Current evaporation (mm) minus136 136 031 028Avg abs barometer (hPa) 921 1020 96659 1209Solar radiation (Wm2) 1 1660 30075 32517

carry out experiments three algorithms for machine learn-ing technique namely least median square [21] multilayerperceptrons [22] and support vector machine [23] are used

Evaluating the degree of fitness that is how well aregressionmodel fits to a dataset is usually obtained by corre-lation coefficient Assuming the actual values as 119886

1 1198862 119886

119899

and the predicted values as 1199011 1199012 119901

119899 the correlation

coefficiency is known by the equation

119877 =119878119875119860

radic119878119875119878119860

(1)

where

119878119875119860

=sum119894(119901119894minus 119901) (119886

119894minus 119886)

119899 minus 1

119878119875=

sum119894(119901119894minus 119901)2

119899 minus 1 119878119860=

sum119894(119886119894minus 119886)2

119899 minus 1

(2)

To find out the correlation coefficient of the model the fulltraining set is partitioned into ten mutually exclusive andsame-sized subsets The performance of the subset dependson the accuracy of predicting test values For every individualalgorithm this cross validation method was run over tentimes and finally the average value for 10 cross validationswas calculated In 119896-cv a dataset 119878

119899is uniformly partitioned

into 119896 folds of similar size 119875 = 1198751 119875119896 For the sake of

clarity and without loss of generality it is supposed that 119899is multiple of 119896 Let 119879

119894= 119878119899119875119894be the complement dataset

of 119875119894 Then the algorithm 119860(sdot) induces a classifier from 119879

119894

120595119894= 119860(119879

119894) and estimates its prediction error with119875

119894The 119896-cv

prediction error estimator of 120595 = 119860(119878119899) is defined as follows

[24]

120585119896(119878119899 119875) =

1

119899

119896

sum

119894=1

sum

(119909119888)isin119875119894

1 (119888 120595119894(119909)) (3)

where 1(119894 119895) = 1 if and only if 119894 = 119895 and equal to zerootherwise So the 119896-cv error estimator is the average of theerrors made by the classifiers 120595

119894in their respective divisions

119875119894

According to Zheng and Kusiak in [24] the mean abso-lute error (MAE) and mean absolute percent error (MAPE)are used to measure the prediction performance we havealso used these evaluation metrics for our experiments Thedefinitions are expressed as

MAE =sum119899

119894=1

1003816100381610038161003816119901119894minus 119886119894

1003816100381610038161003816

119899

MAPE =(sum |PE|)

119899

(4)

where PE = (119864119886) lowast 100 119864 = (119886 minus 119901) 119886 = actual values119901 = predicted values and 119899 = number of occurrences

Error of the experimental results was also analyzedaccording to mean absolute scaled error (MASE) [25] MASEis scale free less sensitive to outlier its interpretation is veryeasy in comparison to other methods and less variable tosmall samples MASE is suitable for uncertain demand seriesas it never produces infinite or undefined results It indicatesthat the predictionwith the smallestMASEwould be countedthe most accurate among all other alternatives [25] Equation(5) states the formula to calculate MASE as

MASE =MAE

(1119899 minus 1)sum119899

119894=2

1003816100381610038161003816119886119894minus 119886119894minus1

1003816100381610038161003816

(5)

where

MAE =1

119899

119899

sum

119894=1

1003816100381610038161003816119901119894minus 119886119894

1003816100381610038161003816 (6)

41 Prediction of the Machine Learning Techniques Usingthe Selected Feature Subsets Various feature subsets weregenerated or selected using different wrapper feature selec-tion methods Afterwards six-hours-ahead solar radiationprediction by the selected machine learning techniquesnamely LMS MLP and SVM was performed For thisinstance the selected feature subsets were supplied to theindividual machine learning techniquesThe intention of thisexperiment was to observe whether this initiative producesany improvement in the error reduction of those selectedmachine learning techniques or not For these experimentsany tuning of the particular algorithms to a definite datasetwas avoided For all the experiments default values oflearning parameters were used In general in the followingtables one can see the CC MAE MAPE and MASE of six-hours-ahead prediction for each machine learning techniquesupplied with different feature subsets For all the experi-ments ldquoWrdquo is used to indicate that a particular machinelearning technique supplied with the selected feature subsetsstatistically outplays the one without applying feature selec-tion (WAFS) methods

Tables 2 and 3 represent the obtained CC and MAE forapplying LMS MLP and SVM machine learning techniquefor six hours in advance prediction on the used dataset beforeand after feature selection process

In Tables 4 and 5 theMAPE andMASE are shown beforeand after feature selection processes are applied to LMSMLPand SVMmachine learning technique for the same purpose


Table 2 Achieved CC after applying various wrapper selection methods on LMS MLP and SVM

WAFS BFS LFS SSFS Ranker GS PCCSLMS 095 096 096 096 096 096 097 WMLP 098 097 097 098 099 W 097 098SVM 096 096 096 096 097 W 096 096

Table 3 Achieved MAE after applying various wrapper selection methods on LMS MLP and SVM


Table 4 Achieved MAPE after applying various wrapper selection methods on LMS MLP and SVM


Table 5 Achieved MASE after applying various wrapper selection methods on LMS MLP and SVM


The results from the experimental results show that thePCCS is somewhat a superior feature selection method forLMS algorithm considering all the instances It is noticeablethat all the feature selection methods contributed to improvethe CC of LMS algorithm However in the case of MAE allthe selection algorithms except the GS improve the resultsfor LMS In both the case of MAPE and MASE BFS is theonly selectionmethod which does not improve the results forLMS It is found from those results that the ranker searchis to some extents superior feature selection method forMLP algorithm It is noticeable that all the feature selectionmethods present a nearly close CC for MLP algorithm butin the case of MAE MAPE and MASE ranker search is theonly selectionmethodwhich improves the results Finally theobtained results illustrate that again the ranker search is tosome extent a superior feature selectionmethod for SVM It isalso noticeable that all the feature selection methods presenteither nearly close or equal CC for SVMHowever in the caseof MAE MAPE and MASE LFS is the only one which isunable to improve the results for SVM

5 Prediction Results Before versus afterApplying the Feature Selection Techniques

In Table 6 the prediction errors (MAE and MASE) of theindividual machine learning techniques are compared on thebasis of before supplying selected feature subsets and aftersupplying selected feature subsets on them The comparativeresults show that errors are reduced for all the instances aftersupplying selected feature subsets The terms MAE BEFORE

and MASE BEFORE represent the results for having noselected feature subsets for MAE and MASE respectivelywhereas the terms MAE AFTER and MASE AFTER repre-sent the results having selected feature subsets for MAE andMASE respectively

Figure 3(a) graphically demonstrates the prediction accu-racy comparison of LMS MLP and SVM before andafter applying selected feature subsets in terms of MAEFigure 3(b) graphically demonstrates the prediction accuracycomparison of LMSMLP and SVMbefore and after applyingselected feature subsets in terms of MASE

Figure 4(a) graphically illustrates the significance ofapplying and without applying (WAFS) various feature selec-tionmethods on LMS in terms of CC Figure 4(b) graphicallyillustrates the significance of applying and without applying(WAFS) feature selection methods onMLP in terms of MAEIn both the cases results are improved after applying selectedfeature subsets

Figure 5 graphically illustrates the significance of apply-ing and without applying (WAFS) feature selection methodson SVM in terms of MAPE The graphical representationclearly shows that the result is improved in terms of MAPEafter applying selected feature subsets

6 Performance Analysis throughStatistical Tests

A statistical test provides a mechanism for making quanti-tative decisions about a process or processes The intentionis to determine whether there is enough evidence to ldquorejectrdquo


Table 6 Error measurements of the top most three decisive regression algorithmsrsquo prediction accuracy with feature selection

MAE before MAE after MASE before MASE after RankLMS 7719 7337 063 059 1MLP 9102 8431 074 068 2SVM 12688 12211 103 099 3

020406080

100120140

LMS MLP SVM

77199102

12688

73378431

12211

MAE beforeMAE after

MA

E

(a)

0

02

04

06

08

1

12

LMS MLP SVM

063074

103

059068

099

MASE beforeMASE after

MA

SE

(b)

Figure 3 MAE (a) and MASE (b) comparison of LMS MLP and SVM before and after feature selection process

098

097 097

098

099

097

098

0960

0965

0970

0975

0980

0985

0990

0995

WAFS BFS LFS SSFS Ranker GS PCCS

Cor

relat

ion

coeffi

cien

t

Various selection algorithms

(a)

7719 76817449 7412 7493

8753

7337

65

70

75

80

85

90


Mea

n ab

solu

te er

ror


(b)

Figure 4 Effects of applying and without applying various feature selection algorithms on MLP (a) and LMS (b)

21722153

2235

2135

2088

21352165

20

205

21

215

22

225


Mea

n ab

solu

te p

erce

nt er

ror


Figure 5 Effects of applying and without applying various feature selection algorithms on SVM


Table 7 Paired samples statistics

Mean 119873 Std deviation Std error meanPair 1

Actual data 63917 6 17741 7243LMS prediction 71254 6 10725 4378

Pair 2Actual data 63917 6 17741 7243MLP prediction 58477 6 26311 10741

Pair 3Actual data 63917 6 17741 7243SVM prediction with RBF kernel 50413 6 20935 8547

Pair 4Actual data 63917 6 17741 7243SVM prediction with poly kernel 51705 6 18319 7479

Table 8 Paired samples correlations

119873 Correlation SigPair 1

Actual data amp LMS prediction 6 0967 0002Pair 2

Actual data amp MLP prediction 6 0985 0000Pair 3

Actual data amp SVM prediction with RBF kernel 6 0961 0002Pair 4

Actual data amp SVM prediction with poly kernel 6 0962 0002

an inference or hypothesis about the process For a singlegroup themeans of two variables are usually compared by thepaired-samples 119905-test It considers the differences between thevalues of two variables in a case-by-case basis and examineswhether the average varies from 0 or not It carries outvarious types of outputs such as statistical description forthe test variables the correlation among the test variablesmeaningful statistics for the paired differences the 119905-testitself and a 95 confidence interval The paired-samples 119905-tests are carried out in order to justify whether any significantdifference exists between the actual and predicted resultsachieved by the selected three machine learning techniquesfor ensemble method The 119905-test is executed with the SPSSpackagemdashPASW Statistics 20 [26] In this paper the paired119905-test is employed to verify the performance of the machinelearning techniques used in the experiments Here the nullhypothesis and the alternative hypothesis are termed as 119867

0

and 119867119886 respectively where 119867

0means there is no significant

difference between the actual and predicted mean values119867119886

means there is significant difference between the actual andpredicted mean values

In Tables 7 8 and 9 the paired samples statistics thepaired sample correlations and paired samples tests for theactual and predicted values of LMS MLP and SVM arerepresented respectively Observing Table 9 we find that forpair 1 119905(5) = minus228 119875 gt 00001 for pair 2 119905(5) = 143119875 gt 00001 for pair 3 119905(5) = 528 119875 gt 00001 and for pair 4119905(5) = 599 119875 gt 00001 Due to the means of the actual and

predicted values of each pair and the direction of the 119905 valuesit can be concluded that there was no statistically significantdifference between actual and predicted values for all thecases Therefore the test failed to reject the null hypothesis1198670

7 Conclusions

Feature selection is a fundamental issue in both the regressionand classification problems especially for the dataset having avery high volume of data Applying feature selectionmethodson machine learning techniques may significantly contributeto increase performance in terms of accuracy In this papervarious methods of feature selection methods have beenbriefly described In particular the wrapper feature selectionmethods are found better which is also justified by the resultsobtained from the experiments performed in this paperFrom the experiments performed in this paper it is foundthat for LMS the MAE before and after applying selectedfeature subsets is 7719 and 7337 respectively and MASE063 and 059 respectively In the case of MLP the MAEbefore and after applying selected feature subsets is 9102 and8431 respectively andMASE 074 and 068 respectively ForSVM the MAE before and after applying selected featuresubsets is 12688 and 12211 respectively and MASE 103 and099 respectivelyThe comparison between the results clearlyshows that LMS MLP and SVM provide better predictionaccuracy (ie reducedMAE andMASE)with selected feature


Table 9 Paired samples test

Paired differences

119905 df Sig (2-tailed)Mean Std deviation Std error mean

95 confidence interval ofthe difference

Lower UpperPair 1

Actual data - LMSprediction minus7337 7862 3209 minus15588 914 minus228 5 0071

Pair 2Actual data - MLPprediction 5439 9347 3816 minus4369 15248 143 5 0213

Pair 3Actual data - SVMprediction withRBF kernel

13504 6263 2557 6931 20076 528 5 0003

Pair 4Actual data - SVMprediction withpoly kernel

12211 4995 2039 6969 17453 599 5 0002

subsets than without selected feature subsets Experimentalresults of this paper facilitate to make a concrete verdictthat providing more attention and effort towards the featuresubset selection aspect (eg selected feature subsets onprediction accuracy which is investigated in this paper) cansignificantly contribute to improve the accuracy of solarpower prediction

It is mentionable that for these experiments the machinelearning techniques were applied with the default learningparameter settings In the near future the new experimentswill be performed with the intention to achieve better pre-diction accuracy of the selected machine learning techniquesby applying both the optimized or tuned learning parametersettings and selected feature subsets on them

Nomenclature

119886119894 Actual values

119867119886 Alternative hypothesis

119862 Class attribute 119862 isin 1 119903

119879119894= 119878119899119875119894 Complement dataset of 119875

119894

cv Cross validation119875119894 Divisions

119896-cv Error estimator119860(sdot) Induction algorithmM Mean1198670 Null hypothesis

119896 Number of folds119873 Number of instances119875 Set of 119899 points119878 Standard deviation119878119899 Training set

119875 Two-tailed significance value119903119894 Unknown errors or residuals

119909 Unlabeled instance

Greek SymbolsΨ119894 Classifiers

120585119896 Randomized error estimator

120572 Significance level = 005

Abbreviations

BOM Bureau of meteorologyBE Backward eliminationBS Backward selectionBFS Best first searchCSIRO Commonwealth Scientific and Industrial

Research OrganizationCC Correlation coefficientDeg DegreeDegC Degree CelsiusFCBF Fast correlation-based feature selectionFS Forward selectionGA Genetic algorithmGS Genetic searchhPa HectopascalsKmh Kilometer per hourLFS Linear forward selectionLMS Least median squareMAE Mean absolute errorMASE Mean absolute scaled errorMAPE Mean absolute percent errorMLP Multilayer perceptronPASW Predictive analytic softwareRECs Renewable energy certificatesmm MillimeterNASA National Aeronautics amp Space AdminNOAA National Oceanic amp Atmospheric AdminNREL National Renewable Energy Laboratory


PCCS Positive correlation coefficient selectionSPSS Statistical Package for Social SciencesSSFS Subset size forward selectionSVM Support vector machineWAFS Without applying feature selectionWm2 Watt per meter square

References

[1] L Yu and H Liu ldquoFeature selection for high-dimensional dataa fast correlation based filter solutionrdquo in Proceedings of the20th International Conference on Machine Learning pp 856ndash863 August 2003

[2] F Tan Improving feature selection techniques for machine learn-ing [PhD thesis] 2007 Computer Science Dissertations Paper27 httpdigitalarchivegsueducs diss27

[3] R Kohavi and G H John ldquoWrappers for feature subsetselectionrdquo Artificial Intelligence vol 97 no 1-2 pp 273ndash3241997

[4] D Koller and M Sahami ldquoToward optimal feature selectionrdquoin Proceedings of the 13th International Conference on MachineLearning pp 284ndash292 1996

[5] M Dash and H Liu ldquoFeature selection for classificationrdquoIntelligent Data Analysis vol 1 no 3 pp 131ndash156 1997

[6] T MitchellMachine Learning McGraw Hill 1997[7] D A Bell and H Wang ldquoA formalism for relevance and its

application in feature subset selectionrdquo Machine Learning vol41 no 2 pp 175ndash195 2000

[8] M Karagiannopoulos D Anyfantis S B Kotsiantis and P EPintelas ldquoFeature selection for regression problemsrdquo in Pro-ceedings of the 8th Hellenic European Research on ComputerMathematics amp Its Applications (HERCMA rsquo07) pp 20ndash22 2007

[9] H Liu and L Yu ldquoToward integrating feature selection algo-rithms for classification and clusteringrdquo IEEE Transactions onKnowledge and Data Engineering vol 17 no 4 pp 491ndash5022005

[10] P Langley ldquoSelection of relevant features in machine learningrdquoinProceedings of the AAAI Fall SymposiumonRelevance pp 1ndash51994

[11] A L Blum and P Langley ldquoSelection of relevant features andexamples inmachine learningrdquoArtificial Intelligence vol 97 no1-2 pp 245ndash271 1997

[12] I Guyon and A Elisseeff ldquoAn introduction to variable andfeature selectionrdquo Journal of Machine Learning Research vol 3pp 1157ndash1182 2003

[13] R Caruana and D Freitag ldquoGreedy attribute selectionrdquo inProceedings of the 11th International Conference on MachineLearning San Francisco Calif USA 1994

[14] M Gutlein E Frank M Hall and A Karwath ldquoLarge-scaleattribute selection using wrappersrdquo in Proceedings of the IEEESymposium on Computational Intelligence and Data Mining(CIDM rsquo09) pp 332ndash339 April 2009

[15] M Gutlein Large scale attribute selection using wrappers[Diploma of Computer Science thesis] Albert-Ludwigs-Univer-sitat-Freiburg 2006

[16] M Hall E Frank G Holmes B Pfahringer P Reutemann andHWitten ldquoTheWEKA data mining software an updaterdquoACMSIGKDD Explorations Newsletter vol 11 no 1 pp 10ndash18 2009

[17] E Goldberg Genetic Algorithms in Search Optimization andMachine Learning Addison-Wesley 1989

[18] M R Hossain A M T Oo and A B M S Ali ldquoThe combinedeffect of applying feature selection and parameter optimizationon machine learning techniques for solar Power predictionrdquoAmerican Journal of Energy Research vol 1 no 1 pp 7ndash16 2013

[19] S Coppolino ldquoA new correlation between clearness index andrelative sunshinerdquo Renewable Energy vol 4 no 4 pp 417ndash4231994

[20] httpwwwsolarguyscomau[21] P J Rousseeuw ldquoLeast median of squares regressionrdquo Journal

of the American Statistical Association vol 79 no 388 pp 871ndash880 1984

[22] S Haykin Neural Networks A Comprehensive FoundationPrentice Hall 1999

[23] S K Shevade S S Keerthi C Bhattacharyya and K R KMurthy ldquoImprovements to the SMO algorithm for SVM regres-sionrdquo IEEE Transactions on Neural Networks vol 11 no 5 pp1188ndash1193 2000

[24] H Zheng and A Kusiak ldquoPrediction of wind farm power ramprates a data-mining approachrdquo Journal of Solar Energy Engi-neering vol 131 no 3 Article ID 031011 8 pages 2009

[25] R J Hyndman and A B Koehler ldquoAnother look at measures offorecast accuracyrdquo International Journal of Forecasting vol 22no 4 pp 679ndash688 2006

[26] IBM SPSS Statistics for Windows-Version 200 IBM Corpora-tion New York NY USA 2011

TribologyAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

FuelsJournal of


Journal ofPetroleum Engineering


Industrial EngineeringJournal of


Power ElectronicsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

CombustionJournal of


Journal of


Renewable Energy

Submit your manuscripts athttpwwwhindawicom


StructuresJournal of


RotatingMachinery


EnergyJournal of


Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal ofPhotoenergy


Nuclear InstallationsScience and Technology of


Solar EnergyJournal of


Wind EnergyJournal of


Nuclear EnergyInternational Journal of


High Energy PhysicsAdvances in

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Original Subsetdata set

Goodness of subset

Subset generation

Subset evaluation

Result validation

Extract the best subset

Stopping criterion

No Yes

Figure 1 Key sequences of feature selection

and attempts to reach the most advantageous ones Anotheraspect is called evaluator which is basically an evaluationalgorithm to make a decision about the goodness of theplanned feature subset and finally returns the assessmentabout righteousness of the search method [8] On the otherhand lacking an appropriate stopping condition the featureselection procedure could run exhaustively or everlastinglyall theway throughout the rawdataset Itmay be discontinuedwhenever any attribute is inserted or deleted but ultimatelynot producing a better subset or whenever a subset isproduced which provides the maximum benefits accordingto some assessing functions A feature selector may stopmanipulating features when the merit of a current featuresubset stops improving or conversely does not degrade Ausual feature selection procedure revealed in Figure 1 consistsof four fundamental steps (A) subset generation (B) subsetevaluation (C) stopping criterion and (D) result validation[9]The procedure begins with subset generation that utilizesa definite search strategy to generate candidate feature sub-sets Then each candidate subset is evaluated according to adefinite evaluation condition and comparedwith the previousbest one If it is better it replaces the previous best Theprocess of subset generation and evaluation is repetitive untila given stopping condition is satisfied Finally the selectedbest feature subset is validated by some test data Figure 1graphically demonstrates the previous mentioned steps andprocedures of feature selection process

Based on some evaluation functions and calculationsfeature selection methods find out the best feature fromdifferent candidate subsets Usually feature selection meth-ods are classified into two general groups (ie filter andwrapper) [10] Inductive algorithms are used by wrappermethods as the evaluation function whereas filter methodsare independent of the inductive algorithm Wrapper meth-ods work along wrapping the feature selection in conjunctionwith the induction algorithm to be used and to accomplishthis wrapper methods use cross validation With the sametraining data it may happen that individual regression algo-rithm can perform better with different feature subsets [11]Widely used wrapper selection methods are briefly discussedin the following section Section 3 deals with real life datacollection and analysis of the dataset Sections 4 and 5 showthe experimental results using machine learning techniquesand selected feature subsets Comparison of the results andgraphical presentation of those results are presented in thosetwo sections Section 6 demonstrates the performance ofthose obtained prediction results by paired 119905-testsThe results

from the experiments demonstrate that LMS MLP and SVMsupplied with selected feature subsets provide better predic-tion accuracy (ie reducedMAE andMASE) than when theyare without the selected feature subsets Concluding remarksare provided in final section of this paper

2 Wrapper Methods of Feature Selection

In the field of machine learning the feature subset selectionthat is also named as attribute subset selection variable selec-tion or variable subset selection is an important method thathelps to select an appropriate subset of significant featuresfor model development In machine learning based advancedand sophisticated applications (eg solar power prediction)feature selection methods have become an obvious needGenerally feature selection methods are categorised intothree different classes [3 12] filter selection methods selectfeature subsets as a preprocessing act autonomously ofthe selected predictor wrapper selection methods exploitthe predictive power of the machine learning technique toselect suitable feature subsets and to conclude embeddedselection methods usually select feature subsets in the courseof training Wrapper methods were used for the experimentsin this paper Therefore in the subsequent section widelyused and accepted wrapper methods are placed

The wrapper methods use the performance (eg regres-sion classification or prediction accuracy) of an inductionalgorithm for feature subset evaluation Figure 2 shows theideas behind wrapper approaches [3] For each generatedfeature subset 119878 wrappers evaluate its goodness by applyingthe induction algorithm to the dataset using features in subset119878 Wrappers can find feature subsets with high accuracybecause the features match well with the learning algorithms

The easiest method among all the wrapper selectionalgorithms is the forward selection (FS) This method startsthe procedurewithout having any feature in the feature subsetand follows a greedy approach so that it can sequentiallyadd features until no possible single feature addition resultsin a higher valuation of the induction function Backwardelimination (BE) begins with the complete feature set andgradually removes features as long as the valuation doesnot degrade Description about forward selection (FS) andbackward selection (BS) can be found in [13] where theauthors proved that wrapper selection methods are betterthan methods having no selection

Starting with an empty set of features the best first search(BFS) produces every possible individual feature extension


Training data


Final subset

Test data

Subset generation

Induction algorithm


Subset evaluation

Induction algorithm

Result validation

















1 1198862 119886

119899




119877 =119878119875119860

radic119878119875119878119860

(1)

where

119878119875119860

=sum119894(119901119894minus 119901) (119886

119894minus 119886)

119899 minus 1

119878119875=

sum119894(119901119894minus 119901)2

119899 minus 1 119878119860=

sum119894(119886119894minus 119886)2

119899 minus 1

(2)







119894

120595119894= 119860(119879


119894The 119896-cv


[24]

120585119896(119878119899 119875) =

1

119899

119896

sum

119894=1

sum

(119909119888)isin119875119894

1 (119888 120595119894(119909)) (3)



119875119894


MAE =sum119899

119894=1

1003816100381610038161003816119901119894minus 119886119894

1003816100381610038161003816

119899

MAPE =(sum |PE|)

119899

(4)



MASE =MAE

(1119899 minus 1)sum119899

119894=2

1003816100381610038161003816119886119894minus 119886119894minus1

1003816100381610038161003816

(5)

where

MAE =1

119899

119899

sum

119894=1

1003816100381610038161003816119901119894minus 119886119894

1003816100381610038161003816 (6)

























020406080

100120140

LMS MLP SVM

77199102

12688

73378431

12211

MAE beforeMAE after

MA

E

(a)

0

02

04

06

08

1

12

LMS MLP SVM

063074

103

059068

099


MA

SE

(b)


098

097 097

098

099

097

098

0960

0965

0970

0975

0980

0985

0990

0995


Cor

relat

ion

coeffi

cien

t


(a)

7719 76817449 7412 7493

8753

7337

65

70

75

80

85

90


Mea

n ab

solu

te er

ror


(b)


21722153

2235

2135

2088

21352165

20

205

21

215

22

225


Mea

n ab

solu

te p

erce

nt er

ror

















0







7 Conclusions




Paired differences



Lower UpperPair 1




13504 6263 2557 6931 20076 528 5 0003


12211 4995 2039 6969 17453 599 5 0002



Nomenclature





119894









Abbreviations





References































FuelsJournal of







Advances in



Journal of


Renewable Energy





RotatingMachinery


EnergyJournal of


















Training data


Final subset

Test data

Subset generation

Induction algorithm


Subset evaluation

Induction algorithm

Result validation

















1 1198862 119886

119899




119877 =119878119875119860

radic119878119875119878119860

(1)

where

119878119875119860

=sum119894(119901119894minus 119901) (119886

119894minus 119886)

119899 minus 1

119878119875=

sum119894(119901119894minus 119901)2

119899 minus 1 119878119860=

sum119894(119886119894minus 119886)2

119899 minus 1

(2)







119894

120595119894= 119860(119879


119894The 119896-cv


[24]

120585119896(119878119899 119875) =

1

119899

119896

sum

119894=1

sum

(119909119888)isin119875119894

1 (119888 120595119894(119909)) (3)



119875119894


MAE =sum119899

119894=1

1003816100381610038161003816119901119894minus 119886119894

1003816100381610038161003816

119899

MAPE =(sum |PE|)

119899

(4)



MASE =MAE

(1119899 minus 1)sum119899

119894=2

1003816100381610038161003816119886119894minus 119886119894minus1

1003816100381610038161003816

(5)

where

MAE =1

119899

119899

sum

119894=1

1003816100381610038161003816119901119894minus 119886119894

1003816100381610038161003816 (6)

























020406080

100120140

LMS MLP SVM

77199102

12688

73378431

12211

MAE beforeMAE after

MA

E

(a)

0

02

04

06

08

1

12

LMS MLP SVM

063074

103

059068

099


MA

SE

(b)


098

097 097

098

099

097

098

0960

0965

0970

0975

0980

0985

0990

0995


Cor

relat

ion

coeffi

cien

t


(a)

7719 76817449 7412 7493

8753

7337

65

70

75

80

85

90


Mea

n ab

solu

te er

ror


(b)


21722153

2235

2135

2088

21352165

20

205

21

215

22

225


Mea

n ab

solu

te p

erce

nt er

ror

















0







7 Conclusions




Paired differences



Lower UpperPair 1




13504 6263 2557 6931 20076 528 5 0003


12211 4995 2039 6969 17453 599 5 0002



Nomenclature





119894









Abbreviations





References































FuelsJournal of







Advances in



Journal of


Renewable Energy





RotatingMachinery


EnergyJournal of






















1 1198862 119886

119899




119877 =119878119875119860

radic119878119875119878119860

(1)

where

119878119875119860

=sum119894(119901119894minus 119901) (119886

119894minus 119886)

119899 minus 1

119878119875=

sum119894(119901119894minus 119901)2

119899 minus 1 119878119860=

sum119894(119886119894minus 119886)2

119899 minus 1

(2)







119894

120595119894= 119860(119879


119894The 119896-cv


[24]

120585119896(119878119899 119875) =

1

119899

119896

sum

119894=1

sum

(119909119888)isin119875119894

1 (119888 120595119894(119909)) (3)



119875119894


MAE =sum119899

119894=1

1003816100381610038161003816119901119894minus 119886119894

1003816100381610038161003816

119899

MAPE =(sum |PE|)

119899

(4)



MASE =MAE

(1119899 minus 1)sum119899

119894=2

1003816100381610038161003816119886119894minus 119886119894minus1

1003816100381610038161003816

(5)

where

MAE =1

119899

119899

sum

119894=1

1003816100381610038161003816119901119894minus 119886119894

1003816100381610038161003816 (6)

























020406080

100120140

LMS MLP SVM

77199102

12688

73378431

12211

MAE beforeMAE after

MA

E

(a)

0

02

04

06

08

1

12

LMS MLP SVM

063074

103

059068

099


MA

SE

(b)


098

097 097

098

099

097

098

0960

0965

0970

0975

0980

0985

0990

0995


Cor

relat

ion

coeffi

cien

t


(a)

7719 76817449 7412 7493

8753

7337

65

70

75

80

85

90


Mea

n ab

solu

te er

ror


(b)


21722153

2235

2135

2088

21352165

20

205

21

215

22

225


Mea

n ab

solu

te p

erce

nt er

ror

















0







7 Conclusions




Paired differences



Lower UpperPair 1




13504 6263 2557 6931 20076 528 5 0003


12211 4995 2039 6969 17453 599 5 0002



Nomenclature





119894









Abbreviations





References































FuelsJournal of







Advances in



Journal of


Renewable Energy





RotatingMachinery


EnergyJournal of






































020406080

100120140

LMS MLP SVM

77199102

12688

73378431

12211

MAE beforeMAE after

MA

E

(a)

0

02

04

06

08

1

12

LMS MLP SVM

063074

103

059068

099


MA

SE

(b)


098

097 097

098

099

097

098

0960

0965

0970

0975

0980

0985

0990

0995


Cor

relat

ion

coeffi

cien

t


(a)

7719 76817449 7412 7493

8753

7337

65

70

75

80

85

90


Mea

n ab

solu

te er

ror


(b)


21722153

2235

2135

2088

21352165

20

205

21

215

22

225


Mea

n ab

solu

te p

erce

nt er

ror

















0







7 Conclusions




Paired differences



Lower UpperPair 1




13504 6263 2557 6931 20076 528 5 0003


12211 4995 2039 6969 17453 599 5 0002



Nomenclature





119894









Abbreviations





References































FuelsJournal of







Advances in



Journal of


Renewable Energy





RotatingMachinery


EnergyJournal of































0







7 Conclusions




Paired differences



Lower UpperPair 1




13504 6263 2557 6931 20076 528 5 0003


12211 4995 2039 6969 17453 599 5 0002



Nomenclature





119894









Abbreviations





References































FuelsJournal of







Advances in



Journal of


Renewable Energy





RotatingMachinery


EnergyJournal of



















Paired differences



Lower UpperPair 1




13504 6263 2557 6931 20076 528 5 0003


12211 4995 2039 6969 17453 599 5 0002



Nomenclature





119894









Abbreviations





References































FuelsJournal of







Advances in



Journal of


Renewable Energy





RotatingMachinery


EnergyJournal of



















References































FuelsJournal of







Advances in



Journal of


Renewable Energy





RotatingMachinery


EnergyJournal of





















FuelsJournal of







Advances in



Journal of


Renewable Energy





RotatingMachinery


EnergyJournal of

















Research Article The Effectiveness of Feature Selection...

Documents

Transcript of Research Article The Effectiveness of Feature Selection...