Research Article A Hybrid Sales Forecasting Scheme by...

Research ArticleA Hybrid Sales Forecasting Scheme by Combining IndependentComponent Analysis with K-Means Clustering and SupportVector Regression

Chi-Jie Lu1 and Chi-Chang Chang2

1 Department of Industrial Management Chien Hsin University of Science and Technology Taoyuan County 32097 Taiwan2 School of Medical Informatics Chung Shan Medical University Information Technology OfficeChung Shan Medical University Hospital Taichung City 40201 Taiwan

Correspondence should be addressed to Chi-Chang Chang changintwgmailcom

Received 22 April 2014 Accepted 5 June 2014 Published 17 June 2014

Academic Editor Kuo-Nan Huang

Copyright copy 2014 C-J Lu and C-C Chang This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Sales forecasting plays an important role in operating a business since it can be used to determine the required inventory level tomeet consumer demand and avoid the problem of underoverstocking Improving the accuracy of sales forecasting has becomean important issue of operating a business This study proposes a hybrid sales forecasting scheme by combining independentcomponent analysis (ICA) with K-means clustering and support vector regression (SVR) The proposed scheme first uses the ICAto extract hidden information from the observed sales data The extracted features are then applied to K-means algorithm forclustering the sales data into several disjoined clusters Finally the SVR forecasting models are applied to each group to generatefinal forecasting results Experimental results from information technology (IT) product agent sales data reveal that the proposedsales forecasting scheme outperforms the three comparisonmodels and hence provides an efficient alternative for sales forecasting

1 Introduction

Sales forecasting is one of the most important tasks in manycompanies since it can be used to determine the requiredinventory level to meet consumer demand and avoid theproblem of underoverstocking In addition sales forecast-ing can have implications for corporate financial planningmarketing client management and other areas of businessTherefore improving the accuracy of sales forecasting hasbecome an important issue of operating a business

Recently many researches have been conducted on thestudy of sales forecasting in various industries such asclothing [1 2] fashion [3ndash6] book [7] electronics [8 9]and automotive industry [10] However there is little researchcentered on sales forecasting for the information technology(IT) industry Lu and Wang [11] developed a hybrid salesforecasting model for computer dealer Lu [12] proposed atwo-stage sales forecasting model by integrating multivariateadaptive regression splines and support vector regression

(SVR) for computer products Lu et al [13] employed elevenforecasting methods for predicting sales of computer whole-salers With technological advancements and rapid changesin consumer demand computer products are characterizedby product variety rapid specification changes and rapidprice declines These factors have made sales forecasting inthe IT industry especially in IT product agents an importantbut difficult task This paper focuses on sales forecasting forIT product agents in light of the important role they play inthe IT industry by distributing IT products to retailers andindustry customers

In order to improve the forecasting performance manystudies have proposed different kinds of forecasting modelsbased clustering algorithms [14ndash16] For a given input datathe forecasting model first uses a clustering algorithm topartition the whole input space into several disjoint regionsThen for each partition the forecasting method is con-structed to produce the output Since the partitioned regionshave more uniformstationary data structure than that of

Hindawi Publishing Corporatione Scientific World JournalVolume 2014 Article ID 624017 8 pageshttpdxdoiorg1011552014624017

2 The Scientific World Journal

the whole input space it will become easier for constructingeffective forecasting model

However the existing clustering-based forecasting mod-els usually directly use the observed original values ofprediction variables for data clustering [14ndash16] But someunderlying factors may not be directly observed from theoriginal data Therefore if the underlyinginteresting infor-mation that cannot be observed directly from the observedoriginal data can be revealed through transforming theinput space into the feature space with a suitable featureextraction method the performance of the clustering-basedforecasting model can be improved by using the features asinputs to producemore effective clustering resultsThereforeindependent component analysis (ICA) a novel statisticalsignal processing technique is used in this research

ICAmodel was proposed to find the latent source signalsfrom observed mixture signal without knowing any priorknowledge of the mixing mechanisms [17] It is aimed atextracting the hidden information from the observed datawhere no relevant data mixture mechanisms are availableICA has been reported to have the capability of extractingthe distinguishability information from the time series data[11 17 18] Back and Weigend [19] used ICA to exact thefeatures of the daily returns of the 28 largest Japanese stocksThe results showed that the dominant ICs can reveal moreunderlying structure and information of the stock pricesthan principal component analysis Kiviluoto and Oja [20]employed ICA to find the fundamental factors affecting thecash flow of 40 stores in the same retail chain They foundthat the cash flow of the retail stores was mainly affected byholidays seasons and competitorsrsquo strategies

In this study a hybrid sales forecasting scheme bycombining ICA with K-means clustering and support vectorregression (SVR) is proposed In the proposed scheme firstthe ICA model is applied to the input data to estimatefeatures Then the features are used as inputs of K-meansclustering algorithm to group the input data into severaldisjoined clusters As K-means is one of the most usedclustering algorithms [21] it is utilized in this study In thefinal step of the proposed scheme for each cluster the SVRforecasting model is constructed and the final forecastingresults can be obtained This study considers the SVR as thepredictor due to its great potential and superior performancein practical applications [18 22] SVR based on statisticallearning theory is a novel neural network algorithm andhas been receiving increasing attention for solving nonlinearregression estimation problems [14 15 18] This is largelydue to the structure risk minimization principles in SVRwhich has greater generalization ability and is superior tothe empirical risk minimization principle as adopted bytraditional neural networks SVR has been successfully andwidely used in various forecasting problems such as electricload forecasting [23ndash25] wind speed [26] traffic flow [27 28]and financial time series forecasting [29ndash35]

There are only very few articles utilizing ICA and SVRin constructing clustering-based sales forecasting model Luand Wang [11] combined ICA growing hierarchical self-organizingmaps (GHSOM) and SVR to develop a clustering-based sales forecastingmodel In their approach the principal

component analysis (PCA) was used as the preprocessingmethod of ICA for dimension reduction However usingPCA to reduce the dimension of the original data beforeICAmay lose some useful information for feature extractionMoreover the GHSOM is a complex and time consumingmodel Since the proposed method does not use PCA fordimension reduction and utilizes the simple fast and well-known K-means method for data clustering it is believedthat the proposed clustering-based sales forecasting approachdiffers from the method proposed by Lu and Wang [11]and hence provides an ideal alternative in conducting salesforecasting for computer wholesalers

The rest of this paper is organized as follows Section 2gives a brief introduction about independent componentanalysis and support vector regression The proposed cluste-ring-based sales forecasting model is thoroughly describedin Section 3 Section 4 presents the experimental results froma computer wholesaler sales data The paper is concluded inSection 5

2 Methodology

21 Independent Component Analysis Let 119883 = [1199091 1199092

119909119898]119879 be a matrix of size119898times 119899119898 le 119899 consisting of observed

mixture signals 119909119894of size 1 times 119899 119894 = 1 2 119898 In the basic

ICA model the matrix119883 can be modeled as [17]

119883 = 119860119878 =

119898

sum

119894=1

119886119894119904119894 (1)

where 119886119894is the 119894th column of the 119898 times 119898 unknown mixing

matrix 119860 119904119894is the 119894th row of the 119898 times 119899 source matrix 119878

The vectors 119904119894are latent source signals that cannot be directly

observed from the observed mixture signals 119909119894 The ICA

model aims at finding an119898times119898 demixingmatrix119882 such that119884 = [119910

119894] = 119882119883 where 119910

119894is the 119894th row of the matrix 119884 119894 =

1 2 119898The vectors 119910119894must be as statistically independent

as possible and are called independent components (ICs)When demixing matrix119882 is the inverse of mixing matrix 119860that is119882 = 119860

minus1 ICs (119910119894) can be used to estimate the latent

source signals 119904119894 Several existing algorithms can be used to

perform ICA modelingThe ICA modeling is formulated as an optimization

problem by setting up the measure of the independence ofICs as an objective function and using some optimizationtechniques for solving the demixing matrix 119882 Severalexisting algorithms can be used to perform ICA modeling[17] In general the ICs are obtained by using the demixingmatrix 119882 to multiply the matrix 119883 that is 119884 = 119882119883 Thedemixing matrix can be determined using an unsupervisedlearning algorithm with the objective of maximizing thestatistical independence of ICs The ICs with non-Gaussiandistributions imply the statistical independence and the non-Gaussianity of the ICs can bemeasured by the negentropy [17]

119869 (y) = 119867 (ygauss) minus 119867 (y) (2)

where ygauss is a Gaussian random vector having the samecovariance matrix as y 119867 is the entropy of a random vectory with density 119901(y) defined as119867(y) = minus int119901(y) log119901(y)119889y

The Scientific World Journal 3

The negentropy is always nonnegative and is zero if andonly if y has a Gaussian distribution Since the problemin using negentropy is computationally very difficult anapproximation of negentropy is proposed as follows [17]

119869 (119910) asymp [119864 119866 (119910) minus 119864 119866 (V)]2 (3)

where V is a Gaussian variable of zeromean and unit varianceand 119910 is a random variable with zero mean and unit variance119866 is a nonquadratic function and is given by 119866(119910) =

exp(minus11991022) in this studyThe FastICA algorithm proposed byHyvarinen and Oja [17] is adopted in this paper to solve forthe demixing matrix

22 Support Vector Regression Support vector regression(SVR) can be expressed as the following equation

119891 (119909) = (119911 sdot 120601 (119909)) + 119887 (4)

where 119911 is weight vector 119887 is bias and120601(119909) is a kernel functionwhich use a nonlinear function to transform the nonlinearinput to be linear mode in a high dimension feature space

Traditional regression gets the coefficients through mini-mizing the square error which can be considered as empiricalrisk based on loss function Vapnik [22] introduced so-called120576-insensitivity loss function to SVR It can be expressed asfollows

119871120576(119891 (119909) minus 119902) =

1003816100381610038161003816119891 (119909) minus 1199021003816100381610038161003816 minus 120576 if 1003816100381610038161003816119891 (119909) minus 119902

1003816100381610038161003816 ge 120576

0 otherwise(5)

where 119902 is the target output 120576 defined the region of 120576-insensitivity when the predicted value falls into the bandarea the loss is zero Contrarily if the predicted value fallsout the band area the loss is equal to the difference betweenthe predicted value and the margin

Considering empirical risk and structure risk syn-chronously the SVR model can be constructed to minimizethe following programming

Min 1

2119911119879119911 + 119862sum

119894

(120585119894+ 120585lowast

119894)

Subject to 119902119894minus 119911119879119909119894minus 119887 le 120576 + 120585

119894

119911119879119909119894+ 119887 minus 119902

119894le 120576 + 120585

lowast

119894

120585119894 120585lowast

119894ge 0

(6)

where 119894 = 1 2 119899 is the number of training data (120585119894+ 120585lowast

119894)

is the empirical risk (12)119911119879119911 is the structure risk preventingoverlearning and lack of applied universality 119862 is modifyingcoefficient representing the trade-off between empirical riskand structure risk Equation (6) is a quadratic programmingproblem After selecting proper modifying coefficient (119862)width of band area (120576) and kernel function (119870) the optimumof each parameter can be resolved though Lagrange functionThegeneral formof the SVR-based regression function can bewritten as follows [22]

119891 (119909 z) = 119891 (119909 120572 120572lowast) =119873

sum

119894=1

(120572119894minus 120572lowast

119894)119870 (119909 119909

119894) + 119887 (7)

where 120572119895and 120572lowast

119895are Lagrangian multipliers and satisfy the

equality 120572119895120572lowast

119895= 0 119870(119909

119894 1199091015840

119894) is the kernel function Any

function that meets Mercerrsquos condition can be used as thekernel function

Although several choices for the kernel function areavailable the most widely used kernel function is theradial basis function (RBF) defined as [36] 119870(119909

119894 119909119895) =

exp(minus119909119894minus 119909119895221205902) where 120590 denotes the width of the RBF

Thus the RBF with parameter 120590 = 02 is applied in this studyas kernel function

SVR performance is mainly affected by the setting ofparameters119862 and 120576 [36]There are no general rules governingthe choice of 119862 and 120576 The grid search proposed by Linet al [37] is a common and straightforward method usingexponentially growing sequences of 119862 and 120576 to identify goodparameters (eg 119862 = 2minus15 2minus13 2minus11 215) The parameterset of119862 and 120576which generate the minimum forecastingmeansquare error (MSE) is considered as the best parameter set Inthis study the grid search is used in each cluster to determinethe best parameter set for training an optimal SVR forecastingmodel

3 The Proposed Sales Forecasting Scheme

In this study a hybrid sales forecasting scheme by com-bining ICA K-means and SVR is proposed The proposedmodel contains three stages In the first stage the ICAmodel is applied to the input data to estimate independentcomponents (ICs) demixing matrix and mixing matrixfrom input data The ICs can be used to represent hiddeninformationfeatures of the input data After obtaining ICsthe aim of this stage is to find the relationship between ICsand observed input data using mixing matrix In the secondstage theK-means clustering algorithmgroups the input datainto several disjoined clusters using the mixing matrix Eachcluster contains similar objects In the third stage for eachcluster the SVR forecastingmodel is constructed and the finalforecasting results can be obtained

The detailed procedure of the proposed scheme is asfollows

(1) Consider 119872 observed company sales data 119909119894 119894 =

1 2 119872 each of size 1 times 119905 And then combine 119909119894

as an input data matrix 119883 119883 = [1199091 1199092 119909

119872]119879 of

size119872times 119905(2) Compute the demixing matrix 119882 = [119908

1 1199082

119908119872]119879 of size 119872 times 119872 and independent component

matrix 119884 = [1199101 1199102 119910

119872]119879 of size119872times 119905 using ICA

method to the matrix 119883 that is 119884 = 119882119883 The rowvectors 119910

119894are ICs

(3) Asmentioned in Section 2 the ICs (119910119894) can be used to

estimate the latent source signals (or hidden features)119904119894which are the rows of the unknown source matrix 119878

of size119872times119905Thus according to (1) the observed datamatrix 119883 can be obtained by using mixing matrix 119860119860 = 119882

minus1 to multiply the independent componentmatrix 119884 that is119883 = 119860119878 = 119882

minus1119884 = 119860119884


Table 1 Basic statistics of the data in each sector

Group name Number of companies Mean Standard deviation Max MinManufacturing 12 41323 32101 1020132 232Service 10 29101 22345 772345 121Finance 8 76225 119132 1864655 041lowastUnit thousand NT dollars

(4) For each row vector 119909119894 it can be obtained by using the

corresponding row vector of matrix 119860 119886119894 to multiply

the matrix 119884 that is 119909119894= 119886119894119884 Thus the relationship

between each observed company sales data 119909119894and the

ICs can represented as

119909119894= 11988611989411199101+ 11988611989421199102+ sdot sdot sdot + 119886

119894119872119910119872 119894 = 1 2 119872 (8)

(5) From (5) it can be found that each vector 119886119894can

be used to represent the effects of different ICs oninformation (or features) contained in the corre-sponding sales data 119909

119894 Thus the mixing matrix 119860

119860 = [1198861 1198862 119886

119872]119879 can be used for clustering since

any two similar sales data 119909119894and 119909

119895 119894 = 119895 will have

similar patterns in the vectors 119886119894

(6) Use the K-means clustering algorithm to the mixingmatrix 119860 for clustering the observed sales data (iethe companies) into several disjointed clusters

(7) Multiple SVRmodels that best fit each cluster are con-structed by finding the optimal learning parametersof SVRs Every constructed SVR model is the mostadequate one for a particular cluster

(8) For each company find the cluster which it belongs toand use the SVR model corresponding to the clusterto generate final forecasting results

4 Experimental Results

In order to evaluate the performance of the proposed hybridsales forecasting scheme using ICA K-means and SVRalgorithms (called ICA-k-SVRmodel) themonthly sales dataof an IT product agent in Taiwan are used in this studyThe monthly sales data of 30 companies from July 2003to February 2010 are collected For each company thereare a total of 80 data points in the dataset The dataset isdivided into training and testing datasets The first 56 datapoints (70 of the total sample points) are used as thetraining samples while the remaining 24 data points (30of the total sample points) are employed as the holdoutand used as the testing sample for measuring out-of-sampleforecasting ability In the dataset the agent classifies the 30companies into three groups Manufacturing Service andFinance sectors according the business type of each companyNote that the assigned group of each company is viewed as itsoriginal group Table 1 shows the basic statistics of the data ofeach group From Table 1 it can be seen that the sales data inthe Finance sector is themost dynamic since it has the highestaverage standard deviation and maximum sales amount aswell as the lowest minimum sales amount

Table 2 Comparison of the K-means clustering results and theoriginal grouping results

Company Original group K-means group1 Manufacturing 22 Manufacturing 13 Manufacturing 34 Manufacturing 15 Manufacturing 16 Manufacturing 27 Manufacturing 28 Manufacturing 19 Manufacturing 310 Manufacturing 311 Manufacturing 212 Manufacturing 113 Service 214 Service 215 Service 316 Service 117 Service 118 Service 119 Service 220 Service 221 Service 222 Service 323 Finance 124 Finance 125 Finance 126 Finance 227 Finance 328 Finance 229 Finance 330 Finance 1

The prediction results of the proposed sales forecast-ing scheme are compared to that of three other forecast-ing schemes that is single-SVR K-means-SVR and ICA-GHSOM-SVR schemes In the single-SVR scheme the SVRmodels are directly applied to each original group for buildingsales forecasting model That is in this study three SVRsales forecasting models are individually constructed for thethree groups In the K-means-SVR scheme the K-meansalgorithm is first used to cluster the 30 firms according tothe patterns of the original sales data and then the SVR isutilized to build sales forecasting models for each group Inconstructing the ICA-GHSOM-SVR model the parameter


Table 3 The forecasting results of each group by using the single-SVR model

Group name Number of companies RMSE MAD MAPEManufacturing 12 42628 20384 327Service 10 29376 9607 311Finance 8 148869 49999 472

Table 4 The forecasting results of each group by using the K-means-SVR model


Table 5 The forecasting results of each group by using the ICA-GHSOM-SVR model


setting and modeling process is the same as that described inLu and Wang [11] For more detailed information about theICA-GHSOM-SVR model please refer to Lu and Wang [11]

In this study all of the four forecasting schemes areused for one-step-ahead forecasting of monthly sales dataIn building the SVR forecasting model the LIBSVM packageproposed by Chang and Lin [38] is adapted in this study Theoriginal datasets are first scaled into the range of [minus10 10]when using the LIBSVMpackageThree forecasting variablesincluding the previous one monthrsquos sales amount previoustwo monthsrsquo sales amount and previous three monthsrsquo salesamount are used for SVR modeling

The prediction performance is evaluated using the fol-lowing performance measures namely the root mean squareerror (RMSE) mean absolute difference (MAD) and meanabsolute percentage error (MAPE)The smaller are the valuesof RMSE MAD and MAPE the closer are the predictedresults to that of the actual value The definitions of thesecriteria are as follows

RMSE = radicsum119899

119894=1(119879119894minus 119875119894)2

119899

MAD =sum119899

119894=1

1003816100381610038161003816119879119894 minus 1198751198941003816100381610038161003816

2

119899

MAPE =sum119899

119894=1

1003816100381610038161003816(119879119894 minus 119875119894) 1198791198941003816100381610038161003816

119899

(9)

where 119879119894and 119875

119894represent the actual and predicted value at

week 119894 respectively 119899 is the total number of data pointsThis study used three clustering-based models for sales

forecasting The K-means algorithm in the K-means-SVRmodel directly uses the original values of the data pointsas input variables In the ICA-GHSOM-SVR model andproposed ICA-k-SVR model the GHSOM and K-meansalgorithms employ the elements of mixing matrix as inputs

respectively Note that the clusters of companies generated bythe K-means or GHSOM algorithms of the three clustering-based models are the temporary clusters which are usedto build more effective forecasting models For comparingthe forecast results of the single-SVR model each companyclustered by the K-means algorithms of the three clustering-based models is reallocated to its original cluster Afterreassigning all classified companies to their original groupthe prediction errors of the three clustering-basedmodels arethen recalculated based on the original grouping results Forexample as seen in Table 2 the K-means algorithm clustersboth company 1 and company 13 into Cluster 2 Basedon the relationship depicted in Table 2 company 1 andcompany 13 can be reassigned to the Manufacturing sectorand the Service sector respectively

Table 3 shows the forecasting results of each groupdefined by the IT agent applying the individual SVR modelto each group Note that the best parameter sets of the SVRmodels for the Manufacturing Service and Finance sectorsare (119862 = 2

3 120576 = 2minus3) (119862 = 2

minus1 120576 = 2minus7) and (119862 = 2

1120576 = 2

minus1) respectively Table 3 shows that the single SVRmodel performs well in the Service sector but generates ahigher forecasting error in the Finance sector

Tables 4 5 and 6 show the forecasting results of eachgroup defined by the IT product agent by applying K-means-SVR model ICA-GHSOM-SVR model and the proposedICA-k-SVR sales forecasting scheme respectively The bestparameter sets of theK-means-SVR ICA-GHSOM-SVR andICA-k-SVR models for the three sectors are summarized inTable 7

In order to compare the performance of the fourmethodsTable 8 summarizes the forecasting results of ManufacturingService and Finance using the four models It can beobserved from Table 8 that the proposed sales forecastingscheme produces the best forecasting results and outperformsthe other three methods in all groups Thus it indicates


Table 6 The forecasting results of each group by using the proposed sales forecasting model


Table 7 Summary of best parameter sets of the K-means-SVR ICA-GHSOM-SVR and ICA-k-SVR for the three sectors

Group name Models Best parameter sets

K-means-SVRManufacturing (119862 = 2minus1 120576 = 2minus1)

Service (119862 = 2minus5 120576 = 2minus3)Finance (119862 = 25 120576 = 2minus5)

ICA-GHSOM-SVRManufacturing (119862 = 23 120576 = 21)

Service (119862 = 23 120576 = 23)Finance (119862 = 2minus1 120576 = 2minus1)

ICA-k-SVRManufacturing (119862 = 2minus7 120576 = 2minus3)

Service (119862 = 2minus9 120576 = 2minus3)Finance (119862 = 2minus5 120576 = 2minus5)

Table 8 Summary of forecast results by the four models

Group name Models MAD RMSE MAPE

Manufacturing

single-SVR 42628 20384 327K-means-SVR 35028 13221 189

ICA-GHSOM-SVR 27831 10590 129ICA-k-SVR 25212 9290 096

Service



Finance



that the proposed sales forecasting scheme provides a betterforecasting result than the three comparison models in termsof prediction error

Moreover it also can be observed from Table 8 that K-means algorithm is successfully applied for grouping compa-nies using sales data since the forecasting performance of K-means-SVR and the proposed forecasting schemes are betterthan the single-SVR model As the ICA-GHSOM-SVR andthe proposed model can generate better forecasting resultsthe ICA model is an effective feature extraction model andcan improve the performance of the forecastingmodel in thisstudy

5 Conclusions

This paper proposed a hybrid sales forecasting scheme usingICA K-means and SVR for IT product agent The proposedscheme first uses the ICA to extract hiddenunderlyinginformation (ie features) from the observed sales data

The extracted features are then applied to K-means algo-rithm for clustering the sales data into several disjoinedclusters Finally for each cluster the SVR forecasting modelis constructed and final forecasting results are obtainedThe monthly sales data collected from an IT product agentare used in this study for evaluating the performance ofthe proposed method Experimental results showed thatthe proposed sales forecasting scheme produces the bestforecasting results and outperforms the three comparisonmethods According to the experiments it can be con-cluded that the K-means algorithm is a promising tool forgrouping companies using sales data and ICA model isan effective feature extraction model The proposed hybridsales forecasting scheme is an effective alternative for salesforecasting

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper


Acknowledgment

This work is partially supported by the National ScienceCouncil of Taiwan Grant no NSC 100-2628-E-231-001

References

[1] S Thomassey and M Happiette ldquoA neural clustering andclassification system for sales forecasting of new apparel itemsrdquoApplied Soft Computing Journal vol 7 no 4 pp 1177ndash1187 2007

[2] S Thomassey ldquoSales forecasts in clothing industry the keysuccess factor of the supply chain managementrdquo InternationalJournal of Production Economics vol 128 no 2 pp 470ndash4832010

[3] Z-L Sun T-M Choi K-F Au and Y Yu ldquoSales forecastingusing extreme learning machine with applications in fashionretailingrdquo Decision Support Systems vol 46 no 1 pp 411ndash4192008

[4] W K Wong and Z X Guo ldquoA hybrid intelligent modelfor medium-term sales forecasting in fashion retail supplychains using extreme learning machine and harmony searchalgorithmrdquo International Journal of Production Economics vol128 no 2 pp 614ndash624 2010

[5] T-M Choi C-L Hui N Liu S-F Ng and Y Yu ldquoFast fashionsales forecasting with limited data and timerdquo Decision SupportSystems vol 59 pp 84ndash92 2014

[6] M Xia and W K Wong ldquoA seasonal discrete grey forecastingmodel for fashion retailingrdquo Knowledge-Based Systems vol 57pp 119ndash126 2014

[7] P-C Chang C-Y Lai and K R Lai ldquoA hybrid system by evolv-ing case-based reasoning with genetic algorithm in wholesalersreturning book forecastingrdquo Decision Support Systems vol 42no 3 pp 1715ndash1729 2006

[8] P-C Chang C-H Liu and Y-W Wang ldquoA hybrid model byclustering and evolving fuzzy rules for sales decision supportsin printed circuit board industryrdquoDecision Support Systems vol42 no 3 pp 1254ndash1269 2006

[9] E Hadavandi H Shavandi and A Ghanbari ldquoAn improvedsales forecasting approach by the integration of genetic fuzzysystems and data clustering case study of printed circuit boardrdquoExpert Systems with Applications vol 38 no 8 pp 9392ndash93992011

[10] A Sa-Ngasoongsong S T S Bukkapatnam J Kim P S Iyerand R P Suresh ldquoMulti-step sales forecasting in automotiveindustry based on structural relationship identificationrdquo Inter-national Journal of Production Economics vol 140 no 2 pp875ndash887 2012

[11] C-J Lu and Y-W Wang ldquoCombining independent componentanalysis and growing hierarchical self-organizing maps withsupport vector regression in product demand forecastingrdquoInternational Journal of Production Economics vol 128 no 2pp 603ndash613 2010

[12] C J Lu ldquoSales forecasting of computer products based onvariable selection scheme and support vector regressionrdquo Neu-rocomputing vol 128 pp 491ndash499 2014

[13] C-J Lu T-S Lee and C-M Lian ldquoSales forecasting forcomputer wholesalers a comparison of multivariate adaptiveregression splines and artificial neural networksrdquo DecisionSupport Systems vol 54 no 1 pp 584ndash596 2012

[14] F E H Tay and L J Cao ldquoImproved financial time seriesforecasting by combining support vector machines with self-organizing feature maprdquo Intelligent Data Analysis vol 5 no 4pp 339ndash354 2001

[15] L Cao ldquoSupport vector machines experts for time seriesforecastingrdquo Neurocomputing vol 51 pp 321ndash339 2003

[16] R K Lai C-Y Fan W-H Huang and P-C Chang ldquoEvolvingand clustering fuzzy decision tree for financial time series dataforecastingrdquo Expert Systems with Applications vol 36 no 2 pp3761ndash3773 2009

[17] A Hyvarinen and E Oja ldquoIndependent component analysisalgorithms and applicationsrdquo Neural Networks vol 13 no 4-5pp 411ndash430 2000

[18] C-J Lu T-S Lee and C-C Chiu ldquoFinancial time seriesforecasting using independent component analysis and supportvector regressionrdquo Decision Support Systems vol 47 no 2 pp115ndash125 2009

[19] A Back and A Weigend ldquoDiscovering structure in financeusing independent component analysisrdquo in Proceedings of the5th International Conference on Neural Networks in CapitalMarket pp 15ndash17 1997

[20] K Kiviluoto and E Oja ldquoIndependent component analysisfor parallel financial time seriesrdquo in Proceeding of the 5thInternational Conference on Neural Information vol 2 pp 895ndash898 Tokyo Japan 1998

[21] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

[22] VN VapnikTheNature of Statistical LearningTheory SpringerNew York NY USA 2000

[23] E E Elattar J Goulermas andQHWu ldquoElectric load forecast-ing based on locally weighted support vector regressionrdquo IEEETransactions on Systems Man and Cybernetics C Applicationsand Reviews vol 40 no 4 pp 438ndash447 2010

[24] W-C Hong ldquoHybrid evolutionary algorithms in a SVR-basedelectric load forecastingmodelrdquo International Journal of Electri-cal Power and Energy Systems vol 31 no 7-8 pp 409ndash417 2009

[25] Z Hu Y Bao and T Xiong ldquoElectricity load forecastingusing support vector regression with memetic algorithmsrdquoTheScientific World Journal vol 2013 Article ID 292575 10 pages2013

[26] S Tatinati and K C Veluvolu ldquoA hybrid approach for short-term forecasting of wind speedrdquo The Scientific World Journalvol 2013 Article ID 548370 8 pages 2013

[27] M-W Li W-C Hong and H-G Kang ldquoUrban traffic flowforecasting using Gauss-SVR with cat mapping cloud modeland PSO hybrid algorithmrdquo Neurocomputing vol 99 pp 230ndash240 2013

[28] W-CHong ldquoApplication of seasonal SVRwith chaotic immunealgorithm in traffic flow forecastingrdquo Neural Computing andApplications vol 21 no 3 pp 583ndash593 2012

[29] R Alwee S M Hj Shamsuddin and R Sallehuddin ldquoHybridsupport vector regression and autoregressive integratedmovingaverage models improved by particle swarm optimization forproperty crime rates forecasting with economic indicatorsrdquoTheScientific World Journal vol 2013 Article ID 951475 11 pages2013

[30] T Xiong Y Bao and Z Hu ldquoMultiple-output support vectorregression with a firefly algorithm for interval-valued stockprice index forecastingrdquo Knowledge-Based Systems vol 55 pp87ndash100 2014


[31] T Xiong Y Bao and Z Hu ldquoBeyond one-step-ahead fore-casting evaluation of alternative multi-step-ahead forecastingmodels for crude oil pricesrdquo Energy Economics vol 40 pp 405ndash415 2013

[32] L-J Kao C-C Chiu C-J Lu and J-L Yang ldquoIntegration ofnonlinear independent component analysis and support vectorregression for stock price forecastingrdquoNeurocomputing vol 99pp 534ndash542 2013

[33] C-J Lu ldquoHybridizing nonlinear independent component anal-ysis and support vector regression with particle swarm opti-mization for stock index forecastingrdquo Neural Computing andApplications vol 23 no 7-8 pp 2417ndash2427 2013

[34] W Dai Y E Shao and C-J Lu ldquoIncorporating feature selectionmethod into support vector regression for stock index forecast-ingrdquoNeural Computing and Applications vol 23 no 6 pp 1551ndash1561 2013

[35] L-J Kao C-C Chiu C-J Lu and C-H Chang ldquoA hybridapproach by integrating wavelet-based feature extraction withMARS and SVR for stock index forecastingrdquo Decision SupportSystems vol 54 no 3 pp 1228ndash1244 2013

[36] V Cherkassky and Y Ma ldquoPractical selection of SVM parame-ters and noise estimation for SVM regressionrdquoNeural Networksvol 17 no 1 pp 113ndash126 2004

[37] C J Lin C W Hsu and C C Chang ldquoA practical guide tosupport vector classificationrdquo Tech Rep Department of Com-puter Science and Information Engineering National TaiwanUniversity 2003

[38] C C Chang andC J Lin ldquoLIBSVM a library for support vectormachinesrdquo 2001 httpwwwcsientuedutwsimcjlinlibsvm

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of


Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of


Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of


Mathematical PhysicsAdvances in

Complex AnalysisJournal of


OptimizationJournal of


CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of


Operations ResearchAdvances in

Journal of


Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences


The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Algebra

Discrete Dynamics in Nature and Society



Decision SciencesAdvances in

Discrete MathematicsJournal of


Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of


the whole input space it will become easier for constructingeffective forecasting model

However the existing clustering-based forecasting mod-els usually directly use the observed original values ofprediction variables for data clustering [14ndash16] But someunderlying factors may not be directly observed from theoriginal data Therefore if the underlyinginteresting infor-mation that cannot be observed directly from the observedoriginal data can be revealed through transforming theinput space into the feature space with a suitable featureextraction method the performance of the clustering-basedforecasting model can be improved by using the features asinputs to producemore effective clustering resultsThereforeindependent component analysis (ICA) a novel statisticalsignal processing technique is used in this research

ICAmodel was proposed to find the latent source signalsfrom observed mixture signal without knowing any priorknowledge of the mixing mechanisms [17] It is aimed atextracting the hidden information from the observed datawhere no relevant data mixture mechanisms are availableICA has been reported to have the capability of extractingthe distinguishability information from the time series data[11 17 18] Back and Weigend [19] used ICA to exact thefeatures of the daily returns of the 28 largest Japanese stocksThe results showed that the dominant ICs can reveal moreunderlying structure and information of the stock pricesthan principal component analysis Kiviluoto and Oja [20]employed ICA to find the fundamental factors affecting thecash flow of 40 stores in the same retail chain They foundthat the cash flow of the retail stores was mainly affected byholidays seasons and competitorsrsquo strategies

In this study a hybrid sales forecasting scheme bycombining ICA with K-means clustering and support vectorregression (SVR) is proposed In the proposed scheme firstthe ICA model is applied to the input data to estimatefeatures Then the features are used as inputs of K-meansclustering algorithm to group the input data into severaldisjoined clusters As K-means is one of the most usedclustering algorithms [21] it is utilized in this study In thefinal step of the proposed scheme for each cluster the SVRforecasting model is constructed and the final forecastingresults can be obtained This study considers the SVR as thepredictor due to its great potential and superior performancein practical applications [18 22] SVR based on statisticallearning theory is a novel neural network algorithm andhas been receiving increasing attention for solving nonlinearregression estimation problems [14 15 18] This is largelydue to the structure risk minimization principles in SVRwhich has greater generalization ability and is superior tothe empirical risk minimization principle as adopted bytraditional neural networks SVR has been successfully andwidely used in various forecasting problems such as electricload forecasting [23ndash25] wind speed [26] traffic flow [27 28]and financial time series forecasting [29ndash35]

There are only very few articles utilizing ICA and SVRin constructing clustering-based sales forecasting model Luand Wang [11] combined ICA growing hierarchical self-organizingmaps (GHSOM) and SVR to develop a clustering-based sales forecastingmodel In their approach the principal

component analysis (PCA) was used as the preprocessingmethod of ICA for dimension reduction However usingPCA to reduce the dimension of the original data beforeICAmay lose some useful information for feature extractionMoreover the GHSOM is a complex and time consumingmodel Since the proposed method does not use PCA fordimension reduction and utilizes the simple fast and well-known K-means method for data clustering it is believedthat the proposed clustering-based sales forecasting approachdiffers from the method proposed by Lu and Wang [11]and hence provides an ideal alternative in conducting salesforecasting for computer wholesalers

The rest of this paper is organized as follows Section 2gives a brief introduction about independent componentanalysis and support vector regression The proposed cluste-ring-based sales forecasting model is thoroughly describedin Section 3 Section 4 presents the experimental results froma computer wholesaler sales data The paper is concluded inSection 5

2 Methodology

21 Independent Component Analysis Let 119883 = [1199091 1199092

119909119898]119879 be a matrix of size119898times 119899119898 le 119899 consisting of observed

mixture signals 119909119894of size 1 times 119899 119894 = 1 2 119898 In the basic

ICA model the matrix119883 can be modeled as [17]

119883 = 119860119878 =

119898

sum

119894=1

119886119894119904119894 (1)

where 119886119894is the 119894th column of the 119898 times 119898 unknown mixing

matrix 119860 119904119894is the 119894th row of the 119898 times 119899 source matrix 119878

The vectors 119904119894are latent source signals that cannot be directly

observed from the observed mixture signals 119909119894 The ICA

model aims at finding an119898times119898 demixingmatrix119882 such that119884 = [119910

119894] = 119882119883 where 119910

119894is the 119894th row of the matrix 119884 119894 =

1 2 119898The vectors 119910119894must be as statistically independent

as possible and are called independent components (ICs)When demixing matrix119882 is the inverse of mixing matrix 119860that is119882 = 119860

minus1 ICs (119910119894) can be used to estimate the latent

source signals 119904119894 Several existing algorithms can be used to

perform ICA modelingThe ICA modeling is formulated as an optimization

problem by setting up the measure of the independence ofICs as an objective function and using some optimizationtechniques for solving the demixing matrix 119882 Severalexisting algorithms can be used to perform ICA modeling[17] In general the ICs are obtained by using the demixingmatrix 119882 to multiply the matrix 119883 that is 119884 = 119882119883 Thedemixing matrix can be determined using an unsupervisedlearning algorithm with the objective of maximizing thestatistical independence of ICs The ICs with non-Gaussiandistributions imply the statistical independence and the non-Gaussianity of the ICs can bemeasured by the negentropy [17]

119869 (y) = 119867 (ygauss) minus 119867 (y) (2)

where ygauss is a Gaussian random vector having the samecovariance matrix as y 119867 is the entropy of a random vectory with density 119901(y) defined as119867(y) = minus int119901(y) log119901(y)119889y



119869 (119910) asymp [119864 119866 (119910) minus 119864 119866 (V)]2 (3)




119891 (119909) = (119911 sdot 120601 (119909)) + 119887 (4)



119871120576(119891 (119909) minus 119902) =


1003816100381610038161003816 ge 120576

0 otherwise(5)



Min 1

2119911119879119911 + 119862sum

119894

(120585119894+ 120585lowast

119894)


119894

119911119879119909119894+ 119887 minus 119902

119894le 120576 + 120585

lowast

119894

120585119894 120585lowast

119894ge 0

(6)


119894)


119891 (119909 z) = 119891 (119909 120572 120572lowast) =119873

sum

119894=1

(120572119894minus 120572lowast

119894)119870 (119909 119909

119894) + 119887 (7)




119895= 0 119870(119909

119894 1199091015840




119894 119909119895) =










119872]119879 of


1 1199082


matrix 119884 = [1199101 1199102 119910



119894are ICs





minus1119884 = 119860119884









119909119894= 11988611989411199101+ 11988611989421199102+ sdot sdot sdot + 119886

119894119872119910119872 119894 = 1 2 119872 (8)




119860 = [1198861 1198862 119886



119895 119894 = 119895 will have





















119894=1(119879119894minus 119875119894)2

119899

MAD =sum119899

119894=1

1003816100381610038161003816119879119894 minus 1198751198941003816100381610038161003816

2

119899

MAPE =sum119899

119894=1

1003816100381610038161003816(119879119894 minus 119875119894) 1198791198941003816100381610038161003816

119899

(9)

where 119879119894and 119875






3 120576 = 2minus3) (119862 = 2

minus1 120576 = 2minus7) and (119862 = 2

1120576 = 2

















Manufacturing



Service



Finance





5 Conclusions






Acknowledgment


References















































Volume 2014




Journal of











Journal of


Function Spaces






Algebra











119869 (119910) asymp [119864 119866 (119910) minus 119864 119866 (V)]2 (3)




119891 (119909) = (119911 sdot 120601 (119909)) + 119887 (4)



119871120576(119891 (119909) minus 119902) =


1003816100381610038161003816 ge 120576

0 otherwise(5)



Min 1

2119911119879119911 + 119862sum

119894

(120585119894+ 120585lowast

119894)


119894

119911119879119909119894+ 119887 minus 119902

119894le 120576 + 120585

lowast

119894

120585119894 120585lowast

119894ge 0

(6)


119894)


119891 (119909 z) = 119891 (119909 120572 120572lowast) =119873

sum

119894=1

(120572119894minus 120572lowast

119894)119870 (119909 119909

119894) + 119887 (7)




119895= 0 119870(119909

119894 1199091015840




119894 119909119895) =










119872]119879 of


1 1199082


matrix 119884 = [1199101 1199102 119910



119894are ICs





minus1119884 = 119860119884









119909119894= 11988611989411199101+ 11988611989421199102+ sdot sdot sdot + 119886

119894119872119910119872 119894 = 1 2 119872 (8)




119860 = [1198861 1198862 119886



119895 119894 = 119895 will have





















119894=1(119879119894minus 119875119894)2

119899

MAD =sum119899

119894=1

1003816100381610038161003816119879119894 minus 1198751198941003816100381610038161003816

2

119899

MAPE =sum119899

119894=1

1003816100381610038161003816(119879119894 minus 119875119894) 1198791198941003816100381610038161003816

119899

(9)

where 119879119894and 119875






3 120576 = 2minus3) (119862 = 2

minus1 120576 = 2minus7) and (119862 = 2

1120576 = 2

















Manufacturing



Service



Finance





5 Conclusions






Acknowledgment


References















































Volume 2014




Journal of











Journal of


Function Spaces






Algebra

















119909119894= 11988611989411199101+ 11988611989421199102+ sdot sdot sdot + 119886

119894119872119910119872 119894 = 1 2 119872 (8)




119860 = [1198861 1198862 119886



119895 119894 = 119895 will have





















119894=1(119879119894minus 119875119894)2

119899

MAD =sum119899

119894=1

1003816100381610038161003816119879119894 minus 1198751198941003816100381610038161003816

2

119899

MAPE =sum119899

119894=1

1003816100381610038161003816(119879119894 minus 119875119894) 1198791198941003816100381610038161003816

119899

(9)

where 119879119894and 119875






3 120576 = 2minus3) (119862 = 2

minus1 120576 = 2minus7) and (119862 = 2

1120576 = 2

















Manufacturing



Service



Finance





5 Conclusions






Acknowledgment


References















































Volume 2014




Journal of











Journal of


Function Spaces






Algebra




















119894=1(119879119894minus 119875119894)2

119899

MAD =sum119899

119894=1

1003816100381610038161003816119879119894 minus 1198751198941003816100381610038161003816

2

119899

MAPE =sum119899

119894=1

1003816100381610038161003816(119879119894 minus 119875119894) 1198791198941003816100381610038161003816

119899

(9)

where 119879119894and 119875






3 120576 = 2minus3) (119862 = 2

minus1 120576 = 2minus7) and (119862 = 2

1120576 = 2

















Manufacturing



Service



Finance





5 Conclusions






Acknowledgment


References















































Volume 2014




Journal of











Journal of


Function Spaces






Algebra






















Manufacturing



Service



Finance





5 Conclusions






Acknowledgment


References















































Volume 2014




Journal of











Journal of


Function Spaces






Algebra










Acknowledgment


References















































Volume 2014




Journal of











Journal of


Function Spaces






Algebra

























Volume 2014




Journal of











Journal of


Function Spaces






Algebra









Research Article A Hybrid Sales Forecasting Scheme by...

Documents

Transcript of Research Article A Hybrid Sales Forecasting Scheme by...