An Improved Demand Forecasting Model Using...

16
Research Article An Improved Demand Forecasting Model Using Deep Learning Approach and Proposed Decision Integration Strategy for Supply Chain Zeynep Hilal Kilimci , 1 A. Okay Akyuz , 1,2 Mitat Uysal, 1 Selim Akyokus , 3 M. Ozan Uysal, 1 Berna Atak Bulbul , 2 and Mehmet Ali Ekmis 2 Department of Computer Engineering, Dogus University, Istanbul, Turkey OBASE Research & Development Center, Istanbul, Turkey Department of Computer Engineering, Istanbul Medipol University, Istanbul, Turkey Correspondence should be addressed to Berna Atak Bulbul; [email protected] Received 1 February 2019; Accepted 5 March 2019; Published 26 March 2019 Guest Editor: iago C. Silva Copyright © 2019 Zeynep Hilal Kilimci et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Demand forecasting is one of the main issues of supply chains. It aimed to optimize stocks, reduce costs, and increase sales, profit, and customer loyalty. For this purpose, historical data can be analyzed to improve demand forecasting by using various methods like machine learning techniques, time series analysis, and deep learning models. In this work, an intelligent demand forecasting system is developed. is improved model is based on the analysis and interpretation of the historical data by using different forecasting methods which include time series analysis techniques, support vector regression algorithm, and deep learning models. To the best of our knowledge, this is the first study to blend the deep learning methodology, support vector regression algorithm, and different time series analysis models by a novel decision integration strategy for demand forecasting approach. e other novelty of this work is the adaptation of boosting ensemble strategy to demand forecasting system by implementing a novel decision integration model. e developed system is applied and tested on real life data obtained from SOK Market in Turkey which operates as a fast-growing company with 6700 stores, 1500 products, and 23 distribution centers. A wide range of comparative and extensive experiments demonstrate that the proposed demand forecasting system exhibits noteworthy results compared to the state-of-art studies. Unlike the state-of-art studies, inclusion of support vector regression, deep learning model, and a novel integration strategy to the proposed forecasting system ensures significant accuracy improvement. 1. Introduction Since competition is increasing day by day among retailers at the market, companies are focusing more predictive analytics techniques in order to decrease their costs and increase their productivity and profit. Excessive stocks (overstock) and out- of-stock (stockouts) are very serious problems for retailers. Excessive stock levels can cause revenue loss because of company capital bound to stock surplus. Excess inventory can also lead to increased storage, labor, and insurance costs, and quality reduction and degradation depending on the type of the product. Out-of-stock products can result in lost for sales and reduced customer satisfaction and store loyalty. If customers cannot find products at the shelves that they are looking for, they might shiſt to another competitor or buy substitute items. Especially at middle and low level segments, customer’s loyalty is quite difficult for retailers [1]. Sales and customer loss is a critical problem for retailers. Considering competition and financial constraints in retail industry, it is very crucial to have an accurate demand forecasting and inventory control system for management of effective operations. Supply chain operations are cost oriented and retailers need to optimize their stocks to carry less financial risks. It seems that retail industry will face more Hindawi Complexity Volume 2019, Article ID 9067367, 15 pages https://doi.org/10.1155/2019/9067367

Transcript of An Improved Demand Forecasting Model Using...

Page 1: An Improved Demand Forecasting Model Using …downloads.hindawi.com/journals/complexity/2019/9067367.pdfAn Improved Demand Forecasting Model Using Deep Learning Approach and Proposed

Research ArticleAn Improved Demand Forecasting Model UsingDeep Learning Approach and Proposed DecisionIntegration Strategy for Supply Chain

Zeynep Hilal Kilimci 1 A Okay Akyuz 12 Mitat Uysal1 Selim Akyokus 3

M Ozan Uysal1 Berna Atak Bulbul 2 and Mehmet Ali Ekmis2

1Department of Computer Engineering Dogus University Istanbul Turkey2OBASE Research amp Development Center Istanbul Turkey3Department of Computer Engineering Istanbul Medipol University Istanbul Turkey

Correspondence should be addressed to Berna Atak Bulbul bernabulbulobasecom

Received 1 February 2019 Accepted 5 March 2019 Published 26 March 2019

Guest Editor Thiago C Silva

Copyright copy 2019 Zeynep Hilal Kilimci et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Demand forecasting is one of the main issues of supply chains It aimed to optimize stocks reduce costs and increase sales profitand customer loyalty For this purpose historical data can be analyzed to improve demand forecasting by using variousmethods likemachine learning techniques time series analysis and deep learningmodels In this work an intelligent demand forecasting systemis developed This improved model is based on the analysis and interpretation of the historical data by using different forecastingmethods which include time series analysis techniques support vector regression algorithm and deep learning models To the bestof our knowledge this is the first study to blend the deep learning methodology support vector regression algorithm and differenttime series analysis models by a novel decision integration strategy for demand forecasting approachThe other novelty of this workis the adaptation of boosting ensemble strategy to demand forecasting system by implementing a novel decision integrationmodelThe developed system is applied and tested on real life data obtained from SOKMarket in Turkey which operates as a fast-growingcompany with 6700 stores 1500 products and 23 distribution centers A wide range of comparative and extensive experimentsdemonstrate that the proposed demand forecasting system exhibits noteworthy results compared to the state-of-art studies Unlikethe state-of-art studies inclusionof support vector regression deep learningmodel and a novel integration strategy to the proposedforecasting system ensures significant accuracy improvement

1 Introduction

Since competition is increasing day by day among retailers atthe market companies are focusing more predictive analyticstechniques in order to decrease their costs and increase theirproductivity and profit Excessive stocks (overstock) and out-of-stock (stockouts) are very serious problems for retailersExcessive stock levels can cause revenue loss because ofcompany capital bound to stock surplus Excess inventorycan also lead to increased storage labor and insurance costsand quality reduction and degradation depending on thetype of the product Out-of-stock products can result in

lost for sales and reduced customer satisfaction and storeloyalty If customers cannot find products at the shelves thatthey are looking for they might shift to another competitoror buy substitute items Especially at middle and low levelsegments customerrsquos loyalty is quite difficult for retailers [1]Sales and customer loss is a critical problem for retailersConsidering competition and financial constraints in retailindustry it is very crucial to have an accurate demandforecasting and inventory control system for managementof effective operations Supply chain operations are costoriented and retailers need to optimize their stocks to carryless financial risks It seems that retail industry will face more

HindawiComplexityVolume 2019 Article ID 9067367 15 pageshttpsdoiorg10115520199067367

2 Complexity

competition in future Therefore the usage of technologicaltools and predictive methods is becoming more popular andnecessary for retailers [2] Retailers in different sectors arelooking for automated demand forecasting and replenish-ment solutions that use big data and predictive analyticstechnologies [3] There has been extensive set of methodsand research performed in the area of demand forecastingTraditional forecasting methods are based on time-seriesforecasting approachesThese forecasting approaches predictfuture demand based on historical time series data which isa sequence of data points measured at successive intervalsin time These methods use a limited number of historicaltime-series data related with demand In the last two decadesdata mining and machine learning models have drawn moreattention and have been successfully applied to time seriesforecasting Machine learning forecasting methods can use alarge amount of data and features related with demand andpredict future demand and patterns using different learningalgorithms Among many machine learning methods deeplearning (DL) methods have become very popular and havebeen recently applied tomany fields such as image and speechrecognition natural language processing andmachine trans-lation DL methods have produced better predictions andresults when compared with traditional machine learningalgorithms in many researches Ensemble learning (EL) isalso another methodology to boost the system performanceAn ensemble system is composed of two parts ensemble gen-eration and ensemble integration [4] In ensemble generationpart a diverse set of base prediction models is generatedby using different methods or samples In integration partthe predictions of all models are combined by using anintegration strategy

In this study we propose a novel model to improvedemand forecasting process which is one of the main issuesof supply chains For this purpose nine different time seriesmethods support vector regression algorithm (SVR) andDLapproach based demand forecasting model are constructedTo get the final decision of these models for the proposedsystem nine different time series methods SVR algorithmand DL model are blended by a new integration strategywhich is reminiscent of boosting ensemble strategyThe othernovelty of this work is the adaptation of boosting strategy todemand forecasting model In this way the final decision ofthe proposed system is based on the best algorithms of theweek by gaining more weight This makes our forecasts morereliable with respect to the trend changes and seasonalitybehaviorsTheproposed system is implemented and tested onSOK Market real-life data Experiment results indicate thatthe proposed system presents noticeable accuracy enhance-ments for demand forecasting process when compared tosingle prediction models In Table 3 the enhancementsobtained by the novel integration method compared to singlebest predictor model are presented To the best of our knowl-edge this is the first study to consolidate the deep learningmethodology different time series analysis models and anovel integration strategy for demand forecasting processThe rest of this paper is organized as follows Section 2gives a summary of related work about demand forecastingtime series methods DL approach and ensemble learning

methodologies Section 3 describes the proposed frameworkSections 4 5 and 6 present experiment setup experimentalresults and conclusions respectively

2 Related Work

This section gives a summary of some of researches aboutdemand forecasting EL and DL There are many appli-cation areas of automatic demand forecasting methodolo-gies at the literature Energy load demand transportationtourism stock market and retail forecasting are some ofimportant application areas of automatic demand forecastingTraditional approaches for demand forecasting use timeseries methods Time series methods include Naıve methodaverage method exponential smoothing Holtrsquos linear trendmethod exponential trend method damped trend methodsHolt-Winters seasonal method moving averages ARMA(Autoregressive Moving Average) and ARIMA (Autoregres-sive Integrated Moving Average) models [5] Exponentialsmoothing methods can have different forms depending onthe usage of trend and seasonal components and additivemultiplicative and damped calculations Pegels presenteddifferent possible exponential smoothing methods in graphi-cal form [6]The types of exponential smoothingmethods arefurther extended by Gardder [7] to include additive andmul-tiplicative damped trend methods ARMA and ARIMA (alsocalled the BoxndashJenkins method named after the statisticiansGeorge Box andGwilym Jenkins) aremost commonmethodsthat are applied to find the best fit of a model to historicalvalues of a time series [8]

Intermittent Demand Forecasting methods try to detectintermittent demand patterns that are characterized withzero or varied demands at different periods Intermittentdemand patterns occur in areas like fashion retail automotivespare parts and manufacturing Modelling of intermittentdemand is a challenging task because of different variationsOne of the influential methods about intermittent demandforecasting is proposed byCroston [9] Crostonrsquosmethod usesa decomposition approach that uses separate exponentiallysmoothed estimates of the demand size and the intervalbetween demand incidences Its superior performance overthe single exponential smoothing (SES) method has beendemonstrated byWillemain [10] To address some limitationson Crostonrsquos method some additional studies were per-formed by SyntetosndashBoylan [11 12] and Teunter Syntetos andBabai [13] Some applications use multiple time series thatcan be organized hierarchically and can be combined usingbottom up and top down approaches at different levels ingroups based on product types geography or other featuresAhierarchical forecasting framework is proposed byHydmanet al [14] that provides better forecasts produced by either atop-down or a bottom-up approach

On supply chain context since there are a high number oftime seriesmethods automaticmodel selection becomes verycrucial [15] Aggregate selection is a single source of forecastsand is chosen for all the time series Also all combinations oftrend and cyclical effects in additive and multiplicative formshould be considered Petropoulos et al in 2014 analyzedvia regression analysis the main determinants of forecasting

Complexity 3

accuracy Li et al introduced a revised mean absolute scalederror (RMASE) in their study as a new accuracy measurefor intermittent demand forecasting which is a relativeerror measure and scale-independent [16] In addition totime series methods artificial intelligence approaches arebecoming popular with the growth of big data technologiesAn initial attempt was made in the study of Garcia [17]

In recent years EL is also popular and used by researchersinmany research areas In the study of Song andDai [18] theyproposed a novel double deep Extreme Learning Machine(ELM) ensemble system focusing on the problem of timeseries forecasting In the study by Araque et al DL basedsentiment classifier is developed [19] This classifier serves asa baseline when compared to subsequent results proposingtwo ensemble techniques which aggregate baseline classifierwith other surface classifiers in sentiment analysis Tonget al proposed a new software defect predicting approachincluding two phases the deep learning phase and two-stageensemble (TSE) phase [20] Qiua presented an ensemblemethod [21] composed of empirical mode decomposition(EMD) algorithm and DL approach both together in hiswork He focused on electric load demand forecasting prob-lem comparing different algorithms with their ensemblestrategy Qi et al present the combination of Ex-Adaboostlearning strategy and the DL research based on supportvectormachine (SVM) and thenpropose a newDeep SupportVector Machine (DeepSVM) [22]

In classification problems the performance of learningalgorithms mostly depends on the nature of data repre-sentation [23] DL was firstly proposed in 2006 by thestudy of Geoff Hinton who reported a significant inventionin the feature extraction [24] After then DL researchersgenerated many new application areas in different fields [25]Deep Belief Networks (DBN) based on Restricted BoltzmannMachines (RBMs) is another representative algorithm of DL[24] where there are connections between the layers butnone of them among units within each layer [24] At firsta DBN is implemented to train data in an unsupervisedlearning way DBN learns the common features of inputs asmuch as possible it can Then DBN can be optimized in asupervised way With DBN the corresponding model canbe constructed for classification or other pattern recognitiontasks Convolutional Neural Networks (CNN) is anotherinstance of DL [22] and multiple layers neural networks InCNN each layer contains several two-dimensional planesand those are composed ofmany neuronsThemain principaladvantage of CNN is that the weight in each convolutionallayer is shared among each of them In other words theneurons use the same filter in each two-dimensional planeAs a result feature tunable parameters reduce computationcomplexity [22 25] Auto Encoder (AE) is also being thoughtto train on deep architectures in an acquisitive layer-wisemanner [22] In neural network (NN) systems it is supposedthat the output itself can be thought as the input data It ispossible to obtain the different data representations for theoriginal data by adjusting the weight of each layer Inputdata is composed of encoder and decoder AE is a NN forreconstructing the input data Other types of DLmethods canbe found in [24 26 27]

Our work differs from the above mentioned literaturestudies in that this is the very first attempt of employing dif-ferent time series methods SVR algorithm and DL approachfor demand forecasting process Unlike the literature studiesa novel final decision integration strategy is proposed to boostthe performance of demand forecasting systemThedetails ofthe proposed study can be found in Section 3

3 Proposed Framework

This section gives a summary of base forecasting techniquessuch as time series and regression methods support vectorregression model feature reduction approach deep learningmethodology and a new final decision integration strategy

31 Time Series and Regression Methods In our proposedsystem nine different time series algorithms including mov-ing average (MA) exponential smoothing Holt-WintersARIMA [28]methods and three different RegressionModelsare employed In time series forecasting models the classicalapproach is to collect historical data analyze these dataunderlying feature and utilize the model to predict the future[28] Table 1 shows algorithm definitions and parameterswhich are used in the proposed system These algorithms arecommonly used forecasting algorithms in time series demandforecasting domain

32 Support Vector Regression Support Vector Machines(SVM) is a powerful classification technique based on asupervised learning theory developed and presented byVladimir Vapnik [29] The background works for SVMdepend on early studies of Vapnikrsquos and Alexei Chervo-nenkisrsquos on statistical learning theory about 1960s Althoughthe training time of even the fastest SVM can be quiteslow their main properties are highly accurate and theirability to model complex and nonlinear decision bound-aries is really powerful They show much less pronenessto overfitting than other methods The support vectors canalso provide a very compact description of the learnedmodel

We also use SVR algorithm in our proposed systemwhich is regression implementation of SVM for continuousvariable classification problems SVR algorithm is being usedfor continuous variable prediction problems as a regressionmethod that preserves all the main properties (maximalmargin) as well as classification problems The main idea ofSVR is the computation of a linear regression function in ahigh dimensional feature space The input data are mappedby means of a nonlinear function in high dimensionalspace SVR has been applied in different variety of areasespecially on time series and financial prediction problemshandwritten digit recognition speaker identification objectrecognition convex quadratic programming and choices ofloss functions are some of them [4] SVR is a continuousvariable prediction method like regression In this study SVRis used to forecast sales demands by using the input variablesexplained in Table 1

4 Complexity

Table 1 Time series algorithms used in demand forecasting

Models Formulation Definition of Variables

Regression Model 1 119910119905 = 1205730 + 12057311198831119905 + 12057321198832119905 + 12057331198833119905 + 12057341198834119905 + 120598119905Variability in weekly customer demand special days andholidays (X1) discount dates (X2) days when the productis closed for sale (X3) and sales (X4) that cannot explain

these three activities but are exceptionalRegression Model2 119910119905 = 1205730 + 12057311198831119905 + 12057321198832119905 + 12057331198833119905 + sum52119895=1 120574119895119882119895119905 + 120598119905 R1 is modeled by adding weekly partial-regressive terms

(119882119895119905 119895 = 1 52)Regression Model3

119910119905 =1205730+12057311198831119905+12057321198832119905+12057331198833119905+sum12119895=1 120572119895119872119895119905+sum4119895=1 120574119895119882119895119905+120598119905The R3 model adds the shadow regression terms to the R1model for each month of the year119872119895119905 119895 = 1 12 andfor the months of the week119882119895119905 119895 = 1 4 end result

SVR (SupportVector Regression) 119910119905 = 119878119881119877 (119894119899119901119906119905 V119886119903119894119886119887119897119890119904)

The SVR prediction is used for forecasting 119910119905 by usinginputs X1 X2 X3 mentioned regression model 1 each

month of the year119872119895119905 119895 = 1 12 and the months ofthe week119882119895119905 119895 = 1 4

ExponentialSmoothing Model 119910119905 = 120572119910119905minus1 + (1 minus 120572) 119910119905minus1 + 120598119905 Exponential smoothing is generally used for products

where there is no trend or seasonality in the model

Holt-TrendMethods

119910119905 = 119871 119905 + 119879119905119871 119905 = 120572119910119905 + (1 minus 120572) (119871 119905 minus 1)119879119905 = 120573 (119871 119905 minus 119871 119905minus1) + (1 minus 120573)119879119905minus1Additive Holt-Trend model has been evaluated for

products with level (L) and trend (T)

Holt-WintersSeasonal Models

119910119905 = 119871 119905 + 119879119905 + 119878119905+119901+1119871 119905 = 120572119910119905 + (1 minus 120572) (119871 119905 + 119879119905)119879119905 = 120573 (119871 119905 minus 119871 119905minus1) + (1 minus 120573) 119879119905minus1119878119905+119901+1 = 120574(119863119905+1119871 119905+1 ) + (1 minus 120574) 119878119905+1

Additive Holt-Winters Seasonal models have beenevaluated for products with level (L) seasonality (S) and

trends (T)

Two-Level Model 11199101119905 = 120572119910119905minus1 + (1 minus 120572) 119910119905minus1 + 1205981199051199102119905 = 1205730 + 12057311198831119905 + 12057321198832119905 + 12057331198833119905 + 12057341198834119905 + 120598119905119910119905 = 1199101119905 minus 1199102119905 + 120598119905

In this method R1 model was first applied to the timeseries Later the demand values estimated by this modelare subtracted from the customer demand data and theresidual values are fitted with exponential correction andHolt-Winters Exponential Smoothing Model Estimates ofthese two metrics were collected and the product demand

estimate was calculated and evaluated

Two-Level Model 2

1199101119905 = 119871 119905 + 119879119905119871 119905 = 1205721199101119905 + (1 minus 120572) (119871 119905 minus 1)119879119905 = 120573 (119871 119905 minus 119871 119905minus1) + (1 minus 120573) 119879119905minus11199102119905 = 1205730 + 12057311198831119905 + 12057321198832119905 + 12057331198833119905 + 12057341198834119905 + 120598119905119910119905 = 1199101119905 minus 1199102119905 + 120598119905

In this method R1 model was first applied to the timeseries Later the demand values estimated by this modelare subtracted from the customer demand data and theresidual values are fitted with exponential correction andHolt-Winters Trend Model Estimates of these two metrics

were collected and the product demand estimate wascalculated and evaluated

Two-Level Model 3

1199101119905 = 119871 119905 + 119879119905 + 119878119905+119901+1119871 119905 = 120572119910119905 + (1 minus 120572) (119871 119905 + 119879119905)119879119905 = 120573 (119871 119905 minus 119871 119905minus1) + (1 minus 120573) 119879119905minus1119878119905+119901+1 = 120574(119863119905+1119871 119905+1 ) + (1 minus 120574) 119878119905+11199102119905 = 1205730 + 12057311198831119905 + 12057321198832119905 + 12057331198833119905 + 12057341198834119905 + 120598119905119910119905 = 1199101119905 minus 1199102119905 + 120598119905

In this method R1 model was first applied to the timeseries Later the demand values estimated by this modelare subtracted from the customer demand data and theresidual values are fitted with exponential correction andHolt-Winters Seasonal Model Estimates of these two

metrics were collected and the product demand estimatewas calculated and evaluated

33 Deep Learning Machine learning approach can analyzefeatures relationships and complex interactions among fea-tures of a problem from samples of a dataset and learn amodel which can be used for demand forecasting Deeplearning (DL) is a machine learning technique that appliesdeep neural network architectures to solve various complexproblems DL has become a very popular research topicamong researchers and has been shown to provide impres-sive results in image processing computer vision naturallanguage processing bioinformatics and many other fields[25 26]

In principle DL is an implementation of artificial neuralnetworks which mimic natural human brain [30] Howevera deep neural network is more powerful and capable ofanalyzing and composing more complex features and rela-tionships than a traditional neural network DL requires highcomputing power and large amounts of data for training Therecent improvements in GPU (Graphical Processing Unit)and parallel architectures enabled the necessary computingpower required in deep neural networks DL uses successivelayers of neurons where each layer extracts more complexand abstracts features from the output of previous layers

Complexity 5

Thus a DL can automatically perform feature extraction initself without any preprocessing step Visual object recogni-tion speech recognition and genomics are some of the fieldswhere DL is applied successfully [26]

A multilayer feedforward artificial neural network(MLFANN) is employed as a deep learning algorithm in thisstudy In a feedforward neural network information movesfrom the input nodes to the hidden nodes and at the end tothe output nodes in successive layers of the network withoutany feedback MLFANN is trained with stochastic gradientdescent using backpropagation It uses gradient descentalgorithm to update the weights purposing to minimize thesquared error among the network output values and thetarget output values Using gradient descent each weightis adjusted according to its contribution value to the errorFor deep learning part H2O library [31] is used as an opensource big data artificial intelligence platform

34 Feature Extraction In this work there are some issuesbecause of the huge amount of data When we try to useall of the features of all products in each store with 155features and 875 million records deep learning algorithmtook days to finish its job due to limited computational powerFor the purpose reducing the number of features and thecomputational time is required for each algorithm In orderto overcome this problem and to accelerate clustering andmodelling phases PCA (Principal Component Analysis) isused as a feature extraction algorithm PCA is a commonlyused dimension reduction algorithm that presents the mostsignificant features PCA transforms each instance of thegiven dataset from d dimensional space to a k dimensionalsubspace in which the new generated set of k dimensions arecalled the Principal Components (PC) [32] Each principalcomponent is directed to a maximum variance excluding thevariance accounted for in all its preceding components As aresult the first component covers the maximum variance andso on In brief Principal Components are represented as

PCi = a1X1 + a2X2+ (1)

wherePCi is ith principal component Xj is j

th original featureand aj is the numerical coefficient for feature Xj

35 Proposed Integration Strategy Decision integrationapproach is being used by the way of combining thestrengths of different algorithms into a single collaboratedmethod philosophy By the help of this approach it is aimedto improve the success of the proposed forecasting system bycombining different algorithms Since each algorithm can bemore sensitive or can have some weaknesses under differentconditions collecting decision of each model providesmore effective and powerful results for decision makingprocesses

There are twomain EL approaches in the literature calledhomogeneous and heterogeneous EL If different types ofclassifiers are used as base algorithms then such a systemis called heterogeneous ensemble otherwise homogenousensemble In this study we concentrate on heterogeneousensembles An ensemble system is composed of two parts

ensemble generation and ensemble integration [33ndash36] Inensemble generation part a diverse set of models are gen-erated using different base classifiers Nine different timeseries and regression methods support vector regressionmodel and deep learning algorithm are employed as baseclassifiers in this study There are many integration methodsthat combine decisions of base classifiers to obtain a finaldecision [37ndash40] For integration step a new decision inte-gration strategy is proposed for demand forecasting modelin this work We draw inspiration from boosting ensemblemodel [4 41] to construct our proposed decision integrationstrategy The essential concept behind boosting methodol-ogy is that in prediction or classification problems finaldecisions can be calculated as weighted combination of theresults

In order to integrate the prediction of each forecastingalgorithm we used two integration strategies for the pro-posed approach The first integration strategy selects the bestperforming forecasting method among others and uses thatmethod to predict the demand of a product in a store forthe next week The second integration strategy chooses thebest performing forecasting methods of current week andcalculates the prediction by combining weighted predictionsof winners In our decision integration strategy final decisionis determined by regarding contribution of all algorithmslike democratic systems Our approach considers decisionsof forecasting algorithms those are performing better byconsidering the previous 4 weeks of this year and last yeartransformation of current week and previous week Whilealgorithms are getting better at forecasts by the time theircontribution weights are increasing accordingly or vice versaIn first decision integration strategy we do not considercontribution of all algorithms in fully democratic mannerwe only look at contributions of the best algorithms ofrelated week In other words final decision is maintainedby integrating the decisions of only the best ones (selectedaccording to their historical decisions) for each store andproductOn the other hand the best algorithms of a week canchange according to each store product couple since everyalgorithm has different behavior for different products andlocations

The second decision integration strategy is basedon the weighted average of mean absolute percentageerror (MAPE) and mean absolute deviation (MAD) [42]These are two popular evaluation metrics to assess theperformance of forecasting models The equations forcalculation of these forecast accuracy measures are asfollows

MAPE (Mean Absolute Percentage Error)

MAPE = 1n

nsumt=1

1003816100381610038161003816Ft minus At1003816100381610038161003816

At(2)

MAD = 1n

nsumt=1

1003816100381610038161003816Ft minus At1003816100381610038161003816 (3)

MAD (Mean Absolute Deviation)

6 Complexity

where

(i) 119865119905 is the expected or estimated value for period 119905(ii) 119860 119905 is the actual value for period t(iii) n is also taking the value of the number of periods

Accuracy of a model can be simply computed as followsAccuracy = 1-MAPE If the values of the above two measuresare small it means that forecasting model performs better[43]

Replenishment for demand forecasting in retail industryneeds some modifications in general EL strategies regardingretail specific trends and dynamics For this purpose theproposed second integration strategy takes calculation ofweighted average of MAPE for each week of previous 4weeks and additionally MAPE of the previous yearrsquos sameweek and previous year following weekrsquos trend in Equation(4) This makes our forecasts more reliable with respect totrend changes and seasonality behaviors MAPE of each weekis being calculated by the formula defined in Equation (2)above The proposed model automatically changes weeklyweights of eachmember algorithm by theway of their averageMAPE for each week at store and product level Based on theresults ofMAPEweights the best algorithms of the week gainmore weight on final decision

Average MAPE of each algorithm for a product at eachstore is calculated with Equation (4) Suppose that MAPEs ofan algorithm are 02 03 01 and 01 for the previous 4 weeksof the current week and 01 and 02 for the previous yearrsquossame week and prior week Assume that weekly coefficientsare 25 20 10 5 30 and 10 respectively Then theaverage MAPE of an algorithm for the current week will be03 lowast 025 + 03 lowast 02 + 01 lowast01 + 01 lowast 005 + 01 lowast 03 + 02 lowast01 = 02 according to Equation (4)

Mavg =nsumk=1

CkMk (4)

In Equation (4) the sum of coefficients should be 1 that issum119899119896=1 119862119896 = 1 119872119896 means MAPE of the related weeks where

(i) M1 is the MAPE of previous week(ii) M2 is the MAPE of 2 weeks before related week(iii) M3 is the MAPE of 3 weeks before related week(iv) M4 is the MAPE of 4 weeks before related week(v) M5 is the MAPE of the previous years same week(vi) M6 is the MAPE of the previous years previous week

The proposed decision integration system computes a MAPEfor each algorithm for the coming forecasting week per storeand product level In addition the effects of special days aretaken into consideration Christmas Valentinersquos daymotherrsquosday start of Ramadan and other religious days can be thoughtas some examples of special days System users consideringthe current yearrsquos calendar can define special days manuallyThus trends of special days are computed by using previousyearrsquos trends automatically This enables the considerationof seasonality and special events that results in evaluation

of more accurate forecasting values Meanwhile seasonalityand other effects can be taken under consideration aswell Following MAPE calculation of the algorithms newweights (Wi where i=1n) are being assigned to each of themaccording to their weighted average for the current week

The next step is the definition of the best algorithms ofthe week for each store and product couple We only take30 of the best performingmethods into consideration as thebest algorithms After making many empirical observationsthis ratio is giving ultimate results according to the datasetIt is obvious that these parameters are very specific todataset characteristics and also depend on which algorithmsare included in the integration strategy Scaling is the lastpreprocess for the calculation of final decision forecastSuppose that we have n algorithms (A1 A2 An) and k ofthem are the best models of the week for a store and productcouple in which 119896 le 119899 The weight of each winner is beingscaled according to its weighted average in Equation (5)

W1015840t = Wt

sumkj=1Wj

t in (1 k) (5)

Assume that there are 3 best algorithms and their weights are1198821 = 30 1198822 = 18 and1198823 = 12 respectively Theirscaled new weights are going to be

11988210158401 = 3060 = 50

11988210158402 = 1860 = 30

11988210158403 = 1260 = 20

(6)

respectively according to Equation (5)Scaling makes our calculation more comprehensible

After scaling the weight of each algorithm the system givesultimate decision according to new weights by consideringthe performance of each algorithmrsquos with Equation (5)

Favg =ksumj=1FjW1015840j (7)

In Equation (7) the main constraint is sum119896119895=11198821015840119895 = 1 119896 isthe number of champion algorithms and F1 is the forecastof the related algorithm Suppose that the best performingalgorithms are A1 A2 and A3 and algorithm A1 forecastssales quantity as 20 and A2 says it will be 10 for the next weekA3 forecast is 5 Let us assume that their scaled weights are50 30 and 20 respectively Then the weighted forecastis as follows according to Equation (7)

Favg = 20 lowast 050 + 10 lowast 030 + 5 lowast 020 = 14 (8)

Every algorithmhas vote right according to its weight Finallyif a member algorithm does not appear in the list of thebest algorithms for a product and store couple for a specificperiod it is automatically being put into a black list sothat it will not be used anymore for a product store level

Complexity 7

Data Collection

Data Cleaning

Data Enrichment from BigData

Finding Frequent Itemsusing Apriori

ReplenishmentData mart on

Hadoop SPARK

Dimensionality Reduction

Clustering

Deep Learning Model Time Series Models Support VectorRegression Model

Final Decision

Proposed Integration Strategy

Figure 1 Flowchart of the proposed system

as it is marked This enables faster computation time inwhich the system disregards poor performing algorithmsThe algorithm of the proposed demand forecasting systemand the flowchart of the proposed system are given inAlgorithm 1 and Figure 1 respectively

4 Experiment Setup

We use an enhanced dataset which includes SOK Marketrsquosreal life sales and stock data with enriched features Thedataset consists of 106 weeks of sales data that includes 7888distinct products for different stores of SOKMarket On eachstore around 1500 products are actively sold while the rest

rarely become active The forecasting is being performed foreach week to predict demand of the following week Three-fold cross validation methodology is applied for testing dataand then decides with the best accurate one of them forthe next coming week forecast This is a common approachmaking demand forecasting in retail [44] Furthermoreoutside weather information such as daily temperature andother weather conditions is joined to the model as newvariables Since it is known that there is a certain correlationbetween shopping trends and weather conditions at retail inmost circumstances [33] we enriched the dataset with theweather information The dataset also includes mostly salesrelated features filled from relational legacy data sources

8 Complexity

Given n is the number of storesm is the number of products t is the number ofalgorithms in the system and 119860 119905 is an algorithm with index t 119904119898119899 is the matrix whichincludes the number of best performing algorithms for each store and product 119861119898119899 is thematrix which contains the set of blacklist of algorithms for each store and product and119865119894119895 is the matrix which stores final decision of each forecast where i is the number ofstores and j is the number of products

for i=1nfor j=1mfor k=1tif 119860119896 is in list 119861119894119895 then continueelse run 119860119896Calculate algorithm weight119882119896end if

end forfor k=1tChoice best performing algorithms and locate in 119904119894119895

end forfor z=1 119904119894119895do scaling for 119860119911

end forCalculate proposed integration strategy and store in 119865119894119895

end forend forreturn all 119865119899119898

Algorithm 1 The algorithm of the proposed demand forecasting system

Deep learning approach is used to estimate customerdemand for each product at each store of SOK Marketretail chain There is also very apparent similarity betweenproducts which are mostly in the same basket in terms ofsales trends For this purpose Apriori algorithm [4] is utilizedto find correlated products of each product Apriori algorithmis a fast way to find out most associated products which areat the same basketThemost associated productrsquos sales relatedfeatures are added to dataset as wellThis takes the advantageof relationships among products with similar sales trendsAddition of associated product of a product and its featuresenables DL algorithm to learn better within different hiddenlayers In summary it is observed in extensive experimentsthat enhancements of training dataset with weather data andmost associated products sales data made our estimates morepowerful with the usage of DL algorithm

The dataset includes weekly based data of each product ateach store for the latest 8 weeks and sales quantities of the last4 weeks in the same season Historically 2 years of data areincluded for each 5500 stores and 1500 products Data size isabout 875 million records within 155 features These featuresinclude sales quantity customer return quantity receivedproduct quantities from distribution centers sales amountdiscount amount receipt counts of a product customercounts who bought a specific product sales quantity foreach day of week (Monday Tuesday etc) maximum andminimum stock level of each day average stock level for eachweek sales quantity for each hour of the days and also salesquantities of last 4 weeks of the same season Since there isa relationship between products which are generally in thesame basket we prepared the same features defined above for

the most associated products as well Detailed explanation ofthe dataset is presented in Table 2

In Table 2 fields given with square brackets mean that itis an array for each week (Ex Sales quantity week 0 meanssales quantity of current week and sales quantity week 1 isfor sales quantity for one week before and so on) Regardingexperimental results taking weekly data for the latest 8 weeksand 4 weeks of the same season at previous year gives moreacceptable trends of changes in seasonality

For DL implementation part H2O library [31] is usedas an open source big data artificial intelligence platformH2O is a powerful machine learning library and gives usopportunity to implement DL algorithms on Hadoop Sparkbig data environments It puts together the power of highlyadvanced machine learning algorithms and provides to getbenefit of truly scalable in-memory processing for Sparkbig data environment on one or many nodes via its versionof Sparkling Water By the help of in-memory processingcapability of Spark technologies it provides faster parallelplatform to utilize big data for business requirements to getmaximum benefit In this study we use H2O version 31402on 64-bit 16-core and 32GBmemorymachine to observe thepower of parallelism during DL modelling while increasingthe number of neurons

A multilayer feedforward artificial neural network(MLFANN) is employed as a deep learning algorithmMLFANN is trained with stochastic gradient descent usingbackpropagation Using gradient descent each weight isadjusted according to its contribution value to the errorH2O has also advanced features like momentum trainingadaptive learning rate L1 or L2 regularization rate annealing

Complexity 9

Table 2 Dataset explanation

Name Description ExampleYearweek Related yearweek weeks are starting fromMonday to Sunday 201801Store Store number 1234Product Product Identification Number 1

Product Adjactive Associated product with the product according to apriori algorithm Most frequentproduct at the same basket with a specific product 2

Stock In Quantity Week[0-8] Stock increase quantity of the product in related week ex stock transfer quantityfrom distribution center to store 50

Return Quantity Week[0-8] Stock return quantity from customers at a specific week and store 20Sales Quantity Week[0-8] Weekly sales quantity of related product at a specific store 120Sales Amount Week[0-8] Total sales amount of the product at the customer receipt 2500Discount Amount Week[0-8] Discount amount of the product if there is any 500Customer Count Week[0-8] How many customers bought this product at a specific week 30Receipt Count Week[0-8] Distinct receipt count for related product 20Sales Quantity Time[9-22] Hourly sales quantity of related product from 9 am to 22 pm 5Last4weeks Day[1-7] Total sales of each weekday of last 4 weeks Total sales of Mondays Tuesdays etc 10Last8weeks Day[1-7] Total sales of each weekday of last 8 weeks Total sales of Mondays Tuesdays etc 10Max Stock Week[0-8] Maximum stock quantity of related week 12Min Stock Week[0-8] Minimum stock quantity of related week 2Avg Stock Week[0-8] Average stock quantity of related week 5Sales Quantity Adj Week[0-8] Sales quantity of most associated product 14Temperature Weekday[1-7] Daily temperature of weekdays Monday Tuesday etc 22Weekly Avg Temperature[0-8] Average weather temperature of related week 23Weather Condition Weekday[1-7] Nominal variable rainy snowy sunny cloudy etc RainySales Quantity Next Week Target variable of our classification problem 25

dropout and grid search Gaussian distribution is appliedbecause of the usage of continuous variable as responsevariable H2O performs very well in our environment when3 level hidden layers 10 nodes each of them and totally 300epochs are set as parameters

After implementing dimension reduction step K-meansis employed as a clustering algorithm K-means is a greedyand fast clustering algorithm [45] that tries to partitionsamples into k clusters in which each sample is near to thecluster center K is selected as 20 because after several trialsand empirical observations the most even distribution ondataset is reached and divided our dataset into 20 differ-ent clusters for each store After that part deep learningalgorithm is applied and obtained 20 different DL modelsfor each store Then demand forecasting for each productis performed by using its clusterrsquos model on a store basisThe main reason of using clustering is time and computinglack for DL algorithm Instead of making modelling for eachstore 20 models are generated for each store in product levelOur trials with sample dataset bias difference are less than2 so this speed-up is very beneficial and a good optionfor researchers The forecasting result of the DL model istransferred into our forecasting system that makes its finaldecision by considering decisions of 11 forecasting methodsincluding DL model

Forecasting solutions should be scalable process largeamounts of data and extract a model from data Big data

environments using Spark technologies give us opportunityfor implementing algorithms with scalability that shares tasksof machine learning algorithms among nodes of parallelarchitectures Considering huge amounts of samples andlarge number of features even the computational power ofparallel architecture is not enough in some of cases Forthis reason dimension reduction is needed for big dataapplications In this study PCA is employed as a featureextraction step

5 Experimental Results

Forecasting solutions should be scalable process largeamounts of data and extract a model from data Big dataenvironments using Spark technologies give us opportunityfor implementing algorithms with scalability that shares tasksof machine learning algorithms among nodes of parallelarchitectures Considering huge amounts of samples andlarge number of features even the computational power ofparallel architecture is not enough in some of cases Forthis reason dimension reduction is needed for big dataapplications In this study PCA is employed as a featureextraction step

SOK Market sells 21 different groups of items We assessthe performance of the proposed forecasting system on groupbasis as shown in Table 3 Table 3 shows MAPE (Mean Abso-lute Percentage Error) of integration strategies (S1) and (S2)

10 Complexity

Table3Dem

andforecasting

improvem

entsperp

rodu

ctgrou

p

Prod

uct

Group

s

Metho

dI

with

outP

ropo

sed

IntegrationStrategy

(119878 1)

MAPE

Metho

dII

with

Prop

osed

IntegrationStrategy

(119878 2)

MAPE

Prop

osed

Integration

Strategy

with

Deep

Learning

(119878 119863)

MAPE

Percentage

Success

Rate

119875 1=(119878 1minus

119878 2)119878 2

Percentage

Success

Rate

119875 2=(119878 1minus

119878 119863)119878 119863

Improvem

ent

119863=119875 2minus

119875 1Ba

byProd

ucts

05157

03081

02927

4026

4325

299

Bakery

Prod

ucts

03482

02059

01966

4085

4352

267

Beverage

03714

02316

02207

3764

4058

294

Biscuit-C

hocolate

03358

02077

01977

3814

4113

299

BreakfastP

rodu

cts

044

4302770

02661

3765

4011

245

Canned-Paste

-Sauces

03836

02309

02198

3980

4269

289

Cheese

03953

02457

02345

3784

4068

284

Cleaning

Prod

ucts

04560

02791

02650

3879

4189

310

CosmeticsP

rodu

cts

05397

03266

03148

3949

4167

219

DeliM

eats

04242

02602

02488

3865

4136

270

EdibleOils

040

6002299

02215

4336

4545

209

Hou

seho

ldGoo

ds05713

03656

03535

3601

3813

212

IceC

ream

-Frozen

05012

03255

03106

3505

3803

298

Legu

mes-Pasta-Sou

p03850

02397

02269

3774

4107

333

Nuts-Ch

ips

03316

02049

01966

3820

4071

251

PoultryEg

gs04219

02527

02403

4011

4304

294

ReadyMeals

04613

02610

02520

4342

4536

194

RedMeat

02514

01616

01532

3571

3906

335

Tea-Coff

eeProd

ucts

04347

02650

02535

3904

4168

264

Textile

Prod

ucts

05418

03048

02907

4374

46

34

260

To

baccoProd

ucts

03791

02378

02290

3729

3961

232

Average

04238

02582

02469

3899

4168

269

Complexity 11

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 2 MAPE distribution according to product groups with S1 integration strategy

in columns 2 and 3The average percentage forecasting errorsof strategies (S1) and (S2) are 042 and 026 respectively As itcan be seen fromTable 3 the percentage success rate of (S2) inaccordance with (S1) is represented in column 5 The currentintegration strategy (S2) provides 3899 improvement overthe first integration strategy (S1) average In some productgroups like Textile Products the error rate is reduced from054 to 031 with 4374 enhancement in MAPE

With this work a further improvement is obtainedby utilizing DL model compared to the usage of just 10algorithms which are based on forecasting strategy Column4 of Table 3 demonstrates MAPEs of 21 different groups ofproducts after the addition of DL model After the inclusionof DL model into the proposed system we observed around2 to 34 increase in demand forecast accuracies in someof product groups The error rates of some product groupslike ldquobaby productsrdquo ldquocosmetic productsrdquo and ldquohouseholdgoodsrdquo are usually higher than others since these productgroups do not have continuous demand by consumers Forexample the error rate for ldquohousehold goodsrdquo group is 035while the error rate for ldquored meatrdquo group is 015 The errorrates in food products are usually lower since customersconsider their daily consumption regularly to buy theseproducts However the error rates are higher in baby textileand household goods Customers buy products from thesegroups depending on whether they like the product or notselectively In addition the prices of goods in these groupsand promotion effects are usually high relative to the pricesof goods in other groups

Moreover it is observed that the top benefitted productgroups for Method I accomplishes over 40 success rateon Baby Products Bakery Products Edible Oils PoultryEggs Ready Meals and Textile Products when the column

of percentage success rate is analyzed The common pointof these groups is that each one of them is being consumedby a specific customer segment regularly For instance babyproducts group is being chosen by families who have kidsEdible Oils Poultry Eggs and Bakery Products are beingpreferred by frequently and continuously shopping customersegments and Ready Meals are being preferred by mostlybachelor single and working consumers etc Furthermorethe inclusion of DL into the forecasting system indicatesthat some other consuming groups (for example CleaningProducts RedMeats and Legumes Pasta Soup) exhibit betterperformance than the others with additional over 3

Figure 2 shows the box plots of mean percentage errors(MAPE) for each group of products after application of(S1) integration strategy The box plots enable us to analyzedistributional characteristics of forecasting errors for productgroups As it can be seen from the figure there are differentmedians for each product group Thus the forecasting errorsare usually dissimilar for different products groups Theinterquartile range box represents the middle 50 of scoresin data The lengths of interquartile range boxes are usuallyvery tall This means that there are quite number of differentprediction errors within products of a given group

Figure 3 presents the box plots of MAPEs of the proposedforecasting system that apply proposed integration approach(S2) which combines the predictions of 10 different forecast-ing algorithms

Figure 4 shows after adding DL to our system resultsThe lengths of interquartile range boxes are narrower whencompared to the ones in Figures 2 and 3 This means that thedistribution of forecasting errors is less than Figures 2 and 3Finally the integration of DL into the proposed forecastingsystem generates results that are more accurate

12 Complexity

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 3 MAPE distribution according to product groups with S2 integration strategy

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 4 MAPE distribution according to product groups after deep learning algorithms

For example for the (S1) integration strategy in BabyProducts product group MAPE distribution for the 1st quar-tile of data 5 is observed between 0and 23 for 2nd quartilebetween 23 and 48 and for 3rd and 4th between 48and 75 in Figure 2 After applying integration strategy forBaby Products product group MAPE distribution becomes0ndash16 for the 1st quartile 16ndash30 for the 2nd quartileof the data and 30ndash50 for the rest This gives around8 improvement for the first quartile from 8 to 18enhancement for the second quartile and from 18 to 25advancement for the rest of the data according to boundaries

differences Median of MAPE distribution is analyzed as48 After applying the proposed integration strategy it isobserved as 27 which means 21 improvement Approx-imately 1-3 enhancement is observed for each quartileof the data with the inclusion of deep learning strategy inFigure 4

For Cleaning Products similar improvements areobserved with the others for integration strategy MAPEdistribution of the 1st quartile of data is observed between0 and 19 for the 2nd quartile it is between 19 and 40and for the rest of data it is observed between 40 and

Complexity 13

SUM

(SA

LE Q

UAN

TITY

)

DATE

9000

8000

7000

6000

5000

4000

3000

2000

1000

0

July August Sep Oct Nov Dec Jan Feb March April May June July

ActualS1

S2SD

2017 2017 2017 2017 2017 2017 2018 2018 2018 2018 2018 2018 2018

Figure 5 Accuracy comparison among integration strategies for one sample product group

73 in Figure 2 After performing the proposed integrationstrategy for Cleaning Products MAPE distribution becomes0ndash10 for the 1st quartile of data 10ndash17 for the 2ndquartile and 27ndash43 for the 3rd and 4th quartiles togetherImprovements are observed as follows 9 for the 1st quartilefrom 9 to 13 for the 2nd quartile and from 13 to 20for the rest of the data at boundaries After implementationof the integration strategy for the 1st and 2nd quartiles thesystem advancement is observed as 2 and for the 3rdand 4th quartiles it is 3 respectively Briefly integrationstrategy improvement is remarkable for each product groupand consolidation of DL to the proposed integration strategyadvances results between almost 2 and 3 additionally

In Figure 5 the accuracy comparison among integrationstrategies of one of the best performing sample groups ispresented during 1-year period Particularly Christmas andother seasonality effects can be seen very clearly at Figure 5and integration strategy with DL (SD) is performing with thebest accuracy

It is hard to compare the performance of our proposedsystem with the other studies because of the lack of workswith similar combinations of deep learning approach theproposed decision integration strategies and different learn-ing algorithms for demand forecasting process Anotherdifficulty is the deficiency of real-life dataset similar toSOK dataset in order to compare the state-of-art studiesAlthough the results of proposed system are given in thisstudy we also report the results of a number of researchworkshere on demand forecasting domain A greedy aggregation-decomposition method has solved a real-world intermittentdemand forecasting problem for a fashion retailer in Sin-gapore [46] They report 59 MAPE success rate with asmall dataset Authors compare different approaches suchas statistical model winter model and radius basis functionneural network (RBFNN) with SVM for demand forecastingprocess in [47] As a result they conclude the study by the

fact that the success of SVM algorithm surpasses otherswith around 77 enhancement at average MAPE resultslevel A novel demand forecasting model called SHEnSVM(Selective and Heterogeneous Ensemble of Support Vec-tor Machines) is proposed in [48] The proposed modelpresents that the individual SVMs are trained by differentsamples generated by bootstrap algorithm After that geneticalgorithm is employed for retrieving the best individualcombination schema They report 10 advancement withSVM algorithm and 64 MAPE improvement They onlyemploy beer datawith 3 different brands in their experimentsTugba Efendigil et al propose a novel forecasting mechanismwhich is modeled by artificial intelligence approaches in[49] They compared both artificial neural networks andadaptive network-based fuzzy inference system techniques tomanage the fuzzy demandwith incomplete informationTheyreached around 18 MAPE rates for some products duringtheir experiments

6 Conclusion

In retail industry demand forecasting is one of the mainproblems of supply chains to optimize stocks reduce costsand increase sales profit and customer loyalty To overcomethis issue there are several methods such as time seriesanalysis and machine learning approaches to analyze andlearn complex interactions and patterns from historical data

In this study there is a novel attempt to integrate the11 different forecasting models that include time series algo-rithms support vector regression model and deep learningmethod for demand forecasting process Moreover a noveldecision integration strategy is developed by drawing inspi-ration from the ensemble methodology namely boostingThe proposed approach considers the performance of eachmodel in time and combines the weighted predictions of thebest performing models for demand forecasting process The

14 Complexity

proposed forecasting system is tested and carried out real lifedata obtained from SOK Market retail chain It is observedthat the inclusion of different learning algorithms except timeseries models and a novel integration strategy advanced theperformance of demand forecasting system To the best ofour knowledge this is the very first attempt to consolidatedeep learning methodology SVR algorithm and differenttime series methods for demand forecasting systems Fur-thermore the other novelty of this work is the adaptation ofboosting ensemble strategy to the demand forecasting modelIn this way the final decision of the proposed system isbased on the best algorithms of the week by gaining moreweight This makes our forecasts more reliable with respectto the trend changes and seasonality behaviors Moreoverthe proposed approach performs very well integrating withdeep learning algorithm on Spark big data environmentDimension reduction process and clustering methods helpto decrease time consuming with less computing powerduring deep learning modelling phase Although a reviewof some of similar studies is presented in Section 5 asit is expected it is very difficult to compare the resultsof other studies with ours because of the use of differentdatasets and methods In this study we compare results ofthree models where model one selects the best performingforecasting method depending on its success on previousperiod with 424 MAPE on average The second modelwith the novel integration strategy results in 258 MAPEon average The last model with the novel integration strat-egy enhanced with deep learning approach provides 247MAPE on average As a result the inclusion of deep learningapproach into the novel integration strategy reduces averageprediction error for demand forecasting process in supplychain

As a future work we plan to enrich the set of featuresby gathering data from other sources like economic studiesshopping trends social media social events and locationbased demographic data of stores New variety of datasources contributions to deep learning can be observedOne more study can be done to determine the hyperpa-rameters for deep learning algorithm In addition we alsoplan to use other deep learning techniques such as con-volutional neural networks recurrent neural networks anddeep neural networks as learning algorithms Furthermoreour another objective is to use heuristic methods MBO(Migrating Birds Optimization) and other related algorithms[50] to optimize some of coefficientsweights which weredetermined empirically by trial-and-error like taking 30percent of the best performing methods in our currentsystem

Data Availability

This dataset is private customer data so the agreement withthe organization SOK does not allow sharing data

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work is supported by OBASE Research amp DevelopmentCenter

References

[1] P J McGoldrick and E Andre ldquoConsumer misbehaviourpromiscuity or loyalty in grocery shoppingrdquo Journal of Retailingand Consumer Services vol 4 no 2 pp 73ndash81 1997

[2] D Grewal A L Roggeveen and J Nordfalt ldquoThe Future ofRetailingrdquo Journal of Retailing vol 93 no 1 pp 1ndash6 2017

[3] E T Bradlow M Gangwar P Kopalle and S Voleti ldquothe roleof big data and predictive analytics in retailingrdquo Journal ofRetailing vol 93 no 1 pp 79ndash95 2017

[4] J Han M Kamber and J Pei Data Mining Concepts andTechniques Morgan Kaufmann Publishers USA 2013

[5] R Hyndman and G Athanasopoulos Forecasting Principlesand Practice OTexts Melbourne Australia 2018 httpotextsorgfpp2

[6] C C Pegels ldquoExponential forecasting some new variationsrdquoManagement Science vol 12 pp 311ndash315 1969

[7] E S Gardner ldquoExponential smoothing the state of the artrdquoJournal of Forecasting vol 4 no 1 pp 1ndash28 1985

[8] D Gujarati Basic Econometrics Mcgraw-Hill New York NYUSA 2003

[9] J D Croston ldquoForecasting and stock control for intermittentdemandsrdquo Operational Research Quarterly vol 23 no 3 pp289ndash303 1972

[10] T RWillemain C N Smart J H Shockor and P A DeSautelsldquoForecasting intermittent demand inmanufacturing a compar-ative evaluation of Crostonrsquos methodrdquo International Journal ofForecasting vol 10 no 4 pp 529ndash538 1994

[11] A Syntetos Forecasting of intermittent demand Brunel Univer-sity 2001

[12] A A Syntetos and J E Boylan ldquoThe accuracy of intermittentdemand estimatesrdquo International Journal of Forecasting vol 21no 2 pp 303ndash314 2005

[13] R H Teunter A A Syntetos and M Z Babai ldquoIntermittentdemand linking forecasting to inventory obsolescencerdquo Euro-pean Journal of Operational Research vol 214 no 3 pp 606ndash615 2011

[14] R J Hyndman R A Ahmed G Athanasopoulos and H LShang ldquoOptimal combination forecasts for hierarchical timeseriesrdquo Computational Statistics amp Data Analysis vol 55 no 9pp 2579ndash2589 2011

[15] R Fildes and F Petropoulos ldquoSimple versus complex selectionrules for forecasting many time seriesrdquo Journal of BusinessResearch vol 68 no 8 pp 1692ndash1701 2015

[16] C Li and A Lim ldquoA greedy aggregationndashdecompositionmethod for intermittent demand forecasting in fashion retail-ingrdquo European Journal of Operational Research vol 269 no 3pp 860ndash869 2018

[17] F Turrado Garcıa L J Garcıa Villalba and J Portela ldquoIntelli-gent system for time series classification using support vectormachines applied to supply-chainrdquo Expert Systems with Appli-cations vol 39 no 12 pp 10590ndash10599 2012

[18] G Song and Q Dai ldquoA novel double deep ELMs ensemblesystem for time series forecastingrdquo Knowledge-Based Systemsvol 134 pp 31ndash49 2017

Complexity 15

[19] O Araque I Corcuera-Platas J F Sanchez-Rada and C AIglesias ldquoEnhancing deep learning sentiment analysis withensemble techniques in social applicationsrdquoExpert SystemswithApplications vol 77 pp 236ndash246 2017

[20] H Tong B Liu and S Wang ldquoSoftware defect predictionusing stacked denoising autoencoders and twostage ensemblelearningrdquo Information and Soware Technology vol 96 pp 94ndash111 2018

[21] X Qiu Y Ren P N Suganthan and G A J AmaratungaldquoEmpirical mode decomposition based ensemble deep learningfor load demand time series forecastingrdquo Applied So Comput-ing vol 54 pp 246ndash255 2017

[22] Z Qi BWang Y Tian and P Zhang ldquoWhen ensemble learningmeets deep learning a new deep support vector machine forclassificationrdquo Knowledge-Based Systems vol 107 pp 54ndash602016

[23] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[24] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[25] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesrdquo inNeural Networks Tricks of theTrade vol 7700 of Lecture Notes in Computer Science pp 437ndash478 Springer Berlin Germany 2nd edition 2012

[26] Y LeCun Y Bengio and G Hinton ldquoDeep learning ReviewrdquoInternational Journal of Business and Social Science vol 3 2015

[27] S Kim Z Yu R M Kil and M Lee ldquoDeep learning of supportvectormachineswith class probability output networksrdquoNeuralNetworks vol 64 pp 19ndash28 2015

[28] P J Brockwell and R A Davis Time Series eory AndMethods Springer Series in Statistics NewYork NY USA 1989

[29] V N Vapnik Statistical Learning eory Wiley- InterscienceNew York NY USA 1998

[30] S Haykin Neural Networks and LearningMachines MacmillanPublishers Limited New Jersey NJ USA 2009

[31] A Candel V Parmar E LeDell and A Arora Deep Learningwith H2O United States of America 2018

[32] K Keerthi Vasan and B Surendiran ldquoDimensionality reductionusing principal component analysis for network intrusiondetectionrdquo Perspectives in Science vol 8 pp 510ndash512 2016

[33] L Rokach ldquoEnsemble-based classifiersrdquo Artificial IntelligenceReview vol 33 no 1-2 pp 1ndash39 2010

[34] R Polikar ldquoEnsemble based systems in decision makingrdquo IEEECircuits and Systems Magazine vol 6 no 3 pp 21ndash45 2006

[35] Reid SamA review of heterogeneous ensemblemethods Depart-ment of Computer Science University of Colorado at Boulder2007

[36] D Gopika and B Azhagusundari ldquoAn analysis on ensem-ble methods in classification tasksrdquo International Journal ofAdvanced Research in Computer and Communication Engineer-ing vol 3 no 7 pp 7423ndash7427 2014

[37] Y Ren L Zhang and P N Suganthan ldquoEnsemble classificationand regression-recent developments applications and futuredirectionsrdquo IEEE Computational Intelligence Magazine vol 11no 1 pp 41ndash53 2016

[38] U G Mangai S Samanta S Das and P R ChowdhuryldquoA survey of decision fusion and feature fusion strategies for

pattern classificationrdquo IETE Technical Review vol 27 no 4 pp293ndash307 2010

[39] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014

[40] G Tsoumakas L Angelis and I Vlahavas ldquoSelective fusion ofheterogeneous classifiersrdquo Intelligent Data Analysis vol 9 no 6pp 511ndash525 2005

[41] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[42] S Chopra and P Meindl Supply Chain Management StrategyPlanning and Operation Prentice Hall 2004

[43] G Kushwaha ldquoOperational performance through supply chainmanagement practicesrdquo International Journal of Business andSocial Science vol 217 pp 65ndash77 2012

[44] G E Box G M Jenkins G Reinsel and G M LjungTime Series Analysis Forecasting and Control Wiley Series inProbability and Statistics John Wiley amp Sons Hoboken NJUSA 5th edition 2015

[45] W-L Zhao C-H Deng and C-W Ngo ldquok-means a revisitrdquoNeurocomputing vol 291 pp 195ndash206 2018

[46] C Li andA Lim ldquoA greedy aggregation-decompositionmethodfor intermittent demand forecasting in fashion retailingrdquo Euro-pean Journal of Operational Research vol 269 no 3 pp 860ndash869 2018

[47] L Yue Y Yafeng G Junjun and T Chongli ldquoDemand forecast-ing by using support vectormachinerdquo inProceedings of theirdInternational Conference on Natural Computation (ICNC 2007)pp 272ndash276 Haikou China August 2007

[48] L Yue L Zhenjiang Y Yafeng T Zaixia G Junjun andZ Bofeng ldquoSelective and heterogeneous SVM ensemble fordemand forecastingrdquo in Proceedings of the 2010 IEEE 10th Inter-national Conference on Computer and Information Technology(CIT) pp 1519ndash1524 Bradford UK June 2010

[49] T Efendigil S Onut and C Kahraman ldquoA decision supportsystem for demand forecasting with artificial neural networksand neuro-fuzzy models a comparative analysisrdquo Expert Sys-tems with Applications vol 36 no 3 pp 6697ndash6707 2009

[50] E Duman M Uysal and A F Alkaya ldquoMigrating birds opti-mization a newmetaheuristic approach and its performance onquadratic assignment problemrdquo Information Sciences vol 217pp 65ndash77 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 2: An Improved Demand Forecasting Model Using …downloads.hindawi.com/journals/complexity/2019/9067367.pdfAn Improved Demand Forecasting Model Using Deep Learning Approach and Proposed

2 Complexity

competition in future Therefore the usage of technologicaltools and predictive methods is becoming more popular andnecessary for retailers [2] Retailers in different sectors arelooking for automated demand forecasting and replenish-ment solutions that use big data and predictive analyticstechnologies [3] There has been extensive set of methodsand research performed in the area of demand forecastingTraditional forecasting methods are based on time-seriesforecasting approachesThese forecasting approaches predictfuture demand based on historical time series data which isa sequence of data points measured at successive intervalsin time These methods use a limited number of historicaltime-series data related with demand In the last two decadesdata mining and machine learning models have drawn moreattention and have been successfully applied to time seriesforecasting Machine learning forecasting methods can use alarge amount of data and features related with demand andpredict future demand and patterns using different learningalgorithms Among many machine learning methods deeplearning (DL) methods have become very popular and havebeen recently applied tomany fields such as image and speechrecognition natural language processing andmachine trans-lation DL methods have produced better predictions andresults when compared with traditional machine learningalgorithms in many researches Ensemble learning (EL) isalso another methodology to boost the system performanceAn ensemble system is composed of two parts ensemble gen-eration and ensemble integration [4] In ensemble generationpart a diverse set of base prediction models is generatedby using different methods or samples In integration partthe predictions of all models are combined by using anintegration strategy

In this study we propose a novel model to improvedemand forecasting process which is one of the main issuesof supply chains For this purpose nine different time seriesmethods support vector regression algorithm (SVR) andDLapproach based demand forecasting model are constructedTo get the final decision of these models for the proposedsystem nine different time series methods SVR algorithmand DL model are blended by a new integration strategywhich is reminiscent of boosting ensemble strategyThe othernovelty of this work is the adaptation of boosting strategy todemand forecasting model In this way the final decision ofthe proposed system is based on the best algorithms of theweek by gaining more weight This makes our forecasts morereliable with respect to the trend changes and seasonalitybehaviorsTheproposed system is implemented and tested onSOK Market real-life data Experiment results indicate thatthe proposed system presents noticeable accuracy enhance-ments for demand forecasting process when compared tosingle prediction models In Table 3 the enhancementsobtained by the novel integration method compared to singlebest predictor model are presented To the best of our knowl-edge this is the first study to consolidate the deep learningmethodology different time series analysis models and anovel integration strategy for demand forecasting processThe rest of this paper is organized as follows Section 2gives a summary of related work about demand forecastingtime series methods DL approach and ensemble learning

methodologies Section 3 describes the proposed frameworkSections 4 5 and 6 present experiment setup experimentalresults and conclusions respectively

2 Related Work

This section gives a summary of some of researches aboutdemand forecasting EL and DL There are many appli-cation areas of automatic demand forecasting methodolo-gies at the literature Energy load demand transportationtourism stock market and retail forecasting are some ofimportant application areas of automatic demand forecastingTraditional approaches for demand forecasting use timeseries methods Time series methods include Naıve methodaverage method exponential smoothing Holtrsquos linear trendmethod exponential trend method damped trend methodsHolt-Winters seasonal method moving averages ARMA(Autoregressive Moving Average) and ARIMA (Autoregres-sive Integrated Moving Average) models [5] Exponentialsmoothing methods can have different forms depending onthe usage of trend and seasonal components and additivemultiplicative and damped calculations Pegels presenteddifferent possible exponential smoothing methods in graphi-cal form [6]The types of exponential smoothingmethods arefurther extended by Gardder [7] to include additive andmul-tiplicative damped trend methods ARMA and ARIMA (alsocalled the BoxndashJenkins method named after the statisticiansGeorge Box andGwilym Jenkins) aremost commonmethodsthat are applied to find the best fit of a model to historicalvalues of a time series [8]

Intermittent Demand Forecasting methods try to detectintermittent demand patterns that are characterized withzero or varied demands at different periods Intermittentdemand patterns occur in areas like fashion retail automotivespare parts and manufacturing Modelling of intermittentdemand is a challenging task because of different variationsOne of the influential methods about intermittent demandforecasting is proposed byCroston [9] Crostonrsquosmethod usesa decomposition approach that uses separate exponentiallysmoothed estimates of the demand size and the intervalbetween demand incidences Its superior performance overthe single exponential smoothing (SES) method has beendemonstrated byWillemain [10] To address some limitationson Crostonrsquos method some additional studies were per-formed by SyntetosndashBoylan [11 12] and Teunter Syntetos andBabai [13] Some applications use multiple time series thatcan be organized hierarchically and can be combined usingbottom up and top down approaches at different levels ingroups based on product types geography or other featuresAhierarchical forecasting framework is proposed byHydmanet al [14] that provides better forecasts produced by either atop-down or a bottom-up approach

On supply chain context since there are a high number oftime seriesmethods automaticmodel selection becomes verycrucial [15] Aggregate selection is a single source of forecastsand is chosen for all the time series Also all combinations oftrend and cyclical effects in additive and multiplicative formshould be considered Petropoulos et al in 2014 analyzedvia regression analysis the main determinants of forecasting

Complexity 3

accuracy Li et al introduced a revised mean absolute scalederror (RMASE) in their study as a new accuracy measurefor intermittent demand forecasting which is a relativeerror measure and scale-independent [16] In addition totime series methods artificial intelligence approaches arebecoming popular with the growth of big data technologiesAn initial attempt was made in the study of Garcia [17]

In recent years EL is also popular and used by researchersinmany research areas In the study of Song andDai [18] theyproposed a novel double deep Extreme Learning Machine(ELM) ensemble system focusing on the problem of timeseries forecasting In the study by Araque et al DL basedsentiment classifier is developed [19] This classifier serves asa baseline when compared to subsequent results proposingtwo ensemble techniques which aggregate baseline classifierwith other surface classifiers in sentiment analysis Tonget al proposed a new software defect predicting approachincluding two phases the deep learning phase and two-stageensemble (TSE) phase [20] Qiua presented an ensemblemethod [21] composed of empirical mode decomposition(EMD) algorithm and DL approach both together in hiswork He focused on electric load demand forecasting prob-lem comparing different algorithms with their ensemblestrategy Qi et al present the combination of Ex-Adaboostlearning strategy and the DL research based on supportvectormachine (SVM) and thenpropose a newDeep SupportVector Machine (DeepSVM) [22]

In classification problems the performance of learningalgorithms mostly depends on the nature of data repre-sentation [23] DL was firstly proposed in 2006 by thestudy of Geoff Hinton who reported a significant inventionin the feature extraction [24] After then DL researchersgenerated many new application areas in different fields [25]Deep Belief Networks (DBN) based on Restricted BoltzmannMachines (RBMs) is another representative algorithm of DL[24] where there are connections between the layers butnone of them among units within each layer [24] At firsta DBN is implemented to train data in an unsupervisedlearning way DBN learns the common features of inputs asmuch as possible it can Then DBN can be optimized in asupervised way With DBN the corresponding model canbe constructed for classification or other pattern recognitiontasks Convolutional Neural Networks (CNN) is anotherinstance of DL [22] and multiple layers neural networks InCNN each layer contains several two-dimensional planesand those are composed ofmany neuronsThemain principaladvantage of CNN is that the weight in each convolutionallayer is shared among each of them In other words theneurons use the same filter in each two-dimensional planeAs a result feature tunable parameters reduce computationcomplexity [22 25] Auto Encoder (AE) is also being thoughtto train on deep architectures in an acquisitive layer-wisemanner [22] In neural network (NN) systems it is supposedthat the output itself can be thought as the input data It ispossible to obtain the different data representations for theoriginal data by adjusting the weight of each layer Inputdata is composed of encoder and decoder AE is a NN forreconstructing the input data Other types of DLmethods canbe found in [24 26 27]

Our work differs from the above mentioned literaturestudies in that this is the very first attempt of employing dif-ferent time series methods SVR algorithm and DL approachfor demand forecasting process Unlike the literature studiesa novel final decision integration strategy is proposed to boostthe performance of demand forecasting systemThedetails ofthe proposed study can be found in Section 3

3 Proposed Framework

This section gives a summary of base forecasting techniquessuch as time series and regression methods support vectorregression model feature reduction approach deep learningmethodology and a new final decision integration strategy

31 Time Series and Regression Methods In our proposedsystem nine different time series algorithms including mov-ing average (MA) exponential smoothing Holt-WintersARIMA [28]methods and three different RegressionModelsare employed In time series forecasting models the classicalapproach is to collect historical data analyze these dataunderlying feature and utilize the model to predict the future[28] Table 1 shows algorithm definitions and parameterswhich are used in the proposed system These algorithms arecommonly used forecasting algorithms in time series demandforecasting domain

32 Support Vector Regression Support Vector Machines(SVM) is a powerful classification technique based on asupervised learning theory developed and presented byVladimir Vapnik [29] The background works for SVMdepend on early studies of Vapnikrsquos and Alexei Chervo-nenkisrsquos on statistical learning theory about 1960s Althoughthe training time of even the fastest SVM can be quiteslow their main properties are highly accurate and theirability to model complex and nonlinear decision bound-aries is really powerful They show much less pronenessto overfitting than other methods The support vectors canalso provide a very compact description of the learnedmodel

We also use SVR algorithm in our proposed systemwhich is regression implementation of SVM for continuousvariable classification problems SVR algorithm is being usedfor continuous variable prediction problems as a regressionmethod that preserves all the main properties (maximalmargin) as well as classification problems The main idea ofSVR is the computation of a linear regression function in ahigh dimensional feature space The input data are mappedby means of a nonlinear function in high dimensionalspace SVR has been applied in different variety of areasespecially on time series and financial prediction problemshandwritten digit recognition speaker identification objectrecognition convex quadratic programming and choices ofloss functions are some of them [4] SVR is a continuousvariable prediction method like regression In this study SVRis used to forecast sales demands by using the input variablesexplained in Table 1

4 Complexity

Table 1 Time series algorithms used in demand forecasting

Models Formulation Definition of Variables

Regression Model 1 119910119905 = 1205730 + 12057311198831119905 + 12057321198832119905 + 12057331198833119905 + 12057341198834119905 + 120598119905Variability in weekly customer demand special days andholidays (X1) discount dates (X2) days when the productis closed for sale (X3) and sales (X4) that cannot explain

these three activities but are exceptionalRegression Model2 119910119905 = 1205730 + 12057311198831119905 + 12057321198832119905 + 12057331198833119905 + sum52119895=1 120574119895119882119895119905 + 120598119905 R1 is modeled by adding weekly partial-regressive terms

(119882119895119905 119895 = 1 52)Regression Model3

119910119905 =1205730+12057311198831119905+12057321198832119905+12057331198833119905+sum12119895=1 120572119895119872119895119905+sum4119895=1 120574119895119882119895119905+120598119905The R3 model adds the shadow regression terms to the R1model for each month of the year119872119895119905 119895 = 1 12 andfor the months of the week119882119895119905 119895 = 1 4 end result

SVR (SupportVector Regression) 119910119905 = 119878119881119877 (119894119899119901119906119905 V119886119903119894119886119887119897119890119904)

The SVR prediction is used for forecasting 119910119905 by usinginputs X1 X2 X3 mentioned regression model 1 each

month of the year119872119895119905 119895 = 1 12 and the months ofthe week119882119895119905 119895 = 1 4

ExponentialSmoothing Model 119910119905 = 120572119910119905minus1 + (1 minus 120572) 119910119905minus1 + 120598119905 Exponential smoothing is generally used for products

where there is no trend or seasonality in the model

Holt-TrendMethods

119910119905 = 119871 119905 + 119879119905119871 119905 = 120572119910119905 + (1 minus 120572) (119871 119905 minus 1)119879119905 = 120573 (119871 119905 minus 119871 119905minus1) + (1 minus 120573)119879119905minus1Additive Holt-Trend model has been evaluated for

products with level (L) and trend (T)

Holt-WintersSeasonal Models

119910119905 = 119871 119905 + 119879119905 + 119878119905+119901+1119871 119905 = 120572119910119905 + (1 minus 120572) (119871 119905 + 119879119905)119879119905 = 120573 (119871 119905 minus 119871 119905minus1) + (1 minus 120573) 119879119905minus1119878119905+119901+1 = 120574(119863119905+1119871 119905+1 ) + (1 minus 120574) 119878119905+1

Additive Holt-Winters Seasonal models have beenevaluated for products with level (L) seasonality (S) and

trends (T)

Two-Level Model 11199101119905 = 120572119910119905minus1 + (1 minus 120572) 119910119905minus1 + 1205981199051199102119905 = 1205730 + 12057311198831119905 + 12057321198832119905 + 12057331198833119905 + 12057341198834119905 + 120598119905119910119905 = 1199101119905 minus 1199102119905 + 120598119905

In this method R1 model was first applied to the timeseries Later the demand values estimated by this modelare subtracted from the customer demand data and theresidual values are fitted with exponential correction andHolt-Winters Exponential Smoothing Model Estimates ofthese two metrics were collected and the product demand

estimate was calculated and evaluated

Two-Level Model 2

1199101119905 = 119871 119905 + 119879119905119871 119905 = 1205721199101119905 + (1 minus 120572) (119871 119905 minus 1)119879119905 = 120573 (119871 119905 minus 119871 119905minus1) + (1 minus 120573) 119879119905minus11199102119905 = 1205730 + 12057311198831119905 + 12057321198832119905 + 12057331198833119905 + 12057341198834119905 + 120598119905119910119905 = 1199101119905 minus 1199102119905 + 120598119905

In this method R1 model was first applied to the timeseries Later the demand values estimated by this modelare subtracted from the customer demand data and theresidual values are fitted with exponential correction andHolt-Winters Trend Model Estimates of these two metrics

were collected and the product demand estimate wascalculated and evaluated

Two-Level Model 3

1199101119905 = 119871 119905 + 119879119905 + 119878119905+119901+1119871 119905 = 120572119910119905 + (1 minus 120572) (119871 119905 + 119879119905)119879119905 = 120573 (119871 119905 minus 119871 119905minus1) + (1 minus 120573) 119879119905minus1119878119905+119901+1 = 120574(119863119905+1119871 119905+1 ) + (1 minus 120574) 119878119905+11199102119905 = 1205730 + 12057311198831119905 + 12057321198832119905 + 12057331198833119905 + 12057341198834119905 + 120598119905119910119905 = 1199101119905 minus 1199102119905 + 120598119905

In this method R1 model was first applied to the timeseries Later the demand values estimated by this modelare subtracted from the customer demand data and theresidual values are fitted with exponential correction andHolt-Winters Seasonal Model Estimates of these two

metrics were collected and the product demand estimatewas calculated and evaluated

33 Deep Learning Machine learning approach can analyzefeatures relationships and complex interactions among fea-tures of a problem from samples of a dataset and learn amodel which can be used for demand forecasting Deeplearning (DL) is a machine learning technique that appliesdeep neural network architectures to solve various complexproblems DL has become a very popular research topicamong researchers and has been shown to provide impres-sive results in image processing computer vision naturallanguage processing bioinformatics and many other fields[25 26]

In principle DL is an implementation of artificial neuralnetworks which mimic natural human brain [30] Howevera deep neural network is more powerful and capable ofanalyzing and composing more complex features and rela-tionships than a traditional neural network DL requires highcomputing power and large amounts of data for training Therecent improvements in GPU (Graphical Processing Unit)and parallel architectures enabled the necessary computingpower required in deep neural networks DL uses successivelayers of neurons where each layer extracts more complexand abstracts features from the output of previous layers

Complexity 5

Thus a DL can automatically perform feature extraction initself without any preprocessing step Visual object recogni-tion speech recognition and genomics are some of the fieldswhere DL is applied successfully [26]

A multilayer feedforward artificial neural network(MLFANN) is employed as a deep learning algorithm in thisstudy In a feedforward neural network information movesfrom the input nodes to the hidden nodes and at the end tothe output nodes in successive layers of the network withoutany feedback MLFANN is trained with stochastic gradientdescent using backpropagation It uses gradient descentalgorithm to update the weights purposing to minimize thesquared error among the network output values and thetarget output values Using gradient descent each weightis adjusted according to its contribution value to the errorFor deep learning part H2O library [31] is used as an opensource big data artificial intelligence platform

34 Feature Extraction In this work there are some issuesbecause of the huge amount of data When we try to useall of the features of all products in each store with 155features and 875 million records deep learning algorithmtook days to finish its job due to limited computational powerFor the purpose reducing the number of features and thecomputational time is required for each algorithm In orderto overcome this problem and to accelerate clustering andmodelling phases PCA (Principal Component Analysis) isused as a feature extraction algorithm PCA is a commonlyused dimension reduction algorithm that presents the mostsignificant features PCA transforms each instance of thegiven dataset from d dimensional space to a k dimensionalsubspace in which the new generated set of k dimensions arecalled the Principal Components (PC) [32] Each principalcomponent is directed to a maximum variance excluding thevariance accounted for in all its preceding components As aresult the first component covers the maximum variance andso on In brief Principal Components are represented as

PCi = a1X1 + a2X2+ (1)

wherePCi is ith principal component Xj is j

th original featureand aj is the numerical coefficient for feature Xj

35 Proposed Integration Strategy Decision integrationapproach is being used by the way of combining thestrengths of different algorithms into a single collaboratedmethod philosophy By the help of this approach it is aimedto improve the success of the proposed forecasting system bycombining different algorithms Since each algorithm can bemore sensitive or can have some weaknesses under differentconditions collecting decision of each model providesmore effective and powerful results for decision makingprocesses

There are twomain EL approaches in the literature calledhomogeneous and heterogeneous EL If different types ofclassifiers are used as base algorithms then such a systemis called heterogeneous ensemble otherwise homogenousensemble In this study we concentrate on heterogeneousensembles An ensemble system is composed of two parts

ensemble generation and ensemble integration [33ndash36] Inensemble generation part a diverse set of models are gen-erated using different base classifiers Nine different timeseries and regression methods support vector regressionmodel and deep learning algorithm are employed as baseclassifiers in this study There are many integration methodsthat combine decisions of base classifiers to obtain a finaldecision [37ndash40] For integration step a new decision inte-gration strategy is proposed for demand forecasting modelin this work We draw inspiration from boosting ensemblemodel [4 41] to construct our proposed decision integrationstrategy The essential concept behind boosting methodol-ogy is that in prediction or classification problems finaldecisions can be calculated as weighted combination of theresults

In order to integrate the prediction of each forecastingalgorithm we used two integration strategies for the pro-posed approach The first integration strategy selects the bestperforming forecasting method among others and uses thatmethod to predict the demand of a product in a store forthe next week The second integration strategy chooses thebest performing forecasting methods of current week andcalculates the prediction by combining weighted predictionsof winners In our decision integration strategy final decisionis determined by regarding contribution of all algorithmslike democratic systems Our approach considers decisionsof forecasting algorithms those are performing better byconsidering the previous 4 weeks of this year and last yeartransformation of current week and previous week Whilealgorithms are getting better at forecasts by the time theircontribution weights are increasing accordingly or vice versaIn first decision integration strategy we do not considercontribution of all algorithms in fully democratic mannerwe only look at contributions of the best algorithms ofrelated week In other words final decision is maintainedby integrating the decisions of only the best ones (selectedaccording to their historical decisions) for each store andproductOn the other hand the best algorithms of a week canchange according to each store product couple since everyalgorithm has different behavior for different products andlocations

The second decision integration strategy is basedon the weighted average of mean absolute percentageerror (MAPE) and mean absolute deviation (MAD) [42]These are two popular evaluation metrics to assess theperformance of forecasting models The equations forcalculation of these forecast accuracy measures are asfollows

MAPE (Mean Absolute Percentage Error)

MAPE = 1n

nsumt=1

1003816100381610038161003816Ft minus At1003816100381610038161003816

At(2)

MAD = 1n

nsumt=1

1003816100381610038161003816Ft minus At1003816100381610038161003816 (3)

MAD (Mean Absolute Deviation)

6 Complexity

where

(i) 119865119905 is the expected or estimated value for period 119905(ii) 119860 119905 is the actual value for period t(iii) n is also taking the value of the number of periods

Accuracy of a model can be simply computed as followsAccuracy = 1-MAPE If the values of the above two measuresare small it means that forecasting model performs better[43]

Replenishment for demand forecasting in retail industryneeds some modifications in general EL strategies regardingretail specific trends and dynamics For this purpose theproposed second integration strategy takes calculation ofweighted average of MAPE for each week of previous 4weeks and additionally MAPE of the previous yearrsquos sameweek and previous year following weekrsquos trend in Equation(4) This makes our forecasts more reliable with respect totrend changes and seasonality behaviors MAPE of each weekis being calculated by the formula defined in Equation (2)above The proposed model automatically changes weeklyweights of eachmember algorithm by theway of their averageMAPE for each week at store and product level Based on theresults ofMAPEweights the best algorithms of the week gainmore weight on final decision

Average MAPE of each algorithm for a product at eachstore is calculated with Equation (4) Suppose that MAPEs ofan algorithm are 02 03 01 and 01 for the previous 4 weeksof the current week and 01 and 02 for the previous yearrsquossame week and prior week Assume that weekly coefficientsare 25 20 10 5 30 and 10 respectively Then theaverage MAPE of an algorithm for the current week will be03 lowast 025 + 03 lowast 02 + 01 lowast01 + 01 lowast 005 + 01 lowast 03 + 02 lowast01 = 02 according to Equation (4)

Mavg =nsumk=1

CkMk (4)

In Equation (4) the sum of coefficients should be 1 that issum119899119896=1 119862119896 = 1 119872119896 means MAPE of the related weeks where

(i) M1 is the MAPE of previous week(ii) M2 is the MAPE of 2 weeks before related week(iii) M3 is the MAPE of 3 weeks before related week(iv) M4 is the MAPE of 4 weeks before related week(v) M5 is the MAPE of the previous years same week(vi) M6 is the MAPE of the previous years previous week

The proposed decision integration system computes a MAPEfor each algorithm for the coming forecasting week per storeand product level In addition the effects of special days aretaken into consideration Christmas Valentinersquos daymotherrsquosday start of Ramadan and other religious days can be thoughtas some examples of special days System users consideringthe current yearrsquos calendar can define special days manuallyThus trends of special days are computed by using previousyearrsquos trends automatically This enables the considerationof seasonality and special events that results in evaluation

of more accurate forecasting values Meanwhile seasonalityand other effects can be taken under consideration aswell Following MAPE calculation of the algorithms newweights (Wi where i=1n) are being assigned to each of themaccording to their weighted average for the current week

The next step is the definition of the best algorithms ofthe week for each store and product couple We only take30 of the best performingmethods into consideration as thebest algorithms After making many empirical observationsthis ratio is giving ultimate results according to the datasetIt is obvious that these parameters are very specific todataset characteristics and also depend on which algorithmsare included in the integration strategy Scaling is the lastpreprocess for the calculation of final decision forecastSuppose that we have n algorithms (A1 A2 An) and k ofthem are the best models of the week for a store and productcouple in which 119896 le 119899 The weight of each winner is beingscaled according to its weighted average in Equation (5)

W1015840t = Wt

sumkj=1Wj

t in (1 k) (5)

Assume that there are 3 best algorithms and their weights are1198821 = 30 1198822 = 18 and1198823 = 12 respectively Theirscaled new weights are going to be

11988210158401 = 3060 = 50

11988210158402 = 1860 = 30

11988210158403 = 1260 = 20

(6)

respectively according to Equation (5)Scaling makes our calculation more comprehensible

After scaling the weight of each algorithm the system givesultimate decision according to new weights by consideringthe performance of each algorithmrsquos with Equation (5)

Favg =ksumj=1FjW1015840j (7)

In Equation (7) the main constraint is sum119896119895=11198821015840119895 = 1 119896 isthe number of champion algorithms and F1 is the forecastof the related algorithm Suppose that the best performingalgorithms are A1 A2 and A3 and algorithm A1 forecastssales quantity as 20 and A2 says it will be 10 for the next weekA3 forecast is 5 Let us assume that their scaled weights are50 30 and 20 respectively Then the weighted forecastis as follows according to Equation (7)

Favg = 20 lowast 050 + 10 lowast 030 + 5 lowast 020 = 14 (8)

Every algorithmhas vote right according to its weight Finallyif a member algorithm does not appear in the list of thebest algorithms for a product and store couple for a specificperiod it is automatically being put into a black list sothat it will not be used anymore for a product store level

Complexity 7

Data Collection

Data Cleaning

Data Enrichment from BigData

Finding Frequent Itemsusing Apriori

ReplenishmentData mart on

Hadoop SPARK

Dimensionality Reduction

Clustering

Deep Learning Model Time Series Models Support VectorRegression Model

Final Decision

Proposed Integration Strategy

Figure 1 Flowchart of the proposed system

as it is marked This enables faster computation time inwhich the system disregards poor performing algorithmsThe algorithm of the proposed demand forecasting systemand the flowchart of the proposed system are given inAlgorithm 1 and Figure 1 respectively

4 Experiment Setup

We use an enhanced dataset which includes SOK Marketrsquosreal life sales and stock data with enriched features Thedataset consists of 106 weeks of sales data that includes 7888distinct products for different stores of SOKMarket On eachstore around 1500 products are actively sold while the rest

rarely become active The forecasting is being performed foreach week to predict demand of the following week Three-fold cross validation methodology is applied for testing dataand then decides with the best accurate one of them forthe next coming week forecast This is a common approachmaking demand forecasting in retail [44] Furthermoreoutside weather information such as daily temperature andother weather conditions is joined to the model as newvariables Since it is known that there is a certain correlationbetween shopping trends and weather conditions at retail inmost circumstances [33] we enriched the dataset with theweather information The dataset also includes mostly salesrelated features filled from relational legacy data sources

8 Complexity

Given n is the number of storesm is the number of products t is the number ofalgorithms in the system and 119860 119905 is an algorithm with index t 119904119898119899 is the matrix whichincludes the number of best performing algorithms for each store and product 119861119898119899 is thematrix which contains the set of blacklist of algorithms for each store and product and119865119894119895 is the matrix which stores final decision of each forecast where i is the number ofstores and j is the number of products

for i=1nfor j=1mfor k=1tif 119860119896 is in list 119861119894119895 then continueelse run 119860119896Calculate algorithm weight119882119896end if

end forfor k=1tChoice best performing algorithms and locate in 119904119894119895

end forfor z=1 119904119894119895do scaling for 119860119911

end forCalculate proposed integration strategy and store in 119865119894119895

end forend forreturn all 119865119899119898

Algorithm 1 The algorithm of the proposed demand forecasting system

Deep learning approach is used to estimate customerdemand for each product at each store of SOK Marketretail chain There is also very apparent similarity betweenproducts which are mostly in the same basket in terms ofsales trends For this purpose Apriori algorithm [4] is utilizedto find correlated products of each product Apriori algorithmis a fast way to find out most associated products which areat the same basketThemost associated productrsquos sales relatedfeatures are added to dataset as wellThis takes the advantageof relationships among products with similar sales trendsAddition of associated product of a product and its featuresenables DL algorithm to learn better within different hiddenlayers In summary it is observed in extensive experimentsthat enhancements of training dataset with weather data andmost associated products sales data made our estimates morepowerful with the usage of DL algorithm

The dataset includes weekly based data of each product ateach store for the latest 8 weeks and sales quantities of the last4 weeks in the same season Historically 2 years of data areincluded for each 5500 stores and 1500 products Data size isabout 875 million records within 155 features These featuresinclude sales quantity customer return quantity receivedproduct quantities from distribution centers sales amountdiscount amount receipt counts of a product customercounts who bought a specific product sales quantity foreach day of week (Monday Tuesday etc) maximum andminimum stock level of each day average stock level for eachweek sales quantity for each hour of the days and also salesquantities of last 4 weeks of the same season Since there isa relationship between products which are generally in thesame basket we prepared the same features defined above for

the most associated products as well Detailed explanation ofthe dataset is presented in Table 2

In Table 2 fields given with square brackets mean that itis an array for each week (Ex Sales quantity week 0 meanssales quantity of current week and sales quantity week 1 isfor sales quantity for one week before and so on) Regardingexperimental results taking weekly data for the latest 8 weeksand 4 weeks of the same season at previous year gives moreacceptable trends of changes in seasonality

For DL implementation part H2O library [31] is usedas an open source big data artificial intelligence platformH2O is a powerful machine learning library and gives usopportunity to implement DL algorithms on Hadoop Sparkbig data environments It puts together the power of highlyadvanced machine learning algorithms and provides to getbenefit of truly scalable in-memory processing for Sparkbig data environment on one or many nodes via its versionof Sparkling Water By the help of in-memory processingcapability of Spark technologies it provides faster parallelplatform to utilize big data for business requirements to getmaximum benefit In this study we use H2O version 31402on 64-bit 16-core and 32GBmemorymachine to observe thepower of parallelism during DL modelling while increasingthe number of neurons

A multilayer feedforward artificial neural network(MLFANN) is employed as a deep learning algorithmMLFANN is trained with stochastic gradient descent usingbackpropagation Using gradient descent each weight isadjusted according to its contribution value to the errorH2O has also advanced features like momentum trainingadaptive learning rate L1 or L2 regularization rate annealing

Complexity 9

Table 2 Dataset explanation

Name Description ExampleYearweek Related yearweek weeks are starting fromMonday to Sunday 201801Store Store number 1234Product Product Identification Number 1

Product Adjactive Associated product with the product according to apriori algorithm Most frequentproduct at the same basket with a specific product 2

Stock In Quantity Week[0-8] Stock increase quantity of the product in related week ex stock transfer quantityfrom distribution center to store 50

Return Quantity Week[0-8] Stock return quantity from customers at a specific week and store 20Sales Quantity Week[0-8] Weekly sales quantity of related product at a specific store 120Sales Amount Week[0-8] Total sales amount of the product at the customer receipt 2500Discount Amount Week[0-8] Discount amount of the product if there is any 500Customer Count Week[0-8] How many customers bought this product at a specific week 30Receipt Count Week[0-8] Distinct receipt count for related product 20Sales Quantity Time[9-22] Hourly sales quantity of related product from 9 am to 22 pm 5Last4weeks Day[1-7] Total sales of each weekday of last 4 weeks Total sales of Mondays Tuesdays etc 10Last8weeks Day[1-7] Total sales of each weekday of last 8 weeks Total sales of Mondays Tuesdays etc 10Max Stock Week[0-8] Maximum stock quantity of related week 12Min Stock Week[0-8] Minimum stock quantity of related week 2Avg Stock Week[0-8] Average stock quantity of related week 5Sales Quantity Adj Week[0-8] Sales quantity of most associated product 14Temperature Weekday[1-7] Daily temperature of weekdays Monday Tuesday etc 22Weekly Avg Temperature[0-8] Average weather temperature of related week 23Weather Condition Weekday[1-7] Nominal variable rainy snowy sunny cloudy etc RainySales Quantity Next Week Target variable of our classification problem 25

dropout and grid search Gaussian distribution is appliedbecause of the usage of continuous variable as responsevariable H2O performs very well in our environment when3 level hidden layers 10 nodes each of them and totally 300epochs are set as parameters

After implementing dimension reduction step K-meansis employed as a clustering algorithm K-means is a greedyand fast clustering algorithm [45] that tries to partitionsamples into k clusters in which each sample is near to thecluster center K is selected as 20 because after several trialsand empirical observations the most even distribution ondataset is reached and divided our dataset into 20 differ-ent clusters for each store After that part deep learningalgorithm is applied and obtained 20 different DL modelsfor each store Then demand forecasting for each productis performed by using its clusterrsquos model on a store basisThe main reason of using clustering is time and computinglack for DL algorithm Instead of making modelling for eachstore 20 models are generated for each store in product levelOur trials with sample dataset bias difference are less than2 so this speed-up is very beneficial and a good optionfor researchers The forecasting result of the DL model istransferred into our forecasting system that makes its finaldecision by considering decisions of 11 forecasting methodsincluding DL model

Forecasting solutions should be scalable process largeamounts of data and extract a model from data Big data

environments using Spark technologies give us opportunityfor implementing algorithms with scalability that shares tasksof machine learning algorithms among nodes of parallelarchitectures Considering huge amounts of samples andlarge number of features even the computational power ofparallel architecture is not enough in some of cases Forthis reason dimension reduction is needed for big dataapplications In this study PCA is employed as a featureextraction step

5 Experimental Results

Forecasting solutions should be scalable process largeamounts of data and extract a model from data Big dataenvironments using Spark technologies give us opportunityfor implementing algorithms with scalability that shares tasksof machine learning algorithms among nodes of parallelarchitectures Considering huge amounts of samples andlarge number of features even the computational power ofparallel architecture is not enough in some of cases Forthis reason dimension reduction is needed for big dataapplications In this study PCA is employed as a featureextraction step

SOK Market sells 21 different groups of items We assessthe performance of the proposed forecasting system on groupbasis as shown in Table 3 Table 3 shows MAPE (Mean Abso-lute Percentage Error) of integration strategies (S1) and (S2)

10 Complexity

Table3Dem

andforecasting

improvem

entsperp

rodu

ctgrou

p

Prod

uct

Group

s

Metho

dI

with

outP

ropo

sed

IntegrationStrategy

(119878 1)

MAPE

Metho

dII

with

Prop

osed

IntegrationStrategy

(119878 2)

MAPE

Prop

osed

Integration

Strategy

with

Deep

Learning

(119878 119863)

MAPE

Percentage

Success

Rate

119875 1=(119878 1minus

119878 2)119878 2

Percentage

Success

Rate

119875 2=(119878 1minus

119878 119863)119878 119863

Improvem

ent

119863=119875 2minus

119875 1Ba

byProd

ucts

05157

03081

02927

4026

4325

299

Bakery

Prod

ucts

03482

02059

01966

4085

4352

267

Beverage

03714

02316

02207

3764

4058

294

Biscuit-C

hocolate

03358

02077

01977

3814

4113

299

BreakfastP

rodu

cts

044

4302770

02661

3765

4011

245

Canned-Paste

-Sauces

03836

02309

02198

3980

4269

289

Cheese

03953

02457

02345

3784

4068

284

Cleaning

Prod

ucts

04560

02791

02650

3879

4189

310

CosmeticsP

rodu

cts

05397

03266

03148

3949

4167

219

DeliM

eats

04242

02602

02488

3865

4136

270

EdibleOils

040

6002299

02215

4336

4545

209

Hou

seho

ldGoo

ds05713

03656

03535

3601

3813

212

IceC

ream

-Frozen

05012

03255

03106

3505

3803

298

Legu

mes-Pasta-Sou

p03850

02397

02269

3774

4107

333

Nuts-Ch

ips

03316

02049

01966

3820

4071

251

PoultryEg

gs04219

02527

02403

4011

4304

294

ReadyMeals

04613

02610

02520

4342

4536

194

RedMeat

02514

01616

01532

3571

3906

335

Tea-Coff

eeProd

ucts

04347

02650

02535

3904

4168

264

Textile

Prod

ucts

05418

03048

02907

4374

46

34

260

To

baccoProd

ucts

03791

02378

02290

3729

3961

232

Average

04238

02582

02469

3899

4168

269

Complexity 11

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 2 MAPE distribution according to product groups with S1 integration strategy

in columns 2 and 3The average percentage forecasting errorsof strategies (S1) and (S2) are 042 and 026 respectively As itcan be seen fromTable 3 the percentage success rate of (S2) inaccordance with (S1) is represented in column 5 The currentintegration strategy (S2) provides 3899 improvement overthe first integration strategy (S1) average In some productgroups like Textile Products the error rate is reduced from054 to 031 with 4374 enhancement in MAPE

With this work a further improvement is obtainedby utilizing DL model compared to the usage of just 10algorithms which are based on forecasting strategy Column4 of Table 3 demonstrates MAPEs of 21 different groups ofproducts after the addition of DL model After the inclusionof DL model into the proposed system we observed around2 to 34 increase in demand forecast accuracies in someof product groups The error rates of some product groupslike ldquobaby productsrdquo ldquocosmetic productsrdquo and ldquohouseholdgoodsrdquo are usually higher than others since these productgroups do not have continuous demand by consumers Forexample the error rate for ldquohousehold goodsrdquo group is 035while the error rate for ldquored meatrdquo group is 015 The errorrates in food products are usually lower since customersconsider their daily consumption regularly to buy theseproducts However the error rates are higher in baby textileand household goods Customers buy products from thesegroups depending on whether they like the product or notselectively In addition the prices of goods in these groupsand promotion effects are usually high relative to the pricesof goods in other groups

Moreover it is observed that the top benefitted productgroups for Method I accomplishes over 40 success rateon Baby Products Bakery Products Edible Oils PoultryEggs Ready Meals and Textile Products when the column

of percentage success rate is analyzed The common pointof these groups is that each one of them is being consumedby a specific customer segment regularly For instance babyproducts group is being chosen by families who have kidsEdible Oils Poultry Eggs and Bakery Products are beingpreferred by frequently and continuously shopping customersegments and Ready Meals are being preferred by mostlybachelor single and working consumers etc Furthermorethe inclusion of DL into the forecasting system indicatesthat some other consuming groups (for example CleaningProducts RedMeats and Legumes Pasta Soup) exhibit betterperformance than the others with additional over 3

Figure 2 shows the box plots of mean percentage errors(MAPE) for each group of products after application of(S1) integration strategy The box plots enable us to analyzedistributional characteristics of forecasting errors for productgroups As it can be seen from the figure there are differentmedians for each product group Thus the forecasting errorsare usually dissimilar for different products groups Theinterquartile range box represents the middle 50 of scoresin data The lengths of interquartile range boxes are usuallyvery tall This means that there are quite number of differentprediction errors within products of a given group

Figure 3 presents the box plots of MAPEs of the proposedforecasting system that apply proposed integration approach(S2) which combines the predictions of 10 different forecast-ing algorithms

Figure 4 shows after adding DL to our system resultsThe lengths of interquartile range boxes are narrower whencompared to the ones in Figures 2 and 3 This means that thedistribution of forecasting errors is less than Figures 2 and 3Finally the integration of DL into the proposed forecastingsystem generates results that are more accurate

12 Complexity

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 3 MAPE distribution according to product groups with S2 integration strategy

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 4 MAPE distribution according to product groups after deep learning algorithms

For example for the (S1) integration strategy in BabyProducts product group MAPE distribution for the 1st quar-tile of data 5 is observed between 0and 23 for 2nd quartilebetween 23 and 48 and for 3rd and 4th between 48and 75 in Figure 2 After applying integration strategy forBaby Products product group MAPE distribution becomes0ndash16 for the 1st quartile 16ndash30 for the 2nd quartileof the data and 30ndash50 for the rest This gives around8 improvement for the first quartile from 8 to 18enhancement for the second quartile and from 18 to 25advancement for the rest of the data according to boundaries

differences Median of MAPE distribution is analyzed as48 After applying the proposed integration strategy it isobserved as 27 which means 21 improvement Approx-imately 1-3 enhancement is observed for each quartileof the data with the inclusion of deep learning strategy inFigure 4

For Cleaning Products similar improvements areobserved with the others for integration strategy MAPEdistribution of the 1st quartile of data is observed between0 and 19 for the 2nd quartile it is between 19 and 40and for the rest of data it is observed between 40 and

Complexity 13

SUM

(SA

LE Q

UAN

TITY

)

DATE

9000

8000

7000

6000

5000

4000

3000

2000

1000

0

July August Sep Oct Nov Dec Jan Feb March April May June July

ActualS1

S2SD

2017 2017 2017 2017 2017 2017 2018 2018 2018 2018 2018 2018 2018

Figure 5 Accuracy comparison among integration strategies for one sample product group

73 in Figure 2 After performing the proposed integrationstrategy for Cleaning Products MAPE distribution becomes0ndash10 for the 1st quartile of data 10ndash17 for the 2ndquartile and 27ndash43 for the 3rd and 4th quartiles togetherImprovements are observed as follows 9 for the 1st quartilefrom 9 to 13 for the 2nd quartile and from 13 to 20for the rest of the data at boundaries After implementationof the integration strategy for the 1st and 2nd quartiles thesystem advancement is observed as 2 and for the 3rdand 4th quartiles it is 3 respectively Briefly integrationstrategy improvement is remarkable for each product groupand consolidation of DL to the proposed integration strategyadvances results between almost 2 and 3 additionally

In Figure 5 the accuracy comparison among integrationstrategies of one of the best performing sample groups ispresented during 1-year period Particularly Christmas andother seasonality effects can be seen very clearly at Figure 5and integration strategy with DL (SD) is performing with thebest accuracy

It is hard to compare the performance of our proposedsystem with the other studies because of the lack of workswith similar combinations of deep learning approach theproposed decision integration strategies and different learn-ing algorithms for demand forecasting process Anotherdifficulty is the deficiency of real-life dataset similar toSOK dataset in order to compare the state-of-art studiesAlthough the results of proposed system are given in thisstudy we also report the results of a number of researchworkshere on demand forecasting domain A greedy aggregation-decomposition method has solved a real-world intermittentdemand forecasting problem for a fashion retailer in Sin-gapore [46] They report 59 MAPE success rate with asmall dataset Authors compare different approaches suchas statistical model winter model and radius basis functionneural network (RBFNN) with SVM for demand forecastingprocess in [47] As a result they conclude the study by the

fact that the success of SVM algorithm surpasses otherswith around 77 enhancement at average MAPE resultslevel A novel demand forecasting model called SHEnSVM(Selective and Heterogeneous Ensemble of Support Vec-tor Machines) is proposed in [48] The proposed modelpresents that the individual SVMs are trained by differentsamples generated by bootstrap algorithm After that geneticalgorithm is employed for retrieving the best individualcombination schema They report 10 advancement withSVM algorithm and 64 MAPE improvement They onlyemploy beer datawith 3 different brands in their experimentsTugba Efendigil et al propose a novel forecasting mechanismwhich is modeled by artificial intelligence approaches in[49] They compared both artificial neural networks andadaptive network-based fuzzy inference system techniques tomanage the fuzzy demandwith incomplete informationTheyreached around 18 MAPE rates for some products duringtheir experiments

6 Conclusion

In retail industry demand forecasting is one of the mainproblems of supply chains to optimize stocks reduce costsand increase sales profit and customer loyalty To overcomethis issue there are several methods such as time seriesanalysis and machine learning approaches to analyze andlearn complex interactions and patterns from historical data

In this study there is a novel attempt to integrate the11 different forecasting models that include time series algo-rithms support vector regression model and deep learningmethod for demand forecasting process Moreover a noveldecision integration strategy is developed by drawing inspi-ration from the ensemble methodology namely boostingThe proposed approach considers the performance of eachmodel in time and combines the weighted predictions of thebest performing models for demand forecasting process The

14 Complexity

proposed forecasting system is tested and carried out real lifedata obtained from SOK Market retail chain It is observedthat the inclusion of different learning algorithms except timeseries models and a novel integration strategy advanced theperformance of demand forecasting system To the best ofour knowledge this is the very first attempt to consolidatedeep learning methodology SVR algorithm and differenttime series methods for demand forecasting systems Fur-thermore the other novelty of this work is the adaptation ofboosting ensemble strategy to the demand forecasting modelIn this way the final decision of the proposed system isbased on the best algorithms of the week by gaining moreweight This makes our forecasts more reliable with respectto the trend changes and seasonality behaviors Moreoverthe proposed approach performs very well integrating withdeep learning algorithm on Spark big data environmentDimension reduction process and clustering methods helpto decrease time consuming with less computing powerduring deep learning modelling phase Although a reviewof some of similar studies is presented in Section 5 asit is expected it is very difficult to compare the resultsof other studies with ours because of the use of differentdatasets and methods In this study we compare results ofthree models where model one selects the best performingforecasting method depending on its success on previousperiod with 424 MAPE on average The second modelwith the novel integration strategy results in 258 MAPEon average The last model with the novel integration strat-egy enhanced with deep learning approach provides 247MAPE on average As a result the inclusion of deep learningapproach into the novel integration strategy reduces averageprediction error for demand forecasting process in supplychain

As a future work we plan to enrich the set of featuresby gathering data from other sources like economic studiesshopping trends social media social events and locationbased demographic data of stores New variety of datasources contributions to deep learning can be observedOne more study can be done to determine the hyperpa-rameters for deep learning algorithm In addition we alsoplan to use other deep learning techniques such as con-volutional neural networks recurrent neural networks anddeep neural networks as learning algorithms Furthermoreour another objective is to use heuristic methods MBO(Migrating Birds Optimization) and other related algorithms[50] to optimize some of coefficientsweights which weredetermined empirically by trial-and-error like taking 30percent of the best performing methods in our currentsystem

Data Availability

This dataset is private customer data so the agreement withthe organization SOK does not allow sharing data

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work is supported by OBASE Research amp DevelopmentCenter

References

[1] P J McGoldrick and E Andre ldquoConsumer misbehaviourpromiscuity or loyalty in grocery shoppingrdquo Journal of Retailingand Consumer Services vol 4 no 2 pp 73ndash81 1997

[2] D Grewal A L Roggeveen and J Nordfalt ldquoThe Future ofRetailingrdquo Journal of Retailing vol 93 no 1 pp 1ndash6 2017

[3] E T Bradlow M Gangwar P Kopalle and S Voleti ldquothe roleof big data and predictive analytics in retailingrdquo Journal ofRetailing vol 93 no 1 pp 79ndash95 2017

[4] J Han M Kamber and J Pei Data Mining Concepts andTechniques Morgan Kaufmann Publishers USA 2013

[5] R Hyndman and G Athanasopoulos Forecasting Principlesand Practice OTexts Melbourne Australia 2018 httpotextsorgfpp2

[6] C C Pegels ldquoExponential forecasting some new variationsrdquoManagement Science vol 12 pp 311ndash315 1969

[7] E S Gardner ldquoExponential smoothing the state of the artrdquoJournal of Forecasting vol 4 no 1 pp 1ndash28 1985

[8] D Gujarati Basic Econometrics Mcgraw-Hill New York NYUSA 2003

[9] J D Croston ldquoForecasting and stock control for intermittentdemandsrdquo Operational Research Quarterly vol 23 no 3 pp289ndash303 1972

[10] T RWillemain C N Smart J H Shockor and P A DeSautelsldquoForecasting intermittent demand inmanufacturing a compar-ative evaluation of Crostonrsquos methodrdquo International Journal ofForecasting vol 10 no 4 pp 529ndash538 1994

[11] A Syntetos Forecasting of intermittent demand Brunel Univer-sity 2001

[12] A A Syntetos and J E Boylan ldquoThe accuracy of intermittentdemand estimatesrdquo International Journal of Forecasting vol 21no 2 pp 303ndash314 2005

[13] R H Teunter A A Syntetos and M Z Babai ldquoIntermittentdemand linking forecasting to inventory obsolescencerdquo Euro-pean Journal of Operational Research vol 214 no 3 pp 606ndash615 2011

[14] R J Hyndman R A Ahmed G Athanasopoulos and H LShang ldquoOptimal combination forecasts for hierarchical timeseriesrdquo Computational Statistics amp Data Analysis vol 55 no 9pp 2579ndash2589 2011

[15] R Fildes and F Petropoulos ldquoSimple versus complex selectionrules for forecasting many time seriesrdquo Journal of BusinessResearch vol 68 no 8 pp 1692ndash1701 2015

[16] C Li and A Lim ldquoA greedy aggregationndashdecompositionmethod for intermittent demand forecasting in fashion retail-ingrdquo European Journal of Operational Research vol 269 no 3pp 860ndash869 2018

[17] F Turrado Garcıa L J Garcıa Villalba and J Portela ldquoIntelli-gent system for time series classification using support vectormachines applied to supply-chainrdquo Expert Systems with Appli-cations vol 39 no 12 pp 10590ndash10599 2012

[18] G Song and Q Dai ldquoA novel double deep ELMs ensemblesystem for time series forecastingrdquo Knowledge-Based Systemsvol 134 pp 31ndash49 2017

Complexity 15

[19] O Araque I Corcuera-Platas J F Sanchez-Rada and C AIglesias ldquoEnhancing deep learning sentiment analysis withensemble techniques in social applicationsrdquoExpert SystemswithApplications vol 77 pp 236ndash246 2017

[20] H Tong B Liu and S Wang ldquoSoftware defect predictionusing stacked denoising autoencoders and twostage ensemblelearningrdquo Information and Soware Technology vol 96 pp 94ndash111 2018

[21] X Qiu Y Ren P N Suganthan and G A J AmaratungaldquoEmpirical mode decomposition based ensemble deep learningfor load demand time series forecastingrdquo Applied So Comput-ing vol 54 pp 246ndash255 2017

[22] Z Qi BWang Y Tian and P Zhang ldquoWhen ensemble learningmeets deep learning a new deep support vector machine forclassificationrdquo Knowledge-Based Systems vol 107 pp 54ndash602016

[23] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[24] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[25] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesrdquo inNeural Networks Tricks of theTrade vol 7700 of Lecture Notes in Computer Science pp 437ndash478 Springer Berlin Germany 2nd edition 2012

[26] Y LeCun Y Bengio and G Hinton ldquoDeep learning ReviewrdquoInternational Journal of Business and Social Science vol 3 2015

[27] S Kim Z Yu R M Kil and M Lee ldquoDeep learning of supportvectormachineswith class probability output networksrdquoNeuralNetworks vol 64 pp 19ndash28 2015

[28] P J Brockwell and R A Davis Time Series eory AndMethods Springer Series in Statistics NewYork NY USA 1989

[29] V N Vapnik Statistical Learning eory Wiley- InterscienceNew York NY USA 1998

[30] S Haykin Neural Networks and LearningMachines MacmillanPublishers Limited New Jersey NJ USA 2009

[31] A Candel V Parmar E LeDell and A Arora Deep Learningwith H2O United States of America 2018

[32] K Keerthi Vasan and B Surendiran ldquoDimensionality reductionusing principal component analysis for network intrusiondetectionrdquo Perspectives in Science vol 8 pp 510ndash512 2016

[33] L Rokach ldquoEnsemble-based classifiersrdquo Artificial IntelligenceReview vol 33 no 1-2 pp 1ndash39 2010

[34] R Polikar ldquoEnsemble based systems in decision makingrdquo IEEECircuits and Systems Magazine vol 6 no 3 pp 21ndash45 2006

[35] Reid SamA review of heterogeneous ensemblemethods Depart-ment of Computer Science University of Colorado at Boulder2007

[36] D Gopika and B Azhagusundari ldquoAn analysis on ensem-ble methods in classification tasksrdquo International Journal ofAdvanced Research in Computer and Communication Engineer-ing vol 3 no 7 pp 7423ndash7427 2014

[37] Y Ren L Zhang and P N Suganthan ldquoEnsemble classificationand regression-recent developments applications and futuredirectionsrdquo IEEE Computational Intelligence Magazine vol 11no 1 pp 41ndash53 2016

[38] U G Mangai S Samanta S Das and P R ChowdhuryldquoA survey of decision fusion and feature fusion strategies for

pattern classificationrdquo IETE Technical Review vol 27 no 4 pp293ndash307 2010

[39] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014

[40] G Tsoumakas L Angelis and I Vlahavas ldquoSelective fusion ofheterogeneous classifiersrdquo Intelligent Data Analysis vol 9 no 6pp 511ndash525 2005

[41] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[42] S Chopra and P Meindl Supply Chain Management StrategyPlanning and Operation Prentice Hall 2004

[43] G Kushwaha ldquoOperational performance through supply chainmanagement practicesrdquo International Journal of Business andSocial Science vol 217 pp 65ndash77 2012

[44] G E Box G M Jenkins G Reinsel and G M LjungTime Series Analysis Forecasting and Control Wiley Series inProbability and Statistics John Wiley amp Sons Hoboken NJUSA 5th edition 2015

[45] W-L Zhao C-H Deng and C-W Ngo ldquok-means a revisitrdquoNeurocomputing vol 291 pp 195ndash206 2018

[46] C Li andA Lim ldquoA greedy aggregation-decompositionmethodfor intermittent demand forecasting in fashion retailingrdquo Euro-pean Journal of Operational Research vol 269 no 3 pp 860ndash869 2018

[47] L Yue Y Yafeng G Junjun and T Chongli ldquoDemand forecast-ing by using support vectormachinerdquo inProceedings of theirdInternational Conference on Natural Computation (ICNC 2007)pp 272ndash276 Haikou China August 2007

[48] L Yue L Zhenjiang Y Yafeng T Zaixia G Junjun andZ Bofeng ldquoSelective and heterogeneous SVM ensemble fordemand forecastingrdquo in Proceedings of the 2010 IEEE 10th Inter-national Conference on Computer and Information Technology(CIT) pp 1519ndash1524 Bradford UK June 2010

[49] T Efendigil S Onut and C Kahraman ldquoA decision supportsystem for demand forecasting with artificial neural networksand neuro-fuzzy models a comparative analysisrdquo Expert Sys-tems with Applications vol 36 no 3 pp 6697ndash6707 2009

[50] E Duman M Uysal and A F Alkaya ldquoMigrating birds opti-mization a newmetaheuristic approach and its performance onquadratic assignment problemrdquo Information Sciences vol 217pp 65ndash77 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 3: An Improved Demand Forecasting Model Using …downloads.hindawi.com/journals/complexity/2019/9067367.pdfAn Improved Demand Forecasting Model Using Deep Learning Approach and Proposed

Complexity 3

accuracy Li et al introduced a revised mean absolute scalederror (RMASE) in their study as a new accuracy measurefor intermittent demand forecasting which is a relativeerror measure and scale-independent [16] In addition totime series methods artificial intelligence approaches arebecoming popular with the growth of big data technologiesAn initial attempt was made in the study of Garcia [17]

In recent years EL is also popular and used by researchersinmany research areas In the study of Song andDai [18] theyproposed a novel double deep Extreme Learning Machine(ELM) ensemble system focusing on the problem of timeseries forecasting In the study by Araque et al DL basedsentiment classifier is developed [19] This classifier serves asa baseline when compared to subsequent results proposingtwo ensemble techniques which aggregate baseline classifierwith other surface classifiers in sentiment analysis Tonget al proposed a new software defect predicting approachincluding two phases the deep learning phase and two-stageensemble (TSE) phase [20] Qiua presented an ensemblemethod [21] composed of empirical mode decomposition(EMD) algorithm and DL approach both together in hiswork He focused on electric load demand forecasting prob-lem comparing different algorithms with their ensemblestrategy Qi et al present the combination of Ex-Adaboostlearning strategy and the DL research based on supportvectormachine (SVM) and thenpropose a newDeep SupportVector Machine (DeepSVM) [22]

In classification problems the performance of learningalgorithms mostly depends on the nature of data repre-sentation [23] DL was firstly proposed in 2006 by thestudy of Geoff Hinton who reported a significant inventionin the feature extraction [24] After then DL researchersgenerated many new application areas in different fields [25]Deep Belief Networks (DBN) based on Restricted BoltzmannMachines (RBMs) is another representative algorithm of DL[24] where there are connections between the layers butnone of them among units within each layer [24] At firsta DBN is implemented to train data in an unsupervisedlearning way DBN learns the common features of inputs asmuch as possible it can Then DBN can be optimized in asupervised way With DBN the corresponding model canbe constructed for classification or other pattern recognitiontasks Convolutional Neural Networks (CNN) is anotherinstance of DL [22] and multiple layers neural networks InCNN each layer contains several two-dimensional planesand those are composed ofmany neuronsThemain principaladvantage of CNN is that the weight in each convolutionallayer is shared among each of them In other words theneurons use the same filter in each two-dimensional planeAs a result feature tunable parameters reduce computationcomplexity [22 25] Auto Encoder (AE) is also being thoughtto train on deep architectures in an acquisitive layer-wisemanner [22] In neural network (NN) systems it is supposedthat the output itself can be thought as the input data It ispossible to obtain the different data representations for theoriginal data by adjusting the weight of each layer Inputdata is composed of encoder and decoder AE is a NN forreconstructing the input data Other types of DLmethods canbe found in [24 26 27]

Our work differs from the above mentioned literaturestudies in that this is the very first attempt of employing dif-ferent time series methods SVR algorithm and DL approachfor demand forecasting process Unlike the literature studiesa novel final decision integration strategy is proposed to boostthe performance of demand forecasting systemThedetails ofthe proposed study can be found in Section 3

3 Proposed Framework

This section gives a summary of base forecasting techniquessuch as time series and regression methods support vectorregression model feature reduction approach deep learningmethodology and a new final decision integration strategy

31 Time Series and Regression Methods In our proposedsystem nine different time series algorithms including mov-ing average (MA) exponential smoothing Holt-WintersARIMA [28]methods and three different RegressionModelsare employed In time series forecasting models the classicalapproach is to collect historical data analyze these dataunderlying feature and utilize the model to predict the future[28] Table 1 shows algorithm definitions and parameterswhich are used in the proposed system These algorithms arecommonly used forecasting algorithms in time series demandforecasting domain

32 Support Vector Regression Support Vector Machines(SVM) is a powerful classification technique based on asupervised learning theory developed and presented byVladimir Vapnik [29] The background works for SVMdepend on early studies of Vapnikrsquos and Alexei Chervo-nenkisrsquos on statistical learning theory about 1960s Althoughthe training time of even the fastest SVM can be quiteslow their main properties are highly accurate and theirability to model complex and nonlinear decision bound-aries is really powerful They show much less pronenessto overfitting than other methods The support vectors canalso provide a very compact description of the learnedmodel

We also use SVR algorithm in our proposed systemwhich is regression implementation of SVM for continuousvariable classification problems SVR algorithm is being usedfor continuous variable prediction problems as a regressionmethod that preserves all the main properties (maximalmargin) as well as classification problems The main idea ofSVR is the computation of a linear regression function in ahigh dimensional feature space The input data are mappedby means of a nonlinear function in high dimensionalspace SVR has been applied in different variety of areasespecially on time series and financial prediction problemshandwritten digit recognition speaker identification objectrecognition convex quadratic programming and choices ofloss functions are some of them [4] SVR is a continuousvariable prediction method like regression In this study SVRis used to forecast sales demands by using the input variablesexplained in Table 1

4 Complexity

Table 1 Time series algorithms used in demand forecasting

Models Formulation Definition of Variables

Regression Model 1 119910119905 = 1205730 + 12057311198831119905 + 12057321198832119905 + 12057331198833119905 + 12057341198834119905 + 120598119905Variability in weekly customer demand special days andholidays (X1) discount dates (X2) days when the productis closed for sale (X3) and sales (X4) that cannot explain

these three activities but are exceptionalRegression Model2 119910119905 = 1205730 + 12057311198831119905 + 12057321198832119905 + 12057331198833119905 + sum52119895=1 120574119895119882119895119905 + 120598119905 R1 is modeled by adding weekly partial-regressive terms

(119882119895119905 119895 = 1 52)Regression Model3

119910119905 =1205730+12057311198831119905+12057321198832119905+12057331198833119905+sum12119895=1 120572119895119872119895119905+sum4119895=1 120574119895119882119895119905+120598119905The R3 model adds the shadow regression terms to the R1model for each month of the year119872119895119905 119895 = 1 12 andfor the months of the week119882119895119905 119895 = 1 4 end result

SVR (SupportVector Regression) 119910119905 = 119878119881119877 (119894119899119901119906119905 V119886119903119894119886119887119897119890119904)

The SVR prediction is used for forecasting 119910119905 by usinginputs X1 X2 X3 mentioned regression model 1 each

month of the year119872119895119905 119895 = 1 12 and the months ofthe week119882119895119905 119895 = 1 4

ExponentialSmoothing Model 119910119905 = 120572119910119905minus1 + (1 minus 120572) 119910119905minus1 + 120598119905 Exponential smoothing is generally used for products

where there is no trend or seasonality in the model

Holt-TrendMethods

119910119905 = 119871 119905 + 119879119905119871 119905 = 120572119910119905 + (1 minus 120572) (119871 119905 minus 1)119879119905 = 120573 (119871 119905 minus 119871 119905minus1) + (1 minus 120573)119879119905minus1Additive Holt-Trend model has been evaluated for

products with level (L) and trend (T)

Holt-WintersSeasonal Models

119910119905 = 119871 119905 + 119879119905 + 119878119905+119901+1119871 119905 = 120572119910119905 + (1 minus 120572) (119871 119905 + 119879119905)119879119905 = 120573 (119871 119905 minus 119871 119905minus1) + (1 minus 120573) 119879119905minus1119878119905+119901+1 = 120574(119863119905+1119871 119905+1 ) + (1 minus 120574) 119878119905+1

Additive Holt-Winters Seasonal models have beenevaluated for products with level (L) seasonality (S) and

trends (T)

Two-Level Model 11199101119905 = 120572119910119905minus1 + (1 minus 120572) 119910119905minus1 + 1205981199051199102119905 = 1205730 + 12057311198831119905 + 12057321198832119905 + 12057331198833119905 + 12057341198834119905 + 120598119905119910119905 = 1199101119905 minus 1199102119905 + 120598119905

In this method R1 model was first applied to the timeseries Later the demand values estimated by this modelare subtracted from the customer demand data and theresidual values are fitted with exponential correction andHolt-Winters Exponential Smoothing Model Estimates ofthese two metrics were collected and the product demand

estimate was calculated and evaluated

Two-Level Model 2

1199101119905 = 119871 119905 + 119879119905119871 119905 = 1205721199101119905 + (1 minus 120572) (119871 119905 minus 1)119879119905 = 120573 (119871 119905 minus 119871 119905minus1) + (1 minus 120573) 119879119905minus11199102119905 = 1205730 + 12057311198831119905 + 12057321198832119905 + 12057331198833119905 + 12057341198834119905 + 120598119905119910119905 = 1199101119905 minus 1199102119905 + 120598119905

In this method R1 model was first applied to the timeseries Later the demand values estimated by this modelare subtracted from the customer demand data and theresidual values are fitted with exponential correction andHolt-Winters Trend Model Estimates of these two metrics

were collected and the product demand estimate wascalculated and evaluated

Two-Level Model 3

1199101119905 = 119871 119905 + 119879119905 + 119878119905+119901+1119871 119905 = 120572119910119905 + (1 minus 120572) (119871 119905 + 119879119905)119879119905 = 120573 (119871 119905 minus 119871 119905minus1) + (1 minus 120573) 119879119905minus1119878119905+119901+1 = 120574(119863119905+1119871 119905+1 ) + (1 minus 120574) 119878119905+11199102119905 = 1205730 + 12057311198831119905 + 12057321198832119905 + 12057331198833119905 + 12057341198834119905 + 120598119905119910119905 = 1199101119905 minus 1199102119905 + 120598119905

In this method R1 model was first applied to the timeseries Later the demand values estimated by this modelare subtracted from the customer demand data and theresidual values are fitted with exponential correction andHolt-Winters Seasonal Model Estimates of these two

metrics were collected and the product demand estimatewas calculated and evaluated

33 Deep Learning Machine learning approach can analyzefeatures relationships and complex interactions among fea-tures of a problem from samples of a dataset and learn amodel which can be used for demand forecasting Deeplearning (DL) is a machine learning technique that appliesdeep neural network architectures to solve various complexproblems DL has become a very popular research topicamong researchers and has been shown to provide impres-sive results in image processing computer vision naturallanguage processing bioinformatics and many other fields[25 26]

In principle DL is an implementation of artificial neuralnetworks which mimic natural human brain [30] Howevera deep neural network is more powerful and capable ofanalyzing and composing more complex features and rela-tionships than a traditional neural network DL requires highcomputing power and large amounts of data for training Therecent improvements in GPU (Graphical Processing Unit)and parallel architectures enabled the necessary computingpower required in deep neural networks DL uses successivelayers of neurons where each layer extracts more complexand abstracts features from the output of previous layers

Complexity 5

Thus a DL can automatically perform feature extraction initself without any preprocessing step Visual object recogni-tion speech recognition and genomics are some of the fieldswhere DL is applied successfully [26]

A multilayer feedforward artificial neural network(MLFANN) is employed as a deep learning algorithm in thisstudy In a feedforward neural network information movesfrom the input nodes to the hidden nodes and at the end tothe output nodes in successive layers of the network withoutany feedback MLFANN is trained with stochastic gradientdescent using backpropagation It uses gradient descentalgorithm to update the weights purposing to minimize thesquared error among the network output values and thetarget output values Using gradient descent each weightis adjusted according to its contribution value to the errorFor deep learning part H2O library [31] is used as an opensource big data artificial intelligence platform

34 Feature Extraction In this work there are some issuesbecause of the huge amount of data When we try to useall of the features of all products in each store with 155features and 875 million records deep learning algorithmtook days to finish its job due to limited computational powerFor the purpose reducing the number of features and thecomputational time is required for each algorithm In orderto overcome this problem and to accelerate clustering andmodelling phases PCA (Principal Component Analysis) isused as a feature extraction algorithm PCA is a commonlyused dimension reduction algorithm that presents the mostsignificant features PCA transforms each instance of thegiven dataset from d dimensional space to a k dimensionalsubspace in which the new generated set of k dimensions arecalled the Principal Components (PC) [32] Each principalcomponent is directed to a maximum variance excluding thevariance accounted for in all its preceding components As aresult the first component covers the maximum variance andso on In brief Principal Components are represented as

PCi = a1X1 + a2X2+ (1)

wherePCi is ith principal component Xj is j

th original featureand aj is the numerical coefficient for feature Xj

35 Proposed Integration Strategy Decision integrationapproach is being used by the way of combining thestrengths of different algorithms into a single collaboratedmethod philosophy By the help of this approach it is aimedto improve the success of the proposed forecasting system bycombining different algorithms Since each algorithm can bemore sensitive or can have some weaknesses under differentconditions collecting decision of each model providesmore effective and powerful results for decision makingprocesses

There are twomain EL approaches in the literature calledhomogeneous and heterogeneous EL If different types ofclassifiers are used as base algorithms then such a systemis called heterogeneous ensemble otherwise homogenousensemble In this study we concentrate on heterogeneousensembles An ensemble system is composed of two parts

ensemble generation and ensemble integration [33ndash36] Inensemble generation part a diverse set of models are gen-erated using different base classifiers Nine different timeseries and regression methods support vector regressionmodel and deep learning algorithm are employed as baseclassifiers in this study There are many integration methodsthat combine decisions of base classifiers to obtain a finaldecision [37ndash40] For integration step a new decision inte-gration strategy is proposed for demand forecasting modelin this work We draw inspiration from boosting ensemblemodel [4 41] to construct our proposed decision integrationstrategy The essential concept behind boosting methodol-ogy is that in prediction or classification problems finaldecisions can be calculated as weighted combination of theresults

In order to integrate the prediction of each forecastingalgorithm we used two integration strategies for the pro-posed approach The first integration strategy selects the bestperforming forecasting method among others and uses thatmethod to predict the demand of a product in a store forthe next week The second integration strategy chooses thebest performing forecasting methods of current week andcalculates the prediction by combining weighted predictionsof winners In our decision integration strategy final decisionis determined by regarding contribution of all algorithmslike democratic systems Our approach considers decisionsof forecasting algorithms those are performing better byconsidering the previous 4 weeks of this year and last yeartransformation of current week and previous week Whilealgorithms are getting better at forecasts by the time theircontribution weights are increasing accordingly or vice versaIn first decision integration strategy we do not considercontribution of all algorithms in fully democratic mannerwe only look at contributions of the best algorithms ofrelated week In other words final decision is maintainedby integrating the decisions of only the best ones (selectedaccording to their historical decisions) for each store andproductOn the other hand the best algorithms of a week canchange according to each store product couple since everyalgorithm has different behavior for different products andlocations

The second decision integration strategy is basedon the weighted average of mean absolute percentageerror (MAPE) and mean absolute deviation (MAD) [42]These are two popular evaluation metrics to assess theperformance of forecasting models The equations forcalculation of these forecast accuracy measures are asfollows

MAPE (Mean Absolute Percentage Error)

MAPE = 1n

nsumt=1

1003816100381610038161003816Ft minus At1003816100381610038161003816

At(2)

MAD = 1n

nsumt=1

1003816100381610038161003816Ft minus At1003816100381610038161003816 (3)

MAD (Mean Absolute Deviation)

6 Complexity

where

(i) 119865119905 is the expected or estimated value for period 119905(ii) 119860 119905 is the actual value for period t(iii) n is also taking the value of the number of periods

Accuracy of a model can be simply computed as followsAccuracy = 1-MAPE If the values of the above two measuresare small it means that forecasting model performs better[43]

Replenishment for demand forecasting in retail industryneeds some modifications in general EL strategies regardingretail specific trends and dynamics For this purpose theproposed second integration strategy takes calculation ofweighted average of MAPE for each week of previous 4weeks and additionally MAPE of the previous yearrsquos sameweek and previous year following weekrsquos trend in Equation(4) This makes our forecasts more reliable with respect totrend changes and seasonality behaviors MAPE of each weekis being calculated by the formula defined in Equation (2)above The proposed model automatically changes weeklyweights of eachmember algorithm by theway of their averageMAPE for each week at store and product level Based on theresults ofMAPEweights the best algorithms of the week gainmore weight on final decision

Average MAPE of each algorithm for a product at eachstore is calculated with Equation (4) Suppose that MAPEs ofan algorithm are 02 03 01 and 01 for the previous 4 weeksof the current week and 01 and 02 for the previous yearrsquossame week and prior week Assume that weekly coefficientsare 25 20 10 5 30 and 10 respectively Then theaverage MAPE of an algorithm for the current week will be03 lowast 025 + 03 lowast 02 + 01 lowast01 + 01 lowast 005 + 01 lowast 03 + 02 lowast01 = 02 according to Equation (4)

Mavg =nsumk=1

CkMk (4)

In Equation (4) the sum of coefficients should be 1 that issum119899119896=1 119862119896 = 1 119872119896 means MAPE of the related weeks where

(i) M1 is the MAPE of previous week(ii) M2 is the MAPE of 2 weeks before related week(iii) M3 is the MAPE of 3 weeks before related week(iv) M4 is the MAPE of 4 weeks before related week(v) M5 is the MAPE of the previous years same week(vi) M6 is the MAPE of the previous years previous week

The proposed decision integration system computes a MAPEfor each algorithm for the coming forecasting week per storeand product level In addition the effects of special days aretaken into consideration Christmas Valentinersquos daymotherrsquosday start of Ramadan and other religious days can be thoughtas some examples of special days System users consideringthe current yearrsquos calendar can define special days manuallyThus trends of special days are computed by using previousyearrsquos trends automatically This enables the considerationof seasonality and special events that results in evaluation

of more accurate forecasting values Meanwhile seasonalityand other effects can be taken under consideration aswell Following MAPE calculation of the algorithms newweights (Wi where i=1n) are being assigned to each of themaccording to their weighted average for the current week

The next step is the definition of the best algorithms ofthe week for each store and product couple We only take30 of the best performingmethods into consideration as thebest algorithms After making many empirical observationsthis ratio is giving ultimate results according to the datasetIt is obvious that these parameters are very specific todataset characteristics and also depend on which algorithmsare included in the integration strategy Scaling is the lastpreprocess for the calculation of final decision forecastSuppose that we have n algorithms (A1 A2 An) and k ofthem are the best models of the week for a store and productcouple in which 119896 le 119899 The weight of each winner is beingscaled according to its weighted average in Equation (5)

W1015840t = Wt

sumkj=1Wj

t in (1 k) (5)

Assume that there are 3 best algorithms and their weights are1198821 = 30 1198822 = 18 and1198823 = 12 respectively Theirscaled new weights are going to be

11988210158401 = 3060 = 50

11988210158402 = 1860 = 30

11988210158403 = 1260 = 20

(6)

respectively according to Equation (5)Scaling makes our calculation more comprehensible

After scaling the weight of each algorithm the system givesultimate decision according to new weights by consideringthe performance of each algorithmrsquos with Equation (5)

Favg =ksumj=1FjW1015840j (7)

In Equation (7) the main constraint is sum119896119895=11198821015840119895 = 1 119896 isthe number of champion algorithms and F1 is the forecastof the related algorithm Suppose that the best performingalgorithms are A1 A2 and A3 and algorithm A1 forecastssales quantity as 20 and A2 says it will be 10 for the next weekA3 forecast is 5 Let us assume that their scaled weights are50 30 and 20 respectively Then the weighted forecastis as follows according to Equation (7)

Favg = 20 lowast 050 + 10 lowast 030 + 5 lowast 020 = 14 (8)

Every algorithmhas vote right according to its weight Finallyif a member algorithm does not appear in the list of thebest algorithms for a product and store couple for a specificperiod it is automatically being put into a black list sothat it will not be used anymore for a product store level

Complexity 7

Data Collection

Data Cleaning

Data Enrichment from BigData

Finding Frequent Itemsusing Apriori

ReplenishmentData mart on

Hadoop SPARK

Dimensionality Reduction

Clustering

Deep Learning Model Time Series Models Support VectorRegression Model

Final Decision

Proposed Integration Strategy

Figure 1 Flowchart of the proposed system

as it is marked This enables faster computation time inwhich the system disregards poor performing algorithmsThe algorithm of the proposed demand forecasting systemand the flowchart of the proposed system are given inAlgorithm 1 and Figure 1 respectively

4 Experiment Setup

We use an enhanced dataset which includes SOK Marketrsquosreal life sales and stock data with enriched features Thedataset consists of 106 weeks of sales data that includes 7888distinct products for different stores of SOKMarket On eachstore around 1500 products are actively sold while the rest

rarely become active The forecasting is being performed foreach week to predict demand of the following week Three-fold cross validation methodology is applied for testing dataand then decides with the best accurate one of them forthe next coming week forecast This is a common approachmaking demand forecasting in retail [44] Furthermoreoutside weather information such as daily temperature andother weather conditions is joined to the model as newvariables Since it is known that there is a certain correlationbetween shopping trends and weather conditions at retail inmost circumstances [33] we enriched the dataset with theweather information The dataset also includes mostly salesrelated features filled from relational legacy data sources

8 Complexity

Given n is the number of storesm is the number of products t is the number ofalgorithms in the system and 119860 119905 is an algorithm with index t 119904119898119899 is the matrix whichincludes the number of best performing algorithms for each store and product 119861119898119899 is thematrix which contains the set of blacklist of algorithms for each store and product and119865119894119895 is the matrix which stores final decision of each forecast where i is the number ofstores and j is the number of products

for i=1nfor j=1mfor k=1tif 119860119896 is in list 119861119894119895 then continueelse run 119860119896Calculate algorithm weight119882119896end if

end forfor k=1tChoice best performing algorithms and locate in 119904119894119895

end forfor z=1 119904119894119895do scaling for 119860119911

end forCalculate proposed integration strategy and store in 119865119894119895

end forend forreturn all 119865119899119898

Algorithm 1 The algorithm of the proposed demand forecasting system

Deep learning approach is used to estimate customerdemand for each product at each store of SOK Marketretail chain There is also very apparent similarity betweenproducts which are mostly in the same basket in terms ofsales trends For this purpose Apriori algorithm [4] is utilizedto find correlated products of each product Apriori algorithmis a fast way to find out most associated products which areat the same basketThemost associated productrsquos sales relatedfeatures are added to dataset as wellThis takes the advantageof relationships among products with similar sales trendsAddition of associated product of a product and its featuresenables DL algorithm to learn better within different hiddenlayers In summary it is observed in extensive experimentsthat enhancements of training dataset with weather data andmost associated products sales data made our estimates morepowerful with the usage of DL algorithm

The dataset includes weekly based data of each product ateach store for the latest 8 weeks and sales quantities of the last4 weeks in the same season Historically 2 years of data areincluded for each 5500 stores and 1500 products Data size isabout 875 million records within 155 features These featuresinclude sales quantity customer return quantity receivedproduct quantities from distribution centers sales amountdiscount amount receipt counts of a product customercounts who bought a specific product sales quantity foreach day of week (Monday Tuesday etc) maximum andminimum stock level of each day average stock level for eachweek sales quantity for each hour of the days and also salesquantities of last 4 weeks of the same season Since there isa relationship between products which are generally in thesame basket we prepared the same features defined above for

the most associated products as well Detailed explanation ofthe dataset is presented in Table 2

In Table 2 fields given with square brackets mean that itis an array for each week (Ex Sales quantity week 0 meanssales quantity of current week and sales quantity week 1 isfor sales quantity for one week before and so on) Regardingexperimental results taking weekly data for the latest 8 weeksand 4 weeks of the same season at previous year gives moreacceptable trends of changes in seasonality

For DL implementation part H2O library [31] is usedas an open source big data artificial intelligence platformH2O is a powerful machine learning library and gives usopportunity to implement DL algorithms on Hadoop Sparkbig data environments It puts together the power of highlyadvanced machine learning algorithms and provides to getbenefit of truly scalable in-memory processing for Sparkbig data environment on one or many nodes via its versionof Sparkling Water By the help of in-memory processingcapability of Spark technologies it provides faster parallelplatform to utilize big data for business requirements to getmaximum benefit In this study we use H2O version 31402on 64-bit 16-core and 32GBmemorymachine to observe thepower of parallelism during DL modelling while increasingthe number of neurons

A multilayer feedforward artificial neural network(MLFANN) is employed as a deep learning algorithmMLFANN is trained with stochastic gradient descent usingbackpropagation Using gradient descent each weight isadjusted according to its contribution value to the errorH2O has also advanced features like momentum trainingadaptive learning rate L1 or L2 regularization rate annealing

Complexity 9

Table 2 Dataset explanation

Name Description ExampleYearweek Related yearweek weeks are starting fromMonday to Sunday 201801Store Store number 1234Product Product Identification Number 1

Product Adjactive Associated product with the product according to apriori algorithm Most frequentproduct at the same basket with a specific product 2

Stock In Quantity Week[0-8] Stock increase quantity of the product in related week ex stock transfer quantityfrom distribution center to store 50

Return Quantity Week[0-8] Stock return quantity from customers at a specific week and store 20Sales Quantity Week[0-8] Weekly sales quantity of related product at a specific store 120Sales Amount Week[0-8] Total sales amount of the product at the customer receipt 2500Discount Amount Week[0-8] Discount amount of the product if there is any 500Customer Count Week[0-8] How many customers bought this product at a specific week 30Receipt Count Week[0-8] Distinct receipt count for related product 20Sales Quantity Time[9-22] Hourly sales quantity of related product from 9 am to 22 pm 5Last4weeks Day[1-7] Total sales of each weekday of last 4 weeks Total sales of Mondays Tuesdays etc 10Last8weeks Day[1-7] Total sales of each weekday of last 8 weeks Total sales of Mondays Tuesdays etc 10Max Stock Week[0-8] Maximum stock quantity of related week 12Min Stock Week[0-8] Minimum stock quantity of related week 2Avg Stock Week[0-8] Average stock quantity of related week 5Sales Quantity Adj Week[0-8] Sales quantity of most associated product 14Temperature Weekday[1-7] Daily temperature of weekdays Monday Tuesday etc 22Weekly Avg Temperature[0-8] Average weather temperature of related week 23Weather Condition Weekday[1-7] Nominal variable rainy snowy sunny cloudy etc RainySales Quantity Next Week Target variable of our classification problem 25

dropout and grid search Gaussian distribution is appliedbecause of the usage of continuous variable as responsevariable H2O performs very well in our environment when3 level hidden layers 10 nodes each of them and totally 300epochs are set as parameters

After implementing dimension reduction step K-meansis employed as a clustering algorithm K-means is a greedyand fast clustering algorithm [45] that tries to partitionsamples into k clusters in which each sample is near to thecluster center K is selected as 20 because after several trialsand empirical observations the most even distribution ondataset is reached and divided our dataset into 20 differ-ent clusters for each store After that part deep learningalgorithm is applied and obtained 20 different DL modelsfor each store Then demand forecasting for each productis performed by using its clusterrsquos model on a store basisThe main reason of using clustering is time and computinglack for DL algorithm Instead of making modelling for eachstore 20 models are generated for each store in product levelOur trials with sample dataset bias difference are less than2 so this speed-up is very beneficial and a good optionfor researchers The forecasting result of the DL model istransferred into our forecasting system that makes its finaldecision by considering decisions of 11 forecasting methodsincluding DL model

Forecasting solutions should be scalable process largeamounts of data and extract a model from data Big data

environments using Spark technologies give us opportunityfor implementing algorithms with scalability that shares tasksof machine learning algorithms among nodes of parallelarchitectures Considering huge amounts of samples andlarge number of features even the computational power ofparallel architecture is not enough in some of cases Forthis reason dimension reduction is needed for big dataapplications In this study PCA is employed as a featureextraction step

5 Experimental Results

Forecasting solutions should be scalable process largeamounts of data and extract a model from data Big dataenvironments using Spark technologies give us opportunityfor implementing algorithms with scalability that shares tasksof machine learning algorithms among nodes of parallelarchitectures Considering huge amounts of samples andlarge number of features even the computational power ofparallel architecture is not enough in some of cases Forthis reason dimension reduction is needed for big dataapplications In this study PCA is employed as a featureextraction step

SOK Market sells 21 different groups of items We assessthe performance of the proposed forecasting system on groupbasis as shown in Table 3 Table 3 shows MAPE (Mean Abso-lute Percentage Error) of integration strategies (S1) and (S2)

10 Complexity

Table3Dem

andforecasting

improvem

entsperp

rodu

ctgrou

p

Prod

uct

Group

s

Metho

dI

with

outP

ropo

sed

IntegrationStrategy

(119878 1)

MAPE

Metho

dII

with

Prop

osed

IntegrationStrategy

(119878 2)

MAPE

Prop

osed

Integration

Strategy

with

Deep

Learning

(119878 119863)

MAPE

Percentage

Success

Rate

119875 1=(119878 1minus

119878 2)119878 2

Percentage

Success

Rate

119875 2=(119878 1minus

119878 119863)119878 119863

Improvem

ent

119863=119875 2minus

119875 1Ba

byProd

ucts

05157

03081

02927

4026

4325

299

Bakery

Prod

ucts

03482

02059

01966

4085

4352

267

Beverage

03714

02316

02207

3764

4058

294

Biscuit-C

hocolate

03358

02077

01977

3814

4113

299

BreakfastP

rodu

cts

044

4302770

02661

3765

4011

245

Canned-Paste

-Sauces

03836

02309

02198

3980

4269

289

Cheese

03953

02457

02345

3784

4068

284

Cleaning

Prod

ucts

04560

02791

02650

3879

4189

310

CosmeticsP

rodu

cts

05397

03266

03148

3949

4167

219

DeliM

eats

04242

02602

02488

3865

4136

270

EdibleOils

040

6002299

02215

4336

4545

209

Hou

seho

ldGoo

ds05713

03656

03535

3601

3813

212

IceC

ream

-Frozen

05012

03255

03106

3505

3803

298

Legu

mes-Pasta-Sou

p03850

02397

02269

3774

4107

333

Nuts-Ch

ips

03316

02049

01966

3820

4071

251

PoultryEg

gs04219

02527

02403

4011

4304

294

ReadyMeals

04613

02610

02520

4342

4536

194

RedMeat

02514

01616

01532

3571

3906

335

Tea-Coff

eeProd

ucts

04347

02650

02535

3904

4168

264

Textile

Prod

ucts

05418

03048

02907

4374

46

34

260

To

baccoProd

ucts

03791

02378

02290

3729

3961

232

Average

04238

02582

02469

3899

4168

269

Complexity 11

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 2 MAPE distribution according to product groups with S1 integration strategy

in columns 2 and 3The average percentage forecasting errorsof strategies (S1) and (S2) are 042 and 026 respectively As itcan be seen fromTable 3 the percentage success rate of (S2) inaccordance with (S1) is represented in column 5 The currentintegration strategy (S2) provides 3899 improvement overthe first integration strategy (S1) average In some productgroups like Textile Products the error rate is reduced from054 to 031 with 4374 enhancement in MAPE

With this work a further improvement is obtainedby utilizing DL model compared to the usage of just 10algorithms which are based on forecasting strategy Column4 of Table 3 demonstrates MAPEs of 21 different groups ofproducts after the addition of DL model After the inclusionof DL model into the proposed system we observed around2 to 34 increase in demand forecast accuracies in someof product groups The error rates of some product groupslike ldquobaby productsrdquo ldquocosmetic productsrdquo and ldquohouseholdgoodsrdquo are usually higher than others since these productgroups do not have continuous demand by consumers Forexample the error rate for ldquohousehold goodsrdquo group is 035while the error rate for ldquored meatrdquo group is 015 The errorrates in food products are usually lower since customersconsider their daily consumption regularly to buy theseproducts However the error rates are higher in baby textileand household goods Customers buy products from thesegroups depending on whether they like the product or notselectively In addition the prices of goods in these groupsand promotion effects are usually high relative to the pricesof goods in other groups

Moreover it is observed that the top benefitted productgroups for Method I accomplishes over 40 success rateon Baby Products Bakery Products Edible Oils PoultryEggs Ready Meals and Textile Products when the column

of percentage success rate is analyzed The common pointof these groups is that each one of them is being consumedby a specific customer segment regularly For instance babyproducts group is being chosen by families who have kidsEdible Oils Poultry Eggs and Bakery Products are beingpreferred by frequently and continuously shopping customersegments and Ready Meals are being preferred by mostlybachelor single and working consumers etc Furthermorethe inclusion of DL into the forecasting system indicatesthat some other consuming groups (for example CleaningProducts RedMeats and Legumes Pasta Soup) exhibit betterperformance than the others with additional over 3

Figure 2 shows the box plots of mean percentage errors(MAPE) for each group of products after application of(S1) integration strategy The box plots enable us to analyzedistributional characteristics of forecasting errors for productgroups As it can be seen from the figure there are differentmedians for each product group Thus the forecasting errorsare usually dissimilar for different products groups Theinterquartile range box represents the middle 50 of scoresin data The lengths of interquartile range boxes are usuallyvery tall This means that there are quite number of differentprediction errors within products of a given group

Figure 3 presents the box plots of MAPEs of the proposedforecasting system that apply proposed integration approach(S2) which combines the predictions of 10 different forecast-ing algorithms

Figure 4 shows after adding DL to our system resultsThe lengths of interquartile range boxes are narrower whencompared to the ones in Figures 2 and 3 This means that thedistribution of forecasting errors is less than Figures 2 and 3Finally the integration of DL into the proposed forecastingsystem generates results that are more accurate

12 Complexity

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 3 MAPE distribution according to product groups with S2 integration strategy

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 4 MAPE distribution according to product groups after deep learning algorithms

For example for the (S1) integration strategy in BabyProducts product group MAPE distribution for the 1st quar-tile of data 5 is observed between 0and 23 for 2nd quartilebetween 23 and 48 and for 3rd and 4th between 48and 75 in Figure 2 After applying integration strategy forBaby Products product group MAPE distribution becomes0ndash16 for the 1st quartile 16ndash30 for the 2nd quartileof the data and 30ndash50 for the rest This gives around8 improvement for the first quartile from 8 to 18enhancement for the second quartile and from 18 to 25advancement for the rest of the data according to boundaries

differences Median of MAPE distribution is analyzed as48 After applying the proposed integration strategy it isobserved as 27 which means 21 improvement Approx-imately 1-3 enhancement is observed for each quartileof the data with the inclusion of deep learning strategy inFigure 4

For Cleaning Products similar improvements areobserved with the others for integration strategy MAPEdistribution of the 1st quartile of data is observed between0 and 19 for the 2nd quartile it is between 19 and 40and for the rest of data it is observed between 40 and

Complexity 13

SUM

(SA

LE Q

UAN

TITY

)

DATE

9000

8000

7000

6000

5000

4000

3000

2000

1000

0

July August Sep Oct Nov Dec Jan Feb March April May June July

ActualS1

S2SD

2017 2017 2017 2017 2017 2017 2018 2018 2018 2018 2018 2018 2018

Figure 5 Accuracy comparison among integration strategies for one sample product group

73 in Figure 2 After performing the proposed integrationstrategy for Cleaning Products MAPE distribution becomes0ndash10 for the 1st quartile of data 10ndash17 for the 2ndquartile and 27ndash43 for the 3rd and 4th quartiles togetherImprovements are observed as follows 9 for the 1st quartilefrom 9 to 13 for the 2nd quartile and from 13 to 20for the rest of the data at boundaries After implementationof the integration strategy for the 1st and 2nd quartiles thesystem advancement is observed as 2 and for the 3rdand 4th quartiles it is 3 respectively Briefly integrationstrategy improvement is remarkable for each product groupand consolidation of DL to the proposed integration strategyadvances results between almost 2 and 3 additionally

In Figure 5 the accuracy comparison among integrationstrategies of one of the best performing sample groups ispresented during 1-year period Particularly Christmas andother seasonality effects can be seen very clearly at Figure 5and integration strategy with DL (SD) is performing with thebest accuracy

It is hard to compare the performance of our proposedsystem with the other studies because of the lack of workswith similar combinations of deep learning approach theproposed decision integration strategies and different learn-ing algorithms for demand forecasting process Anotherdifficulty is the deficiency of real-life dataset similar toSOK dataset in order to compare the state-of-art studiesAlthough the results of proposed system are given in thisstudy we also report the results of a number of researchworkshere on demand forecasting domain A greedy aggregation-decomposition method has solved a real-world intermittentdemand forecasting problem for a fashion retailer in Sin-gapore [46] They report 59 MAPE success rate with asmall dataset Authors compare different approaches suchas statistical model winter model and radius basis functionneural network (RBFNN) with SVM for demand forecastingprocess in [47] As a result they conclude the study by the

fact that the success of SVM algorithm surpasses otherswith around 77 enhancement at average MAPE resultslevel A novel demand forecasting model called SHEnSVM(Selective and Heterogeneous Ensemble of Support Vec-tor Machines) is proposed in [48] The proposed modelpresents that the individual SVMs are trained by differentsamples generated by bootstrap algorithm After that geneticalgorithm is employed for retrieving the best individualcombination schema They report 10 advancement withSVM algorithm and 64 MAPE improvement They onlyemploy beer datawith 3 different brands in their experimentsTugba Efendigil et al propose a novel forecasting mechanismwhich is modeled by artificial intelligence approaches in[49] They compared both artificial neural networks andadaptive network-based fuzzy inference system techniques tomanage the fuzzy demandwith incomplete informationTheyreached around 18 MAPE rates for some products duringtheir experiments

6 Conclusion

In retail industry demand forecasting is one of the mainproblems of supply chains to optimize stocks reduce costsand increase sales profit and customer loyalty To overcomethis issue there are several methods such as time seriesanalysis and machine learning approaches to analyze andlearn complex interactions and patterns from historical data

In this study there is a novel attempt to integrate the11 different forecasting models that include time series algo-rithms support vector regression model and deep learningmethod for demand forecasting process Moreover a noveldecision integration strategy is developed by drawing inspi-ration from the ensemble methodology namely boostingThe proposed approach considers the performance of eachmodel in time and combines the weighted predictions of thebest performing models for demand forecasting process The

14 Complexity

proposed forecasting system is tested and carried out real lifedata obtained from SOK Market retail chain It is observedthat the inclusion of different learning algorithms except timeseries models and a novel integration strategy advanced theperformance of demand forecasting system To the best ofour knowledge this is the very first attempt to consolidatedeep learning methodology SVR algorithm and differenttime series methods for demand forecasting systems Fur-thermore the other novelty of this work is the adaptation ofboosting ensemble strategy to the demand forecasting modelIn this way the final decision of the proposed system isbased on the best algorithms of the week by gaining moreweight This makes our forecasts more reliable with respectto the trend changes and seasonality behaviors Moreoverthe proposed approach performs very well integrating withdeep learning algorithm on Spark big data environmentDimension reduction process and clustering methods helpto decrease time consuming with less computing powerduring deep learning modelling phase Although a reviewof some of similar studies is presented in Section 5 asit is expected it is very difficult to compare the resultsof other studies with ours because of the use of differentdatasets and methods In this study we compare results ofthree models where model one selects the best performingforecasting method depending on its success on previousperiod with 424 MAPE on average The second modelwith the novel integration strategy results in 258 MAPEon average The last model with the novel integration strat-egy enhanced with deep learning approach provides 247MAPE on average As a result the inclusion of deep learningapproach into the novel integration strategy reduces averageprediction error for demand forecasting process in supplychain

As a future work we plan to enrich the set of featuresby gathering data from other sources like economic studiesshopping trends social media social events and locationbased demographic data of stores New variety of datasources contributions to deep learning can be observedOne more study can be done to determine the hyperpa-rameters for deep learning algorithm In addition we alsoplan to use other deep learning techniques such as con-volutional neural networks recurrent neural networks anddeep neural networks as learning algorithms Furthermoreour another objective is to use heuristic methods MBO(Migrating Birds Optimization) and other related algorithms[50] to optimize some of coefficientsweights which weredetermined empirically by trial-and-error like taking 30percent of the best performing methods in our currentsystem

Data Availability

This dataset is private customer data so the agreement withthe organization SOK does not allow sharing data

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work is supported by OBASE Research amp DevelopmentCenter

References

[1] P J McGoldrick and E Andre ldquoConsumer misbehaviourpromiscuity or loyalty in grocery shoppingrdquo Journal of Retailingand Consumer Services vol 4 no 2 pp 73ndash81 1997

[2] D Grewal A L Roggeveen and J Nordfalt ldquoThe Future ofRetailingrdquo Journal of Retailing vol 93 no 1 pp 1ndash6 2017

[3] E T Bradlow M Gangwar P Kopalle and S Voleti ldquothe roleof big data and predictive analytics in retailingrdquo Journal ofRetailing vol 93 no 1 pp 79ndash95 2017

[4] J Han M Kamber and J Pei Data Mining Concepts andTechniques Morgan Kaufmann Publishers USA 2013

[5] R Hyndman and G Athanasopoulos Forecasting Principlesand Practice OTexts Melbourne Australia 2018 httpotextsorgfpp2

[6] C C Pegels ldquoExponential forecasting some new variationsrdquoManagement Science vol 12 pp 311ndash315 1969

[7] E S Gardner ldquoExponential smoothing the state of the artrdquoJournal of Forecasting vol 4 no 1 pp 1ndash28 1985

[8] D Gujarati Basic Econometrics Mcgraw-Hill New York NYUSA 2003

[9] J D Croston ldquoForecasting and stock control for intermittentdemandsrdquo Operational Research Quarterly vol 23 no 3 pp289ndash303 1972

[10] T RWillemain C N Smart J H Shockor and P A DeSautelsldquoForecasting intermittent demand inmanufacturing a compar-ative evaluation of Crostonrsquos methodrdquo International Journal ofForecasting vol 10 no 4 pp 529ndash538 1994

[11] A Syntetos Forecasting of intermittent demand Brunel Univer-sity 2001

[12] A A Syntetos and J E Boylan ldquoThe accuracy of intermittentdemand estimatesrdquo International Journal of Forecasting vol 21no 2 pp 303ndash314 2005

[13] R H Teunter A A Syntetos and M Z Babai ldquoIntermittentdemand linking forecasting to inventory obsolescencerdquo Euro-pean Journal of Operational Research vol 214 no 3 pp 606ndash615 2011

[14] R J Hyndman R A Ahmed G Athanasopoulos and H LShang ldquoOptimal combination forecasts for hierarchical timeseriesrdquo Computational Statistics amp Data Analysis vol 55 no 9pp 2579ndash2589 2011

[15] R Fildes and F Petropoulos ldquoSimple versus complex selectionrules for forecasting many time seriesrdquo Journal of BusinessResearch vol 68 no 8 pp 1692ndash1701 2015

[16] C Li and A Lim ldquoA greedy aggregationndashdecompositionmethod for intermittent demand forecasting in fashion retail-ingrdquo European Journal of Operational Research vol 269 no 3pp 860ndash869 2018

[17] F Turrado Garcıa L J Garcıa Villalba and J Portela ldquoIntelli-gent system for time series classification using support vectormachines applied to supply-chainrdquo Expert Systems with Appli-cations vol 39 no 12 pp 10590ndash10599 2012

[18] G Song and Q Dai ldquoA novel double deep ELMs ensemblesystem for time series forecastingrdquo Knowledge-Based Systemsvol 134 pp 31ndash49 2017

Complexity 15

[19] O Araque I Corcuera-Platas J F Sanchez-Rada and C AIglesias ldquoEnhancing deep learning sentiment analysis withensemble techniques in social applicationsrdquoExpert SystemswithApplications vol 77 pp 236ndash246 2017

[20] H Tong B Liu and S Wang ldquoSoftware defect predictionusing stacked denoising autoencoders and twostage ensemblelearningrdquo Information and Soware Technology vol 96 pp 94ndash111 2018

[21] X Qiu Y Ren P N Suganthan and G A J AmaratungaldquoEmpirical mode decomposition based ensemble deep learningfor load demand time series forecastingrdquo Applied So Comput-ing vol 54 pp 246ndash255 2017

[22] Z Qi BWang Y Tian and P Zhang ldquoWhen ensemble learningmeets deep learning a new deep support vector machine forclassificationrdquo Knowledge-Based Systems vol 107 pp 54ndash602016

[23] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[24] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[25] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesrdquo inNeural Networks Tricks of theTrade vol 7700 of Lecture Notes in Computer Science pp 437ndash478 Springer Berlin Germany 2nd edition 2012

[26] Y LeCun Y Bengio and G Hinton ldquoDeep learning ReviewrdquoInternational Journal of Business and Social Science vol 3 2015

[27] S Kim Z Yu R M Kil and M Lee ldquoDeep learning of supportvectormachineswith class probability output networksrdquoNeuralNetworks vol 64 pp 19ndash28 2015

[28] P J Brockwell and R A Davis Time Series eory AndMethods Springer Series in Statistics NewYork NY USA 1989

[29] V N Vapnik Statistical Learning eory Wiley- InterscienceNew York NY USA 1998

[30] S Haykin Neural Networks and LearningMachines MacmillanPublishers Limited New Jersey NJ USA 2009

[31] A Candel V Parmar E LeDell and A Arora Deep Learningwith H2O United States of America 2018

[32] K Keerthi Vasan and B Surendiran ldquoDimensionality reductionusing principal component analysis for network intrusiondetectionrdquo Perspectives in Science vol 8 pp 510ndash512 2016

[33] L Rokach ldquoEnsemble-based classifiersrdquo Artificial IntelligenceReview vol 33 no 1-2 pp 1ndash39 2010

[34] R Polikar ldquoEnsemble based systems in decision makingrdquo IEEECircuits and Systems Magazine vol 6 no 3 pp 21ndash45 2006

[35] Reid SamA review of heterogeneous ensemblemethods Depart-ment of Computer Science University of Colorado at Boulder2007

[36] D Gopika and B Azhagusundari ldquoAn analysis on ensem-ble methods in classification tasksrdquo International Journal ofAdvanced Research in Computer and Communication Engineer-ing vol 3 no 7 pp 7423ndash7427 2014

[37] Y Ren L Zhang and P N Suganthan ldquoEnsemble classificationand regression-recent developments applications and futuredirectionsrdquo IEEE Computational Intelligence Magazine vol 11no 1 pp 41ndash53 2016

[38] U G Mangai S Samanta S Das and P R ChowdhuryldquoA survey of decision fusion and feature fusion strategies for

pattern classificationrdquo IETE Technical Review vol 27 no 4 pp293ndash307 2010

[39] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014

[40] G Tsoumakas L Angelis and I Vlahavas ldquoSelective fusion ofheterogeneous classifiersrdquo Intelligent Data Analysis vol 9 no 6pp 511ndash525 2005

[41] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[42] S Chopra and P Meindl Supply Chain Management StrategyPlanning and Operation Prentice Hall 2004

[43] G Kushwaha ldquoOperational performance through supply chainmanagement practicesrdquo International Journal of Business andSocial Science vol 217 pp 65ndash77 2012

[44] G E Box G M Jenkins G Reinsel and G M LjungTime Series Analysis Forecasting and Control Wiley Series inProbability and Statistics John Wiley amp Sons Hoboken NJUSA 5th edition 2015

[45] W-L Zhao C-H Deng and C-W Ngo ldquok-means a revisitrdquoNeurocomputing vol 291 pp 195ndash206 2018

[46] C Li andA Lim ldquoA greedy aggregation-decompositionmethodfor intermittent demand forecasting in fashion retailingrdquo Euro-pean Journal of Operational Research vol 269 no 3 pp 860ndash869 2018

[47] L Yue Y Yafeng G Junjun and T Chongli ldquoDemand forecast-ing by using support vectormachinerdquo inProceedings of theirdInternational Conference on Natural Computation (ICNC 2007)pp 272ndash276 Haikou China August 2007

[48] L Yue L Zhenjiang Y Yafeng T Zaixia G Junjun andZ Bofeng ldquoSelective and heterogeneous SVM ensemble fordemand forecastingrdquo in Proceedings of the 2010 IEEE 10th Inter-national Conference on Computer and Information Technology(CIT) pp 1519ndash1524 Bradford UK June 2010

[49] T Efendigil S Onut and C Kahraman ldquoA decision supportsystem for demand forecasting with artificial neural networksand neuro-fuzzy models a comparative analysisrdquo Expert Sys-tems with Applications vol 36 no 3 pp 6697ndash6707 2009

[50] E Duman M Uysal and A F Alkaya ldquoMigrating birds opti-mization a newmetaheuristic approach and its performance onquadratic assignment problemrdquo Information Sciences vol 217pp 65ndash77 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 4: An Improved Demand Forecasting Model Using …downloads.hindawi.com/journals/complexity/2019/9067367.pdfAn Improved Demand Forecasting Model Using Deep Learning Approach and Proposed

4 Complexity

Table 1 Time series algorithms used in demand forecasting

Models Formulation Definition of Variables

Regression Model 1 119910119905 = 1205730 + 12057311198831119905 + 12057321198832119905 + 12057331198833119905 + 12057341198834119905 + 120598119905Variability in weekly customer demand special days andholidays (X1) discount dates (X2) days when the productis closed for sale (X3) and sales (X4) that cannot explain

these three activities but are exceptionalRegression Model2 119910119905 = 1205730 + 12057311198831119905 + 12057321198832119905 + 12057331198833119905 + sum52119895=1 120574119895119882119895119905 + 120598119905 R1 is modeled by adding weekly partial-regressive terms

(119882119895119905 119895 = 1 52)Regression Model3

119910119905 =1205730+12057311198831119905+12057321198832119905+12057331198833119905+sum12119895=1 120572119895119872119895119905+sum4119895=1 120574119895119882119895119905+120598119905The R3 model adds the shadow regression terms to the R1model for each month of the year119872119895119905 119895 = 1 12 andfor the months of the week119882119895119905 119895 = 1 4 end result

SVR (SupportVector Regression) 119910119905 = 119878119881119877 (119894119899119901119906119905 V119886119903119894119886119887119897119890119904)

The SVR prediction is used for forecasting 119910119905 by usinginputs X1 X2 X3 mentioned regression model 1 each

month of the year119872119895119905 119895 = 1 12 and the months ofthe week119882119895119905 119895 = 1 4

ExponentialSmoothing Model 119910119905 = 120572119910119905minus1 + (1 minus 120572) 119910119905minus1 + 120598119905 Exponential smoothing is generally used for products

where there is no trend or seasonality in the model

Holt-TrendMethods

119910119905 = 119871 119905 + 119879119905119871 119905 = 120572119910119905 + (1 minus 120572) (119871 119905 minus 1)119879119905 = 120573 (119871 119905 minus 119871 119905minus1) + (1 minus 120573)119879119905minus1Additive Holt-Trend model has been evaluated for

products with level (L) and trend (T)

Holt-WintersSeasonal Models

119910119905 = 119871 119905 + 119879119905 + 119878119905+119901+1119871 119905 = 120572119910119905 + (1 minus 120572) (119871 119905 + 119879119905)119879119905 = 120573 (119871 119905 minus 119871 119905minus1) + (1 minus 120573) 119879119905minus1119878119905+119901+1 = 120574(119863119905+1119871 119905+1 ) + (1 minus 120574) 119878119905+1

Additive Holt-Winters Seasonal models have beenevaluated for products with level (L) seasonality (S) and

trends (T)

Two-Level Model 11199101119905 = 120572119910119905minus1 + (1 minus 120572) 119910119905minus1 + 1205981199051199102119905 = 1205730 + 12057311198831119905 + 12057321198832119905 + 12057331198833119905 + 12057341198834119905 + 120598119905119910119905 = 1199101119905 minus 1199102119905 + 120598119905

In this method R1 model was first applied to the timeseries Later the demand values estimated by this modelare subtracted from the customer demand data and theresidual values are fitted with exponential correction andHolt-Winters Exponential Smoothing Model Estimates ofthese two metrics were collected and the product demand

estimate was calculated and evaluated

Two-Level Model 2

1199101119905 = 119871 119905 + 119879119905119871 119905 = 1205721199101119905 + (1 minus 120572) (119871 119905 minus 1)119879119905 = 120573 (119871 119905 minus 119871 119905minus1) + (1 minus 120573) 119879119905minus11199102119905 = 1205730 + 12057311198831119905 + 12057321198832119905 + 12057331198833119905 + 12057341198834119905 + 120598119905119910119905 = 1199101119905 minus 1199102119905 + 120598119905

In this method R1 model was first applied to the timeseries Later the demand values estimated by this modelare subtracted from the customer demand data and theresidual values are fitted with exponential correction andHolt-Winters Trend Model Estimates of these two metrics

were collected and the product demand estimate wascalculated and evaluated

Two-Level Model 3

1199101119905 = 119871 119905 + 119879119905 + 119878119905+119901+1119871 119905 = 120572119910119905 + (1 minus 120572) (119871 119905 + 119879119905)119879119905 = 120573 (119871 119905 minus 119871 119905minus1) + (1 minus 120573) 119879119905minus1119878119905+119901+1 = 120574(119863119905+1119871 119905+1 ) + (1 minus 120574) 119878119905+11199102119905 = 1205730 + 12057311198831119905 + 12057321198832119905 + 12057331198833119905 + 12057341198834119905 + 120598119905119910119905 = 1199101119905 minus 1199102119905 + 120598119905

In this method R1 model was first applied to the timeseries Later the demand values estimated by this modelare subtracted from the customer demand data and theresidual values are fitted with exponential correction andHolt-Winters Seasonal Model Estimates of these two

metrics were collected and the product demand estimatewas calculated and evaluated

33 Deep Learning Machine learning approach can analyzefeatures relationships and complex interactions among fea-tures of a problem from samples of a dataset and learn amodel which can be used for demand forecasting Deeplearning (DL) is a machine learning technique that appliesdeep neural network architectures to solve various complexproblems DL has become a very popular research topicamong researchers and has been shown to provide impres-sive results in image processing computer vision naturallanguage processing bioinformatics and many other fields[25 26]

In principle DL is an implementation of artificial neuralnetworks which mimic natural human brain [30] Howevera deep neural network is more powerful and capable ofanalyzing and composing more complex features and rela-tionships than a traditional neural network DL requires highcomputing power and large amounts of data for training Therecent improvements in GPU (Graphical Processing Unit)and parallel architectures enabled the necessary computingpower required in deep neural networks DL uses successivelayers of neurons where each layer extracts more complexand abstracts features from the output of previous layers

Complexity 5

Thus a DL can automatically perform feature extraction initself without any preprocessing step Visual object recogni-tion speech recognition and genomics are some of the fieldswhere DL is applied successfully [26]

A multilayer feedforward artificial neural network(MLFANN) is employed as a deep learning algorithm in thisstudy In a feedforward neural network information movesfrom the input nodes to the hidden nodes and at the end tothe output nodes in successive layers of the network withoutany feedback MLFANN is trained with stochastic gradientdescent using backpropagation It uses gradient descentalgorithm to update the weights purposing to minimize thesquared error among the network output values and thetarget output values Using gradient descent each weightis adjusted according to its contribution value to the errorFor deep learning part H2O library [31] is used as an opensource big data artificial intelligence platform

34 Feature Extraction In this work there are some issuesbecause of the huge amount of data When we try to useall of the features of all products in each store with 155features and 875 million records deep learning algorithmtook days to finish its job due to limited computational powerFor the purpose reducing the number of features and thecomputational time is required for each algorithm In orderto overcome this problem and to accelerate clustering andmodelling phases PCA (Principal Component Analysis) isused as a feature extraction algorithm PCA is a commonlyused dimension reduction algorithm that presents the mostsignificant features PCA transforms each instance of thegiven dataset from d dimensional space to a k dimensionalsubspace in which the new generated set of k dimensions arecalled the Principal Components (PC) [32] Each principalcomponent is directed to a maximum variance excluding thevariance accounted for in all its preceding components As aresult the first component covers the maximum variance andso on In brief Principal Components are represented as

PCi = a1X1 + a2X2+ (1)

wherePCi is ith principal component Xj is j

th original featureand aj is the numerical coefficient for feature Xj

35 Proposed Integration Strategy Decision integrationapproach is being used by the way of combining thestrengths of different algorithms into a single collaboratedmethod philosophy By the help of this approach it is aimedto improve the success of the proposed forecasting system bycombining different algorithms Since each algorithm can bemore sensitive or can have some weaknesses under differentconditions collecting decision of each model providesmore effective and powerful results for decision makingprocesses

There are twomain EL approaches in the literature calledhomogeneous and heterogeneous EL If different types ofclassifiers are used as base algorithms then such a systemis called heterogeneous ensemble otherwise homogenousensemble In this study we concentrate on heterogeneousensembles An ensemble system is composed of two parts

ensemble generation and ensemble integration [33ndash36] Inensemble generation part a diverse set of models are gen-erated using different base classifiers Nine different timeseries and regression methods support vector regressionmodel and deep learning algorithm are employed as baseclassifiers in this study There are many integration methodsthat combine decisions of base classifiers to obtain a finaldecision [37ndash40] For integration step a new decision inte-gration strategy is proposed for demand forecasting modelin this work We draw inspiration from boosting ensemblemodel [4 41] to construct our proposed decision integrationstrategy The essential concept behind boosting methodol-ogy is that in prediction or classification problems finaldecisions can be calculated as weighted combination of theresults

In order to integrate the prediction of each forecastingalgorithm we used two integration strategies for the pro-posed approach The first integration strategy selects the bestperforming forecasting method among others and uses thatmethod to predict the demand of a product in a store forthe next week The second integration strategy chooses thebest performing forecasting methods of current week andcalculates the prediction by combining weighted predictionsof winners In our decision integration strategy final decisionis determined by regarding contribution of all algorithmslike democratic systems Our approach considers decisionsof forecasting algorithms those are performing better byconsidering the previous 4 weeks of this year and last yeartransformation of current week and previous week Whilealgorithms are getting better at forecasts by the time theircontribution weights are increasing accordingly or vice versaIn first decision integration strategy we do not considercontribution of all algorithms in fully democratic mannerwe only look at contributions of the best algorithms ofrelated week In other words final decision is maintainedby integrating the decisions of only the best ones (selectedaccording to their historical decisions) for each store andproductOn the other hand the best algorithms of a week canchange according to each store product couple since everyalgorithm has different behavior for different products andlocations

The second decision integration strategy is basedon the weighted average of mean absolute percentageerror (MAPE) and mean absolute deviation (MAD) [42]These are two popular evaluation metrics to assess theperformance of forecasting models The equations forcalculation of these forecast accuracy measures are asfollows

MAPE (Mean Absolute Percentage Error)

MAPE = 1n

nsumt=1

1003816100381610038161003816Ft minus At1003816100381610038161003816

At(2)

MAD = 1n

nsumt=1

1003816100381610038161003816Ft minus At1003816100381610038161003816 (3)

MAD (Mean Absolute Deviation)

6 Complexity

where

(i) 119865119905 is the expected or estimated value for period 119905(ii) 119860 119905 is the actual value for period t(iii) n is also taking the value of the number of periods

Accuracy of a model can be simply computed as followsAccuracy = 1-MAPE If the values of the above two measuresare small it means that forecasting model performs better[43]

Replenishment for demand forecasting in retail industryneeds some modifications in general EL strategies regardingretail specific trends and dynamics For this purpose theproposed second integration strategy takes calculation ofweighted average of MAPE for each week of previous 4weeks and additionally MAPE of the previous yearrsquos sameweek and previous year following weekrsquos trend in Equation(4) This makes our forecasts more reliable with respect totrend changes and seasonality behaviors MAPE of each weekis being calculated by the formula defined in Equation (2)above The proposed model automatically changes weeklyweights of eachmember algorithm by theway of their averageMAPE for each week at store and product level Based on theresults ofMAPEweights the best algorithms of the week gainmore weight on final decision

Average MAPE of each algorithm for a product at eachstore is calculated with Equation (4) Suppose that MAPEs ofan algorithm are 02 03 01 and 01 for the previous 4 weeksof the current week and 01 and 02 for the previous yearrsquossame week and prior week Assume that weekly coefficientsare 25 20 10 5 30 and 10 respectively Then theaverage MAPE of an algorithm for the current week will be03 lowast 025 + 03 lowast 02 + 01 lowast01 + 01 lowast 005 + 01 lowast 03 + 02 lowast01 = 02 according to Equation (4)

Mavg =nsumk=1

CkMk (4)

In Equation (4) the sum of coefficients should be 1 that issum119899119896=1 119862119896 = 1 119872119896 means MAPE of the related weeks where

(i) M1 is the MAPE of previous week(ii) M2 is the MAPE of 2 weeks before related week(iii) M3 is the MAPE of 3 weeks before related week(iv) M4 is the MAPE of 4 weeks before related week(v) M5 is the MAPE of the previous years same week(vi) M6 is the MAPE of the previous years previous week

The proposed decision integration system computes a MAPEfor each algorithm for the coming forecasting week per storeand product level In addition the effects of special days aretaken into consideration Christmas Valentinersquos daymotherrsquosday start of Ramadan and other religious days can be thoughtas some examples of special days System users consideringthe current yearrsquos calendar can define special days manuallyThus trends of special days are computed by using previousyearrsquos trends automatically This enables the considerationof seasonality and special events that results in evaluation

of more accurate forecasting values Meanwhile seasonalityand other effects can be taken under consideration aswell Following MAPE calculation of the algorithms newweights (Wi where i=1n) are being assigned to each of themaccording to their weighted average for the current week

The next step is the definition of the best algorithms ofthe week for each store and product couple We only take30 of the best performingmethods into consideration as thebest algorithms After making many empirical observationsthis ratio is giving ultimate results according to the datasetIt is obvious that these parameters are very specific todataset characteristics and also depend on which algorithmsare included in the integration strategy Scaling is the lastpreprocess for the calculation of final decision forecastSuppose that we have n algorithms (A1 A2 An) and k ofthem are the best models of the week for a store and productcouple in which 119896 le 119899 The weight of each winner is beingscaled according to its weighted average in Equation (5)

W1015840t = Wt

sumkj=1Wj

t in (1 k) (5)

Assume that there are 3 best algorithms and their weights are1198821 = 30 1198822 = 18 and1198823 = 12 respectively Theirscaled new weights are going to be

11988210158401 = 3060 = 50

11988210158402 = 1860 = 30

11988210158403 = 1260 = 20

(6)

respectively according to Equation (5)Scaling makes our calculation more comprehensible

After scaling the weight of each algorithm the system givesultimate decision according to new weights by consideringthe performance of each algorithmrsquos with Equation (5)

Favg =ksumj=1FjW1015840j (7)

In Equation (7) the main constraint is sum119896119895=11198821015840119895 = 1 119896 isthe number of champion algorithms and F1 is the forecastof the related algorithm Suppose that the best performingalgorithms are A1 A2 and A3 and algorithm A1 forecastssales quantity as 20 and A2 says it will be 10 for the next weekA3 forecast is 5 Let us assume that their scaled weights are50 30 and 20 respectively Then the weighted forecastis as follows according to Equation (7)

Favg = 20 lowast 050 + 10 lowast 030 + 5 lowast 020 = 14 (8)

Every algorithmhas vote right according to its weight Finallyif a member algorithm does not appear in the list of thebest algorithms for a product and store couple for a specificperiod it is automatically being put into a black list sothat it will not be used anymore for a product store level

Complexity 7

Data Collection

Data Cleaning

Data Enrichment from BigData

Finding Frequent Itemsusing Apriori

ReplenishmentData mart on

Hadoop SPARK

Dimensionality Reduction

Clustering

Deep Learning Model Time Series Models Support VectorRegression Model

Final Decision

Proposed Integration Strategy

Figure 1 Flowchart of the proposed system

as it is marked This enables faster computation time inwhich the system disregards poor performing algorithmsThe algorithm of the proposed demand forecasting systemand the flowchart of the proposed system are given inAlgorithm 1 and Figure 1 respectively

4 Experiment Setup

We use an enhanced dataset which includes SOK Marketrsquosreal life sales and stock data with enriched features Thedataset consists of 106 weeks of sales data that includes 7888distinct products for different stores of SOKMarket On eachstore around 1500 products are actively sold while the rest

rarely become active The forecasting is being performed foreach week to predict demand of the following week Three-fold cross validation methodology is applied for testing dataand then decides with the best accurate one of them forthe next coming week forecast This is a common approachmaking demand forecasting in retail [44] Furthermoreoutside weather information such as daily temperature andother weather conditions is joined to the model as newvariables Since it is known that there is a certain correlationbetween shopping trends and weather conditions at retail inmost circumstances [33] we enriched the dataset with theweather information The dataset also includes mostly salesrelated features filled from relational legacy data sources

8 Complexity

Given n is the number of storesm is the number of products t is the number ofalgorithms in the system and 119860 119905 is an algorithm with index t 119904119898119899 is the matrix whichincludes the number of best performing algorithms for each store and product 119861119898119899 is thematrix which contains the set of blacklist of algorithms for each store and product and119865119894119895 is the matrix which stores final decision of each forecast where i is the number ofstores and j is the number of products

for i=1nfor j=1mfor k=1tif 119860119896 is in list 119861119894119895 then continueelse run 119860119896Calculate algorithm weight119882119896end if

end forfor k=1tChoice best performing algorithms and locate in 119904119894119895

end forfor z=1 119904119894119895do scaling for 119860119911

end forCalculate proposed integration strategy and store in 119865119894119895

end forend forreturn all 119865119899119898

Algorithm 1 The algorithm of the proposed demand forecasting system

Deep learning approach is used to estimate customerdemand for each product at each store of SOK Marketretail chain There is also very apparent similarity betweenproducts which are mostly in the same basket in terms ofsales trends For this purpose Apriori algorithm [4] is utilizedto find correlated products of each product Apriori algorithmis a fast way to find out most associated products which areat the same basketThemost associated productrsquos sales relatedfeatures are added to dataset as wellThis takes the advantageof relationships among products with similar sales trendsAddition of associated product of a product and its featuresenables DL algorithm to learn better within different hiddenlayers In summary it is observed in extensive experimentsthat enhancements of training dataset with weather data andmost associated products sales data made our estimates morepowerful with the usage of DL algorithm

The dataset includes weekly based data of each product ateach store for the latest 8 weeks and sales quantities of the last4 weeks in the same season Historically 2 years of data areincluded for each 5500 stores and 1500 products Data size isabout 875 million records within 155 features These featuresinclude sales quantity customer return quantity receivedproduct quantities from distribution centers sales amountdiscount amount receipt counts of a product customercounts who bought a specific product sales quantity foreach day of week (Monday Tuesday etc) maximum andminimum stock level of each day average stock level for eachweek sales quantity for each hour of the days and also salesquantities of last 4 weeks of the same season Since there isa relationship between products which are generally in thesame basket we prepared the same features defined above for

the most associated products as well Detailed explanation ofthe dataset is presented in Table 2

In Table 2 fields given with square brackets mean that itis an array for each week (Ex Sales quantity week 0 meanssales quantity of current week and sales quantity week 1 isfor sales quantity for one week before and so on) Regardingexperimental results taking weekly data for the latest 8 weeksand 4 weeks of the same season at previous year gives moreacceptable trends of changes in seasonality

For DL implementation part H2O library [31] is usedas an open source big data artificial intelligence platformH2O is a powerful machine learning library and gives usopportunity to implement DL algorithms on Hadoop Sparkbig data environments It puts together the power of highlyadvanced machine learning algorithms and provides to getbenefit of truly scalable in-memory processing for Sparkbig data environment on one or many nodes via its versionof Sparkling Water By the help of in-memory processingcapability of Spark technologies it provides faster parallelplatform to utilize big data for business requirements to getmaximum benefit In this study we use H2O version 31402on 64-bit 16-core and 32GBmemorymachine to observe thepower of parallelism during DL modelling while increasingthe number of neurons

A multilayer feedforward artificial neural network(MLFANN) is employed as a deep learning algorithmMLFANN is trained with stochastic gradient descent usingbackpropagation Using gradient descent each weight isadjusted according to its contribution value to the errorH2O has also advanced features like momentum trainingadaptive learning rate L1 or L2 regularization rate annealing

Complexity 9

Table 2 Dataset explanation

Name Description ExampleYearweek Related yearweek weeks are starting fromMonday to Sunday 201801Store Store number 1234Product Product Identification Number 1

Product Adjactive Associated product with the product according to apriori algorithm Most frequentproduct at the same basket with a specific product 2

Stock In Quantity Week[0-8] Stock increase quantity of the product in related week ex stock transfer quantityfrom distribution center to store 50

Return Quantity Week[0-8] Stock return quantity from customers at a specific week and store 20Sales Quantity Week[0-8] Weekly sales quantity of related product at a specific store 120Sales Amount Week[0-8] Total sales amount of the product at the customer receipt 2500Discount Amount Week[0-8] Discount amount of the product if there is any 500Customer Count Week[0-8] How many customers bought this product at a specific week 30Receipt Count Week[0-8] Distinct receipt count for related product 20Sales Quantity Time[9-22] Hourly sales quantity of related product from 9 am to 22 pm 5Last4weeks Day[1-7] Total sales of each weekday of last 4 weeks Total sales of Mondays Tuesdays etc 10Last8weeks Day[1-7] Total sales of each weekday of last 8 weeks Total sales of Mondays Tuesdays etc 10Max Stock Week[0-8] Maximum stock quantity of related week 12Min Stock Week[0-8] Minimum stock quantity of related week 2Avg Stock Week[0-8] Average stock quantity of related week 5Sales Quantity Adj Week[0-8] Sales quantity of most associated product 14Temperature Weekday[1-7] Daily temperature of weekdays Monday Tuesday etc 22Weekly Avg Temperature[0-8] Average weather temperature of related week 23Weather Condition Weekday[1-7] Nominal variable rainy snowy sunny cloudy etc RainySales Quantity Next Week Target variable of our classification problem 25

dropout and grid search Gaussian distribution is appliedbecause of the usage of continuous variable as responsevariable H2O performs very well in our environment when3 level hidden layers 10 nodes each of them and totally 300epochs are set as parameters

After implementing dimension reduction step K-meansis employed as a clustering algorithm K-means is a greedyand fast clustering algorithm [45] that tries to partitionsamples into k clusters in which each sample is near to thecluster center K is selected as 20 because after several trialsand empirical observations the most even distribution ondataset is reached and divided our dataset into 20 differ-ent clusters for each store After that part deep learningalgorithm is applied and obtained 20 different DL modelsfor each store Then demand forecasting for each productis performed by using its clusterrsquos model on a store basisThe main reason of using clustering is time and computinglack for DL algorithm Instead of making modelling for eachstore 20 models are generated for each store in product levelOur trials with sample dataset bias difference are less than2 so this speed-up is very beneficial and a good optionfor researchers The forecasting result of the DL model istransferred into our forecasting system that makes its finaldecision by considering decisions of 11 forecasting methodsincluding DL model

Forecasting solutions should be scalable process largeamounts of data and extract a model from data Big data

environments using Spark technologies give us opportunityfor implementing algorithms with scalability that shares tasksof machine learning algorithms among nodes of parallelarchitectures Considering huge amounts of samples andlarge number of features even the computational power ofparallel architecture is not enough in some of cases Forthis reason dimension reduction is needed for big dataapplications In this study PCA is employed as a featureextraction step

5 Experimental Results

Forecasting solutions should be scalable process largeamounts of data and extract a model from data Big dataenvironments using Spark technologies give us opportunityfor implementing algorithms with scalability that shares tasksof machine learning algorithms among nodes of parallelarchitectures Considering huge amounts of samples andlarge number of features even the computational power ofparallel architecture is not enough in some of cases Forthis reason dimension reduction is needed for big dataapplications In this study PCA is employed as a featureextraction step

SOK Market sells 21 different groups of items We assessthe performance of the proposed forecasting system on groupbasis as shown in Table 3 Table 3 shows MAPE (Mean Abso-lute Percentage Error) of integration strategies (S1) and (S2)

10 Complexity

Table3Dem

andforecasting

improvem

entsperp

rodu

ctgrou

p

Prod

uct

Group

s

Metho

dI

with

outP

ropo

sed

IntegrationStrategy

(119878 1)

MAPE

Metho

dII

with

Prop

osed

IntegrationStrategy

(119878 2)

MAPE

Prop

osed

Integration

Strategy

with

Deep

Learning

(119878 119863)

MAPE

Percentage

Success

Rate

119875 1=(119878 1minus

119878 2)119878 2

Percentage

Success

Rate

119875 2=(119878 1minus

119878 119863)119878 119863

Improvem

ent

119863=119875 2minus

119875 1Ba

byProd

ucts

05157

03081

02927

4026

4325

299

Bakery

Prod

ucts

03482

02059

01966

4085

4352

267

Beverage

03714

02316

02207

3764

4058

294

Biscuit-C

hocolate

03358

02077

01977

3814

4113

299

BreakfastP

rodu

cts

044

4302770

02661

3765

4011

245

Canned-Paste

-Sauces

03836

02309

02198

3980

4269

289

Cheese

03953

02457

02345

3784

4068

284

Cleaning

Prod

ucts

04560

02791

02650

3879

4189

310

CosmeticsP

rodu

cts

05397

03266

03148

3949

4167

219

DeliM

eats

04242

02602

02488

3865

4136

270

EdibleOils

040

6002299

02215

4336

4545

209

Hou

seho

ldGoo

ds05713

03656

03535

3601

3813

212

IceC

ream

-Frozen

05012

03255

03106

3505

3803

298

Legu

mes-Pasta-Sou

p03850

02397

02269

3774

4107

333

Nuts-Ch

ips

03316

02049

01966

3820

4071

251

PoultryEg

gs04219

02527

02403

4011

4304

294

ReadyMeals

04613

02610

02520

4342

4536

194

RedMeat

02514

01616

01532

3571

3906

335

Tea-Coff

eeProd

ucts

04347

02650

02535

3904

4168

264

Textile

Prod

ucts

05418

03048

02907

4374

46

34

260

To

baccoProd

ucts

03791

02378

02290

3729

3961

232

Average

04238

02582

02469

3899

4168

269

Complexity 11

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 2 MAPE distribution according to product groups with S1 integration strategy

in columns 2 and 3The average percentage forecasting errorsof strategies (S1) and (S2) are 042 and 026 respectively As itcan be seen fromTable 3 the percentage success rate of (S2) inaccordance with (S1) is represented in column 5 The currentintegration strategy (S2) provides 3899 improvement overthe first integration strategy (S1) average In some productgroups like Textile Products the error rate is reduced from054 to 031 with 4374 enhancement in MAPE

With this work a further improvement is obtainedby utilizing DL model compared to the usage of just 10algorithms which are based on forecasting strategy Column4 of Table 3 demonstrates MAPEs of 21 different groups ofproducts after the addition of DL model After the inclusionof DL model into the proposed system we observed around2 to 34 increase in demand forecast accuracies in someof product groups The error rates of some product groupslike ldquobaby productsrdquo ldquocosmetic productsrdquo and ldquohouseholdgoodsrdquo are usually higher than others since these productgroups do not have continuous demand by consumers Forexample the error rate for ldquohousehold goodsrdquo group is 035while the error rate for ldquored meatrdquo group is 015 The errorrates in food products are usually lower since customersconsider their daily consumption regularly to buy theseproducts However the error rates are higher in baby textileand household goods Customers buy products from thesegroups depending on whether they like the product or notselectively In addition the prices of goods in these groupsand promotion effects are usually high relative to the pricesof goods in other groups

Moreover it is observed that the top benefitted productgroups for Method I accomplishes over 40 success rateon Baby Products Bakery Products Edible Oils PoultryEggs Ready Meals and Textile Products when the column

of percentage success rate is analyzed The common pointof these groups is that each one of them is being consumedby a specific customer segment regularly For instance babyproducts group is being chosen by families who have kidsEdible Oils Poultry Eggs and Bakery Products are beingpreferred by frequently and continuously shopping customersegments and Ready Meals are being preferred by mostlybachelor single and working consumers etc Furthermorethe inclusion of DL into the forecasting system indicatesthat some other consuming groups (for example CleaningProducts RedMeats and Legumes Pasta Soup) exhibit betterperformance than the others with additional over 3

Figure 2 shows the box plots of mean percentage errors(MAPE) for each group of products after application of(S1) integration strategy The box plots enable us to analyzedistributional characteristics of forecasting errors for productgroups As it can be seen from the figure there are differentmedians for each product group Thus the forecasting errorsare usually dissimilar for different products groups Theinterquartile range box represents the middle 50 of scoresin data The lengths of interquartile range boxes are usuallyvery tall This means that there are quite number of differentprediction errors within products of a given group

Figure 3 presents the box plots of MAPEs of the proposedforecasting system that apply proposed integration approach(S2) which combines the predictions of 10 different forecast-ing algorithms

Figure 4 shows after adding DL to our system resultsThe lengths of interquartile range boxes are narrower whencompared to the ones in Figures 2 and 3 This means that thedistribution of forecasting errors is less than Figures 2 and 3Finally the integration of DL into the proposed forecastingsystem generates results that are more accurate

12 Complexity

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 3 MAPE distribution according to product groups with S2 integration strategy

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 4 MAPE distribution according to product groups after deep learning algorithms

For example for the (S1) integration strategy in BabyProducts product group MAPE distribution for the 1st quar-tile of data 5 is observed between 0and 23 for 2nd quartilebetween 23 and 48 and for 3rd and 4th between 48and 75 in Figure 2 After applying integration strategy forBaby Products product group MAPE distribution becomes0ndash16 for the 1st quartile 16ndash30 for the 2nd quartileof the data and 30ndash50 for the rest This gives around8 improvement for the first quartile from 8 to 18enhancement for the second quartile and from 18 to 25advancement for the rest of the data according to boundaries

differences Median of MAPE distribution is analyzed as48 After applying the proposed integration strategy it isobserved as 27 which means 21 improvement Approx-imately 1-3 enhancement is observed for each quartileof the data with the inclusion of deep learning strategy inFigure 4

For Cleaning Products similar improvements areobserved with the others for integration strategy MAPEdistribution of the 1st quartile of data is observed between0 and 19 for the 2nd quartile it is between 19 and 40and for the rest of data it is observed between 40 and

Complexity 13

SUM

(SA

LE Q

UAN

TITY

)

DATE

9000

8000

7000

6000

5000

4000

3000

2000

1000

0

July August Sep Oct Nov Dec Jan Feb March April May June July

ActualS1

S2SD

2017 2017 2017 2017 2017 2017 2018 2018 2018 2018 2018 2018 2018

Figure 5 Accuracy comparison among integration strategies for one sample product group

73 in Figure 2 After performing the proposed integrationstrategy for Cleaning Products MAPE distribution becomes0ndash10 for the 1st quartile of data 10ndash17 for the 2ndquartile and 27ndash43 for the 3rd and 4th quartiles togetherImprovements are observed as follows 9 for the 1st quartilefrom 9 to 13 for the 2nd quartile and from 13 to 20for the rest of the data at boundaries After implementationof the integration strategy for the 1st and 2nd quartiles thesystem advancement is observed as 2 and for the 3rdand 4th quartiles it is 3 respectively Briefly integrationstrategy improvement is remarkable for each product groupand consolidation of DL to the proposed integration strategyadvances results between almost 2 and 3 additionally

In Figure 5 the accuracy comparison among integrationstrategies of one of the best performing sample groups ispresented during 1-year period Particularly Christmas andother seasonality effects can be seen very clearly at Figure 5and integration strategy with DL (SD) is performing with thebest accuracy

It is hard to compare the performance of our proposedsystem with the other studies because of the lack of workswith similar combinations of deep learning approach theproposed decision integration strategies and different learn-ing algorithms for demand forecasting process Anotherdifficulty is the deficiency of real-life dataset similar toSOK dataset in order to compare the state-of-art studiesAlthough the results of proposed system are given in thisstudy we also report the results of a number of researchworkshere on demand forecasting domain A greedy aggregation-decomposition method has solved a real-world intermittentdemand forecasting problem for a fashion retailer in Sin-gapore [46] They report 59 MAPE success rate with asmall dataset Authors compare different approaches suchas statistical model winter model and radius basis functionneural network (RBFNN) with SVM for demand forecastingprocess in [47] As a result they conclude the study by the

fact that the success of SVM algorithm surpasses otherswith around 77 enhancement at average MAPE resultslevel A novel demand forecasting model called SHEnSVM(Selective and Heterogeneous Ensemble of Support Vec-tor Machines) is proposed in [48] The proposed modelpresents that the individual SVMs are trained by differentsamples generated by bootstrap algorithm After that geneticalgorithm is employed for retrieving the best individualcombination schema They report 10 advancement withSVM algorithm and 64 MAPE improvement They onlyemploy beer datawith 3 different brands in their experimentsTugba Efendigil et al propose a novel forecasting mechanismwhich is modeled by artificial intelligence approaches in[49] They compared both artificial neural networks andadaptive network-based fuzzy inference system techniques tomanage the fuzzy demandwith incomplete informationTheyreached around 18 MAPE rates for some products duringtheir experiments

6 Conclusion

In retail industry demand forecasting is one of the mainproblems of supply chains to optimize stocks reduce costsand increase sales profit and customer loyalty To overcomethis issue there are several methods such as time seriesanalysis and machine learning approaches to analyze andlearn complex interactions and patterns from historical data

In this study there is a novel attempt to integrate the11 different forecasting models that include time series algo-rithms support vector regression model and deep learningmethod for demand forecasting process Moreover a noveldecision integration strategy is developed by drawing inspi-ration from the ensemble methodology namely boostingThe proposed approach considers the performance of eachmodel in time and combines the weighted predictions of thebest performing models for demand forecasting process The

14 Complexity

proposed forecasting system is tested and carried out real lifedata obtained from SOK Market retail chain It is observedthat the inclusion of different learning algorithms except timeseries models and a novel integration strategy advanced theperformance of demand forecasting system To the best ofour knowledge this is the very first attempt to consolidatedeep learning methodology SVR algorithm and differenttime series methods for demand forecasting systems Fur-thermore the other novelty of this work is the adaptation ofboosting ensemble strategy to the demand forecasting modelIn this way the final decision of the proposed system isbased on the best algorithms of the week by gaining moreweight This makes our forecasts more reliable with respectto the trend changes and seasonality behaviors Moreoverthe proposed approach performs very well integrating withdeep learning algorithm on Spark big data environmentDimension reduction process and clustering methods helpto decrease time consuming with less computing powerduring deep learning modelling phase Although a reviewof some of similar studies is presented in Section 5 asit is expected it is very difficult to compare the resultsof other studies with ours because of the use of differentdatasets and methods In this study we compare results ofthree models where model one selects the best performingforecasting method depending on its success on previousperiod with 424 MAPE on average The second modelwith the novel integration strategy results in 258 MAPEon average The last model with the novel integration strat-egy enhanced with deep learning approach provides 247MAPE on average As a result the inclusion of deep learningapproach into the novel integration strategy reduces averageprediction error for demand forecasting process in supplychain

As a future work we plan to enrich the set of featuresby gathering data from other sources like economic studiesshopping trends social media social events and locationbased demographic data of stores New variety of datasources contributions to deep learning can be observedOne more study can be done to determine the hyperpa-rameters for deep learning algorithm In addition we alsoplan to use other deep learning techniques such as con-volutional neural networks recurrent neural networks anddeep neural networks as learning algorithms Furthermoreour another objective is to use heuristic methods MBO(Migrating Birds Optimization) and other related algorithms[50] to optimize some of coefficientsweights which weredetermined empirically by trial-and-error like taking 30percent of the best performing methods in our currentsystem

Data Availability

This dataset is private customer data so the agreement withthe organization SOK does not allow sharing data

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work is supported by OBASE Research amp DevelopmentCenter

References

[1] P J McGoldrick and E Andre ldquoConsumer misbehaviourpromiscuity or loyalty in grocery shoppingrdquo Journal of Retailingand Consumer Services vol 4 no 2 pp 73ndash81 1997

[2] D Grewal A L Roggeveen and J Nordfalt ldquoThe Future ofRetailingrdquo Journal of Retailing vol 93 no 1 pp 1ndash6 2017

[3] E T Bradlow M Gangwar P Kopalle and S Voleti ldquothe roleof big data and predictive analytics in retailingrdquo Journal ofRetailing vol 93 no 1 pp 79ndash95 2017

[4] J Han M Kamber and J Pei Data Mining Concepts andTechniques Morgan Kaufmann Publishers USA 2013

[5] R Hyndman and G Athanasopoulos Forecasting Principlesand Practice OTexts Melbourne Australia 2018 httpotextsorgfpp2

[6] C C Pegels ldquoExponential forecasting some new variationsrdquoManagement Science vol 12 pp 311ndash315 1969

[7] E S Gardner ldquoExponential smoothing the state of the artrdquoJournal of Forecasting vol 4 no 1 pp 1ndash28 1985

[8] D Gujarati Basic Econometrics Mcgraw-Hill New York NYUSA 2003

[9] J D Croston ldquoForecasting and stock control for intermittentdemandsrdquo Operational Research Quarterly vol 23 no 3 pp289ndash303 1972

[10] T RWillemain C N Smart J H Shockor and P A DeSautelsldquoForecasting intermittent demand inmanufacturing a compar-ative evaluation of Crostonrsquos methodrdquo International Journal ofForecasting vol 10 no 4 pp 529ndash538 1994

[11] A Syntetos Forecasting of intermittent demand Brunel Univer-sity 2001

[12] A A Syntetos and J E Boylan ldquoThe accuracy of intermittentdemand estimatesrdquo International Journal of Forecasting vol 21no 2 pp 303ndash314 2005

[13] R H Teunter A A Syntetos and M Z Babai ldquoIntermittentdemand linking forecasting to inventory obsolescencerdquo Euro-pean Journal of Operational Research vol 214 no 3 pp 606ndash615 2011

[14] R J Hyndman R A Ahmed G Athanasopoulos and H LShang ldquoOptimal combination forecasts for hierarchical timeseriesrdquo Computational Statistics amp Data Analysis vol 55 no 9pp 2579ndash2589 2011

[15] R Fildes and F Petropoulos ldquoSimple versus complex selectionrules for forecasting many time seriesrdquo Journal of BusinessResearch vol 68 no 8 pp 1692ndash1701 2015

[16] C Li and A Lim ldquoA greedy aggregationndashdecompositionmethod for intermittent demand forecasting in fashion retail-ingrdquo European Journal of Operational Research vol 269 no 3pp 860ndash869 2018

[17] F Turrado Garcıa L J Garcıa Villalba and J Portela ldquoIntelli-gent system for time series classification using support vectormachines applied to supply-chainrdquo Expert Systems with Appli-cations vol 39 no 12 pp 10590ndash10599 2012

[18] G Song and Q Dai ldquoA novel double deep ELMs ensemblesystem for time series forecastingrdquo Knowledge-Based Systemsvol 134 pp 31ndash49 2017

Complexity 15

[19] O Araque I Corcuera-Platas J F Sanchez-Rada and C AIglesias ldquoEnhancing deep learning sentiment analysis withensemble techniques in social applicationsrdquoExpert SystemswithApplications vol 77 pp 236ndash246 2017

[20] H Tong B Liu and S Wang ldquoSoftware defect predictionusing stacked denoising autoencoders and twostage ensemblelearningrdquo Information and Soware Technology vol 96 pp 94ndash111 2018

[21] X Qiu Y Ren P N Suganthan and G A J AmaratungaldquoEmpirical mode decomposition based ensemble deep learningfor load demand time series forecastingrdquo Applied So Comput-ing vol 54 pp 246ndash255 2017

[22] Z Qi BWang Y Tian and P Zhang ldquoWhen ensemble learningmeets deep learning a new deep support vector machine forclassificationrdquo Knowledge-Based Systems vol 107 pp 54ndash602016

[23] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[24] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[25] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesrdquo inNeural Networks Tricks of theTrade vol 7700 of Lecture Notes in Computer Science pp 437ndash478 Springer Berlin Germany 2nd edition 2012

[26] Y LeCun Y Bengio and G Hinton ldquoDeep learning ReviewrdquoInternational Journal of Business and Social Science vol 3 2015

[27] S Kim Z Yu R M Kil and M Lee ldquoDeep learning of supportvectormachineswith class probability output networksrdquoNeuralNetworks vol 64 pp 19ndash28 2015

[28] P J Brockwell and R A Davis Time Series eory AndMethods Springer Series in Statistics NewYork NY USA 1989

[29] V N Vapnik Statistical Learning eory Wiley- InterscienceNew York NY USA 1998

[30] S Haykin Neural Networks and LearningMachines MacmillanPublishers Limited New Jersey NJ USA 2009

[31] A Candel V Parmar E LeDell and A Arora Deep Learningwith H2O United States of America 2018

[32] K Keerthi Vasan and B Surendiran ldquoDimensionality reductionusing principal component analysis for network intrusiondetectionrdquo Perspectives in Science vol 8 pp 510ndash512 2016

[33] L Rokach ldquoEnsemble-based classifiersrdquo Artificial IntelligenceReview vol 33 no 1-2 pp 1ndash39 2010

[34] R Polikar ldquoEnsemble based systems in decision makingrdquo IEEECircuits and Systems Magazine vol 6 no 3 pp 21ndash45 2006

[35] Reid SamA review of heterogeneous ensemblemethods Depart-ment of Computer Science University of Colorado at Boulder2007

[36] D Gopika and B Azhagusundari ldquoAn analysis on ensem-ble methods in classification tasksrdquo International Journal ofAdvanced Research in Computer and Communication Engineer-ing vol 3 no 7 pp 7423ndash7427 2014

[37] Y Ren L Zhang and P N Suganthan ldquoEnsemble classificationand regression-recent developments applications and futuredirectionsrdquo IEEE Computational Intelligence Magazine vol 11no 1 pp 41ndash53 2016

[38] U G Mangai S Samanta S Das and P R ChowdhuryldquoA survey of decision fusion and feature fusion strategies for

pattern classificationrdquo IETE Technical Review vol 27 no 4 pp293ndash307 2010

[39] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014

[40] G Tsoumakas L Angelis and I Vlahavas ldquoSelective fusion ofheterogeneous classifiersrdquo Intelligent Data Analysis vol 9 no 6pp 511ndash525 2005

[41] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[42] S Chopra and P Meindl Supply Chain Management StrategyPlanning and Operation Prentice Hall 2004

[43] G Kushwaha ldquoOperational performance through supply chainmanagement practicesrdquo International Journal of Business andSocial Science vol 217 pp 65ndash77 2012

[44] G E Box G M Jenkins G Reinsel and G M LjungTime Series Analysis Forecasting and Control Wiley Series inProbability and Statistics John Wiley amp Sons Hoboken NJUSA 5th edition 2015

[45] W-L Zhao C-H Deng and C-W Ngo ldquok-means a revisitrdquoNeurocomputing vol 291 pp 195ndash206 2018

[46] C Li andA Lim ldquoA greedy aggregation-decompositionmethodfor intermittent demand forecasting in fashion retailingrdquo Euro-pean Journal of Operational Research vol 269 no 3 pp 860ndash869 2018

[47] L Yue Y Yafeng G Junjun and T Chongli ldquoDemand forecast-ing by using support vectormachinerdquo inProceedings of theirdInternational Conference on Natural Computation (ICNC 2007)pp 272ndash276 Haikou China August 2007

[48] L Yue L Zhenjiang Y Yafeng T Zaixia G Junjun andZ Bofeng ldquoSelective and heterogeneous SVM ensemble fordemand forecastingrdquo in Proceedings of the 2010 IEEE 10th Inter-national Conference on Computer and Information Technology(CIT) pp 1519ndash1524 Bradford UK June 2010

[49] T Efendigil S Onut and C Kahraman ldquoA decision supportsystem for demand forecasting with artificial neural networksand neuro-fuzzy models a comparative analysisrdquo Expert Sys-tems with Applications vol 36 no 3 pp 6697ndash6707 2009

[50] E Duman M Uysal and A F Alkaya ldquoMigrating birds opti-mization a newmetaheuristic approach and its performance onquadratic assignment problemrdquo Information Sciences vol 217pp 65ndash77 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 5: An Improved Demand Forecasting Model Using …downloads.hindawi.com/journals/complexity/2019/9067367.pdfAn Improved Demand Forecasting Model Using Deep Learning Approach and Proposed

Complexity 5

Thus a DL can automatically perform feature extraction initself without any preprocessing step Visual object recogni-tion speech recognition and genomics are some of the fieldswhere DL is applied successfully [26]

A multilayer feedforward artificial neural network(MLFANN) is employed as a deep learning algorithm in thisstudy In a feedforward neural network information movesfrom the input nodes to the hidden nodes and at the end tothe output nodes in successive layers of the network withoutany feedback MLFANN is trained with stochastic gradientdescent using backpropagation It uses gradient descentalgorithm to update the weights purposing to minimize thesquared error among the network output values and thetarget output values Using gradient descent each weightis adjusted according to its contribution value to the errorFor deep learning part H2O library [31] is used as an opensource big data artificial intelligence platform

34 Feature Extraction In this work there are some issuesbecause of the huge amount of data When we try to useall of the features of all products in each store with 155features and 875 million records deep learning algorithmtook days to finish its job due to limited computational powerFor the purpose reducing the number of features and thecomputational time is required for each algorithm In orderto overcome this problem and to accelerate clustering andmodelling phases PCA (Principal Component Analysis) isused as a feature extraction algorithm PCA is a commonlyused dimension reduction algorithm that presents the mostsignificant features PCA transforms each instance of thegiven dataset from d dimensional space to a k dimensionalsubspace in which the new generated set of k dimensions arecalled the Principal Components (PC) [32] Each principalcomponent is directed to a maximum variance excluding thevariance accounted for in all its preceding components As aresult the first component covers the maximum variance andso on In brief Principal Components are represented as

PCi = a1X1 + a2X2+ (1)

wherePCi is ith principal component Xj is j

th original featureand aj is the numerical coefficient for feature Xj

35 Proposed Integration Strategy Decision integrationapproach is being used by the way of combining thestrengths of different algorithms into a single collaboratedmethod philosophy By the help of this approach it is aimedto improve the success of the proposed forecasting system bycombining different algorithms Since each algorithm can bemore sensitive or can have some weaknesses under differentconditions collecting decision of each model providesmore effective and powerful results for decision makingprocesses

There are twomain EL approaches in the literature calledhomogeneous and heterogeneous EL If different types ofclassifiers are used as base algorithms then such a systemis called heterogeneous ensemble otherwise homogenousensemble In this study we concentrate on heterogeneousensembles An ensemble system is composed of two parts

ensemble generation and ensemble integration [33ndash36] Inensemble generation part a diverse set of models are gen-erated using different base classifiers Nine different timeseries and regression methods support vector regressionmodel and deep learning algorithm are employed as baseclassifiers in this study There are many integration methodsthat combine decisions of base classifiers to obtain a finaldecision [37ndash40] For integration step a new decision inte-gration strategy is proposed for demand forecasting modelin this work We draw inspiration from boosting ensemblemodel [4 41] to construct our proposed decision integrationstrategy The essential concept behind boosting methodol-ogy is that in prediction or classification problems finaldecisions can be calculated as weighted combination of theresults

In order to integrate the prediction of each forecastingalgorithm we used two integration strategies for the pro-posed approach The first integration strategy selects the bestperforming forecasting method among others and uses thatmethod to predict the demand of a product in a store forthe next week The second integration strategy chooses thebest performing forecasting methods of current week andcalculates the prediction by combining weighted predictionsof winners In our decision integration strategy final decisionis determined by regarding contribution of all algorithmslike democratic systems Our approach considers decisionsof forecasting algorithms those are performing better byconsidering the previous 4 weeks of this year and last yeartransformation of current week and previous week Whilealgorithms are getting better at forecasts by the time theircontribution weights are increasing accordingly or vice versaIn first decision integration strategy we do not considercontribution of all algorithms in fully democratic mannerwe only look at contributions of the best algorithms ofrelated week In other words final decision is maintainedby integrating the decisions of only the best ones (selectedaccording to their historical decisions) for each store andproductOn the other hand the best algorithms of a week canchange according to each store product couple since everyalgorithm has different behavior for different products andlocations

The second decision integration strategy is basedon the weighted average of mean absolute percentageerror (MAPE) and mean absolute deviation (MAD) [42]These are two popular evaluation metrics to assess theperformance of forecasting models The equations forcalculation of these forecast accuracy measures are asfollows

MAPE (Mean Absolute Percentage Error)

MAPE = 1n

nsumt=1

1003816100381610038161003816Ft minus At1003816100381610038161003816

At(2)

MAD = 1n

nsumt=1

1003816100381610038161003816Ft minus At1003816100381610038161003816 (3)

MAD (Mean Absolute Deviation)

6 Complexity

where

(i) 119865119905 is the expected or estimated value for period 119905(ii) 119860 119905 is the actual value for period t(iii) n is also taking the value of the number of periods

Accuracy of a model can be simply computed as followsAccuracy = 1-MAPE If the values of the above two measuresare small it means that forecasting model performs better[43]

Replenishment for demand forecasting in retail industryneeds some modifications in general EL strategies regardingretail specific trends and dynamics For this purpose theproposed second integration strategy takes calculation ofweighted average of MAPE for each week of previous 4weeks and additionally MAPE of the previous yearrsquos sameweek and previous year following weekrsquos trend in Equation(4) This makes our forecasts more reliable with respect totrend changes and seasonality behaviors MAPE of each weekis being calculated by the formula defined in Equation (2)above The proposed model automatically changes weeklyweights of eachmember algorithm by theway of their averageMAPE for each week at store and product level Based on theresults ofMAPEweights the best algorithms of the week gainmore weight on final decision

Average MAPE of each algorithm for a product at eachstore is calculated with Equation (4) Suppose that MAPEs ofan algorithm are 02 03 01 and 01 for the previous 4 weeksof the current week and 01 and 02 for the previous yearrsquossame week and prior week Assume that weekly coefficientsare 25 20 10 5 30 and 10 respectively Then theaverage MAPE of an algorithm for the current week will be03 lowast 025 + 03 lowast 02 + 01 lowast01 + 01 lowast 005 + 01 lowast 03 + 02 lowast01 = 02 according to Equation (4)

Mavg =nsumk=1

CkMk (4)

In Equation (4) the sum of coefficients should be 1 that issum119899119896=1 119862119896 = 1 119872119896 means MAPE of the related weeks where

(i) M1 is the MAPE of previous week(ii) M2 is the MAPE of 2 weeks before related week(iii) M3 is the MAPE of 3 weeks before related week(iv) M4 is the MAPE of 4 weeks before related week(v) M5 is the MAPE of the previous years same week(vi) M6 is the MAPE of the previous years previous week

The proposed decision integration system computes a MAPEfor each algorithm for the coming forecasting week per storeand product level In addition the effects of special days aretaken into consideration Christmas Valentinersquos daymotherrsquosday start of Ramadan and other religious days can be thoughtas some examples of special days System users consideringthe current yearrsquos calendar can define special days manuallyThus trends of special days are computed by using previousyearrsquos trends automatically This enables the considerationof seasonality and special events that results in evaluation

of more accurate forecasting values Meanwhile seasonalityand other effects can be taken under consideration aswell Following MAPE calculation of the algorithms newweights (Wi where i=1n) are being assigned to each of themaccording to their weighted average for the current week

The next step is the definition of the best algorithms ofthe week for each store and product couple We only take30 of the best performingmethods into consideration as thebest algorithms After making many empirical observationsthis ratio is giving ultimate results according to the datasetIt is obvious that these parameters are very specific todataset characteristics and also depend on which algorithmsare included in the integration strategy Scaling is the lastpreprocess for the calculation of final decision forecastSuppose that we have n algorithms (A1 A2 An) and k ofthem are the best models of the week for a store and productcouple in which 119896 le 119899 The weight of each winner is beingscaled according to its weighted average in Equation (5)

W1015840t = Wt

sumkj=1Wj

t in (1 k) (5)

Assume that there are 3 best algorithms and their weights are1198821 = 30 1198822 = 18 and1198823 = 12 respectively Theirscaled new weights are going to be

11988210158401 = 3060 = 50

11988210158402 = 1860 = 30

11988210158403 = 1260 = 20

(6)

respectively according to Equation (5)Scaling makes our calculation more comprehensible

After scaling the weight of each algorithm the system givesultimate decision according to new weights by consideringthe performance of each algorithmrsquos with Equation (5)

Favg =ksumj=1FjW1015840j (7)

In Equation (7) the main constraint is sum119896119895=11198821015840119895 = 1 119896 isthe number of champion algorithms and F1 is the forecastof the related algorithm Suppose that the best performingalgorithms are A1 A2 and A3 and algorithm A1 forecastssales quantity as 20 and A2 says it will be 10 for the next weekA3 forecast is 5 Let us assume that their scaled weights are50 30 and 20 respectively Then the weighted forecastis as follows according to Equation (7)

Favg = 20 lowast 050 + 10 lowast 030 + 5 lowast 020 = 14 (8)

Every algorithmhas vote right according to its weight Finallyif a member algorithm does not appear in the list of thebest algorithms for a product and store couple for a specificperiod it is automatically being put into a black list sothat it will not be used anymore for a product store level

Complexity 7

Data Collection

Data Cleaning

Data Enrichment from BigData

Finding Frequent Itemsusing Apriori

ReplenishmentData mart on

Hadoop SPARK

Dimensionality Reduction

Clustering

Deep Learning Model Time Series Models Support VectorRegression Model

Final Decision

Proposed Integration Strategy

Figure 1 Flowchart of the proposed system

as it is marked This enables faster computation time inwhich the system disregards poor performing algorithmsThe algorithm of the proposed demand forecasting systemand the flowchart of the proposed system are given inAlgorithm 1 and Figure 1 respectively

4 Experiment Setup

We use an enhanced dataset which includes SOK Marketrsquosreal life sales and stock data with enriched features Thedataset consists of 106 weeks of sales data that includes 7888distinct products for different stores of SOKMarket On eachstore around 1500 products are actively sold while the rest

rarely become active The forecasting is being performed foreach week to predict demand of the following week Three-fold cross validation methodology is applied for testing dataand then decides with the best accurate one of them forthe next coming week forecast This is a common approachmaking demand forecasting in retail [44] Furthermoreoutside weather information such as daily temperature andother weather conditions is joined to the model as newvariables Since it is known that there is a certain correlationbetween shopping trends and weather conditions at retail inmost circumstances [33] we enriched the dataset with theweather information The dataset also includes mostly salesrelated features filled from relational legacy data sources

8 Complexity

Given n is the number of storesm is the number of products t is the number ofalgorithms in the system and 119860 119905 is an algorithm with index t 119904119898119899 is the matrix whichincludes the number of best performing algorithms for each store and product 119861119898119899 is thematrix which contains the set of blacklist of algorithms for each store and product and119865119894119895 is the matrix which stores final decision of each forecast where i is the number ofstores and j is the number of products

for i=1nfor j=1mfor k=1tif 119860119896 is in list 119861119894119895 then continueelse run 119860119896Calculate algorithm weight119882119896end if

end forfor k=1tChoice best performing algorithms and locate in 119904119894119895

end forfor z=1 119904119894119895do scaling for 119860119911

end forCalculate proposed integration strategy and store in 119865119894119895

end forend forreturn all 119865119899119898

Algorithm 1 The algorithm of the proposed demand forecasting system

Deep learning approach is used to estimate customerdemand for each product at each store of SOK Marketretail chain There is also very apparent similarity betweenproducts which are mostly in the same basket in terms ofsales trends For this purpose Apriori algorithm [4] is utilizedto find correlated products of each product Apriori algorithmis a fast way to find out most associated products which areat the same basketThemost associated productrsquos sales relatedfeatures are added to dataset as wellThis takes the advantageof relationships among products with similar sales trendsAddition of associated product of a product and its featuresenables DL algorithm to learn better within different hiddenlayers In summary it is observed in extensive experimentsthat enhancements of training dataset with weather data andmost associated products sales data made our estimates morepowerful with the usage of DL algorithm

The dataset includes weekly based data of each product ateach store for the latest 8 weeks and sales quantities of the last4 weeks in the same season Historically 2 years of data areincluded for each 5500 stores and 1500 products Data size isabout 875 million records within 155 features These featuresinclude sales quantity customer return quantity receivedproduct quantities from distribution centers sales amountdiscount amount receipt counts of a product customercounts who bought a specific product sales quantity foreach day of week (Monday Tuesday etc) maximum andminimum stock level of each day average stock level for eachweek sales quantity for each hour of the days and also salesquantities of last 4 weeks of the same season Since there isa relationship between products which are generally in thesame basket we prepared the same features defined above for

the most associated products as well Detailed explanation ofthe dataset is presented in Table 2

In Table 2 fields given with square brackets mean that itis an array for each week (Ex Sales quantity week 0 meanssales quantity of current week and sales quantity week 1 isfor sales quantity for one week before and so on) Regardingexperimental results taking weekly data for the latest 8 weeksand 4 weeks of the same season at previous year gives moreacceptable trends of changes in seasonality

For DL implementation part H2O library [31] is usedas an open source big data artificial intelligence platformH2O is a powerful machine learning library and gives usopportunity to implement DL algorithms on Hadoop Sparkbig data environments It puts together the power of highlyadvanced machine learning algorithms and provides to getbenefit of truly scalable in-memory processing for Sparkbig data environment on one or many nodes via its versionof Sparkling Water By the help of in-memory processingcapability of Spark technologies it provides faster parallelplatform to utilize big data for business requirements to getmaximum benefit In this study we use H2O version 31402on 64-bit 16-core and 32GBmemorymachine to observe thepower of parallelism during DL modelling while increasingthe number of neurons

A multilayer feedforward artificial neural network(MLFANN) is employed as a deep learning algorithmMLFANN is trained with stochastic gradient descent usingbackpropagation Using gradient descent each weight isadjusted according to its contribution value to the errorH2O has also advanced features like momentum trainingadaptive learning rate L1 or L2 regularization rate annealing

Complexity 9

Table 2 Dataset explanation

Name Description ExampleYearweek Related yearweek weeks are starting fromMonday to Sunday 201801Store Store number 1234Product Product Identification Number 1

Product Adjactive Associated product with the product according to apriori algorithm Most frequentproduct at the same basket with a specific product 2

Stock In Quantity Week[0-8] Stock increase quantity of the product in related week ex stock transfer quantityfrom distribution center to store 50

Return Quantity Week[0-8] Stock return quantity from customers at a specific week and store 20Sales Quantity Week[0-8] Weekly sales quantity of related product at a specific store 120Sales Amount Week[0-8] Total sales amount of the product at the customer receipt 2500Discount Amount Week[0-8] Discount amount of the product if there is any 500Customer Count Week[0-8] How many customers bought this product at a specific week 30Receipt Count Week[0-8] Distinct receipt count for related product 20Sales Quantity Time[9-22] Hourly sales quantity of related product from 9 am to 22 pm 5Last4weeks Day[1-7] Total sales of each weekday of last 4 weeks Total sales of Mondays Tuesdays etc 10Last8weeks Day[1-7] Total sales of each weekday of last 8 weeks Total sales of Mondays Tuesdays etc 10Max Stock Week[0-8] Maximum stock quantity of related week 12Min Stock Week[0-8] Minimum stock quantity of related week 2Avg Stock Week[0-8] Average stock quantity of related week 5Sales Quantity Adj Week[0-8] Sales quantity of most associated product 14Temperature Weekday[1-7] Daily temperature of weekdays Monday Tuesday etc 22Weekly Avg Temperature[0-8] Average weather temperature of related week 23Weather Condition Weekday[1-7] Nominal variable rainy snowy sunny cloudy etc RainySales Quantity Next Week Target variable of our classification problem 25

dropout and grid search Gaussian distribution is appliedbecause of the usage of continuous variable as responsevariable H2O performs very well in our environment when3 level hidden layers 10 nodes each of them and totally 300epochs are set as parameters

After implementing dimension reduction step K-meansis employed as a clustering algorithm K-means is a greedyand fast clustering algorithm [45] that tries to partitionsamples into k clusters in which each sample is near to thecluster center K is selected as 20 because after several trialsand empirical observations the most even distribution ondataset is reached and divided our dataset into 20 differ-ent clusters for each store After that part deep learningalgorithm is applied and obtained 20 different DL modelsfor each store Then demand forecasting for each productis performed by using its clusterrsquos model on a store basisThe main reason of using clustering is time and computinglack for DL algorithm Instead of making modelling for eachstore 20 models are generated for each store in product levelOur trials with sample dataset bias difference are less than2 so this speed-up is very beneficial and a good optionfor researchers The forecasting result of the DL model istransferred into our forecasting system that makes its finaldecision by considering decisions of 11 forecasting methodsincluding DL model

Forecasting solutions should be scalable process largeamounts of data and extract a model from data Big data

environments using Spark technologies give us opportunityfor implementing algorithms with scalability that shares tasksof machine learning algorithms among nodes of parallelarchitectures Considering huge amounts of samples andlarge number of features even the computational power ofparallel architecture is not enough in some of cases Forthis reason dimension reduction is needed for big dataapplications In this study PCA is employed as a featureextraction step

5 Experimental Results

Forecasting solutions should be scalable process largeamounts of data and extract a model from data Big dataenvironments using Spark technologies give us opportunityfor implementing algorithms with scalability that shares tasksof machine learning algorithms among nodes of parallelarchitectures Considering huge amounts of samples andlarge number of features even the computational power ofparallel architecture is not enough in some of cases Forthis reason dimension reduction is needed for big dataapplications In this study PCA is employed as a featureextraction step

SOK Market sells 21 different groups of items We assessthe performance of the proposed forecasting system on groupbasis as shown in Table 3 Table 3 shows MAPE (Mean Abso-lute Percentage Error) of integration strategies (S1) and (S2)

10 Complexity

Table3Dem

andforecasting

improvem

entsperp

rodu

ctgrou

p

Prod

uct

Group

s

Metho

dI

with

outP

ropo

sed

IntegrationStrategy

(119878 1)

MAPE

Metho

dII

with

Prop

osed

IntegrationStrategy

(119878 2)

MAPE

Prop

osed

Integration

Strategy

with

Deep

Learning

(119878 119863)

MAPE

Percentage

Success

Rate

119875 1=(119878 1minus

119878 2)119878 2

Percentage

Success

Rate

119875 2=(119878 1minus

119878 119863)119878 119863

Improvem

ent

119863=119875 2minus

119875 1Ba

byProd

ucts

05157

03081

02927

4026

4325

299

Bakery

Prod

ucts

03482

02059

01966

4085

4352

267

Beverage

03714

02316

02207

3764

4058

294

Biscuit-C

hocolate

03358

02077

01977

3814

4113

299

BreakfastP

rodu

cts

044

4302770

02661

3765

4011

245

Canned-Paste

-Sauces

03836

02309

02198

3980

4269

289

Cheese

03953

02457

02345

3784

4068

284

Cleaning

Prod

ucts

04560

02791

02650

3879

4189

310

CosmeticsP

rodu

cts

05397

03266

03148

3949

4167

219

DeliM

eats

04242

02602

02488

3865

4136

270

EdibleOils

040

6002299

02215

4336

4545

209

Hou

seho

ldGoo

ds05713

03656

03535

3601

3813

212

IceC

ream

-Frozen

05012

03255

03106

3505

3803

298

Legu

mes-Pasta-Sou

p03850

02397

02269

3774

4107

333

Nuts-Ch

ips

03316

02049

01966

3820

4071

251

PoultryEg

gs04219

02527

02403

4011

4304

294

ReadyMeals

04613

02610

02520

4342

4536

194

RedMeat

02514

01616

01532

3571

3906

335

Tea-Coff

eeProd

ucts

04347

02650

02535

3904

4168

264

Textile

Prod

ucts

05418

03048

02907

4374

46

34

260

To

baccoProd

ucts

03791

02378

02290

3729

3961

232

Average

04238

02582

02469

3899

4168

269

Complexity 11

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 2 MAPE distribution according to product groups with S1 integration strategy

in columns 2 and 3The average percentage forecasting errorsof strategies (S1) and (S2) are 042 and 026 respectively As itcan be seen fromTable 3 the percentage success rate of (S2) inaccordance with (S1) is represented in column 5 The currentintegration strategy (S2) provides 3899 improvement overthe first integration strategy (S1) average In some productgroups like Textile Products the error rate is reduced from054 to 031 with 4374 enhancement in MAPE

With this work a further improvement is obtainedby utilizing DL model compared to the usage of just 10algorithms which are based on forecasting strategy Column4 of Table 3 demonstrates MAPEs of 21 different groups ofproducts after the addition of DL model After the inclusionof DL model into the proposed system we observed around2 to 34 increase in demand forecast accuracies in someof product groups The error rates of some product groupslike ldquobaby productsrdquo ldquocosmetic productsrdquo and ldquohouseholdgoodsrdquo are usually higher than others since these productgroups do not have continuous demand by consumers Forexample the error rate for ldquohousehold goodsrdquo group is 035while the error rate for ldquored meatrdquo group is 015 The errorrates in food products are usually lower since customersconsider their daily consumption regularly to buy theseproducts However the error rates are higher in baby textileand household goods Customers buy products from thesegroups depending on whether they like the product or notselectively In addition the prices of goods in these groupsand promotion effects are usually high relative to the pricesof goods in other groups

Moreover it is observed that the top benefitted productgroups for Method I accomplishes over 40 success rateon Baby Products Bakery Products Edible Oils PoultryEggs Ready Meals and Textile Products when the column

of percentage success rate is analyzed The common pointof these groups is that each one of them is being consumedby a specific customer segment regularly For instance babyproducts group is being chosen by families who have kidsEdible Oils Poultry Eggs and Bakery Products are beingpreferred by frequently and continuously shopping customersegments and Ready Meals are being preferred by mostlybachelor single and working consumers etc Furthermorethe inclusion of DL into the forecasting system indicatesthat some other consuming groups (for example CleaningProducts RedMeats and Legumes Pasta Soup) exhibit betterperformance than the others with additional over 3

Figure 2 shows the box plots of mean percentage errors(MAPE) for each group of products after application of(S1) integration strategy The box plots enable us to analyzedistributional characteristics of forecasting errors for productgroups As it can be seen from the figure there are differentmedians for each product group Thus the forecasting errorsare usually dissimilar for different products groups Theinterquartile range box represents the middle 50 of scoresin data The lengths of interquartile range boxes are usuallyvery tall This means that there are quite number of differentprediction errors within products of a given group

Figure 3 presents the box plots of MAPEs of the proposedforecasting system that apply proposed integration approach(S2) which combines the predictions of 10 different forecast-ing algorithms

Figure 4 shows after adding DL to our system resultsThe lengths of interquartile range boxes are narrower whencompared to the ones in Figures 2 and 3 This means that thedistribution of forecasting errors is less than Figures 2 and 3Finally the integration of DL into the proposed forecastingsystem generates results that are more accurate

12 Complexity

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 3 MAPE distribution according to product groups with S2 integration strategy

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 4 MAPE distribution according to product groups after deep learning algorithms

For example for the (S1) integration strategy in BabyProducts product group MAPE distribution for the 1st quar-tile of data 5 is observed between 0and 23 for 2nd quartilebetween 23 and 48 and for 3rd and 4th between 48and 75 in Figure 2 After applying integration strategy forBaby Products product group MAPE distribution becomes0ndash16 for the 1st quartile 16ndash30 for the 2nd quartileof the data and 30ndash50 for the rest This gives around8 improvement for the first quartile from 8 to 18enhancement for the second quartile and from 18 to 25advancement for the rest of the data according to boundaries

differences Median of MAPE distribution is analyzed as48 After applying the proposed integration strategy it isobserved as 27 which means 21 improvement Approx-imately 1-3 enhancement is observed for each quartileof the data with the inclusion of deep learning strategy inFigure 4

For Cleaning Products similar improvements areobserved with the others for integration strategy MAPEdistribution of the 1st quartile of data is observed between0 and 19 for the 2nd quartile it is between 19 and 40and for the rest of data it is observed between 40 and

Complexity 13

SUM

(SA

LE Q

UAN

TITY

)

DATE

9000

8000

7000

6000

5000

4000

3000

2000

1000

0

July August Sep Oct Nov Dec Jan Feb March April May June July

ActualS1

S2SD

2017 2017 2017 2017 2017 2017 2018 2018 2018 2018 2018 2018 2018

Figure 5 Accuracy comparison among integration strategies for one sample product group

73 in Figure 2 After performing the proposed integrationstrategy for Cleaning Products MAPE distribution becomes0ndash10 for the 1st quartile of data 10ndash17 for the 2ndquartile and 27ndash43 for the 3rd and 4th quartiles togetherImprovements are observed as follows 9 for the 1st quartilefrom 9 to 13 for the 2nd quartile and from 13 to 20for the rest of the data at boundaries After implementationof the integration strategy for the 1st and 2nd quartiles thesystem advancement is observed as 2 and for the 3rdand 4th quartiles it is 3 respectively Briefly integrationstrategy improvement is remarkable for each product groupand consolidation of DL to the proposed integration strategyadvances results between almost 2 and 3 additionally

In Figure 5 the accuracy comparison among integrationstrategies of one of the best performing sample groups ispresented during 1-year period Particularly Christmas andother seasonality effects can be seen very clearly at Figure 5and integration strategy with DL (SD) is performing with thebest accuracy

It is hard to compare the performance of our proposedsystem with the other studies because of the lack of workswith similar combinations of deep learning approach theproposed decision integration strategies and different learn-ing algorithms for demand forecasting process Anotherdifficulty is the deficiency of real-life dataset similar toSOK dataset in order to compare the state-of-art studiesAlthough the results of proposed system are given in thisstudy we also report the results of a number of researchworkshere on demand forecasting domain A greedy aggregation-decomposition method has solved a real-world intermittentdemand forecasting problem for a fashion retailer in Sin-gapore [46] They report 59 MAPE success rate with asmall dataset Authors compare different approaches suchas statistical model winter model and radius basis functionneural network (RBFNN) with SVM for demand forecastingprocess in [47] As a result they conclude the study by the

fact that the success of SVM algorithm surpasses otherswith around 77 enhancement at average MAPE resultslevel A novel demand forecasting model called SHEnSVM(Selective and Heterogeneous Ensemble of Support Vec-tor Machines) is proposed in [48] The proposed modelpresents that the individual SVMs are trained by differentsamples generated by bootstrap algorithm After that geneticalgorithm is employed for retrieving the best individualcombination schema They report 10 advancement withSVM algorithm and 64 MAPE improvement They onlyemploy beer datawith 3 different brands in their experimentsTugba Efendigil et al propose a novel forecasting mechanismwhich is modeled by artificial intelligence approaches in[49] They compared both artificial neural networks andadaptive network-based fuzzy inference system techniques tomanage the fuzzy demandwith incomplete informationTheyreached around 18 MAPE rates for some products duringtheir experiments

6 Conclusion

In retail industry demand forecasting is one of the mainproblems of supply chains to optimize stocks reduce costsand increase sales profit and customer loyalty To overcomethis issue there are several methods such as time seriesanalysis and machine learning approaches to analyze andlearn complex interactions and patterns from historical data

In this study there is a novel attempt to integrate the11 different forecasting models that include time series algo-rithms support vector regression model and deep learningmethod for demand forecasting process Moreover a noveldecision integration strategy is developed by drawing inspi-ration from the ensemble methodology namely boostingThe proposed approach considers the performance of eachmodel in time and combines the weighted predictions of thebest performing models for demand forecasting process The

14 Complexity

proposed forecasting system is tested and carried out real lifedata obtained from SOK Market retail chain It is observedthat the inclusion of different learning algorithms except timeseries models and a novel integration strategy advanced theperformance of demand forecasting system To the best ofour knowledge this is the very first attempt to consolidatedeep learning methodology SVR algorithm and differenttime series methods for demand forecasting systems Fur-thermore the other novelty of this work is the adaptation ofboosting ensemble strategy to the demand forecasting modelIn this way the final decision of the proposed system isbased on the best algorithms of the week by gaining moreweight This makes our forecasts more reliable with respectto the trend changes and seasonality behaviors Moreoverthe proposed approach performs very well integrating withdeep learning algorithm on Spark big data environmentDimension reduction process and clustering methods helpto decrease time consuming with less computing powerduring deep learning modelling phase Although a reviewof some of similar studies is presented in Section 5 asit is expected it is very difficult to compare the resultsof other studies with ours because of the use of differentdatasets and methods In this study we compare results ofthree models where model one selects the best performingforecasting method depending on its success on previousperiod with 424 MAPE on average The second modelwith the novel integration strategy results in 258 MAPEon average The last model with the novel integration strat-egy enhanced with deep learning approach provides 247MAPE on average As a result the inclusion of deep learningapproach into the novel integration strategy reduces averageprediction error for demand forecasting process in supplychain

As a future work we plan to enrich the set of featuresby gathering data from other sources like economic studiesshopping trends social media social events and locationbased demographic data of stores New variety of datasources contributions to deep learning can be observedOne more study can be done to determine the hyperpa-rameters for deep learning algorithm In addition we alsoplan to use other deep learning techniques such as con-volutional neural networks recurrent neural networks anddeep neural networks as learning algorithms Furthermoreour another objective is to use heuristic methods MBO(Migrating Birds Optimization) and other related algorithms[50] to optimize some of coefficientsweights which weredetermined empirically by trial-and-error like taking 30percent of the best performing methods in our currentsystem

Data Availability

This dataset is private customer data so the agreement withthe organization SOK does not allow sharing data

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work is supported by OBASE Research amp DevelopmentCenter

References

[1] P J McGoldrick and E Andre ldquoConsumer misbehaviourpromiscuity or loyalty in grocery shoppingrdquo Journal of Retailingand Consumer Services vol 4 no 2 pp 73ndash81 1997

[2] D Grewal A L Roggeveen and J Nordfalt ldquoThe Future ofRetailingrdquo Journal of Retailing vol 93 no 1 pp 1ndash6 2017

[3] E T Bradlow M Gangwar P Kopalle and S Voleti ldquothe roleof big data and predictive analytics in retailingrdquo Journal ofRetailing vol 93 no 1 pp 79ndash95 2017

[4] J Han M Kamber and J Pei Data Mining Concepts andTechniques Morgan Kaufmann Publishers USA 2013

[5] R Hyndman and G Athanasopoulos Forecasting Principlesand Practice OTexts Melbourne Australia 2018 httpotextsorgfpp2

[6] C C Pegels ldquoExponential forecasting some new variationsrdquoManagement Science vol 12 pp 311ndash315 1969

[7] E S Gardner ldquoExponential smoothing the state of the artrdquoJournal of Forecasting vol 4 no 1 pp 1ndash28 1985

[8] D Gujarati Basic Econometrics Mcgraw-Hill New York NYUSA 2003

[9] J D Croston ldquoForecasting and stock control for intermittentdemandsrdquo Operational Research Quarterly vol 23 no 3 pp289ndash303 1972

[10] T RWillemain C N Smart J H Shockor and P A DeSautelsldquoForecasting intermittent demand inmanufacturing a compar-ative evaluation of Crostonrsquos methodrdquo International Journal ofForecasting vol 10 no 4 pp 529ndash538 1994

[11] A Syntetos Forecasting of intermittent demand Brunel Univer-sity 2001

[12] A A Syntetos and J E Boylan ldquoThe accuracy of intermittentdemand estimatesrdquo International Journal of Forecasting vol 21no 2 pp 303ndash314 2005

[13] R H Teunter A A Syntetos and M Z Babai ldquoIntermittentdemand linking forecasting to inventory obsolescencerdquo Euro-pean Journal of Operational Research vol 214 no 3 pp 606ndash615 2011

[14] R J Hyndman R A Ahmed G Athanasopoulos and H LShang ldquoOptimal combination forecasts for hierarchical timeseriesrdquo Computational Statistics amp Data Analysis vol 55 no 9pp 2579ndash2589 2011

[15] R Fildes and F Petropoulos ldquoSimple versus complex selectionrules for forecasting many time seriesrdquo Journal of BusinessResearch vol 68 no 8 pp 1692ndash1701 2015

[16] C Li and A Lim ldquoA greedy aggregationndashdecompositionmethod for intermittent demand forecasting in fashion retail-ingrdquo European Journal of Operational Research vol 269 no 3pp 860ndash869 2018

[17] F Turrado Garcıa L J Garcıa Villalba and J Portela ldquoIntelli-gent system for time series classification using support vectormachines applied to supply-chainrdquo Expert Systems with Appli-cations vol 39 no 12 pp 10590ndash10599 2012

[18] G Song and Q Dai ldquoA novel double deep ELMs ensemblesystem for time series forecastingrdquo Knowledge-Based Systemsvol 134 pp 31ndash49 2017

Complexity 15

[19] O Araque I Corcuera-Platas J F Sanchez-Rada and C AIglesias ldquoEnhancing deep learning sentiment analysis withensemble techniques in social applicationsrdquoExpert SystemswithApplications vol 77 pp 236ndash246 2017

[20] H Tong B Liu and S Wang ldquoSoftware defect predictionusing stacked denoising autoencoders and twostage ensemblelearningrdquo Information and Soware Technology vol 96 pp 94ndash111 2018

[21] X Qiu Y Ren P N Suganthan and G A J AmaratungaldquoEmpirical mode decomposition based ensemble deep learningfor load demand time series forecastingrdquo Applied So Comput-ing vol 54 pp 246ndash255 2017

[22] Z Qi BWang Y Tian and P Zhang ldquoWhen ensemble learningmeets deep learning a new deep support vector machine forclassificationrdquo Knowledge-Based Systems vol 107 pp 54ndash602016

[23] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[24] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[25] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesrdquo inNeural Networks Tricks of theTrade vol 7700 of Lecture Notes in Computer Science pp 437ndash478 Springer Berlin Germany 2nd edition 2012

[26] Y LeCun Y Bengio and G Hinton ldquoDeep learning ReviewrdquoInternational Journal of Business and Social Science vol 3 2015

[27] S Kim Z Yu R M Kil and M Lee ldquoDeep learning of supportvectormachineswith class probability output networksrdquoNeuralNetworks vol 64 pp 19ndash28 2015

[28] P J Brockwell and R A Davis Time Series eory AndMethods Springer Series in Statistics NewYork NY USA 1989

[29] V N Vapnik Statistical Learning eory Wiley- InterscienceNew York NY USA 1998

[30] S Haykin Neural Networks and LearningMachines MacmillanPublishers Limited New Jersey NJ USA 2009

[31] A Candel V Parmar E LeDell and A Arora Deep Learningwith H2O United States of America 2018

[32] K Keerthi Vasan and B Surendiran ldquoDimensionality reductionusing principal component analysis for network intrusiondetectionrdquo Perspectives in Science vol 8 pp 510ndash512 2016

[33] L Rokach ldquoEnsemble-based classifiersrdquo Artificial IntelligenceReview vol 33 no 1-2 pp 1ndash39 2010

[34] R Polikar ldquoEnsemble based systems in decision makingrdquo IEEECircuits and Systems Magazine vol 6 no 3 pp 21ndash45 2006

[35] Reid SamA review of heterogeneous ensemblemethods Depart-ment of Computer Science University of Colorado at Boulder2007

[36] D Gopika and B Azhagusundari ldquoAn analysis on ensem-ble methods in classification tasksrdquo International Journal ofAdvanced Research in Computer and Communication Engineer-ing vol 3 no 7 pp 7423ndash7427 2014

[37] Y Ren L Zhang and P N Suganthan ldquoEnsemble classificationand regression-recent developments applications and futuredirectionsrdquo IEEE Computational Intelligence Magazine vol 11no 1 pp 41ndash53 2016

[38] U G Mangai S Samanta S Das and P R ChowdhuryldquoA survey of decision fusion and feature fusion strategies for

pattern classificationrdquo IETE Technical Review vol 27 no 4 pp293ndash307 2010

[39] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014

[40] G Tsoumakas L Angelis and I Vlahavas ldquoSelective fusion ofheterogeneous classifiersrdquo Intelligent Data Analysis vol 9 no 6pp 511ndash525 2005

[41] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[42] S Chopra and P Meindl Supply Chain Management StrategyPlanning and Operation Prentice Hall 2004

[43] G Kushwaha ldquoOperational performance through supply chainmanagement practicesrdquo International Journal of Business andSocial Science vol 217 pp 65ndash77 2012

[44] G E Box G M Jenkins G Reinsel and G M LjungTime Series Analysis Forecasting and Control Wiley Series inProbability and Statistics John Wiley amp Sons Hoboken NJUSA 5th edition 2015

[45] W-L Zhao C-H Deng and C-W Ngo ldquok-means a revisitrdquoNeurocomputing vol 291 pp 195ndash206 2018

[46] C Li andA Lim ldquoA greedy aggregation-decompositionmethodfor intermittent demand forecasting in fashion retailingrdquo Euro-pean Journal of Operational Research vol 269 no 3 pp 860ndash869 2018

[47] L Yue Y Yafeng G Junjun and T Chongli ldquoDemand forecast-ing by using support vectormachinerdquo inProceedings of theirdInternational Conference on Natural Computation (ICNC 2007)pp 272ndash276 Haikou China August 2007

[48] L Yue L Zhenjiang Y Yafeng T Zaixia G Junjun andZ Bofeng ldquoSelective and heterogeneous SVM ensemble fordemand forecastingrdquo in Proceedings of the 2010 IEEE 10th Inter-national Conference on Computer and Information Technology(CIT) pp 1519ndash1524 Bradford UK June 2010

[49] T Efendigil S Onut and C Kahraman ldquoA decision supportsystem for demand forecasting with artificial neural networksand neuro-fuzzy models a comparative analysisrdquo Expert Sys-tems with Applications vol 36 no 3 pp 6697ndash6707 2009

[50] E Duman M Uysal and A F Alkaya ldquoMigrating birds opti-mization a newmetaheuristic approach and its performance onquadratic assignment problemrdquo Information Sciences vol 217pp 65ndash77 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 6: An Improved Demand Forecasting Model Using …downloads.hindawi.com/journals/complexity/2019/9067367.pdfAn Improved Demand Forecasting Model Using Deep Learning Approach and Proposed

6 Complexity

where

(i) 119865119905 is the expected or estimated value for period 119905(ii) 119860 119905 is the actual value for period t(iii) n is also taking the value of the number of periods

Accuracy of a model can be simply computed as followsAccuracy = 1-MAPE If the values of the above two measuresare small it means that forecasting model performs better[43]

Replenishment for demand forecasting in retail industryneeds some modifications in general EL strategies regardingretail specific trends and dynamics For this purpose theproposed second integration strategy takes calculation ofweighted average of MAPE for each week of previous 4weeks and additionally MAPE of the previous yearrsquos sameweek and previous year following weekrsquos trend in Equation(4) This makes our forecasts more reliable with respect totrend changes and seasonality behaviors MAPE of each weekis being calculated by the formula defined in Equation (2)above The proposed model automatically changes weeklyweights of eachmember algorithm by theway of their averageMAPE for each week at store and product level Based on theresults ofMAPEweights the best algorithms of the week gainmore weight on final decision

Average MAPE of each algorithm for a product at eachstore is calculated with Equation (4) Suppose that MAPEs ofan algorithm are 02 03 01 and 01 for the previous 4 weeksof the current week and 01 and 02 for the previous yearrsquossame week and prior week Assume that weekly coefficientsare 25 20 10 5 30 and 10 respectively Then theaverage MAPE of an algorithm for the current week will be03 lowast 025 + 03 lowast 02 + 01 lowast01 + 01 lowast 005 + 01 lowast 03 + 02 lowast01 = 02 according to Equation (4)

Mavg =nsumk=1

CkMk (4)

In Equation (4) the sum of coefficients should be 1 that issum119899119896=1 119862119896 = 1 119872119896 means MAPE of the related weeks where

(i) M1 is the MAPE of previous week(ii) M2 is the MAPE of 2 weeks before related week(iii) M3 is the MAPE of 3 weeks before related week(iv) M4 is the MAPE of 4 weeks before related week(v) M5 is the MAPE of the previous years same week(vi) M6 is the MAPE of the previous years previous week

The proposed decision integration system computes a MAPEfor each algorithm for the coming forecasting week per storeand product level In addition the effects of special days aretaken into consideration Christmas Valentinersquos daymotherrsquosday start of Ramadan and other religious days can be thoughtas some examples of special days System users consideringthe current yearrsquos calendar can define special days manuallyThus trends of special days are computed by using previousyearrsquos trends automatically This enables the considerationof seasonality and special events that results in evaluation

of more accurate forecasting values Meanwhile seasonalityand other effects can be taken under consideration aswell Following MAPE calculation of the algorithms newweights (Wi where i=1n) are being assigned to each of themaccording to their weighted average for the current week

The next step is the definition of the best algorithms ofthe week for each store and product couple We only take30 of the best performingmethods into consideration as thebest algorithms After making many empirical observationsthis ratio is giving ultimate results according to the datasetIt is obvious that these parameters are very specific todataset characteristics and also depend on which algorithmsare included in the integration strategy Scaling is the lastpreprocess for the calculation of final decision forecastSuppose that we have n algorithms (A1 A2 An) and k ofthem are the best models of the week for a store and productcouple in which 119896 le 119899 The weight of each winner is beingscaled according to its weighted average in Equation (5)

W1015840t = Wt

sumkj=1Wj

t in (1 k) (5)

Assume that there are 3 best algorithms and their weights are1198821 = 30 1198822 = 18 and1198823 = 12 respectively Theirscaled new weights are going to be

11988210158401 = 3060 = 50

11988210158402 = 1860 = 30

11988210158403 = 1260 = 20

(6)

respectively according to Equation (5)Scaling makes our calculation more comprehensible

After scaling the weight of each algorithm the system givesultimate decision according to new weights by consideringthe performance of each algorithmrsquos with Equation (5)

Favg =ksumj=1FjW1015840j (7)

In Equation (7) the main constraint is sum119896119895=11198821015840119895 = 1 119896 isthe number of champion algorithms and F1 is the forecastof the related algorithm Suppose that the best performingalgorithms are A1 A2 and A3 and algorithm A1 forecastssales quantity as 20 and A2 says it will be 10 for the next weekA3 forecast is 5 Let us assume that their scaled weights are50 30 and 20 respectively Then the weighted forecastis as follows according to Equation (7)

Favg = 20 lowast 050 + 10 lowast 030 + 5 lowast 020 = 14 (8)

Every algorithmhas vote right according to its weight Finallyif a member algorithm does not appear in the list of thebest algorithms for a product and store couple for a specificperiod it is automatically being put into a black list sothat it will not be used anymore for a product store level

Complexity 7

Data Collection

Data Cleaning

Data Enrichment from BigData

Finding Frequent Itemsusing Apriori

ReplenishmentData mart on

Hadoop SPARK

Dimensionality Reduction

Clustering

Deep Learning Model Time Series Models Support VectorRegression Model

Final Decision

Proposed Integration Strategy

Figure 1 Flowchart of the proposed system

as it is marked This enables faster computation time inwhich the system disregards poor performing algorithmsThe algorithm of the proposed demand forecasting systemand the flowchart of the proposed system are given inAlgorithm 1 and Figure 1 respectively

4 Experiment Setup

We use an enhanced dataset which includes SOK Marketrsquosreal life sales and stock data with enriched features Thedataset consists of 106 weeks of sales data that includes 7888distinct products for different stores of SOKMarket On eachstore around 1500 products are actively sold while the rest

rarely become active The forecasting is being performed foreach week to predict demand of the following week Three-fold cross validation methodology is applied for testing dataand then decides with the best accurate one of them forthe next coming week forecast This is a common approachmaking demand forecasting in retail [44] Furthermoreoutside weather information such as daily temperature andother weather conditions is joined to the model as newvariables Since it is known that there is a certain correlationbetween shopping trends and weather conditions at retail inmost circumstances [33] we enriched the dataset with theweather information The dataset also includes mostly salesrelated features filled from relational legacy data sources

8 Complexity

Given n is the number of storesm is the number of products t is the number ofalgorithms in the system and 119860 119905 is an algorithm with index t 119904119898119899 is the matrix whichincludes the number of best performing algorithms for each store and product 119861119898119899 is thematrix which contains the set of blacklist of algorithms for each store and product and119865119894119895 is the matrix which stores final decision of each forecast where i is the number ofstores and j is the number of products

for i=1nfor j=1mfor k=1tif 119860119896 is in list 119861119894119895 then continueelse run 119860119896Calculate algorithm weight119882119896end if

end forfor k=1tChoice best performing algorithms and locate in 119904119894119895

end forfor z=1 119904119894119895do scaling for 119860119911

end forCalculate proposed integration strategy and store in 119865119894119895

end forend forreturn all 119865119899119898

Algorithm 1 The algorithm of the proposed demand forecasting system

Deep learning approach is used to estimate customerdemand for each product at each store of SOK Marketretail chain There is also very apparent similarity betweenproducts which are mostly in the same basket in terms ofsales trends For this purpose Apriori algorithm [4] is utilizedto find correlated products of each product Apriori algorithmis a fast way to find out most associated products which areat the same basketThemost associated productrsquos sales relatedfeatures are added to dataset as wellThis takes the advantageof relationships among products with similar sales trendsAddition of associated product of a product and its featuresenables DL algorithm to learn better within different hiddenlayers In summary it is observed in extensive experimentsthat enhancements of training dataset with weather data andmost associated products sales data made our estimates morepowerful with the usage of DL algorithm

The dataset includes weekly based data of each product ateach store for the latest 8 weeks and sales quantities of the last4 weeks in the same season Historically 2 years of data areincluded for each 5500 stores and 1500 products Data size isabout 875 million records within 155 features These featuresinclude sales quantity customer return quantity receivedproduct quantities from distribution centers sales amountdiscount amount receipt counts of a product customercounts who bought a specific product sales quantity foreach day of week (Monday Tuesday etc) maximum andminimum stock level of each day average stock level for eachweek sales quantity for each hour of the days and also salesquantities of last 4 weeks of the same season Since there isa relationship between products which are generally in thesame basket we prepared the same features defined above for

the most associated products as well Detailed explanation ofthe dataset is presented in Table 2

In Table 2 fields given with square brackets mean that itis an array for each week (Ex Sales quantity week 0 meanssales quantity of current week and sales quantity week 1 isfor sales quantity for one week before and so on) Regardingexperimental results taking weekly data for the latest 8 weeksand 4 weeks of the same season at previous year gives moreacceptable trends of changes in seasonality

For DL implementation part H2O library [31] is usedas an open source big data artificial intelligence platformH2O is a powerful machine learning library and gives usopportunity to implement DL algorithms on Hadoop Sparkbig data environments It puts together the power of highlyadvanced machine learning algorithms and provides to getbenefit of truly scalable in-memory processing for Sparkbig data environment on one or many nodes via its versionof Sparkling Water By the help of in-memory processingcapability of Spark technologies it provides faster parallelplatform to utilize big data for business requirements to getmaximum benefit In this study we use H2O version 31402on 64-bit 16-core and 32GBmemorymachine to observe thepower of parallelism during DL modelling while increasingthe number of neurons

A multilayer feedforward artificial neural network(MLFANN) is employed as a deep learning algorithmMLFANN is trained with stochastic gradient descent usingbackpropagation Using gradient descent each weight isadjusted according to its contribution value to the errorH2O has also advanced features like momentum trainingadaptive learning rate L1 or L2 regularization rate annealing

Complexity 9

Table 2 Dataset explanation

Name Description ExampleYearweek Related yearweek weeks are starting fromMonday to Sunday 201801Store Store number 1234Product Product Identification Number 1

Product Adjactive Associated product with the product according to apriori algorithm Most frequentproduct at the same basket with a specific product 2

Stock In Quantity Week[0-8] Stock increase quantity of the product in related week ex stock transfer quantityfrom distribution center to store 50

Return Quantity Week[0-8] Stock return quantity from customers at a specific week and store 20Sales Quantity Week[0-8] Weekly sales quantity of related product at a specific store 120Sales Amount Week[0-8] Total sales amount of the product at the customer receipt 2500Discount Amount Week[0-8] Discount amount of the product if there is any 500Customer Count Week[0-8] How many customers bought this product at a specific week 30Receipt Count Week[0-8] Distinct receipt count for related product 20Sales Quantity Time[9-22] Hourly sales quantity of related product from 9 am to 22 pm 5Last4weeks Day[1-7] Total sales of each weekday of last 4 weeks Total sales of Mondays Tuesdays etc 10Last8weeks Day[1-7] Total sales of each weekday of last 8 weeks Total sales of Mondays Tuesdays etc 10Max Stock Week[0-8] Maximum stock quantity of related week 12Min Stock Week[0-8] Minimum stock quantity of related week 2Avg Stock Week[0-8] Average stock quantity of related week 5Sales Quantity Adj Week[0-8] Sales quantity of most associated product 14Temperature Weekday[1-7] Daily temperature of weekdays Monday Tuesday etc 22Weekly Avg Temperature[0-8] Average weather temperature of related week 23Weather Condition Weekday[1-7] Nominal variable rainy snowy sunny cloudy etc RainySales Quantity Next Week Target variable of our classification problem 25

dropout and grid search Gaussian distribution is appliedbecause of the usage of continuous variable as responsevariable H2O performs very well in our environment when3 level hidden layers 10 nodes each of them and totally 300epochs are set as parameters

After implementing dimension reduction step K-meansis employed as a clustering algorithm K-means is a greedyand fast clustering algorithm [45] that tries to partitionsamples into k clusters in which each sample is near to thecluster center K is selected as 20 because after several trialsand empirical observations the most even distribution ondataset is reached and divided our dataset into 20 differ-ent clusters for each store After that part deep learningalgorithm is applied and obtained 20 different DL modelsfor each store Then demand forecasting for each productis performed by using its clusterrsquos model on a store basisThe main reason of using clustering is time and computinglack for DL algorithm Instead of making modelling for eachstore 20 models are generated for each store in product levelOur trials with sample dataset bias difference are less than2 so this speed-up is very beneficial and a good optionfor researchers The forecasting result of the DL model istransferred into our forecasting system that makes its finaldecision by considering decisions of 11 forecasting methodsincluding DL model

Forecasting solutions should be scalable process largeamounts of data and extract a model from data Big data

environments using Spark technologies give us opportunityfor implementing algorithms with scalability that shares tasksof machine learning algorithms among nodes of parallelarchitectures Considering huge amounts of samples andlarge number of features even the computational power ofparallel architecture is not enough in some of cases Forthis reason dimension reduction is needed for big dataapplications In this study PCA is employed as a featureextraction step

5 Experimental Results

Forecasting solutions should be scalable process largeamounts of data and extract a model from data Big dataenvironments using Spark technologies give us opportunityfor implementing algorithms with scalability that shares tasksof machine learning algorithms among nodes of parallelarchitectures Considering huge amounts of samples andlarge number of features even the computational power ofparallel architecture is not enough in some of cases Forthis reason dimension reduction is needed for big dataapplications In this study PCA is employed as a featureextraction step

SOK Market sells 21 different groups of items We assessthe performance of the proposed forecasting system on groupbasis as shown in Table 3 Table 3 shows MAPE (Mean Abso-lute Percentage Error) of integration strategies (S1) and (S2)

10 Complexity

Table3Dem

andforecasting

improvem

entsperp

rodu

ctgrou

p

Prod

uct

Group

s

Metho

dI

with

outP

ropo

sed

IntegrationStrategy

(119878 1)

MAPE

Metho

dII

with

Prop

osed

IntegrationStrategy

(119878 2)

MAPE

Prop

osed

Integration

Strategy

with

Deep

Learning

(119878 119863)

MAPE

Percentage

Success

Rate

119875 1=(119878 1minus

119878 2)119878 2

Percentage

Success

Rate

119875 2=(119878 1minus

119878 119863)119878 119863

Improvem

ent

119863=119875 2minus

119875 1Ba

byProd

ucts

05157

03081

02927

4026

4325

299

Bakery

Prod

ucts

03482

02059

01966

4085

4352

267

Beverage

03714

02316

02207

3764

4058

294

Biscuit-C

hocolate

03358

02077

01977

3814

4113

299

BreakfastP

rodu

cts

044

4302770

02661

3765

4011

245

Canned-Paste

-Sauces

03836

02309

02198

3980

4269

289

Cheese

03953

02457

02345

3784

4068

284

Cleaning

Prod

ucts

04560

02791

02650

3879

4189

310

CosmeticsP

rodu

cts

05397

03266

03148

3949

4167

219

DeliM

eats

04242

02602

02488

3865

4136

270

EdibleOils

040

6002299

02215

4336

4545

209

Hou

seho

ldGoo

ds05713

03656

03535

3601

3813

212

IceC

ream

-Frozen

05012

03255

03106

3505

3803

298

Legu

mes-Pasta-Sou

p03850

02397

02269

3774

4107

333

Nuts-Ch

ips

03316

02049

01966

3820

4071

251

PoultryEg

gs04219

02527

02403

4011

4304

294

ReadyMeals

04613

02610

02520

4342

4536

194

RedMeat

02514

01616

01532

3571

3906

335

Tea-Coff

eeProd

ucts

04347

02650

02535

3904

4168

264

Textile

Prod

ucts

05418

03048

02907

4374

46

34

260

To

baccoProd

ucts

03791

02378

02290

3729

3961

232

Average

04238

02582

02469

3899

4168

269

Complexity 11

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 2 MAPE distribution according to product groups with S1 integration strategy

in columns 2 and 3The average percentage forecasting errorsof strategies (S1) and (S2) are 042 and 026 respectively As itcan be seen fromTable 3 the percentage success rate of (S2) inaccordance with (S1) is represented in column 5 The currentintegration strategy (S2) provides 3899 improvement overthe first integration strategy (S1) average In some productgroups like Textile Products the error rate is reduced from054 to 031 with 4374 enhancement in MAPE

With this work a further improvement is obtainedby utilizing DL model compared to the usage of just 10algorithms which are based on forecasting strategy Column4 of Table 3 demonstrates MAPEs of 21 different groups ofproducts after the addition of DL model After the inclusionof DL model into the proposed system we observed around2 to 34 increase in demand forecast accuracies in someof product groups The error rates of some product groupslike ldquobaby productsrdquo ldquocosmetic productsrdquo and ldquohouseholdgoodsrdquo are usually higher than others since these productgroups do not have continuous demand by consumers Forexample the error rate for ldquohousehold goodsrdquo group is 035while the error rate for ldquored meatrdquo group is 015 The errorrates in food products are usually lower since customersconsider their daily consumption regularly to buy theseproducts However the error rates are higher in baby textileand household goods Customers buy products from thesegroups depending on whether they like the product or notselectively In addition the prices of goods in these groupsand promotion effects are usually high relative to the pricesof goods in other groups

Moreover it is observed that the top benefitted productgroups for Method I accomplishes over 40 success rateon Baby Products Bakery Products Edible Oils PoultryEggs Ready Meals and Textile Products when the column

of percentage success rate is analyzed The common pointof these groups is that each one of them is being consumedby a specific customer segment regularly For instance babyproducts group is being chosen by families who have kidsEdible Oils Poultry Eggs and Bakery Products are beingpreferred by frequently and continuously shopping customersegments and Ready Meals are being preferred by mostlybachelor single and working consumers etc Furthermorethe inclusion of DL into the forecasting system indicatesthat some other consuming groups (for example CleaningProducts RedMeats and Legumes Pasta Soup) exhibit betterperformance than the others with additional over 3

Figure 2 shows the box plots of mean percentage errors(MAPE) for each group of products after application of(S1) integration strategy The box plots enable us to analyzedistributional characteristics of forecasting errors for productgroups As it can be seen from the figure there are differentmedians for each product group Thus the forecasting errorsare usually dissimilar for different products groups Theinterquartile range box represents the middle 50 of scoresin data The lengths of interquartile range boxes are usuallyvery tall This means that there are quite number of differentprediction errors within products of a given group

Figure 3 presents the box plots of MAPEs of the proposedforecasting system that apply proposed integration approach(S2) which combines the predictions of 10 different forecast-ing algorithms

Figure 4 shows after adding DL to our system resultsThe lengths of interquartile range boxes are narrower whencompared to the ones in Figures 2 and 3 This means that thedistribution of forecasting errors is less than Figures 2 and 3Finally the integration of DL into the proposed forecastingsystem generates results that are more accurate

12 Complexity

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 3 MAPE distribution according to product groups with S2 integration strategy

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 4 MAPE distribution according to product groups after deep learning algorithms

For example for the (S1) integration strategy in BabyProducts product group MAPE distribution for the 1st quar-tile of data 5 is observed between 0and 23 for 2nd quartilebetween 23 and 48 and for 3rd and 4th between 48and 75 in Figure 2 After applying integration strategy forBaby Products product group MAPE distribution becomes0ndash16 for the 1st quartile 16ndash30 for the 2nd quartileof the data and 30ndash50 for the rest This gives around8 improvement for the first quartile from 8 to 18enhancement for the second quartile and from 18 to 25advancement for the rest of the data according to boundaries

differences Median of MAPE distribution is analyzed as48 After applying the proposed integration strategy it isobserved as 27 which means 21 improvement Approx-imately 1-3 enhancement is observed for each quartileof the data with the inclusion of deep learning strategy inFigure 4

For Cleaning Products similar improvements areobserved with the others for integration strategy MAPEdistribution of the 1st quartile of data is observed between0 and 19 for the 2nd quartile it is between 19 and 40and for the rest of data it is observed between 40 and

Complexity 13

SUM

(SA

LE Q

UAN

TITY

)

DATE

9000

8000

7000

6000

5000

4000

3000

2000

1000

0

July August Sep Oct Nov Dec Jan Feb March April May June July

ActualS1

S2SD

2017 2017 2017 2017 2017 2017 2018 2018 2018 2018 2018 2018 2018

Figure 5 Accuracy comparison among integration strategies for one sample product group

73 in Figure 2 After performing the proposed integrationstrategy for Cleaning Products MAPE distribution becomes0ndash10 for the 1st quartile of data 10ndash17 for the 2ndquartile and 27ndash43 for the 3rd and 4th quartiles togetherImprovements are observed as follows 9 for the 1st quartilefrom 9 to 13 for the 2nd quartile and from 13 to 20for the rest of the data at boundaries After implementationof the integration strategy for the 1st and 2nd quartiles thesystem advancement is observed as 2 and for the 3rdand 4th quartiles it is 3 respectively Briefly integrationstrategy improvement is remarkable for each product groupand consolidation of DL to the proposed integration strategyadvances results between almost 2 and 3 additionally

In Figure 5 the accuracy comparison among integrationstrategies of one of the best performing sample groups ispresented during 1-year period Particularly Christmas andother seasonality effects can be seen very clearly at Figure 5and integration strategy with DL (SD) is performing with thebest accuracy

It is hard to compare the performance of our proposedsystem with the other studies because of the lack of workswith similar combinations of deep learning approach theproposed decision integration strategies and different learn-ing algorithms for demand forecasting process Anotherdifficulty is the deficiency of real-life dataset similar toSOK dataset in order to compare the state-of-art studiesAlthough the results of proposed system are given in thisstudy we also report the results of a number of researchworkshere on demand forecasting domain A greedy aggregation-decomposition method has solved a real-world intermittentdemand forecasting problem for a fashion retailer in Sin-gapore [46] They report 59 MAPE success rate with asmall dataset Authors compare different approaches suchas statistical model winter model and radius basis functionneural network (RBFNN) with SVM for demand forecastingprocess in [47] As a result they conclude the study by the

fact that the success of SVM algorithm surpasses otherswith around 77 enhancement at average MAPE resultslevel A novel demand forecasting model called SHEnSVM(Selective and Heterogeneous Ensemble of Support Vec-tor Machines) is proposed in [48] The proposed modelpresents that the individual SVMs are trained by differentsamples generated by bootstrap algorithm After that geneticalgorithm is employed for retrieving the best individualcombination schema They report 10 advancement withSVM algorithm and 64 MAPE improvement They onlyemploy beer datawith 3 different brands in their experimentsTugba Efendigil et al propose a novel forecasting mechanismwhich is modeled by artificial intelligence approaches in[49] They compared both artificial neural networks andadaptive network-based fuzzy inference system techniques tomanage the fuzzy demandwith incomplete informationTheyreached around 18 MAPE rates for some products duringtheir experiments

6 Conclusion

In retail industry demand forecasting is one of the mainproblems of supply chains to optimize stocks reduce costsand increase sales profit and customer loyalty To overcomethis issue there are several methods such as time seriesanalysis and machine learning approaches to analyze andlearn complex interactions and patterns from historical data

In this study there is a novel attempt to integrate the11 different forecasting models that include time series algo-rithms support vector regression model and deep learningmethod for demand forecasting process Moreover a noveldecision integration strategy is developed by drawing inspi-ration from the ensemble methodology namely boostingThe proposed approach considers the performance of eachmodel in time and combines the weighted predictions of thebest performing models for demand forecasting process The

14 Complexity

proposed forecasting system is tested and carried out real lifedata obtained from SOK Market retail chain It is observedthat the inclusion of different learning algorithms except timeseries models and a novel integration strategy advanced theperformance of demand forecasting system To the best ofour knowledge this is the very first attempt to consolidatedeep learning methodology SVR algorithm and differenttime series methods for demand forecasting systems Fur-thermore the other novelty of this work is the adaptation ofboosting ensemble strategy to the demand forecasting modelIn this way the final decision of the proposed system isbased on the best algorithms of the week by gaining moreweight This makes our forecasts more reliable with respectto the trend changes and seasonality behaviors Moreoverthe proposed approach performs very well integrating withdeep learning algorithm on Spark big data environmentDimension reduction process and clustering methods helpto decrease time consuming with less computing powerduring deep learning modelling phase Although a reviewof some of similar studies is presented in Section 5 asit is expected it is very difficult to compare the resultsof other studies with ours because of the use of differentdatasets and methods In this study we compare results ofthree models where model one selects the best performingforecasting method depending on its success on previousperiod with 424 MAPE on average The second modelwith the novel integration strategy results in 258 MAPEon average The last model with the novel integration strat-egy enhanced with deep learning approach provides 247MAPE on average As a result the inclusion of deep learningapproach into the novel integration strategy reduces averageprediction error for demand forecasting process in supplychain

As a future work we plan to enrich the set of featuresby gathering data from other sources like economic studiesshopping trends social media social events and locationbased demographic data of stores New variety of datasources contributions to deep learning can be observedOne more study can be done to determine the hyperpa-rameters for deep learning algorithm In addition we alsoplan to use other deep learning techniques such as con-volutional neural networks recurrent neural networks anddeep neural networks as learning algorithms Furthermoreour another objective is to use heuristic methods MBO(Migrating Birds Optimization) and other related algorithms[50] to optimize some of coefficientsweights which weredetermined empirically by trial-and-error like taking 30percent of the best performing methods in our currentsystem

Data Availability

This dataset is private customer data so the agreement withthe organization SOK does not allow sharing data

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work is supported by OBASE Research amp DevelopmentCenter

References

[1] P J McGoldrick and E Andre ldquoConsumer misbehaviourpromiscuity or loyalty in grocery shoppingrdquo Journal of Retailingand Consumer Services vol 4 no 2 pp 73ndash81 1997

[2] D Grewal A L Roggeveen and J Nordfalt ldquoThe Future ofRetailingrdquo Journal of Retailing vol 93 no 1 pp 1ndash6 2017

[3] E T Bradlow M Gangwar P Kopalle and S Voleti ldquothe roleof big data and predictive analytics in retailingrdquo Journal ofRetailing vol 93 no 1 pp 79ndash95 2017

[4] J Han M Kamber and J Pei Data Mining Concepts andTechniques Morgan Kaufmann Publishers USA 2013

[5] R Hyndman and G Athanasopoulos Forecasting Principlesand Practice OTexts Melbourne Australia 2018 httpotextsorgfpp2

[6] C C Pegels ldquoExponential forecasting some new variationsrdquoManagement Science vol 12 pp 311ndash315 1969

[7] E S Gardner ldquoExponential smoothing the state of the artrdquoJournal of Forecasting vol 4 no 1 pp 1ndash28 1985

[8] D Gujarati Basic Econometrics Mcgraw-Hill New York NYUSA 2003

[9] J D Croston ldquoForecasting and stock control for intermittentdemandsrdquo Operational Research Quarterly vol 23 no 3 pp289ndash303 1972

[10] T RWillemain C N Smart J H Shockor and P A DeSautelsldquoForecasting intermittent demand inmanufacturing a compar-ative evaluation of Crostonrsquos methodrdquo International Journal ofForecasting vol 10 no 4 pp 529ndash538 1994

[11] A Syntetos Forecasting of intermittent demand Brunel Univer-sity 2001

[12] A A Syntetos and J E Boylan ldquoThe accuracy of intermittentdemand estimatesrdquo International Journal of Forecasting vol 21no 2 pp 303ndash314 2005

[13] R H Teunter A A Syntetos and M Z Babai ldquoIntermittentdemand linking forecasting to inventory obsolescencerdquo Euro-pean Journal of Operational Research vol 214 no 3 pp 606ndash615 2011

[14] R J Hyndman R A Ahmed G Athanasopoulos and H LShang ldquoOptimal combination forecasts for hierarchical timeseriesrdquo Computational Statistics amp Data Analysis vol 55 no 9pp 2579ndash2589 2011

[15] R Fildes and F Petropoulos ldquoSimple versus complex selectionrules for forecasting many time seriesrdquo Journal of BusinessResearch vol 68 no 8 pp 1692ndash1701 2015

[16] C Li and A Lim ldquoA greedy aggregationndashdecompositionmethod for intermittent demand forecasting in fashion retail-ingrdquo European Journal of Operational Research vol 269 no 3pp 860ndash869 2018

[17] F Turrado Garcıa L J Garcıa Villalba and J Portela ldquoIntelli-gent system for time series classification using support vectormachines applied to supply-chainrdquo Expert Systems with Appli-cations vol 39 no 12 pp 10590ndash10599 2012

[18] G Song and Q Dai ldquoA novel double deep ELMs ensemblesystem for time series forecastingrdquo Knowledge-Based Systemsvol 134 pp 31ndash49 2017

Complexity 15

[19] O Araque I Corcuera-Platas J F Sanchez-Rada and C AIglesias ldquoEnhancing deep learning sentiment analysis withensemble techniques in social applicationsrdquoExpert SystemswithApplications vol 77 pp 236ndash246 2017

[20] H Tong B Liu and S Wang ldquoSoftware defect predictionusing stacked denoising autoencoders and twostage ensemblelearningrdquo Information and Soware Technology vol 96 pp 94ndash111 2018

[21] X Qiu Y Ren P N Suganthan and G A J AmaratungaldquoEmpirical mode decomposition based ensemble deep learningfor load demand time series forecastingrdquo Applied So Comput-ing vol 54 pp 246ndash255 2017

[22] Z Qi BWang Y Tian and P Zhang ldquoWhen ensemble learningmeets deep learning a new deep support vector machine forclassificationrdquo Knowledge-Based Systems vol 107 pp 54ndash602016

[23] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[24] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[25] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesrdquo inNeural Networks Tricks of theTrade vol 7700 of Lecture Notes in Computer Science pp 437ndash478 Springer Berlin Germany 2nd edition 2012

[26] Y LeCun Y Bengio and G Hinton ldquoDeep learning ReviewrdquoInternational Journal of Business and Social Science vol 3 2015

[27] S Kim Z Yu R M Kil and M Lee ldquoDeep learning of supportvectormachineswith class probability output networksrdquoNeuralNetworks vol 64 pp 19ndash28 2015

[28] P J Brockwell and R A Davis Time Series eory AndMethods Springer Series in Statistics NewYork NY USA 1989

[29] V N Vapnik Statistical Learning eory Wiley- InterscienceNew York NY USA 1998

[30] S Haykin Neural Networks and LearningMachines MacmillanPublishers Limited New Jersey NJ USA 2009

[31] A Candel V Parmar E LeDell and A Arora Deep Learningwith H2O United States of America 2018

[32] K Keerthi Vasan and B Surendiran ldquoDimensionality reductionusing principal component analysis for network intrusiondetectionrdquo Perspectives in Science vol 8 pp 510ndash512 2016

[33] L Rokach ldquoEnsemble-based classifiersrdquo Artificial IntelligenceReview vol 33 no 1-2 pp 1ndash39 2010

[34] R Polikar ldquoEnsemble based systems in decision makingrdquo IEEECircuits and Systems Magazine vol 6 no 3 pp 21ndash45 2006

[35] Reid SamA review of heterogeneous ensemblemethods Depart-ment of Computer Science University of Colorado at Boulder2007

[36] D Gopika and B Azhagusundari ldquoAn analysis on ensem-ble methods in classification tasksrdquo International Journal ofAdvanced Research in Computer and Communication Engineer-ing vol 3 no 7 pp 7423ndash7427 2014

[37] Y Ren L Zhang and P N Suganthan ldquoEnsemble classificationand regression-recent developments applications and futuredirectionsrdquo IEEE Computational Intelligence Magazine vol 11no 1 pp 41ndash53 2016

[38] U G Mangai S Samanta S Das and P R ChowdhuryldquoA survey of decision fusion and feature fusion strategies for

pattern classificationrdquo IETE Technical Review vol 27 no 4 pp293ndash307 2010

[39] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014

[40] G Tsoumakas L Angelis and I Vlahavas ldquoSelective fusion ofheterogeneous classifiersrdquo Intelligent Data Analysis vol 9 no 6pp 511ndash525 2005

[41] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[42] S Chopra and P Meindl Supply Chain Management StrategyPlanning and Operation Prentice Hall 2004

[43] G Kushwaha ldquoOperational performance through supply chainmanagement practicesrdquo International Journal of Business andSocial Science vol 217 pp 65ndash77 2012

[44] G E Box G M Jenkins G Reinsel and G M LjungTime Series Analysis Forecasting and Control Wiley Series inProbability and Statistics John Wiley amp Sons Hoboken NJUSA 5th edition 2015

[45] W-L Zhao C-H Deng and C-W Ngo ldquok-means a revisitrdquoNeurocomputing vol 291 pp 195ndash206 2018

[46] C Li andA Lim ldquoA greedy aggregation-decompositionmethodfor intermittent demand forecasting in fashion retailingrdquo Euro-pean Journal of Operational Research vol 269 no 3 pp 860ndash869 2018

[47] L Yue Y Yafeng G Junjun and T Chongli ldquoDemand forecast-ing by using support vectormachinerdquo inProceedings of theirdInternational Conference on Natural Computation (ICNC 2007)pp 272ndash276 Haikou China August 2007

[48] L Yue L Zhenjiang Y Yafeng T Zaixia G Junjun andZ Bofeng ldquoSelective and heterogeneous SVM ensemble fordemand forecastingrdquo in Proceedings of the 2010 IEEE 10th Inter-national Conference on Computer and Information Technology(CIT) pp 1519ndash1524 Bradford UK June 2010

[49] T Efendigil S Onut and C Kahraman ldquoA decision supportsystem for demand forecasting with artificial neural networksand neuro-fuzzy models a comparative analysisrdquo Expert Sys-tems with Applications vol 36 no 3 pp 6697ndash6707 2009

[50] E Duman M Uysal and A F Alkaya ldquoMigrating birds opti-mization a newmetaheuristic approach and its performance onquadratic assignment problemrdquo Information Sciences vol 217pp 65ndash77 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 7: An Improved Demand Forecasting Model Using …downloads.hindawi.com/journals/complexity/2019/9067367.pdfAn Improved Demand Forecasting Model Using Deep Learning Approach and Proposed

Complexity 7

Data Collection

Data Cleaning

Data Enrichment from BigData

Finding Frequent Itemsusing Apriori

ReplenishmentData mart on

Hadoop SPARK

Dimensionality Reduction

Clustering

Deep Learning Model Time Series Models Support VectorRegression Model

Final Decision

Proposed Integration Strategy

Figure 1 Flowchart of the proposed system

as it is marked This enables faster computation time inwhich the system disregards poor performing algorithmsThe algorithm of the proposed demand forecasting systemand the flowchart of the proposed system are given inAlgorithm 1 and Figure 1 respectively

4 Experiment Setup

We use an enhanced dataset which includes SOK Marketrsquosreal life sales and stock data with enriched features Thedataset consists of 106 weeks of sales data that includes 7888distinct products for different stores of SOKMarket On eachstore around 1500 products are actively sold while the rest

rarely become active The forecasting is being performed foreach week to predict demand of the following week Three-fold cross validation methodology is applied for testing dataand then decides with the best accurate one of them forthe next coming week forecast This is a common approachmaking demand forecasting in retail [44] Furthermoreoutside weather information such as daily temperature andother weather conditions is joined to the model as newvariables Since it is known that there is a certain correlationbetween shopping trends and weather conditions at retail inmost circumstances [33] we enriched the dataset with theweather information The dataset also includes mostly salesrelated features filled from relational legacy data sources

8 Complexity

Given n is the number of storesm is the number of products t is the number ofalgorithms in the system and 119860 119905 is an algorithm with index t 119904119898119899 is the matrix whichincludes the number of best performing algorithms for each store and product 119861119898119899 is thematrix which contains the set of blacklist of algorithms for each store and product and119865119894119895 is the matrix which stores final decision of each forecast where i is the number ofstores and j is the number of products

for i=1nfor j=1mfor k=1tif 119860119896 is in list 119861119894119895 then continueelse run 119860119896Calculate algorithm weight119882119896end if

end forfor k=1tChoice best performing algorithms and locate in 119904119894119895

end forfor z=1 119904119894119895do scaling for 119860119911

end forCalculate proposed integration strategy and store in 119865119894119895

end forend forreturn all 119865119899119898

Algorithm 1 The algorithm of the proposed demand forecasting system

Deep learning approach is used to estimate customerdemand for each product at each store of SOK Marketretail chain There is also very apparent similarity betweenproducts which are mostly in the same basket in terms ofsales trends For this purpose Apriori algorithm [4] is utilizedto find correlated products of each product Apriori algorithmis a fast way to find out most associated products which areat the same basketThemost associated productrsquos sales relatedfeatures are added to dataset as wellThis takes the advantageof relationships among products with similar sales trendsAddition of associated product of a product and its featuresenables DL algorithm to learn better within different hiddenlayers In summary it is observed in extensive experimentsthat enhancements of training dataset with weather data andmost associated products sales data made our estimates morepowerful with the usage of DL algorithm

The dataset includes weekly based data of each product ateach store for the latest 8 weeks and sales quantities of the last4 weeks in the same season Historically 2 years of data areincluded for each 5500 stores and 1500 products Data size isabout 875 million records within 155 features These featuresinclude sales quantity customer return quantity receivedproduct quantities from distribution centers sales amountdiscount amount receipt counts of a product customercounts who bought a specific product sales quantity foreach day of week (Monday Tuesday etc) maximum andminimum stock level of each day average stock level for eachweek sales quantity for each hour of the days and also salesquantities of last 4 weeks of the same season Since there isa relationship between products which are generally in thesame basket we prepared the same features defined above for

the most associated products as well Detailed explanation ofthe dataset is presented in Table 2

In Table 2 fields given with square brackets mean that itis an array for each week (Ex Sales quantity week 0 meanssales quantity of current week and sales quantity week 1 isfor sales quantity for one week before and so on) Regardingexperimental results taking weekly data for the latest 8 weeksand 4 weeks of the same season at previous year gives moreacceptable trends of changes in seasonality

For DL implementation part H2O library [31] is usedas an open source big data artificial intelligence platformH2O is a powerful machine learning library and gives usopportunity to implement DL algorithms on Hadoop Sparkbig data environments It puts together the power of highlyadvanced machine learning algorithms and provides to getbenefit of truly scalable in-memory processing for Sparkbig data environment on one or many nodes via its versionof Sparkling Water By the help of in-memory processingcapability of Spark technologies it provides faster parallelplatform to utilize big data for business requirements to getmaximum benefit In this study we use H2O version 31402on 64-bit 16-core and 32GBmemorymachine to observe thepower of parallelism during DL modelling while increasingthe number of neurons

A multilayer feedforward artificial neural network(MLFANN) is employed as a deep learning algorithmMLFANN is trained with stochastic gradient descent usingbackpropagation Using gradient descent each weight isadjusted according to its contribution value to the errorH2O has also advanced features like momentum trainingadaptive learning rate L1 or L2 regularization rate annealing

Complexity 9

Table 2 Dataset explanation

Name Description ExampleYearweek Related yearweek weeks are starting fromMonday to Sunday 201801Store Store number 1234Product Product Identification Number 1

Product Adjactive Associated product with the product according to apriori algorithm Most frequentproduct at the same basket with a specific product 2

Stock In Quantity Week[0-8] Stock increase quantity of the product in related week ex stock transfer quantityfrom distribution center to store 50

Return Quantity Week[0-8] Stock return quantity from customers at a specific week and store 20Sales Quantity Week[0-8] Weekly sales quantity of related product at a specific store 120Sales Amount Week[0-8] Total sales amount of the product at the customer receipt 2500Discount Amount Week[0-8] Discount amount of the product if there is any 500Customer Count Week[0-8] How many customers bought this product at a specific week 30Receipt Count Week[0-8] Distinct receipt count for related product 20Sales Quantity Time[9-22] Hourly sales quantity of related product from 9 am to 22 pm 5Last4weeks Day[1-7] Total sales of each weekday of last 4 weeks Total sales of Mondays Tuesdays etc 10Last8weeks Day[1-7] Total sales of each weekday of last 8 weeks Total sales of Mondays Tuesdays etc 10Max Stock Week[0-8] Maximum stock quantity of related week 12Min Stock Week[0-8] Minimum stock quantity of related week 2Avg Stock Week[0-8] Average stock quantity of related week 5Sales Quantity Adj Week[0-8] Sales quantity of most associated product 14Temperature Weekday[1-7] Daily temperature of weekdays Monday Tuesday etc 22Weekly Avg Temperature[0-8] Average weather temperature of related week 23Weather Condition Weekday[1-7] Nominal variable rainy snowy sunny cloudy etc RainySales Quantity Next Week Target variable of our classification problem 25

dropout and grid search Gaussian distribution is appliedbecause of the usage of continuous variable as responsevariable H2O performs very well in our environment when3 level hidden layers 10 nodes each of them and totally 300epochs are set as parameters

After implementing dimension reduction step K-meansis employed as a clustering algorithm K-means is a greedyand fast clustering algorithm [45] that tries to partitionsamples into k clusters in which each sample is near to thecluster center K is selected as 20 because after several trialsand empirical observations the most even distribution ondataset is reached and divided our dataset into 20 differ-ent clusters for each store After that part deep learningalgorithm is applied and obtained 20 different DL modelsfor each store Then demand forecasting for each productis performed by using its clusterrsquos model on a store basisThe main reason of using clustering is time and computinglack for DL algorithm Instead of making modelling for eachstore 20 models are generated for each store in product levelOur trials with sample dataset bias difference are less than2 so this speed-up is very beneficial and a good optionfor researchers The forecasting result of the DL model istransferred into our forecasting system that makes its finaldecision by considering decisions of 11 forecasting methodsincluding DL model

Forecasting solutions should be scalable process largeamounts of data and extract a model from data Big data

environments using Spark technologies give us opportunityfor implementing algorithms with scalability that shares tasksof machine learning algorithms among nodes of parallelarchitectures Considering huge amounts of samples andlarge number of features even the computational power ofparallel architecture is not enough in some of cases Forthis reason dimension reduction is needed for big dataapplications In this study PCA is employed as a featureextraction step

5 Experimental Results

Forecasting solutions should be scalable process largeamounts of data and extract a model from data Big dataenvironments using Spark technologies give us opportunityfor implementing algorithms with scalability that shares tasksof machine learning algorithms among nodes of parallelarchitectures Considering huge amounts of samples andlarge number of features even the computational power ofparallel architecture is not enough in some of cases Forthis reason dimension reduction is needed for big dataapplications In this study PCA is employed as a featureextraction step

SOK Market sells 21 different groups of items We assessthe performance of the proposed forecasting system on groupbasis as shown in Table 3 Table 3 shows MAPE (Mean Abso-lute Percentage Error) of integration strategies (S1) and (S2)

10 Complexity

Table3Dem

andforecasting

improvem

entsperp

rodu

ctgrou

p

Prod

uct

Group

s

Metho

dI

with

outP

ropo

sed

IntegrationStrategy

(119878 1)

MAPE

Metho

dII

with

Prop

osed

IntegrationStrategy

(119878 2)

MAPE

Prop

osed

Integration

Strategy

with

Deep

Learning

(119878 119863)

MAPE

Percentage

Success

Rate

119875 1=(119878 1minus

119878 2)119878 2

Percentage

Success

Rate

119875 2=(119878 1minus

119878 119863)119878 119863

Improvem

ent

119863=119875 2minus

119875 1Ba

byProd

ucts

05157

03081

02927

4026

4325

299

Bakery

Prod

ucts

03482

02059

01966

4085

4352

267

Beverage

03714

02316

02207

3764

4058

294

Biscuit-C

hocolate

03358

02077

01977

3814

4113

299

BreakfastP

rodu

cts

044

4302770

02661

3765

4011

245

Canned-Paste

-Sauces

03836

02309

02198

3980

4269

289

Cheese

03953

02457

02345

3784

4068

284

Cleaning

Prod

ucts

04560

02791

02650

3879

4189

310

CosmeticsP

rodu

cts

05397

03266

03148

3949

4167

219

DeliM

eats

04242

02602

02488

3865

4136

270

EdibleOils

040

6002299

02215

4336

4545

209

Hou

seho

ldGoo

ds05713

03656

03535

3601

3813

212

IceC

ream

-Frozen

05012

03255

03106

3505

3803

298

Legu

mes-Pasta-Sou

p03850

02397

02269

3774

4107

333

Nuts-Ch

ips

03316

02049

01966

3820

4071

251

PoultryEg

gs04219

02527

02403

4011

4304

294

ReadyMeals

04613

02610

02520

4342

4536

194

RedMeat

02514

01616

01532

3571

3906

335

Tea-Coff

eeProd

ucts

04347

02650

02535

3904

4168

264

Textile

Prod

ucts

05418

03048

02907

4374

46

34

260

To

baccoProd

ucts

03791

02378

02290

3729

3961

232

Average

04238

02582

02469

3899

4168

269

Complexity 11

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 2 MAPE distribution according to product groups with S1 integration strategy

in columns 2 and 3The average percentage forecasting errorsof strategies (S1) and (S2) are 042 and 026 respectively As itcan be seen fromTable 3 the percentage success rate of (S2) inaccordance with (S1) is represented in column 5 The currentintegration strategy (S2) provides 3899 improvement overthe first integration strategy (S1) average In some productgroups like Textile Products the error rate is reduced from054 to 031 with 4374 enhancement in MAPE

With this work a further improvement is obtainedby utilizing DL model compared to the usage of just 10algorithms which are based on forecasting strategy Column4 of Table 3 demonstrates MAPEs of 21 different groups ofproducts after the addition of DL model After the inclusionof DL model into the proposed system we observed around2 to 34 increase in demand forecast accuracies in someof product groups The error rates of some product groupslike ldquobaby productsrdquo ldquocosmetic productsrdquo and ldquohouseholdgoodsrdquo are usually higher than others since these productgroups do not have continuous demand by consumers Forexample the error rate for ldquohousehold goodsrdquo group is 035while the error rate for ldquored meatrdquo group is 015 The errorrates in food products are usually lower since customersconsider their daily consumption regularly to buy theseproducts However the error rates are higher in baby textileand household goods Customers buy products from thesegroups depending on whether they like the product or notselectively In addition the prices of goods in these groupsand promotion effects are usually high relative to the pricesof goods in other groups

Moreover it is observed that the top benefitted productgroups for Method I accomplishes over 40 success rateon Baby Products Bakery Products Edible Oils PoultryEggs Ready Meals and Textile Products when the column

of percentage success rate is analyzed The common pointof these groups is that each one of them is being consumedby a specific customer segment regularly For instance babyproducts group is being chosen by families who have kidsEdible Oils Poultry Eggs and Bakery Products are beingpreferred by frequently and continuously shopping customersegments and Ready Meals are being preferred by mostlybachelor single and working consumers etc Furthermorethe inclusion of DL into the forecasting system indicatesthat some other consuming groups (for example CleaningProducts RedMeats and Legumes Pasta Soup) exhibit betterperformance than the others with additional over 3

Figure 2 shows the box plots of mean percentage errors(MAPE) for each group of products after application of(S1) integration strategy The box plots enable us to analyzedistributional characteristics of forecasting errors for productgroups As it can be seen from the figure there are differentmedians for each product group Thus the forecasting errorsare usually dissimilar for different products groups Theinterquartile range box represents the middle 50 of scoresin data The lengths of interquartile range boxes are usuallyvery tall This means that there are quite number of differentprediction errors within products of a given group

Figure 3 presents the box plots of MAPEs of the proposedforecasting system that apply proposed integration approach(S2) which combines the predictions of 10 different forecast-ing algorithms

Figure 4 shows after adding DL to our system resultsThe lengths of interquartile range boxes are narrower whencompared to the ones in Figures 2 and 3 This means that thedistribution of forecasting errors is less than Figures 2 and 3Finally the integration of DL into the proposed forecastingsystem generates results that are more accurate

12 Complexity

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 3 MAPE distribution according to product groups with S2 integration strategy

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 4 MAPE distribution according to product groups after deep learning algorithms

For example for the (S1) integration strategy in BabyProducts product group MAPE distribution for the 1st quar-tile of data 5 is observed between 0and 23 for 2nd quartilebetween 23 and 48 and for 3rd and 4th between 48and 75 in Figure 2 After applying integration strategy forBaby Products product group MAPE distribution becomes0ndash16 for the 1st quartile 16ndash30 for the 2nd quartileof the data and 30ndash50 for the rest This gives around8 improvement for the first quartile from 8 to 18enhancement for the second quartile and from 18 to 25advancement for the rest of the data according to boundaries

differences Median of MAPE distribution is analyzed as48 After applying the proposed integration strategy it isobserved as 27 which means 21 improvement Approx-imately 1-3 enhancement is observed for each quartileof the data with the inclusion of deep learning strategy inFigure 4

For Cleaning Products similar improvements areobserved with the others for integration strategy MAPEdistribution of the 1st quartile of data is observed between0 and 19 for the 2nd quartile it is between 19 and 40and for the rest of data it is observed between 40 and

Complexity 13

SUM

(SA

LE Q

UAN

TITY

)

DATE

9000

8000

7000

6000

5000

4000

3000

2000

1000

0

July August Sep Oct Nov Dec Jan Feb March April May June July

ActualS1

S2SD

2017 2017 2017 2017 2017 2017 2018 2018 2018 2018 2018 2018 2018

Figure 5 Accuracy comparison among integration strategies for one sample product group

73 in Figure 2 After performing the proposed integrationstrategy for Cleaning Products MAPE distribution becomes0ndash10 for the 1st quartile of data 10ndash17 for the 2ndquartile and 27ndash43 for the 3rd and 4th quartiles togetherImprovements are observed as follows 9 for the 1st quartilefrom 9 to 13 for the 2nd quartile and from 13 to 20for the rest of the data at boundaries After implementationof the integration strategy for the 1st and 2nd quartiles thesystem advancement is observed as 2 and for the 3rdand 4th quartiles it is 3 respectively Briefly integrationstrategy improvement is remarkable for each product groupand consolidation of DL to the proposed integration strategyadvances results between almost 2 and 3 additionally

In Figure 5 the accuracy comparison among integrationstrategies of one of the best performing sample groups ispresented during 1-year period Particularly Christmas andother seasonality effects can be seen very clearly at Figure 5and integration strategy with DL (SD) is performing with thebest accuracy

It is hard to compare the performance of our proposedsystem with the other studies because of the lack of workswith similar combinations of deep learning approach theproposed decision integration strategies and different learn-ing algorithms for demand forecasting process Anotherdifficulty is the deficiency of real-life dataset similar toSOK dataset in order to compare the state-of-art studiesAlthough the results of proposed system are given in thisstudy we also report the results of a number of researchworkshere on demand forecasting domain A greedy aggregation-decomposition method has solved a real-world intermittentdemand forecasting problem for a fashion retailer in Sin-gapore [46] They report 59 MAPE success rate with asmall dataset Authors compare different approaches suchas statistical model winter model and radius basis functionneural network (RBFNN) with SVM for demand forecastingprocess in [47] As a result they conclude the study by the

fact that the success of SVM algorithm surpasses otherswith around 77 enhancement at average MAPE resultslevel A novel demand forecasting model called SHEnSVM(Selective and Heterogeneous Ensemble of Support Vec-tor Machines) is proposed in [48] The proposed modelpresents that the individual SVMs are trained by differentsamples generated by bootstrap algorithm After that geneticalgorithm is employed for retrieving the best individualcombination schema They report 10 advancement withSVM algorithm and 64 MAPE improvement They onlyemploy beer datawith 3 different brands in their experimentsTugba Efendigil et al propose a novel forecasting mechanismwhich is modeled by artificial intelligence approaches in[49] They compared both artificial neural networks andadaptive network-based fuzzy inference system techniques tomanage the fuzzy demandwith incomplete informationTheyreached around 18 MAPE rates for some products duringtheir experiments

6 Conclusion

In retail industry demand forecasting is one of the mainproblems of supply chains to optimize stocks reduce costsand increase sales profit and customer loyalty To overcomethis issue there are several methods such as time seriesanalysis and machine learning approaches to analyze andlearn complex interactions and patterns from historical data

In this study there is a novel attempt to integrate the11 different forecasting models that include time series algo-rithms support vector regression model and deep learningmethod for demand forecasting process Moreover a noveldecision integration strategy is developed by drawing inspi-ration from the ensemble methodology namely boostingThe proposed approach considers the performance of eachmodel in time and combines the weighted predictions of thebest performing models for demand forecasting process The

14 Complexity

proposed forecasting system is tested and carried out real lifedata obtained from SOK Market retail chain It is observedthat the inclusion of different learning algorithms except timeseries models and a novel integration strategy advanced theperformance of demand forecasting system To the best ofour knowledge this is the very first attempt to consolidatedeep learning methodology SVR algorithm and differenttime series methods for demand forecasting systems Fur-thermore the other novelty of this work is the adaptation ofboosting ensemble strategy to the demand forecasting modelIn this way the final decision of the proposed system isbased on the best algorithms of the week by gaining moreweight This makes our forecasts more reliable with respectto the trend changes and seasonality behaviors Moreoverthe proposed approach performs very well integrating withdeep learning algorithm on Spark big data environmentDimension reduction process and clustering methods helpto decrease time consuming with less computing powerduring deep learning modelling phase Although a reviewof some of similar studies is presented in Section 5 asit is expected it is very difficult to compare the resultsof other studies with ours because of the use of differentdatasets and methods In this study we compare results ofthree models where model one selects the best performingforecasting method depending on its success on previousperiod with 424 MAPE on average The second modelwith the novel integration strategy results in 258 MAPEon average The last model with the novel integration strat-egy enhanced with deep learning approach provides 247MAPE on average As a result the inclusion of deep learningapproach into the novel integration strategy reduces averageprediction error for demand forecasting process in supplychain

As a future work we plan to enrich the set of featuresby gathering data from other sources like economic studiesshopping trends social media social events and locationbased demographic data of stores New variety of datasources contributions to deep learning can be observedOne more study can be done to determine the hyperpa-rameters for deep learning algorithm In addition we alsoplan to use other deep learning techniques such as con-volutional neural networks recurrent neural networks anddeep neural networks as learning algorithms Furthermoreour another objective is to use heuristic methods MBO(Migrating Birds Optimization) and other related algorithms[50] to optimize some of coefficientsweights which weredetermined empirically by trial-and-error like taking 30percent of the best performing methods in our currentsystem

Data Availability

This dataset is private customer data so the agreement withthe organization SOK does not allow sharing data

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work is supported by OBASE Research amp DevelopmentCenter

References

[1] P J McGoldrick and E Andre ldquoConsumer misbehaviourpromiscuity or loyalty in grocery shoppingrdquo Journal of Retailingand Consumer Services vol 4 no 2 pp 73ndash81 1997

[2] D Grewal A L Roggeveen and J Nordfalt ldquoThe Future ofRetailingrdquo Journal of Retailing vol 93 no 1 pp 1ndash6 2017

[3] E T Bradlow M Gangwar P Kopalle and S Voleti ldquothe roleof big data and predictive analytics in retailingrdquo Journal ofRetailing vol 93 no 1 pp 79ndash95 2017

[4] J Han M Kamber and J Pei Data Mining Concepts andTechniques Morgan Kaufmann Publishers USA 2013

[5] R Hyndman and G Athanasopoulos Forecasting Principlesand Practice OTexts Melbourne Australia 2018 httpotextsorgfpp2

[6] C C Pegels ldquoExponential forecasting some new variationsrdquoManagement Science vol 12 pp 311ndash315 1969

[7] E S Gardner ldquoExponential smoothing the state of the artrdquoJournal of Forecasting vol 4 no 1 pp 1ndash28 1985

[8] D Gujarati Basic Econometrics Mcgraw-Hill New York NYUSA 2003

[9] J D Croston ldquoForecasting and stock control for intermittentdemandsrdquo Operational Research Quarterly vol 23 no 3 pp289ndash303 1972

[10] T RWillemain C N Smart J H Shockor and P A DeSautelsldquoForecasting intermittent demand inmanufacturing a compar-ative evaluation of Crostonrsquos methodrdquo International Journal ofForecasting vol 10 no 4 pp 529ndash538 1994

[11] A Syntetos Forecasting of intermittent demand Brunel Univer-sity 2001

[12] A A Syntetos and J E Boylan ldquoThe accuracy of intermittentdemand estimatesrdquo International Journal of Forecasting vol 21no 2 pp 303ndash314 2005

[13] R H Teunter A A Syntetos and M Z Babai ldquoIntermittentdemand linking forecasting to inventory obsolescencerdquo Euro-pean Journal of Operational Research vol 214 no 3 pp 606ndash615 2011

[14] R J Hyndman R A Ahmed G Athanasopoulos and H LShang ldquoOptimal combination forecasts for hierarchical timeseriesrdquo Computational Statistics amp Data Analysis vol 55 no 9pp 2579ndash2589 2011

[15] R Fildes and F Petropoulos ldquoSimple versus complex selectionrules for forecasting many time seriesrdquo Journal of BusinessResearch vol 68 no 8 pp 1692ndash1701 2015

[16] C Li and A Lim ldquoA greedy aggregationndashdecompositionmethod for intermittent demand forecasting in fashion retail-ingrdquo European Journal of Operational Research vol 269 no 3pp 860ndash869 2018

[17] F Turrado Garcıa L J Garcıa Villalba and J Portela ldquoIntelli-gent system for time series classification using support vectormachines applied to supply-chainrdquo Expert Systems with Appli-cations vol 39 no 12 pp 10590ndash10599 2012

[18] G Song and Q Dai ldquoA novel double deep ELMs ensemblesystem for time series forecastingrdquo Knowledge-Based Systemsvol 134 pp 31ndash49 2017

Complexity 15

[19] O Araque I Corcuera-Platas J F Sanchez-Rada and C AIglesias ldquoEnhancing deep learning sentiment analysis withensemble techniques in social applicationsrdquoExpert SystemswithApplications vol 77 pp 236ndash246 2017

[20] H Tong B Liu and S Wang ldquoSoftware defect predictionusing stacked denoising autoencoders and twostage ensemblelearningrdquo Information and Soware Technology vol 96 pp 94ndash111 2018

[21] X Qiu Y Ren P N Suganthan and G A J AmaratungaldquoEmpirical mode decomposition based ensemble deep learningfor load demand time series forecastingrdquo Applied So Comput-ing vol 54 pp 246ndash255 2017

[22] Z Qi BWang Y Tian and P Zhang ldquoWhen ensemble learningmeets deep learning a new deep support vector machine forclassificationrdquo Knowledge-Based Systems vol 107 pp 54ndash602016

[23] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[24] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[25] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesrdquo inNeural Networks Tricks of theTrade vol 7700 of Lecture Notes in Computer Science pp 437ndash478 Springer Berlin Germany 2nd edition 2012

[26] Y LeCun Y Bengio and G Hinton ldquoDeep learning ReviewrdquoInternational Journal of Business and Social Science vol 3 2015

[27] S Kim Z Yu R M Kil and M Lee ldquoDeep learning of supportvectormachineswith class probability output networksrdquoNeuralNetworks vol 64 pp 19ndash28 2015

[28] P J Brockwell and R A Davis Time Series eory AndMethods Springer Series in Statistics NewYork NY USA 1989

[29] V N Vapnik Statistical Learning eory Wiley- InterscienceNew York NY USA 1998

[30] S Haykin Neural Networks and LearningMachines MacmillanPublishers Limited New Jersey NJ USA 2009

[31] A Candel V Parmar E LeDell and A Arora Deep Learningwith H2O United States of America 2018

[32] K Keerthi Vasan and B Surendiran ldquoDimensionality reductionusing principal component analysis for network intrusiondetectionrdquo Perspectives in Science vol 8 pp 510ndash512 2016

[33] L Rokach ldquoEnsemble-based classifiersrdquo Artificial IntelligenceReview vol 33 no 1-2 pp 1ndash39 2010

[34] R Polikar ldquoEnsemble based systems in decision makingrdquo IEEECircuits and Systems Magazine vol 6 no 3 pp 21ndash45 2006

[35] Reid SamA review of heterogeneous ensemblemethods Depart-ment of Computer Science University of Colorado at Boulder2007

[36] D Gopika and B Azhagusundari ldquoAn analysis on ensem-ble methods in classification tasksrdquo International Journal ofAdvanced Research in Computer and Communication Engineer-ing vol 3 no 7 pp 7423ndash7427 2014

[37] Y Ren L Zhang and P N Suganthan ldquoEnsemble classificationand regression-recent developments applications and futuredirectionsrdquo IEEE Computational Intelligence Magazine vol 11no 1 pp 41ndash53 2016

[38] U G Mangai S Samanta S Das and P R ChowdhuryldquoA survey of decision fusion and feature fusion strategies for

pattern classificationrdquo IETE Technical Review vol 27 no 4 pp293ndash307 2010

[39] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014

[40] G Tsoumakas L Angelis and I Vlahavas ldquoSelective fusion ofheterogeneous classifiersrdquo Intelligent Data Analysis vol 9 no 6pp 511ndash525 2005

[41] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[42] S Chopra and P Meindl Supply Chain Management StrategyPlanning and Operation Prentice Hall 2004

[43] G Kushwaha ldquoOperational performance through supply chainmanagement practicesrdquo International Journal of Business andSocial Science vol 217 pp 65ndash77 2012

[44] G E Box G M Jenkins G Reinsel and G M LjungTime Series Analysis Forecasting and Control Wiley Series inProbability and Statistics John Wiley amp Sons Hoboken NJUSA 5th edition 2015

[45] W-L Zhao C-H Deng and C-W Ngo ldquok-means a revisitrdquoNeurocomputing vol 291 pp 195ndash206 2018

[46] C Li andA Lim ldquoA greedy aggregation-decompositionmethodfor intermittent demand forecasting in fashion retailingrdquo Euro-pean Journal of Operational Research vol 269 no 3 pp 860ndash869 2018

[47] L Yue Y Yafeng G Junjun and T Chongli ldquoDemand forecast-ing by using support vectormachinerdquo inProceedings of theirdInternational Conference on Natural Computation (ICNC 2007)pp 272ndash276 Haikou China August 2007

[48] L Yue L Zhenjiang Y Yafeng T Zaixia G Junjun andZ Bofeng ldquoSelective and heterogeneous SVM ensemble fordemand forecastingrdquo in Proceedings of the 2010 IEEE 10th Inter-national Conference on Computer and Information Technology(CIT) pp 1519ndash1524 Bradford UK June 2010

[49] T Efendigil S Onut and C Kahraman ldquoA decision supportsystem for demand forecasting with artificial neural networksand neuro-fuzzy models a comparative analysisrdquo Expert Sys-tems with Applications vol 36 no 3 pp 6697ndash6707 2009

[50] E Duman M Uysal and A F Alkaya ldquoMigrating birds opti-mization a newmetaheuristic approach and its performance onquadratic assignment problemrdquo Information Sciences vol 217pp 65ndash77 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 8: An Improved Demand Forecasting Model Using …downloads.hindawi.com/journals/complexity/2019/9067367.pdfAn Improved Demand Forecasting Model Using Deep Learning Approach and Proposed

8 Complexity

Given n is the number of storesm is the number of products t is the number ofalgorithms in the system and 119860 119905 is an algorithm with index t 119904119898119899 is the matrix whichincludes the number of best performing algorithms for each store and product 119861119898119899 is thematrix which contains the set of blacklist of algorithms for each store and product and119865119894119895 is the matrix which stores final decision of each forecast where i is the number ofstores and j is the number of products

for i=1nfor j=1mfor k=1tif 119860119896 is in list 119861119894119895 then continueelse run 119860119896Calculate algorithm weight119882119896end if

end forfor k=1tChoice best performing algorithms and locate in 119904119894119895

end forfor z=1 119904119894119895do scaling for 119860119911

end forCalculate proposed integration strategy and store in 119865119894119895

end forend forreturn all 119865119899119898

Algorithm 1 The algorithm of the proposed demand forecasting system

Deep learning approach is used to estimate customerdemand for each product at each store of SOK Marketretail chain There is also very apparent similarity betweenproducts which are mostly in the same basket in terms ofsales trends For this purpose Apriori algorithm [4] is utilizedto find correlated products of each product Apriori algorithmis a fast way to find out most associated products which areat the same basketThemost associated productrsquos sales relatedfeatures are added to dataset as wellThis takes the advantageof relationships among products with similar sales trendsAddition of associated product of a product and its featuresenables DL algorithm to learn better within different hiddenlayers In summary it is observed in extensive experimentsthat enhancements of training dataset with weather data andmost associated products sales data made our estimates morepowerful with the usage of DL algorithm

The dataset includes weekly based data of each product ateach store for the latest 8 weeks and sales quantities of the last4 weeks in the same season Historically 2 years of data areincluded for each 5500 stores and 1500 products Data size isabout 875 million records within 155 features These featuresinclude sales quantity customer return quantity receivedproduct quantities from distribution centers sales amountdiscount amount receipt counts of a product customercounts who bought a specific product sales quantity foreach day of week (Monday Tuesday etc) maximum andminimum stock level of each day average stock level for eachweek sales quantity for each hour of the days and also salesquantities of last 4 weeks of the same season Since there isa relationship between products which are generally in thesame basket we prepared the same features defined above for

the most associated products as well Detailed explanation ofthe dataset is presented in Table 2

In Table 2 fields given with square brackets mean that itis an array for each week (Ex Sales quantity week 0 meanssales quantity of current week and sales quantity week 1 isfor sales quantity for one week before and so on) Regardingexperimental results taking weekly data for the latest 8 weeksand 4 weeks of the same season at previous year gives moreacceptable trends of changes in seasonality

For DL implementation part H2O library [31] is usedas an open source big data artificial intelligence platformH2O is a powerful machine learning library and gives usopportunity to implement DL algorithms on Hadoop Sparkbig data environments It puts together the power of highlyadvanced machine learning algorithms and provides to getbenefit of truly scalable in-memory processing for Sparkbig data environment on one or many nodes via its versionof Sparkling Water By the help of in-memory processingcapability of Spark technologies it provides faster parallelplatform to utilize big data for business requirements to getmaximum benefit In this study we use H2O version 31402on 64-bit 16-core and 32GBmemorymachine to observe thepower of parallelism during DL modelling while increasingthe number of neurons

A multilayer feedforward artificial neural network(MLFANN) is employed as a deep learning algorithmMLFANN is trained with stochastic gradient descent usingbackpropagation Using gradient descent each weight isadjusted according to its contribution value to the errorH2O has also advanced features like momentum trainingadaptive learning rate L1 or L2 regularization rate annealing

Complexity 9

Table 2 Dataset explanation

Name Description ExampleYearweek Related yearweek weeks are starting fromMonday to Sunday 201801Store Store number 1234Product Product Identification Number 1

Product Adjactive Associated product with the product according to apriori algorithm Most frequentproduct at the same basket with a specific product 2

Stock In Quantity Week[0-8] Stock increase quantity of the product in related week ex stock transfer quantityfrom distribution center to store 50

Return Quantity Week[0-8] Stock return quantity from customers at a specific week and store 20Sales Quantity Week[0-8] Weekly sales quantity of related product at a specific store 120Sales Amount Week[0-8] Total sales amount of the product at the customer receipt 2500Discount Amount Week[0-8] Discount amount of the product if there is any 500Customer Count Week[0-8] How many customers bought this product at a specific week 30Receipt Count Week[0-8] Distinct receipt count for related product 20Sales Quantity Time[9-22] Hourly sales quantity of related product from 9 am to 22 pm 5Last4weeks Day[1-7] Total sales of each weekday of last 4 weeks Total sales of Mondays Tuesdays etc 10Last8weeks Day[1-7] Total sales of each weekday of last 8 weeks Total sales of Mondays Tuesdays etc 10Max Stock Week[0-8] Maximum stock quantity of related week 12Min Stock Week[0-8] Minimum stock quantity of related week 2Avg Stock Week[0-8] Average stock quantity of related week 5Sales Quantity Adj Week[0-8] Sales quantity of most associated product 14Temperature Weekday[1-7] Daily temperature of weekdays Monday Tuesday etc 22Weekly Avg Temperature[0-8] Average weather temperature of related week 23Weather Condition Weekday[1-7] Nominal variable rainy snowy sunny cloudy etc RainySales Quantity Next Week Target variable of our classification problem 25

dropout and grid search Gaussian distribution is appliedbecause of the usage of continuous variable as responsevariable H2O performs very well in our environment when3 level hidden layers 10 nodes each of them and totally 300epochs are set as parameters

After implementing dimension reduction step K-meansis employed as a clustering algorithm K-means is a greedyand fast clustering algorithm [45] that tries to partitionsamples into k clusters in which each sample is near to thecluster center K is selected as 20 because after several trialsand empirical observations the most even distribution ondataset is reached and divided our dataset into 20 differ-ent clusters for each store After that part deep learningalgorithm is applied and obtained 20 different DL modelsfor each store Then demand forecasting for each productis performed by using its clusterrsquos model on a store basisThe main reason of using clustering is time and computinglack for DL algorithm Instead of making modelling for eachstore 20 models are generated for each store in product levelOur trials with sample dataset bias difference are less than2 so this speed-up is very beneficial and a good optionfor researchers The forecasting result of the DL model istransferred into our forecasting system that makes its finaldecision by considering decisions of 11 forecasting methodsincluding DL model

Forecasting solutions should be scalable process largeamounts of data and extract a model from data Big data

environments using Spark technologies give us opportunityfor implementing algorithms with scalability that shares tasksof machine learning algorithms among nodes of parallelarchitectures Considering huge amounts of samples andlarge number of features even the computational power ofparallel architecture is not enough in some of cases Forthis reason dimension reduction is needed for big dataapplications In this study PCA is employed as a featureextraction step

5 Experimental Results

Forecasting solutions should be scalable process largeamounts of data and extract a model from data Big dataenvironments using Spark technologies give us opportunityfor implementing algorithms with scalability that shares tasksof machine learning algorithms among nodes of parallelarchitectures Considering huge amounts of samples andlarge number of features even the computational power ofparallel architecture is not enough in some of cases Forthis reason dimension reduction is needed for big dataapplications In this study PCA is employed as a featureextraction step

SOK Market sells 21 different groups of items We assessthe performance of the proposed forecasting system on groupbasis as shown in Table 3 Table 3 shows MAPE (Mean Abso-lute Percentage Error) of integration strategies (S1) and (S2)

10 Complexity

Table3Dem

andforecasting

improvem

entsperp

rodu

ctgrou

p

Prod

uct

Group

s

Metho

dI

with

outP

ropo

sed

IntegrationStrategy

(119878 1)

MAPE

Metho

dII

with

Prop

osed

IntegrationStrategy

(119878 2)

MAPE

Prop

osed

Integration

Strategy

with

Deep

Learning

(119878 119863)

MAPE

Percentage

Success

Rate

119875 1=(119878 1minus

119878 2)119878 2

Percentage

Success

Rate

119875 2=(119878 1minus

119878 119863)119878 119863

Improvem

ent

119863=119875 2minus

119875 1Ba

byProd

ucts

05157

03081

02927

4026

4325

299

Bakery

Prod

ucts

03482

02059

01966

4085

4352

267

Beverage

03714

02316

02207

3764

4058

294

Biscuit-C

hocolate

03358

02077

01977

3814

4113

299

BreakfastP

rodu

cts

044

4302770

02661

3765

4011

245

Canned-Paste

-Sauces

03836

02309

02198

3980

4269

289

Cheese

03953

02457

02345

3784

4068

284

Cleaning

Prod

ucts

04560

02791

02650

3879

4189

310

CosmeticsP

rodu

cts

05397

03266

03148

3949

4167

219

DeliM

eats

04242

02602

02488

3865

4136

270

EdibleOils

040

6002299

02215

4336

4545

209

Hou

seho

ldGoo

ds05713

03656

03535

3601

3813

212

IceC

ream

-Frozen

05012

03255

03106

3505

3803

298

Legu

mes-Pasta-Sou

p03850

02397

02269

3774

4107

333

Nuts-Ch

ips

03316

02049

01966

3820

4071

251

PoultryEg

gs04219

02527

02403

4011

4304

294

ReadyMeals

04613

02610

02520

4342

4536

194

RedMeat

02514

01616

01532

3571

3906

335

Tea-Coff

eeProd

ucts

04347

02650

02535

3904

4168

264

Textile

Prod

ucts

05418

03048

02907

4374

46

34

260

To

baccoProd

ucts

03791

02378

02290

3729

3961

232

Average

04238

02582

02469

3899

4168

269

Complexity 11

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 2 MAPE distribution according to product groups with S1 integration strategy

in columns 2 and 3The average percentage forecasting errorsof strategies (S1) and (S2) are 042 and 026 respectively As itcan be seen fromTable 3 the percentage success rate of (S2) inaccordance with (S1) is represented in column 5 The currentintegration strategy (S2) provides 3899 improvement overthe first integration strategy (S1) average In some productgroups like Textile Products the error rate is reduced from054 to 031 with 4374 enhancement in MAPE

With this work a further improvement is obtainedby utilizing DL model compared to the usage of just 10algorithms which are based on forecasting strategy Column4 of Table 3 demonstrates MAPEs of 21 different groups ofproducts after the addition of DL model After the inclusionof DL model into the proposed system we observed around2 to 34 increase in demand forecast accuracies in someof product groups The error rates of some product groupslike ldquobaby productsrdquo ldquocosmetic productsrdquo and ldquohouseholdgoodsrdquo are usually higher than others since these productgroups do not have continuous demand by consumers Forexample the error rate for ldquohousehold goodsrdquo group is 035while the error rate for ldquored meatrdquo group is 015 The errorrates in food products are usually lower since customersconsider their daily consumption regularly to buy theseproducts However the error rates are higher in baby textileand household goods Customers buy products from thesegroups depending on whether they like the product or notselectively In addition the prices of goods in these groupsand promotion effects are usually high relative to the pricesof goods in other groups

Moreover it is observed that the top benefitted productgroups for Method I accomplishes over 40 success rateon Baby Products Bakery Products Edible Oils PoultryEggs Ready Meals and Textile Products when the column

of percentage success rate is analyzed The common pointof these groups is that each one of them is being consumedby a specific customer segment regularly For instance babyproducts group is being chosen by families who have kidsEdible Oils Poultry Eggs and Bakery Products are beingpreferred by frequently and continuously shopping customersegments and Ready Meals are being preferred by mostlybachelor single and working consumers etc Furthermorethe inclusion of DL into the forecasting system indicatesthat some other consuming groups (for example CleaningProducts RedMeats and Legumes Pasta Soup) exhibit betterperformance than the others with additional over 3

Figure 2 shows the box plots of mean percentage errors(MAPE) for each group of products after application of(S1) integration strategy The box plots enable us to analyzedistributional characteristics of forecasting errors for productgroups As it can be seen from the figure there are differentmedians for each product group Thus the forecasting errorsare usually dissimilar for different products groups Theinterquartile range box represents the middle 50 of scoresin data The lengths of interquartile range boxes are usuallyvery tall This means that there are quite number of differentprediction errors within products of a given group

Figure 3 presents the box plots of MAPEs of the proposedforecasting system that apply proposed integration approach(S2) which combines the predictions of 10 different forecast-ing algorithms

Figure 4 shows after adding DL to our system resultsThe lengths of interquartile range boxes are narrower whencompared to the ones in Figures 2 and 3 This means that thedistribution of forecasting errors is less than Figures 2 and 3Finally the integration of DL into the proposed forecastingsystem generates results that are more accurate

12 Complexity

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 3 MAPE distribution according to product groups with S2 integration strategy

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 4 MAPE distribution according to product groups after deep learning algorithms

For example for the (S1) integration strategy in BabyProducts product group MAPE distribution for the 1st quar-tile of data 5 is observed between 0and 23 for 2nd quartilebetween 23 and 48 and for 3rd and 4th between 48and 75 in Figure 2 After applying integration strategy forBaby Products product group MAPE distribution becomes0ndash16 for the 1st quartile 16ndash30 for the 2nd quartileof the data and 30ndash50 for the rest This gives around8 improvement for the first quartile from 8 to 18enhancement for the second quartile and from 18 to 25advancement for the rest of the data according to boundaries

differences Median of MAPE distribution is analyzed as48 After applying the proposed integration strategy it isobserved as 27 which means 21 improvement Approx-imately 1-3 enhancement is observed for each quartileof the data with the inclusion of deep learning strategy inFigure 4

For Cleaning Products similar improvements areobserved with the others for integration strategy MAPEdistribution of the 1st quartile of data is observed between0 and 19 for the 2nd quartile it is between 19 and 40and for the rest of data it is observed between 40 and

Complexity 13

SUM

(SA

LE Q

UAN

TITY

)

DATE

9000

8000

7000

6000

5000

4000

3000

2000

1000

0

July August Sep Oct Nov Dec Jan Feb March April May June July

ActualS1

S2SD

2017 2017 2017 2017 2017 2017 2018 2018 2018 2018 2018 2018 2018

Figure 5 Accuracy comparison among integration strategies for one sample product group

73 in Figure 2 After performing the proposed integrationstrategy for Cleaning Products MAPE distribution becomes0ndash10 for the 1st quartile of data 10ndash17 for the 2ndquartile and 27ndash43 for the 3rd and 4th quartiles togetherImprovements are observed as follows 9 for the 1st quartilefrom 9 to 13 for the 2nd quartile and from 13 to 20for the rest of the data at boundaries After implementationof the integration strategy for the 1st and 2nd quartiles thesystem advancement is observed as 2 and for the 3rdand 4th quartiles it is 3 respectively Briefly integrationstrategy improvement is remarkable for each product groupand consolidation of DL to the proposed integration strategyadvances results between almost 2 and 3 additionally

In Figure 5 the accuracy comparison among integrationstrategies of one of the best performing sample groups ispresented during 1-year period Particularly Christmas andother seasonality effects can be seen very clearly at Figure 5and integration strategy with DL (SD) is performing with thebest accuracy

It is hard to compare the performance of our proposedsystem with the other studies because of the lack of workswith similar combinations of deep learning approach theproposed decision integration strategies and different learn-ing algorithms for demand forecasting process Anotherdifficulty is the deficiency of real-life dataset similar toSOK dataset in order to compare the state-of-art studiesAlthough the results of proposed system are given in thisstudy we also report the results of a number of researchworkshere on demand forecasting domain A greedy aggregation-decomposition method has solved a real-world intermittentdemand forecasting problem for a fashion retailer in Sin-gapore [46] They report 59 MAPE success rate with asmall dataset Authors compare different approaches suchas statistical model winter model and radius basis functionneural network (RBFNN) with SVM for demand forecastingprocess in [47] As a result they conclude the study by the

fact that the success of SVM algorithm surpasses otherswith around 77 enhancement at average MAPE resultslevel A novel demand forecasting model called SHEnSVM(Selective and Heterogeneous Ensemble of Support Vec-tor Machines) is proposed in [48] The proposed modelpresents that the individual SVMs are trained by differentsamples generated by bootstrap algorithm After that geneticalgorithm is employed for retrieving the best individualcombination schema They report 10 advancement withSVM algorithm and 64 MAPE improvement They onlyemploy beer datawith 3 different brands in their experimentsTugba Efendigil et al propose a novel forecasting mechanismwhich is modeled by artificial intelligence approaches in[49] They compared both artificial neural networks andadaptive network-based fuzzy inference system techniques tomanage the fuzzy demandwith incomplete informationTheyreached around 18 MAPE rates for some products duringtheir experiments

6 Conclusion

In retail industry demand forecasting is one of the mainproblems of supply chains to optimize stocks reduce costsand increase sales profit and customer loyalty To overcomethis issue there are several methods such as time seriesanalysis and machine learning approaches to analyze andlearn complex interactions and patterns from historical data

In this study there is a novel attempt to integrate the11 different forecasting models that include time series algo-rithms support vector regression model and deep learningmethod for demand forecasting process Moreover a noveldecision integration strategy is developed by drawing inspi-ration from the ensemble methodology namely boostingThe proposed approach considers the performance of eachmodel in time and combines the weighted predictions of thebest performing models for demand forecasting process The

14 Complexity

proposed forecasting system is tested and carried out real lifedata obtained from SOK Market retail chain It is observedthat the inclusion of different learning algorithms except timeseries models and a novel integration strategy advanced theperformance of demand forecasting system To the best ofour knowledge this is the very first attempt to consolidatedeep learning methodology SVR algorithm and differenttime series methods for demand forecasting systems Fur-thermore the other novelty of this work is the adaptation ofboosting ensemble strategy to the demand forecasting modelIn this way the final decision of the proposed system isbased on the best algorithms of the week by gaining moreweight This makes our forecasts more reliable with respectto the trend changes and seasonality behaviors Moreoverthe proposed approach performs very well integrating withdeep learning algorithm on Spark big data environmentDimension reduction process and clustering methods helpto decrease time consuming with less computing powerduring deep learning modelling phase Although a reviewof some of similar studies is presented in Section 5 asit is expected it is very difficult to compare the resultsof other studies with ours because of the use of differentdatasets and methods In this study we compare results ofthree models where model one selects the best performingforecasting method depending on its success on previousperiod with 424 MAPE on average The second modelwith the novel integration strategy results in 258 MAPEon average The last model with the novel integration strat-egy enhanced with deep learning approach provides 247MAPE on average As a result the inclusion of deep learningapproach into the novel integration strategy reduces averageprediction error for demand forecasting process in supplychain

As a future work we plan to enrich the set of featuresby gathering data from other sources like economic studiesshopping trends social media social events and locationbased demographic data of stores New variety of datasources contributions to deep learning can be observedOne more study can be done to determine the hyperpa-rameters for deep learning algorithm In addition we alsoplan to use other deep learning techniques such as con-volutional neural networks recurrent neural networks anddeep neural networks as learning algorithms Furthermoreour another objective is to use heuristic methods MBO(Migrating Birds Optimization) and other related algorithms[50] to optimize some of coefficientsweights which weredetermined empirically by trial-and-error like taking 30percent of the best performing methods in our currentsystem

Data Availability

This dataset is private customer data so the agreement withthe organization SOK does not allow sharing data

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work is supported by OBASE Research amp DevelopmentCenter

References

[1] P J McGoldrick and E Andre ldquoConsumer misbehaviourpromiscuity or loyalty in grocery shoppingrdquo Journal of Retailingand Consumer Services vol 4 no 2 pp 73ndash81 1997

[2] D Grewal A L Roggeveen and J Nordfalt ldquoThe Future ofRetailingrdquo Journal of Retailing vol 93 no 1 pp 1ndash6 2017

[3] E T Bradlow M Gangwar P Kopalle and S Voleti ldquothe roleof big data and predictive analytics in retailingrdquo Journal ofRetailing vol 93 no 1 pp 79ndash95 2017

[4] J Han M Kamber and J Pei Data Mining Concepts andTechniques Morgan Kaufmann Publishers USA 2013

[5] R Hyndman and G Athanasopoulos Forecasting Principlesand Practice OTexts Melbourne Australia 2018 httpotextsorgfpp2

[6] C C Pegels ldquoExponential forecasting some new variationsrdquoManagement Science vol 12 pp 311ndash315 1969

[7] E S Gardner ldquoExponential smoothing the state of the artrdquoJournal of Forecasting vol 4 no 1 pp 1ndash28 1985

[8] D Gujarati Basic Econometrics Mcgraw-Hill New York NYUSA 2003

[9] J D Croston ldquoForecasting and stock control for intermittentdemandsrdquo Operational Research Quarterly vol 23 no 3 pp289ndash303 1972

[10] T RWillemain C N Smart J H Shockor and P A DeSautelsldquoForecasting intermittent demand inmanufacturing a compar-ative evaluation of Crostonrsquos methodrdquo International Journal ofForecasting vol 10 no 4 pp 529ndash538 1994

[11] A Syntetos Forecasting of intermittent demand Brunel Univer-sity 2001

[12] A A Syntetos and J E Boylan ldquoThe accuracy of intermittentdemand estimatesrdquo International Journal of Forecasting vol 21no 2 pp 303ndash314 2005

[13] R H Teunter A A Syntetos and M Z Babai ldquoIntermittentdemand linking forecasting to inventory obsolescencerdquo Euro-pean Journal of Operational Research vol 214 no 3 pp 606ndash615 2011

[14] R J Hyndman R A Ahmed G Athanasopoulos and H LShang ldquoOptimal combination forecasts for hierarchical timeseriesrdquo Computational Statistics amp Data Analysis vol 55 no 9pp 2579ndash2589 2011

[15] R Fildes and F Petropoulos ldquoSimple versus complex selectionrules for forecasting many time seriesrdquo Journal of BusinessResearch vol 68 no 8 pp 1692ndash1701 2015

[16] C Li and A Lim ldquoA greedy aggregationndashdecompositionmethod for intermittent demand forecasting in fashion retail-ingrdquo European Journal of Operational Research vol 269 no 3pp 860ndash869 2018

[17] F Turrado Garcıa L J Garcıa Villalba and J Portela ldquoIntelli-gent system for time series classification using support vectormachines applied to supply-chainrdquo Expert Systems with Appli-cations vol 39 no 12 pp 10590ndash10599 2012

[18] G Song and Q Dai ldquoA novel double deep ELMs ensemblesystem for time series forecastingrdquo Knowledge-Based Systemsvol 134 pp 31ndash49 2017

Complexity 15

[19] O Araque I Corcuera-Platas J F Sanchez-Rada and C AIglesias ldquoEnhancing deep learning sentiment analysis withensemble techniques in social applicationsrdquoExpert SystemswithApplications vol 77 pp 236ndash246 2017

[20] H Tong B Liu and S Wang ldquoSoftware defect predictionusing stacked denoising autoencoders and twostage ensemblelearningrdquo Information and Soware Technology vol 96 pp 94ndash111 2018

[21] X Qiu Y Ren P N Suganthan and G A J AmaratungaldquoEmpirical mode decomposition based ensemble deep learningfor load demand time series forecastingrdquo Applied So Comput-ing vol 54 pp 246ndash255 2017

[22] Z Qi BWang Y Tian and P Zhang ldquoWhen ensemble learningmeets deep learning a new deep support vector machine forclassificationrdquo Knowledge-Based Systems vol 107 pp 54ndash602016

[23] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[24] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[25] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesrdquo inNeural Networks Tricks of theTrade vol 7700 of Lecture Notes in Computer Science pp 437ndash478 Springer Berlin Germany 2nd edition 2012

[26] Y LeCun Y Bengio and G Hinton ldquoDeep learning ReviewrdquoInternational Journal of Business and Social Science vol 3 2015

[27] S Kim Z Yu R M Kil and M Lee ldquoDeep learning of supportvectormachineswith class probability output networksrdquoNeuralNetworks vol 64 pp 19ndash28 2015

[28] P J Brockwell and R A Davis Time Series eory AndMethods Springer Series in Statistics NewYork NY USA 1989

[29] V N Vapnik Statistical Learning eory Wiley- InterscienceNew York NY USA 1998

[30] S Haykin Neural Networks and LearningMachines MacmillanPublishers Limited New Jersey NJ USA 2009

[31] A Candel V Parmar E LeDell and A Arora Deep Learningwith H2O United States of America 2018

[32] K Keerthi Vasan and B Surendiran ldquoDimensionality reductionusing principal component analysis for network intrusiondetectionrdquo Perspectives in Science vol 8 pp 510ndash512 2016

[33] L Rokach ldquoEnsemble-based classifiersrdquo Artificial IntelligenceReview vol 33 no 1-2 pp 1ndash39 2010

[34] R Polikar ldquoEnsemble based systems in decision makingrdquo IEEECircuits and Systems Magazine vol 6 no 3 pp 21ndash45 2006

[35] Reid SamA review of heterogeneous ensemblemethods Depart-ment of Computer Science University of Colorado at Boulder2007

[36] D Gopika and B Azhagusundari ldquoAn analysis on ensem-ble methods in classification tasksrdquo International Journal ofAdvanced Research in Computer and Communication Engineer-ing vol 3 no 7 pp 7423ndash7427 2014

[37] Y Ren L Zhang and P N Suganthan ldquoEnsemble classificationand regression-recent developments applications and futuredirectionsrdquo IEEE Computational Intelligence Magazine vol 11no 1 pp 41ndash53 2016

[38] U G Mangai S Samanta S Das and P R ChowdhuryldquoA survey of decision fusion and feature fusion strategies for

pattern classificationrdquo IETE Technical Review vol 27 no 4 pp293ndash307 2010

[39] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014

[40] G Tsoumakas L Angelis and I Vlahavas ldquoSelective fusion ofheterogeneous classifiersrdquo Intelligent Data Analysis vol 9 no 6pp 511ndash525 2005

[41] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[42] S Chopra and P Meindl Supply Chain Management StrategyPlanning and Operation Prentice Hall 2004

[43] G Kushwaha ldquoOperational performance through supply chainmanagement practicesrdquo International Journal of Business andSocial Science vol 217 pp 65ndash77 2012

[44] G E Box G M Jenkins G Reinsel and G M LjungTime Series Analysis Forecasting and Control Wiley Series inProbability and Statistics John Wiley amp Sons Hoboken NJUSA 5th edition 2015

[45] W-L Zhao C-H Deng and C-W Ngo ldquok-means a revisitrdquoNeurocomputing vol 291 pp 195ndash206 2018

[46] C Li andA Lim ldquoA greedy aggregation-decompositionmethodfor intermittent demand forecasting in fashion retailingrdquo Euro-pean Journal of Operational Research vol 269 no 3 pp 860ndash869 2018

[47] L Yue Y Yafeng G Junjun and T Chongli ldquoDemand forecast-ing by using support vectormachinerdquo inProceedings of theirdInternational Conference on Natural Computation (ICNC 2007)pp 272ndash276 Haikou China August 2007

[48] L Yue L Zhenjiang Y Yafeng T Zaixia G Junjun andZ Bofeng ldquoSelective and heterogeneous SVM ensemble fordemand forecastingrdquo in Proceedings of the 2010 IEEE 10th Inter-national Conference on Computer and Information Technology(CIT) pp 1519ndash1524 Bradford UK June 2010

[49] T Efendigil S Onut and C Kahraman ldquoA decision supportsystem for demand forecasting with artificial neural networksand neuro-fuzzy models a comparative analysisrdquo Expert Sys-tems with Applications vol 36 no 3 pp 6697ndash6707 2009

[50] E Duman M Uysal and A F Alkaya ldquoMigrating birds opti-mization a newmetaheuristic approach and its performance onquadratic assignment problemrdquo Information Sciences vol 217pp 65ndash77 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 9: An Improved Demand Forecasting Model Using …downloads.hindawi.com/journals/complexity/2019/9067367.pdfAn Improved Demand Forecasting Model Using Deep Learning Approach and Proposed

Complexity 9

Table 2 Dataset explanation

Name Description ExampleYearweek Related yearweek weeks are starting fromMonday to Sunday 201801Store Store number 1234Product Product Identification Number 1

Product Adjactive Associated product with the product according to apriori algorithm Most frequentproduct at the same basket with a specific product 2

Stock In Quantity Week[0-8] Stock increase quantity of the product in related week ex stock transfer quantityfrom distribution center to store 50

Return Quantity Week[0-8] Stock return quantity from customers at a specific week and store 20Sales Quantity Week[0-8] Weekly sales quantity of related product at a specific store 120Sales Amount Week[0-8] Total sales amount of the product at the customer receipt 2500Discount Amount Week[0-8] Discount amount of the product if there is any 500Customer Count Week[0-8] How many customers bought this product at a specific week 30Receipt Count Week[0-8] Distinct receipt count for related product 20Sales Quantity Time[9-22] Hourly sales quantity of related product from 9 am to 22 pm 5Last4weeks Day[1-7] Total sales of each weekday of last 4 weeks Total sales of Mondays Tuesdays etc 10Last8weeks Day[1-7] Total sales of each weekday of last 8 weeks Total sales of Mondays Tuesdays etc 10Max Stock Week[0-8] Maximum stock quantity of related week 12Min Stock Week[0-8] Minimum stock quantity of related week 2Avg Stock Week[0-8] Average stock quantity of related week 5Sales Quantity Adj Week[0-8] Sales quantity of most associated product 14Temperature Weekday[1-7] Daily temperature of weekdays Monday Tuesday etc 22Weekly Avg Temperature[0-8] Average weather temperature of related week 23Weather Condition Weekday[1-7] Nominal variable rainy snowy sunny cloudy etc RainySales Quantity Next Week Target variable of our classification problem 25

dropout and grid search Gaussian distribution is appliedbecause of the usage of continuous variable as responsevariable H2O performs very well in our environment when3 level hidden layers 10 nodes each of them and totally 300epochs are set as parameters

After implementing dimension reduction step K-meansis employed as a clustering algorithm K-means is a greedyand fast clustering algorithm [45] that tries to partitionsamples into k clusters in which each sample is near to thecluster center K is selected as 20 because after several trialsand empirical observations the most even distribution ondataset is reached and divided our dataset into 20 differ-ent clusters for each store After that part deep learningalgorithm is applied and obtained 20 different DL modelsfor each store Then demand forecasting for each productis performed by using its clusterrsquos model on a store basisThe main reason of using clustering is time and computinglack for DL algorithm Instead of making modelling for eachstore 20 models are generated for each store in product levelOur trials with sample dataset bias difference are less than2 so this speed-up is very beneficial and a good optionfor researchers The forecasting result of the DL model istransferred into our forecasting system that makes its finaldecision by considering decisions of 11 forecasting methodsincluding DL model

Forecasting solutions should be scalable process largeamounts of data and extract a model from data Big data

environments using Spark technologies give us opportunityfor implementing algorithms with scalability that shares tasksof machine learning algorithms among nodes of parallelarchitectures Considering huge amounts of samples andlarge number of features even the computational power ofparallel architecture is not enough in some of cases Forthis reason dimension reduction is needed for big dataapplications In this study PCA is employed as a featureextraction step

5 Experimental Results

Forecasting solutions should be scalable process largeamounts of data and extract a model from data Big dataenvironments using Spark technologies give us opportunityfor implementing algorithms with scalability that shares tasksof machine learning algorithms among nodes of parallelarchitectures Considering huge amounts of samples andlarge number of features even the computational power ofparallel architecture is not enough in some of cases Forthis reason dimension reduction is needed for big dataapplications In this study PCA is employed as a featureextraction step

SOK Market sells 21 different groups of items We assessthe performance of the proposed forecasting system on groupbasis as shown in Table 3 Table 3 shows MAPE (Mean Abso-lute Percentage Error) of integration strategies (S1) and (S2)

10 Complexity

Table3Dem

andforecasting

improvem

entsperp

rodu

ctgrou

p

Prod

uct

Group

s

Metho

dI

with

outP

ropo

sed

IntegrationStrategy

(119878 1)

MAPE

Metho

dII

with

Prop

osed

IntegrationStrategy

(119878 2)

MAPE

Prop

osed

Integration

Strategy

with

Deep

Learning

(119878 119863)

MAPE

Percentage

Success

Rate

119875 1=(119878 1minus

119878 2)119878 2

Percentage

Success

Rate

119875 2=(119878 1minus

119878 119863)119878 119863

Improvem

ent

119863=119875 2minus

119875 1Ba

byProd

ucts

05157

03081

02927

4026

4325

299

Bakery

Prod

ucts

03482

02059

01966

4085

4352

267

Beverage

03714

02316

02207

3764

4058

294

Biscuit-C

hocolate

03358

02077

01977

3814

4113

299

BreakfastP

rodu

cts

044

4302770

02661

3765

4011

245

Canned-Paste

-Sauces

03836

02309

02198

3980

4269

289

Cheese

03953

02457

02345

3784

4068

284

Cleaning

Prod

ucts

04560

02791

02650

3879

4189

310

CosmeticsP

rodu

cts

05397

03266

03148

3949

4167

219

DeliM

eats

04242

02602

02488

3865

4136

270

EdibleOils

040

6002299

02215

4336

4545

209

Hou

seho

ldGoo

ds05713

03656

03535

3601

3813

212

IceC

ream

-Frozen

05012

03255

03106

3505

3803

298

Legu

mes-Pasta-Sou

p03850

02397

02269

3774

4107

333

Nuts-Ch

ips

03316

02049

01966

3820

4071

251

PoultryEg

gs04219

02527

02403

4011

4304

294

ReadyMeals

04613

02610

02520

4342

4536

194

RedMeat

02514

01616

01532

3571

3906

335

Tea-Coff

eeProd

ucts

04347

02650

02535

3904

4168

264

Textile

Prod

ucts

05418

03048

02907

4374

46

34

260

To

baccoProd

ucts

03791

02378

02290

3729

3961

232

Average

04238

02582

02469

3899

4168

269

Complexity 11

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 2 MAPE distribution according to product groups with S1 integration strategy

in columns 2 and 3The average percentage forecasting errorsof strategies (S1) and (S2) are 042 and 026 respectively As itcan be seen fromTable 3 the percentage success rate of (S2) inaccordance with (S1) is represented in column 5 The currentintegration strategy (S2) provides 3899 improvement overthe first integration strategy (S1) average In some productgroups like Textile Products the error rate is reduced from054 to 031 with 4374 enhancement in MAPE

With this work a further improvement is obtainedby utilizing DL model compared to the usage of just 10algorithms which are based on forecasting strategy Column4 of Table 3 demonstrates MAPEs of 21 different groups ofproducts after the addition of DL model After the inclusionof DL model into the proposed system we observed around2 to 34 increase in demand forecast accuracies in someof product groups The error rates of some product groupslike ldquobaby productsrdquo ldquocosmetic productsrdquo and ldquohouseholdgoodsrdquo are usually higher than others since these productgroups do not have continuous demand by consumers Forexample the error rate for ldquohousehold goodsrdquo group is 035while the error rate for ldquored meatrdquo group is 015 The errorrates in food products are usually lower since customersconsider their daily consumption regularly to buy theseproducts However the error rates are higher in baby textileand household goods Customers buy products from thesegroups depending on whether they like the product or notselectively In addition the prices of goods in these groupsand promotion effects are usually high relative to the pricesof goods in other groups

Moreover it is observed that the top benefitted productgroups for Method I accomplishes over 40 success rateon Baby Products Bakery Products Edible Oils PoultryEggs Ready Meals and Textile Products when the column

of percentage success rate is analyzed The common pointof these groups is that each one of them is being consumedby a specific customer segment regularly For instance babyproducts group is being chosen by families who have kidsEdible Oils Poultry Eggs and Bakery Products are beingpreferred by frequently and continuously shopping customersegments and Ready Meals are being preferred by mostlybachelor single and working consumers etc Furthermorethe inclusion of DL into the forecasting system indicatesthat some other consuming groups (for example CleaningProducts RedMeats and Legumes Pasta Soup) exhibit betterperformance than the others with additional over 3

Figure 2 shows the box plots of mean percentage errors(MAPE) for each group of products after application of(S1) integration strategy The box plots enable us to analyzedistributional characteristics of forecasting errors for productgroups As it can be seen from the figure there are differentmedians for each product group Thus the forecasting errorsare usually dissimilar for different products groups Theinterquartile range box represents the middle 50 of scoresin data The lengths of interquartile range boxes are usuallyvery tall This means that there are quite number of differentprediction errors within products of a given group

Figure 3 presents the box plots of MAPEs of the proposedforecasting system that apply proposed integration approach(S2) which combines the predictions of 10 different forecast-ing algorithms

Figure 4 shows after adding DL to our system resultsThe lengths of interquartile range boxes are narrower whencompared to the ones in Figures 2 and 3 This means that thedistribution of forecasting errors is less than Figures 2 and 3Finally the integration of DL into the proposed forecastingsystem generates results that are more accurate

12 Complexity

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 3 MAPE distribution according to product groups with S2 integration strategy

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 4 MAPE distribution according to product groups after deep learning algorithms

For example for the (S1) integration strategy in BabyProducts product group MAPE distribution for the 1st quar-tile of data 5 is observed between 0and 23 for 2nd quartilebetween 23 and 48 and for 3rd and 4th between 48and 75 in Figure 2 After applying integration strategy forBaby Products product group MAPE distribution becomes0ndash16 for the 1st quartile 16ndash30 for the 2nd quartileof the data and 30ndash50 for the rest This gives around8 improvement for the first quartile from 8 to 18enhancement for the second quartile and from 18 to 25advancement for the rest of the data according to boundaries

differences Median of MAPE distribution is analyzed as48 After applying the proposed integration strategy it isobserved as 27 which means 21 improvement Approx-imately 1-3 enhancement is observed for each quartileof the data with the inclusion of deep learning strategy inFigure 4

For Cleaning Products similar improvements areobserved with the others for integration strategy MAPEdistribution of the 1st quartile of data is observed between0 and 19 for the 2nd quartile it is between 19 and 40and for the rest of data it is observed between 40 and

Complexity 13

SUM

(SA

LE Q

UAN

TITY

)

DATE

9000

8000

7000

6000

5000

4000

3000

2000

1000

0

July August Sep Oct Nov Dec Jan Feb March April May June July

ActualS1

S2SD

2017 2017 2017 2017 2017 2017 2018 2018 2018 2018 2018 2018 2018

Figure 5 Accuracy comparison among integration strategies for one sample product group

73 in Figure 2 After performing the proposed integrationstrategy for Cleaning Products MAPE distribution becomes0ndash10 for the 1st quartile of data 10ndash17 for the 2ndquartile and 27ndash43 for the 3rd and 4th quartiles togetherImprovements are observed as follows 9 for the 1st quartilefrom 9 to 13 for the 2nd quartile and from 13 to 20for the rest of the data at boundaries After implementationof the integration strategy for the 1st and 2nd quartiles thesystem advancement is observed as 2 and for the 3rdand 4th quartiles it is 3 respectively Briefly integrationstrategy improvement is remarkable for each product groupand consolidation of DL to the proposed integration strategyadvances results between almost 2 and 3 additionally

In Figure 5 the accuracy comparison among integrationstrategies of one of the best performing sample groups ispresented during 1-year period Particularly Christmas andother seasonality effects can be seen very clearly at Figure 5and integration strategy with DL (SD) is performing with thebest accuracy

It is hard to compare the performance of our proposedsystem with the other studies because of the lack of workswith similar combinations of deep learning approach theproposed decision integration strategies and different learn-ing algorithms for demand forecasting process Anotherdifficulty is the deficiency of real-life dataset similar toSOK dataset in order to compare the state-of-art studiesAlthough the results of proposed system are given in thisstudy we also report the results of a number of researchworkshere on demand forecasting domain A greedy aggregation-decomposition method has solved a real-world intermittentdemand forecasting problem for a fashion retailer in Sin-gapore [46] They report 59 MAPE success rate with asmall dataset Authors compare different approaches suchas statistical model winter model and radius basis functionneural network (RBFNN) with SVM for demand forecastingprocess in [47] As a result they conclude the study by the

fact that the success of SVM algorithm surpasses otherswith around 77 enhancement at average MAPE resultslevel A novel demand forecasting model called SHEnSVM(Selective and Heterogeneous Ensemble of Support Vec-tor Machines) is proposed in [48] The proposed modelpresents that the individual SVMs are trained by differentsamples generated by bootstrap algorithm After that geneticalgorithm is employed for retrieving the best individualcombination schema They report 10 advancement withSVM algorithm and 64 MAPE improvement They onlyemploy beer datawith 3 different brands in their experimentsTugba Efendigil et al propose a novel forecasting mechanismwhich is modeled by artificial intelligence approaches in[49] They compared both artificial neural networks andadaptive network-based fuzzy inference system techniques tomanage the fuzzy demandwith incomplete informationTheyreached around 18 MAPE rates for some products duringtheir experiments

6 Conclusion

In retail industry demand forecasting is one of the mainproblems of supply chains to optimize stocks reduce costsand increase sales profit and customer loyalty To overcomethis issue there are several methods such as time seriesanalysis and machine learning approaches to analyze andlearn complex interactions and patterns from historical data

In this study there is a novel attempt to integrate the11 different forecasting models that include time series algo-rithms support vector regression model and deep learningmethod for demand forecasting process Moreover a noveldecision integration strategy is developed by drawing inspi-ration from the ensemble methodology namely boostingThe proposed approach considers the performance of eachmodel in time and combines the weighted predictions of thebest performing models for demand forecasting process The

14 Complexity

proposed forecasting system is tested and carried out real lifedata obtained from SOK Market retail chain It is observedthat the inclusion of different learning algorithms except timeseries models and a novel integration strategy advanced theperformance of demand forecasting system To the best ofour knowledge this is the very first attempt to consolidatedeep learning methodology SVR algorithm and differenttime series methods for demand forecasting systems Fur-thermore the other novelty of this work is the adaptation ofboosting ensemble strategy to the demand forecasting modelIn this way the final decision of the proposed system isbased on the best algorithms of the week by gaining moreweight This makes our forecasts more reliable with respectto the trend changes and seasonality behaviors Moreoverthe proposed approach performs very well integrating withdeep learning algorithm on Spark big data environmentDimension reduction process and clustering methods helpto decrease time consuming with less computing powerduring deep learning modelling phase Although a reviewof some of similar studies is presented in Section 5 asit is expected it is very difficult to compare the resultsof other studies with ours because of the use of differentdatasets and methods In this study we compare results ofthree models where model one selects the best performingforecasting method depending on its success on previousperiod with 424 MAPE on average The second modelwith the novel integration strategy results in 258 MAPEon average The last model with the novel integration strat-egy enhanced with deep learning approach provides 247MAPE on average As a result the inclusion of deep learningapproach into the novel integration strategy reduces averageprediction error for demand forecasting process in supplychain

As a future work we plan to enrich the set of featuresby gathering data from other sources like economic studiesshopping trends social media social events and locationbased demographic data of stores New variety of datasources contributions to deep learning can be observedOne more study can be done to determine the hyperpa-rameters for deep learning algorithm In addition we alsoplan to use other deep learning techniques such as con-volutional neural networks recurrent neural networks anddeep neural networks as learning algorithms Furthermoreour another objective is to use heuristic methods MBO(Migrating Birds Optimization) and other related algorithms[50] to optimize some of coefficientsweights which weredetermined empirically by trial-and-error like taking 30percent of the best performing methods in our currentsystem

Data Availability

This dataset is private customer data so the agreement withthe organization SOK does not allow sharing data

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work is supported by OBASE Research amp DevelopmentCenter

References

[1] P J McGoldrick and E Andre ldquoConsumer misbehaviourpromiscuity or loyalty in grocery shoppingrdquo Journal of Retailingand Consumer Services vol 4 no 2 pp 73ndash81 1997

[2] D Grewal A L Roggeveen and J Nordfalt ldquoThe Future ofRetailingrdquo Journal of Retailing vol 93 no 1 pp 1ndash6 2017

[3] E T Bradlow M Gangwar P Kopalle and S Voleti ldquothe roleof big data and predictive analytics in retailingrdquo Journal ofRetailing vol 93 no 1 pp 79ndash95 2017

[4] J Han M Kamber and J Pei Data Mining Concepts andTechniques Morgan Kaufmann Publishers USA 2013

[5] R Hyndman and G Athanasopoulos Forecasting Principlesand Practice OTexts Melbourne Australia 2018 httpotextsorgfpp2

[6] C C Pegels ldquoExponential forecasting some new variationsrdquoManagement Science vol 12 pp 311ndash315 1969

[7] E S Gardner ldquoExponential smoothing the state of the artrdquoJournal of Forecasting vol 4 no 1 pp 1ndash28 1985

[8] D Gujarati Basic Econometrics Mcgraw-Hill New York NYUSA 2003

[9] J D Croston ldquoForecasting and stock control for intermittentdemandsrdquo Operational Research Quarterly vol 23 no 3 pp289ndash303 1972

[10] T RWillemain C N Smart J H Shockor and P A DeSautelsldquoForecasting intermittent demand inmanufacturing a compar-ative evaluation of Crostonrsquos methodrdquo International Journal ofForecasting vol 10 no 4 pp 529ndash538 1994

[11] A Syntetos Forecasting of intermittent demand Brunel Univer-sity 2001

[12] A A Syntetos and J E Boylan ldquoThe accuracy of intermittentdemand estimatesrdquo International Journal of Forecasting vol 21no 2 pp 303ndash314 2005

[13] R H Teunter A A Syntetos and M Z Babai ldquoIntermittentdemand linking forecasting to inventory obsolescencerdquo Euro-pean Journal of Operational Research vol 214 no 3 pp 606ndash615 2011

[14] R J Hyndman R A Ahmed G Athanasopoulos and H LShang ldquoOptimal combination forecasts for hierarchical timeseriesrdquo Computational Statistics amp Data Analysis vol 55 no 9pp 2579ndash2589 2011

[15] R Fildes and F Petropoulos ldquoSimple versus complex selectionrules for forecasting many time seriesrdquo Journal of BusinessResearch vol 68 no 8 pp 1692ndash1701 2015

[16] C Li and A Lim ldquoA greedy aggregationndashdecompositionmethod for intermittent demand forecasting in fashion retail-ingrdquo European Journal of Operational Research vol 269 no 3pp 860ndash869 2018

[17] F Turrado Garcıa L J Garcıa Villalba and J Portela ldquoIntelli-gent system for time series classification using support vectormachines applied to supply-chainrdquo Expert Systems with Appli-cations vol 39 no 12 pp 10590ndash10599 2012

[18] G Song and Q Dai ldquoA novel double deep ELMs ensemblesystem for time series forecastingrdquo Knowledge-Based Systemsvol 134 pp 31ndash49 2017

Complexity 15

[19] O Araque I Corcuera-Platas J F Sanchez-Rada and C AIglesias ldquoEnhancing deep learning sentiment analysis withensemble techniques in social applicationsrdquoExpert SystemswithApplications vol 77 pp 236ndash246 2017

[20] H Tong B Liu and S Wang ldquoSoftware defect predictionusing stacked denoising autoencoders and twostage ensemblelearningrdquo Information and Soware Technology vol 96 pp 94ndash111 2018

[21] X Qiu Y Ren P N Suganthan and G A J AmaratungaldquoEmpirical mode decomposition based ensemble deep learningfor load demand time series forecastingrdquo Applied So Comput-ing vol 54 pp 246ndash255 2017

[22] Z Qi BWang Y Tian and P Zhang ldquoWhen ensemble learningmeets deep learning a new deep support vector machine forclassificationrdquo Knowledge-Based Systems vol 107 pp 54ndash602016

[23] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[24] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[25] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesrdquo inNeural Networks Tricks of theTrade vol 7700 of Lecture Notes in Computer Science pp 437ndash478 Springer Berlin Germany 2nd edition 2012

[26] Y LeCun Y Bengio and G Hinton ldquoDeep learning ReviewrdquoInternational Journal of Business and Social Science vol 3 2015

[27] S Kim Z Yu R M Kil and M Lee ldquoDeep learning of supportvectormachineswith class probability output networksrdquoNeuralNetworks vol 64 pp 19ndash28 2015

[28] P J Brockwell and R A Davis Time Series eory AndMethods Springer Series in Statistics NewYork NY USA 1989

[29] V N Vapnik Statistical Learning eory Wiley- InterscienceNew York NY USA 1998

[30] S Haykin Neural Networks and LearningMachines MacmillanPublishers Limited New Jersey NJ USA 2009

[31] A Candel V Parmar E LeDell and A Arora Deep Learningwith H2O United States of America 2018

[32] K Keerthi Vasan and B Surendiran ldquoDimensionality reductionusing principal component analysis for network intrusiondetectionrdquo Perspectives in Science vol 8 pp 510ndash512 2016

[33] L Rokach ldquoEnsemble-based classifiersrdquo Artificial IntelligenceReview vol 33 no 1-2 pp 1ndash39 2010

[34] R Polikar ldquoEnsemble based systems in decision makingrdquo IEEECircuits and Systems Magazine vol 6 no 3 pp 21ndash45 2006

[35] Reid SamA review of heterogeneous ensemblemethods Depart-ment of Computer Science University of Colorado at Boulder2007

[36] D Gopika and B Azhagusundari ldquoAn analysis on ensem-ble methods in classification tasksrdquo International Journal ofAdvanced Research in Computer and Communication Engineer-ing vol 3 no 7 pp 7423ndash7427 2014

[37] Y Ren L Zhang and P N Suganthan ldquoEnsemble classificationand regression-recent developments applications and futuredirectionsrdquo IEEE Computational Intelligence Magazine vol 11no 1 pp 41ndash53 2016

[38] U G Mangai S Samanta S Das and P R ChowdhuryldquoA survey of decision fusion and feature fusion strategies for

pattern classificationrdquo IETE Technical Review vol 27 no 4 pp293ndash307 2010

[39] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014

[40] G Tsoumakas L Angelis and I Vlahavas ldquoSelective fusion ofheterogeneous classifiersrdquo Intelligent Data Analysis vol 9 no 6pp 511ndash525 2005

[41] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[42] S Chopra and P Meindl Supply Chain Management StrategyPlanning and Operation Prentice Hall 2004

[43] G Kushwaha ldquoOperational performance through supply chainmanagement practicesrdquo International Journal of Business andSocial Science vol 217 pp 65ndash77 2012

[44] G E Box G M Jenkins G Reinsel and G M LjungTime Series Analysis Forecasting and Control Wiley Series inProbability and Statistics John Wiley amp Sons Hoboken NJUSA 5th edition 2015

[45] W-L Zhao C-H Deng and C-W Ngo ldquok-means a revisitrdquoNeurocomputing vol 291 pp 195ndash206 2018

[46] C Li andA Lim ldquoA greedy aggregation-decompositionmethodfor intermittent demand forecasting in fashion retailingrdquo Euro-pean Journal of Operational Research vol 269 no 3 pp 860ndash869 2018

[47] L Yue Y Yafeng G Junjun and T Chongli ldquoDemand forecast-ing by using support vectormachinerdquo inProceedings of theirdInternational Conference on Natural Computation (ICNC 2007)pp 272ndash276 Haikou China August 2007

[48] L Yue L Zhenjiang Y Yafeng T Zaixia G Junjun andZ Bofeng ldquoSelective and heterogeneous SVM ensemble fordemand forecastingrdquo in Proceedings of the 2010 IEEE 10th Inter-national Conference on Computer and Information Technology(CIT) pp 1519ndash1524 Bradford UK June 2010

[49] T Efendigil S Onut and C Kahraman ldquoA decision supportsystem for demand forecasting with artificial neural networksand neuro-fuzzy models a comparative analysisrdquo Expert Sys-tems with Applications vol 36 no 3 pp 6697ndash6707 2009

[50] E Duman M Uysal and A F Alkaya ldquoMigrating birds opti-mization a newmetaheuristic approach and its performance onquadratic assignment problemrdquo Information Sciences vol 217pp 65ndash77 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 10: An Improved Demand Forecasting Model Using …downloads.hindawi.com/journals/complexity/2019/9067367.pdfAn Improved Demand Forecasting Model Using Deep Learning Approach and Proposed

10 Complexity

Table3Dem

andforecasting

improvem

entsperp

rodu

ctgrou

p

Prod

uct

Group

s

Metho

dI

with

outP

ropo

sed

IntegrationStrategy

(119878 1)

MAPE

Metho

dII

with

Prop

osed

IntegrationStrategy

(119878 2)

MAPE

Prop

osed

Integration

Strategy

with

Deep

Learning

(119878 119863)

MAPE

Percentage

Success

Rate

119875 1=(119878 1minus

119878 2)119878 2

Percentage

Success

Rate

119875 2=(119878 1minus

119878 119863)119878 119863

Improvem

ent

119863=119875 2minus

119875 1Ba

byProd

ucts

05157

03081

02927

4026

4325

299

Bakery

Prod

ucts

03482

02059

01966

4085

4352

267

Beverage

03714

02316

02207

3764

4058

294

Biscuit-C

hocolate

03358

02077

01977

3814

4113

299

BreakfastP

rodu

cts

044

4302770

02661

3765

4011

245

Canned-Paste

-Sauces

03836

02309

02198

3980

4269

289

Cheese

03953

02457

02345

3784

4068

284

Cleaning

Prod

ucts

04560

02791

02650

3879

4189

310

CosmeticsP

rodu

cts

05397

03266

03148

3949

4167

219

DeliM

eats

04242

02602

02488

3865

4136

270

EdibleOils

040

6002299

02215

4336

4545

209

Hou

seho

ldGoo

ds05713

03656

03535

3601

3813

212

IceC

ream

-Frozen

05012

03255

03106

3505

3803

298

Legu

mes-Pasta-Sou

p03850

02397

02269

3774

4107

333

Nuts-Ch

ips

03316

02049

01966

3820

4071

251

PoultryEg

gs04219

02527

02403

4011

4304

294

ReadyMeals

04613

02610

02520

4342

4536

194

RedMeat

02514

01616

01532

3571

3906

335

Tea-Coff

eeProd

ucts

04347

02650

02535

3904

4168

264

Textile

Prod

ucts

05418

03048

02907

4374

46

34

260

To

baccoProd

ucts

03791

02378

02290

3729

3961

232

Average

04238

02582

02469

3899

4168

269

Complexity 11

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 2 MAPE distribution according to product groups with S1 integration strategy

in columns 2 and 3The average percentage forecasting errorsof strategies (S1) and (S2) are 042 and 026 respectively As itcan be seen fromTable 3 the percentage success rate of (S2) inaccordance with (S1) is represented in column 5 The currentintegration strategy (S2) provides 3899 improvement overthe first integration strategy (S1) average In some productgroups like Textile Products the error rate is reduced from054 to 031 with 4374 enhancement in MAPE

With this work a further improvement is obtainedby utilizing DL model compared to the usage of just 10algorithms which are based on forecasting strategy Column4 of Table 3 demonstrates MAPEs of 21 different groups ofproducts after the addition of DL model After the inclusionof DL model into the proposed system we observed around2 to 34 increase in demand forecast accuracies in someof product groups The error rates of some product groupslike ldquobaby productsrdquo ldquocosmetic productsrdquo and ldquohouseholdgoodsrdquo are usually higher than others since these productgroups do not have continuous demand by consumers Forexample the error rate for ldquohousehold goodsrdquo group is 035while the error rate for ldquored meatrdquo group is 015 The errorrates in food products are usually lower since customersconsider their daily consumption regularly to buy theseproducts However the error rates are higher in baby textileand household goods Customers buy products from thesegroups depending on whether they like the product or notselectively In addition the prices of goods in these groupsand promotion effects are usually high relative to the pricesof goods in other groups

Moreover it is observed that the top benefitted productgroups for Method I accomplishes over 40 success rateon Baby Products Bakery Products Edible Oils PoultryEggs Ready Meals and Textile Products when the column

of percentage success rate is analyzed The common pointof these groups is that each one of them is being consumedby a specific customer segment regularly For instance babyproducts group is being chosen by families who have kidsEdible Oils Poultry Eggs and Bakery Products are beingpreferred by frequently and continuously shopping customersegments and Ready Meals are being preferred by mostlybachelor single and working consumers etc Furthermorethe inclusion of DL into the forecasting system indicatesthat some other consuming groups (for example CleaningProducts RedMeats and Legumes Pasta Soup) exhibit betterperformance than the others with additional over 3

Figure 2 shows the box plots of mean percentage errors(MAPE) for each group of products after application of(S1) integration strategy The box plots enable us to analyzedistributional characteristics of forecasting errors for productgroups As it can be seen from the figure there are differentmedians for each product group Thus the forecasting errorsare usually dissimilar for different products groups Theinterquartile range box represents the middle 50 of scoresin data The lengths of interquartile range boxes are usuallyvery tall This means that there are quite number of differentprediction errors within products of a given group

Figure 3 presents the box plots of MAPEs of the proposedforecasting system that apply proposed integration approach(S2) which combines the predictions of 10 different forecast-ing algorithms

Figure 4 shows after adding DL to our system resultsThe lengths of interquartile range boxes are narrower whencompared to the ones in Figures 2 and 3 This means that thedistribution of forecasting errors is less than Figures 2 and 3Finally the integration of DL into the proposed forecastingsystem generates results that are more accurate

12 Complexity

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 3 MAPE distribution according to product groups with S2 integration strategy

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 4 MAPE distribution according to product groups after deep learning algorithms

For example for the (S1) integration strategy in BabyProducts product group MAPE distribution for the 1st quar-tile of data 5 is observed between 0and 23 for 2nd quartilebetween 23 and 48 and for 3rd and 4th between 48and 75 in Figure 2 After applying integration strategy forBaby Products product group MAPE distribution becomes0ndash16 for the 1st quartile 16ndash30 for the 2nd quartileof the data and 30ndash50 for the rest This gives around8 improvement for the first quartile from 8 to 18enhancement for the second quartile and from 18 to 25advancement for the rest of the data according to boundaries

differences Median of MAPE distribution is analyzed as48 After applying the proposed integration strategy it isobserved as 27 which means 21 improvement Approx-imately 1-3 enhancement is observed for each quartileof the data with the inclusion of deep learning strategy inFigure 4

For Cleaning Products similar improvements areobserved with the others for integration strategy MAPEdistribution of the 1st quartile of data is observed between0 and 19 for the 2nd quartile it is between 19 and 40and for the rest of data it is observed between 40 and

Complexity 13

SUM

(SA

LE Q

UAN

TITY

)

DATE

9000

8000

7000

6000

5000

4000

3000

2000

1000

0

July August Sep Oct Nov Dec Jan Feb March April May June July

ActualS1

S2SD

2017 2017 2017 2017 2017 2017 2018 2018 2018 2018 2018 2018 2018

Figure 5 Accuracy comparison among integration strategies for one sample product group

73 in Figure 2 After performing the proposed integrationstrategy for Cleaning Products MAPE distribution becomes0ndash10 for the 1st quartile of data 10ndash17 for the 2ndquartile and 27ndash43 for the 3rd and 4th quartiles togetherImprovements are observed as follows 9 for the 1st quartilefrom 9 to 13 for the 2nd quartile and from 13 to 20for the rest of the data at boundaries After implementationof the integration strategy for the 1st and 2nd quartiles thesystem advancement is observed as 2 and for the 3rdand 4th quartiles it is 3 respectively Briefly integrationstrategy improvement is remarkable for each product groupand consolidation of DL to the proposed integration strategyadvances results between almost 2 and 3 additionally

In Figure 5 the accuracy comparison among integrationstrategies of one of the best performing sample groups ispresented during 1-year period Particularly Christmas andother seasonality effects can be seen very clearly at Figure 5and integration strategy with DL (SD) is performing with thebest accuracy

It is hard to compare the performance of our proposedsystem with the other studies because of the lack of workswith similar combinations of deep learning approach theproposed decision integration strategies and different learn-ing algorithms for demand forecasting process Anotherdifficulty is the deficiency of real-life dataset similar toSOK dataset in order to compare the state-of-art studiesAlthough the results of proposed system are given in thisstudy we also report the results of a number of researchworkshere on demand forecasting domain A greedy aggregation-decomposition method has solved a real-world intermittentdemand forecasting problem for a fashion retailer in Sin-gapore [46] They report 59 MAPE success rate with asmall dataset Authors compare different approaches suchas statistical model winter model and radius basis functionneural network (RBFNN) with SVM for demand forecastingprocess in [47] As a result they conclude the study by the

fact that the success of SVM algorithm surpasses otherswith around 77 enhancement at average MAPE resultslevel A novel demand forecasting model called SHEnSVM(Selective and Heterogeneous Ensemble of Support Vec-tor Machines) is proposed in [48] The proposed modelpresents that the individual SVMs are trained by differentsamples generated by bootstrap algorithm After that geneticalgorithm is employed for retrieving the best individualcombination schema They report 10 advancement withSVM algorithm and 64 MAPE improvement They onlyemploy beer datawith 3 different brands in their experimentsTugba Efendigil et al propose a novel forecasting mechanismwhich is modeled by artificial intelligence approaches in[49] They compared both artificial neural networks andadaptive network-based fuzzy inference system techniques tomanage the fuzzy demandwith incomplete informationTheyreached around 18 MAPE rates for some products duringtheir experiments

6 Conclusion

In retail industry demand forecasting is one of the mainproblems of supply chains to optimize stocks reduce costsand increase sales profit and customer loyalty To overcomethis issue there are several methods such as time seriesanalysis and machine learning approaches to analyze andlearn complex interactions and patterns from historical data

In this study there is a novel attempt to integrate the11 different forecasting models that include time series algo-rithms support vector regression model and deep learningmethod for demand forecasting process Moreover a noveldecision integration strategy is developed by drawing inspi-ration from the ensemble methodology namely boostingThe proposed approach considers the performance of eachmodel in time and combines the weighted predictions of thebest performing models for demand forecasting process The

14 Complexity

proposed forecasting system is tested and carried out real lifedata obtained from SOK Market retail chain It is observedthat the inclusion of different learning algorithms except timeseries models and a novel integration strategy advanced theperformance of demand forecasting system To the best ofour knowledge this is the very first attempt to consolidatedeep learning methodology SVR algorithm and differenttime series methods for demand forecasting systems Fur-thermore the other novelty of this work is the adaptation ofboosting ensemble strategy to the demand forecasting modelIn this way the final decision of the proposed system isbased on the best algorithms of the week by gaining moreweight This makes our forecasts more reliable with respectto the trend changes and seasonality behaviors Moreoverthe proposed approach performs very well integrating withdeep learning algorithm on Spark big data environmentDimension reduction process and clustering methods helpto decrease time consuming with less computing powerduring deep learning modelling phase Although a reviewof some of similar studies is presented in Section 5 asit is expected it is very difficult to compare the resultsof other studies with ours because of the use of differentdatasets and methods In this study we compare results ofthree models where model one selects the best performingforecasting method depending on its success on previousperiod with 424 MAPE on average The second modelwith the novel integration strategy results in 258 MAPEon average The last model with the novel integration strat-egy enhanced with deep learning approach provides 247MAPE on average As a result the inclusion of deep learningapproach into the novel integration strategy reduces averageprediction error for demand forecasting process in supplychain

As a future work we plan to enrich the set of featuresby gathering data from other sources like economic studiesshopping trends social media social events and locationbased demographic data of stores New variety of datasources contributions to deep learning can be observedOne more study can be done to determine the hyperpa-rameters for deep learning algorithm In addition we alsoplan to use other deep learning techniques such as con-volutional neural networks recurrent neural networks anddeep neural networks as learning algorithms Furthermoreour another objective is to use heuristic methods MBO(Migrating Birds Optimization) and other related algorithms[50] to optimize some of coefficientsweights which weredetermined empirically by trial-and-error like taking 30percent of the best performing methods in our currentsystem

Data Availability

This dataset is private customer data so the agreement withthe organization SOK does not allow sharing data

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work is supported by OBASE Research amp DevelopmentCenter

References

[1] P J McGoldrick and E Andre ldquoConsumer misbehaviourpromiscuity or loyalty in grocery shoppingrdquo Journal of Retailingand Consumer Services vol 4 no 2 pp 73ndash81 1997

[2] D Grewal A L Roggeveen and J Nordfalt ldquoThe Future ofRetailingrdquo Journal of Retailing vol 93 no 1 pp 1ndash6 2017

[3] E T Bradlow M Gangwar P Kopalle and S Voleti ldquothe roleof big data and predictive analytics in retailingrdquo Journal ofRetailing vol 93 no 1 pp 79ndash95 2017

[4] J Han M Kamber and J Pei Data Mining Concepts andTechniques Morgan Kaufmann Publishers USA 2013

[5] R Hyndman and G Athanasopoulos Forecasting Principlesand Practice OTexts Melbourne Australia 2018 httpotextsorgfpp2

[6] C C Pegels ldquoExponential forecasting some new variationsrdquoManagement Science vol 12 pp 311ndash315 1969

[7] E S Gardner ldquoExponential smoothing the state of the artrdquoJournal of Forecasting vol 4 no 1 pp 1ndash28 1985

[8] D Gujarati Basic Econometrics Mcgraw-Hill New York NYUSA 2003

[9] J D Croston ldquoForecasting and stock control for intermittentdemandsrdquo Operational Research Quarterly vol 23 no 3 pp289ndash303 1972

[10] T RWillemain C N Smart J H Shockor and P A DeSautelsldquoForecasting intermittent demand inmanufacturing a compar-ative evaluation of Crostonrsquos methodrdquo International Journal ofForecasting vol 10 no 4 pp 529ndash538 1994

[11] A Syntetos Forecasting of intermittent demand Brunel Univer-sity 2001

[12] A A Syntetos and J E Boylan ldquoThe accuracy of intermittentdemand estimatesrdquo International Journal of Forecasting vol 21no 2 pp 303ndash314 2005

[13] R H Teunter A A Syntetos and M Z Babai ldquoIntermittentdemand linking forecasting to inventory obsolescencerdquo Euro-pean Journal of Operational Research vol 214 no 3 pp 606ndash615 2011

[14] R J Hyndman R A Ahmed G Athanasopoulos and H LShang ldquoOptimal combination forecasts for hierarchical timeseriesrdquo Computational Statistics amp Data Analysis vol 55 no 9pp 2579ndash2589 2011

[15] R Fildes and F Petropoulos ldquoSimple versus complex selectionrules for forecasting many time seriesrdquo Journal of BusinessResearch vol 68 no 8 pp 1692ndash1701 2015

[16] C Li and A Lim ldquoA greedy aggregationndashdecompositionmethod for intermittent demand forecasting in fashion retail-ingrdquo European Journal of Operational Research vol 269 no 3pp 860ndash869 2018

[17] F Turrado Garcıa L J Garcıa Villalba and J Portela ldquoIntelli-gent system for time series classification using support vectormachines applied to supply-chainrdquo Expert Systems with Appli-cations vol 39 no 12 pp 10590ndash10599 2012

[18] G Song and Q Dai ldquoA novel double deep ELMs ensemblesystem for time series forecastingrdquo Knowledge-Based Systemsvol 134 pp 31ndash49 2017

Complexity 15

[19] O Araque I Corcuera-Platas J F Sanchez-Rada and C AIglesias ldquoEnhancing deep learning sentiment analysis withensemble techniques in social applicationsrdquoExpert SystemswithApplications vol 77 pp 236ndash246 2017

[20] H Tong B Liu and S Wang ldquoSoftware defect predictionusing stacked denoising autoencoders and twostage ensemblelearningrdquo Information and Soware Technology vol 96 pp 94ndash111 2018

[21] X Qiu Y Ren P N Suganthan and G A J AmaratungaldquoEmpirical mode decomposition based ensemble deep learningfor load demand time series forecastingrdquo Applied So Comput-ing vol 54 pp 246ndash255 2017

[22] Z Qi BWang Y Tian and P Zhang ldquoWhen ensemble learningmeets deep learning a new deep support vector machine forclassificationrdquo Knowledge-Based Systems vol 107 pp 54ndash602016

[23] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[24] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[25] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesrdquo inNeural Networks Tricks of theTrade vol 7700 of Lecture Notes in Computer Science pp 437ndash478 Springer Berlin Germany 2nd edition 2012

[26] Y LeCun Y Bengio and G Hinton ldquoDeep learning ReviewrdquoInternational Journal of Business and Social Science vol 3 2015

[27] S Kim Z Yu R M Kil and M Lee ldquoDeep learning of supportvectormachineswith class probability output networksrdquoNeuralNetworks vol 64 pp 19ndash28 2015

[28] P J Brockwell and R A Davis Time Series eory AndMethods Springer Series in Statistics NewYork NY USA 1989

[29] V N Vapnik Statistical Learning eory Wiley- InterscienceNew York NY USA 1998

[30] S Haykin Neural Networks and LearningMachines MacmillanPublishers Limited New Jersey NJ USA 2009

[31] A Candel V Parmar E LeDell and A Arora Deep Learningwith H2O United States of America 2018

[32] K Keerthi Vasan and B Surendiran ldquoDimensionality reductionusing principal component analysis for network intrusiondetectionrdquo Perspectives in Science vol 8 pp 510ndash512 2016

[33] L Rokach ldquoEnsemble-based classifiersrdquo Artificial IntelligenceReview vol 33 no 1-2 pp 1ndash39 2010

[34] R Polikar ldquoEnsemble based systems in decision makingrdquo IEEECircuits and Systems Magazine vol 6 no 3 pp 21ndash45 2006

[35] Reid SamA review of heterogeneous ensemblemethods Depart-ment of Computer Science University of Colorado at Boulder2007

[36] D Gopika and B Azhagusundari ldquoAn analysis on ensem-ble methods in classification tasksrdquo International Journal ofAdvanced Research in Computer and Communication Engineer-ing vol 3 no 7 pp 7423ndash7427 2014

[37] Y Ren L Zhang and P N Suganthan ldquoEnsemble classificationand regression-recent developments applications and futuredirectionsrdquo IEEE Computational Intelligence Magazine vol 11no 1 pp 41ndash53 2016

[38] U G Mangai S Samanta S Das and P R ChowdhuryldquoA survey of decision fusion and feature fusion strategies for

pattern classificationrdquo IETE Technical Review vol 27 no 4 pp293ndash307 2010

[39] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014

[40] G Tsoumakas L Angelis and I Vlahavas ldquoSelective fusion ofheterogeneous classifiersrdquo Intelligent Data Analysis vol 9 no 6pp 511ndash525 2005

[41] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[42] S Chopra and P Meindl Supply Chain Management StrategyPlanning and Operation Prentice Hall 2004

[43] G Kushwaha ldquoOperational performance through supply chainmanagement practicesrdquo International Journal of Business andSocial Science vol 217 pp 65ndash77 2012

[44] G E Box G M Jenkins G Reinsel and G M LjungTime Series Analysis Forecasting and Control Wiley Series inProbability and Statistics John Wiley amp Sons Hoboken NJUSA 5th edition 2015

[45] W-L Zhao C-H Deng and C-W Ngo ldquok-means a revisitrdquoNeurocomputing vol 291 pp 195ndash206 2018

[46] C Li andA Lim ldquoA greedy aggregation-decompositionmethodfor intermittent demand forecasting in fashion retailingrdquo Euro-pean Journal of Operational Research vol 269 no 3 pp 860ndash869 2018

[47] L Yue Y Yafeng G Junjun and T Chongli ldquoDemand forecast-ing by using support vectormachinerdquo inProceedings of theirdInternational Conference on Natural Computation (ICNC 2007)pp 272ndash276 Haikou China August 2007

[48] L Yue L Zhenjiang Y Yafeng T Zaixia G Junjun andZ Bofeng ldquoSelective and heterogeneous SVM ensemble fordemand forecastingrdquo in Proceedings of the 2010 IEEE 10th Inter-national Conference on Computer and Information Technology(CIT) pp 1519ndash1524 Bradford UK June 2010

[49] T Efendigil S Onut and C Kahraman ldquoA decision supportsystem for demand forecasting with artificial neural networksand neuro-fuzzy models a comparative analysisrdquo Expert Sys-tems with Applications vol 36 no 3 pp 6697ndash6707 2009

[50] E Duman M Uysal and A F Alkaya ldquoMigrating birds opti-mization a newmetaheuristic approach and its performance onquadratic assignment problemrdquo Information Sciences vol 217pp 65ndash77 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 11: An Improved Demand Forecasting Model Using …downloads.hindawi.com/journals/complexity/2019/9067367.pdfAn Improved Demand Forecasting Model Using Deep Learning Approach and Proposed

Complexity 11

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 2 MAPE distribution according to product groups with S1 integration strategy

in columns 2 and 3The average percentage forecasting errorsof strategies (S1) and (S2) are 042 and 026 respectively As itcan be seen fromTable 3 the percentage success rate of (S2) inaccordance with (S1) is represented in column 5 The currentintegration strategy (S2) provides 3899 improvement overthe first integration strategy (S1) average In some productgroups like Textile Products the error rate is reduced from054 to 031 with 4374 enhancement in MAPE

With this work a further improvement is obtainedby utilizing DL model compared to the usage of just 10algorithms which are based on forecasting strategy Column4 of Table 3 demonstrates MAPEs of 21 different groups ofproducts after the addition of DL model After the inclusionof DL model into the proposed system we observed around2 to 34 increase in demand forecast accuracies in someof product groups The error rates of some product groupslike ldquobaby productsrdquo ldquocosmetic productsrdquo and ldquohouseholdgoodsrdquo are usually higher than others since these productgroups do not have continuous demand by consumers Forexample the error rate for ldquohousehold goodsrdquo group is 035while the error rate for ldquored meatrdquo group is 015 The errorrates in food products are usually lower since customersconsider their daily consumption regularly to buy theseproducts However the error rates are higher in baby textileand household goods Customers buy products from thesegroups depending on whether they like the product or notselectively In addition the prices of goods in these groupsand promotion effects are usually high relative to the pricesof goods in other groups

Moreover it is observed that the top benefitted productgroups for Method I accomplishes over 40 success rateon Baby Products Bakery Products Edible Oils PoultryEggs Ready Meals and Textile Products when the column

of percentage success rate is analyzed The common pointof these groups is that each one of them is being consumedby a specific customer segment regularly For instance babyproducts group is being chosen by families who have kidsEdible Oils Poultry Eggs and Bakery Products are beingpreferred by frequently and continuously shopping customersegments and Ready Meals are being preferred by mostlybachelor single and working consumers etc Furthermorethe inclusion of DL into the forecasting system indicatesthat some other consuming groups (for example CleaningProducts RedMeats and Legumes Pasta Soup) exhibit betterperformance than the others with additional over 3

Figure 2 shows the box plots of mean percentage errors(MAPE) for each group of products after application of(S1) integration strategy The box plots enable us to analyzedistributional characteristics of forecasting errors for productgroups As it can be seen from the figure there are differentmedians for each product group Thus the forecasting errorsare usually dissimilar for different products groups Theinterquartile range box represents the middle 50 of scoresin data The lengths of interquartile range boxes are usuallyvery tall This means that there are quite number of differentprediction errors within products of a given group

Figure 3 presents the box plots of MAPEs of the proposedforecasting system that apply proposed integration approach(S2) which combines the predictions of 10 different forecast-ing algorithms

Figure 4 shows after adding DL to our system resultsThe lengths of interquartile range boxes are narrower whencompared to the ones in Figures 2 and 3 This means that thedistribution of forecasting errors is less than Figures 2 and 3Finally the integration of DL into the proposed forecastingsystem generates results that are more accurate

12 Complexity

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 3 MAPE distribution according to product groups with S2 integration strategy

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 4 MAPE distribution according to product groups after deep learning algorithms

For example for the (S1) integration strategy in BabyProducts product group MAPE distribution for the 1st quar-tile of data 5 is observed between 0and 23 for 2nd quartilebetween 23 and 48 and for 3rd and 4th between 48and 75 in Figure 2 After applying integration strategy forBaby Products product group MAPE distribution becomes0ndash16 for the 1st quartile 16ndash30 for the 2nd quartileof the data and 30ndash50 for the rest This gives around8 improvement for the first quartile from 8 to 18enhancement for the second quartile and from 18 to 25advancement for the rest of the data according to boundaries

differences Median of MAPE distribution is analyzed as48 After applying the proposed integration strategy it isobserved as 27 which means 21 improvement Approx-imately 1-3 enhancement is observed for each quartileof the data with the inclusion of deep learning strategy inFigure 4

For Cleaning Products similar improvements areobserved with the others for integration strategy MAPEdistribution of the 1st quartile of data is observed between0 and 19 for the 2nd quartile it is between 19 and 40and for the rest of data it is observed between 40 and

Complexity 13

SUM

(SA

LE Q

UAN

TITY

)

DATE

9000

8000

7000

6000

5000

4000

3000

2000

1000

0

July August Sep Oct Nov Dec Jan Feb March April May June July

ActualS1

S2SD

2017 2017 2017 2017 2017 2017 2018 2018 2018 2018 2018 2018 2018

Figure 5 Accuracy comparison among integration strategies for one sample product group

73 in Figure 2 After performing the proposed integrationstrategy for Cleaning Products MAPE distribution becomes0ndash10 for the 1st quartile of data 10ndash17 for the 2ndquartile and 27ndash43 for the 3rd and 4th quartiles togetherImprovements are observed as follows 9 for the 1st quartilefrom 9 to 13 for the 2nd quartile and from 13 to 20for the rest of the data at boundaries After implementationof the integration strategy for the 1st and 2nd quartiles thesystem advancement is observed as 2 and for the 3rdand 4th quartiles it is 3 respectively Briefly integrationstrategy improvement is remarkable for each product groupand consolidation of DL to the proposed integration strategyadvances results between almost 2 and 3 additionally

In Figure 5 the accuracy comparison among integrationstrategies of one of the best performing sample groups ispresented during 1-year period Particularly Christmas andother seasonality effects can be seen very clearly at Figure 5and integration strategy with DL (SD) is performing with thebest accuracy

It is hard to compare the performance of our proposedsystem with the other studies because of the lack of workswith similar combinations of deep learning approach theproposed decision integration strategies and different learn-ing algorithms for demand forecasting process Anotherdifficulty is the deficiency of real-life dataset similar toSOK dataset in order to compare the state-of-art studiesAlthough the results of proposed system are given in thisstudy we also report the results of a number of researchworkshere on demand forecasting domain A greedy aggregation-decomposition method has solved a real-world intermittentdemand forecasting problem for a fashion retailer in Sin-gapore [46] They report 59 MAPE success rate with asmall dataset Authors compare different approaches suchas statistical model winter model and radius basis functionneural network (RBFNN) with SVM for demand forecastingprocess in [47] As a result they conclude the study by the

fact that the success of SVM algorithm surpasses otherswith around 77 enhancement at average MAPE resultslevel A novel demand forecasting model called SHEnSVM(Selective and Heterogeneous Ensemble of Support Vec-tor Machines) is proposed in [48] The proposed modelpresents that the individual SVMs are trained by differentsamples generated by bootstrap algorithm After that geneticalgorithm is employed for retrieving the best individualcombination schema They report 10 advancement withSVM algorithm and 64 MAPE improvement They onlyemploy beer datawith 3 different brands in their experimentsTugba Efendigil et al propose a novel forecasting mechanismwhich is modeled by artificial intelligence approaches in[49] They compared both artificial neural networks andadaptive network-based fuzzy inference system techniques tomanage the fuzzy demandwith incomplete informationTheyreached around 18 MAPE rates for some products duringtheir experiments

6 Conclusion

In retail industry demand forecasting is one of the mainproblems of supply chains to optimize stocks reduce costsand increase sales profit and customer loyalty To overcomethis issue there are several methods such as time seriesanalysis and machine learning approaches to analyze andlearn complex interactions and patterns from historical data

In this study there is a novel attempt to integrate the11 different forecasting models that include time series algo-rithms support vector regression model and deep learningmethod for demand forecasting process Moreover a noveldecision integration strategy is developed by drawing inspi-ration from the ensemble methodology namely boostingThe proposed approach considers the performance of eachmodel in time and combines the weighted predictions of thebest performing models for demand forecasting process The

14 Complexity

proposed forecasting system is tested and carried out real lifedata obtained from SOK Market retail chain It is observedthat the inclusion of different learning algorithms except timeseries models and a novel integration strategy advanced theperformance of demand forecasting system To the best ofour knowledge this is the very first attempt to consolidatedeep learning methodology SVR algorithm and differenttime series methods for demand forecasting systems Fur-thermore the other novelty of this work is the adaptation ofboosting ensemble strategy to the demand forecasting modelIn this way the final decision of the proposed system isbased on the best algorithms of the week by gaining moreweight This makes our forecasts more reliable with respectto the trend changes and seasonality behaviors Moreoverthe proposed approach performs very well integrating withdeep learning algorithm on Spark big data environmentDimension reduction process and clustering methods helpto decrease time consuming with less computing powerduring deep learning modelling phase Although a reviewof some of similar studies is presented in Section 5 asit is expected it is very difficult to compare the resultsof other studies with ours because of the use of differentdatasets and methods In this study we compare results ofthree models where model one selects the best performingforecasting method depending on its success on previousperiod with 424 MAPE on average The second modelwith the novel integration strategy results in 258 MAPEon average The last model with the novel integration strat-egy enhanced with deep learning approach provides 247MAPE on average As a result the inclusion of deep learningapproach into the novel integration strategy reduces averageprediction error for demand forecasting process in supplychain

As a future work we plan to enrich the set of featuresby gathering data from other sources like economic studiesshopping trends social media social events and locationbased demographic data of stores New variety of datasources contributions to deep learning can be observedOne more study can be done to determine the hyperpa-rameters for deep learning algorithm In addition we alsoplan to use other deep learning techniques such as con-volutional neural networks recurrent neural networks anddeep neural networks as learning algorithms Furthermoreour another objective is to use heuristic methods MBO(Migrating Birds Optimization) and other related algorithms[50] to optimize some of coefficientsweights which weredetermined empirically by trial-and-error like taking 30percent of the best performing methods in our currentsystem

Data Availability

This dataset is private customer data so the agreement withthe organization SOK does not allow sharing data

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work is supported by OBASE Research amp DevelopmentCenter

References

[1] P J McGoldrick and E Andre ldquoConsumer misbehaviourpromiscuity or loyalty in grocery shoppingrdquo Journal of Retailingand Consumer Services vol 4 no 2 pp 73ndash81 1997

[2] D Grewal A L Roggeveen and J Nordfalt ldquoThe Future ofRetailingrdquo Journal of Retailing vol 93 no 1 pp 1ndash6 2017

[3] E T Bradlow M Gangwar P Kopalle and S Voleti ldquothe roleof big data and predictive analytics in retailingrdquo Journal ofRetailing vol 93 no 1 pp 79ndash95 2017

[4] J Han M Kamber and J Pei Data Mining Concepts andTechniques Morgan Kaufmann Publishers USA 2013

[5] R Hyndman and G Athanasopoulos Forecasting Principlesand Practice OTexts Melbourne Australia 2018 httpotextsorgfpp2

[6] C C Pegels ldquoExponential forecasting some new variationsrdquoManagement Science vol 12 pp 311ndash315 1969

[7] E S Gardner ldquoExponential smoothing the state of the artrdquoJournal of Forecasting vol 4 no 1 pp 1ndash28 1985

[8] D Gujarati Basic Econometrics Mcgraw-Hill New York NYUSA 2003

[9] J D Croston ldquoForecasting and stock control for intermittentdemandsrdquo Operational Research Quarterly vol 23 no 3 pp289ndash303 1972

[10] T RWillemain C N Smart J H Shockor and P A DeSautelsldquoForecasting intermittent demand inmanufacturing a compar-ative evaluation of Crostonrsquos methodrdquo International Journal ofForecasting vol 10 no 4 pp 529ndash538 1994

[11] A Syntetos Forecasting of intermittent demand Brunel Univer-sity 2001

[12] A A Syntetos and J E Boylan ldquoThe accuracy of intermittentdemand estimatesrdquo International Journal of Forecasting vol 21no 2 pp 303ndash314 2005

[13] R H Teunter A A Syntetos and M Z Babai ldquoIntermittentdemand linking forecasting to inventory obsolescencerdquo Euro-pean Journal of Operational Research vol 214 no 3 pp 606ndash615 2011

[14] R J Hyndman R A Ahmed G Athanasopoulos and H LShang ldquoOptimal combination forecasts for hierarchical timeseriesrdquo Computational Statistics amp Data Analysis vol 55 no 9pp 2579ndash2589 2011

[15] R Fildes and F Petropoulos ldquoSimple versus complex selectionrules for forecasting many time seriesrdquo Journal of BusinessResearch vol 68 no 8 pp 1692ndash1701 2015

[16] C Li and A Lim ldquoA greedy aggregationndashdecompositionmethod for intermittent demand forecasting in fashion retail-ingrdquo European Journal of Operational Research vol 269 no 3pp 860ndash869 2018

[17] F Turrado Garcıa L J Garcıa Villalba and J Portela ldquoIntelli-gent system for time series classification using support vectormachines applied to supply-chainrdquo Expert Systems with Appli-cations vol 39 no 12 pp 10590ndash10599 2012

[18] G Song and Q Dai ldquoA novel double deep ELMs ensemblesystem for time series forecastingrdquo Knowledge-Based Systemsvol 134 pp 31ndash49 2017

Complexity 15

[19] O Araque I Corcuera-Platas J F Sanchez-Rada and C AIglesias ldquoEnhancing deep learning sentiment analysis withensemble techniques in social applicationsrdquoExpert SystemswithApplications vol 77 pp 236ndash246 2017

[20] H Tong B Liu and S Wang ldquoSoftware defect predictionusing stacked denoising autoencoders and twostage ensemblelearningrdquo Information and Soware Technology vol 96 pp 94ndash111 2018

[21] X Qiu Y Ren P N Suganthan and G A J AmaratungaldquoEmpirical mode decomposition based ensemble deep learningfor load demand time series forecastingrdquo Applied So Comput-ing vol 54 pp 246ndash255 2017

[22] Z Qi BWang Y Tian and P Zhang ldquoWhen ensemble learningmeets deep learning a new deep support vector machine forclassificationrdquo Knowledge-Based Systems vol 107 pp 54ndash602016

[23] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[24] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[25] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesrdquo inNeural Networks Tricks of theTrade vol 7700 of Lecture Notes in Computer Science pp 437ndash478 Springer Berlin Germany 2nd edition 2012

[26] Y LeCun Y Bengio and G Hinton ldquoDeep learning ReviewrdquoInternational Journal of Business and Social Science vol 3 2015

[27] S Kim Z Yu R M Kil and M Lee ldquoDeep learning of supportvectormachineswith class probability output networksrdquoNeuralNetworks vol 64 pp 19ndash28 2015

[28] P J Brockwell and R A Davis Time Series eory AndMethods Springer Series in Statistics NewYork NY USA 1989

[29] V N Vapnik Statistical Learning eory Wiley- InterscienceNew York NY USA 1998

[30] S Haykin Neural Networks and LearningMachines MacmillanPublishers Limited New Jersey NJ USA 2009

[31] A Candel V Parmar E LeDell and A Arora Deep Learningwith H2O United States of America 2018

[32] K Keerthi Vasan and B Surendiran ldquoDimensionality reductionusing principal component analysis for network intrusiondetectionrdquo Perspectives in Science vol 8 pp 510ndash512 2016

[33] L Rokach ldquoEnsemble-based classifiersrdquo Artificial IntelligenceReview vol 33 no 1-2 pp 1ndash39 2010

[34] R Polikar ldquoEnsemble based systems in decision makingrdquo IEEECircuits and Systems Magazine vol 6 no 3 pp 21ndash45 2006

[35] Reid SamA review of heterogeneous ensemblemethods Depart-ment of Computer Science University of Colorado at Boulder2007

[36] D Gopika and B Azhagusundari ldquoAn analysis on ensem-ble methods in classification tasksrdquo International Journal ofAdvanced Research in Computer and Communication Engineer-ing vol 3 no 7 pp 7423ndash7427 2014

[37] Y Ren L Zhang and P N Suganthan ldquoEnsemble classificationand regression-recent developments applications and futuredirectionsrdquo IEEE Computational Intelligence Magazine vol 11no 1 pp 41ndash53 2016

[38] U G Mangai S Samanta S Das and P R ChowdhuryldquoA survey of decision fusion and feature fusion strategies for

pattern classificationrdquo IETE Technical Review vol 27 no 4 pp293ndash307 2010

[39] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014

[40] G Tsoumakas L Angelis and I Vlahavas ldquoSelective fusion ofheterogeneous classifiersrdquo Intelligent Data Analysis vol 9 no 6pp 511ndash525 2005

[41] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[42] S Chopra and P Meindl Supply Chain Management StrategyPlanning and Operation Prentice Hall 2004

[43] G Kushwaha ldquoOperational performance through supply chainmanagement practicesrdquo International Journal of Business andSocial Science vol 217 pp 65ndash77 2012

[44] G E Box G M Jenkins G Reinsel and G M LjungTime Series Analysis Forecasting and Control Wiley Series inProbability and Statistics John Wiley amp Sons Hoboken NJUSA 5th edition 2015

[45] W-L Zhao C-H Deng and C-W Ngo ldquok-means a revisitrdquoNeurocomputing vol 291 pp 195ndash206 2018

[46] C Li andA Lim ldquoA greedy aggregation-decompositionmethodfor intermittent demand forecasting in fashion retailingrdquo Euro-pean Journal of Operational Research vol 269 no 3 pp 860ndash869 2018

[47] L Yue Y Yafeng G Junjun and T Chongli ldquoDemand forecast-ing by using support vectormachinerdquo inProceedings of theirdInternational Conference on Natural Computation (ICNC 2007)pp 272ndash276 Haikou China August 2007

[48] L Yue L Zhenjiang Y Yafeng T Zaixia G Junjun andZ Bofeng ldquoSelective and heterogeneous SVM ensemble fordemand forecastingrdquo in Proceedings of the 2010 IEEE 10th Inter-national Conference on Computer and Information Technology(CIT) pp 1519ndash1524 Bradford UK June 2010

[49] T Efendigil S Onut and C Kahraman ldquoA decision supportsystem for demand forecasting with artificial neural networksand neuro-fuzzy models a comparative analysisrdquo Expert Sys-tems with Applications vol 36 no 3 pp 6697ndash6707 2009

[50] E Duman M Uysal and A F Alkaya ldquoMigrating birds opti-mization a newmetaheuristic approach and its performance onquadratic assignment problemrdquo Information Sciences vol 217pp 65ndash77 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 12: An Improved Demand Forecasting Model Using …downloads.hindawi.com/journals/complexity/2019/9067367.pdfAn Improved Demand Forecasting Model Using Deep Learning Approach and Proposed

12 Complexity

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 3 MAPE distribution according to product groups with S2 integration strategy

100

075

050

025

000

MA

PE

PRODUCT GROUPS

Baby

Pro

duct

s

Bake

ry P

rodu

cts

Beve

rage

Bisc

uitndash

Choc

olat

e

Brea

kfas

t Pro

duct

s

Cann

edndashP

aste

ndashSau

ces

Chee

se

Clea

ning

Pro

duct

s

Cos

met

ics P

rodu

cts

Deli

Mea

ts

Edib

le O

ils

Hou

seho

ld G

oods

Ice C

ream

ndashFro

zen

Legu

mes

ndashPas

tandashS

oup

Nut

sndashCh

ips

Poul

tryndash

Eggs

Red

Mea

t

Read

y M

eals

Teandash

Coff

ee P

rodu

cts

Text

ile P

rodu

cts

Toba

cco

Prod

ucts

Figure 4 MAPE distribution according to product groups after deep learning algorithms

For example for the (S1) integration strategy in BabyProducts product group MAPE distribution for the 1st quar-tile of data 5 is observed between 0and 23 for 2nd quartilebetween 23 and 48 and for 3rd and 4th between 48and 75 in Figure 2 After applying integration strategy forBaby Products product group MAPE distribution becomes0ndash16 for the 1st quartile 16ndash30 for the 2nd quartileof the data and 30ndash50 for the rest This gives around8 improvement for the first quartile from 8 to 18enhancement for the second quartile and from 18 to 25advancement for the rest of the data according to boundaries

differences Median of MAPE distribution is analyzed as48 After applying the proposed integration strategy it isobserved as 27 which means 21 improvement Approx-imately 1-3 enhancement is observed for each quartileof the data with the inclusion of deep learning strategy inFigure 4

For Cleaning Products similar improvements areobserved with the others for integration strategy MAPEdistribution of the 1st quartile of data is observed between0 and 19 for the 2nd quartile it is between 19 and 40and for the rest of data it is observed between 40 and

Complexity 13

SUM

(SA

LE Q

UAN

TITY

)

DATE

9000

8000

7000

6000

5000

4000

3000

2000

1000

0

July August Sep Oct Nov Dec Jan Feb March April May June July

ActualS1

S2SD

2017 2017 2017 2017 2017 2017 2018 2018 2018 2018 2018 2018 2018

Figure 5 Accuracy comparison among integration strategies for one sample product group

73 in Figure 2 After performing the proposed integrationstrategy for Cleaning Products MAPE distribution becomes0ndash10 for the 1st quartile of data 10ndash17 for the 2ndquartile and 27ndash43 for the 3rd and 4th quartiles togetherImprovements are observed as follows 9 for the 1st quartilefrom 9 to 13 for the 2nd quartile and from 13 to 20for the rest of the data at boundaries After implementationof the integration strategy for the 1st and 2nd quartiles thesystem advancement is observed as 2 and for the 3rdand 4th quartiles it is 3 respectively Briefly integrationstrategy improvement is remarkable for each product groupand consolidation of DL to the proposed integration strategyadvances results between almost 2 and 3 additionally

In Figure 5 the accuracy comparison among integrationstrategies of one of the best performing sample groups ispresented during 1-year period Particularly Christmas andother seasonality effects can be seen very clearly at Figure 5and integration strategy with DL (SD) is performing with thebest accuracy

It is hard to compare the performance of our proposedsystem with the other studies because of the lack of workswith similar combinations of deep learning approach theproposed decision integration strategies and different learn-ing algorithms for demand forecasting process Anotherdifficulty is the deficiency of real-life dataset similar toSOK dataset in order to compare the state-of-art studiesAlthough the results of proposed system are given in thisstudy we also report the results of a number of researchworkshere on demand forecasting domain A greedy aggregation-decomposition method has solved a real-world intermittentdemand forecasting problem for a fashion retailer in Sin-gapore [46] They report 59 MAPE success rate with asmall dataset Authors compare different approaches suchas statistical model winter model and radius basis functionneural network (RBFNN) with SVM for demand forecastingprocess in [47] As a result they conclude the study by the

fact that the success of SVM algorithm surpasses otherswith around 77 enhancement at average MAPE resultslevel A novel demand forecasting model called SHEnSVM(Selective and Heterogeneous Ensemble of Support Vec-tor Machines) is proposed in [48] The proposed modelpresents that the individual SVMs are trained by differentsamples generated by bootstrap algorithm After that geneticalgorithm is employed for retrieving the best individualcombination schema They report 10 advancement withSVM algorithm and 64 MAPE improvement They onlyemploy beer datawith 3 different brands in their experimentsTugba Efendigil et al propose a novel forecasting mechanismwhich is modeled by artificial intelligence approaches in[49] They compared both artificial neural networks andadaptive network-based fuzzy inference system techniques tomanage the fuzzy demandwith incomplete informationTheyreached around 18 MAPE rates for some products duringtheir experiments

6 Conclusion

In retail industry demand forecasting is one of the mainproblems of supply chains to optimize stocks reduce costsand increase sales profit and customer loyalty To overcomethis issue there are several methods such as time seriesanalysis and machine learning approaches to analyze andlearn complex interactions and patterns from historical data

In this study there is a novel attempt to integrate the11 different forecasting models that include time series algo-rithms support vector regression model and deep learningmethod for demand forecasting process Moreover a noveldecision integration strategy is developed by drawing inspi-ration from the ensemble methodology namely boostingThe proposed approach considers the performance of eachmodel in time and combines the weighted predictions of thebest performing models for demand forecasting process The

14 Complexity

proposed forecasting system is tested and carried out real lifedata obtained from SOK Market retail chain It is observedthat the inclusion of different learning algorithms except timeseries models and a novel integration strategy advanced theperformance of demand forecasting system To the best ofour knowledge this is the very first attempt to consolidatedeep learning methodology SVR algorithm and differenttime series methods for demand forecasting systems Fur-thermore the other novelty of this work is the adaptation ofboosting ensemble strategy to the demand forecasting modelIn this way the final decision of the proposed system isbased on the best algorithms of the week by gaining moreweight This makes our forecasts more reliable with respectto the trend changes and seasonality behaviors Moreoverthe proposed approach performs very well integrating withdeep learning algorithm on Spark big data environmentDimension reduction process and clustering methods helpto decrease time consuming with less computing powerduring deep learning modelling phase Although a reviewof some of similar studies is presented in Section 5 asit is expected it is very difficult to compare the resultsof other studies with ours because of the use of differentdatasets and methods In this study we compare results ofthree models where model one selects the best performingforecasting method depending on its success on previousperiod with 424 MAPE on average The second modelwith the novel integration strategy results in 258 MAPEon average The last model with the novel integration strat-egy enhanced with deep learning approach provides 247MAPE on average As a result the inclusion of deep learningapproach into the novel integration strategy reduces averageprediction error for demand forecasting process in supplychain

As a future work we plan to enrich the set of featuresby gathering data from other sources like economic studiesshopping trends social media social events and locationbased demographic data of stores New variety of datasources contributions to deep learning can be observedOne more study can be done to determine the hyperpa-rameters for deep learning algorithm In addition we alsoplan to use other deep learning techniques such as con-volutional neural networks recurrent neural networks anddeep neural networks as learning algorithms Furthermoreour another objective is to use heuristic methods MBO(Migrating Birds Optimization) and other related algorithms[50] to optimize some of coefficientsweights which weredetermined empirically by trial-and-error like taking 30percent of the best performing methods in our currentsystem

Data Availability

This dataset is private customer data so the agreement withthe organization SOK does not allow sharing data

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work is supported by OBASE Research amp DevelopmentCenter

References

[1] P J McGoldrick and E Andre ldquoConsumer misbehaviourpromiscuity or loyalty in grocery shoppingrdquo Journal of Retailingand Consumer Services vol 4 no 2 pp 73ndash81 1997

[2] D Grewal A L Roggeveen and J Nordfalt ldquoThe Future ofRetailingrdquo Journal of Retailing vol 93 no 1 pp 1ndash6 2017

[3] E T Bradlow M Gangwar P Kopalle and S Voleti ldquothe roleof big data and predictive analytics in retailingrdquo Journal ofRetailing vol 93 no 1 pp 79ndash95 2017

[4] J Han M Kamber and J Pei Data Mining Concepts andTechniques Morgan Kaufmann Publishers USA 2013

[5] R Hyndman and G Athanasopoulos Forecasting Principlesand Practice OTexts Melbourne Australia 2018 httpotextsorgfpp2

[6] C C Pegels ldquoExponential forecasting some new variationsrdquoManagement Science vol 12 pp 311ndash315 1969

[7] E S Gardner ldquoExponential smoothing the state of the artrdquoJournal of Forecasting vol 4 no 1 pp 1ndash28 1985

[8] D Gujarati Basic Econometrics Mcgraw-Hill New York NYUSA 2003

[9] J D Croston ldquoForecasting and stock control for intermittentdemandsrdquo Operational Research Quarterly vol 23 no 3 pp289ndash303 1972

[10] T RWillemain C N Smart J H Shockor and P A DeSautelsldquoForecasting intermittent demand inmanufacturing a compar-ative evaluation of Crostonrsquos methodrdquo International Journal ofForecasting vol 10 no 4 pp 529ndash538 1994

[11] A Syntetos Forecasting of intermittent demand Brunel Univer-sity 2001

[12] A A Syntetos and J E Boylan ldquoThe accuracy of intermittentdemand estimatesrdquo International Journal of Forecasting vol 21no 2 pp 303ndash314 2005

[13] R H Teunter A A Syntetos and M Z Babai ldquoIntermittentdemand linking forecasting to inventory obsolescencerdquo Euro-pean Journal of Operational Research vol 214 no 3 pp 606ndash615 2011

[14] R J Hyndman R A Ahmed G Athanasopoulos and H LShang ldquoOptimal combination forecasts for hierarchical timeseriesrdquo Computational Statistics amp Data Analysis vol 55 no 9pp 2579ndash2589 2011

[15] R Fildes and F Petropoulos ldquoSimple versus complex selectionrules for forecasting many time seriesrdquo Journal of BusinessResearch vol 68 no 8 pp 1692ndash1701 2015

[16] C Li and A Lim ldquoA greedy aggregationndashdecompositionmethod for intermittent demand forecasting in fashion retail-ingrdquo European Journal of Operational Research vol 269 no 3pp 860ndash869 2018

[17] F Turrado Garcıa L J Garcıa Villalba and J Portela ldquoIntelli-gent system for time series classification using support vectormachines applied to supply-chainrdquo Expert Systems with Appli-cations vol 39 no 12 pp 10590ndash10599 2012

[18] G Song and Q Dai ldquoA novel double deep ELMs ensemblesystem for time series forecastingrdquo Knowledge-Based Systemsvol 134 pp 31ndash49 2017

Complexity 15

[19] O Araque I Corcuera-Platas J F Sanchez-Rada and C AIglesias ldquoEnhancing deep learning sentiment analysis withensemble techniques in social applicationsrdquoExpert SystemswithApplications vol 77 pp 236ndash246 2017

[20] H Tong B Liu and S Wang ldquoSoftware defect predictionusing stacked denoising autoencoders and twostage ensemblelearningrdquo Information and Soware Technology vol 96 pp 94ndash111 2018

[21] X Qiu Y Ren P N Suganthan and G A J AmaratungaldquoEmpirical mode decomposition based ensemble deep learningfor load demand time series forecastingrdquo Applied So Comput-ing vol 54 pp 246ndash255 2017

[22] Z Qi BWang Y Tian and P Zhang ldquoWhen ensemble learningmeets deep learning a new deep support vector machine forclassificationrdquo Knowledge-Based Systems vol 107 pp 54ndash602016

[23] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[24] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[25] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesrdquo inNeural Networks Tricks of theTrade vol 7700 of Lecture Notes in Computer Science pp 437ndash478 Springer Berlin Germany 2nd edition 2012

[26] Y LeCun Y Bengio and G Hinton ldquoDeep learning ReviewrdquoInternational Journal of Business and Social Science vol 3 2015

[27] S Kim Z Yu R M Kil and M Lee ldquoDeep learning of supportvectormachineswith class probability output networksrdquoNeuralNetworks vol 64 pp 19ndash28 2015

[28] P J Brockwell and R A Davis Time Series eory AndMethods Springer Series in Statistics NewYork NY USA 1989

[29] V N Vapnik Statistical Learning eory Wiley- InterscienceNew York NY USA 1998

[30] S Haykin Neural Networks and LearningMachines MacmillanPublishers Limited New Jersey NJ USA 2009

[31] A Candel V Parmar E LeDell and A Arora Deep Learningwith H2O United States of America 2018

[32] K Keerthi Vasan and B Surendiran ldquoDimensionality reductionusing principal component analysis for network intrusiondetectionrdquo Perspectives in Science vol 8 pp 510ndash512 2016

[33] L Rokach ldquoEnsemble-based classifiersrdquo Artificial IntelligenceReview vol 33 no 1-2 pp 1ndash39 2010

[34] R Polikar ldquoEnsemble based systems in decision makingrdquo IEEECircuits and Systems Magazine vol 6 no 3 pp 21ndash45 2006

[35] Reid SamA review of heterogeneous ensemblemethods Depart-ment of Computer Science University of Colorado at Boulder2007

[36] D Gopika and B Azhagusundari ldquoAn analysis on ensem-ble methods in classification tasksrdquo International Journal ofAdvanced Research in Computer and Communication Engineer-ing vol 3 no 7 pp 7423ndash7427 2014

[37] Y Ren L Zhang and P N Suganthan ldquoEnsemble classificationand regression-recent developments applications and futuredirectionsrdquo IEEE Computational Intelligence Magazine vol 11no 1 pp 41ndash53 2016

[38] U G Mangai S Samanta S Das and P R ChowdhuryldquoA survey of decision fusion and feature fusion strategies for

pattern classificationrdquo IETE Technical Review vol 27 no 4 pp293ndash307 2010

[39] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014

[40] G Tsoumakas L Angelis and I Vlahavas ldquoSelective fusion ofheterogeneous classifiersrdquo Intelligent Data Analysis vol 9 no 6pp 511ndash525 2005

[41] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[42] S Chopra and P Meindl Supply Chain Management StrategyPlanning and Operation Prentice Hall 2004

[43] G Kushwaha ldquoOperational performance through supply chainmanagement practicesrdquo International Journal of Business andSocial Science vol 217 pp 65ndash77 2012

[44] G E Box G M Jenkins G Reinsel and G M LjungTime Series Analysis Forecasting and Control Wiley Series inProbability and Statistics John Wiley amp Sons Hoboken NJUSA 5th edition 2015

[45] W-L Zhao C-H Deng and C-W Ngo ldquok-means a revisitrdquoNeurocomputing vol 291 pp 195ndash206 2018

[46] C Li andA Lim ldquoA greedy aggregation-decompositionmethodfor intermittent demand forecasting in fashion retailingrdquo Euro-pean Journal of Operational Research vol 269 no 3 pp 860ndash869 2018

[47] L Yue Y Yafeng G Junjun and T Chongli ldquoDemand forecast-ing by using support vectormachinerdquo inProceedings of theirdInternational Conference on Natural Computation (ICNC 2007)pp 272ndash276 Haikou China August 2007

[48] L Yue L Zhenjiang Y Yafeng T Zaixia G Junjun andZ Bofeng ldquoSelective and heterogeneous SVM ensemble fordemand forecastingrdquo in Proceedings of the 2010 IEEE 10th Inter-national Conference on Computer and Information Technology(CIT) pp 1519ndash1524 Bradford UK June 2010

[49] T Efendigil S Onut and C Kahraman ldquoA decision supportsystem for demand forecasting with artificial neural networksand neuro-fuzzy models a comparative analysisrdquo Expert Sys-tems with Applications vol 36 no 3 pp 6697ndash6707 2009

[50] E Duman M Uysal and A F Alkaya ldquoMigrating birds opti-mization a newmetaheuristic approach and its performance onquadratic assignment problemrdquo Information Sciences vol 217pp 65ndash77 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 13: An Improved Demand Forecasting Model Using …downloads.hindawi.com/journals/complexity/2019/9067367.pdfAn Improved Demand Forecasting Model Using Deep Learning Approach and Proposed

Complexity 13

SUM

(SA

LE Q

UAN

TITY

)

DATE

9000

8000

7000

6000

5000

4000

3000

2000

1000

0

July August Sep Oct Nov Dec Jan Feb March April May June July

ActualS1

S2SD

2017 2017 2017 2017 2017 2017 2018 2018 2018 2018 2018 2018 2018

Figure 5 Accuracy comparison among integration strategies for one sample product group

73 in Figure 2 After performing the proposed integrationstrategy for Cleaning Products MAPE distribution becomes0ndash10 for the 1st quartile of data 10ndash17 for the 2ndquartile and 27ndash43 for the 3rd and 4th quartiles togetherImprovements are observed as follows 9 for the 1st quartilefrom 9 to 13 for the 2nd quartile and from 13 to 20for the rest of the data at boundaries After implementationof the integration strategy for the 1st and 2nd quartiles thesystem advancement is observed as 2 and for the 3rdand 4th quartiles it is 3 respectively Briefly integrationstrategy improvement is remarkable for each product groupand consolidation of DL to the proposed integration strategyadvances results between almost 2 and 3 additionally

In Figure 5 the accuracy comparison among integrationstrategies of one of the best performing sample groups ispresented during 1-year period Particularly Christmas andother seasonality effects can be seen very clearly at Figure 5and integration strategy with DL (SD) is performing with thebest accuracy

It is hard to compare the performance of our proposedsystem with the other studies because of the lack of workswith similar combinations of deep learning approach theproposed decision integration strategies and different learn-ing algorithms for demand forecasting process Anotherdifficulty is the deficiency of real-life dataset similar toSOK dataset in order to compare the state-of-art studiesAlthough the results of proposed system are given in thisstudy we also report the results of a number of researchworkshere on demand forecasting domain A greedy aggregation-decomposition method has solved a real-world intermittentdemand forecasting problem for a fashion retailer in Sin-gapore [46] They report 59 MAPE success rate with asmall dataset Authors compare different approaches suchas statistical model winter model and radius basis functionneural network (RBFNN) with SVM for demand forecastingprocess in [47] As a result they conclude the study by the

fact that the success of SVM algorithm surpasses otherswith around 77 enhancement at average MAPE resultslevel A novel demand forecasting model called SHEnSVM(Selective and Heterogeneous Ensemble of Support Vec-tor Machines) is proposed in [48] The proposed modelpresents that the individual SVMs are trained by differentsamples generated by bootstrap algorithm After that geneticalgorithm is employed for retrieving the best individualcombination schema They report 10 advancement withSVM algorithm and 64 MAPE improvement They onlyemploy beer datawith 3 different brands in their experimentsTugba Efendigil et al propose a novel forecasting mechanismwhich is modeled by artificial intelligence approaches in[49] They compared both artificial neural networks andadaptive network-based fuzzy inference system techniques tomanage the fuzzy demandwith incomplete informationTheyreached around 18 MAPE rates for some products duringtheir experiments

6 Conclusion

In retail industry demand forecasting is one of the mainproblems of supply chains to optimize stocks reduce costsand increase sales profit and customer loyalty To overcomethis issue there are several methods such as time seriesanalysis and machine learning approaches to analyze andlearn complex interactions and patterns from historical data

In this study there is a novel attempt to integrate the11 different forecasting models that include time series algo-rithms support vector regression model and deep learningmethod for demand forecasting process Moreover a noveldecision integration strategy is developed by drawing inspi-ration from the ensemble methodology namely boostingThe proposed approach considers the performance of eachmodel in time and combines the weighted predictions of thebest performing models for demand forecasting process The

14 Complexity

proposed forecasting system is tested and carried out real lifedata obtained from SOK Market retail chain It is observedthat the inclusion of different learning algorithms except timeseries models and a novel integration strategy advanced theperformance of demand forecasting system To the best ofour knowledge this is the very first attempt to consolidatedeep learning methodology SVR algorithm and differenttime series methods for demand forecasting systems Fur-thermore the other novelty of this work is the adaptation ofboosting ensemble strategy to the demand forecasting modelIn this way the final decision of the proposed system isbased on the best algorithms of the week by gaining moreweight This makes our forecasts more reliable with respectto the trend changes and seasonality behaviors Moreoverthe proposed approach performs very well integrating withdeep learning algorithm on Spark big data environmentDimension reduction process and clustering methods helpto decrease time consuming with less computing powerduring deep learning modelling phase Although a reviewof some of similar studies is presented in Section 5 asit is expected it is very difficult to compare the resultsof other studies with ours because of the use of differentdatasets and methods In this study we compare results ofthree models where model one selects the best performingforecasting method depending on its success on previousperiod with 424 MAPE on average The second modelwith the novel integration strategy results in 258 MAPEon average The last model with the novel integration strat-egy enhanced with deep learning approach provides 247MAPE on average As a result the inclusion of deep learningapproach into the novel integration strategy reduces averageprediction error for demand forecasting process in supplychain

As a future work we plan to enrich the set of featuresby gathering data from other sources like economic studiesshopping trends social media social events and locationbased demographic data of stores New variety of datasources contributions to deep learning can be observedOne more study can be done to determine the hyperpa-rameters for deep learning algorithm In addition we alsoplan to use other deep learning techniques such as con-volutional neural networks recurrent neural networks anddeep neural networks as learning algorithms Furthermoreour another objective is to use heuristic methods MBO(Migrating Birds Optimization) and other related algorithms[50] to optimize some of coefficientsweights which weredetermined empirically by trial-and-error like taking 30percent of the best performing methods in our currentsystem

Data Availability

This dataset is private customer data so the agreement withthe organization SOK does not allow sharing data

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work is supported by OBASE Research amp DevelopmentCenter

References

[1] P J McGoldrick and E Andre ldquoConsumer misbehaviourpromiscuity or loyalty in grocery shoppingrdquo Journal of Retailingand Consumer Services vol 4 no 2 pp 73ndash81 1997

[2] D Grewal A L Roggeveen and J Nordfalt ldquoThe Future ofRetailingrdquo Journal of Retailing vol 93 no 1 pp 1ndash6 2017

[3] E T Bradlow M Gangwar P Kopalle and S Voleti ldquothe roleof big data and predictive analytics in retailingrdquo Journal ofRetailing vol 93 no 1 pp 79ndash95 2017

[4] J Han M Kamber and J Pei Data Mining Concepts andTechniques Morgan Kaufmann Publishers USA 2013

[5] R Hyndman and G Athanasopoulos Forecasting Principlesand Practice OTexts Melbourne Australia 2018 httpotextsorgfpp2

[6] C C Pegels ldquoExponential forecasting some new variationsrdquoManagement Science vol 12 pp 311ndash315 1969

[7] E S Gardner ldquoExponential smoothing the state of the artrdquoJournal of Forecasting vol 4 no 1 pp 1ndash28 1985

[8] D Gujarati Basic Econometrics Mcgraw-Hill New York NYUSA 2003

[9] J D Croston ldquoForecasting and stock control for intermittentdemandsrdquo Operational Research Quarterly vol 23 no 3 pp289ndash303 1972

[10] T RWillemain C N Smart J H Shockor and P A DeSautelsldquoForecasting intermittent demand inmanufacturing a compar-ative evaluation of Crostonrsquos methodrdquo International Journal ofForecasting vol 10 no 4 pp 529ndash538 1994

[11] A Syntetos Forecasting of intermittent demand Brunel Univer-sity 2001

[12] A A Syntetos and J E Boylan ldquoThe accuracy of intermittentdemand estimatesrdquo International Journal of Forecasting vol 21no 2 pp 303ndash314 2005

[13] R H Teunter A A Syntetos and M Z Babai ldquoIntermittentdemand linking forecasting to inventory obsolescencerdquo Euro-pean Journal of Operational Research vol 214 no 3 pp 606ndash615 2011

[14] R J Hyndman R A Ahmed G Athanasopoulos and H LShang ldquoOptimal combination forecasts for hierarchical timeseriesrdquo Computational Statistics amp Data Analysis vol 55 no 9pp 2579ndash2589 2011

[15] R Fildes and F Petropoulos ldquoSimple versus complex selectionrules for forecasting many time seriesrdquo Journal of BusinessResearch vol 68 no 8 pp 1692ndash1701 2015

[16] C Li and A Lim ldquoA greedy aggregationndashdecompositionmethod for intermittent demand forecasting in fashion retail-ingrdquo European Journal of Operational Research vol 269 no 3pp 860ndash869 2018

[17] F Turrado Garcıa L J Garcıa Villalba and J Portela ldquoIntelli-gent system for time series classification using support vectormachines applied to supply-chainrdquo Expert Systems with Appli-cations vol 39 no 12 pp 10590ndash10599 2012

[18] G Song and Q Dai ldquoA novel double deep ELMs ensemblesystem for time series forecastingrdquo Knowledge-Based Systemsvol 134 pp 31ndash49 2017

Complexity 15

[19] O Araque I Corcuera-Platas J F Sanchez-Rada and C AIglesias ldquoEnhancing deep learning sentiment analysis withensemble techniques in social applicationsrdquoExpert SystemswithApplications vol 77 pp 236ndash246 2017

[20] H Tong B Liu and S Wang ldquoSoftware defect predictionusing stacked denoising autoencoders and twostage ensemblelearningrdquo Information and Soware Technology vol 96 pp 94ndash111 2018

[21] X Qiu Y Ren P N Suganthan and G A J AmaratungaldquoEmpirical mode decomposition based ensemble deep learningfor load demand time series forecastingrdquo Applied So Comput-ing vol 54 pp 246ndash255 2017

[22] Z Qi BWang Y Tian and P Zhang ldquoWhen ensemble learningmeets deep learning a new deep support vector machine forclassificationrdquo Knowledge-Based Systems vol 107 pp 54ndash602016

[23] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[24] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[25] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesrdquo inNeural Networks Tricks of theTrade vol 7700 of Lecture Notes in Computer Science pp 437ndash478 Springer Berlin Germany 2nd edition 2012

[26] Y LeCun Y Bengio and G Hinton ldquoDeep learning ReviewrdquoInternational Journal of Business and Social Science vol 3 2015

[27] S Kim Z Yu R M Kil and M Lee ldquoDeep learning of supportvectormachineswith class probability output networksrdquoNeuralNetworks vol 64 pp 19ndash28 2015

[28] P J Brockwell and R A Davis Time Series eory AndMethods Springer Series in Statistics NewYork NY USA 1989

[29] V N Vapnik Statistical Learning eory Wiley- InterscienceNew York NY USA 1998

[30] S Haykin Neural Networks and LearningMachines MacmillanPublishers Limited New Jersey NJ USA 2009

[31] A Candel V Parmar E LeDell and A Arora Deep Learningwith H2O United States of America 2018

[32] K Keerthi Vasan and B Surendiran ldquoDimensionality reductionusing principal component analysis for network intrusiondetectionrdquo Perspectives in Science vol 8 pp 510ndash512 2016

[33] L Rokach ldquoEnsemble-based classifiersrdquo Artificial IntelligenceReview vol 33 no 1-2 pp 1ndash39 2010

[34] R Polikar ldquoEnsemble based systems in decision makingrdquo IEEECircuits and Systems Magazine vol 6 no 3 pp 21ndash45 2006

[35] Reid SamA review of heterogeneous ensemblemethods Depart-ment of Computer Science University of Colorado at Boulder2007

[36] D Gopika and B Azhagusundari ldquoAn analysis on ensem-ble methods in classification tasksrdquo International Journal ofAdvanced Research in Computer and Communication Engineer-ing vol 3 no 7 pp 7423ndash7427 2014

[37] Y Ren L Zhang and P N Suganthan ldquoEnsemble classificationand regression-recent developments applications and futuredirectionsrdquo IEEE Computational Intelligence Magazine vol 11no 1 pp 41ndash53 2016

[38] U G Mangai S Samanta S Das and P R ChowdhuryldquoA survey of decision fusion and feature fusion strategies for

pattern classificationrdquo IETE Technical Review vol 27 no 4 pp293ndash307 2010

[39] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014

[40] G Tsoumakas L Angelis and I Vlahavas ldquoSelective fusion ofheterogeneous classifiersrdquo Intelligent Data Analysis vol 9 no 6pp 511ndash525 2005

[41] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[42] S Chopra and P Meindl Supply Chain Management StrategyPlanning and Operation Prentice Hall 2004

[43] G Kushwaha ldquoOperational performance through supply chainmanagement practicesrdquo International Journal of Business andSocial Science vol 217 pp 65ndash77 2012

[44] G E Box G M Jenkins G Reinsel and G M LjungTime Series Analysis Forecasting and Control Wiley Series inProbability and Statistics John Wiley amp Sons Hoboken NJUSA 5th edition 2015

[45] W-L Zhao C-H Deng and C-W Ngo ldquok-means a revisitrdquoNeurocomputing vol 291 pp 195ndash206 2018

[46] C Li andA Lim ldquoA greedy aggregation-decompositionmethodfor intermittent demand forecasting in fashion retailingrdquo Euro-pean Journal of Operational Research vol 269 no 3 pp 860ndash869 2018

[47] L Yue Y Yafeng G Junjun and T Chongli ldquoDemand forecast-ing by using support vectormachinerdquo inProceedings of theirdInternational Conference on Natural Computation (ICNC 2007)pp 272ndash276 Haikou China August 2007

[48] L Yue L Zhenjiang Y Yafeng T Zaixia G Junjun andZ Bofeng ldquoSelective and heterogeneous SVM ensemble fordemand forecastingrdquo in Proceedings of the 2010 IEEE 10th Inter-national Conference on Computer and Information Technology(CIT) pp 1519ndash1524 Bradford UK June 2010

[49] T Efendigil S Onut and C Kahraman ldquoA decision supportsystem for demand forecasting with artificial neural networksand neuro-fuzzy models a comparative analysisrdquo Expert Sys-tems with Applications vol 36 no 3 pp 6697ndash6707 2009

[50] E Duman M Uysal and A F Alkaya ldquoMigrating birds opti-mization a newmetaheuristic approach and its performance onquadratic assignment problemrdquo Information Sciences vol 217pp 65ndash77 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 14: An Improved Demand Forecasting Model Using …downloads.hindawi.com/journals/complexity/2019/9067367.pdfAn Improved Demand Forecasting Model Using Deep Learning Approach and Proposed

14 Complexity

proposed forecasting system is tested and carried out real lifedata obtained from SOK Market retail chain It is observedthat the inclusion of different learning algorithms except timeseries models and a novel integration strategy advanced theperformance of demand forecasting system To the best ofour knowledge this is the very first attempt to consolidatedeep learning methodology SVR algorithm and differenttime series methods for demand forecasting systems Fur-thermore the other novelty of this work is the adaptation ofboosting ensemble strategy to the demand forecasting modelIn this way the final decision of the proposed system isbased on the best algorithms of the week by gaining moreweight This makes our forecasts more reliable with respectto the trend changes and seasonality behaviors Moreoverthe proposed approach performs very well integrating withdeep learning algorithm on Spark big data environmentDimension reduction process and clustering methods helpto decrease time consuming with less computing powerduring deep learning modelling phase Although a reviewof some of similar studies is presented in Section 5 asit is expected it is very difficult to compare the resultsof other studies with ours because of the use of differentdatasets and methods In this study we compare results ofthree models where model one selects the best performingforecasting method depending on its success on previousperiod with 424 MAPE on average The second modelwith the novel integration strategy results in 258 MAPEon average The last model with the novel integration strat-egy enhanced with deep learning approach provides 247MAPE on average As a result the inclusion of deep learningapproach into the novel integration strategy reduces averageprediction error for demand forecasting process in supplychain

As a future work we plan to enrich the set of featuresby gathering data from other sources like economic studiesshopping trends social media social events and locationbased demographic data of stores New variety of datasources contributions to deep learning can be observedOne more study can be done to determine the hyperpa-rameters for deep learning algorithm In addition we alsoplan to use other deep learning techniques such as con-volutional neural networks recurrent neural networks anddeep neural networks as learning algorithms Furthermoreour another objective is to use heuristic methods MBO(Migrating Birds Optimization) and other related algorithms[50] to optimize some of coefficientsweights which weredetermined empirically by trial-and-error like taking 30percent of the best performing methods in our currentsystem

Data Availability

This dataset is private customer data so the agreement withthe organization SOK does not allow sharing data

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this article

Acknowledgments

This work is supported by OBASE Research amp DevelopmentCenter

References

[1] P J McGoldrick and E Andre ldquoConsumer misbehaviourpromiscuity or loyalty in grocery shoppingrdquo Journal of Retailingand Consumer Services vol 4 no 2 pp 73ndash81 1997

[2] D Grewal A L Roggeveen and J Nordfalt ldquoThe Future ofRetailingrdquo Journal of Retailing vol 93 no 1 pp 1ndash6 2017

[3] E T Bradlow M Gangwar P Kopalle and S Voleti ldquothe roleof big data and predictive analytics in retailingrdquo Journal ofRetailing vol 93 no 1 pp 79ndash95 2017

[4] J Han M Kamber and J Pei Data Mining Concepts andTechniques Morgan Kaufmann Publishers USA 2013

[5] R Hyndman and G Athanasopoulos Forecasting Principlesand Practice OTexts Melbourne Australia 2018 httpotextsorgfpp2

[6] C C Pegels ldquoExponential forecasting some new variationsrdquoManagement Science vol 12 pp 311ndash315 1969

[7] E S Gardner ldquoExponential smoothing the state of the artrdquoJournal of Forecasting vol 4 no 1 pp 1ndash28 1985

[8] D Gujarati Basic Econometrics Mcgraw-Hill New York NYUSA 2003

[9] J D Croston ldquoForecasting and stock control for intermittentdemandsrdquo Operational Research Quarterly vol 23 no 3 pp289ndash303 1972

[10] T RWillemain C N Smart J H Shockor and P A DeSautelsldquoForecasting intermittent demand inmanufacturing a compar-ative evaluation of Crostonrsquos methodrdquo International Journal ofForecasting vol 10 no 4 pp 529ndash538 1994

[11] A Syntetos Forecasting of intermittent demand Brunel Univer-sity 2001

[12] A A Syntetos and J E Boylan ldquoThe accuracy of intermittentdemand estimatesrdquo International Journal of Forecasting vol 21no 2 pp 303ndash314 2005

[13] R H Teunter A A Syntetos and M Z Babai ldquoIntermittentdemand linking forecasting to inventory obsolescencerdquo Euro-pean Journal of Operational Research vol 214 no 3 pp 606ndash615 2011

[14] R J Hyndman R A Ahmed G Athanasopoulos and H LShang ldquoOptimal combination forecasts for hierarchical timeseriesrdquo Computational Statistics amp Data Analysis vol 55 no 9pp 2579ndash2589 2011

[15] R Fildes and F Petropoulos ldquoSimple versus complex selectionrules for forecasting many time seriesrdquo Journal of BusinessResearch vol 68 no 8 pp 1692ndash1701 2015

[16] C Li and A Lim ldquoA greedy aggregationndashdecompositionmethod for intermittent demand forecasting in fashion retail-ingrdquo European Journal of Operational Research vol 269 no 3pp 860ndash869 2018

[17] F Turrado Garcıa L J Garcıa Villalba and J Portela ldquoIntelli-gent system for time series classification using support vectormachines applied to supply-chainrdquo Expert Systems with Appli-cations vol 39 no 12 pp 10590ndash10599 2012

[18] G Song and Q Dai ldquoA novel double deep ELMs ensemblesystem for time series forecastingrdquo Knowledge-Based Systemsvol 134 pp 31ndash49 2017

Complexity 15

[19] O Araque I Corcuera-Platas J F Sanchez-Rada and C AIglesias ldquoEnhancing deep learning sentiment analysis withensemble techniques in social applicationsrdquoExpert SystemswithApplications vol 77 pp 236ndash246 2017

[20] H Tong B Liu and S Wang ldquoSoftware defect predictionusing stacked denoising autoencoders and twostage ensemblelearningrdquo Information and Soware Technology vol 96 pp 94ndash111 2018

[21] X Qiu Y Ren P N Suganthan and G A J AmaratungaldquoEmpirical mode decomposition based ensemble deep learningfor load demand time series forecastingrdquo Applied So Comput-ing vol 54 pp 246ndash255 2017

[22] Z Qi BWang Y Tian and P Zhang ldquoWhen ensemble learningmeets deep learning a new deep support vector machine forclassificationrdquo Knowledge-Based Systems vol 107 pp 54ndash602016

[23] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[24] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[25] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesrdquo inNeural Networks Tricks of theTrade vol 7700 of Lecture Notes in Computer Science pp 437ndash478 Springer Berlin Germany 2nd edition 2012

[26] Y LeCun Y Bengio and G Hinton ldquoDeep learning ReviewrdquoInternational Journal of Business and Social Science vol 3 2015

[27] S Kim Z Yu R M Kil and M Lee ldquoDeep learning of supportvectormachineswith class probability output networksrdquoNeuralNetworks vol 64 pp 19ndash28 2015

[28] P J Brockwell and R A Davis Time Series eory AndMethods Springer Series in Statistics NewYork NY USA 1989

[29] V N Vapnik Statistical Learning eory Wiley- InterscienceNew York NY USA 1998

[30] S Haykin Neural Networks and LearningMachines MacmillanPublishers Limited New Jersey NJ USA 2009

[31] A Candel V Parmar E LeDell and A Arora Deep Learningwith H2O United States of America 2018

[32] K Keerthi Vasan and B Surendiran ldquoDimensionality reductionusing principal component analysis for network intrusiondetectionrdquo Perspectives in Science vol 8 pp 510ndash512 2016

[33] L Rokach ldquoEnsemble-based classifiersrdquo Artificial IntelligenceReview vol 33 no 1-2 pp 1ndash39 2010

[34] R Polikar ldquoEnsemble based systems in decision makingrdquo IEEECircuits and Systems Magazine vol 6 no 3 pp 21ndash45 2006

[35] Reid SamA review of heterogeneous ensemblemethods Depart-ment of Computer Science University of Colorado at Boulder2007

[36] D Gopika and B Azhagusundari ldquoAn analysis on ensem-ble methods in classification tasksrdquo International Journal ofAdvanced Research in Computer and Communication Engineer-ing vol 3 no 7 pp 7423ndash7427 2014

[37] Y Ren L Zhang and P N Suganthan ldquoEnsemble classificationand regression-recent developments applications and futuredirectionsrdquo IEEE Computational Intelligence Magazine vol 11no 1 pp 41ndash53 2016

[38] U G Mangai S Samanta S Das and P R ChowdhuryldquoA survey of decision fusion and feature fusion strategies for

pattern classificationrdquo IETE Technical Review vol 27 no 4 pp293ndash307 2010

[39] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014

[40] G Tsoumakas L Angelis and I Vlahavas ldquoSelective fusion ofheterogeneous classifiersrdquo Intelligent Data Analysis vol 9 no 6pp 511ndash525 2005

[41] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[42] S Chopra and P Meindl Supply Chain Management StrategyPlanning and Operation Prentice Hall 2004

[43] G Kushwaha ldquoOperational performance through supply chainmanagement practicesrdquo International Journal of Business andSocial Science vol 217 pp 65ndash77 2012

[44] G E Box G M Jenkins G Reinsel and G M LjungTime Series Analysis Forecasting and Control Wiley Series inProbability and Statistics John Wiley amp Sons Hoboken NJUSA 5th edition 2015

[45] W-L Zhao C-H Deng and C-W Ngo ldquok-means a revisitrdquoNeurocomputing vol 291 pp 195ndash206 2018

[46] C Li andA Lim ldquoA greedy aggregation-decompositionmethodfor intermittent demand forecasting in fashion retailingrdquo Euro-pean Journal of Operational Research vol 269 no 3 pp 860ndash869 2018

[47] L Yue Y Yafeng G Junjun and T Chongli ldquoDemand forecast-ing by using support vectormachinerdquo inProceedings of theirdInternational Conference on Natural Computation (ICNC 2007)pp 272ndash276 Haikou China August 2007

[48] L Yue L Zhenjiang Y Yafeng T Zaixia G Junjun andZ Bofeng ldquoSelective and heterogeneous SVM ensemble fordemand forecastingrdquo in Proceedings of the 2010 IEEE 10th Inter-national Conference on Computer and Information Technology(CIT) pp 1519ndash1524 Bradford UK June 2010

[49] T Efendigil S Onut and C Kahraman ldquoA decision supportsystem for demand forecasting with artificial neural networksand neuro-fuzzy models a comparative analysisrdquo Expert Sys-tems with Applications vol 36 no 3 pp 6697ndash6707 2009

[50] E Duman M Uysal and A F Alkaya ldquoMigrating birds opti-mization a newmetaheuristic approach and its performance onquadratic assignment problemrdquo Information Sciences vol 217pp 65ndash77 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 15: An Improved Demand Forecasting Model Using …downloads.hindawi.com/journals/complexity/2019/9067367.pdfAn Improved Demand Forecasting Model Using Deep Learning Approach and Proposed

Complexity 15

[19] O Araque I Corcuera-Platas J F Sanchez-Rada and C AIglesias ldquoEnhancing deep learning sentiment analysis withensemble techniques in social applicationsrdquoExpert SystemswithApplications vol 77 pp 236ndash246 2017

[20] H Tong B Liu and S Wang ldquoSoftware defect predictionusing stacked denoising autoencoders and twostage ensemblelearningrdquo Information and Soware Technology vol 96 pp 94ndash111 2018

[21] X Qiu Y Ren P N Suganthan and G A J AmaratungaldquoEmpirical mode decomposition based ensemble deep learningfor load demand time series forecastingrdquo Applied So Comput-ing vol 54 pp 246ndash255 2017

[22] Z Qi BWang Y Tian and P Zhang ldquoWhen ensemble learningmeets deep learning a new deep support vector machine forclassificationrdquo Knowledge-Based Systems vol 107 pp 54ndash602016

[23] Y Bengio A Courville and P Vincent ldquoRepresentation learn-ing a review and new perspectivesrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1798ndash1828 2013

[24] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[25] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesrdquo inNeural Networks Tricks of theTrade vol 7700 of Lecture Notes in Computer Science pp 437ndash478 Springer Berlin Germany 2nd edition 2012

[26] Y LeCun Y Bengio and G Hinton ldquoDeep learning ReviewrdquoInternational Journal of Business and Social Science vol 3 2015

[27] S Kim Z Yu R M Kil and M Lee ldquoDeep learning of supportvectormachineswith class probability output networksrdquoNeuralNetworks vol 64 pp 19ndash28 2015

[28] P J Brockwell and R A Davis Time Series eory AndMethods Springer Series in Statistics NewYork NY USA 1989

[29] V N Vapnik Statistical Learning eory Wiley- InterscienceNew York NY USA 1998

[30] S Haykin Neural Networks and LearningMachines MacmillanPublishers Limited New Jersey NJ USA 2009

[31] A Candel V Parmar E LeDell and A Arora Deep Learningwith H2O United States of America 2018

[32] K Keerthi Vasan and B Surendiran ldquoDimensionality reductionusing principal component analysis for network intrusiondetectionrdquo Perspectives in Science vol 8 pp 510ndash512 2016

[33] L Rokach ldquoEnsemble-based classifiersrdquo Artificial IntelligenceReview vol 33 no 1-2 pp 1ndash39 2010

[34] R Polikar ldquoEnsemble based systems in decision makingrdquo IEEECircuits and Systems Magazine vol 6 no 3 pp 21ndash45 2006

[35] Reid SamA review of heterogeneous ensemblemethods Depart-ment of Computer Science University of Colorado at Boulder2007

[36] D Gopika and B Azhagusundari ldquoAn analysis on ensem-ble methods in classification tasksrdquo International Journal ofAdvanced Research in Computer and Communication Engineer-ing vol 3 no 7 pp 7423ndash7427 2014

[37] Y Ren L Zhang and P N Suganthan ldquoEnsemble classificationand regression-recent developments applications and futuredirectionsrdquo IEEE Computational Intelligence Magazine vol 11no 1 pp 41ndash53 2016

[38] U G Mangai S Samanta S Das and P R ChowdhuryldquoA survey of decision fusion and feature fusion strategies for

pattern classificationrdquo IETE Technical Review vol 27 no 4 pp293ndash307 2010

[39] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014

[40] G Tsoumakas L Angelis and I Vlahavas ldquoSelective fusion ofheterogeneous classifiersrdquo Intelligent Data Analysis vol 9 no 6pp 511ndash525 2005

[41] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[42] S Chopra and P Meindl Supply Chain Management StrategyPlanning and Operation Prentice Hall 2004

[43] G Kushwaha ldquoOperational performance through supply chainmanagement practicesrdquo International Journal of Business andSocial Science vol 217 pp 65ndash77 2012

[44] G E Box G M Jenkins G Reinsel and G M LjungTime Series Analysis Forecasting and Control Wiley Series inProbability and Statistics John Wiley amp Sons Hoboken NJUSA 5th edition 2015

[45] W-L Zhao C-H Deng and C-W Ngo ldquok-means a revisitrdquoNeurocomputing vol 291 pp 195ndash206 2018

[46] C Li andA Lim ldquoA greedy aggregation-decompositionmethodfor intermittent demand forecasting in fashion retailingrdquo Euro-pean Journal of Operational Research vol 269 no 3 pp 860ndash869 2018

[47] L Yue Y Yafeng G Junjun and T Chongli ldquoDemand forecast-ing by using support vectormachinerdquo inProceedings of theirdInternational Conference on Natural Computation (ICNC 2007)pp 272ndash276 Haikou China August 2007

[48] L Yue L Zhenjiang Y Yafeng T Zaixia G Junjun andZ Bofeng ldquoSelective and heterogeneous SVM ensemble fordemand forecastingrdquo in Proceedings of the 2010 IEEE 10th Inter-national Conference on Computer and Information Technology(CIT) pp 1519ndash1524 Bradford UK June 2010

[49] T Efendigil S Onut and C Kahraman ldquoA decision supportsystem for demand forecasting with artificial neural networksand neuro-fuzzy models a comparative analysisrdquo Expert Sys-tems with Applications vol 36 no 3 pp 6697ndash6707 2009

[50] E Duman M Uysal and A F Alkaya ldquoMigrating birds opti-mization a newmetaheuristic approach and its performance onquadratic assignment problemrdquo Information Sciences vol 217pp 65ndash77 2012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 16: An Improved Demand Forecasting Model Using …downloads.hindawi.com/journals/complexity/2019/9067367.pdfAn Improved Demand Forecasting Model Using Deep Learning Approach and Proposed

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom