Demand Forecasting Of Outbound Logistics Using Machine ...

43
Master of Science in computer science February 2018 Demand Forecasting Of Outbound Logistics Using Machine learning Ashik Talupula Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden

Transcript of Demand Forecasting Of Outbound Logistics Using Machine ...

Page 1: Demand Forecasting Of Outbound Logistics Using Machine ...

Master of Science in computer scienceFebruary 2018

Demand Forecasting Of OutboundLogistics Using Machine learning

Ashik Talupula

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden

Page 2: Demand Forecasting Of Outbound Logistics Using Machine ...

This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology inpartial fulfilment of the requirements for the degree of Master of Science in computer science .The thesis is equivalent to 20 weeks of full time studies.

The authors declare that they are the sole authors of this thesis and that they have not usedany sources other than those listed in the bibliography and identified as references. They furtherdeclare that they have not submitted this thesis at any other institution to obtain a degree.

Contact Information:Author(s): Ashik TalupulaE-mail:[email protected]

University advisor: Dr. Hüseyin KusetogullariDepartment of Computer Science

Faculty of Computing Internet : www.bth.seBlekinge Institute of Technology Phone : +46 455 38 50 00SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57

Page 3: Demand Forecasting Of Outbound Logistics Using Machine ...

Abstract

Background. long term volume forecasting is important for logistics service providerfor planning their capacity and taking the strategic decisions. At present demandis estimated by using traditional methods of averaging techniques or with their ownexperiences which often contain some error.This study is focused on filling these gapsby using machine learning approaches.Sample data set is provided by the organiza-tion, which is the leading manufacture of Trucks, buses and construction equipment,organization has a customers from more than 190 markets and has a production fa-cilities in 18 countries.Objectives. This study is to investigate a suitable machine learning algorithm thatcan be used for forecasting demand of outbound distributed products and then eval-uating the performance of the selected algorithms by conducting an experiment toarticulate the possibility of using long-term forecasting in transportation.Methods. primarily, literature review was initiated to find a suitable machine learn-ing algorithm and then based on the results of literature review an experiment isperformed to evaluate the performance of the selected algorithmsResults. Selected CNN,ANN and LSTM models are performing quite well Butbased on the type and amount of historical data that models were given to learn,models have a very slight difference in performance measures in terms of forecastingperformance. Comparisons are made with different measures that are selected by theliterature reviewConclusions. This study examines the efficacy of using Convolutional Neural Net-works (CNN) for performing demand forecasting of outbound distributed productsat country level. The methodology provided uses convolutions on historical loads.The output from the convolutional operation is supplied to fully connected layers to-gether with other relevant data. The presented methodology was implemented on anorganization data set of outbound distributed products per month. Results obtainedfrom the CNN were compared to results obtained by Long Short Term MemoriesLSTM sequence-to-sequence (LSTM S2S) and Artificial Neural Networks (ANN) forthe same dataset. Experimental results showed that the CNN outperformed LSTMwhile producing comparable results to the ANN. Further testing is needed to comparethe performances of different deep learning architectures in outbound forecasting.

Keywords: Demand forecasting , time series , outbound logistics, machine learning.

i

Page 4: Demand Forecasting Of Outbound Logistics Using Machine ...

Acknowledgments

First of all, I would like to thank my university supervisor, Dr. Hüseyin Kuse-togullari. He was always open when I ran into a trouble spot or had a questionabout my research or writing query. He always permitted this paper to be my ownwork, but steered me in the right adirection whenever he thought I required it. Iwould also like to thank my supervisor at Volvo Teja Yerneni for supporting me notonly with the thesis part but also in motivating and collaborating with the team atVolvo. Finally, I must express my deep appreciation to my parents and to my friendsfor offering me with unfailing support and continuous encouragement throughoutmy years of study and through the process of researching and writing this thesis.Without them, this achievement would not have been feasible. Thank you.

ii

Page 5: Demand Forecasting Of Outbound Logistics Using Machine ...

Contents

Abstract i

Acknowledgments ii

1 Introduction 11.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Related Work 42.1 Time series forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Preliminaries 73.1 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 Time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2.1 Univariate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2.2 Multivariate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2.3 Components of time series . . . . . . . . . . . . . . . . . . . . 8

3.3 Time series forecasting as a supervised problem . . . . . . . . . . . . 93.3.1 Supervised learning . . . . . . . . . . . . . . . . . . . . . . . . 93.3.2 Sliding window approach for time series data . . . . . . . . . . 9

3.4 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . 93.4.1 Activation Functions . . . . . . . . . . . . . . . . . . . . . . . 103.4.2 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . 123.4.3 LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.4.4 CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.5 ARIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.6 SVR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.7 Multiple parallel input and Multi step output. . . . . . . . . . . . . . 16

4 Method 184.1 Data gathering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Data pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.3 Data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.4 Experiment setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.5 performance metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.6 Walk forward Validation . . . . . . . . . . . . . . . . . . . . . . . . . 21

iii

Page 6: Demand Forecasting Of Outbound Logistics Using Machine ...

5 Results 235.1 Learning curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.2 FORECASTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.3 Forecasting Performance . . . . . . . . . . . . . . . . . . . . . . . . . 245.4 Validity Threats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6 Analysis and Discussion 266.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

7 Conclusions and Future Work 28

References 29

A Supplemental Information 32

iv

Page 7: Demand Forecasting Of Outbound Logistics Using Machine ...

List of Figures

1.1 Outbound process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.1 Time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.1 Univariate time series . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 multivariate time series . . . . . . . . . . . . . . . . . . . . . . . . . . 83.3 time series decomposition . . . . . . . . . . . . . . . . . . . . . . . . 83.4 Time series data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.5 supervised problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.6 single layer perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . 103.7 Multi layer perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . 103.8 Sigmoid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.9 Tan-h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.10 Relu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.11 Recurrent and feed forward networks structure . . . . . . . . . . . . . 123.12 LSTM Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.13 support vector regressor . . . . . . . . . . . . . . . . . . . . . . . . . 163.14 multivariate time series . . . . . . . . . . . . . . . . . . . . . . . . . . 163.15 Transformation of input and output from the above series . . . . . . . 17

4.1 Data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.2 Walk forward validation . . . . . . . . . . . . . . . . . . . . . . . . . 22

5.1 LSTM training graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.2 CNN training graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.3 Actual vs forecast using CNN . . . . . . . . . . . . . . . . . . . . . . 245.4 Actual vs forecast using CNN . . . . . . . . . . . . . . . . . . . . . . 245.5 Models performances . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

A.1 Distribution of residuals . . . . . . . . . . . . . . . . . . . . . . . . . 32A.2 Actual vs forecast using LSTM . . . . . . . . . . . . . . . . . . . . . 32A.3 Decomposition of Time series . . . . . . . . . . . . . . . . . . . . . . 33A.4 forecsat using LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . 33A.5 forecast using LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

v

Page 8: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 1Introduction

A supply chain consists of all activities bounded with moving goods from raw materi-als to the consumer [35]. Sales and Order Planning(SOP) is responsible for planningand agreeing volume from all business units for the upcoming months on the firsthand. Then it plays the role of communication of those volumes to operation plantsand production logistics to plan supply chain activities[33]. Logistics is the process ofdistribution of goods from point of origin to point of consumption to meet consumerrequirements. Inbound logistics refers to transport, storage, delivery of goods cominginside a business and outbound logistics refers to the same for goods going outsideof a business[34]. The process starts, when a customer places an order by connectingto the sales department, the order is then processed by sales department and assignsit to the production plant. Sales office provides the customer with customer deliverydate (CDD). CDD is provided if, and only if goods are directly transported to thecustomer location and it is specified as Available at Terminal Date (ATD) and Indi-cated Customer Delivery Date(I-CDD) if the goods will pass through terminal andnoted as transfer. An ATD shows when the order ought to be at the terminal andbe prepared to stack onto the following transport unit and an I-CDD. The businessvolume of logistics has a sustainable growth with the advancement of the economyand improved offline and online technology thus, efficient logistics demand predictionis needed to manage their processes in an organized manner[18] .

Forecasting is the process of predicting the future, based on past or current data.Forecasting plays an important role in sales and operations planning for taking strate-gic and planning decisions. Forecasted values are just the projections, we don’t getthe exact value we only try to reduce the error with the help of forecasting toolsand more sophisticated models. One can easily forecast sales by using different fore-casting techniques like ARIMA[22], SVM [20], ANN [23][37], LSTM[13],CNN[15][30],etc. by having the details of previous sales record and accurate demand details.

1

Page 9: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 1. Introduction 2

Figure 1.1: Outbound process

Forecasting on outbound distributed products lowers the cost of warehousingand transportation by optimizing the logistic process through consolidation, capacityplanning and collaboration using a third-party logistics provider. The purpose of theThesis is to forecast outbound distributed products of a manufacturing company thatuses third party logistics (3PL) services for distribution of their products throughAir, water and road transportation. Third party services include handling logisticssuch as warehousing, packaging, fulfillment and distribution.

1.1 Problem StatementMost of the logistics service providers faces several challenges in managing warehouseand distribution of products such as capacity planning, freight volume. So, there isa need to study outbound processes of a manufacturing company for developing aproper plan to overcome the challenges. Transportation is the major part of Logistics,where securing the capacity in carriers would be the most concerned issue for thelogistics services especially international logistics. The risk of shortage with capacitywith carrier providers can be minimized by an early request of space, this can beachieved through reliable Long-term volume forecasting (LTVF). This also help inplanning and handling the higher capacity demand of transportations, which can’tbe handled by the carrier providers. Carrier providers could increase their service

Page 10: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 1. Introduction 3

capacity upon request to meet our demands if we could give an early demand.

1.1.1 Aim

The main aim of the thesis is to investigate suitable machine learning model thatcan translate SOP (sales and operation planning) information into forecasting infor-mation for the outbound Distribute product processes.

1.1.2 Objectives

• Identifying an appropriate machine learning model for forecasting outboundlogistics.

• Evaluating the efficiency of selected machine learning algorithm

1.1.3 Research Questions

• RQ 1: What are the available state-of-art-methods used in forecasting?

The motivation for this research question is to find out a suitable forecastingmethod to identify underlying causes over a period.

• RQ 2: Which Machine Learning Model would perform better forecasting ontime series data?

Motivation: The motivation for this research question is to evaluate differenttime series forecasting models on outbound logistics and selecting the appro-priate one based on the performance.

Page 11: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 2Related Work

Related work for this research incorporates demand forecasting of supply chain andlogistics in general.

The investigations on forecasting demand and its connection to supply chainnetwork began far earlier. In 1960, Winter exhibited strategies of the exponentialforecasting framework for forecasting sales for the purpose of optimizing productionplanning. In the most recent decade, a few papers were proposed to deal with thisissue. Gilbert (2005) [14] stated a multistage inventory network model build onARIMA, he also motivated about the cause of bullwhip effect and also the demandvariations in inventory and galling orders. Besides, Liang (2006) [31] proposed asolution for estimating the ordering capacity for the period t+ 1 of a multi-echelonsupply chain, where every entity was permitted to use diverse inventory structure.

Aburto and Weber (2007) [1] presented an hybrid intelligent system which iscombination of both neural networks and ARIMA(auto regressive and moving aver-age model) for forecasting the demand. In 2008, Carbonneau [5] stated the use ofadvance non linear machine learning algorithms in the context of extended supplychain.

Garcia et al. (2012) [11], used support vector machines to solve the issues facedby distribution and discovery of new models. Kandananond (2012) [24] stated thatIn forecasting consumer product demand Support vector machine outran the resultscompared to ANN’s (artificial neural networks) and also in the roiling year , the sameauthor [24] mentioned that SVM surpasses ARIMA method of forecasting.

Manas Gaur, Shruthi Goel, and Eshaan Jain (2015) [12] used K Nearest Neighborand the Bayesian networks for forecasting the demand in supply chain. The aimof their study is to find the suitable one by comparing both the algorithms andalso adaptive boosting technique is used in conjunction with different algorithmsto improve the performance of the model. Results of the experiment shows thatBayesian networks with or without adding adaptive boosting surpasses the K NearestNeighbor (KNN). Also, KNN with two nearest neighbors gave the promising results

Wen-Jing Yuan and ZE-YI Jin [38]proposed a combination of grey model andstacked autoencoder (SAE) for forecasting demand of logistics by taking merits forthe uniqueness of logistics demand forecasting issue. The original data is processedthrough multiple grey model and the out of the grey model is given as the input tothe SAE model, to get the final predicted value, extreme learning machine (ELM)is applied for exact prediction at the top and SAE for feature extraction at the bot-tom. The proposed model shows more accurate results than ordinary grey networkmodel when applied in the empirical research on the logistics demand of a Brazilian

4

Page 12: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 2. Related Work 5

company.Yan Zhao and Shengchang [40], Wang proposed two forecasting models Support

vector machine (SVM) and least squares support vector machine (LS SVM), to iden-tify the better forecasting model they evaluated the efficiency of both the modelsby considering the features of complexity and nonlinearity in highway freight vol-ume. Based on calculations the forecasting model based on LSSVM is efficient forforecasting the freight volume

Pei-you Chen and LU Liu [7], proposed PSO-SVR algorithm which is combi-nation of both support vector regression (SVR) and particle swarm optimizationalgorithm (PSO) to forecast the demand of coal transportation. They selected rail-way freight turnover volume, amount of coal consumption and some other factors,they choose railway freight volume from 1995-2011 as learning samples, radial basisfunction (RBF) as the kernel of prediction model to establish the influence factorsby combining both the models. Results show that the selected algorithm is superiorto Neural Networks Back propagation (BP) in forecast accuracy and error.

Real Carbonneau [5] studied the applications of advanced machine learning al-gorithms such as neural networks, recurrent neural networks and SVR to forecastthe falsified demand data set of the supply chain. He compared it with traditionalmethods like linear regression, moving average. Results paved positive towards theRNN and SVR. Two different data sets were used to experiment one is collected fromsimulated supply chain and other one is actual Canadian foundries orders.

Jingyi Du [9] stated that LSTM (neural networks) has gained a great attentionin deep learning, especially time series. LSTM network is used to predict the applestock price by using multiple feature input and single feature input variable to verifythe prediction on stocks time series. Results showed positive when used multi-featureas an input.

The widely used machine learning approach is Artificial Neural Network. How-ever, Hu and Zhang (2008)[17] explained the draw backs of using ANNs such asoptimizing cost function and uncontrolled convergence. LSTM, support vector re-gressor and Random forest regressor to present the accurate demand forecasting byovercoming the draw backs caused by traditional methods and ANNs

Kasun Amarasinghe , Daniel L Marino[2] atalked about the forecasting demandof energy load using deep neural networks. this paper investigates the effectiveness ofusing Convolutional Neural Networks (CNN) for performing energy load forecastingat individual building level. The presented methodology uses convolutions on his-torical loads. The output from the convolutional operation is fed to fully connectedlayers together with other pertinent information. The presented methodology wasimplemented on a benchmark data set of electricity consumption for a single res-idential customer. Results obtained from the CNN were compared against resultsobtained by Long Short Term Memories LSTM sequence-to-sequence (LSTM S2S),Factored Restricted Boltzmann Machines (FCRBM), “shallow” Artificial Neural Net-works (ANN) and Support Vector Machines (SVM) for the same dataset. Experi-mental results showed that the CNN outperformed SVR while producing comparableresults to the ANN and deep learning methodologies.

Page 13: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 2. Related Work 6

2.1 Time series forecastingTime series is a set of data points taken at specified time of equal intervals [6].Time series analysis has only one variable and need to predict other variable withrespect to time.Time series data are often spotted in predicting stock prices, sales ofa retail store, electricity demand, airline passengers, forecasting weather. consideran observed time series as t1,t2,t3,t4. . . ..tn want to forecast for the future value ofthe series tn+h were h is the forecast horizon . Forecast of tn+h is made for ‘h’steps ahead at the time ‘tn’ is represented as t^n(h). The symbol t^will distinguishfrom observed and forecasted values. A forecasting method is a technique for figuringout forecasts from present and past observations []. A forecasting model is selectedbased on given series of data. Forecasting methods and models aren’t similar theyshouldn’t be used as an equivalent term. Judgmental forecasts, Univariate methods,Multivariate methods are the three types of Forecasting methods[6]

Figure 2.1: Time series

data set provided for the study follows a non-linear pattern.According to theliterature study conducted many papers stated that ANN, LSTM and CNN are bestsuitable for adopting the patterns in non linear data. so deep learning techniquesLSTM, CNN and ANN(’Artificial Neural Networks’)are adpoted for this study

Page 14: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 3Preliminaries

3.1 ForecastingForecasting is determining what is going to happen in the future by analyzing

from the past and current patterns in the data, it helps the business people to planthe uncertainty of what will might and might not occur. Forecasting approaches arefurther classified into two types:

1. Quantitative: this type of forecasting is done by taking historical data, timeseries or correlation information and creating these forecasts out into the future ofwhat we think is going to happen right

2. Qualitative: These are opinions taken from experts, decision makers andcustomers

3.2 Time seriesAs discussed in the section 2.1 time series is a sequence of observations s 2 R,usually ordered in time, these are further classified into univariate and multivariatetime series depending on the number of dependent variables recorded with respectto time.

3.2.1 Univariate

Univariate timeseries data that has only single variable recorded sequential over equalintervals of time. The table below is a univariate time series data stating the monthlysales of a product.

Figure 3.1: Univariate time series

3.2.2 Multivariate

Multivariate time series data that has more than one-time dependent variable (mul-tiple time series that deals with dependent data simultaneously). These type of timeseries data are bang on and challenging in context the machine learning.

7

Page 15: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 3. Preliminaries 8

Figure 3.2: multivariate time series

3.2.3 Components of time series

The several reasons which affect the values of an observation in a time series are saidto be components of time series. These are decomposed into four categories:

• Trend: in time series analysis trend is a movement to relatively higher or lowervalues over a long period of time. When a trend pattern of the data exhibitsa general direction that is upward (higher highs and higher lows) is called asupward trend. When a trend pattern of the data exhibits a general directionthat is downward (lower highs and lower lows) is called as downward trend.When there is no trend it is called as horizontal trend

• Seasonality: time series data that exhibits a repeating pattern at a fixed intervalof time within a one-year period is called as seasonality. It is a common patternseen across many time series data.

• Cyclic pattern: It exists, when the data exhibits rises and falls that are not ofa fixed period.

• Irregular fluctuations : these are left over series of residuals after removing oftrend and cyclic variations from a data set, which may or may not be random.These fluctuations are unpredictable and erratic in nature.

Figure 3.3: time series decomposition

Page 16: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 3. Preliminaries 9

3.3 Time series forecasting as a supervised problemMost of the time series forecasting problems are shaped as a supervised learningproblem. Standard linear and nonlinear machine learning algorithms can be used bytransforming time series data to a supervised learning problem.

3.3.1 Supervised learning

In this type of learning machine learns under guidance, where you have the inputvariables represented as (X) and output variable as (Y) and the algorithms are usedto learn the mapping between them.

3.3.2 Sliding window approach for time series data

Using prior time steps to predict the next time step is said to be sliding windowmethod, it is also called as lags in statistics and time series. Time series can beshaped into supervised problem by restructuring time series data set in the form ofusing previous time steps as input variable(X) and next time steps as out variable(y).let’s suppose we have a time series as in table 1 we can transform the time seriesin table1 into a supervised learning problem by using the previous values of timesteps to predict the values of a next time steps.

Figure 3.4: Time series data Figure 3.5: supervised problem

3.4 Artificial Neural NetworksArtificial neural networks are computing systems inspired by biological neural net-works. They are also called as neural networks, they perform tasks by learning fromexamples without being programmed with the set of instructions. ANN’s consists of aset of neurons connected and organized in layers. These neurons send signals to eachother through a weighted connection. The neural network architecture is composedof input, output and hidden layers. Input layer is the initial input vector of the datafurther process by subsequent Layers of the Artificial neural networks. Hidden layersare the layers between the input and output layers where neurons of this layer takein the set of weighted inputs and gives an output through the activation function.Output layer gives the required outputs[39].

Page 17: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 3. Preliminaries 10

• Perceptron: It is the basic building block of neural network, it is a linearclassifier used for binary prediction. This type of network works only for linearstructured data.

Figure 3.6: single layer perceptron

• Multi -layer neural network::

Its has more advanced network architecture compared to perceptron. Theyare used in solving complex regression and classification tasks. Recurrent neu-ral networks, convolution neural networks are some examples of multi-layerperceptron[21].

Figure 3.7: Multi layer perceptron

3.4.1 Activation Functions

Calculations performed in a neuron are two types activation functions and aggrega-tions. Aggregations are just the weighted sum where as activation functions definethe output of the neuron given set of input data. These activation functions aredifferent for different types of architectures. Relu, sigmoid, and Tan-h are the widelyused non-linear activation functions

Page 18: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter3. Preliminaries 11

•Sigmoidactivationfunction: Theseare mostlyusedinbinaryclassificationproblems,itiskindoflogisticfunctionwhichgeneratesthesetofprobabilityoutputsto0and1withthegiveninputs[25].

Figure3.8:Sigmoid

Sigmoid(x)=ex/1+ex (3.1)

•Tan-hactivationfunction:Tan-hactivationfunctionisanalternativetologisticsigmoidfunction.Italsofollowsthesigmoidalfunctionbehaviorbut,theoutvaluesareboundedintherangeof-1and1,highnegativeinputvaluestoTan-hfunctionwillmaptonegativeoutputs[25]

Figure3.9:Tan-h

Tanh(x)=2/1+e2x (3.2)

•Reluactivationfunction:Itiscalledasrectifiedlinearunits,itiswidelyusedinconvolutionnetworks(CNN)becauseitgivesaccuratepredictionfortrueorfalse.Itisfamousbecauseallitsnegativevaluesareconsideredas0andpositivevaluesareconsideredas1[32].

Page 19: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter3. Preliminaries 12

Figure3.10: Relu

ReLU(x)=max(0,x) (3.3)

3.4.2 Recurrent Neural Networks

Recurrentneuralnetwork(RNN)worksontheprincipleofsavingtheoutputofalayerandfeedingthisbacktotheinputinordertopredicttheoutputofthelayer.RNN’saremostlyusedforsequentialtypeofdata.TheformulationofRNNisdonebyabstractingthegeneralconceptsandcommonpropertiesoffeedforwardneuralnetworks. Thesetypesofnetworksarewidelyusedinspeechrecognition,sentimentclassificationandtimeseriesprediction[16].

Figure3.11: Recurrentandfeedforwardnetworksstructure

Consideraninputsequencex=(x1...,xT),astandardrecurrentneuralnet-work(RNN)computesthehiddenvectorsequenceh=(h1...,hT)andoutputvectorsequencey=(y1,...,yT)byiteratingthefollowingequationsfromt=1toT

ht=H(wxhxt+whhht 1+bt) (3.4)

yt=Whyht+by (3.5)

Where Wdenotestheweightmatrices.Histhehiddenlayerfunction, Wxhdenotestheinputhiddenweightmatrixandbhdenoteshiddenbiasvector

Page 20: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 3. Preliminaries 13

3.4.3 LSTM

Lstm (Long short-term memory) are evolved version of the recurrent neural network,during back propagation recurrent neural networks suffer from the vanishing gradientproblem. Gradients are the values used to update weights of a neural network.vanishing gradient problem is when a gradient shrinks as it back propagates throughtime. If a gradient value becomes extremely small it doesn’t contribute to muchlearning so, in recurrent neural networks the layers that gets a small gradient updatedoesn’t learn mainly, the starting layers. Because these layers are not learning theycan forget what is seen in longer sequences does having short term memory. Lstm’sare created as the solution to short term memory, they have the internal mechanismscalled gates that can regulate the flow of information, these gates learns which data ina sequence is important to keep or throw away by doing this it learns to use relevantinformation to make predictions,. Lstms are mostly used in speech recognition, textgeneration and time series[19].

Figure 3.12: LSTM Architecture

A common lstm architecture is composed of a cell states and three regulatorsusually called as gates. Cell states acts as highway that transfers relative informationall the way down to the sequence chain think of it as a memory of the network becausecell state carry information throughout the sequence processing from earlier timesteps could be carried all the way to the last time step thus reducing the effects ofshort-term memory. The gates are just different neural networks that decide whichis allowed on the cell state. The gates learn what information is relevant to keepor forget during training. these contains sigmoid activation, instead of squishingvalues between -1 and 1 squishes values between 0 and 1 forget gate decides whatinformation should be thrown or kept away information from the previous hiddenstate and information from the current input is passed through the sigmoid function.Values comes out between o and 1 closer to 0 means forget and 1 means to keep, toupdate the cell state we have the input gate and output gate sends aggregate valuesto activation function.

Page 21: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 3. Preliminaries 14

it = �(Wxixt +Whiht1 +Wcict1 + bi) (3.6)

ft = �(Wxfxt +Whfht1 +Wcfct1 + bf) (3.7)

ct = ftct1 + itR(Wxcxt +Whcht1 + bc) (3.8)

ot = �(Wxoxt +Whoht1 +Wcoct + bo) (3.9)

ht = otR(ct) (3.10)

where, sigma is the logistic sigmoid function, f is the forget gate,i is the inputgate, o is output gate and c is the cell activation vectors. R is ReLU activationfunction.

3.4.4 CNN

CNNs are a special type of neural network used primarily for processing data witha grid topology[30]. For example, images can be viewed as 2D grids and time seriesdata such as energy consumption data, can be viewed as 1D grids. CNNs were usedeffectively for computer vision activities such as the classification of images [29],[15].Inat least one of the layers in the network, CNNs use a specific linear operation calledconvolution. Convolution is described as a two-function procedure based on valuedarguments [15]. The convolution operation is denoted with an asterisk

S = (X �W ) (3.11)

Where w denotes the weighting function and x denotes the input function. Theweighting function is called a ‘’kernel” within CNNs. Convolution operation outputis often referred to as ‘’feature map” (referred as s)

Usually, the operation of convolution is applied to inputs in multidimensionalarrays.in addition, the kernel is also a multi-dimensional weight array that changesas the algorithm learns through the iterations. Therefore, in a particular case, withmultidimensional inputs and Kernels, The convolution procedure is applied to morethan one dimension. Thus, the two dimensional input of convolution operation canbe expressed as:

s(i; j) = (I �K)(i; j) =Xl

Xm

I(l;m)K(i+ l; j +m) (3.12)

Where K represents a two dimensional kernel and I represents a two dimensionalinput. S is the resulting feature map after the convolution.

3.5 ARIMAArima is the widely used model in forecasting time series, proposed in 1970’s by Boxand Jenkins [4] , on the basis of auto regressive (AR), moving average (MA), and autoregressive moving average (ARMA). ARIMA is called as auto regressive integratedmoving average It is the combination of both AR and MA binding together with

Page 22: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter3. Preliminaries 15

integration.ARisthecorrelationbetweentheprevioustimeperiodtothecurrentandMAreferstotheresidualserroritisalinearcombinationoferrortermswhosevaluesoccurredsynchronicatvarioustimestampsinthepast. ARIMAis mostlyusedinforecastinginflationrateorunemploymentrates,demandoftheproducts,forecastingmortgageinterestrateandforecastingsilverorgoldprices,itiswidelyusedinseveralusecasesglobally. ARIMA modelhasthreeparametersp,d,qsopreferstoautoregressivelags,qstandsfor movingaverageanddistheorderofdifferentiationtopredictthevalueofpweusePACFgraphitisapartialautocorrelationplotandtopredictthevalueofqweuseACFplotautocorrelationplotanddistheorderofdifferentiationtomakedatastationary[28].WhentheseriesbecomesstationaryaftertheorderofdifferentiationtheformulaeforARIMAisstatedasfollows:

Yt=φ1Yt 1+φ2Yt 2....+φpYt p+et−θ1et 1−θ2et 2....+θqet q (3.13)

wherepistheorderofautoregressive model,qistheorderof movingaveragemodel,eisthewhitenoisesequence,φandθarethemodelparametersandYtisthevalueofaobservationattimet[8].

3.6 SVR

SVR(supportvectorregression)are mainlyusedforcontinuousdata. Thesesup-portsbothlinearandnonlinearregression.Itsolvesquadraticprogrammingissuebymappingtheirfeaturedimensiontothehigh-dimensionalfeaturespaceandbuildingthehyperplaneasthedecisionfunctionoftheoriginalspaceinthehighdimensionalspace.SupportvectoristheapplicationofSVMinregression

Consideratrainingdataset,D=(x1,y1),(x2,y2),...,(xn,yn),yi∈Rismappedtothehighdimensionalfeaturebythenonlinear mapping(β(x),andestablisharegressionfunction:

f(x,ω)=ω·eβ(x)+b (3.14)

SVRisnotusedtoseparatingsamplesofdifferentclassesasmuchaspossibleyet,tomakethesamplesfocuseswithinthereasonabledeviationrange,I.e,measuringthelosswhenthedeviationislargerthan

Page 23: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 3. Preliminaries 16

Figure 3.13: support vector regressor

From the above figure SVR, the loss is not calculated if the small black dots fallwithin the allowable deviation range. the distance between f(x) and its value y willbe taken into account the total loss, the distance between the circle andf(x) exceedsthe permissible deviation range �. SVR can be formulated as:

min!;b

1

2k!k2 + C

nXi=1

l�(f(xi) � yi) (3.15)

3.7 Multiple parallel input and Multi step output.Predicting multiple time steps into the future is called as multi step forecasting.Parallel time series requires the prediction of multiple time steps of each time series.The strategy is clearly stated with an example below[36][3].

Figure 3.14: multivariate time series

Page 24: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 3. Preliminaries 17

From the above multivariate time series picture we use the last two-time stepsfrom each of the five-time series as input to the model and predict the next timesteps of each of the five-time series as output.The accompanying first sample of thedata set would be:

Figure 3.15: Transformation of input and output from the above series

Page 25: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 4Method

This study is focused on two research methods addressing two research questions.

A literature search was initiated to find answer for RQ1, the reason for selectingliterature review as a research method is to study different forecasting methods andidentifying the suitable forecasting method that performs better on time series dataand my research is favored with the availability of past data so we can think ofpatterns that appeared in the past will continue in future by considering this scopeof the literature search was reduced to time series forecasting techniques. A relatedstudy was conducted to better understand the working of these forecasting techniquesand their importance in forecasting the demand.

The articles are searched using the search strings “machine learning”, “forecast-ing”, “time series forecasting”, “demand forecasting in logistics” and then refined theselected articles using:

Inclusion criteria

• Articles written in English and published between the years 2005-2019 wereselected

• Articles using forecasting and machine learning approach in supply chain es-pecially logistics.

• Articles which are published in journals, books and magazines.

Exclusion criteria

• Articles excluded which are not in the field computer science, supply chain andare not written in English.

An experiment was performed to answer RQ2 and to evaluate the performanceof the identified machine learning algorithms for demand forecasting. The modelsare trained using sliding window and multi input multi out strategy. organization isinterested in identifying the demand for coming twelve months (Long term) to plantheir capacity for transportation so, forecasting for rolling year was performed withthe availability of eight and half years past data. six and half years of data was usedfor training and two years of data was used for testing and validation

Dependent Variable: Root mean squared error

18

Page 26: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 4. Method 19

Independent Variables: ANN,LSTM and CNN

Based on literature study RMSE is a widely used performance metric comparedto other metrics that are used in regression and also the errors are squared beforebeing averaged, significant errors are assigned a relatively high weight by the RMSE,it means that when large errors are especially undesirable, RMSE is more useful.RMSE does not necessarily increase with the variance of the errors. RMSE increaseswith the variance of the frequency distribution of error magnitudes[10]. So, in thisstudy RMSE is adopted to calculate the performance of the model.

The proposed model is about forecasting the demand of outbound distributedproducts. The selected models are the most recent and well know forecasting tech-niques.

The research methodology is broken down into following phases below:

4.1 Data gatheringCollecting data from an organization is not that easy. It consists of several businessunits and many stake holders involved in it. historical data of distributed productsis stored in one data base and the recent one-year data in stored in an operationaldata store. I managed to pull the past fifteen years historical data of distributedtrucks. The initial data collected directly from the data bases contains more than300 columns, each column representing some information related to the distributedproducts as, all these columns are not so related to the problem refined the databy considering only the required features related to problem. The final data setcontaining the distributed products start date, end date, end location, monthly soldproducts to the end location, total demand for that particular month and bodybuilders. Some time these products will not be delivered to end location they willbe gone through other countries for body building these are called body builders.

4.2 Data pre-processingThe raw data set contains lot of duplicates recorded and the delivery dates areunorganized. I removed all the duplicates at the starting phase of the data analysisthen, grouped the data to month wise as organization wanted monthly forecast andalso grouped according to ordering countries and destination countries. Orderingcountries are the actual orders came from and destination countries are the wherethe final products are delivered. The data set contains many missing values and nullvalues.Missing values are filled with the average value and null values are dropped.From the collected 15 years of data only 8 years of data has some correlation int thedata rest of the past data has lot of missing values and lot of change overs made andalso not continuous so, consider only 8 years from past and 2 months from 2019 forthe research.

Page 27: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 4. Method 20

4.3 Data setAs discussed in earlier sections data set is transformed in to time series supervisedlearning problem, where each month is set to an index and the rest of the variablesof distributed products are columns with the today monthly demand as a feature.The final data set has 98 months of data with 14 variables and 1 feature where thesevariables are not dependent on each other except time and total demand. 75 percentof data is used for training and 10 percent for validation and 15 percent for testing.As discussed in the preliminaries used a multiple parallel input and multi-step outputstrategy for training.

Figure 4.1: Data set

The values in the data set are not real values, as per organization rules the data setshould not be shared out.from fig 4.1 the data is recorded month wise and in secondcolumn the prefix part that is Sweden is the ordering country and the suffix partGermany is the destination country and the values are monthly product deliveries.

4.4 Experiment setupThe experiment is focused on evaluating the selected machine learning models bycomparing their performances using selected metrics. Models are trained using mul-tiple parallel input and out put strategy. Controlled guarding is done during thetraining process. The batch size is same for both the models, taken care overfittingand underfitting by using call backs, early stopping and checkpoint functions. Modelperformance is precisely monitored during the training.

The experiment was carried out in DSVM (DATA SCIENCE VIRTUAL MA-CHINE) this is a Microsoft Azure virtual Image, preinstalled and configured andtested with various tools that are required for data science and data analysis.

The experiment used Keras an open source library written in python. It runson top of TensorFlow which is used for numerical operations and consists a bulk ofmachine learning algorithms. Min max scaler from Scikit learn is used for normalizing

Page 28: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 4. Method 21

the values between 0 and 1. Pandas is used data analysis and NumPy is usedfor performing mathematical operations. Matplot lib and sns are used for plottinggraphs, python anaconda environment and jupyter notebook were used. After scalingthe data, it is given as input as samples time steps and features as explained in theprevious sections. Forecasted values are rescaled using inverse scaling function fromScikit learn.

4.5 performance metricsThere are many evaluation metrics, but I considered mean squared error, meanabsolute error, root mean squared error.

Mean Absolute error: It is the sum of absolute differences between the actualand predicted values it doesn’t take care of direction that is positive or negative, allthe forecasted values are pushed towards positive.

MAE =nXi=1

jY � Y 1j (4.1)

Root Mean Squared error This measures the deviation from the actual values.Itcalculates the rooted mean of the square errors i.e., differences between individualprediction and actual are squared and summed all together which is then squarerooted and finally divided. Lower the RMSE value better the prediction

RMSE =

r�ni=1

�Yppredicted� Yactualn

�2(4.2)

Mean Squared error: it is procedure used for estimating the unobserved quantity.

MSE =1

n

Xn

i=1(observed� predicted)2 (4.3)

4.6 Walk forward ValidationWalk forward validation is used for determining the best parameters, it is used tooptimize within the sample data for a time window in a data series the remainder ofthe data is reserved for out of sample testing, a small portion of the reserved datafollowing the sample data is tested with the results recorded. The in sample timewindow is shifted forward by the period by the out of sample test and the processrepeated at the end all of the recorded results are used to assess the strategy toget the suitable parameters of the model and run these finalized parameters usinganother segment of the data.

This study adopted this validation technique where first month is trained andnext following is validated then first and second-month data are trained and testedwith the 3 month and so on the process followed until the final phase. Fitted themodel only on the training period and assessed the performance on the validationperiod rerun the model on the entire series.

Page 29: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 4. Method 22

Figure 4.2: Walk forward validation

Page 30: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 5Results

The experiment was supervisioned and the performances of both LSTM and CNN arecalculated using MSE, RMSE and MAE error. Learning graph, the main aim of theexperiment, forecasted values and the training process are recorded and explained infollowing phases of the experiment.

5.1 Learning curveThe main objective during training the algorithm is to minimize the loss between theactual output from the predicted output from the given training data. The trainingis started with the arbitrarily set weights, then weights are updated incrementally aswe move closer and closer to the loss. The size of the steps to reach the loss dependson learning rate. After testing and tuning the parameters the learning rate of 0.001 isset to get the optimum loss. Adam optimizer which is a variant of Stochastic gradientdescent (sgd) is used. Models are trained using the history as mentioned in abovesections and to check whether the selected algorithms are working correctly, facing abias or variance problem and also to check the performance of the algorithm learningcurves are used. early stopping and model check points functions are imported fromKeras to monitor the learning by making sure the training is going smooth withoutoverfitting and underfitting.

Figure 5.1: LSTM training graph Figure 5.2: CNN training graph

23

Page 31: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 5. Results 24

5.2 FORECASTS

Figure 5.3: Actual vs forecast using CNN

Figure 5.4: Actual vs forecast using CNN

The graphs are plotted with forecasts vs actual values on the time taken as x co-ordinate and the volumes of demand on the y-axis from the history for the 82 (7years) and forecasted demand for the 12 months (one year). Each From the earlydescription, the Organization has a total of 121 distributed locations and picturingall of them is not necessary to understand the nature of models so, I selected twodifferent destination locations where the volume is highly distributed.

5.3 Forecasting PerformanceIn order to provide a benchmark, the volume forecasting process was carried outusing a standard feed forward “shallow” ANN and LSTM. Historical volume data for12 previous time steps together with the same time step. Data sent to the CNN wereused as inputs to ANN and LSTM. All Three algorithms were implemented with cross

Page 32: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 5. Results 25

validation. The forecasting performance of the three algorithms was measured usingRMSE and the working of the performance metrics are discussed in preliminariessection.

Figure 5.5: Models performances

from the above figure we can see that the CNN outperforms the both LSTM andANN comparably ANN produces the better results than LSTM.

5.4 Validity ThreatsInternal validity refers to how well a research has been performed[27]. Most of therecognized threats to validity during the study are either eliminated or restrictedby thorough consideration for model’s selection, long-term forecasting strategy se-lection and performance measures. But it is not certain that all threats to validitywill be eliminated. All the models faced the same internal validity threats during thetraining. The missing variables in a data set which have a strong dependency andultimately affecting the model’s performance is an external validity threat. Iden-tifying these variables can be one way to control this threat but unfortunately allthe variables cannot be identified which makes it a valid external threat. Size andmagnitude of a data set can also be considered as the external validity threat becausethe weekly data and daily data has more data samples rather than grouping it tomonthly

Page 33: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 6Analysis and Discussion

Based on literature review Lstm (long short-term memory), convolution nctworks,neuralnetworks and Arima are the most widely models used for forecasting time series.Arima is better for forecasting short term periods and suitable for forecasting uni-variant time series. My problem is related on forecasting multi variate data for longhorizon. At the initial stages of conducting experiment tried Arima, vector auto re-gression with exogenous variables and some machine learning techniques as they werenot able to produce the optimum forecasts instead ANN, Lstm and CNN techniquesare able to produce the promising results.

6.1 ImplementationAs mentioned, the cnn based demand forecasting algorithm was implemented on 98months distributed products data. The presented methodology was implemented toperform a forecast for the rolling year, used a multi input and multi output strategyTherefore, 12-time steps as an input were fed into the convolution layers of theCNN. As the inputs were time series data, it needs to be data in the form of a 1Dgrid i.e. restructured in the form of a 1-Dimensional input in addition to these thekernel used in the convolution layers are defined as 1 dimensional kernel. In theimplemented CNN, each of the convolution layers were designed to have the abovementioned three phases; 1) convolution phase, 2) non-linear transformation and 3)max pooling phase. As mentioned, the convolution phase for the three layers wereperformed with 1D kernels. The rectified linear unit function was used to performnon-linear transformation for all the convolution layers. Max pooling was performedas a pooling phase for all the convolution layers once the output is produced fromthis convolution layers it is forwarded to the fully connected layers (Hidden layers).In this experiment, one hidden layer with 40 neurons each were used. The hiddenlayer used the ReLU function as its activation function. Since there were 12 timeseries outputs, the output layer contained 12 neurons for 12-time steps as an outputas explained in multi input and multi output strategy. The output layer contained alinear activation function to produce outputs. Various CNN architectures have beentested with distinct convolution layers, pooling filter sizes and Kernel sizes. Trainingis done using ADAM[26] algorithm was used as the gradient based optimizer. Thesame training and testing data were used while tuning the network with differenthidden layers, pool size and neurons. The outbound demand forecasting process wasperformed using a standard feed forward ANN and Lstm to benchmark the resultsobtained from convolution neural network.

26

Page 34: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 6. Analysis and Discussion 27

6.2 Discussion• RQ 1: What are the available state-of-art-methods used in forecasting?

Answer: Based on the results obtained from the Literature study three machinelearning models Long short term memory(LSTM), Artificial neural network(ANN), Convolution neural network (CNN) have been chosen for forecastingthe demand of out bound distributed Trucks.

• RQ 2: Which Machine Learning Model would perform better forecasting ontime series data?

Answer:Convolution neural networks is the best suitable machine learning al-gorithm for forecasting the demand of outbound products. In this experimentCNN has high performance I.e. 0.694 compared with other two algorithmsLSTM with 0.654 and ANN with 0.678. After performing the out of sampletest its performance is increased to 0.743 which is quite promising. The model’sperformance is discussed in section 5.3

Page 35: Demand Forecasting Of Outbound Logistics Using Machine ...

Chapter 7Conclusions and Future Work

This paper is focused on forecasting demand of outbound distributed products forlong term. There are no relevant influencing factors identified other than total de-mand, so multiple regression approach is not considered. While going through lit-erature most of the papers related to time series forecasting problem talked aboutneural networks and deep neural networks are best suitable for forecasting long termrather than traditional moving average methods, models are selected based on itsfeasibility and applicability and the forecasting performance is compared with twoselected nonlinear time series bench marking methods- Lstm and Artificial neuralnetworks. It is found that CNN with the data pre-processing measures exhibit thebetter performance in out-of-sample testing. As forecasting precision is very influen-tial for planning the capacity and reducing the costs for logistic companies, CNN isthought to be the best candidate of prediction approach for the case

The experiment conducted in this research is on monthly data. Weekly demanddata might produce better results and also this research did not consider any externalfactors. The future research can be done using weekly data and considering factorsinfluencing demand to get the accurate results.

28

Page 36: Demand Forecasting Of Outbound Logistics Using Machine ...

References

[1] Luis Aburto and Richard Weber. Improved supply chain management based onhybrid demand forecasts. Applied Soft Computing, 7(1):136–144, 2007.

[2] Kasun Amarasinghe, Daniel L Marino, and Milos Manic. Deep neural networksfor energy load forecasting. In 2017 IEEE 26th International Symposium onIndustrial Electronics (ISIE), pages 1483–1488. IEEE, 2017.

[3] Gianluca Bontempi. Long term time series prediction with multi-input multi-output local learning. Proc. 2nd ESTSP, pages 145–154, 2008.

[4] George EP Box, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung.Time series analysis: forecasting and control. John Wiley & Sons, 2015.

[5] Real Carbonneau, Kevin Laframboise, and Rustam Vahidov. Application ofmachine learning techniques for supply chain demand forecasting. EuropeanJournal of Operational Research, 184(3):1140–1154, 2008.

[6] Chris Chatfield. Time-series forecasting. Chapman and Hall/CRC, 2000.

[7] Pei-you Chen and Lu Liu. Study on coal logistics demand forecast based onpso-svr. In 2013 10th international conference on service systems and servicemanagement, pages 130–133. IEEE, 2013.

[8] Paulo Cortez, Miguel Rocha, and José Neves. Evolving time series forecastingarma models. Journal of Heuristics, 10(4):415–429, 2004.

[9] Jingyi Du, Qingli Liu, Kang Chen, and Jiacheng Wang. Forecasting stock pricesin two ways based on lstm neural network. In 2019 IEEE 3rd Information Tech-nology, Networking, Electronic and Automation Control Conference (ITNEC),pages 1083–1086. IEEE, 2019.

[10] Gabriel Fernandez. Deep Learning Approaches for Network Intrusion Detection.PhD thesis, The University of Texas at San Antonio, 2019.

[11] Fernando Turrado García, Luis Javier García Villalba, and Javier Portela. Intel-ligent system for time series classification using support vector machines appliedto supply-chain. Expert Systems with Applications, 39(12):10590–10599, 2012.

[12] Manas Gaur, Shruti Goel, and Eshaan Jain. Comparison between nearest neigh-bours and bayesian network for demand forecasting in supply chain manage-ment. In 2015 2nd International Conference on Computing for SustainableGlobal Development (INDIACom), pages 1433–1436. IEEE, 2015.

29

Page 37: Demand Forecasting Of Outbound Logistics Using Machine ...

References 30

[13] Felix A Gers, Douglas Eck, and Jürgen Schmidhuber. Applying lstm to time se-ries predictable through time-window approaches. In Neural Nets WIRN Vietri-01, pages 193–200. Springer, 2002.

[14] Kenneth Gilbert. An arima supply chain model. Management Science,51(2):305–310, 2005.

[15] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press,2016.

[16] Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. Speech recognitionwith deep recurrent neural networks. In 2013 IEEE international conference onacoustics, speech and signal processing, pages 6645–6649. IEEE, 2013.

[17] Hu Guosheng and Zhang Guohong. Comparison on neural networks and supportvector machines in suppliers’ selection. Journal of Systems Engineering andElectronics, 19(2):316–320, 2008.

[18] John E Hanke, Arthur G Reitsch, and Dean W Wichern. Business forecasting,volume 9. Prentice Hall Upper Saddle River, NJ, 2001.

[19] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neuralcomputation, 9(8):1735–1780, 1997.

[20] Wei-Chiang Hong, Yucheng Dong, Li-Yueh Chen, and Shih-Yung Wei. Svr withhybrid chaotic genetic algorithms for tourism demand forecasting. Applied SoftComputing, 11(2):1881–1890, 2011.

[21] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforwardnetworks are universal approximators. Neural networks, 2(5):359–366, 1989.

[22] Rob J Hyndman and George Athanasopoulos. Seasonal arima models. Forecast-ing: principles and practice, 2015.

[23] Joarder Kamruzzaman and Ruhul A Sarker. Forecasting of currency exchangerates using ann: A case study. In International Conference on Neural Networksand Signal Processing, 2003. Proceedings of the 2003, volume 1, pages 793–797.IEEE, 2003.

[24] Karin Kandananond. Consumer product demand forecasting based on artifi-cial neural network and support vector machine. World Academy of Science,Engineering and Technology, 63:372–375, 2012.

[25] Bekir Karlik and A Vehbi Olgac. Performance analysis of various activationfunctions in generalized mlp architectures of neural networks. InternationalJournal of Artificial Intelligence and Expert Systems, 1(4):111–122, 2011.

[26] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimiza-tion. arXiv preprint arXiv:1412.6980, 2014.

[27] Barbara Ann Kitchenham, David Budgen, and Pearl Brereton. Evidence-basedsoftware engineering and systematic reviews, volume 4. CRC press, 2015.

Page 38: Demand Forecasting Of Outbound Logistics Using Machine ...

References 31

[28] Yann-Aël Le Borgne, Silvia Santini, and Gianluca Bontempi. Adaptive modelselection for time series prediction in wireless sensor networks. Signal Processing,87(12):3010–3020, 2007.

[29] Y LeCun, Y Bengio, and G Hinton. Deep learning. nature 521. 2015.

[30] Yann LeCun, Yoshua Bengio, et al. Convolutional networks for images,speech, and time series. The handbook of brain theory and neural networks,3361(10):1995, 1995.

[31] Wen-Yau Liang and Chun-Che Huang. Agent-based demand forecast in multi-echelon supply chain. Decision support systems, 42(1):390–407, 2006.

[32] Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. Rectifier nonlinearitiesimprove neural network acoustic models. In Proc. icml, volume 30, page 3, 2013.

[33] Jan Olhager, Martin Rudberg, and Joakim Wikner. Long-term capacity man-agement: Linking the perspectives from manufacturing strategy and sales andoperations planning. International Journal of Production Economics, 69(2):215–225, 2001.

[34] Kwame Owusu Kwateng, John Frimpong Manso, and Richard Osei-Mensah.Outbound logistics management in manufacturing companies in ghana. Reviewof Business & Finance Studies, 5(1):83–92, 2014.

[35] Gordon Stewart. Supply chain performance benchmarking study reveals keys tosupply chain excellence. Logistics Information Management, 8(2):38–44, 1995.

[36] Souhaib Ben Taieb, Antti Sorjamaa, and Gianluca Bontempi. Multiple-outputmodeling for multi-step-ahead time series forecasting. Neurocomputing, 73(10-12):1950–1957, 2010.

[37] Sangeeta Vhatkar and Jessica Dias. Oral-care goods sales forecasting usingartificial neural network model. Procedia Computer Science, 79:238–243, 2016.

[38] Wen-Jing Yuan, Jian-Hua Chen, Jing-Jing Cao, and Ze-Yi Jin. Forecast oflogistics demand based on grey deep neural network model. In 2018 InternationalConference on Machine Learning and Cybernetics (ICMLC), volume 1, pages251–256. IEEE, 2018.

[39] G Peter Zhang and Min Qi. Neural network forecasting for seasonal and trendtime series. European journal of operational research, 160(2):501–514, 2005.

[40] Xinfeng Zhang, Shengchang Wang, and Yan Zhao. Application of support vectormachine and least squares vector machine to freight volume forecast. In 2011International Conference on Remote Sensing, Environment and TransportationEngineering, pages 104–107. IEEE, 2011.

Page 39: Demand Forecasting Of Outbound Logistics Using Machine ...

Appendix ASupplemental Information

Figure A.1: Distribution of residuals

Figure A.2: Actual vs forecast using LSTM

32

Page 40: Demand Forecasting Of Outbound Logistics Using Machine ...

Appendix A. Supplemental Information 33

Figure A.3: Decomposition of Time series

Figure A.4: forecsat using LSTM

Page 41: Demand Forecasting Of Outbound Logistics Using Machine ...

Appendix A. Supplemental Information 34

Figure A.5: forecast using LSTM

Page 42: Demand Forecasting Of Outbound Logistics Using Machine ...
Page 43: Demand Forecasting Of Outbound Logistics Using Machine ...

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden