Maximum Length Weighted Nearest Neighbor Approach …ikop0282/ijcnn2015t.pdf · Maximum Length...

Maximum Length Weighted Nearest NeighborApproach for Electricity Load Forecasting

Tommaso Colombo, Irena Koprinska, and Massimo Panella

Abstract— In this paper we present a new approach for timeseries forecasting, called Maximum Length Weighted NearestNeighbor (MLWNN), which combines prediction based on se-quence similarity with optimization techniques. MLWNN predictsthe 24 hourly electricity loads for the next day, from a timesequence of previously electricity loads up to the current day. Weevaluate MLWNN using electricity load data for two years, forthree countries (Australia, Portugal and Spain), and compare itsperformance with three state-of-the-art methods (weighted near-est neighbor, pattern sequence-based forecasting and iterativeneural network) and with two baselines. The results show thatMLWNN is a promising approach for one day ahead electricityload forecasting.

I. INTRODUCTION

This paper proposes a new machine learning approach forforecasting the hourly electricity loads for the next day. Givena time series of electricity loads measured every hour up to agiven day, we aim at forecasting the 24 hourly electricity loadsfor the next day. This task is categorized as ‘short-term loadforecasting’. It is needed for the planning and operation ofpower systems, to ensure reliable and cost-effective electricitysupply. It is also fundamental in deregulated electricity mar-kets, where electricity prices are determined through a biddingprocess, to support the market participants in their transactions.

Predicting the load for the next day accurately is a challeng-ing task as the electricity time series has a number of nestedcycles (daily, weekly, seasonal and yearly), and also showsrandom fluctuations depending on the household electricityusage, weather changes, large industrial units with irregularhours of operation and other variations.

More generally, the dynamics of the electricity loads in-fluences the behavior of the energy prices. This behavior iscomplex and has shown large unexpected volatility in the lastdecade. In this context, a tool providing accurate forecastsof electricity loads and prices, is very useful. Non-linearregression models, such as Neural Networks (NNs), havebeen successfully used for forecasting financial time series,electricity prices and other commodity prices, and were shownto be able to capture important characteristics such as fat tails,volatility, persistence and leverage effects [1]–[16].

Most of the existing approaches for short-term load fore-casting consider one step ahead prediction, i.e. at time hthe task is to predict the load for time h+ 1. In this paper

Tommaso Colombo and Massimo Panella are with the Department ofInformation Engineering, Electronics and Telecommunications (DIET) of theUniversity of Rome “La Sapienza”, Via Eudossiana 18, 00184 Rome, Italy.Email: [email protected]. Irena Koprinska is with the School ofInformation Technology, University of Sydney, Sydney, NSW 2006, Australia.Email: [email protected].

we consider predicting all 24 hourly values for the next daysimultaneously. This task was previously considered by theWeighted Nearest Neighbor (WNN) approach [17], which is astate-of-the-art approach. Suppose that Xd is a 24-dimensionalvector consisting of the hourly loads for a day d. To predictthe loads Xd+1 for the next day d+ 1, WNN first finds the Knearest neighbors of the previous day Xd; then, the predictionfor the new day is a weighted linear combination of the loadfor the days following the nearest neighbors, where the weightsare determined by the distance of the neighbors to Xd.

In this paper we propose a new approach for predictingthe hourly electricity load for the next day, called MaximumLength Weighted Nearest Neighbor (MLWNN), which extendsthe state-of-the-art WNN approach in several ways. Our con-tribution can be stated as follows:

1) Our proposed MLWNN approach can be seen as ageneralization of WNN. While WNN finds the mostsimilar sequence of previous days (when determining thenearest neighbors), MLWNN finds the longest sequenceof hourly loads with a similarity higher than a giventhreshold. The best value of this threshold is determinedby an optimization procedure. As a consequence, ML-WNN is applicable to any forecasting horizon, as itdirectly operates on a sequence of hourly loads insteadof a sequence of days, and it does not require thetime series to be organized into 24 dimensional vectorscorresponding to each day.

2) We conduct an evaluation of MLWNN using electricitydata for two years, for Australia, Portugal and Spain. Wecompare the forecasting accuracy of MLWNN on thethree datasets with WNN and two other advanced fore-casting approaches: Iterative Neural Network (INN) [18]and Pattern Sequence-Based Forecasting (PSF) [19], andalso with two baselines.

The paper is organized as follows. In Section II we providean overview of the most important and recent research worksin the field of short-term load forecasting. The proposedMLWNN approach is presented in Section III. Section IVpresents the experimental setup and Section V shows anddiscusses the results. Finally, our conclusions are drawn inSections VI.

II. RELATED WORKS

There are two main groups of approaches for short-termload forecasting: statistical and computational intelligence.Prominent examples of the first group are exponential smooth-ing, Autoregressive Integrated Moving Average (ARIMA) and

Linear Regression (LR), and notable examples of the secondgroup are NNs and support vector regression.

A. Statistical Approaches

Taylor et al. [20] considered the prediction of the hourlyelectricity load for Rio de Janeiro, from 1 to 24 hours ahead.They compared four methods: ARIMA, double seasonal Holt-Winters exponential smoothing, backpropagation NN and aPCA-based LR. They found that the most accurate methodwas exponential smoothing which was also the fastest method.In [21] Taylor and McSharry compared exponential smoothingand PCA-based LR with two new methods (a different formu-lation of exponential smoothing and a periodic autoregression)using hourly data for Italy, Norway and Sweden. The resultsagain showed that the double seasonal Holt-Winters exponen-tial smoothing was the most accurate method.

Soares and Medeiros [22] proposed a novel forecastingmodel with two components: deterministic for trends, sea-sonality and special days, and stochastic that uses linearautocorrelation. A different model was built for each hour ofthe day. An evaluation using Brazilian hourly data showed thatthe proposed approach obtained promising results, outperform-ing ARIMA and other methods. A semi-parametric additiveregression method was proposed in [22] and used to forecasthalf-hourly electricity loads one day ahead for Australiandata. A separate model was built for each half hour, usingprevious electricity loads, calendar and temperature variables.The forecasting method was evaluated offline on historical dataand and also in real time on site, showing very good results.

Fan and Hyndman [23] proposed a semi-parametric additiveregression methodology that was used to forecast the half-hourly electricity loads one day ahead for the states of Victoriaand South Australia in Australia. A separate model was builtfor each half hour, using the previous lagged electricity loads,calendar and temperature variables. The forecasting modelshowed excellent performance on both historical data andwhen applied in real time on site.

B. Computational Intelligence Approaches

NN-based approaches are probably the most popular ap-proaches for load forecasting due to their ability to learnthe time series from examples and to capture non-linearrelationships between the predictor variables and the targetvariable [24]–[26].

Most of the proposed computational intelligence approachesconsidered the task of one step ahead prediction (e.g. 1 hourahead); below we review approached that predict all 24 hourlyvalues for the next day. Apart from WNN [17] which wasalready mentioned, there are two other notable approaches thatpredict the 24 hourly values for the next day: PSF and INN.

PSF [19] is a generalization of WNN, which combinesclustering with sequence matching. It first groups all vectorsXd from the training data into K clusters and labels them withthe cluster number. Then it extracts a sequence of consecutivedays, from day d backwards, and matches the cluster labelsof this sequence against the training data to find a set ESd of

sequences that are the same. It then follows a nearest neighborapproach similarly to WNN, which finds the following dayfor each element of ESd and averages the 24 hourly loadsof these following days, in order to produce the final 24hourly predictions for day d+ 1. The results showed that bothWNN and PSF are very competitive approaches outperformingARIMA, NNs and other methods.

INN [18] is an iterative prediction method. At time h itmakes a prediction for time h+ 1; this prediction is added tothe available data and used to make a prediction for time h+ 2and so on for all 24 points from the forecasting horizon. It usesa mutual information feature selector and a neural networkforecasting algorithm. The results showed that it was able toprovide accurate predictions, outperforming the non-iterativemethods WNN and PSF.

Wavelet-based approaches predicting the load for the nextday have also been proposed. Reis and Alves da Silva [27]considered the task of 1-24 hours ahead prediction of hourlyNorth American data. They used multilevel wavelet to decom-pose the electricity load into several components that werepredicted separately by NNs trained with the backpropagationalgorithm. Chen et al. [28] also considered the task of pre-dicting the electricity load 1-day ahead from previous hourlyloads using wavelet transformation and backpropagation NNs.They selected a day that is similar to the day to be forecastedin terms of weekly index and weather, decomposed the loadfor this day into two wavelet components and then trained aseparate NN for each component. Non-wavelet features suchas temperature, humidity, cloud cover and precipitation werealso used as inputs to the NNs. An evaluation using four yearsdata for the state of New England was conducted, showing amean absolute percentage error of 1.24-2.22%.

III. THE PROPOSED APPROACH TO SHORT-TERMELECTRICITY LOAD FORECASTING

We consider a ’one day ahead’ prediction problem: giventhe hourly loads recorded in the past up to day d, the goal is toforecast the 24 hourly loads corresponding to day d+ 1. Moreprecisely, given a time series S(n), n > 0, of hourly loads thatare known up to hour h, we want to determine the hourlyloads S(h+ 1), . . . , S(h+ 24), which means considering aforecasting horizon of 24 time steps. This formulation is a gen-eralization of the standard ’one day ahead’ prediction problem,as it does not require a reorganization of the hourly time seriesdata into 24-dimensional vectors. Thus, our proposed approachis general and can be applied to any forecasting horizon f ≥ 1,to predict the samples S(h+ 1), . . . , S(h+ f).

We firstly introduce the main parameters of the method, thathave to be specified in advance:• forecasting horizon, an integer value f > 0;• number of nearest neighbor patterns (subsequences), an

integer value K ≥ 1;• upper bound of the dimension of the subsequences associ-

ated with the nearest neighbors, an integer value W ≥ 1;• similarity threshold under which two subsequences are

not considered as sufficiently similar with respect to a

suitable defined similarity measure, a real-valued param-eter θ > 0.

The proposed MLWNN algorithm is based on the followingsteps, applied at any time instant h using the available data:

1) Let w be a counter variable initialized to 1.2) Store a dataset of observations in a matrix

Yw ∈ R(h−f−w+1)×w, where the ith row yw,i ofYw, i = 1 . . . (h− f − w + 1), is a vector of w hourlyloads:

yw,i = [S(i) S(i+ 1) . . . S(i+ w − 1)] . (1)

We note that the most recent data sample stored in Yw

is S(h− f), and that it is located in the last row andthe rightmost column.

3) Let zw be the vector containing the most recent knownw samples of hourly loads:

zw = [S(h− w + 1) S(h− w + 2) . . . S(h)] . (2)

4) Using the Chebyshev distance, find the K nearest neigh-bors for the vector zw among the rows of Yw. TheChebyshev distance D(zw,yw,i) between zw and yw,i

is defined as:

‖zw − yw,i‖∞ = maxj=1...w

∣∣zw(j)− yw,i(j)∣∣. (3)

5) Let Dmax be the maximum Chebyshev distance amongthe found K nearest neighbors and let zmax be themaximum absolute value of the elements in zw. IfDmax

zmax> θ then go to step (8).

6) Store wbest ← w as a new value for w and also thereference vector q0 ← zw. Store the Nearest Set (NS) ofvectors {q1,q2, . . . ,qK} using the K nearest neighborsvectors found, where q1 is the nearest vector of q0 andqK is the furthest nearest vector.

7) If w < min{h− f −K + 1, W} then increasew + 1← w and go back to step (2)1.

8) Extract the f following samples for each vector in NSand compute a weighted average to produce the predic-tion of S(h+ 1), . . . , S(h+ f), as explained below.

At the end of this procedure, we will obtain the set of nearestneighbors NS for q0; each of them can be associated with thevector containing the f following load values. For example, ifqn = ywbest,i , 1 ≤ n ≤ K, as in (1), then:

q(f)n = [S(i+ w) S(i+ w + 1) . . . S(i+ w + f − 1)] . (4)

The loads to be predicted are q(f)0 :

q(f)0 = [S(h+ 1) S(h+ 2) . . . S(h+ f)] . (5)

The weighted average, which determines q(f)0 , is therefore

based on the following formula:

q(f)0 =

1∑K

n=1 αn

K∑

n=1

αnq(f)n , (6)

1The number of rows in Yw must be greater than or equal to thenumber K of nearest neighbors, i.e. h− f − w + 1 ≥ K, and hencew ≤ h− f −K + 1. Also, since w ≥ 1, we must have a sufficient numberof observations so that h ≥ f +K.

where the weights αn are obtained as follows:

αn =‖qK ,q0‖∞ − ‖qn,q0‖∞‖qK ,q0‖∞ − ‖q1,q0‖∞

. (7)

Regarding the novelty of the proposed MLWNN approach,there are three main differences with respect to the originalWNN method. Firstly, the length of similar vectors we arelooking for in the previous data, in order to generate NS, isiteratively and automatically increased. This is based on theassumption that the longer the similar sequence is, the betterit represents the local behavior of the time series.

Also, the subsequence is increased based on a similaritythreshold whose optimal value is determined using an opti-mization technique as described in the next section. MLWNNuses the Chebyshev distance while WNN uses the Euclideandistance. The Chebyshev distance is the largest of the element-by-element distances between the two vectors, and it hasbeen chosen to find similarity between sequences by takingunder control any divergence on a single element-by-elementdifference. MLWNN stops increasing the subsequence whenthe Chebyshev distance between the compared vectors is abovethe similarity threshold (expressed as a percentage similarity).We note that if the Chebyshev distance is below the threshold,this means that at least one element-by-element distance isbelow this threshold.

Finally, WNN uses a window of previous days, whileMLWNN does not require this, which means that a larger partof the available data (potentially all available data) can be usedto make the prediction. The underlying assumption is that ifwe use more data, we may be able to find better candidates.

The parameter w in (1) can be considered as the em-bedding dimension of the time series, that is the numberof previous loads that will feed the regression model forestimating the next value to be predicted [19], [29]–[31]. Thisvalue can be found by other means, e.g. fractal dimensionsand statistical methods, often assuming a chaotic behavior ofthe observed time series [32]. MLWNN algorithm aims at amore general heuristic for prediction, which tries to overcomethe dependence of classical embedding approaches from theperformance of estimation of the optimal dimension.

It is also important to note that this approach provides thebasis for further extensions, that can utilize more complexregression models and also neural and fuzzy NNs. In addition,after NS has been determined, the predicted values in (5) canbe obtained through a nonlinear inference system that replacesequations (6) and (7).

We note again the high flexibility of our proposed MLWNNalgorithm, especially with respect to the forecasting horizonf . For example, MLWNN can be applied for one hour aheadprediction using the value f = 1. In the following sections,we report the results for f = 24, for the sake of comparisonwith WNN, PSW and INN approaches.

IV. EXPERIMENTAL SETUP

We use electricity load data collected from Australia, Por-tugal and Spain for two years: from 1 January 2010 to 31 De-cember 2011. The data are sampled every hour, thus the total

number of samples in each dataset is 2 ∗ 365 ∗ 24 = 17520.All datasets are publicly available: the Australian data is forthe State of New South Wales (NSW), it is provided by theAustralian Energy Market Operator (AEMO) and availablefrom http://www.aemo.com.au. The Portuguese andSpanish data are provided by the Spanish Electricity PriceMarket Operator (OMEL) and available from http://www.omelholding.es.

The electricity load has three main cycles: daily, weekly andyearly. Figs. 1-3 show the hourly electricity load for the threecountries for one month, June 2011. From these plots we canclearly see the daily and weekly cycles, which are correlatedwith the human, industrial and commercial activities. Duringthe day, the load has a minimum at 4am, a first peak at 9-10am, stays relatively stable until the end of the working dayand then it reaches a second peak at 6-7 pm. As expected,the load during the weekend is lower than the load duringthe weekdays. We can also see that the load for Portugaland Spain is more variable than the load for Australia, whichmight be due to greater weather and temperature fluctuations,and also to higher activity variations, especially given that theAustralian data is for one region only (the state of NSW).

Figure 3.1: Australian dataset, month of June 2011

Figure 3.2: Australian dataset, month of December 2011

ahead prediction.Weather data available on the Australian Government Bureau of Meteorologywere daily maximum temperature, minimum temperature, rainfall and solarexposure throughout the New South Wales territory.To aggregate those thousands of di↵erently located data, a population-basedweighted average was computed and then a normalization was carried on tomake them comparable to electricity load samples.Obviously not all cities in New South Wales were considered, but the mostimportant only. The weighted average, for each city c in the set C, as follows:Given the daily weather sample xd,c and the population Pc living in that city,

Xd =xd,c ⇤ PcP

c2CPc

(3.1)

This procedure was repeated for each of the weather variables.Taking in consideration the di↵erent sampling rate, hourly for the target

32

Fig. 1. Australian hourly loads (June 2011).

Figure 3.3: Portuguese dataset, month of June 2011

Figure 3.4: Portuguese dataset, month of December 2011

(although also input) variable and daily for the daily variables, data wereaggregated in a vector with 28 elements: the first 24 representing the hourlyloads of day d, whereas elements 25, 26, 27 and 28 representing the four dailymeasures of the weather variables.Afterwards, data were standardized in the interval [0,1] through the followingformula:given a set Y, for each y in Y,

y =max Y � y

max Y �min Y + 10�12(3.2)

Doing so, we prevent the load samples (order of 104) to ”obscure” weathersamples.

33

Fig. 2. Portuguese hourly loads (June 2011).

Figure 3.5: Spanish dataset, month of June 2011

Figure 3.6: Spanish dataset, month of December 2011

34

Fig. 3. Spanish hourly loads (June 2011).

There is also a yearly pattern of the electricity load, e.g.

the load for 2010 is very similar to the load for 2011. Thismotivates using the data for 2010 as training set to buildprediction models and then using these models to predict theload for 2011 (test set data). Although the three datasets havesimilar cycles, the range of values is different. It is highest forthe Spanish data (from 10000 MW to 40000 MW), followedby the Australian data (from 5000 MW to 14000 MW) andlowest for the Portuguese data (from 1000 MW to 8000 MW).

The objective function to be minimized was the MeanAbsolute Percentage Error (MAPE), which is defined as:

MAPE =100

N

N∑

n=1

∣∣∣∣∣Sn − Sn

Sn

∣∣∣∣∣ , (8)

where Sn is the estimated value of the time series at timen and N is the total number of samples in the test set(when evaluating performance) or validation set (when tuningparameters).

As previously explained, given that f was set to 24, ML-WNN requires three other parameters to be pre-specified orestimated from data: θ, W and K, for which a suitable upperbound was found to be 50. The training phase on the 2010 datawas three-fold, due to computational time constraints that weset (i.e., a prediction must be completed in a hour, before anew sample is available and the next one is to be predicted).

Firstly, two suitable initial values for W and K weredetermined: W0 and K0. Then, a near-optimal value for θwas found through a global optimization technique. This wasa ‘nested’ modification of a full-search global optimizationalgorithm: every iteration, m values of θ were drawn at randomin a given interval; then, the MAPE was computed m times inorder to select the value θ∗ minimizing it; finally, the intervalfor the next iteration is restricted and centered on the θ∗ value.The output of such an algorithm, although not guaranteeingto find the global optimum, is a ‘funnel’ leading quickly toa local minimum. If the objective function is convex, then itleads to the global minimum.

Successively, the determination of local optima for Wand K was addressed: the values of these parameters wereiteratively increased until a local minimum was found. Thisprocedure was applied to both parameters, once at a time. Foreach tuning, the performance was evaluated by measuring theMAPE on a validation set, which consisted of the most recent25% training data samples. The optimal parameters determinedfor the three datasets are summarized in Table I.

TABLE IOPTIMAL MLWNN PARAMETERS FOR EACH DATASET

Parameter Australian Portuguese Spanishθ 0.09 0.22 0.19W 25 40 40K 8 7 9

We can see that the value of the parameter θ is considerablydifferent for each dataset. It is lowest for the least volatile timeseries (Australia) and highest for the most volatile time series(Portugal). This result reflects the variability of the three time

TABLE IIMAPE RESULTS FOR MLWNN

Hour Australian Portuguese Spanish1 1.15± 1.01 8.08± 7.30 4.29± 3.442 1.22± 1.11 9.06± 7.88 5.39± 4.543 1.31± 1.11 9.94± 8.00 6.57± 5.774 1.36± 1.15 10.32± 8.42 6.98± 6.395 1.70± 1.48 10.22± 8.50 7.31± 6.776 2.74± 2.79 10.44± 8.60 7.17± 7.067 3.88± 4.40 11.02± 9.17 6.85± 7.618 4.00± 4.67 12.59± 10.83 8.63± 9.269 3.26± 3.84 14.56± 13.62 10.47± 12.45

10 3.02± 3.28 17.60± 16.73 9.27± 10.8511 3.22± 3.24 17.28± 17.69 7.75± 8.9512 3.54± 3.45 16.07± 16.88 7.15± 8.1913 3.84± 3.82 14.99± 15.60 6.97± 7.7214 4.13± 4.29 14.16± 14.74 6.81± 7.6715 4.37± 4.66 14.97± 15.13 6.39± 6.9516 4.38± 4.71 16.05± 16.10 6.66± 7.3017 4.45± 4.70 16.60± 16.95 7.26± 7.9618 4.23± 4.37 16.98± 17.69 7.14± 7.9219 3.73± 3.96 16.12± 16.52 6.72± 7.1520 3.59± 3.75 14.20± 14.36 6.47± 6.8521 3.43± 3.67 13.20± 13.03 5.73± 5.6122 3.03± 3.25 13.00± 12.47 5.30± 5.0323 2.52± 2.67 12.42± 11.81 4.89± 4.4024 2.40± 2.53 12.27± 11.51 4.53± 4.09

Avg. 3.10± 3.63 13.42± 13.53 6.78± 7.49

series for which θ is the similarity threshold expressed as apercentage: a lower value (more strict constraint) is suitable fora less volatile time series, while a higher value (more relaxedconstraint) is needed when dealing with more volatile timeseries. The values of W and especially the values K are similarfor the three datasets (between 7 and 9).

V. PERFORMANCE EVALUATION

Once the training was completed and the parameters weretuned, MLWNN was evaluated on the test data (the 2011 data)in terms of MAPE. The evaluation procedure consisted of 365one day ahead predictions, starting from the first hour of eachday. The MAPE for each predicted hour, averaged over thewhole test set, and the related standard deviations are reportedin Table II. The overall average error and standard deviationwere also computed and are shown in the last row.

The hourly MAPE results for the three datasets are shownin Fig. 4. Although the graphs are similar, we can see thatthe peaks in the MAPE error are not aligned: the higher theaverage error, the more shifted the peak is towards the hoursin the middle of the day.

Figure 6.1: Hourly MAPE on the three datasets

low, but the forecasting accuracy has been proven to be quite stable, with amaximum error very likely to be under 7-8%.To give an idea of how good was such a performance, we compare our resultswith the results obtained by di↵erent methods on the same dataset and somebaselines as well.Please note that Bpday is a simple predictor baseline which forecasts next

day simply copying the previous day; similarly, Bpweek computes a predic-

tion copying the same day of the previous week, whereas Bmean calculatesa simple mean of the previous samples available.

MLWNN INN WNN PSF Bpday Bpweek Bmean

MAE 277.35 304.89 307.46 352.03 420.46 471.20 719.44MAPE 3.10 3.36 3.40 3.96 4.82 5.20 8.08

To provide the reader with a visual idea of the forecasting accuracy, somefigures are shown in the next page.Predicted values are represented in red, while the actual samples are repre-sented in blue.In the first figure an entire month is shown, in particular the month of De-cember, which is one of the most di�cult months to predict: this is mainlybecause during summer peaks are frequent and di�cult to forecast. Theweekly pattern is simple to recognize and the error is very low for the vastmajority of the forecasts.The second figure extracts the last week of December 2011 and clearly out-lines how the errors are extremely higher during peak hours rather thanduring the first and last hours of each day. Below, the third and last figureshows the forecast and the actual sample, in a particular day of December2011.

51

Fig. 4. Hourly MAPE of the three datasets.

For the sake of comparison with other approaches, we also

TABLE IIIMAE RESULTS FOR MLWNN

Hour Australian Portuguese Spanish1 91.72 280.33 870.442 90.66 287.06 952.113 90.76 296.59 1054.124 91.12 296.01 1057.305 114.08 289.54 1081.346 193.96 289.34 1064.947 300.73 298.03 1049.728 339.66 348.77 1465.059 295.65 425.24 1862.34

10 281.11 596.72 1797.9411 302.37 634.97 1610.6512 331.63 615.43 1525.5413 359.83 604.23 1549.0514 385.03 556.92 1517.4815 405.79 575.39 1406.9116 412.07 620.54 1437.6617 424.35 626.45 1529.0818 411.64 619.39 1517.8919 367.44 591.44 1456.6520 348.55 549.40 1412.6521 323.72 531.82 1281.7422 274.80 524.46 1197.0723 221.13 502.39 1087.2624 198.63 478.23 933.97

Avg. 277.35 476.61 1321.62

report the Mean Average Error (MAE), which is defined as:

MAE =1

N

N∑

n=1

|Sn − Sn|. (9)

The MAE obtained for each hour, averaged over the 2011 testset, is reported in Table III.

Below we discuss the performance of MLWNN on eachdataset in more details. For comparison, we also present theresults of the three state-of-the-art approaches mentioned inSect. II: INN, WNN and PSF. As MLWNN is a generalizationof WNN, it is important that we compare the two methods. Theother two methods, PSF and INN, were chosen for comparisonsince PSF is an extension of WNN, and INN was shownto outperform WNN and PSF in [18]. In addition, we alsocompare the performance of WNN with two baselines: Bpday

and Bpweek. The daily baseline Bpday simply predicts the loadsfrom the previous day. The weekly baseline Bpweek predictsthe loads from the same day of the previous week.

A. Australian Data

Table IV presents the comparison results for the Australiandataset. We can see that MLWNN is the most accurateapproach, followed by INN, WNN, PSF and the two baselines.We also used the t-test to evaluate if the differences inaccuracy are statistically significant. The results showed thatthe accuracy of MLWNN is significantly higher (+) than all ofthe other methods used for comparison at a confidence levelof 99%.

Figs. 5, 6 and 7 show the predicted and the actual loadsfor a month, week and day, respectively, in December 2011.The predicted values are relatively close to the actual values,especially taking into consideration that December is one of

TABLE IVPERFORMANCE COMPARISON FOR AUSTRALIAN DATASET

Error/test MLWNN INN WNN PSF Bpday Bpweek

MAPE 3.10 3.36 3.40 3.96 4.82 5.20MAE 277.4 304.9 307.5 352.0 420.5 471.2t-test + + + + +

the months with most volatile electricity load in Australia, andhence, difficult to predict.

Figure 6.2: Predicted (red) vs actual Australian loads for December 2011

Figure 6.3: Predicted (red) vs actual Australian loads, last week of December2011

The procedure applied was to start the forecast at hour 1 of each day, thusforecasting hours 1,2, ..., 24 and then shift to the next day. This is not thebest approach, mainly for two reasons: first of all, in a real application, itis in general not true that forecasts are made starting from the first hour ofthe day; secondly, and most important, an online update of the forecasts (i.e.a continuous process, forecasting hours 1-24 first, then 2-25, 3-26, ..., 24-47,etc.) would lead to better performances in terms of prediction accuracy and,then, risk management.

52

Fig. 5. NSW: predicted (blue) vs. actual load (red), December 2011.Figure 6.2: Predicted (red) vs actual Australian loads for December 2011

Figure 6.3: Predicted (red) vs actual Australian loads, last week of December2011

The procedure applied was to start the forecast at hour 1 of each day, thusforecasting hours 1,2, ..., 24 and then shift to the next day. This is not thebest approach, mainly for two reasons: first of all, in a real application, itis in general not true that forecasts are made starting from the first hour ofthe day; secondly, and most important, an online update of the forecasts (i.e.a continuous process, forecasting hours 1-24 first, then 2-25, 3-26, ..., 24-47,etc.) would lead to better performances in terms of prediction accuracy and,then, risk management.

52

Fig. 6. NSW: predicted (blue) vs. actual load (red), a week in December2011.

Figure 6.4: Predicted (red) vs actual Australian loads for a day in December2011

hours 1 2 3 4 5 6 7 8MAPE-365 1.16 1.22 1.30 1.35 1.69 2.72 3.85 3.97

std-365 1.01 1.11 1.10 1.15 1.48 2.79 4.40 4.67MAPE-8760 1.56 1.86 2.13 2.32 2.48 2.60 2.71 2.82

std-8760 1.44 1.93 2.31 2.57 2.76 2.91 3.05 3.18hours 9 10 11 12 13 14 15 16

MAPE-365 3.23 3.00 3.22 3.54 3.86 4.15 4.39 4.40std-365 3.84 3.27 3.24 3.45 3.82 4.29 4.66 4.71

MAPE-8760 2.92 3.00 3.08 3.17 3.26 3.33 3.39 3.43std-8760 3.31 3.41 3.50 3.62 3.75 3.85 3.91 3.92hours 17 18 19 20 21 22 23 24

MAPE-365 4.47 4.25 3.74 3.60 3.44 3.04 2.53 2.42std-365 4.70 4.36 3.95 3.95 3.66 3.25 2.67 2.52

MAPE-8760 3.43 3.50 3.54 3.59 3.64 3.68 3.73 3.79std-8760 3.94 3.98 4.03 4.07 4.10 4.11 4.12 4.14

Therefore, the results yield until now are only partly representative of thepossible applications. In fact, the prediction accuracy and its standard de-viation is computed for only one of the 24 starting points. Furthermore, itis implied that the accuracy, for example, of a one hour ahead prediction istested only on the first hour of the day and, similarly, the accuracy of a 7hours ahead prediction is tested only on the 7th hour of the day.Based on these considerations, we now provide results for such an approach,with the aim of providing more reliable results on each forecasting horizon(1 to 24 hours) and a stronger basis to our conclutions.Evidently, while following the first approach ends up in 365 24 hours aheadpredictions (the whole test-set), with this approach 8760 (24*365) 24 hoursahead predictions need to be computed to evaluate performance on the test-

53

Fig. 7. NSW: predicted (blue) vs. actual load (red), a day in December 2011.

B. Portuguese Data

Table V shows the accuracy on the Portuguese dataset.We can see that MLWNN is more accurate than WNN.Overall, MLWNN is the second best performing approachafter INN; WNN is third, followed by the daily baselineand PSF, and finally by the weekly baseline. The t-test forstatistical significance showed that the accuracy of MLWNNis significantly lower (-) than INN and significantly higher (+)than all other methods at a confidence level of 99%.

Figs. 8, 9 and 10 show the predicted and actual load for thesame month, week and day as for the Australian data. Note thatthe y-axes are different for the two datasets. The Portuguesedataset is the most difficult to predict. As Fig. 10 shows, the

TABLE VPERFORMANCE COMPARISON FOR PORTUGUESE DATASET


MAPE 13.42 11.70 14.95 16.18 16.06 19.12MAE 476.6 426.4 538.9 589.8 579.6 695.4t-test - + + + +

daily pattern for the Portuguese data is more complex and thismay be reason for the lower accuracy.

C. Spanish Data

Table VI presents the comparison results for the Spanishdataset. MLWNN is the third most accurate approach, afterINN and WNN. However, the accuracy of WNN is onlyslightly higher than the accuracy of MLWNN and this dif-ference is not statistically significant. More specifically, thet-test for statistical significance of the accuracy differencesat a confidence level of 99%, showing that the accuracy ofMLWNN is statistically higher (+) than PSF and the twobaselines, statistically lower (-) than INN and not statisticallydifferent (=) than the accuracy of WNN.

Below some figures are reported to let the reader visualize the di↵erence inperformances compared to the previous dataset.

Figure 6.10: Predicted (red) vs actual Portuguese loads for December 2011

The first graphic plots prediction (in red) against actual samples (in blue)for the month of December 2011, while the second one provides a closer lookon a specific week of the same month.Even if in the majority of the cases the prediction is very close to the ac-tual behavior of the timeseries, some peaks in the error can be recognized.Those peaks are generally present during the central hours of the day, whenhuman and industrial activities generate unpredictable spikes for the stan-dard model. Spikes (or, more in general, outliers) prediction will be furtherinvestigated in future work.The weekly pattern is hard to be found in these data, which are much moreirregular than the Australian ones.The last picture represents the forecasting of a day in December.

It is clear how predicting Portuguese electricity loads is a much more toughtask than predicting Australian ones.

As explained before, predicting every day starting from the first houris not fully representative of how the performance, so in the next table thehourly Mean Average Percentage Error and its standard deviation is reportedin a comparison between the two tests: one always starting from hour 1 ofthe day and then shifting to the next day, the other making a one day aheadprediction starting from each of the 24 hours of each day.In Figure 6.13 hourly MAPE for the two tests are compared. It is interestingto note that here, di↵erently than on Australian data, the MAPE of the first

57

Fig. 8. Portugal: predicted (blue) vs. actual load (red), December 2011.

Figure 6.11: Predicted (red) vs actual Portuguese loads, last week of Decem-ber 2011

Figure 6.12: Predicted (red) vs actual Portuguese loads for a day in December2011

hour in the case of starting points not fixed to be hour 1 is smaller thanin the standard approach. This leads to the logical conclusion that, whilein New South Wales forecasting the first hour is simpler than forecasting ageneral hour of the day, the same can not be said about Portuguese data.

58

Fig. 9. Portugal: predicted (blue) vs. actual load (red), a week in December2011.

TABLE VIPERFORMANCE COMPARISON FOR SPANISH DATASET


MAPE 6.78 5.79 6.03 8.87 9.47 7.45MAE 1321.6 1134.6 1179.9 1711.4 1888.0 1460.1t-test - = + + +

The Spanish data is more difficult to predict than theAustralian data but easier than the Portuguese data. Figs. 11,12 and 13 show the actual loads compared to the predicted

Figure 6.11: Predicted (red) vs actual Portuguese loads, last week of Decem-ber 2011

Figure 6.12: Predicted (red) vs actual Portuguese loads for a day in December2011

hour in the case of starting points not fixed to be hour 1 is smaller thanin the standard approach. This leads to the logical conclusion that, whilein New South Wales forecasting the first hour is simpler than forecasting ageneral hour of the day, the same can not be said about Portuguese data.

58

Fig. 10. Portugal: predicted (blue) vs. actual load (red), a day in December2011.

ones, for a month, a week and a day, in December 2011,respectively.


MAE 1321.62 1134.63 1179.89 1711.38 1888.02 1460.07 2671.69MAPE 6.78 5.79 6.03 8.87 9.47 7.45 14.32

Figures from 6.18 to 6.20 report the predicted values compared to the actualones for December. At first sight the results seem to be inaccurate, but itmust be specified that December is one of the most di�cult months to bepredicted due to the presence of outliers (Christmas holidays).

Figure 6.18: Predicted (red) vs actual Spanish loads for December

Figure 6.19: Predicted (red) vs actual Spanish loads, last week of December

Observing Figure 6.20, the error is higher in the central hours of the day,while the prediction is very accurate for the first and last hours of the day:

62

Fig. 11. Spain: predicted (blue) vs. actual load (red), December 2011.


MAE 1321.62 1134.63 1179.89 1711.38 1888.02 1460.07 2671.69MAPE 6.78 5.79 6.03 8.87 9.47 7.45 14.32

Figures from 6.18 to 6.20 report the predicted values compared to the actualones for December. At first sight the results seem to be inaccurate, but itmust be specified that December is one of the most di�cult months to bepredicted due to the presence of outliers (Christmas holidays).

Figure 6.18: Predicted (red) vs actual Spanish loads for December

Figure 6.19: Predicted (red) vs actual Spanish loads, last week of December

Observing Figure 6.20, the error is higher in the central hours of the day,while the prediction is very accurate for the first and last hours of the day:

62

Fig. 12. Spain: predicted (blue) vs. actual load (red), a week in December2011.

this is consistent with human activity which deeply influence the behavior ofthe timeseries.

Figure 6.20: Predicted (red) vs actual Spanish loads for a day in December

The following table compares hourly MAPE and standard deviation whenthe starting point is fixed and when it is not.

hours 1 2 3 4 5 6 7 8MAPE-365 4.29 5.39 6.56 6.98 7.31 7.16 6.84 8.63

std-365 3.43 4.53 5.76 6.39 6.77 7.05 7.60 7.60MAPE-8760 3.68 4.22 4.65 4.96 5.19 5.38 5.54 5.70

std-8760 3.58 4.44 5.18 5.71 6.04 6.30 6.49 6.67hours 9 10 11 12 13 14 15 16

MAPE-365 10.47 9.26 7.74 7.15 6.97 6.80 6.38 6.65std-365 12.45 10.84 8.94 8.19 7.72 7.67 6.94 7.30

MAPE-8760 5.87 6.05 6.23 6.42 6.58 6.73 6.85 6.96std-8760 6.86 7.02 7.22 7.41 7.59 7.72 7.79 7.86

hours 17 18 19 20 21 22 23 24MAPE-365 7.25 7.13 6.72 6.47 5.72 5.30 4.89 4.53

std-365 7.96 7.92 7.14 6.84 5.61 5.02 4.40 4.09MAPE-8760 7.05 7.12 7.18 7.20 7.24 7.27 7.31 7.39

std-8760 7.92 8.00 8.06 8.09 8.09 8.09 8.07 8.07

Both the table and the figure below clearly show the di↵erent behavior ofthe forecasting error: in the first case, MAPE has peaks in the central hoursof the day, while in the second it increases with the forecasting horizon, asexpected.

63

Fig. 13. Spain: predicted (blue) vs. actual load (red), a day in December2011.

In summary, we can conclude that MLWNN is able to obtainvery promising results. It outperformed WNN (the algorithmit extends) on the Australian and Portuguese data, and wasonly slightly less accurate than WNN on the Spanish data.It also outperformed all the baselines and the state-of-the-art approach PSF on all datasets, and produced comparable

results to INN. On the Australian data, MLWNN was thebest performing algorithm. In terms of computational cost,some optimization efforts might be carried out in futureresearch works, even considering the framework of big dataanalysis [33] or exhaustive search by quantum computation[34]. Nevertheless, the most ‘strict’ constraint, represented bythe computational time that must be less than one hour incase of one hour ahead prediction, is widely respected by thealgorithm in almost all the operative situations.

VI. CONCLUSION

In this paper, we presented MLWNN, a new approach fortime series forecasting that is applicable for any forecastinghorizon. It combines nearest neighbor sequence similarityprediction with optimization techniques. To make a predictionfor a future value or a set of values, MLWNN finds similarprevious sequences of values and combines them to produce afinal prediction. In particular, it finds the longest sequencesof values with a similarity higher than a given threshold,where the best value of the threshold is determined by an opti-mization procedure. In contrast to other advanced approaches,MLWNN directly operates on a single sequence and does notrequire the time series to be organized into vectors with aparticular length (e.g. 24 dimensional vectors for hourly dailyelectricity loads).

MLWNN was applied and evaluated for predicting the 24hourly electricity loads for the next day, using data fromthree different countries. A comparison with several advancedforecasting methods and baselines showed that MLWNN isa promising approach for one day ahead electricity loadforecasting. It outperformed WNN, the approach it extends,on two of the datasets and obtained a similar performance onthe third one.

In future works, we plan to investigate about the sensitivityof the proposed approach on some parameter settings and todevelop further extensions of MLWNN by using advancedregression models based on NNs and fuzzy NNs.

REFERENCES

[1] T. Mills, Time Series Techniques for Economists. Cambridge, UK:Cambridge University Press, 1990.

[2] A. Pankratz, Forecasting with Dynamic Regression Models. New York,NY, USA: John Wiley and Sons, 1991.

[3] D. Percival and A. Walden, Spectral Analysis for Physical Applications.Cambridge, UK: Cambridge University Press, 1993.

[4] G. Box, G. Jenkins, and G. Reinsel, Time Series Analysis: Forecastingand Control, third edition. Englewood Cliffs, NJ, USA: Prentice-Hall,1994.

[5] R. Donaldson and M. Kamstra, “An artificial neural network-GARCHmodel for international stock return volatility,” Journal of EmpiricalFinance, vol. 4, pp. 17–46, 1997.

[6] F. Miranda and N. Burgess, “Modelling market volatilities: the neuralnetwork perspective,” European Journal of Finance, vol. 3, pp. 137–157,1997.

[7] K. Kyoung-jae, “Financial time series forecasting using support vectormachines,” Neurocomputing, vol. 55, no. 1-2, pp. 307 – 319, 2003.

[8] W. Xie, L. Yu, L. Xu, and S. Wang, “A new method for crude oilprice forecasting based on support vector machines,” in Lecture notesin computer science, V. A. et al., Ed. Heidelberg, Germany: Springer,2006, pp. 444–451.

[9] I. Haidar, S. Kulkarni, and H. Pan, “Forecasting model for crude oilprices based on artificial neural networks,” in Proc. of InternationalConference on Intelligent Sensors, Sensor Networks and InformationProcessing (ISSNIP 2008), Sydney, NSW, Australia, 2008, pp. 103–108.

[10] M. Panella, F. Barcellona, and R.L. D’Ecclesia, “Subband predictionof energy commodity prices,” in Proc. of the IEEE Int. Workshopon Signal Processing Advances in Wireless Communications (SPAWC2012). Cesme, Turkey: IEEE, 2012.

[11] S. Kulkarni and I. Haidar, “Forecasting model for crude oil price usingartificial neural networks and commodity futures prices,” InternationalJournal of Computer Science and Information Security (IJCSIS), vol. 2,no. 1, pp. 81–88, 2009.

[12] P. Brockwell and R. Davis, Time Series: Theory and Methods, 2nd ed.New York, NY, USA: Springer-Verlag, 2009.

[13] M. Panella, F. Barcellona, and R.L. D’Ecclesia, “Forecasting energycommodity prices using neural networks,” Advances in Decision Sci-ences, vol. 2012, 2012.

[14] J. Matzenberger, “Neuronal network based modelling of demand andcompeting use of forestry commodities for material and energy use,”Energy Procedia, vol. 40, no. 0, pp. 156 – 164, 2013.

[15] A. Sato, L. Pichl, and T. Kaizoji, Using Neural Networks for Forecastingof Commodity Time Series Trends, ser. Lecture Notes in ComputerScience, A. Madaan, S. Kikuchi, and S. Bhalla, Eds. Springer BerlinHeidelberg, 2013, vol. 7813.

[16] M. Panella, L. Liparulo, F. Barcellona, and R.L. D’Ecclesia, “A studyon crude oil prices modeled by neurofuzzy networks,” in Proc. of 2013IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2013),Hyderabad, India, 2013.

[17] A. Troncoso, J. M. Riquelme, J. C. Riquelme, J. L. Martinez, andA. Gomez, “Electricity market price forecasting based on weightednearest neighbor techniques,” IEEE Transactions on Power Systems,vol. 22, pp. 1294–1301, 2007.

[18] M. Rana, I. Koprinska, and A. Troncoso, “Forecasting hourly electricityload profile using neural networks,” in Neural Networks (IJCNN), 2014International Joint Conference on, July 2014, pp. 824–831.

[19] F. Martinez-Alvarez, A. Troncoso, J. C. Riquelme, and J. S. Aguilar-Ruiz, “Energy time series forecasting based on pattern sequence similar-ity,” IEEE Transactions on Knowledge and Data Engineering, vol. 23,pp. 1230–1243, 2011.

[20] J. W. Taylor, L. M. d. Menezes, and P. E. McSharry, “A comparisonof univariate methods for forecasting electricity demand up to a dayahead,” International Journal of Forecasting, vol. 22, pp. 1–16, 2006.

[21] J. W. Taylor and P. E. McSharry, “Short-term load forecasting methods:an evaluation based on european data,” IEEE Transactions on PowerSystems, vol. 22, pp. 2213–2219, 2007.

[22] L. J. Soares and M. C. Medeiros, “Modeling and forecasting short-termelectricity load: a comparison of methods with an application to braziliandata,” International Journal of Forecasting, vol. 24, pp. 630–644, 2008.

[23] S. Fan and R. J. Hyndman, “Short-term load forecasting based on asemi-parametric additive model,” EEE Transactions on Power Systems,vol. 27, pp. 134–141, 2012.

[24] S. Haykin, Neural Networks, a Comprehensive Foundation, 2nd Edition.Englewood Cliffs, NJ, USA: Prentice-Hall, 1999.

[25] S. Theodoridis and K. Koutroumbas, Pattern Recognition, Fourth Edi-tion. Burlington, MA: Academic Press, Elsevier Inc., 2009.

[26] M. Panella, L. Liparulo, and A. Proietti, “A higher-order fuzzy neuralnetwork for modeling financial time series,” in Proc. of 2014 Interna-tional Joint Conference on Neural Networks (IJCNN 2014), July 2014,pp. 3066–3073.

[27] A. Reis and A. A. da Silva, “Feature extraction via multiresolutionanalysis for short-term load forecasting,” IEEE Transactions on PowerSystems, vol. 20, pp. 189–198, 2005.

[28] Y. Chen, P. Luh, C. Guan, Y. Zhao, L. Miche, M. Coolbeth, P. Friedland,and S. Rourke, “Short-term load forecasting: similar day-based waveletneural network,” IEEE Transactions on Power Systems, vol. 25, pp. 322–330, 2010.

[29] S. Haykin and J. Principe, “Making sense of a complex world [chaoticevents modeling],” IEEE Signal Process. Mag., vol. 15, no. 3, pp. 66–81,1998.

[30] B. Widrow and R. Winter, “Neural nets for adaptive filtering andadaptive pattern recognition,” Computer, vol. 12, no. 3, pp. 25–39, 1988.

[31] M. Panella, “Advances in biological time series prediction by neuralnetworks,” Biomedical Signal Processing and Control, vol. 6, no. 2, pp.112–120, 2011.

[32] H. Abarbanel, Analysis of Observed Chaotic Data. New York, USA:Springer-Verlag, Inc., 1996.

[33] S. Scardapane, D. Wang, M. Panella, and A. Uncini, “DistributedLearning with Random Vector Functional-Link Networks,” InformationSciences, vol. 301, pp. 271–284, 2015.

[34] M. Panella and G. Martinelli, “Neurofuzzy networks with nonlinearquantum learning,” IEEE Trans. Fuzzy Syst., vol. 17, no. 3, pp. 698–710, 2009.

Maximum Length Weighted Nearest Neighbor Approach …ikop0282/ijcnn2015t.pdf · Maximum Length...

Documents

Transcript of Maximum Length Weighted Nearest Neighbor Approach …ikop0282/ijcnn2015t.pdf · Maximum Length...