Household Electricity Demand Forecasting - Benchmarking ... · PDF fileconsumption is a vital...

10
Household Electricity Demand Forecasting - Benchmarking State-of-the-Art Methods Andreas Veit, Christoph Goebel, Rohit Tidke, Christoph Doblander and Hans-Arno Jacobsen Department of Computer Science, Technische Universität München [email protected], [email protected], [email protected], [email protected], [email protected] ABSTRACT The increasing use of renewable energy sources with vari- able output, such as solar photovoltaic and wind power gen- eration, calls for Smart Grids that effectively manage flexible loads and energy storage. The ability to forecast consump- tion at different locations in distribution systems will be a key capability of Smart Grids. The goal of this paper is to benchmark state-of-the-art methods for forecasting electric- ity demand on the household level across different granular- ities and time scales in an explorative way, thereby reveal- ing potential shortcomings and find promising directions for future research in this area. We apply a number of fore- casting methods including ARIMA, neural networks, and exponential smoothening using several strategies for train- ing data selection, in particular day type and sliding window based strategies. We consider forecasting horizons ranging between 15 minutes and 24 hours. Our evaluation is based on two data sets containing the power usage of individual ap- pliances at second time granularity collected over the course of several months. The results indicate that forecasting accu- racy varies significantly depending on the choice of forecast- ing methods/strategy and the parameter configuration. Mea- sured by the Mean Absolute Percentage Error (MAPE), the considered state-of-the-art forecasting methods rarely beat corresponding persistence forecasts. Overall, we observed MAPEs in the range between 5 and >100%. The average MAPE for the first data set was 30%, while it was 85% for the other data set. These results show big room for im- provement. Based on the identified trends and experiences from our experiments, we contribute a detailed discussion of promising future research. 1. INTRODUCTION According to the US Department of Energy, the cre- ation of a sustainable and energy-efficient society is one of the greatest challenges of this century, as traditional non-renewable sources of energy are depleting and ad- verse effects of carbon emissions are being felt [27]. Two key issues in creating a sustainable and energy-efficient society are reducing peak energy demands and increas- ing the penetration of renewable energy sources. The authors of [10] outline a cs research agenda to help achieve this goal. To achieve a reliable operation of the electricity distribution system, supply and load have to be balanced within a tight tolerance in real time. Load forecasting has therefore been a major issue in power systems operations [19].Today, with increasing decen- tralized generation of electricity, there is a need for con- trolling of smaller zones of the electric grid. Smart Grids enable micromanagement of those zones. In [4], the au- thors describe how accurate load forecasts can greatly enhance the micro-balancing capabilities of smart grids, if they are utilized for control operations and decisions like dispatch, unit commitment, fuel allocation and off- line network analysis. Thus, the prediction of energy consumption is a vital factor towards successful energy management. Load forecasts can be performed on different volt- age levels in the grid. Forecasts can be performed on the transmission level, the distribution level and even the individual household and device level, because with the introduction of smart meters, the load can now be measured on the household level. Even more granular forecasts can be performed on the appliance levels with installations of energy sensors or energy consumption disaggregation [31]. Recently, there have been many studies on the disaggregation of electricity consumption of households into individual appliances [5]. However, the short term forecasting of individual household con- sumption has not been evaluated to a satisfactory ex- tent. Considering the importance of short-term load fore- casting in demand and supply balancing, we conduct experiments in order to compare state-of-the-art fore- casting methods. The growing public availability of electricity consumption data gives the opportunity to analyze and benchmark possible forecasting methods and strategies. In our experiments we use Autoregres- sive Integrated Moving Average (ARIMA), exponential smoothing and neural networks for univariate time se- ries. In addition, we apply three different forecasting strategies: a sliding window approach, a day type ap- proach and a hierarchical day type approach. The anal- 1 arXiv:1404.0200v1 [cs.LG] 1 Apr 2014

Transcript of Household Electricity Demand Forecasting - Benchmarking ... · PDF fileconsumption is a vital...

Page 1: Household Electricity Demand Forecasting - Benchmarking ... · PDF fileconsumption is a vital factor towards ... of short-term load fore-casting in demand and supply ... direct load

Household Electricity Demand Forecasting- Benchmarking State-of-the-Art Methods

Andreas Veit, Christoph Goebel, Rohit Tidke,Christoph Doblander and Hans-Arno Jacobsen

Department of Computer Science, Technische Universität Mü[email protected], [email protected], [email protected],

[email protected], [email protected]

ABSTRACTThe increasing use of renewable energy sources with vari-able output, such as solar photovoltaic and wind power gen-eration, calls for Smart Grids that effectively manage flexibleloads and energy storage. The ability to forecast consump-tion at different locations in distribution systems will be akey capability of Smart Grids. The goal of this paper is tobenchmark state-of-the-art methods for forecasting electric-ity demand on the household level across different granular-ities and time scales in an explorative way, thereby reveal-ing potential shortcomings and find promising directions forfuture research in this area. We apply a number of fore-casting methods including ARIMA, neural networks, andexponential smoothening using several strategies for train-ing data selection, in particular day type and sliding windowbased strategies. We consider forecasting horizons rangingbetween 15 minutes and 24 hours. Our evaluation is basedon two data sets containing the power usage of individual ap-pliances at second time granularity collected over the courseof several months. The results indicate that forecasting accu-racy varies significantly depending on the choice of forecast-ing methods/strategy and the parameter configuration. Mea-sured by the Mean Absolute Percentage Error (MAPE), theconsidered state-of-the-art forecasting methods rarely beatcorresponding persistence forecasts. Overall, we observedMAPEs in the range between 5 and >100%. The averageMAPE for the first data set was ~30%, while it was ~85%for the other data set. These results show big room for im-provement. Based on the identified trends and experiencesfrom our experiments, we contribute a detailed discussion ofpromising future research.

1. INTRODUCTIONAccording to the US Department of Energy, the cre-

ation of a sustainable and energy-efficient society is oneof the greatest challenges of this century, as traditionalnon-renewable sources of energy are depleting and ad-verse effects of carbon emissions are being felt [27]. Twokey issues in creating a sustainable and energy-efficientsociety are reducing peak energy demands and increas-ing the penetration of renewable energy sources. The

authors of [10] outline a cs research agenda to helpachieve this goal. To achieve a reliable operation of theelectricity distribution system, supply and load have tobe balanced within a tight tolerance in real time. Loadforecasting has therefore been a major issue in powersystems operations [19].Today, with increasing decen-tralized generation of electricity, there is a need for con-trolling of smaller zones of the electric grid. Smart Gridsenable micromanagement of those zones. In [4], the au-thors describe how accurate load forecasts can greatlyenhance the micro-balancing capabilities of smart grids,if they are utilized for control operations and decisionslike dispatch, unit commitment, fuel allocation and off-line network analysis. Thus, the prediction of energyconsumption is a vital factor towards successful energymanagement.

Load forecasts can be performed on different volt-age levels in the grid. Forecasts can be performed onthe transmission level, the distribution level and eventhe individual household and device level, because withthe introduction of smart meters, the load can now bemeasured on the household level. Even more granularforecasts can be performed on the appliance levels withinstallations of energy sensors or energy consumptiondisaggregation [31]. Recently, there have been manystudies on the disaggregation of electricity consumptionof households into individual appliances [5]. However,the short term forecasting of individual household con-sumption has not been evaluated to a satisfactory ex-tent.

Considering the importance of short-term load fore-casting in demand and supply balancing, we conductexperiments in order to compare state-of-the-art fore-casting methods. The growing public availability ofelectricity consumption data gives the opportunity toanalyze and benchmark possible forecasting methodsand strategies. In our experiments we use Autoregres-sive Integrated Moving Average (ARIMA), exponentialsmoothing and neural networks for univariate time se-ries. In addition, we apply three different forecastingstrategies: a sliding window approach, a day type ap-proach and a hierarchical day type approach. The anal-

1

arX

iv:1

404.

0200

v1 [

cs.L

G]

1 A

pr 2

014

Page 2: Household Electricity Demand Forecasting - Benchmarking ... · PDF fileconsumption is a vital factor towards ... of short-term load fore-casting in demand and supply ... direct load

ysis of these methods on multiple data sets can givean indication of the optimal parameterization and us-age. Therefore, we used two electricity consumptiondata sets: one collected by researchers at the TechnischeUniversitat Munchen and one from the MassachusettsInstitute of Technology. For the comparison of the dif-ferent methods and strategies we used different granu-larities of consumption data, i.e., sampling frequenciesfrom 15 up to 60 minutes, and varied the time hori-zons for the forecasts from very short-term forecast of15 minutes up to forecasts of 24 hours. In order to com-pare the results of the different methods and strategiesand the influence of the granularity and forecast hori-zon, we tested the accuracy of the forecast with theMean Absolute Percentage Error. Overall, we observedMAPEs in the range between 5 and >100%, with theaverage MAPE for the first data set being ~30% and~85% for the second data set, respectively. Looking atthe performance of the algorithms and strategies, wesee that most of the algorithms benefit from splittingthe data into training sets of particular day types andthat predictions based on disaggregated data from in-dividual appliances leads to better results. Generally,we show that without further refinement of advancedmethods such as ARIMA and neural networks, the per-sistence forecasts are hard to beat in short-term fore-casts. Especially in households with demand profilesthat remain constant for many hours during a typicalday, advanced forecasting methods provide little value,if they are not embedded into a framework that adaptstheir use to individual household attributes. Thereforewe also provide an exploration of promising directionsfor future research. These experimental results and theexploration of future research directions are the primarycontributions of this paper.

This paper is organized as follows: In Section 2, wewill review related literature and identify the researchgap. In Section 3, we describe the electricity consump-tion data we use in our experiments and explain allperformed transformations. Subsequently, in Section 4,we describe our experimental setup and the forecastingmethods and strategies used in our experiments. Sec-tion 5 presents the results of our experiments and Sec-tion 6 discusses our findings and explores directions forfuture research.

2. RELATED WORKDemand side management and demand response re-

ceive increasing attention by research and industry. Theresearch work published so far includes a variety of di-rections from direct load control or targeted customerinteraction to indirect incentive-based control (see [20]for an overview). In order to help balancing demandand supply, demand side management programs requireaccurate predictions of consumer demand.

The approaches for demand side management focuson different levels of the power system. On the grid op-erator level, studies focus for example on the minimiza-tion of power flow fluctuations [26] or the integrationof renewable energy [29]. The distribution grid opera-tor uses consumption forecasts to balance grids with ahigh penetration of decentralized generation of renew-able energy (e.g., [16], [9]). Other studies look at thelevel of groups of consumers with a focus on game theo-retic frameworks [21] or virtual price signals [28]. Mostwork on demand response, however, focuses on the levelof end consumers. Recent research has studied the useof variable price signals for individual customers. Thesedynamic tariffs penalize consumption during certain pe-riods of time with increased electricity prices, so thatcustomers can respond by adjusting their consumption(e.g., [2], [12]). However, [24] points out that demandside management with variable price signals can causeinstabilities through load synchronization. To avoid un-controlled behavior, accurate consumption forecasts canhelp utilities to select customers that are most suitablefor a demand response program.

First studies have analyzed the potential and firstprototypes of consumption forecasts for individual house-holds (e.g. [31], [30] ). However, most work on house-hold consumption focuses on disaggregation of electric-ity consumption. Examples include [18], [17], [13], [1]and [15]. The authors of [5] give an overview of thestate-of-the-art in this area. In this paper, we bench-mark state-of-the-art forecasting models for householdconsumption and also evaluate how the disaggregationof consumption data influences the prediction of house-hold consumption.

3. ELECTRICITY CONSUMPTION DATAThe data collected by the smart meters or smart

home infrastructures include differing sets of attributes.The most common metrics are wattage readings or ac-cumulated energy at discrete time steps. While someconsumption data sets are univariate time series onlyconsisting of the overall electricity consumption readingfrom a household, other data sets consist of multivariatedata including, for example, readings from a system ofsensors distributed over a household. In our experimentwe use data sets from the second category.

3.1 Data SetsWe use two different data sets for our experiments.

To perform the same experiments using both data sets,a transformation of the data was necessary. These trans-formations are explained in the following section.

3.1.1 The TUM Home Experiment Data SetIn the TUM Home Experiment, a single household

in Germany, in the state of Bavaria, was equipped with

2

Page 3: Household Electricity Demand Forecasting - Benchmarking ... · PDF fileconsumption is a vital factor towards ... of short-term load fore-casting in demand and supply ... direct load

0

500

1000

1500

2000

0

100

200

300

400

500

RE

DD

TUM

0 24 48 72 96 120 144 168 192 216 240 264 288 312hours

pow

er

Figure 1: Demand profiles from the data sets.

a distributed network of Pikkerton sensors measuringpower in Watt, on/off status and energy in kWh fromseveral appliances. The measured appliances includelights in the kitchen, antechamber and living room, thefridge, washing machine, office and entertainment de-vices. The data used for this experiment was collectedfrom February 4th 2013 to October 31st 2013. Fig-ure 1 shows in the lower graph the demand profile fromFebruary 21st to March 5th 2013. From the figure itcan be seen that the demand is flat for long time inter-vals, with occasional peaks, especially in the evenings.In particular, 70% of all power readings lie between 25and 30W. Figure 2 shows the empirical cumulative dis-tribution function of the power readings. The graphillustrates the very steep increase of power frequency ataround 25 to 30 Watt. In the following, we will refer tothis data set as the TUM data set.

3.1.2 The Reference Energy Disaggregation Data SetThe Reference Energy Disaggregation Data Set (REDD)

is a public data set for energy disaggregation research [18].The REDD data set is provided by the MassachusettsInstitute of Technology and contains power consump-tion measurements of 6 US households recorded for 18days between April 2011 and June 2011. The dataset contains high frequency and low frequency readingsand includes readings from the main electrical circuitsas well as readings from individual appliances such aslights, microwave and refrigerator. For the experimentspresented in this paper we use the low frequency read-ings of the individual appliances, which are sampled atintervals of 3 seconds. Figure 1 shows in the uppergraph the demand profile for house 1 from April 19thto May 1st 2011. From the figure it can be seen that,in contrast to the TUM data set, the REDD aggregatedemand has more frequent and higher fluctuations. Asa result, the cumulative distribution of the power read-ings shown in Figure 2 has a flatter slope. We will referto this data set as REDD data set.

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

RE

DD

TUM

0 100 200 300 400 500power

empi

rical

cum

ulat

ive

dist

ribut

ion

func

tion

Figure 2: Cumulative distribution of power.

3.2 Data TransformationThe data sets used in our experiment come in differ-

ent formats. To achieve comparable results, they needto be transformed to obtain uniformity and to allow thegeneration of data sets at the required granularities forthe experiment. The transformation can be divided intothree steps:

Step 1: The data sets are transformed into a commonformat. Since the readings are at different frequencies,we convert the time indicators into UNIX timestampsand the granularity to one minute.

Step 2: Statistical time series forecasting relies on theassumption that time series are equally spaced. In [7],the author explains that most research has been con-ducted on equally spaced time series. In [8], he explainsthat in case of unequally spaced time series interpola-tion methods should be used to transform unequallyspaced intervals into equally spaced intervals. Usuallylinear interpolations are performed for this transforma-tion. After interpolating the gaps in the time series,standard models for equally spaced intervals can beused. In the data sets used for our experiments severalbreaks in the time series exist, due to meters or sensorsnot providing measurements. These breaks cannot allbe interpolated, because the interpolation of long inter-vals can have a significant influence on the statisticalforecasting model. As the main seasonality in electric-ity consumption data is one day, longer breaks of sev-eral hours can no longer be interpolated. Interpolationwould otherwise distort the forecasting models. Deter-mining the optimal length of interpolation intervals isitself an optimization problem. In this paper, we inter-polate intervals up to a length of two hours using linearinterpolation.

Step 3: The different strategies we use in our exper-iments require different formats for their data. Firstwe use a sliding window strategy where we select train-ing data windows of specific lengths to predict futureload. For this strategy a continuous time series is nec-essary. Therefore we select the longest period withoutbreaks longer than 2 hours. We also evaluate day type

3

Page 4: Household Electricity Demand Forecasting - Benchmarking ... · PDF fileconsumption is a vital factor towards ... of short-term load fore-casting in demand and supply ... direct load

strategies, where the forecasting models are trained us-ing data from similar days of the week. For these strate-gies we create a cross-sectional data set divided by thedays of the week. We join each day of the week, e.g.,Mondays of consecutive weeks, into one data set.

In the following, we explain the transformations per-formed on each data set.

3.2.1 Transformation of TUM Data SetThe TUM data set as introduced above is a multi-

variate data set containing measurements from severalappliances in the experiment house. Table 1 shows anextract from the raw data set. In order to be consis-tent with the REDD data set, the time stamps and thegranularity are converted to the UNIX format and oneminute intervals. Since the readings from all applianceshave been stored in one big data set, we extract thepower readings and split the data set into individualchannels for the different appliances and subsequentlyinterpolate gaps of up to two hours. Then, for the hi-erarchical strategy we created a separate data set foreach appliance and for each day of the week. For sev-eral times, no data is available from some appliances,but data is available for other appliances. Such incom-plete data could disrupt forecasts, because the forecast-ing model would assume the appliance to be switchedoff, although it is running. To obtain a consistent dataset, we only consider durations where data is availablefrom all appliances. Afterwards, we aggregate the differ-ent appliance channels for the day type and the slidingwindow strategies. To get a continuous time series forthe sliding window strategy, we choose to only use thedata from the longest consistent distinct data set, whichis the data of the period from Feb 20th 2013 09:13:00GMT to Apr 5th 2013 05:44:00 GMT.

3.2.2 Transformation of REDD Data SetThe REDD data set also contains measurements from

several appliances. They are already divided into sepa-rate channels for the individual appliances. To be con-sistent with the TUM data set the time stamps and thegranularity have been converted to the common UNIXformat. Although the data set contains readings fromsix different houses, we only used the data from houseno. 1, as it contains a long enough period of measure-ments. For the other houses we can neither perform theday type nor the hierarchical forecasting strategies, asthey only contain data from 2 up to 3 days for each dayof the week. For the data from house 1 we then interpo-late gaps of up to two hours and aggregate the differentappliance channels for the day type and the sliding win-dow strategies. To get a continuous time series for thesliding window strategy, we choose the longest consis-tent distinct data set, i.e., the data from Apr 18th 201122:00:00 GMT to May 2nd 2011 21:59:00 GMT.

4. EXPERIMENTSIn this section we will introduce the different forecast-

ing methods and strategies we use in our experiment aswell as their specific parameterizations.

4.1 Forecasting MethodsFirst, as a benchmark for the other forecasting meth-

ods we include the persistence method, where all fore-casts are equal to the last observation. We will refer tothis method as PERSIST. For short forecasting horizonsand high granularities of consumption data, persistenceforecasts are known as hard to beat by the other meth-ods.

Furthermore, we use the Autoregressive IntegratedMoving Average (ARIMA) model. The model is de-noted as ARIMA(p, d, q)(P,D,Q), where the non-seasonalcomponents are defined in the first parentheses and theseasonal components of the model are defined in thesecond parentheses. The parameters (p, P ) denote thenumber of lagged variables, i.e., the number of last ob-servations for autoregression in the seasonal and non-seasonal components. The parameters (d,D) denote thedifference that is necessary to make the time series sta-tionary. Lastly the parameters (q,Q) denote the mov-ing average over the number of last observations. Tofind the optimal parameters, we use the auto.arima()

method provided by the R forecast package. Thisfunction provides the best ARIMA model according tothe minimization of Akaike information criterion with acorrection for finite sample sizes(AICc). The algorithmto determine the model parameters is described in [14].

Third, we use an Exponential smoothing state spacemodel (BATS) with Box-Cox transformation, ARMAerrors as well as trend and seasonal components. Themodel is denoted as BATS(ω, φ, p, q,m1,m2...mt), whereω is the Box-Cox, φ the damping, (p, q) the ARMA pa-rameters, and (m1,m2...mt) are the seasonal periods.We also apply the TBATS model, which uses trigono-metric functions for the seasonal decomposition. It isdenoted as TBATS(ω, φ, p, q, {m1k1}, {m2k2}...{mtkt}),where the parameter ki represents the number of har-monics required by the ith seasonal component. TheBATS and TBATS approach is explained in detail in [6].We used the implementations of the bats() and tbats()

functions provided by the R forecast package.Lastly, we also use feed-forward neural networks with

a single hidden layer and lagged inputs for forecast-ing univariate time series. The model is denoted asNNAR(p, P, k)m, where p is the number of non-seasonallags, P is the number of seasonal lags, k is the number ofnodes in the hidden layer and m is the seasonal period.The model is analogous to an ARIMA(p, 0, 0) (P, 0, 0)model, but with nonlinear functions. We used the im-plementations of the nnetar() function provided by theR forecast package. In this function the network is

4

Page 5: Household Electricity Demand Forecasting - Benchmarking ... · PDF fileconsumption is a vital factor towards ... of short-term load fore-casting in demand and supply ... direct load

Table 1: An extract from the TUM data set.

timestamp value property unit appliance

2013-10-01 20:21:33 1.538 WORK kWh light-livingroom

2013-10-01 20:21:33 ON POW boolean light-livingroom

2013-10-01 20:21:33 501875 FREQ Hz light-livingroom

2013-10-01 20:21:33 231 VRMS RMS plug-office

2013-10-01 20:21:33 30 LOAD Watt washingmachine

2013-10-01 20:21:34 0 LOAD Watt washingmachine

2013-10-01 20:21:34 55 IRMS RMS washingmachine

2013-10-01 20:21:34 0.636 WORK kWh washingmachine

2013-10-01 20:21:34 ON POW boolean washingmachine

2013-10-01 20:21:34 49.7500 FREQ Hz washingmachine

2013-10-01 20:21:34 231 VRMS RMS washingmachine

trained for one-step forecasts. For forecasts of longerhorizons, forecasts are computed recursively.

4.2 Forecasting StrategiesIn our experiment we use three different strategies to

sample the training and test data. In the following weintroduce the individual strategies:

4.2.1 Sliding Window StrategyFirst, we used the sliding window strategy, where the

data set is divided into windows of smaller parts witha defined length. Each created window of training datahas a corresponding test windows for cross validation tomeasure the accuracy of the prediction. The approachis illustrated in Figure 3a. The forecasting model isthen fitted to the validated window and tested againstthe test window. The main reason for using the slid-ing window approach is that the available data sets areof different length. Using standardized window lengthsallows comparing the results from different data sets.After a prediction model has been trained and tested,the window moves forward on the data set. The dis-tance the window is moved is called sliding length. Inour experiment we use sliding windows with a slidinglength of 24 hours.

4.2.2 Day Type StrategySecond, we use a day type strategy. While the sliding

window approach considers the data to be a continuoustime series, the day type approach uses cross-sectionaldata. The strategy is to join each day of the week of con-secutive weeks into separate data sets. The approach isillustrated in Figure 3b. The training data set and thetest data set are then sampled from the individual datasets. An example of such an approach is to join theMondays of consecutive weeks.

4.2.3 Hierarchical Day Type StrategyThird, we use a hierarchical day type strategy. A hi-

dataset

horizon

test setwindow

(a) Sliding Window Strategy

datasetSunMon Tue Wed Thu Fri Sat Mon

datasetMonMon Mon Mon Mon Mon

datasetTueTue Tue Tue Tue Tue

history of similar days

(b) Day Type Strategy

datasetSunMon Tue Wed Thu Fri Sat Mon

datasetMonMon Mon Mon Mon Mon

individual appliances

fridgeMon

ovenMon

lightsMon

(c) Hierarchical Day Type Strategy

Figure 3: Different forecasting strategies.

5

Page 6: Household Electricity Demand Forecasting - Benchmarking ... · PDF fileconsumption is a vital factor towards ... of short-term load fore-casting in demand and supply ... direct load

erarchical time series is a collection of several time se-ries that are linked together in a hierarchical structure.Hierarchical forecasting methods allow the forecasts ateach level to be summed up in order to provide a fore-cast for the level above. Existing approaches to hier-archical time series include a top-down, bottom-up andmiddle-out approach. In the top-down approach theaggregated series is forecasted and then disaggregatedbased on historical proportions. The possible ways tocompute these proportions are explained in [27]. Thebottom-up approach, first forecasts all the individualchannels on the bottom level and then aggregates theforecasts to create the aggregated forecast. The middle-out approach combines both approaches. It starts at amiddle layer or an intermediate level and uses aggre-gation for the higher layers and disaggregation for thelower layers. We apply a bottom-up approach, i.e., weuse the individual appliance channels to create forecastsfor the individual appliances. Similar to the day typeapproach, we join each day of the week of consecutiveweeks into separate data sets for each appliance. Theapproach is illustrated in Figure 3c. Finally, we aggre-gate the individual forecast to a forecast of the entirehousehold and test it against the test window.

4.3 GranularitiesWhile both data sets used in this experiment con-

tain data at 1-3 seconds granularity, other data setsand meters offer measurements of different granulari-ties. Therefore, we want to understand the effect ofdifferent measurement granularities on the performanceof the different forecasting methods. In particular, wetransformed the available data into granularities of 15,30 and 60 minute intervals. The power reading of thoseintervals is defined as the mean power of the readingsin the respective interval.

4.4 Training Window SizesAnother parameter for demand forecasting is the win-

dow length used for training the models. The differentdata sets and forecasting methods limit the length oftraining sets. For example, for the day type strategyonly training sets of 3 days can be used, since the REDDdata set only contains only four days for of each day ofthe week. The ARIMA method of the R forecast pack-age cannot handle models with seasonal periods withmore than 350 data points and the NNET method re-quires at least two seasonal period cycles to train theneural network. Considering these restrictions, we use atraining window size of 3 days for the day type and thehierarchical day type approach and varied the trainingwindow length for the sliding window approach between3, 5 and 7 days.

4.5 Forecasting HorizonsThe forecasting horizon is the number of point fore-

casts the particular algorithm predicts into the future.In the context of this experiment the horizon is given bythe minutes the load is predicted into the future. Thefocus of this work lies on short-term forecasts. Hence,the range of the prediction lies between 15 minutes and24 hours. Note that the granularity of the forecast can-not be higher than the granularity of the training data.For instance, with a training data set of 15 minutes in-tervals the earliest prediction will be 15 minutes into thefuture and all further predictions will be in intervals of15 minutes.

4.6 Model Quality MeasureWe require a statistical quality measure that can com-

pare the different forecasting methods and strategies.The present experiment uses the Mean Absolute Per-centage Error (MAPE) as the standard accuracy errormeasure. The reason for this choice is that MAPE canbe used to compare the performance on different datasets, because it is a relative measure. MAPE is definedas the mean over the ratio of the absolute differencebetween the residual and the actual value in percent:

MAPE =1

n

n∑t=1

|xt − xtxt

|

where xt is the actual value and xt is the forecast value.For example, with an actual load of 100 Watt and acorresponding forecasted load of 150 Watt, the MAPEwould be 50%, because the difference between actualand predicted load is 50% of the actual load.

4.7 Experimental SetupThe purpose of our experiment is to gain insights into

how the different parameters influence the different fore-casting methods and strategies. This information helpsto choose the most appropriate method. The followinggives a summary of the different parameters and theirvalues, as we used them in our experiment.

granularity ∈ {15, 30, 60}minutes,

method ∈ {ARIMA,BATS,NNET,PERSIST, TBATS}strategy ∈ {daytype, hierarchical, slidingwindow}horizon ∈ {15, 30, 60, 180, 360, 720, 1440}minutes

windowsize ∈ {3, 5, 7}days

5. EXPERIMENTAL RESULTSIn this section we present the influence of the defined

parameters on the accuracy of the forecasting methodsand strategies. For the evaluation we performed a totalof 16038 different forecasts.

Result 1: For certain households increasing trainingwindow sizes significantly improve forecast accuracy.

6

Page 7: Household Electricity Demand Forecasting - Benchmarking ... · PDF fileconsumption is a vital factor towards ... of short-term load fore-casting in demand and supply ... direct load

ARIMA BATS NNET PERSIST TBATS

0

50

100

150

0

50

100

150

RE

DD

TUM

3 5 7 3 5 7 3 5 7 3 5 7 3 5 7Window size

MA

PE

Windowsize

3 days

5 days

7 days

Figure 4: MAPE for varying window sizes.

Figure 4 shows boxplots of the distribution of the MAPEfor the sliding window strategy and the different win-dow lengths. The results are split by data set as well asby applied forecasting method. For each window lengthone boxplot shows the median as well as 25 and 75%quantiles of the MAPE. In addition, the graph showsthe mean as dots and a linear trend line over the increas-ing window sizes. From these results we have three keyinsights: (1) Increasing window sizes reduce the fore-casting error on the REDD data set significantly, F(2,2399)=10.209, p<0.0001. However, the forecasting er-ror on the TUM data set does not change with increasedwindow sizes F(2,11380)=0.1563, p>0.05. A possibleexplanation is that in the consumption profile of theTUM data set the consumption of every day is very sim-ilar and shows a constant pattern. Thus, an additionalday of training data does not provide the models withnew important information. However, on the REDDdata set with its fluctuations in the demand profile theresults of the ARIMA, NNET and TBATS methods canimprove with the additional information. (2) The fore-casting error on the TUM data set is almost constantlylower than on the REDD data set. This could be dueto the fact that the demand profile of the TUM dataset has long and frequent periods of constant consump-tion, which are easier to predict. The REDD data seton the other hand contains more fluctuations. (3) Onthe TUM data set, the persistence forecast has a betterprecision than all other forecast methods. This is alsodue to the long periods with constant consumption.

Result 2: Longer forecasting horizons lead to increas-ing errors. Lower granularities reduce the error.Figure 5 shows heatmaps of the MAPE from the slid-ing window and day type strategies for different gran-ularities and forecasting horizons. The results fromthe hierarchical strategy are not included, because onlythe ARIMA method was performed for the hierarchicalstrategy. The results are split by data set as well as bythe performed forecasting method. The values in thelower right corner of the respective tables are missing,

ARIMA BATS NNET PERSIST TBATS

48

83

90

96

100

103

146

91

86

95

103

109

146

51

66

77

84

117

20

22

27

43

54

57

57

25

29

44

53

55

55

30

44

51

55

50

76

129

118

112

114

111

127

57

75

89

82

69

92

59

74

77

69

92

7

9

13

17

23

34

33

10

13

23

28

42

36

17

25

26

42

33

85

93

90

78

79

85

150

49

54

60

68

65

121

52

50

60

55

102

35

36

38

42

44

64

48

35

39

39

40

55

43

42

46

49

68

50

54

71

91

93

99

93

113

75

71

103

109

105

118

60

83

98

103

117

6

8

10

16

22

31

34

6

9

15

21

31

34

10

13

18

29

29

72

116

105

99

96

87

106

72

70

83

74

57

81

63

81

82

74

96

10

11

15

21

24

41

28

13

15

23

25

41

30

22

26

29

45

32

15

30

60

180

360

720

1440

15

30

60

180

360

720

1440

RE

DD

TUM

15 30 60 15 30 60 15 30 60 15 30 60 15 30 60Sampling granularity in minutes

Fore

cast

ing

horiz

on in

min

utes

0

50

100

150

200

MAPE

(a) Sliding Window Strategy

ARIMA BATS NNET PERSIST TBATS

91

112

82

66

65

97

95

62

56

62

72

94

84

47

53

66

84

76

48

46

49

54

51

50

49

43

39

37

37

38

44

42

40

40

38

42

36

63

60

66

64

70

69

54

52

60

67

56

53

48

63

58

52

49

14

11

11

12

14

16

27

15

15

13

13

19

34

15

15

14

18

33

206

225

161

122

124

126

119

95

75

77

120

95

83

56

97

84

79

65

16

10

8

8

8

13

36

6

5

4

4

10

31

8

7

7

12

31

8

53

58

69

69

63

61

61

68

74

72

63

60

59

68

64

46

46

10

6

5

4

3

8

23

6

5

4

3

9

24

7

6

5

10

25

66

77

67

58

60

65

65

40

45

55

66

55

55

50

64

57

57

48

8

6

5

6

6

10

28

11

10

12

12

16

32

10

13

12

15

29

15

30

60

180

360

720

1440

15

30

60

180

360

720

1440

RE

DD

TUM

15 30 60 15 30 60 15 30 60 15 30 60 15 30 60Sampling granularity in minutes

Fore

cast

ing

horiz

on in

min

utes

0

50

100

150

200

MAPE

(b) Day Type Strategy

Figure 5: MAPE for varying horizons and granularities.

7

Page 8: Household Electricity Demand Forecasting - Benchmarking ... · PDF fileconsumption is a vital factor towards ... of short-term load fore-casting in demand and supply ... direct load

Table 2: Distribution of power on weekdays and -ends.

TUM data set REDD data set

weekday weekend weekday weekend

mean 44.0 55.3 259.5 389.1

sd 61.1 86.6 375.7 661.0

ARIMA BATS NNET PERSIST TBATS

69.9 85

32.3 77.5

60.7 54.2

16.1 20.7

134.6 85.2

10.6 17.7

66.1 49.8

7.2 14.9

60.8 54.6

11.9 19

RE

DD

TUM

WD WE WD WE WD WE WD WE WD WE 0

50

100

150

200

MAPE

Figure 6: Mean MAPE for weekdays and -ends.

because in these cases no forecast is possible, as the fore-casting horizon is shorter than the data set granularity.From these results we gain four key insights: (1) Withthe exception of neural networks on the REDD dataset, all forecasting methods can achieve better resultson both data sets with the day type strategy. (2) Again,except for the neural network with the day type strategyon the REDD data set, longer forecasting horizons leadto larger errors. However, it can be observed that theexponential smoothing methods BATS and TBATS aremore robust against increasing horizons than the othermethods. (3) Especially on the REDD data set it can beobserved that higher granularities lead to smaller errors.This can be explained by the reduction of the variancedue to averaging the power readings over longer timeintervals. With less variation in the demand profile theforecasting methods can make more precise predictions.As the demand profile of the TUM data set is constantover long periods, the increasing granularity does notdecrease the error. (4) While in the REDD data setthe persistence method has a high precision for shorthorizons and granularities, the exponential smoothingstrategies BATS and TBATS and the neural networkoutperform the persistence method for granularities of30 and 60 minutes. As mentioned above, the persistenceforecast is difficult to beat on the TUM data set, but theexponential smoothing strategies BATS and TBATS getclose, especially for longer forecasting horizons.

Result 3: For some households the prediction on week-days can reach a higher precision than for weekends.Figure 6 shows the mean MAPE from the day typestrategy for weekdays and weekends. The results aresplit by data set as well as by the performed forecastingmethod. From these results can see that on the TUMdata set all methods perform better on weekdays. How-ever, on the REDD data set only the ARIMA modelperforms better on weekdays than on weekends, whileall other methods perform better on the weekend. Ta-ble 2 shows the mean power consumption and the stan-dard deviation of the power on weekdays and weekends

day type hierarchical day type sliding window

76.3

43.3

78

28.7

93.9

42.7

RE

DD

TUM

0

50

100

150

200

MAPE

Figure 7: Mean MAPE for different strategies.

for both data sets. For both data sets the mean con-sumption as well as the standard deviation are lower onweekdays than on weekends. On the REDD data setall the forecasting methods except ARIMA do not seemto be able to improve their precision with the reduceddeviation.

Result 4: Splitting the data set into day type windowscan improve the forecast precision. In addition splittingthe data set into distinct channels for individual appli-ances can also improve forecast precision.Figure 5 shows that for almost every method a divisionof the data into day type windows improves the forecastprecision against using simple sliding windows. In addi-tion, Figure 7 shows a comparison of the mean MAPEof all three strategies for the ARIMA method. Whilethe ARIMA method does not provide the best results ingeneral, the figure shows that using a hierarchical strat-egy can greatly improve the performance on the TUMdata set. This is a surprising result as generally theprediction of aggregated loads tend to result in a higherprecision.

6. DISCUSSIONForecasting electricity consumption at different loca-

tions in electric distribution grids on short time scalesis a crucial ingredient of systems that will enable higherrenewable penetration without sacrificing the security ofelectricity supply. The goal of this paper is to evaluatethe performance of state-of-the-art forecasting methodsbased on actual data. Overall, we observed that mostof the algorithms benefit from larger training sets andsplitting the data into training sets of particular daytypes. In addition we observed that predictions basedon disaggregated data from individual appliances leadto better results. Generally, our analysis has revealedthat if the forecasting methods are applied without in-dividual tuning, they are able to beat the accuracy ofpersistence forecasting only in rare cases. Furthermore,the achievable accuracy in terms of average MAPE issurprisingly low, ranging between 5 and 50% for one ofthe considered data sets, and between 30 and 150% forthe other, more variable, demand profile. Our work thusmotivates more research investigating how accuracy canbe increased.

R is only one out of many statistical packages of-fering state-of-the-art forecasting methods. As men-

8

Page 9: Household Electricity Demand Forecasting - Benchmarking ... · PDF fileconsumption is a vital factor towards ... of short-term load fore-casting in demand and supply ... direct load

tioned above, its data processing capabilities are lim-ited. Other well-known packages that could be used in-clude WEKA time series forecasting [11] as well as sev-eral Python modules, e.g., statsmodel [25] and Scikit-learn [23]. Using Python modules, allows for fittingmodels to more data points compared to R and couldtherefore yield better results.

Furthermore, the introduction of further features couldprovide additional information for prediction algorithmsto react faster in case a change in consumption occurs.For example, when a device is switched on or off it takessome time until the average wattage of the time inter-val accounts for the change. The TUM data set con-tains additional sensors which are not yet considered inour experiments. We expect an increased forecast pre-cision when information from occupancy, temperatureand brightness sensors are included.

We also expect error reduction when the consumptionpatterns of the appliances itself are considered. This issupported by the results for the hierarchical strategy (cf,Figure 7). Thermal devices like fridges, freezers, boil-ers and heat pumps have a very predictable consump-tion pattern. Other devices like washing machines, dish-washers and laundry dryers have a known consumptionpattern once switched on. When looking at individualappliances, another direction worthwhile investigatingwould be event detection. Instead of prediction solelybased on continuous wattage readings it could be benefi-cial to detect concrete events (e.g., on/off) and based onthat derive a future consumption pattern. A sequenceof events could train a markov model [22] and predictfuture events which could be used for the consumptionforecast. We think that this could reduce the predictionerror especially for short time forecasts.

In addition, it is important to investigate strategiesfor handling missing sensor data. In our experiments weonly considered consistent data sets. However, in a realworld setting load forecasts need to be performed evenin situations with missing data. Future work shouldinvestigate how to handle temporary sensor outages,which could distract the prediction algorithms.

Our results show a large difference between the fore-casting accuracy of the same methods applied to twodifferent data sets. It is unclear how common the char-acteristics of these data sets are. The necessary datafor carrying out more representative studies is currentlymissing, although more data sets are currently beingpublished [3]. Since we tested a wide range of com-binations of methods, strategies, sampling granularitiesand forecasting horizons, our experimental results give agood insight into how the different methods and strate-gies perform in various settings, despite the large differ-ence in forecast accuracy.

In summary, this study should be considered as anexploration of promising directions for future research

rather than yielding final results on the viability of localelectricity demand forecasting.

7. CONCLUSIONSWe have evaluated a wide range of state-of-the-art

methods and strategies for short-term forecasting of house-hold electricity consumption, which is a key capabilityin many smart grid applications. Although our currentdata base is limited, we were able to gain useful in-sights into their performance at different levels of gran-ularity and forecasting horizon length. We showed thatwithout further refinement of advanced methods suchas ARIMA and neural networks, the persistence fore-casts are hard to beat in most situations. Especiallyin households with demand profiles that remain con-stant for many hours during a typical day, advancedforecasting methods provide little value, if they are notembedded into a framework that adapts their use to in-dividual household attributes. Future work will focuson the design of such frameworks and evaluate thembased on representative data.

8. REFERENCES[1] S. Akshay Uttama Nambi, T. G. Papaioannou, D. Chakraborty,

and K. Aberer. Sustainable energy consumption monitoring inresidential settings. In Computer Communications Workshops(INFOCOM WKSHPS), 2013 IEEE Conference on, pages1–6. IEEE, 2013.

[2] A. Alberini and M. Filippini. Response of residential electricitydemand to price: The effect of measurement error. EnergyEconomics, 33(5):889–895, 2011.

[3] S. Barker, A. Mishra, D. Irwin, E. Cecchet, P. Shenoy, andJ. Albrecht. Smart*: An open data set and tools for enablingresearch in sustainable homes. In Proceedings of the 2012Workshop on Data Mining Applications in Sustainability(SustKDD 2012), 2012.

[4] D. Bunn and E. Farmer. Review of short-term forecastingmethods in the electric power industry. Comparative modelsfor electrical load forecasting, pages 13–30, 1985.

[5] K. Carrie Armel, A. Gupta, G. Shrimali, and A. Albert. Isdisaggregation the holy grail of energy efficiency? The case ofelectricity. Energy Policy, 2012.

[6] A. M. De Livera, R. J. Hyndman, and R. D. Snyder.Forecasting time series with complex seasonal patterns usingexponential smoothing. Journal of the American StatisticalAssociation, 106(496):1513–1527, 2011.

[7] A. Eckner. Algorithms for Unevenly-Spaced Time Series:Moving Averages and Other Rolling Operators. Technicalreport, Working Paper, 2012.

[8] A. Eckner. A framework for the analysis of unevenly-spacedtime series data. Technical report, Working Paper, 2012.

[9] C. Goebel and D. Callaway. Using ICT-Controlled Plug-inElectric Vehicles to Supply Grid Regulation in California atDifferent Renewable Integration Levels. IEEE Transactions onSmart Grid, 4(2):729–740, 2013.

[10] C. Goebel, H.-A. Jacobsen, V. Razo, C. Doblander, J. Rivera,J. Ilg, C. Flath, H. Schmeck, C. Weinhardt, D. Pathmaperuma,H.-J. Appelrath, M. Sonnenschein, S. Lehnhoff, O. Kramer,T. Staake, E. Fleisch, D. Neumann, J. Struker, K. Erek,R. Zarnekow, H. Ziekow, and J. Lassig. Energy Informatics.Business & Information Systems Engineering, pages 1–7,2013.

[11] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann,and I. H. Witten. The weka data mining software: an update.ACM SIGKDD Explorations Newsletter, 11(1):10–18, 2009.

[12] K. Herter, P. McAuliffe, and A. Rosenfeld. An exploratoryanalysis of California residential customer response to criticalpeak pricing of electricity. Energy, 32(1):25–34, 2007.

[13] S. Humeau, T. K. Wijaya, M. Vasirani, and K. Aberer.Electricity load forecasting for residential customers:Exploiting aggregation and correlation between households. In

9

Page 10: Household Electricity Demand Forecasting - Benchmarking ... · PDF fileconsumption is a vital factor towards ... of short-term load fore-casting in demand and supply ... direct load

Sustainable Internet and ICT for Sustainability (SustainIT),2013, pages 1–6. IEEE, 2013.

[14] R. J. Hyndman, Y. Khandakar, et al. Automatic time series forforecasting: The forecast package for R. 2007.

[15] W. Kleiminger, C. Beckel, T. Staake, and S. Santini.Occupancy Detection from Electricity Consumption Data. InProceedings of the 5th ACM Workshop on Embedded SystemsFor Energy-Efficient Buildings, pages 1–8. ACM, 2013.

[16] K. Kok. Dynamic pricing as control mechanism. In Power andEnergy Society General Meeting, 2011 IEEE, pages 1–8.IEEE, 2011.

[17] J. Z. Kolter and J. Ferreira. A large-scale study on predictingand contextualizing building energy usage. In Proceedings ofthe 25th AAAI Conference on Artificial Intelligence, 2011.

[18] J. Z. Kolter and M. J. Johnson. Redd: A public data set forenergy disaggregation research. In proceedings of the SustKDDworkshop on Data Mining Applications in Sustainability,pages 1–6, 2011.

[19] K. Lee, Y. T. Cha, and J. Park. Short-term load forecastingusing an artificial neural network. IEEE Transactions onPower Systems, 7(1):124–132, 1992.

[20] J. Medina, N. Muller, and I. Roytelman. Demand response anddistribution grid operations: Opportunities and challenges.IEEE Transactions on Smart Grid, 1(2):193–198, 2010.

[21] A. Mohsenian-Rad, V. Wong, J. Jatskevich, R. Schober, andA. Leon-Garcia. Autonomous demand-side management basedon game-theoretic energy consumption scheduling for the futuresmart grid. SIEEE Transactions on mart Grid, 1(3):320–331,2010.

[22] V. Muthusamy, H. Liu, and H.-A. J. Jacobsen. PredictivePublish/Subscribe Matching. In Proceedings of the FourthACM International Conference on Distributed Event-BasedSystems, DEBS ’10, pages 14–25, New York, NY, USA, 2010.ACM.

[23] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss,V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau,M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn:Machine learning in Python. Journal of Machine LearningResearch, 12:2825–2830, 2011.

[24] S. D. Ramchurn, P. Vytelingum, A. Rogers, and N. Jennings.Agent-based control for decentralised demand side managementin the smart grid. In The 10th International Conference onAutonomous Agents and Multiagent Systems-Volume 1, pages5–12, 2011.

[25] J. Seabold and J. Perktold. Statsmodels: Econometric andStatistical Modeling with Python. In Proceedings of the 9thPython in Science Conference, 2010.

[26] K. Tanaka, K. Uchida, K. Ogimi, T. Goya, A. Yona, T. Senjy,T. Funabashi, and C. Kim. Optimal operation by controllableloads based on smart grid topology considering insolationforecasted error. IEEE Transactions on Smart Grid,2(3):438–444, 2011.

[27] US Department of Energy. Grid 2030: A National Vision ForElectricity’s Second 100 Years, 2003.

[28] A. Veit, Y. Xu, R. Zheng, N. Chakraborty, and K. Sycara.Multiagent coordination for energy consumption scheduling inconsumer cooperatives. In Proceedings of 27th AAAIConference on Artificial Intelligence, pages 1362–1368, July2013.

[29] C. Wu, H. Mohsenian-Rad, and J. Huang. Wind powerintegration via aggregator-consumer coordination: A gametheoretic approach. In Innovative Smart Grid Technologies(ISGT), 2012 IEEE PES, pages 1–6. IEEE, 2012.

[30] H. Ziekow, C. Doblander, C. Goebel, and H.-A. Jacobsen.Forecasting household electricity demand with complex eventprocessing: insights from a prototypical solution. InProceedings of the Industrial Track of the 13thACM/IFIP/USENIX International Middleware Conference,page 2. ACM, 2013.

[31] H. Ziekow, C. Goebel, J. Struker, and H.-A. Jacobsen. Thepotential of smart home sensors in forecasting householdelectricity demand. In 2013 IEEE International Conference onSmart Grid Communications (SmartGridComm), pages229–234. IEEE, 2013.

10