Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

1/85

Jonas Buyl

route

Power demand prediction of a vehicle on a non-fixed

Academiejaar 2011-2012

Faculteit Ingenieurswetenschappen en Architectuur

Voorzitter: prof. dr. ir. Jan Van Campenhout

Vakgroep Elektronica en Informatiesystemen

Master in de ingenieurswetenschappen: computerwetenschappen

Masterproef ingediend tot het behalen van de academische graad van

Begeleiders: Pieter Buteneers, Tim Waegeman

Promotoren: prof. dr. ir. Benjamin Schrauwen, dr. ir. David Verstraeten


2/85


3/85

Jonas Buyl

route

Power demand prediction of a vehicle on a non-fixed

Academiejaar 2011-2012

Faculteit Ingenieurswetenschappen en Architectuur

Voorzitter: prof. dr. ir. Jan Van Campenhout

Vakgroep Elektronica en Informatiesystemen

Master in de ingenieurswetenschappen: computerwetenschappen

Masterproef ingediend tot het behalen van de academische graad van

Begeleiders: Pieter Buteneers, Tim Waegeman

Promotoren: prof. dr. ir. Benjamin Schrauwen, dr. ir. David Verstraeten


4/85

1

Power demand prediction of a vehicle on a

non-fixed routeJonas Buyl

Supervisors: prof. dr. ir. Benjamin Schrauwen, dr. ir. David Verstraeten,

ir. Pieter Buteneers, ir. Tim Waegeman.

AbstractIn this article several approaches are presented topredict the future power demand and speed of a car, as well asthe chance to stop in 200m. We introduce a time series predictionmodel based on Reservoir Computing, a novel technique fortraining recurrent neural networks. The model is improved byusing information of previous trips, and post-processing thepredicted output window. Furthermore, we present a RC-basedclassifier to predict the chance a car is stopping within the next200m. This is used to separate the model in a model trained onfast data, and a model trained on intervals where the car stops.

Index TermsReservoir Computing, vehicle behavior predic-tion, stop prediction, road graph, electric vehicles, time seriesprediction

I. INTRODUCTION

ELECTRIC vehicles (EV) are increasingly commerciallyviable but sales figures remain fairly disappointing, oftenbecause of the high price. The battery has been the main

way of storing energy in EVs because of its large power-

to-weight ratio, but they are not as capable as capacitors

to handle peaks in power demand. New research proposes

to use supercapacitors capacitors with an energy densitymuch greater than capacitors in electric vehicles to replace

batteries.

The ChargeCar project [1] suggests to combine the advantages

of both, to be able to use cheaper batteries and supercapacitors

to reduce the manufacturing costs for EVs. The capacitor is

used as a buffer to handle the high spikes in power demand.

This extends battery life-time, increases efficiency in cold

weather and can even extend the range of the EV.

To direct the energy flows between battery, capacitor, and

engine, a controller is needed. In this article we introduce

several approaches to predict vehicle behavior and upcoming

stops. These predictions can then be used to improve an

intelligent controller.We use of Reservoir Computing (RC), a novel way of training

recurrent neural networks [3]. Instead of training all internal

weights, only the output weights are trained. The weights of

the input and the internal connections are generated randomly

and remain constant.

First, we modify a GPS map generation algorithm presented by

L. Cao and J. Krumm [2], to keep information about the cars

current power demand, speed, acceleration, etc... . This infor-

mation is then used as extra input for time series prediction

of the power demand, speed and acceleration profiles using

Reservoir Computing [4]. Additionally, we observed that the

weight of the predicted output vs. the information from previ-

ous trips decreases as the range of the prediction increases.

Therefore the predicted output windows are post-processed

using a simple linear model. Furthermore, we introduce an RC-

based classifier model to predict if the car stops within 200m.

The classifier is then used to separate the time series prediction

model for situations where the car stops within 200m.

I I . VEHICLE BEHAVIOR PREDICTIONA. Pre-processing

After building the road graph data structure defined by Cao

et al. [2], the complete dataset is mapped on the road segments.

To better align the trip data, it was interpolated every meter,

converting the data to a distance scale.

B. Single RC time series prediction model (RCLA)

The speed, acceleration and power profiles are used as input

in separate systems with a reservoir of 150 neurons each. The

neuron output weights are trained using ridge regression. Each

predicted value is sent back in an output feedback loop to

recursively predict the rest of the sequence. The reservoir stateoutputs at each step t are extended with the averages of the

information of previous trips at each step t, to predict the next

output value y(t).

C. Output window post-processing (OWPP)

The training process of the time series consists of only

predicting the next step ahead. We observed that the influence

of the predicted output vs. the information from previous

trips decreases as the range of the prediction increases. The

output window is therefore post-processed by applying linear

regression at each time step t individually, combining the

predicted values with the average values at point t.

D. Stop prediction

A reservoir was used with a logistic regression readout to

classify a sample t as point where the car stops in 200m. First

the current power demand, acceleration and speed at t is used

as input of the reservoir. Secondly, the average acceleration

of previous trips at t + 20 is used, excluding trips where thecar does not stop within 200m of t + 20. Lastly, the averagechance to stop within 200m of t + 20 is used.For the evaluation of this classifier, the area under the ROC-

curve (AUC) is maximized. The true positive rate and false


5/85

2

positive rate are calculated for every threshold that can sep-

arate the classes, when classifying the output between [0, 1]of the reservoir readout. A maximum average AUC of 0.955

was found and 94.5% of the samples were correctly classified,tested on a dataset in which 10% of the samples are actualstops. From Figure 1, we can see that the predicted chance

to stop is usually high at points where the car stops. Around

places where the car breaks but doesnt stop, the output can

be high as well. This could be interpreted as an error, but this

output may still be useful for some applications.

Fig. 1. An example of the output of the RC stop prediction model. The greenareas are the target areas where the car stops within 200m. At the bottom, thespeed profile of the trip is given for comparison with the actual car behavior.The chosen threshold line is shown as a grey dashed line.

E. Split model time series prediction (RCSP)The RCLA model was separated by training and optimizing

one model on a dataset of intervals where the car stops, and

one model on the other intervals. The stop classifier is then

used to determine which model should be used for the time

series prediction. Finally, the OWPP filter was separated as

well and applied to the RCSP model to further improve results.

III. EVALUATION

The proposed RC-based models were compared with a

number of linear methods. The best performing methods made

use of a time delay window (TDW): a weighted average of

the previous values, trained using linear regression. A secondmodel extends the TDW model by including a weighted

average of the information of previous trips (TDWAtdw).

Trips of one driver were used from the dataset supplied by the

ChargeCar project. A random subset of 2,230,500 samples was

chosen and divided in 9566 intervals to predict. Of these data

25% was used for training, another 25% for validation, and theremaining 50% was used to compare the models. The resultsof the RC models are the averages taken over 10 reservoir

instances.

The results of all discussed models are given in Table I. The

first RC-based model RCLA does not yield much better results

than the linear methods. However, after output window post-

processing the Root Mean Squared Error (RMSE) can be

decreased significantly. The RCSP model predicts the speed

better than other models, and when extended with the OWPP

filter, the model outperforms any other tested model to predict

the power demand, acceleration and speed profiles. In Figure

2 the absolute deviation is given over the predicted distance.

The OWPP improves the result towards the end of predictions,

whereas the RCSP model improves the result at the start of

predictions.

RMSE (STD) Power (W) Speed (m/s) Acceleration (m/s)2

TDW 8766 1.481 0.3136TDWAtdw 8401 1.502 0.3111RCLA 8416 (4.47) 1.423 (0.018) 0.3130 (0.0005)RCLA/OWPP 8386 (3.67) 1.311 (0.012) 0.3081 (0.0002)RCSP 8367 (25.62) 1.304 (0.017) 0.3023 (0.0005)RCSP/OWPP 8257 (12.48) 1.257 (0.016) 0.2992 (0.0006)

TABLE IAVERAGE RMSE ERROR RATES (AND STANDARD DEVIATION).

Fig. 2. Average absolute deviation over the predicted distance.

IV. CONCLUSION

It is possible to use data from previous trips and Reservoir

Computing to predict the future power demand, speed and

acceleration profile. Using a classifier to predict if a stop is

imminent significantly improves the results. Post-processing

the predicted output interval further boosts the performance.

The average absolute deviation of the predicted speed at 200m

further is 6km/h.

Both the predicted profiles and the stop predictor could be

used for an intelligent vehicle energy management controller.

REFERENCES

[1] H. Benjamin Brown, Illah Nourbakhsh, Christopher Bartley, JenniferCross, Paul S Dille, Joshua Schapiro, and Alexander Styler. Chargecarcommunity conversions: Practical, custom electric vehicles now! NumberCMU-RI-TR-, March 2012.

[2] Lili Cao and John Krumm. From gps traces to a routable road map. InProceedings of the 17th ACM SIGSPATIAL International Conference on

Advances in Geographic Information Systems, GIS 09, pages 312, NewYork, NY, USA, 2009. ACM.

[3] David Verstraeten, Benjamin Schrauwen, Michiel DHaene, and DirkStroobandt. An experimental unification of reservoir computing methods.

Neural Networks, 20(3):391403, 4 2007.[4] Francis wyffels, Benjamin Schrauwen, and Dirk Stroobandt. Stable output

feedback in reservoir computing using ridge regression. In V. Kurkova,R. Neruda, and J. Koutnik, editors, Proceedings of the 18th InternationalConference on Artificial Neural Networks, pages 808817, Prague, 92008. Springer.


6/85

1

Voorspelling van vermogensgebruik van een

voertuig op een niet-vaste routeJonas Buyl

Promotors: prof. dr. ir. Benjamin Schrauwen, dr. ir. David Verstraeten,

Begeleiders: ir. Pieter Buteneers, ir. Tim Waegeman.

SamenvattingIn dit artikel presenteren we verschillendemanieren om het verbruik en de snelheid van een voertuig tevoorspellen, net als de kans om te stoppen binnen de 200m. Weintroduceren een model voor tijdsreeksvoorspelling gebaseerdop Reservoir Computing, een nieuwe techniek om recurrenteneurale netwerken te trainen. Het model werd nog verbeterdmet informatie van vorige trips, en met herverwerking van hetvoorspellingsvenster. Verder gebruiken we een RC classificatie-model om de stopkans te voorspellen om het eerdere model opte splitsen door een model te trainen op snelle data en een ander

op intervallen waar de auto stopt. De stopkans predictor bepaaltdan welk model moet gebruikt worden om het volgende intervalte voorspellen.

Sleutelwoorden: Reservoir Computing, gedragsvoorspelling vaneen voertuig, voorspelling van stopkans, wegengraaf, elektrischevoertuigen, tijdsreeksvoorspelling

I. INTRODUCTIE

ELEKTRISCHE voertuigen (EV) zijn meer en meer com-

mercieel aantrekkelijker maar de verkoopscijfers blijven

tegenvallen, dikwijls o.w.v. de hoge kostprijs van de batterij.

De batterij is de meest gebruikte manier om energie op te slaan

in EVen, maar ze zijn niet zo goed in staat om grote pieken

in het vermogen op te vangen zoals condensatoren. Nieuwonderzoek stelt voor om supercondensatoren condensatoren

met een veel grotere energiedensiteit te gebruiken in EVen

i.p.v. batterijen.

Het ChargeCar project [1] stelt voor om de voordelen van

beide te gebruiken zodat goedkopere batterijen en supercon-

densatoren gebruikt kunnen worden. De condensator wordt

dan gebruikt als buffer tegen hoge pieken in het vermogen.

Dit verbetert de levensduur van de batterijen, de efficientie in

koud weer en kan zelfs het bereik van het EV vergroten.

Om de energiestromen tussen batterij, condensator en motor

te sturen, is een controller nodig. In dit artikel introduceren

we verschillende manieren om het gedrag van een voertuig en

stops te voorspellen. Deze voorspellingen kunnen dan gebruiktworden om een intelligente controller te verbeteren.

We maken gebruik van Reservoir Computing (RC), een vrij

nieuwe techniek om recurrente neurale netwerken te trainen

[3]. In plaats van alle interne gewichten te trainen, worden

alleen de uitganggewichten getraind. De rest van de verbin-

dingen blijven constant en worden willekeurig gegenereerd.

Als eerste passen we een automatisch algoritme aan om

GPS kaarten te genereren [2] om informatie bij te houden

van de auto (zoals het huidig vermogen, snelheid, enz...).

Deze informatie kan dan gebruikt worden om de modellen

te verbeteren. Bovendien zagen we dat het gewicht van de

voorspelde waarde t.o.v. de informatie van voorbije trips,

daalde naargelang de voorspelde afstand. Daarom wordt het

voorspelde venster herverwerkt met een simpel lineair model.

Verder introduceren we een RC classificatiemodel om te

voorspellen of de auto stopt binnen de 200m. Deze wordt

dan gebruikt om de tijdsreeksvoorspellingsmodellen te splitsen

naargelang de auto stopt of niet.

I I . VOORSPELLING VAN HET GEDRAG VAN EEN VOERTUIGA. Voorverwerking

Na het bouwen van de wegengraaf gedefinieerd door Cao

et al. [2], werd de volledige dataset gekoppeld aan de weg-

segmenten. Om de data beter te uit te lijnen werden de trips

elke meter genterpoleerd zodat de data op een afstandsschaal

komt.

B. Tijdsreeksvoorspelling met enkel RC model(RCLA)

De snelheid, acceleratie en het vermogen worden gebruikt

als input in aparte reservoirs met 150 neuronen elk. De

uitganggewichten worden getraind met ridge regressie [4].

Elke voorspelde waarde wordt teruggekoppeld om recursief derest van de reeks te voorspellen. De reservoir-uitgang op elke

stap t wordt uitgebreid met de informatie van de vorige trips

op elke tijdsstip t om de volgende uitgang y(t) te voorspellen.

C. Herverwerking van uitvoervenster (OWPP)

De tijdsreeksen worden slechts getraind op de volgende

stap. We hebben gemerkt dat de invloed van de voorspelde

uitvoer t.o.v. de informatie van vorige trips vermindert naar-

gelang de predictie-afstand langer wordt. Het uitvoervenster

werd daarom herverwerkt door lineare regressie toe te passen

op elke stap t apart en de de voorspelde waarde te combineren

met de gemiddelde waarde op punt t.

D. Stopkans voorspelling

Een reservoir werd gecombineerd met een logistische uit-

leesfunctie om een punt t te classificeren als een punt waar

de auto stopt binnen de 200 meter met de volgende input: Het

huidig vermogen, de snelheid, de acceleratie, de gemiddelde

acceleratie van de vorige trips op het punt t+ 20 en als laatstede gemiddelde kans om binnen de 200 meter te stoppen van

punt t + 20. Voor de evaluatie van dit model werd de op-pervlakte onder de ROC-curves gemaximaliseerd (AUC). De

fracties echte positieven en fout-positieven worden berekenend


7/85

2

voor elke drempelwaarde die de klassen kan scheiden van de

uitvoer waarde tussen [0, 1] van de reservoir uitleesfunctie.We vonden een maximale gemiddelde oppervlakte van 0.955.

Na het minimalizeren van de fractie foute classificaties werd

een drempelwaarde gevonden die 94.5% van de punten juistclassificeert. Uit Figuur 1 kunnen we zien dat de voorspelde

stopkans meestal hoog is waar de auto stopt. Rond plekken

waar de auto niet stopt maar wel remt is de uitvoer soms ook

hoog. Dit resulteert in een fout, maar deze uitvoer kan toch

nog nuttig zijn voor andere applicaties.

Figuur 1. Een voorbeeld van de uitvoer van het RC stopkans voorspellings-model. De groene gebieden zijn de richtgebieden waar de auto stopt binnende 200m. Onderaan werd het snelheidsprofiel gegeven om de vergelijking methet echte gedrag te tonen. De gekozen drempelwaarde wordt getoond met degrijze stippellijn.

E. Gescheiden tijdsreeksvoorspelling (RCSP)

Het RCLA model werd gescheiden door een model apart te

optimalizeren en te trainen op een dataset met intervallen waar

de auto stopt, en een ander model op de andere intervallen.

De stopkans predictor wordt dan gebruikt om te bepalen van

welk model de voorspelling moet gebruikt worden. Tenslotte

wordt ook de OWPP filter apart getraind en toegepast op dit

model om de resultaten nog meer te verbeteren.

III. EVALUATIE

De RC-gebaseerde modellen werden vergeleken met enkele

lineaire methodes. De beste modellen maken gebruik van een

geschiedenisvenster (TDW): een gewogen gemiddelde van devorige waarden, getraind met lineaire regressie. Een tweede

model breidt het TDW model uit met ook het gewogen

gemiddelde van informatie uit de wegengraaf (TDWAtdw).

De trips van 1 bestuurder werden gebruikt uit het ChargeCar

project. Een willekeurige subset van 2,230,500 punten werd

gekozen en verdeeld in 9566 intervallen om te voorspellen.

Van deze data werd 25% gebruikt voor training, nog 25%voor validatie en de resterende 50% werd gebruikt om demodellen te vergelijken. De resultaten van de RC modellen

zijn de gemiddelden over 10 reservoirs.

De resultaten van alle modellen kunnen teruggevonden worden

in Tabel I. Het eerste RC-gebaseerde model RCLA biedt geen

grote verbetering tov. de lineaire methodes, maar de OWPP

filter kan de gemiddelde kwadratische fout (RMSE) wel sterk

verbeteren. Het RCSP model voorspelt de snelheid beter dan

de andere modellen, en uitgebreid met een OWPP filter pres-

teert het beter dan elk ander model. In Figuur 2 is de absolute

afwijking gegeven over de voorspelde afstand. De OWPP

verbetert het resultaat aan het einde van de voorspellingen,

terwijl het RCSP model de resultaten verbetert aan het begin.

RMSE (STD) Vermogen (W) Snelheid (m/s) Acceleratie (m/s)2

TDW 8766 1.481 0.3136TDWAtdw 8401 1.502 0.3111RCLA 8416 (4.47) 1.423 (0.018) 0.3130 (0.0005)RCLA/OWPP 8386 (3.67) 1.311 (0.012) 0.3081 (0.0002)RCSP 8367 (25.62) 1.304 (0.017) 0.3023 (0.0005)RCSP/OWPP 8257 (12.48) 1.257 (0.016) 0.2992 (0.0006)

Tabel IGEMIDDELDE RMSE (EN STANDAARD DEVIATIE)

Figuur 2. Gemiddelde absolute afwijking over de voorspelde afstand.

IV. CONCLUSIE

Het is mogelijk de voorspelling van snelheid, het vermogen

en de acceleratie te verbeteren door het gebruik van de

informatie van vorige trips en Reservoir Computing. Met

een stopkans predictor kan dit model nog meer verbeterd

worden. Herverwerking van het uitgangsvenster verbetert de

prestaties nog meer. De gemiddelde absolute afwijking van de

voorspelling op meter 200 is 6km/u. De voorspelde profielen

en de stopkans predictor kunnen bovendien samen gebruikt

worden in een intelligente controller om de energie in een EV

te sturen.

REFERENTIES

[1] H. Benjamin Brown, Illah Nourbakhsh, Christopher Bartley, JenniferCross, Paul S Dille, Joshua Schapiro, and Alexander Styler. Chargecarcommunity conversions: Practical, custom electric vehicles now! NumberCMU-RI-TR-, March 2012.

[2] Lili Cao and John Krumm. From gps traces to a routable road map. InProceedings of the 17th ACM SIGSPATIAL International Conference on

Advances in Geographic Information Systems, GIS 09, pages 312, NewYork, NY, USA, 2009. ACM.

[3] David Verstraeten, Benjamin Schrauwen, Michiel DHaene, and DirkStroobandt. An experimental unification of reservoir computing methods.

Neural Networks, 20(3):391403, 4 2007.[4] Francis wyffels, Benjamin Schrauwen, and Dirk Stroobandt. Stable output

feedback in reservoir computing using ridge regression. In V. Kurkova,R. Neruda, and J. Koutnik, editors, Proceedings of the 18th InternationalConference on Artificial Neural Networks, pages 808817, Prague, 92008. Springer.


8/85

Toelating tot bruikleen - Copyright

De auteur geeft de toelating deze masterproef voor consultatie beschikbaar te stellen en

delen van de masterproef te kopiren voor persoonlijk gebruik. Elk ander gebruik valt onder

de beperkingen van het auteursrecht, in het bijzonder met betrekking tot de verplichting

de bron uitdrukkelijk te vermelden bij het aanhalen van resultaten uit deze masterproef.

The author gives permission to make this master dissertation available for consultation

and to copy parts of this master dissertation for personal use. In the case of any other

use, the limitations of the copyright have to be respected, in particular with regard to the

obligation to state expressly the source when quoting results from this master dissertation.

Jonas Buyl June 10, 2012

i


9/85

Acknowledgments

First, I would like to thank my promotors prof. dr. ir. Benjamin Schrauwen and dr. ir.

David Verstraeten for their advice and for making this research possible. I would also like

to thank my supervisor Pieter Buteneers for his guidance and patience for letting me work

and discover at my own pace.

On a personal level I owe much gratitude to friends and family for their support. Es-

pecially towards Sara Im very grateful for her understanding and patience. Lastly, I

thank my parents for giving me the means to study.

ii


10/85

Power demand prediction of a vehicle on a non-fixed route

by

Jonas Buyl

Thesis submitted in partial fulfillment of a Master Degree in Engineering: Computer

Science

Academic year: 2011-2012

Universiteit Gent

Faculty of Engineering

Promoters: prof. dr. ir. Benjamin Schrauwen, dr. ir. David VerstraetenSupervisor: ir. Pieter Buteneers

Summary

In this thesis several approaches are presented to predict the future power demand and

speed of a car, as well as other upcoming events that affect this demand. First, a road

graph data structure for automatic GPS map generation is adapted to capture local vehicle

behavior information.

The average local vehicle behavior is then used as extra information for the time series

prediction of the power demand, speed and acceleration using Reservoir Computing, which

is a novel technique for training recurrent neural networks. The predicted output window

is post-processed using a simple linear technique.

Thirdly, another system is presented that uses the current acceleration profile as well as

the information in the road graph to predict the chance a car is going to stop within the

next 200m.

Finally, two separate time prediction models are trained, one for when the car stops over

the next 200m, and one for when it does not. For each prediction, the model used is then

determined by the stop prediction model.

Keywords: reservoir computing, vehicle behavior prediction, road graph, electric

vehicles

iii


11/85

Contents

1 Introduction 2

1.1 A battery-capacitor hybrid setup . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Battery vs. supercapacitor . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.2 Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Content and structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Reservoir Computing 6

2.1 Introduction to neural networks . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 The Reservoir Computing approach . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.1 Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.2 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5.1 Input scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5.2 Spectral radius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5.3 Leak rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5.4 Bias scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5.5 Reservoir size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.6 Time series prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.6.1 Time series prediction using Reservoir Computing . . . . . . . . . . 162.7 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Data analysis 18

3.1 A road graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Extracting useful information . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3 Error measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

iv


12/85

3.3.1 Defining a prediction distance . . . . . . . . . . . . . . . . . . . . . . 22

3.3.2 Root Mean Square Error (RMSE) . . . . . . . . . . . . . . . . . . . 24

3.3.3 Kurtosis Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.3.4 Receiver Operating Characteristic (ROC) . . . . . . . . . . . . . . . 25

4 Time series prediction of vehicle power, speed and acceleration 28

4.1 Evaluation methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2 Baseline models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2.1 System setups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3 A prediction system with Reservoir Computing . . . . . . . . . . . . . . . . 36

4.3.1 System setups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.3.2 Parameter optimization . . . . . . . . . . . . . . . . . . . . . . . . . 384.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.4 Output window post-processing . . . . . . . . . . . . . . . . . . . . . . . . . 45

5 Stop prediction 47

5.1 Predicting the chance to stop using RC . . . . . . . . . . . . . . . . . . . . 48

5.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6 S plitting the system for stopping and driving behavior 55

6.1 The model setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6.2.1 RCSP Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

7 Conclusion 59

A Extra tables 61

A.1 Kurtosis difference results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

A.2 Model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

B Extra figures 66

B.1 Stop prediction model examples . . . . . . . . . . . . . . . . . . . . . . . . . 66B.2 Time series prediction examples . . . . . . . . . . . . . . . . . . . . . . . . . 68

1


13/85

Chapter 1

Introduction

Electric cars or vehicles (EVs) are increasingly commercially viable but seem to be hold-

back by a number of problems. Not only are there a lot of myths around EVs but there

remain some real issues.

One of the myths, for example is the issue of battery life expectancy which has been

largely solved. Nissan even announced an eight-year warranty on the batteries of its

electric model: the LEAF (figure 1.1). People usually drive longer than 8 years with a car

so some money will be spent on battery replacements, but these maintenance costs are

lower than the more frequent repairs needed in a regular gas-powered car [9].

The LEAF is not as successful as anticipated however because the price is still too high for

people to switch. Often a third of the total price is spent solely on the expensive lithium

batteries. Prices are expected to drop through mass production and governments all over

the world give substantial incentives. A radical change may be necessary, however, to

make it interesting for consumers to buy an EV for shorter distances while keeping the

regular car for long distances. One idea could be to mitigate the disadvantages of cheaper

batteries (such as a short life expectancy) in another cheaper way, so we could bring the

price down.

1.1 A battery-capacitor hybrid setup

The ChargeCar project [1] is committed to finding new ways to bring down the costs for

EVs. One of their ideas is to exploit the advantages of both batteries and capacitors.

1.1.1 Battery vs. supercapacitor

The battery has been the main way of storing energy in EVs because of their large power-

to-weight ratio. This simply means it allows the car to go further without adding a lot of

2


14/85

Figure 1.1: The Nissan LEAF full electric vehicle

weight. The downside however is that batteries are generally more inefficient when coping

with large spikes in power demand. Not only do fast charges and discharges decrease bat-

tery life expectancy, they also decrease the capacity of batteries. This is especially true

for lead-acid batteries, due to the chemical and structural changes in the interface under

high load. These increase resistance and therefore decrease capacity. For lithium-based

batteries on the other hand, it is shown that they lose capacity because of the higher

temperatures caused by high power load. [11].

Capacitors, like batteries, are electrical components used to store energy. They consist

of two metal plates separated by a thin isolation layer. Electrons can then be transfered

from one plate to another, charging and discharging the capacitor. One advantage over

batteries is that they show little degradation even after several hundreds of thousands of

charge cycles. Theyre especially more proficient in handling large power demand peaks

than batteries. The energy density of capacitors is a lot lower than batteries however.

Supercapacitors1 on the other hand, have a much greater energy density than capaci-

tors [24]. The amount of energy that can be stored in a capacitor increases with thesurface area of the metal plates. In supercapacitors, the plates are coated with a carbon

layer, etched to produce many holes that extend through the material, much like a sponge.

This increases the interior surface area many orders of magnitude, greatly increasing the

energy density (> 100,000 times).

1Also referred to as electric double-layer capacitor (EDLC), ultracapacitor, etc...

3


15/85

1.1.2 Controller

The solution presented by ChargeCar is to exploit the advantages of both a battery pack

and a supercapacitor. The capacitor is used for high spikes in power demand, and to save

energy generated while braking (through regenerative braking). When the capacitor is

empty, the battery is used to supply power. When generating power while the capacitor

is already full, the battery is charged. The supercapacitor effectively works as a buffer

between engine and battery, relieving the battery. Using both systems together then

allows car manufacturers to use cheaper, more cost-effective components. Furthermore,

capacitors function well in temperatures as low as 40 C, when batteries are at their worst.

To direct the energy flows between battery, capacitor and engine a controller is needed as

Figure 1.2: A controller guides power flows between battery, capacitor and engine

shown in figure 1.2. When accelerating, as much power as possible should be drained from

the capacitor to handle the high energy demand for accelerating. When braking all energy

generated by regenerative braking should be saved in the capacitor. After accelerating and

keeping a constant speed, the capacitor will be nearly empty, which means the battery

will need to be used. Ideally this is the only time the battery is used.

Now consider the following situation: the car is approaching an intersection but only stops

there sometimes. It could be useful in this situation to make sure the capacitor is com-

pletely filled in case the car does slow down. Another example: the car is driving steadily

at 70km/h but wants to overtake another car. The capacitor should have some energyleft to handle the short power burst, which means its desirable to transfer some energy

slowly from the battery to the capacitor, when the capacitor is almost empty, to handle

any possible peak.

Finding an optimal controller then, is a complex problem. To minimize battery usage it

could be beneficial to try to predict vehicle behavior and upcoming driving environments.

An intelligent controller could then use these predictions to optimize capacitor usage.

4


16/85

1.2 Problem statement

In this thesis we will investigate to what extend it is possible to predict the future power

demand and speed, as well as other upcoming events that affect this demand. These

predictions could then be useful for the controller described above. Previous research on

this subject includes the prediction of power demand for hybrid vehicles on a fixed route

by Bartholomaeus et al.[4] and Johannesson et al.[18], but these make use of a fixed route

where the data set is perfectly aligned, and they predict vehicle behavior assuming the

vehicle drives along the same route. The reality however, is much more complex. To

truly investigate the possibilities of prediction in real-life situations we do not make the

assumptions made by Bartholomaeus and Johannesson, making it less relevant to compare

results. In this work predictions are made without assuming the vehicle is on a fixed route.

The models presented here can be easily adapted to work under real-life circumstanceswhere vehicles are driving on a non-fixed route, and data is collected while driving.

1.3 Content and structure

To do this we first try and gather information of previous trips in a single data structure in

Chapter 3. We then try several approaches using Reservoir Computing and other machine

learning techniques explained in Chapter 2. One approach is to use these techniques for

time series prediction of power demand and other factors that it depends on (e.g. speed)

which we discuss in Chapter 4. In Chapter 5 we try to calculate the chance of stopping

within a short distance. Finally in Chapter 6, we use the stop predictor from Chapter 5

and determine if they can improve the prediction models presented in Chapter 4.

5


17/85

Chapter 2

Reservoir Computing

When problems become so complex they cant be solved efficiently by ordinary algorithms

sometimes a near-optimal solution can be found more efficiently using machine learning

techniques. This usually comes down to a model that is trained to capture underlying

characteristics of data. These can then be used to predict the output of new input data.

2.1 Introduction to neural networks

A neural network is a model based on the biological structure of the brain and consists

of several interconnected neurons. Each neuron has input and output connections that

connect it to the rest of the network. The output is calculated by taking a weighted sum

of those input connections, usually transformed by a non-linear activation function (for

the rest of this work the hyperbolic tangent tanh is used).

When there are no recurrent connections or cycles in the network, the network is called

a feed-forward neural network (FFNN). If there are cycles in the network it is called a

recurrent neural network (RNN). A neural network is trained by adjusting the weights

according to the error rate between target and predicted output. If the output depends on

large chains of neurons the adjustments can become so small that they cant be calculated

anymore. In RNNs the output depends on infinite chains of neurons which makes it very

hard to train this type of neural network. Algorithms like back-propagation-through-time

[30] are able to solve the problem but it the algorithm is very complex and takes a long

time to calculate.

6


18/85

2.2 The Reservoir Computing approach

Reservoir Computing (RC) is a fairly new approach to training recurrent neural networks

[28]. Its a unifying term for several similar methods discovered independently, the most

important ones being Echo State Networks [16] and Liquid State Networks [21]. The idea

is to never train the network itself but to only train the weight of each neuron to the

output: a readout function. All other weights, such as the input connections and internal

connections, are fixed and initialized randomly, but can be scaled and tuned (see section

2.5 on reservoir parameters).

To understand the dynamics of reservoirs consider the following analogy, which is usu-

ally given to explain Liquid State Networks. It does not capture the whole picture of

Reservoir Computing, but it gives an idea of what happens to the state of the reservoirwhen affected by an external input. Imagine the hidden layers of the reservoir network

as a real reservoir or liquid. We would like a warning system that warns us when some-

one throws a large object in the liquid (the input). A single throw will generate ripples

in the reservoir, converting that input to a spatio-temporal pattern along the surface of

the reservoir. To detect this pattern we place floating sensors in the reservoir which are

evidently connected through the liquid. The state of the reservoir can then be read from

the sensor values at a specific point in time.

This analogy makes it clear that certain parameters can heavily influence the reservoir

dynamics: the number of sensors, the size of the thrown object (the input), the way the

connecting surface behaves when an object falls in the water, etc... They are further dis-

cussed in section 2.5 in the context of Reservoir Computing specifically.

In general, reservoirs are used to give a high-dimensional dynamic representation of the

input, called the state of the reservoir. Because they are interconnected, they possess a

memory which depends largely on the scaling of the internal connections. Extra memory

can also be introduced for every neuron individually by retaining a part of the previous

neuron output value. To work properly, the reservoir needs to satisfy the Echo State

Property[16, 15]: the reservoir needs to wash out any information from initial conditions.

In practice, a reservoir network consists of the following:

7


19/85

Figure 2.1: A schematic representation of a typical reservoir network. Solid arrow lines are not

trained. Dashed arrow lines are trained connections.

u[k] The reservoir input vector on time step k

x[k] Reservoir state on time step k

y[k] The output vector on time step k

A schematic representation is given in Figure 2.1.

The state of the reservoir x[k] that retains 1 of the previous state x[k 1] at each time

step k is given by:

x[k] = (1 ) x[k 1] + f(Wrx[k 1] + Wiu[k] + Wb)

The weights of the internal connections Wr are initialized with random values from the

normal distribution, but scaled so that the largest absolute eigenvalue of the random ma-

trix is equal to a given parameter value: the spectral radius (see subsection 2.5.2). The

input weights Wi are initialized randomly as well, but are rescaled by the input scaling

parameter (subsection 2.5.1). A bias is sometimes added to the input with scaling Wb

(see subsection 2.5.4).

The output is calculated by:

y[k] = Wor

x[k] + Woi

u[k] + Wob

The output weights Wor

(reservoir to output), Woi

(input to output) and Wob

(output

bias) need to be trained. They are the dashed connection arrows in Figure 2.1.

8


20/85

2.3 Training

The main advantage of training recurrent neural networks using Reservoir Computing is

that only the weights Wor

of the reservoir neurons to the output need to be trained. An

additional linear connection straight from input to output with weight Woi

is sometimes

added, as well as a constant value (or bias) with weight Wob

.

This training approach not only reduces the time required for training but also allows

a wider variety of training methods to train the output weights. For this work, two dif-

ferent training methods are used:

2.3.1 Linear regression

The first step for training the weights using linear regression consists of letting the reservoirrun over all the samples and keep the reservoir states on every time step k in a matrix A.

Suppose we want to train the weights W using simple linear regression then we want to

find the least squares solution of the desired output y and the predicted output y:

Wopt = argminW

A W y2

There exists a closed-form solution:

Wopt = (ATA)1ATy

Although A is large (nsamples nneurons), its still possible to calculate the output weights

relatively fast when compared to other RNN training techniques such as BPTT.

2.3.2 Logistic regression

Logistic regression is a classification method that models the probability of an input sample

x belonging to a certain class. In contrast to other probabilistic models, logistic regres-

sion uses a discriminative approach which classifies the inputs directly with the following

probability:

p(C1|x) = f(x) = (wTx) =

1

1 + exp(wTx w0)

and p(C2,x) = 1 p(C1,x) when solving a binary classification problem.

9


21/85

Figure 2.2: An example of classification using the logistic function. The shaded area is the overlaparea between decision spaces. The red line is the suggested hard threshold

These distributions are visualized in Figure 2.2. When distributions are not linearly

separable1 theres an overlap area which means hard threshold needs to be defined. This

is often the point at which both probability functions intersect, but other thresholds can

be chosen if misclassification costs are different for the two classes.

The weights of the logistic regression model are found by minimizing the cross-entropyfunction:

E(w) = lnp(t|w) = N

n=1

tn ln yn + (1 tn)ln(1 yn)

Where tn {0, 1}, 1 if the input sample belongs to class C1. yn is the predicted output of

the model. Note that correctly classified samples that lie far from the decision line do not

get penalized. In ridge regression however, when samples would be correctly classified,

but lie far from the target output, they get penalized when they lie far from the target

output. This is further illustrated in Figure 2.3.

The solution for the minimization does not have a closed form but it is a convex problem2

so we can find it through gradient descent3. There exists a gradient descent approach

1Two sets of points in two dimensions are linearly separable if they can be separated by a single straightline

2A convex problem is a problem that has a unique minimum3Gradient descent is an optimization algorithm that finds the minimum error by taking steps to the

negative of the gradient (or derivative) of the error function

10


22/85

Figure 2.3: The error measure E(z) for the mean squared error of target and model output (used e.g.in classification using linear regression) and the cross-entropy function used in logistic regression.In z = y t, y is the model output, and t the target output of the model. For t = 1: a model outputy = 2 is penalized more by the mean squared error than a model output of y = 1, although theyare both classified correctly. For the cross-entropy function this is not true, and could therefore bemore suitable for classification.

based on the Newton-Raphson iterative optimization scheme called iteratively reweighed

least squares (IRLS) [23]. The weights are updated each iteration subtracting the deriva-

tive of the error function divided by the second derivative. The derivation and specifics of

this algorithm are not important for this work but the basics steps consist of the following

each iteration:

y = ((w())Tx)

Rnn = yn(1 yn)

z = Xw() R1(y t)

w(+1) = (XTRX)1XTRz

2.4 Regularization

When training a complex system, the model can become overfitted to the training samples.

This means that the model will perform well on the training set, but not on new test

data, because it is trained on examples that are not representative for the full range of

possibilities. When the model is then tested on a sample it has not seen before in the

training set, it wont know what to do with it. For example[27], suppose we want to train

a model to predict the Fibonacci sequence [1, 1, 2, 3, 5, 8,...], and we give it the examples

11


23/85

[1, 1], the model will be trained to output 1. The training examples, like in the Fibonacci

sequence, are examples of the underlying characteristics, affected by a small deviation

or noise. The underlying characteristic for the Fibonacci sequence is known: the n-th

Fibonacci number can be calculated by rounding

(1+2)n

5 . This deviation between thetraining examples and the underlying characteristic therefore lies always between 12 and

12 . If the model is too complex, the deviation is trained as well. One of the ways to

smooth this noise is by constraining weight size. This makes the model less sensitive to

noise and slight deviations. However, if this constraint is too strict, the model is simplified

too much to learn the underlying characteristics. A trade-off needs to be made. Using

Tikhonov regularization [25] or ridge regression, the trade-off can be tuned with a single

regularization parameter [32].

To find the least squares solution the regularized weights W are now found by:

Wopt = argminW

A W y2 + W2

in which is the regularization parameter. This minimization problem has a closed-form

solution as well, the weights can be calculated as follows:

Wopt = (ATA + I)1ATy

When is large, the size of the squared weights will increase the cost a lot. Setting too

high however, will increase the distance between the optimal solution and the regularized

solution (referred to as underfitting). To optimize , the same model is trained each time

with a different . The performance of each is then evaluated on new samples that are

not a part of the training set.

For training reservoir networks in particular, its important to note that the weights

depend on the random initialization, and that a regularization parameter needs to be

optimized for every reservoir specifically.

More regularization is needed as reservoir size increases because complexity increases: inthe extreme case there is a reservoir node for every training sample, mapping the sample

exactly to the output. On the other hand, if reservoir size is extremely low, no regulariza-

tion is needed because the model is not as complex.

For logistic regression, proper regularization is often necessary as well. The optimized

regularization parameter can be added to the IRLS algorithm easily by modifying the

12


24/85

weight update as follows:

w(+1) = (XTRX+ I)1XTRz

2.5 Parameters

As mentioned before, there are a number of parameters that need to be determined to

control the dynamics of the reservoir. The results of a model trained using Reservoir Com-

puting depend on the careful fine-tuning of these parameters. When using regularization

in the readout function, the regularization parameter should be optimized separately for

every reservoir parameter. Each model is trained using several different regularization

parameters and tested on a validation set. The model with the optimal regularization

parameter is then evaluated again on a separate test set to be sure of the general perfor-

mance of the model. We therefore need divide the dataset in three parts. This could be

a problem, especially when using a limited amount of data because we could accidentally

choose a poor set of samples which can lead to misleading results.

The best solution to counter this problem is cross-validation. The dataset is divided

in K subsets. Each subset is then used exactly once as a test set and the others as a train-

ing set. After this, the result is averaged, making sure the result is valid for the complete

dataset. When an extra validation set is needed as well, the subsets used for training are

divided again in a smaller training subset and a validation set. For example, suppose the

dataset consists of 4 samples then the cross-validation scheme is shown in Table 2.1.

2.5.1 Input scaling

Input scaling determines the scaling of the random input weights to the reservoir. They

determine how much the neurons are excited by new input values. For very low input values

the nonlinear neuron activation functions are barely activated resulting in an almost linear

system. Very high input values however, will saturate the activation function, resulting

almost in a binary step function. In other words: the input scaling determines the degreeof nonlinearity in the system.

2.5.2 Spectral radius

The spectral radius of a reservoir is the largest absolute eigenvalue of the weight matrix

of the internal connections between the neurons in the reservoir. It therefore defines the

factor by which the previous states are multiplied in the reservoir state update (section

13


25/85

Training set Validation set Test set

1,2 3

1,3 2 4

2,3 1

1,2 41,4 2 3

2,4 1

1, 3 4

1, 4 3 2

3, 4 1

2, 3 4

2, 4 3 1

3, 4 2

Table 2.1: An example cross-validation scheme where a dataset of 4 samples is divided in a trainingset, a validation set, and a test set[27]

.

2.2). If we choose a spectral radius < 1, the input values will eventually fade out, ensuring

stability and the echo state property. With a spectral radius > 1, the reservoir can become

unstable if the reservoir is near linear.

The internal connections between the neurons add memory to the reservoir. The spectral

radius and the scaling of the internal connection weights therefore influences the time scaleof reservoir. For input that evolves slowly, or that has long range temporal interactions,

the spectral radius is usually chosen close to 1, or even higher if the reservoir is nonlinear

enough.

2.5.3 Leak rate

The leak rate of each neuron in the reservoir controls the retainment rate of the previous

neuron output. It influences the memory of the reservoir directly. This also means it

affects the influence of new state updates and therefore also makes the reservoir adapt

more slowly to new situations. Therefore, a trade-off between the influence of long-term

dynamics and the influence of new input needs to be made.

2.5.4 Bias scaling

A constant 1 may be added to the input, multiplied by the bias scaling parameter. This

shifts the working point on the sigmoid activation function tanh of the neuron. The steep-

14


26/85

ness of the sigmoid is largest around the origin. Shifting the point upward or downward

therefore makes the reservoir less dynamic. An illustration of the influence of the bias to

the activation function is shown in 2.4.

Figure 2.4: Illustration of the effect of bias scaling. Using 0 as working point for the input broadensthe spectrum of the neuron activation (red line). When shifting the bias, the neuron exhibits aless dynamic behavior (the green line).

2.5.5 Reservoir size

The reservoir size is the amount of neurons in the network. Increasing reservoir size

usually improves the result, assuming sufficient regularization. Its therefore not really

an optimizable parameter as the reservoir size is normally determined according to the

computational power available.

2.6 Time series prediction

Predicting time series, as it is basically predicting the future, has been the focus of much

research throughout history. The basic idea is to first observe a training sequence andthen try to complete the sequence over a number of steps in the future, also referred to as

the number of freerun steps in RC literature.

To be able to predict by only taking into account the history there needs to be a pat-

tern in the sequence. That pattern can either be periodical (like 1, 2, 1, 2, 1, 2, . . .) or

contain a certain trend (such as the Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, . . .).

15


27/85

2.6.1 Time series prediction using Reservoir Computing

Reservoir Computing has already been successfully used for the signal generation task [17].

The model is trained by training it to always output the next step ahead. After training

these signal dynamics, the first unknown signal value is predicted according to the learned

signal transitions (teacher forcing). The predicted value is then used as input value to

predict the following signal value, and so on. This continues until the required number of

predicted steps is reached.

Its very important to let the reservoir neurons warm up sufficiently before starting predic-

tion, as the model obviously cant complete a sequence if it doesnt know the first part of

the sequence. Warming up a reservoir is done simply by letting the run the reservoir over

the history of the signal and feeding back the observed values. The evolution of neuronactivations when running over an example power demand profile can be seen in Figure

2.5. In this case, we can see that the neuron output values are initialized at a value close

to 0 and take about 100 state updates before patterns begin to appear in the reservoir.

Figure 2.5: An example of the output values of the first 10 neurons over the first 200 state updateswhen running the reservoir over a power demand profile of a car

One of the short-comings of time series prediction with Reservoir Computing, is thatoften it can only act on short-term time scales, although this can be partly solved by

retaining a part of the previous state. However, RC has already been tried successfully

for long-term financial time series prediction in [31] and [26]. Financial time series can

often be decomposed in periodical patterns, a trend and a remainder. These signals can

then be predicted separately and combined again after prediction. In this work we predict

the time series of vehicle behavior such as the power demand course, the speed profile

16


28/85

and acceleration. After initial experiments it became clear however that these are not

as straight-forward to predict as previously attempted and resulted in poor results. Our

approach to this problem is explained in Chapter 4.

2.7 Classification

Reservoir Computing can also be successfully applied to many temporal classification

problems such as speech recognition [27] and the detection of epileptic seizures [7]. In

the first example, the samples are isolated and need to be classified as a whole. In the

second, the input is a signal where the class needs to be defined at every time step. Every

classification then depends on the current input values and the previous input samples

(thanks to the memory properties of reservoir networks). The advantage of classification

using RC is that the reservoir not only maps the input values to a high-dimensional featurespace but that input values are memorized in the reservoir for a certain time in order to

detect the temporal patterns as well. The reservoirs states can then be classified using

any classification technique. The classification problem handled in this thesis is explained

and addressed in Chapter 5.

17


29/85

Chapter 3

Data analysis

The data is supplied by ChargeCar.org [1]. It consists of the GPS coordinates and elevation

of each point sampled on a one-second interval. The dataset contains data from multiple

drivers, but the GPS points are not gathered in the same neighborhood, so they cant be

combined. Well focus on the available data from one driver alone as it already allows for

extensive research. We will be using the trips from a driver who drove around in south

San Francisco. In total about 6915 km is covered in about 159 hours (Figure 3.1). The

driver frequently drives along the same road segments. It could therefore be interesting to

keep data from previous trips over the same road. Future predictions can then be based

on the previous trips when passing in the neighborhood.

Figure 3.1: ChargeCar GPS data

18


30/85

3.1 A road graph

To access data from previous trips easily, a suitable data structure is needed along with an

efficient algorithm to build it. The greatest difficulty is detecting when a car is driving on a

road where he has already been, and quickly finding the related GPS points. We follow the

approach described by L. Cao and J. Krumm to build a graph of directed road segments

[8]. They present a new and fully automatic method for creating a routable road map from

GPS traces of everyday drivers. The algorithm described by Cao and Krumm performs

well for ensuring road connectivity and differentiating GPS traces from close parallel roads.

First, to increase efficiency, a separate dataset is made by generally retaining only one

point every 30 meter. When the direction change over the last three points is greater than

10 degrees, the GPS points are retained every 10 meter. This increases accuracy when thecar is making a turn. Some of these points will be close and should be merged, making

sure the connectivity in the graph is not lost. To build the data structure we start with an

empty graph. Each trip is then processed sequentially. For each node in a trip the graph

is searched to decide whether it should be merged with an existing graph node.

Intuitively, a new node n should be merged with node v from the graph if theyre on

the same road segment. Let e be a road segment that connects v and another node v

from the road graph. Then n should be merged with v if the distance from e to n is small

enough, the trip goes in the same direction as e, and n is closer to v than v.

An illustration of the process can be seen in Figure 3.2. The first trip becomes the initial

graph. In the second trip, the 2nd, 3rd, 4th and 5th nodes are merged with the existing

nodes. The 1st, 6th and 7th node are copied to the graph. The road segments from the

second trip are connected from the new nodes to the existing nodes to ensure connectivity.

No nodes from trip 3 satisfy the merge conditions so they are simply copied to the graph.

The algorithm without optimization is very inefficient because every GPS point of the

trip needs to be compared to every node in the graph. The time required to add a trip

increases dramatically as the number of nodes in the graph increases. However, its clear

that the current GPS point will never need to merge with nodes far away. For this pur-

pose, the nodes are kept in a 2-d tree. Using a 2-dimensional distance tree all nodes within

range can be looked up in O(logN) time [19] (where N is the amount of nodes in the road

19


31/85

Figure 3.2: A simple example of the merge algorithm. The circles represent the retained GPSpoints. The arrowed lines represent the connecting road segments along with the driving direction.

graph). Storing the road graph nodes in a 2-d tree therefore significantly reduces the timerequired to add a trip to the road graph.

3.2 Extracting useful information

Although we had no access to real data such as speed and power usage, a lot of infor-

mation can be calculated from the GPS data. Speed, acceleration, and power demand is

calculated for every sample using the power model described by ChargeCar.

After building the graph, the complete dataset is used again, and mapped on the roadsegments between the road graph nodes. This way, the road graph can be built efficiently

without losing information about vehicle behavior between those points.

The dataset consists of samples with 1 second intervals. This means that the time spent

over a road segment of 30 meters between two nodes can be very variable, because it

depends on the speed of the vehicle over the road segment. In order to correctly calculate

20


32/85

and align the average behavior over the road segment, while properly retaining any peaks

in the profiles, interpolation is needed. The distance traveled over the road segment is al-

most constant for every trips. The trips are therefore interpolated every meter, converting

them from a sample every second, to a sample every meter. Over the rest of the work,the time series described are converted from a time scale to a distance scale. The effect of

this interpolation is illustrated in Figure 3.3.

Through testing we also noticed that vehicles driving exhibit very different behavior from

Figure 3.3: The speed profile on a time scale (left profile) is interpolated to a distance scale (rightprofile)

vehicles stopping. Suppose a vehicle stops 50% of the times he passes through an inter-section where the speed limit is 50 km/h. The average speed over the road segment would

then be 25 km/h. However, he rarely really drives at this speed, but usually either stops

and continues slowly over the intersection, or he doesnt need to stop, and keeps driving at

50 km/h. We therefore separate the captured information in a slow set and a fast dataset

on every road segment.

When passing through a road segment, the current vehicle behavior is added to the slow

set if:

The car stops on the current road segment, i.e. if the car drives slower than 2 m/s

= 7.2 km/h at any point over the road segment.

Or, the car stops in a road segment within 100 m before or after the current position,

and the cars average speed over the segment is more than 2m/s slower than the total

average speed of the fast set.

21


33/85

If neither condition is satisfied, the information about the cars behavior over the segment

is added to the fast data.

The average profiles over a path in the road graph can now be collected by concatenatingthe average profiles captured in each road segment. An example of the average speed

profiles is given in Figure 3.4. Small jumps in the average profiles couldnt be avoided,

because the amount of trips, that drive over the different road segments in the chosen

path, is variable.

Figure 3.4: An example of the average speed profiles over a trip in the road graph

3.3 Error measures

To evaluate the techniques used, obviously some sort of evaluation method is needed.

The ultimate goal is to minimize battery usage, but the controller using these predictions

remains hypothetical. Its therefore impossible to define a single number to capture the

goodness of a model. Its still possible however, to reason about the usefulness of the

proposed models using the following error measures.

3.3.1 Defining a prediction distance

If we want to evaluate the predictive capabilities of each model we should first specify on

what scale prediction is required. Of course, this can be very different for each applica-tion. The initial purpose for this work however, is the improvement of a battery/capacitor

controller so we will focus on this example.

The most ideal situation would be to allow the capacitor to empty completely while driv-

ing at a constant speed, to be ready to save all the energy contained in the moving car.

The theoretical maximum prediction distance is then the distance traveled to empty the

22


34/85

capacitor. This can be calculated using the specifications of the vehicle used to capture

the GPS data.

In the ChargeCar data a Honda Civic is used. The power required for this vehicle drivingat a constant speed is can be calculated as follows, with the constants and units described

in Table 3.1.

P=A

2 Cd D v

3 + Cr m g v

Symbol Description Value

A Frontal area 1.988 m2

Cd Drag coefficient 0.31D Air density 1.29 kg/m3

Cr Roll resistance coefficient 0.015

m Car mass 1200 kg

g Gravitational acceleration 9.81 m/s2

W Capacity energy capacity 190080 J

v Vehicle velocity m/s

P Power W

Table 3.1: Constants and units needed to calculate the prediction distance

The supercapacitor used in the ChargeCar test car is the Maxwell BMOD0165. The

maximum stored energy in this capacitor is 52.8 Wh or 190080 Joule. The time needed

to use the 190080 J while driving at constant velocity v is then:

t = 190080 JA2CdDv3+Cr mgv

= 190080 J0.3975v3+176.58v

The distance covered in function of that time and velocity is then:

d = v 190080 J

0.3975 v3 + 176.58 v

23


35/85

Figure 3.5: Maximum prediction distance vs. vehicle velocity

Figure 3.5 shows that the required distance decreases as speed increases. At 1 km/h

the maximum distance is 1076 meter. As it approaches 0, the distance approaches +.

From this result we would say that we need to predict the coming kilometer. However, he

car will usually be driving at minimum 50 km/h. The distance required at this speed is

750 m.

3.3.2 Root Mean Square Error (RMSE)The root mean square error is a frequently used error measure of the deviation between a

predicted model and the actual observed values. It allows us to compare two signals (e.g.

the speed profiles) and aggregate the point-wise differences (or residuals) between them

into a single number to evaluate the models used. For vectors x (observed values) and x

(predicted values) the RMSE is calculated as follows:

RMSE(x,x) =MSE(x,x) =

ni=1(xi xi)

2

n

3.3.3 Kurtosis Difference

Kurtosis is a measure of peakedness, in Machine Learning usually used as a measure of

non-Gaussianity 1 [3]. Through experiments we noticed a model sometimes converges to a

weighted average of the history of the current trip. This might be the best result according

to the RMSE and MAD error measures above but will not be as useful for a controller1Gaussianity is the similarity of a distribution with the normal (or Gaussian) distribution

24


36/85

because a controller needs a predictor that can correctly predict energy spikes or other

events that have a large influence on energy usage. It could then be useful to compare the

peakedness of the prediction profile with the peakedness of the observed profile. We could

then get a better idea of the similarity of both signals.

The peakedness (or kurtosis) of a vector x with mean x is calculated by:

Kurt(x) =1n

ni=1(xi x)

4

1n

ni=1(xi x)

22 3

To compare the peakedness, the kurtosis difference error measure is presented. No previ-

ous work was found on this measure, at least in the context of Machine Learning. The

kurtosis difference is merely used in an attempt to quantify the models in a secondary

way. The kurtosis difference of the model output y and the target output y is calculatedby: Kurt(y) Kurt(y).

3.3.4 Receiver Operating Characteristic (ROC)

The Receiver Operating Characteristic (ROC) is a classification error measure developed

in WWII by radar engineers and has since been used in large number of areas. In recent

years it has also gained a lot of interest in the field of machine learning and pattern recog-

nition [13].

The output of classifier models is usually continuous, but its often hard to evaluate per-

formance of these model because usually a hard threshold needs to be set to classify the

output. The ROC curve is able to visualize the trade-off between the hit rate (or true

positive rate or TPR) and the rate of false positives (or FPR) in binary classification. The

TPR is equivalent with the proportion of actual positives which are correctly identified.

The FPR on the other hand, is the proportion of negatives which are wrongly identified

as positive.

The curve is calculated by iterating over every possible hard threshold that can classify

the output of the model. Classifiers appearing on the lower left hand-side of an ROC curvecan be thought of as strict or conservative (A in Figure 3.6). They only make positive clas-

sification with strong evidence. Classifiers appearing on the right hand-side of the ROC

curve are less selective but result in a lot of false positives (B). Intuitively the best model

then stretches as far as possible to the top left hand-side (C). Random classification models

result in points along a straight line from the bottom left to the top right of the graph (D).

Classifiers under this line (E) can be thought of as worse than random, but if wed inverse

25


37/85

Figure 3.6: A basic ROC graph showing five discrete classifiers

the classifier and switch the target classes, the classifiers result is inversed, which makes

it perform better again than the random classifier. Consequently, the models that extract

no knowledge from the data will roughly follow the straight line from the random classifier.

The ROC measure also provides a way of evaluating model performance without set-

ting a threshold by calculate the total surface area under the curve, known as Area Under

Curve (AUC). Well mainly be using this measure for comparing performance of models

for binary classification.

One single threshold can be selected along the ROC curve to know the exact percent-

age of correctly classified samples for a selected trade-off between false and true positive

rates. Often the threshold is chosen where the false and true positive rates are equal. If

the classifier is used for a specific purpose, the cost of a false positive is sometimes higher

than the cost of a true negative or vice versa. The threshold can then be optimized in

relation to that application specifically. Two example ROC curves are shown in Figure3.7, along with the line at which the two error rates are equal.

26


38/85

Figure 3.7: Example of the ROC curves of two models, including the Equal Error Rate line, wherethe True Positive Rate is equal to the False Positive Rate.

27


39/85

Chapter 4

Time series prediction of vehiclepower, speed and acceleration

To predict the future power demand profile of a car using RC techniques, a different ap-

proach as the prediction of financial time series is needed, because there are typically fewer

periodic factors in driving a car. Stopping and accelerating often happens in a similar way,

but these stops come at near random intervals if there is no pre-existing information about

the cars environment.

Vehicle power demand depends on many physical factors and can be decomposed in other

ways: elevation differences, acceleration, speed, etc... . Elevation is not expected to change

over several trips so we can read it directly from the road graph. Predicting the vehicleacceleration and speed however, is far more complex. In this chapter, we present and eval-

uate several time series prediction models for the vehicles power demand, acceleration

and speed profile. An example of these profiles can be seen in Figure 4.1. Note that from

here on, the profiles are evaluated on a distance scale instead of a time scale, as explained

in the previous chapter.

4.1 Evaluation methodology

The theoretical prediction distance calculated in section 3.3.1 was 750m. However, thememory capabilities of reservoirs is limited and after predicting a number of steps, the

influence of the real observed samples on the reservoir diminishes[14]. In the setup pre-

sented by Jaeger, the input was forgotten after about 400 time steps. When adding noise,

the memory reduced to around 200 time steps. Experiments showed that trying to predict

any further than 200m with the models presented here, yielded misleading results which

were difficult to explain. We therefore decided to limit the evaluation to predicting the

28


40/85

Figure 4.1: An example power demand, speed, acceleration and elevation profile

first 200m.

As mentioned in Chapter 3, in total about 6915 km is covered. The dataset was con-

verted to a distance scale which means the dataset now contains about 6,915,000 samples.

To train and test the models in this chapter, all trips in the dataset were divided in 200m

intervals. The information of previous trips over each interval was extracted from the road

graph and merged with the intervals.

For time series prediction, especially using RC, a warm-up period is needed (see section

2.6.1). We couldt use the reservoir states of the previous intervals because the reservoir

states contain the predicted signal over the interval, not the real signal values. A poor

prediction would therefore have an effect on the next prediction. Moreover, the previously

predicted interval is not always a part of the current trip.

A warm-up period is therefore added before every prediction interval. This should be

long enough to forget the previous states, but short enough to limit memory require-

ments, and the computation time required to run over all the warm-up samples. Because

the maximum possible prediction distance seemed to be around 200m, we expect that

300m is enough to largely forget the state of the previously predicted interval.

29


41/85

Since the input weights and the internal weights of a reservoir are chosen randomly, a

particularly good reservoir could be generated by training one model while another model

was trained with a very weak reservoir. This could lead to misleading results. Therefore

each experiment involving RC is done over 10 different reservoir instances. When plottingthe resulting errors, the standard deviation of the results of a model is given as well using

error bars. In text, the standard deviation is given in parentheses along with the average

results.

To include all intervals, the reservoirs would now need to run 10 times over 17,287,500

samples. Additionally, were predicting the power demand as well as the speed and accel-

eration profile. Too much time would be required to finish all experiments in a reasonable

time frame. Therefore, a random subset of 2,230,500 samples (containing 9566 prediction

intervals) was chosen and fixed for the remaining experiments.

Because of the restrictions above, cross-validation was unfortunately not an option. In-

Figure 4.2: Example of splitting the profiles in prediction intervals

stead, the models were trained on the first 25% of the trips. The next 25% was used as a

validation set to optimize the model parameters. The performance of the resulting models

was then evaluated on the remaining 50% of the dataset. The test set was chosen large

enough to ensure the models are evaluated on a data set large enough to draw general

conclusions.

30


42/85

4.2 Baseline models

4.2.1 System setups

First, some simple and linear models are presented and evaluated. For each model we

define an abbreviation in the paragraph title to be able to refer to them more clearly

afterwards.

Last value as prediction (LV)

The last observed value is used as prediction for the next predicted values: y(t) = y(t 1).

As speed is usually quite constant this should already provide a good estimate. The longest

distances in the dataset are often over highways where speed hardly changes.

Averages from previous trips as prediction (SA/FA)

The averages of the previous trips over the path of the current trip are used directly as

prediction. The performance of the slow average profiles (SA) and the fast average profiles

(FA) are evaluated separately.

Offset averages as prediction (OA)

The predictions of the previous model dont make any use of the current trip however.

At least the first few predicted values should be close to the last known values. To solve

this, we first calculate the difference between the current trip and the averages of the lastobserved sample of the interval. This offset is then added to the averages over the rest of

the prediction.

Weighted time delay window (TDW)

A weighted average is taken of the previous values. This allows the model to incorporate

the recent history of the current trip. The weights of every point in the recent history

window y(t nwindow size), ..., y(t 1) are trained using ridge regression to predict one

step ahead: y(t). The predicted value y(t) is fed back and used as part of the current trip

history: y(t nwindow size + 1), ..., y(t 1), y(t) to predict y(t+ 1).

Of course, we need to know how many previous values need to be incorporated: the

size of the time delay window needs to be determined. As shown in Figure 4.3, the opti-

mal window size is 2 for the speed profile. For the acceleration profile, a window of size 3

is chosen. Lastly, for the power profile, a very large window size is preferred. The error

31


43/85

function flattens out around window size 200. This value is therefore chosen as the window

size.

Figure 4.3: RMSE error values vs. time window size in the TDW model

Weighted time delay window with averages (TDWAt/TDWAtdw)

Figure 4.4: The weighted time delay model (TDW) used for each profile (with added averages atstep t (TDWAt) in dashed rectangle)

The TDW model above can be extended with input of the average profiles of previous

trips to investigate the influence of the information in the road graph. The predicted value

at each step is combined with the average value at that step (TDWAt). Additionally, thismodel is extended again by also including a weighted average of the history of the average

profiles (TDWAtdw).

Initially all information from the road graph was included: the slow and fast averages

of power demand, speed and acceleration as well as the average chance to stop over the

current road segment and the elevation difference between two successive samples.

32


44/85

Some of the information in the road graph is not useful. To detect the contributing

averages, a feature selection is done on the input dimensions using Least Angle Regression

(LAR). With this method, the weights of the input dimensions that hardly contributetowards improving the solution can drop to 0, which is not possible using linear (or ridge)

regression. For the specifics of this algorithm we refer to the article by Efron et al.[12].

This method was chosen over other similar stepwise methods such as forward feature selec-

tion because its just as fast as forward feature selection but generally performs better[12].

No additional experiments were done in this thesis to confirm this however.

For every profile, the last observed value was selected. For the power demand profile,

the fast average power value was selected as well. For the speed profile almost all averages

remained non-zero, except for the slow average speed. Lastly for the acceleration profile,

both the fast average acceleration and the average chance to stop were chosen.

Training the TDWAt model weights using LAR does not perform as well as training

the weights using ridge regression i

Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

Documents

Transcript of Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route