Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
Transcript of Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
1/85
Jonas Buyl
route
Power demand prediction of a vehicle on a non-fixed
Academiejaar 2011-2012
Faculteit Ingenieurswetenschappen en Architectuur
Voorzitter: prof. dr. ir. Jan Van Campenhout
Vakgroep Elektronica en Informatiesystemen
Master in de ingenieurswetenschappen: computerwetenschappen
Masterproef ingediend tot het behalen van de academische graad van
Begeleiders: Pieter Buteneers, Tim Waegeman
Promotoren: prof. dr. ir. Benjamin Schrauwen, dr. ir. David Verstraeten
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
2/85
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
3/85
Jonas Buyl
route
Power demand prediction of a vehicle on a non-fixed
Academiejaar 2011-2012
Faculteit Ingenieurswetenschappen en Architectuur
Voorzitter: prof. dr. ir. Jan Van Campenhout
Vakgroep Elektronica en Informatiesystemen
Master in de ingenieurswetenschappen: computerwetenschappen
Masterproef ingediend tot het behalen van de academische graad van
Begeleiders: Pieter Buteneers, Tim Waegeman
Promotoren: prof. dr. ir. Benjamin Schrauwen, dr. ir. David Verstraeten
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
4/85
1
Power demand prediction of a vehicle on a
non-fixed routeJonas Buyl
Supervisors: prof. dr. ir. Benjamin Schrauwen, dr. ir. David Verstraeten,
ir. Pieter Buteneers, ir. Tim Waegeman.
AbstractIn this article several approaches are presented topredict the future power demand and speed of a car, as well asthe chance to stop in 200m. We introduce a time series predictionmodel based on Reservoir Computing, a novel technique fortraining recurrent neural networks. The model is improved byusing information of previous trips, and post-processing thepredicted output window. Furthermore, we present a RC-basedclassifier to predict the chance a car is stopping within the next200m. This is used to separate the model in a model trained onfast data, and a model trained on intervals where the car stops.
Index TermsReservoir Computing, vehicle behavior predic-tion, stop prediction, road graph, electric vehicles, time seriesprediction
I. INTRODUCTION
ELECTRIC vehicles (EV) are increasingly commerciallyviable but sales figures remain fairly disappointing, oftenbecause of the high price. The battery has been the main
way of storing energy in EVs because of its large power-
to-weight ratio, but they are not as capable as capacitors
to handle peaks in power demand. New research proposes
to use supercapacitors capacitors with an energy densitymuch greater than capacitors in electric vehicles to replace
batteries.
The ChargeCar project [1] suggests to combine the advantages
of both, to be able to use cheaper batteries and supercapacitors
to reduce the manufacturing costs for EVs. The capacitor is
used as a buffer to handle the high spikes in power demand.
This extends battery life-time, increases efficiency in cold
weather and can even extend the range of the EV.
To direct the energy flows between battery, capacitor, and
engine, a controller is needed. In this article we introduce
several approaches to predict vehicle behavior and upcoming
stops. These predictions can then be used to improve an
intelligent controller.We use of Reservoir Computing (RC), a novel way of training
recurrent neural networks [3]. Instead of training all internal
weights, only the output weights are trained. The weights of
the input and the internal connections are generated randomly
and remain constant.
First, we modify a GPS map generation algorithm presented by
L. Cao and J. Krumm [2], to keep information about the cars
current power demand, speed, acceleration, etc... . This infor-
mation is then used as extra input for time series prediction
of the power demand, speed and acceleration profiles using
Reservoir Computing [4]. Additionally, we observed that the
weight of the predicted output vs. the information from previ-
ous trips decreases as the range of the prediction increases.
Therefore the predicted output windows are post-processed
using a simple linear model. Furthermore, we introduce an RC-
based classifier model to predict if the car stops within 200m.
The classifier is then used to separate the time series prediction
model for situations where the car stops within 200m.
I I . VEHICLE BEHAVIOR PREDICTIONA. Pre-processing
After building the road graph data structure defined by Cao
et al. [2], the complete dataset is mapped on the road segments.
To better align the trip data, it was interpolated every meter,
converting the data to a distance scale.
B. Single RC time series prediction model (RCLA)
The speed, acceleration and power profiles are used as input
in separate systems with a reservoir of 150 neurons each. The
neuron output weights are trained using ridge regression. Each
predicted value is sent back in an output feedback loop to
recursively predict the rest of the sequence. The reservoir stateoutputs at each step t are extended with the averages of the
information of previous trips at each step t, to predict the next
output value y(t).
C. Output window post-processing (OWPP)
The training process of the time series consists of only
predicting the next step ahead. We observed that the influence
of the predicted output vs. the information from previous
trips decreases as the range of the prediction increases. The
output window is therefore post-processed by applying linear
regression at each time step t individually, combining the
predicted values with the average values at point t.
D. Stop prediction
A reservoir was used with a logistic regression readout to
classify a sample t as point where the car stops in 200m. First
the current power demand, acceleration and speed at t is used
as input of the reservoir. Secondly, the average acceleration
of previous trips at t + 20 is used, excluding trips where thecar does not stop within 200m of t + 20. Lastly, the averagechance to stop within 200m of t + 20 is used.For the evaluation of this classifier, the area under the ROC-
curve (AUC) is maximized. The true positive rate and false
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
5/85
2
positive rate are calculated for every threshold that can sep-
arate the classes, when classifying the output between [0, 1]of the reservoir readout. A maximum average AUC of 0.955
was found and 94.5% of the samples were correctly classified,tested on a dataset in which 10% of the samples are actualstops. From Figure 1, we can see that the predicted chance
to stop is usually high at points where the car stops. Around
places where the car breaks but doesnt stop, the output can
be high as well. This could be interpreted as an error, but this
output may still be useful for some applications.
Fig. 1. An example of the output of the RC stop prediction model. The greenareas are the target areas where the car stops within 200m. At the bottom, thespeed profile of the trip is given for comparison with the actual car behavior.The chosen threshold line is shown as a grey dashed line.
E. Split model time series prediction (RCSP)The RCLA model was separated by training and optimizing
one model on a dataset of intervals where the car stops, and
one model on the other intervals. The stop classifier is then
used to determine which model should be used for the time
series prediction. Finally, the OWPP filter was separated as
well and applied to the RCSP model to further improve results.
III. EVALUATION
The proposed RC-based models were compared with a
number of linear methods. The best performing methods made
use of a time delay window (TDW): a weighted average of
the previous values, trained using linear regression. A secondmodel extends the TDW model by including a weighted
average of the information of previous trips (TDWAtdw).
Trips of one driver were used from the dataset supplied by the
ChargeCar project. A random subset of 2,230,500 samples was
chosen and divided in 9566 intervals to predict. Of these data
25% was used for training, another 25% for validation, and theremaining 50% was used to compare the models. The resultsof the RC models are the averages taken over 10 reservoir
instances.
The results of all discussed models are given in Table I. The
first RC-based model RCLA does not yield much better results
than the linear methods. However, after output window post-
processing the Root Mean Squared Error (RMSE) can be
decreased significantly. The RCSP model predicts the speed
better than other models, and when extended with the OWPP
filter, the model outperforms any other tested model to predict
the power demand, acceleration and speed profiles. In Figure
2 the absolute deviation is given over the predicted distance.
The OWPP improves the result towards the end of predictions,
whereas the RCSP model improves the result at the start of
predictions.
RMSE (STD) Power (W) Speed (m/s) Acceleration (m/s)2
TDW 8766 1.481 0.3136TDWAtdw 8401 1.502 0.3111RCLA 8416 (4.47) 1.423 (0.018) 0.3130 (0.0005)RCLA/OWPP 8386 (3.67) 1.311 (0.012) 0.3081 (0.0002)RCSP 8367 (25.62) 1.304 (0.017) 0.3023 (0.0005)RCSP/OWPP 8257 (12.48) 1.257 (0.016) 0.2992 (0.0006)
TABLE IAVERAGE RMSE ERROR RATES (AND STANDARD DEVIATION).
Fig. 2. Average absolute deviation over the predicted distance.
IV. CONCLUSION
It is possible to use data from previous trips and Reservoir
Computing to predict the future power demand, speed and
acceleration profile. Using a classifier to predict if a stop is
imminent significantly improves the results. Post-processing
the predicted output interval further boosts the performance.
The average absolute deviation of the predicted speed at 200m
further is 6km/h.
Both the predicted profiles and the stop predictor could be
used for an intelligent vehicle energy management controller.
REFERENCES
[1] H. Benjamin Brown, Illah Nourbakhsh, Christopher Bartley, JenniferCross, Paul S Dille, Joshua Schapiro, and Alexander Styler. Chargecarcommunity conversions: Practical, custom electric vehicles now! NumberCMU-RI-TR-, March 2012.
[2] Lili Cao and John Krumm. From gps traces to a routable road map. InProceedings of the 17th ACM SIGSPATIAL International Conference on
Advances in Geographic Information Systems, GIS 09, pages 312, NewYork, NY, USA, 2009. ACM.
[3] David Verstraeten, Benjamin Schrauwen, Michiel DHaene, and DirkStroobandt. An experimental unification of reservoir computing methods.
Neural Networks, 20(3):391403, 4 2007.[4] Francis wyffels, Benjamin Schrauwen, and Dirk Stroobandt. Stable output
feedback in reservoir computing using ridge regression. In V. Kurkova,R. Neruda, and J. Koutnik, editors, Proceedings of the 18th InternationalConference on Artificial Neural Networks, pages 808817, Prague, 92008. Springer.
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
6/85
1
Voorspelling van vermogensgebruik van een
voertuig op een niet-vaste routeJonas Buyl
Promotors: prof. dr. ir. Benjamin Schrauwen, dr. ir. David Verstraeten,
Begeleiders: ir. Pieter Buteneers, ir. Tim Waegeman.
SamenvattingIn dit artikel presenteren we verschillendemanieren om het verbruik en de snelheid van een voertuig tevoorspellen, net als de kans om te stoppen binnen de 200m. Weintroduceren een model voor tijdsreeksvoorspelling gebaseerdop Reservoir Computing, een nieuwe techniek om recurrenteneurale netwerken te trainen. Het model werd nog verbeterdmet informatie van vorige trips, en met herverwerking van hetvoorspellingsvenster. Verder gebruiken we een RC classificatie-model om de stopkans te voorspellen om het eerdere model opte splitsen door een model te trainen op snelle data en een ander
op intervallen waar de auto stopt. De stopkans predictor bepaaltdan welk model moet gebruikt worden om het volgende intervalte voorspellen.
Sleutelwoorden: Reservoir Computing, gedragsvoorspelling vaneen voertuig, voorspelling van stopkans, wegengraaf, elektrischevoertuigen, tijdsreeksvoorspelling
I. INTRODUCTIE
ELEKTRISCHE voertuigen (EV) zijn meer en meer com-
mercieel aantrekkelijker maar de verkoopscijfers blijven
tegenvallen, dikwijls o.w.v. de hoge kostprijs van de batterij.
De batterij is de meest gebruikte manier om energie op te slaan
in EVen, maar ze zijn niet zo goed in staat om grote pieken
in het vermogen op te vangen zoals condensatoren. Nieuwonderzoek stelt voor om supercondensatoren condensatoren
met een veel grotere energiedensiteit te gebruiken in EVen
i.p.v. batterijen.
Het ChargeCar project [1] stelt voor om de voordelen van
beide te gebruiken zodat goedkopere batterijen en supercon-
densatoren gebruikt kunnen worden. De condensator wordt
dan gebruikt als buffer tegen hoge pieken in het vermogen.
Dit verbetert de levensduur van de batterijen, de efficientie in
koud weer en kan zelfs het bereik van het EV vergroten.
Om de energiestromen tussen batterij, condensator en motor
te sturen, is een controller nodig. In dit artikel introduceren
we verschillende manieren om het gedrag van een voertuig en
stops te voorspellen. Deze voorspellingen kunnen dan gebruiktworden om een intelligente controller te verbeteren.
We maken gebruik van Reservoir Computing (RC), een vrij
nieuwe techniek om recurrente neurale netwerken te trainen
[3]. In plaats van alle interne gewichten te trainen, worden
alleen de uitganggewichten getraind. De rest van de verbin-
dingen blijven constant en worden willekeurig gegenereerd.
Als eerste passen we een automatisch algoritme aan om
GPS kaarten te genereren [2] om informatie bij te houden
van de auto (zoals het huidig vermogen, snelheid, enz...).
Deze informatie kan dan gebruikt worden om de modellen
te verbeteren. Bovendien zagen we dat het gewicht van de
voorspelde waarde t.o.v. de informatie van voorbije trips,
daalde naargelang de voorspelde afstand. Daarom wordt het
voorspelde venster herverwerkt met een simpel lineair model.
Verder introduceren we een RC classificatiemodel om te
voorspellen of de auto stopt binnen de 200m. Deze wordt
dan gebruikt om de tijdsreeksvoorspellingsmodellen te splitsen
naargelang de auto stopt of niet.
I I . VOORSPELLING VAN HET GEDRAG VAN EEN VOERTUIGA. Voorverwerking
Na het bouwen van de wegengraaf gedefinieerd door Cao
et al. [2], werd de volledige dataset gekoppeld aan de weg-
segmenten. Om de data beter te uit te lijnen werden de trips
elke meter genterpoleerd zodat de data op een afstandsschaal
komt.
B. Tijdsreeksvoorspelling met enkel RC model(RCLA)
De snelheid, acceleratie en het vermogen worden gebruikt
als input in aparte reservoirs met 150 neuronen elk. De
uitganggewichten worden getraind met ridge regressie [4].
Elke voorspelde waarde wordt teruggekoppeld om recursief derest van de reeks te voorspellen. De reservoir-uitgang op elke
stap t wordt uitgebreid met de informatie van de vorige trips
op elke tijdsstip t om de volgende uitgang y(t) te voorspellen.
C. Herverwerking van uitvoervenster (OWPP)
De tijdsreeksen worden slechts getraind op de volgende
stap. We hebben gemerkt dat de invloed van de voorspelde
uitvoer t.o.v. de informatie van vorige trips vermindert naar-
gelang de predictie-afstand langer wordt. Het uitvoervenster
werd daarom herverwerkt door lineare regressie toe te passen
op elke stap t apart en de de voorspelde waarde te combineren
met de gemiddelde waarde op punt t.
D. Stopkans voorspelling
Een reservoir werd gecombineerd met een logistische uit-
leesfunctie om een punt t te classificeren als een punt waar
de auto stopt binnen de 200 meter met de volgende input: Het
huidig vermogen, de snelheid, de acceleratie, de gemiddelde
acceleratie van de vorige trips op het punt t+ 20 en als laatstede gemiddelde kans om binnen de 200 meter te stoppen van
punt t + 20. Voor de evaluatie van dit model werd de op-pervlakte onder de ROC-curves gemaximaliseerd (AUC). De
fracties echte positieven en fout-positieven worden berekenend
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
7/85
2
voor elke drempelwaarde die de klassen kan scheiden van de
uitvoer waarde tussen [0, 1] van de reservoir uitleesfunctie.We vonden een maximale gemiddelde oppervlakte van 0.955.
Na het minimalizeren van de fractie foute classificaties werd
een drempelwaarde gevonden die 94.5% van de punten juistclassificeert. Uit Figuur 1 kunnen we zien dat de voorspelde
stopkans meestal hoog is waar de auto stopt. Rond plekken
waar de auto niet stopt maar wel remt is de uitvoer soms ook
hoog. Dit resulteert in een fout, maar deze uitvoer kan toch
nog nuttig zijn voor andere applicaties.
Figuur 1. Een voorbeeld van de uitvoer van het RC stopkans voorspellings-model. De groene gebieden zijn de richtgebieden waar de auto stopt binnende 200m. Onderaan werd het snelheidsprofiel gegeven om de vergelijking methet echte gedrag te tonen. De gekozen drempelwaarde wordt getoond met degrijze stippellijn.
E. Gescheiden tijdsreeksvoorspelling (RCSP)
Het RCLA model werd gescheiden door een model apart te
optimalizeren en te trainen op een dataset met intervallen waar
de auto stopt, en een ander model op de andere intervallen.
De stopkans predictor wordt dan gebruikt om te bepalen van
welk model de voorspelling moet gebruikt worden. Tenslotte
wordt ook de OWPP filter apart getraind en toegepast op dit
model om de resultaten nog meer te verbeteren.
III. EVALUATIE
De RC-gebaseerde modellen werden vergeleken met enkele
lineaire methodes. De beste modellen maken gebruik van een
geschiedenisvenster (TDW): een gewogen gemiddelde van devorige waarden, getraind met lineaire regressie. Een tweede
model breidt het TDW model uit met ook het gewogen
gemiddelde van informatie uit de wegengraaf (TDWAtdw).
De trips van 1 bestuurder werden gebruikt uit het ChargeCar
project. Een willekeurige subset van 2,230,500 punten werd
gekozen en verdeeld in 9566 intervallen om te voorspellen.
Van deze data werd 25% gebruikt voor training, nog 25%voor validatie en de resterende 50% werd gebruikt om demodellen te vergelijken. De resultaten van de RC modellen
zijn de gemiddelden over 10 reservoirs.
De resultaten van alle modellen kunnen teruggevonden worden
in Tabel I. Het eerste RC-gebaseerde model RCLA biedt geen
grote verbetering tov. de lineaire methodes, maar de OWPP
filter kan de gemiddelde kwadratische fout (RMSE) wel sterk
verbeteren. Het RCSP model voorspelt de snelheid beter dan
de andere modellen, en uitgebreid met een OWPP filter pres-
teert het beter dan elk ander model. In Figuur 2 is de absolute
afwijking gegeven over de voorspelde afstand. De OWPP
verbetert het resultaat aan het einde van de voorspellingen,
terwijl het RCSP model de resultaten verbetert aan het begin.
RMSE (STD) Vermogen (W) Snelheid (m/s) Acceleratie (m/s)2
TDW 8766 1.481 0.3136TDWAtdw 8401 1.502 0.3111RCLA 8416 (4.47) 1.423 (0.018) 0.3130 (0.0005)RCLA/OWPP 8386 (3.67) 1.311 (0.012) 0.3081 (0.0002)RCSP 8367 (25.62) 1.304 (0.017) 0.3023 (0.0005)RCSP/OWPP 8257 (12.48) 1.257 (0.016) 0.2992 (0.0006)
Tabel IGEMIDDELDE RMSE (EN STANDAARD DEVIATIE)
Figuur 2. Gemiddelde absolute afwijking over de voorspelde afstand.
IV. CONCLUSIE
Het is mogelijk de voorspelling van snelheid, het vermogen
en de acceleratie te verbeteren door het gebruik van de
informatie van vorige trips en Reservoir Computing. Met
een stopkans predictor kan dit model nog meer verbeterd
worden. Herverwerking van het uitgangsvenster verbetert de
prestaties nog meer. De gemiddelde absolute afwijking van de
voorspelling op meter 200 is 6km/u. De voorspelde profielen
en de stopkans predictor kunnen bovendien samen gebruikt
worden in een intelligente controller om de energie in een EV
te sturen.
REFERENTIES
[1] H. Benjamin Brown, Illah Nourbakhsh, Christopher Bartley, JenniferCross, Paul S Dille, Joshua Schapiro, and Alexander Styler. Chargecarcommunity conversions: Practical, custom electric vehicles now! NumberCMU-RI-TR-, March 2012.
[2] Lili Cao and John Krumm. From gps traces to a routable road map. InProceedings of the 17th ACM SIGSPATIAL International Conference on
Advances in Geographic Information Systems, GIS 09, pages 312, NewYork, NY, USA, 2009. ACM.
[3] David Verstraeten, Benjamin Schrauwen, Michiel DHaene, and DirkStroobandt. An experimental unification of reservoir computing methods.
Neural Networks, 20(3):391403, 4 2007.[4] Francis wyffels, Benjamin Schrauwen, and Dirk Stroobandt. Stable output
feedback in reservoir computing using ridge regression. In V. Kurkova,R. Neruda, and J. Koutnik, editors, Proceedings of the 18th InternationalConference on Artificial Neural Networks, pages 808817, Prague, 92008. Springer.
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
8/85
Toelating tot bruikleen - Copyright
De auteur geeft de toelating deze masterproef voor consultatie beschikbaar te stellen en
delen van de masterproef te kopiren voor persoonlijk gebruik. Elk ander gebruik valt onder
de beperkingen van het auteursrecht, in het bijzonder met betrekking tot de verplichting
de bron uitdrukkelijk te vermelden bij het aanhalen van resultaten uit deze masterproef.
The author gives permission to make this master dissertation available for consultation
and to copy parts of this master dissertation for personal use. In the case of any other
use, the limitations of the copyright have to be respected, in particular with regard to the
obligation to state expressly the source when quoting results from this master dissertation.
Jonas Buyl June 10, 2012
i
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
9/85
Acknowledgments
First, I would like to thank my promotors prof. dr. ir. Benjamin Schrauwen and dr. ir.
David Verstraeten for their advice and for making this research possible. I would also like
to thank my supervisor Pieter Buteneers for his guidance and patience for letting me work
and discover at my own pace.
On a personal level I owe much gratitude to friends and family for their support. Es-
pecially towards Sara Im very grateful for her understanding and patience. Lastly, I
thank my parents for giving me the means to study.
ii
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
10/85
Power demand prediction of a vehicle on a non-fixed route
by
Jonas Buyl
Thesis submitted in partial fulfillment of a Master Degree in Engineering: Computer
Science
Academic year: 2011-2012
Universiteit Gent
Faculty of Engineering
Promoters: prof. dr. ir. Benjamin Schrauwen, dr. ir. David VerstraetenSupervisor: ir. Pieter Buteneers
Summary
In this thesis several approaches are presented to predict the future power demand and
speed of a car, as well as other upcoming events that affect this demand. First, a road
graph data structure for automatic GPS map generation is adapted to capture local vehicle
behavior information.
The average local vehicle behavior is then used as extra information for the time series
prediction of the power demand, speed and acceleration using Reservoir Computing, which
is a novel technique for training recurrent neural networks. The predicted output window
is post-processed using a simple linear technique.
Thirdly, another system is presented that uses the current acceleration profile as well as
the information in the road graph to predict the chance a car is going to stop within the
next 200m.
Finally, two separate time prediction models are trained, one for when the car stops over
the next 200m, and one for when it does not. For each prediction, the model used is then
determined by the stop prediction model.
Keywords: reservoir computing, vehicle behavior prediction, road graph, electric
vehicles
iii
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
11/85
Contents
1 Introduction 2
1.1 A battery-capacitor hybrid setup . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Battery vs. supercapacitor . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Content and structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Reservoir Computing 6
2.1 Introduction to neural networks . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 The Reservoir Computing approach . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.1 Input scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.2 Spectral radius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.3 Leak rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.4 Bias scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.5 Reservoir size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6 Time series prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6.1 Time series prediction using Reservoir Computing . . . . . . . . . . 162.7 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Data analysis 18
3.1 A road graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Extracting useful information . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Error measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
iv
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
12/85
3.3.1 Defining a prediction distance . . . . . . . . . . . . . . . . . . . . . . 22
3.3.2 Root Mean Square Error (RMSE) . . . . . . . . . . . . . . . . . . . 24
3.3.3 Kurtosis Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.4 Receiver Operating Characteristic (ROC) . . . . . . . . . . . . . . . 25
4 Time series prediction of vehicle power, speed and acceleration 28
4.1 Evaluation methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Baseline models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.1 System setups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 A prediction system with Reservoir Computing . . . . . . . . . . . . . . . . 36
4.3.1 System setups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3.2 Parameter optimization . . . . . . . . . . . . . . . . . . . . . . . . . 384.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4 Output window post-processing . . . . . . . . . . . . . . . . . . . . . . . . . 45
5 Stop prediction 47
5.1 Predicting the chance to stop using RC . . . . . . . . . . . . . . . . . . . . 48
5.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6 S plitting the system for stopping and driving behavior 55
6.1 The model setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.2.1 RCSP Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7 Conclusion 59
A Extra tables 61
A.1 Kurtosis difference results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
A.2 Model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
B Extra figures 66
B.1 Stop prediction model examples . . . . . . . . . . . . . . . . . . . . . . . . . 66B.2 Time series prediction examples . . . . . . . . . . . . . . . . . . . . . . . . . 68
1
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
13/85
Chapter 1
Introduction
Electric cars or vehicles (EVs) are increasingly commercially viable but seem to be hold-
back by a number of problems. Not only are there a lot of myths around EVs but there
remain some real issues.
One of the myths, for example is the issue of battery life expectancy which has been
largely solved. Nissan even announced an eight-year warranty on the batteries of its
electric model: the LEAF (figure 1.1). People usually drive longer than 8 years with a car
so some money will be spent on battery replacements, but these maintenance costs are
lower than the more frequent repairs needed in a regular gas-powered car [9].
The LEAF is not as successful as anticipated however because the price is still too high for
people to switch. Often a third of the total price is spent solely on the expensive lithium
batteries. Prices are expected to drop through mass production and governments all over
the world give substantial incentives. A radical change may be necessary, however, to
make it interesting for consumers to buy an EV for shorter distances while keeping the
regular car for long distances. One idea could be to mitigate the disadvantages of cheaper
batteries (such as a short life expectancy) in another cheaper way, so we could bring the
price down.
1.1 A battery-capacitor hybrid setup
The ChargeCar project [1] is committed to finding new ways to bring down the costs for
EVs. One of their ideas is to exploit the advantages of both batteries and capacitors.
1.1.1 Battery vs. supercapacitor
The battery has been the main way of storing energy in EVs because of their large power-
to-weight ratio. This simply means it allows the car to go further without adding a lot of
2
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
14/85
Figure 1.1: The Nissan LEAF full electric vehicle
weight. The downside however is that batteries are generally more inefficient when coping
with large spikes in power demand. Not only do fast charges and discharges decrease bat-
tery life expectancy, they also decrease the capacity of batteries. This is especially true
for lead-acid batteries, due to the chemical and structural changes in the interface under
high load. These increase resistance and therefore decrease capacity. For lithium-based
batteries on the other hand, it is shown that they lose capacity because of the higher
temperatures caused by high power load. [11].
Capacitors, like batteries, are electrical components used to store energy. They consist
of two metal plates separated by a thin isolation layer. Electrons can then be transfered
from one plate to another, charging and discharging the capacitor. One advantage over
batteries is that they show little degradation even after several hundreds of thousands of
charge cycles. Theyre especially more proficient in handling large power demand peaks
than batteries. The energy density of capacitors is a lot lower than batteries however.
Supercapacitors1 on the other hand, have a much greater energy density than capaci-
tors [24]. The amount of energy that can be stored in a capacitor increases with thesurface area of the metal plates. In supercapacitors, the plates are coated with a carbon
layer, etched to produce many holes that extend through the material, much like a sponge.
This increases the interior surface area many orders of magnitude, greatly increasing the
energy density (> 100,000 times).
1Also referred to as electric double-layer capacitor (EDLC), ultracapacitor, etc...
3
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
15/85
1.1.2 Controller
The solution presented by ChargeCar is to exploit the advantages of both a battery pack
and a supercapacitor. The capacitor is used for high spikes in power demand, and to save
energy generated while braking (through regenerative braking). When the capacitor is
empty, the battery is used to supply power. When generating power while the capacitor
is already full, the battery is charged. The supercapacitor effectively works as a buffer
between engine and battery, relieving the battery. Using both systems together then
allows car manufacturers to use cheaper, more cost-effective components. Furthermore,
capacitors function well in temperatures as low as 40 C, when batteries are at their worst.
To direct the energy flows between battery, capacitor and engine a controller is needed as
Figure 1.2: A controller guides power flows between battery, capacitor and engine
shown in figure 1.2. When accelerating, as much power as possible should be drained from
the capacitor to handle the high energy demand for accelerating. When braking all energy
generated by regenerative braking should be saved in the capacitor. After accelerating and
keeping a constant speed, the capacitor will be nearly empty, which means the battery
will need to be used. Ideally this is the only time the battery is used.
Now consider the following situation: the car is approaching an intersection but only stops
there sometimes. It could be useful in this situation to make sure the capacitor is com-
pletely filled in case the car does slow down. Another example: the car is driving steadily
at 70km/h but wants to overtake another car. The capacitor should have some energyleft to handle the short power burst, which means its desirable to transfer some energy
slowly from the battery to the capacitor, when the capacitor is almost empty, to handle
any possible peak.
Finding an optimal controller then, is a complex problem. To minimize battery usage it
could be beneficial to try to predict vehicle behavior and upcoming driving environments.
An intelligent controller could then use these predictions to optimize capacitor usage.
4
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
16/85
1.2 Problem statement
In this thesis we will investigate to what extend it is possible to predict the future power
demand and speed, as well as other upcoming events that affect this demand. These
predictions could then be useful for the controller described above. Previous research on
this subject includes the prediction of power demand for hybrid vehicles on a fixed route
by Bartholomaeus et al.[4] and Johannesson et al.[18], but these make use of a fixed route
where the data set is perfectly aligned, and they predict vehicle behavior assuming the
vehicle drives along the same route. The reality however, is much more complex. To
truly investigate the possibilities of prediction in real-life situations we do not make the
assumptions made by Bartholomaeus and Johannesson, making it less relevant to compare
results. In this work predictions are made without assuming the vehicle is on a fixed route.
The models presented here can be easily adapted to work under real-life circumstanceswhere vehicles are driving on a non-fixed route, and data is collected while driving.
1.3 Content and structure
To do this we first try and gather information of previous trips in a single data structure in
Chapter 3. We then try several approaches using Reservoir Computing and other machine
learning techniques explained in Chapter 2. One approach is to use these techniques for
time series prediction of power demand and other factors that it depends on (e.g. speed)
which we discuss in Chapter 4. In Chapter 5 we try to calculate the chance of stopping
within a short distance. Finally in Chapter 6, we use the stop predictor from Chapter 5
and determine if they can improve the prediction models presented in Chapter 4.
5
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
17/85
Chapter 2
Reservoir Computing
When problems become so complex they cant be solved efficiently by ordinary algorithms
sometimes a near-optimal solution can be found more efficiently using machine learning
techniques. This usually comes down to a model that is trained to capture underlying
characteristics of data. These can then be used to predict the output of new input data.
2.1 Introduction to neural networks
A neural network is a model based on the biological structure of the brain and consists
of several interconnected neurons. Each neuron has input and output connections that
connect it to the rest of the network. The output is calculated by taking a weighted sum
of those input connections, usually transformed by a non-linear activation function (for
the rest of this work the hyperbolic tangent tanh is used).
When there are no recurrent connections or cycles in the network, the network is called
a feed-forward neural network (FFNN). If there are cycles in the network it is called a
recurrent neural network (RNN). A neural network is trained by adjusting the weights
according to the error rate between target and predicted output. If the output depends on
large chains of neurons the adjustments can become so small that they cant be calculated
anymore. In RNNs the output depends on infinite chains of neurons which makes it very
hard to train this type of neural network. Algorithms like back-propagation-through-time
[30] are able to solve the problem but it the algorithm is very complex and takes a long
time to calculate.
6
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
18/85
2.2 The Reservoir Computing approach
Reservoir Computing (RC) is a fairly new approach to training recurrent neural networks
[28]. Its a unifying term for several similar methods discovered independently, the most
important ones being Echo State Networks [16] and Liquid State Networks [21]. The idea
is to never train the network itself but to only train the weight of each neuron to the
output: a readout function. All other weights, such as the input connections and internal
connections, are fixed and initialized randomly, but can be scaled and tuned (see section
2.5 on reservoir parameters).
To understand the dynamics of reservoirs consider the following analogy, which is usu-
ally given to explain Liquid State Networks. It does not capture the whole picture of
Reservoir Computing, but it gives an idea of what happens to the state of the reservoirwhen affected by an external input. Imagine the hidden layers of the reservoir network
as a real reservoir or liquid. We would like a warning system that warns us when some-
one throws a large object in the liquid (the input). A single throw will generate ripples
in the reservoir, converting that input to a spatio-temporal pattern along the surface of
the reservoir. To detect this pattern we place floating sensors in the reservoir which are
evidently connected through the liquid. The state of the reservoir can then be read from
the sensor values at a specific point in time.
This analogy makes it clear that certain parameters can heavily influence the reservoir
dynamics: the number of sensors, the size of the thrown object (the input), the way the
connecting surface behaves when an object falls in the water, etc... They are further dis-
cussed in section 2.5 in the context of Reservoir Computing specifically.
In general, reservoirs are used to give a high-dimensional dynamic representation of the
input, called the state of the reservoir. Because they are interconnected, they possess a
memory which depends largely on the scaling of the internal connections. Extra memory
can also be introduced for every neuron individually by retaining a part of the previous
neuron output value. To work properly, the reservoir needs to satisfy the Echo State
Property[16, 15]: the reservoir needs to wash out any information from initial conditions.
In practice, a reservoir network consists of the following:
7
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
19/85
Figure 2.1: A schematic representation of a typical reservoir network. Solid arrow lines are not
trained. Dashed arrow lines are trained connections.
u[k] The reservoir input vector on time step k
x[k] Reservoir state on time step k
y[k] The output vector on time step k
A schematic representation is given in Figure 2.1.
The state of the reservoir x[k] that retains 1 of the previous state x[k 1] at each time
step k is given by:
x[k] = (1 ) x[k 1] + f(Wrx[k 1] + Wiu[k] + Wb)
The weights of the internal connections Wr are initialized with random values from the
normal distribution, but scaled so that the largest absolute eigenvalue of the random ma-
trix is equal to a given parameter value: the spectral radius (see subsection 2.5.2). The
input weights Wi are initialized randomly as well, but are rescaled by the input scaling
parameter (subsection 2.5.1). A bias is sometimes added to the input with scaling Wb
(see subsection 2.5.4).
The output is calculated by:
y[k] = Wor
x[k] + Woi
u[k] + Wob
The output weights Wor
(reservoir to output), Woi
(input to output) and Wob
(output
bias) need to be trained. They are the dashed connection arrows in Figure 2.1.
8
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
20/85
2.3 Training
The main advantage of training recurrent neural networks using Reservoir Computing is
that only the weights Wor
of the reservoir neurons to the output need to be trained. An
additional linear connection straight from input to output with weight Woi
is sometimes
added, as well as a constant value (or bias) with weight Wob
.
This training approach not only reduces the time required for training but also allows
a wider variety of training methods to train the output weights. For this work, two dif-
ferent training methods are used:
2.3.1 Linear regression
The first step for training the weights using linear regression consists of letting the reservoirrun over all the samples and keep the reservoir states on every time step k in a matrix A.
Suppose we want to train the weights W using simple linear regression then we want to
find the least squares solution of the desired output y and the predicted output y:
Wopt = argminW
A W y2
There exists a closed-form solution:
Wopt = (ATA)1ATy
Although A is large (nsamples nneurons), its still possible to calculate the output weights
relatively fast when compared to other RNN training techniques such as BPTT.
2.3.2 Logistic regression
Logistic regression is a classification method that models the probability of an input sample
x belonging to a certain class. In contrast to other probabilistic models, logistic regres-
sion uses a discriminative approach which classifies the inputs directly with the following
probability:
p(C1|x) = f(x) = (wTx) =
1
1 + exp(wTx w0)
and p(C2,x) = 1 p(C1,x) when solving a binary classification problem.
9
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
21/85
Figure 2.2: An example of classification using the logistic function. The shaded area is the overlaparea between decision spaces. The red line is the suggested hard threshold
These distributions are visualized in Figure 2.2. When distributions are not linearly
separable1 theres an overlap area which means hard threshold needs to be defined. This
is often the point at which both probability functions intersect, but other thresholds can
be chosen if misclassification costs are different for the two classes.
The weights of the logistic regression model are found by minimizing the cross-entropyfunction:
E(w) = lnp(t|w) = N
n=1
tn ln yn + (1 tn)ln(1 yn)
Where tn {0, 1}, 1 if the input sample belongs to class C1. yn is the predicted output of
the model. Note that correctly classified samples that lie far from the decision line do not
get penalized. In ridge regression however, when samples would be correctly classified,
but lie far from the target output, they get penalized when they lie far from the target
output. This is further illustrated in Figure 2.3.
The solution for the minimization does not have a closed form but it is a convex problem2
so we can find it through gradient descent3. There exists a gradient descent approach
1Two sets of points in two dimensions are linearly separable if they can be separated by a single straightline
2A convex problem is a problem that has a unique minimum3Gradient descent is an optimization algorithm that finds the minimum error by taking steps to the
negative of the gradient (or derivative) of the error function
10
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
22/85
Figure 2.3: The error measure E(z) for the mean squared error of target and model output (used e.g.in classification using linear regression) and the cross-entropy function used in logistic regression.In z = y t, y is the model output, and t the target output of the model. For t = 1: a model outputy = 2 is penalized more by the mean squared error than a model output of y = 1, although theyare both classified correctly. For the cross-entropy function this is not true, and could therefore bemore suitable for classification.
based on the Newton-Raphson iterative optimization scheme called iteratively reweighed
least squares (IRLS) [23]. The weights are updated each iteration subtracting the deriva-
tive of the error function divided by the second derivative. The derivation and specifics of
this algorithm are not important for this work but the basics steps consist of the following
each iteration:
y = ((w())Tx)
Rnn = yn(1 yn)
z = Xw() R1(y t)
w(+1) = (XTRX)1XTRz
2.4 Regularization
When training a complex system, the model can become overfitted to the training samples.
This means that the model will perform well on the training set, but not on new test
data, because it is trained on examples that are not representative for the full range of
possibilities. When the model is then tested on a sample it has not seen before in the
training set, it wont know what to do with it. For example[27], suppose we want to train
a model to predict the Fibonacci sequence [1, 1, 2, 3, 5, 8,...], and we give it the examples
11
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
23/85
[1, 1], the model will be trained to output 1. The training examples, like in the Fibonacci
sequence, are examples of the underlying characteristics, affected by a small deviation
or noise. The underlying characteristic for the Fibonacci sequence is known: the n-th
Fibonacci number can be calculated by rounding
(1+2)n
5 . This deviation between thetraining examples and the underlying characteristic therefore lies always between 12 and
12 . If the model is too complex, the deviation is trained as well. One of the ways to
smooth this noise is by constraining weight size. This makes the model less sensitive to
noise and slight deviations. However, if this constraint is too strict, the model is simplified
too much to learn the underlying characteristics. A trade-off needs to be made. Using
Tikhonov regularization [25] or ridge regression, the trade-off can be tuned with a single
regularization parameter [32].
To find the least squares solution the regularized weights W are now found by:
Wopt = argminW
A W y2 + W2
in which is the regularization parameter. This minimization problem has a closed-form
solution as well, the weights can be calculated as follows:
Wopt = (ATA + I)1ATy
When is large, the size of the squared weights will increase the cost a lot. Setting too
high however, will increase the distance between the optimal solution and the regularized
solution (referred to as underfitting). To optimize , the same model is trained each time
with a different . The performance of each is then evaluated on new samples that are
not a part of the training set.
For training reservoir networks in particular, its important to note that the weights
depend on the random initialization, and that a regularization parameter needs to be
optimized for every reservoir specifically.
More regularization is needed as reservoir size increases because complexity increases: inthe extreme case there is a reservoir node for every training sample, mapping the sample
exactly to the output. On the other hand, if reservoir size is extremely low, no regulariza-
tion is needed because the model is not as complex.
For logistic regression, proper regularization is often necessary as well. The optimized
regularization parameter can be added to the IRLS algorithm easily by modifying the
12
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
24/85
weight update as follows:
w(+1) = (XTRX+ I)1XTRz
2.5 Parameters
As mentioned before, there are a number of parameters that need to be determined to
control the dynamics of the reservoir. The results of a model trained using Reservoir Com-
puting depend on the careful fine-tuning of these parameters. When using regularization
in the readout function, the regularization parameter should be optimized separately for
every reservoir parameter. Each model is trained using several different regularization
parameters and tested on a validation set. The model with the optimal regularization
parameter is then evaluated again on a separate test set to be sure of the general perfor-
mance of the model. We therefore need divide the dataset in three parts. This could be
a problem, especially when using a limited amount of data because we could accidentally
choose a poor set of samples which can lead to misleading results.
The best solution to counter this problem is cross-validation. The dataset is divided
in K subsets. Each subset is then used exactly once as a test set and the others as a train-
ing set. After this, the result is averaged, making sure the result is valid for the complete
dataset. When an extra validation set is needed as well, the subsets used for training are
divided again in a smaller training subset and a validation set. For example, suppose the
dataset consists of 4 samples then the cross-validation scheme is shown in Table 2.1.
2.5.1 Input scaling
Input scaling determines the scaling of the random input weights to the reservoir. They
determine how much the neurons are excited by new input values. For very low input values
the nonlinear neuron activation functions are barely activated resulting in an almost linear
system. Very high input values however, will saturate the activation function, resulting
almost in a binary step function. In other words: the input scaling determines the degreeof nonlinearity in the system.
2.5.2 Spectral radius
The spectral radius of a reservoir is the largest absolute eigenvalue of the weight matrix
of the internal connections between the neurons in the reservoir. It therefore defines the
factor by which the previous states are multiplied in the reservoir state update (section
13
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
25/85
Training set Validation set Test set
1,2 3
1,3 2 4
2,3 1
1,2 41,4 2 3
2,4 1
1, 3 4
1, 4 3 2
3, 4 1
2, 3 4
2, 4 3 1
3, 4 2
Table 2.1: An example cross-validation scheme where a dataset of 4 samples is divided in a trainingset, a validation set, and a test set[27]
.
2.2). If we choose a spectral radius < 1, the input values will eventually fade out, ensuring
stability and the echo state property. With a spectral radius > 1, the reservoir can become
unstable if the reservoir is near linear.
The internal connections between the neurons add memory to the reservoir. The spectral
radius and the scaling of the internal connection weights therefore influences the time scaleof reservoir. For input that evolves slowly, or that has long range temporal interactions,
the spectral radius is usually chosen close to 1, or even higher if the reservoir is nonlinear
enough.
2.5.3 Leak rate
The leak rate of each neuron in the reservoir controls the retainment rate of the previous
neuron output. It influences the memory of the reservoir directly. This also means it
affects the influence of new state updates and therefore also makes the reservoir adapt
more slowly to new situations. Therefore, a trade-off between the influence of long-term
dynamics and the influence of new input needs to be made.
2.5.4 Bias scaling
A constant 1 may be added to the input, multiplied by the bias scaling parameter. This
shifts the working point on the sigmoid activation function tanh of the neuron. The steep-
14
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
26/85
ness of the sigmoid is largest around the origin. Shifting the point upward or downward
therefore makes the reservoir less dynamic. An illustration of the influence of the bias to
the activation function is shown in 2.4.
Figure 2.4: Illustration of the effect of bias scaling. Using 0 as working point for the input broadensthe spectrum of the neuron activation (red line). When shifting the bias, the neuron exhibits aless dynamic behavior (the green line).
2.5.5 Reservoir size
The reservoir size is the amount of neurons in the network. Increasing reservoir size
usually improves the result, assuming sufficient regularization. Its therefore not really
an optimizable parameter as the reservoir size is normally determined according to the
computational power available.
2.6 Time series prediction
Predicting time series, as it is basically predicting the future, has been the focus of much
research throughout history. The basic idea is to first observe a training sequence andthen try to complete the sequence over a number of steps in the future, also referred to as
the number of freerun steps in RC literature.
To be able to predict by only taking into account the history there needs to be a pat-
tern in the sequence. That pattern can either be periodical (like 1, 2, 1, 2, 1, 2, . . .) or
contain a certain trend (such as the Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, . . .).
15
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
27/85
2.6.1 Time series prediction using Reservoir Computing
Reservoir Computing has already been successfully used for the signal generation task [17].
The model is trained by training it to always output the next step ahead. After training
these signal dynamics, the first unknown signal value is predicted according to the learned
signal transitions (teacher forcing). The predicted value is then used as input value to
predict the following signal value, and so on. This continues until the required number of
predicted steps is reached.
Its very important to let the reservoir neurons warm up sufficiently before starting predic-
tion, as the model obviously cant complete a sequence if it doesnt know the first part of
the sequence. Warming up a reservoir is done simply by letting the run the reservoir over
the history of the signal and feeding back the observed values. The evolution of neuronactivations when running over an example power demand profile can be seen in Figure
2.5. In this case, we can see that the neuron output values are initialized at a value close
to 0 and take about 100 state updates before patterns begin to appear in the reservoir.
Figure 2.5: An example of the output values of the first 10 neurons over the first 200 state updateswhen running the reservoir over a power demand profile of a car
One of the short-comings of time series prediction with Reservoir Computing, is thatoften it can only act on short-term time scales, although this can be partly solved by
retaining a part of the previous state. However, RC has already been tried successfully
for long-term financial time series prediction in [31] and [26]. Financial time series can
often be decomposed in periodical patterns, a trend and a remainder. These signals can
then be predicted separately and combined again after prediction. In this work we predict
the time series of vehicle behavior such as the power demand course, the speed profile
16
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
28/85
and acceleration. After initial experiments it became clear however that these are not
as straight-forward to predict as previously attempted and resulted in poor results. Our
approach to this problem is explained in Chapter 4.
2.7 Classification
Reservoir Computing can also be successfully applied to many temporal classification
problems such as speech recognition [27] and the detection of epileptic seizures [7]. In
the first example, the samples are isolated and need to be classified as a whole. In the
second, the input is a signal where the class needs to be defined at every time step. Every
classification then depends on the current input values and the previous input samples
(thanks to the memory properties of reservoir networks). The advantage of classification
using RC is that the reservoir not only maps the input values to a high-dimensional featurespace but that input values are memorized in the reservoir for a certain time in order to
detect the temporal patterns as well. The reservoirs states can then be classified using
any classification technique. The classification problem handled in this thesis is explained
and addressed in Chapter 5.
17
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
29/85
Chapter 3
Data analysis
The data is supplied by ChargeCar.org [1]. It consists of the GPS coordinates and elevation
of each point sampled on a one-second interval. The dataset contains data from multiple
drivers, but the GPS points are not gathered in the same neighborhood, so they cant be
combined. Well focus on the available data from one driver alone as it already allows for
extensive research. We will be using the trips from a driver who drove around in south
San Francisco. In total about 6915 km is covered in about 159 hours (Figure 3.1). The
driver frequently drives along the same road segments. It could therefore be interesting to
keep data from previous trips over the same road. Future predictions can then be based
on the previous trips when passing in the neighborhood.
Figure 3.1: ChargeCar GPS data
18
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
30/85
3.1 A road graph
To access data from previous trips easily, a suitable data structure is needed along with an
efficient algorithm to build it. The greatest difficulty is detecting when a car is driving on a
road where he has already been, and quickly finding the related GPS points. We follow the
approach described by L. Cao and J. Krumm to build a graph of directed road segments
[8]. They present a new and fully automatic method for creating a routable road map from
GPS traces of everyday drivers. The algorithm described by Cao and Krumm performs
well for ensuring road connectivity and differentiating GPS traces from close parallel roads.
First, to increase efficiency, a separate dataset is made by generally retaining only one
point every 30 meter. When the direction change over the last three points is greater than
10 degrees, the GPS points are retained every 10 meter. This increases accuracy when thecar is making a turn. Some of these points will be close and should be merged, making
sure the connectivity in the graph is not lost. To build the data structure we start with an
empty graph. Each trip is then processed sequentially. For each node in a trip the graph
is searched to decide whether it should be merged with an existing graph node.
Intuitively, a new node n should be merged with node v from the graph if theyre on
the same road segment. Let e be a road segment that connects v and another node v
from the road graph. Then n should be merged with v if the distance from e to n is small
enough, the trip goes in the same direction as e, and n is closer to v than v.
An illustration of the process can be seen in Figure 3.2. The first trip becomes the initial
graph. In the second trip, the 2nd, 3rd, 4th and 5th nodes are merged with the existing
nodes. The 1st, 6th and 7th node are copied to the graph. The road segments from the
second trip are connected from the new nodes to the existing nodes to ensure connectivity.
No nodes from trip 3 satisfy the merge conditions so they are simply copied to the graph.
The algorithm without optimization is very inefficient because every GPS point of the
trip needs to be compared to every node in the graph. The time required to add a trip
increases dramatically as the number of nodes in the graph increases. However, its clear
that the current GPS point will never need to merge with nodes far away. For this pur-
pose, the nodes are kept in a 2-d tree. Using a 2-dimensional distance tree all nodes within
range can be looked up in O(logN) time [19] (where N is the amount of nodes in the road
19
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
31/85
Figure 3.2: A simple example of the merge algorithm. The circles represent the retained GPSpoints. The arrowed lines represent the connecting road segments along with the driving direction.
graph). Storing the road graph nodes in a 2-d tree therefore significantly reduces the timerequired to add a trip to the road graph.
3.2 Extracting useful information
Although we had no access to real data such as speed and power usage, a lot of infor-
mation can be calculated from the GPS data. Speed, acceleration, and power demand is
calculated for every sample using the power model described by ChargeCar.
After building the graph, the complete dataset is used again, and mapped on the roadsegments between the road graph nodes. This way, the road graph can be built efficiently
without losing information about vehicle behavior between those points.
The dataset consists of samples with 1 second intervals. This means that the time spent
over a road segment of 30 meters between two nodes can be very variable, because it
depends on the speed of the vehicle over the road segment. In order to correctly calculate
20
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
32/85
and align the average behavior over the road segment, while properly retaining any peaks
in the profiles, interpolation is needed. The distance traveled over the road segment is al-
most constant for every trips. The trips are therefore interpolated every meter, converting
them from a sample every second, to a sample every meter. Over the rest of the work,the time series described are converted from a time scale to a distance scale. The effect of
this interpolation is illustrated in Figure 3.3.
Through testing we also noticed that vehicles driving exhibit very different behavior from
Figure 3.3: The speed profile on a time scale (left profile) is interpolated to a distance scale (rightprofile)
vehicles stopping. Suppose a vehicle stops 50% of the times he passes through an inter-section where the speed limit is 50 km/h. The average speed over the road segment would
then be 25 km/h. However, he rarely really drives at this speed, but usually either stops
and continues slowly over the intersection, or he doesnt need to stop, and keeps driving at
50 km/h. We therefore separate the captured information in a slow set and a fast dataset
on every road segment.
When passing through a road segment, the current vehicle behavior is added to the slow
set if:
The car stops on the current road segment, i.e. if the car drives slower than 2 m/s
= 7.2 km/h at any point over the road segment.
Or, the car stops in a road segment within 100 m before or after the current position,
and the cars average speed over the segment is more than 2m/s slower than the total
average speed of the fast set.
21
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
33/85
If neither condition is satisfied, the information about the cars behavior over the segment
is added to the fast data.
The average profiles over a path in the road graph can now be collected by concatenatingthe average profiles captured in each road segment. An example of the average speed
profiles is given in Figure 3.4. Small jumps in the average profiles couldnt be avoided,
because the amount of trips, that drive over the different road segments in the chosen
path, is variable.
Figure 3.4: An example of the average speed profiles over a trip in the road graph
3.3 Error measures
To evaluate the techniques used, obviously some sort of evaluation method is needed.
The ultimate goal is to minimize battery usage, but the controller using these predictions
remains hypothetical. Its therefore impossible to define a single number to capture the
goodness of a model. Its still possible however, to reason about the usefulness of the
proposed models using the following error measures.
3.3.1 Defining a prediction distance
If we want to evaluate the predictive capabilities of each model we should first specify on
what scale prediction is required. Of course, this can be very different for each applica-tion. The initial purpose for this work however, is the improvement of a battery/capacitor
controller so we will focus on this example.
The most ideal situation would be to allow the capacitor to empty completely while driv-
ing at a constant speed, to be ready to save all the energy contained in the moving car.
The theoretical maximum prediction distance is then the distance traveled to empty the
22
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
34/85
capacitor. This can be calculated using the specifications of the vehicle used to capture
the GPS data.
In the ChargeCar data a Honda Civic is used. The power required for this vehicle drivingat a constant speed is can be calculated as follows, with the constants and units described
in Table 3.1.
P=A
2 Cd D v
3 + Cr m g v
Symbol Description Value
A Frontal area 1.988 m2
Cd Drag coefficient 0.31D Air density 1.29 kg/m3
Cr Roll resistance coefficient 0.015
m Car mass 1200 kg
g Gravitational acceleration 9.81 m/s2
W Capacity energy capacity 190080 J
v Vehicle velocity m/s
P Power W
Table 3.1: Constants and units needed to calculate the prediction distance
The supercapacitor used in the ChargeCar test car is the Maxwell BMOD0165. The
maximum stored energy in this capacitor is 52.8 Wh or 190080 Joule. The time needed
to use the 190080 J while driving at constant velocity v is then:
t = 190080 JA2CdDv3+Cr mgv
= 190080 J0.3975v3+176.58v
The distance covered in function of that time and velocity is then:
d = v 190080 J
0.3975 v3 + 176.58 v
23
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
35/85
Figure 3.5: Maximum prediction distance vs. vehicle velocity
Figure 3.5 shows that the required distance decreases as speed increases. At 1 km/h
the maximum distance is 1076 meter. As it approaches 0, the distance approaches +.
From this result we would say that we need to predict the coming kilometer. However, he
car will usually be driving at minimum 50 km/h. The distance required at this speed is
750 m.
3.3.2 Root Mean Square Error (RMSE)The root mean square error is a frequently used error measure of the deviation between a
predicted model and the actual observed values. It allows us to compare two signals (e.g.
the speed profiles) and aggregate the point-wise differences (or residuals) between them
into a single number to evaluate the models used. For vectors x (observed values) and x
(predicted values) the RMSE is calculated as follows:
RMSE(x,x) =MSE(x,x) =
ni=1(xi xi)
2
n
3.3.3 Kurtosis Difference
Kurtosis is a measure of peakedness, in Machine Learning usually used as a measure of
non-Gaussianity 1 [3]. Through experiments we noticed a model sometimes converges to a
weighted average of the history of the current trip. This might be the best result according
to the RMSE and MAD error measures above but will not be as useful for a controller1Gaussianity is the similarity of a distribution with the normal (or Gaussian) distribution
24
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
36/85
because a controller needs a predictor that can correctly predict energy spikes or other
events that have a large influence on energy usage. It could then be useful to compare the
peakedness of the prediction profile with the peakedness of the observed profile. We could
then get a better idea of the similarity of both signals.
The peakedness (or kurtosis) of a vector x with mean x is calculated by:
Kurt(x) =1n
ni=1(xi x)
4
1n
ni=1(xi x)
22 3
To compare the peakedness, the kurtosis difference error measure is presented. No previ-
ous work was found on this measure, at least in the context of Machine Learning. The
kurtosis difference is merely used in an attempt to quantify the models in a secondary
way. The kurtosis difference of the model output y and the target output y is calculatedby: Kurt(y) Kurt(y).
3.3.4 Receiver Operating Characteristic (ROC)
The Receiver Operating Characteristic (ROC) is a classification error measure developed
in WWII by radar engineers and has since been used in large number of areas. In recent
years it has also gained a lot of interest in the field of machine learning and pattern recog-
nition [13].
The output of classifier models is usually continuous, but its often hard to evaluate per-
formance of these model because usually a hard threshold needs to be set to classify the
output. The ROC curve is able to visualize the trade-off between the hit rate (or true
positive rate or TPR) and the rate of false positives (or FPR) in binary classification. The
TPR is equivalent with the proportion of actual positives which are correctly identified.
The FPR on the other hand, is the proportion of negatives which are wrongly identified
as positive.
The curve is calculated by iterating over every possible hard threshold that can classify
the output of the model. Classifiers appearing on the lower left hand-side of an ROC curvecan be thought of as strict or conservative (A in Figure 3.6). They only make positive clas-
sification with strong evidence. Classifiers appearing on the right hand-side of the ROC
curve are less selective but result in a lot of false positives (B). Intuitively the best model
then stretches as far as possible to the top left hand-side (C). Random classification models
result in points along a straight line from the bottom left to the top right of the graph (D).
Classifiers under this line (E) can be thought of as worse than random, but if wed inverse
25
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
37/85
Figure 3.6: A basic ROC graph showing five discrete classifiers
the classifier and switch the target classes, the classifiers result is inversed, which makes
it perform better again than the random classifier. Consequently, the models that extract
no knowledge from the data will roughly follow the straight line from the random classifier.
The ROC measure also provides a way of evaluating model performance without set-
ting a threshold by calculate the total surface area under the curve, known as Area Under
Curve (AUC). Well mainly be using this measure for comparing performance of models
for binary classification.
One single threshold can be selected along the ROC curve to know the exact percent-
age of correctly classified samples for a selected trade-off between false and true positive
rates. Often the threshold is chosen where the false and true positive rates are equal. If
the classifier is used for a specific purpose, the cost of a false positive is sometimes higher
than the cost of a true negative or vice versa. The threshold can then be optimized in
relation to that application specifically. Two example ROC curves are shown in Figure3.7, along with the line at which the two error rates are equal.
26
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
38/85
Figure 3.7: Example of the ROC curves of two models, including the Equal Error Rate line, wherethe True Positive Rate is equal to the False Positive Rate.
27
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
39/85
Chapter 4
Time series prediction of vehiclepower, speed and acceleration
To predict the future power demand profile of a car using RC techniques, a different ap-
proach as the prediction of financial time series is needed, because there are typically fewer
periodic factors in driving a car. Stopping and accelerating often happens in a similar way,
but these stops come at near random intervals if there is no pre-existing information about
the cars environment.
Vehicle power demand depends on many physical factors and can be decomposed in other
ways: elevation differences, acceleration, speed, etc... . Elevation is not expected to change
over several trips so we can read it directly from the road graph. Predicting the vehicleacceleration and speed however, is far more complex. In this chapter, we present and eval-
uate several time series prediction models for the vehicles power demand, acceleration
and speed profile. An example of these profiles can be seen in Figure 4.1. Note that from
here on, the profiles are evaluated on a distance scale instead of a time scale, as explained
in the previous chapter.
4.1 Evaluation methodology
The theoretical prediction distance calculated in section 3.3.1 was 750m. However, thememory capabilities of reservoirs is limited and after predicting a number of steps, the
influence of the real observed samples on the reservoir diminishes[14]. In the setup pre-
sented by Jaeger, the input was forgotten after about 400 time steps. When adding noise,
the memory reduced to around 200 time steps. Experiments showed that trying to predict
any further than 200m with the models presented here, yielded misleading results which
were difficult to explain. We therefore decided to limit the evaluation to predicting the
28
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
40/85
Figure 4.1: An example power demand, speed, acceleration and elevation profile
first 200m.
As mentioned in Chapter 3, in total about 6915 km is covered. The dataset was con-
verted to a distance scale which means the dataset now contains about 6,915,000 samples.
To train and test the models in this chapter, all trips in the dataset were divided in 200m
intervals. The information of previous trips over each interval was extracted from the road
graph and merged with the intervals.
For time series prediction, especially using RC, a warm-up period is needed (see section
2.6.1). We couldt use the reservoir states of the previous intervals because the reservoir
states contain the predicted signal over the interval, not the real signal values. A poor
prediction would therefore have an effect on the next prediction. Moreover, the previously
predicted interval is not always a part of the current trip.
A warm-up period is therefore added before every prediction interval. This should be
long enough to forget the previous states, but short enough to limit memory require-
ments, and the computation time required to run over all the warm-up samples. Because
the maximum possible prediction distance seemed to be around 200m, we expect that
300m is enough to largely forget the state of the previously predicted interval.
29
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
41/85
Since the input weights and the internal weights of a reservoir are chosen randomly, a
particularly good reservoir could be generated by training one model while another model
was trained with a very weak reservoir. This could lead to misleading results. Therefore
each experiment involving RC is done over 10 different reservoir instances. When plottingthe resulting errors, the standard deviation of the results of a model is given as well using
error bars. In text, the standard deviation is given in parentheses along with the average
results.
To include all intervals, the reservoirs would now need to run 10 times over 17,287,500
samples. Additionally, were predicting the power demand as well as the speed and accel-
eration profile. Too much time would be required to finish all experiments in a reasonable
time frame. Therefore, a random subset of 2,230,500 samples (containing 9566 prediction
intervals) was chosen and fixed for the remaining experiments.
Because of the restrictions above, cross-validation was unfortunately not an option. In-
Figure 4.2: Example of splitting the profiles in prediction intervals
stead, the models were trained on the first 25% of the trips. The next 25% was used as a
validation set to optimize the model parameters. The performance of the resulting models
was then evaluated on the remaining 50% of the dataset. The test set was chosen large
enough to ensure the models are evaluated on a data set large enough to draw general
conclusions.
30
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
42/85
4.2 Baseline models
4.2.1 System setups
First, some simple and linear models are presented and evaluated. For each model we
define an abbreviation in the paragraph title to be able to refer to them more clearly
afterwards.
Last value as prediction (LV)
The last observed value is used as prediction for the next predicted values: y(t) = y(t 1).
As speed is usually quite constant this should already provide a good estimate. The longest
distances in the dataset are often over highways where speed hardly changes.
Averages from previous trips as prediction (SA/FA)
The averages of the previous trips over the path of the current trip are used directly as
prediction. The performance of the slow average profiles (SA) and the fast average profiles
(FA) are evaluated separately.
Offset averages as prediction (OA)
The predictions of the previous model dont make any use of the current trip however.
At least the first few predicted values should be close to the last known values. To solve
this, we first calculate the difference between the current trip and the averages of the lastobserved sample of the interval. This offset is then added to the averages over the rest of
the prediction.
Weighted time delay window (TDW)
A weighted average is taken of the previous values. This allows the model to incorporate
the recent history of the current trip. The weights of every point in the recent history
window y(t nwindow size), ..., y(t 1) are trained using ridge regression to predict one
step ahead: y(t). The predicted value y(t) is fed back and used as part of the current trip
history: y(t nwindow size + 1), ..., y(t 1), y(t) to predict y(t+ 1).
Of course, we need to know how many previous values need to be incorporated: the
size of the time delay window needs to be determined. As shown in Figure 4.3, the opti-
mal window size is 2 for the speed profile. For the acceleration profile, a window of size 3
is chosen. Lastly, for the power profile, a very large window size is preferred. The error
31
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
43/85
function flattens out around window size 200. This value is therefore chosen as the window
size.
Figure 4.3: RMSE error values vs. time window size in the TDW model
Weighted time delay window with averages (TDWAt/TDWAtdw)
Figure 4.4: The weighted time delay model (TDW) used for each profile (with added averages atstep t (TDWAt) in dashed rectangle)
The TDW model above can be extended with input of the average profiles of previous
trips to investigate the influence of the information in the road graph. The predicted value
at each step is combined with the average value at that step (TDWAt). Additionally, thismodel is extended again by also including a weighted average of the history of the average
profiles (TDWAtdw).
Initially all information from the road graph was included: the slow and fast averages
of power demand, speed and acceleration as well as the average chance to stop over the
current road segment and the elevation difference between two successive samples.
32
-
7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route
44/85
Some of the information in the road graph is not useful. To detect the contributing
averages, a feature selection is done on the input dimensions using Least Angle Regression
(LAR). With this method, the weights of the input dimensions that hardly contributetowards improving the solution can drop to 0, which is not possible using linear (or ridge)
regression. For the specifics of this algorithm we refer to the article by Efron et al.[12].
This method was chosen over other similar stepwise methods such as forward feature selec-
tion because its just as fast as forward feature selection but generally performs better[12].
No additional experiments were done in this thesis to confirm this however.
For every profile, the last observed value was selected. For the power demand profile,
the fast average power value was selected as well. For the speed profile almost all averages
remained non-zero, except for the slow average speed. Lastly for the acceleration profile,
both the fast average acceleration and the average chance to stop were chosen.
Training the TDWAt model weights using LAR does not perform as well as training
the weights using ridge regression i