Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

download Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

of 85

Transcript of Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    1/85

    Jonas Buyl

    route

    Power demand prediction of a vehicle on a non-fixed

    Academiejaar 2011-2012

    Faculteit Ingenieurswetenschappen en Architectuur

    Voorzitter: prof. dr. ir. Jan Van Campenhout

    Vakgroep Elektronica en Informatiesystemen

    Master in de ingenieurswetenschappen: computerwetenschappen

    Masterproef ingediend tot het behalen van de academische graad van

    Begeleiders: Pieter Buteneers, Tim Waegeman

    Promotoren: prof. dr. ir. Benjamin Schrauwen, dr. ir. David Verstraeten

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    2/85

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    3/85

    Jonas Buyl

    route

    Power demand prediction of a vehicle on a non-fixed

    Academiejaar 2011-2012

    Faculteit Ingenieurswetenschappen en Architectuur

    Voorzitter: prof. dr. ir. Jan Van Campenhout

    Vakgroep Elektronica en Informatiesystemen

    Master in de ingenieurswetenschappen: computerwetenschappen

    Masterproef ingediend tot het behalen van de academische graad van

    Begeleiders: Pieter Buteneers, Tim Waegeman

    Promotoren: prof. dr. ir. Benjamin Schrauwen, dr. ir. David Verstraeten

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    4/85

    1

    Power demand prediction of a vehicle on a

    non-fixed routeJonas Buyl

    Supervisors: prof. dr. ir. Benjamin Schrauwen, dr. ir. David Verstraeten,

    ir. Pieter Buteneers, ir. Tim Waegeman.

    AbstractIn this article several approaches are presented topredict the future power demand and speed of a car, as well asthe chance to stop in 200m. We introduce a time series predictionmodel based on Reservoir Computing, a novel technique fortraining recurrent neural networks. The model is improved byusing information of previous trips, and post-processing thepredicted output window. Furthermore, we present a RC-basedclassifier to predict the chance a car is stopping within the next200m. This is used to separate the model in a model trained onfast data, and a model trained on intervals where the car stops.

    Index TermsReservoir Computing, vehicle behavior predic-tion, stop prediction, road graph, electric vehicles, time seriesprediction

    I. INTRODUCTION

    ELECTRIC vehicles (EV) are increasingly commerciallyviable but sales figures remain fairly disappointing, oftenbecause of the high price. The battery has been the main

    way of storing energy in EVs because of its large power-

    to-weight ratio, but they are not as capable as capacitors

    to handle peaks in power demand. New research proposes

    to use supercapacitors capacitors with an energy densitymuch greater than capacitors in electric vehicles to replace

    batteries.

    The ChargeCar project [1] suggests to combine the advantages

    of both, to be able to use cheaper batteries and supercapacitors

    to reduce the manufacturing costs for EVs. The capacitor is

    used as a buffer to handle the high spikes in power demand.

    This extends battery life-time, increases efficiency in cold

    weather and can even extend the range of the EV.

    To direct the energy flows between battery, capacitor, and

    engine, a controller is needed. In this article we introduce

    several approaches to predict vehicle behavior and upcoming

    stops. These predictions can then be used to improve an

    intelligent controller.We use of Reservoir Computing (RC), a novel way of training

    recurrent neural networks [3]. Instead of training all internal

    weights, only the output weights are trained. The weights of

    the input and the internal connections are generated randomly

    and remain constant.

    First, we modify a GPS map generation algorithm presented by

    L. Cao and J. Krumm [2], to keep information about the cars

    current power demand, speed, acceleration, etc... . This infor-

    mation is then used as extra input for time series prediction

    of the power demand, speed and acceleration profiles using

    Reservoir Computing [4]. Additionally, we observed that the

    weight of the predicted output vs. the information from previ-

    ous trips decreases as the range of the prediction increases.

    Therefore the predicted output windows are post-processed

    using a simple linear model. Furthermore, we introduce an RC-

    based classifier model to predict if the car stops within 200m.

    The classifier is then used to separate the time series prediction

    model for situations where the car stops within 200m.

    I I . VEHICLE BEHAVIOR PREDICTIONA. Pre-processing

    After building the road graph data structure defined by Cao

    et al. [2], the complete dataset is mapped on the road segments.

    To better align the trip data, it was interpolated every meter,

    converting the data to a distance scale.

    B. Single RC time series prediction model (RCLA)

    The speed, acceleration and power profiles are used as input

    in separate systems with a reservoir of 150 neurons each. The

    neuron output weights are trained using ridge regression. Each

    predicted value is sent back in an output feedback loop to

    recursively predict the rest of the sequence. The reservoir stateoutputs at each step t are extended with the averages of the

    information of previous trips at each step t, to predict the next

    output value y(t).

    C. Output window post-processing (OWPP)

    The training process of the time series consists of only

    predicting the next step ahead. We observed that the influence

    of the predicted output vs. the information from previous

    trips decreases as the range of the prediction increases. The

    output window is therefore post-processed by applying linear

    regression at each time step t individually, combining the

    predicted values with the average values at point t.

    D. Stop prediction

    A reservoir was used with a logistic regression readout to

    classify a sample t as point where the car stops in 200m. First

    the current power demand, acceleration and speed at t is used

    as input of the reservoir. Secondly, the average acceleration

    of previous trips at t + 20 is used, excluding trips where thecar does not stop within 200m of t + 20. Lastly, the averagechance to stop within 200m of t + 20 is used.For the evaluation of this classifier, the area under the ROC-

    curve (AUC) is maximized. The true positive rate and false

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    5/85

    2

    positive rate are calculated for every threshold that can sep-

    arate the classes, when classifying the output between [0, 1]of the reservoir readout. A maximum average AUC of 0.955

    was found and 94.5% of the samples were correctly classified,tested on a dataset in which 10% of the samples are actualstops. From Figure 1, we can see that the predicted chance

    to stop is usually high at points where the car stops. Around

    places where the car breaks but doesnt stop, the output can

    be high as well. This could be interpreted as an error, but this

    output may still be useful for some applications.

    Fig. 1. An example of the output of the RC stop prediction model. The greenareas are the target areas where the car stops within 200m. At the bottom, thespeed profile of the trip is given for comparison with the actual car behavior.The chosen threshold line is shown as a grey dashed line.

    E. Split model time series prediction (RCSP)The RCLA model was separated by training and optimizing

    one model on a dataset of intervals where the car stops, and

    one model on the other intervals. The stop classifier is then

    used to determine which model should be used for the time

    series prediction. Finally, the OWPP filter was separated as

    well and applied to the RCSP model to further improve results.

    III. EVALUATION

    The proposed RC-based models were compared with a

    number of linear methods. The best performing methods made

    use of a time delay window (TDW): a weighted average of

    the previous values, trained using linear regression. A secondmodel extends the TDW model by including a weighted

    average of the information of previous trips (TDWAtdw).

    Trips of one driver were used from the dataset supplied by the

    ChargeCar project. A random subset of 2,230,500 samples was

    chosen and divided in 9566 intervals to predict. Of these data

    25% was used for training, another 25% for validation, and theremaining 50% was used to compare the models. The resultsof the RC models are the averages taken over 10 reservoir

    instances.

    The results of all discussed models are given in Table I. The

    first RC-based model RCLA does not yield much better results

    than the linear methods. However, after output window post-

    processing the Root Mean Squared Error (RMSE) can be

    decreased significantly. The RCSP model predicts the speed

    better than other models, and when extended with the OWPP

    filter, the model outperforms any other tested model to predict

    the power demand, acceleration and speed profiles. In Figure

    2 the absolute deviation is given over the predicted distance.

    The OWPP improves the result towards the end of predictions,

    whereas the RCSP model improves the result at the start of

    predictions.

    RMSE (STD) Power (W) Speed (m/s) Acceleration (m/s)2

    TDW 8766 1.481 0.3136TDWAtdw 8401 1.502 0.3111RCLA 8416 (4.47) 1.423 (0.018) 0.3130 (0.0005)RCLA/OWPP 8386 (3.67) 1.311 (0.012) 0.3081 (0.0002)RCSP 8367 (25.62) 1.304 (0.017) 0.3023 (0.0005)RCSP/OWPP 8257 (12.48) 1.257 (0.016) 0.2992 (0.0006)

    TABLE IAVERAGE RMSE ERROR RATES (AND STANDARD DEVIATION).

    Fig. 2. Average absolute deviation over the predicted distance.

    IV. CONCLUSION

    It is possible to use data from previous trips and Reservoir

    Computing to predict the future power demand, speed and

    acceleration profile. Using a classifier to predict if a stop is

    imminent significantly improves the results. Post-processing

    the predicted output interval further boosts the performance.

    The average absolute deviation of the predicted speed at 200m

    further is 6km/h.

    Both the predicted profiles and the stop predictor could be

    used for an intelligent vehicle energy management controller.

    REFERENCES

    [1] H. Benjamin Brown, Illah Nourbakhsh, Christopher Bartley, JenniferCross, Paul S Dille, Joshua Schapiro, and Alexander Styler. Chargecarcommunity conversions: Practical, custom electric vehicles now! NumberCMU-RI-TR-, March 2012.

    [2] Lili Cao and John Krumm. From gps traces to a routable road map. InProceedings of the 17th ACM SIGSPATIAL International Conference on

    Advances in Geographic Information Systems, GIS 09, pages 312, NewYork, NY, USA, 2009. ACM.

    [3] David Verstraeten, Benjamin Schrauwen, Michiel DHaene, and DirkStroobandt. An experimental unification of reservoir computing methods.

    Neural Networks, 20(3):391403, 4 2007.[4] Francis wyffels, Benjamin Schrauwen, and Dirk Stroobandt. Stable output

    feedback in reservoir computing using ridge regression. In V. Kurkova,R. Neruda, and J. Koutnik, editors, Proceedings of the 18th InternationalConference on Artificial Neural Networks, pages 808817, Prague, 92008. Springer.

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    6/85

    1

    Voorspelling van vermogensgebruik van een

    voertuig op een niet-vaste routeJonas Buyl

    Promotors: prof. dr. ir. Benjamin Schrauwen, dr. ir. David Verstraeten,

    Begeleiders: ir. Pieter Buteneers, ir. Tim Waegeman.

    SamenvattingIn dit artikel presenteren we verschillendemanieren om het verbruik en de snelheid van een voertuig tevoorspellen, net als de kans om te stoppen binnen de 200m. Weintroduceren een model voor tijdsreeksvoorspelling gebaseerdop Reservoir Computing, een nieuwe techniek om recurrenteneurale netwerken te trainen. Het model werd nog verbeterdmet informatie van vorige trips, en met herverwerking van hetvoorspellingsvenster. Verder gebruiken we een RC classificatie-model om de stopkans te voorspellen om het eerdere model opte splitsen door een model te trainen op snelle data en een ander

    op intervallen waar de auto stopt. De stopkans predictor bepaaltdan welk model moet gebruikt worden om het volgende intervalte voorspellen.

    Sleutelwoorden: Reservoir Computing, gedragsvoorspelling vaneen voertuig, voorspelling van stopkans, wegengraaf, elektrischevoertuigen, tijdsreeksvoorspelling

    I. INTRODUCTIE

    ELEKTRISCHE voertuigen (EV) zijn meer en meer com-

    mercieel aantrekkelijker maar de verkoopscijfers blijven

    tegenvallen, dikwijls o.w.v. de hoge kostprijs van de batterij.

    De batterij is de meest gebruikte manier om energie op te slaan

    in EVen, maar ze zijn niet zo goed in staat om grote pieken

    in het vermogen op te vangen zoals condensatoren. Nieuwonderzoek stelt voor om supercondensatoren condensatoren

    met een veel grotere energiedensiteit te gebruiken in EVen

    i.p.v. batterijen.

    Het ChargeCar project [1] stelt voor om de voordelen van

    beide te gebruiken zodat goedkopere batterijen en supercon-

    densatoren gebruikt kunnen worden. De condensator wordt

    dan gebruikt als buffer tegen hoge pieken in het vermogen.

    Dit verbetert de levensduur van de batterijen, de efficientie in

    koud weer en kan zelfs het bereik van het EV vergroten.

    Om de energiestromen tussen batterij, condensator en motor

    te sturen, is een controller nodig. In dit artikel introduceren

    we verschillende manieren om het gedrag van een voertuig en

    stops te voorspellen. Deze voorspellingen kunnen dan gebruiktworden om een intelligente controller te verbeteren.

    We maken gebruik van Reservoir Computing (RC), een vrij

    nieuwe techniek om recurrente neurale netwerken te trainen

    [3]. In plaats van alle interne gewichten te trainen, worden

    alleen de uitganggewichten getraind. De rest van de verbin-

    dingen blijven constant en worden willekeurig gegenereerd.

    Als eerste passen we een automatisch algoritme aan om

    GPS kaarten te genereren [2] om informatie bij te houden

    van de auto (zoals het huidig vermogen, snelheid, enz...).

    Deze informatie kan dan gebruikt worden om de modellen

    te verbeteren. Bovendien zagen we dat het gewicht van de

    voorspelde waarde t.o.v. de informatie van voorbije trips,

    daalde naargelang de voorspelde afstand. Daarom wordt het

    voorspelde venster herverwerkt met een simpel lineair model.

    Verder introduceren we een RC classificatiemodel om te

    voorspellen of de auto stopt binnen de 200m. Deze wordt

    dan gebruikt om de tijdsreeksvoorspellingsmodellen te splitsen

    naargelang de auto stopt of niet.

    I I . VOORSPELLING VAN HET GEDRAG VAN EEN VOERTUIGA. Voorverwerking

    Na het bouwen van de wegengraaf gedefinieerd door Cao

    et al. [2], werd de volledige dataset gekoppeld aan de weg-

    segmenten. Om de data beter te uit te lijnen werden de trips

    elke meter genterpoleerd zodat de data op een afstandsschaal

    komt.

    B. Tijdsreeksvoorspelling met enkel RC model(RCLA)

    De snelheid, acceleratie en het vermogen worden gebruikt

    als input in aparte reservoirs met 150 neuronen elk. De

    uitganggewichten worden getraind met ridge regressie [4].

    Elke voorspelde waarde wordt teruggekoppeld om recursief derest van de reeks te voorspellen. De reservoir-uitgang op elke

    stap t wordt uitgebreid met de informatie van de vorige trips

    op elke tijdsstip t om de volgende uitgang y(t) te voorspellen.

    C. Herverwerking van uitvoervenster (OWPP)

    De tijdsreeksen worden slechts getraind op de volgende

    stap. We hebben gemerkt dat de invloed van de voorspelde

    uitvoer t.o.v. de informatie van vorige trips vermindert naar-

    gelang de predictie-afstand langer wordt. Het uitvoervenster

    werd daarom herverwerkt door lineare regressie toe te passen

    op elke stap t apart en de de voorspelde waarde te combineren

    met de gemiddelde waarde op punt t.

    D. Stopkans voorspelling

    Een reservoir werd gecombineerd met een logistische uit-

    leesfunctie om een punt t te classificeren als een punt waar

    de auto stopt binnen de 200 meter met de volgende input: Het

    huidig vermogen, de snelheid, de acceleratie, de gemiddelde

    acceleratie van de vorige trips op het punt t+ 20 en als laatstede gemiddelde kans om binnen de 200 meter te stoppen van

    punt t + 20. Voor de evaluatie van dit model werd de op-pervlakte onder de ROC-curves gemaximaliseerd (AUC). De

    fracties echte positieven en fout-positieven worden berekenend

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    7/85

    2

    voor elke drempelwaarde die de klassen kan scheiden van de

    uitvoer waarde tussen [0, 1] van de reservoir uitleesfunctie.We vonden een maximale gemiddelde oppervlakte van 0.955.

    Na het minimalizeren van de fractie foute classificaties werd

    een drempelwaarde gevonden die 94.5% van de punten juistclassificeert. Uit Figuur 1 kunnen we zien dat de voorspelde

    stopkans meestal hoog is waar de auto stopt. Rond plekken

    waar de auto niet stopt maar wel remt is de uitvoer soms ook

    hoog. Dit resulteert in een fout, maar deze uitvoer kan toch

    nog nuttig zijn voor andere applicaties.

    Figuur 1. Een voorbeeld van de uitvoer van het RC stopkans voorspellings-model. De groene gebieden zijn de richtgebieden waar de auto stopt binnende 200m. Onderaan werd het snelheidsprofiel gegeven om de vergelijking methet echte gedrag te tonen. De gekozen drempelwaarde wordt getoond met degrijze stippellijn.

    E. Gescheiden tijdsreeksvoorspelling (RCSP)

    Het RCLA model werd gescheiden door een model apart te

    optimalizeren en te trainen op een dataset met intervallen waar

    de auto stopt, en een ander model op de andere intervallen.

    De stopkans predictor wordt dan gebruikt om te bepalen van

    welk model de voorspelling moet gebruikt worden. Tenslotte

    wordt ook de OWPP filter apart getraind en toegepast op dit

    model om de resultaten nog meer te verbeteren.

    III. EVALUATIE

    De RC-gebaseerde modellen werden vergeleken met enkele

    lineaire methodes. De beste modellen maken gebruik van een

    geschiedenisvenster (TDW): een gewogen gemiddelde van devorige waarden, getraind met lineaire regressie. Een tweede

    model breidt het TDW model uit met ook het gewogen

    gemiddelde van informatie uit de wegengraaf (TDWAtdw).

    De trips van 1 bestuurder werden gebruikt uit het ChargeCar

    project. Een willekeurige subset van 2,230,500 punten werd

    gekozen en verdeeld in 9566 intervallen om te voorspellen.

    Van deze data werd 25% gebruikt voor training, nog 25%voor validatie en de resterende 50% werd gebruikt om demodellen te vergelijken. De resultaten van de RC modellen

    zijn de gemiddelden over 10 reservoirs.

    De resultaten van alle modellen kunnen teruggevonden worden

    in Tabel I. Het eerste RC-gebaseerde model RCLA biedt geen

    grote verbetering tov. de lineaire methodes, maar de OWPP

    filter kan de gemiddelde kwadratische fout (RMSE) wel sterk

    verbeteren. Het RCSP model voorspelt de snelheid beter dan

    de andere modellen, en uitgebreid met een OWPP filter pres-

    teert het beter dan elk ander model. In Figuur 2 is de absolute

    afwijking gegeven over de voorspelde afstand. De OWPP

    verbetert het resultaat aan het einde van de voorspellingen,

    terwijl het RCSP model de resultaten verbetert aan het begin.

    RMSE (STD) Vermogen (W) Snelheid (m/s) Acceleratie (m/s)2

    TDW 8766 1.481 0.3136TDWAtdw 8401 1.502 0.3111RCLA 8416 (4.47) 1.423 (0.018) 0.3130 (0.0005)RCLA/OWPP 8386 (3.67) 1.311 (0.012) 0.3081 (0.0002)RCSP 8367 (25.62) 1.304 (0.017) 0.3023 (0.0005)RCSP/OWPP 8257 (12.48) 1.257 (0.016) 0.2992 (0.0006)

    Tabel IGEMIDDELDE RMSE (EN STANDAARD DEVIATIE)

    Figuur 2. Gemiddelde absolute afwijking over de voorspelde afstand.

    IV. CONCLUSIE

    Het is mogelijk de voorspelling van snelheid, het vermogen

    en de acceleratie te verbeteren door het gebruik van de

    informatie van vorige trips en Reservoir Computing. Met

    een stopkans predictor kan dit model nog meer verbeterd

    worden. Herverwerking van het uitgangsvenster verbetert de

    prestaties nog meer. De gemiddelde absolute afwijking van de

    voorspelling op meter 200 is 6km/u. De voorspelde profielen

    en de stopkans predictor kunnen bovendien samen gebruikt

    worden in een intelligente controller om de energie in een EV

    te sturen.

    REFERENTIES

    [1] H. Benjamin Brown, Illah Nourbakhsh, Christopher Bartley, JenniferCross, Paul S Dille, Joshua Schapiro, and Alexander Styler. Chargecarcommunity conversions: Practical, custom electric vehicles now! NumberCMU-RI-TR-, March 2012.

    [2] Lili Cao and John Krumm. From gps traces to a routable road map. InProceedings of the 17th ACM SIGSPATIAL International Conference on

    Advances in Geographic Information Systems, GIS 09, pages 312, NewYork, NY, USA, 2009. ACM.

    [3] David Verstraeten, Benjamin Schrauwen, Michiel DHaene, and DirkStroobandt. An experimental unification of reservoir computing methods.

    Neural Networks, 20(3):391403, 4 2007.[4] Francis wyffels, Benjamin Schrauwen, and Dirk Stroobandt. Stable output

    feedback in reservoir computing using ridge regression. In V. Kurkova,R. Neruda, and J. Koutnik, editors, Proceedings of the 18th InternationalConference on Artificial Neural Networks, pages 808817, Prague, 92008. Springer.

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    8/85

    Toelating tot bruikleen - Copyright

    De auteur geeft de toelating deze masterproef voor consultatie beschikbaar te stellen en

    delen van de masterproef te kopiren voor persoonlijk gebruik. Elk ander gebruik valt onder

    de beperkingen van het auteursrecht, in het bijzonder met betrekking tot de verplichting

    de bron uitdrukkelijk te vermelden bij het aanhalen van resultaten uit deze masterproef.

    The author gives permission to make this master dissertation available for consultation

    and to copy parts of this master dissertation for personal use. In the case of any other

    use, the limitations of the copyright have to be respected, in particular with regard to the

    obligation to state expressly the source when quoting results from this master dissertation.

    Jonas Buyl June 10, 2012

    i

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    9/85

    Acknowledgments

    First, I would like to thank my promotors prof. dr. ir. Benjamin Schrauwen and dr. ir.

    David Verstraeten for their advice and for making this research possible. I would also like

    to thank my supervisor Pieter Buteneers for his guidance and patience for letting me work

    and discover at my own pace.

    On a personal level I owe much gratitude to friends and family for their support. Es-

    pecially towards Sara Im very grateful for her understanding and patience. Lastly, I

    thank my parents for giving me the means to study.

    ii

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    10/85

    Power demand prediction of a vehicle on a non-fixed route

    by

    Jonas Buyl

    Thesis submitted in partial fulfillment of a Master Degree in Engineering: Computer

    Science

    Academic year: 2011-2012

    Universiteit Gent

    Faculty of Engineering

    Promoters: prof. dr. ir. Benjamin Schrauwen, dr. ir. David VerstraetenSupervisor: ir. Pieter Buteneers

    Summary

    In this thesis several approaches are presented to predict the future power demand and

    speed of a car, as well as other upcoming events that affect this demand. First, a road

    graph data structure for automatic GPS map generation is adapted to capture local vehicle

    behavior information.

    The average local vehicle behavior is then used as extra information for the time series

    prediction of the power demand, speed and acceleration using Reservoir Computing, which

    is a novel technique for training recurrent neural networks. The predicted output window

    is post-processed using a simple linear technique.

    Thirdly, another system is presented that uses the current acceleration profile as well as

    the information in the road graph to predict the chance a car is going to stop within the

    next 200m.

    Finally, two separate time prediction models are trained, one for when the car stops over

    the next 200m, and one for when it does not. For each prediction, the model used is then

    determined by the stop prediction model.

    Keywords: reservoir computing, vehicle behavior prediction, road graph, electric

    vehicles

    iii

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    11/85

    Contents

    1 Introduction 2

    1.1 A battery-capacitor hybrid setup . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.1.1 Battery vs. supercapacitor . . . . . . . . . . . . . . . . . . . . . . . 2

    1.1.2 Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.3 Content and structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2 Reservoir Computing 6

    2.1 Introduction to neural networks . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2.2 The Reservoir Computing approach . . . . . . . . . . . . . . . . . . . . . . 7

    2.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.3.1 Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.3.2 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.4 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.5 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.5.1 Input scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.5.2 Spectral radius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.5.3 Leak rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2.5.4 Bias scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2.5.5 Reservoir size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    2.6 Time series prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    2.6.1 Time series prediction using Reservoir Computing . . . . . . . . . . 162.7 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    3 Data analysis 18

    3.1 A road graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    3.2 Extracting useful information . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    3.3 Error measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    iv

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    12/85

    3.3.1 Defining a prediction distance . . . . . . . . . . . . . . . . . . . . . . 22

    3.3.2 Root Mean Square Error (RMSE) . . . . . . . . . . . . . . . . . . . 24

    3.3.3 Kurtosis Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    3.3.4 Receiver Operating Characteristic (ROC) . . . . . . . . . . . . . . . 25

    4 Time series prediction of vehicle power, speed and acceleration 28

    4.1 Evaluation methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    4.2 Baseline models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    4.2.1 System setups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    4.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    4.3 A prediction system with Reservoir Computing . . . . . . . . . . . . . . . . 36

    4.3.1 System setups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    4.3.2 Parameter optimization . . . . . . . . . . . . . . . . . . . . . . . . . 384.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    4.4 Output window post-processing . . . . . . . . . . . . . . . . . . . . . . . . . 45

    5 Stop prediction 47

    5.1 Predicting the chance to stop using RC . . . . . . . . . . . . . . . . . . . . 48

    5.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    6 S plitting the system for stopping and driving behavior 55

    6.1 The model setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    6.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    6.2.1 RCSP Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    7 Conclusion 59

    A Extra tables 61

    A.1 Kurtosis difference results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    A.2 Model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    B Extra figures 66

    B.1 Stop prediction model examples . . . . . . . . . . . . . . . . . . . . . . . . . 66B.2 Time series prediction examples . . . . . . . . . . . . . . . . . . . . . . . . . 68

    1

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    13/85

    Chapter 1

    Introduction

    Electric cars or vehicles (EVs) are increasingly commercially viable but seem to be hold-

    back by a number of problems. Not only are there a lot of myths around EVs but there

    remain some real issues.

    One of the myths, for example is the issue of battery life expectancy which has been

    largely solved. Nissan even announced an eight-year warranty on the batteries of its

    electric model: the LEAF (figure 1.1). People usually drive longer than 8 years with a car

    so some money will be spent on battery replacements, but these maintenance costs are

    lower than the more frequent repairs needed in a regular gas-powered car [9].

    The LEAF is not as successful as anticipated however because the price is still too high for

    people to switch. Often a third of the total price is spent solely on the expensive lithium

    batteries. Prices are expected to drop through mass production and governments all over

    the world give substantial incentives. A radical change may be necessary, however, to

    make it interesting for consumers to buy an EV for shorter distances while keeping the

    regular car for long distances. One idea could be to mitigate the disadvantages of cheaper

    batteries (such as a short life expectancy) in another cheaper way, so we could bring the

    price down.

    1.1 A battery-capacitor hybrid setup

    The ChargeCar project [1] is committed to finding new ways to bring down the costs for

    EVs. One of their ideas is to exploit the advantages of both batteries and capacitors.

    1.1.1 Battery vs. supercapacitor

    The battery has been the main way of storing energy in EVs because of their large power-

    to-weight ratio. This simply means it allows the car to go further without adding a lot of

    2

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    14/85

    Figure 1.1: The Nissan LEAF full electric vehicle

    weight. The downside however is that batteries are generally more inefficient when coping

    with large spikes in power demand. Not only do fast charges and discharges decrease bat-

    tery life expectancy, they also decrease the capacity of batteries. This is especially true

    for lead-acid batteries, due to the chemical and structural changes in the interface under

    high load. These increase resistance and therefore decrease capacity. For lithium-based

    batteries on the other hand, it is shown that they lose capacity because of the higher

    temperatures caused by high power load. [11].

    Capacitors, like batteries, are electrical components used to store energy. They consist

    of two metal plates separated by a thin isolation layer. Electrons can then be transfered

    from one plate to another, charging and discharging the capacitor. One advantage over

    batteries is that they show little degradation even after several hundreds of thousands of

    charge cycles. Theyre especially more proficient in handling large power demand peaks

    than batteries. The energy density of capacitors is a lot lower than batteries however.

    Supercapacitors1 on the other hand, have a much greater energy density than capaci-

    tors [24]. The amount of energy that can be stored in a capacitor increases with thesurface area of the metal plates. In supercapacitors, the plates are coated with a carbon

    layer, etched to produce many holes that extend through the material, much like a sponge.

    This increases the interior surface area many orders of magnitude, greatly increasing the

    energy density (> 100,000 times).

    1Also referred to as electric double-layer capacitor (EDLC), ultracapacitor, etc...

    3

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    15/85

    1.1.2 Controller

    The solution presented by ChargeCar is to exploit the advantages of both a battery pack

    and a supercapacitor. The capacitor is used for high spikes in power demand, and to save

    energy generated while braking (through regenerative braking). When the capacitor is

    empty, the battery is used to supply power. When generating power while the capacitor

    is already full, the battery is charged. The supercapacitor effectively works as a buffer

    between engine and battery, relieving the battery. Using both systems together then

    allows car manufacturers to use cheaper, more cost-effective components. Furthermore,

    capacitors function well in temperatures as low as 40 C, when batteries are at their worst.

    To direct the energy flows between battery, capacitor and engine a controller is needed as

    Figure 1.2: A controller guides power flows between battery, capacitor and engine

    shown in figure 1.2. When accelerating, as much power as possible should be drained from

    the capacitor to handle the high energy demand for accelerating. When braking all energy

    generated by regenerative braking should be saved in the capacitor. After accelerating and

    keeping a constant speed, the capacitor will be nearly empty, which means the battery

    will need to be used. Ideally this is the only time the battery is used.

    Now consider the following situation: the car is approaching an intersection but only stops

    there sometimes. It could be useful in this situation to make sure the capacitor is com-

    pletely filled in case the car does slow down. Another example: the car is driving steadily

    at 70km/h but wants to overtake another car. The capacitor should have some energyleft to handle the short power burst, which means its desirable to transfer some energy

    slowly from the battery to the capacitor, when the capacitor is almost empty, to handle

    any possible peak.

    Finding an optimal controller then, is a complex problem. To minimize battery usage it

    could be beneficial to try to predict vehicle behavior and upcoming driving environments.

    An intelligent controller could then use these predictions to optimize capacitor usage.

    4

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    16/85

    1.2 Problem statement

    In this thesis we will investigate to what extend it is possible to predict the future power

    demand and speed, as well as other upcoming events that affect this demand. These

    predictions could then be useful for the controller described above. Previous research on

    this subject includes the prediction of power demand for hybrid vehicles on a fixed route

    by Bartholomaeus et al.[4] and Johannesson et al.[18], but these make use of a fixed route

    where the data set is perfectly aligned, and they predict vehicle behavior assuming the

    vehicle drives along the same route. The reality however, is much more complex. To

    truly investigate the possibilities of prediction in real-life situations we do not make the

    assumptions made by Bartholomaeus and Johannesson, making it less relevant to compare

    results. In this work predictions are made without assuming the vehicle is on a fixed route.

    The models presented here can be easily adapted to work under real-life circumstanceswhere vehicles are driving on a non-fixed route, and data is collected while driving.

    1.3 Content and structure

    To do this we first try and gather information of previous trips in a single data structure in

    Chapter 3. We then try several approaches using Reservoir Computing and other machine

    learning techniques explained in Chapter 2. One approach is to use these techniques for

    time series prediction of power demand and other factors that it depends on (e.g. speed)

    which we discuss in Chapter 4. In Chapter 5 we try to calculate the chance of stopping

    within a short distance. Finally in Chapter 6, we use the stop predictor from Chapter 5

    and determine if they can improve the prediction models presented in Chapter 4.

    5

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    17/85

    Chapter 2

    Reservoir Computing

    When problems become so complex they cant be solved efficiently by ordinary algorithms

    sometimes a near-optimal solution can be found more efficiently using machine learning

    techniques. This usually comes down to a model that is trained to capture underlying

    characteristics of data. These can then be used to predict the output of new input data.

    2.1 Introduction to neural networks

    A neural network is a model based on the biological structure of the brain and consists

    of several interconnected neurons. Each neuron has input and output connections that

    connect it to the rest of the network. The output is calculated by taking a weighted sum

    of those input connections, usually transformed by a non-linear activation function (for

    the rest of this work the hyperbolic tangent tanh is used).

    When there are no recurrent connections or cycles in the network, the network is called

    a feed-forward neural network (FFNN). If there are cycles in the network it is called a

    recurrent neural network (RNN). A neural network is trained by adjusting the weights

    according to the error rate between target and predicted output. If the output depends on

    large chains of neurons the adjustments can become so small that they cant be calculated

    anymore. In RNNs the output depends on infinite chains of neurons which makes it very

    hard to train this type of neural network. Algorithms like back-propagation-through-time

    [30] are able to solve the problem but it the algorithm is very complex and takes a long

    time to calculate.

    6

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    18/85

    2.2 The Reservoir Computing approach

    Reservoir Computing (RC) is a fairly new approach to training recurrent neural networks

    [28]. Its a unifying term for several similar methods discovered independently, the most

    important ones being Echo State Networks [16] and Liquid State Networks [21]. The idea

    is to never train the network itself but to only train the weight of each neuron to the

    output: a readout function. All other weights, such as the input connections and internal

    connections, are fixed and initialized randomly, but can be scaled and tuned (see section

    2.5 on reservoir parameters).

    To understand the dynamics of reservoirs consider the following analogy, which is usu-

    ally given to explain Liquid State Networks. It does not capture the whole picture of

    Reservoir Computing, but it gives an idea of what happens to the state of the reservoirwhen affected by an external input. Imagine the hidden layers of the reservoir network

    as a real reservoir or liquid. We would like a warning system that warns us when some-

    one throws a large object in the liquid (the input). A single throw will generate ripples

    in the reservoir, converting that input to a spatio-temporal pattern along the surface of

    the reservoir. To detect this pattern we place floating sensors in the reservoir which are

    evidently connected through the liquid. The state of the reservoir can then be read from

    the sensor values at a specific point in time.

    This analogy makes it clear that certain parameters can heavily influence the reservoir

    dynamics: the number of sensors, the size of the thrown object (the input), the way the

    connecting surface behaves when an object falls in the water, etc... They are further dis-

    cussed in section 2.5 in the context of Reservoir Computing specifically.

    In general, reservoirs are used to give a high-dimensional dynamic representation of the

    input, called the state of the reservoir. Because they are interconnected, they possess a

    memory which depends largely on the scaling of the internal connections. Extra memory

    can also be introduced for every neuron individually by retaining a part of the previous

    neuron output value. To work properly, the reservoir needs to satisfy the Echo State

    Property[16, 15]: the reservoir needs to wash out any information from initial conditions.

    In practice, a reservoir network consists of the following:

    7

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    19/85

    Figure 2.1: A schematic representation of a typical reservoir network. Solid arrow lines are not

    trained. Dashed arrow lines are trained connections.

    u[k] The reservoir input vector on time step k

    x[k] Reservoir state on time step k

    y[k] The output vector on time step k

    A schematic representation is given in Figure 2.1.

    The state of the reservoir x[k] that retains 1 of the previous state x[k 1] at each time

    step k is given by:

    x[k] = (1 ) x[k 1] + f(Wrx[k 1] + Wiu[k] + Wb)

    The weights of the internal connections Wr are initialized with random values from the

    normal distribution, but scaled so that the largest absolute eigenvalue of the random ma-

    trix is equal to a given parameter value: the spectral radius (see subsection 2.5.2). The

    input weights Wi are initialized randomly as well, but are rescaled by the input scaling

    parameter (subsection 2.5.1). A bias is sometimes added to the input with scaling Wb

    (see subsection 2.5.4).

    The output is calculated by:

    y[k] = Wor

    x[k] + Woi

    u[k] + Wob

    The output weights Wor

    (reservoir to output), Woi

    (input to output) and Wob

    (output

    bias) need to be trained. They are the dashed connection arrows in Figure 2.1.

    8

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    20/85

    2.3 Training

    The main advantage of training recurrent neural networks using Reservoir Computing is

    that only the weights Wor

    of the reservoir neurons to the output need to be trained. An

    additional linear connection straight from input to output with weight Woi

    is sometimes

    added, as well as a constant value (or bias) with weight Wob

    .

    This training approach not only reduces the time required for training but also allows

    a wider variety of training methods to train the output weights. For this work, two dif-

    ferent training methods are used:

    2.3.1 Linear regression

    The first step for training the weights using linear regression consists of letting the reservoirrun over all the samples and keep the reservoir states on every time step k in a matrix A.

    Suppose we want to train the weights W using simple linear regression then we want to

    find the least squares solution of the desired output y and the predicted output y:

    Wopt = argminW

    A W y2

    There exists a closed-form solution:

    Wopt = (ATA)1ATy

    Although A is large (nsamples nneurons), its still possible to calculate the output weights

    relatively fast when compared to other RNN training techniques such as BPTT.

    2.3.2 Logistic regression

    Logistic regression is a classification method that models the probability of an input sample

    x belonging to a certain class. In contrast to other probabilistic models, logistic regres-

    sion uses a discriminative approach which classifies the inputs directly with the following

    probability:

    p(C1|x) = f(x) = (wTx) =

    1

    1 + exp(wTx w0)

    and p(C2,x) = 1 p(C1,x) when solving a binary classification problem.

    9

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    21/85

    Figure 2.2: An example of classification using the logistic function. The shaded area is the overlaparea between decision spaces. The red line is the suggested hard threshold

    These distributions are visualized in Figure 2.2. When distributions are not linearly

    separable1 theres an overlap area which means hard threshold needs to be defined. This

    is often the point at which both probability functions intersect, but other thresholds can

    be chosen if misclassification costs are different for the two classes.

    The weights of the logistic regression model are found by minimizing the cross-entropyfunction:

    E(w) = lnp(t|w) = N

    n=1

    tn ln yn + (1 tn)ln(1 yn)

    Where tn {0, 1}, 1 if the input sample belongs to class C1. yn is the predicted output of

    the model. Note that correctly classified samples that lie far from the decision line do not

    get penalized. In ridge regression however, when samples would be correctly classified,

    but lie far from the target output, they get penalized when they lie far from the target

    output. This is further illustrated in Figure 2.3.

    The solution for the minimization does not have a closed form but it is a convex problem2

    so we can find it through gradient descent3. There exists a gradient descent approach

    1Two sets of points in two dimensions are linearly separable if they can be separated by a single straightline

    2A convex problem is a problem that has a unique minimum3Gradient descent is an optimization algorithm that finds the minimum error by taking steps to the

    negative of the gradient (or derivative) of the error function

    10

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    22/85

    Figure 2.3: The error measure E(z) for the mean squared error of target and model output (used e.g.in classification using linear regression) and the cross-entropy function used in logistic regression.In z = y t, y is the model output, and t the target output of the model. For t = 1: a model outputy = 2 is penalized more by the mean squared error than a model output of y = 1, although theyare both classified correctly. For the cross-entropy function this is not true, and could therefore bemore suitable for classification.

    based on the Newton-Raphson iterative optimization scheme called iteratively reweighed

    least squares (IRLS) [23]. The weights are updated each iteration subtracting the deriva-

    tive of the error function divided by the second derivative. The derivation and specifics of

    this algorithm are not important for this work but the basics steps consist of the following

    each iteration:

    y = ((w())Tx)

    Rnn = yn(1 yn)

    z = Xw() R1(y t)

    w(+1) = (XTRX)1XTRz

    2.4 Regularization

    When training a complex system, the model can become overfitted to the training samples.

    This means that the model will perform well on the training set, but not on new test

    data, because it is trained on examples that are not representative for the full range of

    possibilities. When the model is then tested on a sample it has not seen before in the

    training set, it wont know what to do with it. For example[27], suppose we want to train

    a model to predict the Fibonacci sequence [1, 1, 2, 3, 5, 8,...], and we give it the examples

    11

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    23/85

    [1, 1], the model will be trained to output 1. The training examples, like in the Fibonacci

    sequence, are examples of the underlying characteristics, affected by a small deviation

    or noise. The underlying characteristic for the Fibonacci sequence is known: the n-th

    Fibonacci number can be calculated by rounding

    (1+2)n

    5 . This deviation between thetraining examples and the underlying characteristic therefore lies always between 12 and

    12 . If the model is too complex, the deviation is trained as well. One of the ways to

    smooth this noise is by constraining weight size. This makes the model less sensitive to

    noise and slight deviations. However, if this constraint is too strict, the model is simplified

    too much to learn the underlying characteristics. A trade-off needs to be made. Using

    Tikhonov regularization [25] or ridge regression, the trade-off can be tuned with a single

    regularization parameter [32].

    To find the least squares solution the regularized weights W are now found by:

    Wopt = argminW

    A W y2 + W2

    in which is the regularization parameter. This minimization problem has a closed-form

    solution as well, the weights can be calculated as follows:

    Wopt = (ATA + I)1ATy

    When is large, the size of the squared weights will increase the cost a lot. Setting too

    high however, will increase the distance between the optimal solution and the regularized

    solution (referred to as underfitting). To optimize , the same model is trained each time

    with a different . The performance of each is then evaluated on new samples that are

    not a part of the training set.

    For training reservoir networks in particular, its important to note that the weights

    depend on the random initialization, and that a regularization parameter needs to be

    optimized for every reservoir specifically.

    More regularization is needed as reservoir size increases because complexity increases: inthe extreme case there is a reservoir node for every training sample, mapping the sample

    exactly to the output. On the other hand, if reservoir size is extremely low, no regulariza-

    tion is needed because the model is not as complex.

    For logistic regression, proper regularization is often necessary as well. The optimized

    regularization parameter can be added to the IRLS algorithm easily by modifying the

    12

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    24/85

    weight update as follows:

    w(+1) = (XTRX+ I)1XTRz

    2.5 Parameters

    As mentioned before, there are a number of parameters that need to be determined to

    control the dynamics of the reservoir. The results of a model trained using Reservoir Com-

    puting depend on the careful fine-tuning of these parameters. When using regularization

    in the readout function, the regularization parameter should be optimized separately for

    every reservoir parameter. Each model is trained using several different regularization

    parameters and tested on a validation set. The model with the optimal regularization

    parameter is then evaluated again on a separate test set to be sure of the general perfor-

    mance of the model. We therefore need divide the dataset in three parts. This could be

    a problem, especially when using a limited amount of data because we could accidentally

    choose a poor set of samples which can lead to misleading results.

    The best solution to counter this problem is cross-validation. The dataset is divided

    in K subsets. Each subset is then used exactly once as a test set and the others as a train-

    ing set. After this, the result is averaged, making sure the result is valid for the complete

    dataset. When an extra validation set is needed as well, the subsets used for training are

    divided again in a smaller training subset and a validation set. For example, suppose the

    dataset consists of 4 samples then the cross-validation scheme is shown in Table 2.1.

    2.5.1 Input scaling

    Input scaling determines the scaling of the random input weights to the reservoir. They

    determine how much the neurons are excited by new input values. For very low input values

    the nonlinear neuron activation functions are barely activated resulting in an almost linear

    system. Very high input values however, will saturate the activation function, resulting

    almost in a binary step function. In other words: the input scaling determines the degreeof nonlinearity in the system.

    2.5.2 Spectral radius

    The spectral radius of a reservoir is the largest absolute eigenvalue of the weight matrix

    of the internal connections between the neurons in the reservoir. It therefore defines the

    factor by which the previous states are multiplied in the reservoir state update (section

    13

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    25/85

    Training set Validation set Test set

    1,2 3

    1,3 2 4

    2,3 1

    1,2 41,4 2 3

    2,4 1

    1, 3 4

    1, 4 3 2

    3, 4 1

    2, 3 4

    2, 4 3 1

    3, 4 2

    Table 2.1: An example cross-validation scheme where a dataset of 4 samples is divided in a trainingset, a validation set, and a test set[27]

    .

    2.2). If we choose a spectral radius < 1, the input values will eventually fade out, ensuring

    stability and the echo state property. With a spectral radius > 1, the reservoir can become

    unstable if the reservoir is near linear.

    The internal connections between the neurons add memory to the reservoir. The spectral

    radius and the scaling of the internal connection weights therefore influences the time scaleof reservoir. For input that evolves slowly, or that has long range temporal interactions,

    the spectral radius is usually chosen close to 1, or even higher if the reservoir is nonlinear

    enough.

    2.5.3 Leak rate

    The leak rate of each neuron in the reservoir controls the retainment rate of the previous

    neuron output. It influences the memory of the reservoir directly. This also means it

    affects the influence of new state updates and therefore also makes the reservoir adapt

    more slowly to new situations. Therefore, a trade-off between the influence of long-term

    dynamics and the influence of new input needs to be made.

    2.5.4 Bias scaling

    A constant 1 may be added to the input, multiplied by the bias scaling parameter. This

    shifts the working point on the sigmoid activation function tanh of the neuron. The steep-

    14

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    26/85

    ness of the sigmoid is largest around the origin. Shifting the point upward or downward

    therefore makes the reservoir less dynamic. An illustration of the influence of the bias to

    the activation function is shown in 2.4.

    Figure 2.4: Illustration of the effect of bias scaling. Using 0 as working point for the input broadensthe spectrum of the neuron activation (red line). When shifting the bias, the neuron exhibits aless dynamic behavior (the green line).

    2.5.5 Reservoir size

    The reservoir size is the amount of neurons in the network. Increasing reservoir size

    usually improves the result, assuming sufficient regularization. Its therefore not really

    an optimizable parameter as the reservoir size is normally determined according to the

    computational power available.

    2.6 Time series prediction

    Predicting time series, as it is basically predicting the future, has been the focus of much

    research throughout history. The basic idea is to first observe a training sequence andthen try to complete the sequence over a number of steps in the future, also referred to as

    the number of freerun steps in RC literature.

    To be able to predict by only taking into account the history there needs to be a pat-

    tern in the sequence. That pattern can either be periodical (like 1, 2, 1, 2, 1, 2, . . .) or

    contain a certain trend (such as the Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, . . .).

    15

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    27/85

    2.6.1 Time series prediction using Reservoir Computing

    Reservoir Computing has already been successfully used for the signal generation task [17].

    The model is trained by training it to always output the next step ahead. After training

    these signal dynamics, the first unknown signal value is predicted according to the learned

    signal transitions (teacher forcing). The predicted value is then used as input value to

    predict the following signal value, and so on. This continues until the required number of

    predicted steps is reached.

    Its very important to let the reservoir neurons warm up sufficiently before starting predic-

    tion, as the model obviously cant complete a sequence if it doesnt know the first part of

    the sequence. Warming up a reservoir is done simply by letting the run the reservoir over

    the history of the signal and feeding back the observed values. The evolution of neuronactivations when running over an example power demand profile can be seen in Figure

    2.5. In this case, we can see that the neuron output values are initialized at a value close

    to 0 and take about 100 state updates before patterns begin to appear in the reservoir.

    Figure 2.5: An example of the output values of the first 10 neurons over the first 200 state updateswhen running the reservoir over a power demand profile of a car

    One of the short-comings of time series prediction with Reservoir Computing, is thatoften it can only act on short-term time scales, although this can be partly solved by

    retaining a part of the previous state. However, RC has already been tried successfully

    for long-term financial time series prediction in [31] and [26]. Financial time series can

    often be decomposed in periodical patterns, a trend and a remainder. These signals can

    then be predicted separately and combined again after prediction. In this work we predict

    the time series of vehicle behavior such as the power demand course, the speed profile

    16

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    28/85

    and acceleration. After initial experiments it became clear however that these are not

    as straight-forward to predict as previously attempted and resulted in poor results. Our

    approach to this problem is explained in Chapter 4.

    2.7 Classification

    Reservoir Computing can also be successfully applied to many temporal classification

    problems such as speech recognition [27] and the detection of epileptic seizures [7]. In

    the first example, the samples are isolated and need to be classified as a whole. In the

    second, the input is a signal where the class needs to be defined at every time step. Every

    classification then depends on the current input values and the previous input samples

    (thanks to the memory properties of reservoir networks). The advantage of classification

    using RC is that the reservoir not only maps the input values to a high-dimensional featurespace but that input values are memorized in the reservoir for a certain time in order to

    detect the temporal patterns as well. The reservoirs states can then be classified using

    any classification technique. The classification problem handled in this thesis is explained

    and addressed in Chapter 5.

    17

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    29/85

    Chapter 3

    Data analysis

    The data is supplied by ChargeCar.org [1]. It consists of the GPS coordinates and elevation

    of each point sampled on a one-second interval. The dataset contains data from multiple

    drivers, but the GPS points are not gathered in the same neighborhood, so they cant be

    combined. Well focus on the available data from one driver alone as it already allows for

    extensive research. We will be using the trips from a driver who drove around in south

    San Francisco. In total about 6915 km is covered in about 159 hours (Figure 3.1). The

    driver frequently drives along the same road segments. It could therefore be interesting to

    keep data from previous trips over the same road. Future predictions can then be based

    on the previous trips when passing in the neighborhood.

    Figure 3.1: ChargeCar GPS data

    18

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    30/85

    3.1 A road graph

    To access data from previous trips easily, a suitable data structure is needed along with an

    efficient algorithm to build it. The greatest difficulty is detecting when a car is driving on a

    road where he has already been, and quickly finding the related GPS points. We follow the

    approach described by L. Cao and J. Krumm to build a graph of directed road segments

    [8]. They present a new and fully automatic method for creating a routable road map from

    GPS traces of everyday drivers. The algorithm described by Cao and Krumm performs

    well for ensuring road connectivity and differentiating GPS traces from close parallel roads.

    First, to increase efficiency, a separate dataset is made by generally retaining only one

    point every 30 meter. When the direction change over the last three points is greater than

    10 degrees, the GPS points are retained every 10 meter. This increases accuracy when thecar is making a turn. Some of these points will be close and should be merged, making

    sure the connectivity in the graph is not lost. To build the data structure we start with an

    empty graph. Each trip is then processed sequentially. For each node in a trip the graph

    is searched to decide whether it should be merged with an existing graph node.

    Intuitively, a new node n should be merged with node v from the graph if theyre on

    the same road segment. Let e be a road segment that connects v and another node v

    from the road graph. Then n should be merged with v if the distance from e to n is small

    enough, the trip goes in the same direction as e, and n is closer to v than v.

    An illustration of the process can be seen in Figure 3.2. The first trip becomes the initial

    graph. In the second trip, the 2nd, 3rd, 4th and 5th nodes are merged with the existing

    nodes. The 1st, 6th and 7th node are copied to the graph. The road segments from the

    second trip are connected from the new nodes to the existing nodes to ensure connectivity.

    No nodes from trip 3 satisfy the merge conditions so they are simply copied to the graph.

    The algorithm without optimization is very inefficient because every GPS point of the

    trip needs to be compared to every node in the graph. The time required to add a trip

    increases dramatically as the number of nodes in the graph increases. However, its clear

    that the current GPS point will never need to merge with nodes far away. For this pur-

    pose, the nodes are kept in a 2-d tree. Using a 2-dimensional distance tree all nodes within

    range can be looked up in O(logN) time [19] (where N is the amount of nodes in the road

    19

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    31/85

    Figure 3.2: A simple example of the merge algorithm. The circles represent the retained GPSpoints. The arrowed lines represent the connecting road segments along with the driving direction.

    graph). Storing the road graph nodes in a 2-d tree therefore significantly reduces the timerequired to add a trip to the road graph.

    3.2 Extracting useful information

    Although we had no access to real data such as speed and power usage, a lot of infor-

    mation can be calculated from the GPS data. Speed, acceleration, and power demand is

    calculated for every sample using the power model described by ChargeCar.

    After building the graph, the complete dataset is used again, and mapped on the roadsegments between the road graph nodes. This way, the road graph can be built efficiently

    without losing information about vehicle behavior between those points.

    The dataset consists of samples with 1 second intervals. This means that the time spent

    over a road segment of 30 meters between two nodes can be very variable, because it

    depends on the speed of the vehicle over the road segment. In order to correctly calculate

    20

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    32/85

    and align the average behavior over the road segment, while properly retaining any peaks

    in the profiles, interpolation is needed. The distance traveled over the road segment is al-

    most constant for every trips. The trips are therefore interpolated every meter, converting

    them from a sample every second, to a sample every meter. Over the rest of the work,the time series described are converted from a time scale to a distance scale. The effect of

    this interpolation is illustrated in Figure 3.3.

    Through testing we also noticed that vehicles driving exhibit very different behavior from

    Figure 3.3: The speed profile on a time scale (left profile) is interpolated to a distance scale (rightprofile)

    vehicles stopping. Suppose a vehicle stops 50% of the times he passes through an inter-section where the speed limit is 50 km/h. The average speed over the road segment would

    then be 25 km/h. However, he rarely really drives at this speed, but usually either stops

    and continues slowly over the intersection, or he doesnt need to stop, and keeps driving at

    50 km/h. We therefore separate the captured information in a slow set and a fast dataset

    on every road segment.

    When passing through a road segment, the current vehicle behavior is added to the slow

    set if:

    The car stops on the current road segment, i.e. if the car drives slower than 2 m/s

    = 7.2 km/h at any point over the road segment.

    Or, the car stops in a road segment within 100 m before or after the current position,

    and the cars average speed over the segment is more than 2m/s slower than the total

    average speed of the fast set.

    21

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    33/85

    If neither condition is satisfied, the information about the cars behavior over the segment

    is added to the fast data.

    The average profiles over a path in the road graph can now be collected by concatenatingthe average profiles captured in each road segment. An example of the average speed

    profiles is given in Figure 3.4. Small jumps in the average profiles couldnt be avoided,

    because the amount of trips, that drive over the different road segments in the chosen

    path, is variable.

    Figure 3.4: An example of the average speed profiles over a trip in the road graph

    3.3 Error measures

    To evaluate the techniques used, obviously some sort of evaluation method is needed.

    The ultimate goal is to minimize battery usage, but the controller using these predictions

    remains hypothetical. Its therefore impossible to define a single number to capture the

    goodness of a model. Its still possible however, to reason about the usefulness of the

    proposed models using the following error measures.

    3.3.1 Defining a prediction distance

    If we want to evaluate the predictive capabilities of each model we should first specify on

    what scale prediction is required. Of course, this can be very different for each applica-tion. The initial purpose for this work however, is the improvement of a battery/capacitor

    controller so we will focus on this example.

    The most ideal situation would be to allow the capacitor to empty completely while driv-

    ing at a constant speed, to be ready to save all the energy contained in the moving car.

    The theoretical maximum prediction distance is then the distance traveled to empty the

    22

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    34/85

    capacitor. This can be calculated using the specifications of the vehicle used to capture

    the GPS data.

    In the ChargeCar data a Honda Civic is used. The power required for this vehicle drivingat a constant speed is can be calculated as follows, with the constants and units described

    in Table 3.1.

    P=A

    2 Cd D v

    3 + Cr m g v

    Symbol Description Value

    A Frontal area 1.988 m2

    Cd Drag coefficient 0.31D Air density 1.29 kg/m3

    Cr Roll resistance coefficient 0.015

    m Car mass 1200 kg

    g Gravitational acceleration 9.81 m/s2

    W Capacity energy capacity 190080 J

    v Vehicle velocity m/s

    P Power W

    Table 3.1: Constants and units needed to calculate the prediction distance

    The supercapacitor used in the ChargeCar test car is the Maxwell BMOD0165. The

    maximum stored energy in this capacitor is 52.8 Wh or 190080 Joule. The time needed

    to use the 190080 J while driving at constant velocity v is then:

    t = 190080 JA2CdDv3+Cr mgv

    = 190080 J0.3975v3+176.58v

    The distance covered in function of that time and velocity is then:

    d = v 190080 J

    0.3975 v3 + 176.58 v

    23

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    35/85

    Figure 3.5: Maximum prediction distance vs. vehicle velocity

    Figure 3.5 shows that the required distance decreases as speed increases. At 1 km/h

    the maximum distance is 1076 meter. As it approaches 0, the distance approaches +.

    From this result we would say that we need to predict the coming kilometer. However, he

    car will usually be driving at minimum 50 km/h. The distance required at this speed is

    750 m.

    3.3.2 Root Mean Square Error (RMSE)The root mean square error is a frequently used error measure of the deviation between a

    predicted model and the actual observed values. It allows us to compare two signals (e.g.

    the speed profiles) and aggregate the point-wise differences (or residuals) between them

    into a single number to evaluate the models used. For vectors x (observed values) and x

    (predicted values) the RMSE is calculated as follows:

    RMSE(x,x) =MSE(x,x) =

    ni=1(xi xi)

    2

    n

    3.3.3 Kurtosis Difference

    Kurtosis is a measure of peakedness, in Machine Learning usually used as a measure of

    non-Gaussianity 1 [3]. Through experiments we noticed a model sometimes converges to a

    weighted average of the history of the current trip. This might be the best result according

    to the RMSE and MAD error measures above but will not be as useful for a controller1Gaussianity is the similarity of a distribution with the normal (or Gaussian) distribution

    24

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    36/85

    because a controller needs a predictor that can correctly predict energy spikes or other

    events that have a large influence on energy usage. It could then be useful to compare the

    peakedness of the prediction profile with the peakedness of the observed profile. We could

    then get a better idea of the similarity of both signals.

    The peakedness (or kurtosis) of a vector x with mean x is calculated by:

    Kurt(x) =1n

    ni=1(xi x)

    4

    1n

    ni=1(xi x)

    22 3

    To compare the peakedness, the kurtosis difference error measure is presented. No previ-

    ous work was found on this measure, at least in the context of Machine Learning. The

    kurtosis difference is merely used in an attempt to quantify the models in a secondary

    way. The kurtosis difference of the model output y and the target output y is calculatedby: Kurt(y) Kurt(y).

    3.3.4 Receiver Operating Characteristic (ROC)

    The Receiver Operating Characteristic (ROC) is a classification error measure developed

    in WWII by radar engineers and has since been used in large number of areas. In recent

    years it has also gained a lot of interest in the field of machine learning and pattern recog-

    nition [13].

    The output of classifier models is usually continuous, but its often hard to evaluate per-

    formance of these model because usually a hard threshold needs to be set to classify the

    output. The ROC curve is able to visualize the trade-off between the hit rate (or true

    positive rate or TPR) and the rate of false positives (or FPR) in binary classification. The

    TPR is equivalent with the proportion of actual positives which are correctly identified.

    The FPR on the other hand, is the proportion of negatives which are wrongly identified

    as positive.

    The curve is calculated by iterating over every possible hard threshold that can classify

    the output of the model. Classifiers appearing on the lower left hand-side of an ROC curvecan be thought of as strict or conservative (A in Figure 3.6). They only make positive clas-

    sification with strong evidence. Classifiers appearing on the right hand-side of the ROC

    curve are less selective but result in a lot of false positives (B). Intuitively the best model

    then stretches as far as possible to the top left hand-side (C). Random classification models

    result in points along a straight line from the bottom left to the top right of the graph (D).

    Classifiers under this line (E) can be thought of as worse than random, but if wed inverse

    25

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    37/85

    Figure 3.6: A basic ROC graph showing five discrete classifiers

    the classifier and switch the target classes, the classifiers result is inversed, which makes

    it perform better again than the random classifier. Consequently, the models that extract

    no knowledge from the data will roughly follow the straight line from the random classifier.

    The ROC measure also provides a way of evaluating model performance without set-

    ting a threshold by calculate the total surface area under the curve, known as Area Under

    Curve (AUC). Well mainly be using this measure for comparing performance of models

    for binary classification.

    One single threshold can be selected along the ROC curve to know the exact percent-

    age of correctly classified samples for a selected trade-off between false and true positive

    rates. Often the threshold is chosen where the false and true positive rates are equal. If

    the classifier is used for a specific purpose, the cost of a false positive is sometimes higher

    than the cost of a true negative or vice versa. The threshold can then be optimized in

    relation to that application specifically. Two example ROC curves are shown in Figure3.7, along with the line at which the two error rates are equal.

    26

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    38/85

    Figure 3.7: Example of the ROC curves of two models, including the Equal Error Rate line, wherethe True Positive Rate is equal to the False Positive Rate.

    27

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    39/85

    Chapter 4

    Time series prediction of vehiclepower, speed and acceleration

    To predict the future power demand profile of a car using RC techniques, a different ap-

    proach as the prediction of financial time series is needed, because there are typically fewer

    periodic factors in driving a car. Stopping and accelerating often happens in a similar way,

    but these stops come at near random intervals if there is no pre-existing information about

    the cars environment.

    Vehicle power demand depends on many physical factors and can be decomposed in other

    ways: elevation differences, acceleration, speed, etc... . Elevation is not expected to change

    over several trips so we can read it directly from the road graph. Predicting the vehicleacceleration and speed however, is far more complex. In this chapter, we present and eval-

    uate several time series prediction models for the vehicles power demand, acceleration

    and speed profile. An example of these profiles can be seen in Figure 4.1. Note that from

    here on, the profiles are evaluated on a distance scale instead of a time scale, as explained

    in the previous chapter.

    4.1 Evaluation methodology

    The theoretical prediction distance calculated in section 3.3.1 was 750m. However, thememory capabilities of reservoirs is limited and after predicting a number of steps, the

    influence of the real observed samples on the reservoir diminishes[14]. In the setup pre-

    sented by Jaeger, the input was forgotten after about 400 time steps. When adding noise,

    the memory reduced to around 200 time steps. Experiments showed that trying to predict

    any further than 200m with the models presented here, yielded misleading results which

    were difficult to explain. We therefore decided to limit the evaluation to predicting the

    28

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    40/85

    Figure 4.1: An example power demand, speed, acceleration and elevation profile

    first 200m.

    As mentioned in Chapter 3, in total about 6915 km is covered. The dataset was con-

    verted to a distance scale which means the dataset now contains about 6,915,000 samples.

    To train and test the models in this chapter, all trips in the dataset were divided in 200m

    intervals. The information of previous trips over each interval was extracted from the road

    graph and merged with the intervals.

    For time series prediction, especially using RC, a warm-up period is needed (see section

    2.6.1). We couldt use the reservoir states of the previous intervals because the reservoir

    states contain the predicted signal over the interval, not the real signal values. A poor

    prediction would therefore have an effect on the next prediction. Moreover, the previously

    predicted interval is not always a part of the current trip.

    A warm-up period is therefore added before every prediction interval. This should be

    long enough to forget the previous states, but short enough to limit memory require-

    ments, and the computation time required to run over all the warm-up samples. Because

    the maximum possible prediction distance seemed to be around 200m, we expect that

    300m is enough to largely forget the state of the previously predicted interval.

    29

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    41/85

    Since the input weights and the internal weights of a reservoir are chosen randomly, a

    particularly good reservoir could be generated by training one model while another model

    was trained with a very weak reservoir. This could lead to misleading results. Therefore

    each experiment involving RC is done over 10 different reservoir instances. When plottingthe resulting errors, the standard deviation of the results of a model is given as well using

    error bars. In text, the standard deviation is given in parentheses along with the average

    results.

    To include all intervals, the reservoirs would now need to run 10 times over 17,287,500

    samples. Additionally, were predicting the power demand as well as the speed and accel-

    eration profile. Too much time would be required to finish all experiments in a reasonable

    time frame. Therefore, a random subset of 2,230,500 samples (containing 9566 prediction

    intervals) was chosen and fixed for the remaining experiments.

    Because of the restrictions above, cross-validation was unfortunately not an option. In-

    Figure 4.2: Example of splitting the profiles in prediction intervals

    stead, the models were trained on the first 25% of the trips. The next 25% was used as a

    validation set to optimize the model parameters. The performance of the resulting models

    was then evaluated on the remaining 50% of the dataset. The test set was chosen large

    enough to ensure the models are evaluated on a data set large enough to draw general

    conclusions.

    30

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    42/85

    4.2 Baseline models

    4.2.1 System setups

    First, some simple and linear models are presented and evaluated. For each model we

    define an abbreviation in the paragraph title to be able to refer to them more clearly

    afterwards.

    Last value as prediction (LV)

    The last observed value is used as prediction for the next predicted values: y(t) = y(t 1).

    As speed is usually quite constant this should already provide a good estimate. The longest

    distances in the dataset are often over highways where speed hardly changes.

    Averages from previous trips as prediction (SA/FA)

    The averages of the previous trips over the path of the current trip are used directly as

    prediction. The performance of the slow average profiles (SA) and the fast average profiles

    (FA) are evaluated separately.

    Offset averages as prediction (OA)

    The predictions of the previous model dont make any use of the current trip however.

    At least the first few predicted values should be close to the last known values. To solve

    this, we first calculate the difference between the current trip and the averages of the lastobserved sample of the interval. This offset is then added to the averages over the rest of

    the prediction.

    Weighted time delay window (TDW)

    A weighted average is taken of the previous values. This allows the model to incorporate

    the recent history of the current trip. The weights of every point in the recent history

    window y(t nwindow size), ..., y(t 1) are trained using ridge regression to predict one

    step ahead: y(t). The predicted value y(t) is fed back and used as part of the current trip

    history: y(t nwindow size + 1), ..., y(t 1), y(t) to predict y(t+ 1).

    Of course, we need to know how many previous values need to be incorporated: the

    size of the time delay window needs to be determined. As shown in Figure 4.3, the opti-

    mal window size is 2 for the speed profile. For the acceleration profile, a window of size 3

    is chosen. Lastly, for the power profile, a very large window size is preferred. The error

    31

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    43/85

    function flattens out around window size 200. This value is therefore chosen as the window

    size.

    Figure 4.3: RMSE error values vs. time window size in the TDW model

    Weighted time delay window with averages (TDWAt/TDWAtdw)

    Figure 4.4: The weighted time delay model (TDW) used for each profile (with added averages atstep t (TDWAt) in dashed rectangle)

    The TDW model above can be extended with input of the average profiles of previous

    trips to investigate the influence of the information in the road graph. The predicted value

    at each step is combined with the average value at that step (TDWAt). Additionally, thismodel is extended again by also including a weighted average of the history of the average

    profiles (TDWAtdw).

    Initially all information from the road graph was included: the slow and fast averages

    of power demand, speed and acceleration as well as the average chance to stop over the

    current road segment and the elevation difference between two successive samples.

    32

  • 7/30/2019 Jonas Buyl - Power Demand Prediction of Vehicles on a Non-fixed Route

    44/85

    Some of the information in the road graph is not useful. To detect the contributing

    averages, a feature selection is done on the input dimensions using Least Angle Regression

    (LAR). With this method, the weights of the input dimensions that hardly contributetowards improving the solution can drop to 0, which is not possible using linear (or ridge)

    regression. For the specifics of this algorithm we refer to the article by Efron et al.[12].

    This method was chosen over other similar stepwise methods such as forward feature selec-

    tion because its just as fast as forward feature selection but generally performs better[12].

    No additional experiments were done in this thesis to confirm this however.

    For every profile, the last observed value was selected. For the power demand profile,

    the fast average power value was selected as well. For the speed profile almost all averages

    remained non-zero, except for the slow average speed. Lastly for the acceleration profile,

    both the fast average acceleration and the average chance to stop were chosen.

    Training the TDWAt model weights using LAR does not perform as well as training

    the weights using ridge regression i