Corrected ISA Paper

download Corrected ISA Paper

of 15

Transcript of Corrected ISA Paper

  • 8/8/2019 Corrected ISA Paper

    1/15

  • 8/8/2019 Corrected ISA Paper

    2/15

    INTRODUCTION

    Fuel cells convert chemical energy to electric energy. Recent global economic and political conditions

    raise the desirability of PEM fuel cells as an alternative residential power source due to their reliance on

    readily available fuels such as natural gas, propane or bottled hydrogen. Unfortunately, commercialviability remains elusive due mainly to the prohibitively short system life. While the industry is fast at

    work characterizing and mitigating the causes of system failure, comparatively little attention has been

    paid to the possibility of extending system life through more efficient operation. Predictive models ofresidential power usage are the key element of the control systems that will drive efficient operation. To

    maximize efficiency these models must adapt to seasonality and the changing habits of users.

    PEM fuel cells create electric power using a three-stage process. In the first stage fuel is converted topure hydrogen by the reformer. Auto-Thermal Reforming is a commonly used process in which the fuel

    is broken down into hydrogen and carbon monoxide through the introduction of steam in the presence of

    a catalyst.

    FIGURE 1 FUEL CELL BLOCK DIAGRAM

    The power-producing element of the fuel cell system is the stack. The stack consists of a number of

    individual fuel cells stacked up in series. Each cell contains a membrane electrode assembly (MEA)

    between two conducting plates. The MEA consists of a fuel electrode (anode) and an oxidant electrode(cathode) separated by an ion conducting membrane. When hydrogen gas is introduced into the system

    on the anode side, the catalyst surface of the membrane splits hydrogen gas molecules into protons and

    electrons. The protons pass through the membrane to react with oxygen on the cathode side (forming

    water). The electrons, which cannot pass through the membrane, must travel around it, thus creating thesource of DC electricity. The power inverter converts the DC power to AC. Additionally the fuel cell

    has a cooling system and water management system. Fuel cells do not handle power transitions

    instantaneously so batteries are used to handle surges.

    INTELLIGENT CONTROL APPROACH

    In the absence of usage profiles, the fuel cell must operate continuously and the batteries must be kept at

    a fairly high state of charge to provide power quality comparable to the grid. To avoid brownouts a

  • 8/8/2019 Corrected ISA Paper

    3/15

    minimum amount of power must be generated at all times. The excess energy produced can provide

    heat if necessary and the power can be returned to the grid in some locations. Clearly, however this

    prolonged operation adds unnecessary run hours to the stack and accelerates end of life.

    Reliable information about homeowner power usage habits would provide the opportunity to idle the

    system or even shut it down in some cases. Additionally, the batteries could be held at a lower state ofcharge and recharged at times compatible with low power operation. All of this would result in reduced

    system run hours, less wear and prolonged system life. This is the driving force behind the development

    of predictive models of residential power use for intelligent, adaptive control systems.

    The general approach is to collect power usage information for a period of time on board the fuel cell,

    and use the on-board model to predict a usage profile for the next day. Periods of low power use would

    be identified and if sufficiently long the system can be shut down or idled. Periods of high usage couldbe anticipated by increasing the battery state charge.

    PREDICTIVE MODELING OF RESIDENTIAL POWER USAGE

    LOAD FORECASTING

    Load forecasting is a method used by electric power providers to predict how much electricity a

    specified group will demand for a given time. The current state-of-the-art in load forecasting studies the

    activities of both the residential and commercial communities as well as weather factors in order to base

    predictions. The factors that are most interesting to forecasters in addition to the actual load are time ofday, day of the week (weekend versus midweek), holiday/non-holiday, ambient temperature, dew point,

    wind speed, and cloud cover, among other weather-related variables. In the stationary fuel cellapplication however, it is necessary to build the model based on only the activity of the single home thatthe system is providing energy for. Moreover the physical location of the system will not always be

    outside the home, where it could provide ambient temperature or other weather related information.

    Therefore in this application, any possible model could only have previous load values and calendartime as inputs.

    RESIDENTIAL POWER USAGE AS A STOCHASTIC PROCESS

    The manner in which electric power is consumed in a home is unquestionably a random process. Evensmall homes have refrigerator compressors, furnaces or air conditioners that may require power at any

    time during the day depending on weather, the number of current occupants in the home and a number

    of other factors. Figure 3 shows the power usage of a home over the course of a week and it is clear thatwhile there is randomness in the process, both the mean and to a lesser extent the variance change in

    time (the process is not homogenous).

  • 8/8/2019 Corrected ISA Paper

    4/15

    FIG. 2 RESIDENTIAL ELECTRIC POWER USAGE PROFILE

    Initial attempts to model this data were based on the assumption that residential power usage is a

    Markov process that is continuous in time but discrete in state. The load data was discretized by placingit in bins. Matrix quantities and load transition probabilities can be calculated as follows:

    ( ) tiijj et,x|t,x = (1)

    Where is a matrix transition probabilities from load i to load j in time t = tj-ti. = matrix of transition probability rates for all i and j

    This model was sufficient for system engineering analysis and design, as it was fairly straightforward to

    apply Monte Carlo techniques and create simulated profiles. However, to estimate the required matrices

    for a useful number of load increments computational became difficult even for one home. More

    difficult was the evaluation of the matrix exponential (matrix power series).

    A more careful look at residential power usage suggests that it is more accurately modeled as acontinuous time, continuous state Markov random walk. In other words, it is fair to say that for all

    practical purposes load usage is a continuous variable, even in a small home given the variable load

    requirements of blowers, compressors and pumps, among other things. Fortunately, continuous-time,

    continuous-state Markov processes are governed by the laws of Geometric Brownian Motion (GBM),fairly well characterized in the literature in a large number of stock market applications.

    GBM may be described by the following stochastic differential equation with L(t) representing thechange in residential power usage in time:

    dZtLdttLtdL )()()( += (2)Where L(t) = power load at time t

    = drift

    = volatility

    dZ = Weiner increment = N(0,1)d t

    N(0,1) = Standard Normal Distribution

  • 8/8/2019 Corrected ISA Paper

    5/15

    Dividing through by L(t) and applying Itos Lemma [Ito, 1944] with F(L(t),t) = ln(L(t)):

    dZdt2

    dF2

    +

    = (3)

    This stochastic differential equation has an explicit solution that lends itself well to Monte Carlo

    simulation:

    ( )( )

    +

    =t10Nt

    2

    0

    2

    eLtL,

    (4)

    With historical data the literature [1] recommends a simple method for estimating the parameters

    )/( 22

    and . Calculated from the data:

    ttLtLt 1iii

    ))(ln())(ln()(

    = . (5)

    Where: = the instantaneous drift

    The parameter

    =

    2

    2

    , the mean of the data, and is just the standard deviation of all . Later,

    GBM will be applied to the analysis of residential usage and demonstrate a method for modeling the

    volatility and drift parameters as a function of time (inputs).

    RESIDENTIAL POWER USAGE AND ARTIFICIAL NEURAL NETWORKS

    It was found through a background literature search that the most common method of electric power

    load forecasting is by means of artificial neural networks (ANN). ANN has some major advantagesover other forecasting tools: it can model with high accuracy a data set that is nonlinear and interactive

    by learning the general patterns associated between the input(s) and the expected output(s). ANNs are

    usually composed of three layers: an input layer, a middle or hidden layer (although there could beseveral), and an output layer. Each input element to the model is connected to each neuron contained in

    the hidden layer. In turn, the hidden layer is then connected to the output neuron(s). This type of

    network in which all elements flow in one direction from inputs to outputs is called a feedforward

    network. It is through these interconnections that ANN can have high accuracy when modelingnonlinear functions.

    The output of the neural network is a matrix equation. Lying at each connection between neurons is aweight, or value, that assesses the strength of the input relative to the output. The product of the weights

    and inputs is then passed through an activation function, whose output serves as the input to the hidden

  • 8/8/2019 Corrected ISA Paper

    6/15

    layer neurons. These values are then combined and passed through another activation function at the

    output neuron to produce the final output value.

    FIG. 4 BASIC ANN ARCHITECTURE

    NETWORK TRAINING

    In order to determine the weights at each neuron, the ANN needs to be trained. Training is the process

    by which the weights and biases are optimized to minimize the overall error of the network. To train the

    network, the training set, which is comprised of input values paired with their respective target value, ispassed through the network. There are certain precautions to be aware of prior to training and they are

    primarily related to proper generalization; the concept behind training is for the network to learn the

    general relationship between the input and output values. Therefore, a large data set that isrepresentative of the sample space needs to be used. For example, the ideal load data used in training

    would represent all load usage characteristics related to the home. If too small a data set has been used

    during training that is not representative of the entire sample space, the network will not learn thegeneral pattern. It will then perform poorly during simulation or use on board the system.

    Prior to training, the weight and bias values at all nodes are randomly initialized. The inputs are passed

    through the network to produce an output, which is then compared to a target value. Depending uponthe error between the output and the target the network weights and biases are adjusted. This process

    continues until the weights and biases produce a minimum performance error.

    PREDICTION INTERVALS FOR NEURAL NETWORKS

    Point predictions with neural networks are subject to the same type of uncertainty questions as

    regression or any other modeling tool. It is therefore desirable to characterize the uncertainty of the

    prediction with some type of prediction interval. The width of the interval would be an integral part of

    the intelligent control algorithm. Unfortunately, unlike regression, standard methods for predictioninterval estimation are not readily available for neural networks and are still the subject of debate. An

    added complication in our application is the fact that residential power usage is a stochastic process and

    both the mean value and the variance change in time.

    The total variability of neural network predictions like all model predictions can be thought of as having

    a model uncertainty component mS2 and a noise component )(S v x2 . The general approach is to

    estimate the model uncertainty by characterizing the change in network performance with respect tochanges in the network weights. The noise component can be estimated using a separate network that

    models the variance as a function of the inputs.

  • 8/8/2019 Corrected ISA Paper

    7/15

    Bishop [10] has proposed an estimate of the model uncertainty that makes use of the Jacobian and

    Hessian matrices, calculated as part of backpropagation training algorithms used in this work. TheJacobian, J, is the matrix of the first derivatives of the network errors with respect to the weights and

    biases. The Hessian, H, is the matrix of second derivatives. The inverse of the Hessian is regarded as an

    unbiased estimate of the variance/covariance matrix with respect to network weights and biases.

    The performance gradient is first estimated:

    EJg T= (6)

    Where g = gradient of the error functionE = Network Error (difference between the actual and predicted load values)

    With equation 6 the model uncertainty, mS2 , can now be estimated:

    gHg 1T = mS2 (7)

    From a separate neural network (or additional layer) with exponential activation function the noise

    component n)(S x2

    is estimated as a function of the input vector. Hwang and Ding [8] suggest the

    following prediction interval:

    mnv)k)d(n,/(n S)(St)(y 2112211 1+ ++ xx (8)

    Where y = the predicted response to the input set xn+1t = the students t distribution

    n = number of training points,d = the number of input variables

    k = the total number of estimated weights & biases

    RESIDENTIAL ELECTRIC POWER LOAD DATA AND NETWORK INPUTS

    The data were instantaneous power usage from multiple homes taken at 15 minute intervals over a

    calendar year. The data were fairly high quality but did contain some zero values that generally

    occurred in sequences of 3 to 20. Banks and Carson [4] recommend a Monte Carlo approach forhandling sequential missing data in a time series. The missing sequences were filled using a sequence of

    normal random number based on the mean and standard deviation of the previous five values.

    The inputs to the network in this application are a combination of calendar times and previous loadvalues. As part of data preprocessing the correlation matrix was calculated and the input values retained

    that were correlated less than 0.5.

  • 8/8/2019 Corrected ISA Paper

    8/15

  • 8/8/2019 Corrected ISA Paper

    9/15

    TABLE 1 ROBUST OPTIMIZATION EXPERIMENT

    NETWORK OPTIMIZATION RESULTS

    The results graphed in figure 5 below show the effects of the number of nodes, epochs, training periods,

    and the type of training algorithm used on the signal-to-noise ratio and mean R2 value. The most robustnetwork architecture will be that which maximizes the signal to noise ratio for all control factors.

    Clearly, the number of nodes required for the most robust network would be either seven or eleven.

    This supports the idea that a small amount of neurons in the hidden layer is important for generalization

    of the function. Too many neurons can result in overfitting, and this explains why the network with 19nodes had the lowest signal to noise ratio of the simulation set. When overfitting occurs, the network

    has memorized the relationship between input and output of the training set, instead of learning it. Anetwork that has good performance during training but poor performance during simulation hasmemorized and modeled the noise components of the training set. When inputs are passed through such

    a network, the results will possess a high amount of error. Therefore the precautions to take prior to

    training are to verify the data set is large enough, as well as representative of behavior. Conversely, ifthere arent enough neurons, then the network is not flexible enough to generalize the function. This

    may be why eleven nodes slightly outperformed seven nodes.

    Likewise, too many epochs allowed the network to begin memorizing the relationship between input andoutput, rather than only learning it. The training period that went back nine weeks performed the best

    since there was more data for the network to learn from. This longer data period contained more

    examples of the time-load relationships and patterns that would reappear in the simulation set.

  • 8/8/2019 Corrected ISA Paper

    10/15

    FIG. 5 SNR AND MEAN R2 FOR NETWORK ARCHITECTURE OPTIMIZATION

    The training algorithms tested also had some effect on signal-to-noise ratio. The trainbfg algorithm

    slightly outperformed the trainlm algorithm [9]. Both algorithms are considered iterative, meaning

    they continue training until the error function reaches a minimum. If the error begins to increase,training ends. One problem with this approach is if the algorithm has reached a local minimum, the

    error would have to increase in order to continue training to find the global minimum. The Levenberg-

    Marquardt (trainlm) and quasi-Newton (trainbfg) algorithms do not allow this, so the performancewould be caught at a local minimum. The same applies for saddlepoints, or very flat areas on the error

    plane. This is why the performance of the training algorithms is subject to the initial choice of weights

    and biases [10]. Performance also depends however on the speed with which it converges. Since the L-

    M algorithm moves faster (or takes steeper steps) towards convergence, it would be more likely to getstuck at a saddlepoint or a local minimum than the quasi-Newton algorithm. This may account for the

    improved performance of trainbfg.

    The logs-sigmoid training function outperformed the hyperbolic tangent function if only slightly. One

    reason may be that the log-sigmoid activation function does not allow for a negative value output. No

    load value should be negative, but since the load values used were in kilowatts, there could be timeswhere the house is drawing low power on the order of watts. By restricting the output to only positive

    values, negative network outputs do not have to be corrected by the values of the weights and biases,

    and instead they are denied by the activation functions. This also applies to the linear and positive linearactivation functions. The final output value does not have to be corrected by the weights and biases

    prior to being passed through the activation functions.

  • 8/8/2019 Corrected ISA Paper

    11/15

    CONFIRMATION EXPERIMENTS

    It was concluded then from network optimization experiment, that a three-layer feedforward network

    should be used with seven nodes at the hidden layer, a logsig activation function at the hidden layer,

    and a poslin transfer function at the output neuron. The data set used for training should go back nineweeks and the network training algorithm should be the quasi-Newton backpropogation algorithm.

    Finally, There should be 150 epochs for training.

  • 8/8/2019 Corrected ISA Paper

    12/15

    FIG. 7 ANN RESULTS FOR ONE HOUR AHEAD PREDICTION (SIMULATION)

    Figure 7 displays the results from load predictions for one day from a home in the west during the month

    of May and another home from the south east in the summer. Prediction intervals were calculated as

    described above.Table 2 displays the training and simulation R2 values for four homes.

    TABLE 2 NETWORK RESULTS R VALUE OF ACTUAL VS PREDICTED

    Southeast home

    Winter

    West home

    Spring

    Southeast home

    Summer

    West home

    Fall

    Training MSE 0.64 0.94 0.78 0.796

    Simulation MSE 0.78 0.94 0.749 0.96

    The optimized network architecture has very good prediciton accuary over a range of homes andseasons.

    STOCHASTIC MODELING OF RESIDENTIAL LOAD DATA:

    It has been demonstrated that residential power usage can be modeled and predicted with optimized

    neural networks to a surprising level of accuracy and reproducibility. This fact suggests that although

    the data appear noisy on first glance, the apparent randomness is probably due to multiple repeating andinteracting patterns in the data, not recognizable to the human eye. To further illustrate a GBM

    simulation of the load data is examined.

    Figure 8 below compares a GBM simulated profile with actual load data for a day. The simulation was

    conducted using equation 4. The drift and volatility parameters are constants, estimated from the

    previous days data.

    FIG. 8 LOAD PROFILE WITH STATIC GBM SIMULATION

  • 8/8/2019 Corrected ISA Paper

    13/15

    It should be no surprise that this static GBM simulation does a poor job predicting residential loadusage. By definition the drift and volatility parameters are constants. Even with accurate estimates the

    model is not capable of accommodating anything more than linear changes in drift and volatility. The

    question is raised however as to what extent the prediction accuracy of a GBM simulation could beimproved if both the drift and volatility parameters could be modeled as a function of time. This

    approach has been characterized by [1] and [2], using empirical, auto regressive methods to model the

    drift and volatility.

    With neural network modeling tools readily available it seems prudent to attempt to use them for drift

    and volatility modeling and assess the prediction accuracy of this dynamic GBM simulation in

    comparison to results outlined above. The parameter was calculated for the load data (equation 5) ofa particular home as well as a five point moving drift and volatility.

    Volatilities are variances and as such are 2 distributed. The actual distribution of the volatilities was

    sufficiently close to exponential such that a log transformation rendered it nearly normal. The same feedforward back propagation type training was used to model both the moving drift and the moving log of

    the volatilities. Equation 4 was used with the drift and volatility parameters replaced by the networks.The simulation R-values for the modeled drift and volatility when compared to actual data were 0.98

    and 0.74 respectively.

    FIG. 9 LOAD PROFILE WITH DYNAMIC GBM SIMULATION

    In figure 9 it is evident that patterns begin to emerge in the predictions (the R value of these predictions

    is about 0.5 for reference). The scale and magnitudes of the predictions are not yet close to the actual

    data however the accuracy is considerably improved over the static drift and volatility predictions above.

    Though more effort could be undertaken to optimize these predictions it hardly seems necessary giventhe prediction accuracy and generalization of the neural network models of the raw data.

  • 8/8/2019 Corrected ISA Paper

    14/15

    CONCLUSIONS:

    It was demonstrated in this paper that residential power usage profiles can be accuratetely modeled

    using both neural networks and geometric brownian motion with dynamic models for the drift and

    volatility parameters. This alone is a significant step toward predictive, adaptive control systems thatmay one day be used to extend fuel cell system life. The neural network approach was far suprior to the

    GBM even when neural networks themselves were used to model the drift and volatility parameters as a

    function of time.

    An effective method of optimizating network architecture parameters was demonstrated based on

    Taguchi robust design methodologies. The results of the analysis were predictable and grounded in

    neural network theory.

    Going forward it will be our task to identify methods for determining optimimal starting weights for the

    network to avoid convergence to local minimum. Adpative training will also be studied as well as

    alternative network types (radial basis, sequential, elman).

  • 8/8/2019 Corrected ISA Paper

    15/15

    REFERENCE:

    1. Hull, J. & A. White,The pricing of option assets with stochastic volatilities, Journal of Finance

    42(2), 1997, pp-281-300.2. Levy, G., An introduction to GARCH models I Finance, Financial Engineering News, 2001,

    22.

    3. Ingleby, M & Onyango, S (2002), Robust estimation of historical volatility by a HoughTransform, Masters Thesis, University of Huddersfield.

    4. Banks, J., Carson, J., Random Variate Generation, Discreet Event System Simulation, Prentice

    Hall, Englewood Cliffs, New Jersey, 1984.

    5. Taguchi, G., Chowdhury, S., Taguchi, S., Robust Engineering Process Formula, RobustEngineering, McGraw Hill, New York, New York, 2000.

    6. Klebaner, F., Brownian Motion Calculus, Introduction to Stochastic Calculus with

    Applications, Imperial College Press, London, England, 2001.

    7. Papadopoulos, G., Edwards, P.J., Murray, A.F., Confidence Estimation for Neural Networks: APractical Comparison, Department of Electronics and Electrical Engineering, University of

    Edinburgh, Public Domain.8. Hwang, G., Ding, A.A., Prediction Intervals for Artificial Neural Networks, Journal of the

    American Statistical Association, Vol. 92, No. 438, pp 748-757.

    9. Demuth, H., Beale, Mark, Back Propagation, Neural Network Toolbox For Use with Matlab,The Mathworks, 2001.

    10. Bishop, C.M., Parameter Optimization Algorithms, Neural Networks for Pattern Recognition,

    Clarendon Press, Oxford, 1995.

    11. Hagan, M.T., Demuth, H.B., Beale, M.H., Performance Optimization, Neural Network Design,PWS Publishing Company, Boston, MA, 1996

    12. Lee, K.Y., Cha, Y.T., Park, J.H., Short-term Load Forecasting Using An Artificial NeuralNetwork, IEEE Transactions On Power Systems. 7(1), 1992, pp124-132.