Reinforcement-Learning-Based Optimal Control of Hybrid Energy … · 2019-01-24 · novel RL-OPT...

10
1 Reinforcement-Learning-Based Optimal Control of Hybrid Energy Storage System in Hybrid AC/DC Microgrids Jiajun Duan, Member, IEEE, Zhehan Yi, Member, IEEE, Di Shi, Senior Member, IEEE, and Zhiwei Wang, Senior Member, IEEE Abstract—In this paper, a reinforcement-learning-based on- line optimal (RL-OPT) control method is proposed for hybrid energy storage system (HESS) in AC/DC microgrids involving photovoltaic (PV) system and diesel generators (DG). Due to the low system inertia, conventional unregulated charging and discharging (C&D) of energy storages in microgrids may intro- duce disturbances that degrade power quality and system perfor- mance, especially in fast C&D situations. Secondary and tertiary control levels can optimize the state of charge (SOC) reference of HESS periodically; however, they are lacking the direct con- trollability of regulating the transient performance. Additionally, the unknown and time-varying system parameters greatly limit the performance of conventional model-based controllers. In this study, the optimal control theory is used to optimize the C&D profile and to suppress the disturbances caused by integrating HESS. Neural networks (NN) are devised to train the nonlinear dynamics of HESS based on the input/output measurement, and to learn the optimal control input for bidirectional converter interfaced HESS using the estimated system dynamics. Because the proposed RL-OPT method is fully decentralized, which only requires the local measurements, the plug & play capability of HESS can be easily realized. Both islanded and grid-tied modes are considered. Extensive simulations and experiments are conducted to evaluate the effectiveness of proposed method. Index Terms—HESS, hybrid AC/DC microgrid, optimal con- trol, reinforcement learning, neuron network. I. I NTRODUCTION T HE development of microgrid technologies featuring renewable distributed energy resources (DERs) have brought new opportunities as well as challenges to conven- tional distribution systems [1]–[3]. Meanwhile, HESS (e.g., combination of batteries and ultra-capacitors) are deployed to compensate the intermittency of renewable DERs and participate into the real-time demand-supply accommodation, which also helps to defer the extraordinary cost of updating the conventional power grids [4], [5]. This unveils a completely new path to alternate the traditional operation pattern of power systems, which creates significant benefits and convenience for both power suppliers and customers [6]–[8]. Currently, HESS consisting of Lithium-Ion batteries (LIB) and Ultra-Capacitors (UC) have been widely incorporated in microgrids [9], [10]. However, the discrepancies of energy and power densities This work was supported by the SGCC Science and Technology Program under project Hybrid Energy Storage Management Platform for Integrated Energy System. J. Duan, Z. Yi, D. Shi and Z. Wang are with GEIRI North America, San Jose, CA, 95134 USA, e-mails: {jiajun.duan, zhehan.yi, di.shi, zhi- wei.wang}@geirina.net. result in the disparate dynamic inertias between LIB and UC. Therefore, proper real-time control of HESS with promising transient performance becomes a challenging problem. In a classic microgrid with a hierarchical control structure, the SOC of energy storage unit (ESU) can be decided by the centralized tertiary control and realized by distributed secondary/primary control [11]–[13]. In the charging process, ESU works as a load bank, while in the discharging process, it performs as a DER. For the rest of the time, ESU should be isolated from the system to prevent continuously repeated C&D caused by the self-discharging effect [14]. Nevertheless, due to the low inertia in microgrids, the initializing and switching C&D processes for ESU may lead to nuisance dis- turbances in the system, especially in fast C&D scenarios [15], [16]. The unexpected disturbance significantly degrades the power quality and might damage sensitive loads, such as data centers. It may even trigger false protection schemes under the worst scenarios [13]. This disobeys the original purpose of implementing HESS, which is to increase the system stability and reduce the disturbances. Therefore, the desired control policy should provide a smooth C&D solution for HESS in a decentralized manner with plug & play capability [17]. Conventionally, constant current (CC) and proportion- integration (PI)-based control are two of the most popular methods in industrial applications [14], [18]–[20]. Both con- trol methods have their own advantages and disadvantages, respectively. For example, CC control has been widely used in low voltage electronic devices such as cellphone and laptop due to its implementation simplicity [18], [19]. However, the starting and terminating of CC controller will introduce significant voltage disturbances, which is harmful to micro- grids with HESS. From this perspective, PI-based control methods can slightly improve the transient performance, since large disturbances are introduced only during the initializing periods of C&D. However, PI-based methods generally require excessive parameter tuning efforts and rely on the awareness of system dynamics. Once system parameters are changed or deviated from the original set point, e.g., due to aging or heating issues, the performance of PI-based controllers will significantly degrade [20]. Additionally, in practice, the outer voltage loop of double-loop PI controller is usually simplified as a proportional controller to avoid the over-C&D problem, which also limits the performance of PI-based controller [14]. Except for the above benchmarking control methods, several advanced control algorithms have been developed to solve

Transcript of Reinforcement-Learning-Based Optimal Control of Hybrid Energy … · 2019-01-24 · novel RL-OPT...

Page 1: Reinforcement-Learning-Based Optimal Control of Hybrid Energy … · 2019-01-24 · novel RL-OPT control method is developed to realize the smooth C&D control of HESS. A hybrid AC/DC

1

Reinforcement-Learning-Based Optimal Controlof Hybrid Energy Storage System in

Hybrid AC/DC MicrogridsJiajun Duan, Member, IEEE, Zhehan Yi, Member, IEEE, Di Shi, Senior Member, IEEE,

and Zhiwei Wang, Senior Member, IEEE

Abstract—In this paper, a reinforcement-learning-based on-line optimal (RL-OPT) control method is proposed for hybridenergy storage system (HESS) in AC/DC microgrids involvingphotovoltaic (PV) system and diesel generators (DG). Due tothe low system inertia, conventional unregulated charging anddischarging (C&D) of energy storages in microgrids may intro-duce disturbances that degrade power quality and system perfor-mance, especially in fast C&D situations. Secondary and tertiarycontrol levels can optimize the state of charge (SOC) referenceof HESS periodically; however, they are lacking the direct con-trollability of regulating the transient performance. Additionally,the unknown and time-varying system parameters greatly limitthe performance of conventional model-based controllers. In thisstudy, the optimal control theory is used to optimize the C&Dprofile and to suppress the disturbances caused by integratingHESS. Neural networks (NN) are devised to train the nonlineardynamics of HESS based on the input/output measurement, andto learn the optimal control input for bidirectional converterinterfaced HESS using the estimated system dynamics. Becausethe proposed RL-OPT method is fully decentralized, which onlyrequires the local measurements, the plug & play capabilityof HESS can be easily realized. Both islanded and grid-tiedmodes are considered. Extensive simulations and experimentsare conducted to evaluate the effectiveness of proposed method.

Index Terms—HESS, hybrid AC/DC microgrid, optimal con-trol, reinforcement learning, neuron network.

I. INTRODUCTION

THE development of microgrid technologies featuringrenewable distributed energy resources (DERs) have

brought new opportunities as well as challenges to conven-tional distribution systems [1]–[3]. Meanwhile, HESS (e.g.,combination of batteries and ultra-capacitors) are deployedto compensate the intermittency of renewable DERs andparticipate into the real-time demand-supply accommodation,which also helps to defer the extraordinary cost of updating theconventional power grids [4], [5]. This unveils a completelynew path to alternate the traditional operation pattern of powersystems, which creates significant benefits and convenience forboth power suppliers and customers [6]–[8]. Currently, HESSconsisting of Lithium-Ion batteries (LIB) and Ultra-Capacitors(UC) have been widely incorporated in microgrids [9], [10].However, the discrepancies of energy and power densities

This work was supported by the SGCC Science and Technology Programunder project Hybrid Energy Storage Management Platform for IntegratedEnergy System.

J. Duan, Z. Yi, D. Shi and Z. Wang are with GEIRI North America,San Jose, CA, 95134 USA, e-mails: jiajun.duan, zhehan.yi, di.shi, [email protected].

result in the disparate dynamic inertias between LIB and UC.Therefore, proper real-time control of HESS with promisingtransient performance becomes a challenging problem.

In a classic microgrid with a hierarchical control structure,the SOC of energy storage unit (ESU) can be decided bythe centralized tertiary control and realized by distributedsecondary/primary control [11]–[13]. In the charging process,ESU works as a load bank, while in the discharging process,it performs as a DER. For the rest of the time, ESU shouldbe isolated from the system to prevent continuously repeatedC&D caused by the self-discharging effect [14]. Nevertheless,due to the low inertia in microgrids, the initializing andswitching C&D processes for ESU may lead to nuisance dis-turbances in the system, especially in fast C&D scenarios [15],[16]. The unexpected disturbance significantly degrades thepower quality and might damage sensitive loads, such as datacenters. It may even trigger false protection schemes underthe worst scenarios [13]. This disobeys the original purpose ofimplementing HESS, which is to increase the system stabilityand reduce the disturbances. Therefore, the desired controlpolicy should provide a smooth C&D solution for HESS in adecentralized manner with plug & play capability [17].

Conventionally, constant current (CC) and proportion-integration (PI)-based control are two of the most popularmethods in industrial applications [14], [18]–[20]. Both con-trol methods have their own advantages and disadvantages,respectively. For example, CC control has been widely usedin low voltage electronic devices such as cellphone and laptopdue to its implementation simplicity [18], [19]. However,the starting and terminating of CC controller will introducesignificant voltage disturbances, which is harmful to micro-grids with HESS. From this perspective, PI-based controlmethods can slightly improve the transient performance, sincelarge disturbances are introduced only during the initializingperiods of C&D. However, PI-based methods generally requireexcessive parameter tuning efforts and rely on the awarenessof system dynamics. Once system parameters are changed ordeviated from the original set point, e.g., due to aging orheating issues, the performance of PI-based controllers willsignificantly degrade [20]. Additionally, in practice, the outervoltage loop of double-loop PI controller is usually simplifiedas a proportional controller to avoid the over-C&D problem,which also limits the performance of PI-based controller [14].

Except for the above benchmarking control methods, severaladvanced control algorithms have been developed to solve

Page 2: Reinforcement-Learning-Based Optimal Control of Hybrid Energy … · 2019-01-24 · novel RL-OPT control method is developed to realize the smooth C&D control of HESS. A hybrid AC/DC

2

DC Loads

PV

VDC

= =

= ~

VAC

AC Loads

DC/AC

CB= =

PV= ~

HESS

DC/DC Filter

= =

DGLIB=

~

Utility Grid

DC Bus AC Bus

AC/DC

Transformer

Fig. 1. Diagram of the considered hybrid AC/DC microgrid.

different problems of HESS [21]–[25]. However, most of themare targeting at designing optimal power dispatch in upper-level controllers with a relatively long time step [21], [22],[26]. These are not applicable for optimizing the transientC&D profile of HESS. To the best of authors’ knowledge, thereare only a few methods that focus on improving the real-timecontrol performance of HESS. In [23], an adaptive controlmethod is proposed to provide power smoothing control,which calculates the cut-off frequency adaptively to realizeautonomous control of multiple ESUs. A coordinated control ispresented in [24] for hybrid AC/DC microgrids involving ESSand pulsed-power loads. In [25], a hierarchical control methodwith two layers (e.g., energy management and converter con-trol layers) is proposed for the UC energy storage in urbanrailway. These works definitely promote the development ofHESS applications and speed up the technology maturity.However, a common issue is that they are all designed basedon PI algorithms, which introduces considerable disturbancesto the system during C&D processes. Although advancedalgorithms can be developed to find optimal PI gains, they havelimited capability to optimize the entire C&D profile [25]. Thereliance on the awareness of system dynamics also makes thesemethods inapplicable in most practical scenarios. The model-free control concept has been applied in other applicationssuch as twin rotor aerodynamic systems and reverse osmosisdesalination plants [27], [28], which can also be introducedto improve the overall performance of HESS, especially inmicrogrid applications.

As an attempt to address the aforementioned issues, anovel RL-OPT control method is developed to realize thesmooth C&D control of HESS. A hybrid AC/DC microgridinvolving PV, UC, LIB and DG are considered during thecontroller design process. Firstly, the dynamic model of mod-ified bidirectional-power-converter (BPC)-interfaced HESS isderived. Considering that the internal impedance of each ESUis unknown, one NN is developed to estimate the systemdynamic online. Then, another NN is applied to calculate theoptimal control input for the HESS through online learningbased on the estimated system dynamics. The proposed controlscheme also considers both grid-tied and islanded modes of themicrogrid. In grid-tied mode, the main grid is considered as aninfinite source which maintains the bus voltage and reactivepower at the point of common coupling (PCC) through the

voltage source converter (VSC). While in islanded mode, DGsare deployed to maintain the bus voltage at PCC. For eitherscenario, PV works under the maximum power point tracking(MPPT) mode to maximize the renewable DER utilization.The effectiveness of proposed RL-OPT method is thoroughlytested through both software simulations and hardware-in-loop(HIL) experiments. The major contributions of this work canbe summarized as following:• The optimal control problem of HESS is formulated using

RL method to reduce the disturbances caused by C&Dof various energy storage device.

• The proposed model-free method has the adaptivity fordifferent system dynamics based on the input/output datawithout the need system parameter information.

• A novel bidirectional converter topology is designed toavoid redundancy of C&D circuit as well as the self-discharging problem.

• Extensive case studies of both software simulations andHIL experiments have been conducted to test the effec-tiveness of the proposed control method.

The rest of paper is organized as follows. In Section II, theproblem formulation for the studied system is introduced.Then, the optimal control method for known nonlinear systemwith perturbation is formulated. Section IV introduces the pro-posed online RL-OPT control algorithm under unknown sys-tem dynamics and the corresponding implementation process.The case studies of proposed control algorithm in softwaresimulation are performed in Section V and HIL experimentresults with the corresponding analysis are illustrated in Sec-tion VI. Section VII concludes the paper and suggests potentialfuture work.

II. PROBLEM FORMULATION

As is presented in Fig. 1, the studied microgrid consists ofboth AC and DC buses, which are interconnected through abidirectional DC/AC voltage source converter (VSC). The PVarray (in MPPT mode) and a HESS involving both UC and LIBare connected on the DC bus. Grid-tied and islanded operationmode switching of the microgrid is realized by operating thecircuit breaker (CB).

The corresponding switch-level C&D circuit with detailedESU model is presented in Fig. 2. In the modified BPC,switch S3 is added to avoid the self-leakage problem during

Page 3: Reinforcement-Learning-Based Optimal Control of Hybrid Energy … · 2019-01-24 · novel RL-OPT control method is developed to realize the smooth C&D control of HESS. A hybrid AC/DC

3

Fig. 2. Switch-level UC interface circuit.

the initiating period of system. Switches S1 and S2 are used tocontrol the BPC in buck or boost mode to charge or dischargethe ESU (i.e., UC or LIB) in HESS, respectively.

Basically, the dynamics of a standard BPC-interfaced-UCmodel can be represented as

Vc = − Vc

CucRpc+ Ic

Cuc

Uc = Vc +RscIc

Ic = 1Lf

(Uc −RfIc − αVt)(1)

in which

α =

1

1−D , ∀ S1 is off1D , ∀ S2 is off

(2)

and D is the duty ratio of PWM. Rf , Lf and Cf are filterresistance, inductance and capacitance, respectively. Rpc andRsc are the internal parallel and series resistance of UC,respectively. Similarly, Rpb and Rsb are the internal paralleland series resistance of LIB, respectively. Vc and Uc are theinternal and external voltages of ESU, respectively. Vt is theterminal bus voltage. Then, the dynamics in Eqn. (1) can befurther written as

Uc =(Rsc

Lf− 1

CucRpc)Uc −

RscVtLf

α

+ (Rpc +Rsc

CucRp− RscRf

Lf)Ic

(3)

By defining the tracking error as e(t) = Uc(t) − U∗c withU∗c being SOC reference, the error dynamics of (1) can berepresented in a more condensed way as

e(t) = f(e(t)) + g(t)u(t) +D(t), e(0) = e0 (4)

where f(e) = (Rsc/Lf−1/CucRpc)Uc and g(t) = RscVt/Lf

are unknown nonlinear system dyanmic. u(t) = α is thecontrol input. D(t) = [(Rpc + Rsc)/CucRp − RscRf/Lf ]Icis the perturbation term of the system with D(0) = 0 and isbounded as ‖D(t)‖ ≤ dmax. It should be noted that the errordynamics of LIB can be similarly represented in a form asEqn. (4) and the derivative process is omitted here.

Obviously, the performance of conventional PI-based con-trol methods on a nonlinear uncertain system in Eqn. (4)would be very limited, especially when system dynamics areunknown. The trial & error tuning is also impractical and notreliable enough to guarantee the system performance. In orderto maximumly reduce the disturbances caused by C&D ofHESS, the optimal control problem will be formulated withrespect to e and u in next section.

III. OPTIMAL CONTROL DESIGN FOR UNCERTAINNONLINEAR SYSTEM

In this section, optimal control policy is derived for theknown uncertain nonlinear system in Eqn. (4). Then, the RLmethod is developed to solve this optimal control problemunder unknown system dynamics in next section. Firstly,considering a nominal nonlinear system without uncertaintyD, i.e., e(t) = f(e(t)) + g(t)u(t), the infinite-horizon integralcost function can be designed as

J(e0, u) =

∫ ∞0

r(e, u)dt (5)

where r(e, u) = Q(e) + uTRu with R being a symmetricpositive definite matrix and Q(e) = eTPe being a positivedefinite function of e.

Based on the Theorem 1 in [29], there exists a control lawu(e) that can guarantee the asymptotic stability of the closed-loop nonlinear system in Eqn. (4) when the preconditions inEqn. (6) can be satisfied with respect to a positive definitecontinuously differentiable function V (e), a bounded functionΓ(e), and a feedback control law u(e).

V T∂eD(t) ≤ Γ(e)

V T∂e[f(e) + g(t)u] + Γ(e) +Q(e) + uTRu = 0

(6)

where V∂e is the partial derivative of the cost function V (e)with respect to e. Then, cost function Eqn. (5) satisfies

supD(t)∈M

J(e0, u) ≤ Jd(e0, u) = V (e0) (7)

where “sup” denotes the supremum operator that finds theminimal cost Jd(e0, u) greater than or equal to J(e0, u) forany perturbation D(t) ∈M,M = D(t)|D(t) ∈ <, ‖D(t)‖ ≤dmax. Jd(e0, u) is the modified cost function for nonlinearsystem with uncertainty, which can be designed as

Jd(e0, u) =

∫ ∞0

[r(e, u) + Γ(e)]dt (8)

Then, the Eqn. (8) can be further written as

Jd(e0, u) = V (e0)

=

∫ T

0

[r(e, u) + Γ(e)]dt+

∫ ∞T

[r(e, u) + Γ(e)]dt

=

∫ T

0

[r(e, u) + Γ(e)]dt+ V (e)

(9)

Since V (e) is continuously differentiable, Eqn. (9) becomes

limT→0

V (e0)− V (e)

T= lim

T→0

1

T

∫ T

0

[r(e, u) + Γ(e)]dt

⇒ V (e) = V T∂e[f(e) + g(t)u+D] = −r(e, u)− Γ(e)

⇒ 0 = V T∂e[f(e) + g(t)u+D] + r(e, u) + Γ(e)

(10)

It can be observed that Eqn. (10) is an infinitesimal versionof Eqn. (9). Based on Eqn. (10), Hamiltonian of the optimalcontrol problem can be defined as

H(e, u, V∂e) =Q(e) + uTRu

+ V T∂e[f(e) + g(t)u+D] + Γ(e)

(11)

Page 4: Reinforcement-Learning-Based Optimal Control of Hybrid Energy … · 2019-01-24 · novel RL-OPT control method is developed to realize the smooth C&D control of HESS. A hybrid AC/DC

4

Correspondingly, the optimal cost function can be designed as

V ∗(e) = minu∈Ω

∫ T

0

[r(e, u) + Γ(e)]dt (12)

The objective of optimal cost function in Eqn. (12) is toachieve the least tracking error using the least control effortand introducing the least disturbances. Then, Eqn. (12) canbe obtained by solving the Hamilton-Jacobi-Bellman (HJB)equation as

minu∈Ω

H(e, u, V ∗∂e) = 0 (13)

By taking the partial derivative of HJB equation, i.e.,∂H(e, u, V ∗∂e)/∂u = 0, the optimal control law u∗ can bederived as

u∗ = −1

2R−1g(t)TV ∗∂e (14)

Accordingly, the bounded function can be designed as

Γ(e) =1

4V T∂eV∂e + d2

max (15)

It can be easily proven that Γ(e) in Eqn. (15) satisfies thecondition in Eqn. (6), i.e., V T

∂eD(t) ≤ Γ(e). Substituting Eqn.(14) and (15) into Eqn. (13), the HJB equation in terms ofV ∗∂e can be represented as

0 =Q(e) +1

4V ∗T∂e V

∗∂e + d2

max + V ∗T∂e [f(e) +D]

− 1

4V ∗T∂e g(t)R−1g(t)TV ∗∂e

(16)

Theorem 1 (Optimal Control Policy u∗): Consider any non-linear uncertain systems presented in Eqn. (4) with costfunction defined in Eqn. (7) and HJB equation defined in Eqn.(16), provided any admissible control u, the cost function Eqn.(7) is smaller than a guaranteed cost bound Jb given as

Jb = V ∗(e0) +

∫ T

0

(u− u∗)TR(u− u∗)dt (17)

If u = u∗, the cost Jb is guaranteed to be minimized, i.e.,Jb = V ∗(e0).Proof : According to Eqn. (10) and the definition of V ∗(e),the cost function Eqn. (7) with respect to any arbitrary u canbe rewritten as

J(e0, u) = V ∗(e0) +

∫ T

0

[r(e, u) + V ∗(e)]dt (18)

By Eqn. (10) and Eqn. (16), one can obtain that

r(e, u) + V ∗(e) = Q(e) + uTRu+ V ∗T∂e [f(e) + g(t)u+D]

= uTRu+ V ∗T∂e g(t)u+1

4V ∗T∂e g(t)R−1g(t)TV ∗∂e

− 1

4V ∗T∂e V

∗∂e − d2

max

≤ uTRu+ V ∗T∂e g(t)u+1

4V ∗T∂e g(t)R−1g(t)TV ∗∂e

(19)Recalling Eqn. (14), Eqn. (19) can be compiled into a squareform with respect to R−1g(t)TV ∗∂e/2 as

r(e, u) + V ∗(e) ≤ (u− u∗)TR(u− u∗) (20)

which implies that Eqn. (16) holds. Thus, if u = u∗, the costJb is guaranteed to be minimized, i.e., Jb = V ∗(e0), and thecorresponding optimal control input is u∗. Proof completed.♦

The above optimal control is derived based on a knownnonlinear system dynamic. However, it is very difficult tonumerically solve the optimal control problem of a nonlinearsystem, especially when system dynamic is unknown. In nextsection, NNs are applied to solve solve above-mentionedproblem using RL method.

IV. ONLINE RL-OPT CONTROL METHOD FOR HESSGenerally, the optimal control input u∗ is calculated based

on the solution of HJB Eqn. (16). However, in practice,the system uncertainty makes the non-linear partial derivativefunction too complicated to be solved directly. Therefore, twoNNs are developed in this section to realize the adaptiveonline-learning of optimal control policy. Firstly, an NN isdesigned to estimate the unknown system dynamics. Basedon the estimated system dynamics, another NN is developedto solve the optimal policy eventually. The detailed controlimplementation method for the proposed controller is elabo-rated.

A. System Dynamic Identifier Design

According to the universal approximation capability of NN[30], the system dynamics in Eqn. 4 can be represented by asingle-layer NN as

e(t) = f(e(t)) + g(t)u(t) +D(t)

= W ∗Te σ1(e) +W ∗Tu σ1(u) +W ∗Td 1 + ε1

= W ∗T1 σ1(e, u) + ε1

(21)

where W ∗1 = [W ∗e W ∗u W ∗d ]T ∈ <N×1 are ideal unknownweights of NN identifier approximating system dynamics f(e),g(t) and D. σ1(e, u) = [σ1(e) σ1(u) 1] ∈ <N×1 is theactivation function where 1 represents the vector of ones.N is the number of hidden-layer neurons. ε1 is the NNreconstruction error. W ∗1 and ε1 are assumed to be boundedas ‖W ∗1 ‖ ≤ W1M and ‖ε1‖ ≤ ε1M , respectively [31]. Itshould be mentioned that since D is a perturbation termassociated with system dynamics rather than a random externaldisturbance, term D satisfies the conditions to be estimatedby NN. Then, the tracking error dynamics estimator can bedesigned as

˙e(t) = WT1 σ1(e, u) + k1e(t) (22)

where W1 = [We Wu Wd]T ∈ <N×1 are the estimated NNidentifier weights and k1 is the selected parameter to maintainthe NN identifier stability. Defining e(t) = e(t)− e(t) as theestimation error of tracking error, the dynamics of e(t) can berepresented as

˙e(t) = e(t)− ˙e(t)

= W ∗T1 σ1(e, u) + ε1 − WT1 σ1(e, u)− k1e(t)

= WT1 σ1(e, u) + ε1 − k1e(t)

(23)

where the NN weight estimation error is defined as W1(t) =

W ∗1 −W1(t), and furthermore, ˙W1(t) = − ˙W1(t). To force the

Page 5: Reinforcement-Learning-Based Optimal Control of Hybrid Energy … · 2019-01-24 · novel RL-OPT control method is developed to realize the smooth C&D control of HESS. A hybrid AC/DC

5

estimated NN identifier weight W1(t) converging to the targetweight W ∗1 , the updating law for W1 can be designed as

˙W1 = −k2W

T1 σ1(e, u) + σ1(e, u)e(t) (24)

where k2 is a positive tuning parameter of NN identifier.Theorem 2 (Boundedness of NN identifier): Using the pro-posed NN identifier in Eqn. (22) with updating law in Eqn.(24) and letting the activation function σ1(e, u) satisfy thepersistency of excitation (PE) condition [32]. Given the initialNN identifier weight as W1(0) residing in a compact setΩ, there exists a positive tuning parameter k2 such that theidentification error e(t) in Eqn. (23) and NN identifier weightestimation error W1(t) are uniformly ultimately bounded(UUB).Proof : Define the following Lyapunov candidate as

L(W , e) =1

2W 2

1 +1

2e(t)2 (25)

Then, taking the first derivative of Eqn. (25) and substitutingEqn. (23) and (24), one can obtain that

L = WT1

˙W1 + eT ˙e

= −WT1 (−k2W1 + σ1e) + eT (WT

1 σ1 + ε1 − k1e)

= k2WT1 (W ∗1 − W1)− k1e

T e+ eT ε1

≤ 1

2k2W

T1 W1 +

1

2k2W

∗T1 W ∗1 − k2W

T1 W1

− k1eT e+ eT ε1

≤ −1

2k2‖W1‖2 − k1‖e‖2 + εb

(26)

where εb = eT ε1 + 12k2W

∗T1 W ∗1 is a bounded steady-state

error. According to the the Lyapunov synthesis [31], one canconclude that the identification error e(t) and NN identifierweight estimation error W1(t) are UUB. Proof completed.♦

B. Adaptive RL-OPT Control Design

Similarly, based on the universal approximation property ofNN [30], the cost function V ∗(e) can be represented by asingle-layer NN on set Ω as

V ∗(e) = W ∗T2 σ2(e, u) + ε2 (27)

where W ∗T2 ∈ <1×N are ideal unknown weights of NN costfunction estimator, σ2(e, u) is the activation function, ε2 isthe NN reconstruction error, and W ∗2 and ε2 are assumed tobe bounded as ‖W ∗2 ‖ ≤ W2M and ‖ε2‖ ≤ ε2M , respectively[30]. Thereafter, the cost function estimator can be designedas

V (e) = WT2 σ2(e, u) (28)

Accordingly, the estimated optimal control policy can bederived based on the two NNs in Eqn. (22) and (28) as

u = −1

2R−1WT

u Oσ2(e, u)T W2 (29)

where Oσ2(e, u) = ∂σ2(e, u)/∂e is the partial derivative ofσ2(e, u) with respect to e. Next, substituting Eqn. (22), (28)

and (29) into Eqn. (11), the approximated Hamiltonian ofoptimal control problem becomes

H(e, u, V ) = Q(e) +1

4WT

2 Oσ2WuR−1WT

u OσT2 W2

+ OσT2 W2[WT

1 (t)σ1(e, u) + k1e(t)]

+1

4WT

2 Oσ2OσT2 W2 + d2

max

(30)

Because of the impact of system uncertainty and NNreconstruction error, the estimated Hamiltonian cannot hold,i.e., H(e, u, V ) 6= 0. According to the optimal control theory[31], the estimated cost function can converge close to the idealtarget if the approximated Hamiltonian equation approaches tothe ideal Hamiltonian, i.e., H(e, u, V ) → H(e, u∗, V ∗) = 0.Inspired by this, the updating law for tuning the NN weightof cost function estimator can be designed as

˙W2 =

k3

2Θ(e, u)Oσ2WuR

−1WTu J1∂e −

k4ωH

(1 + ωTω)2(31)

where k3 and k4 are designed control coefficients, ω =−[Oσ2WuR

−1WTu Oσ

T2 W2]/2, and Θ(e, u) is an index op-

erator given by

Θ(e, u) =

0, ∀J1 = JT

1∂ee < 0

1, otherwise(32)

where J1 is a unbounded Lyapunov candidate and J1∂e is itspartial derivative with respect to e. Moreover, J1∂e can bedefined similar to [29] as

‖e‖ ≤ c1‖e‖ ≡ (c2‖J1∂e‖)14 (33)

where c1 and c2 are constants. Note that ‖J1∂e‖ can beselected to satisfy the general bound, e.g., J1 = 1

5 (eT e)52 .

Theorem 3 (Convergence of the Optimal Control): Considerthe nonlinear uncertain system in Eqn. (4) with control lawin Eqn. (29) and NN weights updating law in Eqn. (24) and(31), there exists tuning parameters k1 to k4 such that all ofsignals in the closed-loop system, e.g., tracking error e, errorof NN identifier weight W1 and error of NN cost functionestimator weight W2 are guaranteed to be UUB. Moreover, thecalculated control input u is proved to approximately approachthe optimal control input u∗.Proof : Omitted here and can be refereed to [29] and [33].

C. Controller Implementation

The implementation process of the proposed RL-OPT con-troller is shown in Fig. 3. In addition, the overall proceduresof the proposed control method are summarized in Table I.The RL-OPT controller takes the SOC measurement of theESU and calculates the optimal control input for the BPC in adecentralized manner. Thus, implementation of the proposedmethod can be achieved with minimum communication effortsand the plug & play capability of HESS can be easilyrealized. In addition, the proposed RL-OPT does not needcurrent measurement units comparing to the conventional PI-based control methods. Even though, the performance of theproposed control method is significantly improved, which willbe verified in later case studies.

Page 6: Reinforcement-Learning-Based Optimal Control of Hybrid Energy … · 2019-01-24 · novel RL-OPT control method is developed to realize the smooth C&D control of HESS. A hybrid AC/DC

6

Fig. 3. Schematic of the control implementation.

TABLE IMETHODOLOGY OF PROPOSED RL-OPT CONTROLLER

1) Initialize control policy u(0) and NN weights W1(0), W2(0)

2) while e > ε1,update W1 using Eqn. (24)

3) Calculate the estimate Hamiltonian H(e, u, V ) using Eqn. (30)4) while H(e, u, V ) 6= 0,

update W2 using Eqn. (31)5) Calculate the optimal control policy u using Eqn. (29)6) end

V. SIMULATION CASE STUDIES

In this section, the proposed control method for HESS isimplemented in a hybrid AC/DC microgrid with the sameconfiguration as the one shown in Fig. 1. The detailed switch-level model is applied in the simulation using Matlab/SimulinkSimscape toolbox with a sampling frequency of 10 kHz. TheMosfet module is used for the bidirectional converter as shownin Fig. 2. The Tustin/Backward Euler method is selectedfor the discrete solver. Both grid-tied and islanded scenariosare tested under various C&D schematics. The system andcontroller parameters are given in Table II.

The performance of proposed control method is comparedto the conventional PI-based method as presented in [14]for benchmarking studies. It should be mentioned that theconventional PI-based control methods usually take a P &PI double loop structure to avoid the over C&D problem,which requires both voltage and current transducers. Fromthis perspective, the proposed control method only requiresone voltage sensor with a much-improved performance, whichwill be presented in the following case studies.

TABLE IISYSTEM PARAMETERS OF THE SIMULATION CASE STUDY

Parameter Value Parameter Value

Cuc 5.7 F Rpc 6 kΩ

Rsc 0.1 Ω Cpb 4.7 FRpb 5 kΩ Rsb 0.15Ω

Uoc 5V Cf 20 µFRf 0.1 Ω Lf 4 mHk1 2 k2 0.1k3 0.8 k4 1

Fig. 4. Discharging response of UC in islanded mode: (a) discharging current;(b) SOC.

A. Case I. Islanded Mode

In this case, the proposed RL-OPT control method is testedin an islanded microgrid. One DG is used to regulate the DCbus voltage at 48V constantly. At time 1s, the UC is dischargedfrom 30% to 29%. While at time 6s, the LIB is charged from30% to 31%. Firstly, the discharging current and SOC of UCare shown in Fig. 4. While the charging current and SOC ofLIB are plotted in Fig. 5. Additionally, the response of DCbus voltage is presented in Fig. 6.

As can be observed, the conventional PI-based controlmethod produces large disturbances to the system (bluecurves). The sharply changing C&D current and SOC leadto a huge disruption to the DC bus, e.g. over 5V (i.e. >10%)overshoot. The considerable disturbances are harmful to thesystem, especially for the sensitive loads and power elec-tronic devices. On the contrary, the proposed RL-OPT controlmethod is able to optimize the entire C&D profile (red curves).It can be seen that the C&D currents of ESU get greatlysmoothed, which consequently reduces the voltage overshootof DC bus to be less than 1V (i.e. <2%). The responses ofactivation weights W1 is shown in Fig. 7, among which Fig.7(a) is the weight of UC controller and Fig. 7(b) is the weightof LIB controller, respectively. As the weights converge totheir desired target, the optimal control is achieved.

B. Case II. Grid-Tied Mode

In case study II, the proposed RL-NN controller is testedunder the grid-tied microgrid. The simulation setting is sameas that of the case I, except that the DC bus voltage ismaintained by the external grid through a VSC that can beconsidered as an infinite source. Similarly, the C&D currentand SOC of UC and LIB are presented in Fig. 8 and Fig. 9,respectively. The responses of DC bus voltage is shown in Fig.10. The responses of activation weights W1 is shown in Fig.11, among which Fig. 11(a) is the weight of UC controller andFig. 11(b) is the weight of LIB controller, respectively. As theweights converge to their desired target, the optimal control

Page 7: Reinforcement-Learning-Based Optimal Control of Hybrid Energy … · 2019-01-24 · novel RL-OPT control method is developed to realize the smooth C&D control of HESS. A hybrid AC/DC

7

Fig. 5. Charging response of LIB in islanded mode: (a) charging current; (b)SOC.

Fig. 6. DC bus voltage response in islanded mode

Fig. 7. Activation Weights of NN in islanded mode: (a) W1 of UC; (b) W1

of LIB.

TABLE IIISIMULATION RESULTS COMPARISON

Islanded-mode Grid-tied modeVsurge Isurge Vsurge Isurge

Conventionalmethod

0.85 V(1.78%)

4.74 A0.82 V(1.70%)

4.74 A

Proposed RLmethod

5.16 V(10.7%))

20.05 A1.35 V(2.81%)

20.05 A

Fig. 8. Discharging response of UC in grid-tied mode: (a) discharging current;(b) SOC.

is achieved. As can be seen, the C&D profile as well as theactivation weights of UC and LIB in grid-tied mode are almostthe same as in islanded mode. However, the disturbances onDC bus voltage is significantly reduced because the maingrid can provide a relatively stable voltage support thanDG. Since the proposed control design has already achievedthe optimization, not much improvement can be observed.Nevertheless, the performance of proposed RL-NN controlleris still better than the conventional PI-based controller in termsof C&D currents and unexpected disturbances. In addition, adetailed comparison between conventional control method andproposed method is given in Table III.

VI. EXPERIMENTAL CASE STUDIES

In this section, the developed RL-OPT controller is fullyevaluated through HIL experiments. The hardware experimentis very different from the software simulation since manypractical problems may appear, e.g. communication delay andmeasurement noises. Therefore, it is meaningful to prove theeffectiveness of developed controller in a physical system andpromote the corresponding application maturity. The topologyconfiguration of tested system is shown in Fig. 12 and thelaboratory setup of HIL testbed is presented in Fig. 13,respectively. The major system parameters given in TableIII. It should be mentioned that in software simulation andHIL experiments, the different components parameters and thesame control parameters are used to demonstrate the adaptivityof the proposed RL control method. Basically, a DC power

Page 8: Reinforcement-Learning-Based Optimal Control of Hybrid Energy … · 2019-01-24 · novel RL-OPT control method is developed to realize the smooth C&D control of HESS. A hybrid AC/DC

8

Fig. 9. Charging response of LIB in grid-tied mode: (a) charging current; (b)SOC.

Fig. 10. DC bus Voltage response in grid-tied mode.

Fig. 11. Activation Weights of NN in grid-tied mode: (a) W1 of UC; (b) W1

of LIB.

TABLE IVSYSTEM PARAMETERS OF HIL CASE STUDIES

Parameter Value Parameter Value

Cuc 6.0 F Lf 2 mHCf 47 µF Rload 20Ω

Fig. 12. Topology configuration of tested system.

supply connected with a boost converter is applied to maintainthe DC bus voltage at 20V and supply the power for normalloads. A Maxwell UC (BMOD0006-E160-B02) is connectedat the DC bus via a DC/DC buck converter. HIL platformdSPACE MicroLabBox (DS1202) is employed to interfacethe microgrid and the proposed RL-OPT controller that isimplemented in the host PC. Variables are measured by theADC I/Os and resulted switching signals sent by DAC I/Osof the DS1202 in a real-time manner. For safety consideration,the experiment is designed to charge the UC from 9 V to11 V. It is noteworthy that, except for certain necessary systemsettings, the controller parameters are set exactly the same asthey were in the simulation case studies without further tuningeffort, as this can help to evaluate the scalability of proposedcontroller.

The experiment results are presented in Fig. 14, where thegreen line denotes the DC bus voltage, blue line denotes thevoltage of UC, and red line denotes the charging current.It can be observed from Fig. 14 that the entire chargingprocess is smooth. Expect for the normal harmonics introducedby the switching devices and surrounding electromagneticinterference, the charging process of UC barely producesany disturbance to the common bus voltage. In addition, thesystem mismatch between software simulation and hardwareexperiment has been well resolved by the RL-based method,which is a significant merit of the proposed method. As aconclusion, the effectiveness of the proposed RL-OPT controlmethod for HESS is completely demonstrated.

VII. CONCLUSION

Conventional control methods for HESS result in significantdisturbances to microgrids during C&D process. In addition,conventional control methods of HESS are usually designed

Page 9: Reinforcement-Learning-Based Optimal Control of Hybrid Energy … · 2019-01-24 · novel RL-OPT control method is developed to realize the smooth C&D control of HESS. A hybrid AC/DC

9

Fig. 13. Laboratory setup of the HIL testbed.

Fig. 14. HIL experiment results: DC bus voltage, UC voltage, and UCcharging current.

based on the known system parameters, which may be unavail-able or fluctuated in practice. In this work, A novel RL-OPTcontroller is developed to provide a smooth C&D control forHESS in microgrids with unknown system parameters. First,the optimal control design for uncertain nonlinear system isformulated. Then, one NN is designed to learn the systemdynamics based on the input/output data. Next, another NNis developed to learn the optimal control input for systemthrough online RL. The effectiveness of proposed method isfully evaluated through extensive software simulations andHIL experiments. In the future, other types of HESS such asflywheel and flow battery can be considered. The upper-levelenergy management system can also be incorporated into thecontrol scheme design.

REFERENCES

[1] J. Duan, C. Wang, H. Xu, and W. Liu, “Distributed control of inverter-interfaced microgrids based on consensus algorithm with improvedtransient performance,” IEEE Trans. on Smart Grid, pp. 1–1, 2018.

[2] Z. Yi and A. H. Etemadi, “Line-to-line fault detection for photovoltaicarrays based on multiresolution signal decomposition and two-stagesupport vector machine,” IEEE Trans. on Ind. Electron., vol. 64, no. 11,pp. 8546–8556, Nov 2017.

[3] D. Shi, X. Chen, Z. Wang, X. Zhang, Z. Yu, X. Wang, and D. Bian, “Adistributed cooperative control framework for synchronized reconnectionof a multi-bus microgrid,” IEEE Trans. on Smart Grid, pp. 1–1, 2018.

[4] B. P. Roberts and C. Sandberg, “The role of energy storage in develop-ment of smart grids,” Proc. IEEE, vol. 99, no. 6, pp. 1139–1144, June2011.

[5] Y. Chen, Y. Wang, D. Kirschen, and B. Zhang, “Model-free renewablescenario generation using generative adversarial networks,” IEEE Trans.on Power Syst., vol. 33, no. 3, pp. 3265–3275, May 2018.

[6] Y. Xiang, L. Wang, and N. Liu, “A robustness-oriented power gridoperation strategy considering attacks,” IEEE Trans. on Smart Grid,vol. 9, no. 5, pp. 4248–4261, Sept 2018.

[7] C. A. Silva-Monroy and J. Watson, “Integrating energy storage devicesinto market management systems,” Proc. IEEE, vol. 102, no. 7, pp.1084–1093, July 2014.

[8] M. Farhadi and O. Mohammed, “Energy storage technologies for high-power applications,” IEEE Trans. on Ind. Appl., vol. 52, no. 3, pp. 1953–1961, May 2016.

[9] B. G. Carkhuff, P. A. Demirev, and R. Srinivasan, “Impedance-basedbattery management system for safety monitoring of lithium-ion batter-ies,” IEEE Trans. on Ind. Electron., vol. 65, no. 8, pp. 6497–6504, Aug2018.

[10] E. Manla, G. Mandic, and A. Nasiri, “Development of an electricalmodel for lithium-ion ultracapacitors,” IEEE J. of Emerg. and Sel. Topicsin Power Electron., vol. 3, no. 2, pp. 395–404, June 2015.

[11] Z. Yi, W. Dong, and A. H. Etemadi, “A unified control and powermanagement scheme for PV-battery-based hybrid microgrids for bothgrid-connected and islanded modes,” IEEE Trans. on Smart Grid,vol. PP, no. 99, pp. 1–1, 2017.

[12] D. E. Olivares, A. Mehrizi-Sani, A. H. Etemadi, C. A. Caizares,R. Iravani, M. Kazerani, A. H. Hajimiragha, O. Gomis-Bellmunt,M. Saeedifard, R. Palma-Behnke, G. A. Jimnez-Estvez, and N. D.Hatziargyriou, “Trends in microgrid control,” IEEE Trans. on SmartGrid, vol. 5, no. 4, pp. 1905–1919, July 2014.

[13] J. Duan, C. Wang, and H. Xu, “Distributed control of inverter-interfacedmicrogrids with bounded transient line currents,” IEEE Trans. on Ind.Informat., vol. 14, no. 5, pp. 2052–2061, May 2018.

[14] W. Im, C. Wang, L. Tan, and W. Liu, “Cooperative controls for pulsedpower load accommodation in a shipboard power system,” IEEE Trans.on Power Syst., vol. 31, no. 6, pp. 5181–5189, Nov 2016.

[15] A. Ortega and F. Milano, “Generalized model of vsc-based energystorage systems for transient stability analysis,” IEEE Trans. on PowerSyst., vol. 31, no. 5, pp. 3369–3380, Sept 2016.

[16] S. Negarestani, M. Fotuhi-Firuzabad, M. Rastegar, and A. Rajabi-Ghahnavieh, “Optimal sizing of storage system in a fast chargingstation for plug-in hybrid electric vehicles,” IEEE Trans. on Transport.Electrific., vol. 2, no. 4, pp. 443–453, Dec 2016.

[17] T. Kovaltchouk, A. Blavette, J. Aubry, H. B. Ahmed, and B. Multon,“Comparison between centralized and decentralized storage energymanagement for direct wave energy converter farm,” IEEE Trans. onEnergy Convers., vol. 31, no. 3, pp. 1051–1058, Sept 2016.

[18] V. Vu, D. Tran, and W. Choi, “Implementation of the constant currentand constant voltage charge of inductive power transfer systems withthe double-sidedlcccompensation topology for electric vehicle batterycharge applications,” IEEE Trans. on Power Electron., vol. 33, no. 9,pp. 7398–7410, Sept 2018.

[19] X. Zheng, X. Liu, Y. He, and G. Zeng, “Active vehicle battery equal-ization scheme in the condition of constant-voltage/current charging anddischarging,” IEEE Trans. on Veh. Technol., vol. 66, no. 5, pp. 3714–3723, May 2017.

[20] X. Feng, J. Hu, Y. Tao, H. Liu, and D. Liu, “Research of off-grid energystorage converter based on repetitive control and pi control,” in 2016China International Conference on Electricity Distribution (CICED),Aug 2016, pp. 1–4.

[21] C. Lin, D. Deng, C. Kuo, and Y. Liang, “Optimal charging control ofenergy storage and electric vehicle of an individual in the internet ofenergy with energy trading,” IEEE Trans. on Ind. Informat., vol. 14,no. 6, pp. 2570–2578, June 2018.

[22] L. Wang, F. Bai, R. Yan, and T. K. Saha, “Real-time coordinated voltagecontrol of pv inverters and energy storage for weak networks with highpv penetration,” IEEE Trans. on Power Syst., vol. 33, no. 3, pp. 3383–3395, May 2018.

[23] L. Meng, T. Dragicevic, and J. M. Guerrero, “Adaptive control designfor autonomous operation of multiple energy storage systems in powersmoothing applications,” IEEE Trans. on Ind. Electron., vol. 65, no. 8,pp. 6612–6624, Aug 2018.

[24] T. Ma, M. H. Cintuglu, and O. A. Mohammed, “Control of a hybrid ac/dcmicrogrid involving energy storage and pulsed loads,” IEEE Trans. onInd. Appl., vol. 53, no. 1, pp. 567–575, Jan 2017.

[25] F. Zhu, Z. Yang, H. Xia, and F. Lin, “Hierarchical control and full-rangedynamic performance optimization of the supercapacitor energy storagesystem in urban railway,” IEEE Trans. on Ind. Electron., vol. 65, no. 8,pp. 6646–6656, Aug 2018.

[26] W. Zhang, W. Liu, X. Wang, L. Liu, and F. Ferrese, “Distributed multipleagent system based online optimal reactive power control for smartgrids,” IEEE Trans. on Smart Grid, vol. 5, no. 5, pp. 2421–2431, 2014.

Page 10: Reinforcement-Learning-Based Optimal Control of Hybrid Energy … · 2019-01-24 · novel RL-OPT control method is developed to realize the smooth C&D control of HESS. A hybrid AC/DC

10

[27] R.-C. Roman, R.-E. Precup, and R.-C. David, “Second order intelligentproportional-integral fuzzy control of twin rotor aerodynamic systems,”Procedia computer science, vol. 139, pp. 372–380, 2018.

[28] S. Vrkalovic, E.-C. Lunca, and I.-D. Borlea, “Model-free sliding modeand fuzzy controllers for reverse osmosis desalination plants,” Int. J.Artif. Intell, vol. 16, pp. 208–222, 2018.

[29] Y. Huang, “Optimal guaranteed cost control of uncertain non-linearsystems using adaptive dynamic programming with concurrent learning,”IET Control Theory Applications, vol. 12, no. 8, pp. 1025–1035, 2018.

[30] F. Lewis, S. Jagannathan, and A. Yesildirak, Neural network control ofrobot manipulators and non-linear systems. CRC Press, 1998.

[31] F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal control. John Wiley& Sons, 2012.

[32] T. Dierks and S. Jagannathan, “Optimal control of affine nonlinearcontinuous-time systems,” in American Control Conference (ACC),2010. IEEE, 2010, pp. 1568–1573.

[33] H. Xu and S. Jagannathan, “Stochastic optimal controller design foruncertain nonlinear networked control system via neuro dynamic pro-gramming,” IEEE Trans. on Neural Netw. Learn. Syst., vol. 24, no. 3,pp. 471–484, March 2013.