Benchmark of Machine Learning Algorithms on Capturing ...yweng2/papers/2019IET.pdf · Benchmark of...

IET Research Journals

Benchmark of Machine Learning Algorithmson Capturing Future Distribution NetworkAnomalies

ISSN 1751-8644doi: 0000000000www.ietdl.org

Mostafa Mohammadpourfard1, Yang Weng 2∗, Mohsen Tajdinian3

1Department of Electrical and Computer Engineering, Sahand University of Technology, Tabriz, Iran2School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, USA3School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran* E-mail: [email protected]

Abstract: The conventional distribution network is undergoing structural changes and becoming an active grid due to the advent ofsmart grid technologies encompassing distributed energy resources (DERs), aggregated demand response and electric vehicles(EVs). This establishes a need for state estimation (SE)-based tools and real-time monitoring of the distribution grid to correctlyapply active controls. Although such new tools may be vulnerable to cyber-attack, the cyber-security of distribution grid has notreceived enough attention. As smart distribution grid intensively relies on communication infrastructures, we assume in this paperthat an attacker can compromise the communication and successfully conduct attacks against crucial functions of distributionmanagement system, making the distribution system prone to instability boundaries for collapses. We formulate attack detectionproblem in the distribution grid as a statistical learning problem and demonstrate a comprehensive benchmark of statistical learningmethods on various IEEE distribution test systems. The proposed learning algorithms are tested using various attack scenarioswhich include distinct features of modern distribution grid such as integration of DERs and EVs. Furthermore, the interactionbetween the transmission and distribution systems and its effect on the attack detection problem is investigated. Simulation resultsshow attack detection is more challenging in the distribution grid.

1 Introduction

Due to the strong dependence of smart grid functions on informa-tion and communication technology (ICT), the number of possiblecyber-attacks has been increased [1, 2]. These attacks can reducethe reliability of smart grids and cause severe operational failuresand substantial financial loss [3–5]. Therefore, the cyber-security ofsmart grid has been highlighted and become a significant concernamong power researchers. Among identified cyber-attacks, the mostcritical one is the false data injection (FDI) which make transmis-sion system state estimation (SE) inaccurate [3]. Wrong estimatescan lead to wrong supervisory decision makings which can leadto catastrophic consequences such as blackouts [6]. The adver-sary orchestrates these attacks by altering readings of smart metersto inject arbitrary errors to state estimates without being detectedby bad data detection (BDD) methods [7]. Therefore, detection ofFDI attacks is essential for ensuring the overall reliability of smartgrids. To this end, several mitigations and detection methods havebeen proposed to protect the system operation and control againstFDI attacks. They include protection-based approaches [8, 9] anddetection-based approaches [6, 10].

While significant numbers of research works have been conductedon the cybersecurity of the transmission grid, focusing on the cyber-security of the smart distribution grid is in its early stage. Hence, byidentifying this important gap, this issue is set as the focus of thispaper. With more and more intelligent sensors and pervasive elec-tronic automation devices deployed into the smart distribution grid,the distribution system is subject to high risks of cyber-attacks likethe transmission grid. Conventional distribution networks were pas-sively designed to deliver energy efficiently and reliably from thetransmission grid to the end users. However, due to shifting towardsthe smart grid and with the advent of distributed generations (DGs)and electric vehicles (EVs), distribution grid is undergoing structuralchanges [11, 12] such as bi-directional power flow (flowing backfrom customers to the distribution grid), uncertainty and dynamic

variation of system load profiles, voltage stability problems, and dis-tribution system operating at stability boundaries are also troublingthe distribution grid operation.

To meet the changes in technology, reliable and real-time moni-toring of the distribution system is needed as it is critical to verifythe security state of the system which makes the role of distribu-tion system SE more significant [13, 14]. SE is well established inthe transmission grid but is currently under development for the dis-tribution grid due to lack of communication infrastructure, and thefact that most distribution systems were not monitored in the past[12, 13]. Recently, attempts have been made to develop SE meth-ods to estimate the state of the network, in terms of node voltages orbranch currents for distribution grid [15].

However, the integrity of state estimation is under mountingthreat, and they are vulnerable to cyber-attacks specifically false datainjection attacks [7]. Deng et al. [16] extend FDI attacks against stateestimation in transmission systems to distribution feeders. Theseattacks can cause severe damages to the distribution systems. Anadversary can, for example, launch a cyber-attack to manipulatemeasurements to disrupt a critical component of the grid such as acircuit breaker to cause a blackout[17]. Therefore, malicious attackdetection is the essential step for preventing or minimizing the dam-ages resulting from the FDI in the smart distribution grid. On theother side, attack detection problem in the distribution grid is dif-ferent from the transmission grid. This is because these networksdiffer from one another in many ways, such as high resistance toreactance (r/x) ratios, radial network topology, and increasing vari-able and less predictable load profiles due to complex interactions ofDERs and EVs, etc [14, 15]. Fig. 1 shows how these unique char-acteristics of the distribution grid make the attack detection morechallenging in this network. For example, the high r/x ratios andunbalanced loads separated by short distances lead to having differ-ent and broader valid measurement ranges. This means a change inthe measurement which should be detected as an attack in the trans-mission grid can be a normal one in the distribution grid.

In this paper, we focus on FDI attack detection problem in

IET Research Journals, pp. 1–11© The Institution of Engineering and Technology 2015 1

Fig.1: New features of the smart distribution grid

the smart distribution system using statistical learning techniques.Although a different number of methods have been proposed forFDI attack detection in the power systems, to the best of the author'sknowledge, this is the first paper that utilizes machine learning meth-ods to detect data integrity attacks in the distribution grid withconsidering its modern characteristics and for the first time proposesa benchmark of those methods. More specifically, this paper pro-vides insight into how to use machine learning algorithms to detectanomalies in the modern distribution network and their capabilitiesin this field. The contributions of this paper are as follow:1. A detailed analysis of supervised learning techniques in identify-

ing anomalies is conducted. Then, we propose ensemble methodsto model distribution grid data and predict FDI attacks andevaluate their performance.

2. Effects of FDI attacks on essential measurement vectors of thedistribution grid such as voltage magnitudes and line currents areanalyzed and compared.

3. The capability of statistical learning algorithms in dealing withthe challenging features of the modern distribution grid such asDERs and EVs is thoroughly analyzed and compared to eachother.

4. Integration of transmission and distribution networks and theinfluences of different possible events in the transmission gridsuch as contingencies on attack detection problem in the distribu-tion grid is investigated.The rest of the paper is as follows. Section II formulates the attack

detection problem as a statistical classification problem. Statisticallearning methods are described in Section III. In Section IV, wepresent our experimental results. Finally, we conclude the paper inSection V.

2 Problem Formulation

As more measurements become available in future distribution grid,SE will be used for distribution system monitoring as widely astransmission systems. Distribution system state estimation is doneto estimate the state of the network, in terms of node voltages orbranch currents for distribution grid [15]. For example, references[13, 15] have tried to adopt weighted least squares (WLS) for thedistribution systems. For an N -bus distribution system using an ACstate estimation, the system model to study the attack detection isdefined as:

z = h(x) + e, (1)

where z ∈ Rm×1 is the measurement vector, x ∈ Rn×1 is the vectorof the state variables [18]. h(·) : Rn×1 → Rm×1 is a nonlinear vec-tor function between measurements and distrbution system states.e ∈ Rm×1 is the measurement error vector. The estimated systemstate x are obtained by minimizing the WLS criterion. The estima-tion process is followed by a BDD method based on 2-Norm ofmeasurement residual to detect the presence of bad measurements

‖r‖ = ‖z− h(x)‖ ≥ τ , where τ is a predefined detection thresh-old [18]. To mislead the power grid control algorithms, the attackerneeds to inject a nonzero attack vector a to the original measure-ment vector za = z + a without being detected by the BDD. Themalicious measurement za could bypass the BDD detector and notlead to a change in the residual value under the condition a =h(x + c)− h(x), where c = [c1, · · · , cn] is the maliciously injectederror on the system state [19]. After false data injection, distributionsystem state estimation will get an erroneous system state [16]:

xa = x + c. (2)

To perform FDI attacks, we make assumption that the attackercan somehow compromise the communication infrastructure of aregion and a subset of the sensor readings and will finally be ableto introduce arbitrary error into some critical measurements in thedistribution grid such as voltage magnitude |V | and line currentbetween bus i and bus j |Iij |. This means it is assumed that vectorc could be injected to voltage magnitude of buses [V1, · · · , Vn] andbranch currents [I1, · · · , Inbr] where nbr is the number of branches.It is noteworthy that bus 1 is a reference bus, so it is fixed as aconstant with a unit magnitude which causes the voltage magni-tude vector become [V2, · · · , Vn]. Therefore, the attack model canbe described as:

Va =

V2 + c2...

Vn + cn

; Ia =

I1 + c1...

Inbr + cnbr

. (3)

The attack detection problem is to find the corrupted vector. Givena set of samples S and labels Y (normal versus tampered), machinelearning algorithms try to learn the latent relationship between thesamples and label to produce a function S −→ Y for classifyingthe measurements in two groups, secure and attacked. Therefore, theattack detection problem using statistical learning algorithms can bedefined as a binary classification problem:

yi =

{0, if c = 0

1, if c 6= 1, (4)

where yi = 0 shows that there is no attack and yi = 1 means the ithmeasurement is corrupted.

It is assumed that for each targeted state, various injectionamounts, e.g., 90%, 110%, of the true value are simulated. Forexample, 90% means that the manipulated variable by the attackeris 10% smaller than the original value. This means vector c canhave the following values c = ±[0.1× V2, · · · 0.1×, Vn] and c =±[0.1× I1, · · · , 0.1× Inbr].

IET Research Journals, pp. 1–112 © The Institution of Engineering and Technology 2015

(a) 18-bus (b) Bayesian network

Fig. 2: Bayesian network for IEEE 18-bus: Line currents

3 Attack Identification Using Machine LearningMethods

3.1 Supervised Learning Algorithms

To classify the measurements into normal and attack, five learningalgorithms including Bayesian Network (BN) [20], Support VectorMachine (SVM) [21], K-Nearest Neighbor (KNN) [22], C4.5 deci-sion tree [23] and Multilayer Perceptron (MLP) [24] are employed.We choose BN since it is a powerful inference method and representsthe probabilistic graphical models (PGMs). PGMs are widely usedin power system area to model the probabilistic dependencies amongdifferent measurements [10, 11]. The assumption is that the attackwill lead to a different graphical model [10]. SVM is used since ithas a good generalization ability and has been shown to be effectivein discriminating normal samples and attack ones [25]. KNN is cho-sen due to its lazy learning efficiency. C4.5 is an extended versionof ID3 algorithm [23] and represents decision tree-based algorithms.MLP is a type of neural network which is commonly used in dif-ferent applications and is able to discriminate non-linearly separableclasses.

1) BN Classfication: A Bayesian network (BN) is a graphi-cal model (a directed acyclic graph (DAG)) that represents a jointprobability distribution over a set of variables [20]. BN provides amapping from a sample si to the posterior probability of belongingto the attack class yi = 1, P (yi = 1|si). The posterior probabilityof the presence of an attack in terms of prior probabilities and thereverse conditional probability can be obtained as follows:

P (yi = 1|si) =P (si|yi = 1)P (yi = 1)

P (si). (5)

To learn a BN, the structure of the DAG needs to be determined.The search space to build the structure includes all of the possiblestructures of DAGs based on the input variables. To enumerate allof the possible DAGs, reference [26] proposed a heuristic searchalgorithm called K2 which searches for the most probable Bayesiannetwork structure. Fig. 2b shows an example of built DAG for 18-busnetwork when the input variables are line currents. This test systemis shown in Fig. 2a. Nodes of this graph are corresponding to linecurrents. For example, i4-5 means the line current between bus 4 andbus 5. Class Label represents the label of the samples. Test systemsare discussed in detail in section IV.

2) SVM: Given a set of N training samples S = {(s1, y1), · · · ,(sN , yN )}, SVM seeks a hyperplane that separates the attackedand normal measurements [21]. To this end, SVM minimizes thefollowing cost function which tries to maximize the margin (lin-ear separation of data) and minimize the error (penalization of

misclassified samples):

minw,b,ξ

1

2wTw + C

N∑i=1

ξi,

s.t. yi(wTφ(si) + b) ≥ 1− ξi,

ξi ≥ 0, i = 1, 2, · · · , N,

(6)

where C is the penalty parameter, ξi is slack variable showing thenon-separability of data,w is the weight vector of the SVM network,b is the bias term of the SVM network.

SVM can manage the nonlinear relationships between class labelsand attributes using the kernel functionK(si, sj) ≡ φT (si) · φ(sj),where φ(s) is a nonlinear transformation which maps training sam-ple si into a higher dimensional space, where SVM recognizes thesolution. In this paper, the kernel radial basis function is used to sep-arate the solution sets which are not linearly separable and is definedas:

K(si, sj) = exp(−γ‖si − sj‖2), γ > 0, (7)

where γ adjusts the smoothing of the discriminant function.

3) KNN: KNN is a lazy method which does not construct any clas-sification model in advance. It labels a new observation accordingto the labels of predefined (k-nearest) number of training samplesclosest in distance to the new point [22]. The distance metric usedin this paper is Euclidean distance. Thus, first the set of KNNs ofthe new observation s′i is constructed and then the most frequentlyobserved class label in the set is defined through a simple majorityvote. Afterward, the s′i is labeled as the most frequent class label.

4) C4.5: The goal of a decision tree is extracting predictive infor-mation in the form of decision rules inferred from the trainingsamples S. Then, these rules are used to predict the value of a targetvariable of the new sample s′i. In this paper, the C4.5 algorithm isused to construct the decision tree. In this approach, at each node ofthe tree, an attribute is selected to split the samples based on infor-mation gain [23]. Assume that Fr(yi, S) is the frequency of samplesbelonging to class yi. The entropy (E) of S which is is a measure ofthe amount of uncertainty in dataset is computed as follows [22]:

E(S) = −k∑i=1

(Fr(yi, S)

|S| · log2(Fr(yi, S)

|S| )), (8)

where k is the number of distinct classes, |S| is the total number ofsamples in S. After computation of E(S), S is divided into n num-ber of outcomes regarding an attribute say x. Therefore,Ex(S) is theweighted sum of all the individual entropies of subsets of samples(si). Then, the final entropy is calculated as follows [22]:

Ex(S) = −n∑i=1

(|si||S|E(si)). (9)

Information gain (IG) is the measure of the difference in entropyfrom before to after the set S is partitioned on an attribute x and isequal to IG(x) = E(s)− Ex(S). IG(x) shows how much uncer-tainty in dataset is reduced after partitioning data on attribute x andis calculated for all attributes. The best attribute to split S is theattribute with the greatest IG. This means that attribute becomes theparent node of the tree. The child nodes are created in a similar wayuntil all the entries are classified to a single output class.

5) MLP: Given a set of N samples and the class label,{(s1, y1), · · · , (sN , yN )}, MLP can learn a nonlinear functionapproximator for classifying the measurements into normal andattacked [24]. MLP is a feed-forward artificial neural network con-sists of different interconnected layers: an input, an output layer andone or more hidden layers. Except the input nodes, other nodes areneurons which have usually a nonlinear sigmoid activation function.


One widely used activation function which ranges from 0 to 1 is asfollows:

φ(yj) = (1 + e−vi)−1, (10)

where vi is the weighted sum of the inputs of the ith node and yi isthe output of the node. For learning, MLP utilizes a technique calledbackpropagation which is a generalization of the least mean squaresalgorithm in the linear perceptron (LP). Unlike LP, MLP can managenonlinearly separable datasets.

3.2 Ensemble Learning Methods

To make good predictions, classification methods search a hypothe-sis space for finding a proper hypothesis. Finding a good hypothesismight be difficult. Ensemble techniques combine a collection ofhypotheses to build a (hopefully) better hypothesis [27]. Output ofensemble methods is a single hypothesis which it might not be withinthe hypothesis space of the models from which it is trained. Thismeans that ensemble techniques can have more flexibility regard-ing the represented functions. However, that flexibility might lead tooverfitting. A supervised learning hypothesis method is said to over-fit if it is more accurate in fitting the training data and less accuratein predicting the class label of the test data. In the following, ensem-ble learning techniques such as Adaptive Boosting and stacking areexplained.

1) Adaptive Boosting (AdaBoost): AdaBoost could be used toboost the performance of any classification algorithm [28]. Adaboostworks by building multiple learning models in a sequence. The firstmodel is built by fitting a classifier on the original dataset in theusual way. Then, the second model is built by fitting another copyof the classifier on the same dataset with focus on samples that weremisclassified in the first model. Then, the third copy of the classifieris trained in such a way that focuses on previous model’s errors.This means in each iteration, the weights of misclassified recordsare adjusted such that consequent classifiers focus more on difficultinstances which were incorrectly classified. A boosted classifier is inthe following form:

BT (si) =

T∑t=1

bt(si), (11)

where each bt is the base learner which takes the input sample si andreturns its class yi and T shows the number of iterations. For eachsample si, the base learner generates an output hypothesis h(si).At each iteration, a coefficient αt is attributed to the base learner insuch a way that the sum of training error of the resulting t-stage Mt

is minimized:

Mt =∑i=1

M [Bt−1(si) + αth(si)], (12)

where Bt−1(si) represents the boosted classifier which has beendeveloped up to precedent stage. M(B) is the error function, andbt(si) = αth(si) is the base learner which will be added to the final

classifier. At each iteration t, a weight wt equal to M(Bt−1(si)) isassigned to each training sample si which will be used to inform thetraining of the base learner. In this paper, each learning algorithmis combined with Adaboost technique with the number of iterations10 and 100. However, we observed that there is not much differencebetween them so that results of the 100 one are presented in thepaper.

2) Stacking: Stacking uses a meta-learner to combine the predic-tions of a collection of classifiers [29]. To this end, first differentlearning algorithms are trained on the original data. Then, predic-tions of those classifiers are fed into a meta-classifier to combinetheir output and make a final prediction. Logistic regression isusually used as the combiner.

3.3 Performance Evaluation

To validate the statistical learning algorithms, F-measure (FM)and detection rate (DR) are derived from the confusion matrix.F-measure is defined as follows [27]:

FM = (2× Pr ×RePr +Re

), (13)

where Pr is the precision, Re is the recall and they are computed asfollows:

Pr = (TP

TP + FP) Re = (

TP

TP + FN), (14)

where true positive (TP) is the number of attack samples correctlydetected, false positive (FP) is the number of incorrectly detectedattacks, true negative (TN) is the number of truly identified normalsamples, and false negative (FN) is the number of missed attacks.F-measure value 1 indicates the best performance. DR is the numberof attacks detected by the method divided by the total number ofattacks in the dataset.

4 Benchmark Results

In this section, the performance of the machine learning methodsare analyzed using different case studies. First, we launch attacks ineach system using the formulation mentioned in Section II and traineach learning algorithm in Case I. In this case, it is assumed thatthere is no DERs, EVs and also the effect of the transmission grid isignored. In the next case studies, we use the trained statistical mod-els to evaluate the robustness of the algorithms in managing uniquecharacteristics of the distribution grid and also integrated power sys-tem. It is noteworthy that a grid search [30] method is employed tosearch the parameters of the learning algorithms.

4.1 Data Preparation

To complete simulations, different IEEE systems such as 18-bus,33-bus, and 123-bus distribution systems are used. In each network,feeder bus is selected as the slack bus. The historical data have been

Fig. 3: IEEE distribution test networks


(a) 18-bus (b) 33-bus (c) 123-busFig. 4: Normalized 3-months power profile for a bus in each system

preprocessed by MATPOWER [31] and DIgSILENT [32]. To simu-late the power system behavior in a more realistic pattern, two typesof time-varying load models are considered: 1) time-varying resi-dential load and 2) time-varying commercial load. The load data isadapted from Open Energy Information (OpenEI) [33] and are com-bined in the test systems as shown in Fig. 2a and Fig. 3. The recordedload data in OpenEI is hourly real power consumption of commercialand residential building types. Therefore, since reactive powers arenot available, we simulate reactive power qi at bus i with randompower factor: qi(t) = pi(t)

√1− pfi(t)2/pfi(t), where pfi(t) ∼

Unif(0.85, 0.95). For example, Fig. 4 shows a dataset from OpenEIfor each utilized network.

To obtain measurements at time t, i.e., |Vi(t)| and |Iij(t)|, forsimulations, we run a power flow based on the power profile above.Therefore, time-series data are gained by repeatedly running thepower flow to generate hourly data over three months. Totally, T =2160 measurements are obtained as normal samples.

4.2 Case Study I - Detecting Attacks in the Networkswithout any DERs and etc.

In this case, we have assumed that the system works well exceptdays 62 to 66. This means the measurements of 5 days are com-pletely replaced by the attacked ones. Therefore, there are 120 attacksamples for each attacking scenarios with incremental/decrementalattack cost and overall 240 attack samples for each measurement. Tobuild machine learning models, we tested the learning algorithms onthe generated datasets. The models were built using 10-fold crossvalidation. Fig. 5 summarizes the test results for this case study. Themore detailed results are presented in the appendix. We observedthat most of the algorithms have good F-measure for all networks.However, for voltage magnitudes, SVM, MLP and stacking tech-nique have the F-measure 1 for all networks. For the branch currents,it is observed that the stacking approach provides better F-measureby decreasing false positive (FP) rate and increasing detection rate.Moreover, for this measurement, we observed that the normal sam-ples which are incorrectly labeled as attacked variables by methods,are increased. This is because the variance of changes in the branchcurrents is increased and it is difficult for algorithms to draw aboundary line between secure and attack measurements.

4.3 Case Study II - Testing the Trained Models to DetectAttacks after Integration of DERs.

Expansion of distribution grid by adopting DERs has attracted muchattention due to increased energy demand, economic and environ-mental benefits [11]. However, integration of DERs could causesignificant uncertainty and variability[34]. In this case, we have ana-lyzed the robustness of the built machine learning models in dealing

with the integration of DERs into the network through injecting falsedata after that integration. To this end, it is assumed that the sys-tem works normally until day 20. On the 21st day, solar panels areselected as the source of renewable energy to be added to networkswith a fixed penetration level (40% of the peak load). The hourlypower generation profile is obtained using PVWatts Calculator, anapplication developed by the National Renewable Energy Labora-tory (NREL) [35]. Power generation of a photovoltaic system inPVWatts, is estimated based on physical parameters and weather.The hourly data are computed based on the weather history of CasaGrande, AZ, USA.

We randomly choose a bus as the location of PV. The renew-able power generation is modeled as a negative load. Attacks areconducted on days 62 to 66. Fig. 6 summarizes the test results oftesting pre-built learning models. The more detailed results are pre-sented in the appendix. As one can see, the trained models cannotpredict samples correctly after integration of DERs. This is becauseadoption of DERs creates unforeseen dynamics leading to data dis-tribution change. Therefore, old observations are different than thenew ones. For voltage magnitudes, we observed that the trained algo-rithms are able to detect most of the attacks even after integration ofthe DERs. The maximum and minimum voltage limits considered tobe ±10% but data distribution change in the voltage magnitudes isnot too much. Moreover, considering all networks, SVM is the bestmethod to detect attacks for voltage magnitudes for this case study.In addition, we observed that the least F-measure belongs to the BNwhich shows a significant drop. This drop is because of a decrease inthe detection rate which means its trained model has classified mostof the attacks as normal samples.

For branch currents, we observed that the KNN is better than othermethods while its F-measure is decreased significantly. The mainreason for this reduction is a decrease of the detection rate whichmeans integration of DERs has led to having broader measurementsrange so that the trained model incorrectly labels the new attackedvariables as normal ones. Furthermore, while stacking method hasgood F-measure in the trained model, it provides the worst resultalong with the BN. The reason is overfitting of the trained model.Overall, in this case study, it is observed that BN which is based onconstructing a DAG is not robust against changes in the network.The KNN is superior to other methods since it does not build anymodel but memorizes the training dataset and all the work is done atprediction time.

4.4 Case Study III - Testing the Trained Models to DetectAttacks in the Networks with EVs.

Adoption of plug-in EVs can lead to a significant draw on the dis-tribution grid. EVs can connect to the grid to supply power duringpeak load times to increase the reliability of the grid (vehicle to grid)[36] or they can be connected to the grid to be charged (grid to thevehicle).

In this case, for evaluating the robustness of the statistical learn-ing methods in managing adoption of EVs, attacks are orchestratedin the systems after integration of EVs. To this end, we have assumedthat EVs are integrated into the networks after integration of DERsfrom day 31st and attacks are launched on days 62 to 66. In thissimulation, EVs are connected to the grid to charge their batteriesbetween 11 PM to 2 AM and are connected to the grid to charge grid

(a) Voltage magnitudes (b) Branch currentsFig. 5: F-measure values for Case I


(a) Voltage magnitudes (b) Branch currentsFig. 6: F-measure values for Case II

(a) Voltage magnitudes (b) Branch currentsFig. 7: F-measure values for Case III

between 8 AM to 10 AM. The required energy for EVs is consideredto be 10% of the average daily load [36]. Locations of the EVs areselected randomly. Fig. 7 presents the summary of test results forthis case.

We observed that SVM has better F-measure over the tested net-works in detecting attacks for voltage magnitudes. However, MLPalso yields good performance. BN is the worst method which showsthat BN is not robust to adoption of EVs and it is because of increasein FN rate.

For branch currents, it is observed that KNN has better F-measureover different networks comparing to other methods. Moreover, itis observed that integration of EVs leads further decrease in the F-measure of the methods as the system size increases, i.e., 33-bus and123-bus. This is while in the small networks like 18-bus and also8-bus, F-measure is less decreased comparing to the results of casestudy II. Integration of EVs could lead to natural jumps in the datathat inflate the variance in the absence of attacks which mislead thetrained models. Therefore, it leads to increase of FN rate specificallyin the bigger networks and that is the major reason for reduction inF-measure.

4.5 Case Study IV - Testing the Trained Models to DetectAttacks in Integrated Power System.

To date, most of the distribution networks are designed and usuallyanalyzed separately without considering the impact of the transmis-sion network [37]. This is while a real power system is an integrationof transmission and distribution networks as shown in the Fig. 8. Fortransmission network, the BB acts as a PQ bus so that it is modeledas an equivalent admittance in the admittance matrix. For distribu-tion network, the BB acts as the slack bus. So, it is modeled as anequivalent network.

Fig. 8: Integrated power system

Therefore, one motivation of this case study is analyzing theinteraction between the transmission and distribution systems. Morespecifically, the effects of topology changes in the transmission net-work are considered in this case study. To this end, it is assumed thatthe distribution test networks are attached to the IEEE 14-bus testsystem and the system works normally until day 59. At this time,two line outages (4 - 5 and 2- 5) occur as shown in Fig. 9 and thesystem works under those contingencies for 9 days and attacks areorchestrated on days 62 to 66. This means attacks in the distributiongrid are launched after contingencies in the transmission network.Fig. 10 represents the summary of test results for this case study.

Fig. 9: IEEE 14-bus test system

We observed that SVM provides better results over different net-works. We observed that detecting attacks on voltage magnitudesin 123-bus system is more challenging compared to other cases. Forbranch currents, it is observed that MLP and KNN have better perfor-mance. It is noteworthy that detecting attacks on this variable in thiscase study, is less challenging comparing to adoption of DERs andEVs. The least F-measure over different networks is for BN networkand stacking method. This means the BN is not robust to changes andit is not recommended to be used in the smart distribution grid. F-measure of the stacking method shows the trained model in the casestudy I is overfitted. The major reason for decrease in F-measure forthis variable is misclassifying the attacked measurements as normalones.

Fig. 11 presents a condensed view of the numerical results. It con-tains the algorithms which have yielded best results in each case


(a) Voltage magnitudes (b) Branch currentsFig. 10: F-measure values for Case IV

Fig. 11: Best results over different networks and case studiesstudy in F-measure. As it is clear, for voltage magnitude, stackingapproach and SVM yield the best results and for branch currents,KNN outperforms other methods.

5 Conclusion

Data integrity attacks can deteriorate the control performance of thedistribution grid and cause maximum damages to it and may lead toa serious financial loss. Therefore, in this paper, for the first time,the attack detection problem is formulated as a statistical learningproblem with considering the inherent and special characteristics ofdistribution grid such as dynamic variation of load profiles due tothe adoption of DERs and EVs. To conduct attacks, the hypothesisin this paper is that an attacker can compromise the communica-tion infrastructure of the distribution grid and launch attacks on twocritical variables: voltage magnitudes and branch currents. Further-more, we take a comprehensive analysis on identifying the properdetection method for each variable and demonstrate a comprehen-sive benchmark of supervised and ensemble learning algorithms. Weobserved that dynamic nature of the distribution grid is difficult tomodel because the learning algorithm has to mimic different behav-iors with heavy time-dependencies. Therefore, the trained machinelearning algorithms could not yield the primary performance andmisclassify new observations after integration of DERs, EVs, andcontingencies in the transmission grid. The comprehensive resultscan serve as an exclusive reference for future discussions to improvecyber-security of the distribution grid.

6 References1 J. Zhao, L. Mili, and M. Wang, “A Generalized False Data Injection Attacks Against

Power System Nonlinear State Estimator and Countermeasures,” IEEE Transactionson Power Systems, pp. 1-1, Jan 2018.

2 C.-C. Sun, A. Hahn, and C.-C. Liu, “Cyber security of a power grid: State-of-the-art,” International Journal of Electrical Power & Energy Systems, vol. 99, pp. 45-56,Jan 2018.

3 K. Hamedani, L. Liu, R. Atat, J. Wu, and Y. Yi, “Reservoir Computing Meets SmartGrids: Attack Detection Using Delayed Feedback Networks,” IEEE Transactions onIndustrial Informatics, vol. 14, no. 2, pp. 734-743, Feb 2018.

4 A. Anwar, and A. N. Mahmood, “Anomaly detection in electric network database ofsmart grid: Graph matching approach,” Electric Power Systems Research, vol. 133,pp. 51-62, Apr 2016.

5 S. Poudel, Z. Ni, and N. Malla, “Real-time cyber physical system testbed for powersystem security and control,” International Journal of Electrical Power & EnergySystems, vol. 90, pp. 124-133, Feb 2017.

6 M. Mohammadpourfard, A. Sami, and Y. Weng, “Identification of False DataInjection Attacks with Considering the Impact of Wind Generation and Topology

Reconfigurations,” IEEE Transactions on Sustainable Energy, vol. PP, no. 99, pp.1-1, Dec 2017.

7 M. Mohammadpourfard, A. Sami, and A. R. Seifi, “A statistical unsupervisedmethod against false data injection attacks: A visualization-based approach,” ExpertSyst Appl, vol. 84, pp. 242-261, Oct 2017.

8 T. T. Kim, and H. V. Poor, “Strategic Protection Against Data Injection Attacks onPower Grids,” IEEE Transactions on Smart Grid, vol. 2, no. 2, pp. 326-333, Apr2011.

9 Q. Yang, J. Yang, W. Yu, D. An, N. Zhang, and W. Zhao, “On False Data-InjectionAttacks against Power System State Estimation: Modeling and Countermeasures,”IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 3, pp. 717-729,Mar 2014.

10 H. Sedghi, and E. Jonckheere, “Statistical Structure Learning to Ensure DataIntegrity in Smart Grid,” IEEE Transactions on Smart Grid, vol. 6, no. 4, pp.1924-1933, Jul 2015.

11 Y. Weng, Y. Liao, and R. Rajagopal, “Distributed Energy Resources TopologyIdentification via Graphical Modeling,” IEEE Transactions on Power Systems, vol.32, no. 4, pp. 2682-2694, Jul 2017.

12 B. P. Hayes, and M. Prodanovic, “State Forecasting and Operational Planning forDistribution Network Energy Management Systems,” IEEE Transactions on SmartGrid, vol. 7, no. 2, pp. 1002-1011, Mar 2016.

13 A. Angioni, T. SchlÃusser, F. Ponci, and A. Monti, “Impact of Pseudo-Measurements From New Power Profiles on State Estimation in Low-Voltage Grids,”IEEE Transactions on Instrumentation and Measurement, vol. 65, no. 1, pp. 70-77,Jan 2016.

14 A. Primadianto, and C. N. Lu, “A Review on Distribution System State Estima-tion,” IEEE Trans. Power Syst, vol. 32, no. 5, pp. 3875-3883, 2017.

15 M. Pau, P. A. Pegoraro, and S. Sulis, “Efficient Branch-Current-Based DistributionSystem State Estimation Including Synchronized Measurements,” IEEE Transactionson Instrumentation and Measurement, vol. 62, no. 9, pp. 2419-2429, Sep 2013.

16 R. Deng, P. Zhuang, and H. Liang, âAIJFalse Data Injection Attacks Against StateEstimation in Power Distribution Systems,âAI IEEE Transactions on Smart Grid, pp.1-1, Mar 2018.

17 T. T. Tesfay, J. P. Hubaux, J. Y. L. Boudec, and P. Oechslin, “Cyber-secure com-munication architecture for active power distribution networks,” in Proceedings of the29th Annual ACM Symposium on Applied Computing, Gyeongju, Republic of Korea,2014, pp. 545-552.

18 A. Abur, and A. G. Exposito, “Power System State Estimation: Theory andImplementation,” CRC Press, Mar 2004.

19 J. Liang, L. Sankar, and O. Kosut, “Vulnerability Analysis and Consequences ofFalse Data Injection Attack on Power System State Estimation,” IEEE Trans. PowerSyst, vol. 31, no. 5, pp. 3864-3872, Sep 2016.

20 J. Pearl, “Probabilistic Reasoning in Intelligent Systems: Networks for PlausibleInference,” Elsevier, 2014.

21 C. Cortes, and V. Vapnik, “Support-Vector Networks,” Machine Learning, vol. 20,no. 3, pp. 273-297, Sep 1995.

22 M. Kantardzic, “Data Mining: Concepts, Models, Methods, and Algorithms,” JohnWiley & Sons, Oct 2011

23 R. Quinlan, “C4.5: Programs for Machine Learning,” Morgan Kaufmann Publish-ers, 1993.

24 J. A. Anderson, and J. Davis, “An introduction to neural networks,” MIT Press,1995.

25 M. Esmalifalak, L. Liu, N. Nguyen, R. Zheng, and Z. Han, “Detecting StealthyFalse Data Injection Using Machine Learning in Smart Grid,” IEEE Systems Journal,


vol. PP, no. 99, pp. 1-9, Aug 2014.26 G. F. Cooper, and E. Herskovits, “A Bayesian method for the induction of prob-

abilistic networks from data,” Machine Learning, vol. 9, no. 4, pp. 309-347, Oct1992.

27 S. Kulkarni, and G. Harman, “An Elementary Introduction to Statistical LearningTheory,” John Wiley & Sons, Aug 2011.

28 R. E. Schapire, and Y. Freund, “Boosting: Foundations and Algorithms,” MITPress, 2012.

29 D. H. Wolpert, “Stacked generalization,” Neural Networks, vol. 5, no. 2, pp. 241-259, 1992.

30 J. Bergstra, and Y. Bengio, “Random Search for Hyper-Parameter Optimization,”J Mach Learn Res, vol. 13, pp. 281-305, Feb 2012.

31 R. D. Zimmerman, C. E. Murillo-Snchez, and R. J. Thomas, “MATPOWER:Steady-state operations, planning, and analysis tools,” IEEE Transactions on PowerSystems, vol. 26, no. 1, pp. 12âAS19, Feb 2011.

32 DIgSILENT PowerFactory, “DIgSILENT PowerFactory 15 User Manual,” Aug2013.

33 Office of Energy Efficiency & Renewable Energy. “Commercial and ResidentialHourly Load Profiles,” 2017; Available: https://openei.org

34 M. Tajdinian, A. R. Seifi, and M. Allahbakhshi, “Calculating probability densityfunction of critical clearing time: Novel Formulation, implementation and applicationin probabilistic transient stability assessment,” International Journal of ElectricalPower & Energy Systems, vol. 103, pp. 622-633, Jun 2018.

35 A. P. Dobos, “PVWatts version 5 manual,” National Renewable Energy Labora-tory, Sep 2014.

36 S. Gao, K. T. Chau, C. Liu, D. Wu, and C. C. Chan, “Integrated Energy Manage-ment of Plug-in Electric Vehicles in Power Grid With Renewables,” IEEE Trans. Veh.Technol, vol. 63, no. 7, pp. 3019-3027, Sep 2014.

37 A. R. Abbasi, and A. R. Seifi, “A new coordinated approach to state estimationin integrated power systems,” International Journal of Electrical Power & EnergySystems, vol. 45, no. 1, pp. 152-158, 2013

7 Appendices

Tables 1 to 12 represent the detailed results of the machine learningmethods over different case studies. As one can see, detecting attackson branch currents are more challenging than voltage magnitudes.More specifically, we observed a significant reduction in F-measurevalues of the learning algorithms when the trained models are testedon datasets of the case studies II-IV. This is because the variance ofdata of branch currents is much more than the voltage magnitudes.Fig. 12a shows how voltage magnitude of a bus changes over threemonths. Fig. 12b shows variance change of a line current over threemonths.

(a) Voltage magnitude of a bus (b) Current magnitude of a line

Fig. 12: Measurements of a bus

As it is clear, the range of the line current magnitudes is muchmore than voltage magnitudes. Such a broader valid range makes anattack obvious on voltage magnitudes to be hard to distinguish onbranch currents.

Fig. 13a shows the voltage magnitude of a bus before and afterintegration of DERs. We observe that integration of DERs does notlead too much data distribution change so that the learning algo-rithms can yield good performance in detecting attacks on voltagemagnitudes in case study II. This is true also about other case studies.

(a) Change in voltage magnitudes

due to integration of DERs

(b) Change in current magnitudes

due to integration of DERs

Fig. 13: Effects of integration of DERs on measurements of a bus

Fig. 13b shows the effect of integration of DERs in case studyII on a line current. As it is clear, this integration shows a signifi-cant change in data distribution. Therefore, it is difficult for machinelearning methods to manage this change and detect attacks. Morespecifically, based on the detailed results for case study II in Tables2, 6, and 10, the learning algorithms are experiencing a significantreduction in F-measure because of reduction in detection rate. Thismeans integration of DERs has led to label the attacked measure-ments as normal ones and has increased the false negative rate.Furthermore, we observed more reduction in the 18-bus test systemfor this measurement compared to other networks. This is becausethe peak demand in this network is more than the other utilized net-works which shows the attack detection becomes more challengingas the system load increases.

As one can see in Tables 3, 7, 11, integration of EVs leads toincrease of false negative rate which leads to decrease in F-measurefor branch currents. This reduction is more in bigger networks, i.e.,33-bus and 123-bus. For voltage magnitudes, we observed that SVMoutperforms other methods. Experimental results show that stack-ing model is overfitted and cannot provide good results in test casestudies. Furthermore, results show that the boosting method does nothave a substantial effect on the base learners’ F-measure.

Tables 4, 8, and 12 present the results for case study IV. It is clearthat attack detection on voltage magnitudes is more challenging forlearning methods as the system size increase except SVM. Consid-ering different case studies, SVM can be a good option for detectingattacks on voltage magnitudes. For branch currents, KNN is morerobust against uncertainty comparing to other methods.

We observed that graphical models such as BN are not robustagainst changes and they might not be proper to be used in thedistribution grid. We observed that detection rate is not a goodperformance metric to evaluate the methods. This is because analgorithm has a higher detection rate in some scenarios but it alsohas a higher false positive rate. For example, in the Table 2, detectionrate of C4.5 for line current is 0.947 and it is the highest detectionrate while it has less F-measure. The reason is its high false positiverate 0.673 which means most of the normal samples are classified asattack samples. That is why this algorithm has higher detection rate.Therefore, F-measure metric is more reliable in this context.


Table 1 Summary of Test Results 18-bus: Case I.Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.858 1 0.253SVM 1 1 0KNN 1 1 0C4.5 0.997 0.996 0.001MLP 1 1 0

a) Voltage Magnitude

Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.932 1 0.12SVM 1 1 0KNN 1 1 0C4.5 0.997 0.996 0.001MLP 1 1 0

b) Voltage Magnitude: Boosting

Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Stacking 1 1 0

c) Voltage Magnitude:Stacking

Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.929 0.989 0.118SVM 0.915 0.91 0.08KNN 0.911 0.943 0.115C4.5 0.887 0.938 0.154MLP 0.867 0.815 0.09

d) Line Current


e) Line Current: Boosting

Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Stacking 0.955 0.965 0.053

f) Line Current: Stacking

Table 2 Summary of Test Results 18-bus: Case II.Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.637 0.436 0.18SVM 1 1 0KNN 1 1 0C4.5 0.961 0.942 0.024MLP 1 1 0


Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.752 0.632 0.148SVM 1 1 0KNN 1 1 0C4.5 0.961 0.942 0.02MLP 1 1 0





d) Line Current

Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.614 0.857 0.55SVM 0.635 0.661 0.388KNN 0.70 0.584 0.20C4.5 0.574 0.946 0.658MLP 0.579 0.253 0.0.07




Table 3 Summary of Test Results 18-bus: Case III.Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.624 0.40 0.168SVM 1 1 0KNN 1 1 0C4.5 0.961 0.946 0.028MLP 1 1 0


Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.761 0.638 0.137SVM 1 1 0KNN 1 1 0C4.5 0.961 0.942 0.02MLP 1 1 0





d) Line Current





Table 4 Summary of Test Results 18-bus: Case IV.Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.689 0.504 0.147SVM 1 1 0KNN 1 1 0C4.5 0.995 0.989 0MLP 1 1 0


Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.819 0.744 0.119SVM 1 1 0KNN 1 1 0C4.5 0.995 0.989 0MLP 1 1 0





d) Line Current





Table 5 Summary of Test Results 33-bus: Case I.Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.939 1.000 0.134SVM 1 1 0KNN 0.997 0.994 0C4.5 0.996 0.996 0.003MLP 1 1 0


Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.975 1 0.056SVM 1 1 0KNN 0.997 0.994 0C4.5 0.996 0.996 0.003MLP 1 1 0




Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.935 1 0.143SVM 0.912 0.959 0.144KNN 0.936 1 0.141C4.5 0.938 0.987 0.121MLP 0.955 0.998 0.098

d) Line Current

Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.961 0.998 0.084SVM 0.929 0.964 0.113KNN 0.936 1 0.141C4.5 0.938 0.987 0.120MLP 0.955 0.998 0.098





Table 6 Summary of Test Results 33-bus: Case II.Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.482 171 0.049SVM 1 1 0KNN 0.926 0.853 0C4.5 0.912 0.978 0.153MLP 1 1 0


Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.482 0.171 0.049SVM 1 1 0KNN 0.926 0.853 0C4.5 0.912 0.978 0.153MLP 1 1 0





d) Line Current





Table 7 Summary of Test Results 33: Case III.Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.492 0.183 0.051SVM 1 1 0KNN 0.921 0.844 0C4.5 0.909 0.974 0.155MLP 1 1 0







d) Line Current





Table 8 Summary of Test Results 33: Case IV.Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.270 0 0.047SVM 0.993 1 0.015KNN 0.726 0.523 0C4.5 1 1 0MLP 0.958 1 0.093


Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.270 0 0.047SVM 0.990 1 0.021KNN 0.726 0.523 0C4.5 1 1 0MLP 0.958 1 0.093


Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Stacking 0.958 1 0.093



d) Line Current





Table 9 Summary of Test Results 123-bus: Case I.Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.979 1.000 0.068SVM 1 1 0KNN 0.998 0.997 0C4.5 0.996 0.999 0.010MLP 1 1 0






Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.983 1 0.056SVM 0.976 1 0.076KNN 0.976 1 0.078C4.5 0.974 0.999 0.079MLP 0.982 1 0.057

d) Line Current

Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.983 1 0.056SVM 0.912 0.998 0.266KNN 0.978 1 0.071C4.5 0.975 0.999 0.078MLP 0.982 1 0.057


Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Stacking 0.983 1 0.056


Table 10 Summary of Test Results 123-bus: Case II.Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.173 0.019 0.010SVM 1 1 0KNN 0.857 0.785 0C4.5 0.925 0.998 0.228MLP 0.785 0.677 0


Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.173 0.019 0.010SVM 1 1 0KNN 0.857 0.785 0C4.5 0.925 0.998 0.228MLP 0.785 0.677 0


Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Stacking 0.770 0.655 0



d) Line Current

Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.147 0 0SVM 0.805 0.890 0.373KNN 0.862 0.821 0.061C4.5 0.794 0.725 0.079MLP 0.803 0.876 0.354


Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Stacking 0.147 0 0



Table 11 Summary of Test Results 123-bus: Case III.Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.158 0.008 0.006SVM 1 1 0KNN 0.839 0.757 0C4.5 0.921 0.992 0.228MLP 0.776 0.664 0


Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.158 0.008 0.006SVM 1 1 0KNN 0.839 0.757 0C4.5 0.921 0.992 0.228MLP 0.776 0.664 0


Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Stacking 0.763 0.644 0



d) Line Current





Table 12 Summary of Test Results 123-bus: Case IV.Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.147 0 0.004SVM 0.977 1 0.074KNN 0.202 0.039 0C4.5 0.146 0 0.006MLP 0.381 0.205 0.089


Algorithm F-Measure(%) Detection Rate(%) False Positive(%)Bayesian 0.147 0 0.004SVM 0.977 1 0.074KNN 0.202 0.039 0C4.5 146 1 0.006MLP 0.381 0.205 0.089





d) Line Current






Benchmark of Machine Learning Algorithms on Capturing ...yweng2/papers/2019IET.pdf · Benchmark of...

Documents

Transcript of Benchmark of Machine Learning Algorithms on Capturing ...yweng2/papers/2019IET.pdf · Benchmark of...