Online Data Validation for Distribution Operations Against Cybertampering

11
550 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 29, NO. 2, MARCH 2014 Online Data Validation for Distribution Operations Against Cybertampering Yonghe Guo, Student Member, IEEE, Chee-Wooi Ten, Senior Member, IEEE, and Panida Jirutitijaroen, Senior Member, IEEE Abstract—In recent development of ubiquitous computing on home energy management systems, additional IP-based data collection points are utilized to acquire electricity usage directly from consumers. As cybertampering can disrupt the accuracy of billing information, a well-structured cyberdefense mechanism is required to validate the availability and integrity of metering data for a customer billing center. This paper proposes an online data validation framework to verify home energy meters (EMs) in a secondary network with real-time measurements from feeder remote terminal units (FRTUs) in primary network. A potential cybertampering is identied based on three levels: 1) feeders; 2) subsystems; and 3) customers. This framework attempts to iden- tify malicious consumers and their locations based on historical consumption pattern and network operating conditions in real time. The proposed framework is demonstrated to detect tam- pering activities using a realistic 12.47-kV distribution network. Results show that the data validation framework can accurately identify subsystems with tampered meters. This framework allows the system operator to restrict the search area to a manageable number of meters in a subsystem. Index Terms—Advanced metering infrastructure, distribution management system, information integrity and validation, trusted computing. I. INTRODUCTION U TILITIES around the world have been rening their busi- ness models to provide additional choices to consumers by deploying IP-based advanced metering infrastructure (AMI) [1]. Digital upgrade in distribution systems is envisioned to im- prove the efciency of power delivery with the use of communi- cation technologies [2]. These measurements from AMI can be validated through data exchange between the distribution dis- patching center and the customer billing network [1], [3]. At- tempts have been made to detect bad measurements from AMI real-time data using distribution system state estimation [4]–[6]. Most techniques are based on weighted least square state esti- mation (WLS-SE). Although WLS-SE has been shown to detect bad data in transmission systems [7], its application to distribu- Manuscript received October 21, 2012; revised March 27, 2013; accepted September 17, 2013. Date of publication October 09, 2013; date of current ver- sion February 14, 2014. Paper no. TPWRS-01176-2012. Y. Guo and C.-W. Ten are with the Electrical and Computer Engineering Department, Michigan Technological University, Houghton, MI 49931 USA (e-mail: [email protected]; [email protected]). P. Jirutitijaroen is with the Electrical and Computer Engineering Department, National University of Singapore, Singapore 117576 (e-mail: [email protected]. sg). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TPWRS.2013.2282931 tion systems can be limited due to a large number of state vari- ables and a small number of measurement redundancies. Nontechnical losses caused by incorrect metering data can cost billions of dollars, which have been a major concern to power utilities [8]–[10]. The major source of these losses is energy theft. Traditionally, energy theft includes directly con- necting unregistered electrical appliances to the power grid [11], [12], using alternative neutral lines, tampering with meters/ter- minals, sabotaging control wires, using magnets to decelerate the spinning discs for recording the energy consumption, and tapping off of a neighbor/legal consumer [13]. Energy theft can be detected using supervised and unsuper- vised learning algorithms. Pattern recognition and data-mining approaches are applied in customer load characterization [14]–[16], consumer energy usage anomaly detection [17], and distributed intrusion detection system for distribution systems [18]. An extreme learning machine method is applied to identify energy consumption by evaluating irregular load behavior in [11]. A genetic algorithm–support vector machines (GA—SVM) based framework is proposed to detect the elec- tricity theft in [19] and SVM is also used in [8], [11], and [20]. All of the aforementioned techniques require long-term historical data collection and are ofine evaluation-oriented. While new technologies of IP-based communication improve system operations, the reliability of information ow for each household has become a critical issue. Securing genuine cus- tomer data is an important issue especially when energy con- sumption data at the electronic meter can be exploited and tam- pered [21]. The feasibility of cybertampering on electronic me- ters is enumerated and identied [22] and can evolve into dis- tributed form [23]. Compared with energy theft, cybertampering can result in large utility billing discrepancies. A domain-spe- cic threat infecting the EMs can be initiated on a massive scale to alter hundreds of meter readings [24]. Cyberthreats are an imminent challenge to the upcoming and existing AMI systems due to lack of online data validation. The contribution of this paper is to establish a framework to perform online data detection of cross-domain irregularity of measurements. Section II describes a data validation model and attackers’ intention for distribution system cybersecurity. Section III details the detection algorithms of data irregularity on three levels. Section IV provides case studies for simulation and validation. Section V presents the three-level validation re- sults. Section VI concludes with a discussion and future work. II. ONLINE DATA VALIDATION MODEL Cybertampering is an electronic alteration from their real values of energy consumption of home energy meters. The U.S. Government work not protected by U.S. copyright.

Transcript of Online Data Validation for Distribution Operations Against Cybertampering

Page 1: Online Data Validation for Distribution Operations Against Cybertampering

550 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 29, NO. 2, MARCH 2014

Online Data Validation for Distribution OperationsAgainst Cybertampering

Yonghe Guo, Student Member, IEEE, Chee-Wooi Ten, Senior Member, IEEE, andPanida Jirutitijaroen, Senior Member, IEEE

Abstract—In recent development of ubiquitous computing onhome energy management systems, additional IP-based datacollection points are utilized to acquire electricity usage directlyfrom consumers. As cybertampering can disrupt the accuracy ofbilling information, a well-structured cyberdefense mechanismis required to validate the availability and integrity of meteringdata for a customer billing center. This paper proposes an onlinedata validation framework to verify home energy meters (EMs)in a secondary network with real-time measurements from feederremote terminal units (FRTUs) in primary network. A potentialcybertampering is identified based on three levels: 1) feeders;2) subsystems; and 3) customers. This framework attempts to iden-tify malicious consumers and their locations based on historicalconsumption pattern and network operating conditions in realtime. The proposed framework is demonstrated to detect tam-pering activities using a realistic 12.47-kV distribution network.Results show that the data validation framework can accuratelyidentify subsystems with tampered meters. This framework allowsthe system operator to restrict the search area to a manageablenumber of meters in a subsystem.

Index Terms—Advanced metering infrastructure, distributionmanagement system, information integrity and validation, trustedcomputing.

I. INTRODUCTION

U TILITIES around the world have been refining their busi-ness models to provide additional choices to consumers

by deploying IP-based advanced metering infrastructure (AMI)[1]. Digital upgrade in distribution systems is envisioned to im-prove the efficiency of power delivery with the use of communi-cation technologies [2]. These measurements from AMI can bevalidated through data exchange between the distribution dis-patching center and the customer billing network [1], [3]. At-tempts have been made to detect bad measurements from AMIreal-time data using distribution system state estimation [4]–[6].Most techniques are based on weighted least square state esti-mation (WLS-SE). AlthoughWLS-SE has been shown to detectbad data in transmission systems [7], its application to distribu-

Manuscript received October 21, 2012; revised March 27, 2013; acceptedSeptember 17, 2013. Date of publication October 09, 2013; date of current ver-sion February 14, 2014. Paper no. TPWRS-01176-2012.Y. Guo and C.-W. Ten are with the Electrical and Computer Engineering

Department, Michigan Technological University, Houghton, MI 49931 USA(e-mail: [email protected]; [email protected]).P. Jirutitijaroen is with the Electrical and Computer Engineering Department,

National University of Singapore, Singapore 117576 (e-mail: [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TPWRS.2013.2282931

tion systems can be limited due to a large number of state vari-ables and a small number of measurement redundancies.Nontechnical losses caused by incorrect metering data can

cost billions of dollars, which have been a major concern topower utilities [8]–[10]. The major source of these losses isenergy theft. Traditionally, energy theft includes directly con-necting unregistered electrical appliances to the power grid [11],[12], using alternative neutral lines, tampering with meters/ter-minals, sabotaging control wires, using magnets to deceleratethe spinning discs for recording the energy consumption, andtapping off of a neighbor/legal consumer [13].Energy theft can be detected using supervised and unsuper-

vised learning algorithms. Pattern recognition and data-miningapproaches are applied in customer load characterization[14]–[16], consumer energy usage anomaly detection [17],and distributed intrusion detection system for distributionsystems [18]. An extreme learning machine method is appliedto identify energy consumption by evaluating irregular loadbehavior in [11]. A genetic algorithm–support vector machines(GA—SVM) based framework is proposed to detect the elec-tricity theft in [19] and SVM is also used in [8], [11], and[20]. All of the aforementioned techniques require long-termhistorical data collection and are offline evaluation-oriented.While new technologies of IP-based communication improve

system operations, the reliability of information flow for eachhousehold has become a critical issue. Securing genuine cus-tomer data is an important issue especially when energy con-sumption data at the electronic meter can be exploited and tam-pered [21]. The feasibility of cybertampering on electronic me-ters is enumerated and identified [22] and can evolve into dis-tributed form [23]. Compared with energy theft, cybertamperingcan result in large utility billing discrepancies. A domain-spe-cific threat infecting the EMs can be initiated on a massive scaleto alter hundreds of meter readings [24]. Cyberthreats are animminent challenge to the upcoming and existing AMI systemsdue to lack of online data validation.The contribution of this paper is to establish a framework

to perform online data detection of cross-domain irregularityof measurements. Section II describes a data validation modeland attackers’ intention for distribution system cybersecurity.Section III details the detection algorithms of data irregularityon three levels. Section IV provides case studies for simulationand validation. Section V presents the three-level validation re-sults. Section VI concludes with a discussion and future work.

II. ONLINE DATA VALIDATION MODEL

Cybertampering is an electronic alteration from their realvalues of energy consumption of home energy meters. The

U.S. Government work not protected by U.S. copyright.

Page 2: Online Data Validation for Distribution Operations Against Cybertampering

GUO et al.: ONLINE DATA VALIDATION FOR DISTRIBUTION OPERATIONS AGAINST CYBERTAMPERING 551

Fig. 1. Distribution feeder with IP-based EMs.

most common situation is that malicious customers may reducethe scale of their energy usages to fake their electricity bills.The difference between falsified meter readings and actualenergy consumptions is electronically modified by malicioususers and is referred to in this paper as tampered energy usage.This work classifies potential cybertampering to three types ofattack based on the attackers’ intentions.• Type 1—Individual attack: This type is an independent at-tack targeting to and initiated by a single meter. A meterreading of the target meter is manipulated to reduce actualenergy consumption. The impact of such attack is thereforelimited to an individual customer.

• Type 2—Collusive attack: This attack is a collaborationamong different meters. An attacker attempts to lower theelectricity reading of his or her own meter while fraudu-lently increasing others. The attacker would need to com-promise neighbor’s AMI system.

• Type 3—Massive tampering attack: This type represents amassive attack affecting multiple metering devices. Suchevent may be caused by worms that can be coordinated bya single attacker to increase or decrease the energy usagefor all affected customers.

Due to potential financial loss, Type-1 and Type-3 attackswould be of interest to most utilities and thus are the majorconcern of the proposed framework. A detection probabilityof a Type-1 attack is dependent on the magnitude of tamperedmetering values. Tampered energy usage can be detected even

when there are multiple malicious customers under one sub-system. A Type-3 attack can be distinguished due to a signif-icant discrepancy between two different data sources of FRTUsand EMs. It should be noted that the proposed framework maynot accurately detect a Type-2 attack for the customers under thesame distribution transformer. However, accurate identificationcan be achieved through offline historical record analysis.To locate tampered meters, an exhaustive inspection on all

meters is needed, which is impractical for a large number ofcustomers. The data validation framework aims to deduce theinformation irregularities and helps energy providers detectthis anomaly. The proposed framework utilizes the existingdata resources from feeder remote terminal units (FRTUs)and IP-based EMs. Fig. 1 illustrates power injection startingfrom the feeder head of a primary distribution feeder to endconsumers. With the presence of FRTUs, the distribution feedercan be grouped into several subsystems, e.g.,, regions 1, 2and 3 shown in Fig. 1. The customer billing center is assumedto share consumer energy information with the distributiondispatching center to establish a trusted credibility database.Our framework is proposed with three-level evaluation to de-

tect information anomaly: (A) feeders; (B) subsystems; and (C)customers. This is summarized in the flowchart of Fig. 2. Part(A) depicted in Fig. 2 is the evaluation to determine the avail-ability and trustability of the FRTUs. Based on the evaluation re-sult, the framework segregates the feeders into multiple subsys-tems based on the trustworthy levels of FRTUs. Part (B) focuses

Page 3: Online Data Validation for Distribution Operations Against Cybertampering

552 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 29, NO. 2, MARCH 2014

Fig. 2. Flowchart of the proposed framework.

on the subsystem-level detection. For each group, a statisticalapproach based on experience is utilized to detect the existenceof tampered metering datasets based on statistical results fromthe distribution network analysis. Once an inconsistency withina subsystem is detected, the framework continues the next partof the evaluation based on the individual customer identifica-tion. For each individual customer associated with the selectedsubsystem(s), the framework first collects the 24-h load profiledata to further analyze these datasets. This is implemented usingpattern recognition approaches. The techniques used in part (C)detection includes a fuzzy c-mean clustering based credibilityscore system in part (C1) and a support vector machine (SVM)shown in part (C2) for accurate identification.The following assumptions are made to facilitate data valida-

tion algorithms.1) Most of the existing FRTUs and EMs use different commu-nication networks, i.e., supervisory control and data acqui-sition (SCADA) network versus AMI network, separatedby frame relay switches, and different communication pro-tocols (e.g., DNP3 versus ANSI C12.18). There is no directcommunication link between the FRTUs and EMs.

2) The SCADA system has been developed with securitystrategies to protect it from a cyberattack [25], while thesystem-wide validation for EMs remains in the early stage.

3) There is no switching and system reconfiguration in thenetwork while performing data validation. Thus, networktopology and parameters of both primary and secondarydistribution networks are accurate in real time.

The above assumptions imply that FRTUs are considered tobe trustworthy data sources to validate customer metering infor-mation. The proposed analysis focuses on steady-state operationof the distribution network. Network data such as line segments,distribution transformers and capacitor banks are assumed to be

regularly updated following basic requirement of modern dis-tribution management systems (DMSs). For incrementally up-dated devices, such as remote-controllable tie switches, regula-tors, and locally controlled capacitors, accurate information canbe obtained from SCADA/DMS systems.

III. DETECTION ALGORITHMS OF DATA IRREGULARITY

Here, we present the data validation algorithms inSections III-A and III-B. The detection of tampered energyconsumption at the consumer level is discussed in Section III-C.

A. FRTU Verification and Subsystem Grouping

The most common measurement units at primary distribu-tion systems are substation RTUs and FRTUs that serve as thetelemetry devices with remote-controllable switches. The ideahere is to segregate large radial systems into several subsys-tems utilizing FRTU infrastructure with a number of consumerelectronic meters (EMs) that can be evaluated for potential tam-pering events. The topology of FRTUs deployment can be de-scribed using a connection matrix

.... . .

.... . .

...

.... . .

.... . .

...

(1)

where and are the indices of FRTU and

if is the upstreamneighboring FRTU ofif is the downstreamneighboring FRTU ofif or i and j arenot neighboring FRTU.

(2)

Several factors can cause FRTUs to malfunction, such as powersupply loss, RTU disconnection, sensor fault, or communicationfailure. Thus, the availability and integrity of FRTUs need to beverified before forming a connection matrix. Such verification isachieved by analyzing the log information recorded by FRTUs.An FRTU is deemed untrustworthy if malfunction events areobserved, e.g., losing communication with SCADA, frequentpower supply warning, irregular data response, and unautho-rized configuration change. An FRTU is deemed trustworthyif none of the aforementioned malfunction events are present.The subsystems are divided based on the trustworthy FRTU set.A diagonal trustworthy matrix is defined to represent the avail-ability status of each FRTU as

(3)

where is a Boolean variable indicating if the correspondingFRTU is trustworthy or not. Subsystem grouping is based on thetrustworthy FRTU set. The new FRTU connection matrix can beobtained from the original connection matrix and the trust-worthy matrix . The algorithm is summarized in Algorithm 1.The algorithm first calculates a new intermediate matrix andthen checks whether there are any columns in equal to 0. The

Page 4: Online Data Validation for Distribution Operations Against Cybertampering

GUO et al.: ONLINE DATA VALIDATION FOR DISTRIBUTION OPERATIONS AGAINST CYBERTAMPERING 553

existence of such zero column vectors indicates the presence ofunreliable FRTUs which must be excluded from the trustworthyFRTU set. The algorithm calls the FINDPARENT procedure torecursively find the new upstream neighboring FRTUs of thetrustworthy FRTU set. A new connection matrix of trustworthyFRTU is then updated. The FRTU groups for all subsystems canbe directly obtained from the row vectors of the connection ma-trix that have entry 1. For each row vector, the index of the cor-responding row vector denotes the root FRTU of the subsystem,while the column indices of entries of 1 in that row represent thenumbers of the leaf FRTUs of the subsystem.

Algorithm 1 Subsystem grouping

Require: ,

Ensure: Updated connection matrix

1:

2:Let and be the -th column and row of , respectively

3:for each column in do

4: if then

5: FINDPARENT ( , )

6:

7:end if

8:end for

9:for each column in do

10: if then

11:

12:end if

13:end for

14:Decompose

15:

16:procedure FINDPARENT , ,

17:Find

18:if then

19:return

20:else

21:return FINDPARENT( , , )

22:end if

23:end procedure

B. Subsystem Identification

This is the second part in the proposed data validation frame-work. The root FRTU/RTU serves as a power injection point ofthe subsystem. The FRTU(s) located at the points downstreamsubtrees are treated as the border switches, which are the leafnodes of a spanning tree. Real-time power flow measurementsof these FRTUs represent lumped loads of the other subsystems

connected to them. This subsystem is then validated by distri-bution power flow module to estimate its power losses.The frequency of data from FRTU and customer meters are

not identical. The sampling rate for SCADA network is sig-nificantly higher than consumer electronic meters and is typi-cally polls the real-timemeasurements between 3–10 s. The datatransfer between distribution operations and customer billingcenter is assumed to be 15-min time interval, which follows thecycle of consumer electronic meters. The average real powerand reactive power data over 15 min for both FRTUs and EMsmust first be calculated. With cybertampering of Types 1 and3, one would expect that the amount of power losses is muchhigher than anticipated. With all sampling datasets from FRTUs(primary network) and EMs (secondary network), power lossesfor the given subsystem can be estimated by applying any dis-tribution power flow method. Mismatch Ratio of sub-system is first defined as follows as a measure to detect incon-sistency of metering data:

(4)

(5)

(6)

where

average three-phase power consumption inthe th subsystem by FRTUs;

average three-phase power losses of the thsubsystem, calculated from energy meterreadings using power flow analysis;

average power measurement of phase bythe root FRTU in the th subsystem;

average power measurement of phase bythe th FRTU which is one of the leaf nodesin the th subsystem;

average three-phase power measurements ofth home energy meter in th subsystem;

set of FRTUs leaf nodes in the th subsystem;

set of home energy meters belonging to the-th subsystem;

predefined threshold, selected based onexperience.

The above data validation for subsystem is performed in 15-mincycles, which follow the cycle of the consumer electronic me-ters. If frequent violations of constraint (5) are observed for asubsystem over a specified period of time, especially duringpeak hours, then the subsystem can be inferred as malicious.Using the predetermined threshold of a subsystem from his-

torical data, the irregularity of systematic identification is de-termined by: 1) total power consumption of the subsystem and2) magnitude of tampered power. The magnitude of detectabletampered power consumption (DTP) value for th is defined as

(7)

Page 5: Online Data Validation for Distribution Operations Against Cybertampering

554 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 29, NO. 2, MARCH 2014

Fig. 3. Magnitude of detectable tampered energy consumption for a subsystem.

This equation determines the boundary between detectableand nondetectable tampered value, which is depicted in Fig. 3.The two boundary lines separate detectable and nondetectableregions. The region between two boundary lines is a detectableregion, and the lower area is an undetectable region. This im-plies that for a subsystem, if the value of the tampered homeenergy meters is within the detectable region, e.g.,, point A inFig. 3, then it is likely to be identified. Similarly, if the value isoutside of the detectable region of point B shown in Fig. 3, itis unlikely to be identified. It should be noted that the thresholdvalues of each subsystem should be regularly adjusted to re-flect changes in annual consumption data and possible networkupgrade.

C. Potentially Tampered EM Identification

Here, we propose a pattern recognition method to determinethe anomaly of energy consumption by identifying the behaviorof historical pattern. Studies show that customer behaviors canbe characterized by repeated patterns of consumption profile[17]. As such, an observed daily load profile that is deviatedfrom regular patterns may indicate a potential cybertampering.A credibility score associated to each consumer is evaluatedbased on the trend of their energy usage to determine whethertheir energy consumption is consistent with their historicalprofile.Daily load profile of an individual contains 96 data points,

i.e., four points per hour for 24 h, given that the energy meteris updated every 15-min cycle. These datasets for all consumersare preprocessed to eliminate the random noise by averaging thedata points over time. Each original 15-min load curve vector

is aggregated into a 2-haverage vector . Eachdata point in is given as

(8)

where is 24-h data points from 15-min cycle, and is the sizeof 15-min data points for a 2-h interval, i.e., .The aggregation ratio is , and . The dimension

Fig. 4. Load curve aggregation for a home EM.

of original datasets can be considerably reduced to facilitate thecredibility evaluation as well as to minimize noise. An exampleof load curve aggregation is shown in Fig. 4.The 30-day historical load profiles of an individual customer

are used as a reference for detecting potentially tampered me-ters. The time window of 30 days for evaluation is assumed toprovide a relatively stable usage pattern. The dimension of theseoriginal vectors is first reduced due to aggregation. These vec-tors are then clustered to groups by the fuzzy -means (FCM)method. Each vector of consumer energy datasets belongs toa cluster with membership value. For each vector, there aremembership values. After FCM clustering, the centroid of loadcurve is obtained. The membership value of each load vector iscalculated by

(9)

where

membership value to the aggregated loadcurve of the th customer belonging to the thcentroid;

th centroid of the th load curve;

th centroid of the th load curve;

aggregated load curve of the th customer.

A credibility score of an aggregated load curve vectoris defined by its maximum membership value

(10)

Different numbers of clusters will inevitably affect the calcu-lation of credibility score. Considering that daily load profilescould be highly dependent on the different days within a week,the possible number of energy usage patterns are assumed to bean integer between 2 to 7. The following steps determine thenumber of clusters for each customer.1) Let be cluster number candidates. Set

and conduct FCM on the reduced

Page 6: Online Data Validation for Distribution Operations Against Cybertampering

GUO et al.: ONLINE DATA VALIDATION FOR DISTRIBUTION OPERATIONS AGAINST CYBERTAMPERING 555

load profiles, respectively, to calculate the min-imum credibility score for each number of clusters,. A minimum credibility score set can be obtained:

, where denotesthe set of the credibility scores of reduced load profilesclustered with clusters.

2) Determine the maximum value from the minimum credi-bility score set, i.e., . Set this value to bethe credibility threshold of this customer and then selectthe corresponding cluster number to the maximum valueto be .

A credibility score implies the regularity of correspondingload profile vector. The credibility threshold can be deter-mined based on the minimum credibility score of untamperedhistorical data. It is possible that in some scenarios the calcu-lated credibility threshold could be very low due to non-tam-pering irregular load profiles, or clustering errors. In that case,the credibility threshold can be improved by

(11)

where is a constant number specifying the lower bound thecredibility threshold.An example of a subsystem is given in Fig. 5 to illustrate the

concept of anomaly detection algorithms in Sections III-B andIII-C. Fig. 5 also shows energy meters, root node of FRTU (’),and child nodes of FRTU (”). Through power flow analysis ofthe subsystem, the power loss is 122 W. First, theinconsistency of the metering data from FRTUs and electronicmeters are evaluated using (5) to validate the data integrity. Theis set to be 0.5% and the calculated mismatch ratio is 8.9%,

which is more than predefined acceptable mismatch ratio. Thissubsystem is potentially suspicious with cybertampering. Aseach distribution transformer is connected to one customer, allthree consumers are evaluated based on the historical trendingto identify the credibility scores for each consumer.Equations (12) and (13), shown at the bottom of the page,

are the three cluster centers (the three rows) of the consumer 1load curve and 2-h load average, respectively. The membershipvalue of the load vector is calculated using (9). Equation (14)

(14)

gives the credibility score for consumer 1, which is 0.9523. Sim-ilarly, the credibility scores for customers 2 and 3 are 0.9583 and0.6322. The threshold of credibility score is set to be 0.8, and,as a result, the consumer 3 is deemed malicious.

D. Irregularity Detection Using Supporter Vector Machine

The trust credibility system is used to detect the load profileirregularity to recommend the potential range of suspects. In

Fig. 5. Case example with three consumers and two FRTUs.

practice, a load profile anomaly is not necessarily caused bycybertampering. A number of other factors can potentially leadto an inconsistent load profile:1) load profile vector clustering errors;2) vacation or holidays;3) newly added appliances to customer;4) other reasons.The existence of irregular load profiles can interfere with

the trust credibility system and cause high false positive errors.The supporter vector machine (SVM) with radial basis func-tion kernel is applied for further identification. For sake of com-pleteness, the SVM is briefed here. Readers can refer to [26]for detailed information. Given a set of training data for binaryclassification

(15)

then, if there exists a hyperplane

(16)

such that

(17)

then the training data set is defined linearly separable. The SVMaims to form a maximal margin classifier of linearly separabledataset by minimizing , subject to (17).Realistically, not all binary classification problems are lin-

early separable. However, those cases can be mapped usingkernel function with to a linear space and linear separation can

(12)

(13)

Page 7: Online Data Validation for Distribution Operations Against Cybertampering

556 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 29, NO. 2, MARCH 2014

be accomplished. The typical kernel functions include polyno-mials, radial basis function (RBF), and multilayer perceptron.The SVM is used as a classifier which needs to be pre-trained

for each individual customer by the following steps.Step 1) For each suspicious customer, collect the historical

load profile data.Step 2) Pre-estimate the tampered load profile data for the

customers.Step 3) Use the SVM as a classifier and train it with the his-

torical genuine data and the pre-estimated tampereddata.

Step 4) Evaluate the current load profile using the pre-trainedSVM for further identification.

The load profile of each customer deemed suspicious by thetrust credibility system is further evaluated by the pre-trainedSVM. If it is identified as the irregular ones, the customerswill be removed from the suspicious group. Using the super-vised training technique can effectively reduce the false positiveerrors.

E. False Positive and Negative Rates

In addition, the anomaly in consumption pattern does notnecessarily indicate the existence of cybertampering. There areother factors that can also contribute to energy usage anoma-lies. To characterize the performance of the trusted credibilitysystem, the false positive and negative rates are introduced. Thefalse positive rate is defined as

(18)

where is the number of detected customers with loadprofile anomaly and is the detected tampered customers.This rate is highly dependent on the local customers’ energyusage behaviors and can vary from area to area. The false neg-ative rate is defined as

(19)

where is the total number of tampered customers. Themisidentification rate is defined as Both and can beaffected by the setting of credibility threshold lower bound .A higher value of usually leads to lower false negative rate.The proper setting of needs to be heuristically determinedwith the consideration of balancing and .

IV. CASE STUDIES

The proposed methodology is simulated and validated on atest case using an actual distribution system. The methods forfeeder and subsystem identification are implemented in C++.The identification of tampered electronic meters and trustedcredibility system is implemented in MATLAB that interfaceswith C++ programming environment.

A. Test Case Setup

We apply the proposed framework to a test casemodeled froma realistic 12.47-kV distribution network. The geographical mapof the distribution system is shown in Fig. 6. This distributionsystem consists of two substations, 16 distribution feeders, 115FRTUs or RTUs (19 of them are normally open switches.), 1315distribution transformers, and 5301 EMs. In this simulation,

all FRTUs are assumed to be functional. Thus, the distributionsystem can be divided into 96 subsystems. Our dataset includesthe meter readings of each customer for the latest 30 days.

B. Mismatch Ratio Threshold

The mismatch ratio threshold , discussed in Section III-B,is used to set up for subsystem identification. Depending on themeasurement errors and system losses, the ratio is determinedby historical datasets. All EMs in the test system are assumed tohave an accuracy of . This conforms with class 0.2 de-fined in ANSI C12.20 [27]. We can then assume that 99.7% ofthe measurement errors are within the range of . Thus,the normalized error of each EM can be modeled to follow anormal distribution with a mean of 0 and standard deviation of0.0667%. Since FRTUs are used for command and control, theyare assumed to have better accuracy of . The measure-ment resolutions of EMs and FRTUs are assumed to be 0.024kW-hr and 0.1 kW, respectively.A Monte Carlo simulation is performed to study the distri-

bution of mismatch ratio . The statistical distribution is ob-tained from the calculation based on 500 samples of FRTUmea-surement and EM data using (4). An example of subsystem 70is shown in Fig. 7. The mismatch ratio ranges between 0.7%and 0.7% for the 500 randomly generated samples. The calcu-lated standard deviation of the 500 samples is 0.18%. Then, thesamples of the can be fit into a normal distribution withexpected value of 0 and standard deviation of 0.18%. The max-imum acceptable probability of false positive errors is set to be0.5%. Then, the value of for this particular subsystem can becalculated to be 0.5%.The threshold values of mismatch ratio are determined based

on the discussion above. We run similar analysis to all subsys-tems. Results for mismatch ratios of all subsystems are shown inFig. 8. The values range from to . All of these valuesare determined prior to subsystem identification.

V. SIMULATION RESULTS

Here, we illustrate the three-level data validation framework.Selected tampering scenarios are explained. The results of sub-system identification and individual credibility score are latershown.

A. Selected Tampering Scenarios

Three different cybertampering scenarios are described in thisstudy. The first scenario is that customers scaling down thereading of EMs. EMs 5297, 5298, 5299, 5300, and 5301, all as-sociated with subsystem 70, are simulated. The second scenarioinvolves malicious consumers disrupting the power supply orcutting off the communication link between EMs and customerbilling networks. The EMs report zero energy consumption orcannot respond with any real-time information to the customerbilling center. EMs 5287 and 5288, both associated with sub-system 79, are set up to be this case. The third scenario is whenthe consumers limit the EM readings at peak load hours. Theenergy consumption above the reading limit is hidden. Thisevent is simulated on EM 5242, associated with subsystem 64.A detailed description of the test case setting is summarizedin Table I. According to Section IV-B, the thresholds for tam-pered subsystem 64, 70, and 79 are , , and ,respectively.

Page 8: Online Data Validation for Distribution Operations Against Cybertampering

GUO et al.: ONLINE DATA VALIDATION FOR DISTRIBUTION OPERATIONS AGAINST CYBERTAMPERING 557

Fig. 6. Distribution feeders with IP-based energy meters.

Fig. 7. Mismatch ratio distribution under normal condition for subsystem 70.

B. Results on Subsystem Identification

This simulation consists of tampered EMs that are associ-ated with three different subsystems. A 15-minute time win-dows (7:30PM to 7:45PM) is selected for subsystem identifica-tion simulation and is shown in Fig. 8. The white bars representthe mismatch threshold for subsystem , while the black barsrepresent the calculated mismatch ratio for each sub-system. If the mismatch ratio for a subsystem exceeds its ,the corresponding white bar chart would be completely coveredby the black bar chart. Results show that mismatch ratios calcu-

Fig. 8. Mismatch ratios for all subsystem: 7:30PM—7:45PM.

lated for other subsystems except No. 64, 70, and 79 are belowthe threshold. The numerical results for subsystem 64, 70, and79 are shown in Table II. The mismatch ratio of all time win-dows is above its threshold value. Thus, subsystems 64, 70, and79 are concluded as potentially tampered areas.

C. Results on Credibility Score Simulation

Followed by the subsystem identification, the range of poten-tially tampered EMs can now be verified using the trusted cred-ibility system introduced in Section III-C. This procedure re-quires a period of observation on each consumer energy pattern,

Page 9: Online Data Validation for Distribution Operations Against Cybertampering

558 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 29, NO. 2, MARCH 2014

TABLE ITAMPERED EM SETUP

TABLE IISUBSYSTEM IDENTIFICATION

TABLE IIIPOTENTIALLY TAMPERED EMS

typically 24 hours, and sufficient historical load profile data. Inthis simulation, all EMs associated to subsystems 64, 70, and 79are evaluated to determine credibility scores for all consumers.The lower bound of credibility threshold is set to 0.85. Resultsare shown in Table III. The credibility scores for tampered EMsNo. 5242, 5297, 5298, 5299, 5300, 5287, and 5288 are 0.3313,0.3741, 0.4773, 0.5379, 0.6042, 0.4333, 0.6283, and 0.6164, re-spectively, all below their credibility thresholds. It can be con-cluded that these EMs are suspicious. There are 31 EMs asso-ciated with subsystems 64, 17 EMs associated with subsystem70, and 103 EMs associated with subsystem 79. The number ofpotentially tampered EMs is reduced to 12 EMs when we eval-uate the system through credibility score.The proposed method can detect all tampered meters in all

three scenarios. Even though there are some meters that areinaccurately labeled as being malicious, the proposed methodcan help the utilities to narrow down their search into a small

TABLE IVCREDIBILITY SCORE PERFORMANCE SHOWN BY

NUMBER OF EMS ITH

Fig. 9. Influence of value on false negative rates for groups 1–5.

number of EMs. Further investigation such as on-site inspec-tions can be conducted.

D. Credibility Score Threshold

The selection of credibility threshold lower bound cangreatly influence the accuracy of identification. In order to eval-uate the performance of proposed trusted credibility system, weselect 5 groups of customers from different geographic loca-tions to conduct a detailed study. For each group, a randomnumber of EMs are chosen as tampered ones. A wide range ofcybertampering behaviors are simulated, such as scaling downeach data point by to , reducing or amplifying eachdata point by a fixed or random amount, or limit the data pointto some fixed value. Table IV shows some exemplary simula-tion results of the credibility score performance of each cus-tomer group with . For example, the actual numbersof tampered customers are 18, 15, 29, 33 and 51 meters forgroup 1 to 5. The proposed algorithm suspects 25, 20, 32, 53,and 81 meters in groups 1–5, respectively, and identify 16, 14,23, 31, and 43 meters in groups 1–5, respectively. Figs. 9 and10 show how the false negative and false positive rates varywith the value of for each group of customers. The falsenegative rates keep decreasing with a smaller value, whilethe false positive rates may increase after reaching some cer-tain point. The value for a particular customer group needsto be properly selected such that the false negative rate can bereduced as much as possible with an acceptable false positiverate. If the upper bound constraints of false positive and neg-ative rates are set to be and , respectively, the op-timal value of is 0.95. Follow the same principle, the credi-

Page 10: Online Data Validation for Distribution Operations Against Cybertampering

GUO et al.: ONLINE DATA VALIDATION FOR DISTRIBUTION OPERATIONS AGAINST CYBERTAMPERING 559

Fig. 10. Influence of value on false positive rates for groups 1–5.

Fig. 11. Comparison of false positive rates of all groups with .

bility threshold lower bound values for all groups can be deter-mined. Based on the simulation results, in most situations, thebest value of is between 0.85 and 0.95.

E. Performance Evaluation of SVM

Although the trust credibility system with high credibilityscore threshold can effectively identify the tampered customers,the analysis can cause high false positive rate. To further im-prove the identification accuracy, the SVM is introduced.For individual suspicious customer, the SVM is trained usinghistorical genuine and pre-estimated tampered load profiledata. Fig. 11 depicts the impact of using SVM on false positiverates. The is set to be 0.95. Without the SVM evaluation,the false positive rates of detection are , , ,

, and for groups 1–5, respectively. The SVM cansignificantly reduce the rates to , , , ,and , respectively. Fig. 12 shows the influence of SVMon the false negative rates. In some situations, a slight increaseof the rate can be seen due to the classification errors. For group1, 2, 4, and 5, the false negative rates are increased to ,

, , and , respectively.

Fig. 12. Comparison of false negative rates of all groups with .

VI. CONCLUSION AND FUTURE WORK

The proposed method is designed to determine inconsistencyof datasets from substation RTUs, FRTUs, and EMs that in-volves three major steps. The assumptions of using this methodinclude perfect accuracy with up-to-date topology and networkparameters. The algorithm requires distribution power flowanalysis, together with the measurement points from those datasources. Simulation results show that the proposed methodprovides a systematic way to identify cybertampering eventswith acceptable accuracy. The proposed application is appliedon radial network topology. Additional algorithmic enhance-ments are required to handle the loop or meshed condition ofdistribution grid.It should be noted that the proposed data validation frame-

work contains similar idea as hypothesis testing in traditionalstate estimation techniques. Traditional state estimationmethods use all available measurements to estimate a setof state variables through solving an optimization problem,typically weighted least square problem, while the proposedmethod is to validate data consistency across two domainnetworks. The advantage of the proposed approach is thatit requires no calculation on a large gain matrix and candirectly use the power flow result from DMS.The hypothesis testing of subsystem defined in constraint (5)

is used to detect the presence of inconsistency between twomea-surement data sources. The adjustment of mismatch thresholdvalues is determined by data measurement errors and historicalload profile of a subsystem. Each measurement device, eitherfrom FRTUs or home EM readings, has errors that can be mod-eled using statistical normal distribution. As the sampling ratesfor FRTUs are every 3-s cycle and home EMs are every 15 min,the reading based on 15-min sampling rate is normalized. Asa result, the errors between estimated loss and actual loss maynot be perfectly accurate. The existence of measurement errorsmay occur even when there is no cybertampering event. Thethreshold defined here is a range of mismatch that is reasonablyconsidered within estimated error values.At the distribution dispatching center, the database update of

the distribution network topology may not always be up-to-datereflecting the system scenarios. It is important for system en-gineers to consistently maintain the integrity of the distribution

Page 11: Online Data Validation for Distribution Operations Against Cybertampering

560 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 29, NO. 2, MARCH 2014

network database, including the information of conductors andincremental network topology modifications.

ACKNOWLEDGMENT

The authors would like to thank A. Ginter, Waterfall SecuritySolutions, for the useful discussions. The authors would alsolike to thank Consumers Energy for providing the load profilesamples of the “smart” meters. The authors greatly appreciatethe five anonymous reviewers for their effort and time to im-prove the quality of this work.

REFERENCES[1] U.S. Dept. Energy, “Assessment of demand response and advancedme-

tering,” Docket Ad06–2-000 [Online]. Available: http://www.ferc.gov/legal/staff-reports/demand-response.pdf

[2] C. W. Gellings, The Smart Grid: Enabling Energy Efficiency and De-mand Response, 1st ed ed. Boca Raton, FL, USA: CRC, 2009.

[3] “Challenge and opportunity: Charting a new energy future. Energy fu-ture coalition,” Smart Grid Working Group (2003–06), Appendix A:Working Group Reports [Online]. Available: http://www.energyfuture-coalition.org/files/webmuploads/EFC_Report/EFCReport.pdf

[4] H. Wang and N. N. Schulz, “A revised branch current-based distribu-tion system state estimation algorithm and meter placement impact,”IEEE Trans. Power Syst., vol. 19, no. 1, pp. 207–213, Feb. 2004.

[5] Z. J. Simendic, V. C. Strezoski, and G. S. Svenda, “In-field verificationof the real-time distribution state estimation,” in Proc. CIRED 18th Int.Conf. Electricity Distrib., Turin, Italy, Jun. 6–9, 2005, pp. 1–4.

[6] M. Baran and T. E. McDermott, “Distribution system state estimationusing AMI data,” in Proc. Power Syst. Conf. Expo., Seattle, WA, USA,Mar. 1–3, 2009, pp. 15–18.

[7] A. Abur and A. G. Exposito, Power System State Estimation: Theoryand Implementation, 1st ed ed. Boca Raton, FL, USA: CRC, 2004.

[8] J. Nagi, K. S. Yap, S. K. Tiong, S. K. Ahmed, andM. Mohamad, “Non-technical loss detection for metered customers in power utility usingsupport vector machines,” IEEE Trans. Power Del., vol. 25, no. 2, pp.1162–1171, Apr. 2010.

[9] T. B. Smith, “Electricity theft: A comparative analysis,”Energy Policy,vol. 32, pp. 2067–2076, 2004.

[10] T. Winther, “Electricity theft as a relational issue: A comparative lookat Zanzibar, Tanzania, and the Sunderban Islands, India,” Energy forSustainable Develop., vol. 16, no. 1, pp. 111–119, Mar. 2012.

[11] A. H. Nizar, Z. Y. Dong, and Y. Wang, “Power utility non-technicalloss analysis with extreme learning machine method,” IEEE Trans.Power Syst., vol. 23, no. 3, pp. 946–955, Aug. 2008.

[12] D. Suriyamongkol, “Non-technical losses in electrical power systems,”Master’s thesis, Dept. Electr. Eng. Comput. Sci., Ohio Univ., Athens,OH, USA, 2002.

[13] S. McLaughlin, D. Podkuiko, and P. McDaniel, “Energy theft in theadvanced metering infrastructure,” in Proc. 4th Int. Workshop Crit. Inf.Infrastruct. Security, Bonn, Germany, Sep. 2009, pp. 176–187.

[14] G. Tsekouras, N. Hatziargyriou, and E. Dialynas, “Two-stage patternrecognition of load curves for classification of electricity customers,”IEEE Trans. Power Syst., vol. 22, no. 3, pp. 1120–1128, Jul. 2007.

[15] G. Chicco, R. Napoli, and F. Piglione, “Comparisons among clusteringtechniques for electricity customer classification,” IEEE Trans. PowerSyst., vol. 21, no. 2, pp. 933–940, May 2006.

[16] V. Figueiredo, F. Rodrigues, Z. Vale, and J. B. Gouveia, “An elec-tric energy consumer characterization framework based on data miningtechniques,” IEEE Trans. Power Syst., vol. 20, no. 2, pp. 596–602,May2005.

[17] Y. Zhang, W. Chen, and J. Black, “Anomaly detection in premiseenergy consumption data,” in Proc. IEEE Power Eng. Soc. GeneralMeeting, Detroit, MI, USA, Jul. 2011, pp. 1–8.

[18] Y. Zhang, L. Wang, W. Sun, R. C. Green, and M. Alam, “Distributedintrusion detection system in a multi-layer network architecture ofsmart grids,” IEEE Trans. Smart Grid, vol. 2, no. 99, pp. 796–808,Jul. 2011.

[19] J. Nagi, K. S. Yap, S. K. Tiong, S. K. Ahmed, and A. M. Mohammad,“Detection of abnormalities and electricity theft using genetic supportvector machines,” in Proc. IEEE Region 10 Conf. TENCON, Hyder-abad, India, Jan. 2009, pp. 1–6.

[20] S. S. S. R. Depuru, L. Wang, and V. Devabhaktuni, “Support vectormachine based data classification for detection of electricity theft,” inProc. Power Syst. Conf. Expo. IEEE/PES Phoenix, Mar. 2011, pp. 1–8.

[21] P. McDaniel and S. McLaughlin, “Security and privacy challenges inthe smart grid,” IEEE Security Privacy, vol. 7, pp. 75–77, 2009.

[22] Y.-H. Chang, P. Jirutitijaroen, and C.-W. Ten, “A simulation model ofcyber threats for energy metering devices in a secondary distributionnetwork,” in Proc. 5th Int. CRIS on Critical Infrastructures, Beijing,China, Sep. 2010, pp. 1–7.

[23] D. S. Yeung, S. Jin, and X. Wang, “Covariance-matrix modeling anddetecting various flooding attacks,” IEEE Trans. Syst., Man, Cybern.A, vol. 37, no. 2, pp. 157–169, Feb. 2007.

[24] E. M. Hutchins, M. J. Cloppert, and R. M. Amin, “Intelligence-drivencomputer network defense informed by analysis of adversary cam-paigns and intrusion kill chains,” Mar. 13, 2013. [Online]. Available:http://www.lockheedmartin.com/content/dam/lockheed/data/corpo-rate/documents/LM-White-Paper-Intel-Driven-Defense.pdf.

[25] Nat. Commun. Syst., “Supervisory control and data acquisition(SCADA) systems,” Tech. Inf. Bull. 04–1 [Online]. Available:http://www.ncs.gov/library/tech_bulletins/2004/tib_04–1.pdf

[26] C. J. C. Burges, “A tutorial on support vector machines for patternrecognition,” Data Mining Knowledge Discovery, vol. 2, no. 2, pp.121–167, 1998.

[27] American National Standard for electricity meters – 0.2 and 0.5accuracy classes, ANSI C12.20–2010, Nat. Electrical ManufacturersAssoc., Aug. 31, 2010.

Yonghe Guo (S’11) received the B.S. and M.S.degrees in electronic engineering from BeijingInstitute of Technology, Beijing, China, in 2006 and2008, respectively. He is currently working towardthe Ph.D. degree at the Department of Electricaland Computer Engineering, Michigan TechnologicalUniversity, Houghton, MI, USA.His research interests include cybersecurity

for power distribution systems and operationsoptimization.

Chee-Wooi Ten (SM’11) received the B.S.E.E.and M.S.E.E. degrees from Iowa State University,Ames, IA, USA, in 1999 and 2001, respectively, andthe Ph.D. degree from University College Dublin(UCD), National University of Ireland, Dublin,Ireland, in 2009.He was an Application Engineer with Siemens

Energy Management and Information System(SEMIS) in Singapore from 2002 to 2006. He iscurrently an Assistant Professor with MichiganTechnological University, Houghton, MI, USA. His

primary research interests are modeling for critical cyberinfrastructures andSCADA automation applications.

Panida Jirutitijaroen (SM’12) received the B.Eng.degree (Hon.) from Chulalongkorn University,Bangkok, Thailand, in 2002, and the Ph.D. degree inelectrical engineering from Texas A&M University,College Station, TX, USA, in 2007.She is an Assistant Professor with the Department

of Electrical and Computer Engineering, NationalUniversity of Singapore. Her research interests arepower system reliability and optimization.