Anomaly Detection in Cyber Physical...

Anomaly Detection in Cyber Physical Systems

Maggie Cheng

Illinois Institute of Technology

IEEE Big Data ConferenceDecember 11, 2018

Seattle, WA

Outline

Introduction

Outlier Detection

Sequential Change Point Detection

High Dimensional Data

Summary and Outlook

Introduction 2

Anomaly Analysis

I Unusual and significant changes in the network or CPS

I Necessary to detect, and remove/mitigate

I Analyzing anomalies from data is a big data analytics problem:

- large amount of data

- high speed

- high dimensional

- heterogeneous types

- noisy

I Involves extracting and interpreting anomalous patterns from data

- difficult to define a baseline for normal pattern

Introduction 3

Anomaly Analysis

The steps in diagnosing an anomaly

1. Detection: binary, or real-valued anomaly score

- The time at which the anomaly is observed

2. Identification: selecting the anomaly type from a set of possiblecandidate anomalies

- Zero-day attack?

3. Localization

- Which link/node, component?

4. Quantification of impact

- A measure of the importance of the anomaly

The scope of this tutorial

I Detection

I Cyber + physical anomalies

Introduction 4

Types of Anomalies

Outlier

I Aberrant observations that are considerably different from themajority are outliers.

- ”An outlier is an observation that deviates so much from otherobservations as to arouse suspicion that it was generated by adifferent mechanism” (Hawkins, 1980)

I Indicator of problems

I With or w/o time dimension

Change Point on Time Series

I Abrupt change in time series

I Indicator of change of state in the underlying process

Introduction 5

Types of Anomalies

Outlier

I Location outliers

I Scatter outliers

I Combination of both

Change Point

I Change in mean

I Change in variance

I Combination of both

Index

NULL

Index

NULL

Introduction 6

Outlier vs. Change Point on Time Series

Change point may not be an outlier; Outlier may not be a change point

IndexNULL

Introduction 7

Outline

Introduction

Outlier Detection



Summary and Outlook

Outlier Detection 8

Outlier Detection

Applications of Outlier Analysis

I Credit card fraud detection

I Intrusion detection in networks

I Bad data detection in power grid

- Observing abnormal values indicates measurement errors or errors inthe generating process

I Many other applications ...

Evolution of Outlier Detection Techniques

I From univariate to multivariate (multi-dimensional)

I From one normal class to multiple normal classes (more than onegenerating mechanism underlying the data)

Outlier Detection 9

Outlier Detection

Detection TechniquesI Statistical outlier detection (Barnett and Lewis, 1994)- Based on distributionI Distance-based (Knorr and Ng, 1998)- Based on notion of proximityI Density-based (Breunig et al., 2000)- Based on local outlier factor

Outlier Detection 10

Example I: Bad Data Detection in Power Systems

Weighted Least Square Based Method



WLS-Based Error Detection in Power Systems

I Linear WLS

minimize f = ‖e‖2 = eT · e =

m∑i=1

wi

zi − n∑j=1

aijxj

2

I Non-Linear WLS

minimize f = ‖e‖2 = eT · e =

m∑i=1

1

σ2i

[zi − hi(x)]2

x: state variables

z: measurements

h(x): non-linear measurement functions

zi − hi(x): residual of the ith measurement



Weighted Least Square Based Method: Using χ2-Test

I Let zi be measurement, zi be anticipated value for zi.

I Assume residuals Xi = zi − zi are Gaussian and independent.

I Therefore, χ2 =n∑i=1

(Xi/σi)2 follows χ2-squared distribution with

DOF ν = n.

I Given a significance level α, if χ2 > χ2α → bad data detected.

Assumptions in WLS

I Errors in measurements are Gaussian and independent.

I System topology model is correct.


Example II: Line Outage Detection in Power Systems

Topology Error: Discrepancy Between Assumed Model and Actual Model

(h) Assumed model (i) Actual model

IEEE 9-Bus Test System



System Matrix Known

I Hypothesis testing concerning the mean vector of the residuals

I Use the asymptotic distribution theory of Maxima

System Matrix Unknown

I Perform change point detection on time series → anomaly detected

I Identify and locate specific line outage



A New Topology Error Detection Approach

x: state variables

z: measurements

h(x): non-linear measurement functions

zi = hi(x): the anticipated value for the ith measurement

Xi = zi − zi: the residual of the ith measurement

I Without topology errors, it is expected that the residuals Xi arenormally distributed with zero means:

Xi ∼ N (µi, σ2i ) and µi = 0,∀i

I The problem of topology error detection is cast as a problem oftesting the hypothesis of whether the mean vector of a stochasticprocess is zero.



Hypothesis Testing Approach

I Input: the residuals for the n redundant measurements:Xi = zi − zi, i = 1 . . . n.

I Hypotheses:H0 : µ1 = . . . = µn = 0

H1 : ∃i, such that µi 6= 0

New Test StatisticsDefine Mn = max

i≤n|Xi|/σi

I Mn is the maximum standardized residual

I Hypothesis testing based on Mn: Compute Mn from data andcompare Mn with what is expected under the null hypothesis.


Hypothesis Testing

Hypothesis Testing FrameworkGiven a significance level α ∈ (0, 1), find a cutoff value tn such thatunder the null hypothesis H0,

P(Mn > tn) = α

I If Mn > tn, reject H0

I How to find the cutoff value tn?


Cutoff Value tn

Detection Threshold tn: When Xi Are IndependentFrom Mn = max

i≤n|Xi|/σi and P(Mn > tn) = α, we have

1− α = P(Mn ≤ tn)

=

n∏i=1

P(|Xi|/σi ≤ tn)

= [1− 2Φ(−tn)]n

where Φ(x) =∫ x−∞ φ(u)du, and φ(x) = (2π)−1/2e−x

2/2.

Let Φ−1 be the inverse function of Φ. Hence

tn = −Φ−1{[1− (1− α)1/n]/2} (1)


Asymptotic Theory for Mn

When Xi are dependent

I Need to know the asymptotic distribution for Mn under dependence

Definition (Measure of Dependence)

I Let X1, X2, . . . , Xn be a non-stationary Gaussian process; rij be thecorrelation between Xi and Xj , for 1 ≤ i, j ≤ n.

I Define a skeleton index set

S(δ) = {(i, j) : |rij | > δ, 1 ≤ i, j ≤ n}

I The cardinality of the set S(δ) is a measure of dependence.

I Large |S(δ)| implies strong overall dependence.


Asymptotic Theory for Mn

Theorem (Weak Dependence Condition (Wu et al., 2016))If there exists λ ∈ (0, 1) and constant C,α > 0, such that

maxij|rij | < λ (2)

and for every δ ∈ (0, 1), the cardinality of the skeleton index set S(δ)satisfies

|S(δ)| ≤ Cnδ−α (3)

Then for every 0 < α < 1, we have

limn→∞

P(Mn ≤ tn) = 1− α (4)

Remarks

1. Under conditions (2) and (3), the maxima Mn has the sameasymptotic distribution as the one obtained under independence.

2. When power grids are sparsely connected, the condition is easilysatisfied.


Detecting a Topology Error in IEEE 118-Bus System

Setup

I 118 state variables, 301 measurements: 183 redundantmeasurements.

I Topology change: remove the transmission line between bus 37 and38.

Procedure

I Obtain the residuals (Xi)ni=1 by using non-WLS state estimation.

I Estimate the standard error σ.

I Compute the standardized residual Xi/σi, i = 1, . . . , n.

I Compute the cutoff value tn with significance level α = 0.001 andn = 183.

I By hypothesis testing approach, if there is an index i such that|Xi|/σi > tn, reject H0, and report alarm.


Detecting a Topology Error in IEEE 118-Bus System

Results

I tn = 4.546012

I Identify the index set J = {j : |Xj |/σj > tn}

J = {112, 113, 114, 115, 116, 124, 125, 127, 129, 130, 154, 155, 162, 165}

0 50 100 150

−1.0

−0.5

0.0

0.5

1.0

Index

Resid

uals

(j) Residuals Xi

0 50 100 150

0.0

00.0

50.1

00.1

5

Index

Estim

ate

d S

tandard

Err

ors

(k) Standard errors σi

0 50 100 150

−10

−5

05

Index

Sta

ndard

ized R

esid

uals

(l) Xi/σi


Example III: PCA-Based Network Traffic Anomaly

Detection

(Lakhina et al., 2004a; Huang et al., 2006)I Outlier detection from time seriesI Use Principal Component Analysis (PCA) to detect anomaliesI Detect volume anomaly in network traffic- Use O-D flow information- Each link has aggregated traffic from all O-D flows

Outlier Detection from Time Series 24


Detection I

PCA Analysis on Network Data

I Form link data matrix Ym×n

- m: last m data points, n: number of links

I Perform PCA on Y

- Find the principle components:

1 The first principle component v1:

v1 = arg max‖v‖=1

‖ Y v ‖

‖ Y v ‖2 is proportional to the variance of the data measured along v.



Detection II

2 The k-th principle component vk is:

vk = arg max‖v‖=1

‖ (Y −k−1∑i=1

Y vivTi )v ‖

- The matrix P = [v1, v2, . . . , vk] is formed by the first k principlecomponents, which capture the dominant variance in the data

I Separate normal and anomalous network-wide traffic

Y = Y + Y

Y = PPTY , and Y = (I − PPT )Y

- Y : contains residual traffic.

- Volume anomaly will result in a large spike in Y


Outline

Introduction

Outlier Detection



Summary and Outlook

Sequential Change Point Detection 27


Sequential Change Point DetectionTo detect a change point in a time series {X1, X2, . . . , Xn}, it is assumedthat the pre-change density is f0, and if a change occurs at time ν, thenthe post-change density becomes f1 beginning from moment ν + 1.

The hypotheses are then formulated as:H0: {X1, X2, . . . , Xn} ∼ f0H1: {X1, X2, . . . , Xν} ∼ f0, and {Xν+1, Xν+2, . . . , Xn} ∼ f1

The Change Point Detection Problem is to decide(1) which hypothesis is true?(2) if H1 is true, ν =?

The time instance ν, at which the state of the process changes is referredto as the change point or time of change.



Algorithms

I Cumulative Sum Algorithm (CUSUM)

I Shiryaev-Roberts Procedure

I Sliding Window Algorithm

Performance Metrics

I False positive and false negative rates

- If a change occurred but the detection procedure failed to detect it:false negative (misdetection)

- If the detection time N < ν: false positive (false alarm)

I Detection delay

- If there is a true change and the time of change is ν, the detectiontime is N , then detection delay τ = N − ν


CUSUM (Page, 1954)

OptimalityCUSUM is optimal in the sense of minimizing worst case detection delay.

Assumptions

I Observations X1, X2, . . . Xn are independent, iid pre-change and iidpost-change

I Probability density functions: f0 before change; f1 after change

I Assume f0 and f1 are known

I The only thing unknown is ν, the time of change

Parameter

I Detection threshold h


CUSUM

Parametric CUSUM: Based on Maximum Likelihood PrincipleDetection statistics:

Wn = max(Wn−1 + Zn, 0) for n ≥ 1

where W0 = 0Zn = logLn

Ln is the likelihood ratio: Ln =f1(Xn|Xn−1

1 )

f0(Xn|Xn−11 )

, or Ln = f1(Xn)f0(Xn)

for i.i.d.

The procedure declares a change as soon as the detection statistics Wn

exceeds a preset threshold h:

N = min{n ≥ 1 : Wn ≥ h}


Shiryaev-Roberts Procedure

Roberts (Aug., 1966); Shiryayev (1963)

The Problem Setting

I Original Setting: ”Quickest Detection of a Disorder in a StationaryRegime”

- The change is possibly taking place at a far horizon

I A randomized version for a general discrete time setting

I Applications: target detection and tracking, rapid detection ofintrusions in communication networks, environmental monitoring

- Early detection of changes that may occur in a distinct future



The Algorithm

1. Shiryaev-Roberts statistic Rn =n∑k=1

p(X1,...,Xn|ν=k)p(X1,...,Xn|ν=∞)

2. From independence assumption: Rn =n∑k=1

n∏i=k

f1(Xi)f0(Xi)

3. Rn can be computed recursively: Rn = (1 +Rn−1) f1(Xn)f0(Xn)

, for

n ≥ 1; R0 = 0

4. Stopping time: RAB= min{n ≥ 1 : Rn ≥ AB}

Parameter: AB is chosen such that E∞NAB= B

B is a preset value before surveillance begins.



Detection Delay Shiryaev-Roberts procedure is the best in terms ofminimizing the expected detection delay (asymptotically).

TheoremShiryaev-Roberts procedure minimizes

∞∑k=1

Ek(N − k)+

over all stopping times N that satisfy E∞(N) ≥ B.

CUSUM and S-R Procedure

I Based on ratio of likelihoods

- S-R procedure is a CUSUM-type of algorithm

I Difficult to apply when f1 and f0 are unknown


Sliding Window Algorithm

Preset Parameters

I Window size m (m� N , the total number of data points)

I Significance level α (e.g., α = 0.05)

Sliding Window Algorithm (Cheng et al., 2016)

1. Set window offset d = 0.

2. Compute the sum S1 =d+m∑i=d+1

Xi, and S2 =d+2m∑

i=d+m+1

Xi.

3. If | S2 − S1 |≥ zσ√

2m, declare a change point ν = d+m

4. Else set d = d+ 1, go to line 2.

Remarks:

I z is the critical value that provides an area of α in the upper tail ofthe standard normal distribution.

I σ2 is the variance, updated as the window moves


Sliding Window Algorithm

Algorithm Properties

I Be able to detect a change in state without knowing the actual pre-and post- change densities

I Relate detection threshold to a tolerable false alarm rate —controlled trade-off

I Relate detection threshold to the dynamic characteristics of the dataand not use a preset value

I Be able to detect abrupt changes as well as slow and subtle changes

I Avoid mistaking an isolated outlier as a change for a new state


Applications Using Sequential Change Point Detection

I DoS Attack Detection

- SYN flood attack

I Attack Detection in Wireless Networks

- Network layer

- MAC layer

- Physical layer

I Power Grid Anomaly Detection


DoS Attack Detection

A Common DoS Attack: SYN Flood Attack

I Attacker sends control packets to compromised nodes

I A large number of flooding sources send an excessive number ofSYN requests to the victim

I The victim server returns SYN/ACK packet to the client waiting forACK until timeout

I Flooding sources never return an ACK

I Exhaust the victim server’s backlog queue → all connection requestsdropped

Challenges: preset threshold (7)

I Traffic patterns vary from site to site, from time to time

I Per-flow state information not known

I Normal traffic models hard to define



How to detect w/o prior knowledge of flow and traffic info?Detection mechanism must be insensitive to site and traffic patterns.

I There is no normal traffic model or flow rate, but there is normalbehavior

I Baseline: protocol behavior (TCP connection management)

- Normal: FINs match with SYN requests from clients

- Packet drop/retransmission cause small discrepancy

- Under SYN flood attack: Large difference between the number ofSYNs and FINs received


SYN Flood Attack

I Attackers create a large number of ”open” connectionsI Change in network measurement: | SY N − FIN | shows abrupt

increase



Detection Procedure

I Monitor the number of SYNs and FINs

- at egress router (near the flooding source)

- at ingress router (near the victim server)

I Generate time series on (SYNs–FINs)

I Perform sequential change point detection on time series

- Non-parametric version



Non-Parametric CUSUM for Change Point Detection (Wang et al., 2002)

I Tunable parameters: a,N

I Observations: S: number of SYNs; F : number of FINs

1. Dn = Sn − Fn2. Rn = α(Rn−1) + (1− α)Fn

3. Xn = Dn/Rn

4. choose constant a > E(Xn)

5. Test statistic: yn = (yn−1 + (Xn − a))+, y0 = 0

6. Detection: first n such that yn > N

Remarks

I Algorithm very sensitive to N and a.

I Difficulty: determining N and c before monitoring begins.


Wormhole Attack in Wireless Ad Hoc Networks

Routing: A category of routing protocols use shortest path routing.

I Nodes exchange local information and relay to others

I Nodes collectively decide a route towards a destination

I Select the ”best route” based on hop count (shortest path routing)

Wormhole Attack

I Adversary controls two end points and a tunnel between them

I Attract traffic to go through the controlled wormhole tunnel bymaking false route advertisement— a shorter path towards adestination


In-Band vs. Out-Band Wormhole Attack

In-Band Wormhole Attack

I Wormhole tunnel consists ofother wireless nodescontrolled by the adversary

I Re-routed packets gothrough these wireless nodes

AB

Out-Band Wormhole Attack

I Wormhole tunnel is anexternal link

– A wired link

– A wireless link (e.g., along-range directional link)

AB

This talk: address in-band wormhole attack


Wormhole Attack Detection

Performance Degradation in an In-Band Wormhole Attack

I End-to-end delay increases

I Throughput decreases

I Packet Deliver Ratio drops (If the wormhole endpoints drops packetsarbitrarily)

I and more . . .

Proposed Method

I Model the end-to-end delay of a flow as a time series

I Perform Change Point Detection on the time series to detect thechange


Stationary Network — Setup

I An in-band wormhole tunnel is established between node 1 and node2 at 50 seconds.

I Wormhole tunnel: 1-19-25-23-2 advertised as one hop 1-2I Two flows, without other traffic in the background

I Flow 18 28:

– Before 50 seconds: use path18-9-34-35-37-28

– After 50 seconds: use path18-1 ... 2-28

I Flow 17 38:

– Before 50 seconds: use path17-14-21-16-38

– After 50 seconds: use path17-1...2-38


Stationary Network— Result I

02

46

81

0

18−−28

0 20 40 60 80 100

02

46

81

0

17−−38

Simulation Time (seconds)

En

d−

to−

en

d D

ela

y (

se

co

nd

s)


Stationary Network— Result II

I Three flows that changed routes: 9 2, 18 28, 17 38I There are other flows in the background that stayed on the original

routes

02

46

810

12

9−−24

02

46

810

12

18−−28

0 20 40 60 80 100

02

46

810

12

17−−38

Simulation Time (s)

End−

to−

end D

ela

y (

s)

Figure: Packet size 256B,interval=0.01s, 0.025s

01

23

45

9−−24

01

23

45

18−−28

0 20 40 60 80 100

01

23

45

17−−38

Simulation Time (s)

End−

to−

end D

ela

y (

s)

Figure: Packet size 256B,interval=0.02s, 0.01s


MAC-Layer Attack Detection in Wireless Networks

IEEE 802.11 MACI CSMA/CAI RTS-CTS-DATA-ACK

(a) Computer Networking, Kurose & Ross

(b)


MAC Layer Misbehavior in IEEE 802.11 Networks

Sender Selfish Behavior

I Manipulation on carrier sense time

I Manipulation on back-off value during contention

Consequences

I Channel-capturing effect: other nodes have less chance to transmit

Receiver Selfish Behavior

I RTS dropping attack

Consequences

I Clear channel for itself

I Sender waste resource retransmit RTS


MAC Layer Misbehavior Detection

Other Flows Experience Performance Degradation

I End-to-end delay increases

I Throughput decreases

I Packet interval increases

Detection Method

I Monitor packets received

I Compute per-flow end-to-end delay (or throughput, packet interval)as a time series

I Use the sliding window change point detection method to detect thechange on time series


Simulation Setup

I Case 1: Shorter DIFS attack

normal sender: DIFS>SIFS

attacker: switch to DIFS=SIFS starting at 50s

I Case 2: Shorter DIFS and smaller back-off window γ

normal sender: following binary exponential back-off, γ ∈ [32, 1024]

attacker: use fixed γ = 2

I Case 3: RTS dropping attack

normal receiver: respond CTS for every RTS request

attacker: RTS to CTS ratio 20:1


MAC Layer Misbehavior Detection Results

I Case 1: Five victim flows 19→ 1, 14→ 1, 12→ 1, 10→ 1, 6→ 1I Case 2: Same as case 1I Case 3: 20→ 2, 11→ 2, 8→ 2, 7→ 2, 5→ 2I Node 2 is the attacker in all cases

1

2

3

4

5

67

8

9

10

11

12

13

14

15

16

18

19

17

20


Result I: Case 1

0.0

0.2

0.4

0.6

0 20 40 60 80 100

Delay


De

lay (

s)

(d) Delay

600

800

1000

1200

0 20 40 60 80 100

Throughput


Th

rou

gh

pu

t (k

bp

s)

(e) Throughput

0.000.010.020.030.040.05

0 20 40 60 80 100

Delay Mean


De

lay M

ea

n (

s)

(f) µD

812

816

820

0 20 40 60 80 100

Throughput Mean


Th

rou

gh

pu

t M

ea

n (

kb

ps)

(g) µT


Result II: Case 2

0.00

0.04

0.08

0.12

0 20 40 60 80 100

Delay


De

lay (

s)

(h) Delay

700

800

900

1000

020406080100

Throughput


Th

rou

gh

pu

t (k

bp

s)

(i) Throughput

0.0020

0.0030

0.0040

0.0050

0 20 40 60 80 100

Delay Mean


De

lay M

ea

n (

s)

(j) µD

817

819

821

823

0 20 40 60 80 100

Throughput Mean


Th

rou

gh

pu

t M

ea

n (

kb

ps)

(k) µT


Result III: Case 3

0500

1000

1500

2000

0 20 40 60 80 100

Tx Data Rate


Tx D

ata

Ra

te (

kb

ps)

(l) Throughput

0.0

0.2

0.4

0.6

0.8

0 20 40 60 80 100

Tx Packet Interval


Tra

nsm

itte

d P

acke

t In

terv

al (s

)

(m) Packet Interval

600

800

1000

1200

0 20 40 60 80 100

Tx Data Rate Mean


Tx D

ata

Ra

te M

ea

n (

kb

ps)

(n) µT

0.010

0.015

0.020

0.025

0 20 40 60 80 100

Cumulative Average


Tx P

acke

t In

terv

al (s

)

(o) µI


Jamming Attack Detection in Wireless Networks

Attacks

I All nodes exposed to open medium

I Jamming signals: using higher transmission power, do not have tofollow MAC protocol

I Legitimate nodes suffer

- TDMA: collision, increased packet error rate and drop rate

- CSMA: collision, channel capturing

Detection Procedure

I Detect changes from network measurements (delay, throughput,error rate, packet delivery ratio, signal strength, IFS, etc)

I Distinguish

- Jamming vs. weak signals from legitimate nodes

- Jamming vs. network congestion among legitimate nodes


Jamming Attack Detection in Wireless Networks

Detection MethodsI Use summary information in a time interval, compare against a

preset detection threshold

- Not suitable for highly dynamic networksI Use change point detection on time series

- Test statistic: delay, throughput, received packets IFS

e.g.: Throughput when jamming signal duration varies

0 10 20 30 40 50 60 70 80 90 100100

150

200

250

300

350

400

450

500

550

(p) 0.0005s

0 10 20 30 40 50 60 70 80 90 100100

150

200

250

300

350

400

450

500

550

(q) 0.8s

0 10 20 30 40 50 60 70 80 90 100100

150

200

250

300

350

400

450

500

550

(r) 1.5s


Anomaly Detection in Power Grids

Types of Anomalies

I Line outage

- wild animals

- weather

- over-grown trees

- coupled with aging infrastructure + lack of maintenance

I Generator outage

I Transformer fault

I Human errors

I Cyber attacks


Error Detection in Power Grids

Methods

I Detection Based on State Estimationm (WLS)

- Works for measurement errors (e.g., bad data detection)

- Line outage: topology change often causes conforming errors

I Other Detection Methods

- When the system matrix is known, e.g., (Wu et al., 2016)

- When the system matrix is unknown:

Real-time change point detection + anomaly identification


Real-Time Anomaly Detection in Power Grids

What Feature to Use?

G2

G1

G3

Load A Load B

Load C

T2

T1

T3

1

2 3

4

5 6

7

8

9

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

0 20 40 60 80 100 120

Bus 8 angle

Bus 9 angle

Difference(8-9)


Vo

lta

ge

Ph

ase

An

gle

(R

ad

ian

s)


Outline

Introduction

Outlier Detection



Summary and Outlook

High Dimensional Data 62

Outlier Detection for High-Dimensional Data

I Notion of proximity not straightforward (7)

I High-dimensional sparse data: sparsity makes every data point anoutlier

Outlier detection algorithms for high-dimensional data

I Naıve brute force: exhaustive search (Slow!)

I Evolutionary algorithm (Aggarwal and Yu, 2001)

I Projection-based outlier detection (Huber, 1985)

- E.g., principle component analysis

Outlook

I Detecting outliers among missing data

I Fast method for detecting outliers among a mixture of categoricaland continuous variables


Change Point Detection in High-Dimensional Data

A Common Challenge: ScalabilityAlgorithms that work for univariate or low-dimensional time series maynot work for high-dimensional time series.

Methods

I Sum CUSUM statistic from each series (Mei, 2010)

I Sum the local likelihood ratio statistic, then forming a CUSUMstatistic (Tartakovsky and Veeravalli, 2008)

Assumptions

I Non-Structural problems

I Post-change distributions are prescribed

I All series are affected by the change



I Non-Structural Problem

- No spatial model relating the signal to observations at variouslocations

- Other work: Chen et al. (Aug. 2010); Petrov et al. (2003);Levy-Leduc and Roueff (2009)

I Structural Problem

- Has a spatial structure relating the signal to observations at variouslocations

- Other work: Rabinowitz (1994); Shafie et al. (2003); Siegmund andYakir (2008)



Additional Challenge I: with missing data

I Detecting changes from high dimensional time series with missingdata (Xie et al., 2013)

- Use non-parametric submanifold model

- Extract univariate detection test statistics from high-dimensionaldata

Additional Challenge II: change affects only a small subset of time series

I M � N , M is unknown, the subset is unknown

I Unknown and non-homogeneous amplitudes at different series

I Xie and Siegmund (2012) developed a mechanism to suppress noisefrom unaffected sensors

- First, compute a generalized likelihood ration (GLR) for each series,use it to suppress noise from non-affected sensors

- Then, sum the GLRs to compare to a detection threshold


Outline

Introduction

Outlier Detection



Summary and Outlook

Summary and Outlook 67

Summary and Outlook

Anomaly Detection in Cyber Physical Systems

I Inherently high-dimensional

I Heterogeneous data streams

I Both structural problems and non-structural problems exist

- Non-structural: some cyber attacks

- Structural: some nature-induced faults in physical systems

I Real-time requirement

I False positives, false negatives, detection delay

I Causal analysis + anomaly analysis for meaningful results

Summary and Outlook 68

Thank You !Polunchenko and Tartakovsky (2012); Lakhina et al. (2004b); Siegmund and Venkatraman (1995); Yamada et al. (2013); Lorden (1971);

Shiryayev (1963); Roberts (Aug., 1966); Pollak (1985); Pollak and Tartakovsky (2009); Zhu and Abur (July 2007); Abur and Expsito

(2004); Merrill and Schweppe (1971); Handschin et al. (1975); Mili et al. (May 1996); Rousseeuw and Leroy (1987); Kotiuga and

Vidyasagar (April 1982); Falcao and Assis (Aug. 1988); Abur and Celik (Feb. 1991); Clements et al. (Feb. 1991); Singh and Alvarado

(Aug. 1994); Wei et al. (May 1998); Mili et al. (Nov. 1999,F); Pajic and Clements (Nov. 2005); Aboytes and Cory (June 1975); Garcia

et al. (Sept. 1979); Xiang et al. (July 1981,F); Mili et al. (Nov. 1984); Singh and Alvarado (Aug. 1995); Lourenco et al. (May 2004); Diao

et al. (2009); Tate and Overbye (2008); Zhu and Abur (2010); He and Zhang (2010); Zhang and Kezunovic (2006); He and Zhang (2011);

Barnett and Lewis (1994); Ramaswamy et al. (2000); Aggarwal and Yu (2001); Angiulli and Pizzuti (2002); Hu and Sung (2003); Hodge

and Austin (2004); Zhu et al. (2005); Angiulli and Fassetti (2009); Chandola et al. (2009); Lakhina et al. (2004a); Huang et al. (2006)

69

References I

F. Aboytes and B. J. Cory. Identification of measurement and configuration errors in static estimation. In Proc. 9th Power IndustryComputer Application Conference, New Orleans, pages 298 – 302, June 1975.

A. Abur and M. K. Celik. A fast algorithm for the weighted least absolute value state estimation. Power Systems, IEEE Transactions on, 6(2):1 – 8, Feb. 1991.

Ali Abur and Antonio Gmez Expsito. Power System State Estimation: Theory and Implementation. CRC, 2004.

Charu C. Aggarwal and Philip S. Yu. Outlier detection for high dimensional data. In Proceedings of the 2001 ACM SIGMOD InternationalConference on Management of Data, SIGMOD ’01, pages 37–46, 2001.

Fabrizio Angiulli and Fabio Fassetti. Dolphin: An efficient algorithm for mining distance-based outliers in very large datasets. ACM Trans.Knowl. Discov. Data, 3(1):4:1–4:57, March 2009.

Fabrizio Angiulli and Clara Pizzuti. Fast outlier detection in high dimensional spaces. In Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery, PKDD ’02, pages 15–26, 2002.

V. Barnett and T. Lewis. Outliers in Statistical Data. John Wiley and Sons, NY, 1994.

Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jorg Sander. Lof: Identifying density-based local outliers. In Proceedings ofthe 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD ’00, pages 93–104, New York, NY, USA, 2000.ACM. ISBN 1-58113-217-4. doi: 10.1145/342009.335388. URL http://doi.acm.org/10.1145/342009.335388.

Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM Comput. Surv., 41(3):15:1–15:58, 2009.

M. Chen, S. Gonzalez, A. Vasilakos, H. Cao, and V. C. M. Leung. Body area networks: a survey. Mobile Networks and Applications, Aug.2010.

M. X. Cheng, Y. Ling, and W. B. Wu. In-band wormhole detection in wireless ad hoc networks using change point detection method. In2016 IEEE International Conference on Communications (ICC), pages 1–6, May 2016. doi: 10.1109/ICC.2016.7510989.

K. A. Clements, P. W. Davis, and K. D. Frey. An interior point algorithm for weighted lest absolute value state estimation. In IEEE PowerEngineering Society Winter Meeting, Feb. 1991.

R. Diao, Kai Sun, V. Vittal, R.J. O’Keefe, M.R. Richardson, N. Bhatt, D. Stradford, and S.K. Sarawgi. Decision tree-based online voltagesecurity assessment using pmu measurements. Power Systems, IEEE Transactions on, 24(2):832–839, May 2009. ISSN 0885-8950.doi: 10.1109/TPWRS.2009.2016528.

70

http://doi.acm.org/10.1145/342009.335388

References II

D. Falcao and S. M. Assis. Linear-programming state estimation error analysis and gross error identification. Power Systems, IEEETransactions on, 3(3):809–815, Aug. 1988.

A. Garcia, A. Monticelli, and P. Abreu. Fast decoupled state estimation and bad data processing. IEEE Trans. On Power Apparatus andSystems, 98(5):1645–1652, Sept. 1979.

E. Handschin, F.C. Schweppe, J. Kohlas, and A. Fiechter. Bad data analysis for power system state estimation. IEEE Trans. On PowerApparatus and Systems, PAS-94:329–337, 1975.

D.M. Hawkins. Identification of Outliers. Monographs on applied probability and statistics. Chapman and Hall, 1980. ISBN9780412219009. URL https://books.google.com/books?id=fb0OAAAAQAAJ.

Miao He and Junshan Zhang. Fault detection and localization in smart grid: A probabilistic dependence graph approach. In Smart GridCommunications (SmartGridComm), 2010 First IEEE International Conference on, pages 43–48, Oct 2010. doi:10.1109/SMARTGRID.2010.5622016.

Miao He and Junshan Zhang. A dependency graph approach for fault detection and localization towards secure smart grid. Smart Grid,IEEE Transactions on, 2(2):342–351, June 2011. ISSN 1949-3053. doi: 10.1109/TSG.2011.2129544.

Victoria Hodge and Jim Austin. A survey of outlier detection methodologies. Artif. Intell. Rev., 22(2):85–126, October 2004.

Tianming Hu and Sam Y. Sung. Detecting pattern-based outliers. Pattern Recogn. Lett., 24(16):3059–3068, December 2003.

Ling Huang, Michael I. Jordan, Anthony Joseph, Minos Garofalakis, and Nina Taft. In-network pca and anomaly detection. In In NIPS,pages 617–624. MIT Press, 2006.

Peter J. Huber. Projection pursuit. Ann. Statist., 13(2):435–475, 1985.

Edwin M. Knorr and Raymond T. Ng. Algorithms for mining distance-based outliers in large datasets. In Proceedings of the 24rdInternational Conference on Very Large Data Bases, VLDB ’98, pages 392–403, San Francisco, CA, USA, 1998. Morgan KaufmannPublishers Inc. ISBN 1-55860-566-5. URL http://dl.acm.org/citation.cfm?id=645924.671334.

W. W. Kotiuga and M. Vidyasagar. Bad data rejection properties of weighted least absolute value techniques applied to static stateestimation. IEEE Trans. On Power Apparatus and Systems, PAS-101:844 – 851, April 1982.

Anukool Lakhina, Mark Crovella, and Christophe Diot. Diagnosing network-wide traffic anomalies. In Proceedings of the 2004 Conferenceon Applications, Technologies, Architectures, and Protocols for Computer Communications, SIGCOMM ’04, pages 219–230, NewYork, NY, USA, 2004a. ACM. ISBN 1-58113-862-8. URL http://doi.acm.org/10.1145/1015467.1015492.

71

https://books.google.com/books?id=fb0OAAAAQAAJ

http://dl.acm.org/citation.cfm?id=645924.671334

http://doi.acm.org/10.1145/1015467.1015492

References III

Anukool Lakhina, Mark Crovella, and Christophe Diot. Diagnosing network-wide traffic anomalies. In Proceedings of the 2004 Conferenceon Applications, Technologies, Architectures, and Protocols for Computer Communications, SIGCOMM ’04, pages 219–230, 2004b.

C. Levy-Leduc and F. Roueff. Detection and localization of change-points in high-dimensional network traffic data. The Annuals ofApplied Statistics, 3(2):637–662, 2009.

G. Lorden. Procedures for reacting to a change in distribution. Ann. Math. Statist., 42(6):1897–1908, 1971.

E. M. Lourenco, A. S. Costa, and K. S. Clements. Bayesian-based hypothesis testing for topology error identification in generalized stateestimation. Power Systems, IEEE Transactions on, 9(2):1206 – 1215, May 2004.

Y. Mei. Efficient scalable schemes for monitoring a large number of data streams. Biometrica, 97(2):419 – 433, 2010.

H.M. Merrill and F.C. Schweppe. Bad data suppression in power system static state estimation. IEEE Trans. On Power Apparatus andSystems, PAS-90:2718–2725, 1971.

L. Mili, M.G. Cheniae, N.S. Vichare, and P. J. Rousseeuw. Robustification of the least absolute value estimator by means of projectionstatistics. Power Systems, IEEE Transactions on, 11(1):216 – 225, Feb. 1996.

L. Mili, M. Cheniae, N. Vichare, and P. Rousseeuw. Robust state estimation based on projection statistics. Power Systems, IEEETransactions on, 11(2):1118 – 1127, May 1996.

L. Mili, Th. Van Cutesm, and M. Ribbens-Pavella. Hypothesis testing identification: A new method for bad data analysis in power systemstate estimation. IEEE Trans. On Power Apparatus and Systems, 103(11):3239 – 3252, Nov. 1984.

L. Mili, G. Steeno, F. Dobraca, and D. French. A robust estimation method for topology error identification. Power Systems, IEEETransactions on, 14(4):1469 – 1476, Nov. 1999.

S. Pajic and K. A. Clements. Robustification of the least absolute value estimator by means of projection statistics. Power Systems, IEEETransactions on, 20(4):1683 1689, Nov. 2005.

A. Petrov, B. L. Rozovskii, and A. G. Tartakovsky. Efficient nonlinear filtering methods for detection of dim targets by passive systems.Multitarget-Multisensor Tracking: Applications and Advances, 2003.

Moshe Pollak. Optimal detection of a change in distribution. Ann. Statist., 13(1):206–227, 1985.

Moshe Pollak and Alexander G. Tartakovsky. Optimality properties of the shiryaev-roberts procedure. Statistica Sinica, 19(4):1729–1739,2009.

72

References IV

A. Polunchenko and A. G. Tartakovsky. State-of-the-art in sequential change-point detection. Methodol. Comput. Appl.Probab., 14(3):649–684, 2012.

Daniel Rabinowitz. Detecting clusters in disease incidence. Lecture Notes-Monograph Series, 23, Change-point Problems:255–275, 1994.

Sridhar Ramaswamy, Rajeev Rastogi, and Kyuseok Shim. Efficient algorithms for mining outliers from large data sets. In Proceedings ofthe 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD ’00, pages 427–438, 2000.

S. W. Roberts. A comparison of some control chart procedures. Technometrics, 8(3):411–430, Aug., 1966.

P. J. Rousseeuw and A. M. Leroy. Robust Regression and Outlier Detection. John Wiley, 1987.

K. Shafie, B. Sigal, D. Siegmund, and K. Worsley. Rotation space random fields with an application to fmri data. Ann. Statist., 31:1732 –1771, 2003.

A. N. Shiryayev. On optimal methods in earliest detection problems. Theory of Probab. and Appl., 8:26–51, 1963.

D. Siegmund and E. S. Venkatraman. Using the generalized likelihood ratio statistic for sequential detection of a change-point. Ann.Statist., 23(1):255–271, 1995.

D. O. Siegmund and B. Yakir. Detecting the emergence of a signal in a noisy image. Statistics and Its Inference, 1:3–12, 2008.

H. Singh and F. Alvarado. Weighted least absolute value state estimation using interior point methods. Power Systems, IEEE Transactionson, 9(3):1478 – 1484, Aug. 1994.

H. Singh and F. L. Alvarado. Network topology determination using least absolute value state estimation. Power Systems, IEEETransactions on, 10(3):1159 – 1165, Aug. 1995.

A. G. Tartakovsky and V. V. Veeravalli. Asymptotically optimal quickest change detection in distributed sensor. Sequential Analysis, 27(4):441–475, 2008.

J.E. Tate and T.J. Overbye. Line outage detection using phasor angle measurements. Power Systems, IEEE Transactions on, 23(4):1644–1652, Nov 2008. ISSN 0885-8950. doi: 10.1109/TPWRS.2008.2004826.

Haining Wang, Danlu Zhang, and Kang G. Shin. Detecting syn flooding attacks. In Proceedings.Twenty-First Annual Joint Conference ofthe IEEE Computer and Communications Societies, volume 3, pages 1530–1539, June 2002. doi:10.1109/INFCOM.2002.1019404.

H. Wei, H. Sasaki, J. Kubokawa, and R. Yokoyama. An interior point method for power system weighted nonlinear l1 norm static stateestimation. Power Systems, IEEE Transactions on, 13(2):617 – 623, May 1998.

73

References V

Wei Biao Wu, Maggie X. Cheng, and Bei Gou. A hypothesis testing approach for topology error detection in power grids. IEEE Internet ofThings, 2016.

N. Xiang, S. Wang, and E. Yu. A new approach for detection and identification of multiple bad data in power system state estimation.IEEE Trans. On Power Apparatus and Systems, 101(2):454 – 462, Feb. 1982.

N. Xiang, S. Wang, and E. Yu. Estimation and identification of multiple bad data in power system state estimation. In Proc. 7th PowerSystem Computation Conference, PSCC, Lausanne, July 1981.

Y. Xie and D. Siegmund. Sequential multi-sensor change-point detection. Ann. Statist., 2012.

Yao Xie, Jiaji Huang, and R. Willett. Change-point detection for high-dimensional time series with missing data. Selected Topics in SignalProcessing, IEEE Journal of, 7(1):12–27, Feb 2013.

Makoto Yamada, Akisato Kimura, Futoshi Naya, and Hiroshi Sawada. Change-point detection with feature selection in high-dimensionaltime-series data. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI ’13, pages1827–1833, 2013.

Nan Zhang and M. Kezunovic. Improving real-time fault analysis and validating relay operations to prevent or mitigate cascadingblackouts. In Transmission and Distribution Conference and Exhibition, 2005/2006 IEEE PES, pages 847–852, May 2006. doi:10.1109/TDC.2006.1668608.

Cui Zhu, Hiroyuki Kitagawa, and Christos Faloutsos. Example-based robust outlier detection in high dimensional datasets. In Proceedingsof the Fifth IEEE International Conference on Data Mining, ICDM ’05, pages 829–832, 2005.

Jun Zhu and A. Abur. Improvements in network parameter error identification via synchronized phasors. Power Systems, IEEETransactions on, 25(1):44–50, Feb 2010. ISSN 0885-8950. doi: 10.1109/TPWRS.2009.2030274.

Jun Zhu and A. Abur. Bad data identification when using phasor measurements. In IEEE Power Tech Conference, pages 1676 – 1681,July 2007.

74

Anomaly Detection in Cyber Physical...

Documents

Transcript of Anomaly Detection in Cyber Physical...