Anomaly Detection in Cyber Physical...
Transcript of Anomaly Detection in Cyber Physical...
Anomaly Detection in Cyber Physical Systems
Maggie Cheng
Illinois Institute of Technology
IEEE Big Data ConferenceDecember 11, 2018
Seattle, WA
Outline
Introduction
Outlier Detection
Sequential Change Point Detection
High Dimensional Data
Summary and Outlook
Introduction 2
Anomaly Analysis
I Unusual and significant changes in the network or CPS
I Necessary to detect, and remove/mitigate
I Analyzing anomalies from data is a big data analytics problem:
- large amount of data
- high speed
- high dimensional
- heterogeneous types
- noisy
I Involves extracting and interpreting anomalous patterns from data
- difficult to define a baseline for normal pattern
Introduction 3
Anomaly Analysis
The steps in diagnosing an anomaly
1. Detection: binary, or real-valued anomaly score
- The time at which the anomaly is observed
2. Identification: selecting the anomaly type from a set of possiblecandidate anomalies
- Zero-day attack?
3. Localization
- Which link/node, component?
4. Quantification of impact
- A measure of the importance of the anomaly
The scope of this tutorial
I Detection
I Cyber + physical anomalies
Introduction 4
Types of Anomalies
Outlier
I Aberrant observations that are considerably different from themajority are outliers.
- ”An outlier is an observation that deviates so much from otherobservations as to arouse suspicion that it was generated by adifferent mechanism” (Hawkins, 1980)
I Indicator of problems
I With or w/o time dimension
Change Point on Time Series
I Abrupt change in time series
I Indicator of change of state in the underlying process
Introduction 5
Types of Anomalies
Outlier
I Location outliers
I Scatter outliers
I Combination of both
Change Point
I Change in mean
I Change in variance
I Combination of both
Index
NULL
Index
NULL
Introduction 6
Outlier vs. Change Point on Time Series
Change point may not be an outlier; Outlier may not be a change point
IndexNULL
Introduction 7
Outline
Introduction
Outlier Detection
Sequential Change Point Detection
High Dimensional Data
Summary and Outlook
Outlier Detection 8
Outlier Detection
Applications of Outlier Analysis
I Credit card fraud detection
I Intrusion detection in networks
I Bad data detection in power grid
- Observing abnormal values indicates measurement errors or errors inthe generating process
I Many other applications ...
Evolution of Outlier Detection Techniques
I From univariate to multivariate (multi-dimensional)
I From one normal class to multiple normal classes (more than onegenerating mechanism underlying the data)
Outlier Detection 9
Outlier Detection
Detection TechniquesI Statistical outlier detection (Barnett and Lewis, 1994)- Based on distributionI Distance-based (Knorr and Ng, 1998)- Based on notion of proximityI Density-based (Breunig et al., 2000)- Based on local outlier factor
Outlier Detection 10
Example I: Bad Data Detection in Power Systems
Weighted Least Square Based Method
Outlier Detection 11
Example I: Bad Data Detection in Power Systems
WLS-Based Error Detection in Power Systems
I Linear WLS
minimize f = ‖e‖2 = eT · e =
m∑i=1
wi
zi − n∑j=1
aijxj
2
I Non-Linear WLS
minimize f = ‖e‖2 = eT · e =
m∑i=1
1
σ2i
[zi − hi(x)]2
x: state variables
z: measurements
h(x): non-linear measurement functions
zi − hi(x): residual of the ith measurement
Outlier Detection 12
Example I: Bad Data Detection in Power Systems
Weighted Least Square Based Method: Using χ2-Test
I Let zi be measurement, zi be anticipated value for zi.
I Assume residuals Xi = zi − zi are Gaussian and independent.
I Therefore, χ2 =n∑i=1
(Xi/σi)2 follows χ2-squared distribution with
DOF ν = n.
I Given a significance level α, if χ2 > χ2α → bad data detected.
Assumptions in WLS
I Errors in measurements are Gaussian and independent.
I System topology model is correct.
Outlier Detection 13
Example II: Line Outage Detection in Power Systems
Topology Error: Discrepancy Between Assumed Model and Actual Model
(h) Assumed model (i) Actual model
IEEE 9-Bus Test System
Outlier Detection 14
Example II: Line Outage Detection in Power Systems
System Matrix Known
I Hypothesis testing concerning the mean vector of the residuals
I Use the asymptotic distribution theory of Maxima
System Matrix Unknown
I Perform change point detection on time series → anomaly detected
I Identify and locate specific line outage
Outlier Detection 15
Example II: Line Outage Detection in Power Systems
A New Topology Error Detection Approach
x: state variables
z: measurements
h(x): non-linear measurement functions
zi = hi(x): the anticipated value for the ith measurement
Xi = zi − zi: the residual of the ith measurement
I Without topology errors, it is expected that the residuals Xi arenormally distributed with zero means:
Xi ∼ N (µi, σ2i ) and µi = 0,∀i
I The problem of topology error detection is cast as a problem oftesting the hypothesis of whether the mean vector of a stochasticprocess is zero.
Outlier Detection 16
Example II: Line Outage Detection in Power Systems
Hypothesis Testing Approach
I Input: the residuals for the n redundant measurements:Xi = zi − zi, i = 1 . . . n.
I Hypotheses:H0 : µ1 = . . . = µn = 0
H1 : ∃i, such that µi 6= 0
New Test StatisticsDefine Mn = max
i≤n|Xi|/σi
I Mn is the maximum standardized residual
I Hypothesis testing based on Mn: Compute Mn from data andcompare Mn with what is expected under the null hypothesis.
Outlier Detection 17
Hypothesis Testing
Hypothesis Testing FrameworkGiven a significance level α ∈ (0, 1), find a cutoff value tn such thatunder the null hypothesis H0,
P(Mn > tn) = α
I If Mn > tn, reject H0
I How to find the cutoff value tn?
Outlier Detection 18
Cutoff Value tn
Detection Threshold tn: When Xi Are IndependentFrom Mn = max
i≤n|Xi|/σi and P(Mn > tn) = α, we have
1− α = P(Mn ≤ tn)
=
n∏i=1
P(|Xi|/σi ≤ tn)
= [1− 2Φ(−tn)]n
where Φ(x) =∫ x−∞ φ(u)du, and φ(x) = (2π)−1/2e−x
2/2.
Let Φ−1 be the inverse function of Φ. Hence
tn = −Φ−1{[1− (1− α)1/n]/2} (1)
Outlier Detection 19
Asymptotic Theory for Mn
When Xi are dependent
I Need to know the asymptotic distribution for Mn under dependence
Definition (Measure of Dependence)
I Let X1, X2, . . . , Xn be a non-stationary Gaussian process; rij be thecorrelation between Xi and Xj , for 1 ≤ i, j ≤ n.
I Define a skeleton index set
S(δ) = {(i, j) : |rij | > δ, 1 ≤ i, j ≤ n}
I The cardinality of the set S(δ) is a measure of dependence.
I Large |S(δ)| implies strong overall dependence.
Outlier Detection 20
Asymptotic Theory for Mn
Theorem (Weak Dependence Condition (Wu et al., 2016))If there exists λ ∈ (0, 1) and constant C,α > 0, such that
maxij|rij | < λ (2)
and for every δ ∈ (0, 1), the cardinality of the skeleton index set S(δ)satisfies
|S(δ)| ≤ Cnδ−α (3)
Then for every 0 < α < 1, we have
limn→∞
P(Mn ≤ tn) = 1− α (4)
Remarks
1. Under conditions (2) and (3), the maxima Mn has the sameasymptotic distribution as the one obtained under independence.
2. When power grids are sparsely connected, the condition is easilysatisfied.
Outlier Detection 21
Detecting a Topology Error in IEEE 118-Bus System
Setup
I 118 state variables, 301 measurements: 183 redundantmeasurements.
I Topology change: remove the transmission line between bus 37 and38.
Procedure
I Obtain the residuals (Xi)ni=1 by using non-WLS state estimation.
I Estimate the standard error σ.
I Compute the standardized residual Xi/σi, i = 1, . . . , n.
I Compute the cutoff value tn with significance level α = 0.001 andn = 183.
I By hypothesis testing approach, if there is an index i such that|Xi|/σi > tn, reject H0, and report alarm.
Outlier Detection 22
Detecting a Topology Error in IEEE 118-Bus System
Results
I tn = 4.546012
I Identify the index set J = {j : |Xj |/σj > tn}
J = {112, 113, 114, 115, 116, 124, 125, 127, 129, 130, 154, 155, 162, 165}
0 50 100 150
−1.0
−0.5
0.0
0.5
1.0
Index
Resid
uals
(j) Residuals Xi
0 50 100 150
0.0
00.0
50.1
00.1
5
Index
Estim
ate
d S
tandard
Err
ors
(k) Standard errors σi
0 50 100 150
−10
−5
05
Index
Sta
ndard
ized R
esid
uals
(l) Xi/σi
Outlier Detection 23
Example III: PCA-Based Network Traffic Anomaly
Detection
(Lakhina et al., 2004a; Huang et al., 2006)I Outlier detection from time seriesI Use Principal Component Analysis (PCA) to detect anomaliesI Detect volume anomaly in network traffic- Use O-D flow information- Each link has aggregated traffic from all O-D flows
Outlier Detection from Time Series 24
Example III: PCA-Based Network Traffic Anomaly
Detection I
PCA Analysis on Network Data
I Form link data matrix Ym×n
- m: last m data points, n: number of links
I Perform PCA on Y
- Find the principle components:
1 The first principle component v1:
v1 = arg max‖v‖=1
‖ Y v ‖
‖ Y v ‖2 is proportional to the variance of the data measured along v.
Outlier Detection from Time Series 25
Example III: PCA-Based Network Traffic Anomaly
Detection II
2 The k-th principle component vk is:
vk = arg max‖v‖=1
‖ (Y −k−1∑i=1
Y vivTi )v ‖
- The matrix P = [v1, v2, . . . , vk] is formed by the first k principlecomponents, which capture the dominant variance in the data
I Separate normal and anomalous network-wide traffic
Y = Y + Y
Y = PPTY , and Y = (I − PPT )Y
- Y : contains residual traffic.
- Volume anomaly will result in a large spike in Y
Outlier Detection from Time Series 26
Outline
Introduction
Outlier Detection
Sequential Change Point Detection
High Dimensional Data
Summary and Outlook
Sequential Change Point Detection 27
Sequential Change Point Detection
Sequential Change Point DetectionTo detect a change point in a time series {X1, X2, . . . , Xn}, it is assumedthat the pre-change density is f0, and if a change occurs at time ν, thenthe post-change density becomes f1 beginning from moment ν + 1.
The hypotheses are then formulated as:H0: {X1, X2, . . . , Xn} ∼ f0H1: {X1, X2, . . . , Xν} ∼ f0, and {Xν+1, Xν+2, . . . , Xn} ∼ f1
The Change Point Detection Problem is to decide(1) which hypothesis is true?(2) if H1 is true, ν =?
The time instance ν, at which the state of the process changes is referredto as the change point or time of change.
Sequential Change Point Detection 28
Sequential Change Point Detection
Algorithms
I Cumulative Sum Algorithm (CUSUM)
I Shiryaev-Roberts Procedure
I Sliding Window Algorithm
Performance Metrics
I False positive and false negative rates
- If a change occurred but the detection procedure failed to detect it:false negative (misdetection)
- If the detection time N < ν: false positive (false alarm)
I Detection delay
- If there is a true change and the time of change is ν, the detectiontime is N , then detection delay τ = N − ν
Sequential Change Point Detection 29
CUSUM (Page, 1954)
OptimalityCUSUM is optimal in the sense of minimizing worst case detection delay.
Assumptions
I Observations X1, X2, . . . Xn are independent, iid pre-change and iidpost-change
I Probability density functions: f0 before change; f1 after change
I Assume f0 and f1 are known
I The only thing unknown is ν, the time of change
Parameter
I Detection threshold h
Sequential Change Point Detection 30
CUSUM
Parametric CUSUM: Based on Maximum Likelihood PrincipleDetection statistics:
Wn = max(Wn−1 + Zn, 0) for n ≥ 1
where W0 = 0Zn = logLn
Ln is the likelihood ratio: Ln =f1(Xn|Xn−1
1 )
f0(Xn|Xn−11 )
, or Ln = f1(Xn)f0(Xn)
for i.i.d.
The procedure declares a change as soon as the detection statistics Wn
exceeds a preset threshold h:
N = min{n ≥ 1 : Wn ≥ h}
Sequential Change Point Detection 31
Shiryaev-Roberts Procedure
Roberts (Aug., 1966); Shiryayev (1963)
The Problem Setting
I Original Setting: ”Quickest Detection of a Disorder in a StationaryRegime”
- The change is possibly taking place at a far horizon
I A randomized version for a general discrete time setting
I Applications: target detection and tracking, rapid detection ofintrusions in communication networks, environmental monitoring
- Early detection of changes that may occur in a distinct future
Sequential Change Point Detection 32
Shiryaev-Roberts Procedure
The Algorithm
1. Shiryaev-Roberts statistic Rn =n∑k=1
p(X1,...,Xn|ν=k)p(X1,...,Xn|ν=∞)
2. From independence assumption: Rn =n∑k=1
n∏i=k
f1(Xi)f0(Xi)
3. Rn can be computed recursively: Rn = (1 +Rn−1) f1(Xn)f0(Xn)
, for
n ≥ 1; R0 = 0
4. Stopping time: RAB= min{n ≥ 1 : Rn ≥ AB}
Parameter: AB is chosen such that E∞NAB= B
B is a preset value before surveillance begins.
Sequential Change Point Detection 33
Shiryaev-Roberts Procedure
Detection Delay Shiryaev-Roberts procedure is the best in terms ofminimizing the expected detection delay (asymptotically).
TheoremShiryaev-Roberts procedure minimizes
∞∑k=1
Ek(N − k)+
over all stopping times N that satisfy E∞(N) ≥ B.
CUSUM and S-R Procedure
I Based on ratio of likelihoods
- S-R procedure is a CUSUM-type of algorithm
I Difficult to apply when f1 and f0 are unknown
Sequential Change Point Detection 34
Sliding Window Algorithm
Preset Parameters
I Window size m (m� N , the total number of data points)
I Significance level α (e.g., α = 0.05)
Sliding Window Algorithm (Cheng et al., 2016)
1. Set window offset d = 0.
2. Compute the sum S1 =d+m∑i=d+1
Xi, and S2 =d+2m∑
i=d+m+1
Xi.
3. If | S2 − S1 |≥ zσ√
2m, declare a change point ν = d+m
4. Else set d = d+ 1, go to line 2.
Remarks:
I z is the critical value that provides an area of α in the upper tail ofthe standard normal distribution.
I σ2 is the variance, updated as the window moves
Sequential Change Point Detection 35
Sliding Window Algorithm
Algorithm Properties
I Be able to detect a change in state without knowing the actual pre-and post- change densities
I Relate detection threshold to a tolerable false alarm rate —controlled trade-off
I Relate detection threshold to the dynamic characteristics of the dataand not use a preset value
I Be able to detect abrupt changes as well as slow and subtle changes
I Avoid mistaking an isolated outlier as a change for a new state
Sequential Change Point Detection 36
Applications Using Sequential Change Point Detection
I DoS Attack Detection
- SYN flood attack
I Attack Detection in Wireless Networks
- Network layer
- MAC layer
- Physical layer
I Power Grid Anomaly Detection
Sequential Change Point Detection 37
DoS Attack Detection
A Common DoS Attack: SYN Flood Attack
I Attacker sends control packets to compromised nodes
I A large number of flooding sources send an excessive number ofSYN requests to the victim
I The victim server returns SYN/ACK packet to the client waiting forACK until timeout
I Flooding sources never return an ACK
I Exhaust the victim server’s backlog queue → all connection requestsdropped
Challenges: preset threshold (7)
I Traffic patterns vary from site to site, from time to time
I Per-flow state information not known
I Normal traffic models hard to define
Sequential Change Point Detection 38
DoS Attack Detection
How to detect w/o prior knowledge of flow and traffic info?Detection mechanism must be insensitive to site and traffic patterns.
I There is no normal traffic model or flow rate, but there is normalbehavior
I Baseline: protocol behavior (TCP connection management)
- Normal: FINs match with SYN requests from clients
- Packet drop/retransmission cause small discrepancy
- Under SYN flood attack: Large difference between the number ofSYNs and FINs received
Sequential Change Point Detection 39
SYN Flood Attack
I Attackers create a large number of ”open” connectionsI Change in network measurement: | SY N − FIN | shows abrupt
increase
Sequential Change Point Detection 40
DoS Attack Detection
Detection Procedure
I Monitor the number of SYNs and FINs
- at egress router (near the flooding source)
- at ingress router (near the victim server)
I Generate time series on (SYNs–FINs)
I Perform sequential change point detection on time series
- Non-parametric version
Sequential Change Point Detection 41
DoS Attack Detection
Non-Parametric CUSUM for Change Point Detection (Wang et al., 2002)
I Tunable parameters: a,N
I Observations: S: number of SYNs; F : number of FINs
1. Dn = Sn − Fn2. Rn = α(Rn−1) + (1− α)Fn
3. Xn = Dn/Rn
4. choose constant a > E(Xn)
5. Test statistic: yn = (yn−1 + (Xn − a))+, y0 = 0
6. Detection: first n such that yn > N
Remarks
I Algorithm very sensitive to N and a.
I Difficulty: determining N and c before monitoring begins.
Sequential Change Point Detection 42
Wormhole Attack in Wireless Ad Hoc Networks
Routing: A category of routing protocols use shortest path routing.
I Nodes exchange local information and relay to others
I Nodes collectively decide a route towards a destination
I Select the ”best route” based on hop count (shortest path routing)
Wormhole Attack
I Adversary controls two end points and a tunnel between them
I Attract traffic to go through the controlled wormhole tunnel bymaking false route advertisement— a shorter path towards adestination
Sequential Change Point Detection 43
In-Band vs. Out-Band Wormhole Attack
In-Band Wormhole Attack
I Wormhole tunnel consists ofother wireless nodescontrolled by the adversary
I Re-routed packets gothrough these wireless nodes
AB
Out-Band Wormhole Attack
I Wormhole tunnel is anexternal link
– A wired link
– A wireless link (e.g., along-range directional link)
AB
This talk: address in-band wormhole attack
Sequential Change Point Detection 44
Wormhole Attack Detection
Performance Degradation in an In-Band Wormhole Attack
I End-to-end delay increases
I Throughput decreases
I Packet Deliver Ratio drops (If the wormhole endpoints drops packetsarbitrarily)
I and more . . .
Proposed Method
I Model the end-to-end delay of a flow as a time series
I Perform Change Point Detection on the time series to detect thechange
Sequential Change Point Detection 45
Stationary Network — Setup
I An in-band wormhole tunnel is established between node 1 and node2 at 50 seconds.
I Wormhole tunnel: 1-19-25-23-2 advertised as one hop 1-2I Two flows, without other traffic in the background
I Flow 18 28:
– Before 50 seconds: use path18-9-34-35-37-28
– After 50 seconds: use path18-1 ... 2-28
I Flow 17 38:
– Before 50 seconds: use path17-14-21-16-38
– After 50 seconds: use path17-1...2-38
Sequential Change Point Detection 46
Stationary Network— Result I
02
46
81
0
18−−28
0 20 40 60 80 100
02
46
81
0
17−−38
Simulation Time (seconds)
En
d−
to−
en
d D
ela
y (
se
co
nd
s)
Sequential Change Point Detection 47
Stationary Network— Result II
I Three flows that changed routes: 9 2, 18 28, 17 38I There are other flows in the background that stayed on the original
routes
02
46
810
12
9−−24
02
46
810
12
18−−28
0 20 40 60 80 100
02
46
810
12
17−−38
Simulation Time (s)
End−
to−
end D
ela
y (
s)
Figure: Packet size 256B,interval=0.01s, 0.025s
01
23
45
9−−24
01
23
45
18−−28
0 20 40 60 80 100
01
23
45
17−−38
Simulation Time (s)
End−
to−
end D
ela
y (
s)
Figure: Packet size 256B,interval=0.02s, 0.01s
Sequential Change Point Detection 48
MAC-Layer Attack Detection in Wireless Networks
IEEE 802.11 MACI CSMA/CAI RTS-CTS-DATA-ACK
(a) Computer Networking, Kurose & Ross
(b)
Sequential Change Point Detection 49
MAC Layer Misbehavior in IEEE 802.11 Networks
Sender Selfish Behavior
I Manipulation on carrier sense time
I Manipulation on back-off value during contention
Consequences
I Channel-capturing effect: other nodes have less chance to transmit
Receiver Selfish Behavior
I RTS dropping attack
Consequences
I Clear channel for itself
I Sender waste resource retransmit RTS
Sequential Change Point Detection 50
MAC Layer Misbehavior Detection
Other Flows Experience Performance Degradation
I End-to-end delay increases
I Throughput decreases
I Packet interval increases
Detection Method
I Monitor packets received
I Compute per-flow end-to-end delay (or throughput, packet interval)as a time series
I Use the sliding window change point detection method to detect thechange on time series
Sequential Change Point Detection 51
Simulation Setup
I Case 1: Shorter DIFS attack
normal sender: DIFS>SIFS
attacker: switch to DIFS=SIFS starting at 50s
I Case 2: Shorter DIFS and smaller back-off window γ
normal sender: following binary exponential back-off, γ ∈ [32, 1024]
attacker: use fixed γ = 2
I Case 3: RTS dropping attack
normal receiver: respond CTS for every RTS request
attacker: RTS to CTS ratio 20:1
Sequential Change Point Detection 52
MAC Layer Misbehavior Detection Results
I Case 1: Five victim flows 19→ 1, 14→ 1, 12→ 1, 10→ 1, 6→ 1I Case 2: Same as case 1I Case 3: 20→ 2, 11→ 2, 8→ 2, 7→ 2, 5→ 2I Node 2 is the attacker in all cases
1
2
3
4
5
67
8
9
10
11
12
13
14
15
16
18
19
17
20
Sequential Change Point Detection 53
Result I: Case 1
0.0
0.2
0.4
0.6
0 20 40 60 80 100
Delay
Simulation Time (seconds)
De
lay (
s)
(d) Delay
600
800
1000
1200
0 20 40 60 80 100
Throughput
Simulation Time (seconds)
Th
rou
gh
pu
t (k
bp
s)
(e) Throughput
0.000.010.020.030.040.05
0 20 40 60 80 100
Delay Mean
Simulation Time (seconds)
De
lay M
ea
n (
s)
(f) µD
812
816
820
0 20 40 60 80 100
Throughput Mean
Simulation Time (seconds)
Th
rou
gh
pu
t M
ea
n (
kb
ps)
(g) µT
Sequential Change Point Detection 54
Result II: Case 2
0.00
0.04
0.08
0.12
0 20 40 60 80 100
Delay
Simulation Time (seconds)
De
lay (
s)
(h) Delay
700
800
900
1000
020406080100
Throughput
Simulation Time (seconds)
Th
rou
gh
pu
t (k
bp
s)
(i) Throughput
0.0020
0.0030
0.0040
0.0050
0 20 40 60 80 100
Delay Mean
Simulation Time (seconds)
De
lay M
ea
n (
s)
(j) µD
817
819
821
823
0 20 40 60 80 100
Throughput Mean
Simulation Time (seconds)
Th
rou
gh
pu
t M
ea
n (
kb
ps)
(k) µT
Sequential Change Point Detection 55
Result III: Case 3
0500
1000
1500
2000
0 20 40 60 80 100
Tx Data Rate
Simulation Time (seconds)
Tx D
ata
Ra
te (
kb
ps)
(l) Throughput
0.0
0.2
0.4
0.6
0.8
0 20 40 60 80 100
Tx Packet Interval
Simulation Time (seconds)
Tra
nsm
itte
d P
acke
t In
terv
al (s
)
(m) Packet Interval
600
800
1000
1200
0 20 40 60 80 100
Tx Data Rate Mean
Simulation Time (seconds)
Tx D
ata
Ra
te M
ea
n (
kb
ps)
(n) µT
0.010
0.015
0.020
0.025
0 20 40 60 80 100
Cumulative Average
Simulation Time (seconds)
Tx P
acke
t In
terv
al (s
)
(o) µI
Sequential Change Point Detection 56
Jamming Attack Detection in Wireless Networks
Attacks
I All nodes exposed to open medium
I Jamming signals: using higher transmission power, do not have tofollow MAC protocol
I Legitimate nodes suffer
- TDMA: collision, increased packet error rate and drop rate
- CSMA: collision, channel capturing
Detection Procedure
I Detect changes from network measurements (delay, throughput,error rate, packet delivery ratio, signal strength, IFS, etc)
I Distinguish
- Jamming vs. weak signals from legitimate nodes
- Jamming vs. network congestion among legitimate nodes
Sequential Change Point Detection 57
Jamming Attack Detection in Wireless Networks
Detection MethodsI Use summary information in a time interval, compare against a
preset detection threshold
- Not suitable for highly dynamic networksI Use change point detection on time series
- Test statistic: delay, throughput, received packets IFS
e.g.: Throughput when jamming signal duration varies
0 10 20 30 40 50 60 70 80 90 100100
150
200
250
300
350
400
450
500
550
(p) 0.0005s
0 10 20 30 40 50 60 70 80 90 100100
150
200
250
300
350
400
450
500
550
(q) 0.8s
0 10 20 30 40 50 60 70 80 90 100100
150
200
250
300
350
400
450
500
550
(r) 1.5s
Sequential Change Point Detection 58
Anomaly Detection in Power Grids
Types of Anomalies
I Line outage
- wild animals
- weather
- over-grown trees
- coupled with aging infrastructure + lack of maintenance
I Generator outage
I Transformer fault
I Human errors
I Cyber attacks
Sequential Change Point Detection 59
Error Detection in Power Grids
Methods
I Detection Based on State Estimationm (WLS)
- Works for measurement errors (e.g., bad data detection)
- Line outage: topology change often causes conforming errors
I Other Detection Methods
- When the system matrix is known, e.g., (Wu et al., 2016)
- When the system matrix is unknown:
Real-time change point detection + anomaly identification
Sequential Change Point Detection 60
Real-Time Anomaly Detection in Power Grids
What Feature to Use?
G2
G1
G3
Load A Load B
Load C
T2
T1
T3
1
2 3
4
5 6
7
8
9
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
0 20 40 60 80 100 120
Bus 8 angle
Bus 9 angle
Difference(8-9)
Simulation Time (seconds)
Vo
lta
ge
Ph
ase
An
gle
(R
ad
ian
s)
Sequential Change Point Detection 61
Outline
Introduction
Outlier Detection
Sequential Change Point Detection
High Dimensional Data
Summary and Outlook
High Dimensional Data 62
Outlier Detection for High-Dimensional Data
I Notion of proximity not straightforward (7)
I High-dimensional sparse data: sparsity makes every data point anoutlier
Outlier detection algorithms for high-dimensional data
I Naıve brute force: exhaustive search (Slow!)
I Evolutionary algorithm (Aggarwal and Yu, 2001)
I Projection-based outlier detection (Huber, 1985)
- E.g., principle component analysis
Outlook
I Detecting outliers among missing data
I Fast method for detecting outliers among a mixture of categoricaland continuous variables
High Dimensional Data 63
Change Point Detection in High-Dimensional Data
A Common Challenge: ScalabilityAlgorithms that work for univariate or low-dimensional time series maynot work for high-dimensional time series.
Methods
I Sum CUSUM statistic from each series (Mei, 2010)
I Sum the local likelihood ratio statistic, then forming a CUSUMstatistic (Tartakovsky and Veeravalli, 2008)
Assumptions
I Non-Structural problems
I Post-change distributions are prescribed
I All series are affected by the change
High Dimensional Data 64
Change Point Detection in High-Dimensional Data
I Non-Structural Problem
- No spatial model relating the signal to observations at variouslocations
- Other work: Chen et al. (Aug. 2010); Petrov et al. (2003);Levy-Leduc and Roueff (2009)
I Structural Problem
- Has a spatial structure relating the signal to observations at variouslocations
- Other work: Rabinowitz (1994); Shafie et al. (2003); Siegmund andYakir (2008)
High Dimensional Data 65
Change Point Detection in High-Dimensional Data
Additional Challenge I: with missing data
I Detecting changes from high dimensional time series with missingdata (Xie et al., 2013)
- Use non-parametric submanifold model
- Extract univariate detection test statistics from high-dimensionaldata
Additional Challenge II: change affects only a small subset of time series
I M � N , M is unknown, the subset is unknown
I Unknown and non-homogeneous amplitudes at different series
I Xie and Siegmund (2012) developed a mechanism to suppress noisefrom unaffected sensors
- First, compute a generalized likelihood ration (GLR) for each series,use it to suppress noise from non-affected sensors
- Then, sum the GLRs to compare to a detection threshold
High Dimensional Data 66
Outline
Introduction
Outlier Detection
Sequential Change Point Detection
High Dimensional Data
Summary and Outlook
Summary and Outlook 67
Summary and Outlook
Anomaly Detection in Cyber Physical Systems
I Inherently high-dimensional
I Heterogeneous data streams
I Both structural problems and non-structural problems exist
- Non-structural: some cyber attacks
- Structural: some nature-induced faults in physical systems
I Real-time requirement
I False positives, false negatives, detection delay
I Causal analysis + anomaly analysis for meaningful results
Summary and Outlook 68
Thank You !Polunchenko and Tartakovsky (2012); Lakhina et al. (2004b); Siegmund and Venkatraman (1995); Yamada et al. (2013); Lorden (1971);
Shiryayev (1963); Roberts (Aug., 1966); Pollak (1985); Pollak and Tartakovsky (2009); Zhu and Abur (July 2007); Abur and Expsito
(2004); Merrill and Schweppe (1971); Handschin et al. (1975); Mili et al. (May 1996); Rousseeuw and Leroy (1987); Kotiuga and
Vidyasagar (April 1982); Falcao and Assis (Aug. 1988); Abur and Celik (Feb. 1991); Clements et al. (Feb. 1991); Singh and Alvarado
(Aug. 1994); Wei et al. (May 1998); Mili et al. (Nov. 1999,F); Pajic and Clements (Nov. 2005); Aboytes and Cory (June 1975); Garcia
et al. (Sept. 1979); Xiang et al. (July 1981,F); Mili et al. (Nov. 1984); Singh and Alvarado (Aug. 1995); Lourenco et al. (May 2004); Diao
et al. (2009); Tate and Overbye (2008); Zhu and Abur (2010); He and Zhang (2010); Zhang and Kezunovic (2006); He and Zhang (2011);
Barnett and Lewis (1994); Ramaswamy et al. (2000); Aggarwal and Yu (2001); Angiulli and Pizzuti (2002); Hu and Sung (2003); Hodge
and Austin (2004); Zhu et al. (2005); Angiulli and Fassetti (2009); Chandola et al. (2009); Lakhina et al. (2004a); Huang et al. (2006)
69
References I
F. Aboytes and B. J. Cory. Identification of measurement and configuration errors in static estimation. In Proc. 9th Power IndustryComputer Application Conference, New Orleans, pages 298 – 302, June 1975.
A. Abur and M. K. Celik. A fast algorithm for the weighted least absolute value state estimation. Power Systems, IEEE Transactions on, 6(2):1 – 8, Feb. 1991.
Ali Abur and Antonio Gmez Expsito. Power System State Estimation: Theory and Implementation. CRC, 2004.
Charu C. Aggarwal and Philip S. Yu. Outlier detection for high dimensional data. In Proceedings of the 2001 ACM SIGMOD InternationalConference on Management of Data, SIGMOD ’01, pages 37–46, 2001.
Fabrizio Angiulli and Fabio Fassetti. Dolphin: An efficient algorithm for mining distance-based outliers in very large datasets. ACM Trans.Knowl. Discov. Data, 3(1):4:1–4:57, March 2009.
Fabrizio Angiulli and Clara Pizzuti. Fast outlier detection in high dimensional spaces. In Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery, PKDD ’02, pages 15–26, 2002.
V. Barnett and T. Lewis. Outliers in Statistical Data. John Wiley and Sons, NY, 1994.
Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jorg Sander. Lof: Identifying density-based local outliers. In Proceedings ofthe 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD ’00, pages 93–104, New York, NY, USA, 2000.ACM. ISBN 1-58113-217-4. doi: 10.1145/342009.335388. URL http://doi.acm.org/10.1145/342009.335388.
Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM Comput. Surv., 41(3):15:1–15:58, 2009.
M. Chen, S. Gonzalez, A. Vasilakos, H. Cao, and V. C. M. Leung. Body area networks: a survey. Mobile Networks and Applications, Aug.2010.
M. X. Cheng, Y. Ling, and W. B. Wu. In-band wormhole detection in wireless ad hoc networks using change point detection method. In2016 IEEE International Conference on Communications (ICC), pages 1–6, May 2016. doi: 10.1109/ICC.2016.7510989.
K. A. Clements, P. W. Davis, and K. D. Frey. An interior point algorithm for weighted lest absolute value state estimation. In IEEE PowerEngineering Society Winter Meeting, Feb. 1991.
R. Diao, Kai Sun, V. Vittal, R.J. O’Keefe, M.R. Richardson, N. Bhatt, D. Stradford, and S.K. Sarawgi. Decision tree-based online voltagesecurity assessment using pmu measurements. Power Systems, IEEE Transactions on, 24(2):832–839, May 2009. ISSN 0885-8950.doi: 10.1109/TPWRS.2009.2016528.
70
References II
D. Falcao and S. M. Assis. Linear-programming state estimation error analysis and gross error identification. Power Systems, IEEETransactions on, 3(3):809–815, Aug. 1988.
A. Garcia, A. Monticelli, and P. Abreu. Fast decoupled state estimation and bad data processing. IEEE Trans. On Power Apparatus andSystems, 98(5):1645–1652, Sept. 1979.
E. Handschin, F.C. Schweppe, J. Kohlas, and A. Fiechter. Bad data analysis for power system state estimation. IEEE Trans. On PowerApparatus and Systems, PAS-94:329–337, 1975.
D.M. Hawkins. Identification of Outliers. Monographs on applied probability and statistics. Chapman and Hall, 1980. ISBN9780412219009. URL https://books.google.com/books?id=fb0OAAAAQAAJ.
Miao He and Junshan Zhang. Fault detection and localization in smart grid: A probabilistic dependence graph approach. In Smart GridCommunications (SmartGridComm), 2010 First IEEE International Conference on, pages 43–48, Oct 2010. doi:10.1109/SMARTGRID.2010.5622016.
Miao He and Junshan Zhang. A dependency graph approach for fault detection and localization towards secure smart grid. Smart Grid,IEEE Transactions on, 2(2):342–351, June 2011. ISSN 1949-3053. doi: 10.1109/TSG.2011.2129544.
Victoria Hodge and Jim Austin. A survey of outlier detection methodologies. Artif. Intell. Rev., 22(2):85–126, October 2004.
Tianming Hu and Sam Y. Sung. Detecting pattern-based outliers. Pattern Recogn. Lett., 24(16):3059–3068, December 2003.
Ling Huang, Michael I. Jordan, Anthony Joseph, Minos Garofalakis, and Nina Taft. In-network pca and anomaly detection. In In NIPS,pages 617–624. MIT Press, 2006.
Peter J. Huber. Projection pursuit. Ann. Statist., 13(2):435–475, 1985.
Edwin M. Knorr and Raymond T. Ng. Algorithms for mining distance-based outliers in large datasets. In Proceedings of the 24rdInternational Conference on Very Large Data Bases, VLDB ’98, pages 392–403, San Francisco, CA, USA, 1998. Morgan KaufmannPublishers Inc. ISBN 1-55860-566-5. URL http://dl.acm.org/citation.cfm?id=645924.671334.
W. W. Kotiuga and M. Vidyasagar. Bad data rejection properties of weighted least absolute value techniques applied to static stateestimation. IEEE Trans. On Power Apparatus and Systems, PAS-101:844 – 851, April 1982.
Anukool Lakhina, Mark Crovella, and Christophe Diot. Diagnosing network-wide traffic anomalies. In Proceedings of the 2004 Conferenceon Applications, Technologies, Architectures, and Protocols for Computer Communications, SIGCOMM ’04, pages 219–230, NewYork, NY, USA, 2004a. ACM. ISBN 1-58113-862-8. URL http://doi.acm.org/10.1145/1015467.1015492.
71
References III
Anukool Lakhina, Mark Crovella, and Christophe Diot. Diagnosing network-wide traffic anomalies. In Proceedings of the 2004 Conferenceon Applications, Technologies, Architectures, and Protocols for Computer Communications, SIGCOMM ’04, pages 219–230, 2004b.
C. Levy-Leduc and F. Roueff. Detection and localization of change-points in high-dimensional network traffic data. The Annuals ofApplied Statistics, 3(2):637–662, 2009.
G. Lorden. Procedures for reacting to a change in distribution. Ann. Math. Statist., 42(6):1897–1908, 1971.
E. M. Lourenco, A. S. Costa, and K. S. Clements. Bayesian-based hypothesis testing for topology error identification in generalized stateestimation. Power Systems, IEEE Transactions on, 9(2):1206 – 1215, May 2004.
Y. Mei. Efficient scalable schemes for monitoring a large number of data streams. Biometrica, 97(2):419 – 433, 2010.
H.M. Merrill and F.C. Schweppe. Bad data suppression in power system static state estimation. IEEE Trans. On Power Apparatus andSystems, PAS-90:2718–2725, 1971.
L. Mili, M.G. Cheniae, N.S. Vichare, and P. J. Rousseeuw. Robustification of the least absolute value estimator by means of projectionstatistics. Power Systems, IEEE Transactions on, 11(1):216 – 225, Feb. 1996.
L. Mili, M. Cheniae, N. Vichare, and P. Rousseeuw. Robust state estimation based on projection statistics. Power Systems, IEEETransactions on, 11(2):1118 – 1127, May 1996.
L. Mili, Th. Van Cutesm, and M. Ribbens-Pavella. Hypothesis testing identification: A new method for bad data analysis in power systemstate estimation. IEEE Trans. On Power Apparatus and Systems, 103(11):3239 – 3252, Nov. 1984.
L. Mili, G. Steeno, F. Dobraca, and D. French. A robust estimation method for topology error identification. Power Systems, IEEETransactions on, 14(4):1469 – 1476, Nov. 1999.
S. Pajic and K. A. Clements. Robustification of the least absolute value estimator by means of projection statistics. Power Systems, IEEETransactions on, 20(4):1683 1689, Nov. 2005.
A. Petrov, B. L. Rozovskii, and A. G. Tartakovsky. Efficient nonlinear filtering methods for detection of dim targets by passive systems.Multitarget-Multisensor Tracking: Applications and Advances, 2003.
Moshe Pollak. Optimal detection of a change in distribution. Ann. Statist., 13(1):206–227, 1985.
Moshe Pollak and Alexander G. Tartakovsky. Optimality properties of the shiryaev-roberts procedure. Statistica Sinica, 19(4):1729–1739,2009.
72
References IV
A. Polunchenko and A. G. Tartakovsky. State-of-the-art in sequential change-point detection. Methodol. Comput. Appl.Probab., 14(3):649–684, 2012.
Daniel Rabinowitz. Detecting clusters in disease incidence. Lecture Notes-Monograph Series, 23, Change-point Problems:255–275, 1994.
Sridhar Ramaswamy, Rajeev Rastogi, and Kyuseok Shim. Efficient algorithms for mining outliers from large data sets. In Proceedings ofthe 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD ’00, pages 427–438, 2000.
S. W. Roberts. A comparison of some control chart procedures. Technometrics, 8(3):411–430, Aug., 1966.
P. J. Rousseeuw and A. M. Leroy. Robust Regression and Outlier Detection. John Wiley, 1987.
K. Shafie, B. Sigal, D. Siegmund, and K. Worsley. Rotation space random fields with an application to fmri data. Ann. Statist., 31:1732 –1771, 2003.
A. N. Shiryayev. On optimal methods in earliest detection problems. Theory of Probab. and Appl., 8:26–51, 1963.
D. Siegmund and E. S. Venkatraman. Using the generalized likelihood ratio statistic for sequential detection of a change-point. Ann.Statist., 23(1):255–271, 1995.
D. O. Siegmund and B. Yakir. Detecting the emergence of a signal in a noisy image. Statistics and Its Inference, 1:3–12, 2008.
H. Singh and F. Alvarado. Weighted least absolute value state estimation using interior point methods. Power Systems, IEEE Transactionson, 9(3):1478 – 1484, Aug. 1994.
H. Singh and F. L. Alvarado. Network topology determination using least absolute value state estimation. Power Systems, IEEETransactions on, 10(3):1159 – 1165, Aug. 1995.
A. G. Tartakovsky and V. V. Veeravalli. Asymptotically optimal quickest change detection in distributed sensor. Sequential Analysis, 27(4):441–475, 2008.
J.E. Tate and T.J. Overbye. Line outage detection using phasor angle measurements. Power Systems, IEEE Transactions on, 23(4):1644–1652, Nov 2008. ISSN 0885-8950. doi: 10.1109/TPWRS.2008.2004826.
Haining Wang, Danlu Zhang, and Kang G. Shin. Detecting syn flooding attacks. In Proceedings.Twenty-First Annual Joint Conference ofthe IEEE Computer and Communications Societies, volume 3, pages 1530–1539, June 2002. doi:10.1109/INFCOM.2002.1019404.
H. Wei, H. Sasaki, J. Kubokawa, and R. Yokoyama. An interior point method for power system weighted nonlinear l1 norm static stateestimation. Power Systems, IEEE Transactions on, 13(2):617 – 623, May 1998.
73
References V
Wei Biao Wu, Maggie X. Cheng, and Bei Gou. A hypothesis testing approach for topology error detection in power grids. IEEE Internet ofThings, 2016.
N. Xiang, S. Wang, and E. Yu. A new approach for detection and identification of multiple bad data in power system state estimation.IEEE Trans. On Power Apparatus and Systems, 101(2):454 – 462, Feb. 1982.
N. Xiang, S. Wang, and E. Yu. Estimation and identification of multiple bad data in power system state estimation. In Proc. 7th PowerSystem Computation Conference, PSCC, Lausanne, July 1981.
Y. Xie and D. Siegmund. Sequential multi-sensor change-point detection. Ann. Statist., 2012.
Yao Xie, Jiaji Huang, and R. Willett. Change-point detection for high-dimensional time series with missing data. Selected Topics in SignalProcessing, IEEE Journal of, 7(1):12–27, Feb 2013.
Makoto Yamada, Akisato Kimura, Futoshi Naya, and Hiroshi Sawada. Change-point detection with feature selection in high-dimensionaltime-series data. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI ’13, pages1827–1833, 2013.
Nan Zhang and M. Kezunovic. Improving real-time fault analysis and validating relay operations to prevent or mitigate cascadingblackouts. In Transmission and Distribution Conference and Exhibition, 2005/2006 IEEE PES, pages 847–852, May 2006. doi:10.1109/TDC.2006.1668608.
Cui Zhu, Hiroyuki Kitagawa, and Christos Faloutsos. Example-based robust outlier detection in high dimensional datasets. In Proceedingsof the Fifth IEEE International Conference on Data Mining, ICDM ’05, pages 829–832, 2005.
Jun Zhu and A. Abur. Improvements in network parameter error identification via synchronized phasors. Power Systems, IEEETransactions on, 25(1):44–50, Feb 2010. ISSN 0885-8950. doi: 10.1109/TPWRS.2009.2030274.
Jun Zhu and A. Abur. Bad data identification when using phasor measurements. In IEEE Power Tech Conference, pages 1676 – 1681,July 2007.
74