Statistical Decision Theory · The SNMP server maintains a database of management variables called...
Transcript of Statistical Decision Theory · The SNMP server maintains a database of management variables called...
StatisticalDecisionTheory
L. Fillatre
OutlinesPart I: Anomalydetection innetworks :state-of-the-art
Part II: Statisticaltesting :fundamentals
Part III: Statisticaltesting : sequentialapproaches
Part IV: Statisticaltests : a case study
Statistical Decision Theory
Lionel Fillatre
ENST Bretagne, Computer Science Department
StatisticalDecisionTheory
L. Fillatre
OutlinesPart I: Anomalydetection innetworks :state-of-the-art
Part II: Statisticaltesting :fundamentals
Part III: Statisticaltesting : sequentialapproaches
Part IV: Statisticaltests : a case study
Part I : Anomaly detection in networks
1 Motivation
2 Network anomalies
3 Sources of network data
4 Anomaly detection methods
StatisticalDecisionTheory
L. Fillatre
OutlinesPart I: Anomalydetection innetworks :state-of-the-art
Part II: Statisticaltesting :fundamentals
Part III: Statisticaltesting : sequentialapproaches
Part IV: Statisticaltests : a case study
Part II : statistical testing
5 Motivation
6 Test between two simple hypotheses
7 Test between two composed hypotheses
StatisticalDecisionTheory
L. Fillatre
OutlinesPart I: Anomalydetection innetworks :state-of-the-art
Part II: Statisticaltesting :fundamentals
Part III: Statisticaltesting : sequentialapproaches
Part IV: Statisticaltests : a case study
Part III : sequential approaches
8 Motivation
9 Sequential probability ratio test
10 Change detection: known change
11 Change detection: unknown change
StatisticalDecisionTheory
L. Fillatre
OutlinesPart I: Anomalydetection innetworks :state-of-the-art
Part II: Statisticaltesting :fundamentals
Part III: Statisticaltesting : sequentialapproaches
Part IV: Statisticaltests : a case study
Part IV : a case study
12 DOS attack detection
13 Multichannel parametric CUSUM
14 Multichannel non-parametric CUSUM
15 Practical example
StatisticalDecisionTheory
L. Fillatre
Motivation
Networkanomalies
Sources ofnetwork data
Anomalydetectionmethods
Part I
Anomaly detection in networks
StatisticalDecisionTheory
L. Fillatre
Motivation
Networkanomalies
Sources ofnetwork data
Anomalydetectionmethods
Outlines of Part I
1 Motivation
2 Network anomalies
3 Sources of network data
4 Anomaly detection methods
StatisticalDecisionTheory
L. Fillatre
Motivation
Networkanomalies
Sources ofnetwork data
Anomalydetectionmethods
Motivation
Networks are complex system: vast amounts ofinformation need to be collected and processed.
It is desirable to detect network anomalies andperformance bottlenecks to improve networkmanagement.
To detect anomalies, it is necessary:To give a definition of network anomalies,
To choose the sources of network data relevant todetect anomalies,
To choose a method to detect anomalies.
StatisticalDecisionTheory
L. Fillatre
Motivation
Networkanomalies
Sources ofnetwork data
Anomalydetectionmethods
Network anomalies
Definition: networks anomalies typically refer tocircumstances when network operations deviate fromnormal network behavior.
Classification: there are two kinds of anomalies:
Network failures: server failures, broadcast storms,transient congestions,. . .
Security-related problems: denial of services (DOS),network intrusions,. . .
For the purpose of anomaly detection, we mustcharacterize normal traffic behavior.
StatisticalDecisionTheory
L. Fillatre
Motivation
Networkanomalies
Sources ofnetwork data
Anomalydetectionmethods
Data from network probes
Network probes are specialized tools such as “ping”and “traceroute”.
These methods do not require the cooperation of thenetwork service provider.
Performance metrics derived from such tools canprovide only a coarse grained view of the network.
Hence, the data obtained from probing mechanismsmay be of limited value for anomaly detection.
StatisticalDecisionTheory
L. Fillatre
Motivation
Networkanomalies
Sources ofnetwork data
Anomalydetectionmethods
Data from packet filtering
Packet flows are sampled by capturing the IP headersof a select set of packets at different points in thenetwork.
For flow-based monitoring, a flow is identified bysource-destination addresses and source-destinationport numbers.
Data obtained from this method can be used to detectanomalous network flows.
However, the hardware requirements required for thismeasurement method makes it difficult to use inpractice.
StatisticalDecisionTheory
L. Fillatre
Motivation
Networkanomalies
Sources ofnetwork data
Anomalydetectionmethods
Data from routing protocols
The data collected can be used to build the networktopology and provides link status updates.
Since routing updates occur at frequent intervals, anychange in link utilization will be updated in near realtime.
However, since routing updates must be kept small,only limited information pertaining link statistics can bepropagated through routing updates.
StatisticalDecisionTheory
L. Fillatre
Motivation
Networkanomalies
Sources ofnetwork data
Anomalydetectionmethods
Network management protocols
Network management protocols provide informationabout network traffic statistics.
The information obtained can be used to characterizenetwork behavior.
This source of data is obtained by using the SimpleNetwork Management Protocol (SNMP):
This protocol provides a mechanism to communicatebetween the manager and hundred of SNMP agents.
The SNMP server maintains a database ofmanagement variables called the ManagementInformation Base (MIB) variables.
It is a widely deployed protocol and has beenstandardized for all different network devices.
Due to the fine-grained data available from SNMP, it is agood data source for network anomaly detection.
StatisticalDecisionTheory
L. Fillatre
Motivation
Networkanomalies
Sources ofnetwork data
Anomalydetectionmethods
Hierarchical scheme of methods
Anomaly detection
Rule-based
Pattern matching
Statistical testing
approachesapproachesapproaches approaches
approaches
Signal processing
Finite state
machines
Deterministic Stochastic Non-sequential Sequential
StatisticalDecisionTheory
L. Fillatre
Motivation
Networkanomalies
Sources ofnetwork data
Anomalydetectionmethods
Rule-based approaches (1/2)
Early work in the area of fault or anomaly detection wasbased on expert systems.
An exhaustive database containing the rules ofbehavior of the faulty system is used to determined if afault occurred.
Two kinds of rule selection are possible: deterministicor stochastic (belief networks for example).
StatisticalDecisionTheory
L. Fillatre
Motivation
Networkanomalies
Sources ofnetwork data
Anomalydetectionmethods
Rule-based approaches (2/2)
These rule-based systems rely heavily on the expertiseof the network manager and do not adapt well to theevolving network environment.
Is is possible to improve such a system by adding apicture of previous fault scenarios, which leads tocase-based reasoning systems.
These systems have an heavy dependance on pastinformation and the number of functions to be learnedalso increases with the number of fault studied.
StatisticalDecisionTheory
L. Fillatre
Motivation
Networkanomalies
Sources ofnetwork data
Anomalydetectionmethods
Finite state machines
Anomaly or fault detection using finite state machinesmodel alarm sequences that occur during and prior tofault events.
An alarm is modeled as a state of the finite statemachine.
Finite state machines are built for a known network faultusing history data.
Not all faults can be captured by a finite sequence ofalarms of reasonable length.
StatisticalDecisionTheory
L. Fillatre
Motivation
Networkanomalies
Sources ofnetwork data
Anomalydetectionmethods
Pattern matching
Online learning is used to build a traffic profile for agiven network.
Traffic profiles are built using symptom-specific featurevectors such as link utilization.
When acquired data failed to fit the developed profileswithin some confidence interval, then an anomaly isdeclared.
The efficiency depends on the accuracy of the trafficprofile generated. It is necessary to spend aconsiderable amount of time building traffic profiles (thismethod is not scale gracefully).
StatisticalDecisionTheory
L. Fillatre
Motivation
Networkanomalies
Sources ofnetwork data
Anomalydetectionmethods
Signal processing techniques
Signal processing techniques have been used to modeldata flows.
The normal behavior of data flows are modeled byusing several approaches: spectral analysis, timeseries analysis, wavelets decompositions,. . .
Anomalies correspond to deviations in the normalbehavior of the data flows.
StatisticalDecisionTheory
L. Fillatre
Motivation
Networkanomalies
Sources ofnetwork data
Anomalydetectionmethods
Statistical testing (1/2)
Statistical testing has been used to detect bothanomalies corresponding to network failures as well asnetwork intrusions.
The statistical nature of the available information isused to define the normal behavior of the network(distribution of packet sizes,. . . ).
Non-sequential and sequential approaches can beused according to the network manager’s requirements.
StatisticalDecisionTheory
L. Fillatre
Motivation
Networkanomalies
Sources ofnetwork data
Anomalydetectionmethods
Statistical testing (2/2)
Non-sequential approches allow us to define optimalalgorithms: minimization of false alarms andmaximization of the probability of anomaly detection.
Sequential approaches are used to minimize thenumber of observations needed to detect an anomaly.
When data flows are modeled by using parametricmodels, the design of optimal algorithms is possible.
Non-parametric approaches are particularly studiedbecause of the lack of parametric models. Theseapproaches are often suboptimal.
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Part II
Statistical testing : fundamentals
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Outlines of Part II
5 Motivation
6 Test between two simple hypotheses
7 Test between two composed hypotheses
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Main objectives
Given some observations, it is aimed to diagnose asystem: detection and identification of an anomaly.
Observations are often noisy due to model errorsand/or measurement errors.
For our purpose, the final aim consists of designingautomatic systems to monitor a network and to launchalarms when an anomaly appears.
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Practical examples
To detect Denial Of Services (DOS) attacks on a server.
To detect an abrupt change in the link utilizations on anetwork.
To identify the protocol associated to a flow of packets:http, ftp,. . .
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Basic notations
Assume that we have 2 distributions of probabilityP1, P2.
Let a n-size sample y1, . . . ,yn of independent andidentically distributed (i.i.d.) random variablesgenerated by one of these distributions.
It is assumed that yi ∈ Ω for all i (for example Ω = Rm)
and Ωn is the observation space.
Let us denote Ei[yk] the expectation of yk when yk
follows the distribution Pi, which is denoted yk ∼ Pi.
Assume that each distribution Pi has a probabilitydensity function (pdf) fi(y).All results can be applied to discrete random variables.
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Basic definitions
Definition (simple hypothesis)We call simple hypothesis Hk any assumption concerningthe distribution Pk that can be reduced to a single value inthe space of probability distributions, which is denoted:
Hk = y1, . . . ,yn ∼ Pk, k = 1, 2.
Definition (statistical test)We call a statistical test for testing between hypotheses H1
and H2 any measurable mapping g : Ωn 7→ H1,H2.
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Basic definitions: an illustration
Criterion of optimality Design of the test P1, P2
H1,H2y1, . . . ,yn
g(·)Observation space Ωn
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Basic definitions
Definition (quality of a test)The quality of a test is defined with the aid of a set of errorprobabilities:
αi = Pr(g(y1, . . . ,yn) 6= Hi | Hi true)
= Pri(g(y1, . . . ,yn) 6= Hi)
where αi is the probability of rejecting hypothesis Hi when itis true.
Remarkα1 is called the probability of false alarm ;
α2 is called the probability of miss.
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Bayes test (1/2)
Assume that each hypothesis Hi has a known a prioriprobaility qi such that q1 + q2 = 1.
Definition (Weighted error probability)
For a test g, we define the weighted error probability α(g) by
α(g) = q1α1 + q2α2.
Definition (Bayes test)The test g is said to be a Bayes test if it minimizes α(g) forgiven a priori probabilities qi.
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Bayes test (2/2)
Definition (Likelihood ratio)The Likelihood Ratio (RT) between two pdfs f1 and f2 forthe independent sequence of observations y1, . . . ,yn is
Λ(y1, . . . ,yn) =n∏
i=1
f2(yi)
f1(yi).
Theorem (Bayes test)
The test g which minimizes α(g) is defined by
g(y1, . . . ,yn) =
H1 if Λ(y1, . . . ,yn) <q1q2
H2 if Λ(y1, . . . ,yn) ≥ q1q2
.
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Most Powerful Test (1/2)
DefinitionLet Kα be the class of tests with a bounded probability offalse alarm:
Kα = g : α1(g) ≤ α.
Definition (Most powerful test)We say that a test g∗ ∈ Kα is the Most Powerful (MP) in theclass Kα if, for all g ∈ Kα,
α2(g∗) ≤ α2(g),
or, equivalently,β(g∗) ≥ β(g),
where β(g) = 1− α2(g) is the power of the test g.
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Most Powerful Test (2/2)
Theorem (Neyman-Pearson’s lemma)The MP test g∗ in Kα is given by
g∗(y1, . . . ,yn) =
H1 if Λ(y1, . . . ,yn) < λα
H2 if Λ(y1, . . . ,yn) ≥ λα.
by choosing λα such as α1(g∗) = α.
RemarkThis lemma is fundamental from the theoretical point of viewbut its interest is often limited from the practical point ofview.
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Location testing with Gaussian errors
Assume yi ∼ N (θ, 1).
The two hypotheses are H1 : θ = θ1 andH2 : θ = θ2 with 0 < θ1 < θ2.
The pdf of a Gaussian variable N (θ, 1) isϕθ(x) = ϕ(x− θ) with
ϕ(x) =1√2π
exp (−x2
2).
QuestionFind the Neyman-Pearson test.
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Solution (1/2)
By subtracting θ1 from yi, we can suppose that θ1 = 0.
log Λ(y1, . . . ,yn) = θ2
(
∑ni=1 yi − n θ2
2
)
.
The Neyman-Pearson test is given by
g∗(y1, . . . ,yn) =
H1 if 1√n
∑ni=1 yi < λ′
α
H2 if 1√n
∑ni=1 yi ≥ λ′
α
with λ′α = λα/(θ2
√n) + θ2
√n/2.
Under H1, Λn = 1√n
∑ni=1 yi ∼ N (0, 1) and
λα = Φ−1(1− α), i.e. α1(g∗) = Pr(Λn > λα) = α, where
Φ is the cumulative function of the standardizedGaussian variable N (0, 1).
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Solution (2/2): graphical illustration
-10 -8 -6 -4 -2 2 4 6 8 100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxx
λ0.01
α1(g∗)α2(g
∗)
ϕ(x)ϕ(x− 5)
xθ1 = 0 θ2 = 5
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Basic notations
Let a n-size sample y1, . . . ,yn of i.i.d. random variablesgenerated by a distribution Pθ parameterized by avector θ ∈ Θ.
It is assumed that yi ∈ Ω for all i (for example Ω = Rm)
and Ωn is the observation space.
Let us denote Eθ[yk] the expectation of yk when yk
follows the distribution Pθ, which is denoted yk ∼ Pθ.
Assume that each distribution Pθ has a probabilitydensity function (pdf) fθ(y).All results can be applied to discrete random variables.
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Basic definitions
Definition (composed hypothesis)Any nonsimple hypothesis is called a composed hypothesis.
DefinitionLet us denote H1 : θ ∈ Θ1 and H2 : θ ∈ Θ2 withΘ1 ∩Θ2 = ∅ and Θ1,Θ2 two specified subsets of Θ.
Definition (size of a test)Let α1(g) be the size of a test defined by:
α1(g) = supθ∈Θ1
Pr(g(y1, . . . ,yn) 6= H1 | H1 true)
= supθ∈Θ1
Prθ(g(y1, . . . ,yn) 6= H1).
and let Kα the class of tests with fixed size:
Kα = g : α1(g) ≤ α.
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Uniformly most powerful test
Definition (power function of a test)The power function of a test g is defined by:
βg(θ) = Prθ(g(y1, . . . ,yn) = H2), θ ∈ Θ2.
Definition (Uniformly Most Powerful test)A test g∗ ∈ Kα is said to be Uniformly Most Powerful (UMP)in the class Kα of tests with fixed size α1(g) = α if, for allother tests g ∈ Kα, we have:
∀θ ∈ Θ2, βg(θ) ≤ βg∗(θ).
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Graphical interpretation
0
1
Other tests
UMP test
α
β(θ)
θ
θΘ1 = 0 ≤ θ < θ Θ2 = θ ≤ θ
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Location testing with Gaussian errors
Assume yi ∼ N (θ, 1).
The two hypotheses are H1 : θ = 0 andH2 : θ ≥ θ2 with θ2 > 0.
The pdf of a Gaussian variable N (θ, 1) isϕθ(x) = ϕ(x− θ) with
ϕ(x) =1√2π
exp (−x2
2).
QuestionFind the UMP test.
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Solution
The Neyman-Pearson test between H1 : θ = 0 andH2(θ2) : θ = θ2 is given by
g∗(y1, . . . ,yn) =
H1 if 1√n
∑ni=1 yi < λα
H2(θ2) if 1√n
∑ni=1 yi ≥ λα
.
Under H1, 1√n
∑ni=1 yi ∼ N (0, 1) and λα = Φ−1(1− α)
where Φ is the cumulative function of the standardizedGaussian variable N (0, 1).
Since the decision function 1√n
∑ni=1 yi and the
threshold λα do not depend on θ2, the test g∗ is MP forall θ2 > 0 and, hence, it is an UMP test.
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Generalized Likelihood Ratio test
DefinitionWe say that a test gGLR is a Generalized Likelihood Ratio(GLR) test for testing between H1 = θ : θ ∈ Θ1 andH2 = θ : θ ∈ Θ2 when
gGLR(y1, . . . ,yn) =
H1 if ΛGLR(y1, . . . ,yn) < λα
H2 if ΛGLR(y1, . . . ,yn) ≥ λα
with ΛGLR(y1, . . . ,yn) =supθ2∈Θ2
∏ni=1 fθ2(yi)
supθ1∈Θ1
∏ni=1 fθ1(yi)
.
RemarkThe optimality of the GLR test is established for certaincases (exponential families when n → +∞ for example) butit is not necessary optimal for all cases.
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Location testing with Gaussian errors
Assume yi ∼ N (θ, 1).
The two hypotheses are H1 : θ = |θ1| ≤ a andH2 : θ = |θ2| ≥ b with 0 < a < b.
The pdf of a Gaussian variable N (θ, 1) isϕθ(x) = ϕ(x− θ) with
ϕ(x) =1√2π
exp (−x2
2).
QuestionFind the GLR test.
StatisticalDecisionTheory
L. Fillatre
MotivationMain objectives
Practical examples
Test betweentwo simplehypothesesBasic definitions
Bayes test
Most powerful test
Example
Test betweentwo composedhypothesesBasic definitions
UMP Test
Example
GLR test
Example
Solution
2nlog ΛGLR(y1, . . . ,yn) =
2nlog
sup|θ2|≥b
∏ni=1
fθ2 (yi)
sup|θ1|≤a
∏ni=1
fθ1 (yi),
which leads to
2
nlog ΛGLR(y1, . . . ,yn)=
−(y − b)2 if |y| ≤ a
−(y − b)2 + (y − a)2 if a ≤ |y| ≤ b
(y − a)2 if |y| ≥ b
,
with y = 1n
∑ni=1 yi.
Since 2nlog ΛGLR(y1, . . . ,yn) is an increasing function
of |y|, it follows that:
gGLR(y1, . . . ,yn) =
H1 if y2 < λα
H2 if y2 ≥ λα.
When y1, . . . ,yn ∼ N (θ, 1), y2 ∼ χ2n(‖θ‖22), which leads
to λα = Ψ−1n,a2
(1− α) where Ψn,a2 is the cumulativefunction of a χ2 variable with n degrees of freedom andthe non-centrality parameter a2.
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
Part III
Sequential approaches
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
Outlines of Part III
8 Motivation
9 Sequential probability ratio test
10 Change detection: known change
11 Change detection: unknown change
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
Motivation
In the previous part, we have shown that it is possibleto minimize the error probabilities for a given samplesize n.
New problem: for given error probabilities, try tominimize the sample size or, equivalently, to make thedecision with as few observations as possible.
Sequential analysis is the theory of solving hypothesistesting problems when the sample size is not fixed apriori .
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
Basic definitions (1/2)
Definition (Stopping time)A random variable T is called a stopping time with respectto a process y1, . . . ,yn, . . . if T takes only integer valuesand if, for every n ≥ 1, the event T = n is determined by(y1, . . . ,yn).
ExampleThe first time at which the process y1, . . . ,yn, . . . visitsa set A is a stopping time.
The last time at which the process y1, . . . ,yn, . . . visitsa set A is NOT a stopping time.
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
Basic definitions (2/2)
Definition (Sequential test)A sequential test for testing between between simplehypotheses H1 = y1, . . . ,yn ∼ f1 andH2 = y1, . . . ,yn ∼ f2 is defined to be a pair (g, T ) whereT is a stopping time and g(y1, . . . ,yn) is a decision function.
Definition (Closed test)We say that a sequential test (g, T ) is closed if
P (T < +∞) = 1.
RemarkFor a closed test (g, T ), the mean number of observationsnecessary to decide between the two hypotheses is alwaysfinite: E1(T < +∞) < +∞ and E2(T < +∞) < +∞.
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
Sequential Probability Ratio Test (SPRT)
Definition (SPRT)The test (g, T ) is a Sequential Probability Ratio Test (SPRT)for testing between simple hypotheses H1 and H2 if wesequentially observe data y1, . . . ,yn and if, at time n, wemake one of the following decisions:
accept H1 when Sn ≤ −a ;
accept H2 when Sn ≥ b ;
continue to observe and to test when −a < Sn < b,
Sn =n∑
i=1
logf2(yi)
f1(yi)
and a, b are thresholds such that −∞ < −a < b < +∞.
RemarkThe SPRT is closed.
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
Sequential location testing
Assume yi ∼ N (θ, 1).
The two hypotheses are H1 : θ = 0 andH2 : θ = 2.
The pdf of a Gaussian variable N (θ, 1) isϕθ(x) = ϕ(x− θ) with
ϕ(x) =1√2π
exp (−x2
2).
QuestionFind the SPRT.
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
Solution
Sn =∑n
i=1 logϕ(yi−2)ϕ(yi)
= 2∑n
i=1(yi − 1).
Simulated data:
0 10 20 30 40 50 60-1
0
1
2
3
4
5
yi
i
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
Solution
Sn =∑n
i=1 logϕ(yi−2)ϕ(yi)
= 2∑n
i=1(yi − 1).
Simulated SPRT:
0 10 20 30 40 50 60-60
-40
-20
0
20
40
60
80
100
120
Sn
n
−a
b
acceptance zone of H1
acceptance zone of H2
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
Optimality of the SPRT
DefinitionDenote Kα1,α2
the class of all (sequential andnonsequential) tests (g, T ) such that
α1(g) ≤ α1 , α2(g) ≤ α2
E1(T ) < +∞ , E2(T ) < +∞,
where Ei(T ) is the mean number of observations under Hi.
Let (g, T ) ∈ Kα1,α2a SPRT test for testing between
hypotheses H1 and H2.
Theorem
For every test (g, T ) ∈ Kα1,α2, we have:
E1(T ) ≤ E1(T ) , E2(T ) ≤ E2(T ).
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
Threshold selection: Wald’s identity
TheoremThe error probabilities of (g, T ) verify:
logα2(g)
1− α1(g)≤ min0,−a , log
1− α2(g)
α1(g)≥ max0, b.
RemarkThe equalities hold for the SPRT when the excess over theboundary are small:
Pr1(ST =−a |H1 is accepted)≃Pr2(ST =b |H2 is accepted)≃1.
The thresholds may be chosen by using the followingapproximations: a ≃ log 1−α1
α2, b ≃ log 1−α2
α1.
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
Motivation
The aim is to detect the occurrence of a change assoon as possible, with a fixed rate of false alarm beforethe unknown change time t0.
Let y1,y2, . . . be a random sequence with pdf fθ(yk).Until the unknown time t0, the parameter is θ = θ1 andfrom t0 becomes θ = θ2.
Let ta be the alarm time (stopping time) at which adetection occurs.
For estimating the efficiency of the detection, it isconvenient to use the mean time between false alarmsand mean delay for detection.
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
Basic definitions (1/2)
It is assumed that the change time t0 is non-random.
Definition (Mean time between false alarms)We define mean time between false alarms as the followingexpectation:
T = Eθ1(ta)
where ta is the alarm time.
Definition
Let Kγ = ta : T = Eθ1(ta) ≥ γ the class of all sequentialalgorithms with a bounded mean time between false alarms.
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
Basic definitions (2/2)
Definition (Essential supremum)Let (yi)i∈I be a family of real-valued random variablesbounded by another variable. We say that y is an essentialsupremum for (yi)i∈I , which is denoted y = ess supIyi, if
∀i ∈ I,Pr(yi ≤ z) = 1 ⇔ Pr(y ≤ z) = 1.
Definition (Conditional mean delay)We define conditional mean delay for detection as:
Eθ1(ta − t0 + 1 | ta ≥ t0,y1, . . . ,yt0−1).
Definition (Worst mean delay)We define worst mean delay for detection as:
τ∗(ta) = supt0≥1
ess sup Eθ1(ta − t0 + 1 | ta ≥ t0,y1, . . . ,yt0−1).
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
CUmulated SUM (CUSUM)
Definition (CUSUM)The CUSUM algorithm ta is defined by:
ta = mink ≥ 1 : gk ≥ h
wheregk = Sk −mk,
Sk =
k∑
i=1
si =
k∑
i=1
logfθ2(yi)
fθ1(yi),
mk = min1≤j<k
Sj ,
and h is the threshold.
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
Intuitive derivation of the CUSUM
0 10 20 30 40 50-200
-150
-100
-50
0
50
100
0 10 20 30 40 50-3
-2
-1
0
1
2
3
4
5
6
yk
k k
Sk
h
Alarm time
mk
Eθ1(si) < 0 Eθ2(si) > 0
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
CUmulated SUM (CUSUM)
Definition (CUSUM recursive form)The CUSUM algorithm ta can be rewritten:
ta = mink ≥ 1 : Gk ≥ h
where
G0 = 0, Gk =
[
Gk−1 + logfθ2(yk)
fθ1(yk)
]+
,
x+ = max0, x.
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
CUmulated SUM (CUSUM)
Definition (Kullback-Leibler distance)The Kullback-Leibler distance between two probabilitydensities fθ1 and fθ2 is defined as:
1,2 =
∫
logfθ1(y)
fθ2(y)fθ1(y)dy.
This distance is always positive and is zero only when thetwo densities are equal.
Theorem (Lorden)Let n(γ) = infta∈Kγτ∗(ta). Then
n(γ) =log γ
2,1(1 + o(1))
as γ → +∞, where o(1) stands for a negligible term such aso(1) → 0 as γ → +∞.
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
CUmulated SUM (CUSUM)
Theorem (Lorden)Let a CUSUM algorithm ta designed to verifyT = Eθ1(ta) = γ with γ > 0. Then we have the followingequality:
τ∗(ta) =log γ
2,1(1 + o(1))
as γ → +∞.
Theorem (Optimality of the CUSUM)The CUSUM algorithm is asymptotically optimal in the classKγ .
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
Motivation
In practice, the distribution after the change is rarelyknown.
Let y1,y2, . . . be a random sequence with pdf fθ(yk).Until the unknown time t0, the parameter is θ1 and fromt0 becomes θ2 ∈ Θ2 where the set Θ2 is known.
Three main solutions:Weighted likelihood ratio ;
Invariant likelihood ratio;
Generalized likelihood ratio.
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
Method of weighting functions
It is assumed that θ2 follows a distribution a priori:θ2 ∼ p(θ2).
After the change, the observations yt0 ,yt0+1, . . . followthe distribution:
fθ2(yk) =
∫
Θ2
fθ2(yk)p(θ2)dθ2 , k ≥ t0
⇒ the hypotheses after the change becomes simple.
We can then apply the CUSUM algorithm.
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
Invariant principle
Certain problems are typically invariant with respect toa group of transformation.
The complexity of the hypotheses is then reduced byconsidering only the maximal invariant statistics.
An invariant statistic is a function of the observationssuch as:
the function is invariant with respect to the group oftransformations ;
all other invariant functions depend on this maximalinvariant.
The simplified problem is then solved by using classicaltools.
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
Example
Notation: Np(θ, Ip) denotes the p-dimensionalGaussian distribution with unit covariance matrix andmean θ ∈ R
p.
Problem: the observations y1,y2, . . . follow thedistribution Np(0, Ip) before the change and thedistribution Np(θ, Ip) after the change, with‖θ‖22 =
∑pi=1 θ
2i = c2, c > 0 known.
This problem is invariant with respect to the group ofp-dimensional rotations. The invariant statistics are‖y1‖22, ‖y2‖22, . . ..
These “simplified” observations follow a central χ2
distribution with p degrees of freedom before thechange and a χ2 distribution with p degrees of freedomand the non-centrality parameter c2 after the change.
StatisticalDecisionTheory
L. Fillatre
MotivationMotivation
SPRTBasic definitions
Definition
Example
Optimality
Threshold selection
Changedetection:known changeMotivation
Basic definitions
CUSUM algorithm
Asymptotical bound
Optimality
Changedetection:unknownchangeMotivation
Unknown changetype
Weighting functions
Invariant principle
GLR algorithm
GLR algorithm
It is based on the principle of the GLR test.
ta = mink ≥ 1 : gk ≥ h with
gk = max1≤j≤k
supθ∈Θ2
k∑
i=j
logfθ2(yi)
fθ1(yi).
The properties of optimality of this algorithm are notknown, except for certain cases (exponentialfamilies,. . . ).
StatisticalDecisionTheory
L. Fillatre
DOS attackdetectionAttack scheme
Detection scheme
Problem statement
LR CUSUMLR CUSUM
Optimality
NP-CUSUMPrinciple
NP-CUSUM
ExampleA poisson example
Comparison
Part IV
A case study
StatisticalDecisionTheory
L. Fillatre
DOS attackdetectionAttack scheme
Detection scheme
Problem statement
LR CUSUMLR CUSUM
Optimality
NP-CUSUMPrinciple
NP-CUSUM
ExampleA poisson example
Comparison
Outlines of Part IV
12 DOS attack detection
13 Multichannel parametric CUSUM
14 Multichannel non-parametric CUSUM
15 Practical example
StatisticalDecisionTheory
L. Fillatre
DOS attackdetectionAttack scheme
Detection scheme
Problem statement
LR CUSUMLR CUSUM
Optimality
NP-CUSUMPrinciple
NP-CUSUM
ExampleA poisson example
Comparison
Typical “SYN flooding” attack scheme
The SYN flooding attacks exploit the TCP’s three-wayhand-shake mechanism and its limitation in maintaininghalf-open connections.
ACK ???
SYN
SYN
SYN/ACK
TCP connection
Client Server
Timeout
Half-open connection
StatisticalDecisionTheory
L. Fillatre
DOS attackdetectionAttack scheme
Detection scheme
Problem statement
LR CUSUMLR CUSUM
Optimality
NP-CUSUMPrinciple
NP-CUSUM
ExampleA poisson example
Comparison
Typical detection scheme
It is aimed to detect Denial Of Services (DOS) attacks:SYN flooding attacks, UDP packet storm,. . . .
A DOS attack is generally characterized by an increaseof the number of packets of a particular size.
Principle of monitoring:To split packet sizes into a set of bins (or channels),To monitor these channels simultaneously,To detect a change in one of these channels.
StatisticalDecisionTheory
L. Fillatre
DOS attackdetectionAttack scheme
Detection scheme
Problem statement
LR CUSUMLR CUSUM
Optimality
NP-CUSUMPrinciple
NP-CUSUM
ExampleA poisson example
Comparison
Problem statement
Denote N the number of channel and yk(i), k ≥ 1, thenumber of packets measured in the i-th channel at timek.
Until the unknown time t0, each random value yk(i)follows a distribution Pθ0,i and from t0, there is achange in the distribution, Pθi , of only one of therandom variable, say the i-th channel.
It is assumed that each distribution Pθ0,i and Pθi admitsa pdf denoted fθ0,i and fθi .
StatisticalDecisionTheory
L. Fillatre
DOS attackdetectionAttack scheme
Detection scheme
Problem statement
LR CUSUMLR CUSUM
Optimality
NP-CUSUMPrinciple
NP-CUSUM
ExampleA poisson example
Comparison
LR-CUSUM
Definition (LR-CUSUM)The multichannel parametric CUSUM, simply calledLR-CUSUM, algorithm ta is defined by:
ta = min1≤i≤N
ta(i)
where ta(i) = mink ≥ 1 : Uk(i) ≥ hi,
Uk(i) = max1≤j≤k
Skj (i)
Skj (i) =
k∑
ℓ=j
sℓ(i) =
k∑
ℓ=j
logfθi(yℓ(i))
fθ0,i(yℓ(i)),
and hi is the threshold adapted to the i-th channel.
StatisticalDecisionTheory
L. Fillatre
DOS attackdetectionAttack scheme
Detection scheme
Problem statement
LR CUSUMLR CUSUM
Optimality
NP-CUSUMPrinciple
NP-CUSUM
ExampleA poisson example
Comparison
Criterion of optimality
Definition (False alarm rate)The False Alarm Rate (FAR) is defined by:
FAR(ta) =1
Eθ0 [ta].
Definition (Average detection delay)
When the hypothesis Ht0,i = a change occurs at time t0 inthe i-th channel is true, the speed of detection is measuredby the conditional Average Detection Delay (ADD):
ADDt0,i(ta) = Et0,i[ta−t0+1 | ta ≥ t0] , t0 ≥ 1, i = 1, . . . , N.
StatisticalDecisionTheory
L. Fillatre
DOS attackdetectionAttack scheme
Detection scheme
Problem statement
LR CUSUMLR CUSUM
Optimality
NP-CUSUMPrinciple
NP-CUSUM
ExampleA poisson example
Comparison
Optimality of the LR-CUSUM
Assume hi = h for all i = 1, . . . , N .
Denote Ii =∫
logfθi (y)
fθ0,i(y)fθi(y)dy.
Theorem
Suppose Eθi [logfθi (yℓ(i))
fθ0,i(yℓ(i))]2< +∞ for all i. Then:
For all t0 ≥ 1 and i = 1, . . . , N :
ADDt0,i(ta) ∼h
Iias h → +∞.
If h = log(Nγ), then FAR(ta) ≤ γ and
infτ :FAR(τ)≤γ
supt0≥1
ADDt0,i(τ) ∼|log γ|Ii
as γ → +∞.
StatisticalDecisionTheory
L. Fillatre
DOS attackdetectionAttack scheme
Detection scheme
Problem statement
LR CUSUMLR CUSUM
Optimality
NP-CUSUMPrinciple
NP-CUSUM
ExampleA poisson example
Comparison
Non-parametric change detection
When the distributions Pθi are unknown, the likelihoodratios are also unknown.
The quantities Skj (i) should be replaced by appropriate
score function V kj (i) such as Eθ0 [V
kj (i)] < 0 and
Eθi [Vkj (i)] > 0.
Typical DOS attacks lead to abrupt changes in themean values of the number of packets. Therefore, thedecision function should be sensitive to changes inmean values.
StatisticalDecisionTheory
L. Fillatre
DOS attackdetectionAttack scheme
Detection scheme
Problem statement
LR CUSUMLR CUSUM
Optimality
NP-CUSUMPrinciple
NP-CUSUM
ExampleA poisson example
Comparison
Notations and definitions
Let µi = E0[yk(i)] and θi = Eθi [yk(i)] denote thepre-change and post-change mean values in the i-thchannel by assuming µi < θi.
Definition (Score function)
The score functions V kj (i) are defined by
V kj (i) =
k∑
ℓ=j
wi(yℓ(i)− µi − ci,ℓ) , i = 1, . . . , N,
where wi > 0,ci,ℓ > 0 are tuning parameters.
It is assumed that ci,ℓ = ci for all ℓ.
Denote Vi(yℓ(i)) = wi(yℓ(i)− µi − ci). We have:
E0Vi(yℓ(i))=−wi ci < 0 and EθiVi(yℓ(i))=wi (θi−µi−ci) > 0.
for ci judiciously chosen (0 < ci < θi − µi).
StatisticalDecisionTheory
L. Fillatre
DOS attackdetectionAttack scheme
Detection scheme
Problem statement
LR CUSUMLR CUSUM
Optimality
NP-CUSUMPrinciple
NP-CUSUM
ExampleA poisson example
Comparison
Definition of NP-CUSUM
Definition (NP-CUSUM)The NP-CUSUM algorithm tv is defined by:
t′a = mink ≥ 1 : max1≤i≤N
Wk(i) ≥ h
whereWk(i) = max
1≤j≤kV kj (i)
and h is a threshold.
StatisticalDecisionTheory
L. Fillatre
DOS attackdetectionAttack scheme
Detection scheme
Problem statement
LR CUSUMLR CUSUM
Optimality
NP-CUSUMPrinciple
NP-CUSUM
ExampleA poisson example
Comparison
A poisson example
Assume the size of packet in the i-th channel followsthe poisson distribution P(µi) in the pre-change modeand P(θi) after the change occurs in the i-th channel:
Pr(yk(i) = m) =(µi)
m
m!e−µi , k < t0
Pr(yk(i) = m) =(θi)
m
m!e−θi , k ≥ t0.
It is assumed that θi, µi are known and θi > µi.
QuestionFind the LR-CUSUM ;
Show that the NP-CUSUM is asymptotically optimalwhen ci = εiθi where the variables εi need to bespecified.
StatisticalDecisionTheory
L. Fillatre
DOS attackdetectionAttack scheme
Detection scheme
Problem statement
LR CUSUMLR CUSUM
Optimality
NP-CUSUMPrinciple
NP-CUSUM
ExampleA poisson example
Comparison
Comparison between the algorithms
The LR-CUSUM is based on the statistics
Sℓ(i) = yℓ(i) log(θi/µi)− (θi − µi).
The NP-CUSUM is based on the statistics
Vi(yℓ(i)) = wi(yℓ(i)− µi − εiθi).
It is straightforward to verify that the NP-CUSUMcoincides with the LR-CUSUM test if
εi =Qi − logQi − 1
Qi logQi, wi = logQi
with Qi = θi/µi, which proves that the NP-CUSUM isasymptotically optimal.