A Martingale Framework for Concept Change Detection in Time-Varying Data Stream Ho Shen-Shyang
description
Transcript of A Martingale Framework for Concept Change Detection in Time-Varying Data Stream Ho Shen-Shyang
A Martingale Framework for Concept Change Detection in Time-Varying Data
Stream
Department of Computer ScienceGeorge Mason University
Preview:
● Problem: In a data streaming setting, data points are observed one by one. The concepts to be learned from the data stream may change infinitely often. ● How do we detect the changes efficiently?● Other Topics: Concept Drift, Anamoly detection, ... ...● Testing Exchangeability Online (Vovk et.al., ICML 2003)
Outline:
●Background: Strangeness, Martingale, Exchangeability, ●Martingale Framework - Two Tests●Theoretical Justifications●Additional Theoretical Results●Experimental Results
Strangeness Measure (Saunders et. al., IJCAI 1999)
● Support Vector Machine: Value of Lagrange Multipler or Distance from the hyperplane (we use SVM/Lagrange Multiplier – incremental SVM (Cauwenberghs and Poggio, NIPS 2000))
● K-nearest-neighbor rule: A/B whereA – Sum of the distance of a point from the k nearest points with the same labelB – Sum of the distance of a point from the k nearest points with different label
α: scoring how a data point is different from the rest.
Testing Exchangeability: Definitions
Let { Zi : 1 ≤ i < ∞ } be a sequence of r.v.
A finite sequence of r.v. Z1,..., Zn is exchangeable
if the joint distribution p(Z1,..., Zn) is invariant
under any permutation of the indices of the r.v.
A martingale is a sequence of r.v. { Mi : 0 ≤ i < ∞ }
such that Mn is a measurable function of Z1,..., Zn for
all n = 0, 1, ... (M0 is a constant, say 1) and the
conditional expectation of Mn+1 given M1,..., Mn is equal
to Mn, i.e. E(Mn+1 | M1,..., Mn ) = Mn
Testing Exchangeability (Vovk et. al., ICML 2003)
pn = V(Z U {zn}, θn)
=
where ε in [0,1] (say 0.92) and M0= 1
Performing Kolmogorov-Smirnov Test on the p-value distribution as data is observed one by one.
Skewed p-value distribution: small p-values inflate the martingale values
Martingale Framework: Test for Change Detection
Consider the simple null hypothesis H0: “no concept change in the data stream”
against the alternative hypothesisH1: “concept change occurs in the data stream”
Martingale Framework: Test for Change Detection
Martingale Test 1 (MT1)0 < Mn
(ε)< λ
where λ is a positive number. One rejects the null hypothesis when Mn
(ε) ≥ λ.
Martingale Test 2 (MT2)0 < | Mn
(ε) - Mn-1(ε) |< t
where t is a positive number. One rejects the null hypothesis when | Mn
(ε) - Mn-1(ε) | ≥ t.
Justification for Martingale Test 1: Doob's Maximal Inequality
Assuming that { Mi : 0 ≤ i < ∞ } is a nonnegative martingale,
the Doob's Maximal Inequality states that for any λ > 0 and 0 ≤ n < ∞,
Hence, if E(Mn) = E(M0) = 1, then
Justification for Martingale Test 2 Hoeffding-Azuma Inequality
Let c1, ..., cm be positive constants and let Y1, ..., Ym be
a martingale difference sequence with |Yk| ≤ ck for each k.
Then for any t ≥ 0,
At each n, the martingale difference is maximum and bounded when pn is 1/n for the deterministic martingale (θn=1 for all n)
Justification for Martingale Test 2:
When m = 1, the Hoeffding-Azuma Inequality becomes
Assuming that Mn-1(ε) = M0
(ε) = 1,
Comparison:
Some Theoretical Results for Martingale Test 1 (Ho & Wechsler,
UAI 2005)● Martingale Test based on the Doob's Inequality is an approximaton of the sequential probability ratio test.
●
Where α is the desirable size (type I error) and β is the probability of the type II error
● The mean delay time from the true change point is:
where
Experiments
Precision =
Recall =
Number of Correct DetectionsNumber of Detections
Precision: Probability that a detection is actually correctRecall: Probability that the system recognizes a true changeDelay time (for a detected change): the number of time unitsfrom a true change point to the detected change point, if any
Number of Correct DetectionsNumber of True Changes
Experimental Results: Synthetic Data Stream with noise (10-D Rotating Hyperplane) – Precision and Recall
Experimental Results: Synthetic Data Stream – Mean and Median Delay
Time
Experimental Results: Numerical (WaveNorm & TwoNorm)
and Categorical data streams (Nursery)
Experimental Results: Multi-class data streams (Modified USPS data-
set)
Dataset: 10 classes, 256 dimensions, 7291 data points
Data stream: 3 classes.
Experimental Results: Multi-class data streams (Modified USPS data-
set)
Conclusions:
● Our martingale approach is an efficient, one-pass incremental algorithm that
●Does not require a sliding window on the data stream●Does not require monitoring the performance of a base classifier as data is streaming●Works well for high dimensional, multiclass data stream●Theoretically justified.
Conclusions/Future (Current) Work:
● Previous works: Kifer et. al. (VLDB 2004), Fan et. al.(SDM 2004), Wald (1947), Page (1957) ......● Extension to Unlabeled and One-class data streams● Application: Keyframe Extraction, Anomaly Detection, Adaptive Classifier (Ho and Wechsler, IJCAI 2005)● Comparison using different classifiers (i.e. Different strangeness measure, also weak classifiers)● Comparison with other change detection algorithms.● http://cs.gmu.edu/~sho/research/change_detection.html Acknowledgement: Vladimir Vovk, Harry Wechsler.