Post on 23-Dec-2014
description
Real-time Ranking with Concept Drift Using Expert Advice
Hila Becker and Marta AriasCenter for Computational Learning Systems
Columbia University
2
Dynamic Ranking
Continuous arrival of data over time Set of items to rank
Dynamic featuresAdapt to Changes
Given a list of electrical grid components, produce a ranking according to failure susceptibility
+
+
+
-
+
-
-
-
time
Problem Setting
+
+
+
-
+
-
-
-
+
+
+
-
+
-
-
-
+
+
+
-
+
-
-
-
?
. . .t-1 t1 2 3
?
?
?
?
?
?
?
M
+
Feature Vector
x = x1,x2,…,xn
Label y
4
Challenges
Changes in underlying distributionHiddenConcept driftAdapt learning model to improve
predictions Finite storage space
Sample from the dataDiscard old or irrelevant information
5
Concept Drift
+
+
+
+
+ -
+
+
-
--
-
-
-
time
6
Ensemble Methods
time
7
Weighted Expert Ensembles
Associate a weight with each expert Measure of belief in expert performance Weights used in final prediction
Use only the best expertWeighted average of predictions
Update the weights after every prediction
8
Weighted Majority Algorithm
e1 . . .e2 e3 eNN Experts
1 0 0 1?
w1*1 + w2*0 + w3*0 + . . . + wN*1
>0.5 <0.5
1 0
1
9
Modified Weighted Majority
Different Constrains for data streamsIncorporate new dataStatic vs. Dynamic set of experts
Ranking AlgorithmLoss function – 1-normalized average
rank of positive examplesCombine Predictions – weighted
average rank
10
Online Ranking Algorithm
e1 . . .e2 e3 eB
w1 w2 w3 wB
? F1
F4
F3
F2
F5
F4
F2
F1
F3
F5
F1
F3
F5
F4
F2
F1
F3
F4
F2
F5
F1
F3
F4
F2
F5
F3
F1
F4
F2
F5
eB+1 eB+2
wB+1 wB+2
11
Performance – Summer 05
12
Performance – Winter 06
13
Contributions
Additive weighted ensemble based on the Weighted Majority algorithm
Algorithm adapted to ranking Experiments on a Real-world
datastreamOutperform traditional approachesExplore performance/complexity
tradeoffs
14
Future Work
Ensemble diversity control Exploit re-occurring contexts
Use knowledge of cyclic patternsRevive old experts
Change detection Statistical estimation of predicting
ensemble size
15
Ensemble Methods
Static ensemble with online learners [Hulten ’01]
Use batch-learners as experts Can use many learning algorithms Loses interpretability
Additive ensembles Train an expert at constant intervals [Street
and Kim ’01] Train an expert when performance declines
[Kolter ’05]
16
Ensemble Pruning
Additive ensembles can grow infinitely large
Criteria for removing expertsAge - retire oldest model [Chu and
Zaniolo ‘04]Performance
• Worst in the ensemble • Below a minimal threshold [Stanley ’01]
Instance-based Pruning [Wang et al. ’03]
17
Dealing with a moving set of experts
Introduce new parametersB: “budget” (max number of models) set to 100p: new models weight percentile in [0,100]: age penalty in (0,1]
If too many models (more than B), drop models with poor q-score, whereqi = wi • pow(, agei)
I.e., is rate of exponential decay
18
Performance Metric
feedersfailures
failureranki
i
##
)(1
5833.08*3
5321
1 0
2 1
3 1
4 0
5 1
6 0
7 0
8 0
ranking outages
3
2
1
1 2 3 4 5 6 7 8 pAUC=17/24=0.7
19
Budget Variation
20
Data Streams
Continuous arrival of data over time Real-world applications
Consumer shopping patternsWeather predictionElectricity load forecasting
Increased attentionCompanies collect dataTraditional approaches do not apply