Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing...
-
Upload
magnus-boone -
Category
Documents
-
view
225 -
download
1
Transcript of Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing...
Time Series Prediction Time Series Prediction as a Problem of Missing as a Problem of Missing
ValuesValues
Application to ESTSP2007 and NN3 Competition Benchmarks
Antti SorjamaaAntti Sorjamaa and Amaury Lendasse and Amaury Lendasse
Time Series Prediction and ChemoInformatics Time Series Prediction and ChemoInformatics GroupGroupAdaptive Informatics Research CentreAdaptive Informatics Research CentreHelsinki University of TechnologyHelsinki University of Technology
Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 22/22/22
OutlineOutline
Time Series PredictionTime Series Predictionvs. Missing Valuesvs. Missing Values
Global methodologyGlobal methodology– Self-Organizing Maps (SOM)Self-Organizing Maps (SOM)– Empirical Orthogonal Functions (EOF)Empirical Orthogonal Functions (EOF)
ResultsResults
Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 33/22/22
Missing ValuesMissing Values
11 99 ?? 1111
7766
22
44 1133
77 ?? ?? 33
77 ?? 00 88 1122
2211
1100
22 ?? 11 ?? ??
1122
?? 33 ?? 55 66
?? 55 88 ?? ?? 1111
99 66 77 22 9900
66
33 ?? 2211
?? 22 00
Tim
e
4477
4488
4499
5500
??
??
??
??
Tim
e
4422
4433
4444
4455
4466
4477
4433
4444
4455
4466
4477
4488
4444
4455
4466
4477
4488
4499
4455
4466
4477
4488
4499
5500
4466
4477
4488
4499
5500
??
4477
4488
4499
5500
?? ??
4488
4499
5500
?? ?? ??
4499
5500
?? ?? ?? ??
Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 44/22/22
Time Series PredictionTime Series Predictionvs. Missing Valuesvs. Missing Values Methods designed for finding Methods designed for finding
Missing Values in temporally related Missing Values in temporally related databasesdatabases
Time series is such a databaseTime series is such a database Unknown future can be considered Unknown future can be considered
as a set of missing valuesas a set of missing values
Same methods can be appliedSame methods can be applied
Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 55/22/22
Global MethodologyGlobal Methodology
Based on two methodsBased on two methods– SOMSOM
Nonlinear projection / interpolationNonlinear projection / interpolation Topology preservation Topology preservation
on a low-dimensional gridon a low-dimensional grid
– EOFEOF Linear projectionLinear projection Projection to high-dimensional output Projection to high-dimensional output
spacespace Needs initializationNeeds initialization
Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 66/22/22
SOMSOM
200 400 600 800 1000
3.5
4
4.5
5
x1
x2
1 2 3 5 64
1
2
3
56
4
Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 77/22/22
SOM InterpolationSOM Interpolation
SOM learning is done with known SOM learning is done with known datadata
Missing values are left out Missing values are left out Approach proposed by Cottrell and Approach proposed by Cottrell and LetrémyLetrémy(in Applied Stochastic Models and Data Analysis 2005)(in Applied Stochastic Models and Data Analysis 2005)
11
)(minarg 1,)BMU(
tit NMitIi
tx
mxmmx
11
T1
ttMNMt xxx
)()( xmxxx BMUMM
Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 88/22/22
EOF ProjectionEOF Projection
Based on Singular Value Based on Singular Value Decomposition (SVD)Decomposition (SVD)
K
kkkk
1
T* vuUDVX
Only Only q q Singular Values and Vectors are Singular Values and Vectors are usedused– qq is smaller than is smaller than KK (the rank of (the rank of XX))– Larger values contain more signal than Larger values contain more signal than
smallersmaller
q
kkkk
1
Tˆ vuX
Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 99/22/22
EOF Projection (2)EOF Projection (2)
SVD cannot deal with missing SVD cannot deal with missing valuesvalues– Initialization is crucial!Initialization is crucial!
Decomposition with SVD and Decomposition with SVD and reconstructionreconstruction– qq largest singular values and vectors largest singular values and vectors
are used in the reconstructionare used in the reconstruction– Original data is not modified!Original data is not modified!– The selection of The selection of qq using validation using validation
Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 1010/22/22
EOF Projection (3)EOF Projection (3)
11 99 ?? 1111
7766
22
44 1133
77 ?? ?? 33
77 ?? 00 88 1122
2211
1100
22 ?? 11 ?? ??
1122
?? 33 ?? 55 66
?? 55 88 ?? ?? 1111
99 66 77 22 9900
66
33 ?? 2211
?? 22 00
11 99 55 1111
7766
22
44 1133
77 55 55 33
77 55 00 88 1122
2211
1100
22 55 11 55 55
1122
55 33 55 55 66
55 55 88 55 55 1111
99 66 77 22 9900
66
33 55 2211
55 22 00
11 99 44 1111
7766
22
44 1133
77 66 1111
33
77 22 00 88 1122
2211
1100
22 33 11 88 1100
1122
55 33 33 55 66
77 55 88 22 55 1111
99 66 77 22 9900
66
33 88 2211
11 22 00
11 99 44 1111
7766
22
44 1133
77 99 2211
33
77 44 00 88 1122
2211
1100
22 11 11 99 1122
1122
55 33 33 55 66
99 55 88 22 55 1111
99 66 77 22 9900
66
33 88 2211
11 22 00
11 99 44 1111
7766
22
44 1133
77 99 2222
33
77 55 00 88 1122
2211
1100
22 11 11 99 1133
1122
44 33 33 55 66
1100
55 88 22 55 1111
99 66 77 22 9900
66
33 88 2211
11 22 00
1.1. InitializatioInitializationn
2.2. Round 1Round 13.3. Round 2Round 24.4. Round 3Round 3
..
..
..n.n. Done!Done!
Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 1111/22/22
Global Methodology Global Methodology (2)(2)
Missing Missing DataData
SOMSOM
EOFEOF
Data with Data with filled filled valuesvalues
SOM grid sizeSOM grid size
Number of Number of EOFEOF
EOF EOF iterationiteration
Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 1212/22/22
ESTSP2007ESTSP2007Competition DataCompetition Data
100 200 300 400 500 600 700 800
20
22
24
26
28
Time
Com
pet
itio
n D
ata
Validation
Learning
Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 1313/22/22
Results, Regressor size Results, Regressor size 1111
2 4 6 8 10 12 14 16 18 200
1
2
3
4
5
6
SOM Size / Number of EOF
Val
idat
ion
MS
E
EOF
SOM
SOM+EOF
Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 1414/22/22
2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
0.6
SOM Size / Number of EOF
Val
idat
ion
MS
E
Results (2)Results (2)
EOF
SOM
SOM+EOF
Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 1515/22/22
PredictionPrediction
750 800 850 90018
20
22
24
26
28
30
Time
Com
pet
itio
n D
ata
Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 1616/22/22
NN3 CompetitionNN3 Competition
Prediction of 111 time seriesPrediction of 111 time series Single, automatic, methodology for Single, automatic, methodology for
predicting all the seriespredicting all the series Prediction of 18 values to the Prediction of 18 values to the
future for each seriesfuture for each series All series rather short, which All series rather short, which
makes the prediction trickymakes the prediction tricky Mean SMAPE of all series evaluated Mean SMAPE of all series evaluated
in the competitionin the competition
Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 1717/22/22
Validation MSE = Validation MSE = 0,15590,1559
NN3: Long SeriesNN3: Long Series
0 20 40 60 80 100 120 140-1.5
-1
-0.5
0
0.5
1
1.5
2
Validation MSE = Validation MSE = 0,00760,0076
20 40 60 80 100 120 140
-1.5
-1
-0.5
0
0.5
1
1.5
Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 1818/22/22
NN3: Short SeriesNN3: Short Series
10 20 30 40 50 60
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
Validation MSE = Validation MSE = 0,34930,3493
Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 1919/22/22
NN3: Validation ErrorsNN3: Validation Errors
0 0.2 0.4 0.6 0.8 1 1.2 1.40
2
4
6
8
10
12
Validation MSE
Num
ber
of S
erie
s
0 0.2 0.4 0.6 0.8 1 1.2 1.40
2
4
6
8
Validation MSE
Num
ber
of S
erie
s
0 0.2 0.4 0.6 0.8 1 1.2 1.40
5
10
Validation MSE
Num
ber
of S
erie
s
Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 2020/22/22
SummarySummary
Time Series Prediction can be viewed Time Series Prediction can be viewed as a problem of Missing Valuesas a problem of Missing Values
SOM+EOF methodology works well, SOM+EOF methodology works well, better than individual methods alonebetter than individual methods alone– SOM projection is discreteSOM projection is discrete– EOF needs sufficiently good initializationEOF needs sufficiently good initialization
Methods complete each otherMethods complete each other
Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 2121/22/22
Further WorkFurther Work
Improvements to the methodologyImprovements to the methodologyThe selection of singular values and The selection of singular values and vectorsvectors
Convergence criterionConvergence criterionHow to guarantee quick convergence?How to guarantee quick convergence?
Applying the methodology to data Applying the methodology to data sets from other fieldssets from other fields
Climatology, finance, process dataClimatology, finance, process data
2222/22/22
Questions?Questions?
[email protected]@hut.fi
[email protected]@cis.hut.fi
http://www.cis.hut.fi/projects/tsphttp://www.cis.hut.fi/projects/tsp
Time Series Prediction as a Time Series Prediction as a Problem of Missing ValuesProblem of Missing Values
Application to ESTSP2007 and NN3 Competition Benchmarks