April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model...
-
Upload
sarah-hensley -
Category
Documents
-
view
216 -
download
0
Transcript of April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model...
![Page 1: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/1.jpg)
April 3, 2003Max Plank Institute for Biological Cybernetics
Functional Analytic Approach to Model Selection
Functional Analytic Approach to Model Selection
Department of Computer Science,Tokyo Institute of Technology, Tokyo, Japan
Masashi Sugiyama([email protected])
![Page 2: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/2.jpg)
2
From , obtain such thatit is as close to as possible.
Regression ProblemRegression Problem
:Learning target function
:Learned function
:Training examples
(noise)
![Page 3: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/3.jpg)
3Typical Method of LearningTypical Method of Learning
Linear regression model
Ridge regression
:Parameters to be learned
:Basis functions
:Ridge parameter
![Page 4: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/4.jpg)
4Model SelectionModel Selection
Too simple Appropriate Too complex
Target functionLearned function
Choice of the model affectsheavily on the learned function
(Model refers to, e.g., the ridge parameter )
![Page 5: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/5.jpg)
5Ideal Model SelectionIdeal Model Selection
Determine the model such thata certain generalization error
is minimized.
Badness of
![Page 6: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/6.jpg)
6
However, the generalization error can not be directly calculated since it
includes unknown learning target function
Practical Model Selection
Practical Model Selection
Determine the model such thatan estimator of the generalization error
is minimized.
We want to have an accurate estimator
(not true for Bayesian model selection using evidence)
![Page 7: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/7.jpg)
7
Try to obtain unbiased estimators
Two Approaches toEstimating Generalization Error
(1)
Two Approaches toEstimating Generalization Error
(1)
• CP (Mallows, 1973)• Cross-Validation• Akaike Information Criterion (Akaike, 1974) etc.
Interested intypical-caseperformance
![Page 8: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/8.jpg)
8
Try to obtain probabilistic upper bounds
Two Approaches toEstimating Generalization Error
(2)
Two Approaches toEstimating Generalization Error
(2)
• VC-bound (Vapnik & Chervonenkis, 1974)• Span bound (Chapelle & Vapnik, 2000)• Concentration bound
(Bousquet & Elisseeff 2001) etc.with probability
Interested inworst-case
performance
![Page 9: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/9.jpg)
9Popular Choices of
Generalization MeasurePopular Choices of
Generalization MeasureRisk
e.g.,
Kullback-Leibler divergence
:Learned density:Target density
![Page 10: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/10.jpg)
10Concerns in Existing
MethodsConcerns in Existing
MethodsThe used approximation often requires a large (infinite) number of training examples for justification (asymptotic approximation)
They do not work with small samples
Generalization measure should be integrated over , from which training examples are drawn
They can not be used for transduction
(estimating error at a point of interest)
![Page 11: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/11.jpg)
11Our InterestsOur Interests
We are interested inEstimating the generalization error with
accuracy guaranteed for small (finite) samplesEstimating the transduction error
(the error at a point of interest) Investigating the role of unlabeled samples
(samples without output sample values )
![Page 12: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/12.jpg)
12
:Functional Hilbert spaceWe assume
Our Generalization MeasureOur Generalization Measure
:Norm in the function space
![Page 13: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/13.jpg)
13Generalization Measure
in Functional Hilbert Space
Generalization Measurein Functional Hilbert
SpaceA functional Hilbert space is specified bySet of functions which span the space, Inner product (and norm).
Given a set of functions, we can design the inner product (and therefore the generalization measure) as desired.
![Page 14: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/14.jpg)
14
Weighted distance in input domain
Weighted distance in Fourier domain
Sobolev norm
Examples of the NormExamples of the Norm
:Weight function
:Weight function
: -th derivative of
![Page 15: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/15.jpg)
15Interesting FeaturesInteresting Features
When and , we can use unlabelled samples for estimating :
For transductive inference (given ),
For interpolation, extrapolation: Desired
:Weightfunction
![Page 16: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/16.jpg)
16Goal of My TalkGoal of My Talk
I suppose that you like the generalization measure defined in the functional Hilbert space.
The goal of my talk is to give a method for estimating the generalization error.
:Norm in the function space
![Page 17: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/17.jpg)
17
For further discussion, we have to specify the class of function spaces.
We want the class to be less restrictive.A general function space such as is
not suitable for learning problems because a value of a function at a point is not specified in .
Function Spaces for LearningFunction Spaces for Learning
and have different values at
But they are treated asthe same function in
is spanned by
![Page 18: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/18.jpg)
18
A function space that is rather general and a value of a function at a point is specified is the reproducing kernel Hilbert space (RKHS).
RKHS has the reproducing kernelFor any fixed ,
is a function of in For any function in and any ,
Reproducing Kernel Hilbert Space
Reproducing Kernel Hilbert Space
:Inner product in the function space
![Page 19: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/19.jpg)
19
Specified RKHS : Fixed
, : Mean , VarianceLinear estimation
e.g., ridge regression for linear model
Formulation of Learning ProblemFormulation of Learning Problem
:Linear operator
:Basis functions in
![Page 20: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/20.jpg)
20Sampling OperatorSampling Operator
For any RKHS , there exists a linear operator from to such that
Indeed,
:Neumann-Schatten product
: -th standard basis in
For vectors,
![Page 21: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/21.jpg)
21FormulationFormulation
Learningtarget
function
Learnedfunction
+noise
Sampling operator(Always linear)
Learning operator(Assume linear)
RKHS Sample value space
Gen. error
![Page 22: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/22.jpg)
22
We are interested in typical performance so we estimate the expected generalization error over the noise
We do not take expectation over input points
Data-dependent !We do not assume
Advantageous in active learning !
Expected Generalization ErrorExpected Generalization Error
:Expectation over noise
![Page 23: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/23.jpg)
23Bias / Variance DecompositionBias / Variance Decomposition
Bias Variance
Bias
Variance
We want to estimate the bias !
:Noise variance:Expectation over noise
:Adjoint of
RKHS
![Page 24: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/24.jpg)
24Tricks for Estimating
BiasTricks for Estimating
BiasSuppose we have a linear operator
that gives an unbiased estimate of
We use for estimating the bias of
:Expectation over noise
RKHS
Sugiyama & Ogawa (Neural Comp., 2001)
![Page 25: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/25.jpg)
25Unbiased Estimator of
BiasUnbiased Estimator of
BiasBias
Rough estimate
RKHS
![Page 26: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/26.jpg)
26Subspace Information Criterion
(SIC)Subspace Information Criterion
(SIC)
SIC is an unbiased estimator ofthe generalization error with finite samples
Estimate of Bias Variance
:Expectation over noise
![Page 27: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/27.jpg)
27
exists if and only if span the entire space .
When this is satisfied, is given by .
We can enjoy all the features ! (Unlabeled samples, transductive inference etc.)
Obtaining Unbiased Estimate
Obtaining Unbiased EstimateWe need that gives an unbiased
estimate of learning target .
:Generalized inverse
![Page 28: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/28.jpg)
28Example of Using SIC:
Standard Linear RegressionExample of Using SIC:
Standard Linear RegressionLearning target function
where are unknownRegression model
where are estimated linearly(e.g., ridge regression)
![Page 29: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/29.jpg)
29Example (cont.)Example (cont.)
Generalization measure
If the design matrix has rank , then the best linear unbiased estimator (BLUE) always exists
In this case, SIC provides an unbiased estimate of the above generalization error
:Number of basis functions
:Weight function
![Page 30: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/30.jpg)
30
However, the design matrix has rank only if
Therefore, the target function should be included in a rather small model
Applicability of SIC
Applicability of SIC
:Number of basis functions
:Number of training examples
Range of application of SIC is rather limited
![Page 31: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/31.jpg)
31When Unbiased Estimate
Does Not ExistWhen Unbiased Estimate
Does Not Exist exists if and only if span
the whole space .When this condition is not fulfilled, let us
restrict ourselves to finding a learning result function from a subspace , not from the entire RKHS
RKHS
Sugiyama & Müller (JMLR, 2002)
![Page 32: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/32.jpg)
32Essential Generalization
ErrorEssential Generalization
Error
RKHS
Essential Irrelevant(constant)
Essentially, we are estimating projection
is just replaced by
![Page 33: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/33.jpg)
33
Such exists if and only if the subspace
is included in the span of .
Unbiased Estimate of Projection
Unbiased Estimate of ProjectionIf a linear operator that gives an
unbiased estimate of the projection of the learning target is available, then SIC is an unbiased estimator of the essential generalization error.
e.g., kernel regression model
![Page 34: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/34.jpg)
34RestrictionRestriction
However, another restriction arises:If the generalization measure is designed
as desired, we have to use the kernel function induced by the generalization measure
![Page 35: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/35.jpg)
35Restriction
(cont.)Restriction
(cont.)On the other hand,If a desired kernel function is used, then
we have to use the generalization measure induced by the kernel
e.g., generalization measure in Gaussian RKHS heavily penalizes high frequency components
![Page 36: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/36.jpg)
36Summary of Usage of SICSummary of Usage of SIC
SIC essentially has two modes.For rather restricted linear regression, SIC ha
s several interesting properties.Unlabeled samples can be utilized for estimating
prediction error (expected test error).Any weighted error measures can be used,
e.g., inter-(extra-)polation, transductive inference.
For kernel regression, SIC can always be applied. However, kernel induced generalization measure should be employed.
![Page 37: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/37.jpg)
37Simulation (1): Setting Simulation (1): Setting
:Trigonometric polynomial RKHS
Span:
Gen. measure:
Learning target function :
sinc-like function in
![Page 38: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/38.jpg)
38
Training examples :
Ridge regression is used for learning
Number of training examples:
Noise variance:
Number of trials:
Simulation (1): Setting (cont.)
Simulation (1): Setting (cont.)
![Page 39: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/39.jpg)
39Simulation (1-a):Using Unlabeled
Samples
Simulation (1-a):Using Unlabeled
SamplesWe estimate the prediction error using 1000 unlabeled samples
![Page 40: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/40.jpg)
40Results: Unlabeled
SamplesResults: Unlabeled
Samples
Values can be negative since some constants are ignored
:Ridge parameter
![Page 41: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/41.jpg)
41Results: Unlabeled
SamplesResults: Unlabeled
Samples
:Ridge parameter
![Page 42: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/42.jpg)
42Simulation (1-b):
TransductionSimulation (1-b):
Transduction
We estimate the test error
at a single test point
![Page 43: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/43.jpg)
43Results:
TransductionResults:
Transduction
:Ridge parameter
![Page 44: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/44.jpg)
44Results:
TransductionResults:
Transduction
:Ridge parameter
![Page 45: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/45.jpg)
45
:Gaussian RKHS
Learning target function : sinc function
Training examples :
We estimate
Simulation (2): Infinite Dimensional
RKHS
Simulation (2): Infinite Dimensional
RKHS
![Page 46: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/46.jpg)
46Results: Gaussian
RKHSResults: Gaussian
RKHS
:Ridge parameter
![Page 47: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/47.jpg)
47Results: Gaussian
RKHSResults: Gaussian
RKHS
:Ridge parameter
![Page 48: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/48.jpg)
48Simulation (3):
DELVE Data SetsSimulation (3):
DELVE Data Sets :Gaussian RKHS
We choose the ridge parameter bySICLeave-one-out cross-validationAn empirical Bayesian method
(Marginal likelihood maximization)
Performance is compared by test error
(Akaike, 1980)
![Page 49: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/49.jpg)
49Normalized Test
ErrorsNormalized Test
Errors
Red: Best or comparable (95%t-test)
![Page 50: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/50.jpg)
50Image RestorationImage Restoration
Restoration Filter
We would like to determineparameter values appropriately.
largesmall appropriate
DegradedImage
Parameter
e.g.,Gaussian filter,regularization filter
Sugiyama et al. (IEICE Trans., 2001)Sugiyama & Ogawa (Signal Processing, 2002)
![Page 51: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/51.jpg)
51FormulationFormulation
Hilbert space Hilbert space
Original image
Restored image
Degradation
Filter
Observed image
Noise
![Page 52: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/52.jpg)
52Results with Regularization
FilterResults with Regularization
FilterOriginalimages
Degradedimages
Restoredimages
using SIC
![Page 53: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/53.jpg)
53Precipitation Estimation
Precipitation EstimationEstimating future precipitation from past
precipitation and whether radar data.Our method with SIC won the 1st prize
in estimation accuracy in IEICE Precipitation Estimation Contest 2001
Precipitation and weather radar and data from
IEICE Precipitation Estimation Contest 2001
Moro & Sugiyama (IEICE General Conf., 2001)
1st TokyoTech MSE=0.71
2nd KyuTech MSE=0.75
3rd Chiba Univ MSE=0.93
4th MSE=1.18
![Page 54: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/54.jpg)
54References
(Fundamentals of SIC)References
(Fundamentals of SIC)
Proposing the concept of SICSugiyama, M. & Ogawa, H. Subspace information criterion for model selection. Neural Computation, vol.13, no.8, pp.1863-1889, 2001
Performance evaluation of SICSugiyama, M. & Ogawa, H. Theoretical and experimental evaluation of the subspace information criterion. Machine Learning, vol.48, no.1/2/3, pp.25-50, 2002.
![Page 55: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/55.jpg)
55References (SIC for
Particular Learning Methods)References (SIC for
Particular Learning Methods)SIC for regularization learning
Sugiyama, M. & Ogawa, H. Optimal design of regularization term and regularization parameter by subspace information criterion. Neural Networks, vol.15, no.3, pp.349-361, 2002.
SIC for sparse regressorsTsuda, K., Sugiyama, M., & Müller, K.-R. Subspace information criterion for non-quadratic regularizers --- Model selection for sparse regressors. IEEE Transactions on Neural Networks, vol.13, no.1, pp.70-80, 2002.
![Page 56: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/56.jpg)
56References (Applications of
SIC)References (Applications of
SIC)Applying SIC to image restorationSugiyama, M., Imaizumi, D., & Ogawa, H. Subspace information criterion for image restoration --- Optimizing parameters in linear filters. IEICE Transactions on Information and Systems, vol.E84-D, no.9, pp.1249-1256, Sep. 2001.
Sugiyama, M. & Ogawa, H. A unified method for optimizing linear image restoration filters. Signal Processing, vol.82, no.11, pp.1773-1787, 2002.
Applying SIC to precipitation estimationMoro, S. & Sugiyama, M. Estimation of precipitation from meteorological radar data. In Proceedings of the 2001 IEICE General Conference SD-1-10, pp.264-265, Shiga, Japan, Mar. 26-29, 2001.
![Page 57: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/57.jpg)
57References (Extensions of
SIC)References (Extensions of
SIC)Extending range of application of SIC Sugiyama, M. & Müller, K.-R. The subspace information criterion for infinite dimensional hypothesis spaces. Journal of Machine Learning Research, vol.3 (Nov), pp.323-359, 2002.
Further Improving SICSugiyama, M.Improving precision of the subspace information criterion. IEICE Transactions on Fundamentals (to appear).
Sugiyama, M., Kawanabe, M. & Müller, K.-R. Trading variance reduction with unbiasedness --- The regularized subspace information criterion for robust model selection (submitted).
![Page 58: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/58.jpg)
58Conclusion
sConclusion
sWe formulated the regression problem from a functional analytic point of view.
Within this framework, we gave a generalization error estimator called the subspace information criterion (SIC).
Unbiasedness of SIC guaranteed even with finite samples.
We did not take expectation over training sample points so SIC may be more data-dependent.
![Page 59: April 3, 2003Max Plank Institute for Biological Cybernetics Functional Analytic Approach to Model Selection Department of Computer Science, Tokyo Institute.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d0e5503460f949e34d2/html5/thumbnails/59.jpg)
59Conclusions
(cont.)Conclusions
(cont.)SIC essentially has two modes:For rather restrictive linear regression,
SIC has several interesting properties.Unlabeled samples can be utilized for estimating p
rediction error.Any weighted error measures can be used, e.g., i
nterpolation, extrapolation, transductive inference.
For kernel regression, SIC can always be applied. However, kernel induced generalization measure should be employed.