Lecture 12 - Stoichiometry Lecture 12 - Introduction Lecture 12
Lecture 9 Inexact Theories. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and...
-
date post
19-Dec-2015 -
Category
Documents
-
view
212 -
download
0
Transcript of Lecture 9 Inexact Theories. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and...
Lecture 9
Inexact Theories
SyllabusLecture 01 Describing Inverse ProblemsLecture 02 Probability and Measurement Error, Part 1Lecture 03 Probability and Measurement Error, Part 2 Lecture 04 The L2 Norm and Simple Least SquaresLecture 05 A Priori Information and Weighted Least SquaredLecture 06 Resolution and Generalized InversesLecture 07 Backus-Gilbert Inverse and the Trade Off of Resolution and VarianceLecture 08 The Principle of Maximum LikelihoodLecture 09 Inexact TheoriesLecture 10 Nonuniqueness and Localized AveragesLecture 11 Vector Spaces and Singular Value DecompositionLecture 12 Equality and Inequality ConstraintsLecture 13 L1 , L∞ Norm Problems and Linear ProgrammingLecture 14 Nonlinear Problems: Grid and Monte Carlo Searches Lecture 15 Nonlinear Problems: Newton’s Method Lecture 16 Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals Lecture 17 Factor AnalysisLecture 18 Varimax Factors, Empirical Orthogonal FunctionsLecture 19 Backus-Gilbert Theory for Continuous Problems; Radon’s ProblemLecture 20 Linear Operators and Their AdjointsLecture 21 Fréchet DerivativesLecture 22 Exemplary Inverse Problems, incl. Filter DesignLecture 23 Exemplary Inverse Problems, incl. Earthquake LocationLecture 24 Exemplary Inverse Problems, incl. Vibrational Problems
Purpose of the Lecture
Discuss how an inexact theory can be represented
Solve the inexact, linear Gaussian inverse problem
Use maximization of relative entropy as a guiding principle for solving inverse problems
Introduce F-test as way to determine whether one solution is “better” than another
Part 1
How Inexact Theories can be Represented
How do we generalize the case of
an exact theory
to one that is inexact?
0 1 2 3 4 5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
m
d
dobs model, m
datu
m, d
mapmest
d pre
d=g(m)
exact theory case
theory
0 1 2 3 4 5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
m
d
dobs model, m
datu
m, d
mapmest
d pre
d=g(m)
to make theory inexact ...must make thetheory probabilisticor fuzzy
0 1 2 3 4 5
0
1
2
3
4
5
m
d
0 1 2 3 4 5
0
1
2
3
4
5
m
d
0 1 2 3 4 5
0
1
2
3
4
5
m
ddobs da
tum
, d model, m
map
dobs
map
dobs mapmest
dpre
model, m model, mda
tum
, d
datu
m, d
combinationtheorya prior p.d.f.
how do youcombine
two probability density functions ?
how do youcombine
two probability density functions ?
so that the information in them is combined ...
desirable properties
order shouldn’t matter
combining something with the null distribution should leave it unchanged
combination should be invariant under change of variables
Answer
0 1 2 3 4 5
0
1
2
3
4
5
m
d
0 1 2 3 4 5
0
1
2
3
4
5
m
d
0 1 2 3 4 5
0
1
2
3
4
5
m
d0 1 2 3 4 5
0
1
2
3
4
5
m
d
0 1 2 3 4 5
0
1
2
3
4
5
m
d
0 1 2 3 4 5
0
1
2
3
4
5
md
dobs da
tum
, d model, m
mapdobs
map
dobs
mapmest
dpre
model, m model, m
datu
m, d
datu
m, d
theory, pgdobs
datu
m, d
model, m
map
dobs
mapdobs
mapmestdpre
model, m model, mda
tum
, d
datu
m, d
(F)(E)(D)
a priori , pA total, pT
“solution to inverse problem”maximum likelihood point of
(with pN∝constant)
simultaneously givesmest and dpre
probability that the estimated model parameters are near m and the predicted data are near d
probability that the estimated model parameters are near m irrespective of the value of the predicted data
T
conceptual problem
do not necessarily have maximum likelihood points at the same value of m
and
T
0 1 2 3 4 5
0
1
2
3
4
5
0 1 2 3 4 50
0.2
0.4
0.6
0.8
1
m
p(m
)dobs dpre
model, m
datu
m,
d 0 1 2 3 4 5
0
1
2
3
4
5
0 1 2 3 4 50
0.2
0.4
0.6
0.8
1
m
p(m
)
mapmest
model, mmest’
p(m)
illustrates the problem in defining adefinitive solution
to an inverse problem
illustrates the problem in defining adefinitive solution
to an inverse problem
fortunatelyif all distributions are Gaussian
the two points are the same
Part 2
Solution of the inexact linear Gaussian inverse problem
Gaussian a priori information
Gaussian a priori information
a priori values of model
parameterstheir uncertainty
Gaussian observations
Gaussian observations
observed data
measurement error
Gaussian theory
Gaussian theory
linear theory
uncertaintyin theory
mathematical statement of problem
find (m,d) that maximizes
pT(m,d) = pA(m) pA(d) pg(m,d)
and, along the way, work out the form of pT(m,d)
notational simplification
group m and d into single vector x = [dT, mT]T
group [cov m]A and [cov d]A into single matrix
write d-Gm=0 as Fx=0 with F=[I, –G]
after much algebra, we find pT(x) is a Gaussian distribution
with mean
and variance
after much algebra, we find pT(x) is a Gaussian distribution
with mean
and variancesolution to inverse problem
after pulling mest out of x*
after pulling mest out of x*
reminiscent of GT(GGT)-1 minimum length solution
after pulling mest out of x*
error in theory adds to error in data
after pulling mest out of x*
solution depends on the values of the prior information only to the extent that the model resolution matrix is different from an identity matrix
and after algebraic manipulation
which also equals
reminiscent of (GTG)-1 GT least squares solution
interesting aside
weighted least squares solution
is equal to the
weighted minimum length solution
what did we learn?
for linear Gaussian inverse problem
inexactness of theory
just adds to
inexactness of data
Part 3
Use maximization of relative entropy as a guiding principle for
solving inverse problems
from last lecture
assessing the information contentin pA(m)
Do we know a little about mor
a lot about m ?
Information Gain, S
-S called Relative Entropy
-25 -20 -15 -10 -5 0 5 10 15 20 250
0.1
0.2
0.3
m
p and
q
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
1
2
3
4
sigmamp
S
mpA(m)
S(σ A )pN(m)
σA
(A)
(B)
Principle ofMaximum Relative Entropy
or if you prefer
Principle ofMinimum Information Gain
find solution p.d.f. pT(m) that has smallest possible new information as compared to a priori p.d.f. pA(m)
find solution p.d.f. pT(m) that has the largest relative entropy as compared to a priori p.d.f. pA(m)
or if you prefer
properly normalized
p.d.f.
data is satisfied in the meanor
expected value of error is zero
After minimization using Lagrange Multipliers process
pT(m) is Gaussian with maximum likelihood point mest satisfying
After minimization using Lagrane Multipliers process
pT(m) is Gaussian with maximum likelihood point mest satisfying
just the weighted minimum length solution
What did we learn?
Only that the
Principle of Maximum Entropy
is yet another way of deriving
the inverse problem solutions
we are already familiar with
Part 4
F-test as way to determine whether one solution is
“better” than another
Common Scenario
two different theories
solution mestAMA model parametersprediction error EA
solution mestBMB model parametersprediction error EB
Suppose EB < EAIs B really better than A ?
What if B has many more model parameters than A
MB >> MAIs B fitting better any surprise?
Need to against Null Hypothesis
The difference in error is due torandom variation
suppose error e has a Gaussian p.d.f.uncorrelated
uniform variance σd
estimate variance
want to known the probability density function of
actually, we’ll use the quantity
which is the same, as long as the two theories that we’re testing is applied to the same data
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.51
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.51
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5012
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5012
p(FN,2)
p(FN,5)
p(FN,50)
F
F
F
F
p(FN,25)
N=2 50
N=2 50
N=2 50
N=2 50
p.d.f. of F is known
as is its mean and variance
example
same dataset fit with
a straight line
and
a cubic polynomial
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1
-0.5
0
0.5
1linear fit
z
d(z)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1
-0.5
0
0.5
1cubic fit
zs
d(z)
(A) Linear fit, N-M=9, E=0.030
(B) Cubic fit, N-M=7, E=0.006zi
zi
di
di
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1
-0.5
0
0.5
1linear fit
z
d(z)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1
-0.5
0
0.5
1cubic fit
zs
d(z)
(A) Linear fit, N-M=9, E=0.030
(B) Cubic fit, N-M=7, E=0.006zi
zi
di
di
F7,9 = 4.1est
probability thatF >F est(cubic fit seems better than linear fit)
by random chance alone
or F < 1/F est(linear fit seems better than cubic fit)
by random chance alone
in MatLab
P = 1 - (fcdf(Fobs,vA,vB)-fcdf(1/Fobs,vA,vB));
answer: 6%
The Null Hypothesis
that the difference is due to random variation
cannot be rejected to 95% confidence