Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an...
-
Upload
irma-tucker -
Category
Documents
-
view
229 -
download
0
Transcript of Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an...
Correlation With Errors-In-Variables 3/28/2002 1
Correlation with Errors-In-Variables and an Application to Galaxies
William H. Jefferys
University of Texas at Austin, USA
Correlation With Errors-In-Variables 3/28/2002 2
A Problem in Correlation
• A graduate student at Maryland asked me to assist her on a problem involving galaxy data. She wanted to know if the data showed clear evidence of correlation, and if so, what the correlation was and how strong was the evidence for it.
Correlation With Errors-In-Variables 3/28/2002 3
The Data
Correlation With Errors-In-Variables 3/28/2002 4
Comments on Data
• At first glance the correlation seems obvious.
• But there is an unusual feature of her problem: she knew that the data were imperfect, and for each data point had an error bar in both x and y. Standard treatments of correlation do not address this situation.
Correlation With Errors-In-Variables 3/28/2002 5
The Data
Correlation With Errors-In-Variables 3/28/2002 6
Comments on Data
• The presence of the error bars contributes to uncertainty as to how big the correlation is and how well it is determined
• The data are sparse, so we also need to be concerned about small number statistics
• The student also was concerned about how the lowest point affected any correlation. What would happen, she wondered, if it were not included in the sample? [She was afraid that the ellipticity of the particular galaxy was so low that it might not have been measured accurately and in fact that the galaxy might belong to a different class of galaxies]
Correlation With Errors-In-Variables 3/28/2002 7
The Data
Correlation With Errors-In-Variables 3/28/2002 8
Bayesian Analysis and Astronomy
• She had been trying to use our program GaussFit to analyze the data, but it is not designed for tasks such as measuring correlation.
• I, of course, suggested to her that a Bayesian approach might be appropriate
• Bayesian methods offer many advantages for astronomical research and have attracted much recent interest.
• Astronomy and Astrophysics Abstracts lists 169 articles with the keywords ‘Bayes’ or ‘Bayesian’ in the past 5 years, and the number is increasing rapidly (there were 53 in 2000 alone, up from 33 in 1999).
Correlation With Errors-In-Variables 3/28/2002 9
Advantages of Bayesian Methods
• Bayesian methods allow us to do things that would be difficult or impossible with standard (frequentist) statistical analysis.
• It is simple to incorporate prior physical or statistical information
• Interpretation of results is very natural
• Model comparison is easy and straightforward. (This is such a problem)
• It is a systematic way of approaching statistical problems, rather than a collection of ad hoc techniques. Very complex problems (difficult or impossible to handle classically) are straightforwardly analyzed within a Bayesian framework.
Correlation With Errors-In-Variables 3/28/2002 17
Bayesian Model Selection/Averaging
• Given models Mi, each of which depends on a vector of parameters M, and given data Y, Bayes’ theorem tells us that
• The probabilities p (M | Mi ) and p (Mi ) are the prior probabilities of the parameters given the model and of the model, respectively; p (Y |M, Mi ) is the likelihood function, and p (M, Mi |Y ) is the joint posterior probability distribution of the parameters and models, given the data.
• Note that some parameters may not appear in some models, and there is no requirement that the models be nested.
p( M,Mi |Y) ∝ p(Y | M,Mi )p(M |Mi )p(Mi ),
Correlation With Errors-In-Variables 3/28/2002 19
Strategy
• I do not see a simple frequentist approach to this student’s problem
• A reasonable Bayesian approach is fairly straightforward:
• Assume that the underlying “true” (but unknown) galaxy parameters i and i (corresponding to the observed xi and yi) are distributed as a bivariate normal distribution
p(i ,i |ρ,a,b,σ,σ) ∝1
σσ (1−ρ2)
×exp−1
2(1−ρ2 )(i −a)2
σ2 +
(i −b)2
σ2 −2ρ
(i −a)(i −b)
σσ
⎛
⎝
⎜ ⎜
⎞
⎠
⎟ ⎟
⎡
⎣
⎢ ⎢
⎤
⎦
⎥ ⎥
Correlation With Errors-In-Variables 3/28/2002 20
Strategy
• Since we do not know i and i for each galaxy but instead only the observed values xi and yi, we introduced the i and i for each galaxy as latent variables. These are parameters to be estimated.
• Here a and b give the true center of the distribution; ρ is the true correlation coefficient, and σ and σ are the true standard deviations. None of these quantities are known. They are also parameters which must be estimated.
Correlation With Errors-In-Variables 3/28/2002 21
Strategy
• [Since we are using a bivariate normal, the variance-covariance matrix is
with inverse
V =σ2 ρσσ
ρσσy σ2
⎡
⎣ ⎢
⎤
⎦ ⎥
V−1 =1
1−ρ2
1σ2 −
ρσσ
−ρ
σσ
1σ2
⎡
⎣
⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥
Correlation With Errors-In-Variables 3/28/2002 22
Strategy
• So the density is
where =(–a–b)´ ]
p(ϕ ) ∝ V −1/ 2 exp−12
′ ϕ V−1ϕ ⎡ ⎣
⎤ ⎦
Correlation With Errors-In-Variables 3/28/2002 23
Strategy
• This expression may be regarded as our prior on the latent variables i and i. It depends on the other parameters (a, b, ρ, σ, σ). We can regard these as hyperparameters, which will in turn require their own priors.
• The joint prior on all the latent variables i and i can be written as a product:
where and are vectors whose components are i and i respectively.
p(Ξ,Η |ρ,a,b,σ ,σ ) ∝ p(i ,i |ρ,a,b,σ ,σ )i∏
Correlation With Errors-In-Variables 3/28/2002 24
Strategy
• We know the distributions of the data xi and yi conditional on i and i. Their joint distribution is given by
• Here si and ti are the standard deviations of the data points, assumed known perfectly for this analysis (these are the basis of the error bars I showed earlier...).
p(xi ,yi |i ,i ,si ,ti ) ∝ exp−(xi −i )
2
2si2
⎡
⎣ ⎢ ⎤
⎦ ⎥exp−
(yi −i )2
2ti2
⎡
⎣ ⎢ ⎤
⎦ ⎥
Correlation With Errors-In-Variables 3/28/2002 25
The Data
The “true” valueis somewhere nearthis error ellipse
Correlation With Errors-In-Variables 3/28/2002 26
Strategy
• Now we can write down the likelihood, the joint probability of observing the data, conditional on the parameters (here only the latent parameters appear, the others are implicit through the prior on the latent parameters):
where X, Y, S, and T are vectors whose components are xi, yi, si, and ti, respectively.
p(X,Y |Ξ,Η,S,T) ∝ p(xi ,yi |i ,i ,si ,ti)i∏
Correlation With Errors-In-Variables 3/28/2002 27
Priors
• The next step is to assign priors for each of the parameters, including the latent variables. Lacking special information, I chose conventional priors for all but and . Thus, I assign
• Improper constant flat priors on a and b.
• Improper Jeffreys priors 1/σ and 1/σ on σ and σ.
• We have two models, one with correlation (M1) and one without (M0). I assign p(M1)= p(M0)=1/2
• We will compare M1 and M0 by computing their posterior probabilities. I chose the prior p(ρ|M1) on ρ to be flat and normalized on [–1,1] and zero elsewhere; I chose a delta-function prior p(ρ|M0)= (ρ–0) on M0
• Priors on and were displayed earlier
Correlation With Errors-In-Variables 3/28/2002 28
Posterior Distribution
• The posterior distribution is proportional to the prior times the likelihood, as Bayes instructs us
€
p(ρ ,a,b,σ ξ ,σ η ,Ξ,Η, M k | X,Y ,S,T )
∝p(ρ | Mk ) p(Mk )
σξσ η
× p(Ξ,Η | ρ ,a,b,σ ξ ,σ η )
× p( X,Y | Ξ,Η,S,T )
Correlation With Errors-In-Variables 3/28/2002 29
Simulation Strategy
• We used simulation to generate a sample from the posterior distribution through a combination of Gibbs and Metropolis-Hastings samplers (“Metropolis-within-Gibbs”).
• The sample can be used to calculate quantities of interest:
» Compute posterior mean and variance of the correlation coefficient ρ, (calculate sample mean and variance of ρ)
» Plot the posterior distribution of ρ, (plot a histogram of ρ from the sample).
» Determine quantiles of the posterior distribution of ρ (use quantiles of the sample).
» Compute posterior probabilities of each model (calculate the frequency of the model in the sample).
Correlation With Errors-In-Variables 3/28/2002 30
Posterior Conditionals
• The conditional distribution on i and i looks like
• By combining terms and completing the square we can sample i and i from a bivariate normal.
p(i ,i |ρ,a,b,σ,σ,X,Y,S,T) ∝
exp−12
(i −xi )2
si2 +
(i −yi )2
ti2
⎛
⎝ ⎜
⎞
⎠ ⎟
⎡
⎣ ⎢
⎤
⎦ ⎥
×exp−1
2(1−ρ2 )(i −a)2
σ2 +
(i −b)2
σ2 −2ρ
(i −a)(i −b)
σσ
⎛
⎝
⎜ ⎜
⎞
⎠
⎟ ⎟
⎡
⎣
⎢ ⎢
⎤
⎦
⎥ ⎥
Correlation With Errors-In-Variables 3/28/2002 31
Posterior Conditionals
• Similarly, the posterior conditional on a, b is
• Again, by completing the square we can sample a and b from a bivariate normal
• Note that if this were not an EIV problem, we would have x’s and y’s instead of ’s and ’s
p(a,b | ρ,σ ,σ,Ξ,Η) ∝
exp−1
2(1−ρ2)(i −a)2
σ2 +
(i −b)2
σ2 −2ρ
(i −a)(i −b)
σσ
⎛
⎝
⎜ ⎜
⎞
⎠
⎟ ⎟
⎡
⎣
⎢ ⎢
⎤
⎦
⎥ ⎥i
∏
Correlation With Errors-In-Variables 3/28/2002 32
Posterior Conditionals
• The posterior conditional on σ and σ is
• It wasn’t obvious to me that we could sample this in a Gibbs step, but maybe there’s a way to do it. I just used independent M-H steps with a uniform symmetric proposal distribution, tuning the step size for good mixing, and this worked fine.
p(σ ,σ |ρ,a,b,Ξ,Η) ∝σ−(N+1)σ
−(N +1)
× exp−1
2(1−ρ2 )(i −a)2
σ2 +
(i −b)2
σ2 −2ρ
(i −a)(i −b)
σσ
⎛
⎝
⎜ ⎜
⎞
⎠
⎟ ⎟
⎡
⎣
⎢ ⎢
⎤
⎦
⎥ ⎥i
∏
Correlation With Errors-In-Variables 3/28/2002 33
Posterior Conditionals
• We do a reversible-jump step on ρ and M simultaneously. Here the idea is to propose a model M and at the same time a correlation ρ in a M-H step and either accept or reject according to the M-H .
• Proposing from M0 to M0 or from M1 to M1 is basically simple, just an ordinary M-H step.
• If we are proposing between models, then things are a bit more complicated. This is due to the fact that the dimensionalities of the parameter spaces are different between the two models.
Correlation With Errors-In-Variables 3/28/2002 34
Posterior Conditionals
• The posterior conditional of ρ under M1 is
• (The leading factor of 1/2 comes from the prior on ρ and is very important)
p(ρ |a,b,σ ,σ,Ξ,Η,M1) ∝12×
1
(1−ρ2)N / 2
× exp−1
2(1−ρ2 )(i −a)2
σ2 +
(i −b)2
σ2
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
⎡
⎣
⎢ ⎢
⎤
⎦
⎥ ⎥i
∏
× expρ
(1−ρ2 )(i −a)(i −b)
σσ
⎛
⎝
⎜ ⎜
⎞
⎠
⎟ ⎟
⎡
⎣
⎢ ⎢
⎤
⎦
⎥ ⎥i
∏
Correlation With Errors-In-Variables 3/28/2002 35
Posterior Conditionals
• The posterior conditional of ρ under M0 is
• The function guarantees that ρ=0.
• Here, the proportionality factor is chosen so to match the factor of 1/2 under M0. The factors come from the priors [p(ρ|M1~U(–1,1) which has an implicit factor 1/2, and p(ρ|M0~(ρ)].
p(ρ |a,b,σ ,σ,Ξ,Η,M0) ∝ (ρ)
× exp−12
(i −a)2
σ2 +
(i −b)2
σ2
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
⎡
⎣
⎢ ⎢
⎤
⎦
⎥ ⎥i
∏
Correlation With Errors-In-Variables 3/28/2002 36
Posterior Conditionals
• The M-H ratio when jumping from (M,ρ) to (M*,ρ*) is therefore (where the q’s are the proposals):
• We sampled using a beta proposal q(ρ|M1,…) with parameters tuned by experiment for efficiency and good mixing under the complex model, and with a proposal q(M1|…) that also was chosen by experiment with an eye to getting an accurate estimate of the posterior odds on M1.
• The idea is that a beta proposal on ρ matches the actual conditional pretty well so will be accepted with high probability; the M-H ratio will be close to 1.
p(ρ* |M* ,K )p(M* |K )q(ρ |M,K )q(M |K )p(ρ |M,K )p(M |K )q(ρ* |M* ,K )q(M* |K )
Correlation With Errors-In-Variables 3/28/2002 37
Sampling Strategy for Our Problem
• To summarize:
• We sampled the a, b, j, j in Gibbs steps (a and b appear in the posterior distribution as a bivariate normal distribution, as do the j, j).
• We sampled σ, σ with M-H steps using symmetric uniform proposals centered on the current point, adjusting the maximum step for good mixing
• We sampled ρ and M in a simultaneous reversible-jump M-H step, using a beta proposal on ρ with parameters tuned by experiment for efficiency and good mixing under the complex model, and with a proposal on M that also was chosen by experiment with an eye to getting an accurate estimate of the posterior odds on M.
Correlation With Errors-In-Variables 3/28/2002 38
Results
• For the data set including the circled point, we obtained
• Odds on model with correlation = 207 (assumes prior odds equal to 1)
• Median rho = -0.81
• Mean rho = -0.79 ± 0.10
Correlation With Errors-In-Variables 3/28/2002 39
Posterior distribution of ρ (Including all points)
Correlation With Errors-In-Variables 3/28/2002 40
Results
• For the data set including the circled point, we obtained
• Odds on model with correlation = 207 (assumes prior odds equal to 1)
• Median rho = -0.81
• Mean rho = -0.79 ± 0.10
• For the data set without the circled point we obtained
• Odds on model with correlation = 9.9
• Median rho = -0.70
• Mean rho = -0.68 ± 0.16
Correlation With Errors-In-Variables 3/28/2002 41
Posterior distribution of ρ (Excluding 1 point)
Correlation With Errors-In-Variables 3/28/2002 42
Final Comments
• This problem combines several interesting features:
• Latent variables, introduced because this is an errors-in-variables problem
• Model selection, implemented through reversible-jump MCMC simulation
• A combination of Gibbs and Metropolis-Hastings steps to implement the sampler (“Metropolis-within-Gibbs”)
• It is a good example of how systematic application of basic Bayesian analysis can yield a satisfying solution of a problem that, when looked at frequentistically, seems almost intractable
Correlation With Errors-In-Variables 3/28/2002 43
Final Comments
• One final comment: If you look at the tail area in either of the two cases investigated, you will see that it is much less than the 1/200 or 1/10 odds ratio that we calculated for the odds of M0 against M1. This is an example of how tail areas in general are not reliable statistics for deciding whether a hypothesis should be selected.