Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8....
Transcript of Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8....
![Page 1: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/1.jpg)
Lecture 2 :Searching for correlations
![Page 2: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/2.jpg)
• Lecture 1 : basic descriptive statistics
• Lecture 2 : searching for correlations
• Lecture 3 : hypothesis testing and model-fitting
• Lecture 4 : Bayesian inference
The dark energy puzzleThese lectures
![Page 3: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/3.jpg)
• Correlation coefficient and its error
• How to quantify the significance of a correlation
• Bootstrap error estimates
• Non-parametric correlation tests
• Common pitfalls when searching for correlations
• Comparing two distributions
The dark energy puzzleLecture 2 : searching for correlations
![Page 4: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/4.jpg)
• Two variables are correlated if they share a statistical dependence / relationship
• For example, measurements of temperature at noon and 1pm every day are correlated, because they both lie consistently above the mean daily temperature
• Correlations between variables are important because they indicate some underlying physical relationship between those variables
The dark energy puzzleWhat is a correlation?
![Page 5: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/5.jpg)
The dark energy puzzleWhat is a correlation?
Uncorrelated variables !
![Page 6: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/6.jpg)
The dark energy puzzleWhat is a correlation?
Trick for generating correlated variables :x, z unit gaussian variablesy = x.r + z (1-r2)1/2
(x,y) have corr. coefficient r
![Page 7: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/7.jpg)
• Example in astronomy : black-hole / bulge relation
The dark energy puzzle
Blac
k ho
le m
ass
(sol
ar m
asse
s)
Bulge velocity dispersion (km/s)
What is a correlation?
![Page 8: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/8.jpg)
• Describes the strength of the correlation between (x, y)
• Means :
• Standard deviations :
• Definition of correlation coefficient :
The dark energy puzzleCorrelation coefficient
![Page 9: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/9.jpg)
• No correlation [P(x,y) separable into f(x) g(y)] :
• Complete correlation :
• Complete anti-correlation :
• Possible range is
The dark energy puzzleCorrelation coefficient
![Page 10: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/10.jpg)
The dark energy puzzleLies, damn lies and statistics
Why was this poor statistics? Correlation is not the same as causation. Other dietary or lifestyle habits
could be a third variable! [ ... more examples later ... ]
Example 3
12/08/12 11:05 PMBBC NEWS | Health | Eating pizza 'cuts cancer risk'
Page 1 of 2http://news.bbc.co.uk/2/hi/health/3086013.stm
Home News Sport Radio TV Weather Languages
Low graphics | Accessibility help
Before people startdialling the pizza takeaway,they should consider thatpizza can be high in saturatedfat, salt and calories
Nicola O'Connor, Cancer ResearchUK
Pizzas are covered with a potentiallyprotective tomato sauce
[an error occurred while processing this directive]
One-Minute World News
News services Your news when youwant it
News Front Page
Africa
Americas
Asia-Pacific
Europe
Middle East
South Asia
UK
Business
Health
Medical notes
Science &Environment
Technology
Entertainment
Also in the news
-----------------
Video and Audio
-----------------
Programmes
Have Your Say
In Pictures
Country Profiles
Special Reports
RELATED BBC SITES
SPORT
WEATHER
ON THIS DAY
EDITORS' BLOG
Last Updated: Tuesday, 22 July, 2003, 10:28 GMT 11:28 UK
E-mail this to a friend Printable version
Eating pizza 'cuts cancer risk'
Italian researchers say
eating pizza could protect
against cancer.
Researchers claim eating pizza
regularly reduced the risk of
developing oesophageal cancer
by 59%.
The risk of developing colon
cancer also fell by 26% and
mouth cancer by 34%, they
claimed.
The secret could be lycopene, an antioxidant chemical in
tomatoes, which is thought to offer some protection against
cancer, and which gives the fruit its traditional red colour.
But some experts cast doubt
on the idea that pizza
consumption was the
explanation for why some
people did not develop cancer.
They said other foods or
dietary habits could play a part.
Lifestyle
The researchers looked at 3,300 people who had developed
cancer of the mouth, oesophagus, throat or colon and 5,000
people who had not developed cancer.
They were asked about their eating habits, and how often
they ate pizza.
Those who ate pizza at least once a week had less chance of
developing cancer, they found.
Dr Silvano Gallus, of the Mario Negri Institute for
Pharmaceutical Research in Milan, who led the research: "We
knew that tomato sauce could offer protection against certain
tumours, but we did not expect pizza as a complete meal
also to offer such protective powers."
Nicola O'Connor, of Cancer Research UK, told BBC News
Online: "This study is interesting and the results should
probably be looked at in the context of what we already know
about the Mediterranean diet and it's association with a lower
risk of certain types of cancer.
"But before people start dialling the pizza takeaway, they
SEE ALSO:
Scientists design 'anti-cancer'tomato 20 Jun 02 | Health
GM tomatoes 'offer health boost' 30 Aug 01 | Science/Nature
Oral cancer attacked by tomatoes 21 Dec 00 | Health
RELATED INTERNET LINKS:
International Journal of Cancer
Cancer Research UK
The BBC is not responsible for thecontent of external internet sites
TOP HEALTH STORIES
Stem cell method put to the test
Hospitals 'eyeing private market'
Low vitamin D 'Parkinson's link'
| News feeds
![Page 11: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/11.jpg)
• We can estimate the Pearson product-moment correlation coefficient as :
• The possible range of values is
The dark energy puzzleEstimating the correlation coefficient
c.f.
![Page 12: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/12.jpg)
•If the correlation is statistically significant :
• 0 < |r| < 0.3 is a “weak correlation”
• 0.3 < |r| < 0.7 is a “moderate correlation”
• 0.7 < |r| < 1.0 is a “strong correlation”
The dark energy puzzleEstimating the correlation coefficient
![Page 13: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/13.jpg)
• Assumption : (x,y) are drawn from a bivariate Gaussian distribution about an underlying linear relation :
• If this model is true, then the uncertainty in the measured value of r is
The dark energy puzzleEstimating the correlation coefficient
![Page 14: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/14.jpg)
• Example 1 : who discovered the distance-redshift relation, Lemaitre (1927) or Hubble (1929)?
The dark energy puzzleEstimating the correlation coefficient
!
!
!"#!$%&&'(!)*'+,-(.!!/(012345(!167!!!!
8(6-95-:+,;!
!"#$%&'(&)*+,-.&/,0++*&+1&2+3456"6$+7"*&8&944*$:%&;"60:3"6$,<.&=7$#:><$6?&+1&60:&@$6A"6:><>"7%.&
B+0"77:<C5>D.&/+560&91>$,"(&
"#$%&'(%(&E7:&+1&60:&D>:"6:<6&%$<,+#:>$:<&+1&3+%:>7&6$3:<&$<&60"6&+1&60:&:F4"7%$7D&=7$#:><:.&&"*3+<6&$7#">$"C*?&"66>$C56:%&6+&
G5CC*:&HIJKJL(&@0"6&$<&7+6&A$%:*?&-7+A7&$<&60"6&60:&+>$D$7"*&6>:"6$<:&C?&':3"M6>:&&&HIJKNL&&,+76"$7:%&&"&>$,0&15<$+7&+1&&C+60&
60:+>?&"7%&+1&+C<:>#"6$+7(&&O0:&P>:7,0&4"4:>&A"<&&3:6$,5*+5<*?&,:7<+>:%&&A0:7&&4>$76:%&&$7&Q7D*$<0&R&&"**&%$<,5<<$+7<&+1&>"%$"*&
#:*+,$6$:<&"7%&%$<6"7,:<&&!"#$%&'(%)(*+%,-*.&%(/0-*-1"2%%$(&(*/-#"&-3#%3,%%456%7%%A:>:&+3$66:%(&P"<,$7"6$7D&$7<$D06<&">:&D*:"7:%&
1>+3&"&*:66:>&>:,:76*?&1+57%&$7&60:&':3"M6>:&">,0$#:<(&&97&"44:"*&$<&3"%:&1+>&"&':3"M6>:&O:*:<,+4:.&6+&0+7+5>&&60:&%$<,+#:>:>&+1&
60:&:F4"7%$7D&57$#:><:(&
!"#$%&'"()*+,-.(!($(&/"0'"&12$3((4$4"'5(&
O0:&6$6*:&+1&60:&+>$D$7"*&IJKN&4"4:>&$7%$,"6:<&6+&60:&>:"%:>&60"6&60:&,+76:76&A$**&C:&"&15<$+7&+1&C+60&
60:+>?&"7%&+1&+C<:>#"6$+7S&&
48#%57$#:><&0+3+DT7:&%:&3"<<:&,+7<6"76:&:6&%:&>"?+7&,>+$<<"76.&>:7%"76&,+346:&%:&*"&#$6:<<:&&>"%$"*:&
%:<&&7UC5*:5<:<&:F6>"R9"2"1&-:;(.6%<'-1'%-.%&*"#.2"&($%-#%&'(%=#92-.'%&3#9;(>%&';.?&&
4@%'3/39(#(3;.%;#-)(*.(%3,%13#.&"#&%/"..%"#$%-#1*(".-#9%*"$-;.%"113;#&-#9%1+>&60:&>"%$"*&#:*+,$6?&+1&
:F6>"R9"2"1&-1%#(A;2"(6&
':3"M6>:&<4:76&60:&?:"><&IJKVWX&"6&60:&G">#">%&2+**:D:&EC<:>#"6+>?(&G:&0"%&"7&:F,:**:76&1+57%"6$+7&$7&
+C<:>#"6$+7"*&"<6>+7+3?.&A>$6$7D&"C+56&6:>3<&<5,0&"<&60:&:11:,6$#:&6:34:>"65>:<&+1&<6"><.&6>$D+7+3:6>$,&
4">"**"F:<.&3+#$7DR,*5<6:>&4">"**"F:<.&"C<+*56:&C+*+3:6>$,&3"D7$65%:<.&%A">1&C>"7,0&<6"><.&D$"76&C>"7,0&
<6"><.&"7%&60:&*$-:(&&&
O+&<4:"-&+1&':3"M6>:&HIJKNL&"<&"&3+<6&>:3">-"C*:&"7%&"C<+*56:*?&C>$**$"76&60:+>:6$,"*&4"4:>&+7*?.&$<&"&
D>"#:&$7Y5<6$,:&6+&60:&&#:>?&&6$6*:(&&Z+6&+7*?&%+:<&':3"M6>:&60:+>:6$,"**?&%:>$#:&"&*$7:">&>:*"6$+7<0$4&
C:6A::7&60:&>"%$"*&#:*+,$6$:<&+1&D"*"F$:<&"7%&60:$>&%$<6"7,:<&$7&60:&"C+#:&4"4:>.&C56&0:&$<&:"D:>&6+&
%:6:>3$7:&60:&>"6:&"6&A0$,0&60:&=7$#:><:&:F4"7%<(&&&':3"M6>:&HIJKNL&,">:15**?&5<:<&60:&>"%$"*&#:*+,$6$:<&
+1&&VK&:F6>"D"*",6$,&7:C5*":&&6"C5*"6:%&$7&/6>[3C:>D&&HIJKXL.&"7%&0:&,+7#:>6<&&"44">:76&3"D7$65%:<&3&
$76+&%$<6"7,:&\*+D&>&]&^(K3&_&V(^V`&1+**+A$7D&G5CC*:&&HIJKaL(&O0:&",65"*&#"*5:&A0$,0&':3"M6>:&+C6"$7<&$7&
IJKN&1+>&60:&>"6:&+1&:F4"7<$+7&+1&60:&=7$#:><:&&$<&aKX&-3W<W;4,&b&XNX&-3W<W;4,&A$60&%$11:>:76&
A:$D06$7D&1",6+><&HP$D5>:&IL(&
arXiv:1106.3928 etc.
![Page 15: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/15.jpg)
• Example 1 : who discovered the distance-redshift relation, Lemaitre (1927) or Hubble (1929)?
The dark energy puzzleEstimating the correlation coefficient
![Page 16: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/16.jpg)
The dark energy puzzle
• Part (a) : For each dataset, find the Pearson product-moment correlation coefficient and its error
• Lemaitre : r = 0.38 +/- 0.15
• Hubble : r = 0.79 +/- 0.13
Estimating the correlation coefficient
![Page 17: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/17.jpg)
• What is the probability of obtaining the measured value of r if the true correlation is zero? (also depends on N)
• In order to determine whether the correlation is significant, calculate
• This obeys the Student’s t probability distribution with number of degrees of freedom
• Consult tables (2-tailed test) with these two numbers
The dark energy puzzleIs a correlation significant?
![Page 18: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/18.jpg)
The dark energy puzzleIs a correlation significant?
Distribution of twhen rho=0 Number of data
points is crucial !
![Page 19: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/19.jpg)
• Standard tables list the critical values that t must exceed, as a function of nu, for the hypothesis that the two variables are unrelated to be rejected at a particular level of statistical significance (e.g. 95%, 99%)
The dark energy puzzleIs a correlation significant?
“Two-tailed test” :correlation or anti-correlation
![Page 20: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/20.jpg)
The dark energy puzzle
• Part (b) : Determine the statistical significance of the correlation
• Lemaitre : t=2.60 , nu=40, prob=1.3e-2 (2.5 sigma)
• Hubble : t=6.03 , nu=22 , prob=4.5e-6 (4.6 sigma)
Is a correlation significant?
![Page 21: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/21.jpg)
The dark energy puzzleLinear regression line
• The regression line is the linear fit that minimizes the sum of the squares of the y-residuals
• With intercept [y = a + b x] :
• Without intercept [y = b x] :
![Page 22: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/22.jpg)
The dark energy puzzle
• Part (c) : Determine linear least-squares regression lines of the form V = HD and v = H’D + C
• Lemaitre : H=414.9 , H’=221.7 , C=316.5 Hubble : H=423.9 , H’=453.9 , C=-40.4
Linear regression line
![Page 23: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/23.jpg)
• Aside : Hubble and Lemaitre both found values of H0 ~ 420 km/s/Mpc with independent techniques ! How could they both be wrong? [Example of statistical bias]
• Today would probably indicate confirmation bias, but Hubble didn’t even cite Lemaitre’s result!
• Lemaitre : assumed galaxy apparent magnitude was standard candle - scuppered by Malmquist bias
• Hubble : used “brightest stars” as standard candles, but could not distinguish brightest star from HII region (systematic error bias due to aperture effect)
The dark energy puzzleLinear regression line
![Page 24: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/24.jpg)
The dark energy puzzleBootstrap errors and probabilities
• Bootstrap statistics allow us to determine parameter errors and probability distributions using just the data
• If we have N data points, repeatedly draw at random samples of N points (with replacement)
• Re-compute the parameter of interest for each bootstrap sample
• The distribution of the re-computed parameters estimates the uncertainty in the measurement from the original sample
![Page 25: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/25.jpg)
The dark energy puzzleBootstrap errors and probabilities
• Applied to our example : create 1000 bootstrap samples and measure the correlation coefficients
68% conf +/- 0.11
68% conf +/- 0.07
![Page 26: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/26.jpg)
The dark energy puzzleBootstrap errors and probabilities
• Applied to our example : create 1000 bootstrap samples and do a linear regression fit
68% conf +/- 50
68% conf +/- 42
![Page 27: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/27.jpg)
• If we do not want to assume (x,y) are drawn from a bivariate Gaussian we can use a non-parametric correlation test
• Let (Xi,Yi) be the rank of (xi,yi) in the overall order such that
• Find Spearman rank correlation coefficient
• Compare to standard tables with
The dark energy puzzleNon-parametric correlation coefficient
D [Mpc] Rank0.03 1.00.03 2.00.21 3.00.26 4.00.28 5.50.28 5.50.45 7.00.50 8.50.50 8.50.63 10.00.80 11.00.90 13.50.90 13.50.90 13.50.90 13.51.00 16.01.10 17.51.10 17.51.40 19.01.70 20.02.00 22.52.00 22.52.00 22.52.00 22.5
1
![Page 28: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/28.jpg)
The dark energy puzzleNon-parametric correlation coefficient
• Standard tables list the critical values that rs must exceed, as a function of nu, for the hypothesis that the two variables are unrelated to be rejected at a particular level of statistical significance (e.g. 95%, 99%)
![Page 29: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/29.jpg)
The dark energy puzzle
• Part (e) : Determine the Spearman rank cross-correlation coefficient and its statistical significance
• Lemaitre : rs=0.42 , nu=40 , prob=6.0e-3 (2.8 sigma)
• Hubble : rs=0.80 , nu=22 , prob=3.4e-6 (4.7 sigma)
• [Results very similar to Pearson product-moment correlation coefficient, but fewer assumptions]
Non-parametric correlation coefficient
![Page 30: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/30.jpg)
• Selection effects leading to spurious correlations, for example Malmquist bias
The dark energy puzzleIssues with correlations
Distance modulus (redshift)
log 1
0(ra
dio
lum
inos
ity)
Selection effectsin this region
Selection effectsin this region
![Page 31: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/31.jpg)
• Is the correlation driven by a small number of outliers, so is not robust?
The dark energy puzzleIssues with correlations
Mean, variance, correlation coefficientand regression line are all identical
“Rule ofthumb”
![Page 32: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/32.jpg)
• Correlation does not necessarily imply causation
• Sleeping in your shoes causes you to wake up with a headache! [third variable - you were probably having a few drinks the night before...]
• Ice cream sales cause drowning! [third variable - hotter weather means more people are at the beach...]
• Having grey hair causes cancer! [third variable - age...]
• Obesity causes global warming! [third variable - everyone is getting richer...]
The dark energy puzzleIssues with correlations
![Page 33: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/33.jpg)
The dark energy puzzleLies, damn lies and statistics
“We don’t accept the idea that there are harmful agents in
tobacco”[Phillip Morris,1964]
Why was this poor statistics? Cigarette companies were attempting to invoke a “third variable”. But
correlation does sometimes imply causation, if it can be demonstrated by independent lines of evidence
Example 4
![Page 34: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/34.jpg)
• Are two samples drawn from the same distribution?
• Example 2 : samples of flux densities measured at random positions [A] and galaxy positions [B]
The dark energy puzzleComparing two distributions
![Page 35: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/35.jpg)
• Part (a) : Are their means consistent?
• Calculate t statistic and no. of degrees of freedom :
• Compare to Student’s t distribution
• t = 0.31, nu = 661.5, prob of consistency = 0.76
• Small print : assumes (x,y) are normally-distributed populations
The dark energy puzzleComparing two distributions
![Page 36: Lecture 2 : Searching for correlationsastronomy.swin.edu.au/~cblake/StatsLecture2.pdf · 2012. 8. 18. · •Lecture 1 : basic descriptive statistics •Lecture 2 : searching for](https://reader035.fdocuments.us/reader035/viewer/2022071000/5fbc5b741d2325410749d9da/html5/thumbnails/36.jpg)
• Part (b) : Are the full distributions consistent?
• The Kolmogorov-Smirnov test considers the maximum value of the absolute difference between the cumulative probability distributions
The dark energy puzzleComparing two distributions
Max diff = D = 0.067
Prob(Q) = 0.427