your dataGuillaume Calmettes
presentanalyze
explore
Disclaimer
I am not a statistician
Statistics are scary
Statistics
(You at the beginning of the talk)
Statistics are scary
Statistics
not so
(You at the middle of the talk)
Statistics are scary
Statistics
cool
(You at the end of the talk)
Statistics are scary
Statistics
coolWe have to deal with them anyways, so we had better enjoy them!
(You at the end of the talk)
Press the t-test button and you’ll be done!
Did you check the normality of your data first?
Why should you care about statistics?
http://www.nature.com/nature/authors/gta/2e_Statistical_checklist.pdf
Why should you care about statistics?
Advances in Physiological Education
“Explorations in Statistics” series (2008-present) (Douglas Curran-Everett)
Why should you care about statistics?
http://jp.physoc.org/cgi/collection/stats_reporting
The Journal of Physiology Experimental Physiology The British Journal of Pharmacology Microcirculation The British Journal of Nutrition
“Statistical Perspectives” series (2011-present) (Gordon Drummond)
Why should you care about statistics?
http://blogs.nature.com/methagora/2013/08/giving_statistics_the_attention_it_deserves.html
Significance, P values and t-tests – November 2013 Introduction to the concept of statistical significance and the one-sample t-test.
Error Bars – October 2013 The use of error bars to represent uncertainty and advice on how to interpret them.
Importance of being uncertain – September 2013 How samples are used to estimate population statistics and what this means in terms of uncertainty.
Why should you care about statistics?
“Nature research journals will introduce editorial measures to address the problem by improving the consistency and quality of reporting in life-sciences articles”
“We will examine statistics more closely and encourage authors to be transparent, for example by including their raw data”
“Journals […] fail to exert sufficient scrutiny over the results that they publish”
your dataLook at
A picture is worth a thousand words
Location of deaths in the 1854 London Cholera Epidemic
John Snow (1813-1858)
Dataset #1 Dataset #2 Dataset #3 Dataset #4
x y x y x y x y
10 8.04 10 9.14 10 7.46 8 6.58
8 6.95 8 8.14 8 6.77 8 5.76
13 7.58 13 8.74 13 12.74 8 7.71
9 8.81 9 8.77 9 7.11 8 8.84
11 8.33 11 9.26 11 7.81 8 8.47
14 9.96 14 8.1 14 8.84 8 7.04
6 7.24 6 6.13 6 6.08 8 5.25
4 4.26 4 3.1 4 5.39 19 12.5
12 10.84 12 9.13 12 8.15 8 5.56
7 4.82 7 7.26 7 6.42 8 7.91
5 5.68 5 4.74 5 5.73 8 6.89
Why visualize your data?
Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21
The Anscombe’s quartet example
Why visualize your data?
Property in each case Value
Mean of x 9 (exact)
Variance of x 11 (exact)
Mean of y 7.5
Variance of y 4.122 or 4.127
Correlation of x and y 0.816
Linear regression line y = 3.00 + 0.500x
The Anscombe’s quartet example
Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21
Why visualize your data?
Dataset #1 Dataset #2
Dataset #4Dataset #3
The Anscombe’s quartet example
Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21
Why visualize your data?
Dataset #1 Dataset #2
Dataset #4Dataset #3
The Anscombe’s quartet example
Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21
Visualize your data in their raw form!
Aim for revelation rather than mere summary
A great graphic with raw data will reveal unexpected patterns and invites us to make comparisons we might not have thought of beforehand.
If you are still not convinced …
Mean: 16 / Stdv: 5
If you are still not convinced …
Mean: 16 / Stdv: 5
If you are still not convinced …
Mean: 16 / Stdv: 5
80
60
40
20
0D
onor
eng
raftm
ent (
%)
P < 0.05
mH19
WBM secondary transplantation(16 weeks)
e
flDMR/+ 6DMR/+Daniel’s Journal Club paper
Avoid making bar graphs
Rockman H.A. (2012). "Great expectations". J Clin Invest 122 (4): 1133
“To maintain the highest level of trustworthiness of data, we are encouraging authors to display data in their raw form and not in a fashion that conceals their variance.
Presenting data as columns with error bars (dynamite plunger plots) conceals data. We recommend that individual data be presented as dot plots shown next to the average for the group with appropriate error bars (Figure 1).”
Avoid making bar graphs
Cumming, G. et al. (2007). "Error bars in experimental biology". J Cell Biol 177 (1): 7–11
0
25
50
75
100SORRY,
WE JUST+)6¼<�<:=;<
YOU...
Different types, different meanings
• descriptive statistics (Range, SD)
• inferential statistics (SE, CI)
Error bars
Avoid making bar graphs
Cumming, G. et al. (2007). "Error bars in experimental biology". J Cell Biol 177 (1): 7–11
Different types, different meanings
• descriptive statistics (Range, SD)
• inferential statistics (SE, CI)
Often, they also imply a symmetrical distribution of the data.
Error bars
í�ı í�ı �ıí�ı
95%
�ıµ �ı
Avoid making bar graphs
95% of a normal distribution lies within two standard deviations (σ) of the mean (µ)
Mean and Standard deviation are only useful in the context of a “normal distribution”
Avoid making bar graphs
skewed distribution
symmetrical distribution
Data presentation to reveal the distribution of the data • Display data in their raw form. • A dot plot is a good start. • “Dynamite plunger plots” conceal data. • Check the pattern of distribution of the values.
Avoid making bar graphs
• First set: Gaussian (or normal) distribution (symmetrically distributed)
skewed distribution
symmetrical distribution
• Second set: right skewed, lognormal (few large values) “ This type of distribution of values is quite common in biology (ex: plasma concentrations of immune or inflammatory mediators)” “Plunger plots only: who would know that the values were skewed – ... ... and that the common statistical tests would be inappropriate?”
Avoid making bar graphs
Bar graph Dynamite plunger
Don't tell me no one warned you before!
Summary
Looking for patterns and relationships
Providing a narrative for the reader
Summarize complex data structures
Help avoid erroneous conclusions based upon questionable or unexpected data
For others ...
But primarily for you ...
Why visualize your data?
your dataChose the right descriptor for
Averages can be misleading
Averages can be misleading
Averages can be misleading
Averages can be misleading
Is the mean always a good descriptor?
http://www.globalhealthfacts.org/data/topic/map.aspx?ind=87
# of children per household in China (2012)
• mean: 1.35
Is the mean always a good descriptor?
http://www.globalhealthfacts.org/data/topic/map.aspx?ind=87
# of children per household in China (2012)
• mean: 1.35 • median: 1
more representative of the “typical” family (One child policy)
Any measure is wrong!
http://www.youtube.com/watch?v=JUxHebuXviM
Walter Lewis (MIT)
“Whenever you make a measurement, you must know the uncertainty otherwise it is meaningless”
183.3cm 185.7cm
Any measure is wrong!
Walter Lewis (MIT)
“Whenever you make a measurement, you must know the uncertainty otherwise it is meaningless”
The same concept applies when you report your data!
Provide the uncertainty of your descriptor hint: this is NOT the standard deviation
Any measure is wrong!
Walter Lewis (MIT)
“Whenever you make a measurement, you must know the uncertainty otherwise it is meaningless”
The same concept applies when you report your data!
Provide the uncertainty of your descriptor hint: this is NOT the standard deviation
Report the Confidence Interval of your descriptor
The Bootstrap: origin
Efron B. and Tibshirani R. (1991), Science, Jul 26;253(5018):390-5
Modern electronic computation has encouraged a host of new statistical methods that require fewer distributional assumptions than their predecessors and can be applied to more complicated statistical estimators. These methods allow [...] to explore and describe data and draw valid statistical inferences without the usual concerns for mathematical tractability.
Computing the bootstrap 95% CIa1 a2a3
a4a5
an
A0 (m0)
Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
Computing the bootstrap 95% CIa1 a2a3
a4a5
an
a2
a4
a1
a2
a3a1
a3a5
an
a5
a1
a2
mA1 mA2
A1 A2
a3a4
an
an
a1
a1
mA3
A2
a1a3
an
a4
a5
a3
mA4
A2
...
A0 (m0)
Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
Computing the bootstrap 95% CIa1 a2a3
a4a5
an
a2
a4
a1
a2
a3a1
a3a5
an
a5
a1
a2
mA1 mA2
A1 A2
a3a4
an
an
a1
a1
mA3
A2
a1a3
an
a4
a5
a3
mA4
A2
... ...
A0 (m0)
Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
Computing the bootstrap 95% CIa1 a2a3
a4a5
an
a2
a4
a1
a2
a3a1
a3a5
an
a5
a1
a2
mA1 mA2
A1 A2
a3a4
an
an
a1
a1
mA3
A2
a1a3
an
a4
a5
a3
mA4
A2
...
A0 (m0)
Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
Computing the bootstrap 95% CIa1 a2a3
a4a5
an
a2
a4
a1
a2
a3a1
a3a5
an
a5
a1
a2
mA1 mA2
A1 A2
a3a4
an
an
a1
a1
mA3
A2
a1a3
an
a4
a5
a3
mA4
A2
...
A0 (m0)
Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
5.18 [4.91, 4.47]
your dataAnalyze
Choose your statistical test wisely
http://www.nature.com/nature/authors/gta/#a5.6
Every paper that contains statistical testing should state [...] a justification for the use of that test (including, for example, a discussion of the normality of the data when the test is appropriate only for normal data), [...], whether the tests were one-tailed or two-tailed, and the actual P value for each test (not merely "significant" or "P < 0.5").
Authors Guidelines
The simple case (How to)
mean/std 135.9 ± 19.0
mean/std 187.0 ± 19.8
MaleFemale
The simple case (How to)
mean/std 135.9 ± 19.0
mean/std 187.0 ± 19.8
MaleFemale
Distribution of the data?
The simple case (How to)
mean/std 135.9 ± 19.0
mean/std 187.0 ± 19.8
difference/ci 51.2 [50.4, 51.9]
MaleFemale
Distribution of the data?
The simple case (How to)
mean/std 135.9 ± 19.0
mean/std 187.0 ± 19.8
difference/ci 51.2 [50.4, 51.9]
Distribution of the data?
• fit of the histogram
MaleFemale
The simple case (How to)
mean/std 135.9 ± 19.0
mean/std 187.0 ± 19.8
Distribution of the data?difference/ci
51.2 [50.4, 51.9] • fit of the histogram
MaleFemale
The simple case (How to)
mean/std 135.9 ± 19.0
mean/std 187.0 ± 19.8
Distribution of the data?difference/ci
51.2 [50.4, 51.9] • fit of the histogram • QQ plot
Theoretical quantiles of the distribution \
Φ−1! i − 3/8
n + 1/4
"
A(i)ith point
MaleFemale
The simple case (How to)
mean/std 135.9 ± 19.0
mean/std 187.0 ± 19.8
Distribution of the data?difference/ci
51.2 [50.4, 51.9] • fit of the histogram • QQ plot
not “normal”
MaleFemale
The simple case (How to)
mean/std 135.9 ± 19.0
mean/std 187.0 ± 19.8
difference/ci 51.2 [50.4, 51.9]
MaleFemale
Distribution of the data?
• fit of the histogram • QQ plot
Male
Female
The simple case (How to)
mean/std 135.9 ± 19.0
mean/std 187.0 ± 19.8
Distribution of the data?difference/ci
51.2 [50.4, 51.9] • fit of the histogram • QQ plot
visual inspection
MaleFemale
Male
Female
The simple case (How to)
mean/std 135.9 ± 19.0
mean/std 187.0 ± 19.8
Distribution of the data?difference/ci
51.2 [50.4, 51.9] • fit of the histogram • QQ plot • Shapiro-Wilk test
visual inspection
test
MaleFemale
The simple case (How to)
mean/std 135.9 ± 19.0
mean/std 187.0 ± 19.8
Distribution of the data?difference/ci
51.2 [50.4, 51.9] • fit of the histogram • QQ plot • Shapiro-Wilk test
visual inspection
test
Null Hypothesis for the SW test: Data are normally distributed
Female p-value: 0.9195
Male p-value: 0.3866
MaleFemale
The simple case (How to)
mean/std 135.9 ± 19.0
mean/std 187.0 ± 19.8
difference/ci 51.2 [50.4, 51.9]
Distribution of the data?
Normally distributed
MaleFemale
The simple case (How to)
mean/std 135.9 ± 19.0
mean/std 187.0 ± 19.8
difference/ci 51.2 [50.4, 51.9]
Distribution of the data?
Normally distributed
MaleFemale
The simple case (How to)
mean/std 135.9 ± 19.0
mean/std 187.0 ± 19.8
difference/ci 51.2 [50.4, 51.9]
Distribution of the data?
Normally distributed
MaleFemale
The simple case (How to)
mean/std 135.9 ± 19.0
mean/std 187.0 ± 19.8
difference/ci 51.2 [50.4, 51.9]
Distribution of the data?
Normally distributed
Statistical test?
t-test
MaleFemale
The simple case (How to)
mean/std 135.9 ± 19.0
mean/std 187.0 ± 19.8
difference/ci 51.2 [50.4, 51.9]
Distribution of the data?
Normally distributed
Statistical test?
t-test
t-test p-value < 2.2e-16
Null Hypothesis for the t-test: Data belong to the same population
MaleFemale
Usually it is not so simple
The “not so simple” case
S1 S2
The “not so simple” case
S1 S2
The “not so simple” case
S1 S2
S1 S2
The “not so simple” case
S1 S2
S1 S2 Shapiro-Wilk test:
S2 p-value: 6.7e-06
S1 p-value: 7.4e-05
What to do?
What to do?
For the t-test: !
• Mann-Whitney U (independant) !
• Wilcoxon (dependant)
Non parametric alternatives
Choose a new statistical heroBootstrapman
t-test
Computing the bootstrap p-value
Are the two samples different?
Observed difference = 0.44
Computing the bootstrap p-value
Are the two samples different?
If the two samples were from the same population, what would the probabilities be that the observed difference was from chance alone?
Observed difference = 0.44
Computing the bootstrap p-valuea1 a2a3
a4a5
an b5
b1b2 b3b4 bn
A0 B0D0 = mA-mB (0.44)
Computing the bootstrap p-value
a2 a3
a4
a5an
b5
b1b2b3 b4
bn
a1
a1 a2a3
a4a5
an b5
b1b2 b3b4 bn
A0 B0D0 = mA-mB (0.44)
Computing the bootstrap p-value
a2 a3
a4
a5an
b5
b1b2b3 b4
bn
a1
a2
a4
b1
b2
b3a1
a3a5
an
b5
b1
b2
a1 a2a3
a4a5
an b5
b1b2 b3b4 bn
mA1 mB1
A1 B1
A0 B0
D1 = mA1-mB1
D0 = mA-mB (0.44)
Computing the bootstrap p-value
a2 a3
a4
a5an
b5
b1b2b3 b4
bn
a1
a2
a4
b1
b2
b3a1
a3a5
an
b5
b1
b2
a1 a2a3
a4a5
an b5
b1b2 b3b4 bn
mA1 mB1
A1 B1
A0 B0
D1 = mA1-mB1
D0 = mA-mB (0.44)
D0 = 0.44
D1 = -0.83
Computing the bootstrap p-value
a2 a3
a4
a5an
b5
b1b2b3 b4
bn
a1
a1 a2a3
a4a5
an b5
b1b2 b3b4 bn
A0 B0D0 = mA-mB (0.44)
D0 = 0.44
a2
a1
b1
b5
b3a4
b5b5
an
b5
b1
a1
mA2 mB2
A2 B2
D2 = mA2-mB2
D1 = -0.83 D2 = 0.84
Computing the bootstrap p-value
a2 a3
a4
a5an
b5
b1b2b3 b4
bn
a1
a2
a4
b1
b2
b3a1
a3a5
an
b5
b1
b2
a1 a2a3
a4a5
an b5
b1b2 b3b4 bn
mA1 mB1
A1 B1
A0 B0
Repeat 10000 times (D1 ... D10000)
D1 = mA1-mB1
D0 = mA-mB (0.44)
Computing the bootstrap p-value
a2 a3
a4
a5an
b5
b1b2b3 b4
bn
a1
a2
a4
b1
b2
b3a1
a3a5
an
b5
b1
b2
a1 a2a3
a4a5
an b5
b1b2 b3b4 bn
mA1 mB1
A1 B1
A0 B0
Repeat 10000 times (D1 ... D10000)
D1 = mA1-mB1
(0.44)
How many pseudo-differences are greater or equal than the observed difference D0 ?
D0 = mA-mB (0.44)
Computing the bootstrap p-value
a2 a3
a4
a5an
b5
b1b2b3 b4
bn
a1
a2
a4
b1
b2
b3a1
a3a5
an
b5
b1
b2
a1 a2a3
a4a5
an b5
b1b2 b3b4 bn
mA1 mB1
A1 B1
A0 B0
Repeat 10000 times (D1 ... D10000)
D1 = mA1-mB1
(0.44)
9829<D0 171>D0
How many pseudo-differences are greater or equal than the observed difference D0 ?
D0 = mA-mB (0.44)
Computing the bootstrap p-value
a2 a3
a4
a5an
b5
b1b2b3 b4
bn
a1
a2
a4
b1
b2
b3a1
a3a5
an
b5
b1
b2
a1 a2a3
a4a5
an b5
b1b2 b3b4 bn
mA1 mB1
A1 B1
A0 B0
Repeat 10000 times (D1 ... D10000)
D1 = mA1-mB1
9829<D0 171>D0
(0.44)
p = = 0.0171171 10000
(one-tailed)
How many pseudo-differences are greater or equal than the observed difference D0 ?
D0 = mA-mB (0.44)
Computing the bootstrap p-value
a2 a3
a4
a5an
b5
b1b2b3 b4
bn
a1
a2
a4
b1
b2
b3a1
a3a5
an
b5
b1
b2
a1 a2a3
a4a5
an b5
b1b2 b3b4 bn
mA1 mB1
A1 B1
A0 B0
Repeat 10000 times (D1 ... D10000)
D1 = mA1-mB1
How many pseudo-differences are greater or equal than the observed difference D0 ?
(0.44)
MW: p = 0.0169
9829<D0 171>D0
p = = 0.0171171 10000
(one-tailed)
D0 = mA-mB (0.44)
Summary
• visual inspection (hist. / QQ plot) • normality test
How do my data look like?
Distribution?
What do I want to compare?
Right statistical test?• parametric test • non parametric test • resampling statistics
p-valueThe dark side of the
Statistical significance
“The effect of the drug was statistically significant.”
Statistical significance
“The effect of the drug was statistically significant.”
so what?
Statistical significance (example)“The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).”
Statistical significance (example)“The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).”
Training has a larger effect in the mutant mice than in the control mice!
Statistical significance (example)“The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).”
Training has a larger effect in the mutant mice than in the control mice!
Statistical significance (example)“The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).”
Extreme scenario: - training-induced activity barely reaches significance in mutant mice (e.g., 0.049) and barely fails to reach significance for control mice (e.g., 0.051)
Act
ivity
control mutant+ +- -
*
Does not test whether training effect for mutant mice differs statistically from that for control mice.
Statistical significance (example)“The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).”
When making a comparison between two effects, always report the statistical significance of their difference rather than the difference between significance levels.
Nieuwenhuis S. and al. (2011), “Erroneous analyses of interactions in neuroscience: a problem of significance”, Nat Neuroscience, 14(9):1105-1107
P-values do not convey information
Difference = 4
Mean: 16 SD: 5
Mean: 20 SD: 5
p-value = 0.1090
P-values do not convey information
0.10900.0367
Difference = 4
p-value =
Mean: 16 SD: 5
Mean: 20 SD: 5
P-values do not convey information
0.10900.03670.0009
Difference = 4
p-value =
Mean: 16 SD: 5
Mean: 20 SD: 5
P-values do not convey informationMost applied scientists use p-values as a measure of evidence and of the size of the effect
Fact:
0
2
4
6
8
-log
10(P)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
“Manhattan plot”
- This topic has renewed importance with the advent of the massive multiple testing often seen in genomics studies
- The probability of hypotheses depends on much more than just the p-value.
Loannidis JP, (2005) PLoS Med 2(8):e124
Report effect size and CIs instead
P-value is function of the sample size
Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94
Measured Effect Size: difference = 0.018 mV
0
0.2
0.4
)V
m( edutilpm
Acontrol
(n=6777)atropine(n=5272)
Control
Atropine
0.5 mV100 ms
P-value is function of the sample size
Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94
p = 10-5
Measured Effect Size: difference = 0.018 mV
0
0.2
0.4
)V
m( edutilpm
Acontrol
(n=6777)atropine(n=5272)
Control
Atropine
0.5 mV100 ms
P-value is function of the sample size
Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94
not significant
significant
101 102 103
10–4
10–2
100
P)t
set-t(
101
102 103
–0.4
–0.2
0
0.2
0.4
g 's
eg
de
H
Sample size
0.018 mV
Bootstrap effect size and 95% CIsa1 a2
a3
a4a5 an
etc...
a5a1a5a3
a3a7a1a4
a2a2a9a1
a6a3a4a3
a1a1a8a6
etc...
A
mA1 mA2 mA3 mA4 mA5
(10000 times)
E1 (mA1-mB1 )
E2 (mA1-mB1 )
E10000 (mA10000-mB10000 )
b1 b2
b3
b4b5 bn
etc...
b4b2b2b1
b7b5b3b4
b2b1b1b1
b3b8b4b5
b1b1b2b4
etc...
B
mB1 mB2 mB3 mB4 mB5
(10000 times)
...
Bootstrap effect size and 95% CIsa1 a2
a3
a4a5 an
etc...
a5a1a5a3
a3a7a1a4
a2a2a9a1
a6a3a4a3
a1a1a8a6
etc...
A
mA1 mA2 mA3 mA4 mA5
(10000 times)
E1 (mA1-mB1 )
E2 (mA1-mB1 )
E10000 (mA10000-mB10000 )
b1 b2
b3
b4b5 bn
etc...
b4b2b2b1
b7b5b3b4
b2b1b1b1
b3b8b4b5
b1b1b2b4
etc...
B
mB1 mB2 mB3 mB4 mB5
(10000 times)
...
(0.44)
Bootstrap effect size and 95% CIsa1 a2
a3
a4a5 an
etc...
a5a1a5a3
a3a7a1a4
a2a2a9a1
a6a3a4a3
a1a1a8a6
etc...
A
mA1 mA2 mA3 mA4 mA5
(10000 times)
E1 (mA1-mB1 )
E2 (mA1-mB1 )
E10000 (mA10000-mB10000 )
b1 b2
b3
b4b5 bn
etc...
b4b2b2b1
b7b5b3b4
b2b1b1b1
b3b8b4b5
b1b1b2b4
etc...
B
mB1 mB2 mB3 mB4 mB5
(10000 times)
...
(0.44)
Bootstrap effect size and 95% CIsa1 a2
a3
a4a5 an
etc...
a5a1a5a3
a3a7a1a4
a2a2a9a1
a6a3a4a3
a1a1a8a6
etc...
A
mA1 mA2 mA3 mA4 mA5
(10000 times)
E1 (mA1-mB1 )
E2 (mA1-mB1 )
E10000 (mA10000-mB10000 )
b1 b2
b3
b4b5 bn
etc...
b4b2b2b1
b7b5b3b4
b2b1b1b1
b3b8b4b5
b1b1b2b4
etc...
B
mB1 mB2 mB3 mB4 mB5
(10000 times)
...
(0.44)
250th 9750th
Bootstrap effect size and 95% CIs
BA 250th 9750th
Eff. size = 0.44
0.44 [0.042, 0.853]
Do the 95% confidence intervals of the observed effect size include zero (no difference)?
significanceStatistical vs Biological
Statistical vs Biological significance
“Statistical significance suggests but does not imply biological significance.”
“The P value reported by tests is a probabilistic significance, not a biological one.”
Krzywinski M and Altman N (2013) "Points of significance: Significance, P values and t-tests”. Nature Methods 10, 1041–1042
Statistical vs Biological significance
Statistical significance has a meaning in a specific context
No change
Biological consequences?Small change
Large change
Statistical vs Biological significance
LP 1 LP 2
Schulz D.J. et al. (2006) "Variable channel expression in identified single and electrically coupled neurons in different animals". Nat Neurosci. 9: 356– 362
0
Cond
ucta
nces
at +
15 m
V (µ
S/nF
)
Kd KCa A-type
0.10
0.20
0.30
0.60
0.50
0.40
0
mRN
A co
py n
umbe
rshab BK-KC
200
400
600
800
1,000
1,200
1,400
1,600
shal
AB
LP
PD
PY“Good enough” solutionsSomato-gastric ganglion
Statistical vs Biological significance
Madhvani R.V. et al. (2011) "Shaping a new Ca2+ conductance to suppress early afterdepolarizations in cardiac myocytes". J Physiol 589(Pt 24):6081-92
Statistical vs Biological significanceBreast cancer study Difference in cancer returning between control vs low-fat diet groups.
Authors conclusions: People with low-fat diets had a 25% less chance of cancer returning
Statistical vs Biological significance
Authors conclusions: People with low-fat diets had a 25% less chance of cancer returning
Actual return rates: - control: 12.4% - low-fat diet: 9.8%
Difference 2.6%
2.6 9.8 = 26.5%
Breast cancer study Difference in cancer returning between control vs low-fat diet groups.
Beware of false positives
Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5
(from the authors)
Beware of false positives
Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5
Beware of false positives
2012Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5
your dataPresent
Know your audience
Know your audience
Who?
What?
Why?
How?
Know your audience
Who?
What?
Why?
How?
who is my audience? level of understanding? what do they already know?
Know your audience
Who?
What?
Why?
How?
who is my audience? level of understanding? what do they already know?
why am I presenting? what do my audience want to achieve?
Know your audience
Who?
What?
Why?
How?
why am I presenting? what do my audience want to achieve?
what do I want my audience to know? which story will captivate the audience?
who is my audience? level of understanding? what do they already know?
Know your audience
Who?
What?
Why?
How? what medium will support the message the best? what format/layout will appeal to the audience?
who is my audience? level of understanding? what do they already know?
why am I presenting? what do my audience want to achieve?
what do I want my audience to know? which story will captivate the audience?
Color blindness is a common diseaseMales: one in 12 (8%) / Females: one in 200 (0.5%)
Color blindness is a common disease
“Anyone who needs to be convinced that making scientific images more accessible is a worthwhile task [...]: if your next grant or manuscript submission contains color figures, what if some of your reviewers are color blind? Will they be able to appreciate your figures? Considering the competition for funding and for publication, can you afford the possibility of frustrating your audience? The solution is at hand."
Clarke, M. (2007). "Making figures comprehensible for color-blind readers" Nature blog (http://blogs.nature.com/nautilus/2007/02/post_4.html)
Making figures for color blind people
Wong, B. (2011). "Points of view: Color blindness". Nature Methods 8, 441
Telling stories with data
http://vis.stanford.edu/files/2010-Narrative-InfoVis.pdf
“The Martini Glass Structure”
Telling stories with data
http://vis.stanford.edu/files/2010-Narrative-InfoVis.pdf
“The Martini Glass Structure”
EXPLORESTARTGUIDED
!
NARRATIVE
Aesthetic minimalism
Suda B. (2010). "A practical guide to Designing with Data"
Aesthetic minimalism
Suda B. (2010). "A practical guide to Designing with Data"
Aesthetic minimalism
Suda B. (2010). "A practical guide to Designing with Data"
Aesthetic minimalism
Suda B. (2010). "A practical guide to Designing with Data"
Aesthetic minimalism
Suda B. (2010). "A practical guide to Designing with Data"
Aesthetic minimalism
Suda B. (2010). "A practical guide to Designing with Data"
Common mistakes in data reporting
Welcome to the FOX “Dishonest Charts” gallery
Common mistakes in data reporting
Common mistakes in data reportingE. Tufte’s “Lie Factor”Make things appear to be “better” than they are by fiddling with the scales of things
Common mistakes in data reporting
Common mistakes in data reporting
Common mistakes in data reporting
Common mistakes in data reporting
Common mistakes in data reporting
Common mistakes in data reporting
“We found that relative to WT mice, the luminal microbiota of Il10−/− mice exhibited a ~100-fold increase in E. coli (Fig. 1I)”
Arthur et al, (2012) Science 5;338(6103):120-3
Fig 1I
Common mistakes in data reporting
A
E
BCD
Common mistakes in data reporting
A
E
BCD
20%20%
20%
20%
20%
Common mistakes in data reporting
Common mistakes in data reporting
Common mistakes in data reporting
0
10
20
30
40
year1 year2 year3 year4
Percent Return on Investment
Group A Group B
year4year3year2year1
010203040
Group AGroup B
Percent Return on Investment
Thank you!
“The important thing is not to stop questioning. Curiosity has its own reason for existing”
- Albert Einstein-
Top Related