1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html...
-
Upload
christina-patterson -
Category
Documents
-
view
222 -
download
1
Transcript of 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html...
![Page 2: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/2.jpg)
2
![Page 3: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/3.jpg)
3
Franziskus, Pope
Andrej Kolmogoroff,Mathematician
Two ways of dealing with uncertainty
![Page 4: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/4.jpg)
4
Topics
I. Descriptive Statistics
II. Testing
III. Clustering
IV. Regression
![Page 5: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/5.jpg)
5
What is „data“?
Cases (Samples, Observations)
Endpoints (Variables)Only one item per column!
Meaningful variable names!
Values, instances
of a variable
…Th
e s
am
ple
/ th
e s
am
ple
pop
ula
tion
⊆ p
op
ula
tion
A collection of observationsof a similar structure
![Page 6: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/6.jpg)
6
Different Scales of a Variable
Categorial VariablesHave only a finite number of instances:Male/female; Mon/Tue/…/Sun
Continuous VariablesCan take values in an interval of the real numbersE.g. blood pressure [mmHg], costs [€]
Nominal data: Categorial variables without a given orderE.g. eye color [brown, blue, green, grey]Special Case: Binary (=dichotomic) variables (yes/no, 0/1…)Ordinal data: Instances are ordered in a natural wayE.g. tumor grade [I, II, III, IV], rank in a contest (1,2,3,…)
![Page 7: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/7.jpg)
785% shinier hair!
I. Description
Problem:
It is often difficult to map a variable to an appropriate scale:
E.g. metabolic activity, evolutionary success, pain, social status, customer satisfaction, anger
-> Check whether your choice of scale is meaningful!
![Page 8: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/8.jpg)
8
Value A B AB 0 (absolute) frequency 83 20 10 75 188
relative frequency 44% 11% 5% 40% 100%
Always list absolute frequencies!• Do not list relative frequencies in percent if the
sample size is small (n < 20)• Do not use decimal digits in percent numbers for
n<300• Rule of thumb: Use ca. (log10n) - 2 digits
„Side effects were observed in 14,2857% of all cases“Nonsense, we conclude that n=7!
Description of a categorial variable: Tables
Example: Blood antigens (ABO), n = 188 samples
I. Description
![Page 9: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/9.jpg)
9
0
5
10
15
20
25
30
35
40
45
A B AB 0
%
Description of a categorial variable: Barplot
I. Description
Rel. fre-
quency
Abs. fre-
quency
10
20
40
80
![Page 10: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/10.jpg)
10Merkmalsausprägung
Za
hl d
er
Fä
lle
-3 -2 -1 0 1 2 3
02
04
06
0
Description of continuous data: Histogram
I. Description
![Page 11: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/11.jpg)
11
Merkmalsausprägung
Za
hl d
er
Fä
lle
-3 -2 -1 0 1 2 3
02
04
06
0
Merkmalsausprägung
Za
hl d
er
Fä
lle
-3 -2 -1 0 1 2 3
05
10
15
20
Merkmalsausprägung
Za
hl d
er
Fä
lle
-4 -2 0 2 4
05
01
00
15
02
00
The size of the bins (= width of the bars) is a matter of choice and has to be
determined sensibly!
50 bins 4 Balken12 bins
I. Description
![Page 12: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/12.jpg)
Merkmalsausprägung
rela
tive
Hä
ufig
keit
-3 -2 -1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
-3 -2 -1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
Merkmalsausprägung
rela
tive
Hä
ufig
keit
12
-3 -2 -1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
Merkmalsausprägung
rela
tive
Hä
ufig
keit
Caution: Data will be smoothed automatically. This is very suggestive and blurs discontinuities in a distribution.
I. Description
Description of continuous data: Density plot
![Page 13: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/13.jpg)
13
Most important: The Gaussian (=normal) distribution
Expectation value
Standard-deviation
I. Description
C.F Gauss (1777-1855):Roughly speaking, continuous variables that are the (additive) result of a lot of other random variables follow a Gaussian distribution.-> It is often sensible to assume a gaussian distribution for continuous variables.
![Page 14: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/14.jpg)
14
Measures of Location, Scale and Scatter
Mean: sum of all observations / number of samples
Ex.: observations: 2, 3, 7, 9, 14sum: 2+3+7+9+14 = 35
# observations: 5Mean: 35/5 = 7
Median: A number M such that 50% of all observations are less than or equal to M, and 50% are greater than or equal to M. (Q: What if #observations is even?)
|| | ||| ||| || | || || ||| || | || | || ||| |||| ||| || | || || | || |||| || ||| | || || || || || || | |||| || || ||| | ||| | || || | || | || || || ||||| | || |||| || || || || | || | | | || |||| || || ||| ||| ||| || ||| |||| || | || || | | ||| | || ||||| || ||| | ||| | ||| || || | | | | || |||
-2 -1 0 1 2 3
-1.0
-0.5
0.0
0.5
1.0
d
rep
(0, le
ng
th(d
))
50% 50%
I. Description
![Page 15: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/15.jpg)
rel. H
äu
fig
ke
it
0 1 2 3 4
02
00
04
00
06
00
08
00
01
00
00 Mode: A value for which the
density of the variable reaches a local maximum. If there is only one such value, the distribution is called unimodal, otherwise multimodal. Special case: bimodal)
The mode usually is an unstable description of a sample.
15MeanMedian
I. Description
Description of Location, Scale and Scatter
Mode
![Page 16: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/16.jpg)
16
Distribution Shapes
SymmetricMean Median
Skewed to the rightMedian << Mean
Skewed to the leftMean << Median
I. Description
![Page 17: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/17.jpg)
17
The median should be preferred to the mean if• the ditribution is very asymmetric• there are extreme outliers
The skewness g of the distribution ranges between–1 und +1, i.e. the distribution is approx. symmetric.
skewness g > 0
skewness g < 0
0 1 2 3 4 5
-2
-1
01
2
d
rep(0, length(d))
The mean is more „precise“ than the median if the distribution is approximately normal
Rule of thumb:
Right skew:
Left skew:
I. Description
![Page 18: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/18.jpg)
18
How would you describe this distribution?
I. Description
![Page 19: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/19.jpg)
19
„…it showed a giant boa swallowing an elephant. I painted the inside of the boa to make it visible to the adults. They always need explanations.“
Antoine de Saint-Exupéry, Le petit prince
Unexpected distributionshave unexpected causes!
I. Description
![Page 20: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/20.jpg)
20
More measures of location
Quantile: A q-quantile Q (0≤q≤1) splits the data into a fraction of q points below or equal to Q and a fraction of 1-q points above or equal to Q.
|| | ||| ||| || | || || ||| || | || | || ||| |||| ||| || | || || | || |||| || ||| | || || || || || || | |||| || || ||| | ||| | || || | || | || || || ||||| | || |||| || || || || | || | | | || |||| || || ||| ||| ||| || ||| |||| || | || || | | ||| | || ||||| || ||| | ||| | ||| || || | | | | || |||
-2 -1 0 1 2 3
-1.0
-0.5
0.0
0.5
1.0
d
rep
(0, le
ng
th(d
))
50% 50%Median = 50%-quantile
|| | ||| ||| || | || || ||| || | || | || ||| |||| ||| || | || || | || |||| || ||| | || || || || || || | |||| || || ||| | ||| | || || | || | || || || ||||| | || |||| || || || || | || | | | || |||| || || ||| ||| ||| || ||| |||| || | || || | | ||| | || ||||| || ||| | ||| | ||| || || | | | | || |||
-2 -1 0 1 2 3
-1.0
-0.5
0.0
0.5
1.0
d
rep
(0, le
ng
th(d
))
25% 25%1.quartile =
25%-quantile
25% 25%3.quartile =
75%-quantile
1-quantile =
maximum
0-quantile =
minimum
I. Description
![Page 21: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/21.jpg)
21
The five-point Summary and the Boxplot
|| | ||| ||| || | || || ||| || | || | || ||| |||| ||| || | || || | || |||| || ||| | || || || || || || | |||| || || ||| | ||| | || || | || | || || || ||||| | || |||| || || || || | || | | | || |||| || || ||| ||| ||| || ||| |||| || | || || | | ||| | || ||||| || ||| | ||| | ||| || || | | | | || |||
-2 -1 0 1 2 3
-1.0
-0.5
0.0
0.5
1.0
d
rep(
0, le
ngth
(d))
I. Description
![Page 22: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/22.jpg)
22
How far do the observations scatter around their „center“(=measure of location)?
Measures of Variation
large variationsmall variation
Location measure
e.g.: location = Median variation = 3.quartile – 1.quartile
= Interquartile range (IQR)
I. Description
![Page 23: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/23.jpg)
23
Measures of Variation
x~ jx
|~| xx j
e.g.: location = median variation = mean deviation (MD) from
=
x~
n
jj xx
n 1
|~|1
x~
e.g.: location = median variation = median absolute deviation,MAD
from njxx j ,...,1 , |~| Median
x~
I. Description
x~
![Page 24: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/24.jpg)
24
Mean ± s contains ~68% of the data
Mean ± 2s ´´ ~95% ´´
Mean ± 3s ´´ ~99.7% ´´ x-s x x+s
Measures of Variation
Numbers for Gaussian variables:
z.B.: location = mean variation = mean squared deviation
from
=
= varianceOr: variation = square root of the variance
= standard deviation (s, std.dev)
x
n
jj xx
n 1
2)(1
x
I. Description
![Page 25: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/25.jpg)
25
Histogram/Density Plot vs. Boxplot
Boxplot contains less information, but it is easier to interpret.
I. Description
1
3
2
4
![Page 26: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/26.jpg)
26
Multiple Boxplots I. Description
Sample: 2769 schoolchildren
![Page 27: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/27.jpg)
27
Always report the sample size!
a) numericalMedian, Q1, Q3, Min., Max. (5-summary) for symmetric distr. alternatively: mean, standard deviation
b) graphical
Boxplots, histograms, density plots
c) textuale.g. „Blood pressure was reduced by 12 mmHg (Interquartile range: 8 to 18 mmHg = 10mmHg), whereas the reduction in the placebo group was only3 mmHg (IQR: –2 to 4 mmHg = 6mmHg).“
SummaryI. Description
![Page 28: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/28.jpg)
28
Cross Table
Person Medication Response
A Verum yes
B Placebo no
Two categorial variables: Cross Tables
Data
I. Description
![Page 29: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/29.jpg)
29
Cross Tablevalues of variable 2
values of variable 1(potential causes)
(potential effects)
I. Description
Two categorial variables: Cross tables
Person Medication Response
A Verum yes
B Placebo no
Data
![Page 30: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/30.jpg)
30
Cross TableResponse
yes no
Medi-cation
Verum
Placebo
values of variable 2
values of variable 1(potential causes)
(potential effects)
Each case is one count in the table
I. Description
Two categorial variables: Cross tables
Person Medication Response
A Verum yes
B Placebo no
Data
![Page 31: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/31.jpg)
31
Cross TableResponse
yes no
Medi-cation
Verum 1 0
Placebo 0 1
values of variable 2
values of variable 1(potential causes)
(potential effects)
I. Description
Two categorial variables: Cross tables
Each case is one count in the table
Person Medication Response
A Verum yes
B Placebo no
Data
≠
![Page 32: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/32.jpg)
32
Cross TableResponse
yes no
Medi-cation
Verum 1 0
Placebo 0 1
values of variable 2
values of variable 1(potential causes)
(potential effects)
The most common question is:Are there differences between █ and
█ ?
I. Description
Two categorial variables: Cross tables
![Page 33: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/33.jpg)
33
Absolute number, row-, column percent
ResponseTotal
yes no
Medi-cation
Verum20
50%, 67%20
50%, 40%40
50%
Placebo10
25%, 33%30
75%, 60%40
50%
Total 30, 37% 50, 63% 80, 100%
Cross Table: n = 80 cases
I. Description
Two categorial variables: Cross tables
![Page 34: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/34.jpg)
34
What‘s bad about this table?
I. Description
Two categorial variables: Cross tables
![Page 35: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/35.jpg)
35
Cross tables:Independent vs. paired data
independent data
paired data
Person Medication Response
A Verum yes
B Placebo no
Person Medic.: VerumMedic.: Placebo
A yes yes
B yes no
Paired data: One object (or two closely related objects) serves for the measurement of two variables of the same kind.Exercise: The influence of diet on body height is assessed in 1) a study with 100 randomly picked subjects. 2) a study with 50 identical twins that grew up separately. Write down the cross tables. Which study is probably more informative?
I. Description
![Page 36: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/36.jpg)
36
Cross TableMedic.: Placebo
yes no
Medic.: Verum
yes 1 1
no 0 0
values of variable 2
values of variable 1
I. Description
Cross tables:Paired data
paired data
Person Medic.: VerumMedic.: Placebo
A yes yes
B yes no
![Page 37: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/37.jpg)
37
Cross tableMedic.: Placebo
yes no
Medic.: Verum
yes 1 1
no 0 0
values of variable 2
values of variable 1
A typical question is:
concordant observations
discordant observations
Are the observations concordant or discordant?Is there a particularly large number in █ or █ ?
I. Description
Cross tables:Paired data
![Page 38: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/38.jpg)
Comparison of two global gene expression measurements
Absolute scale Double logarithmic scale
y = ½ x
y = ¼ x
y = 2x
y = 4x
y = ½ x
y = ¼ x
y = 2x
y = 4x
Advantages of double log scale:
• Skewed distributions appear more evenly spread across the plot• Loci of fixed expression folds are lines parallel to the main diagonal
Scatterplot I. Description
Two continuous variables: Scatter Plots
![Page 39: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/39.jpg)
Advantages of the MA-Plot:• Lines of constant expression folds are parallel to the x-axis.• Differences between channel 1 and channel 2 can easily be read off
the plot. Intensity-dependent systematic errors can be detected.
1 10 100 1000 10000
11
01
00
10
00
10
00
0
x
y
0 2 4 6 8 10
-4-2
02
4
log(x * y)/2
log
(y/x
)
turn by 45o
log
(fo
ld r
atio
of y
an
d x
)
log (geometr. Mean of x and y)
Scatterplot vs. M-A-plot I. Description> x = log(exprs[,1])> y = log(exprs[,2])> plot(x,y)
> xMA =(x+y)/2> yMA = y - x> plot(xMA,yMA)
log (x)
log
(y)
There is a mistake in these plots (compare left and right plot)!
![Page 40: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/40.jpg)
No visible bias (=systematic error)
Channel 2 differs from channel 1 by a
constant factor
multiplicative bias
M-A-plot I. Description
![Page 41: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/41.jpg)
How to quantify such a relation between x and y?
Example
Korrelation I. Description
Dependence of two continuous variables
![Page 42: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/42.jpg)
The Pearson correlation coefficient r
measures the degree of linear dependence of two variables
Properties:
-1 ≤ r ≤ +1
r = ± 1: perfect linear dependence
the sign of r indicates the direction of the dependence
r is symmetric, i.e., rxy=ryx
Pearson Korrelation I. Description
r=1
r=1
r= -1
![Page 43: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/43.jpg)
Pearson Korrelation I. Description
The smaller the absolute value of r, the weaker the linear dependence
![Page 44: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/44.jpg)
Pearson Korrelation I. Description
The smaller the absolute value of r, the weaker the linear dependence
![Page 45: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/45.jpg)
Pearson Korrelation I. Description
The smaller the absolute value of r, the weaker the linear dependence
![Page 46: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/46.jpg)
Pearson Korrelation I. Description
The smaller the absolute value of r, the weaker the linear dependence
![Page 47: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/47.jpg)
Pearson Korrelation I. Description
The smaller the absolute value of r, the weaker the linear dependence
![Page 48: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/48.jpg)
Pearson Korrelation I. Description
The smaller the absolute value of r, the weaker the linear dependence
![Page 49: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/49.jpg)
rxy = 0,38 rxy = 0,84
Example: Relation between height and weight resp. Arm length
The closer the points scatter around a line, the larger the absolute value of r.
Pearson Korrelation I. Description
![Page 50: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/50.jpg)
What is the value of r in these cases?
Pearson correlation has difficulties in recognizing non-linear dependencies.
r ≈ 0
r ≈ 0r ≈ 0
Pearson Korrelation I. Description
![Page 51: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/51.jpg)
Spearman correlation measures monotonic dependencies.X
Y
Rang(X)
Ra
ng
(Y)
Idea: Calculate the pearson correlation coefficient of the rank transformed data Spearman-Korrelation s
X
Y
rank
(Y)
rank(X)
r = 0,88 s = 0,95
Korrelation
Pearson correlation Spearman correlation
I. Description
![Page 52: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/52.jpg)
Raw
dat
a
Pearson vs. Spearman Korrelation I. Description
![Page 53: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/53.jpg)
Pearson correlation
NM_001767NM_001767 NM_000734NM_000734 NM_001049NM_001049 NM_006205NM_006205
NM_001767NM_001767 1.00000000 0.94918522 -0.04559766 0.04341766
NM_000734NM_000734 0.94918522 1.00000000 -0.02659545 0.01229839
NM_001049NM_001049 -0.04559766 -0.02659545 1.00000000 -0.85043885
NM_006205NM_006205 0.04341766 0.01229839 -0.85043885 1.00000000
Pearson vs. Spearman Korrelation I. Description
![Page 54: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/54.jpg)
Ran
k tr
ansf
orm
ed d
ata
Pearson vs. Spearman Korrelation I. Description
![Page 55: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/55.jpg)
NM_001767NM_001767 NM_000734NM_000734 NM_001049NM_001049 NM_006205NM_006205
NM_001767NM_001767 1.00000000 0.9529094 -0.10869080 -0.17821449
NM_000734NM_000734 0.9529094 1.00000000 -0.11247013 -0.20515650
NM_001049NM_001049 -0.10869080 -0.11247013 1.00000000 0.03386758
NM_006205NM_006205 -0.17821449 -0.20515650 0.03386758 1.00000000
Spearman correlation
Pearson vs. Spearman Korrelation I. Description
![Page 56: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/56.jpg)
Conclusion: Spearman correlation is more robust against outliers. However in case of linear dependence, it is less sensitive than Pearsion correlation.
Pearson vs. Spearman Korrelation
Raw data Rank transformed data
I. Description
![Page 57: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/57.jpg)
Quantile-Quantile plot (qq-plot). For the comparison of two distributions (of x and y), plot the quantiles of the x distribution against the corresponding quantiles of the y distribution.
QQ-plot
Q(uantile)-Q(uantile) Plots I. Description
![Page 58: 1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html tresch@mpipz.mpg.de.](https://reader035.fdocuments.us/reader035/viewer/2022062300/56649cfe5503460f949ced9b/html5/thumbnails/58.jpg)
Interpretation:
Unsimilar distributions:qq-plot is not linear, in
particular not in the center of the qq-line.
Similar Distributions except for the tails,
the tails of the y distribution are
“heavier”
Q(uantile)-Q(uantile) Plots I. Description
Similar Distributions except for the tails,
the tails of the x distribution are
“heavier”