Post on 21-Dec-2015
Tests involving two samples – comparing variances, F distribution
• TOH - xA = xB ?
• Step 1 - F-test sA2 = sB
2 ? • Step 2 - t-test use different formula for (i) sA
2 = sB2 . (ii) sA
2 ≠sB2
• Goal – whether a given gene is expressed differently between patients and healthy subjects
• This involves comparing the mean of the two samples• To answer this question one must first know whether the two samples have the s
ame variance• The method used to compare variances of two samples – F distribution• Then we use t-test to test whether the mean of the gene is expressed differently
between patients and healthy subjects
Tests involving two samples – comparing variances, F distribution
• The values measured in controls are: 10, 11, 11, 12, 15, 13, 12• The values measured in patients are: 12, 13, 13, 15, 12, 18, 17, 16, 16, 12, 15,
10, 12. Is the variance different between the controls and the patients at a 5% significant level ?
• H0: sA2 = sB
2, H1: sA2 ≠sB
2
• Need to find a new test statistics,• Two-tail test • Notation: assume A = controls, B = patients in the following calculation• Controls sample A has d.o.f and variance = 6 and 2.66 • Patients sample B has d.o.f and variance = 12 and 5.74• Consider the ratio F = 2.66/5.74 = 0.4634, • Significant level for two-tail test = 5%/2 = 2.5%• F-distribution (right tail) F0.025(6,12) = 3.7283 (from Excel)• F0.975(6,12) = 0.1864 (from Excel)
2
2
B
A
s
sF
F- distribution (right tail) http://mips.stanford.edu/public/classes/stats_data_analysis/234_99.html
Tests involving two samples – comparing variances, F distribution
•F0.025(6,12) = 3.7283
•F0.975(6,12) = 0.1864
Tests involving two samples – comparing variances, F-distribution• Usually we have F-distribution table for 0.01, 0.025, 0.05 but not 0.9
75 !!• Given F0.025(6,12) = 3.7283, how to find F0.975(6,12) ???• The F distribution has the interesting property that :• left tail for an F with 1 and 2 d.o.f. is = the reciprocal of the right tail for an F with the d.o.f reversed:• F[Left tail(A,B)] = 1/F[right tail(B,A)]
• F0.975(6,12) = 1/ F(1-0.975)(12,6)
• F0.975(6,12) = 1/ F0.025(12,6) = 1/5.3662 = 0.18635• back to our null hypothesis test• Since 0.18635 < 0.4634 < 3.7283• Since the F-statistics is in between 0.18635 and 3.7283, we will acce
pt the null hypothesis there is no difference between controls and patients
),)(1(),(
12
21
1
F
F
Tests involving two samples – comparing variances, F-distribution• Now, let us consider the ratio
• The two different choices should lead to same conclusion, since the conclusion should not depend which variance we put on the numerator or denominator
• Controls sample A has d.o.f and variance = 6 and 2.66 • Patients sample B has d.o.f and variance = 12 and 5.74• F = 5.74/2.66 = 2.1579• F-distribution (right tail) F0.025(12,6) = 5.3662 (from Excel)• F0.975(12,6) = 0.2682 (from Excel) • Since 0.2682 < 2.1579 < 5.3662• Since the F-statistics is in between 0.2682 and 5.366, we will accept the null hyp
othesis there is no difference between controls and patientsREMARK• The two F-tests are reciprocal to each other• That is 0.18635 < 0.4634 < 3.7283• Reciprocal 1/0.18635 > 1/0.4634 >1/3.7283 5.3662 > 2.1579 > 0.2682
2
2
A
B
s
sF
Tests involving two samples – comparing means
The gene expression level of the gene AC002378 is measured for the patients, P and controls, C are given in the following:
geneID P1 P2 P3 P4 P5 P6AC002378 0.66 0.51 1.12 0.83 0.91 0.50geneID C1 C2 C3 C4 C5 C6AC002378 0.41 0.57 -0.17 0.50 0.22 0.71• F-test: H0: sP
2 = sC2, H1: sP
2 ≠sC2
• T-test: H0: xP = xC, H1: xP ≠ xC
• Mean of gene expression level of patients, XP = 0.755• Mean of gene expression level of controls, XC = 0.373• sP
2 = 0.059, sC2 = 0.097
• To test whether the two samples have the same variance or not, we perform the F-test at a 5% level
• F = 0.059/0.097 = 0.60, d.o.f. = 10• F0.025(5,5) = 7.146, F0.975(5,5) = 0.1399• In between 0.1399 and 7.146 accept the null hypothesis the patie
nts and controls have the same variances
Tests involving two samples – comparing means
• t-statistic of two independent samples with equal variances
• The t-score is
where
• the p-value, or the probability of having such a value by chance is 0.0400. This value is smaller than the significant level 0.05, and therefore we reject the null hypothesis, the gene AC002378 is expressed differently between cancer patients and healthy subjects.
359.2
)61
61
(078.0
0)373.0755.0(
)11
(
)()(
2
CPpool
CPCP
nns
XXt
078.0266
097.0)16(059.0)16(
2
)1()1( 222
CP
CCPPpool nn
snsns
Tests involving two samples – comparing means
• t-statistic of two independent samples with unequal variances• The modified t-score is
• The degree of freedom need to be adjusted as
• This value is not an integer and needs to be rounded down
)(
)()(22
C
C
P
P
CPCP
ns
ns
XXt
1
)(
1
)(
)(
22
22
222
C
C
C
P
P
P
C
C
P
P
n
n
s
nns
n
s
ns