Post on 14-Dec-2015
The Difference Between Two Population Means
Assumptions:1. X1,…,Xm is a random sample from a
population with
2. Y1,…,Yn is a random sample from a population with
3. The X and Y samples are independent of one another
21 1 and .
22 2 and .
Expected Value and Standard Deviation of
The expected value is
The standard deviation is
X Y
1 2. So is an unbiased estimator ofX Y
1 2.
2 21 2
X Y m n
Null hypothesis:0 1 2: 0H
Test statistic value: 0
2 21 2
x yz
m n
Test Procedures for Normal Populations With Known Variances
Large-Sample Tests
The assumptions of normal population distributions and known values of are unnecessary. The Central Limit Theorem guarantees that has approximately a normal distribution.
1 2,
X Y
Large-Sample TestsUse of the test statistic value
0
2 21 2
x yz
s sm n
along with previously stated rejection regions based on z critical values give large-sample tests whose significance levels are approximately
m, n >40
.
Confidence Interval for 1 2
with a confidence level of 100(1 )%
2 21 2
/ 2
s sx y z
m n
Provided m and n are large, a CI for
1 2 is
confidence bounds can be found by replacing / 2 by .z z
Assumptions
Both populations are normal, so that X1,…,Xm is a random sample from a normal distribution and so is Y1,…,Yn. The plausibility of these assumptions can be judged by constructing a normal probability plot of the xi’s and another of the yi’s.
t Distribution
When the population distributions are both normal, the standardized variable
1 2
2 21 2
( )X YT
S Sm n
has approximately a t distribution…
df v can be estimated from the data by
t Distribution
22 21 2
2 22 21 2/ /
1 1
s s
m nv
s m s n
m n
(round down to the nearest integer)
Two-Sample CI for 1 2
with a confidence level of 100(1 )%
2 21 2
/ 2,v
s sx y t
m n
The two-sample CI for 1 2
is
a 0 0:H
Alternative Hypothesis
Rejection Region for Approx. Level Test
a 0 0:H
a 0 0:H
,vt t
,vt t
/ 2,vt t or
The Two-Sample t Test
/ 2,vt t
Pooled t Procedures
Assume two populations are normal and have equal variances. If denotes the common variance, it can be estimated by combining information from the two-samples. Standardizing using the pooled estimator gives a t variable based on m + n – 2 df.
2
X Y
Paired Data (Assumptions)
The data consists of n independently selected pairs (X1,Y1),…, (Xn,Yn), with
Let D1 = X1 – Y1, …, Dn = Xn – Yn. The Di’s are assumed to be normally distributed with mean value and variance
1 2( ) and ( )i iE X E Y
2 .DD
Null hypothesis:0 0: DH
Test statistic value: 0
/D
dt
s n
The Paired t Test
are the sample mean and standard deviation of the di’s.
and Dd s
a 0: DH
Alternative Hypothesis
Rejection Region for Level Test
a 0: DH
a 0: DH
, 1nt t
, 1nt t
/ 2, 1nt t or
The Paired t Test
/ 2, 1nt t
Confidence Interval for D
The paired t CI for isD
/ 2, 1 /n Dd t s n
confidence bounds can be found by replacing / 2 by .t t
Paired Data and Two-Sample t
1( ) ( ) iV X Y V D V D
n
2 21 2 1 2( ) 2iV D
n n
Independence between X and Y
Positive dependence
0
0
Pros and Cons of Pairing
1. For great heterogeneity and large correlation within experimental units, the loss in degrees of freedom will be compensated for by an increased precision associated with pairing (use pairing).
2. If the units are relatively homogeneous and the correlation within pairs is not large, the gain in precision due to pairing will be outweighed by the decrease in degrees of freedom (use independent samples).
Difference Between Population Proportions
Let X ~Bin(m,p1) and Y ~Bin(n,p2) with X and Y independent variables. Then
1 2 1 2ˆ ˆE p p p p
1 1 2 21 2ˆ ˆ
p q p qV p p
m n (qi = 1 – pi)
1 2 1 2ˆ ˆ is an unbiased estimator of p p p p
a 1 2: 0H p p
Alternative Hypothesis
Rejection Region
z z
z z
/ 2z z / 2z zor
Large-Samples
Valid provided
0 010 and (1 ) 10.np n p
a 1 2: 0H p p
a 1 2: 0H p p
1 2( , )p p
Alt. Hypothesis
1 2(1/ 1/ ) ( )z pq m n p p
General Expressions for
1 2( , )p p
a 1 2: 0H p p
a 1 2: 0H p p
1 2(1/ 1/ ) ( )1
z pq m n p p
Alt. Hypothesis
General Expressions for
1 2(1/ 1/ ) ( )z pq m n p p
a 1 2: 0H p p
1 2(1/ 1/ ) ( )z pq m n p p
1 2( , )p p1 2( , )p p
1 2
1 2
( ) /( )
( ) /( )
p mp np m n
q mq nq m n
where
Sample Size
For the case m = n, the level test has type II error probability at the alternative values p1, p2 with p1 – p2 = d when
2
1 2 1 2 1 1 2 2
2
( )( ) / 2z p p q q z p q p qn
d
The F Distribution
The F probability distribution has parameters v1 (number of numerator df) and v2 (number of denominator df). If X1 and X2 are independent chi-squared rv’s with v1 and v2 df, then
1 1
2 2
/
/
X vF
X v
The F Distribution Density Curve Property
1 2 1 21 , , , ,1/v v v vF F
f
1 2, ,v vF
F density curve
Shaded area =
Inferential Methods
Let X1,…,Xm and Y1,…,Yn be random (independent) samples from normal distributions with variances respectively. Let
2 21 2 and .
2 21 2 and denoteS S
the two sample variances, then2 2
1 12 22 2
/
/
SF
S
2 2a 1 2:H
Alternative Hypothesis
Rejection Region
, 1, 1m nf F
2 2a 1 2:H
2 2a 1 2:H
1 , 1, 1m nf F
/ 2, 1, 1m nf F
or 1 / 2, 1, 1m nf F
F Test for Equality of Variances