Lecture 4Statistic for quality
Transcript of Lecture 4Statistic for quality
-
8/20/2019 Lecture 4Statistic for quality
1/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
Mathematical Biostatistics Boot Camp: Lecture 4
Vectors
Brian Caffo
Department of Biostatistics
Johns Hopkins Bloomberg School of Public Health
Johns Hopkins University
August 3, 2012
-
8/20/2019 Lecture 4Statistic for quality
2/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
Table of co
1 Table of contents
2 Random vectors
3 IndependenceIndependent eventsIndependent random variablesIID random variables
4 Correlation
5 Variance and correlation properties
6 Variances properties of sample means
7 The sample variance
8 Some discussion
-
8/20/2019 Lecture 4Statistic for quality
3/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
Random v
• Random vectors are simply random variables collected into a v• For example if X and Y are random variables (X , Y ) is a rand
• Joint density f (x , y ) satisfies f > 0 and
f (x , y )dxdy = 1
• For discrete random variables
f (x , y ) = 1
• In this lecture we focus on independent random variables whf (x , y ) = f (x )g (y )
-
8/20/2019 Lecture 4Statistic for quality
4/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
Independent e
• Two events A and B are independent if
P (A ∩ B ) = P (A)P (B )
• Two random variables, X and Y are independent if for any tw
P ([X ∈ A] ∩ [Y ∈ B ]) = P (X ∈ A)P (Y ∈ B )• If A is independent of B then
• Ac is independent of B • A is independent of B c
• Ac is independent of B c
-
8/20/2019 Lecture 4Statistic for quality
5/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
Ex
• What is the probability of getting two consecutive heads?• A = {Head on flip 1} P (A) = .5• B = {Head on flip 2} P (B ) = .5
• A
∩B =
{Head on flips 1 and 2
}• P (A ∩ B ) = P (A)P (B ) = .5 × .5 = .25
-
8/20/2019 Lecture 4Statistic for quality
6/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
Ex
• Volume 309 of Science reports on a physician who was on triatestimony in a criminal trial
• Based on an estimated prevalence of sudden infant death synd8, 543, Dr Meadow testified that that the probability of a mot
children with SIDS was 18,5432
• The mother on trial was convicted of murder• What was Dr Meadow’s mistake(s)?
-
8/20/2019 Lecture 4Statistic for quality
7/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
Example: cont
• For the purposes of this class, the principal mistake was to asprobabilities of having SIDs within a family are independent
• That is, P (A1 ∩ A2) is not necessarily equal to P (A1)P (A2)• Biological processes that have a believed genetic or familiar en
component, of course, tend to be dependent within families
• In addition, the estimated prevalence was obtained from an unon single cases; hence having no information about recurrencefamilies
-
8/20/2019 Lecture 4Statistic for quality
8/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
Usefu
• We will use the following fact extensively in this class:
If a collection of random variables X 1, X 2, . . . , X n are independtheir joint distribution is the product of their individual densit
functions
That is, if f i is the density for random variable X i we have tha
f (x 1, . . . , x n) =n
i =1
f i (x i )
-
8/20/2019 Lecture 4Statistic for quality
9/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
IID random var
• In the instance where f 1 = f 2 = . . . = f n we say that the X i arindependent and identically distributed
• iid random variables are the default model for random sample
• Many of the important theories of statistics are founded on as
variables are iid
-
8/20/2019 Lecture 4Statistic for quality
10/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
Ex
• Suppose that we flip a biased coin with success probability p the join density of the collection of outcomes?
• These random variables are iid with densities p x i (1 − p )1−x i • Therefore
f (x 1, . . . , x n) =
ni =1
p x i (1 − p )1−x i = p x i (1 − p )n
-
8/20/2019 Lecture 4Statistic for quality
11/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
Corre
• The covariance between two random variables X and Y is de
Cov(X , Y ) = E [(X − µx )(Y − µy )] = E [XY ] − E [X• The following are useful facts about covariance
1 Cov(X , Y ) = Cov(Y , X )2 Cov(X , Y ) can be negative or positive3 |Cov(X , Y )| ≤
Var(X )Var(y )
-
8/20/2019 Lecture 4Statistic for quality
12/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
Corre
• The correlation between X and Y isCor(X , Y ) = Cov(X , Y )/
Var(X )Var(y )
1 −1 ≤ Cor(X , Y ) ≤ 12 Cor(X , Y ) =
±1 if and only if X = a + bY for some constant
3 Cor(X , Y ) is unitless4 X and Y are uncorrelated if Cor(X , Y ) = 05 X and Y are more positively correlated, the closer Cor(X , Y ) 6 X and Y are more negatively correlated, the closer Cor(X , Y )
-
8/20/2019 Lecture 4Statistic for quality
13/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
Some useful r
• Let {X i }ni =1 be a collection of random variables• When the
{X i
} are uncorrelated
Var
ni =1
ai X i + b
=
ni =1
a2i Var(X i )
• Otherwise
Var n
i =1ai X i + b
=
ni =1
a2i Var(X i ) + 2n−1i =1
n j =i
ai a j Cov(X i , X
• If the X i are iid with variance σ2 then Var(X̄ ) = σ2/n and E [S
-
8/20/2019 Lecture 4Statistic for quality
14/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
ExampleProve that Var(X + Y ) = Var(X ) + Var(Y ) + 2Cov(X , Y )
Var(X + Y )
= E [(X + Y )(X + Y )] − E [X + Y ]2
= E [X 2 + 2XY + Y 2] − (µx + µy )2
= E [X 2 + 2XY + Y 2] − µ2x − 2µx µy − µ2y
= (E [X 2] − µ2x ) + (E [Y 2] − µ2y ) + 2(E [XY ] − µx
= Var(X ) + Var(Y ) + 2Cov(X , Y )
-
8/20/2019 Lecture 4Statistic for quality
15/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
• A commonly used subcase from these properties is that if a co
random variables {X i } are uncorrelated , then the variance of tsum of the variances
Var
ni =1
X i
=
ni =1
Var(X i )
• Therefore, it is sums of variances that tend to be useful, not sdeviations; that is, the standard deviation of the sum of bunchrandom variables is the square root of the sum of the varianceof the standard deviations
-
8/20/2019 Lecture 4Statistic for quality
16/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
The sample Suppose X i are iid with variance σ
2
Var(X̄ ) = Var1n
n
i =1
X i =
1
n2Var
ni =1
X i
=
1
n2
ni =1
Var
(X i )
= 1
n2 × nσ2
= σ2
n
M h i l
-
8/20/2019 Lecture 4Statistic for quality
17/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
Some com
• When X i are independent with a common variance Var(X̄ ) =• σ/√ n is called the standard error of the sample mean• The standard error of the sample mean is the standard deviat
distribution of the sample mean
• σ is the standard deviation of the distribution of a single obse
• Easy way to remember, the sample mean has to be less variabobservation, therefore its standard deviation is divided by a
√
M th ti l
-
8/20/2019 Lecture 4Statistic for quality
18/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
The sample va
• The sample variance is defined as
S 2 =
ni =1(X i − X̄ )2
n − 1• The sample variance is an estimator of σ2• The numerator has a version that’s quicker for calculation
ni =1
(X i − X̄ )2 =n
i =1
X 2i − nX̄ 2
• The sample variance is (nearly) the mean of the squared deviamean
Mathematical
-
8/20/2019 Lecture 4Statistic for quality
19/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
The sample variance is unb
E n
i =1 (X i −
¯X )
2 =
ni =1
E
X
2
i − nE ¯X 2
=n
i =1
Var(X i ) + µ
2− n Var(X̄
=
ni =1
σ
2
+ µ2− n σ2/n + µ2
= nσ2 + nµ2 − σ2 − nµ2
= (n − 1)σ2
Mathematical
-
8/20/2019 Lecture 4Statistic for quality
20/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
Hoping to avoid some con
• Suppose X i are iid with mean µ and variance σ2• S 2 estimates σ2• The calculation of S 2 involves dividing by n − 1• S /√ n estimates σ/√ n the standard error of the mean• S /√ n is called the sample standard error (of the mean)
Mathematical
-
8/20/2019 Lecture 4Statistic for quality
21/21
Mathematical
Biostatistics
Boot Camp:
Lecture 4,
Random
Vectors
Brian Caffo
Table of
contents
Random
vectors
Independence
Independentevents
Independentrandom variables
IID randomvariables
Correlation
Variance and
correlation
properties
Variances
properties of
sample means
The sample
variance
Some
Ex
• In a study of 495 organo-lead workers, the following summariefor TBV in cm3
• mean = 1151.281• sum of squared observations = 662361978
• sample sd = (662361978 − 495 × 1151.2812)/494 = 112.62• estimated se of the mean = 112.6215/√ 495 = 5.062