Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk.

Post on 14-Jan-2016

214 views 0 download

Transcript of Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk.

Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk

Means and Variances

What happens to means and variances when data is manipulated?

Let’s check by manipulating data from the survey.

Data

Height in inches (HT) Shoe size (Shoe) Age (Age) Additional Columns:

Height with a 1 inch heel (HeightPlus1)Height in centimeters (2.5TimesHeight)Sum of height and shoe size

(HeightPlusShoe)Sum of height and age (HeightPlusAge)

Statistics

Variable N Mean StDev

HT 444 66.928 3.938

Shoe 445 9.1056 1.9484

Age 444 20.371 2.912

HeightPlus1 444 67.928 3.938

2.5TimesHeight 444 167.32 9.84

HeightPlusShoe 444 76.035 5.693

HeightPlusAge 444 87.299 4.913

Variable N Mean StDev

HT 444 66.928 3.938

Shoe 445 9.1056 1.9484

Age 444 20.371 2.912

HeightPlus1 444 67.928 3.938

2.5TimesHeight 444 167.32 9.84

HeightPlusShoe 444 76.035 5.693

HeightPlusAge 444 87.299 4.913

Observation 1

The mean of heel heights is one inch larger than then mean of heights

Why?

If every element is modified by a constant number the mean follows the same pattern.

Variable N Mean StDev

HT 444 66.928 3.938

Shoe 445 9.1056 1.9484

Age 444 20.371 2.912

HeightPlus1 444 67.928 3.938

2.5TimesHeight 444 167.32 9.84

HeightPlusShoe 444 76.035 5.693

HeightPlusAge 444 87.299 4.913

Observation 2

The standard deviation of heel heights equals the standard deviation of heights

Why?

Standard deviation is relative to the mean, and the shape of the distribution didn’t change

Variable N Mean StDev

HT 444 66.928 3.938

Shoe 445 9.1056 1.9484

Age 444 20.371 2.912

HeightPlus1 444 67.928 3.938

2.5TimesHeight 444 167.32 9.84

HeightPlusShoe 444 76.035 5.693

HeightPlusAge 444 87.299 4.913

Observation 3

The standard deviation of heights is 2.5 times the standard deviation of heights in centimeters

Why?

By multiplying all data values by a constant value we are increasing the spread of the histogram by the same value, therefore modifyingthe properties that depend on the spread (like standard deviation.)

Variable N Mean StDev

HT 444 66.928 3.938

Shoe 445 9.1056 1.9484

Age 444 20.371 2.912

HeightPlus1 444 67.928 3.938

2.5TimesHeight 444 167.32 9.84

HeightPlusShoe 444 76.035 5.693

HeightPlusAge 444 87.299 4.913

Observation 4

Mean of HeightPlusShoe = Mean of Height + Mean of Shoe

Variable N Mean StDev

HT 444 66.928 3.938

Shoe 445 9.1056 1.9484

Age 444 20.371 2.912

HeightPlus1 444 67.928 3.938

2.5TimesHeight 444 167.32 9.84

HeightPlusShoe 444 76.035 5.693

HeightPlusAge 444 87.299 4.913

Observation 5

Mean of HeightPlusAge = Mean of Height + Mean of Age

Why?

Since

Variances

Variance = σ2

Variances apply to a probability distribution

Variance is a way to capture the degree of spread of a distribution

Variances

Variable Variance

HT 15.50784

Shoe 3.796263

Age 8.479744

HeightPlusShoe 32.41025

HeightPlusAge 24.13757

Dependence

Are shoe sizes and heights dependent? Are age and height dependent? Let’s check using scatter plots

Height vs. Shoe Size

Height vs. Age

Back to variances

Variance of HeightPlusShoe is much greater than Var(Height) + Var(Shoe)

Variance of HeightPlusAge is very close to Var(Height) + Var(Age)

Variable VarianceHT 15.50784Shoe 3.796263Age 8.479744

HeightPlusShoe 32.41025HeightPlusAge 24.13757

Why?

Can you see a difference in relationships (Height vs. Shoe Size) and (Height vs. Age?)

Dependence

Adding two dependent data distributions produces extremes (adding small values with corresponding small values and adding large values to correspondent large values)

This makes the variance much larger.

Dependence

In case of independent sets, values do not necessarily correspond by relative value (large values can be added to small values)

This does not alter the spread of the distribution much

Variance of sample mean Mean = (X1 + X2 + … + Xn)/n

Variance [(X1 + X2+ … +Xn)/n] = (Variance[X1] + Variance[X2]+ … + Variance[Xn])/n

Dependence?

Would this work for dependent values of X1, X2 … Xn ?

Would the variance produced by this formula be larger or smaller than actual?

Sampling without replacementWould the variance formula hold true?Why?

Dependence

Adding variances of dependent values will produce a smaller result than expected because adding dependent data sets will produce extremes, altering the spread

Sampling without replacement on smaller populations (n < 10) will produce dependence

The End

Extra Credit (Dr. Pfenning) Use Minitab Calculator to create column

“Birthyear” Plot Earned vs. Birthyear, note relationship Create column “EarnedPlusBirthyear” Find sds of Earned, Birthyear,

EarnedPlusBirthyear, square to variances Compare variances Explain results