Statistics 100 Lecture Set 3

43
Statistics 100 Lecture Set 3

description

Statistics 100 Lecture Set 3. Lecture Set 3. Chapters 3 and 4… please read Pay particular attention to page 47-49… discussion on confidence statements Some suggested problems: Chapter 3: 3.3, 3.9, 3.11, 3.13, 3.15, 3.21, 3.25 , 3.27 Chapter 4: 4.3, 4.5, 4.7, 4.9, 4.15 , 4.17. Example. - PowerPoint PPT Presentation

Transcript of Statistics 100 Lecture Set 3

Page 1: Statistics 100 Lecture Set 3

Statistics 100Lecture Set 3

Page 2: Statistics 100 Lecture Set 3

Lecture Set 3

• Chapters 3 and 4… please read

• Pay particular attention to page 47-49… discussion on confidence statements

• Some suggested problems:– Chapter 3: 3.3, 3.9, 3.11, 3.13, 3.15, 3.21, 3.25, 3.27– Chapter 4: 4.3, 4.5, 4.7, 4.9, 4.15, 4.17

Page 3: Statistics 100 Lecture Set 3

Example

• A 2004 Gallop poll indicated that “a slight majority of Americans (51% to 45%) supported a constitutional amendment that would define marriage as being between a man and a woman only

• The poll was based on a random sample of n = 2,527 adults

• Is 51% really evidence that the majority of Americans favour such an amendment?

Page 4: Statistics 100 Lecture Set 3

What do samples tell us?

• Parameters, Statistics, and Estimates

• A parameter is a population quantity that we want to know about

• A statistic is any quantity computed from a sample

• An estimate is a specific statistic that is used to guess (estimate) a specific parameter

Page 5: Statistics 100 Lecture Set 3

Example

• Parameter?

• Estimate?

• What is the true population parameter in this case?

Page 6: Statistics 100 Lecture Set 3

• Does Parameter = Statistic?

• If I took another sample, would the statistic from the first sample equal the statistic from the second sample?

• Why or why not?

Page 7: Statistics 100 Lecture Set 3

Variation

• Statistics are subject to sampling variability

• Different individuals give different responses– This is variability

• Different samples of individuals represent different collections of responses

• Statistics computed on different samples vary

Page 8: Statistics 100 Lecture Set 3

Variation

• Advantages of random samples:

Page 9: Statistics 100 Lecture Set 3

Variation

• Bias of the statistic

• Variability of the statistic

Page 10: Statistics 100 Lecture Set 3

Variation

Page 11: Statistics 100 Lecture Set 3

Variation

• Variability in a statistic is unfortunate

• Variability in a statistic is unavoidable

• Statistical Science teaches us how the variability in a statistic behaves

• Statisticians learn how to interpret the variability in statistics and turn them

Page 12: Statistics 100 Lecture Set 3

Variation

• For example, a fundamental principle is:

A statistic becomes less variable as the sample size is increased

• Generally, variability in a statistic is related to

Page 13: Statistics 100 Lecture Set 3

Reducing bias and variability

• To reduce bias:

• To reduce variability:

Page 14: Statistics 100 Lecture Set 3

Reducing bias and variability

• Statistical Science teaches us how to sample and how to calculate statistics to avoid bias

– Estimate the average height in class using• Tallest in sample• Middle height in sample• Average height in sample

• Which is best?

Page 15: Statistics 100 Lecture Set 3

Reducing bias and variability

• Choosing a sample size is a compromise

– Reducing variability (large n)

– Cost (small n)

Page 16: Statistics 100 Lecture Set 3

Reducing bias and variability

• To summarize, we use Statistical Science to

– Design studies to have low bias and low variability

– Design statistics to have low bias and low variability

Page 17: Statistics 100 Lecture Set 3

Margin of error

• The estimate of the population quantity does not tell the whole story

• Need to report some measure of variability

• If there is no reported variability, question the validity

Page 18: Statistics 100 Lecture Set 3

Margin of error

• Common measure is the margin of error

Example:

55 % of Brazilians support instituting the death penalty, which does not exist in Brazil, according to the Datafolha.

Datafolha said it interviewed 5,700 people across Brazil on March 19-20, and the survey had a margin of error of 2 % points. (Associated Press: April 8, 2007)

Page 19: Statistics 100 Lecture Set 3

Margin of error

• Common measure is the margin of error

Page 20: Statistics 100 Lecture Set 3

Margin of error

• Situation:– Statistic is a proportion (or a percentage)– Expect answers to be generally “close” to .5 (50%)

• (Like between about .2 and .8 mostly, not much below .1)

• Then the Margin of Error =

Page 21: Statistics 100 Lecture Set 3

Margin of error

• Interpretation of Margin of Error (MoE):– We expect a proportion estimated from a sample of this

size to be within +/- 1 MoE of the population value in 95% of samples

• i.e. there is only a 5% chance that we will “miss” the parameter we are estimating by more than the MoE

• Look at example survey: – n=1009, so MoE = = 1/31.765 = 0.03148 or about 3.1%

Page 22: Statistics 100 Lecture Set 3

Margin of error

• One last thing: Which gives a less variable estimate of a parameter:– A sample of 1000 people from Canada– A sample of 1000 people from USA– A sample of 1000 people from China

Page 23: Statistics 100 Lecture Set 3

• Confidence statement:– Margin of error

– Confidence level:

Confidence statements

Page 24: Statistics 100 Lecture Set 3

Example

• A 2004 Gallop poll indicated that “a slight majority of Americans (51% to 45%) supported a constitutional amendment that would define marriage as being between a man and a woman only

• The poll was based on a random sample of n = 2,527 adults

• Is 51% really evidence that the majority of Americans favour such an amendment?

Page 25: Statistics 100 Lecture Set 3

Example

• What does this say about whether there is a majority of Americans who support the amendment?

Page 26: Statistics 100 Lecture Set 3

Chapter 4: Sample surveys in the real world

• Sampling in the real world is tricky

• Many people choose not to participate

• Questions have to be designed to not influence the response

• Question takers have to be properly trained

• … and so on …

Page 27: Statistics 100 Lecture Set 3

• When you dial a random number, what are the possible things that can happen?

Page 28: Statistics 100 Lecture Set 3

• Just because you call a number to conduct a survey, doesn’t mean that you automatically get answers to your questions.

– Is this a problem?

Page 29: Statistics 100 Lecture Set 3

• One can draw a random sample, compute a margin of error, but still have some problems

• What can go wrong?

• Sampling errors (bad sampling design and undercoverage)

• Random variation • Nonsampling errors

Page 30: Statistics 100 Lecture Set 3

Undercoverage

• Occurs when some members of the population are inadequately represented in the sample (or under represented)

Page 31: Statistics 100 Lecture Set 3

Undercoverage

• Example: USA Census and Hispanic population (McKay, 1993)

• Undercounts can occur when:

– the survey enumerator misses an entire household

– when the person reporting for the household does not list all of the occupants of the household

• Other potential causes:

Page 32: Statistics 100 Lecture Set 3

Non-sampling errors

• What can go wrong that has nothing to do with the way the data were collected?

• Errors that have nothing to do with which individuals are selected for the sample

• Includes– Interviewer error (mis-recording an answer)– Memory errors (Name everything you ate yesterday)– Measurement errors (inexact height measurement)– Response error (subject gives incorrect response)– And…

Page 33: Statistics 100 Lecture Set 3

Non-sampling errors

• Non-response:

Page 34: Statistics 100 Lecture Set 3

Non-sampling errors

• Question problems:

Page 35: Statistics 100 Lecture Set 3

Brief moment of statistical relevance(Wording issues)

• Wednesday’s (March 1, 2006) Vancouver Province Headline:

• “61% of polled back another teacher strike”

• What can you conclude?

Page 36: Statistics 100 Lecture Set 3

Brief moment of statistical relevance

• Actual Question:

"In 2002, the government changed the teachers' contract, removing limits on class size in total and class-composition language regarding students with special needs. In order to end the teachers' strike last October (2005), the government promised to consider dealing with these issues through legislation rather than in contract.”

"If the government does not put class-size limits and class-composition language in legislation, would you support or oppose teachers if they went on strike again?“

• What do you think about this?

Page 37: Statistics 100 Lecture Set 3

Brief moment of statistical relevance

• Alternate question posed by me:

”In October 2005, BC teachers went on strike. It is well-known that student learning outcomes are impacted by the number of continuous school days students spend in school.”

"If the BC Teachers Federation votes to go on strike this spring (spring 2005), would you support or oppose teachers if they went on strike again?“

• What do you think about this?

Page 38: Statistics 100 Lecture Set 3

Non-Sampling Errors

• Survey construction can cause errors!– Ordering of questions

• Sensitive questions placed at end• Answer to one question can accidentally influence another

1. Did you vote for Stephen Harper?2. Do you think Stephen Harper is doing a good job?

Page 39: Statistics 100 Lecture Set 3

How do we measure/account/ adjust for these problems?

• Random Variation (Chapter 3)– Unavoidable– Minimized by good sample choice, good statistic choice– Measured by margin of error

Page 40: Statistics 100 Lecture Set 3

Nonresponse:• Adjust statistics using previously established traits on

refusers

• Follow-up sampling to get responses from some refusers and learn about them

• Make sample match known population traits by weighting responses

Page 41: Statistics 100 Lecture Set 3

• We can try to avoid sampling errors that happen despite randomization

• Can break population into groups of likely similar responses

• M/F in height measurements• Neighbourhood for relation to socio-economic

status• Race• This is called stratification• Groups are called strata (singular: stratum)

Page 42: Statistics 100 Lecture Set 3

• Stratified Random Sampling:– Take separate random samples from each

stratum– Usually proportional to size of population in

stratum– If ¼ of population is in stratum, then ¼ of

sample is taken from there– Guarantees that sample matches population in

relation to strata

Page 43: Statistics 100 Lecture Set 3

Example

• A university has 800 males and 1,200 female faculty members

• A researcher wants to sample opinions from this population in such a way as to give adequate attention to the both males and females

• How would you take a stratified random sample of 100 people from this population?