Statistics 100 Lecture Set 3

Statistics 100Lecture Set 3

Lecture Set 3

• Chapters 3 and 4… please read

• Pay particular attention to page 47-49… discussion on confidence statements

• Some suggested problems:– Chapter 3: 3.3, 3.9, 3.11, 3.13, 3.15, 3.21, 3.25, 3.27– Chapter 4: 4.3, 4.5, 4.7, 4.9, 4.15, 4.17

Example

• A 2004 Gallop poll indicated that “a slight majority of Americans (51% to 45%) supported a constitutional amendment that would define marriage as being between a man and a woman only

• The poll was based on a random sample of n = 2,527 adults

• Is 51% really evidence that the majority of Americans favour such an amendment?

What do samples tell us?

• Parameters, Statistics, and Estimates

• A parameter is a population quantity that we want to know about

• A statistic is any quantity computed from a sample

• An estimate is a specific statistic that is used to guess (estimate) a specific parameter

Example

• Parameter?

• Estimate?

• What is the true population parameter in this case?

• Does Parameter = Statistic?

• If I took another sample, would the statistic from the first sample equal the statistic from the second sample?

• Why or why not?

Variation

• Statistics are subject to sampling variability

• Different individuals give different responses– This is variability

• Different samples of individuals represent different collections of responses

• Statistics computed on different samples vary

Variation

• Advantages of random samples:

Variation

• Bias of the statistic

• Variability of the statistic

Variation

Variation

• Variability in a statistic is unfortunate

• Variability in a statistic is unavoidable

• Statistical Science teaches us how the variability in a statistic behaves

• Statisticians learn how to interpret the variability in statistics and turn them

Variation

• For example, a fundamental principle is:

A statistic becomes less variable as the sample size is increased

• Generally, variability in a statistic is related to

Reducing bias and variability

• To reduce bias:

• To reduce variability:


• Statistical Science teaches us how to sample and how to calculate statistics to avoid bias

– Estimate the average height in class using• Tallest in sample• Middle height in sample• Average height in sample

• Which is best?


• Choosing a sample size is a compromise

– Reducing variability (large n)

– Cost (small n)


• To summarize, we use Statistical Science to

– Design studies to have low bias and low variability

– Design statistics to have low bias and low variability

Margin of error

• The estimate of the population quantity does not tell the whole story

• Need to report some measure of variability

• If there is no reported variability, question the validity

Margin of error

• Common measure is the margin of error

Example:

55 % of Brazilians support instituting the death penalty, which does not exist in Brazil, according to the Datafolha.

Datafolha said it interviewed 5,700 people across Brazil on March 19-20, and the survey had a margin of error of 2 % points. (Associated Press: April 8, 2007)

Margin of error

• Common measure is the margin of error

Margin of error

• Situation:– Statistic is a proportion (or a percentage)– Expect answers to be generally “close” to .5 (50%)

• (Like between about .2 and .8 mostly, not much below .1)

• Then the Margin of Error =

Margin of error

• Interpretation of Margin of Error (MoE):– We expect a proportion estimated from a sample of this

size to be within +/- 1 MoE of the population value in 95% of samples

• i.e. there is only a 5% chance that we will “miss” the parameter we are estimating by more than the MoE

• Look at example survey: – n=1009, so MoE = = 1/31.765 = 0.03148 or about 3.1%

Margin of error

• One last thing: Which gives a less variable estimate of a parameter:– A sample of 1000 people from Canada– A sample of 1000 people from USA– A sample of 1000 people from China

• Confidence statement:– Margin of error

– Confidence level:

Confidence statements

Example

• A 2004 Gallop poll indicated that “a slight majority of Americans (51% to 45%) supported a constitutional amendment that would define marriage as being between a man and a woman only

• The poll was based on a random sample of n = 2,527 adults

• Is 51% really evidence that the majority of Americans favour such an amendment?

Example

• What does this say about whether there is a majority of Americans who support the amendment?

Chapter 4: Sample surveys in the real world

• Sampling in the real world is tricky

• Many people choose not to participate

• Questions have to be designed to not influence the response

• Question takers have to be properly trained

• … and so on …

• When you dial a random number, what are the possible things that can happen?

• Just because you call a number to conduct a survey, doesn’t mean that you automatically get answers to your questions.

– Is this a problem?

• One can draw a random sample, compute a margin of error, but still have some problems

• What can go wrong?

• Sampling errors (bad sampling design and undercoverage)

• Random variation • Nonsampling errors

Undercoverage

• Occurs when some members of the population are inadequately represented in the sample (or under represented)

Undercoverage

• Example: USA Census and Hispanic population (McKay, 1993)

• Undercounts can occur when:

– the survey enumerator misses an entire household

– when the person reporting for the household does not list all of the occupants of the household

• Other potential causes:

Non-sampling errors

• What can go wrong that has nothing to do with the way the data were collected?

• Errors that have nothing to do with which individuals are selected for the sample

• Includes– Interviewer error (mis-recording an answer)– Memory errors (Name everything you ate yesterday)– Measurement errors (inexact height measurement)– Response error (subject gives incorrect response)– And…

Non-sampling errors

• Non-response:

Non-sampling errors

• Question problems:

Brief moment of statistical relevance(Wording issues)

• Wednesday’s (March 1, 2006) Vancouver Province Headline:

• “61% of polled back another teacher strike”

• What can you conclude?

Brief moment of statistical relevance

• Actual Question:

"In 2002, the government changed the teachers' contract, removing limits on class size in total and class-composition language regarding students with special needs. In order to end the teachers' strike last October (2005), the government promised to consider dealing with these issues through legislation rather than in contract.”

"If the government does not put class-size limits and class-composition language in legislation, would you support or oppose teachers if they went on strike again?“

• What do you think about this?

Brief moment of statistical relevance

• Alternate question posed by me:

”In October 2005, BC teachers went on strike. It is well-known that student learning outcomes are impacted by the number of continuous school days students spend in school.”

"If the BC Teachers Federation votes to go on strike this spring (spring 2005), would you support or oppose teachers if they went on strike again?“

• What do you think about this?

Non-Sampling Errors

• Survey construction can cause errors!– Ordering of questions

• Sensitive questions placed at end• Answer to one question can accidentally influence another

1. Did you vote for Stephen Harper?2. Do you think Stephen Harper is doing a good job?

How do we measure/account/ adjust for these problems?

• Random Variation (Chapter 3)– Unavoidable– Minimized by good sample choice, good statistic choice– Measured by margin of error

Nonresponse:• Adjust statistics using previously established traits on

refusers

• Follow-up sampling to get responses from some refusers and learn about them

• Make sample match known population traits by weighting responses

• We can try to avoid sampling errors that happen despite randomization

• Can break population into groups of likely similar responses

• M/F in height measurements• Neighbourhood for relation to socio-economic

status• Race• This is called stratification• Groups are called strata (singular: stratum)

• Stratified Random Sampling:– Take separate random samples from each

stratum– Usually proportional to size of population in

stratum– If ¼ of population is in stratum, then ¼ of

sample is taken from there– Guarantees that sample matches population in

relation to strata

Example

• A university has 800 males and 1,200 female faculty members

• A researcher wants to sample opinions from this population in such a way as to give adequate attention to the both males and females

• How would you take a stratified random sample of 100 people from this population?

Statistics 100 Lecture Set 3

Documents

Transcript of Statistics 100 Lecture Set 3