213Sampling.pdf

29
213Sampling.pdf When one is attempting to study the variable of a population, whether the variable is qualitative or quantitative, there are two methods of data collection which can be employed: 1. Conduct a Census. 2. Take a Sample.

description

213Sampling.pdf. When one is attempting to study the variable of a population, whether the variable is qualitative or quantitative, there are two methods of data collection which can be employed: 1. Conduct a Census. 2. Take a Sample. Some Population characteristics: - PowerPoint PPT Presentation

Transcript of 213Sampling.pdf

Page 1: 213Sampling.pdf

213Sampling.pdf

When one is attempting to study the variable of a population, whether the variable is qualitative or quantitative, there are two methods of data collection which can be employed:

1. Conduct a Census.2. Take a Sample.

Page 2: 213Sampling.pdf

Some Population characteristics:

1. = population average (mean)

2. p = population proportion

Some Sample characteristics:

1. = sample average

2. = sample proportionp̂X

Page 3: 213Sampling.pdf

A census is taken when every element (or individual if the population consists of people) in the population of interest is inspected with regards to the population variable of interest.

Taking a Census:

Page 4: 213Sampling.pdf

Problems with a Census

1. Can each member of the population be accessed? --- Cause for bias.

2. Control of information gathering is very difficult for large populations. --- Cause for bias.

3. Very expensive and time consuming.

Page 5: 213Sampling.pdf

Sample

Take only part of the population:

We hope that the characteristics of the sample reflect those of the population.

Page 6: 213Sampling.pdf

Suppose the mean (or average) number of courses taken by the 100 students in the sample was 4.2, or = 4.2.

What does this statistic represent?

Does this indicate that the average number of courses taken in the Fall 2005 semester for ALL undergraduate students is 4.2, or µ = 4.2?

No, it does not.

X

Page 7: 213Sampling.pdf

Taking a Random Sample

There are four ways to select a random sample:

1. Simple random sampling (SRS). 2. Stratified sampling. 3. Cluster sampling. 4. Systematic sampling (or 1-in-k sampling)

Page 8: 213Sampling.pdf

1. Simple Random Sampling - SRS A simple random sample is taken when every

conceivable subgroup of a size n has the same “chance” of being selected as the sample.

Page 9: 213Sampling.pdf

In order to take a simple random sample, you require:

1. A sampling frame - a complete list of all the elements in the population of interest.

2. A method of numerically differentiating between each element in the sampling frame. This is done by assigning each element in the sampling frame a unique numerical number - a number that “belongs” to only them.

3. A random number generator - such as the table on pages 847 and 848 of your text.

Page 10: 213Sampling.pdf

Consider a regular sized lecture section of Statistics 213 (having 115 students, or N = 115) as the population of interest.

Do you have at least one part time job? (Yes = 1, No = 0)

How many part-time jobs do you have? (Answer = 0, 1, 2, …)

Example

Page 11: 213Sampling.pdf

Below is a condensed version of the class list, or the sampling frame.

1. Student A2. Student B3. Student C

.

.

.115. Student XYZ

1 and 2:

Page 12: 213Sampling.pdf

3. Using the random number generatorA simple random sample of 5 will be taken.

The first three digit number (row 1, columns 1 - 3) is “104”.

If the next three digit number is not between 1 and 115, continue until you find a three digit number between 1 and 115. It should be different from the previously selected number.

The next four choices are: 094, 103, 071, 023.

The SRS will be the 23rd, 94th, 71st, 103rd and 104th student.

Page 13: 213Sampling.pdf

Suppose of the five students selected, two have at least one part-time job.

The sample proportion, , is then p̂

40.5

2ˆ p

Page 14: 213Sampling.pdf

Advantages of SRS

A simple random sample is the purest method of random selection.

The simple random sample criteria allows the selection of the sample to be done in a completely objective manner.

There are no issues with selection bias

Page 15: 213Sampling.pdf

A stratified sample is taken when the population of interest is subdivided into k-different groups, or k-strata.Once this is done a simple random sample (SRS) is taken from each stratum. The simple random samples taken from each stratum are put together and constitute the random sample, or n.

How is a population stratified?

2. Stratified Sampling

Page 16: 213Sampling.pdf

The population is stratified according to some other population variable:

1. geographic - stratify according to some ‘location’ variable of the underlying population: province, region (West, Central, East, Maritimes), quadrant (NW, NE, SW, SE), rural vs. urban, etc.2. non-geographic - stratify according to some ‘non-location’ variable of the population: gender (male, female), income level/tax bracket (lower, middle, upper), age level (18 < 30, 30 < 40, 40 < 55, 55 and up), education level, etc.

Page 17: 213Sampling.pdf

Consider the population of Canadian voters, and the variable of interest is:

“whether or not a politician can be trusted”.

To measure such, suppose a random sample of 1000 Canadian voters is to be selected using stratified sampling, and the population will be stratified into 4 strata. (k = 4)

Example

Page 18: 213Sampling.pdf

Stratifying according to region of the country, we have

Stratum #1 Stratum #2 Stratum #3 Stratum #4The West Ontario Quebec Atlantic nW = 250 nO = 250 nQ = 250 nM = 250

= 58% = 52% = 61% = 68%

Since the samples are “equally weighted”, the sample proportion is simply the average of the individual sample proportions:

= 0.58 + 0.52 + 0.61 + 0.68 = 0.59754

Wp̂ Op̂ Qp̂ Ap̂

Page 19: 213Sampling.pdf

Stratum #1 Stratum #2 Stratum #3 Stratum #4West Ontario Quebec Atlantic30.3 % 37.9 % 24.0 % 7.8 %nW = 303 nO = 379 nQ = 240 nM = 78

= 58% = 52% = 61% = 68%

Proportionally Stratified Sampling1

Wp̂ Op̂ Qp̂ Ap̂

Page 20: 213Sampling.pdf

Stratum #1 Stratum #2 Stratum #3 Stratum #4West Ontario Quebec Atlantic30.3 % 37.9 % 24.0 % 7.8 %nW = 303 nO = 379 nQ = 240 nM = 78 = 58% = 52% = 61% = 68%

Because the stratified sample has been conducted proportionally, we then “weight” the individual percentages and calculate the weighted-average:

= 303 (0.58) + 379 (0.52) +240 (0.61) + 78 (0.68)

1000 1000 1000 1000 = 0.5723

Wp̂ Op̂ Qp̂ Ap̂

Page 21: 213Sampling.pdf

1. Approximately 58% of voters in the West and Territories believe politicians cannot be trusted.

2. Approximately 52% of voters in Ontario believe politicians cannot be trusted.

3. Approximately 61% of voters in Quebec believe politicians cannot be trusted.

4. Approximately 68% of voters in Atlantic Canada believe politicians cannot be trusted.

Page 22: 213Sampling.pdf

Often one does not have the luxury of a largebudget or time frame to complete a study on a large population. In such cases one can attempt to sample from the population using a method that seems to closely follow a stratified sample, but is much easier.

If one wishes to study the annual income of households in Calgary, clearly it would be difficult to have a complete list of all the households of Calgary.

Identify “clusters” of a population - those non-overlapping groups that elements in a population naturally fall within.

Page 23: 213Sampling.pdf

Once this is done, the researcher can either:

1. randomly select, using SRS, one cluster (or many clusters) and then inspect every element fallingwithin the randomly selected cluster(s)

2. randomly select, using SRS, one cluster (or many clusters) and then take a simple random sample ofelements from the randomly selected cluster(s).

Option #1 is deemed a single stage cluster sample.Option #2 is called a double (or multi) stagecluster sample.

Page 24: 213Sampling.pdf

The distinction is the following:

In a single stage cluster sample, random selection is only occurring once: the cluster (or clusters) are RANDOMLY selected.

In a double (or multi) stage cluster sample, random selection is occurring twice (or more than twice).

Page 25: 213Sampling.pdf

The sampling error is simply the difference between the sample and the population. There will be difference,albeit slight, between what is happening in the sample and what is really happening in the population.Both the sample mean ( ) and the sample proportion ( ) have a sampling error.

The sampling error of is approximated by the following:

Error =

Clearly the larger the sample size, the smaller the sampling error.

n

1

X

Page 26: 213Sampling.pdf

A random sample of 500 Calgarians indicated that 45% of Calgarians think that Highway #2 between Calgary and Edmonton should be a toll highway. In this example the population of interest consists of Calgarians.

The variable is an opinion about whether #2 between Calgary and Edmonton should be a toll highway -a qualitative variable that can be measured using a nominal scale (why?).

The sample size, or n, is 500. The sample proportion, or = 0.45.

What is the error of this sample?

Page 27: 213Sampling.pdf

Error =

We get an interval estimate:

That is, from:

errorpˆ

045.0447.0500

11

n

495.0415.0..

045.045.0045.045.0

toei

to

Page 28: 213Sampling.pdf

1. Error of Non-Inclusion - errors due to a member of the population having no “chance” of appearing in the sample. (Nonrandom sampling)

2. Errors of Non-Observation - errors that arise due to problems with sampling. (Non-response)

3. Errors of Observation - errors related to the collecting of the data. Such errors can occur even when the sample is selected using random methods. (Incorrect answer, or measurement bias)

There are other errors in sampling that can occur, which are often called “biases”. Preventative measures can be taken to reduce, and in some cases eliminate, such sampling bias. These biases, or errors, can be classified into:

Page 29: 213Sampling.pdf

Summarizing the Data

Four types of graphical methods will be discussed. These four methods are used for displaying data on a population variable that is quantitative. These four graphical routines are:

Dotplot. Stem-and-Leaf plot (or stemplot). Histogram. Boxplot.