13 Collecting Statistical Data

75
Excursions in Modern Mathematics, 7e: 13.1 - 1 Copyright © 2010 Pearson Education, Inc.

description

13 Collecting Statistical Data. 13.1The Population 13.2Sampling 13.3 Random Sampling 13.4Sampling: Terminology and Key Concepts 13.5The Capture-Recapture Method 13.6Clinical Studies. The Population. - PowerPoint PPT Presentation

Transcript of 13 Collecting Statistical Data

Page 1: 13 Collecting Statistical Data
Page 2: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 2Copyright © 2010 Pearson Education, Inc.

13 Collecting Statistical Data

13.1 The Population

13.2 Sampling

13.3 Random Sampling

13.4 Sampling: Terminology and Key Concepts

13.5 The Capture-Recapture Method

13.6 Clinical Studies

Page 3: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 3Copyright © 2010 Pearson Education, Inc.

Every statistical statement refers, directly or indirectly, to some group of individuals or objects. In statistical terminology, this collection of individuals or objects is called the population. The first question we should ask ourselves when trying to make sense of a statistical statement is, “what is the population to which the statement applies?”

The Population

Page 4: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 4Copyright © 2010 Pearson Education, Inc.

In an ideal world, the specific population to which a statistical statement applies is clearly identified within the story itself. In the real world, this rarely happens because the details are skipped (mostly to keep the story moving along but sometimes with an intent to confuse or deceive) or, alternatively, because two (or more) related populations are involved in the story.

The Population

Page 5: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 5Copyright © 2010 Pearson Education, Inc.

Given a specific population, an obviously relevant question is, “How many individuals or objects are there in that population?” This number is called the N-value of the population. (It is common practice in statistics to use capital N to denote population sizes.) It is important to keep in mind the distinction between the N-value– a number specifying the size of the population–and the population itself.

The N Value

Page 6: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 6Copyright © 2010 Pearson Education, Inc.

Over a period of many years, the United States Fish and Wildlife Service was able to keep a remarkably accurate tally of the number of bald eagle breeding pairs in the contiguous 48 states. (breeding pairs are used as a useful proxy for the health of the overall population.) A tremendous amount of effort has gone into collecting and verifying these N-values, which, for a wildlife population, are of remarkable accuracy.

Example 13.2 The Return of theBald Eagle: Part 2

Page 7: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 7Copyright © 2010 Pearson Education, Inc.

Andy has a coin jar full of quarters. He is hoping that there is enough money in the jar to pay for a new baseball glove. Dad says to go count them, and if there isn’t enough, he will lend Andy the difference. Andy dumps the quarters out of the jar, makes a careful tally, and comes up with a count of 116 quarters.

Example 13.3 N Is in the Eye of the Beholder

Page 8: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 8Copyright © 2010 Pearson Education, Inc.

What is the N-value here? The answer depends on how we define the population. Are we counting coins or money? To Dad, who will end up stuck with all the quarters, the total number of coins might be the most relevant issue. Thus, to Dad, N = 116. Andy, on the other hand, is concerned with how much money is in the jar. If he were to articulate his point of view in statistical language, he would say that N = 29(dollars).

Example 13.3 N Is in the Eye of the Beholder

Page 9: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 9Copyright © 2010 Pearson Education, Inc.

The word data is the plural of the Latin word datum, meaning “something given,” and in ordinary usage has a somewhat broader meaning than the one we will give it in this chapter. For our purposes we will use the word data as any type of information packaged in numerical form, and we will adhere to the standard convention that as a noun it can be used both in singular (“the data is…”), and plural (“the data are…”) forms.

Data

Page 10: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 10Copyright © 2010 Pearson Education, Inc.

The process of collecting data by going through every member of the population is called a census. The idea behind a census is simple enough, but in practice a census requires a great deal of “cooperation” from the population. For larger, more dynamic populations (wildlife, humans, etc.), accurate tallies are inherently difficult if not impossible, and in these cases the best we can hope for is a good estimate of the N-value.

Census

Page 11: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 11Copyright © 2010 Pearson Education, Inc.

The most notoriously difficult N-value question around is,”What is the N-value of the national population of the United States?” This is a question the United States Census tries to answer every 10 years–with very little success. The 2000 U.S.Census was the largest single peacetime undertaking of the federal government–it employed over 850,000 people and cost about $6.5 billion– and yet it missed counting between 3 and 4 million people.

Example 13.4 2000 Census Undercounts

Page 12: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 12Copyright © 2010 Pearson Education, Inc.

Given the critical importance of the U.S. Census and given the tremendous resources put behind the effort by the federal government, why is the head count so far off? How can the best intentions and tremendous resources of our government fail so miserably in an activity that on a smaller scale can be carried out by a child trying to buy a base- ball glove?

Example 13.4 2000 Census Undercounts

Page 13: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 13Copyright © 2010 Pearson Education, Inc.

Nowadays, the notion that if we put enough money and effort into it, all individuals living in the United States can be counted like coins in a jar is unrealistic. In 1790, when the first U.S. Census was carried out, the population was smaller and relatively homogeneous, as people tended to stay in one place, and, by and large, they felt comfortable in their dealings with the government. Under these conditions it might have been possible for census takers to count heads accurately.

Taking a Census

Page 14: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 14Copyright © 2010 Pearson Education, Inc.

Today’s conditions are completely different. People are constantly on the move. Many distrust the government. In large urban areas many people are homeless or don’t want to be counted. And then there is the apathy of many people who think of a census form as another piece of junk mail.

Taking a Census

Page 15: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 15Copyright © 2010 Pearson Education, Inc.

If the Census undercount were consistent among all segments of the population, the undercount problem could be solved easily. Unfortunately, the modern U.S. Census is plagued by what is known as a differential undercount. Ethnic minorities, migrant workers, and the urban poor populations have significantly larger undercount rates than the population at large, and the undercount rates vary significantly within these groups.

Taking a Census

Page 16: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 16Copyright © 2010 Pearson Education, Inc.

Using modern statistical techniques, it is possible to make adjustments to the raw Census figures that correct some of the inaccuracy caused by the differential undercount, but in 1999 the Supreme Court ruled in Department of Commerce et al. v. United States House of Representatives et al. that only the raw numbers, and not statistically adjusted numbers, can be used for the purposes of apportionment of Congressional seats among the states.

Taking a Census

Page 17: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 17Copyright © 2010 Pearson Education, Inc.

13 Collecting Statistical Data

13.1 The Population

13.2 Sampling

13.3 Random Sampling

13.4 Sampling: Terminology and Key Concepts

13.5 The Capture-Recapture Method

13.6 Clinical Studies

Page 18: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 18Copyright © 2010 Pearson Education, Inc.

The practical alternative to a census is to collect data only from some members of the population and use that data to draw conclusions and make inferences about the entire population. Statisticians call this approach a survey (or a poll when the data collection is done by asking questions). The subgroup chosen to provide the data is called the sample, and the act of selecting a sample is called sampling.

A Survey

Page 19: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 19Copyright © 2010 Pearson Education, Inc.

Ideally, every member of the population should have an opportunity to be chosen as part of the sample, but this is possible only if we have a mechanism to identify each and every member of the population. In many situations this is impossible. Say we want to conduct a public opinion poll before an election. The population for the poll consists of all voters in the upcoming election, but how can we identify who is and is not going to vote ahead of time? We know who the registered voters are, but among this group there are still many nonvoters.

A Survey

Page 20: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 20Copyright © 2010 Pearson Education, Inc.

The first important step in a survey is to distinguish the population for which the survey applies (the target population) and the actual subset of the population from which the sample will be drawn, called the sampling frame. The ideal scenario is when the sampling frame is the same as the target population–that would mean that every member of the target population is a candidate for the sample. When this is impossible (or impractical), an appropriate sampling frame must be chosen.

A Survey

Page 21: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 21Copyright © 2010 Pearson Education, Inc.

The basic philosophy behind sampling is simple and well understood–if we have a sample that is “representative” of the entire population, then whatever we want to know about a population can be found out by getting the information from the sample. If we are to draw reliable data from a sample, we must (a) find a sample that is representative of the population, and (b) determine how big the sample should be. These two issues go hand in hand, and we will discuss them next.

Sampling

Page 22: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 22Copyright © 2010 Pearson Education, Inc.

Sometimes a very small sample can be used to get reliable information about a population, no matter how large the population is. This is the case when the population is highly homogeneous.

The more heterogeneous a population gets, the more difficult it is to find a representative sample.

Sampling

Page 23: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 23Copyright © 2010 Pearson Education, Inc.

When the choice of the sample has a built-in tendency (whether intentional or not) to exclude a particular group or characteristic within the population, we say that a survey suffers from selection bias. It is obvious that selection bias must be avoided, but it is not always easy to detect it ahead of time. Even the most scrupulous attempts to eliminate selection bias can fall short.

CASE STUDY 2 THE 1936 LITERARY DIGEST POLL

Page 24: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 24Copyright © 2010 Pearson Education, Inc.

In a typical survey it is understood that not every individual is willing to respond to the survey request (and in a democracy we cannot force them to do so). Those individuals who do not respond to the survey request are called nonrespondents, and those who do are called respondents. The percentage of respondents out of the total sample is called the response rate.

CASE STUDY 2 THE 1936 LITERARY DIGEST POLL

Page 25: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 25Copyright © 2010 Pearson Education, Inc.

For the Literary Digest poll, out of a sample of 10 million people who were mailed a mock ballot only about 2.4 million mailed a ballot back, resulting in a 24% response rate. When the response rate to a survey is low, the survey is said to suffer from nonresponse bias.

CASE STUDY 2 THE 1936 LITERARY DIGEST POLL

Page 26: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 26Copyright © 2010 Pearson Education, Inc.

One of the significant problems with the Literary Digest poll was that the poll was conducted by mail. This approach is the most likely to magnify nonresponse bias, because people often consider a mailed questionnaire just another form of junk mail. Of course, given the size of their sample, the Literary Digest hardly had a choice. This illustrates another important point: Bigger is not better, and a big sample can be more of a liability than an asset.

CASE STUDY 2 THE 1936 LITERARY DIGEST POLL

Page 27: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 27Copyright © 2010 Pearson Education, Inc.

The Literary Digest story has two morals:

(1) You’ll do better with a well-chosen small sample than with a badly chosen large one, and

(2) watch out for selection bias and nonresponse bias.

CASE STUDY 2 THE 1936 LITERARY DIGEST POLL

Page 28: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 28Copyright © 2010 Pearson Education, Inc.

One commonly used short-cut in sampling is known as convenience sampling. In convenience sampling the selection of which individuals are in the sample is dictated by what is easiest or cheapest for the data collector, never mind trying to get a representative sample. A classic example of convenience sampling is when interviewers set up at a fixed location such as a mall or outside a supermarket and ask passersby to be part of a public opinion poll.

Convenience Sampling

Page 29: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 29Copyright © 2010 Pearson Education, Inc.

A different type of convenience sampling occurs when the sample is based on self-selection–the sample consists of those individuals who volunteer to be in it. Self-selection is the reason why many Area Code 800 polls are not to be trusted. Convenience sampling is not always bad–at times there is no other choice or the alternatives are so expensive that they have to be ruled out.

Convenience Sampling

Page 30: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 30Copyright © 2010 Pearson Education, Inc.

We should keep in mind, however, that data collected through convenience sampling are naturally tainted and should always be scrutinized (that’s why we always want to get to the details of how the data were collected). More often than not, convenience sampling gives us data that are too unreliable to be of any scientific value. With data, as with so many other things, you get what you pay for.

Convenience Sampling

Page 31: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 31Copyright © 2010 Pearson Education, Inc.

Quota sampling is a systematic effort to force the sample to be representative of a given population through the use of quotas–the sample should have so many women, so many men, so many blacks, so many whites, so many people living in urban areas, so many people living in rural areas, and so on. The proportions in each category in the sample should be the same as those in the population.

Quota Sampling

Page 32: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 32Copyright © 2010 Pearson Education, Inc.

If we can assume that every important characteristic of the population is taken into account when the quotas are set up, it is reasonable to expect that the sample will be representative of the population and produce reliable data.

Quota Sampling

Page 33: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 33Copyright © 2010 Pearson Education, Inc.

What’s wrong with quota sampling? After all, the basic idea behind it appears to be a good one: Force the sample to be a representative cross section of the population by having each important characteristic of the population proportionally represented in the sample.

CASE STUDY 3 THE 1948 PRESIDENTIAL ELECTION

Page 34: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 34Copyright © 2010 Pearson Education, Inc.

Where do we stop? No matter how careful we might be, we might miss some criterion that would affect the way people vote, and the sample could be deficient in this regard. An even more serious flaw in quota sampling is that, other than meeting the quotas, the interviewers are free to choose whom they interview. This opens the door to selection bias. Looking back over the history of quota sampling, we can see a clear tendency to overestimate the Republican vote.

CASE STUDY 3 THE 1948 PRESIDENTIAL ELECTION

Page 35: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 35Copyright © 2010 Pearson Education, Inc.

13 Collecting Statistical Data

13.1 The Population

13.2 Sampling

13.3 Random Sampling

13.4 Sampling: Terminology and Key Concepts

13.5 The Capture-Recapture Method

13.6 Clinical Studies

Page 36: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 36Copyright © 2010 Pearson Education, Inc.

The best alternative to human selection is to let the laws of chance determine the selection of a sample. Sampling methods that use randomness as part of their de- sign are known as random sampling methods, and any sample obtained through random sampling is called a random sample (or a probability sample).

Random Sampling

Page 37: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 37Copyright © 2010 Pearson Education, Inc.

The most basic form of random sampling is called simple random sampling. It is based on the same principle a lottery is. Any set of numbers of a given size has an equal chance of being chosen as any other set of numbers of that size. In theory, simple random sampling is easy to implement. We put the name of each individual in the population in “a hat,” mix the names well, and then draw as many names as we need for our sample. Of course “a hat” is just a metaphor.

Simple Random Sampling

Page 38: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 38Copyright © 2010 Pearson Education, Inc.

These days, the “hat” is a computer database containing a list of members of the population. A computer program then randomly selects the names. This is a fine idea for small, compact populations, but a hopeless one when it comes to national surveys and public opinion polls. For most public opinion polls–especially those done on a regular basis” the time and money needed to do this are simply not available.

Simple Random Sampling

Page 39: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 39Copyright © 2010 Pearson Education, Inc.

The alternative to simple random sampling used nowadays for national surveys and public opinion polls is a sampling method known as stratified sampling. The basic idea of stratified sampling is to break the sampling frame into categories called strata, and then (unlike quota sampling) randomly choose a sample from these strata. The chosen strata are then further divided into categories, called substrata, and a random sample is taken from these substrata.

Stratified Sampling

Page 40: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 40Copyright © 2010 Pearson Education, Inc.

The selected substrata are further subdivided, a random sample is taken from them, and so on. The process goes on for a predetermined number of steps (usually four or five).

Stratified Sampling

Page 41: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 41Copyright © 2010 Pearson Education, Inc.

In national public opinion polls the strata and substrata are defined by a combination of geographic and demographic criteria. For example, the nation is first divided into “size of community” strata (big cities, medium cities, small cities, villages, rural areas, etc.). The strata are then subdivided by geographical region (New England, Middle Atlantic, East Central, etc.).

This is the first layer of substrata.

CASE STUDY 4 NATIONAL PUBLIC OPINION POLLS

Page 42: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 42Copyright © 2010 Pearson Education, Inc.

The efficiency of stratified sampling compared with simple random sampling in terms of cost and time is clear. The members of the sample are clustered in well-defined and easily manageable areas, significantly reducing the cost of conducting interviews as well as the response time needed to collect the data. For a large, heterogeneous nation like the United States, stratified sampling has generally proved to be a reliable way to collect national data.

CASE STUDY 4 NATIONAL PUBLIC OPINION POLLS

Page 43: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 43Copyright © 2010 Pearson Education, Inc.

What about the size of the sample? Surprisingly, it does not have to be very large. Typically, a Gallup poll is based on samples consisting of approximately 1500 individuals, and roughly the same size sample can be used to poll the populations of a small city as the population of the United States. The size of the sample does not have to be proportional to the size of the population.

CASE STUDY 4 NATIONAL PUBLIC OPINION POLLS

Page 44: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 44Copyright © 2010 Pearson Education, Inc.

13 Collecting Statistical Data

13.1 The Population

13.2 Sampling

13.3 Random Sampling

13.4 Sampling: Terminology and Key Concepts

13.5 The Capture-Recapture Method

13.6 Clinical Studies

Page 45: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 45Copyright © 2010 Pearson Education, Inc.

As we now know, except for a census, the common way to collect statistical information about a population is by means of a survey. When the survey consists of asking people their opinion on some issue, we refer to it as a public opinion poll. In a survey, we use a subset of the population, called a sample, as the source of our information, and from this sample, we try to generalize and draw conclusions about the entire population.

Survey

Page 46: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 46Copyright © 2010 Pearson Education, Inc.

Statisticians use the term statistic to describe any kind of numerical information drawn from a sample. A statistic is always an estimate for some unknown measure, called a parameter, of the population. A parameter is the numerical information we would like to have. Calculating a parameter is difficult and often impossible, since the only way to get the exact value for a parameter is to use a census. If we use a sample, then we can get only an estimate for the parameter, and this estimate is called a statistic.

Statistic versus Parameter

Page 47: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 47Copyright © 2010 Pearson Education, Inc.

We will use the term sampling error to describe the difference between a parameter and a statistic used to estimate that parameter. In other words, the sampling error measures how much the data from a survey differs from the data that would have been obtained if a census had been used.

Sampling error can be attributed to two factors: chance error and sampling bias.

Sampling Error

Page 48: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 48Copyright © 2010 Pearson Education, Inc.

Chance error is the result of the basic fact that a sample, being just a sample, can only give us approximate information about the population. In fact, different samples are likely to produce different statistics for the same population, even when the samples are chosen in exactly the same way–a phenomenon known as sampling variability. While sampling variability, and thus chance error, are unavoidable, with careful selection of the sample and the right choice of sample size they can be kept to a minimum.

Chance Error

Page 49: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 49Copyright © 2010 Pearson Education, Inc.

Sample bias is the result of choosing a bad sample and is a much more serious problem than chance error. Even with the best intentions, getting a sample that is representative of the entire population can be very difficult and can be affected by many subtle factors. Sample bias is the result. As opposed to chance error, sample bias can be eliminated by using proper methods of sample selection.

Sampling Bias

Page 50: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 50Copyright © 2010 Pearson Education, Inc.

Last, we shall make a few comments about the size of the sample, typically denoted by the letter n (to contrast with N, the size of the population). The ratio n/N is called the sampling proportion. A sampling proportion of x% tells us that the size of the sample is intended to be x% of the population. Some sampling methods are conducive for choosing the sample so that a given sampling proportion is obtained.

Sampling Proportion

Page 51: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 51Copyright © 2010 Pearson Education, Inc.

Typically, modern public opinion polls use samples of n between 1000 and 1500 to get statistics that have a margin of error of less than 5%, be it for the population of a city, a region, or the entire country.

Sampling Proportion

Page 52: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 52Copyright © 2010 Pearson Education, Inc.

13 Collecting Statistical Data

13.1 The Population

13.2 Sampling

13.3 Random Sampling

13.4 Sampling: Terminology and Key Concepts

13.5 The Capture-Recapture Method

13.6 Clinical Studies

Page 53: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 53Copyright © 2010 Pearson Education, Inc.

We have already observed that finding the exact N-value of a large and elusive population can be extremely difficult and sometimes impossible. In many cases, a good estimate is all we really need, and such estimates are possible through sampling methods. The simplest sampling method for estimating the N-value of a population is called the capture-recapture method.

Finding the N-value

Page 54: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 54Copyright © 2010 Pearson Education, Inc.

■ Step 1. Capture (sample):Capture (choose) a sample of size n1, tag (mark, identify) the animals (objects, people), and release them back into the general population.

THE CAPTURE-RECAPTUREMETHOD

Page 55: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 55Copyright © 2010 Pearson Education, Inc.

■ Step 2. Recapture (resample):After a certain period of time, capture a new sample of size n2 and take an exact head count of the tagged individuals (i.e., those that were also in the first sample). Call this number k.

THE CAPTURE-RECAPTUREMETHOD

Page 56: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 56Copyright © 2010 Pearson Education, Inc.

■ Step 3. Estimate:The N-value of the population can be estimated to be approximately(n1• n2)/k.

THE CAPTURE-RECAPTUREMETHOD

Page 57: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 57Copyright © 2010 Pearson Education, Inc.

The capture-recapture method is based on the assumption that both the captured and recaptured samples are representative of the entire population. Under these assumptions, the proportion of tagged individuals in the recaptured sample is approximately equal to the proportion of the tagged individuals in the population. In other words, the ratio k/n2 is approximately equal to the ratio n1/N. From this we can solve for N and get N ≈ (n1• n2)/k.

Capture-Recapture Method

Page 58: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 58Copyright © 2010 Pearson Education, Inc.

13 Collecting Statistical Data

13.1 The Population

13.2 Sampling

13.3 Random Sampling

13.4 Sampling: Terminology and Key Concepts

13.5 The Capture-Recapture Method

13.6 Clinical Studies

Page 59: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 59Copyright © 2010 Pearson Education, Inc.

A survey typically deals with issues and questions that have direct and measurable answers. If the election were held today, would you vote for candidate X or candidate Y? How many people live in your household? How many catfish have tags? In these situations data collection involves some combination of observing, measuring, and recording but no active involvement or interference with the phenomenon being observed.

Survey

Page 60: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 60Copyright © 2010 Pearson Education, Inc.

A different type of data collection process is needed when we are trying to establish connections between a cause and an effect. Does taking a math class increase your chances of getting a good paying job? Does repeated exposure to secondhand smoke significantly increase your risk for developing lung cancer? Does a daily dose of aspirin reduce your chances of a heart attack? Do the benefits of hormone replacement therapy for women over 50 outweigh the risks?

Cause and Effect

Page 61: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 61Copyright © 2010 Pearson Education, Inc.

These kinds of cause-and-effect questions require observation over an extended period of time. Moreover, in these situations the data collection process requires the active involvement of the experimenter–in addition to observation, measurement, and recording, there is also treatment.

Cause and Effect

Page 62: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 62Copyright © 2010 Pearson Education, Inc.

When we want to know if a certain cause X produces a certain effect Y, we set up a study in which cause X is produced and its effects are observed. If the effect Y is observed, then it is possible that X was indeed the cause of Y. We have established an association between the cause X and the effect Y.The problem, however, is the nagging possibility that some other cause Z different from X produced the effect Y and that X had nothing to do with it.

Cause and Effect

Page 63: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 63Copyright © 2010 Pearson Education, Inc.

Just because we established an association, we have not established a cause-and-effect relation between the variables. Statisticians like to explain this by a simple saying: Association is not causation.

Cause and Effect

Page 64: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 64Copyright © 2010 Pearson Education, Inc.

One important type of study is called a clinical study or clinical trial. Generally, clinical studies are concerned with determining whether a single variable or treatment (usually a vaccine, a drug, therapy,etc.) can cause a certain effect (a disease, a symptom, a cure, etc.). The importance of such clinical studies is self-evident: Every new vaccine, drug, or treatment must prove itself by means of a clinical study before it is officially approved for public use.

Clinical Study

Page 65: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 65Copyright © 2010 Pearson Education, Inc.

Likewise, almost everything that is bad for us (cigarettes, caffeine, trans fats, etc.) gets its official certification of badness by means of a clinical study.Properly designing a clinical study can be both difficult and controversial, and as a result we are often bombarded with conflicting information produced by different studies examining the same cause-and-effect question. The basic principles guiding a clinical study, however, are pretty much established by statistical practice.

Clinical Study

Page 66: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 66Copyright © 2010 Pearson Education, Inc.

The first and most important issue in any clinical study is to isolate the cause (treatment, drug, vaccine, therapy, etc.) that is under investigation from all other possible contributing causes (called confounding variables) that could produce the same effect. Generally, this is best done by controlling the study.

Isolate the Cause

Page 67: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 67Copyright © 2010 Pearson Education, Inc.

In a controlled study, the subjects are divided into two different groups: the treatment group and the control group.The treatment group consists of those subjects receiving the actual treatment; the control group consists of subjects that are not receiving any treatment–they are there for comparison purposes only (that’s why the control group is sometimes also called the comparison group).

Controlled Study

Page 68: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 68Copyright © 2010 Pearson Education, Inc.

If a real cause-and-effect relationship exists between the treatment and the effect being studied, then the treatment group should show the effects of the treatment and the control group should not.To eliminate the many potential confounding variables that can bias its results, a well-designed controlled study should have control and treatment groups that are similar in every characteristic other than the fact that one group is being treated and the other one is not.

Controlled Study

Page 69: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 69Copyright © 2010 Pearson Education, Inc.

The most reliable way to get equally representative treatment and control groups is to use a randomized controlled study. In a randomized controlled study, the subjects are assigned to the treatment group or the control group randomly.

When the randomization part of a randomized controlled study is properly done, treatment and control groups can be assumed to be statistically similar.

Randomized Controlled Study

Page 70: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 70Copyright © 2010 Pearson Education, Inc.

But there is still one major difference between the two groups that can significantly affect the validity of the study–a critical confounding variable known as the placebo effect.The placebo effect follows from the generally accepted principle that just the idea that one is getting a treatment can produce positive results.

Placebo Effect

Page 71: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 71Copyright © 2010 Pearson Education, Inc.

Thus, when subjects in a study are getting a pill or a vaccine or some other kind of treatment, how can the researchers separate positive results that are consequences of the treatment itself from those that might be caused by the placebo effect?

When possible, the standard way to handle this problem is to give the control group a placebo.

Placebo

Page 72: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 72Copyright © 2010 Pearson Education, Inc.

A placebo is a make-believe form of treatment–a harmless pill, an injection of saline solution, or any other fake type of treatment intended to look like the real treatment. A controlled study in which the subjects in the control group are given a placebo is called a controlled placebo study.

Controlled Placebo Study

Page 73: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 73Copyright © 2010 Pearson Education, Inc.

By giving all subjects a seemingly equal treatment (the treatment group gets the real treatment and the control group gets a placebo that looks like the real treatment), we do not eliminate the placebo effect but rather control it–whatever its effect might be, it affects all subjects equally. It goes without saying that the use of placebos is pointless if the subject knows he or she is getting a placebo.

Controlled Placebo Study

Page 74: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 74Copyright © 2010 Pearson Education, Inc.

Thus, a second key element of a good controlled placebo study is that all subjects be kept in the dark as to whether they are being treated with a real treatment or a placebo. A study in which neither the members of the treatment group nor the members of the control group know to which of the two groups they belong is called a blind study.

Blind Study

Page 75: 13 Collecting Statistical Data

Excursions in Modern Mathematics, 7e: 13.1 - 75Copyright © 2010 Pearson Education, Inc.

To keep the interpretation of the results (which can often be ambiguous) totally objective, it is important that the scientists conducting the study and collecting the data also be in the dark when it comes to who got the treatment and who got the placebo. A controlled placebo study in which neither the subjects nor the scientists conducting the experiment know which subjects are in the treatment group and which are in the control group is called a double-blind study.

Double-Blind Study