Sampling Final Copy

download Sampling Final Copy

of 10

Transcript of Sampling Final Copy

  • 8/9/2019 Sampling Final Copy

    1/10

    INTRODUCTION TO SAMPLING

    Researchers usually cannot make direct observations of every individual in the

    population they are studying. Instead, they collect data from a subset of

    individuals a sample and use those observations to make inferences aboutthe entire population.

    Ideally, the sample corresponds to the larger population on the characteristic(s)

    of interest. In that case, the researcher's conclusions from the sample are

    probably applicable to the entire population.

    This type of correspondence between the sample and the larger population is

    most important when a researcher wants to know what proportion of the

    population has a certain characteristic like a particular opinion or ademographic feature. Public opinion polls that try to describe the percentage of

    the population that plans to vote for a particular candidate, for example, require

    a sample that is highly representative of the population.

    Probability samples and convenience samples

    Two general approaches to sampling are used in research. With probabilitysampling, all elements (e.g., persons, households) in the population have someopportunity of being included in the sample, and the mathematical probability

    that any one of them will be selected can be calculated.

    With nonprobability sampling, in contrast, population elements are selected onthe basis of their availability (e.g., because they volunteered) or because of the

    researcher's personal judgment that they are representative. The consequence is

    that an unknown portion of the population is excluded (e.g., those who did not

    volunteer). One of the most common types of nonprobability sample is called a

    convenience sample not because such samples are necessarily easy to recruit,but because the researcher uses whatever individuals are available rather than

    selecting from the entire population.

    Because some members of the population have no chance of being sampled, the

    extent to which a convenience sample regardless of its size actually

    represents the entire population cannot be known.

    Recruiting a probability sample is not always a priority for researchers. A

    scientist can demonstrate that a particular trait occurs in a population by

    documenting a single instance.

    For example, the assertion that all lesbians are mentally ill can be refuted bydocumenting the existence of even one lesbian who is free from

  • 8/9/2019 Sampling Final Copy

    2/10

    psychopathology.

    Another situation in which a probability sample is not necessary is when a

    researcher wishes to describe a particular group in an exploratory way.

    For example, interviewing 25 people with AIDS (PWAs) about their

    experiences with HIV could provide valuable insights about stress and coping,

    even though it would not yield data about the proportion of PWAs in the general

    population who share those experiences.

    Types of probability samples

    Many strategies can be used to create a probability sample. Each starts with a

    sampling frame, which can be thought of as a list of all elements in the

    population of interest (e.g., names of individuals, telephone numbers, houseaddresses, census tracts). The sampling frame operationally defines the target

    population from which the sample is drawn and to which the sample data will

    be generalized.

    Probably the most familiar type of probability sample is the simple randomsample, for which all elements in the sampling frame have an equal chance ofselection, and sampling is done in a single stage with each element selected

    independently (rather than, for example, in clusters).

    Somewhat more common than simple random samples are systematic samples,which are drawn by starting at a randomly selected element in the sampling

    frame and then taking every nth element (e.g., starting at a random location in atelephone book and then taking every 100th name).

    In yet another approach, cluster sampling, a researcher selects the sample instages, first selecting groups of elements, or clusters (e.g., city blocks, census

    tracts, schools), and then selecting individual elements from each cluster (e.g.,

    randomly or by systematic sampling).

    An Example

    Suppose some researchers want to find out which of two mayoral candidates is

    favored by voters. Obtaining a probability sample would involve defining the

    target population (in this case, all eligible voters in the city) and using one of

    many available procedures for selecting a relatively small number (probably

    fewer than 1,000) of those people for interviewing. For example, the researchers

    might create a systematic sample by obtaining the voter registration roster,

    starting at a randomly selected name, and contacting every 500th personthereafter. Or, in a more sophisticated procedure, the researchers might use a

  • 8/9/2019 Sampling Final Copy

    3/10

    computer to randomly select telephone numbers from all of those in use in the

    city, and then interview a registered voter at each telephone number. (This

    procedure would yield a sample that represents only those people who have a

    telephone.)

    Several procedures would also be available for recruiting a convenience sample,

    but none of them would include the entire population as potential respondents.

    For example, the researchers might ascertain the voting preferences of their own

    friends and acquaintances. Or they might interview shoppers at a local mall. Or

    they might publish two telephone numbers in the local newspaper and ask

    readers to call either number in order to "vote" for one of the candidates. The

    important feature of these methods is that they would systematically exclude

    some members of the population (respectively, eligible voters who do not know

    the researchers, do not go to the shopping mall, and do not read the newspaper).

    Consequently, their findings could not be generalized to the population of city

    voters.

    Non Probability samples

    The difference between nonprobability and probability sampling is that

    nonprobability sampling does not involve random selection and probability

    sampling does. Does that mean that nonprobability samples aren't representative

    of the population? Not necessarily. But it does mean that nonprobability

    samples cannot depend upon the rationale of probability theory. At least with a

    probabilistic sample, we know the odds or probability that we have represented

    the population well. We are able to estimate confidence intervals for the

    statistic. With nonprobability samples, we may or may not represent the

    population well, and it will often be hard for us to know how well we've done

    so. In general, researchers prefer probabilistic or random sampling methods

    over nonprobabilistic ones, and consider them to be more accurate and rigorous.

    However, in applied social research there may be circumstances where it is not

    feasible, practical or theoretically sensible to do random sampling. Here, we

    consider a wide range of nonprobabilistic alternatives.

    We can divide nonprobability sampling methods into two broad types:

    accidental or purposive. Most sampling methods are purposive in nature

    because we usually approach the sampling problem with a specific plan in

    mind. The most important distinctions among these types of sampling methods

    are the ones between the different types of purposive sampling approaches.

    Types of Non Probability samples

  • 8/9/2019 Sampling Final Copy

    4/10

    Accidental, Haphazard or Convenience Sampling

    One of the most common methods of sampling goes under the various titles

    listed here. I would include in this category the traditional "man on the street"

    (of course, now it's probably the "person on the street") interviews conductedfrequently by television news programs to get a quick (although

    nonrepresentative) reading of public opinion. I would also argue that the typical

    use of college students in much psychological research is primarily a matter of

    convenience. (You don't really believe that psychologists use college students

    because they believe they're representative of the population at large, do you?).

    In clinical practice,we might use clients who are available to us as our sample.

    In many research contexts, we sample simply by asking for volunteers. Clearly,

    the problem with all of these types of samples is that we have no evidence that

    they are representative of the populations we're interested in generalizing to --

    and in many cases we would clearly suspect that they are not.

    Purposive Sampling

    In purposive sampling, we sample with a purpose in mind. We usually wouldhave one or more specific predefined groups we are seeking. For instance, have

    you ever run into people in a mall or on the street who are carrying a clipboard

    and who are stopping various people and asking if they could interview them?

    Most likely they are conducting a purposive sample (and most likely they are

    engaged in market research). They might be looking for Caucasian femalesbetween 30-40 years old. They size up the people passing by and anyone who

    looks to be in that category they stop to ask if they will participate. One of the

    first things they're likely to do is verify that the respondent does in fact meet the

    criteria for being in the sample. Purposive sampling can be very useful for

    situations where you need to reach a targeted sample quickly and where

    sampling for proportionality is not the primary concern. With a purposive

    sample, you are likely to get the opinions of your target population, but you are

    also likely to overweight subgroups in your population that are more readily

    accessible.

    All of the methods that follow can be considered subcategories of purposive

    sampling methods. We might sample for specific groups or types of people as in

    modal instance, expert, or quota sampling. We might sample for diversity as in

    heterogeneity sampling. Or, we might capitalize on informal social networks to

    identify specific respondents who are hard to locate otherwise, as in snowball

    sampling. In all of these methods we know what we want -- we are sampling

    with a purpose.

    Modal Instance Sampling

  • 8/9/2019 Sampling Final Copy

    5/10

    In statistics, the mode is the most frequently occurring value in a distribution. Insampling, when we do a modal instance sample, we are sampling the most

    frequent case, or the "typical" case. In a lot of informal public opinion polls, for

    instance, they interview a "typical" voter. There are a number of problems with

    this sampling approach. First, how do we know what the "typical" or "modal"case is? We could say that the modal voter is a person who is of average age,

    educational level, and income in the population. But, it's not clear that using the

    averages of these is the fairest (consider the skewed distribution of income, for

    instance). And, how do you know that those three variables -- age, education,

    income -- are the only or even the most relevant for classifying the typical

    voter? What if religion or ethnicity is an important discriminator? Clearly,

    modal instance sampling is only sensible for informal sampling contexts.

    Expert Sampling

    Expert sampling involves the assembling of a sample of persons with known or

    demonstrable experience and expertise in some area. Often, we convene such a

    sample under the auspices of a "panel of experts." There are actually two

    reasons you might do expert sampling. First, because it would be the best way

    to elicit the views of persons who have specific expertise. In this case, expert

    sampling is essentially just a specific subcase of purposive sampling. But the

    other reason you might use expert sampling is to provide evidence for the

    validity of another sampling approach you've chosen. For instance, let's say you

    do modal instance sampling and are concerned that the criteria you used fordefining the modal instance are subject to criticism. You might convene an

    expert panel consisting of persons with acknowledged experience and insight

    into that field or topic and ask them to examine your modal definitions and

    comment on their appropriateness and validity. The advantage of doing this is

    that you aren't out on your own trying to defend your decisions -- you have

    some acknowledged experts to back you. The disadvantage is that even the

    experts can be, and often are, wrong.

    Quota Sampling

    In quota sampling, you select people nonrandomly according to some fixed

    quota. There are two types of quota sampling: proportional and nonproportional. In proportional quota sampling you want to represent the majorcharacteristics of the population by sampling a proportional amount of each. For

    instance, if you know the population has 40% women and 60% men, and that

    you want a total sample size of 100, you will continue sampling until you get

    those percentages and then you will stop. So, if you've already got the 40

    women for your sample, but not the sixty men, you will continue to sample men

    but even if legitimate women respondents come along, you will not sample

  • 8/9/2019 Sampling Final Copy

    6/10

    them because you have already "met your quota." The problem here (as in much

    purposive sampling) is that you have to decide the specific characteristics on

    which you will base the quota. Will it be by gender, age, education race,

    religion, etc.?

    Non proportional quota sampling is a bit less restrictive. In this method, you

    specify the minimum number of sampled units you want in each category. here,

    you're not concerned with having numbers that match the proportions in the

    population. Instead, you simply want to have enough to assure that you will be

    able to talk about even small groups in the population. This method is the

    nonprobabilistic analogue of stratified random sampling in that it is typically

    used to assure that smaller groups are adequately represented in your sample.

    Heterogeneity Sampling

    We sample for heterogeneity when we want to include all opinions or views,

    and we aren't concerned about representing these views proportionately.

    Another term for this is sampling for diversity. In many brainstorming ornominal group processes (including concept mapping), we would use some

    form of heterogeneity sampling because our primary interest is in getting broad

    spectrum of ideas, not identifying the "average" or "modal instance" ones. In

    effect, what we would like to be sampling is not people, but ideas. We imagine

    that there is a universe of all possible ideas relevant to some topic and that we

    want to sample this population, not the population of people who have the ideas.Clearly, in order to get all of the ideas, and especially the "outlier" or unusual

    ones, we have to include a broad and diverse range of participants.

    Heterogeneity sampling is, in this sense, almost the opposite of modal instance

    sampling.

    Snowball Sampling

    In snowball sampling, you begin by identifying someone who meets the criteria

    for inclusion in your study. You then ask them to recommend others who they

    may know who also meet the criteria. Although this method would hardly lead

    to representative samples, there are times when it may be the best method

    available. Snowball sampling is especially useful when you are trying to reach

    populations that are inaccessible or hard to find. For instance, if you are

    studying the homeless, you are not likely to be able to find good lists of

    homeless people within a specific geographical area. However, if you go to that

    area and identify one or two, you may find that they know very well who the

    other homeless people in their vicinity are and how you can find them.

    Sample Size

  • 8/9/2019 Sampling Final Copy

    7/10

    The sample size is very simply the size of the sample. If there is only one

    sample, the letter "N" is used to designate the sample size. If samples are taken

    from each of "a" populations, then the small letter "n" is used to designate size

    of the sample from each population. When there are samples from more than

    one population, N is used to indicate the total number of subjects sampled and is

    equal to (a)(n). If the sample sizes from the various populations are different,

    then n1 would indicate the sample size from the first population, n2 from the

    second, etc. The total number of subjects sampled would still be indicated by N.

    When correlations are computed, the sample size (N) refers to the number of

    subjects and thus the number of pairs of scores rather than to the total number of

    scores.

    The symbol N also refers to the number of subjects in the formulas for testing

    differences between dependent means. Again, it is the number of subjects, not

    the number of scores.

    Sample Size Calculator Terms: Confidence Interval &

    Confidence Level

    The confidence interval is the plus-or-minus figure usually reported in

    newspaper or television opinion poll results. For example, if you use a

    confidence interval of 4 and 47% percent of your sample picks an answer you

    can be "sure" that if you had asked the question of the entire relevant population

    between 43% (47-4) and 51% (47+4) would have picked that answer.

    The confidence level tells you how sure you can be. It is expressed as a

    percentage and represents how often the true percentage of the population who

    would pick an answer lies within the confidence interval. The 95% confidencelevel means you can be 95% certain; the 99% confidence level means you can

    be 99% certain. Most researchers use the 95% confidence level.

    When you put the confidence level and the confidence interval together, you

    can say that you are 95% sure that the true percentage of the population is

    between 43% and 51%. The wider the confidence interval you are willing to

    accept, the more certain you can be that the whole population answers would be

    within that range.

    For example, if you asked a sample of 1000 people in a city which brand of cola

  • 8/9/2019 Sampling Final Copy

    8/10

    they preferred, and 60% said Brand A, you can be very certain that between 40

    and 80% of all the people in the city actually do prefer that brand, but you

    cannot be so sure that between 59 and 61% of the people in the city prefer the

    brand.

    Factors that Affect Confidence Intervals

    There are three factors that determine the size of the confidence interval for a

    given confidence level:

    Sample size Percentage Population size

    Sample Size

    The larger your sample size, the more sure you can be that their answers truly

    reflect the population. This indicates that for a given confidence level, the larger

    your sample size, the smaller your confidence interval. However, the

    relationship is not linear (i.e., doubling the sample size does not halve the

    confidence interval).

    Percentage

    Your accuracy also depends on the percentage of your sample that picks a

    particular answer. If 99% of your sample said "Yes" and 1% said "No," the

    chances of error are remote, irrespective of sample size. However, if the

    percentages are 51% and 49% the chances of error are much greater. It is easier

    to be sure of extreme answers than of middle-of-the-road ones.

    When determining the sample size needed for a given level of accuracy you

    must use the worst case percentage (50%). You should also use this percentage

    if you want to determine a general level of accuracy for a sample you alreadyhave. To determine the confidence interval for a specific answer your sample

    has given, you can use the percentage picking that answer and get a smaller

    interval.

    Population Size

    How many people are there in the group your sample represents? This may be

    the number of people in a city you are studying, the number of people who buy

    new cars, etc. Often you may not know the exact population size. This is not a

    problem. The mathematics of probability proves the size of the population is

  • 8/9/2019 Sampling Final Copy

    9/10

    irrelevant unless the size of the sample exceeds a few percent of the total

    population you are examining. This means that a sample of 500 people is

    equally useful in examining the opinions of a state of 15,000,000 as it would a

    city of 100,000. For this reason, The Survey System ignores the population size

    when it is "large" or unknown. Population size is only likely to be a factor whenyou work with a relatively small and known group of people (e.g., the members

    of an association).

    The confidence interval calculations assume you have a genuine random sample

    of the relevant population. If your sample is not truly random, you cannot rely

    on the intervals. Non-random samples usually result from some flaw in the

    sampling procedure. An example of such a flaw is to only call people during the

    day and miss almost everyone who works. For most purposes, the non-working

    population cannot be assumed to accurately represent the entire (working and

    non-working) population.

    Sampling error

    In statistics, sampling error or estimation error is the error caused by

    observing a sample instead of the whole population.

    An estimate of a quantity of interest, such as an average or percentage, will

    generally be subject to sample-to-sample variation. These variations in thepossible sample values of a statistic can theoretically be expressed as sampling

    errors, although in practice the exact sampling error is typically unknown.

    Sampling error also refers more broadly to this phenomenon of random

    sampling variation.

    The likely size of the sampling error can generally be controlled by taking a

    large enough random sample from the population, although the cost of doing

    this may be prohibitive; see sample size and statistical powerfor more detail. If

    the observations are collected from a random sample, statistical theory provides

    probabilistic estimates of the likely size of the sampling error for a particular

    statistic orestimator. These are often expressed in terms of its standard error.

    Sampling error can be contrasted with non-sampling error. Non-sampling error

    is a catch-all term for the deviations from the true value that are not a function

    of the sample chosen, including various systematic errors and any random errors

    that are not due to sampling. Non-sampling errors are much harder to quantify

    than sampling error.

    http://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Errors_and_residuals_in_statisticshttp://en.wikipedia.org/wiki/Statistichttp://en.wikipedia.org/wiki/Random_samplehttp://en.wikipedia.org/wiki/Sample_sizehttp://en.wikipedia.org/wiki/Statistical_powerhttp://en.wikipedia.org/wiki/Observationhttp://en.wikipedia.org/wiki/Statistical_theoryhttp://en.wikipedia.org/wiki/Probabilityhttp://en.wikipedia.org/wiki/Statistichttp://en.wikipedia.org/wiki/Estimatorhttp://en.wikipedia.org/wiki/Standard_error_(statistics)http://en.wikipedia.org/w/index.php?title=Non-sampling_error&action=edit&redlink=1http://en.wikipedia.org/wiki/Systematic_errorhttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Errors_and_residuals_in_statisticshttp://en.wikipedia.org/wiki/Statistichttp://en.wikipedia.org/wiki/Random_samplehttp://en.wikipedia.org/wiki/Sample_sizehttp://en.wikipedia.org/wiki/Statistical_powerhttp://en.wikipedia.org/wiki/Observationhttp://en.wikipedia.org/wiki/Statistical_theoryhttp://en.wikipedia.org/wiki/Probabilityhttp://en.wikipedia.org/wiki/Statistichttp://en.wikipedia.org/wiki/Estimatorhttp://en.wikipedia.org/wiki/Standard_error_(statistics)http://en.wikipedia.org/w/index.php?title=Non-sampling_error&action=edit&redlink=1http://en.wikipedia.org/wiki/Systematic_error
  • 8/9/2019 Sampling Final Copy

    10/10