Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker,...

32

Click here to load reader

Transcript of Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker,...

Page 1: Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker, Jr. (2006)) this notion implies that the probability of inclusion of every unit or

Theme: Survey errors under non–probability sampling

0 General information

0.1 Module code

Theme- survey errors under non-probability sampling

0.2 Version history

Version Date Description of changes Author Institute1.0 29-02-2012 First version Andrzej Młodak GUS (PL)2.0 30-05-2012 Second version Andrzej Młodak GUS (PL)

0.3 Template version and print date

Template version used 1.0 p 3 d.d. 28-6-2011

Print date 8-5-2023 15:44

1

Page 2: Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker, Jr. (2006)) this notion implies that the probability of inclusion of every unit or

Contents

General section – Survey errors under non–probability sampling ...........................................................3

1. Summary .......................................................................................................................................3

2. General description .......................................................................................................................5

2.1. Main types of non–probability sampling methods and errors occurring in their use ...................5

2.2. Loss of accuracy due to bias ..........................................................................................................8

2.3. Balance between the gain in variance and the loss of precision due to bias ..............................10

3. Design issues ..............................................................................................................................10

4. Available software tools .............................................................................................................11

5. Decision tree of methods ............................................................................................................11

6. Glossary ......................................................................................................................................11

7. Literature ....................................................................................................................................11

A.1 Interconnections with other modules ......................................................................................14

General section – Survey errors under non–probability sampling ...........................................................3

1. Summary .......................................................................................................................................3

2. General description .......................................................................................................................3

3. Glossary ........................................................................................................................................5

4. Literature ......................................................................................................................................5

A.1 Interconnections with other modules ........................................................................................7

2

Page 3: Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker, Jr. (2006)) this notion implies that the probability of inclusion of every unit or

General section – Survey errors under non–probability sampling1

1. Summary

This module is devoted to some problems concerning the quality of surveys constructed using the non-probability samples (also called convenience samples). Recall that (cf. R. M. Groves (1989), R. D. Fricker, Jr. (2006)) this notion denotes that in thisimplies that design the probability of inclusion of every unit or respondent in the sample is unknown. These samplings are sometimes very useful in practice. As the name implies, such samples are often used because it is somehow convenient to do so. There are various reasons why they are for thieir usefulness are various. Firstly, in classical probability samples, sampling probabilities are established by the researcher – arbitrarily or by special algorithms constructed by him. Therefore – Aas R. D. Fricker, Jr. observed (2006)) – non–probability samples occur when the probability that every unit or respondent was included in the sample cannot be determined. A distinguishing feature between probability and non–probability samples is who decides whether an individual is included in the sample. For probability samples, the surveyor chooses and applies the probability mechanism by which the sample is selected.

Many books and papers are devoted to non–probability sampling designs. In each of themThey list the pros and cons are shownof the approach and in each of them the introduce the concept introduction of the superpopulation model concept is introduced. This difference inA wide spectrum of views on the utilityusefulness of these methods can also be observed also in this handbook. For instance, in the module “Some basic methods” belonging toin the chapter “Sample selection” its author argues that “because the selection of elements is non–random, non–probability sampling does not depend on the rationale of probability theory, thus it does not allow the estimation of sampling errors. These conditions give rise to exclusion bias, placing limits on how much information a sample can provide about the population. Information about the relationship between sample and population is limited, making it difficult to extrapolate from the sample to the population. Non–probability sampling includes accidental (haphazard, convenience) and purposive sampling. These methods are usually not applied in business statistics.”

1 I would like to express my gratitude to Mrs. Eva Elvers (Statistics Sweden), Mr. Paolo Righi (ISTAT, Italy) and Mr. Ioannis Nikolaidis (El–Stat, Greece) for very valuable comments and suggestions.

3

Page 4: Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker, Jr. (2006)) this notion implies that the probability of inclusion of every unit or

On the other hand, the author of the module “Design of sampling methods” withinin the same chapter, indicatingpointing out analogous problems, thinks that the efficient inference from non–probability sampling about the population is possible (also in business statistics), but it requires the characteristics of the population to follow somea model or to be evenly or randomly distributed over the population. Statistics Canada (2003) provides a wider discussion aboutof the advantages and drawback of these methods. For one thing, Tthe advantages are that this method is y are a quick and convenient way of drawing samples, are inexpensive, does not require a survey frame and can be useful for exploratory studies and survey development. The negative properties of these methodsdownside is are that in order to make inferences about the population itthese methods requires strong assumptions about the representativeness of the sample. To saybe more precisely, without special additional actionssteps we cannot be sure whether the use ofsing the non-probability sample we can formulate warrants conclusions about the entire population. Moreover, it is impossible to determine the probability that a unit in the population is selected for the sample, so reliable estimates and estimates of sampling error cannot be computed.

In practice, athe use of non–probability sampling methods generates samples which are usually concentrated on a relatively small area, so the cost of interviewers’ work areis rather low. A common design is for the interviewer to subjectively decide who should be sampled (Biggeri and Falorsi (2006)). For example, Milligan et al. (2004) introduce the non-probability sampling used in the EPI (Expanded Program of Immunization). This method involves choosing random clusters consisting of territorial areas containing the population to be studied. This choice is made using the probability proportional to the number of units in a given cluster. Next, in each cluster a team of interviewers locate the central point, sample a direction from this point to the border of the cluster and interview all units located on the line segment connecting these two points. This attempt is seriously burdened as a result of arbitrary decisions or errors made by interviewers. This is a much more important problem than in the case of classical probability sampling. Even the modification of this option known as compact segment methods (Turner et al. (1996)), where the choice of direction made by the interviewer is replaced by a special subdivision of a given cluster into segments – although better – is not quite free from such errors. It is easy to observe that the above method can also be applied in business statistics, for surveys of economic entities (especially smaller) spread across various territorial areas. So, these problems can be useful also in this case.

Statistics Canada (2003) observed that in business statistics non–probability sampling is often used by market researchers as an inexpensive and quick alternative to probability sampling, They should keep in their mind, however, all the weaknesses of such an approach. This type of sampling can also be also applied as an auxiliary tool to better assess the quality of traditional (i.e. based on probability sampling) statistical business surveys (e.g. in pilot surveys of a new field to be studied or preliminary studies of attitudes, behaviours and inclinations to responsed fromon the part of respondenting units and their contact persons or in ex post analysies allowing to facilitate a better understanding of the results of the main survey).

4

template, 05/16/12,
General comment: a non –probability sample is used in the model based approach. In this approach the model drives the inference while the random choices (such as the random selection) are neglected. In the book of Valliant, Dorfman and Royall (2000) is well explained. They suggest to use random choices when the researcher is conscious that the model used does not fit the real model. The issues is quite complex. I suggest to introduce the concept of model for describing the survey errors.
Page 5: Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker, Jr. (2006)) this notion implies that the probability of inclusion of every unit or

As regards the quality assessment, the subjective choice of respondents by an interviewer (most often they are typically selected from units being that are well–accessible or belonging to his/her close environment) leads to the exclusion of a large part of the population and, inas a result, to a large bias of the results. Moreover (cf. Statistics Canada (1993), it can falsely reduce the apparent variability of the population due to a tendency to select ‘typical’ units and eliminate extreme values. Therefore, the main challenge of this domain consists in proper recognition and measurement of these errors, what ich can contribute to their minimization. In business statistics, it concernsthe problem is mainly evident in the short–term statistics, where (because samples are small) the high sampling errors are avoided “fromby definition”. To obtain satisfactory effectresults, it is usually assumed (sometimes it without always knowing if the assumption isis not known whether is it rightcorrect) that the characteristics of the population follow somea model or are evenly distributed over the population. In particular, survey errors are linked with model failures. Roughly speakingIn other words, iiff hypothetically the researcher uses a superpopulation model corresponding to the real superpopulation model, a suitable non–probability sampling does not produce survey errors but sampling errors (variance of the estimates).

The main aspects of quality assessment which will be are described herebelow are individuals in the population have no control over this process.

On the other hand, non–probability sampling often requires much less time, effort and costs than probability attempts. Despite having less importance for statistical inference, responses obtained from non-probability sampling can be used to analyze various hypotheses, identify problems, define alternatives, collect additional data etc. They are, however, biased, by various errors. theninclude bias inassociated with some specific types of suchthese methods, loss inof accuracy due to this bias, the assessment of Mean Square Error of estimators and variance properties, as well as the balance between the gain in variances and the loss inof precisions due to bias. Properties of various types of non–probability sampling methods (quota sampling, cut-off sampling, volunteer sampling, judgement sampling, etc.) from this point of view will be are described later on.The first one of them is connected with a larger presence of the human factor. For instance, P. Milligan et al. (2004) introduce the non-probability sampling used in the EPI (Expanded Program of Immunization). This method involves choosing random clusters consisting of territorial areas containing the population to be studied. This choice is made using the probability proportional to the number of units in a given cluster. Next, in each cluster a team of interviewers locate the central point, sample a direction from this point to the border of the cluster and interview all units located on the line segment connecting these two points. This attempt is seriously burdened as a result of arbitrary decisions or errors made by interviewers. This is a much more important problem than in the case of classical probability sampling. Even the modification of this option known as compact segment methods (A. G. Turner et al. (1996)), where the choice of direction made by the interviewer is replaced by a special subdivision of a given cluster into segments – although better – is not quite free from such errors. It is easy to observe that the above method can also be applied in business statistics, for surveys of economic entities (especially smaller) spread across various territorial areas. So, these problems can be useful also in this case.

5

template, 05/03/12,
General comment: a non –probability sample is used in the model based approach. In this approach the model drives the inference while the random choices (such as the random selection) are neglected. In the book of Valliant, Dorfman and Royall (2000) is well explained. They suggest to use random choices when the researcher is conscious that the model used does not fit the real model. The issues is quite complex. I suggest to introduce the concept of model for describing the survey errors.
template, 05/03/12,
....besides survey sampling
template, 03/05/12,
Can the author explain better what he means?
Page 6: Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker, Jr. (2006)) this notion implies that the probability of inclusion of every unit or

We will describe the most important problems occurring in the main types of non-probability sampling, i.e. quota sampling, judgemental sampling and cut–off sampling We will recall, of course, tThe main principles of these methods will, of course, be repeated but their detailed .description can be found in the chapter “Sample selection” of this handbook or in the papers by Statistics (2003), Eurostat (2008), Renssen (1998), Dalen (2005), and Montaquila and Kalton (2011). It is worth noting that non –probability sampleing is used in the model–based approach. In this approach the model drives the inference while the random choices (such as the random selection) are neglected. The process is well explained Iin the book ofby Valliant, Dorfman and Royall (2000) is well explained. They suggest to use using random choices when the researcher is conscious knows that the model used does not fit the real model. The issues is quite complex. We will introduce the concept of modelling forto describinge the survey errors later on.

2. General description

2.1. Main types of non–probability sampling methods and errors occurring in their use

As we have noted in the previous section and in the module on design of sampling methods, the non–probability approach is sometimes very useful in practice, because it often requires often much less time and effort thean other methods and is thus usually are less costly to generate, but generally does not support statistical inference. However, responses from a convenience sample might be useful in developing research hypotheses, for identifying issues, defining ranges of alternatives, or collecting other sorts of non-inferential data.

The nNon-probability sampling is commonly is used when the samples are small, when we want rapidimmediate (or fastquick) estimates, such as in short–term statistics. So, when applying non–probabilistic sampling, although we avoid high sampling errors (as thesince samples are small), bias is appearedpresent. Non–probability sampling usesrelies on a subjective method of selecting the sample. So, some population elements cannot been selected, as they have a zero probability. According to the investigated class of methods, the inclusion probabilities for a part of the population are equal to zero. As a result, bias is appearedpresent that affects the total survey error. Additionally, if we apply some types of these attempts (such as balance selection or quota sampling), we suppose that the sample distribution of the main survey characteristics will coincide with the relevant population distributions. Of course, these two distributions may not coincide and therefore bias exists. This bias increases the total survey error.

WSince without a short remainderview of principles ofunderlying such a methodology (especially reasons and types of occurring resulting errors) we cannot fully understandknow the problems and methods ofrelating to quality assessment taking them into account. TIn this subsection we will recallcontains a shortly some overview of basic features of particular non–probability sampling methods, together with reasons and sources of errors occurring when they are applied. Only these aspects which are interesting from the point of view of quality assessment and directly connected with these topics are discussed. The overview is followed by a list of We will formulate also firstgeneral remarks on how the scale of the errors can be assessed.

6

Page 7: Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker, Jr. (2006)) this notion implies that the probability of inclusion of every unit or

Quota sampling is perceived as the non–probability sampling equivalent of stratified sampling. In this case, The strata can be here replaced with arbitrarily established population cells (called quota cells). Then, some, also arbitrarily assumed, units in each cell are surveyed (cf. Bergdahl et al. (2001)). Statistics Canada (2003) indicates that quota sampling is a means forof satisfying sample size objectives for the subpopulations. QThe quotas may be based on population proportions. For example, if there are 100 men and 100 women in the population and a sample of 20 are to be drawn, 10 men and 10 women may be interviewed. Thus, the quota sampling is based on the inclusion of members offrom different subpopulations. The sources of pPossible errors lie inresult from the fact that it is the interviewer that usually decides usually who isselects units to be sampled. He/sheThey can badly assess the current situation due to insufficient or imnproperappropriate information, which cannot be verified. Hence, the chosen sample can be representative only inwith respect to one or a few features of the population. Therefore, the total bias of representativeness may be large. On the other hand, athe problem of non–response can be observed. Of course, units that are unwilling to participate can simply be simply replaced by units that would likeagree to reply, but this actionsolution can seriously affect the representativeness (replaced units can be different than the original ones). Statistics Canada (2003) remindspoints out that market researchers often use relatively inexpensive, reflective and easy to manage quota sampling (particularly for telephone surveys) to survey individuals with particular socio–economic profiles. Rust (2004) suggests that to assess possible negative effect modifiers and confounders (and by the same token to efficiently use the quota sampling) we should haveknow the most importantrelevant information known in advance. In practice, however, there are two main ways enabling the assessment of the level of possible errors. The Ffirst of themone is an analysis of the sampling frame and preliminary studies. That is, we recognize athe structure of the units contained in the frame on the accountin terms of of various aspects (of whichincluding geographical location) and estimate the chance of inclusion of ding them into the sample. We can even conduct even a relevant simulation study. On the other hand, if we have at our disposal data on performance and results of previous exercisessurvey rounds (if they werere have been any) then we can use this information from these realizations which canto help us to recognize the main problems withconnected with the frame and behaviors and decisions of interviewers. Then we can takeconduct trials to reduce them in the current and future editionsrounds.

Barnett (2011) correctly observes correctly that the practical difficulties of conducting such sampling as mentioned above can lead to somea lack of representationveness or randomness of the resulting samples. In his opinion, non–response from some selected units complicates the sampling scheme and the use of conventional results for stratified sampling schemes (single– or multi–factor) may at best produce approximations to the actual, but often unnon-measurable, statistical properties of the quota sampling method in question. Bergdahl et al. (2001) suggests that one way of reducing these inconveniences may be anto application of y additional relevant stratifiers called controls (if they are available in the quota sample) and relevant poststratification within each cell. This actionsolution can decrease the bias resulting from the weaknesses of the choiceselection mechanism. The sample can, however, still be non–representative with respect to some other characteristics.

7

Page 8: Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker, Jr. (2006)) this notion implies that the probability of inclusion of every unit or

In haphazard sampling, the units are selected randomly, in an arbitrary manner without design. Statistics Canada (2003) compares it with the ‘man in the street’ interview, where the interviewer selects any person who happens to walk by. The bias and source of possible errors isresult from here heterogeneity of the surveyed population. That is, the structure and frequency of occurrence of particular groups (classes) of units is unknown. Hence, such a choice can be either too homogeneous or too diversified. A good method of assessment of ing the expected level of errors in such cases is to observatione of the point of conduction of where a surveys is to be conducted before startingactually doing it. That In other words, we observe units appearing in the “field of vision” in several consecutive time periods before the survey. Therefore, we can observe whether the basic structures of the population is stable and how arewhat the differences there are between them inat particular time moments and also whichat factors affect them. Using these observations, we can estimate expected systematic error of such a survey.

Purposive (judgement) sampling. As it is wellwe – known, in this method the decision onas to what units are to be sampled units is is madetaken by thean expert. He/sheThey selects and forms what is considered to be a representative sample. Some analyticianssts (e.g. Statistics Canada (2003) argue that it is perhaps even more biased than haphazard sampling, because since any preconceptions the researcher hasmay have are reflected in the sample, which means that large biases can be introduced if these preconceptions are inaccurate. In practice, the researcher may decide to draw the entire random sample from one group of units, which he/she they regards as ‘representative’ (e.g. when the target population comprises all economic entities, the researcher can choose only those operating in a selected area, which is perceived as a scaled-down version of the country). The inference can be performed mainly for such an area, but under some conditions (such as the above mentioned ‘representativeness’ of the area) can be generalized onto the entire population. One of the methods of quality assessment of such surveys are long–timeerm experiments using various structures of units and observation of their change over time. For instance, most ‘trial’ of elections in Poland conducted in previous years in Poland beforeprior to official parliamentary or presidential elections in several smaller towns, which were regarded as small ‘representations’ of Poland, showeddemonstrated that such “a choicemodel populations” is very difficultare and unstable over time. Hence, if one would like to adopt this approach in terms of in the field of business statistics, if we would like to apply this type of surveythe actual survey should be preceded by a preliminary ‘point’ survey on small groupsone should conduct such preliminary ‘point’ survey on small groups of units. If the frequency of such pre–surveys is sufficiently high, then we will be able tothey should help to recognize all regularities necessary tofor a proper estimate of ion of the expected quality of the main survey. So, the analyses conducted by the authors of UNSTATS (2008) demonstrate that although stratification criteria for probability sampling can also be subjective, the clearer rule of sampling and possibilities of (at least partial) optimizing the objectiveness make probability sampling much more efficient. Hence, the expected reduction of error can be easy computed. Practical examples of these problems in relation e.g. to the producer price index in some countries are presented by Bergdahl et al. (2001).

8

Page 9: Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker, Jr. (2006)) this notion implies that the probability of inclusion of every unit or

BThe balanced sampling is used for some types of estimation (model based or expansion). The problems of In this approach its quality assessment can lie hereis mainly connected with in the quality of control variables with known population totals used as auxiliary sources of information useful in sample selection (it is assumed that the sample mean of each control variable equals to its population mean). SoThis requires a detailed knowledge about sources of information on such auxiliary variables and the scope of their data gaps, distribution of known values, precision of imputation (if it washas been conducteddone). Moreover, theone has to verify their information worth of themvalue (i.e. non pair of themvariables should be a carriercarry of similarthe same information) has to be verified.

CThe cut–off sampling, an example of being a probabilistic selection, where a part of the population withhas a zero the selection probability is excluded equal to zero, is applied (as it was noted in the theme module toin the chapter “Weighting and estimation” of this handbook) in business surveys is commonly applied in business surveys for the data collection targeting estimates used for calculating short–term indices (e.g. turnover index) and for applying model based estimators (ratio or regression estimators). Statistics Canada (2003) states that in cut–off sampling there is a deliberate exclusion of part of the target population from sample selection, because it would cost too much to participate in sampling the entire population and also the bias caused by the cut–off is deemed negligible. As can be expected one can easily conjecturer, the main task of quality assessment will be hereis to determine the costeffect of exclusion of ding such part of the population. That is, the following aspects should be taken into account in this respect: distribution of the population in terms of surveyed data (both sub–distributions for parts of the population which are to be left and dropped), financial and organizational cost (or gain) of such reduction of the sampling frame and precision of estimates. Davies (2000) points out that the discussion of cut–off sampling was started under non–probability sampling, and it is continued here, emphasising the use of models to estimate for the part of the population that was cut off. In terms of treatment of the excluded units, Bergdahl et al. (2001) propose either to completely ignore cut–off units or to model them by ratios of the total of target variables to the auxiliary variable using relevant data collected in previous years. They show examples of surveys in business statistics and relevant simulation studies which show advantages and drawbacks of both options.

Snowball sampling is often used when the desired sample characteristic is rare and it is extremely difficult or prohibitively expensive to locate respondents by other means. Fricker, Jr. (2006) correctly argues that, under these conditions, a random sample will be inefficient, because the resulting sample is unlikely to contain many (or perhaps even any) respondents with the desired characteristics. The sSnowball sampling may be realized by referrals from initial respondents to generate additional respondents. In this situation, the quality assessment concerns theinvolves recognition of the cost of generation of ing such respondents and profitsbenefits achieved fromof this solutionoperation (e.g. gain in precision of estimation of population statistics). There are, however, some situations, where snowball sampling can be perceived as a special type of the cut–off sampling. It happens when we exclude from the sample selection part of the population where the features of interest rarely occur or which is not significant from the point of view of possible bias and gain in precision. The quality of such an option can be assessed not only by the decrease of in costs and precision of estimation (as in cut–off sampling) but also by an increase ofin the likelihood that the sample will not be representative (which is usually connected with its application).

9

template, 18/05/12,
There are a lot of reasons for using cut-off sampling. See Valliant et al (2000). For instance cut-off sampling produces efficient estimates when the superpopulation model is the etheroschedastic ratio model, or when the parameter of interest is a variation, or ect.. See chapter 10 for the various applications of the cut-off sampling
Page 10: Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker, Jr. (2006)) this notion implies that the probability of inclusion of every unit or

The last (but not least) type of non–probability sampling method is the volunteer sampling, where the respondents are volunteers from especially chosen subpopulations (e.g. economic entities with a fixed size). Statistics Canada (2003) notes that this method can be subject to large selection biases, but it is sometimes necessary (e.g. in surveys of prosperity, where respondents are invited to express their subjective opinions on their economic situation). Therefore, in practice, usually a the majority of possible respondents will usually be rather unwilling to respond, whatich can result in a large selection bias. Thus, quality assessment requires to means recognition of zing the inclination of units to respond (and maybe also to indication of e nextother possible respondents) based on previous exercisessurvey rounds.

2.2. Loss inof accuracy due to bias

Särndal et al. (1992) pointed out that non–probability sampling (and the cut–off option, in particular) produces biased estimates. Therefore, we should firstly draw ourpay attention into this aspect. A good starting point for this consideration is the paper by Benedetti et al. (2010). They consider the Hidiroglou estimator of the population parameter, when the population was divided into three strata: the first, for which data are available forfrom a census (or another exhaustive survey) ❑❑, a stratum from which the units will be sampled for a sample survey ❑❑and a stratum of containing units to be excluded ❑❑. Here, the basic estimator of the population statistics of the variable y of the form

❑❑❑̂❑∑❑❑

❑❑∑❑

❑❑, where s is the sample drawn from ❑❑, is here augmented by a model–based

component that takes into account dropped the part of the population which was excluded. Therefore, we assume that ❑❑ (❑❑❑❑ ). Using the auxiliary external variable (e.g. computed usingon the basis of administrative data) for excluded units we can estimate the exclusion component by

~❑∑❑❑

❑❑

∑❑❑

❑❑∑❑❑

❑❑

Benedetti et al. (2010) considerednote that using these models, the complete estimator is given as

❑̂❑❑❑❑̂❑❑̂❑(~❑) (❑❑❑̂❑). They proved that (assuming that t̂ S is unbiased) the bias of this estimator

can be assessed as

❑̂❑(~❑) (❑❑❑❑ )

10

Page 11: Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker, Jr. (2006)) this notion implies that the probability of inclusion of every unit or

Thus, the main source of the loss of accuracy due to bias in such model is the absolute difference between the estimate of the exclusion component ~❑ and its true value δ .

Benedetti et al. (2010) also discuss also the problem of optimal partitioning of the population in the three analyzed strata and many target variables (i.e. variables for which population statistics are to be estimated). To do it, they proposed an iterative simulated annealing (SA) algorithm based on labels assigned to the units according to the decreasing ‘temperature’ of change of the optimal sample size computed as a minimax of the values of this quantity treated as a function of the current partition for target variables. They test their proposal uponusing the example of a business survey, i.e. a monthly survey of red meat slaughtering monthly survey performed by ISTAT (Italian National Institute of Statistics) based on a stratified sampling, with a stratification by kinds of slaughterhouse and geographical division, for a total of 5 strata. This division into strata is connected with the ownership sector of (public or private) ownership of the firmscompanies (public or private) as well as their capacity and geographical location.

Using the aforementioned results, we are able to model bias in another surveys based on the non–probability sampling design. For example, if the snowball sampling is a special case of cut–off sampling, these results can be applied directly. In haphazard or purposive sampling, ❑❑ can be understood as part of the population droppedomitted by an interviewed or an expert, respectively. In the case of snowball sampling with additionally generated respondents ❑❑ can be replaced by ❑❑❑❑ (the population with dropped ‘bad’ units and a subpopulation of the primarily excluded units from which the additional respondents are sampled, respectively). Hence the estimator can be rewritten as

❑̂❑❑❑❑̂❑❑̂❑❑̂❑(~❑ ) (❑❑❑̂❑❑̂❑) and (1) will be of the form

~❑∑❑❑

❑❑

∑❑❑

❑❑∑❑❑

❑❑∑❑❑

❑❑

In the case of quota or volunteer sampling – the specific bias component, having its source inrelating that the resulting from the fact that sample data cannot be extrapolated to the entire population – is much more difficult to model. In the case of quota sampling, we rely on classification variables

X1 , X2 , …, Xm – which are used to divide the population into substrata from which

relevant subsample are drawn – and additional control variable ; In the case of quota sampling, having the classification variables ❑❑❑❑❑❑ , assuming now that S is the entire population, we can estimate the target variable statistics as ❑̂❑

~❑❑̂❑, where

~❑∑❑

∑❑❑

❑❑

∑❑

∑❑❑

❑❑

11

Page 12: Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker, Jr. (2006)) this notion implies that the probability of inclusion of every unit or

In the case of volunteer sampling, modelling of the bias can be mostvery difficult. The only reasonable solution is to assess the average probability of choice of osing a good’ next respondent using experience from previous exercisesrounds and takereating it as ~❑. However, (according to the suggestion expressed in the module “Design of sample selection”) also in the case of these methods for compiling indices, some of the bias of short–term indices or changes over time or ratios between two periods is removed due to the subtraction of similar biases of the estimates Y at different time points. Such removingal of biases makes it meaningful theto comparisonse between potentially biased periodic surveys (Bell and Hillmer (1990), Kish (1994)).

As regards to the choice of estimators like t̂ S, in the module devoted to design of sampling methods recommends model based estimators (e.g. ratio estimators) for compiling indices are in this context recommended. Because of the small samples size of the samples for data collection, the purposive sampling can effectively select the largest units and the cut–off stratified sampling may be preferred for randomly selecting randomly elements above a certain size threshold (Särndal et al. (1992)). In principle, the bias of ratio or regression estimation depends, into a large extendt, on covariance between data onfor target and auxiliary variables and vanishes for large sample sizes (see Rao (2000)). So, they are efficient tools.

2.3. Balance between the gain in variance and the loss of precision due to bias

The loss of precision due to bias can have an impact on the variance of estimation. Such connections can also be observed also in the case of non–probability sampling. Both quality aspects are reflected in the complex error measure, such as the mean squared error (beingwhich is the sum of variance and squared bias). Knaub (2007) noteds that cut–off sampling might be a good choice where the variance reduction more than offsets the introduction of a small bias. According to himIn his opinion, balanced sampling can also have also lower bias, but in highly skewed establishment surveys, extraordinary non–sampling error in responses in the smallest responses,of when small respondents, when they are required to respond on too frequent a basisly,, can make it impractical to use such data, thus forcing the use of a cut–off sampleing. His experiments (where the he coefficient of heteroscedasticity for instances of imputingation for non–response as opposed to mass imputation for what is not in the sample was systematically changed) showeddemonstrate that for a balanced model–based sample and a cut–off model–based sample ratio estimation has often proved useful results, similarly asjust like the classical ratio estimator.

To formally visualize the both aspects, we have to return to the basic models describe d by Benedetti et al. (2010). They have indicated that for cut–off sampling

❑̂❑❑̂❑❑❑❑̂❑(~❑)❑❑̂❑❑❑❑̂❑

12

Page 13: Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker, Jr. (2006)) this notion implies that the probability of inclusion of every unit or

The variance of t̂ y depends mainly on the variance of the estimator used to draw a sample from the Quota sampling is perceived as the non–probability sampling equivalent of stratified sampling. R. D. Fricker, Jr. (2006) states that “the researcher first identifies the strata and their proportions as they are represented in the population. Then the choice of respondents is left up to the individual or individuals conducting the survey, where the interviewers are only required to match specific quotas within each strata”. The strata can be replaced with arbitrarily established population cells (called quota cells). Then some also arbitrarily assumed units in each cell are surveyed (cf. M. Bergdahl et al. (2001)). Possible errors may then occur at the each stage of such a survey.

First of all, the problem lies in decisions of the researcher. Each arbitrary decision is burdened with the possibility of error occurrence. That is, the researcher can badly assess the current situation due to insufficient or improper information, which cannot be verified. On the other hand, similar problems may occur on the part of the interviewer: the choice of the number of non–response units, mistakes in data collection, etc. V. Barnett (2011) observes correctly that the practical difficulties of conducting such sampling as mentioned above can lead to some lack of representation or randomness of the resulting samples. In his opinion, non–response from some selected units complicates the sampling scheme and use of conventional results for stratified sampling schemes (single– or multi–factor) may at best produce approximations to the actual, but often unmeasurable, statistical properties of the quota sampling method in question.

M. Bergdahl et al. (2001) suggests that one way of reducing these inconveniences may be an application of additional relevant stratifiers called controls (if they are available in the quota sample) and relevant poststratification within each cell. This action can decrease the bias resulting from the weaknesses of the choice mechanism. The sample can, however, still be non–representative with respect to some other characteristic.

In judgement (judgemental, purposive) sampling the researcher selects a sample on the basis of their own judgement. For example, the researcher may decide to draw the entire random sample from one group of units, which they regard as ‘representative’ (e.g. when the target population comprises all economic entities, the researcher can choose only those operating in a selected area, which is perceived as a scaled-down version of the country). The inference can be performed mainly for such an area, but under some conditions (such as the above mentioned ‘representativeness’ of area) can be generalized onto the entire population. R. D. Fricker, Jr. (2006) observes that judgement sampling can also be applied in even less structured ways without the application of any random sampling, in which case statistical inference cannot be applied at all. The document prepared by UNSTATS (2008) underlines that the main difficulty with this type of sampling is the subjectivity of determining what constitutes a representative area (depending practically exclusively on the choice made by the expert). Mock elections conducted in previous years in Poland before official parliamentary or presidential elections in several smaller towns regarded as small ‘representation’ of Poland showed that such a choice is very difficult and unstable over time. The authors of UNSTATS (2008) analysis demonstrate that although stratification criteria for probability sampling can also be subjective, the clearer rule of sampling and possibilities of (at least partial) optimizing the objectiveness make probability sampling much more efficient. On the other hand, judgemental sampling doesn’t have a mechanism either for ensuring that each area has a non–zero chance of inclusion or for calculating the selection probability of those that are ultimately selected. Practical examples of these problems in relation e.g. to the producer price index in some countries are presented by M. Bergdahl et al. (2001).

13

Page 14: Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker, Jr. (2006)) this notion implies that the probability of inclusion of every unit or

Snowball (cut-off) sampling is often used when the desired sample characteristic is rare and it is extremely difficult or prohibitively expensive to locate respondents by other means. R. D. Fricker, Jr. (2006) correctly argues that, under these conditions, a random sample will be inefficient, because the resulting sample is unlikely to contain many (or perhaps even any) respondents with the desired characteristics. The cut–off sampling may be realized by referrals from initial respondents to generate additional respondents. A more common practice in this type of survey is, however, to exclude from the sample selection part of the population where the features of interest rarely occur or which is not significant from the point of view of possible bias and gain in precision. This approach can essentially decrease the costs but it should be used reasonably, because the technique itself can increase the likelihood that the sample will not be representative. P. Davies (2000) points out that the discussion of cut-off sampling was started under non-probability sampling, and it is continued here, emphasising the use of models to estimate for the part of the population that was cut off.

The main source of errors can be the properties of the excluded part of population. It is practically impossible to precisely forecast the consequences of such an exclusion in terms of statistical inference. One can at most make some rough estimates. There is also no information on possible outliers in such an excluded set of units. In terms of treatment of the excluded units M. Bergdahl et al. (2001) propose either to completely ignore cut–off units or to model them by ratios of the total of target variables to the auxiliary variable using relevant data collected in previous years. They show examples of surveys in business statistics and relevant simulation studies which show advantages and drawbacks of both options.

stratum designed for a sample survey, taking into account the exclusion component. The bias is given by (2) and its square depends on squared deviation of estimate for exclusion parameter from its true value. So, if the variance ❑̂❑ is sufficiently small (whatich can be achieved by traditional optimization methods), then small value of exclusion parameters can compensate for a larger bias if, it occurs. Using the Horvitz – Thompson estimator, one can easyily determine the optimum size of the sample necessary to optimize survey efficiency of the survey (cf. Särndal et al. (1992)).

In the case of haphazard or purposive sampling we have ❑̂❑(~❑)❑❑̂❑❑̂❑❑❑❑̂❑ with the exclusion

coefficient estimate ~❑defined as in (3). Hence, we should also minimize variance also of t̂E '. For the

quota sampling ❑̂❑~❑❑❑̂❑❑❑❑̂❑ with ~δdefined as in (4). Here the balance between the gain in

variance and bias seems to be easiermore easily obtainable to obtain.

Statistics Canada (2003) states correctly that that the gains in efficiency of estimates that userely on auxiliary data depend on how well the survey variables are correlated with the available auxiliary data. Therefore, in any case, the value of ~❑ depends on the proper choice of the auxiliary information. This action requires good data quality of data and their strong correlation of them with the main subject of estimation.

Rust (2004) suggests using modified post-stratification that to adjust the measured effect modified a post-stratification should be used. On the other hand, he argues that a good method of reduction of some problems leadscan be minimized by through analysiszing whether some information can be obtained via somea probability – based method.

3. Design issues

14

template, 03/05/12,
There are a lot of reasons for using cut-off sampling. See Valliant et al (2000). For instance cut-off sampling produces efficient estimates when the superpopulation model is the etheroschedastic ratio model, or when the parameter of interest is a variation, or ect.. See chapter 10 for the various applications of the cut-off sampling
Page 15: Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker, Jr. (2006)) this notion implies that the probability of inclusion of every unit or

The following elements of design, which are indicatementionedd here are: the choice of the method of non–probability sampling (depending on the possibilities in terms of availability of experts, volunteers, interviewers auxiliary data, etc.) and estimation of bias and variance.

4. Available software tools

Typical, as in the case of classical estimation.

5. Decision tree of methods

Not applicable.Glossary

6. Glossary

Term Definition Source of definition (link)

Synonyms (optional)

Quota sampling

A type of non-probabilistic sampling where the researcher first identifies the strata and their proportions as they are represented in the population. Then, the choice of respondents is left up to the individual or individuals conducting the survey, where the interviewers are only required to match fulfil specific quotas within each strata.

Ronald D. Fricker, Jr. (2006)

Judgment Purposive (judgemetal, purposive) sampling

A type of non-probabilistic sampling where the researcher selects the sample on the basis of his or her judgment

Ronald D. Fricker, Jr. (2006), M. Bergdahl et al. (2001)

Judgement sampling,Judgemetal sampling.

Haphazard sampling

A type of non-probabilistic sampling where the units are selected randomly, in an arbitrary manner without design. One can compare it with the ‘man in the street’ interview, where the interviewer selects any person who happens to walk by.

Statistics Canada (2003)

Balance sampling

A type of non-probabilistic sampling used for some type of estimation (model- based or expansion). It is based on control variables with known population totals used as an auxiliary source of information useful in sample selection.

Statistics Canada (2003)

CSnowball (cut-off ) sampling

A type of non-probabilistic sampling, an example of being a probabilistic selection, where a part of the population has a thea zero selection probability equal to zero is applied.A type of non-probabilistic sampling which is realized by referrals from initial respondents to generate additional respondents or by exclusion from the sample selection a part of the population where the features of interest rarely occur or which is not significant form the point of view of possible bias and gain in precision

Ronald D. Fricker, Jr. (2006), M. Bergdahl et al. (2001)

Volunteer sampling

A type of non-probabilistic sampling, where the respondents are volunteers from ea specially chosen subpopulation (e.g. economic entities with a fixed size).

Statistics Canada (2003)

Snowball sampling

A type of non-probabilistic sampling realized by referrals from initial respondents to generate additional respondents or as a special type of the cut–off sampling. It takes placeIt is used when exclusion from the sample selection concerns a part of the population wheren the features of interest rarely occur in the part of the population that is excluded from sampling or when this part which is not significant form the point of view ofin

Ronald D. Fricker, Jr. (2006),

15

Page 16: Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker, Jr. (2006)) this notion implies that the probability of inclusion of every unit or

terms of possible bias and gain in precision.Exclusion coefficient

A coefficient estimated on the basis onf auxiliary variables assessing anthe impact of the subpopulation excluded from sampling on the basis and the variance of the estimates of totals.

Benedetti et al. (2010)

16

Page 17: Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker, Jr. (2006)) this notion implies that the probability of inclusion of every unit or

7.[4.] Literature

Barnett V. (2011), Guide to Statistical Information Sampling for Surveys, The Higher Education Academy, Math, Tstats & OR Network, Royal Statistical Society, Centre for Statistical Education, University Plymouth, UK, http://www.mathstore.ac.uk/headocs/Sampling%20for%20Surveys.pdf

Bell, W. R. and Hillmer, S.C. (1990), The Time Series Approach to Estimation for Repeated Surveys. Survey Methodology. 16, No 2, 195–212.

Benedetti R., Bee M., Espa G. (2010), A Framework for Cut–off Sampling in Business Survey Design, Journal of Official Statistics, vol. 26, , pp. 651–671.

Bergdahl M., Black O., Bowater R., Chambers R., Davies P., Draper D., Elvers E., Full S., Holmes D., Lundqvist P., Lundström S, Nordberg L., Perry J., Pont M., Prestwood M., Richardson I., Skinner Ch., Smith P., Underwood C., Williams M. (2001), Model Quality Report in Business Statistics, General Editors: P. Davies, P. Smith, http://users.soe.ucsc.edu/~draper/bergdahl-etal-1999-v1.pdf

Barnett V. (2011), Guide to Statistical Information Sampling for Surveys, The Higher Education Academy, Math, Tstats & OR Network, Royal Statistical Society, Centre for Statistical Education, University Plymouth, UK, http://www.mathstore.ac.uk/headocs/Sampling%20for%20Surveys.pdf

Biggeri L, Falorsi, P. D. (2006), A Probability Sample Strategy For Improving The Quality Of The Consumer Price Index Survey Using the Information of the Business Register, Working Paper No 12 Economic and Social Council, Economic Commission For Europe Statistical Commission Conference of European Statisticians Group of Experts on Consumer Price Indices, Eighth Meeting Geneva, 10–12 May 2006, Invited paper submitted by ISTAT, Italy, available at http://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.22/2006/mtg2/wp.12.e.pdf .

Dalen, J. (2005). Sampling Issues in Business Surveys. Pilot Project 1 of the European Community's Phare 2002 Multi Beneficiary Statistics Programme (Lot 1), document available at http://epp.eurostat.ec.europa.eu/portal/page/portal/quality/documents/QIS_PHARE2002_SAMPLING_ISSUES.pdf .

Davies P. (2000), Assessing the Quality of Business Statistics, Office for National Statistics Proc3eedings from the Second International Conference on Establishment Surveys June 17 – 21, 2000, Buffalo, New York, http://www.amstat.org/meetings/ices/2000/proceedings/S38.pdf

Eurostat (2000), Assessment of the Quality in Statistics, Doc. Eurostat/A4/Quality/00/General/Standard report, Statistical Office of European Communities, Luxembourg, http://www.unece.org/fileadmin/DAM/stats/documents/2000/11/metis/crp.3.e.pdf

Eurostat (2008). Survey sampling reference guidelines. Introduction to sample design and estimation techniques, http://epp.eurostat.ec.europa.eu/cache/ITY_OFFPUB/KS-RA-08-003/EN/KS-RA-08-003-EN.PDF

Fricker, R. D. Jr. (2006), Sampling Methods for Web and E-mail Surveys, Naval Postgraduate School, http://www.nps.navy.mil/orfacpag/resumePages/papers/frickerpa/Draft%20Internet%20Survey%20Sampling%20Chapter.pdf

17

Page 18: Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker, Jr. (2006)) this notion implies that the probability of inclusion of every unit or

Groves, R. M. (1989), Survey errors and survey costs, John Wiley & Sons, New York.

Kish, L. (1994), Multipopulation Survey Designs. International Statistical Review, 62, 167-186.

Knaub, J. R., Jr. (2007). Cutoff sampling and inference, InterStat, April 2007, document available at http://interstat.statjournals.net/YEAR/2007/articles/0704006.pdf .

Milligan P., Njie A., Bennett S. (2004) Comparison of two cluster sampling methods for health surveys in developing countries, International Journal of Epidemiology, vol. 33., pp. 469 – 476.

Statistics Canada (2003), Survey Methods and Practices Catalogue, No. 12-587-X. Ottawa, Canada http://www.statcan.gc.ca/pub/12-587-x/12-587-x2003001-eng.pdf

Montaquila, J. M. and Kalton G. (2011), Sampling from Finite Populations. Westat 1600 Research Blvd., Rockville, MD 20850, U.S.A. Based on an article from Lovric, Miodrag (2011), International Encyclopedia of Statistical Science. Heidelberg: Springer Science + Business Media, LLC, http://statprob.com/encyclopedia/SamplingFromFinitePopulations.html .

Rao P. S . R. S . (2000), Sampling Methodologies with Applications, Chapman and Hall/CRC, Boca Raton, London, New York, Washington, U. S. A.

Renssen, R. H. (1998), A Course in Sampling Theory. Statistics. Netherlands, available at http://www.cs.vu.nl/~stochgrp/aionetwerk/course.doc

Rust S. (2004), White Paper on Advantages and Limitations of Alternative Sampling Methods for the National Children’s Study, National Children’s Study Program Office, National Institute for Child Health and Human Development, Rockville, MD, U. S. A., available at the website http://www.nationalchildrensstudy.gov/about/organization/advisorycommittee/2004Jun/Pages/acmsamplingdesign-3.pdf .

Särndal C .E., Swensson, B., Wretman, J. (1992), Model Assisted Survey Sampling, Springer, New York, U. S. A.

Turner A. G., Magnani R. J., Shuaib M. (1996) A not quite as quick but much cleaner alternative to the expanded programme on immunization (EPI) cluster survey design, International Journal of Epidemiology, pp. 25, strpp. 198 – 203.

UNSTATS (2008), Designing Household Survey, Samples: Practical Guidelines, Series: Studies in Methods Series F No. 98, No. ST/ESA/STAT/SER.F/98, United Nations, Department of Economic and Social Affairs, Statistics Division, New York.

Valliant, R., Dorfman, A. H., Royall, R. M. (2000), Finite Population Sampling and Inference. A Prediction Approach, Wiley Series in Survey Methodology, John Wiley & Sons, Inc., New York, U.S.A.

18

Page 19: Template for modules of the revised handbook · Web viewRecall that (cf. Groves (1989), Fricker, Jr. (2006)) this notion implies that the probability of inclusion of every unit or

Specific section – Theme: Survey errors under non–probability sampling

A.1 Interconnections with other modules

• Related themes described in other modules

1. Sample selection

2. Variance estimation

3. Estimation

• Methods explicitly referred to in this module

1. Use of administrative data

2. Design of sample selection

3.[2.] Sample selection

4.[3.] Ratio estimators

• Mathematical techniques explicitly referred to in this module

n/a

• GSBPM phases explicitly referred to in this module

GSBPM Phases 4.1, 5.2 – 5.6.

• Tools explicitly referred to in this module

n/a

• Process steps explicitly referred to in this module

n/a

19