Types and Sources of Errors in Statistical Data

29
SADC Course in Statistics Types and Sources of Errors in Statistical Data

Transcript of Types and Sources of Errors in Statistical Data

Page 1: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 1/29

SADC Course in Statistics

Types and Sources of Errors

in Statistical Data

Page 2: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 2/29

2 To put your footer here go to View > Header and Footer

Types of Errors

• In general, there are two types of errors:a. non-sampling errors and

b. sampling errors.

• It is important for a researcher to be aware of

these errors, in particular non-sampling errors, sothat they can be either minimised or eliminated

from the data collected. 

Page 3: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 3/29

3 To put your footer here go to View > Header and Footer

Non-sampling errors– These are errors that arise during the course ofall data collection activities.

– In summary, they have the following

characteristics:

• exist in both sample surveys and censusesdata.

• difficult to measure .

Page 4: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 4/29

4 To put your footer here go to View > Header and Footer

Sources of non-sampling errors

on-sampling errors arise from:• defects in the sampling frame.

•  failure to identify the target population.

• non response.

• responses given by respondents.

• data processing and

• reporting, among others.

Page 5: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 5/29

5 To put your footer here go to View > Header and Footer

Defects in the sampling frame 

• This result in coverage errors.• These occur when there is an omission, duplicationor wrongful inclusion of units in the samplingframe.

• !missions are referred to as "under coverage#

while duplications and wrongful inclusions arecalled "over coverage#.

• These errors are caused by defects such asinaccuracy, incompleteness, duplication,inade$uacy and out of date sampling frames.

• %overage errors may also occur in field operations,that is, when an enumerator misses severalhouseholds or persons during the interviewingprocess.

Page 6: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 6/29

6 To put your footer here go to View > Header and Footer

Failure to Identify Target Population

• This occurs when the target population is notclearly defined through the use of imprecise

definitions or concepts or when the survey

population does not reflect the target population

due to an inade$uate sampling frame and poorcoverage rules. 

Page 7: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 7/297 To put your footer here go to View > Header and Footer

Response 

• They result from the data that have been

re$uested, provided, received or recorded

incorrectly.

• They may occur as a result of inefficiencies with

the $uestionnaire, the interviewer, the respondentor the survey process.

Page 8: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 8/298 To put your footer here go to View > Header and Footer

a. Poor questionnaire design

• The content and wording of the $uestionnaire maybe misleading and the layout of the $uestionnairemay ma&e it difficult to accurately recordresponses.

• 's a rule, $uestions in $uestionnaire should not beloaded, double-barrelled, misleading orambiguous, and should be directly relevant to theob(ectives of the survey.

• It is essential to pilot test $uestionnaires to

identify $uestionnaire flow and $uestion wordingproblems, and allow sufficient time forimprovements to be made to the $uestionnaire.

Page 9: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 9/299 To put your footer here go to View > Header and Footer

Poor questionnaire design cont!d

• The $uestionnaire should then be re-tested to

ensure changes made do not introduce other

problems.

Page 10: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 10/2910 To put your footer here go to View > Header and Footer

". Inter#ie$er "ias

• 'n interviewer may influence the way a

respondent answers survey $uestions.

• To prevent this, interviewers must be trained to

remain neutral throughout the interviewing

process and must pay close attention to the waythey as& each $uestion.

Page 11: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 11/2911 To put your footer here go to View > Header and Footer

c. Respondent errors

• These arise through the respondent providing

inaccurate or wrong information.

• They occur because of memory biases or

respondents giving inaccurate or false information

when they believe that they are protecting theirpersonal interests or integrity.

• They can also arise from the way the respondent

interprets the $uestionnaire and the wording of

the answer that the respondent gives.• %areful $uestionnaire design and effective

$uestionnaire testing can overcome these

problems to some extent.

Page 12: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 12/2912 To put your footer here go to View > Header and Footer

d. Pro"lems $ith the sur#eyprocess

• )rrors can also occur because of problems with the

actual survey process such as using proxy

responses, that is, ta&ing answers from someone

other than the respondent or lac&ing control over

the survey procedure. 

Page 13: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 13/2913 To put your footer here go to View > Header and Footer

Non-Response 

• on-response results when data is not collectedfrom respondents.

• The proportion of these non-respondents in thesample is called the non-response rate.

• on-response can be either total or partial.

• Total non-response or unit non-response canarise if a respondent cannot be contacted *becausethe sampling frame is incomplete or out-of-dated+ orthe respondent is not at home or is unable torespond because of language difficulties or illness or

out rightly refuses to answer any $uestions or thedwelling unit is vacant.

• !ther respondents may indicate that they simplydont have the time to complete the interview orsurvey form.

Page 14: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 14/2914 To put your footer here go to View > Header and Footer

on-response - cont#d

• hen conducting surveys it is important todocument information on why a respondent hasnot responded.

• Partial non-response or item non-responsecan occur when a respondent replies to some butnot all $uestions of the survey.

• This can arise due to memory problems,inade$uate information or an inability to answer aparticular $uestionsection of the $uestionnaire.

• ' respondent may refuse to answer if/

a. they find $uestions particularly sensitive, or if 

b. they have been as&ed too many $uestions.

Page 15: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 15/2915 To put your footer here go to View > Header and Footer

on-response - cont#d

• To reduce non-response, the following approaches

can be used:

– care should be ta&en in $uestionnaire design

through the use of simple $uestions.

– pilot testing of the $uestionnaire.– explaining survey purposes and uses.

– assuring confidentiality of responses.

– public awareness activities including discussions

with &ey organisations and interest groups,news releases, media interview and articles.

Page 16: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 16/2916 To put your footer here go to View > Header and Footer

Processing 

• These occur at various stages of data processing

such as data cleaning, data capture and editing.

• 0ata cleaning involves ta&ing preliminary chec&s

before entering the data onto the processing

system.• %oder bias is usually a result of poor training or

incomplete instructions, variability in coder

performance and data entry errors.

Page 17: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 17/2917 To put your footer here go to View > Header and Footer

1rocessing – cont#d

• Inade$uate chec&ing and $uality management at

this stage can introduce data loss *where data is

not entered into the system+ and data duplication

*where the same data is entered into the system

more than once+ thus introducing errors in data.

• To minimise these errors, processing staff should

be given ade$uate training, instructions and

realistic wor&loads.

Page 18: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 18/2918 To put your footer here go to View > Header and Footer

Time Period %ias

• This occurs when a survey is conducted during an

unrepresentative time period.

• 2urvey timing is thus important and failure to

recognise this introduces errors in data.

Page 19: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 19/2919 To put your footer here go to View > Header and Footer

&nalysis and Estimation 

• 'nalysis errors include any errors that occur when

using wrong analytical tools or when preliminary

results are used instead of the final ones.

• )rrors that occur during the publication of the data

results are also considered as analysis errors.• )stimation errors occur when inappropriate or

inaccurate weights are used in the estimation

procedure thus introducing errors to the data.

• They also occur when wrong estimators areselected by the analyst.

Page 20: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 20/2920 To put your footer here go to View > Header and Footer

Reducing non-sampling errors

• %an be minimised by adopting any of the following

approaches:

– using an up-to-date and accurate sampling

frame.

– careful selection of the time the survey isconducted.

– planning for follow up of non-respondents.

– careful $uestionnaire design.

– providing thorough training and periodicretraining of interviewers and processing staff.

Page 21: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 21/2921 To put your footer here go to View > Header and Footer

Reducing non-sampling errors cont!d

- designing good systems to capture errors thatoccur during the process of collecting data,

sometimes called 0ata 3uality 'ssurance 2ystems.

Page 22: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 22/2922 To put your footer here go to View > Header and Footer

Sampling error

• 4efer to the difference between the estimatederived from a sample survey and the true valuethat would result if a census of the wholepopulation were ta&en under the same conditions.

• These are errors that arise because data has been

collected from a part, rather than the whole of thepopulation.

• 5ecause of the above, sampling errors arerestricted to sample surveys only unli&e non-

sampling errors that can occur in both samplesurveys and censuses data.

Page 23: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 23/29

Page 24: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 24/2924 To put your footer here go to View > Header and Footer

Factors &ffecting Sampling Error

It is affected by a number of factors including:

a. sample si'e.

• In general, larger sample si7es decrease thesampling error, however this decrease is notdirectly proportional.

• 's a rough rule of the thumb, you need toincrease the sample si7e fourfold to halve thesampling error but bear in mind that non samplingerrors are li&ely to increase with large samples.

". the sampling fraction.• this is of lesser influence but as the sample si7eincreases as a fraction of the population, thesampling error should decrease.

Page 25: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 25/29

25 To put your footer here go to View > Header and Footer

Factors &ffecting Sampling Error cont!d

c. the #aria"ility $ithin the population.

• 6ore variable populations give rise to largererrors as the samples or the estimates calculatedfrom different samples are more li&ely to havegreater variation.

• The effect of variability within the population canbe reduced by the use of stratification that allowsexplaining some of the variability in thepopulation.

d. sample design.

• 'n efficient sampling design will help in reducingsampling error.

Page 26: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 26/29

Page 27: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 27/29

27 To put your footer here go to View > Header and Footer

Reducing sampling error

• If sampling principles are applied carefully within

the constraints of available resources, sampling

error can be &ept to a minimum.

Page 28: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 28/29

28 To put your footer here go to View > Header and Footer

2ources

– http)**$$$.nss.go#.au*nss*home.nsf*

Sur#eyDesignDoc*+,+&/0+0F,+(&0

12&%330+1/(E45penDocument– http)**$$$.statcan.ca*english*edu*po$

er*ch6*nonsampling*nonsampling.htm

– http)**$$$.statcan.ca*english*edu*po$er*ch6*sampling*sampling.htm

Page 29: Types and Sources of Errors in Statistical Data

8/20/2019 Types and Sources of Errors in Statistical Data

http://slidepdf.com/reader/full/types-and-sources-of-errors-in-statistical-data 29/29

f h i d d