Data Quality Control

22
Data Quality Data Quality Control Control

Transcript of Data Quality Control

Page 1: Data Quality Control

Data Quality ControlData Quality Control

Page 2: Data Quality Control

Learning ObjectivesLearning Objectives To know the steps necessary for ensuring quality assurance

and control of data at various stages of a study

To understand the difference between pilot testing and pre-testing

To understand the importance of designing data collection instruments

To understand how data can be managed using an audit trail and the various techniques that can be used to inspect your dataset after it has been entered

Page 3: Data Quality Control

Performance ObjectivesPerformance Objectives Know the difference between quality assurance and quality

control and ways to ensure them

Know the objectives of a pilot test and a pre-test

Understand how data collection instruments should be designed and coded

Be able to manage data using an audit trail

Be able to inspect datasets for errors and rectify them

Page 4: Data Quality Control

Data Quality ControlData Quality Control

Quality Assurance– Activities to ensure

quality of data before data collection

Quality Control– Monitoring and

maintaining the quality of data during the conduct of the study

• Data Management– Handling and processing of data throughout the study

Page 5: Data Quality Control

Steps in Quality AssuranceSteps in Quality Assurance1. Specify the study hypothesis

2. Specify general design to test study hypothesis Develop an overall study protocol

3. Choose or prepare specific instruments

4. Develop procedures for data collection and processing Develop operation manuals

5. Train staff Certify staff

6. User certified staff, pretest and pilot-study data collection and processing instruments and procedures

Page 6: Data Quality Control

Quality Assurance: Standardization of Quality Assurance: Standardization of proceduresprocedures

Why is standardization important?– In order to achieve highest possible level of uniformity

and standardization of data collection procedures in the entire study population

Preparation of written manual of operations– Detailed descriptions of exactly how the procedures

specific to each data collection instrument are to be carried out (BP example)

– Q by Q’s (question by question) instructions for interviews

Page 7: Data Quality Control

Quality Assurance: Training of StaffQuality Assurance: Training of Staff

Aim to make each staff person thoroughly familiar with procedures under his/her responsibility

Training certification of the staff member to perform a specific procedure

Page 8: Data Quality Control

Quality Assurance: Pretesting and Pilot Quality Assurance: Pretesting and Pilot testingtesting

Pretesting– Involves assessing

specific procedures on a sample in order to detect major flaws

Pilot Testing– Formal rehearsal of

study procedures

– Attempts to reproduce the whole flow of operations in a sample as similar as possible to study participants

Page 9: Data Quality Control

Pretesting and Pilot testing resultsPretesting and Pilot testing results

Pretesting of questionnaire used to assess:– flow of questions,

– presence of sensitive questions,

– appropriateness of categorization of variables,

– clarity of the q by q instructions to the interviewer

Pilot testing– In addition to the above, flow of process

Page 10: Data Quality Control

Quality Assurance: Data ManagementQuality Assurance: Data Management

Designing data collection– Layout, questions to ask, sequence of questions,

phrasing of questions, response categories, skip patterns

– Collect and record “raw”, not processed information (eg. Age)

– Codebook: link between the questionnaire and the data entered in the computer

Page 11: Data Quality Control

Code book exampleCode book exampleVariable QNo Meaning Codes FormatQ1Id Q1 Quest. No 1-750 C 3Q2Sex Q2 Respondent’s sex 1 male

2 femaleN 1.0

Q3Child Q3 No of children 99 no response N 2.0

Q4Wt Q4 Weight in kg 999 not recorded N 3.1

Q5roof Q5 Roof type 1 RCC2 Cement sheet 3 Tin sheet4 ThatchedOther (specify)

N 2.0

Page 12: Data Quality Control

Quality Assurance: Use of a Code bookQuality Assurance: Use of a Code book

Variable names– Up to 8 characters a-z and 0-9, must start with a letter

– Combination of question number and description (eg. q3age)

Meaning: – short text description describing the meaning of the

variable

– SPSS software can incorporate this info as variable labels and display it in the output

Page 13: Data Quality Control

Quality Assurance: Use of a Code bookQuality Assurance: Use of a Code book

Codes– Try and use numerical codes

Predecide codes for no response, missing values– Question could not be asked or not applicable (eg.

pregnancy outcome)

– Question was asked but respondent did not reply (eg salary)

– Respondent replied “don’t know”

Page 14: Data Quality Control

Quality ControlQuality Control

Observation of procedures and performance of staff members for identification of obvious protocol deviations

Strategies include:– Over-the-shoulder observation of staff

– Taping all interviews and reviewing a random sample

– Ongoing field supervision

– field editing by interviewer as well as field supervisor

– Office editing which includes coding

– log book maintenance

– Statistical assessment of trends over time in the performance of each observer/interviewer/technician

Page 15: Data Quality Control

Data Management: Audit trailData Management: Audit trail Researcher should be able to trace each piece of information

back to the original document:– ID included in the original documents and in the dataset

– All corrections must be documented and explained

– All modifications to the dataset must be documented by command files

– Each analysis must be documented by a command file

Purpose of audit is to – protect yourself against mistakes, errors, waste of time and loss of

information

– enable external audit (revision)

Page 16: Data Quality Control

Data Management: Handling of DataData Management: Handling of Data

Entering data– Use professional data entry program like

EpiData

Preparations– complete codebook

– examine questionnaires for obvious inconsistencies, skip patterns

Page 17: Data Quality Control

Data Management: Handling of DataData Management: Handling of Data

Error prevention: – Set up a data entry form resembling your

questionnaire

– Define valid values before entering data

– double data entry by two different operators compare contents to get list of discrepancies (

EpiInfo) correct errors in both files and run new comparison

Page 18: Data Quality Control

First Inspection of data. Error FindingFirst Inspection of data. Error Finding

Add variable and value labels to your data using a syntax command

Searching for errors– make printouts of codebook from the data, overview of variables, simple frequency

tables of appropriate variables

– compare codebook created with original codebook and see if label information is correct

– Inspect the generated summary/frequency tables for illegal or improbable minimum and maximum values of variables and inconsistencies (eg. 250 years age, pregnant male; 23 yr woman with 19 yr son)

Calculate the error rate by– randomly select 10% or at least 40 of your questionnaires and re-enter them into

new file

Page 19: Data Quality Control

Correction of errors - DocumentationCorrection of errors - Documentation

If errors are discovered– Make corrections in a command file (SPSS syntax

file), this will provide full documentation of changes made to the dataset

If errors are discovered when comparing files after double data entry– you can make corrections directly in the data

entered, provided you end this step with a comparison of the two files entered and corrected

Page 20: Data Quality Control

Correction of errors - DocumentationCorrection of errors - Documentation

Split the process into distinct and well-defined steps and that your documentation from one step to another is consistent

Archive– once you have a “clean” documented version of

your primary data, save one copy in a safe place and do your work with another copy

Page 21: Data Quality Control

AnalysisAnalysis

Make sure you use the right data set– recommend to create command files for

analysis which start with the command reading the dataset

Late discovery of errors and inconsistencies

Page 22: Data Quality Control

Backing up vs ArchivingBacking up vs Archiving Backing up

– everyday activity

– purpose to able you to restore your data and documents in case of destruction or loss of data

– not only datasets, but also command files modifying your data, written documents such as the protocol, log book and other documenting information

Archiving– takes place once or a few times during the life of the project– purpose is to preserve your data and documents for a more distant

future, maybe to even allow other researchers access to the information.