Educational Research 101: How to Manage Your Data and Prepare for the Statistical Consultation...

Post on 01-Apr-2015

212 views 0 download

Tags:

Transcript of Educational Research 101: How to Manage Your Data and Prepare for the Statistical Consultation...

Educational Research 101:How to Manage Your Data and Prepare

for the Statistical Consultation

Francis S. Nuthalapaty, MDH. Lee Higdon III, PhD

2009 APGO Faculty Development Seminar

Case Study: The wrong way

Case Study: The wrong way

• Statistician was consulted after the data had

been collected.

• Study question was not clearly defined.

• Variables were not defined.

• Data Dictionary was not developed.

• Data were not cleaned/validated.

• Result: a statistician that is asked to perform a

miracle!

Case Study: Lesson

Arrangements to consult with a statistician should be made before you start enrolling and collecting

data on patients! In fact, they should be made before protocol development to prevent issues

downstream.

Learning Objectives

1. Describe the continuum of data management

2. List data collection instruments / approaches

3. Understand how to create a data dictionary

4. Describe methods to validate data

5. Describe various data analytic tools

6. Describe how to decide on statistical

approaches

Question

Where does data management fit into the

research process?

The Research Process

1. Question2. Literature search3. Objective / Hypothesis4. Study design5. IRB6. Study conduct7. Data analysis8. Dissemination of results

Data Management Pearl

“No study is better than the quality of its data”

- Friedman

“…get it right the first time”- Crerand

Analysis

Steps in Data Management

• Definition

• Acquisition

• Data Entry

• Validation

Data Definitions

• Identifying your data

• Identifying your data types

• Naming your data variables

• Creating a data dictionary

Data Types

Types of Variables

Qualitative Quantitative

Nominal

Ordinal

Interval

Ratio

Data Definition Exercise

Data Variable Names

• Make the name descriptive (easier to remember)

• Keep it short (less than 10 characters)

• User lower case

• Avoid spaces – use “underscore”

• Use numbers to indicate sequences

Data Variable Formats

• Variable formats:

– Numeric

– String

Data Variable Values

• Possible responses for a variable

– Numeric format:

• 0 = no / 1 = yes

– String format:

• a = no / b = yes

Data Variable Values

Note on Missing Values

• What about variables with no response?

– Leave it blank

– Assign a period “.”

– Assign a value (usually out of the expected

response range)

– Avoid text

Data Naming Exercise

Data Dictionaries / Code Books

• Brings together all data elements:

– Data types / formats

– Variable names

– Expected response values (range)

– Comments

• Self-generated vs. computer generated

• “Rosetta Stone” for the database

Data Dictionary Exercise

Data Acquisition

Pick the best method for the environment

Data Acquisition Methods

• Interviews

• Questionnaires

• Assessments

– MCQ examinations

– OSCE / OSAT

• Laboratory studies

Data Acquisition Environments

• Observational encounters

• Structured research encounters

• Self-report

Data Acquisition Problems

• Major types of data issues:

– Missing data

– Incorrect data

– Excess variability

Data Acquisition Problems

• Reasons for poor data quality:

– Researcher-dependent data:

• Insufficient time

• Inadequate training

• Lack of focus on study tasks

• Poor communication

• Protocol deviation

Data Acquisition Problems

• Reasons for poor data quality:

– Subject-dependent data:

• Inadequate instruction

• Poor comprehension

• Sensitive or stigmatized behaviors

Data Acquisition Options

• Paper forms

• Direct entry

• Computer assisted data acquisition

Data Acquisition: Paper Forms

Advantages• Controlled

distribution and return

• Comments• Double data entry

Disadvantages• Anonymity• Manual quality

checks• Data entry time /

errors

Data Acquisition: Direct Entry

• Options:– MS Excel, MS Access– Epi Info – free on the web– Direct entry into statistical software

• Pros / Cons:– No data transcription– Errors

Data Acquisition

• Computer assisted data acquisition:

– Automated data collection

– OCR forms

– Computer-based case report forms /

questionnaires

– Computer-assisted self-interviews

– Mobile computing device diaries

Data Acquisition: CASI

• Special Focus: Health Behaviors

– Factors which may affect reporting:

• Sensitive or stigmatized behaviors

• Age discrepancy between participant and

interviewer

• Lack of privacy

• Lack of comprehension of self-administered

questionnaires

Data Acquisition: CASI

• Computer-assisted self-interview (CASI):

– Computer-based interview

– Can incorporate audio, video, and text

– Respondent listens to or reads questions on

screen

– Submits answers through keypad or touch

screen

Data Acquisition: CASI

• Benefits of CASI:

– Interview conducted in privacy

– Standardized interview

– Computer controlled branching

– Automated consistency and range checking

– Multilingual administration

Analysis

Steps in Data Management

• Definition

• Acquisition

• Data Entry

• Validation

Data Validation

1. Is all of the data present?

2. Are the responses within the expected

range?

3. Does the data make sense?

Data Validation

• Is all of the data present?

– Visually examine the data cells

– Frequencies

Data Validation

• Are the responses within the expected

range?

– Frequencies

• Maximum / minimum values

– Descriptive statistics

• Means

• Standard deviations

Data Validation

Once the outlier is found, one can reference the chart for clarification

Descriptive Statistics

Data Distribution

Definitions by SPSS 16.0

Data Distribution

Data Distribution

Scatterplots

Who is Represented in the Data?

• Sample test of proportions– Percent of gender– Percent of ethnicity

• Sample test of means – Age– BMI

• Does our data reflect the population at large or a subset?

Who is not?

• Compare data of the included and excluded individuals– Are they similar for:

• Age (continuous – Student t test)• BMI (continuous – Student t test)• Ethnicity (discrete/categorical – Chi-square test)• Gender (discrete/categorical – Chi-square test)

Analysis

Steps in Data Management

• Definition

• Acquisition

• Data Entry

• Validation

Data Analysis

• Choose the right tool for the job

• Commonly used statistical tests:

– If the data are normally distributed (i.e. bell-shaped curve) then we use parametric statistical test

– If the data are (1) not “bell-shaped”, or (2) have small sample sizes, generally less than 30 per group or (3) contain “outliners”, then we use nonparametric statistical tests.

• Choice of statistical tests is used on:– Distribution of the sample data– Sample size– Number of groups– Independence of the groups

Comparison Measurement

Normal Distributi

on

# of groups

Statistical Test

Mean (Average) Yes 2 Student’s t-test

Mean (Average) Yes ≥3 Analysis of Variance

Median No 2 Wilcoxon Rank-Sum or Mann-Whitney U-test

Median No ≥3 Kruskal-Wallis test

Proportions Yes ≥2 Chi-square test

Proportions No ≥2 Fisher’s exact test

Data Analysis

• Univariate vs. Multivariate– Multivariate methods are being required more frequently in

medical research because we are looking at relationships that involve more than one-to-one association.

• Multivariate methods allow us to:– Examine many variables simultaneously– Adjust for baseline differences between groups– Adjust for potential “confounding” variables– Obtain “adjusted” measures of effect

• Examples of multivariate methods: (Explain or predict the independent variables)– Linear regression – to predict the values of a numerical measurement (viral load)– Logistic regression – to predict a dichotomous outcome (pregnant/not pregnant)– Cox proportional hazard – to predict time to an event (survival time)

Data Analysis

Session content, including narrated MS Powerpoint slides available at:

http://www.obgynknowledgebank.net