Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate...

24
Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of Public Health [email protected]

Transcript of Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate...

Page 1: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Managing Your Own Data

(…if you have to)

Kathryn A. Carson, Sc.M.

Senior Research AssociateDepartment of Epidemiology

Johns Hopkins Bloomberg School of Public [email protected]

Page 2: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Overview

• Principles of Data System Design • Data entry and management systems

– Self managed systems• How to manage data in MS Excel

– Programmer managed systems• How to manage data in MS Access

• Sample data sets• How to prepare data for statistical analysis• Confidentiality/security

7/20/2010 Introduction to Clinical Research 2

Page 3: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Principles of Data System Design

• Data Input/Entry– What resources are available– Amount of data– Set up time versus usage– Double data entry

• Data Validation– Data type (e.g., numeric, date, text )– Range checks– Missing data– Violation of protocol checks– Coding and spelling errors– Consistency checks

7/20/2010 Introduction to Clinical Research 3

Page 4: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Principles of Data System Design (2)

• Data Audit– Computer audits– Manual audits

• Data Edit– Single line through incorrect value on data form, write correct value,

initial and date– Make same changes to database file

• Data maintenance– Single database file

• Data archive– Have plans to archive data after the end of the study– Data can be archived on a CD– Data need to be stored for at least five years

7/20/2010 Introduction to Clinical Research 4

Page 5: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Principles of Data System Design (3)

• Identification of Study Participant– Do not use names, hospital history numbers or Social Security

numbers– Patients should be identified with a unique study assigned ID

number – Maintain a log linking the patient’s name and other personal

information to the study ID• Kept separately under lock and key or encrypted and password protected• Only personnel who need access to information should have it

– HIPAA guideline compliant – collect the least amount of protected health information (PHI) needed for the study

7/20/2010 Introduction to Clinical Research 5

Page 6: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Data Management Computer Systems

• Self-managed computer systems:– Spreadsheets

• Excel, Lotus

• Programmer managed computer systems:– Database management software

• Access, dBase

– Statistical software• SAS, SPSS, Stata

– Web-based systems• Gsurvey, REDCap

– Fax/Teleform systems

7/20/2010 Introduction to Clinical Research 6

Page 7: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Self-managed Systems

• Advantages– Self managed– Convenient for small data sets– Descriptive statistics and graphics available

• Disadvantages– Data types are defined by first few entries– Not conducive to data validation– Cumbersome for very large data sets– Forms need to be designed separately– Repeated column names or no column names allowed– Data codes are entered manually into a separate file– Unable to do consistency checks across forms

7/20/2010 Introduction to Clinical Research 7

Page 8: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Creating an Excel Spreadsheet

• Unique variable names should be in the first row• Data should be in column format• Data in the same column should be of the same data

type • Some data validation features are available • Data audit features are available for existing

spreadsheets

7/20/2010 Introduction to Clinical Research 8

Page 9: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Programmer-managed Systems

• Advantages– Friendlier data entry environment– Computerized data validation– Ability to perform consistency checks within and across tables– Ability to track editing changes– More manageable for large data sets

• Disadvantages– Require more up front planning and resources– Require database knowledge to develop a file– User training

7/20/2010 Introduction to Clinical Research 9

Page 10: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Creating an Access Database

• Primary Key(s)– Must be unique and not missing– Indexes on this value

• Be careful of “Default value”– Default setting is zero for numeric data (pre 2007 versions)

• Use “Required” only when necessary– Will not allow field to be left blank

7/20/2010 Introduction to Clinical Research 10

Page 11: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Sample Dataset 1:Binary/Continuous Data

ID age gender height treatment disease

1 20 0 70 1 0

2 33 1 66 0 0

3 56 1 62 1 1

4 45 1 65 0 1

5 29 0 72 1 0

6 52 0 68 1 1

7 41 1 68 0 1

7/20/2010 Introduction to Clinical Research 11

Page 12: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Sample Dataset 2:Survival Data

ID startdt eventdt fudt survival cens

1 08/04/05

01/01/08 28.9 1

2 01/01/07

01/08/09 24.2 0

3 01/07/04

03/24/04 2.5 1

4 02/23/01

08/01/01 5.3 0

5 12/20/06

07/20/09 31.0 0

6 07/16/00

01/14/03 30.2 0

7 04/13/02

03/21/04 23.4 1

7/20/2010 Introduction to Clinical Research 12

Page 13: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Sample Dataset 3:Longitudinal Data (Vertical)

ID Visit Treatment SBP

1 1 0 135

1 2 0 128

1 3 0 140

2 1 1 140

2 2 1 133

2 3 1 131

7/20/2010 Introduction to Clinical Research 13

Page 14: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Sample Dataset 4:Longitudinal Data (Horizontal)

ID Treatment SBP1 SBP2 SBP3

1 0 135 128 140

2 1 140 133 131

3 1 150 151 143

4 1 132 135 126

5 0 136 129 135

7/20/2010 Introduction to Clinical Research 14

Page 15: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Sample Datasets

What not to do!

Real life examples

7/20/2010 Introduction to Clinical Research 15

Page 16: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Preparing Files for Statistical Analysis

• Allow adequate time for data preparation• Better the quality of data, less time analyzing

– Know your data• Look at frequency distributions and scatterplots

– Multiple checks for errors – Minimize missing data if at all possible

• Be aware of amount of data missing and why

• Freeze the dataset– Copy to another file and date the file– Document any corrections made to file and also correct in

original database and on forms

• Plan on recoding categorical variables so each group has a sufficient sample size

• Prepare a separate code sheet for data

7/20/2010 Introduction to Clinical Research 16

Page 17: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Preparing Files for Statistical Analysis (2)

• General spreadsheet design– One line header row with a unique one word name for each

variable– Do not mix data types within one column– Unique identifying number for each case– Only include raw, un-summarized data, i.e., no summary

statistics or graphs in spreadsheet– Date format with four digit years– Avoid underlining, bold fonts, or italics– Do not leave blank rows or columns in between data– Do not use a row to label a group, use a grouping variable

(column)

7/20/2010 Introduction to Clinical Research 17

Page 18: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Preparing Files for Statistical Analysis (3)

• Missing Data– Consider what software will be used for analysis– Use different codes to indicate reason missing

• e.g., not applicable, unable to complete, or missing

– If numeric field• Must not be a valid data point• Do not use text, such as “NA”, “missing”, “*”

7/20/2010 Introduction to Clinical Research 18

Page 19: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Exporting Excel data into Stata

• Save the Excel file as comma delimited– In the Save As dialog box choose CSV(comma delimited) for

Save As type

• In Stata– Go to drop down menu “File” – “Import” – “ASCII data

created from a spreadsheet”– or use the command ‘insheet using “filepath.csv”, comma

names’

7/20/2010 Introduction to Clinical Research 19

Page 20: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Exporting Access Data into Stata

• Save Access file as a Comma Delimited File– Open the Access table– From the File Menu, select Export– In the pop-up dialog box click on “Save as file type” and select

Text Files– Click “Save All” and in the Export Text Wizard select delimited

and comma

• Follow instructions for importing a comma delimited file into Stata

7/20/2010 Introduction to Clinical Research 20

Page 21: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Data Transfer Software

• Software is available to transfer data between applications• Stat/Transfer and DBMS/COPY

– Access, ASCII, dBase, Epi Info, Excel, JMP, Paradox, QuattroPro, SAS, S-Plus, SPSS, Stata, Statistica

– Need to update as software updates

7/20/2010 Introduction to Clinical Research 21

Page 22: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

ICTR Resources

• Data management and statistical support available for ICTR (CRU) protocols

– http://ictr.johnshopkins.edu/connection/

• Computer facilities with data management and statistical software

– Located in Carnegie 446– http://www.hopkinsmedicine.org/gcrc/– Gsurvey, Teleform, scanning, sample size programs,

statistical software, data transfer software

7/20/2010 Introduction to Clinical Research 22

Page 23: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Data Security and Confidentiality

• Do not include unnecessary protected health information on research data files

– No names, addresses, phone numbers, social security numbers– No medical record numbers

• Files that link the study ID to the PHI should not be maintained on removable storage drives or laptops

• E-mailing of data files should be limited– If PHI are on the file, then the files should be encrypted and

password protected– Do not e-mail the password

• Use JShare or SharePoint to share or transfer data files

7/20/2010 Introduction to Clinical Research 23

Page 24: Managing Your Own Data (…if you have to) Kathryn A. Carson, Sc.M. Senior Research Associate Department of Epidemiology Johns Hopkins Bloomberg School of.

Summary

• Well designed systems minimize data errors and future problems

• Data management systems should be chosen based on resources and individual needs

– Spreadsheets are appropriate for small and simple data sets– Databases provide more options for data management

• Add simple validations to check data entry• Following guidelines for preparing files for statistical

analysis will save time• Data transfer software is available to transfer data

between applications• Limit PHI and keep data secure

7/20/2010 Introduction to Clinical Research 24