Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap...

25
1 Data Management: Collecting, Protecting and Managing Data Presented by Andre Hackman Johns Hopkins Bloomberg School of Public Health Department of Biostatistics July 17, 2014 Objectives Review of available database and data management tools and their capabilities Demonstrate examples of good data management Discuss strategies and good practice for data security and backup Discuss data conversion and transfer options Overview of Data Management Tools Types of data management tools to consider Type of tool Examples Spreadsheet Microsoft Excel Google Docs Spreadsheet Open Office Calc Desktop database Microsoft Access Filemaker Pro Specialized databases Epi Info, CSPro, EpiData Web database REDCap OpenClinica Mobile data collection Open Data Kit, CommCare, Magpi, iFormBuilder,

Transcript of Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap...

Page 1: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

1

Data Management: Collecting, Protecting and

Managing Data

Presented by

Andre HackmanJohns Hopkins Bloomberg School of Public Health

Department of Biostatistics

July 17, 2014

Objectives

• Review of available database and data management tools and their capabilities

• Demonstrate examples of good data management

• Discuss strategies and good practice for data security and backup

• Discuss data conversion and transfer options

Overview of Data Management Tools

Types of data management tools to consider

Type of tool Examples

Spreadsheet Microsoft ExcelGoogle Docs SpreadsheetOpen Office Calc

Desktop database Microsoft Access Filemaker Pro

Specialized databases Epi Info, CSPro, EpiData

Web database REDCapOpenClinica

Mobile data collection Open Data Kit, CommCare, Magpi,iFormBuilder,

Page 2: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

2

Choosing a data management tool…

Pick the tool that is appropriate for the data needs of the project

• What will be the challenges you face with each tool?

– Complexity and scale of the data being collected• Is it longitudinal data with many time points?

• What is the expected number of records?

Choosing a data management tool…

Pick the tool that is appropriate for the data needs of the project

• What will be the challenges you face with each tool?

– Complexity and scale of the data being collected

– Mixture of collection modes: mobile, kiosk, and computer entry.• How will data flow between them?

Choosing a data management tool…

Pick the tool that is appropriate for the data needs of the project

• What will be the challenges you face with each tool?

– Complexity and scale of the data being collected

– Mixture of collection modes: mobile, kiosk, and computer entry.

– Complexity of the data entry forms

Page 3: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

3

Choosing a data management tool…

Pick the tool that is appropriate for the data needs of the project

• What will be the challenges you face with each tool?

– Complexity and scale of the data being collected

– Mixture of collection modes: mobile, kiosk, and computer entry.

– Complexity of the data entry forms

– Multiple users access• Can you control and restrict access?

– Restrict data views by site

– Control whether users have full or read-only access

Choosing a data management tool…

Pick the tool that is appropriate for the data needs of the project

• What will be the challenges you face with each tool?

– Complexity and scale of the data being collected

– Mixture of collection modes: mobile, kiosk, and computer entry.

– Complexity of the data entry forms

– Multiple users access

– Merging/importing/exporting data capabilities

Choosing a data management tool…

Pick the tool that is appropriate for the data needs of the project

• What will be the challenges you face with each tool?

– Complexity and scale of the data being collected

– Mixture of collection modes: mobile, kiosk, and computer entry.

– Complexity of the data entry forms

– Multiple users access

– Merging/importing/exporting data capabilities

– Audit capabilities

Page 4: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

4

Choosing a data management tool…

Pick the tool that is appropriate for the data needs of the project• What will be the challenges you face with each tool?

– Complexity and scale of the data being collected– Mixture of collection modes such as mobile, kiosk, and computer entry.– Complexity of the data entry forms– Multiple users access– Merging/importing/exporting data capabilities– Audit capabilities

– Reporting capabilities during and after the data collection. Built-in or separate tool?

• During: – What data is collected/missing? – Running validation checks on data– Interim looks at the trends

• After: – Summaries– Analytic results of study

Choosing a data management tool…

Pick the tool that is appropriate for the data needs of the project

• What will be the challenges you face with each tool?

– Complexity and scale of the data being collected

– Mixture of collection modes such as mobile, kiosk, and computer entry.

– Complexity of the data entry forms

– Multiple users access

– Merging/importing/exporting data capabilities

– Audit capabilities

– Reporting capabilities during and after the data collection. Built-in or separate tool?

Spreadsheets

• “Lowest common denominator” • Format that all users can open and edit files• Often used by labs to output their results• Disadvantages

– Data can be corrupted by single column sorts or accidental keystrokes– Data entry interface is limited.– Sometimes program tries to “assist” the user with data– Not designed for multiple user entry (unless you are using a web-based

spreadsheet)• Helpful spreadsheet data management commands

– Transpose data (Paste Special, transpose)– Remove duplicates (Data, Remove Duplicates wizard)– Match merge data (VLOOKUP)– Concatenation/substrings (CONCATENATE, SUBSTRING)– Data Validation (Data, Data Validation)– Use “Forms” for data entry (Excel: Add button to Quick Access Toolbar)– Use “Table” to maintain integrity (Insert, Table)– Use Filter to quickly subset your data

Page 5: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

5

SpreadsheetsData management tips

Transpose data (Copy, Paste Special, Transpose)

From columns to rows

Concatenation

Create new fields by concatenating fields together

Data ValidationFor coded fields, create validation to ensure consistent data.

SpreadsheetsData management tips

“Forms” tool for data entry

Remove duplicates tool

To ensure unique records, use the Remove Duplicates option. Be careful if not checking entire records.

Some spreadsheets allow you to automatically create data entry form. Form is simple but often better than trying to enter data directly in spreadsheet

Desktop databases• Microsoft Access and FileMaker Pro commonly used

• Includes database, forms, reports, queries, custom programming

• Easy to initially set up using wizards

• Designed for simultaneous access from multiple users

• Disadvantages– By default, limited to local network access

– Built-in wizards can make it easy to start but tasks beyond wizards require training

– Scalability can be an issue if amount of data is large

Page 6: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

6

Desktop databasesMicrosoft Access

All of the database elements are in one file

Desktop databasesMicrosoft Access

Data entry forms

Queries

Desktop databasesMicrosoft Access

Page 7: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

7

Desktop databasesMicrosoft Access

Reports

Desktop Database: Microsoft Access

(“Access 2007: The Missing Manual, Matthew MacDonald, O'Reilly Media, Inc., 2006)

Specialized databases

• Epi Info, CSPro, Epidata are common

• Includes database, forms, reports, queries, custom programming, analysis tools

• Quickly set up database by building dictionary/forms

• Some include mapping capability to map your data.

• Some can be networked for multi-user access but aren’t truly multi-user.

• Disadvantages– Can require significant training time (CSPro for example)

– Tend to be Windows only applications

Page 8: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

8

Specialized databasesEpiData

Specialized databasesEpi Info – Form Design

Specialized databasesEpi Info – Mapping

Page 9: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

9

Overview of Data Management Tools

• The web database application– REDCap and OpenClinica are examples

– Include database, forms, reports, queries, user/site management, audit trails, backup and security

– Automatically build forms from codebooks

– Accessible from a web browser – no additional software needed

– Scale for very large studies and multiple sites

– Most have large user communities as a resource

– Disadvantages• Features that YOU want may not exist yet

• Some changes require approval by system admin

http://www.project-redcap.org/

Project Home View in REDCap

Page 10: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

10

Entering Data in REDCap

Exporting Data in REDCap

Exporting Data in REDCap

Page 11: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

11

Setting User Rights in REDCap

Audit Trail Logging in REDCap

Mobile data collection

• Mobile data collection– Apple (iPhone, iPad) and Android (phones and tablets) are main

options.

– Data is entered directly on a smart phone or tablet

– Data can be stored locally and transmitted manually OR it can be transferred in real time to a central database.

– Smart phones work well for simple question types

– Tablets are better suited for complex question types with many answers OR the ability to see previous answers

– Considerations:• If data is stored on the device, is it protected in case of theft, damage, etc.

• Do you need a mobile data connection to transfer data on-demand or will it be transferred via WiFi or USB?

• Battery life of the device.

• Where will the data ultimately be stored? Central server, website?

Page 12: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

12

Mobile data collection

• Options for mobile data collection– iFormbuilder

• https://www.iformbuilder.com/

– doForms• http://www.doforms.com/

– Magpi• https://www.magpi.com

– Formhub (for Android only)• https://www.formhub.org/

– Enketo• https://enketo.org/

– Open Data Kit (ODK) (for Android only)• This is what our Center currently uses for mobile data collection.

Open Data Kit (ODK)Sample Screens

Open Data Kit (ODK)Sample Screens

Page 13: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

13

Form & Database Design

• Recommendations for fields and codings on forms

– Make codes for missing and refused data fields.• A common convention is “9” for missing but only if it is not a valid

value

• Problematic if the 9’s get included in the analysis

• Sometimes need to distinguish between – “missing – not obtained”,

– “don’t know”,

– “refused to answer”

Form & Database Design

• Recommendations for fields and database design

– Make codes for missing and refused data fields.

– Use multiple identifiers on each form as a cross check

• You don’t need to save all of them each time. It is simply a cross-check.

Page 14: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

14

Form & Database Design

• Recommendations for fields and database design

– Make codes for missing and refused data fields.

– Consider requiring that all fields are entered unless they are conditional on another question.

– Use multiple identifiers on each form as a cross check.

– Code your responses whenever possible and be consistent in your code assignments. For example, use “0” for NO and “1” for YES.

• Consistency makes it easier for users of your data. They know what to expect!

Form & Database DesignNot the best format. Too many spelling errors:

1. What asthma medication was prescribed for the child to take for the last 7 days?

1a. __________________________

1b. __________________________

1c. __________________________

1d. __________________________

1e. __________________________

Better format with most answers coded:1. What asthma medication was prescribed for the child to

take for the last 7 days? Albuterol

Flovent

Prednisone

Other, describe: __________________________________

Form & Database DesignInconsistent coding for “Missing / Don’t know”

1. How often did you exercise this week?0 – Did not exercise at all

1 – 1 time

2 – 2 times

3 – 3 times

4 – 4 times

5 – 5 or more times

6 – Missing / Don’t know

2. How many hours did you sleep last night?1 – less than 4 hours

2 – 4 to 6 hours

3 - 6 to 8 hours

4 – Missing / Don’t know

In this example, I would recommend making the Missing / Don’t know a consistent value such as “9”.

Page 15: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

15

Form & Database Design

• Recommendations for fields and database design

– Make codes for missing and refused data fields.

– Consider requiring that all fields are entered unless they are conditional on another question.

– Use multiple identifiers on each form as a cross check.

– Code your responses whenever possible and be consistent in your code assignments.

– Do not try to store more than 1 piece of information in a field.

Form & Database DesignIf you store multiple pieces of data in a single field, it will be difficult to analyze.

ID  Assessment 

1001 grade 3, slightly elevated, round

1002 grade 2, normal, oval

ID  Grading Temperature Shape

1001 grade 3 slightly elevated round

1002 grade 2 normal oval

Better: Storing the data in separate fields will make it much easier to analyze.

Field Naming

• Names should be meaningful– Meaningful names reduce reliance on labels and codebooks– Example:

1. In the last week, has your child had a fever?Non-descriptive field name : Q1More Descriptive field name: child_fvr_lwk

– Exceptions with complex questions

• Names should be reasonable length– A general rule is 10-15 characters– Too many characters is cumbersome to use– Example:

Too long: child_fever_in_last_weekToo short: flwJust right: child_fvr_lwk

Page 16: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

16

Field Naming – More Meaningful Names

Field Naming – More Meaningful Names

Field names with less meaning… what are Q1, Q2, Q3?

Field Naming – More Meaningful Names

Field names with meaning in the name…

Page 17: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

17

Field Naming – More Meaningful Names

Data with “self-describing” field names is much easier to understand

Field Naming

• Use prefixes and suffixes when appropriate

– Prefix example: A prefix indicating the form name

bf_lastwk_fever

• where “bf_” indicates this field is from the Baseline Form.

– Suffix example: A suffix can indicate a time period for longitudinal data

lastwk_fever_12

• where “_12” indicates this field is collected at the 12 month visit

– The suffix will help with reshaping data from wide to long format– Prefix/suffix can also help with extracting subsets of fields.

Field Naming – Suffix Useful for Reshaping

WIDE can be reshaped to LONG

Page 18: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

18

Field Naming

• Avoid blanks in names: They will be problematic!– Example:

“first name” is better renamed as “firstname” or “first_name”

• Decide on the character convention:– all lower case– ALL UPPER CASE– CamelCase– use_underscores

• Be consistent in choices!– Consistent choices will make it much easier for those who are new to

your dataset to anticipate the structure of the data.

Codebooks

• The codebook describes the data fields on each form and their codings.

• This document serves as a map to anyone working with the data.

• Log at end of codebook is a good place to list changes that occur during the study related to that form.

• The codebook header should always have the date of change so that a user can tell if they have the current version.

• The Notes column iswhere field-specific change notes should be listed.

Codebook sample

Page 19: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

19

• The last page of the codebook shouldhave dated and signed notes.

• Anyone working with this dataset should be able to look at the codebook and clearly understand any issues or changes that occurred with the data.

Codebook sample

• Programs like REDCap use a defined template as a codebook.

• Because file gets read into REDCap, codebook changes and notes have to be managed separately.

Codebook sample – REDCap

Data Quality and Auditing

• Data quality strategies– Include validation checks during data entry

– Set fields as “required” when appropriate

– Double entry of records• The amount of double entry depends entirely on the study environment

• Some studies double enter 100% of all data

• Other studies enter a random, sub-sample

– Develop data programs to review records• Write programs that check for errors

• Audit trails should be created for changes– On paper, create a standard process of change and initialing

– If electronic entry, then application should record changes made.

Page 20: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

20

Data Security and Handling

• Data Security – Protect the subjects!

• They have consented to data collection.

– Protect yourself and your institution: • IRB requires that you follow best practices

• There can be large fines in the event of a data breach

– De-identify the data• When data is being stored or being prepared for distribution, identifiers should be

removed

• Datasets with identifiers should be stored in a secured environment with limited access – Not carried around on portable devices

• Identifiers are usually not needed for analysis

• See next slide for HIPAA list of 18 identifiers

HIPAA Protected Health Information (PHI): List of 18 Identifiers• 1. Names;

• 2. All geographical subdivisions smaller than a State, including street address, city, county, precinct, zip code, and their equivalent geocodes, except for the initial three digits of a zip code, if according to the current publicly available data from the Bureau of the Census: (1) The geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people; and (2) The initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000.

• 3. All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older;

• 4. Phone numbers;

• 5. Fax numbers;

• 6. Electronic mail addresses;

• 7. Social Security numbers;

• 8. Medical record numbers;

• 9. Health plan beneficiary numbers;

• 10. Account numbers;

• 11. Certificate/license numbers;

• 12. Vehicle identifiers and serial numbers, including license plate numbers;

• 13. Device identifiers and serial numbers;

• 14. Web Universal Resource Locators (URLs);

• 15. Internet Protocol (IP) address numbers;

• 16. Biometric identifiers, including finger and voice prints;

• 17. Full face photographic images and any comparable images; and

• 18. Any other unique identifying number, characteristic, or code (note this does not mean the unique code assigned by the investigator to code the data)

Data Security and Handling

De-identifying your data

• HIPAA allows two ways to de-identify your data

1. Expert Determination• Apply statistical or scientific principles

• Very small risk that the anticipated recipient could identify the individual

2. Safe Harbor• Removal of the 18 types of identifiers

• No actual knowledge residual information can identify individual

Page 21: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

21

Data Security and Handling

Data Security and Handling

Data Security and Handling

• Encrypting your data– It is best practice to encrypt any identifiable or sensitive data

• Depends on the data management system – You may not have direct control of the storage device

• If the data is on a system or device that you control, you can encrypt the data using the operating system

• Many hard drives now have encryption built-in

– To send sensitive data via email• Use software such as 7-zip which can create strong encryption zip files.• Do not send the password for the encrypted file to the same email address!

– A better choice is a phone call, text, or sending the password to a separate email address

– Flash drives• For safely transporting data via flash drive, you should create an encrypted area.• When buying, check if the flash drive has encryption built in. • Otherwise, you can create an encrypted area on the drive using TrueCrypt or

similar software.

Page 22: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

22

Data Security and Handling

• Backing up your data– Create a regular backup schedule– Backup often and develop a plan for managing the backups– Test your backup to make sure that it works and that you know what to

expect. Periodically try to restore some files to test the backup– If you are responsible for your own backups, consider:

– Online backup using services like CrashPlan or Carbonite– External hard drive with encryption

– Flash drives should NOT be a primary backup method. When they fail, they are rarely recoverable.

• The differences between backup services and sync services is blurring

– Services like Dropbox will allow you to save versions and recover deleted files

– Backup services will not have storage quotas. Typically, unlimited backups for x number of computers or mobile devices.

Data Security and Handling

Privacy & Security Contacts at JHU– [email protected]

[email protected]

– IT Help Desk 410-735-4357

– Darren Lacey ([email protected])• Chief Information Security Officer

Data Conversion

• Most programs can export data to a common data format– Comma Separated Values (CSV) is a text format that can be used by

most programs.

– eXtensible Markup Language (XML) a text format that is structured with each data element surround by tags. Can become quite large because of the overhead of the tags.

– Microsoft Excel (.xls) or ODF Spreadsheet (.ods)

• Stat/Transfer – Very useful if you frequently convert data between programs

– Both a graphic interface and a command line interface

– Student pricing available from Stata through GradPlan

Page 23: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

23

Data Conversion

Data Conversion

• 1-2-3• Access (Windows version only)• ASCII - Delimited• ASCII- Fixed Format• dBASE and compatible formats• Data Documentation Initiative (DDI) Schemas• Epi Info• Excel• FoxPro• Gauss• HTML Tables (write only)• JMP• LIMDEP• Matlab• Mineset• Minitab• Mplus• NLOGIT• ODBC (Windows and Mac versions only)

• OpenDocument Spreadsheets• OSIRIS (read-only)• Paradox• Quattro Pro• R• RATS• SAS Data Files• SAS Value Labels• SAS CPORT (read-only)• SAS Transport Files• S-PLUS• SPSS Data Files• SPSS Portable• Stata• Statistica (Windows version only)• SYSTAT• Triple-S

Stat/Transfer Version 11 supports the following file formats.

Data Cleaning and Conversion

• OpenRefine (formerly GoogleRefine)From openrefine.org:

“OpenRefine is a powerful tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase.”

– Learning curve to get proficient with tool.

– Runs locally on your computer.

Page 24: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

24

Data Transfer Methods

• Data transfer options– Email is not always an ideal method because of file size or security issues– An ideal solution will encrypt during transfer and storage– JHU offers JSHARE (my.jhu.edu)

• Web-based, 5GB of space• Can grant access to non-Hopkins users• WebDAV allows you to connect as a network folder

– Dropbox or SpiderOak• Web-based, some initial storage free• Allows sharing to collaborate and synchronization• Data is encrypted on Dropbox site BUT site administrators can access your data• Add a level of security by encrypting files BEFORE posting to Dropbox.• SpiderOak has encryption that you control. They cannot access your data.• JHU may be licensing Microsoft SkyDrive or Box but not yet finalized

– JHSPH offers SharePoint (my.jhsph.edu)• Allows for file sharing, discussions, announcements, calendars• Integrates closely with Microsoft Office• Can be challenging to use with external collaborators

Final Thoughts…

• Before you start collecting data, make sure that you test the “flow” of the data.

• Be sure that you don’t have to process too much to get data into final form.

• Once you start collecting data, don’t make radical changes unless absolutely necessary.

• When changes occur, think through the repurcussions. – What will be lost and gained?– How will it impact the final dataset?– Will it be reasonable to explain in a publication?

• Document events that come up during data collection– Collection issues (equipment, political, environmental)– Decisions to change elements such as forms, variables, timing

Final thoughts…After collection, when managing the data…• Organize your work so that it is reproducible and easy to understand when you return to

it.– Use a system for naming your programs and their outputs– Ideal is a “push-button” system that you can run from beginning to end with the raw data.– Break the programs into logical pieces/batches.

• Could be batches of data

• Never overwrite the data when you are making the changes. Save to a new dataset!– It is almost guaranteed that there will be some errors discovered during analysis.

• Review data whenever you make changes– If possible, confirm that the number of changes is what you expect. – Spot check some records and make sure that they match. Check the first, middle and last records.

• Document the changes that you make – Make comments in your programs so you and others will understand what was done to the data– Write the comments as if they are for someone unfamiliar with the study.

• Details will fade from everyone’s memory over time.• Transition will be easier when there is turnover in a team

– Make notes in your codebook describing changes or issues that must be communicated to anyone that might analyze the data

Page 25: Data Management: Collecting, Protecting and Managing Data€¦ · Audit Trail Logging in REDCap Mobile data collection • Mobile data collection – Apple (iPhone, iPad) and Android

25

Thank You!

Contact Information

• Andre HackmanResearch Associate, Biostatistics Center

Department of BiostatisticsJohns Hopkins Bloomberg School of Public Health615 North Wolfe Street, Room E3146Baltimore, MD 21205Phone: (410)614-5646Fax: (410)955-0958

email: [email protected]

Resources mentioned

• Online backup– Crashplan : www.crashplan.com– Carbonite : www.carbonite.com

• Encryption and Compression– TrueCrypt : www.truecrypt.org– 7-Zip : www.7-zip.org

• Data Conversion and Cleaning– Stat/Transfer : www.stattransfer.com

• Data Transfer– Dropbox : www.dropbox.com– Spider Oak: www.spideroak.com