ESA Ignite talk on quality control for data

20
Ensuring That Your Data are High Quality Carly Strasser | @carlystrasser California Digital Library ESA 2014

description

Talk on Quality Control and Quality Assurance for ecological data, presented as an ignite talk for ESA 2014 meeting in Sacramento CA 12 Aug 2014

Transcript of ESA Ignite talk on quality control for data

Page 1: ESA Ignite talk on quality control for data

Ensuring That Your Data are High Quality

Carly Strasser | @carlystrasser California Digital Library

ESA 2014

Page 2: ESA Ignite talk on quality control for data

Quality assurance & control: Mechanisms for preventing errors from entering a data set

Page 3: ESA Ignite talk on quality control for data

Quality assurance & control:

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

Page 4: ESA Ignite talk on quality control for data

Why?

Page 5: ESA Ignite talk on quality control for data

Why?

Page 6: ESA Ignite talk on quality control for data

From:

Page 7: ESA Ignite talk on quality control for data

Prevent Minimize

Detect Handle

4 Strategies:

From Flickr by Elliott Teel

Page 8: ESA Ignite talk on quality control for data

Prevent Errors Before Collection

•  Define & enforce standards •  Formats

•  Codes

•  Measurement units

•  Metadata

From

Flic

kr b

y S

taci

eBee

Page 9: ESA Ignite talk on quality control for data

Prevent Errors Before Collection

•  Define & enforce standards •  Formats

•  Codes

•  Measurement units

•  Metadata

•  Assign responsibility

for data quality

From

Flic

kr b

y S

taci

eBee

Page 10: ESA Ignite talk on quality control for data

Comments & notes fields Allows handling of unexpected situations

Prevent Errors Before Collection

Allow “other” values

From Flickr by Olga Nohra

Page 11: ESA Ignite talk on quality control for data

Minimize Errors During Collection

•  Eliminate manual data entry •  Design data storage well

•  Minimize repeat entry

•  Use consistent terminology •  Atomize data

From Flickr by Butal Lee

Page 12: ESA Ignite talk on quality control for data

You should invest time in learning databases if your data sets are large or complex

Consider investing time in learning databases if your data are small and humble

you ever intend to share your data you are < 30 years old

From Mark Schildhauer

Minimize Errors: Use databases

Page 13: ESA Ignite talk on quality control for data

Databases •  FileMaker Pro (Mac)

•  Access (PC)

•  LibreOffice

Minimize Errors: Tools

Page 14: ESA Ignite talk on quality control for data

Databases •  FileMaker Pro (Mac)

•  Access (PC)

•  LibreOffice

Spreadsheets •  Google forms

•  LibreOffice

•  Lists & data validation in Excel

Minimize Errors: Tools

Page 15: ESA Ignite talk on quality control for data

Detect Errors After Collection Look for outliers

Goal is not to eliminate outliers but to identify potential data contamination

0

10

20

30

40

50

60

0 10 20 30 40

Page 16: ESA Ignite talk on quality control for data

Detect Errors After Collection Look for outliers

Goal is not to eliminate outliers but to identify potential data contamination

Strategies •  Normal probability plots

•  Regression

•  Scatter plots •  Maps

0

10

20

30

40

50

60

0 10 20 30 40

Page 17: ESA Ignite talk on quality control for data

Handle Errors •  Case-by-case decision •  Flag them?

•  Remove them?

•  Fix them?

•  Document all changes readme.txt, scripts

Page 18: ESA Ignite talk on quality control for data

Handle Errors •  Case-by-case decision •  Flag them?

•  Remove them?

•  Fix them?

•  Document all changes readme.txt, scripts

•  Keep original data separate

•  Use scripts

Raw data as .csv

R script for QAQC

Page 19: ESA Ignite talk on quality control for data

Prevent Minimize

Detect Handle

4 Strategies:

From Flickr by Elliott Teel

Page 20: ESA Ignite talk on quality control for data

Website Email

Twiter Slides

carlystrasser.net [email protected] @carlystrasser slideshare.net/carlystrasser