September 27, 2012 THE FLOW OF DATA. The Flow of Data Data sources Data streams Databases Data...

29
September 27, 2012 THE FLOW OF DATA

Transcript of September 27, 2012 THE FLOW OF DATA. The Flow of Data Data sources Data streams Databases Data...

September 27, 2012

THE FLOW OF DATA

The Flow of Data

•Data sources•Data streams•Databases•Data repositories•Data warehouses

Data Source

An entity that collects the data:

•Health care setting – hospital, clinic•Diagnostic facilities – labs, mobile unit•Research laboratories•Schools•Work places•Government agencies•Surveillance system

Data Stream

A constant flow of a specific type of data

• Death reports• Laboratory diagnostic data• Insurance claims• Pharmaceutical sales• Website searches• Infection reports• Surveillance data

Database

An organized collection of data

• Allows maintenance of complex information

• Organized in a relevant way to purpose• Allows quick selection of desired data –

searchable

Data Repository

A location to safely store and compile data from similar sources

Data WarehouseA database for analysis of compiled data for the purposes of storage and reportingOften has purpose of enabling decision making

Databases and Health•Person (or animal or population) – Place – Time

• Concept from descriptive epidemiology• Characterizes health events• Helps understand why events happen• Who is at risk? Where? When?• Allows formation of hypotheses for research• Databases can capture what is happening to either an individual or a population in a certain place at a certain time

Example: C. difficile infection in the elderlyCenters for Medicare and Medicaid Services

Database contained data for over 1million C. difficile cases from 1991-2004

Objectives and Hypotheses:1. Does the age related rate acceleration of C.

difficile in the elderly vary geographically?H1: Varies similar to rate

2. Does livestock density influence age related rate acceleration?

H2: Increases with increasing livestock density

Geographic DistributionUS C. difficile age related rate acceleration

2.1%

3.7 to 5%

5.1 to 6.5%

6.6 to 7.9%

C.diff rate increase per year of age

Accumulated Data Over TimeUS C. difficile rate 1991-2004

c. Difficile rate acceleration and livestock density by state

Human population – place - time

200 Countries, 200 years, 4 minutes

Considerations for Data Use•Timeliness• When was the data collected? Recent enough?

•Accessibility• Who has access to the data? How to gain access?

•Comparability• Are the data in the database comparable for use together?

• Data coming from different sources!•Compatibility• Are the data in the database compatible? With data from other sources? With the research question?

Primary vs Secondary Data

Primary data• Data that was collected for the analysis being performed

• Examples: • use of laboratory data collected by a hospital to provide care for an individual

• Treatment trial• Laboratory experiment

Primary vs Secondary Data

Secondary data• Data collected for another purpose and now being used for a different analysis

• Examples:• Re-use of data for any purpose• Systematic review• Use hospital records for a retrospective study

Uncertainty in the Primary DataConsider in secondary use of the data!

Accuracy•Degree to which a measurement reflects the true value (data predicts the true population mean)

Precision•Degree to which repeated measurements obtain the same results (data is repeatable)

Bias•Lacking neutrality or having a one-sided view

Accuracy vs Precision

Quality of Primary Data

•Cannot assume primary data is high quality• In addition to being accurate and precise, also consider:• Relevance – is the data useful to your research question?

• Timeliness – is the data available when needed?• Completeness – is their missing data?

Improving Data Quality

•Correcting (after entry) – time consuming, possibly expensive

•Avoiding quality issues:• Avoid missing data• Avoid entry errors (typos, etc)• Enter data into a database for use quickly

Secondary Use of DataWhy do it?

Secondary Use of DataWhy do it?

•New research question•Analysis•Public health investigation•Marketing•Population level monitoring of health•Retrospective analysis•Cost saving•Proof of concept

Secondary Use of DataConservation Medicine Applications

•Not possible to measure individual level exposures in people or animals• Ethics• Cost• Not possible

•An exposure often shared by many in a population

•Exposure may be limited to a specific population

•Limited scale effects may be hard to study without population level data

Ethical Considerations in Secondary Data Use

• For humans – data derived from patients• Individual rights? Restrict use after anonymization?

• Domestic animals – pets, livestock• Owner or farmer rights?

• Wildlife and ecosystem• Public?

• Who owns data?• Who has the right to access it?• For what purpose can it be used?

• Data use and sharing agreements• Public policy issues

Data Confidentiality

Example: MDPH Confidentiality Agreement

Field Trips! Thursday Oct 4th

Primary Data Sources

Visit to Angell Animal Medical CenterTime and location: Angell Animal Medical Center, 350 South Huntington Avenue, Jamaica Plain, MA 02130

1pm-2pm

Visit to the State Lab Institute Epidemiology Unit

Meet with Johanna Vostok, Lynda Glenn and Gillian Haney, Room 123, MDPH State Laboratory Institute, 305 South Street, Jamaica Plain, MA 02130

2:45-4pm

Assignment for Oct 3rd

5-10 minute group presentation

•Progress report on systematic review:• Research question• Literature review strategy (keywords, databases, etc)

• Retrieved article coding form • Selection criteria

• Problems encountered• Solutions?• Collaboration needed?

Systematic Review Project•Paper due October 18th

•Presentation on October 24th 9-12 at TIE

•Paper format – like a journal article:• Title• Abstract• Introduction/Background• Methods• Results• Discussion/Conclusions• References