Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

29
Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International

Transcript of Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Page 1: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Creating Syndrome Definitions Using RStudio

Tim HopperData Scientist

RTI International

Page 3: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Setup

# Connect to TarrantCounty_FP database

# Credentials

USERNAME <- 'username'PASSWORD <- 'password'HOSTNAME <- 'data3.biosen.se'DBNAME <- 'TarrantCounty_FP'TABLE <- 'TC_Meaningful_Use_Base'

# Create database connectioncon <- dbConnect(dbDriver('MySQL'), user=USERNAME, password=PASSWORD,

host=HOSTNAME, dbname=DBNAME)

Page 4: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Example: Co-morbid Syndrome

We want to see the co-occurrence of influenza (influenza-like illness) and asthma.  Data source: Texas region 2/3 Location: Tarrant County Time: February 1–October 31, 2013

Page 5: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.
Page 6: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.
Page 7: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Query for Asthma

SELECT Facility_City, Facility_State, Diagnosis_Code, Diagnosis_Text, Chief_Complaint, Age, Gender, Visit_Date_Time, Row_Number

FROM TC_Meaningful_Use_Base WHERE Visit_Date_Time BETWEEN '2013-02-01 00:00:00' AND '2013-10-31 23:59:59'

AND (Diagnosis_Code LIKE '%493%')

Page 8: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Query for Influenza-Like Illness

SELECT Facility_City, Facility_State, Diagnosis_Code, Diagnosis_Text, Chief_Complaint, Age, Gender, Visit_Date_Time, Row_Number

FROM TC_Meaningful_Use_Base WHERE Visit_Date_Time BETWEEN '2013-02-01 00:00:00' AND '2013-10-31 23:59:59'

AND (Diagnosis_Code LIKE '%487%' OR Diagnosis_Code LIKE '%488%' OR Diagnosis_Code LIKE '%V04.8%'OR Diagnosis_Code LIKE '%V0481%' OR Diagnosis_Code LIKE '%V06.6%'OR Diagnosis_Code LIKE '%V066%')

Page 9: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Run Query and Process Data

# Run Querydf.asthma <- dbGetQuery(con, query.asthma)df.ili <- dbGetQuery(con, query.ili)

# Add column naming each as a syndromedf.asthma$Syndrome <- 'ASTHMA'df.ili$Syndrome <- 'ILI'

# Combine these two data sets into one data.framedf <- rbind(df.asthma, df.ili)

# Format dates and add date column (without time)df$Visit_Date_Time <- ymd_hms(df$Visit_Date_Time)df$Visit_Date <- as.Date(df$Visit_Date_Time)

Page 10: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Create Summary Data Set

events.per.day.split <- ddply(df, .(Visit_Date, Syndrome), summarize, Number_of_Visits=length(Visit_Date))

############################################# Visit_Date Syndrome Number_of_Visits# 1 2013-02-01 ASTHMA 49# 2 2013-02-01 ASTHMA & ILI 2# 3 2013-02-01 ILI 5# 4 2013-02-02 ASTHMA 60# 5 2013-02-02 ILI 21# 6 2013-02-03 ASTHMA 89############################################

Page 11: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Visits Per Day by Syndrome

ggplot(events.per.day.split) + aes(Visit_Date, Number_of_Visits, color=Syndrome) + geom_line()

Page 12: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Create Summary Data Set

events.per.day <- ddply(df, .(Visit_Date), summarize, Number_of_Visits=length(Visit_Date))

################################ Visit_Date Number_of_Visits# 1 2013-02-01 513# 2 2013-02-02 396# 3 2013-02-03 428# 4 2013-02-04 409# 5 2013-02-05 580# 6 2013-02-06 391###############################

Page 13: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Visits Per Day by Syndrome

ggplot(events.per.day) + aes(Visit_Date, Number_of_Visits) + geom_line()

Page 14: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Example: New Syndrome

We want to see create a new syndrome to identify visits during which the patient had cough AND dizziness AND headache.  Data source: Texas region 2/3 Location: Tarrant County Time: February 1–October 31, 2013

Page 15: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Query

SELECT Facility_City, Facility_State, Diagnosis_Code, Diagnosis_Text,

Chief_Complaint, Age, Gender, Visit_Date_Time, Row_Number

FROM TC_Meaningful_Use_Base WHERE Visit_Date_Time BETWEEN '2013-02-01 00:00:00'

AND '2013-10-31 23:59:59'AND (Diagnosis_Code LIKE '%786.2%' OR

Diagnosis_Code LIKE '%7862%') AND (Diagnosis_Code LIKE '%780.4%' OR

Diagnosis_Code LIKE '%7804%') AND (Diagnosis_Code LIKE '%784.0%' OR

Diagnosis_Code LIKE '%7840%');

Page 16: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Run Query

# Run Querydf.sick <- dbGetQuery(con, query)

# Fix dates using lubridatedf.sick$Visit_Date_Time <- ymd_hms(df.sick$Visit_Date_Time)

# Create a month columndf.sick$Month <- month(df.sick$Visit_Date, label=T)

Page 17: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Run Query

ggplot(df.sick) + aes(Month) + geom_histogram()

Page 18: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Create Line Listing

write.csv(df.sick, 'sick.csv', quote=F, row.names=F)

# sick.csv:## Row_Number,Facility_City,Facility_State,Diagnosis_Code,D...# 1374852,Houston,TX,473.9:780.4:300.00:786.2:784.0:305.1:...# 1536525,Houston,TX,486:786.2:780.4:784.0:794.00:789.00:7...# 2100347,Rowlett,TX,780.4:784.0:786.2,NA,SCREENING - HA -...# 2189305,Rowlett,TX,780.4:784.0:786.2:V76.12,NA,SCREENING...# 3108090,Rowlett,TX,780.4:784.0:786.2:V76.12,NA,SCREENING...# 5887191,Rowlett,TX,786.2:780.1:780.4:784.0,NA,786.2:SEP:...# 7968958,Houston,TX,493.90:780.4:780.60:784.0:786.2:787.0...# 9197758,Houston,TX,493.90:780.4:780.60:784.0:786.2:787.0...

Page 19: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Example: Refined Age Groups

We want to see motor vehicle traffic accidents involving young people. We recombine the ages to the following groups: 0–15, 16–20, 21–25, 26–30, and 31–35 years. Data source: Texas region 2/3 Location: Tarrant County Time: February 1–October 31, 2013

Page 20: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.
Page 21: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.
Page 22: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Query

SELECT Facility_City, Facility_State, Diagnosis_Code, Diagnosis_Text,

Chief_Complaint, Age, Gender, Visit_Date_Time, Row_Number

FROM TC_Meaningful_Use_Base WHERE Visit_Date_Time BETWEEN '2013-02-01 00:00:00'

AND '2013-10-31 23:59:59' AND (Diagnosis_Code LIKE '%E81_%')AND Age <= 35;

Page 23: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Run Query

# Run Querydf.auto <- dbGetQuery(con, query)

# Fix dates using lubridatedf.auto$Visit_Date_Time <- ymd_hms(df.auto$Visit_Date_Time)

# Create a date columndf.auto$Visit_Date <- as.Date(df.auto$Visit_Date_Time)

Page 24: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Bin Ages

# Drop all rows where age is greater than 35 years or is undefined

df.auto <- df.auto[!is.na(df.auto$Age),]df.auto <- df.auto[df.auto$Age <= 35,]

# Bin ages

df.auto$Age_binned <- cut(df.auto$Age, breaks=c(0, 15, 20, 25, 30, 35), include.lowest=T)

Page 25: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Histogram of Visits by Age Group

ggplot(df.auto) + aes(Age_binned) + geom_histogram()

Page 26: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Create Summary Data Set

df.auto.daily.counts <- ddply(df.auto, .(Visit_Date, Age_binned), summarize, count=length(Chief_Complaint))

################################# Visit_Date Age_binned count# 1 2013-02-01 [0,15] 3# 2 2013-02-01 (15,20] 25# 3 2013-02-01 (20,25] 16# 4 2013-02-01 (25,30] 16# 5 2013-02-02 [0,15] 13# 6 2013-02-02 (15,20] 6###############################

Page 27: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Visits per Week by Ageggplot(df.auto.daily.counts)

+ aes(x = Visit_Date, y = count, color=Age_binned) + geom_line(size=2, alpha=.7)

Page 28: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Visits per Week by Age

ggplot(df.auto.daily.counts) + aes(x = Visit_Date, y = count, color=Age_binned) + geom_smooth(size=3, alpha=.7)

Page 29: Creating Syndrome Definitions Using RStudio Tim Hopper Data Scientist RTI International.

Questions?