Identifying RA patients from the electronic medical records at Partners HealthCare

35
Identifying RA patients from the electronic medical records at Partners HealthCare Robert Plenge, M.D., Ph.D. VA Hospital July 20, 2010 HARVARD MEDICAL SCHOOL

description

Identifying RA patients from the electronic medical records at Partners HealthCare. Robert Plenge, M.D., Ph.D. VA Hospital July 20, 2010. HARVARD MEDICAL SCHOOL. genotype. phenotype. clinical care. genotype. bottleneck. phenotype. clinical care. July 2010: > 30 RA risk loci. - PowerPoint PPT Presentation

Transcript of Identifying RA patients from the electronic medical records at Partners HealthCare

Page 1: Identifying RA patients from the electronic medical records at Partners HealthCare

Identifying RA patients from the electronic medical records at Partners HealthCare

Robert Plenge, M.D., Ph.D.

VA Hospital

July 20, 2010

HARVARDMEDICAL SCHOOL

Page 2: Identifying RA patients from the electronic medical records at Partners HealthCare

genotype

phenotype

clinical care

Page 3: Identifying RA patients from the electronic medical records at Partners HealthCare

genotype

phenotype

clinical care

bottleneck

Page 4: Identifying RA patients from the electronic medical records at Partners HealthCare

July 2010: >30 RA risk loci

20031978 1987 20052004

PTPN22

2008

“shared epitope”hypothesis

HLADR4

2007

PADI4 CTLA4

TNFAIP3STAT4TRAF1-C5IL2-IL21

CD40CCL21CD244IL2RBTNFRSF14PRKCQPIP4K2CIL2RAAFF3

Latest GWAS in 25,000 case-control samples with replication in 20,000 additional samples

2009

RELBLKTAGAPCD28TRAF6PTPRCFCGR2APRDM1CD2-CD58

Together explain ~35% of the genetic burden of

diseaseIL6STSPRED25q21RBPJIRF5CCR6PXK

2010 (Q2)

Page 5: Identifying RA patients from the electronic medical records at Partners HealthCare

genotype

phenotype

clinical carebottleneck

Page 6: Identifying RA patients from the electronic medical records at Partners HealthCare

Genetic predictors of response to anti-TNF

therapy in RA

PTPRC/CD45 allelen=1,283 patients

P=0.0001

Cui et al (2010) Arth & Rheum

Page 7: Identifying RA patients from the electronic medical records at Partners HealthCare

How can we collect DNA and detailed clinical data on >20,000 RA patients?

Page 8: Identifying RA patients from the electronic medical records at Partners HealthCare

What are the options for collecting clinical data and DNA for genetic studies?

Page 9: Identifying RA patients from the electronic medical records at Partners HealthCare

Options for clinical + DNA

design Clinical

data

DNA Sample size

cost

clinical trial

+++ +++ + $$$

registry ++ +++ ++ $$

claims data

+ n/a +++ $

EMR ++ +++ +++ $

Page 10: Identifying RA patients from the electronic medical records at Partners HealthCare

• Narrative data = free-form written text– info about symptoms, medical history,

medications, exam, impression/plan

• Codified data = structured format– age, demographics, and billing codes

Content of EMRs

EMRs are increasingly utilized!

Page 11: Identifying RA patients from the electronic medical records at Partners HealthCare

Gabriel (1994) Arthritis and Rheumatism

This is not a new idea…

Sens: 89%PPV: 57%

Page 12: Identifying RA patients from the electronic medical records at Partners HealthCare

Gabriel (1994) Arthritis and Rheumatism

Conclusion: The sole reliance on such databases for the diagnosis of RA can result in substantial misdiagnosis.

…but EMR data are “dirty”

Page 13: Identifying RA patients from the electronic medical records at Partners HealthCare

Partners HealthCare: 4 million patients

Page 14: Identifying RA patients from the electronic medical records at Partners HealthCare

Partners HealthCare: linked by EMR

Page 15: Identifying RA patients from the electronic medical records at Partners HealthCare

Partners HealthCare: organized by i2b2

Page 16: Identifying RA patients from the electronic medical records at Partners HealthCare

4 million patients

31,171 patients

ICD9 RA and/or CCP checked(goal = high sensitivity)

3,585 RA patients

Classification algorithm(goal = high PPV)

Page 17: Identifying RA patients from the electronic medical records at Partners HealthCare

• Natural language processing (NLP)– disease terms (e.g., RA, lupus)– medications (e.g., methotrexate)– autoantibodies (e.g., CCP, RF)– radiographic erosions

• Codified data– ICD9 disease codes– prescription medications– laboratory autoantibodies

Our library of RA phenotypes

Qing Zeng

Concept/term Accuracy of concept

presence of erosion 88% seropositive 96% CCP positive 98.7% RF positive 99.3% etanercept 100% methotrexate 100%

Guergana Savova

Page 18: Identifying RA patients from the electronic medical records at Partners HealthCare

• Natural language processing (NLP)– disease terms (e.g., RA, lupus)– medications (e.g., methotrexate)– autoantibodies (e.g., CCP, RF)– radiographic erosions

• Codified data– ICD9 disease codes– prescription medications– laboratory autoantibodies

Our library of RA phenotypes

Shawn Murphy

Page 19: Identifying RA patients from the electronic medical records at Partners HealthCare

‘Optimal’ algorithm to classify RA: NLP + codified data

Regression model with a penalty parameter (to avoid over-fitting)

Codified data NLP data

Tianxi Cai, Kat Liao

Page 20: Identifying RA patients from the electronic medical records at Partners HealthCare

High PPV with adequate sensitivity

Model PPV (SE) Sensitivity (SE)

Codified + NLP 0.93 (0.02) ✪ 0.63 (0.06)

NLP only 0.89 (0.02) 0.56 (0.05)

Codified only 0.88 (0.02) 0.51 (0.05)

✪392 out of 400 (98%) had definite or possible RA!

Page 21: Identifying RA patients from the electronic medical records at Partners HealthCare

This means more patients!

Model PPV (SE) Sensitivity (SE)

Codified + NLP 0.93 (0.02) ✪ 0.63 (0.06)

NLP only 0.89 (0.02) 0.56 (0.05)

Codified only 0.88 (0.02) 0.51 (0.05)

~25% more subjects with the complete algorithm:

3,585 subjects (3,334 with true RA)3,046 subjects (2,680 with true RA)

Page 22: Identifying RA patients from the electronic medical records at Partners HealthCare

Liao et (2010) Arth. Care Research

Characteristics i2b2 RA CORRONA

total number 3,585 7,971

Mean age (SD) 57.5 (17.5) 58.9 (13.4)

Female (%) 79.9 74.5

Anti-CCP(%) 63 N/A

RF (%) 74.4 72.1

Erosions (%) 59.2 59.7

MTX (%) 59.5 52.8

Anti-TNF (%) 32.6 22.6

Clinical features of patients

CCP has an OR = 1.5 for predicting erosions

Page 23: Identifying RA patients from the electronic medical records at Partners HealthCare

4 million patients

31,171 patients

ICD9 RA and/or CCP checked(goal = high sensitivity)

3,585 RA patients

Classification algorithm(goal = high PPV)

Discarded blood for DNA

Page 24: Identifying RA patients from the electronic medical records at Partners HealthCare

Linking the Datamart-Crimson

NLP

dat

aC

odifi

ed d

ata

Page 25: Identifying RA patients from the electronic medical records at Partners HealthCare

OR similar in EMR cohort

0.7

0.9

1.1

1.3

1.5

1.7

1.9

2.1

2.3

2.5GWAS

ACPA+ Current Study

SNP (ordered by chromosome and position)

Od

ds

Ra

tio

PTPN

22

CD2,

CD58

RE

L

SPRE

D2

AFF3

STAT

4 CD28 CT

LA4

PXK

RBPJ

C5O

RF13

ANKR

D55,

IL6S

T

HLA

PRDM

1

TNFA

IP3

TNFA

IP3

TAGA

P CCR6

IRF5

BLK

CCL2

1CC

L21

TRAF

1, C

5

IL2RA

IL2RA

PRKC

Q

KIF5

A, P

IP4K

2C

CD40

IL2RB

1,500 RA multi-ethnic RA cases and 1,500 matched controls

Page 26: Identifying RA patients from the electronic medical records at Partners HealthCare

Genetic risk score also similar

Page 27: Identifying RA patients from the electronic medical records at Partners HealthCare

4 million patients

31,171 patients

ICD9 RA and/or CCP checked(goal = high sensitivity)

3,585 RA patients

Classification algorithm(goal = high PPV)

Clinical subsets

Discarded blood for DNA

Page 28: Identifying RA patients from the electronic medical records at Partners HealthCare

Response to therapy

Page 29: Identifying RA patients from the electronic medical records at Partners HealthCare

Non-responder to anti-TNF therapy

NLP+codified data, together with statistical modeling, to define treatment

response

Page 30: Identifying RA patients from the electronic medical records at Partners HealthCare

Responder to anti-TNF therapy

NLP+codified data, together with statistical modeling, to define treatment

response

Page 31: Identifying RA patients from the electronic medical records at Partners HealthCare

Responder to anti-TNF therapy

5-year NIH grant as part of the PharmacoGenomics

Research Network (PGRN)

Page 32: Identifying RA patients from the electronic medical records at Partners HealthCare

Conclusions

Page 33: Identifying RA patients from the electronic medical records at Partners HealthCare

Options for clinical + DNA

design Clinical data

DNA Sample size

cost

clinical trial

+++ +++ + $$$

registry ++ +++ ++ $$

claims data

+ n/a +++ $

EMR ++ +++ +++ $

Conclusion: NLP + codified data, together with appropriate statistical modeling, can yield accurate clinical data.

Page 34: Identifying RA patients from the electronic medical records at Partners HealthCare

Options for clinical + DNA

design Clinical data

DNA Sample size

cost

clinical trial

+++ +++ + $$$

registry ++ +++ ++ $$

claims data

+ n/a +++ $

EMR ++ +++ +++ $

Conclusion: Genetic studies in our EMR cohort yield effect sizes similar to traditional cohorts.

Page 35: Identifying RA patients from the electronic medical records at Partners HealthCare

Options for clinical + DNA

design Clinical data

DNA Sample size

cost

clinical trial

+++ +++ + $$$

registry ++ +++ ++ $$

claims data

+ n/a +++ $

EMR ++ +++ +++ $

Conclusion: It should be possible to extend this same framework to classify response vs non-response to drugs used to treat RA.