Vanderbilt’s DNA Databank: BioVU. Personalized Medicine Integration of genomic information into...

Post on 24-Dec-2015

217 views 0 download

Tags:

Transcript of Vanderbilt’s DNA Databank: BioVU. Personalized Medicine Integration of genomic information into...

Vanderbilt’s DNA Databank:BioVU

Personalized Medicine

• Integration of genomic information into clinical decision making

• Personalized disease treatment and also preventative therapies

What is BioVU?

• The move towards personalized medicine requires very large sample sets for discovery and validation

• BioVU: biobank intended to support a broad view of biology and enable personalized medicine

• Contains de-identified DNA extracted from leftover blood after clinically-indicated testing of Vanderbilt patients who have not opted out

• Linked to Synthetic Derivative: de-identified EMR

• Current sample number: 135,765

o 120,705 adult samples

o 15,099 pediatric samples

Patient Communication Modules

eligibleJoh

n D

oe

One

way

has

h

A7C

CF

99D

E57

32…

.

A7C

CF

99D

E65

732

….

scru

bbed

Extract DNA

A7C

CF

99D

E65

732

….

Joh

n D

oe

The “synthetic derivative”(SD): can be updated

The Synthetic Derivative

• A Derivative of the EMR - information content reduced by ‘scrubbing’ identifiers

• Systematically shifted event dates

• Contains ~1.9 million recordso ~1 million with detailed longitudinal datao averaging 100,000 bytes in size o an average of 27 codes per record

• Records updated over time and are current through 4/30/11

Narratives, such as:• Clinical Notes• Discharge Summaries• History and Physicals• Problem Lists• Surgical Reports• Progress Notes• Letters

Diagnostic Codes, Procedural Codes

Forms (intake, assessment)

Reports (pathology, ECGs, echocardiograms)

Clinical Communications

Lab Values and Vital Signs

Medication Orders

TraceMaster (ECGs)

Synthetic Derivative Data Types

Synthetic Derivative vs. BioVU

A7C

DE

6532

….

A7C

DE

65

32

….

scru

bbed

+

A7C

DE

65

32

….

scru

bbed

Synthetic Derivative BioVU ~1.9 million ~135,000

Jul-07

Jan-08Jul-0

8Jan-09

Jul-09

Jan-10Jul-1

0Jan-11

Jul-11

Jan-12Jul-1

2Jan-13

Jul-13

Jan-140

25,000

50,000

75,000

100,000

125,000

150,000

175,000

200,000

225,000

Anticipated pediatric samples

Anticipated adult sample accrual

Pediatric samples accrued

Adult samples accrued

Sample accrual

Current accrual as of 2-13-2012:135,765 samples15,099 pediatric

>75

71 - 75

61 - 70

51 - 60

41 - 50

31 - 40

21 - 30

11 - 20

1 - 10

<1

AGE

Female

Male

GENDER

White

Others

Hispanic

Asian

African American

RACE

BioVU Demographics

BioVU Sample Management

RTS SmaRTStore

Validation in BioVU

• Sample handling algorithmso Gender matcho 1/384 gender mismatches

• Ancestryo Characterize sample ancestry, assess usefulness of ‘race’ as

defined in EMRo Provide a panel of ancestry informative markers that define ancestryo No significant difference between the concordance of self-report or

observer-report with genetic ancestry

• Demonstration project – American Journal of Human Genetics, 2010o Can known associations between genetic variants and common

diseases be identified in the EMR?

The “demonstration project”

• Genotype “high-value” SNPs in the first 8,000 samples accrued.o including SNPs associated by replicated genome-wide

experiments with common diseases & traits 1. Atrial fibrillation2. Crohn’s disease3. Multiple Sclerosis4. Rheumatoid arthritis5. Type II Diabetes

• Develop Natural Language Processing methods to identify cases and controls

• Are genotype-phenotype relations replicated?

First results

0.5 5.01.0Odds Ratio

rs2200733 Chr. 4q25

rs10033464 Chr. 4q25

rs11805303 IL23R

rs17234657 Chr. 5

rs1000113 Chr. 5

rs17221417 NOD2

rs2542151 PTPN22

rs3135388 DRB1*1501

rs2104286 IL2RA

rs6897932 IL7RA

rs6457617 Chr. 6

rs6679677 RSBN1

rs2476601 PTPN22

rs4506565 TCF7L2

rs12255372 TCF7L2

rs12243326 TCF7L2

rs10811661 CDKN2B

rs8050136 FTO

rs5219 KCNJ11

rs5215 KCNJ11

rs4402960 IGF2BP2

Atrial fibrillation

Crohn's disease

Multiple sclerosis

Rheumatoid arthritis

Type 2 diabetes

diseasegene / region

marker

2.00.5 5

0.5 5.01.0Odds Ratio

rs2200733 Chr. 4q25

rs10033464 Chr. 4q25

rs11805303 IL23R

rs17234657 Chr. 5

rs1000113 Chr. 5

rs17221417 NOD2

rs2542151 PTPN22

rs3135388 DRB1*1501

rs2104286 IL2RA

rs6897932 IL7RA

rs6457617 Chr. 6

rs6679677 RSBN1

rs2476601 PTPN22

rs4506565 TCF7L2

rs12255372 TCF7L2

rs12243326 TCF7L2

rs10811661 CDKN2B

rs8050136 FTO

rs5219 KCNJ11

rs5215 KCNJ11

rs4402960 IGF2BP2

Atrial fibrillation

Crohn's disease

Multiple sclerosis

Rheumatoid arthritis

Type 2 diabetes

diseasegene / region

marker

2.00.5 5

First results

Types of projects

• Discovery or validation of genotype-phenotype relations for disease susceptibility or drug responses

• Discovery of new disease/susceptibility genes resequence in patients (obesity, Cushing's, susceptibility to infection, insomnia, pre-term birth)

• Access samples without disease X, or “normals” of specified ancestry, or old normals

• Phenome-wide association study (PheWAS): in development

Data Use Agreement

Genotyping Data Accrual

Q2 2010

Q3 2010

Q4 2010

Q1 2011

Q2 2011

Q3 2011

Q4 20110

2,0004,0006,0008,000

10,00012,00014,00016,000

Total GWAS SubjectsN=14,747

Q4 2008

Q1 2009

Q2 2009

Q3 2009

Q4 2009

Q1 2010

Q2 2010

Q3 2010

Q4 2010

Q1 2011

Q2 2011

Q3 2011

Q4 20110

10,000

20,000

30,000

40,000

50,000

60,000

Adult SamplesPed Samples

Total Genotyped SubjectsN=56,859

Common Diagnoses in BioVU

Examples of ICD-9 codesfor rare diseases

Example Rare Disease

Number in SD Number in BioVU

Microcephalus 1,070 85

Pica 115 22

Septicemic Plague 21 0

Pick’s Disease 45 8

Acromegaly and Gigantism 571 123

Ehlers-Danlos Syndrome 285 34

Narcolepsy without Cataplexy 438 76

Spina Bifida 1968 238

Stiff-Man Syndrome 82 17

Tourette Syndrome 667 34

Bell’s Palsy 2534 402

Bulimia Nervosa 919 88

Cushing’s 1443 298

Peyronies Disease 694 157

Wilson’s Disease 140 49

Meningioma 1444 355

Wegener’s 363 141

Not included in SD searches:• Bone marrow transplant• SCID

Flagged Compromised samples:• Transfusion within 2 weeks of blood draw• Leukemia• Myeloma• Lymphoma• Pre-leukemic states

General algorithm for determining EMR phenotype

• Iteratively refine case definition through partial manual review until case definition yields PPV ≥ 95%

• For small case sizes (~100), hand curate cases but use automated case definitions for others

• For samples with inadequate counts of “Definite Cases”, manually review possible cases to determine true positives

• For controls, exclude all potentially overlapping syndromes and possible matches, iteratively refine such that NPV ≥ 98%

Definite Cases(algorithm-defined)

Possible Cases(require manual review)

Controls(algorithm-defined)

Excluded(algorithm-defined)

The problem with ICD9 codes

• ICD9 give both false negatives and false positives

• False negatives:• Outpatient billing limited to 4 diagnoses/visit• Outpatient billing done by physicians (e.g., takes too long to find the

unknown ICD9)• Inpatient billing done by professional coders:

• omit codes that don’t pay well • can only code problems actually explicitly mentioned in documentation

• False positives:• Diagnoses evolve over time -- physicians may initially bill for suspected

diagnoses that later are determined to be incorrect• Billing the wrong code (perhaps it is easier to find for a busier clinician)• Physicians may bill for a different condition if it pays for a given

treatment• Example: Anti-TNF biologics (e.g., infliximab) originally not covered for

psoriatic arthritis, so rheumatologists would code the patient as having rheumatoid arthritis

EMR Phenotyping

Medications Labs ICD-9s ≥3 codes

ExclusionsTime

Constraints

+ +

PHENOTYPE

Lessons from preliminary phenotype development

• Eliminating negated and uncertain terms:– “I don’t think this is MS”, “uncertain if multiple sclerosis”

• Delineating section tag of the note – “FAMILY MEDICAL HISTORY: Mother had multiple

sclerosis.”

• Adding requirements for further signs of “severity of disease”– For MS: an MRI with T2 enhancement, myelin basic

protein or oligoclonal bands on lumbar puncture, etc.– This could potentially miss patients with outside work-ups,

however

Other lessons (more difficult to correct)

• A number of incorrect ICD9 codes for RA and MS assigned to patients

• Evolving disease– “Recently diagnosed with Susac’s syndrome - prior diagnosis of

MS incorrect.” (Notes also included a thorough discussion of MS, ADEM, and Susac’s syndrome.)

• Difference between two doctors: – Presurgical admission H&P includes “rheumatoid arthritis” in the

past medical history – Rheumatology clinic visits notes say the diagnosis is

“dermatomyositis” - never mention RA

• Sometimes incorrect diagnoses are propagated through the record due to cutting-and-pasting / note reuse

ANALYSIS PLAN1. Sample size estimation2. Dependent/outcome variable3. Independent variables (include SNPs, covariates, confounders)

a. Should have race, gender, age in all plans4. Statistical method proposed

a. Type of model if appropriateb. How SNPs will be coded

5. Power calculation6. Population stratification plans7. QC plans

a. Call rate, gender checks, HWE – these will be important to do on each dataset pulled to check for phenotype specific QC issues

PHENOTYPE PLAN8. Trait of interest for study9. Demographic constraints (e.g. gender, age, and/or ethnicity)10.Cases and controls require outline of definition including:

• Inclusion criteria (e.g. ICD9 codes, keyword search, medications, laboratory results)

• Exclusion criteria (e.g. ICD9s, keywords, meds, labs, minimum data or follow up)

11.Validation plan for phenotype (e.g. manual review of all or some records)

VICTR Funding

Investigator query

cases

controls

+

Data use agreement + IRB Approval

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

Investigator query

cases

controls

+

Data use agreement + IRB Approval

Manual Review

Sample retrieval

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

F5r

t783

mb

ncd

s….

B69

9tre

563m

sd…

.

F5r

t783

mb

ncd

s….

B69

9tre

563m

sd…

.

F5r

t783

mb

ncd

s….

B69

9tre

563m

sd…

.

F5r

t783

mb

ncd

s….

B69

9tre

563m

sd…

.

F5r

t783

mb

ncd

s….

B69

9tre

563m

sd…

.

F5r

t783

mb

ncd

s….

B69

9tre

563m

sd…

.

F5r

t783

mb

ncd

s….

B69

9tre

563m

sd…

.

F5r

t783

mb

ncd

s….

B69

9tre

563m

sd…

.

F5r

t783

mb

ncd

s….

B69

9tre

563m

sd…

.

cases

controls

+Investigator

query

cases

controls

+

Data use agreement + IRB Approval

Sample retrieval

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

B6

99

tre

563

msd

..

scru

bbed

F5

rt7

83

mb

nc

ds

scru

bbed

F5r

t783

mb

ncd

s….

B69

9tre

563m

sd…

.

F5r

t783

mb

ncd

s….

B69

9tre

563m

sd…

.

F5r

t783

mb

ncd

s….

B69

9tre

563m

sd…

.

F5r

t783

mb

ncd

s….

B69

9tre

563m

sd…

.

F5r

t783

mb

ncd

s….

B69

9tre

563m

sd…

.

F5r

t783

mb

ncd

s….

B69

9tre

563m

sd…

.

F5r

t783

mb

ncd

s….

B69

9tre

563m

sd…

.

F5r

t783

mb

ncd

s….

B69

9tre

563m

sd…

.

F5r

t783

mb

ncd

s….

B69

9tre

563m

sd…

.

Genotyping, genotype-phenotype relations

cases

controls

+Investigator

query

cases

controls

+

Data use agreement + IRB Approval

BioVU Genotyping Process

Genotyped data analyzedby investigator

Investigator selects cases and controls from

Synthetic Derivative

Investigator signals BioVU programto initiate sample selection

BioVU notifies DNA resources core that samples are ready for

selection and picking

Samples are provided toappropriate lab and are genotyped

Investigator and BioVU programreceive genotype data

BioVU Genotyping Process:

BioVU Requests

60 Total Requests43 Approvals

BioVU Requests BioVU Approvals0

10

20

30

40

50

60 DNA Requests Data Requests

71

BioVU: New Directions

A well characterized cohort of individuals without specific diseases across all ages to be used as controls

Expansion of BioVU to capture and store plasma to enable candidate proteomic/biomarker research

Expanding BioVU genotyping to include mitochondrial SNP genotyping and copy number variants

Link pediatric DNA samples to maternal samples (mom-baby pairs resource)

Expansion of BioVU sequencing activities to include whole exome sequencing on targeted populations

FAQ “answers”

• SD access: “non-human subjects” IRB review (days)

• Current access costs: $4/sample

• Genotyping data: no charge

• Genotyping:o Investigator-funded

Consider VICTR as a funding source

o Genotyping/sequencing performed in VUMC Core Facilities Justification must be provided for outside genotyping, including quality

control plans

o Genotype “redeposit” part of the data use agreement

Questions?

Contact: Erica Bowton PhD

BioVU Program Manager

erica.bowton@vanderbilt.edu

322-1975