Download - Use of Thermostable Group II Intron Reverse Transcriptases (TGIRTs) for Single- Stranded DNA-seq of Cell-Free DNA in Human Plasma and Molecular Diagnostics

Transcript
Page 1: Use of Thermostable Group II Intron Reverse Transcriptases (TGIRTs) for Single- Stranded DNA-seq of Cell-Free DNA in Human Plasma and Molecular Diagnostics

20 min

20 min

60 min

~2 ng

(3) Template-switching by TGIRT

Alkaline treatment cDNA clean-up

(4) Adaptor ligation bythermostable 5’ AppDNA/RNA ligase

R2 RNA 3’-Blocker

5’

5’

3’-N R2R DNA

5’ 3’OH

TGIRT

cDNA clean-up

(5) PCR amplification

5’-App 3’-Blocker 5’ 3’ R1R R2R

5’

R2R

R2

P53’

5’

Barcode+P7R1

5’5’P

DNA nick

P

(2) Dephosphorylation &denaturation

5’ 3’ OH5’3’ OH

(1) Plasma DNA

Target DNA (-)

5’5’5’ 3’ OH

3’ OH3’ OH5’ 3’ OH

5’ 3’ OH

Target DNA (-)

Target DNA (+)

Target DNA (+)

Target DNA (-)

UMI

UMI

3’R1R

Use of Thermostable Group II Intron Reverse Transcriptases (TGIRTs) for Single-Stranded DNA-seq of Cell-Free DNA in Human Plasma and Molecular Diagnostics

Douglas C. Wu and Alan M. Lambowitz Institute of Cellular and Molecular Biology and Department of Molecular Biosciences, The University of Texas at Austin

▪ 2-h workflow from purified DNA to library ▪ Simple protocol for adding unique molecular

identifiers (UMIs) to exclude PCR duplicates

▪ Window protection score (WPS) analysis of long (120-180 nt) fragments exhibits periodicity expected for nucleosome packaging

▪ WPS analysis of shorter (35-80 nt) fragments resulting from DNA nicking by endogenous nucleases footprints binding sites for transcription factors, such as CTCF

▪ TGIRT-seq predicted nucleosomal binding sites in plasma DNA from a healthy individual match previous studies (2)

▪ TGIRTs have higher fidelity than conventional viral RT in RNA-seq (6)

▪ TGIRT ssDNA-seq has a 1.5X higher mismatch rates than Nextera XT

▪ TGIRT ssDNA-seq more prone to indels in mononucleotide runs ≥4 nt

▪ K12 (MG1655) genomic DNA coverage comparable to Nextera XT (5)

▪ High coverage of GC-enriched regions reflects ligation bias

Conclusion

Reference

Grant Support and Conflict-of-interest Statement

(B) TGIRT ssDNA-seq of Human Plasma DNA

▪ TGIRT-seq of cell-free plasma DNA from a healthy individual gives data similar to that obtained by conventional ssDNA-seq (2)

▪ Major peak at ~167 nt corresponds to DNA fragments protected in nucleosome cores

▪ 10.4-bp periodicity (gray dashed lines) reflects minor groove nicking of nucleosome-bound DNA by endogenous DNases

▪ Dinucleotides pattern at the ends of 167-nt DNA fragments are as expected for inter-nucleosome cleavage

Supported by NIH grants GM37949 and GM37951 and Welch Foundation Grant F-1607. Thermostable group II intron reverse transcriptase (TGIRT) enzymes and methods for their use are the subject of patents and patent applications that have been licensed by the University of Texas at Austin and East Tennessee State University to InGex, LLC. A.M.L. and the University of Texas are minority equity holders in InGex, LLC, and A.M.L. and other present and former Lambowitz laboratory members receive royalty payments from sales of TGIRT enzymes and licensing of intellectual property.

4. Uhlen et al. Science. 2015 5. basespace.illumina.com/projects/21071065 6. Mohr et al. RNA. 2013

▪ TGIRT-seq of fragmented E. coli genomic DNA versus simulation indicates that each DNA fragment has a unique UMI with negligible (<0.005%) recopying of DNA templates (not shown)

Introduction

1. Fan et al. PNAS. 2008 2. Snyder et al. Cell. 2016 3. Sun et al. PNAS. 2015

▪ Tissue-of-origin of plasma DNA from a healthy individual deduced from analysis of nucleosome spacing signals downstream of transcription start sites and published RNA-seq data (4) for ssDNA-seq (2) and TGIRT-seq

(A) Streamlined Protocol

(D) TGIRT ssDNA-seq Metrics• Cell-free (cf) DNA in human plasma consists largely of nucleosome-bound DNA fragments released by apoptosis of lymphoid and myeloid cells in blood (1,2)

• In a variety of disease states, plasma is enriched in DNA fragments released from dying cells in the affected tissues. These can be identified by tissue-specific differences in nucleosome positioning, transcription factor occupancy, and DNA methylation sites, thereby providing diagnostic information (2,3)

• Single-stranded DNA sequencing (ssDNA-seq) is more suitable for the analysis of highly fragmented, nicked DNA samples than are conventional dsDNA-seq methods

• The novel end-to-end template-switching activity of Thermostable Group II Intron Reverse Transcriptases (TGIRTs) facilitates ssDNA-seq by enabling direct attachment of DNA-seq adaptors to cDNA product strands without end repair, tailing, or ligation

• TGIRT ssDNA-seq libraries can be constructed from small amounts of starting material in ~2 h with fewer reagents and lower cost than other ssDNA-seq methods

• TGIRTs enable efficient ssDNA-seq that can be used for analysis of cfDNA in human plasma and other bodily fluids

• Identification of protein binding features of cfDNA provides information about the tissue-of-origin and has potential diagnostic applications

• TGIRT DNA-seq should also be applicable to ancient DNA, FFPE DNA, and bisulfite-treated DNA

(C) TGIRT ssDNA-seq Analysis of Nucleosome Positioning and Transcription Factor Occupancies in Human Plasma DNA

(ref. 2)

Read 1 Read 2

A

C

G

T

0 5 10 15 −20 −15 −10 −5

0.00.20.40.60.8

0.00.20.40.60.8

0.00.20.40.60.8

0.00.20.40.60.8

Position Relative to Read ends

Frac

tion

of R

eads

Nextera XTTGIRT−seq

0

1

2

3

4

0 25 50 75 100GC %

Nor

mal

ized

cov

erag

e Nextera XT(Gini: 0.276±0.00946)TGIRT−seq (Gini: 0.263±0.0198)

0

3

6

9

0 10 20 30 40 50Level of Coverage

% o

f Gen

ome

Nextera XT (R−sqrd: 0.912±0.00711)TGIRT−seq (R−sqrd: 0.899±0.0145)

WGS Theoretical (Poisson)

167 nt

0.0

0.5

1.0

1.5

2.0

2.5

0 50 100

150

200

250

300

350

400

Fragment length (nt)

Perc

ent r

eads

ssDNA−seqTGIRT−seq

ssDNA−seq TGIRT−seq

−120−100 −8

0−60−40−20 0 20 40 60 80 10

0−120−100 −8

0−60−40−20 0 20 40 60 80 10

0−0.2

−0.1

0.0

0.1

0.2

Positions relative to center of 167−nt fragments

Nor

mal

ized

cou

nt

AA/AT/TA/TTGG/GC/CG/CC

167 nt

0.00.51.01.52.02.5

0 50 100

150

200

250

300

350

400

Fragment length (nt)

Perc

ent r

eads

ssDNA−seqTGIRT−seq

Long (120−180 nt)Short (35−80 nt)

−1000

−800

−600

−400

−200 0200

400

600

800

1000

−1

0

1

2

0

5

Position relative to CTCF binding sites

ssDNA−seq (ref.2)TGIRT−seq

Scal

ed W

PS

0.00.20.40.60.8

−720

−640

−560

−480

−400

−320

−240

−160 −80 0 80 160

240

320

400

480

560

640

720

Difference in distancebetween nucleosome centers(bp)[ssDNA−seq (ref.2) vs TGIRT−seq]

Peak

cou

nt x105

(a)

(b)

(c)

Figure 3

(Mononucleotide runs < 4)

Indel Rate

Mism

atch Rate

0

1

2

0

1

2

3

Gen

ome

sequ

ence

err

or ra

te

Nextera XT TGIRT−seq

x10−5

x10−3

0.00

0.05

0.10

0.15

0.20

0 1 2 3 4 5 6 7 8 9Mononucleotide run (nt)

Aver

age

inde

l per

read

per m

onon

ucle

otid

e ru

n

Nextera XTTGIRT−seq

0.00

0.25

0.50

0.75

1.00

1.25

−720

−640

−560

−480

−400

−320

−240

−160−80 0 80 160

240

320

400

480

560

640

720

Difference in distancebetween nucleosome centers(bp)[ssDNA−seq (ref.2) vs TGIRT−seq]

Peak

cou

nt

x104