Use of Thermostable Group II Intron Reverse Transcriptases (TGIRTs) for Single- Stranded DNA-seq of...

1
20 min 20 min 60 min ~2 ng (3) Template-switching by TGIRT Alkaline treatment cDNA clean-up (4) Adaptor ligation by thermostable 5’ AppDNA/RNA ligase R2 RNA 3’-Blocker 5’ 5’ 3’-N R2R DNA 5’ 3’OH TGIRT cDNA clean-up (5) PCR amplification 5’-App 3’-Blocker 5’ 3’ R1R R2R 5’ R2R R2 P5 3’ 5’ Barcode+P7 R1 5’ 5’ P DNA nick P (2) Dephosphorylation & denaturation 5’ 3’ OH 5’ 3’ OH (1) Plasma DNA Target DNA (-) 5’ 5’ 5’ 3’ OH 3’ OH 3’ OH 5’ 3’ OH 5’ 3’ OH Target DNA (-) Target DNA (+) Target DNA (+) Target DNA (-) UMI UMI 3’ R1R Use of Thermostable Group II Intron Reverse Transcriptases (TGIRTs) for Single- Stranded DNA-seq of Cell-Free DNA in Human Plasma and Molecular Diagnostics Douglas C. Wu and Alan M. Lambowitz Institute of Cellular and Molecular Biology and Department of Molecular Biosciences, The University of Texas at Austin 2-h workflow from purified DNA to library Simple protocol for adding unique molecular identifiers (UMIs) to exclude PCR duplicates Window protection score (WPS) analysis of long (120-180 nt) fragments exhibits periodicity expected for nucleosome packaging WPS analysis of shorter (35-80 nt) fragments resulting from DNA nicking by endogenous nucleases footprints binding sites for transcription factors, such as CTCF TGIRT-seq predicted nucleosomal binding sites in plasma DNA from a healthy individual match previous studies (2) TGIRTs have higher fidelity than conventional viral RT in RNA-seq (6) TGIRT ssDNA-seq has a 1.5X higher mismatch rates than Nextera XT TGIRT ssDNA-seq more prone to indels in mononucleotide runs 4 nt K12 (MG1655) genomic DNA coverage comparable to Nextera XT (5) High coverage of GC-enriched regions reflects ligation bias Conclusion Reference Grant Support and Conflict-of-interest Statement (B) TGIRT ssDNA-seq of Human Plasma DNA TGIRT-seq of cell-free plasma DNA from a healthy individual gives data similar to that obtained by conventional ssDNA-seq (2) Major peak at ~167 nt corresponds to DNA fragments protected in nucleosome cores 10.4-bp periodicity (gray dashed lines) reflects minor groove nicking of nucleosome-bound DNA by endogenous DNases Dinucleotides pattern at the ends of 167-nt DNA fragments are as expected for inter-nucleosome cleavage Supported by NIH grants GM37949 and GM37951 and Welch Foundation Grant F-1607. Thermostable group II intron reverse transcriptase (TGIRT) enzymes and methods for their use are the subject of patents and patent applications that have been licensed by the University of Texas at Austin and East Tennessee State University to InGex, LLC. A.M.L. and the University of Texas are minority equity holders in InGex, LLC, and A.M.L. and other present and former Lambowitz laboratory members receive royalty payments from sales of TGIRT enzymes and licensing of intellectual property. 4. Uhlen et al. Science. 2015 5. basespace.illumina.com/projects/21071065 6. Mohr et al. RNA. 2013 TGIRT-seq of fragmented E. coli genomic DNA versus simulation indicates that each DNA fragment has a unique UMI with negligible (<0.005%) recopying of DNA templates (not shown) Introduction 1. Fan et al. PNAS. 2008 2. Snyder et al. Cell. 2016 3. Sun et al. PNAS. 2015 Tissue-of-origin of plasma DNA from a healthy individual deduced from analysis of nucleosome spacing signals downstream of transcription start sites and published RNA-seq data (4) for ssDNA-seq (2) and TGIRT-seq (A) Streamlined Protocol (D) TGIRT ssDNA-seq Metrics • Cell-free (cf) DNA in human plasma consists largely of nucleosome-bound DNA fragments released by apoptosis of lymphoid and myeloid cells in blood (1,2) • In a variety of disease states, plasma is enriched in DNA fragments released from dying cells in the affected tissues. These can be identified by tissue- specific differences in nucleosome positioning, transcription factor occupancy, and DNA methylation sites, thereby providing diagnostic information (2,3) • Single-stranded DNA sequencing (ssDNA-seq) is more suitable for the analysis of highly fragmented, nicked DNA samples than are conventional dsDNA-seq methods • The novel end-to-end template-switching activity of Thermostable Group II Intron Reverse Transcriptases (TGIRTs) facilitates ssDNA-seq by enabling direct attachment of DNA-seq adaptors to cDNA product strands without end repair, tailing, or ligation • TGIRT ssDNA-seq libraries can be constructed from small amounts of starting material in ~2 h with fewer reagents and lower cost than other ssDNA-seq methods TGIRTs enable efficient ssDNA-seq that can be used for analysis of cfDNA in human plasma and other bodily fluids Identification of protein binding features of cfDNA provides information about the tissue-of-origin and has potential diagnostic applications TGIRT DNA-seq should also be applicable to ancient DNA, FFPE DNA, and bisulfite-treated DNA (C) TGIRT ssDNA-seq Analysis of Nucleosome Positioning and Transcription Factor Occupancies in Human P lasma DNA (ref. 2) Read 1 Read 2 A C G T 0 5 10 15 20 15 10 5 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 Position Relative to Read ends Fraction of Reads Nextera XT TGIRTseq 0 1 2 3 4 0 25 50 75 100 GC % Normalized coverage Nextera XT(Gini: 0.276±0.00946) TGIRTseq (Gini: 0.263±0.0198) 0 3 6 9 0 10 20 30 40 50 Level of Coverage % of Genome Nextera XT (Rsqrd: 0.912±0.00711) TGIRTseq (Rsqrd: 0.899±0.0145) WGS Theoretical (Poisson) 167 nt 0.0 0.5 1.0 1.5 2.0 2.5 0 50 100 150 200 250 300 350 400 Fragment length (nt) Percent reads ssDNAseq TGIRTseq ssDNAseq TGIRTseq 120 100 80 60 40 20 0 20 40 60 80 100 120 100 80 60 40 20 0 20 40 60 80 100 0.2 0.1 0.0 0.1 0.2 Positions relative to center of 167nt fragments Normalized count AA/AT/TA/TT GG/GC/CG/CC Long (120−180 nt) Short (35−80 nt) −1000 −800 −600 −400 −200 0 200 400 600 800 1000 −1 0 1 2 0 5 Position relative to CTCF binding sites ssDNA−seq (ref.2) TGIRT−seq Scaled WPS (c) (Mononucleotide runs < 4) Indel Rate Mismatch Rate 0 1 2 0 1 2 3 Genome sequence error rate Nextera XT TGIRTseq x10 5 x10 3 0.00 0.05 0.10 0.15 0.20 0 1 2 3 4 5 6 7 8 9 Mononucleotide run (nt) Average indel per read per mononucleotide run Nextera XT TGIRTseq 0.00 0.25 0.50 0.75 1.00 1.25 720 640 560 480 400 320 240 160 80 0 80 160 240 320 400 480 560 640 720 Difference in distance between nucleosome centers(bp) [ssDNAseq (ref.2) vs TGIRTseq] Peak count x10 4

Transcript of Use of Thermostable Group II Intron Reverse Transcriptases (TGIRTs) for Single- Stranded DNA-seq of...

Page 1: Use of Thermostable Group II Intron Reverse Transcriptases (TGIRTs) for Single- Stranded DNA-seq of Cell-Free DNA in Human Plasma and Molecular Diagnostics

20 min

20 min

60 min

~2 ng

(3) Template-switching by TGIRT

Alkaline treatment cDNA clean-up

(4) Adaptor ligation bythermostable 5’ AppDNA/RNA ligase

R2 RNA 3’-Blocker

5’

5’

3’-N R2R DNA

5’ 3’OH

TGIRT

cDNA clean-up

(5) PCR amplification

5’-App 3’-Blocker 5’ 3’ R1R R2R

5’

R2R

R2

P53’

5’

Barcode+P7R1

5’5’P

DNA nick

P

(2) Dephosphorylation &denaturation

5’ 3’ OH5’3’ OH

(1) Plasma DNA

Target DNA (-)

5’5’5’ 3’ OH

3’ OH3’ OH5’ 3’ OH

5’ 3’ OH

Target DNA (-)

Target DNA (+)

Target DNA (+)

Target DNA (-)

UMI

UMI

3’R1R

Use of Thermostable Group II Intron Reverse Transcriptases (TGIRTs) for Single-Stranded DNA-seq of Cell-Free DNA in Human Plasma and Molecular Diagnostics

Douglas C. Wu and Alan M. Lambowitz Institute of Cellular and Molecular Biology and Department of Molecular Biosciences, The University of Texas at Austin

▪ 2-h workflow from purified DNA to library ▪ Simple protocol for adding unique molecular

identifiers (UMIs) to exclude PCR duplicates

▪ Window protection score (WPS) analysis of long (120-180 nt) fragments exhibits periodicity expected for nucleosome packaging

▪ WPS analysis of shorter (35-80 nt) fragments resulting from DNA nicking by endogenous nucleases footprints binding sites for transcription factors, such as CTCF

▪ TGIRT-seq predicted nucleosomal binding sites in plasma DNA from a healthy individual match previous studies (2)

▪ TGIRTs have higher fidelity than conventional viral RT in RNA-seq (6)

▪ TGIRT ssDNA-seq has a 1.5X higher mismatch rates than Nextera XT

▪ TGIRT ssDNA-seq more prone to indels in mononucleotide runs ≥4 nt

▪ K12 (MG1655) genomic DNA coverage comparable to Nextera XT (5)

▪ High coverage of GC-enriched regions reflects ligation bias

Conclusion

Reference

Grant Support and Conflict-of-interest Statement

(B) TGIRT ssDNA-seq of Human Plasma DNA

▪ TGIRT-seq of cell-free plasma DNA from a healthy individual gives data similar to that obtained by conventional ssDNA-seq (2)

▪ Major peak at ~167 nt corresponds to DNA fragments protected in nucleosome cores

▪ 10.4-bp periodicity (gray dashed lines) reflects minor groove nicking of nucleosome-bound DNA by endogenous DNases

▪ Dinucleotides pattern at the ends of 167-nt DNA fragments are as expected for inter-nucleosome cleavage

Supported by NIH grants GM37949 and GM37951 and Welch Foundation Grant F-1607. Thermostable group II intron reverse transcriptase (TGIRT) enzymes and methods for their use are the subject of patents and patent applications that have been licensed by the University of Texas at Austin and East Tennessee State University to InGex, LLC. A.M.L. and the University of Texas are minority equity holders in InGex, LLC, and A.M.L. and other present and former Lambowitz laboratory members receive royalty payments from sales of TGIRT enzymes and licensing of intellectual property.

4. Uhlen et al. Science. 2015 5. basespace.illumina.com/projects/21071065 6. Mohr et al. RNA. 2013

▪ TGIRT-seq of fragmented E. coli genomic DNA versus simulation indicates that each DNA fragment has a unique UMI with negligible (<0.005%) recopying of DNA templates (not shown)

Introduction

1. Fan et al. PNAS. 2008 2. Snyder et al. Cell. 2016 3. Sun et al. PNAS. 2015

▪ Tissue-of-origin of plasma DNA from a healthy individual deduced from analysis of nucleosome spacing signals downstream of transcription start sites and published RNA-seq data (4) for ssDNA-seq (2) and TGIRT-seq

(A) Streamlined Protocol

(D) TGIRT ssDNA-seq Metrics• Cell-free (cf) DNA in human plasma consists largely of nucleosome-bound DNA fragments released by apoptosis of lymphoid and myeloid cells in blood (1,2)

• In a variety of disease states, plasma is enriched in DNA fragments released from dying cells in the affected tissues. These can be identified by tissue-specific differences in nucleosome positioning, transcription factor occupancy, and DNA methylation sites, thereby providing diagnostic information (2,3)

• Single-stranded DNA sequencing (ssDNA-seq) is more suitable for the analysis of highly fragmented, nicked DNA samples than are conventional dsDNA-seq methods

• The novel end-to-end template-switching activity of Thermostable Group II Intron Reverse Transcriptases (TGIRTs) facilitates ssDNA-seq by enabling direct attachment of DNA-seq adaptors to cDNA product strands without end repair, tailing, or ligation

• TGIRT ssDNA-seq libraries can be constructed from small amounts of starting material in ~2 h with fewer reagents and lower cost than other ssDNA-seq methods

• TGIRTs enable efficient ssDNA-seq that can be used for analysis of cfDNA in human plasma and other bodily fluids

• Identification of protein binding features of cfDNA provides information about the tissue-of-origin and has potential diagnostic applications

• TGIRT DNA-seq should also be applicable to ancient DNA, FFPE DNA, and bisulfite-treated DNA

(C) TGIRT ssDNA-seq Analysis of Nucleosome Positioning and Transcription Factor Occupancies in Human Plasma DNA

(ref. 2)

Read 1 Read 2

A

C

G

T

0 5 10 15 −20 −15 −10 −5

0.00.20.40.60.8

0.00.20.40.60.8

0.00.20.40.60.8

0.00.20.40.60.8

Position Relative to Read ends

Frac

tion

of R

eads

Nextera XTTGIRT−seq

0

1

2

3

4

0 25 50 75 100GC %

Nor

mal

ized

cov

erag

e Nextera XT(Gini: 0.276±0.00946)TGIRT−seq (Gini: 0.263±0.0198)

0

3

6

9

0 10 20 30 40 50Level of Coverage

% o

f Gen

ome

Nextera XT (R−sqrd: 0.912±0.00711)TGIRT−seq (R−sqrd: 0.899±0.0145)

WGS Theoretical (Poisson)

167 nt

0.0

0.5

1.0

1.5

2.0

2.5

0 50 100

150

200

250

300

350

400

Fragment length (nt)

Perc

ent r

eads

ssDNA−seqTGIRT−seq

ssDNA−seq TGIRT−seq

−120−100 −8

0−60−40−20 0 20 40 60 80 10

0−120−100 −8

0−60−40−20 0 20 40 60 80 10

0−0.2

−0.1

0.0

0.1

0.2

Positions relative to center of 167−nt fragments

Nor

mal

ized

cou

nt

AA/AT/TA/TTGG/GC/CG/CC

167 nt

0.00.51.01.52.02.5

0 50 100

150

200

250

300

350

400

Fragment length (nt)

Perc

ent r

eads

ssDNA−seqTGIRT−seq

Long (120−180 nt)Short (35−80 nt)

−1000

−800

−600

−400

−200 0200

400

600

800

1000

−1

0

1

2

0

5

Position relative to CTCF binding sites

ssDNA−seq (ref.2)TGIRT−seq

Scal

ed W

PS

0.00.20.40.60.8

−720

−640

−560

−480

−400

−320

−240

−160 −80 0 80 160

240

320

400

480

560

640

720

Difference in distancebetween nucleosome centers(bp)[ssDNA−seq (ref.2) vs TGIRT−seq]

Peak

cou

nt x105

(a)

(b)

(c)

Figure 3

(Mononucleotide runs < 4)

Indel Rate

Mism

atch Rate

0

1

2

0

1

2

3

Gen

ome

sequ

ence

err

or ra

te

Nextera XT TGIRT−seq

x10−5

x10−3

0.00

0.05

0.10

0.15

0.20

0 1 2 3 4 5 6 7 8 9Mononucleotide run (nt)

Aver

age

inde

l per

read

per m

onon

ucle

otid

e ru

n

Nextera XTTGIRT−seq

0.00

0.25

0.50

0.75

1.00

1.25

−720

−640

−560

−480

−400

−320

−240

−160−80 0 80 160

240

320

400

480

560

640

720

Difference in distancebetween nucleosome centers(bp)[ssDNA−seq (ref.2) vs TGIRT−seq]

Peak

cou

nt

x104