Unique Endogenous Controls for Extraction-Free Targeted ... · to add NGS adapters. No RNA...

1
Unique Endogenous Controls for Extraction-Free Targeted RNA Sequencing Assays Ihab Botros, Patrick Roche, and Debrah Thompson HTG Molecular Diagnostics, Tucson, AZ HTG Molecular Diagnostics, Inc. | 3430 E. Global Loop | Tucson, AZ 85706 | (877) 289-2615 | htgmolecular.com | Presented at AGBT 2016 For Research Use Only. Not for use in diagnostic procedures. HTG EdgeSeq, HTG Edge and qNPA are trademarks of HTG Molecular Diagnostics, Inc. Any other trademarks or trade names used herein are the intellectual property of their respective owners. HTG EdgeSeq system & chemistry overview Abstract Measurement of DNA elements and co-measurement of DNA and RNA is highly reproducible. § Scatterplot matrix plots counts obtained for DNA and DNA/RNA measurement in technical triplicates of FFPE tissue (lung adenocarcinoma sample, left and colon carcinoma sample, right). All samples were run at 2.4mm 2 per replicate. Data were log2-transformed. Pairwise correlation and Pearson correlation are shown. Green dots show the RNA probes and black dots the DNA probes. Isolated RNA, cell line lysates, and other FFPE samples had similar results (not shown). Reproducibility DNA probes are specific for DNA: DNase I and RNase A treatments demonstrate specificity § (Top) DNase treatment. To demonstrate that the DNA probes specifically target DNA, we measured DNA probe signal in genomic DNA samples (human liver) with and without pre-treatment with DNase I. Fifty ng of gDNA was used per sample. Values are averages from triplicate samples; two higher-copy and four medium-copy probes are shown. § (Bottom) RNase treatment. RNase A treatment of the same gDNA samples, demonstrating that RNase treatment does not adversely affect the DNA signal. Data were normalized to all counts before taking an average of triplicate samples (our gDNA preparation method is known to co-purify some RNA). Sample Load Correlation (Titrations) Summary and Future Plans Summary § We generated HTG EdgeSeq nuclease protection probes to measure repetitive DNA. These probes were used to make an HTG EdgeSeq assay to co-measure DNA and RNA (DNA-RNAhk assay). § DNA probes within the DNA-RNAhk assay generate reproducible data on multiple sample types, including FFPE tissue. § Signal from both endogenous and exogenous DNA titrates as assay input is diluted, suggesting that DNA probe signal tracks the sample load and could be used as a measurement of sample input. Future Experiments § Perform experiments to determine relationship between DNA probe signal and absolute cell number/cellularity of sample), using a well-defined cell line. § Introduce some of the medium-copy probes ( e.g. U1 or LTR3) into an established HTG EdgeSeq gene expression assay and compare performance of the DNA probes as potential normalizers to our current normalization process. § Determine relationship between reads and absolute copy number of target repeats. Questions § What is a good tool for mapping repeat probes to the genome with high specificity? Work on HTG EdgeSeq system & chemistry supported by NIH grants R44HG005949 and R43HG005949 Do DNA probes reflect the amount of sample in the reaction? § (TOP) Sample Setup: To begin asking how well our DNA probes measure sample load, we added an external spike-in of DNA into the FFPE lysate, and then serially diluted the mixture (See top figure). Samples were run in triplicate on the DNA-RNAhk assay. Samples were sequenced and both equivalence between dilutions and titration were examined. § (MIDDLE) Signal from DNA probes and spike-ins behaves similarly. We first examined titration of the DNA probe signal in the sample and exogenous control separately. The R2 values and shape of the data are similar for both (raw data shown), suggesting that our DNA probes are accurately reflecting the sample load. Since the exogenous control is separate from the sample, it can be used both as an external meter for signal intensity and to correct for experiment-level dilution bias (seen at the lower end of the dilution). § (BOTTOM) Equivalence of signal from a titration series – DNA probes reflect sample load. Scatterplot matrix showing correlation of signal (DNA and RNA) between dilution points of a given sample. Average of triplicates are shown. Raw data were log2-transformed before pairwise correlation was performed. Pearson correlations are shown. RNA probes are the orange dots, exogenous controls are the green dots, and DNA probes are the black dots. DNA probes appear to faithfully reflect the change in input. This strongly suggests that the DNA probes are faithfully reflecting the sample load. Specificity for DNA HTG EdgeSeq chemistry is a coupling of nuclease protection and next generation sequencing (NGS) designed to generate targeted RNA sequencing libraries. Library preparation occurs in two steps: automated nuclease protection, performed on the HTG EdgeSeq processor, followed by PCR to add NGS adapters. No RNA extraction* of the sample is necessary, even when using fixed (FFPE) tissue samples. A specific limitation of extraction-free RNA measurement is the lack of a relevant positive or normalization control. As RNA is not extracted, the precise amount of RNA in the assay is essentially unknown. A spike-in control can provide a “ruler”, but since its relation to the sample input is unknown, it is mainly useful as a process control, and cannot be used for sample normalization. A better approach is to use an inherent property of the sample to perform sample-level “normalization.” In this study, we explore the use of repetitive genomic DNA elements to serve as a “cell counter” proxy — adding DNA-specific probes to co-measure DNA species within HTG’s RNA assay. We structured this study to answer three main questions. One, do probes to repetitive genomic elements provide reliable and reproducible measurements? Second, do these probes, or the protocol changes required for co-measurement of RNA and DNA, impact measurement of the RNA signal? Finally, how well do our measurements correlate to both the DNA present and the sample load added? Those answers directly impact how these controls can be used to normalize sample input. Results, specific probes, and methods will be presented. In the future, we plan to explore whether this approach could also be used, possibly in conjunction with non-coding RNAs, to determine cell-type proportions within a clinical specimen, or to determine tumor percentage. Either would be highly useful in-line information. *Most sample types such as FFPE, cell lines, PAXgene, plasma and serum Extraction-Free Sample Size Normalization Challenge Tuning possibilities: Which DNA probes to use? Repetitive element Estimated copy number (rounded) HSAT1 (Human satellite 1) (this is male-specific) 50 U1 (U1 snRNA (spliceosomal RNA) gene) 200 U5 (snRNA (spliceosomal RNA) gene / pseudogene) 100 HSAT6 (Human centromeric satellite 6) 100 Acro1 (Human acromeric satellite 1) 500 tRNA-Ser >10 HY4 (Y scRNA gene / pseudogene) 150 LTR3 ((Long Terminal Repeat) for HERVK3 endogenous retrovirus)) 550 tRNA-Unk >10 Nine elements were chosen with a range of expected copy number. Two probes were designed to each elements. Two HTG EdgeSeq assays were built with these probes: 1. DNA probes only. 2. DNA elements and RNA housekeeping genes (“DNA-RNAhk”). Lung adenocarcinoma FFPE sample Colon carcinoma FFPE sample 124701 162272 36 51 0 20000 40000 60000 80000 100000 120000 140000 160000 180000 Acro1 HSAT1_1 Average Raw Reads Impact of DNase Treatment of Sample on DNA Probe Expression - High Copy gDNA-Untreated gDNA-DnaseTreated 5280 2552 6804 932 12 8 5 17 0 1000 2000 3000 4000 5000 6000 7000 8000 Average Raw Reads Impact of DNase Treatment of Sample on DNA Probe Expression - Medium Copy gDNA-Untreated gDNA-DnaseTreated 4183 5452 4302 5144 0 1000 2000 3000 4000 5000 6000 Acro1 HSAT1_1 Average Normalized Reads Impact of RNase Treatment of Sample on DNA Probe Expression-High Copy gDNA-Untreated-r1 gDNA-RnaseTreated-r1 177 120 228 31 208 120 215 36 0 50 100 150 200 250 U1_1 HY4_1 LTR3_1 tRNAser1 Average Normalized Reads Impact of RNase Treatment of Sample on DNA Probes Expression-Medium Copy gDNA-Untreated-r1 gDNA-RnaseTreated-r1 R² = 0.90388 0 50000 100000 150000 200000 250000 4.8 2.4 1.2 0.6 Total raw reads LungSqu FFPE, size in mm 2 Total counts, spike-in control Lung carcinoma FFPE sample B-cell lymphoma FFPE sample R² = 0.89029 0 200000 400000 600000 800000 1000000 1200000 1400000 4.8 2.4 1.2 0.6 Total raw reads LungSqu FFPE, size in mm 2 Total counts, DNA probes 6044.0 5640.7 5376.8 5357.6 0.0 1000.0 2000.0 3000.0 4000.0 5000.0 6000.0 7000.0 4.8 2.4 1.2 0.6 Total normalized reads LungSqu FFPE, size in mm 2 Normalized total counts (sample/spike-in) § We measure FFPE samples by area (see above) § Area measurement makes it hard to determine how many cells are present in a sample § Spike-in controls have no obvious relationship to the sample size § Could an inherent property of the sample be used to count cells with some degree of accuracy? § Since no extraction is performed, the DNA is still present in the sample § Repetitive human DNA elements were chosen as potential cell enumerators § HTG EdgeSeq probes were designed to target sequences within such elements Protection probes with wings hybridize to wingmen and target RNA S1 nuclease digest unbound RNA and probes Base hydrolysis eliminate target RNA DNA counts within an RNA assay HTG EdgeSeq chemistry is targeted RNA sequencing, in which the probes are counted during sequencing. We used the DNA- RNAhk assay to determine which DNA probes had a copy number that fit within the range of an HTG EdgeSeq expression assay (see left graph). The housekeeping genes within the test assay (shown in blue) have a fairly broad range of expression, but the DNA repeats (shown in orange) surpassed their range. Probe names are labeled. A similar graph is shown at right for two additional samples at two dilutions each to demonstrate that the ranges are not adversely affected by sample type or amount. The DNA probes span a fairly wide range, but several probes fall within the range of the housekeepers. This means there are several DNA probes that might work well within HTG EdgeSeq assays. FFPE lysate External Spike- In (DNA) Dilution series: FFPE lysate with external spike-in Signal from DNA probes and spike-in controls behaves similarly. Equivalent results from a titration series – DNA probes reflect sample load. Step 1 Step 2 Protection Probe Wing Wing Target RNA Wingman Wingman Step 3 Tag 2 Tag 1 Adapter Adapter PCR step adds adapters and tags Tagged library sequenced on any NGS system Step 4 Step 5 LungAD 2.4mm 2 rep1 LungAD 2.4mm 2 rep2 LungAD 2.4mm 2 rep3 Colon 2.4mm 2 rep2 Colon 2.4mm 2 rep3 Colon 2.4mm 2 rep1 LungSqu 4.8 mm 2 LungSqu 2.4 mm 2 LungSqu 1.2 mm 2 LungSqu 0.6 mm 2 DLBCL 4.8 mm 2 DLBCL 2.4 mm 2 DLBCL 1.2 mm 2 DLBCL 0.6 mm 2

Transcript of Unique Endogenous Controls for Extraction-Free Targeted ... · to add NGS adapters. No RNA...

Page 1: Unique Endogenous Controls for Extraction-Free Targeted ... · to add NGS adapters. No RNA extraction* of the sample is necessary, even when using fixed (FFPE) tissue samples. A specific

Unique Endogenous Controls for Extraction-Free Targeted RNA Sequencing Assays Ihab Botros, Patrick Roche, and Debrah Thompson

HTG Molecular Diagnostics, Tucson, AZ

HTG Molecular Diagnostics, Inc. | 3430 E. Global Loop | Tucson, AZ 85706 | (877) 289-2615 | htgmolecular.com | Presented at AGBT 2016For Research Use Only. Not for use in diagnostic procedures.

HTG EdgeSeq, HTG Edge and qNPA are trademarks of HTG Molecular Diagnostics, Inc. Any other trademarks or trade names used herein are the intellectual property of their respective owners.

HTG EdgeSeq system & chemistry overview

Abstract

Measurement of DNA elements and co-measurement of DNA and RNA is highly reproducible.

§ Scatterplot matrix plots counts obtained for DNA and DNA/RNA measurement in technical triplicates of FFPE tissue (lung adenocarcinoma sample, left and colon carcinoma sample, right). All samples were run at 2.4mm2

per replicate. Data were log2-transformed. Pairwise correlation and Pearson correlation are shown. Green dots show the RNA probes and black dots the DNA probes. Isolated RNA, cell line lysates, and other FFPE samples had similar results (not shown).

Reproducibility

DNA probes are specific for DNA:DNase I and RNase A treatments demonstrate specificity

§ (Top) DNase treatment. To demonstrate that the DNA probes specifically target DNA, we measured DNA probe signal in genomic DNA samples (human liver) with and without pre-treatment with DNase I. Fifty ng of gDNA was used per sample. Values are averages from triplicate samples; two higher-copy and four medium-copy probes are shown.

§ (Bottom) RNase treatment. RNase A treatment of the same gDNA samples, demonstrating that RNase treatment does not adversely affect the DNA signal. Data were normalized to all counts before taking an average of triplicate samples (our gDNA preparation method is known to co-purify some RNA).

Sample Load Correlation (Titrations)

Summary and Future Plans

Summary§We generated HTG EdgeSeq nuclease protection probes to measure repetitive DNA. These probes were used to make an HTG EdgeSeq assay to co-measure DNA and RNA (DNA-RNAhk assay).

§DNA probes within the DNA-RNAhk assay generate reproducible data on multiple sample types, including FFPE tissue.

§Signal from both endogenous and exogenous DNA titrates as assay input is diluted, suggesting that DNA probe signal tracks the sample load and could be used as a measurement of sample input.

Future Experiments§Perform experiments to determine relationship between DNA probe signal and absolute cell number/cellularity of sample), using a well-defined cell line.

§ Introduce some of the medium-copy probes (e.g. U1 or LTR3) into an established HTG EdgeSeq gene expression assay and compare performance of the DNA probes as potential normalizers to our current normalization process.

§Determine relationship between reads and absolute copy number of target repeats.

Questions§What is a good tool for mapping repeat probes to the genome with high specificity?

Work on HTG EdgeSeq system & chemistry supported by NIH grants R44HG005949 and R43HG005949

Do DNA probes reflect the amount of sample in the reaction?

§ (TOP) Sample Setup: To begin asking how well our DNA probes measure sample load, we added an external spike-in of DNA into the FFPE lysate, and then serially diluted the mixture (See top figure). Samples were run in triplicate on the DNA-RNAhk assay. Samples were sequenced and both equivalence between dilutions and titration were examined.

§ (MIDDLE) Signal from DNA probes and spike-ins behaves similarly. We first examined titration of the DNA probe signal in the sample and exogenous control separately. The R2 values and shape of the data are similar for both (raw data shown), suggesting that our DNA probes are accurately reflecting the sample load. Since the exogenous control is separate from the sample, it can be used both as an external meter for signal intensity and to correct for experiment-level dilution bias (seen at the lower end of the dilution).

§ (BOTTOM) Equivalence of signal from a titration series – DNA probes reflect sample load. Scatterplot matrix showing correlation of signal (DNA and RNA) between dilution points of a given sample. Average of triplicates are shown. Raw data were log2-transformed before pairwise correlation was performed. Pearson correlations are shown. RNA probes are the orange dots, exogenous controls are the green dots, and DNA probes are the black dots. DNA probes appear to faithfully reflect the change in input.

This strongly suggests that the DNA probes are faithfully reflecting the sample load.

Specificity for DNA

HTG EdgeSeq chemistry is a coupling of nuclease protection and next generation sequencing(NGS) designed to generate targeted RNA sequencing libraries. Library preparation occurs in twosteps: automated nuclease protection, performed on the HTG EdgeSeq processor, followed by PCRto add NGS adapters. No RNA extraction* of the sample is necessary, even when using fixed(FFPE) tissue samples.

A specific limitation of extraction-free RNA measurement is the lack of a relevant positive ornormalization control. As RNA is not extracted, the precise amount of RNA in the assay isessentially unknown. A spike-in control can provide a “ruler”, but since its relation to the sampleinput is unknown, it is mainly useful as a process control, and cannot be used for samplenormalization. A better approach is to use an inherent property of the sample to performsample-level “normalization.” In this study, we explore the use of repetitive genomic DNAelements to serve as a “cell counter” proxy — adding DNA-specific probes to co-measure DNAspecies within HTG’s RNA assay.

We structured this study to answer three main questions. One, do probes to repetitive genomicelements provide reliable and reproducible measurements? Second, do these probes, or theprotocol changes required for co-measurement of RNA and DNA, impact measurement of the RNAsignal? Finally, how well do our measurements correlate to both the DNA present and the sampleload added? Those answers directly impact how these controls can be used to normalize sampleinput. Results, specific probes, and methods will be presented.

In the future, we plan to explore whether this approach could also be used, possibly inconjunction with non-coding RNAs, to determine cell-type proportions within a clinical specimen,or to determine tumor percentage. Either would be highly useful in-line information.

*Most sample types such as FFPE, cell lines, PAXgene, plasma and serum

Extraction-Free Sample Size Normalization Challenge

Tuning possibilities: Which DNA probes to use?

Repetitive element Estimated copy number (rounded)

HSAT1 (Human satellite 1) (this is male-specific) 50

U1 (U1 snRNA (spliceosomal RNA) gene) 200

U5 (snRNA (spliceosomal RNA) gene / pseudogene) 100

HSAT6 (Human centromeric satellite 6) 100

Acro1 (Human acromeric satellite 1) 500

tRNA-Ser >10

HY4 (Y scRNA gene / pseudogene) 150

LTR3 ((Long Terminal Repeat) for HERVK3 endogenous retrovirus))

550

tRNA-Unk >10

Nine elements were chosen with a range of expected copy number. Two probes were designed to each elements.

Two HTG EdgeSeq assays were built with these probes:1. DNA probes only.2. DNA elements and RNA

housekeeping genes (“DNA-RNAhk”).

Lung adenocarcinoma FFPE sample

Colon carcinoma FFPE sample

124701

162272

36 510

20000

40000

60000

80000

100000

120000

140000

160000

180000

Acro1 HSAT1_1

AverageRawReads

Impact of DNase Treatment of Sample on DNA Probe Expression - High Copy

gDNA-Untreated gDNA-DnaseTreated

5280

2552

6804

932

12 8 5 170

1000

2000

3000

4000

5000

6000

7000

8000

Aver

age

Raw

Rea

ds

Impact of DNase Treatment of Sample on DNA Probe Expression - Medium Copy

gDNA-Untreated gDNA-DnaseTreated

4183

5452

4302

5144

0

1000

2000

3000

4000

5000

6000

Acro1 HSAT1_1

Ave

rage

Nor

mal

ized

Rea

ds

Impact of RNase Treatment of Sample on DNA Probe Expression-High Copy

gDNA-Untreated-r1 gDNA-RnaseTreated-r1

177

120

228

31

208

120

215

36

0

50

100

150

200

250

U1_1 HY4_1 LTR3_1 tRNAser1

Ave

rage

Nor

mal

ized

Rea

ds

Impact of RNase Treatment of Sample on DNA Probes Expression-Medium Copy

gDNA-Untreated-r1 gDNA-RnaseTreated-r1

R² = 0.90388

0

50000

100000

150000

200000

250000

4.8 2.4 1.2 0.6

Tota

l ra

w r

eads

LungSqu FFPE, size in mm2

Total counts, spike-in control

Lung carcinoma FFPE sample B-cell lymphoma FFPE sample

R² = 0.89029

0

200000

400000

600000

800000

1000000

1200000

1400000

4.8 2.4 1.2 0.6

Tota

l ra

w r

eads

LungSqu FFPE, size in mm2

Total counts, DNA probes

6044.05640.7

5376.8 5357.6

0.0

1000.0

2000.0

3000.0

4000.0

5000.0

6000.0

7000.0

4.8 2.4 1.2 0.6

Tota

l no

rmal

ized

rea

ds

LungSqu FFPE, size in mm2

Normalized total counts(sample/spike-in)

§ We measure FFPE samples by area (see above)§ Area measurement makes it hard to determine how many cells are present in a sample§ Spike-in controls have no obvious relationship to the sample size§ Could an inherent property of the sample be used to count cells with some degree of accuracy?

§ Since no extraction is performed, the DNA is still present in the sample§ Repetitive human DNA elements were chosen as potential cell enumerators§ HTG EdgeSeq probes were designed to target sequences within such elements

Protection probes with wings hybridize to wingmen and target RNA

S1 nuclease digest unbound RNA and probes

Base hydrolysis eliminate target RNA

DNA counts within an RNA assay

HTG EdgeSeq chemistry is targeted RNA sequencing, in which the probes are counted during sequencing. We used the DNA-RNAhk assay to determine which DNA probes had a copy number that fit within the range of an HTG EdgeSeq expression assay (see left graph). The housekeeping genes within the test assay (shown in blue) have a fairly broad range of expression, but theDNA repeats (shown in orange) surpassed their range. Probe names are labeled.

A similar graph is shown at right for two additional samples at two dilutions each to demonstrate that the ranges are not adversely affected by sample type or amount.

The DNA probes span a fairly wide range, but several probes fall within the range of the housekeepers. This means there are several DNA probes that might work well within HTG EdgeSeq assays.

FFPE lysate

External Spike-In (DNA)

Dilution series: FFPE lysate with external spike-in

Signal from DNA probes and spike-in controls behaves similarly.

Equivalent results from a titration series – DNA probes reflect sample load.

Step 1

Step 2

Protection Probe WingWing

Target RNA WingmanWingman

Step 3

Tag2

Tag1

Adapter

Adapter

PCR step adds adapters and tags

Tagged library sequenced on any NGS system

Step 4 Step 5

LungAD2.4mm2

rep1

LungAD2.4mm2

rep2

LungAD2.4mm2

rep3

Colon2.4mm2

rep2

Colon2.4mm2

rep3

Colon2.4mm2

rep1

LungSqu4.8 mm2

LungSqu2.4 mm2

LungSqu1.2 mm2

LungSqu0.6 mm2

DLBCL4.8 mm2

DLBCL2.4 mm2

DLBCL1.2 mm2

DLBCL0.6 mm2