Author's response to reviews Biomarker discovery ... · OK - all raw data has been submitted to the...

12
Author's response to reviews Title:Biomarker discovery: Quantification of microRNAs and other small non-coding RNAs using next generation sequencing Authors: Juan Pablo Lopez ( [email protected]) Alpha Diallo ( [email protected]) Cristiana Cruceanu ( [email protected]) Laura M. Fiori ( [email protected]) Sylvie Laboissiere ( [email protected]) Isabelle Guillet ( [email protected]) Joelle Fontaine ( [email protected]) Jiannis Ragoussis ( [email protected]) Vladimir Benes ( [email protected]) Gustavo Turecki ( [email protected]) Carl Ernst ( [email protected]) Version:2Date:3 June 2015 Author's response to reviews: see over

Transcript of Author's response to reviews Biomarker discovery ... · OK - all raw data has been submitted to the...

Page 1: Author's response to reviews Biomarker discovery ... · OK - all raw data has been submitted to the NCBI - Gene Expression Omnibus database and our forms are currently under review.

Author's response to reviews

Title:Biomarker discovery: Quantification of microRNAs and other smallnon-coding RNAs using next generation sequencing

Authors:

Juan Pablo Lopez ([email protected])Alpha Diallo ([email protected])Cristiana Cruceanu ([email protected])Laura M. Fiori ([email protected])Sylvie Laboissiere ([email protected])Isabelle Guillet ([email protected])Joelle Fontaine ([email protected])Jiannis Ragoussis ([email protected])Vladimir Benes ([email protected])Gustavo Turecki ([email protected])Carl Ernst ([email protected])

Version:2Date:3 June 2015

Author's response to reviews: see over

Page 2: Author's response to reviews Biomarker discovery ... · OK - all raw data has been submitted to the NCBI - Gene Expression Omnibus database and our forms are currently under review.

Referee #1 (Reviewer’s report)

This manuscript provided a comprehensive design to evaluate the influence of purification method, miRNA-

seq method (Hiseq or Miseq), RNA quality or degradation and sequencing coverage to miRNA-seq. And the

authors found RNA degradation, Hiseq or Miseq would not bring dramatically effect to miRNA-seq while

purification method in library preparation would bring significantly change to miRNA-seq. The study was

performed rigorously and the findings are interesting. However, the manuscript needs more careful editing

since the design is very complicated which would make the reader confused without perfect and smooth

manuscript. In general, I'd recommend publication if the authors can address the following concerns and

prepare a more concise draft.

RESPONSE: We greatly appreciate the referee’s enthusiasm for our work.

1) In the background section, The function of the miRNA, LincRNA, rRNA, piRNA and T-UCRs, however, please

provide some explicit evidences that they could be taken as biomarkers, or else, please shorten these

comprehensive description.

RESPONSE:

We agree with the reviewer that this section could be more refined. To address the reviewer’s comments we

have improved the wording in these sections and now include explicit examples of miRNA, lincRNA, snoRNA,

piRNA, and T-UCRs as biomarkers within each descriptive section.

In addition, DNA methylation also has been considered to be a great biomarker for some complex disease. It

should be mentioned in the background.

RESPONSE:

We agree with the reviewer. We have now added a small section on DNA methylation in the introduction, as

part of our overall description of some known biomarkers. Specifically, we write (paragraph 1):

“…DNA methylation is also a well-studied biomarker [8-10]. Though not a focus of the current report,

methylated cytosine residues have been associated with several diseases, including cancer and neurological

disorders [11].

[8] Kandimalla R, van Tilborg AA, Zwarthoff EC. DNA methylation-based biomarkers in bladder cancer. Nature reviews Urology. 2013;10(6):327-35. doi:10.1038/nrurol.2013.89. [9] Ordovas JM, Smith CE. Epigenetics and cardiovascular disease. Nature reviews Cardiology. 2010;7(9):510-9. doi:10.1038/nrcardio.2010.104. [10] Warton K, Samimi G. Methylation of cell-free circulating DNA in the diagnosis of cancer. Frontiers in molecular biosciences. 2015;2:13. doi:10.3389/fmolb.2015.00013. [11] Schubeler D. Function and information content of DNA methylation. Nature. 2015;517(7534):321-6. doi:10.1038/nature14192.

Page 3: Author's response to reviews Biomarker discovery ... · OK - all raw data has been submitted to the NCBI - Gene Expression Omnibus database and our forms are currently under review.

2) In the background section, large content were used to introduce miRNA, piRNA and so on. However, these

information is non-informative for the manuscript. However, the most important thing of Table 1 is lack of

enough description. Please change the styles.

RESPONSE:

To address this comment, we have refined our wording in the introductory paragraph and we better highlight

Table 1 to explicit descriptions in the manuscript.

3) Please make sure about the GEO accession ID for the dataset is correct in the line 5 of page 11.

RESPONSE:

OK - all raw data has been submitted to the NCBI - Gene Expression Omnibus database and our forms are

currently under review. The GEO accession code will be provided once we have them.

4) All the commas in the Tables should be replaced with points throughout Tabel 2,4,5 and 6

RESPONSE:

As suggested by the reviewer, all the commas in the tables have been replaced by periods.

5) In the section of “Bioinformatic output measures for small RNA sequencing quality control”, the sentence

of “was consistent with published results” and “was externally validated with **” was confusing. Please

make this sentence clearer.

RESPONSE:

We have modified the paragraph under “Bioinformatic output measures for small RNA sequencing quality

control” (underlined and italicized below) to make the sentence more clear.

It now reads:

There are several important parameters to test in order to establish a high-throughput biomarker discovery

pipeline including quality of the sample, library preparation methods, input quantity, and sequencing coverage.

However, prior to testing these parameters, we established a set of output measures to allow us to compare

across methodologies and experimental conditions. These quality control measures are described in detail on

Table 1. In addition, we tested and compared our bioinformatics pipeline both internally (collaborators) and

externally (online published available data) before analyzing any of the libraries in this study. Our findings were

consistent with published results [47, 36, 35, 48].

[47] Camps C, Saini HK, Mole DR, Choudhry H, Reczko M, Guerra-Assunção JA, Tian YM, Buffa FM, Harris AL, Hatzigeorgiou

AG, Enright AJ, Ragoussis J. Integrated analysis of microRNA and mRNA expression and association with HIF binding reveals

the complexity of microRNA expression regulation under hypoxia. Mol Cancer. 2014 Feb 11;13:28. doi: 10.1186/1476-

4598-13-28.

[36] Spornraft M, Kirchner B, Haase B, Benes V, Pfaffl MW, Riedmaier I.RNAs from plasma-enabling small RNA sequencing.

PLoS One. 2014 Sep 17;9(9):e107259. doi: 10.1371/journal.pone.0107259. eCollection 2014.

Page 4: Author's response to reviews Biomarker discovery ... · OK - all raw data has been submitted to the NCBI - Gene Expression Omnibus database and our forms are currently under review.

[35] Huang X, Yuan T, Tschannen M, Sun Z, Jacob H, Du M et al. Characterization of human plasma-derived exosomal RNAs by deep sequencing. BMC genomics. 2013;14:319. doi:10.1186/1471-2164-14-319. [48] Van de Bunt M, Gaulton KJ, Parts L, Moran I, Johnson PR, Lindgren CM et al. The miRNA profile of human pancreatic islets and beta-cells and relationship to type 2 diabetes pathogenesis. PLoS One. 2013;8(1):e55272. doi:10.1371/journal.pone.0055272.

6) The authors should give some interpretation to the reason why PPS give so many raw reads than other

methods in the table 2.

RESPONSE:

We thank the reviewer for this comment. We have now added an interpretation to the reason why PPS gives

more reads than the other methods tested.

Under “Library purification methods of small RNA sequencing” section (paragraph 5).

“PPS generated the highest number both of total reads and distinct miRNAs identified, as well as very high specificity to miRNAs. This can be attributed to several factors, for example: 1) the libraries purified with PPS contained more than 50 times more product after purification, as compared to the Novex gel methods. This is due to the fact that the PPS is an automated system that does not require extraction of the library products directly from the gel, which can lead to less library product; 2) the range of the automatically isolated bands can be optimized to a desired product size (we used 125-180nt), due to size selection and specificity, PPS contained the least number of reads removed due to a size either smaller than 15nt or larger than 40nt; 3) PPs showed the lowest number of adapter-adapter ligated reads. However, because each PPS instrument limits a run to only 4 samples, we tested variations across instruments. We found a significant difference in the final number of miRNAs identified per machine with 50 more miRNAs identified with PPS2. The PPS showed limitations in terms of consistency, and while the protocol requires less hands-on time in the laboratory, it does not increase throughput (only 4 samples per run) or cost significantly. We believe this is a very good method for medium size projects”. 7) It would be perfect for authors to make a comprehensive comparison between different RNA purification method with a table in the supplementary. RESPONSE:

This is an important comment, but one that has been thoroughly addressed in a recent paper (Pritchard et al., 2012). In the background section, we had written the following statement to address this issue: Background section (paragraph 6) “Although we did not test blood collection procedures or RNA extraction methods, the source of RNA and extraction method can have a significant impact on the measured levels of ncRNAs. Prichard et al. provides a comprehensive review on sample collection and processing for miRNA quantification [32]”. [32] Pritchard CC, Cheng HH, Tewari M. MicroRNA profiling: approaches and considerations. Nature reviews Genetics. 2012;13(5):358-69. doi:10.1038/nrg3198.

Page 5: Author's response to reviews Biomarker discovery ... · OK - all raw data has been submitted to the NCBI - Gene Expression Omnibus database and our forms are currently under review.

8) Please change all the points in “has.miR.485.5p” as “has-miR-485-5p”

RESPONSE:

All the periods in the miRNA names have been replaced by a dash (“has.miR.486.5p” to “miR-486-5p”).

9) How RIN was measured should be introduced in the background section. In addition, please define it when

you use it at the first time.

RESPONSE:

Measurement of RNA quality is an important component of any biomarker experiment, so we appreciate the

reviewer’s attention to this. Since measurement of RIN is a methodological issue, we describe this carefully in

the methods section.

Under “Sample processing and RNA extractions” we state:

“…RNA yields and quality were assed and determined using both the Nanodrop 1000 (Thermo Scientific, USA)

and Agilent 2100 Bioanalizer (Agilent Technologies, USA).”

Under “Effects of RNA integrity” we state:

“RNA integrity number (RIN) values represent the level of RNA degradation in the sample, where 10 and 0 are

the highest and lowest quality scores, respectively”.

Page 6: Author's response to reviews Biomarker discovery ... · OK - all raw data has been submitted to the NCBI - Gene Expression Omnibus database and our forms are currently under review.

10) Please provide the rank correlation of shared miRNAs in the Table 6. I assume the correlation of the rank

between the identified miRNAs in Hiseq2500 and Miseq would be very strong.

RESPONSE:

We have modified table 6 and included rank correlations between both sequencing platforms, as suggested

11) Please provide a short eventual and explicit recommendation or highlight for the readers in the summary

section as the main discovery in the study such as RIN would not affect the miRNA-seq and so on.

RESPONSE:

We agree. We have modified our conclusions section to better highlight the findings of the study.

Conclusions The goal of this study was to highlight some fundamental details of small ncRNA profiling, and provide the

reader with general guidelines for quantification, data processing and analysis of sncRNAs from clinical samples

using NGS. Our results show that good quality sequencing libraries can be prepared from small amounts of

total RNA and that varying degradation levels in the samples do not have a significant effect on the overall

quantification of sncRNAs via NGS. In addition, we discuss the strengths and limitations of three commercially

available library preparation methods, describe our bioinformatics pipeline, provide recommendations for

sequencing depth and coverage, and describe in detail the expression and distribution of all sncRNAs in four

human tissues: whole-blood, brain, heart and liver. Ultimately, this study provides valuable information that

will help researchers plan and execute future small RNA profiling studies that will contribute to the

understanding of sncRNAs as potential biomarkers and mediators of biological functions and disease.

Page 7: Author's response to reviews Biomarker discovery ... · OK - all raw data has been submitted to the NCBI - Gene Expression Omnibus database and our forms are currently under review.

12) Please provide the detailed numbers of the miRNAs in Figure 6, except the proportions.

RESPONSE:

Supplementary tables 1-9 provide all the detailed information on miRNAs, miRNA counts, averages, and

distributions from all samples displayed in figure 6. We have better highlighted the references to the

supplemental information in the main text

13) Please provide corresponding heatmap plot based on the data of Figure 4 and Figure 7, respectively, as

the supplementary or in the main body.

RESPONSE:

We now include a heat map for data seen in Figure 4 and Figure 7 in the supplement. This is the newly

generated supplementary figure 5.

Page 8: Author's response to reviews Biomarker discovery ... · OK - all raw data has been submitted to the NCBI - Gene Expression Omnibus database and our forms are currently under review.

Heat map plot – Expression of miRNAs in four human tissues

Page 9: Author's response to reviews Biomarker discovery ... · OK - all raw data has been submitted to the NCBI - Gene Expression Omnibus database and our forms are currently under review.

14) Which factors in the RNA-seq would affect “Surviving Reads” as mentioned in Table 2,4,5,6?

RESPONSE:

This is an important issue. In the methods section under “Sequencing data processing and analysis- Small RNA-

Seq Pipeline” we state:

“…we applied specific filtering based on defined cutoffs in order to obtain high quality data, which directly

translates to our “surviving reads”. These filters included: 1) Phred quality (Q) score higher than 30, 2) reads

between 15-40nt in length, 3) adapter detection based on perfect-10nt match, and 4) removal of reads without

detected adapter.”

In addition, table 1 provides a definition for “Surviving Reads” (see below), as well as detailed explanation for

all the other output measures and filters used in our small RNA sequencing pipeline.

“Surviving Reads: This metric shows the number of reads that pass all the quality and trimming filters

previously described. A good quality library should have surviving rates between 50% and 100%, depending on

method used”.

15) Table 3, specific values would be prefer than relative description.

RESPONSE:

We have modified table 3 as requested by the reviewer.

Table 3. Library Preparation: Purification Methods

Method Specificity Throughput Cost ($) Study Size

Novex TBE PAGE gel

High

(manually cutting band; very specific)

Low

(few libraries/day)

$$$$$

Small

(2-10 samples)

Pippin Prep Automated gel system

Medium

(automated band; less specific)

Low

(4 libraries/run [2hrs])

$$$

Medium

(10-50 samples)

AMPure XP beads

Low

(all products > 100nt)

High

(24 libraries/2hrs)

$

Large

(50 and up)

Table 3. Recommendations for small RNA sequencing library purification. Recommendations include: (1)

Specificity: based on specificity to a particular small RNA population. (2) Throughput: based on the number of

libraries that can be prepared per day and efficiency of processing. This number is relative to the number of

people working and instruments available in the lab. (3) Cost: based on price of reagents, hands-on laboratory

time, service fees by genome centers. (4) Study Size: based on number of biological or technical replicates.

Page 10: Author's response to reviews Biomarker discovery ... · OK - all raw data has been submitted to the NCBI - Gene Expression Omnibus database and our forms are currently under review.

16) Figure 1 should provide more information. For example, the purification method can be labelled in

Figuare 1B and so on.

RESPONSE:

We agree. All of this information is included in the legend, to ensure a clear and understandable figure. The

legend states:

Figure 1. Illustration of study design and samples. Human biological samples (N=45) were included in the

present study. (A) Peripheral blood from a single individual was split into 11 aliquots (technical replicates) to

test three different small RNA library purification methods: Novex TBE PAGE gel (N=3), Pippin Prep automated

gel system (PPS) (N=4), and AMPure XP beads ((N=3). Sample C1 (control - human brain) (N=1), sample AC

(control - no purification method) (N=1). (B) Peripheral blood from a single individual was split into 5 aliquots

(technical replicates) to test optimal amounts of RNA input: (1µg), (0.5µg), (0.25µg), (0.1µg), and (0.05µg). All

libraries were purified using the PPS system. (C) Peripheral blood samples from 15 healthy volunteers

(biological replicates) to test the effects of RNA integrity. Samples were split into 5 groups (N=3) with average

RIN values of 9, 7, 5.4, 2.2 and 0. All libraries were purified using AMPure XP beads. (D) Peripheral blood

samples from 12 healthy volunteers (biological replicates) to test effects of sequencing coverage. Samples

sequenced on both a HiSeq2500 (N=12) and MiSeq (N=12) Illumina sequencers. All libraries were purified using

AMPure XP beads. (E) Human whole-blood (N=4), brain (N=4), heart (N=4) and liver (N=4) tissues to test

expression and tissue specificity of small ncRNAs. All libraries were purified using AMPure XP beads.

17) It seems there is no any difference between AMPure and control group with the data of Table 2. Why?

RESPONSE:

Control sample AC was not purified or size selected before sequencing in order to compare the results to the

three methods tested. However, all libraries in the study (including control sample AC) were prepared using the

Illumina TruSeq Small RNA protocol with 12 cycles of PCR amplification after ligation of specific 3’ and 5’

adapters. This protocol is ideal for the investigation of small RNA species, as it takes advantage of the structure

of most small RNA molecules by ligating specific adapters to the 5'-phosphate and 3'-hydroxyl group, which are

molecular signatures of their biogenesis pathway.

This means that if the adapter ligation works well, in theory, the libraries don’t need to be purified. The success

of purification methods depends also on suppression of adaptor dimer products in order to keep their

representation at acceptable levels, ideally <2.5%. Adaptor dimer products can be easily checked with our

output measure table under the “adapter-adapter” feature.

The AC control results are similar to AMPure XP beads because this method does not contain a very specific

size selection as opposed to Novex (145-160nt) or PPS (125 and 180nt). AMPure XP beads retained all products

larger than 100nt, which ultimately translates to the lowest specificity (to a single small RNA population), as

compared to the other two methods and highlighted on Table 3.

To better explain these results, we now include a statement to the effect described above in the main

manuscript under “Library purification methods of small RNA sequencing” paragraph 7.

Page 11: Author's response to reviews Biomarker discovery ... · OK - all raw data has been submitted to the NCBI - Gene Expression Omnibus database and our forms are currently under review.

18, In the table 1, what does it mean “we removed reads with a quality scores<30”? Any reads which have

any base score <30 will be filter out?

RESPONSE:

This refers to the mean read quality score. However, all the reads that pass our filters have bases with Q scores

>30. We now better clarify this in the methods section and Table 1.

“Sequencing data processing and analysis- Small RNA-Seq Pipeline”

“…we applied specific filtering based on defined cutoffs in order to obtain high quality data. One of these

cutoffs or filters was quality score (Q). We removed all reads with a mean quality score < 30 after trimming

bases with Q scores <30…”

Page 12: Author's response to reviews Biomarker discovery ... · OK - all raw data has been submitted to the NCBI - Gene Expression Omnibus database and our forms are currently under review.

Referee #2 (Reviewer’s report)

The purpose of current study is to provide researchers with general guidelines for quantification, data

processing and analysis of miRNA, and other sncRNAs, from various clinical samples using NGS. The authors

showed their recommendations for sequencing depth and coverage and provided some information on the

expression and distribution of all small ncRNAs in human tissues; whole-blood, brain, heart and liver. These

findings are very important for researchers and clinicians in choosing an appropriate method to examine

small non-coding RNAs as potential biomarkers. Taken together, this manuscript appears to be totally well-

prepared. However, this manuscript requires some improvements for the publication.

RESPONSE:

We appreciate the referee’s positive comments about the findings of our study.

1) The authors provided useful information to examine small non-coding RNAs as potential biomarkers.

However, potential biomarkers related with disease have not been defined in this study. Thus, I suggest that

the authors should revise the title “Biomarker discovery”

RESPONSE:

We appreciate the reviewer’s point; however, the main purpose of this study was to develop metrics and

methodologies for biomarker quality control. In this sense, our aim was not to identify any one biomarker that

may be associated with disease but rather to develop and assess methodologies. We believe keeping

‘Biomarker discovery” in the title is thus important. Hopefully, the abstract is clear enough to inform the

reader that this is a methodology paper, rather than a biomarker discovery paper.

2) The circular chart in Figure3 does not have a good appearance, thus should be modified.

RESPONSE:

In figure 3 we show the expression and distribution of microRNAs in four different human tissues using pie

charts. We chose pie charts because they are the most common statistical graphic to illustrate numerical

proportions. We would like to highlight that Supplementary tables 1-4 provide detailed information on all

aspects of this figure, so should be considered complementary to the details in Figure 3.