Polymerase Chain Reaction Preparation of Template for...

7
ARTICLE Journal of Biomolecular Techniques 20:128–134 © 2009 ABRF F R AB INTRODUCTION Massively parallel pyrosequencing was introduced in 2005 1 and has since been widely applied. Applications to eukary- otic species have included survey sequencing of the nuclear genome, 2,3 complete sequencing of organellar genomes, 4,5 and transcriptome analysis. 6–10 This technology is particu- larly well-suited to transcriptome analysis because it pro- vides a cost-effective method for comprehensive analysis of mRNA pools. 8,10 Preparation of high-quality cDNA using traditional approaches 11 requires significant amounts of polyA+ RNA, which limits applications of deep tran- scriptome analysis to tissues that are available in quanti- ties sufficient to allow isolation of several micrograms of polyA+ RNA. A few research groups have explored differ- ent options for amplification of cDNA from small samples of RNA, but most reports have focused on transcriptome analysis using RNA purified from tissues available in relatively large quantities, often without amplification. 6–8 One report on transcriptome analysis of microdissected shoot apical meristems used a linear amplification system to amplify the RNA obtained from small tissue samples, 9 while another group reported successful application of a polymerase chain reaction (PCR)-based method for cDNA library preparation. 10 The objective of this study was to adapt PCR-based approaches to cDNA preparation for use in preparing template for massively parallel pyrosequencing, and to compare the utility of DNA sequences obtained from parallel pyrosequencing of templates prepared by dif- ferent methods. A collection of transcript sequences obtained by parallel pyrosequencing was also compared with similar collections from related species obtained by Sanger sequencing, as a measure of the quality of the data obtained by pyrosequencing. MATERIALS AND METHODS RNA Isolation RNA was isolated from foliage (GL, NL samples) and from succulent stem tissue (GS, NS samples) of Eucalyptus Polymerase Chain Reaction Preparation of Template for Massively Parallel Pyrosequencing Ross W. Whetten, 1 Sofía Valenzuela A, 1,2 and John Frampton 1 1 Department of Forestry & Environmental Resources, NC State University, Raleigh, NC; 2 Centro de Biotecnología, Universidad de Concepción, Concepción, Chile Massively parallel pyrosequencing of DNA fragments immobilized on beads has been applied to genome survey sequencing and transcriptome analysis of a variety of eukaryotic organisms, including laboratory model species, agricultural crops and livestock, and species of interest to population biologists and ecologists. Prepa- ration of sufficient high-quality template for sequencing has been an obstacle to sequence analysis of nucleic acids from tissues or cell types available in limited quantities. We report that the use of a biotinylated primer for polymerase chain reaction amplification allows removal of excess primer and poly(A) tract fragments from the sequencing templates, providing much higher yields of useful sequence information from pyrosequenc- ing of amplified templates. This advance allows deep sequencing analysis of nucleic acids isolated from very small tissue samples. Massively parallel pyrosequencing is particularly useful for preliminary investigations of species that have not yet been the subject of significant genomic research, as genomic survey sequences and catalogs of expressed genes provide a means of linking the biology of less intensively studied species to that of more intensively studied model organisms. We obtained over 220 Mb of transcript DNA sequences from Abies fraseri (Pursh) Poir., a conifer species native to the southern Appalachian Mountains of eastern North America. Comparison of the resulting assembled putative transcripts with similar data obtained by other sequencing methods from other conifers demonstrates the utility of the improved sequencing template preparation. KEY WORDS: DNA sequences, polymerase chain reaction, transcriptome analysis ADDRESS CORRESPONDENCE TO: Ross W. Whetten, Department of Forestry & Environmental Resources, NC State University, Raleigh, NC 27695 (phone: 919-515-7578; fax: 919-515-6193; email ross_ [email protected]).

Transcript of Polymerase Chain Reaction Preparation of Template for...

article

Journal of Biomolecular Techniques 20:128–134 © 2009 ABRFFRAB

introductionMassively parallel pyrosequencing was introduced in 20051 and has since been widely applied. Applications to eukary-otic species have included survey sequencing of the nuclear genome,2,3 complete sequencing of organellar genomes,4,5 and transcriptome analysis.6–10 This technology is particu-larly well-suited to transcriptome analysis because it pro-vides a cost-effective method for comprehensive analysis of mRNA pools.8,10 Preparation of high-quality cDNA using traditional approaches11 requires significant amounts of polyA+ RNA, which limits applications of deep tran-scriptome analysis to tissues that are available in quanti-ties sufficient to allow isolation of several micrograms of polyA+ RNA. A few research groups have explored differ-ent options for amplification of cDNA from small samples

of RNA, but most reports have focused on transcriptome analysis using RNA purified from tissues available in relatively large quantities, often without amplification.6–8 One report on transcriptome analysis of microdissected shoot apical meristems used a linear amplification system to amplify the RNA obtained from small tissue samples,9 while another group reported successful application of a polymerase chain reaction (PCR)-based method for cDNA library preparation.10

The objective of this study was to adapt PCR-based approaches to cDNA preparation for use in preparing template for massively parallel pyrosequencing, and to compare the utility of DNA sequences obtained from parallel pyrosequencing of templates prepared by dif-ferent methods. A collection of transcript sequences obtained by parallel pyrosequencing was also compared with similar collections from related species obtained by Sanger sequencing, as a measure of the quality of the data obtained by pyrosequencing.

Materials and Methodsrna isolation

RNA was isolated from foliage (GL, NL samples) and from succulent stem tissue (GS, NS samples) of Eucalyptus

Polymerase Chain Reaction Preparation of Template for Massively Parallel Pyrosequencing

Ross W. Whetten,1 Sofía Valenzuela A,1,2 and John Frampton1

1Department of Forestry & Environmental Resources, NC State University, Raleigh, NC; 2Centro de Biotecnología, Universidad de Concepción, Concepción, Chile

Massively parallel pyrosequencing of DNA fragments immobilized on beads has been applied to genome survey sequencing and transcriptome analysis of a variety of eukaryotic organisms, including laboratory model species, agricultural crops and livestock, and species of interest to population biologists and ecologists. Prepa-ration of sufficient high-quality template for sequencing has been an obstacle to sequence analysis of nucleic acids from tissues or cell types available in limited quantities. We report that the use of a biotinylated primer for polymerase chain reaction amplification allows removal of excess primer and poly(A) tract fragments from the sequencing templates, providing much higher yields of useful sequence information from pyrosequenc-ing of amplified templates. This advance allows deep sequencing analysis of nucleic acids isolated from very small tissue samples. Massively parallel pyrosequencing is particularly useful for preliminary investigations of species that have not yet been the subject of significant genomic research, as genomic survey sequences and catalogs of expressed genes provide a means of linking the biology of less intensively studied species to that of more intensively studied model organisms. We obtained over 220 Mb of transcript DNA sequences from Abies fraseri (Pursh) Poir., a conifer species native to the southern Appalachian Mountains of eastern North America. Comparison of the resulting assembled putative transcripts with similar data obtained by other sequencing methods from other conifers demonstrates the utility of the improved sequencing template preparation.

Key Words: DNA sequences, polymerase chain reaction, transcriptome analysis

Address correspondence to: Ross W. Whetten, Department of Forestry & Environmental Resources, NC State University, Raleigh, NC 27695 (phone: 919-515-7578; fax: 919-515-6193; email [email protected]).

R.W. WhETTEN ET Al. / PCR TEMPlATE FoR PARAllEl PyRoSEqUENCiNg

JoURNAl oF BioMolECUlAR TEChNiqUES, VolUME 20, iSSUE 2, APRil 2009 129

globulus and Eucalyptus nitens, respectively, using a modifica-tion of the lysis buffer provided with the RNAeasy Plant kit (Qiagen, Valencia, CA). Briefly, 0.5 mL of lysis buf-fer (4 M guanidine thiocyanate, 0.2 M sodium acetate pH 5.3, 25 mM EDTA, 2.5% polyvinylpyrrolidone, 1% beta-mercaptoethanol) was added to a 100-mg sample of plant tissue that had been frozen in liquid nitrogen and ground to a powder in a mortar and pestle. The samples were sub-jected to additional grinding as the mixture thawed, and then transferred to a microfuge tube. The solution was adjusted to a final concentration of 0.2% sodium N-lau-roylsarcosine, and incubated at 65ºC for 10 min. The lysate was then carried through the kit protocol, beginning with application to the shredder column.

RNA was isolated from foliage and succulent stem tissue (combined) from one seedling, or from root tissue (from root tips to the zone of lateral root elongation) from two different seedlings of Abies fraseri (Fraser fir), using the same procedure as described above.

cdna PreparationThe SMART-PCR cDNA library system (Clontech, Moun-tain View, CA) was used to prepare cDNA from the four Eucalyptus RNA samples according to the instructions pro-vided by the manufacturer, including a final amplification step using a primer with the sequence AAGCAGTGG-TATCAACGCAGAGT provided as part of the SMART-PCR cDNA library kit. The PCR products were purified on silica matrix columns (Qiagen) according to the proto-col supplied by the manufacturer, and submitted for DNA sequencing by an external service provider.

The same method was also used to prepare double-stranded cDNA from three Abies fraseri RNA samples, one from stem/foliage and two from roots. The amplified cDNA preparations, after purification on silica matrix col-umns, were then normalized using crab dsDNA-specific nuclease (DSN) (Evrogen, Russia; obtained through Axxora, San Diego, CA) as previously described.12 The key modification to the SMART-PCR method for cDNA sequencing template preparation is the use of a biotiny-lated primer for the final amplification of the normal-ized cDNA populations. After completion of the DSN digestion and inactivation of the nuclease, the Abies fraseri cDNA samples were amplified using a biotinylated ver-sion of the SMART PCR primer. The PCR products were purified on silica matrix columns as previously described, and submitted to an external service provider for sequenc-ing. Each sample was titered in three one-sixteenth-plate regions to determine the optimal bead library concentra-tion; the root samples were each sequenced in two half-plate regions of a PicoTiterPlate, and the foliage sample was sequenced in a single half-plate region.

Primer contamination analysisThe SSAHA program13 (http://www.sanger.ac.uk/Soft-ware/analysis/SSAHA/) was used to conduct string searches of sequence reads for the forward and reverse strands of the PCR primer sequence, using parameters that resulted in identifying any region that shared at least 18 of the 23 bases of the primer. The total number of nucleotides of primer sequence was calculated from the SSAHA search results and divided by the total number of nucleotides of all sequence reads to calculate the percent contamination for each sample.

sequence read assemblyThe service laboratory that carried out sequencing of the Eucalyptus samples considered the reads of such poor quality as to not merit assembly, so no further analysis of those sequences was carried out. Sequence reads from the titering runs and the half-plate region runs of the Fraser fir samples (root cDNA of seedling one, root cDNA of seedling two, foliage of seedling two) were assembled separately by the sequencing service laboratory, using the Newbler software provided by the instrument ven-dor (Roche 454 Life Sciences, Branford, CT). We carried out a second assembly, also using Newbler, with all reads from all three samples together to create a single collec-tion of contigs for comparison to other conifer expressed sequence tag collections in Genbank.

sequence alignment and comparisonsThe plant protein RefSeq collection (Build 30, incorpo-rating data up to July 7, 2008; obtained from NCBI on August 11, 2008) was used as the reference collection of protein sequences. Programs from the BLAST package of software from NCBI14 were used to create a protein data-base and carry out comparisons of nucleotide sequence queries to the protein RefSeq database. Unigene collec-tions of Pinus taeda (Build 8), Picea glauca (Build 9), and Picea sitchensis (Build 11) were downloaded from NCBI on August 1, 2008, and used as nucleotide sequence queries for Blastx searches of the plant protein RefSeq database. Unigene builds are intended as a nonredundant represen-tation of the genes in an organism rather than as consen-sus sequences for individual transcription units,15 so they are not directly comparable to contigs assembled by the Newbler software. The generic term “cluster” is used to refer to both contigs and Unigenes.

The criterion used for Blastx searches was an E-value of 1 × 3 10–8; a single result was returned for each query that met this criterion of similarity. The Blastx searches of the RefSeq plant protein collection used the complete Unigene sets for the Pinus taeda and the two Picea builds.

R.W. WhETTEN ET Al. / PCR TEMPlATE FoR PARAllEl PyRoSEqUENCiNg

130 JoURNAl oF BioMolECUlAR TEChNiqUES, VolUME 20, iSSUE 2, APRil 2009

Two separate queries were conducted for the Abies fraseri data: all 75,180 contigs, and the subset of contigs with length equal to or greater than 500 nt. The results of each search were sorted by the subject protein accession, then by alignment length, to find the single longest alignment obtained between a subject protein and a conifer nucle-otide sequence query.

resultsAnalysis of the Eucalyptus cDNA samples on an aga-rose gel (performed by the service laboratory prior to emulsion-PCR preparation of the bead libraries) showed a relatively even distribution of DNA fragment sizes from 100 bp to about 1500 bp in the samples (Fig. 1). A similar agarose gel analysis of the Abies cDNA samples prepared using a biotinylated primer for the final PCR step shows a range of sizes in the cDNA population comparable to that observed in the Eucalyptus samples, and suggests that the DSN normalization performed well in reducing the range of variation in abundance between the most frequent mRNAs and the rest of the population (Fig. 2).

Histograms of sequencing read lengths for some of the Eucalyptus samples showed a striking and unexpected periodic pattern (Fig. 3). The interval between the peaks in the histogram is 23 nucleotides, exactly the length of the primer used in the PCR amplification of the cDNA. Examination of the sequence reads from these four sam-ples showed that the 23-nt primer sequence accounts for 7.1% (sample NS) up to 64.5% (sample NL) of the total number of nucleotides (Table 1A).

In contrast, the materials prepared using a bioti-nylated version of the SMART PCR primer to amplify cDNA preparations from Fraser fir samples show no peri-odic pattern in the histogram of the sequence read lengths. The 23-nt primer sequence accounts for less than 0.2% of the total nucleotides in each of the five sequencing runs (Table 1B). Three independent preparations of Fraser fir cDNA yielded the sequencing read length histograms shown in Figure 4.

Comparison of the contigs assembled from the Fraser fir cDNA sequences with Unigene builds of three other conifers available in Genbank shows that Fraser fir is approaching the amount of sequence coverage available for other conifers in contigs (or Unigenes) of length 500 nucleotides or greater (Table 2). The contigs assembled by the Newbler software are not directly comparable to Unigenes because the criteria used at NCBI to cluster sequences into Unigenes are less stringent than the criteria used by the Newbler assembly software.15 A key disadvan-tage of the massively parallel pyrosequencing technology relative to Sanger sequencing is the shorter read length, which leads to shorter contigs for the Fraser fir assembly. The median size of the 75,180 contigs is 231 bp and the mean is 325 bp.

One basis on which to compare the relative fraction of the transcriptome identified in each of the four coni-fer species is by comparison of the sequence clusters to known plant proteins. Figure 5 shows the distribution of alignment lengths for each of the three conifer Unigene collections and the Abies fraseri contigs reported here. These alignment lengths are a function of the length of the query nucleotide sequence cluster, the sequence

Figure 1

Size distributions of Eucalyptus cDNA samples. Agarose gel elec-trophoresis of cDNA samples was carried out by the sequencing service laboratory prior to emulsion-PCR. The size markers (lanes marked M) are a 100-bp ladder with a major band at 600 bp and the uppermost band at about 2 kb. lanes 1–4 are samples gS, NS, gl, and Nl, respectively. The major smear of cDNAs extends from about 200 bp to about 1200 bp, with some higher-molecular-mass material extending to lengths greater than 2 kb.

Figure 2

Size distributions of Abies cDNA samples. Agarose gel electropho-resis of cDNA samples S1, S2, and S3 before (lanes –3) and after (lanes 4–6) normalization with dsDNA-specific nuclease, prior to emulsion PCR at the sequencing service laboratory. The size marker (lane M) is a 100-bp ladder with a major band at 500 bp, and the uppermost band at about 1.5 kb.

R.W. WhETTEN ET Al. / PCR TEMPlATE FoR PARAllEl PyRoSEqUENCiNg

JoURNAl oF BioMolECUlAR TEChNiqUES, VolUME 20, iSSUE 2, APRil 2009 131

quality of the query, and the evolutionary relationship of the query and the subject. The plant protein RefSeq collection is composed predominantly of proteins from angiosperm plants, and all four conifer species are roughly equidistant from angiosperms in terms of evolutionary divergence. The longest alignment length should therefore be largely a function of cluster size and sequence quality, two key metrics of interest in compar-

ing the results of different sequencing technologies for gene discovery projects.

The four conifer species show similar patterns of variation in the lengths of the longest alignments between plant protein RefSeq subjects and nucleotide sequence clusters (Table 3, Fig. 5). The 75,180 Abies fraseri contigs detected a total of 12,853 different plant protein acces-sions, of which 7063 were detected by contigs of 500 nt

t a b l e 1

Primer Contamination

SampleTotal Reads Total nt Primer nt % Contamination

A. Standard SMART-PCR cDNA method, Eucalyptus Samples

gl 59,329 12,347,343 933,060 7.6

gS 66,825 13,128,982 2,085,624 15.9

Nl 155,978 22,606,088 14,575,536 64.5

NS 70,948 13,909,246 987,888 7.1

B. Modified SMART-PCR cDNA Method, Abies Samples

S1-1 201,444 46,572,108 71,184 0.15

S1-2 253,130 48,690,843 84,696 0.17

S2-1 203,448 46,638,525 94,440 0.20

S2-2 172,765 32,087,931 59,100 0.18

S3-1 207,383 47,120,501 87,372 0.19

Figure 3

Sequencing read length histograms from Eucalyptus samples gl, gS, Nl, and NS. The x-axis is the length of individual sequencing reads in nucleotides; the y-axis is the number of reads at each length, in one-nucleotide bins.

R.W. WhETTEN ET Al. / PCR TEMPlATE FoR PARAllEl PyRoSEqUENCiNg

132 JoURNAl oF BioMolECUlAR TEChNiqUES, VolUME 20, iSSUE 2, APRil 2009

or greater length (called “Large Contigs” by the Newbler assembly software). Not surprisingly, including contigs less than 500 nt long as queries increases the number of alignments to plant protein subjects that cover less than 25% of the protein coding region (Fig. 5).

The use of a biotinylated PCR primer should lead to the selective removal prior to emulsion PCR of the bioti-nylated PCR primer and cDNA fragments produced by nebulization that contain the biotinylated primer. One concern with this strategy is the potential skewing of the distribution of reads relative to the coding capacity of the original mRNA transcripts. A test of the uniformity

of distribution of Blastx alignment endpoints across the length of Arabidopsis proteins was conducted to determine if the assembled contigs have a tendency to align at posi-tions other than the beginning or the end of the protein coding region. No such tendency was detected—the dis-tribution of endpoints was relatively uniform across the length of the protein subjects, with a higher fraction of alignments beginning at the beginning of the protein cod-ing region and ending at the end, as expected (Fig. 6). This Figure 4

Sequencing read length histograms from three independent Abies samples. The solid lines in the S1 and S2 histograms are the results of the first half-plate runs of those samples; the dashed lines are the results of the second half-plate runs. The x-axis is the length of indi-vidual sequencing reads in nucleotides; the y-axis is the number of reads at each length, in one-nucleotide bins.

Figure 5

Size distributions of BlASTX alignments for conifer Unigene builds and Fraser fir cDNA contig assemblies. The y-axis shows the num-ber of alignments for each of the three conifer Unigene builds in genBank and the Abies fraseri contigs assembled as part of this work; the x-axis shows the length of the longest alignment for each subject protein as a fraction of the total protein length. Red dashed line: Abies fraseri contigs of any length. Red solid line: Abies fraseri contigs of at least 500 nt length. Blue solid line: Picea glauca Unige-nes. Blue dashed line: Picea sitchensis Unigenes. Black dashed line: Pinus taeda Unigenes.

Figure 6

Distribution of start and end points of Blastx alignments between assembled Fraser fir cDNA contigs and the reference plant protein sequence from the RefSeq collection. The dashed line shows the cumulative distribution of alignment start points, expressed as the fraction of the reference plant protein sequence, and the solid line shows the cumulative distribution of alignment end points.

R.W. WhETTEN ET Al. / PCR TEMPlATE FoR PARAllEl PyRoSEqUENCiNg

JoURNAl oF BioMolECUlAR TEChNiqUES, VolUME 20, iSSUE 2, APRil 2009 133

is contrary to the result reported in an earlier study, which found that alignments of contigs assembled from GS-20 sequences showed an increased tendency to align with the middle of protein coding regions.16

discussionParallel pyrosequencing is a powerful tool for rapid and cost-effective survey sequencing of genomes and tran-scriptomes of eukaryotic species, but the amount of template required has been a limiting factor for analy-sis of nucleic acids from cells or tissues available only in small quantities. PCR amplification of cDNA templates for transcriptome analysis from four independent Euca-lyptus samples yielded sequences with varying degrees of contamination by the primer used for PCR, but use of a biotinylated primer for the final PCR amplification pro-vided much higher quality sequences from cDNA tem-plates prepared from three independent samples of Abies fraseri. It is not clear why the degree of contamination by primer sequences of the DNA sequence reads from the Eucalyptus samples was so highly variable. All samples were prepared in parallel, using identical methods and materials, so the variability in the outcomes may be due to extreme sensitivity of one or more steps in the sample

preparation process to minor variations in reaction con-ditions. This intrinsic variability may also explain why a previous report of massively parallel pyrosequencing of cDNA prepared using the SMART-PCR technology did not report a problem with primer contamination of the resulting sequences.10

Use of a biotinylated primer in the final stage of cDNA preparation prior to nebulization and ligation of sequencing adaptors is correlated with a much lower incidence of contamination of the DNA sequence reads by primer sequences. The presence of biotin on the 5’ ends of PCR-amplified cDNAs and primers should allow trapping of excess PCR primer and cDNA frag-ments containing the biotinylated ends of amplicons on the streptavidin-coated beads used in preparation of template for emulsion-PCR.1 This trapping is likely to remove most of the polyA tail fragments from the nebulized cDNA preparation. Selective removal of end fragments from the cDNA preparation does not seem to have skewed the distribution of sequence reads along the length of the average coding sequence (Fig. 6) nor affected the ability to identify homologues of plant pro-teins by using assembled sequence contigs as Blastx que-ries of a protein database (Table 3).

t a b l e 2

Comparison of Fraser fir Sequence Assembly with Conifer Unigene Builds

Species No. All clustersa nt in All Clusters No. largeb Clusters nt in large Clusters

Afr 75,180 24,433,783 11,814 10,398,194

Pgl 17,808 13,561,299 15,900 12,804,040

Psi 16,751 14,425,190 14,940 13,716,175

Pta 18,952 15,083,842 17,914 14,710,422

a “Clusters” refers to Unigenes for Picea glauca (Pgl), Picea sitchensis (Psi), and Pinus taeda (Pta), and to Newbler assembled contigs for Abies fraseri (Afr). b “large” Unigenes or contigs are those 500 nt or longer.

t a b l e 3

Number of Different Plant RefSeq Proteins and Fraction of Coding Sequence Covered by Alignments

Afr-all Afr > 499 Pgl Psi Pta

Total 12853 7063 8408 8454 10130

Median 0.316 0.592 0.450 0.558 0.500

Mean 0.413 0.583 0.493 0.582 0.529

1st quartile 0.145 0.330 0.266 0.319 0.316

3rd quartile 0.672 0.849 0.710 0.885 0.734

R.W. WhETTEN ET Al. / PCR TEMPlATE FoR PARAllEl PyRoSEqUENCiNg

134 JoURNAl oF BioMolECUlAR TEChNiqUES, VolUME 20, iSSUE 2, APRil 2009

acknowledgMentsThe authors acknowledge the contributions of Anne-Margaret Bra-ham for technical support, and Bioforest SA and CMPC SA for Euca-lyptus tissue samples. This work was supported by the United States Department of Agriculture [2004-34458-14512].

reFerences 1. Margulies M, Egholm M, Altman WE, et al. Genome sequenc-

ing in microfabricated high-density picolitre reactors. Nature 2005;437:376–380.

2. Macas J, Neumann P, Navrátilová A. Repetitive DNA in the pea (Pisum sativum L.) genome: Comprehensive characteriza-tion using 454 sequencing and comparison to soybean and Medicago truncatula. BMC Genomics 2007;8:427.

3. Wicker T, Schlagenhauf E, Graner A, et al. 454 Sequencing put to the test using the complex genome of barley. BMC Genomics 2006;7:275.

4. Moore MJ, Dhingra A, Soltis PS, et al. Rapid and accurate pyrosequencing of angiosperm plastid genomes. BMC Plant Biol 2006;6:17.

5. Jex AR, Hu M, Littlewood DT, Waeschenbach A, Gasser RB. Using 454 technology for long-PCR based sequencing of the complete mitochondrial genome from single Haemonchus con-tortus (Nematoda). BMC Genomics 2008;9:11.

6. Torres TT, Metta M, Ottenwälder B, Schlötterer C. Gene expression profiling by massively parallel sequencing. Genome Res 2008;18:172–177.

7. Eveland AL, McCarty DR, Koch KE. Transcript profiling by 3ʹ-untranslated region sequencing resolves expression of gene families. Plant Physiol 2008;146:32–44.

8. Vera JC, Wheat CW, Fescemyer HW, et al. Rapid transcrip-tome characterization for a nonmodel organism using 454 pyrosequencing. Mol Ecol 2008;17:1636–1647.

9. Emrich SJ, Barbazuk WB, Li L, Schnable PS. Gene discovery and annotation using LCM-454 transcriptome sequencing. Genome Res 2007;17:69–73.

10. Weber AP, Weber KL, Carr K, Wilkerson C, Ohlrogge JB. Sampling the Arabidopsis transcriptome with massively parallel pyrosequencing. Plant Physiol 2007;144:32–42.

11. Gubler U, Hoffman BJ. A simple and very efficient method for generating cDNA libraries. Gene 1983;25:263–269.

12. Zhulidov PA, Bogdanova EA, Shcheglov AS, et al. Simple cDNA normalization using kamchatka crab duplex-specific nuclease. Nucleic Acids Res 2004;32(3):e37.

13. Ning Z, Cox AJ, Mullikin JC. SSAHA: A fast search method for large DNA databases. Genome Res 2001;11:1725–1729.

14. McGinnis S, Madden TL. BLAST: At the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res 2004;32:W20–W25.

15. NCBI—Build procedure—transcriptome-based, available at the NCBI website (URL: http://www.ncbi.nlm.nih.gov/Uni-Gene/build1.html), last viewed 17 Sept 2008.

16. Bainbridge MN, Warren RL, Hirst M, et al. Analysis of the prostate cancer cell line LNCaP transcriptome using a sequenc-ing-by-synthesis approach. BMC Genomics 2006;7: 246.