Short read quality...

13
Short read quality assessment Martin Morgan 1 June 20-23, 2011 1 [email protected]

Transcript of Short read quality...

Page 1: Short read quality assessmentmaster.bioconductor.org/help/course-materials/2011/RNASeqChIPSeq/... · I Replication I Randomization and blocking, e.g., batch e ects I Depth of coverage

Short read quality assessment

Martin Morgan1

June 20-23, 2011

[email protected]

Page 2: Short read quality assessmentmaster.bioconductor.org/help/course-materials/2011/RNASeqChIPSeq/... · I Replication I Randomization and blocking, e.g., batch e ects I Depth of coverage

Why sequence?

e.g., RNA-seq

I Expression in novel (un-annotated) regions

I Exon junction / RNA editing insights

I Allele-specific / transcript isoform quantification

I Non-model organisms

I Greater dynamic range and sensitivity?

Lessons from microarrays

I Initially: variability between manufactures, technologies, labs

I MAQC: quality control standards and analysis protocols

Page 3: Short read quality assessmentmaster.bioconductor.org/help/course-materials/2011/RNASeqChIPSeq/... · I Replication I Randomization and blocking, e.g., batch e ects I Depth of coverage

Example work flow – [4]

Sample

I Purify poly(A)+ RNA witholigo(dT) magnetic beads

I cDNA synthesis primed withrandom hexamers

Microarray

I Dye-swap, hybridization,florescence, analysis

RNA-seq

I Fragment and size-select

I Illumina adapter ligation

Page 4: Short read quality assessmentmaster.bioconductor.org/help/course-materials/2011/RNASeqChIPSeq/... · I Replication I Randomization and blocking, e.g., batch e ects I Depth of coverage

Example work flow – [4]

Sample

I Purify poly(A)+ RNA witholigo(dT) magnetic beads

I cDNA synthesis primed withrandom hexamers

Microarray

I Dye-swap, hybridization,florescence, analysis

RNA-seq

I Fragment and size-select

I Illumina adapter ligation

Page 5: Short read quality assessmentmaster.bioconductor.org/help/course-materials/2011/RNASeqChIPSeq/... · I Replication I Randomization and blocking, e.g., batch e ects I Depth of coverage

Example work flow – [4]

Sample

I Purify poly(A)+ RNA witholigo(dT) magnetic beads

I cDNA synthesis primed withrandom hexamers

Microarray

I Dye-swap, hybridization,florescence, analysis

RNA-seq

I Fragment and size-select

I Illumina adapter ligation

Page 6: Short read quality assessmentmaster.bioconductor.org/help/course-materials/2011/RNASeqChIPSeq/... · I Replication I Randomization and blocking, e.g., batch e ects I Depth of coverage

Key issues

I Experimental design [1]I ReplicationI Randomization and

blocking, e.g., batcheffects

I Depth of coverageI Statistical powerI Library complexity

I Coverage heterogeneityI Estimation biasesI Legitimate comparison

I Sequencing uncertainty [2]

Page 7: Short read quality assessmentmaster.bioconductor.org/help/course-materials/2011/RNASeqChIPSeq/... · I Replication I Randomization and blocking, e.g., batch e ects I Depth of coverage

Key issues

I Experimental design [1]I ReplicationI Randomization and

blocking, e.g., batcheffects

I Depth of coverageI Statistical powerI Library complexity

I Coverage heterogeneityI Estimation biasesI Legitimate comparison

I Sequencing uncertainty [2]

ROC simulation

I Replication (red vs. blue)

I Randomization and blocking(solid vs. dot)

Page 8: Short read quality assessmentmaster.bioconductor.org/help/course-materials/2011/RNASeqChIPSeq/... · I Replication I Randomization and blocking, e.g., batch e ects I Depth of coverage

Key issues

I Experimental design [1]I ReplicationI Randomization and

blocking, e.g., batcheffects

I Depth of coverageI Statistical powerI Library complexity

I Coverage heterogeneityI Estimation biasesI Legitimate comparison

I Sequencing uncertainty [2]

Number of occurrences of each read (log10)

Cum

ulat

ive

prop

ortio

n of

rea

ds

0.0

0.2

0.4

0.6

0.8

1.0

0 1 2 3 4

1 2

0 1 2 3 4

3 4

5

0 1 2 3 4

6 7

0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

8

Cumulative proportion of readsoccuring 0, 1, . . . times

Page 9: Short read quality assessmentmaster.bioconductor.org/help/course-materials/2011/RNASeqChIPSeq/... · I Replication I Randomization and blocking, e.g., batch e ects I Depth of coverage

Key issues

I Experimental design [1]I ReplicationI Randomization and

blocking, e.g., batcheffects

I Depth of coverageI Statistical powerI Library complexity

I Coverage heterogeneityI Estimation biasesI Legitimate comparison

I Sequencing uncertainty [2]

Copies per read (log10)C

umm

ulat

ive

prop

ortio

n

0.0

0.2

0.4

0.6

0.8

1.0

2.0 2.2 2.4 2.6

Actual versus uniform φX174coverage

Page 10: Short read quality assessmentmaster.bioconductor.org/help/course-materials/2011/RNASeqChIPSeq/... · I Replication I Randomization and blocking, e.g., batch e ects I Depth of coverage

Key issues

I Experimental design [1]I ReplicationI Randomization and

blocking, e.g., batcheffects

I Depth of coverageI Statistical powerI Library complexity

I Coverage heterogeneityI Estimation biasesI Legitimate comparison

I Sequencing uncertainty [2]

Read count increases with genelength

Page 11: Short read quality assessmentmaster.bioconductor.org/help/course-materials/2011/RNASeqChIPSeq/... · I Replication I Randomization and blocking, e.g., batch e ects I Depth of coverage

Key issues

I Experimental design [1]I ReplicationI Randomization and

blocking, e.g., batcheffects

I Depth of coverageI Statistical powerI Library complexity

I Coverage heterogeneityI Estimation biasesI Legitimate comparison

I Sequencing uncertainty [2]

Reads, stratified by cycle,supporting a spurious SNP call inφX174

Page 12: Short read quality assessmentmaster.bioconductor.org/help/course-materials/2011/RNASeqChIPSeq/... · I Replication I Randomization and blocking, e.g., batch e ects I Depth of coverage

Case study

Subset of Brooks et al. [3]

I RNAi and mRNA-seq to identify pasilla-regulated alternativesplicing

I Purified polyA, random hexamer primed

I Single- and paired end sequences

I Alignment to reference genome and curated splic junctions

Page 13: Short read quality assessmentmaster.bioconductor.org/help/course-materials/2011/RNASeqChIPSeq/... · I Replication I Randomization and blocking, e.g., batch e ects I Depth of coverage

P. L. Auer and R. W. Doerge.Statistical design and analysis of RNA sequencing data.Genetics, 185:405–416, Jun 2010.

H. C. Bravo and R. A. Irizarry.Model-based quality assessment and base-calling forsecond-generation sequencing data.Biometrics, 66:665–674, Sep 2010.

A. N. Brooks, L. Yang, M. O. Duff, K. D. Hansen, J. W. Park,S. Dudoit, S. E. Brenner, and B. R. Graveley.Conservation of an RNA regulatory map between Drosophilaand mammals.Genome Res., 21:193–202, Feb 2011.

J. H. Malone and B. Oliver.Microarrays, deep sequencing and the true measure of thetranscriptome.BMC Biol., 9:34, 2011.