CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow.

Post on 13-Jan-2016

222 views 0 download

Tags:

Transcript of CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow.

CS/BioE 598AGB:Genome Assembly, part II

Tandy Warnow

nature biotechnology volume 29 number 11 november 2011

Supplementary Figure 1. De Bruijn graph from reads with sequencing errors. (a) A de Bruijn graph E on our set of reads with k = 4. Finding an Eulerian cycle is already a straightforward task, but for this value of k, it is trivial. (b) If TGGAGTG is incorrectly sequenced as a sixth read (in addition to the correct TGGCGTG read), then the result is a bulge in the de Brujin graph, which complicates assembly.

(Supplementary materials from the Compeau, Pevzner, and Tesler paper, Nature Biotech, 2011)

(c) An illustration of a de Bruijn graph E with many bulges. The process of bulge removal should leave only the red edges remaining, yielding an Eulerian path in the resulting graph.

(Supplementary materials from the Compeau, Pevzner, and Tesler paper,Nature Biotech, 2011)

(Supplementary materials from the Compeau, Pevzner, and Tesler paper, Nature Biotech, 2011)

N50

• The N50 value is the size of the smallest contig (or scaffold) such that 50% of the genome is contained in contigs of size N50 or larger. This is the standard metric used to evaluate the quality of an assembly.

• Salzberg et al. computed “corrected N50” values by splitting contigs (or scaffolds) where errors are identified.

From Mihai Pop’s paper

Differing Conclusions

• Compeau et al.: “De Bruijn graphs are not a cure-all…Short read sequencing technologies …favor the use of de Bruijn graphs...and are also well suited to representing genomes with repeats. However, if a future sequencing technology produces high quality reads with tens of thousands of bases,…,the pendulum could swing back toward favoring overlap-based approaches for assembly.”

Mihai Pop’s conclusion

Salzberg’s conclusions

Salzberg’s conclusions