Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 ›...
Transcript of Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 ›...
![Page 1: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/1.jpg)
Learning to love de Bruijn graphsBen Woodcroft,
Australian Centre for Ecogenomics (ACE)
Winter School in Bioinformatics, 2015
![Page 2: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/2.jpg)
A slide from Torsten Seemann
![Page 3: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/3.jpg)
K‐mers and assembly
• For next‐generation sequencing, comparison of each read with each other read is impossible.– E.g. 10 million reads ‐> 107 x 107 read‐read comparisons. Slowww..
• K‐mers and de Bruijn graphs help make things tractable
![Page 4: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/4.jpg)
K‐mers and assembly
![Page 5: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/5.jpg)
Forks
![Page 6: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/6.jpg)
K‐mer too small
![Page 7: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/7.jpg)
K‐mer too large
![Page 8: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/8.jpg)
My favourite k‐mer size
![Page 9: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/9.jpg)
My favourite k‐mer size
With a 100bp read, this can never happen with a k‐mer size of 51
![Page 10: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/10.jpg)
Less tips, more bubbles
As read lengths get longer, assemblers must move from handling dead ends in the graph to handling bubbles.
![Page 11: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/11.jpg)
Tips and bubbles
![Page 12: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/12.jpg)
Metagenome assembly
Me: “I know, why don’t I just assemble all my data together?”
Run assemblyWait 4 daysOut of memory allocating 18.4 million terabytes of RAM.
![Page 13: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/13.jpg)
Solutions to RAM issues
• Quality trimming• Hard trimming• Throwing away a proportion of reads
randomly• Sequencing something else
![Page 14: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/14.jpg)
Lossy de Bruijn graphs
The number of k‐mers observed is vanishingly small relative to the total number of possible k‐mers
The human genome: ~3Gbp = ~3×109 k‐mersTotal possible 51‐mers: 451 = ~1030
0.00000000000000000002%
When making a list of k‐mers, counting extra ones probably has little effect on assembly.
![Page 15: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/15.jpg)
Bloom filters
A low memory k‐mer “store”
![Page 16: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/16.jpg)
Is my k‐mer in these reads?
From a bloom filter, the answer is either “no” or “probably”
![Page 17: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/17.jpg)
A finishing approach to assembly
A central assumption of this method is that the genome is “mostly” complete
![Page 18: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/18.jpg)
Scaffolding without mate pair data
![Page 19: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/19.jpg)
Gap filling vs. assembly
• Regular assembly ain’t easy• Re‐assembly is more straightforward because you are trying to get to somewhere
![Page 20: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/20.jpg)
Gap filling can correct assembly errors
• Contigs often contain errors right at the ends of contigs
• By starting to search a bit back (e.g. 200bp) away from the end of the contig, these errors can be overcome
![Page 21: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/21.jpg)
Gap‐filling can account for strain variation
github.com/wwood/finishm
![Page 22: Learning to love de Bruijn graphs - bioinformatics.org.aubioinformatics.org.au › ws15 › wp-content › uploads › ws14 › ... · Learning to love de Bruijn graphs Ben Woodcroft,](https://reader033.fdocuments.us/reader033/viewer/2022060416/5f14150b6e7ba7462a7c0f39/html5/thumbnails/22.jpg)
Thanks!
• Slideshare.com/benjwoodcroft
• Github.com/wwood
• Ecogenomic.org