Combining de Bruijn graph, overlap graph and microassembly for de novo genome assembly
-
Upload
anton-alexandrov -
Category
Documents
-
view
2.409 -
download
3
description
Transcript of Combining de Bruijn graph, overlap graph and microassembly for de novo genome assembly
![Page 1: Combining de Bruijn graph, overlap graph and microassembly for de novo genome assembly](https://reader034.fdocuments.us/reader034/viewer/2022052321/547c6a75b4af9fd3158b517a/html5/thumbnails/1.jpg)
Combining de Bruijn graph, overlap graph and
microassembly for de novo genome assembly
A. Alexandrov, S. Kazakov, S. Melnikov, A. Sergushichev, P. Fedotov, F. Tsarev,
A. Shalyto
Genome Assembly Algorithms Laboratory
St. Petersburg National Research University of Information Technologies, Mechanics and Optics
Kazan, 23 Nov 2012
![Page 2: Combining de Bruijn graph, overlap graph and microassembly for de novo genome assembly](https://reader034.fdocuments.us/reader034/viewer/2022052321/547c6a75b4af9fd3158b517a/html5/thumbnails/2.jpg)
22
Algorithm
Error correction
Quasi-contig
assembly
Initial contig
assembly
Contig micro-
assembly
De Bruijn graph
Overlap graph
Scaffolding
![Page 3: Combining de Bruijn graph, overlap graph and microassembly for de novo genome assembly](https://reader034.fdocuments.us/reader034/viewer/2022052321/547c6a75b4af9fd3158b517a/html5/thumbnails/3.jpg)
33
Error correction
• K-mers – substrings of length k.• “Trusted” and “untrusted” k-mers.• Replace “untrusted” k-mers with the
“trusted” ones.• If all the k-mers don’t fit into memory.
• Divide them into buckets.• Process the buckets independently.
![Page 4: Combining de Bruijn graph, overlap graph and microassembly for de novo genome assembly](https://reader034.fdocuments.us/reader034/viewer/2022052321/547c6a75b4af9fd3158b517a/html5/thumbnails/4.jpg)
44
Quasicontig assembly
??? GTCCATGC
ATGCATGCAGTG GTCCATGC
![Page 5: Combining de Bruijn graph, overlap graph and microassembly for de novo genome assembly](https://reader034.fdocuments.us/reader034/viewer/2022052321/547c6a75b4af9fd3158b517a/html5/thumbnails/5.jpg)
55
De Bruijn graph
De Bruijn graph for a set of strings S:
● V =
![Page 6: Combining de Bruijn graph, overlap graph and microassembly for de novo genome assembly](https://reader034.fdocuments.us/reader034/viewer/2022052321/547c6a75b4af9fd3158b517a/html5/thumbnails/6.jpg)
6
De Bruijn graph example (1)
![Page 7: Combining de Bruijn graph, overlap graph and microassembly for de novo genome assembly](https://reader034.fdocuments.us/reader034/viewer/2022052321/547c6a75b4af9fd3158b517a/html5/thumbnails/7.jpg)
7
De Bruijn graph example (2)
AGT GTG
GTC TCA CAT ATC TCC
CCA
CAA
AACACA
CAC
CAGAGGGGAGAG
![Page 8: Combining de Bruijn graph, overlap graph and microassembly for de novo genome assembly](https://reader034.fdocuments.us/reader034/viewer/2022052321/547c6a75b4af9fd3158b517a/html5/thumbnails/8.jpg)
88
Quasicontig assembly
• Build the de Bruijn graph.
• For each pair of reads (r1, r2) find the path between the first k-mer of r1 and the last k-mer of r2.
• The path has to be of appropriate length.
• The path has to be unique.
![Page 9: Combining de Bruijn graph, overlap graph and microassembly for de novo genome assembly](https://reader034.fdocuments.us/reader034/viewer/2022052321/547c6a75b4af9fd3158b517a/html5/thumbnails/9.jpg)
9
De Bruijn graph example (3)
![Page 10: Combining de Bruijn graph, overlap graph and microassembly for de novo genome assembly](https://reader034.fdocuments.us/reader034/viewer/2022052321/547c6a75b4af9fd3158b517a/html5/thumbnails/10.jpg)
10
De Bruijn graph example (4)
![Page 11: Combining de Bruijn graph, overlap graph and microassembly for de novo genome assembly](https://reader034.fdocuments.us/reader034/viewer/2022052321/547c6a75b4af9fd3158b517a/html5/thumbnails/11.jpg)
Unique paths correspond to quasicontigs
![Page 12: Combining de Bruijn graph, overlap graph and microassembly for de novo genome assembly](https://reader034.fdocuments.us/reader034/viewer/2022052321/547c6a75b4af9fd3158b517a/html5/thumbnails/12.jpg)
1212
Initial contig assembly
• Overlap– Suffix array– Inexact overlaps
• Layout– Overlap graph
• Consensus
![Page 13: Combining de Bruijn graph, overlap graph and microassembly for de novo genome assembly](https://reader034.fdocuments.us/reader034/viewer/2022052321/547c6a75b4af9fd3158b517a/html5/thumbnails/13.jpg)
13
Contig microassembly
• There are paired reads that map to different contigs.
• There are pairs of reads, one of which maps to one of the contigs and the other one maps to the gap between the contigs.
![Page 14: Combining de Bruijn graph, overlap graph and microassembly for de novo genome assembly](https://reader034.fdocuments.us/reader034/viewer/2022052321/547c6a75b4af9fd3158b517a/html5/thumbnails/14.jpg)
14
Contig microassembly algorithm
• Use Bowtie to find the positions of reads in contigs.
• Find all the pairs of contigs connected by many reads.
• Build the de Bruijn graph using the reads that map to at least one of the chosen contigs.
• Use the quasicontig assembly algorithm to fill the gap.
![Page 15: Combining de Bruijn graph, overlap graph and microassembly for de novo genome assembly](https://reader034.fdocuments.us/reader034/viewer/2022052321/547c6a75b4af9fd3158b517a/html5/thumbnails/15.jpg)
15
Results
• E. Coli genome – 4,5 million nucleotides.
• SRR001665 library, fragment size – 200, read length – 36, coverage – 160.
• Before microassembly – 525 contigs, N50 = 17804.
• After microassembly – 247 contigs, N50 = 53720.
• ABySS – 632 contigs, N50 = 64280.
![Page 16: Combining de Bruijn graph, overlap graph and microassembly for de novo genome assembly](https://reader034.fdocuments.us/reader034/viewer/2022052321/547c6a75b4af9fd3158b517a/html5/thumbnails/16.jpg)
16
Web-service
• http://genome.ifmo.ru/cloud
![Page 17: Combining de Bruijn graph, overlap graph and microassembly for de novo genome assembly](https://reader034.fdocuments.us/reader034/viewer/2022052321/547c6a75b4af9fd3158b517a/html5/thumbnails/17.jpg)
17
Acknowledgements
• K. Skryabin, E. Prokhorchuk from “Bioengineering” center, for introduction to bioinformatics.
• D. Alexeev, from NRI PCM, for the invitation to this conference.