Bng presentation draft

download Bng presentation draft

of 51

  • date post

    05-Jul-2015
  • Category

    Education

  • view

    264
  • download

    3

Embed Size (px)

description

Algorithms and filters used to improve the Tribolium draft Assembly with Physical Maps Based on Imaging Ultra-Long Single DNA Molecules

Transcript of Bng presentation draft

  • 1. Algorithms and filters used to improve the Triboliumdraft Assembly with Physical Maps Based onImaging Ultra-Long Single DNA Molecules!Jennifer Shelton2014

2. Assembly Pipeline3) use sequence reference to adjust molecule stretch for each scan 3. Assembly PipelineIn recent datasets when SNR is low and alignment is good we see a spike inbases per pixel (bpp) in the first scan, a plateau and a lower plateauFirst scan in aflow cell 4. Assembly Pipeline5) Use sequence reference to determine assembly noise parameters.Estimated genome size is used to set the p-value threshold. 5. Assembly Pipeline6/7) Variants of the starting p-value and default minimum molecule length areexplored in nine assemblies. 6. Current Tribolium sequence-based assemblyInput file N50 (Mb) Numberof ContigsCumulativeLength (Mb)Genome FASTA 1.16 2240 160.74in silico CMAP from FASTA 1.20 223 152.53223 scaffolds from the sequence-based assembly were longer than 20 (kb)with more than 5 labels and were converted into in silico CMAPs 7. Assembly ResultsInput file N50 (Mb) Numberof ContigsCumulativeLength (Mb)Genome FASTA 1.16 2240 160.74in silico CMAP from FASTA 1.20 223 152.53CMAP from assembled BNGmolecules (BNG CMAP)1.35 216 200.47BNG assembled molecules had a higher N50 and longer cumulative lengththan the sequence assembly!The estimated size of the Tribolium genome is ~200 (Mb) 8. Simplest XMAP alignment description1 (Mb)1.1 (Mb)1.1 (Mb) 1.3 (Mb)Breadth of alignment coverage for in silico CMAP: 2.1 (Mb)Total alignment length for in silico CMAP: 2.1 (Mb)!Breadth of alignment coverage for BNG CMAP: 2.4 (Mb)Total alignment length for BNG CMAP: 2.4 (Mb)in silico CMAPfrom genomeFASTACMAP fromassembledmoleculesin silico CMAP 1 in silico CMAP 2BNG CMAP 1 BNG CMAP 2 9. Complex XMAP alignment description1 (Mb)in silico CMAP 1BNG CMAP 1 BNG CMAP 21.1 (Mb) 1.3 (Mb)Breadth of alignment coverage for in silico CMAP: 1 (Mb)Total alignment length for in silico CMAP: 2 (Mb)!Breadth of alignment coverage for BNG CMAP: 2.4 (Mb)Total alignment length for BNG CMAP: 2.4 (Mb)in silico CMAPfrom genomeFASTACMAP fromassembledmolecules 10. Alignment of CMAPs1 (Mb)in silico CMAP 1BNG CMAP 1 BNG CMAP 21.1 (Mb) 1.3 (Mb)Breadth of alignment coverage compared to total aligned length can indicaterelevant relationships between assemblies!In this example differences between "breadth" and "total" length could be due to:!Duplications in sample molecules were extracted fromAssembly of alternate haplotypesMis-assembly creating redundant contigsCollapsed repeat in sequence assemblyin silico CMAPfrom genomeFASTACMAP fromassembledmolecules 11. Alignment of BNG assembly to reference genomeCMAP name Breadth of alignmentcoverage for CMAP(Mb)Length of totalalignment forCMAP (Mb)Percent of CMAPalignedin silico CMAP from FASTA 124.04 132.40 81CMAP from assembled BNGmolecules (BNG CMAP)131.64 132.34 67Close to 4% of the alignment of the in silico CMAP appears to be redundant!Overall 81% of the in silico CMAP aligns to the BNG consensus map 12. ChLG 9 super!Alignment of BNG assembly to reference genomescaffoldBNG consensusmapsChLG 9!scaffolds130 131 133 134 132 129 135 127 136 137 BNG consensusTypically where redundant alignments occur two BNG consensus mapsaligned suggesting they represent haplotypes although this has not beenverifiedmaps 13. Tribolium super-scaffolds overlapping BNG cmapChLG 9 super!scaffoldBNG consensusmapsChLG 9!scaffolds128 130 131 133 134 132 BNG consensusmaps 14. Alignment of BNG assembly to reference genome was used to super-scaffold the Triboliumscaffolds+ in silico CMAP 1 + in silico CMAP 4Stitch.pl estimates super scaffolds using alignments of scaffolds andassembled BNG molecules using BNG Refalignerin silico CMAPaligned asreference+ in silico CMAP 2 - in silico CMAP 3BNG CMAP 1 BNG CMAP 2 15. Alignment of BNG assembly to reference genome was used to super-scaffold the Triboliumscaffolds+ in silico CMAP 1 + in silico CMAP 4BNG CMAP 1 BNG CMAP 2+ in silico CMAP 1 + in silico CMAP 4Stitch.pl estimates super scaffolds using alignments of scaffolds andassembled BNG molecules using BNG Refalignerin silico CMAPaligned asreferencealignment isinverted andused as input forstitch+ in silico CMAP 2 - in silico CMAP 3BNG CMAP 1 BNG CMAP 2+ in silico CMAP 2 - in silico CMAP 3 16. Alignment of BNG assembly to reference genome was used to super-scaffold the Triboliumscaffolds+ in silico CMAP 1 + in silico CMAP 4BNG CMAP 1 BNG CMAP 2+ in silico CMAP 1 + in silico CMAP 4BNG CMAP 1 BNG CMAP 2Stitch.pl estimates super scaffolds using alignments of scaffolds andassembled BNG molecules using BNG Refalignerin silico CMAPaligned asreferencealignment isinverted andused as input forstitch+ in silico CMAP 2 - in silico CMAP 3+ in silico CMAP 4alignments arefiltered based onalignment lengthrelative totalpossiblealignment lengthand confidence+ in silico CMAP 2 - in silico CMAP 3BNG CMAP 1 BNG CMAP 2+ in silico CMAP 2 - in silico CMAP 3+ in silico CMAP 1 17. Alignment of BNG assembly to reference genome was used to super-scaffold the TriboliumscaffoldsBNG CMAP 1 BNG CMAP 2+ in silico CMAP 1 + in silico CMAP 4+ in silico CMAP 2 - in silico CMAP 3BNG CMAP 1+ in silico CMAP 1Stitch.pl checks alignment length against potential alignment lengths to findrelevant global rather than local alignmentsalignmentpasses becausethe alignmentlength is greaterthan 30% of thepotentialalignment length 18. Alignment of BNG assembly to reference genome was used to super-scaffold the TriboliumBNG CMAP 1 BNG CMAP 2+ in silico CMAP 1 + in silico CMAP 4+ in silico CMAP 2 - in silico CMAP 3BNG CMAP 1scaffolds+ in silico CMAP 2Stitch.pl checks alignment length against potential alignment lengths to findrelevant global rather than local alignmentsalignmentpasses becausethe alignmentlength is greaterthan 30% of thepotentialalignment length 19. Alignment of BNG assembly to reference genome was used to super-scaffold the TriboliumscaffoldsBNG CMAP 1 BNG CMAP 2+ in silico CMAP 1 + in silico CMAP 4+ in silico CMAP 2 - in silico CMAP 3- in silico CMAP 2BNG CMAP 2Stitch.pl checks alignment length against potential alignment lengths to findrelevant global rather than local alignmentsalignmentpasses becausethe alignmentlength is greaterthan 30% of thepotentialalignment length 20. Alignment of BNG assembly to reference genome was used to super-scaffold the TriboliumscaffoldsBNG CMAP 1 BNG CMAP 2+ in silico CMAP 1 + in silico CMAP 4+ in silico CMAP 2 - in silico CMAP 3- in silico CMAP 2BNG CMAP 2Stitch.pl checks alignment length against potential alignment lengths to findrelevant global rather than local alignmentsalignment failsbecause thealignment lengthis less than 30%of the potentialalignment length 21. Alignment of BNG assembly to reference genome was used to super-scaffold the TriboliumscaffoldsBNG CMAP 1 BNG CMAP 2+ in silico CMAP 1 + in silico CMAP 4+ in silico CMAP 2 - in silico CMAP 3+ in silico CMAP 2BNG CMAP 2Stitch.pl checks alignment length against potential alignment lengths to findrelevant global rather than local alignmentsalignment failsbecause thealignment lengthis less than 30%of the potentialalignment length 22. Alignment of BNG assembly to reference genome was used to super-scaffold the TriboliumscaffoldsBNG CMAP 1 BNG CMAP 2+ in silico CMAP 1 + in silico CMAP 4+ in silico CMAP 2 - in silico CMAP 3BNG CMAP 2Stitch.pl checks alignment length against potential alignment lengths to findrelevant global rather than local alignmentsalignmentpasses becausethe alignmentlength is greaterthan 30% of thepotentialalignment length- in silico CMAP 3 23. Alignment of BNG assembly to reference genome was used to super-scaffold the TriboliumBNG CMAP 1 BNG CMAP 2+ in silico CMAP 1 + in silico CMAP 4+ in silico CMAP 2 - in silico CMAP 3BNG CMAP 2scaffoldsStitch.pl checks alignment length against potential alignment lengths to findrelevant global rather than local alignmentsalignment failsbecause thealignment lengthis less than 30%of the potentialalignment length- in silico CMAP 3 24. Alignment of BNG assembly to reference genome was used to super-scaffold the TriboliumBNG CMAP 1 BNG CMAP 2+ in silico CMAP 1 + in silico CMAP 4+ in silico CMAP 2 - in silico CMAP 3BNG CMAP 2scaffoldsStitch.pl checks alignment length against potential alignment lengths to findrelevant global rather than local alignmentsalignmentpasses becausethe alignmentlength is greaterthan 30% of thepotentialalignment length+ in silico CMAP 4 25. Alignment of BNG assembly to reference genome was used to super-scaffold the TriboliumscaffoldsBNG CMAP 1 BNG CMAP 2+ in silico CMAP 2 - in silico CMAP 3+ in silico CMAP 4high qualityscaffoldingalignments...+ in silico CMAP 1 26. Alignment of BNG assembly to reference genome was used to super-scaffold the Triboliumscaffoldsare filtered forlongest andhighestconfidencealignment foreach in silicoCMAPBNG CMAP 1 BNG CMAP 2+ in silico CMAP 2 - in silico CMAP 3BNG CMAP 1 BNG CMAP 2+ in silico CMAP 2 - in silico CMAP 3+ in silico CMAP 4+ in silico CMAP 1 + in silico CMAP 4high qualityscaffoldingalignments...+ in silico CMAP 1 27. Alignment of BNG assembly to reference genome was used to super-scaffold the Triboliumscaffoldsare filtered forlongest andhighestconfidencealignment foreach in silicoCMAPPassingalignments areused to superscaffoldBNG CMAP 1 BNG CMAP 2+ in silico CMAP 2 - in silico CMAP 3BNG CMAP 1 BNG CMAP 2+ in silico CMAP 2 - in silico CMAP 3+ in silico CMAP 4+ in silico CMAP 1 + in silico CMAP 4BNG CMAP 1 BNG CMAP 2+ in silico CMAP 2 - in silico CMAP 3+ in silico CMAP 1 + in silico CMAP 4high qualityscaffoldingalignments...+ in silico CMAP 1 28. Alignment of BNG assembly to reference genome was used to super-scaffold the TriboliumscaffoldsStitch is iteratedand additionalsuperscaffoldingalignments arefoundBNG CMAP 1 BNG CMAP 2+ in silico CMAP 2 - in silico CMAP 3+ in silico CMAP 1 + in silico CMAP 4Iteration takes advantage of alignments where sequence-based scaffoldss