MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.

19
MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio

Transcript of MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.

Page 1: MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.

MCB3895-004 Lecture #15Oct 23/14

De novo assemblies using PacBio

Page 2: MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.

PacBio

• Long read sequencing technology

• High error rate (~13%) threw people at first• What would this be good for?

• Scaffolding an early focus

• Also correct reads using Illumina data• (now obsolete)

Page 3: MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.
Page 4: MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.
Page 5: MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.
Page 6: MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.
Page 7: MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.
Page 8: MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.
Page 9: MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.

HGAP

• "Hierarchical Genome Assembly Process"

1. Preassembly - corrects longest reads by mapping shorter reads to them, quality trims

2. Assembly - OLC approach

3. Polishing - Quiver software derives consensus from mapped reads, uses to correct assembly

Page 10: MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.
Page 11: MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.

Results

• My test gave an impressive 1 contig!• High ~60X coverage, tame dataset

• Known problem: still some SNP errors • Can run Quiver again1. Import assembly as a reference sequence2. Perform reference mapping using same reads vs.

new reference3. Will output a new consensus fasta file

incorporating the variants it finds

Page 12: MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.
Page 13: MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.
Page 14: MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.
Page 15: MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.
Page 16: MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.

PacBio chemistries

• PacBio has continually updated both its polymerases and detection chemistry

• Current test data uses P4-C2 chemistry

• P5-C3 gave slightly better length, maybe a bit more error

• Fastq available for this E.coli: SRR1284073

• Brand new: P6-C4

Page 17: MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.

P6-C4

• As per last week

• 10-15kb read N50

• Slightly better accuracy?

• http://blog.pacificbiosciences.com/2014/10/new-chemistry-boosts-average-read.html

Page 18: MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.

Other options: hybrid assemby

• It is possible to combine multiple data types

• Goal: cover the respective strengths of each• (of course, could confound too!)

• SPAdes is one of the most flexible assemblers in this regard

• Must have some Illumina• Will accept corrected, uncorrected PacBio (and

many more, including Oxford Nanopore)

Page 19: MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.

Assignment #7

• Create 2 E.coli assemblies using PacBio data• Use P4-C2 alone and HGAP• Use Illumina + P5-C3 uncorrected• Use Illumina + P4-C2 uncorrected• Use Illumina + P4-C2 corrected• Multiple quiver steps to correct• Some other option!

• Hand in:• 2 genome assemblies• Lab notebook file detailing exact commands