De novo genome assembly -...
Transcript of De novo genome assembly -...
![Page 1: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/1.jpg)
De novo genome assembly
Dr Torsten Seemann
IMB Winter School - Brisbane – Mon 1 July 2013
![Page 2: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/2.jpg)
Introduction
![Page 3: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/3.jpg)
Ideal world
I would not need to give this talk!
AGTCTAGGATTCGCTATAGATTCAGGCTCTGATATATTTCGCGGGATTAGCTAGATCGCTATGCTATGATCTAGATCTCGAGATTCGTATAAGTCTAGGATTCGCTATAGATTCAGGCTCTGATATATTTCGCGGGATTAGCTA
Human DNA Non-existent USB3 device 46 complete
haplotype chromosome sequences
![Page 4: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/4.jpg)
Real world
• Can’t sequence full-length native DNA – no instrument exists (yet)
• But we can sequence short fragments
– 100 at a time (Sanger) – 100,000 at a time (Roche 454) – 1,000,000 at a time (PGM) – 10,000,000 at a time (Proton, MiSeq) – 100,000,000 at a time (HiSeq)
![Page 5: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/5.jpg)
Make a DNA library
• DNA preparation – depends on sequencing platform being used
• Typical steps
– Shearing: chop DNA into smaller fragments – Size selection: choose the size range you need – Adaptor ligation: add special sequence to ends
• Now ready to sequence!
![Page 6: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/6.jpg)
Instruments Platform Method Read Length Yield Quality Value
Illumina synthesis + fluorescence 250 ++++ +++++ ++++
SOLiD ligation + fluorescence 75 ++++ +++ +++
PGM non-term NTP + pH wells 300 ++ +++ +++
Proton non-term NTP + pH wells 400 +++ ++ +++
Roche 454 non-term NTP + luminescence 600 + +++ ++
PacBio synthesis + ZMW 12000 ++ + ++
![Page 7: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/7.jpg)
Which sequencing platform? Long reads
Low cost
High yield
High quality
Pick any 3
![Page 8: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/8.jpg)
De novo assembly
The process of reconstructing the original DNA sequence from the fragment reads alone.
• Instinctively like a jigsaw puzzle
– Find reads which “fit together” (overlap) – Could be missing pieces (sequencing bias) – Some pieces will be dirty (sequencing errors)
![Page 9: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/9.jpg)
An example
![Page 10: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/10.jpg)
A small “genome”
Friends, Romans, countrymen, lend me your ears;
I’ll return them
tomorrow!
![Page 11: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/11.jpg)
Shakespearomics • Reads
ds, Romans, count ns, countrymen, le Friends, Rom send me your ears; crymen, lend me
Oops! I dropped
them.
![Page 12: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/12.jpg)
Shakespearomics • Reads
ds, Romans, count ns, countrymen, le Friends, Rom send me your ears; crymen, lend me
• Overlaps Friends, Rom ds, Romans, count ns, countrymen, le crymen, lend me send me your ears;
I’m good with words.
![Page 13: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/13.jpg)
Shakespearomics • Reads
ds, Romans, count ns, countrymen, le Friends, Rom send me your ears; crymen, lend me
• Overlaps Friends, Rom ds, Romans, count ns, countrymen, le crymen, lend me send me your ears;
• Majority consensus Friends, Romans, countrymen, lend me your ears;
We have a consensus!
![Page 14: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/14.jpg)
So far, so good.
![Page 15: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/15.jpg)
The awful truth
“Genome assembly is impossible.”
A/Prof. Mihai Pop World leader in de novo assembly research.
He wears glasses so he must be smart
![Page 16: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/16.jpg)
Methods
![Page 17: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/17.jpg)
Approaches
• greedy assembly • overlap :: layout :: consensus • de Bruijn graphs • string graphs • seed and extend
… all essentially doing the same thing, but taking different short cuts.
![Page 18: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/18.jpg)
Assembly recipe
• Find all overlaps between reads – hmm, sounds like a lot of work…
• Build a graph – a picture of read connections
• Simplify the graph – sequencing errors will mess it up a lot
• Traverse the graph – trace a sensible path to produce a consensus
![Page 19: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/19.jpg)
![Page 20: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/20.jpg)
Find read overlaps • If we have N reads of length L
– we have to do ½N(N-1) ~ O(N²) comparisons – each comparison is an ~ O(L²) alignment – use special tricks/heuristics to reduce these!
• What counts as “overlapping” ? – minimum overlap length eg. 20bp – minimum %identity across overlap eg. 95% – choice depends on L and expected error rate
![Page 21: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/21.jpg)
N=6 → 15 alignment scores
Read# 1 2 3 4 5 6
1 - - - - - -
2 80 - - - - -
3 95 85 - - - -
4 0 30 20 - - -
5 0 0 25 70 - -
6 0 35 25 60 50 -
![Page 22: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/22.jpg)
Graph construction
Thicker lines mean
stronger evidence for
overlap
Node/Vertex Edge/Arc
![Page 23: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/23.jpg)
A more realistic graph
![Page 24: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/24.jpg)
What ruins the graph? • Read errors
– introduce false edges and nodes
• Non-haploid organisms – heterozygosity causes lots of detours
• Repeats – if longer than read length – causes nodes to be shared, locality confusion
![Page 25: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/25.jpg)
Graph simplification
• Squash small bubbles – collapse small errors (or minor heterozygosity)
• Remove spurs
– short “dead end” hairs on the graph
• Join unambiguously connected nodes – reliable stretches of unique DNA
![Page 26: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/26.jpg)
Graph traversal • For each unconnected graph
– at least one per replicon in original sample
• Find a path which visits each node once – Hamiltonian path/cycle is NP-hard (this is bad) – solution will be a set of paths which terminate at
decision points
• Form a consensus sequences from paths – use all the overlap alignments – each of these collapsed paths is a contig
![Page 27: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/27.jpg)
Contigs
Contiguous, unambiguous stretches of assembled DNA sequence
• Contigs ends correspond to – Real ends (for linear DNA molecules) – Dead ends (missing sequence) – Decision points (forks in the road)
![Page 28: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/28.jpg)
Repeats
![Page 29: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/29.jpg)
What is a repeat?
A segment of DNA which occurs more than once in the genome sequence
• Very common – Transposons (self replicating genes) – Satellites (repetitive adjacent patterns) – Gene duplications (paralogs)
![Page 30: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/30.jpg)
Dot plots
Self similarity plot, genome versus itself
![Page 31: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/31.jpg)
Effect on assembly
The repeated element is collapsed into a single contig
![Page 32: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/32.jpg)
Repeat mis-assembly
a b c
a c b
a b c d I II III
I
II
III a
b c
d
b c
a b d c e f
I II III IV
I III II IV
a d b e c f
a
collapsed tandem excision
rearrangement
![Page 33: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/33.jpg)
The law of repeats
• It is impossible to resolve repeats of length S unless you have reads longer than S.
• It is impossible to resolve repeats of
length S unless you have reads longer than S.
![Page 34: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/34.jpg)
Scaffolding
![Page 35: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/35.jpg)
Beyond contigs
Contig sizes are limited by: • the length of repeats in your genome
– Can’t change this!
• the length (or “span”) of the reads – Wait for new technology – Use “tricks” with existing technology
![Page 36: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/36.jpg)
Types of reads • Example fragment
– atcgtatgatcttgagattctctcttcccttatagctgctata
• “Single-end” read – atcgtatgatcttgagattctctcttcccttatagctgctata – Sequence one end of the fragment
• “Paired-end” read – atcgtatgatcttgagattctctcttcccttatagctgctata – Sequence both ends of same fragment – we can exploit this information!
![Page 37: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/37.jpg)
Scaffolding
• Paired-end reads – known sequences at either end – roughly known distance between ends – unknown sequence between ends
• Most ends will occur in same contig – if our contigs are longer than pair distance
• Some ends will be in different contigs – evidence that these contigs are linked!
![Page 38: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/38.jpg)
Contigs to scaffolds
Contigs
Paired-end read
Scaffold Gap Gap
![Page 39: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/39.jpg)
Assumptions
![Page 40: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/40.jpg)
What can we assemble?
• Genomes – A single organism eg. its chromosomal DNA
• Meta-genomes – Genomic DNA from a mixture of organisms
• Transcriptomes – A single organism’s RNA inc. mRNA, ncRNA
• Meta-transcriptomes – RNA from a mixture of organisms
2:30pm
![Page 41: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/41.jpg)
Genomes
• Expect uniformity – Each part of genome represented
by roughly equal number of reads
• Average depth of coverage – Genome: 4 Mbp – Yield: 4 million x 50 bp reads = 200 Mbp – Coverage: 200 ÷ 4 = 50x (reads per bp)
![Page 42: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/42.jpg)
Meta-genomes
• Expect proportionality & uniformity – Each genome represented by proportion of
reads similar to their proportion in mixture
• Example – Mix of 3 species: ¼ Staph, ¼ Clost, ½ Ecoli – Say we get 4M reads – Then we expect about:
1M from Staph, 1M from Clost, 2M from Ecoli
![Page 43: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/43.jpg)
Meta-genome issues
• Closely related species – will have very similar reads – lots of shared nodes in the graph
• Conserved sequence – bits of DNA common to lots of organisms – “hub” nodes in the graph
• Untangling is difficult – need longer reads
![Page 44: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/44.jpg)
Assessment
![Page 45: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/45.jpg)
Assessing assemblies
• We desire – Total length similar to genome size – Fewer, larger contigs – Correct contigs
• Metrics – No generally useful objective measure – Longest contig, total bp, N50, …
![Page 46: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/46.jpg)
The “N50”
The length of that contig from which 50% of the bases are in it and shorter contigs
• Imagine we got 7 contigs with lengths: – 1,1,3,5,8,12,20
• Total – 1+1+3+5+8+12+20 = 50
• N50 is the “halfway sum” = 25 – 1+1+3+5+8+12 = 30 (≥ 25) so N50 is 12
![Page 47: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/47.jpg)
N50 concerns
• Optimizing for N50 – encourages mis-assemblies!
• An aggressive assembler may over-join: – 1,1,3,5,8,12,20 (previous) – 1,1,3,5,20,20 (now) – 1+1+3+5+20+20 = 50 (unchanged)
• N50 is the “halfway sum” (still 25) – 1+1+3+5+20= 30 (≥ 25) so N50 is 20 (was 12)
![Page 48: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/48.jpg)
Validation
• Self consistency – Align read back to contigs – Check for errors or discordant pairs
• Second opinion
– Use two complementary sequencing methods – Target troublesome areas for PCR – Use a genome wide “optical map”
![Page 49: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/49.jpg)
How do I do it?
![Page 50: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/50.jpg)
Example
• Culture your bacterium • Extract your genomic DNA • Send it to AGRF for Illumina sequencing
– 100bp paired end • Get back two files:
– MRSA_R1.fastq.gz – MRSA_R2.fastq.gz
• Now what?
![Page 51: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/51.jpg)
Assembly tools • Genome
– Velvet, Abyss, Mira, Newbler, SGA, AllPaths, Ray, SOAPdenovo, Spades, Masurca, …
• Meta-genome – MetaVelvet, SGA, custom scripts + above
• Transcriptome – Trans-Abyss, Oases, Trinity
• Meta-Transcriptome – custom scripts + above
![Page 52: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/52.jpg)
Online tutorial
• The GVL – Genomics Virtual Laboratory – http://genome.edu.au
• Protocols – Microbial de novo assembly for Illumina data – Written by Simon Gladman (VBC/LSCC) – https://genome.edu.au/wiki/Protocols
![Page 53: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/53.jpg)
Velvet: hash reads velveth MyFolder 71 -shortPaired -fastq.gz -separate MRSA_R1.fastq.gz MRSA_R2.fastq.gz
Read type
Read files
K-mer size
![Page 54: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/54.jpg)
Velvet: assembly
velvetg
MyFolder
-exp_cov auto -cov_cutoff auto
“Signal” level
“Noise” level
![Page 55: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/55.jpg)
Velvet: examine results less MyFolder/contigs.fa >NODE_1_length_43211_cov_27.36569 AGTCGATGCTTAGAGAGTATGACCTTCTATACAAAA ATCTTATATTAGCGCTAGTCTGATAGCTCCCTAGAT CTGATCTGATATGATCTTAGAGTATCGGCTATTGCT AGTCTCGCGTATAATAAATAATATATTTTTCTAATG ATCTTATATTAGCGCTAGTCTGATAGCTCCCTAGAT CTGATCTGATATGATCTTAGAGTATCGGCTATTGCT AGTCTCGCGTATAATAAATAATATATTTAGTAGTCT …
![Page 56: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/56.jpg)
Velvet: GUI
Where to save
Click run
Add your reads
Velvet Assembler Graphical User Environment
![Page 57: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/57.jpg)
Contact
• Email – [email protected]
• Blog
– TheGenomeFactory.blogspot.com
• Web – vicbioinformatics.com – vlsci.org.au/lscc
Torst!
5½!
![Page 58: De novo genome assembly - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · De novo genome assembly Dr Torsten Seemann ... World leader in de novo](https://reader030.fdocuments.us/reader030/viewer/2022021710/5bfe7de509d3f295268bec8e/html5/thumbnails/58.jpg)