Reaching the Parts Other Methods
Can’t: Long Reads for Microbial
Genomics and MetagenomicsProf Nick Loman
Institute of Microbiology and Infection
University of Birmingham
COI Disclaimer:
Oxford Nanopore Technologies provided free of charge reagents in support of some of the studies presented here and paid an honorarium for NJL to
speak at a company meeting.
Why bother with long reads?
• De novo assembly– Generate finished references for resequencing projects
– Detection of large scale structural variation
– Plasmid, IS element, reconstruction
– Metagenomics
• Metabarcoding– Full length amplicons (16S, 18S, …)
• Haplotyping– Amplicons
– Strain reconstruction in complex mixtures
Long read technologies
• Single molecule • Synthetic/linked
Long read technologies
• Single molecule
– Contigs
– Noisy (5-15% error)
– Best for assembly
• Synthetic/linked
– Scaffolds
– Very accurate
– Best for haplotyping
10X Genomics
Chromium: 1m GemCodesGemCode: 100k GemCodes
James Hadfield, CRUK
Very low input requirement (1ng)Long fragments needed >50kb5 minute processing time
Patchy coverage within single molecule
Bryan Wee, James Hadfield
Long reads for reference genoems
How long is long enough?
• 7kb reads will
bridge rRNA
operon
• Typically the
longest bacterial
repeat
sequence
Serge Koren, Adam Phillippy
Quick et al. BMJ Open 2014 http://bmjopen.bmj.com/content/4/11/e006278.long
Forensic source tracking of Pseudomonas
aeruginosa in a Burns unit
• ST395 representative sequence by PacBio (P4-C2)
• Resolve to two contigs
With thanks to Lex Nederbragt
SNPs Indels Mapped
PAO1Reference
23 4 77%
PacBioReference
40 5 97%
Choice of reference informs
phylogenetic resolution
Bed 11
Environment
Water
Patient
ΔoprD
ΔlysR-like regulator
Rapid emergence of meropenem resistance
Long reads for metagenomics
Noisy reads for metagenomics
• PacBio – Even HMP mock community
Courtesy of Pacific Biosciences
Long read technologies
• Oxford Nanopore MinION • Portable, fieldable
• Real-time
• Low/no instrument cost
• Reads up to 1Mb
• 1-10Gb per run
• Homopolymeric tract errors
• 1D: 15% error
• 1D^2: 5% error
Nanopore WGS Consortium, https://github.com/nanopore-wgs-consortium/NA12878
R9.4 chemistry enables higher yields
In field whole-genome sequencing
MRSAS. pyogenes
900Mb 1D rapid
P. aeruginosa
E. faecium
Bandage by Ryan Wick
4.4Gb 1D ligation
Whale watching with nanopore
• Sambrook
• Excess of DNA (>10ug)
• V. careful pipetting
• Aim <1 transposase cut per moleculePicture from Phil Zuzerte
Total bases: 5,014,576,373 (5Gb) Number of reads: 150,604N50: 63,747Mean: 33,296.44
http://lab.loman.net/2017/03/09/ultrareads-for-nanopore/
Whale watching: E. coli
1113805 916705 790987 778219 771232 671130 646480 629747 614903 603565
Whale watching: E. coli
“Squiggles are vanity, alignments are sanity!
E. coli: genome assembly in 8 reads
Read Length Ref start Ref end Time (m)
1 876991 4398844 634183 32.48
2 696402 470003 1166405 25.79
3 799047 1137438 1936485 29.59
4 642071 1759431 2401502 23.78
5 826662 2106227 2932889 30.61
6 883962 2699626 3583588 32.73
7 825191 3285196 4110387 30.56
8 463341 3995967 4459308 17.16
miniasm
N50 4MbTime: 1.5s (1 CPU)
1x coverage!
PFGE: The future
What’s coming next?
• Combined single molecule and linked read
hybrid metagenomic assemblies?
• New methods for long fragment extraction from
microbiome studies
• Hi-C to link chromosomes w/ plasmids, phage
• Much higher outputs
Conclusions
• Long reads and synthetic long reads promising for whole-genome reconstruction from metagenomics
• Synthetic long reads (10X and/or hybrid) will be useful for detecting low abundance members of population
• Issues:– Very high input requirements may be challenging for certain
applications, e.g. diagnostics from clinical samples
– Extraction of high molecular weight DNA from mixed samples tricky; no one size fits all solution
– Cost remains prohibitive for very deep sequencing
Acknowledgements
– Josh Quick (Birmingham)
– 10X:
• Pablo Fuentes-Utrilla (Birmingham)
• Bryan Wee, Ross Fitzgerald (Roslin)
• James Hadfield (CRUK)
– Nanopore human genome sequencing consortium
– Pseudomonas
• Nicola Cumley, Beryl Oppenheim
Synthetic long reads
• Moleculo
– Long range amplification
– “Nextera on steroids”
– 500ng input
– Size selection required
http://www.illumina.com/products/truseq-synthetic-long-read-kit.ilmn
Top Related