Sequencing technology› Roche/454 GS-FLX (‘454’)› Illumina
Prokaryotic profiling› De novo genome sequencing› Metagenomics› SNP profiling› Species quantification
Viral profiling› De novo genome sequencing
Classic chain-terminator sequencing
Dye chain-terminator sequencing
Next-generation sequencing
Next-gen sequencing principle› Massive parallel› Add ACTGs› Catch a signal
Roche/454 GS-FLX+ (‘454’)› Pyrosequencing
problems with homopolymers (e.g. AAAAAA)
› Long-read sequencing: 500-1000 bp› Variable sequencing length› 1 million reads/run
1Gb/run
› Sequencing speed: ~ 1 day/run› Next-next generation: IonTorrent
PGM/Proton
Illumina › Sequence by synthesis› Short-read sequencing: 36, 72, …, 150bp› Fixed sequencing length› 1 billion reads/run
100Gb/run (= 33 x human genome!)Sequencing speed: 3 day – 10 days ~ length
Solid› Short-read sequencing (similar to Illumina)
454 Illumina
Price per run: $10000/run Price per machine: $200-500.000
› Supporting IT hardware› Peripheral devices such as fragmentation
instrument, PCR equipment …› Negotiating power…
Use service centers!› Nxtgnt (BE), GATC(EU), Baseclear(NL), BGI …› No overhead cost, no maintenance etc.› Cheaper
Next-generation sequencing has become 2nd generation sequencing
Next-next-generation sequencing is almost there: 3rd generation sequencing› Helicos: True Single Molecule Sequencing› IonTorrent/Life: Cheap and fast› Nanopore: Unlimited read size› …
Evolution sequencing technology goes hand in hand with evolution of› IT infrastructure/hardware› Analysis software
Hardware› 1 Illumina run ~ 100Gb text-file ~ 5million
page book › Processing power/storage are an issue!
Software› Mapping to a human genome: ‘couple of hours’
Sequencing technology› Roche/454 GS-FLX (‘454’)› Illumina
Prokaryotic profiling› De novo genome sequencing› Metagenomics› SNP profiling› Species quantification
Viral profiling› De novo genome sequencing
Prokaryotic genomics 101› Prokaryotes = bacterias + archaea› Prokaryotic genomes
Large circular genome (0.5 – 10 Mb) ‘chromosome’
Small plasmids (1-1000 kb) (virulence factors, antibiotics resistance …)
(Almost) no introns Easy ORF annotation
Sequencing technology› Roche/454 GS-FLX (‘454’)› Illumina
Prokaryotic profiling› De novo genome sequencing› Metagenomics› SNP profiling› Species quantification
Viral profiling› De novo genome sequencing
1953: Watson/Crick discover DNA helix 1977: First complete genome
bacteriophage φX174 1995: First genome of free-living organism
H. influenza 2001: First draft of the human genome 2006: >200 complete bacterial genomes 2012: An uncountable number of bacterial
genomes have been sequenced using next-gen sequencing
Complete bacterial genomes used to be› Expensive› Difficult to obtain› ‘Nature’ or ‘Science’ work› Remained complex until the invention of
next-generation sequencing
Using next-generation sequencing, de novo sequencing has become› Relatively easy› Relatively cheap› Routine research
Already >10 complete bacterial genomes published in 2012› More than just an assembly!
Practical1. Get some DNA from an isolated species
of interest2. Sequence: long or short reads (1-10 days)3. Obtain your sequences4. Assemble (1h)
Pure de novo assembly Guided assembly
5. Annotate the genome (days-weeks)
Assembly:Multiple ‘short’ reads
1 long sequence Existing software
› Velvet› SSAKE› Newbler› SSAKE› …
Source: Nature 2009, MacLean et al.
Relatively cheap› Sequencing cost: depending on coverage
Illumina, 30x, 5Gb genome: $10-$100 454, 30x, 5Gb genome: $1000-$5000
› Equipment IT infrastructure, sequencing equipment, people
… Relatively easy
› Need for IT support› No out-of-the-box standard solution for
everything› Several different software packages for
assembly
Sequencing technology› Roche/454 GS-FLX (‘454’)› Illumina
Prokaryotic profiling› De novo genome sequencing› Metagenomics› SNP profiling› Species quantification
Viral profiling› De novo genome sequencing
De novo genome assembly› Study of 1 single species› Need for species isolation
Metagenomics analysis› Study of a community of species› No need for isolation (culturing bias!)› Study the collective gene pool and
function of the community/ecology› No need for individual functions
Practical1. Get bacterial DNA or RNA from a sample
Soil Gut/Fecal Ocean water (e.g. Craig Venter) …
2. Sequence: long or short reads (1-10 days)3. Obtain your sequences4. Map on a database of known genes (1 day)5. Annotate/analyse the community (weeks)
2010: Giant Panda genome (2nd carnivore)› No umami taster receptor -> no meat
affinity› The panda is more a dog than a bear› The panda is a carnivore eating bamboo!
Still 2010 !: Panda ‘microbiome’ Gut microbiome of the panda reveals
the presence of bamboo/cellulose degrading pathways
A clinical example: gut microbiome can predict diabetes and malnourishment
Plos One (2011), Brown et al.Plos One (2010), Valladares et al.Gut Pathology (2011),Gupta et al.
Sequencing technology› Roche/454 GS-FLX (‘454’)› Illumina
Prokaryotic profiling› De novo genome sequencing› Metagenomics› SNP profiling› Species quantification
Viral profiling› De novo genome sequencing
Classical SNP analysis - practical1. Design PCR primers2. Generate amplicons3. Re-sequence using long read sequencing
Conserve ‘SNP blocks’
4. Detect SNPs 5. Correlate SNPs to drug resistance,
severity of symptoms …
Amplicon resequencing is the same for human, prokaryotic, viral analyses
Many standardized out-of-the-box solutions available
Very simple analysis Watch out for the overkill…
› Don’t use a bazooka to kill a fly!› Throughput can be too high
Profile the coding region of hepatitis C
Lauck et al. 2012
Use next-generation sequencing to predict the optimal HIV therapy
Thielen et al. 2012
Sequencing technology› Roche/454 GS-FLX (‘454’)› Illumina
Prokaryotic profiling› De novo genome sequencing› Metagenomics› SNP profiling› Species quantification
Viral profiling› De novo genome sequencing
Imagine the following research questions› Which (known) species/groups are present
in a certain sample› Does this composition alter given a certain
treatment, change of conditions, patients etc.
No need for de novo genome sequencing
No metagenomics: species instead of functions
Prokaryotes have the gene 16S rDNA, coding for ribosomal RNA
The 16S rDNA region is 1.5 kb long 16S rDNA is specific for each
species/strain Theoretical: 4
1,500 = 10
903 possibilities
In practice: 16S rDNA sequence known for millions of species
16S rDNA can be isolated in different species using universal PCR primers› Isolate/amplify different regions using the
same primers Compare the isolated sequences
against a database of known sequences
Practical procedure1. Sample an environment and isolate DNA2. Do a universal PCR amplification3. Sequence using long read sequencing:
the longer the better!4. Obtain sequences5. Map sequences against a reference
database6. Annotate the data
Example: The Antarctica project› Which parameters determine the
composition of bacterial communities in antarctical lakes?
› 20 different samples/lakes› Sequence 16S rDNA genes› 1 x 454 run (1 million 500bp sequences)› Map all sequences back to the RDP
database
Analyse the data using computing power› Compare different locations
Is species A present in location1, location2,…› Assess the distribution in a single location
How dominant is the most dominant species in location 1
How many species are in location 1 …
Visualize !
Analyse different samples on different taxonomic levels› Include taxonomic tree of life of bacterias› Use a ‘taxonomy browser’
Analyse a single location
Compare different locations
Analysis Lab work difficulty
Analysis difficulty
De novo genome ++ (isolate) +
Metagenomics + +++ (pathways etc.)
SNP +++ (design primers)
++ (correlate)
Species quantification ++ (universal primers)
++
Sequencing technology› Roche/454 GS-FLX (‘454’)› Illumina
Prokaryotic profiling› De novo genome sequencing› Metagenomics› SNP profiling› Species quantification
Viral profiling› De novo genome sequencing
Viral profiling› Viral profiling = prokaryotic profiling, but…
Cheaper Faster Easier
› De novo genome sequencing = OK› Don’t spend $10.000 on a 100kb genome!› Multiplexing/pooling capacity is limited!
Watch out for the overkill› An illumina run can be split into 8 lanes› >20 samples per lane can be combined
Still >100Mb per sample…
Thanks for your attention !
Top Related