Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina Prokaryotic profiling › De...

Post on 14-Jan-2016

249 views 3 download

Tags:

Transcript of Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina Prokaryotic profiling › De...

Sequencing technology› Roche/454 GS-FLX (‘454’)› Illumina

Prokaryotic profiling› De novo genome sequencing› Metagenomics› SNP profiling› Species quantification

Viral profiling› De novo genome sequencing

Classic chain-terminator sequencing

Dye chain-terminator sequencing

Next-generation sequencing

Next-gen sequencing principle› Massive parallel› Add ACTGs› Catch a signal

Roche/454 GS-FLX+ (‘454’)› Pyrosequencing

problems with homopolymers (e.g. AAAAAA)

› Long-read sequencing: 500-1000 bp› Variable sequencing length› 1 million reads/run

1Gb/run

› Sequencing speed: ~ 1 day/run› Next-next generation: IonTorrent

PGM/Proton

Illumina › Sequence by synthesis› Short-read sequencing: 36, 72, …, 150bp› Fixed sequencing length› 1 billion reads/run

100Gb/run (= 33 x human genome!)Sequencing speed: 3 day – 10 days ~ length

Solid› Short-read sequencing (similar to Illumina)

454 Illumina

Price per run: $10000/run Price per machine: $200-500.000

› Supporting IT hardware› Peripheral devices such as fragmentation

instrument, PCR equipment …› Negotiating power…

Use service centers!› Nxtgnt (BE), GATC(EU), Baseclear(NL), BGI …› No overhead cost, no maintenance etc.› Cheaper

Next-generation sequencing has become 2nd generation sequencing

Next-next-generation sequencing is almost there: 3rd generation sequencing› Helicos: True Single Molecule Sequencing› IonTorrent/Life: Cheap and fast› Nanopore: Unlimited read size› …

Evolution sequencing technology goes hand in hand with evolution of› IT infrastructure/hardware› Analysis software

Hardware› 1 Illumina run ~ 100Gb text-file ~ 5million

page book › Processing power/storage are an issue!

Software› Mapping to a human genome: ‘couple of hours’

Sequencing technology› Roche/454 GS-FLX (‘454’)› Illumina

Prokaryotic profiling› De novo genome sequencing› Metagenomics› SNP profiling› Species quantification

Viral profiling› De novo genome sequencing

Prokaryotic genomics 101› Prokaryotes = bacterias + archaea› Prokaryotic genomes

Large circular genome (0.5 – 10 Mb) ‘chromosome’

Small plasmids (1-1000 kb) (virulence factors, antibiotics resistance …)

(Almost) no introns Easy ORF annotation

Sequencing technology› Roche/454 GS-FLX (‘454’)› Illumina

Prokaryotic profiling› De novo genome sequencing› Metagenomics› SNP profiling› Species quantification

Viral profiling› De novo genome sequencing

1953: Watson/Crick discover DNA helix 1977: First complete genome

bacteriophage φX174 1995: First genome of free-living organism

H. influenza 2001: First draft of the human genome 2006: >200 complete bacterial genomes 2012: An uncountable number of bacterial

genomes have been sequenced using next-gen sequencing

Complete bacterial genomes used to be› Expensive› Difficult to obtain› ‘Nature’ or ‘Science’ work› Remained complex until the invention of

next-generation sequencing

Using next-generation sequencing, de novo sequencing has become› Relatively easy› Relatively cheap› Routine research

Already >10 complete bacterial genomes published in 2012› More than just an assembly!

Practical1. Get some DNA from an isolated species

of interest2. Sequence: long or short reads (1-10 days)3. Obtain your sequences4. Assemble (1h)

Pure de novo assembly Guided assembly

5. Annotate the genome (days-weeks)

Assembly:Multiple ‘short’ reads

1 long sequence Existing software

› Velvet› SSAKE› Newbler› SSAKE› …

Source: Nature 2009, MacLean et al.

Relatively cheap› Sequencing cost: depending on coverage

Illumina, 30x, 5Gb genome: $10-$100 454, 30x, 5Gb genome: $1000-$5000

› Equipment IT infrastructure, sequencing equipment, people

… Relatively easy

› Need for IT support› No out-of-the-box standard solution for

everything› Several different software packages for

assembly

Sequencing technology› Roche/454 GS-FLX (‘454’)› Illumina

Prokaryotic profiling› De novo genome sequencing› Metagenomics› SNP profiling› Species quantification

Viral profiling› De novo genome sequencing

De novo genome assembly› Study of 1 single species› Need for species isolation

Metagenomics analysis› Study of a community of species› No need for isolation (culturing bias!)› Study the collective gene pool and

function of the community/ecology› No need for individual functions

Practical1. Get bacterial DNA or RNA from a sample

Soil Gut/Fecal Ocean water (e.g. Craig Venter) …

2. Sequence: long or short reads (1-10 days)3. Obtain your sequences4. Map on a database of known genes (1 day)5. Annotate/analyse the community (weeks)

2010: Giant Panda genome (2nd carnivore)› No umami taster receptor -> no meat

affinity› The panda is more a dog than a bear› The panda is a carnivore eating bamboo!

Still 2010 !: Panda ‘microbiome’ Gut microbiome of the panda reveals

the presence of bamboo/cellulose degrading pathways

A clinical example: gut microbiome can predict diabetes and malnourishment

Plos One (2011), Brown et al.Plos One (2010), Valladares et al.Gut Pathology (2011),Gupta et al.

Sequencing technology› Roche/454 GS-FLX (‘454’)› Illumina

Prokaryotic profiling› De novo genome sequencing› Metagenomics› SNP profiling› Species quantification

Viral profiling› De novo genome sequencing

Classical SNP analysis - practical1. Design PCR primers2. Generate amplicons3. Re-sequence using long read sequencing

Conserve ‘SNP blocks’

4. Detect SNPs 5. Correlate SNPs to drug resistance,

severity of symptoms …

Amplicon resequencing is the same for human, prokaryotic, viral analyses

Many standardized out-of-the-box solutions available

Very simple analysis Watch out for the overkill…

› Don’t use a bazooka to kill a fly!› Throughput can be too high

Profile the coding region of hepatitis C

Lauck et al. 2012

Use next-generation sequencing to predict the optimal HIV therapy

Thielen et al. 2012

Sequencing technology› Roche/454 GS-FLX (‘454’)› Illumina

Prokaryotic profiling› De novo genome sequencing› Metagenomics› SNP profiling› Species quantification

Viral profiling› De novo genome sequencing

Imagine the following research questions› Which (known) species/groups are present

in a certain sample› Does this composition alter given a certain

treatment, change of conditions, patients etc.

No need for de novo genome sequencing

No metagenomics: species instead of functions

Prokaryotes have the gene 16S rDNA, coding for ribosomal RNA

The 16S rDNA region is 1.5 kb long 16S rDNA is specific for each

species/strain Theoretical: 4

1,500 = 10

903 possibilities

In practice: 16S rDNA sequence known for millions of species

16S rDNA can be isolated in different species using universal PCR primers› Isolate/amplify different regions using the

same primers Compare the isolated sequences

against a database of known sequences

Practical procedure1. Sample an environment and isolate DNA2. Do a universal PCR amplification3. Sequence using long read sequencing:

the longer the better!4. Obtain sequences5. Map sequences against a reference

database6. Annotate the data

Example: The Antarctica project› Which parameters determine the

composition of bacterial communities in antarctical lakes?

› 20 different samples/lakes› Sequence 16S rDNA genes› 1 x 454 run (1 million 500bp sequences)› Map all sequences back to the RDP

database

Analyse the data using computing power› Compare different locations

Is species A present in location1, location2,…› Assess the distribution in a single location

How dominant is the most dominant species in location 1

How many species are in location 1 …

Visualize !

Analyse different samples on different taxonomic levels› Include taxonomic tree of life of bacterias› Use a ‘taxonomy browser’

Analyse a single location

Compare different locations

Analysis Lab work difficulty

Analysis difficulty

De novo genome ++ (isolate) +

Metagenomics + +++ (pathways etc.)

SNP +++ (design primers)

++ (correlate)

Species quantification ++ (universal primers)

++

Sequencing technology› Roche/454 GS-FLX (‘454’)› Illumina

Prokaryotic profiling› De novo genome sequencing› Metagenomics› SNP profiling› Species quantification

Viral profiling› De novo genome sequencing

Viral profiling› Viral profiling = prokaryotic profiling, but…

Cheaper Faster Easier

› De novo genome sequencing = OK› Don’t spend $10.000 on a 100kb genome!› Multiplexing/pooling capacity is limited!

Watch out for the overkill› An illumina run can be split into 8 lanes› >20 samples per lane can be combined

Still >100Mb per sample…

Thanks for your attention !

joachim.deschrijver@ugent.be