High-Throughput Sequencing

Post on 03-Dec-2014

2.849 views 0 download

Tags:

description

Talk on High-Throughput Sequencing: Overview and Selected Applications for Masters students. Nov 9th 2011

Transcript of High-Throughput Sequencing

PROFESSOR MARK PALLENUNIVERSITY OF BIRMINGHAM

High-throughput sequencing

Overview and selected applications

Outline

What is high-throughput sequencing? How it works Key considerations

Applications Clinical Microbiology Cancer Biology

Conventional Sequencing

Sanger dideoxy chemistry 1970sBacterial genome sequencing

1990s Whole-genome shotgun Clonal populations of template

molecules in vector/cloning host Read lengths >500 bp De novo assembly

Drawbacks Time-consuming, expensive, onerous

Beyond average project grant Out of reach of university infrastructure

Relies on colony propagation and picking Some sequences cannot be cloned

High-throughput Sequencing

>100x faster, >100x cheaper! A disruptive technology

Three “second-generation” technologies in the marketplace 454 (Roche) Solexa (Illumina) SOLiD (ABI)

Fundamentally new approaches Solid-phase amplification of clonal

templates in “molecular colonies” Massive increase in number of “clones”

compensates for shorter read length New chemistries for sequence reading

Pyrophosphate detection (PPi release upon base addition): 454

Reversible addition of fluorescent : Solexa

Sequencing by Ligation: SOLiD

Recent Developments

Single-molecule sequencing Pacific Biosciences (PacBio) Nanopore

Benchtop sequencers Ion Torrent MiSeq

Sequencing in Birmingham

454 Life Sciences (Roche)

454 Life Sciences (Roche)

Solexa/Illumina

Sequencing

SOLiD Sequencing

Requires emPCRLong run-timesShort read-lengths(stuck at 50bp)Sequences in colour space

Vendor: Roche Illumina ABI

Technology:

454 Solexa GA SOLiD

Platform: GS20 FLX Ti I II IIx 1 2 3

Reads: (M) 0.5 0.5 1.25 28 100 150 40 115 320

Fragment

Read length:

100 200 400* 35 50 100 25 35 50

Run time: (d)

0.25 0.3 0.4 3 3 5 6 5 8

Yield: (Gb#)

0.05 0.1 0.5 1 5 15 1 4 16

Rate: (Gb/d)

0.2 0.33 1.25 0.33 1.67 3 0.34 1.6 2

Images: (TB#)

0.01 0.01 0.03 0.5 1.1 2.8 1.8 2.5 1.9

Paired-end

Read length:

200 400 2×35 2×502×10

02×25 2×35 2×50

Insert: (kb) 3.5 3.5* 0.2 0.2 0.2 3 3 3

Run time: (d)

0.3 0.4 6 10 10 12 10 16

Yield: (Gb) 0.1 0.5 2 9 30 2 8 32Source: http://www.politigenomics.com/next-generation-sequencing-informatics

*Now improved to 1 kb reads and choice of 3, 8 or 20 kb inserts #b=bases, B=bytes

Moore’s law applies!

The Sequencing Singularity!

Everything published is out of date!

Modes and Applications

For some applications, 454 read length essential, e.g. amplicon sequencing; otherwise assembly will create

chimeras differential splicing; translocations

For other applications read number is more important; read length less so Transcriptomics where 35 b read will identify

transcript SNP discovery/screening

Modes and Applications

Modes Basic shotgun ‘library’ Paired-end or mate-pair shotgun Amplicon sequencing

Applications Whole genome Metagenome, phylogenetic profiling Transcriptome SNP analysis; Splice variants; Methylation Targeted sequence capture by microarray; Small

RNAs

Modes and Applications

Sequencing run is the basic unit Basic cost of 454 or Illumina ~several £1000s per run

in consumables & essential on-costs Additions for consumables and/or staff time for

multiple library preparation some modes, e.g. paired end data analysis etc

Run can be subdivided Plate-dividing gaskets (loss of wells) Multiplex identifiers (MIDs or sequence barcodes)

So cost per sample may be ~£10s not £1000s! But logistics of filling a plate may incur delays

“De novo assembly” versus“alignment against template”(aka “re-sequencing”)

Bacterial Genomic Epidemiology

Genome sequencing brings the advantages of open-endedness (revealing the “unknown unknowns”), universal applicability ultimate in resolution

High-throughput platforms 454, Illumina, PacBio Expense and set-up puts them beyond average lab

Bench-top sequencing platforms generate data sufficiently quickly and cheaply to have

an impact on real-world clinical and epidemiological problems

The Birth of Genomic Epidemiology for Bacteria

The Birth of Genomic Epidemiology for Bacteria

Sequencing in Birmingham

@mjpallen@pathogenomenick

#AAMTHI

Case Study Acinetobacter baumannii

Gram-negative bacillusMulti-drug resistant

colistin and tigecycline as reserve agents moving towards pan-resistance

Associated with wound infections and ventilator-associated pneumonia bloodstream infections returning military personnel from Iraq and Afghanistan transmission from military to civilian patients

Acinetobacter baumannii: problems

Hard to identify in clinical laboratory Two related genomospecies 3 and 13TU, (now A. pittii and

A. nosocomialis) impossible to distinguish phenotypically

Outbreak strains can be identified by PFGE, VNTR and gene-specific assays BUT mode of spread and transmission chains often

uncertain, hindering optimal management of outbreaks and rational design of policies

Mechanism of resistance hard to identify in individual cases

Poor understanding of pathogen biology

Applications and Questions

Epidemiology Q1: Can whole-genome sequencing detect differences

between isolates within an outbreak? Q2: Can these differences be used to help determine

chains of transmission?

Emergence of Resistance Q3: Can it reveal how resistance emerges?

Taxonomy and Identification Q4: Can it tell us what defines a species within a

genus?

Acinetobacter Genomic Epidemiology

Outbreak in Birmingham Hospital in 2008Isolates indistinguishable by current typing

methods

Acinetobacter Genomic Epidemiology

454 whole-genome sequencing of 6 isolatesSNP detection by mapping reads against

draft reference assemblySNP filtering for false positivesSNP validation with Sanger sequencing of

PCR amplicons

Outbreak isolates distinguishable at only three loci

  SNP 1  SNP 2  SNP 3 

AB0057   C A G

M1  C A G

M2  T  A G

M3  T  A T 

M4  T  A G

C1  T  T  G

C2   T  A G

Before and after tigecycline therapy

Genomes of two Acinetobacter baumannii isolates from single patient sequenced AB210 before

tigecycline therapy (susceptible); 454 sequenced

AB211 after therapy (resistant); Illumina-sequenced

Before and after tigecycline therapy

Eighteen SNPs detected between AB210 and AB211 nine non-synonymous including a SNP in adeS which accounts for resistance

phenotype

Three contigs in AB210 not covered by reads in AB211, representing three deletions of ~15, 44,17 kb mutS truncated; likely increase in mutation rate

Ion Torrent

Millions of wells reading sequencesMicrochip detects release of protons~3 hour run-time~£500 cost per run

Applications: Cancer Biology

Malignant Darwinism

Mutational frequency heterogeneity analysis to become an integral component of molecular pathology

Cancer is an evolutionary process

Applications: Cancer Biology

Genome versus exome versus transcriptomeEven a transcriptome provides • abundance of RNAs • expressed mutations (point mutations, indels, inversions), alternative and novel splicing, gene fusions, RNA editing

Applications: Cancer Biology

Deep precision measurements of mutation frequency in a tissue can be made using next generation sequencing of PCR amplicons spanning the mutation

Challenges

In recent cancer genomes ~50% of predicted SNVs from the primary sequence data could not be revalidated.

Many private germline polymorphisms still exist in every individual, so additional qualification against germline DNA is always necessary to distinguish somatic variants

Applications: Cancer Biology

Coding SNPs dominated by a few frequently mutated loci (oncogenes or tumour suppressors) long tail of population-infrequent SNPs driver/passenger distinction regulatory sequence mutations yet to be explored

Hundreds of genomes for each cancer type required to make sense of the mutations seen?

BUT driver mutations in some cancer subtypes found with much smaller studies C134Y FOXL2 mutation in adult type granulosa cell

tumours from the transcriptomes of four granulosa cell cases

Multiple Displacement Amplification

Single-cell Genomics

Or FACS or dilution or microfluidics)

What will you do when you can sequence everything?

Further Information

High-throughput sequencing technology http://pathogenomics.bham.ac.uk/blog http://www.nature.com/nrg/journal/v11/n1/pdf/nrg262

6.pdf http://onlinelibrary.wiley.com/doi/10.1002/smll.200900

976/pdf http://dx.doi.org/10.1016/j.tibtech.2008.07.003

Clinical Microbiology Pallen, Loman, Penn High-throughput sequencing

and clinical microbiology: progress, opportunities and challenges Current Opinion in Infectious Disease http://www.sciencedirect.com/science/journal/13695274

Further Information

Cancer genomics http://www.ncbi.nlm.nih.gov/pubmed/

19921711,19918804,20016485,20164919,20016488,20200521, 20371490 http://www.nature.com/nature/journal/v458/n7239/pdf/nature07943.pdf http://www.nature.com/news/2010/100414/pdf/464972a.pdf http://www.nature.com/nature/journal/v464/n7289/pdf/464678a.pdf http://www.nature.com/nature/journal/v464/n7289/pdf/464679a.pdf http://omicsomics.blogspot.com/2010/04/value-of-cancer-genomics.html http://cancergenome.nih.gov/ http://www.sanger.ac.uk/genetics/CGP/ http://scienceonline.org/cgi/content/full/sci;327/5969/1074 http://app2.capitalreach.com/esp1204/servlet/tc?

cn=aacr&c=10165&s=20435&e=12623&&m=1&br=80&audio=false http://app2.capitalreach.com/esp1204/servlet/tc?

cn=aacr&c=10165&s=20435&e=12624&&m=1&br=80&audio=false