6/23/11 Technologies&for& metagenomic&data …statweb.stanford.edu/~susan/Summer11/Lect2-NGS.pdf ·...

2
6/23/11 1 Technologies for metagenomic data collec7on Overview “Old school techniques” Next genera7on sequencing Microarrays: PhyloChip Old school techniques Pla7ng Can only work with culturable strains Very laborious Flow cytometry (unconven7onal) Davey, HM. and Kell, DB. Flow cytometry and cell sor7ng of heterogeneous microbial popula7ons: the importance of singlecell analysis. Microbiological reviews, Dec. 1996, 641696. qPCR Need primers for every species Cannot iden7fy previously unknown species Laborious “Old genera7on sequencing”: Sanger sequencing DNA is fragmented Cloned to a plasmid vector Cyclic sequencing reac7on Separa7on by electrophoresis Readout with fluorescent tags Current genera7on next genera7on sequencing pla[orm comparison Read length * Gb per run * Technology Illumina www.illumina.com GA II / HiSeq 2 x 100 bp 20+ Gb Bridge amplifica7on 454 www.454.com GS FLX Titanium 1x400600 bp 2x140200 bp ~0.5 Gb Emul7on PCR amplifica7on Pyrosequencing by extension Applied Biosystems www.appliedbiosystem s.com SOLiD 3 2 x 50 bp 20+ Gb Emulsion PCR amplifica7on Liga7onbased sequencing Alignment in color space Helicos www.helicosbio.com Single molecule 2 x 2555 bp 21–28Gb No amplifica7on Single molecule sequencing * Instrument vendors are constantly upda7ng their technology, chemistry, and QA in order to increase these parameters Emulsion PCR Fragments, with adaptors, are PCR amplified within a water drop in oil. One primer is aiached to the surface of a bead. Used by 454, Polonator and SOLiD. Bridge PCR DNA fragments are flanked with adaptors. A flat surface coated with two types of primers, corresponding to the adaptors. Amplifica7on proceeds in cycles, with one end of each bridge tethered to the surface. Used by Solexa.

Transcript of 6/23/11 Technologies&for& metagenomic&data …statweb.stanford.edu/~susan/Summer11/Lect2-NGS.pdf ·...

Page 1: 6/23/11 Technologies&for& metagenomic&data …statweb.stanford.edu/~susan/Summer11/Lect2-NGS.pdf · 2011-06-24 · 3(!,!-./0.1$2 Most variance in the data is attributable to the 16S

6/23/11  

1  

Technologies  for  metagenomic  data  collec7on  Overview  

•  “Old  school  techniques”  •  Next  genera7on  sequencing  •  Microarrays:  PhyloChip  

Old  school  techniques  

•  Pla7ng  – Can  only  work  with  culturable  strains  – Very  laborious  

•  Flow  cytometry  (unconven7onal)  –  Davey,  HM.  and  Kell,  DB.  Flow  cytometry  and  cell  sor7ng  of  heterogeneous  microbial  popula7ons:  

the  importance  of  single-­‐cell  analysis.  Microbiological  reviews,  Dec.  1996,  641-­‐696.  

•  qPCR  – Need  primers  for  every  species  – Cannot  iden7fy  previously  unknown  species  – Laborious  

“Old  genera7on  sequencing”:  Sanger  sequencing  

•  DNA  is  fragmented  

•  Cloned  to  a  plasmid  vector  

•  Cyclic  sequencing  reac7on  

•  Separa7on  by  electrophoresis  

•  Readout  with  fluorescent  tags  

Current  genera7on  next  genera7on  sequencing  pla[orm  comparison  

Read  length*   Gb  per  run*  

Technology  

Illumina  www.illumina.com    

GA  II  /HiSeq  

2  x  100  bp   20+  Gb   Bridge  amplifica7on  

454  www.454.com      

GS  FLX  Titanium  

1x400-­‐600  bp  2x140-­‐200  bp  

~0.5  Gb   Emul7on  PCR  amplifica7on  Pyrosequencing  by  extension  

Applied  Biosystems  www.appliedbiosystems.com    

SOLiD  3   2  x  50  bp   20+  Gb   Emulsion  PCR  amplifica7on    Liga7on-­‐based  sequencing    Alignment  in  color  space  

Helicos  www.helicosbio.com    

Single  molecule  

2  x  25-­‐55  bp   21–28Gb   No  amplifica7on  Single  molecule  sequencing  

*  Instrument  vendors  are  constantly  upda7ng  their  technology,  chemistry,  and  QA  in  order  to  increase  these  parameters  

Emulsion  PCR  

•  Fragments,  with  adaptors,  are  PCR  amplified  within  a  water  drop  in  oil.  

•  One  primer  is  aiached  to  the  surface  of  a  bead.    

•  Used  by  454,  Polonator  and  SOLiD.  

Bridge  PCR  

•  DNA  fragments  are  flanked  with  adaptors.  •  A  flat  surface  coated  with  two  types  of  primers,  corresponding  

to  the  adaptors.  •  Amplifica7on  proceeds  in  cycles,  with  one  end  of  each  bridge  

tethered  to  the  surface.  •  Used  by  Solexa.  

Page 2: 6/23/11 Technologies&for& metagenomic&data …statweb.stanford.edu/~susan/Summer11/Lect2-NGS.pdf · 2011-06-24 · 3(!,!-./0.1$2 Most variance in the data is attributable to the 16S

6/23/11  

2  

Roche  454  GS  FLX  Sequencing  

•  1  million  reads  per  run  

•  400-­‐500  base  pares  per  read    

•  Ability  to  mul7plex  (sequence  many  specimens  in  a  single  run)  with  bar  codes  

Marcel  Margulies,  et.  al.  Nature  437,  376-­‐380  (15  September  2005)    

How  does  Roche  454  sequencing  work?  

•  Nucleo7des  are  cycled  in  order  T,  A,  C,  G  (“flows”)  

•  When  incorporated  onto  a  template  a  fluorescent  signal  is  emiied  

•  The  amount  of  fluorescence  is  recorded  

•  The  reagents  are  washed  and  the  flow  cycle  restarts  

hip://www.youtube.com/watch?v=kYAGFrbGl6E    

454  base  calling  

>ACGCTCTTGGCCAGTCTAGTCGTGCG  >CCAGTCTAGTCGTGCGACGCTCTTGG  >GGCCAGTCTAGTCGTGCGACGCTCTT  >ACGCTCTTAGTCTAGTCGTGCGGGCC  >TCTTGGCCAGTCTAGTCGTGCGACGC  

T:0!A:1!C:1!G:0!T:1!A:1!C:0!G:2!T:1!A:1!

ACTAGGTA  

General  strategy  for    

•  Select  a  marker  genomic  region  – Must  be  present  in  all  bacteria  – Must  have  regions  constant  for  all  bacteria  for  primer  design  

– Must  have  regions  of  variability  so  that  individual  species/taxa  can  be  dis7nguished  

•  Extract  DNA  from  biological  specimens  •  Amplify  the  marker  from  extracted  DNA  •  Sequence  mul7ple  samples  in  mul7plex  

Targeted  sequencing  of  16S  rRNA  gene  V2  V1   V4  V3   V6  V5   V8  V7   V9  

A   B  454  adaptors  

Barcodes  (10  nt)  

Forward/reverse  primers  

5’  5’  

Loca7on  of  the  hyper-­‐variable  regions  of  the  16S  rRNA  and  a  typical  primer  construct  for  amplicon  library  prepara7on  

500   1000   1500  

16S  rRNA  small  subunit  is  commonly  used  •   Present  in  “all”  bacteria  •   Contains  variable  regions  that  (may)  carry  enough  phylogene7c  informa7on  to  dis7nguish  different  bacteria  on  a  “deep-­‐enough”  taxonomic/phylogene7c  level  

Limita7ons  

•  Unique  sequences  represent  unique  phylotypes,  not  species,  due  to  16S  gene  mul7plicity  

•  Abundances  computed  with  different  primers  are  not  comparable  

!"#$%&!

!!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

! !

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!'()'*!

!'*)'+!

((,!-./0.1$2

3(!,!-./0.1$2

Most variance in the data is attributable to the 16S rRNA locus at which the microbiomes are assessed. !

Similarity of phyla (left) and genera (right) identified using different primers.!

!"#$%&'()'*+,-.'/0'1(##(0'

!"#$""%&'()'*&+%,&'*)'-

!"

#!

#"

!"#!$$%&&%'!%#!"

!"#!$

!"##"$

!%#!"

!"#$%&'()'*%+%&,'-+'.(##(+'

!"#$""%&'()'*&+%,&'*)'-

%$&

&%%

%$&

'()*+,-.-$$

#'()*$+,+-.

#'()*$+,+/0