Genome&Sequencing:&Introduc2on&...

47
Genome Sequencing: Introduc2on to Fragment Assembly Lecture 5: September 4, 2012

Transcript of Genome&Sequencing:&Introduc2on&...

Page 1: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Genome  Sequencing:  Introduc2on  to  Fragment  Assembly  

Lecture  5:  September  4,  2012    

Page 2: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Review  from  Last  Lecture  

Page 3: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Sample  Prepara2on  

Fragments  

Page 4: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Sample  Prepara2on  

Sequencing  

ACGTAGAATCGACCATG

GGGACGTAGAATACGAC

ACGTAGAATACGTAGAA

Reads  

Fragments  

Next  Genera2on  Sequencing  (NGS)  

Page 5: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Sample  Prepara2on  

Sequencing  

Assembly  

ACGTAGAATACGTAGAAACAGATTAGAGAG…

Con2gs  

Fragments  

Reads  

ACGTAGAATCGACCATG GGGACGTAGAATACGAC

ACGTAGAATACGTAGAA

Page 6: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

6  

Sample  Prepara2on  

Sequencing  

Assembly  

Analysis  

Fragments  

Reads  

Con2gs  

“…the  ability  to  determine  DNA  sequences  is  star2ng  to  outrun  the  ability  of  researchers  to  store,  transmit  and  especially  to  analyze  the  data.”  

 -­‐  New  York  Times,  November  30,  2011  

Page 7: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Sample  Prepara2on  

Sequencing  

Assembly  

Analysis  

Fragments  

Reads  

Con2gs  

Page 8: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Algorithms  for  Fragment  Assembly  

Page 9: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Whole  Genome  Shotgun  Sequencing  

Genome  amplified  and  sliced  into  smaller  fragments  (>=600bp)  

Genome  

Build  consensus  sequence  from  overlap  

Page 10: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Tradi2onal  (“Sanger”)  Sequencing  •  Sequence  shotgun  fragments  of  length  600  bp  using  Sanger  sequencing.  

•  Fragment  Assembly  is  accomplished  using  “overlap-­‐layout-­‐consensus”  approach:    •  overlap:  matching  all  possible  reads  and  finding  any  overlapping.  

•  layout:  finding  order  of  reads  along  DNA  and  pu_ng  them  together.  

•  consensus:  deriving  how  sequence  will  appear  based  on  layout.  

Page 11: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Overlap-­‐Layout-­‐Consensus  Approach  •  Build  an  overlap  graph  where  each  node  represents  a  read.  An  edge  exists  between  two  reads  if  they  overlap  

•  Traverse  the  graph  to  find  unambiguous  paths  which  form  the  con2gs  

Page 12: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Problems!  •  The  main  problem  with  this  approach  is  that  it  is  very,  very,  very  slow  and  will  only  work  on  small  genomes  or  low  coverage.  

•  Not  commonly  used  for  complete  assembly,  however,  some  sobware  tools  s2ll  use  this  approach:  –  Celera:  genome  assembler  for  454,  PacBio,  and  Illumina  data    

–  LOCAS:  Resequencing  genomes.  – HapAssembler:  for  sequencing  highly  polymorphic  genomes  

Page 13: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Problems!  

Unfortunately,  overlap-­‐layout-­‐consensus  approach  will  not  work  for  NGS  data  or  significantly  large  genomes:  – There  is  too  much  data.    Calcula2ng  the  overlap  for  each  pair  of  reads  would  take  way  to  much  2me.    – There  has  to  be  a  new  method  for  fragment  assembly.  

Page 14: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

De  Bruijn  Graph  Approach  to  Assembly  

Page 15: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

De  Bruijn  Graph  for  Assembly  •  Introduced  in  1989.    

   •  Adapted  for  next  genera2on  sequencing  data.  

 

Pevzner.  J  Biomol  Struct  Dyn  (1989)  7:63—73.  Iduly  &  Waterman.  J.  Comput  Biol  (1995)  2:291—306.  

Euler-­‐SR:  Chaisson  &  Pevzner.  Genome  Res.  (2008)  18:324—30.  Velvet:  Zerbino  &  Birney.  Genome  Res.  (2008)  18:821—29.  ALLPATHS:  Butler  et  al.  Genome  Res.  (2008)  18(5):810—20.  ABySS:  Simpson  et  al.  Genome  Res  (2009)  19:1117—1123.    

Page 16: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

De  Bruijn  Graph  Construc2on  

I.  Choose  a  value  of  !.  II.  For  each  !-­‐mer  that  exists  in  any  sequence  

create  an  edge  with  one  vertex    labeled  as  the  prefix  and  one  vertex  labeled  as  the  suffix.  

III.  Glue  all  ver2ces  that  have  the  same  label.                  (Pevzner,  Tang  &  Tesler,  2004)  

Page 17: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

GTCTATTCGCTAATTCACTA

ATTCG  

ATTCA   TTCA  ATTC  

ATTC   TTCG  

(Pevzner,  Tang  &  Tesler,  2004)  

De  Bruijn  Graph  Construc2on  

Page 18: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

GTCTATTCGCTAATTCACTA

ATTCG  

ATTCA   TTCA  ATTC  

ATTC   TTCG  

(Pevzner,  Tang  &  Tesler,  2004)  

De  Bruijn  Graph  Construc2on  

Page 19: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

GTCTATTCGCTAATTCACTA

ATTCG  

ATTCA   TTCA  

ATTC  

TTCG  

(Pevzner,  Tang  &  Tesler,  2004)  

De  Bruijn  Graph  Construc2on  

Page 20: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Challenges  in  Fragment  Assembly    •  Repeats  in  the  genome.      •  Sequencing  errors,  which  vary  by  plamorm.    

•  Size  of  the  data,  e.g.  1.5  billion  reads.  

ACCAGTTGACTGGGATCCTTTTTAAAGACTGGGATTTTAACGCG CAGTTGACTG

TGGGATCC TGGGATTT

TGGGAATT TGGGACTT TGGGA--T

TGGGAACTTATT

Subs2tu2on    Dele2on  Inser2on  

Page 21: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

De  Bruijn  Graph  of  a  Genome  

Example  Genome:      ABCDEFGHICDEFGKL  Example  Genome:      ABCDEFGHICDEFGKL

(!  −1)-­‐mers  !-­‐mers  ABC BCD CDE DEF EFG GHI

HIC ICD FGK GKL  

ABCD BCDE CDEF DEFG EFGH GHIC

HICD ICDE EFGK FGKL  

Page 22: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

De  Bruijn  Graph  of  a  Genome  

ABC BCD CDE DEF EFG FGK GKL

FGH

GHI HIC

ICD

Example  Genome:      ABCDEFGHICDEFGKL  Example  Genome:      ABCDEFGHICDEFGKL

Page 23: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

De  Bruijn  Graph  of  a  Genome  

Example  Genome:      ABCDEFGHICDEFGKL  Bulges  (undirected  cycles)  and  whirls  (directed  cycles)  occur  because  of  sequencing  errors  or  repeats  in  the  genome.  

ABC BCD CDE DEF EFG FGK GKL

FGH

GHI HIC

ICD

Page 24: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

De  Bruijn  Graph  of  a  Genome  

Example  Genome:      ABCDEFGHICDEFGKL  Example  Genome:      ABCDEFGHICDEFGKL

1   3  

2  

ABC BCD CDE DEF EFG FGK GKL

FGH

GHI HIC

ICD

Page 25: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Typical  De  Bruijn  Graph  

…  However,  this  is  over  a  billion  ver2ces  (for  a  very  small  bacteria  genome).  

Page 26: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

De  Bruijn  Graph  of  a  Genome  

Example  Genome:      ABCDEFGHICDEFGKL  Example  Genome:      ABCDEFGHICDEFGKL

1   3  

2  

ABC BCD CDE DEF EFG FGK GKL

FGH

GHI HIC

ICD

Page 27: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

De  Bruijn  Graph  of  a  Genome  

ABC BCD CDE DEF EFG FGK GKL

Example  Genome:      ABCDEFGHICDEFGKL  Example  Genome:      ABCDEFGHICDEFGKL

Page 28: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

De  Bruijn  Graph  of  a  Genome  

ABC BCD CDE DEF EFG FGK GKL

Example  Genome:      ABCDEFGHICDEFGKL  Resul2ng  Erroneous  Genome:      ABCDEFGKL

1  

Page 29: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Paired-­‐end  Reads  

•  Random  fragment  with  an  approximately  known  size.  

•  Both  ends  are  sequenced.  •  Specified  prior  to  data  acquisi2on.  

ACTATAAT   ACCGCGAT  Insert  Size  

Page 30: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Standard  (Mul2-­‐cell)  Data    

Coverage  

(Chitsaz  et  al.,  2011)  

Page 31: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Coverage  

Coverage  

(Chitsaz  et  al.,  2011)  

Single-­‐cell  Data    

Page 32: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Detangling  the  de  Bruijn  Graph  

Even  using  mate-­‐pair  informa2on,  the  de  Bruijn  graph  is  highly  tangled.      There  are  the  following  op2ons  for  detangling  the  de  Bruijn  graph:  1.  Error  correc2on  of  reads.  2.  Bulge  and  whirl  removal.  

32  

Page 33: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Detangling  the  de  Bruijn  Graph  

Even  using  mate-­‐pair  informa2on,  the  de  Bruijn  graph  is  highly  tangled.      There  are  the  following  op2ons  for  detangling  the  de  Bruijn  graph:  1.  Error  correc2on  of  reads  2.  Bulge  and  whirl  removal  

33  

PROBLEM!    Both  inevitably    end-­‐up  causing  errors  rather  than  correcHng  then.  

Page 34: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Assembly  Demonstra2on  

Page 35: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Notes  About  Demo  •  We  used  Velvet  because  it’s  the  simplest  to  use.  •  Write  a  shell  script  to  run  the  assembler  to  keep  track  of  the  parameters  you  used  and  to  avoid  wri2ng  out  the  command  each  2me.  

•  Almost  all  assemblers  require  you  to  specify  the  following:  – value  of  k,  whether  the  data  is  paired  end,  insert  length,  minimum  con2g  length  

– And  some2mes,  whether  it  is  single-­‐cell  data.  

Page 36: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

•  How  you  specify  the  mate-­‐pair  informa2on  varies  from  assembler  to  assembler.    You  have  to  read  the  manual  and  write  a  (perl)  script  to  specify  the  data  in  the  correct  format!  

Notes  About  Demo  

Page 37: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

•  Assemblers  can  be  challenging  programs  to  run.    All  of  them  have  intricacies  even  in  the  installa2on  of  the  program.  

•  Therefore,  running  an  assembler  requires:  1.  Some  knowledge  about  Unix/Linux  commands.  2.  Access  to  a  server  with  large  amounts  of  

memory  (64G  for  small  bacteria  genomes,  512G  for  larger  genomes).  

Notes  About  Demo  

Page 38: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

•  Be  aware  that  your  assembler  may  not  always  produce  decent  results.    Can  you  tell  if  you  did?    Yes.    

Notes  About  Demo  

Page 39: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Assembly  Evalua2on  

Page 40: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

What  has  been  sequence?  

•  We’ve  sequenced  a  number  of  genomes  but  several  genomes  remain  difficult  

•  Plant  genomes  are  very  hard  because  they  are  extremely  long,  contain  huge  repeat  regions,  and  are  polyploid      

•  Note:  we  do  not  dis2nguish  between  genotypes…  that  is  a  separate  problem  

Page 41: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Organism   Type   Genome  Size   No.  of  predicted  genes  

Homo  Sapiens   Human   3.2Gb   20,251  Takifugu  rubripes  

Puffer  fish   390Mb     22-­‐29,000  

Oryza  sa2va   Rice   420Mb     32-­‐50,000  Anopheles  gambiae  

Mosquito   278Mb   13,700  

Saccharomyces  cerevisiae  

Baker’s  yeast  

12.1Mb   6,200  

Cucumis  sa2vus   cucumber   367Mb   27,000  

•  kb  (=  kbp)  =  kilo  base  pairs  =  1,000  bp  •  Mb  =  mega  base  pairs  =  1,000,000  bp  •  Gb  =  giga  base  pairs  =  1,000,000,000  bp.  

Page 42: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Assembly  Evalua2on    •  How  can  we  tell  the  difference  between  a  good  assembly  and  a  bad  assembly?  – Answer:  N50  staHsHc,  which  is  a  metric  of  the  length  of  a  set  of  sequences,  with  greater  weight  given  to  longer  sequences.  

– Given  a  set  of  sequences  of  varying  lengths,  the  N50  length  is  defined  as  the  length  N  for  which  half  of  all  bases  in  the  sequences  are  in  a  sequence  of  length  L  <  N.  

– There  are  some  contradictory  in  the  defini2on(s)  of  the  N50  value.    

Page 43: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Calcula2ng  N50  

Alterna2ve  defini2on:  the  largest  en2ty  E  such  that  at  least  half  of  the  total  size  of  the  en22es  is  contained  in  en22es  larger  than  E.    1.  Read  Fasta  file  and  calculate  sequence  length.  2.  Sort  length  on  reverse  order.  3.  Calculate  Total  size.  4.  Calculate  N50.  

Page 44: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Other  Evalua2ons  

•  Number  of  inser2ons,  dele2ons,  and  subs2tu2on  errors  in  an  assembly  

•  misassembly  of  con2gs  (chimeric  indels)  

44  

>=500  bp  

Page 45: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Other  Evalua2ons  

•  Number  of  inser2ons,  dele2ons,  and  subs2tu2on  errors  in  an  assembly  

•  misassembly  of  con2gs  (chimeric  indels)  

45  

>=500  bp  

Page 46: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Next  Lecture  

Page 47: Genome&Sequencing:&Introduc2on& to&FragmentAssembly&cs680/Slides/lecture5.pdfTradi2onal&(“Sanger”)&Sequencing& • Sequence&shotgun&fragments&of&length&600&bp& using&Sanger&sequencing.&

Detangling  the  de  Bruijn  Graph  

Even  using  mate-­‐pair  informa2on,  the  de  Bruijn  graph  is  highly  tangled.      There  are  the  following  op2ons  for  detangling  the  de  Bruijn  graph:  1.  Error  correc2on  of  reads  2.  Bulge  and  whirl  removal