Post on 01-Jul-2015
description
Surya Saha Cornell University & Boyce Thompson Institute
suryasaha@cornell.edu // Twitter:@SahaSurya
IIT Indore
May 29, 2014
Slides: http://bit.ly/IITIndoreSeq
http://www.acgt.me/blog/2014/3/7/next-generation-sequencing-must-die
5/29/2014 IIT Indore 2
You are free to:
Copy, share, adapt, or re-mix;
Photograph, film, or broadcast;
Blog, live-blog, or post video of;
This presentation. Provided that:
You attribute the work to its author and
respect the rights and licenses associated
with its components.
Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero. Social Media Icons adapted with
permission from originals by Christopher Ross. Original images are available under GPL at
http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites
19
53
DNA Structure discovery
19
77
20
12
Sanger DNA sequencing by chain-terminating inhibitors
19
84
Epstein-Barr virus
(170 Kb)
19
87
Abi370
Sequencer
19
95
20
01
Homo sapiens (3.0 Gb)
20
05
454
Solexa
Solid
20
07
20
11
Ion Torrent
PacBio
Haemophilus influenzae (1.83 Mb)
20
13
Slide credit: Aureliano Bombarely
Sequencing over the Ages
Illumina
Illumina Hiseq X
454
5/29/2014 IIT Indore 3
Pinus taeda
(24 Gb)
20
14
MinION
5/29/2014 IIT Indore 4
Its all about the $£€¥
http://www.genome.gov/sequencingcosts/
5/29/2014 IIT Indore 5
First generation sequencing
Sanger method
5/29/2014 IIT Indore 6
Frederick Sanger 13 Aug 1918 – 19 Nov 2013 Won the Nobel Prize for Chemistry in 1958 and 1980. Published the dideoxy chain termination method or “Sanger method” in 1977
http://dailym.ai/1f1XeTB
Sanger method
5/29/2014 IIT Indore 7
http://bit.ly/1g6Cudq
http://bit.ly/1lcQO4J
Maxam-Gilbert method
5/29/2014 IIT Indore 8
Maxam-Gilbert method
5/29/2014 IIT Indore 9
http://bit.ly/1noY0fu http://bit.ly/1lGvJCA
First generation sequencing
• Very high quality sequences (99.999%)
• Very low throughput
5/29/2014 IIT Indore 10
Run Time Read Length Reads / Run
Total
nucleotides
sequenced
Cost / MB
Capillary
Sequencing
(ABI3730xl)
20m-3h 400-900 bp 96 or 386 1.9-84 Kb $2400
http://bit.ly/1clLps3 http://1.usa.gov/1cLqIRd
Next generation sequencing
5/29/2014 IIT Indore 11
5/29/2014 IIT Indore 12
http://bit.ly/1keDtZQ
• Second generation • Third generation • Fourth generation • Next-next-generation • Next-next-next
generation http://www.acgt.me/blog/2014/3/10/next-generation-sequencing-must-diepart-2
Use the specific technology used to generate the data
– Illumina Hiseq/Miseq/NextSeq
– Pacific Biosciences RS I/RS II
– Ion Torrent Proton/PGM
– SOLiD
– 454
5/29/2014 IIT Indore 13
http://www.acgt.me/blog/2014/3/10/next-generation-sequencing-must-diepart-2
454 Pyrosequencing
One purified DNA fragment, to one bead, to one read.
5/29/2014 IIT Indore 14
http://bit.ly/1ehwxWN
GS FLX Titanium
http://bit.ly/1ehAcEh
Illumina
5/29/2014 IIT Indore 15
Output 15 Gb 120 GB 1000 GB 1800 GB
Number of Reads
25 Million 400 Million 4 Billion 6 Billion
Read Length
2x300 bp 2x150 bp 2x125 bp (2x250 update mid-2014)
2x150 bp
Cost $99K $250K $740K $10M
Source: Illumina
Illumina
5/29/2014 IIT Indore 16
Output 15 Gb 120 GB 1000 GB 1800 GB
Number of Reads
25 Million 400 Million 4 Billion 6 Billion
Read Length
2x300 bp 2x150 bp 2x125 bp (2x250 update mid-2014)
2x150 bp
Cost $99K $250K $740K $10M
Source: Illumina
$1000 human genome??
Illu
min
a: M
ole
culo
5/29/2014 IIT Indore 18
http://bit.ly/1aEPOBn
Pacific Biosciences SMRT sequencing
Single Molecule Real Time sequencing
5/29/2014 IIT Indore 19
http://bit.ly/1naxgTe
Pacific Biosciences SMRT sequencing Error correction methods
5/29/2014 IIT Indore 20
Hierarchical genome-assembly process (HGAP)
PB
Jelly
Enlish et al., PLOS One. 2012
PBJelly
5/29/2014 IIT Indore 21
Pacific Biosciences SMRT sequencing Read Lengths
http://www.igs.umaryland.edu/labs/grc/
Mean Read Length: 8391 bp Maximum Subread Length: 24585 bp
Oxford Nanopore
5/29/2014 IIT Indore 22
https://www.nanoporetech.com/
• No data yet
• Error model
http://erlichya.tumblr.com/post/66376172948/hands-on-experience-with-oxford-nanopore-minion
Others
• Ion Torrent Proton/PGM
• Nabsys
• SOLiD
5/29/2014 IIT Indore 23
Comparison
5/29/2014 IIT Indore 24
Next generation sequencing
5/29/2014 IIT Indore 25
Run Time Read Length Quality
Total
nucleotides
sequenced
Cost /MB
454
Pyrosequencing 24h 700 bp Q20-Q30 0.7 GB $10
Illumina Miseq 27h 2x250bp > Q30 15 GB $0.15
Illumina Hiseq
2500 11days 2x125bp >Q30 1000 GB $0.05
Ion torrent 2h 400bp >Q20 50MB-1GB $1
Pacific
Biosciences 2h 5.5-8.5kb
>Q30 consensus
>Q10 single
400-800MB
/SMRT cell $0.33-$1
http://bit.ly/1clLps3 http://1.usa.gov/1cLqIRd
http://omicsmaps.com/
Next Generation Genomics: World Map of High-throughput Sequencers
IIT Indore 5/29/2014 26
Real cost of Sequencing!!
Sboner, Genome Biology, 2011
IIT Indore 5/29/2014 29
Library Types
Single end
Pair end (PE, 150-800 bp, Fwd:/1, Rev:/2)
Mate pair (MP, 2Kb to 20 Kb)
5/29/2014 IIT Indore 30
F
F R
F R 454/Roche
F R Illumina
Illumina
Slide credit: Aureliano Bombarely
Implications of Choice of Library
5/29/2014 IIT Indore 31 Slide credit: Aureliano Bombarely
Consensus sequence
(Contig)
Reads
Scaffold
(or Supercontig)
Pair Read information
NNNNN
Pseudomolecule
(or ultracontig)
F
Genetic information (markers)
NNNNN NN
5/29/2014 IIT Indore 32
Quality control: Encoding
http://bit.ly/N28yUd
Phred score of a base is: Qphred = -10 log10 (e)
where e is the estimated probability of a base being incorrect
Which technology to use??
• Microbial genomes
• Eukaryotic genomes
• Resequencing genomes
• RNAseq and other XXXseq methods
5/29/2014 IIT Indore 33
http://bit.ly/1ko9Kgh
Looking into the Crystal ball
• Desktop sequencing
• Diagnostics in the clinic
• Large scale environmental sequencing of microbes
• But challenges remain..
5/29/2014 IIT Indore 34
• International Society of Computational Biology (ISCB)
• ISCB SC RSG India
• > 1500 members
• Contact – rsg-india@googlegroups.com
– http://www.iscbsc.org/rsg/rsg-india
– https://groups.google.com/forum/#!forum/compbio_discussion
5/29/2014 IIT Indore 35
5/29/2014 IIT Indore 36
• Collaborate with student organizations
• Organize workshops and journal clubs
• Attend international meetings
Position available at Solgenomics
Cassavabase project
Plant Breeding + Bioinformatician
● Familiar with breeding
● Programming in Perl, R, SQL, Hadoop
● Linux
● Africa
● Genius
http://www.cassavabase.org/forum/posts.pl?topic_id=9
Thank you!! Questions??
5/29/2014 BTI Plant Bioinformatics Course 2014 38