Genome Characterization Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Ch 9.
Sequencing techniques and genome assembly
description
Transcript of Sequencing techniques and genome assembly
![Page 1: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/1.jpg)
Sequencing techniques and genome assembly
Yuzhen Ye ([email protected])
School of Informatics & Computing, IUB
I519 Introduction to Bioinformatics, 2011
![Page 2: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/2.jpg)
Start with reads>read1aatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaat>read2gctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgca>read3tgcggctatgctaatgcatgcggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgcta……
![Page 3: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/3.jpg)
What can be done Assemble the short reads into a genome
(hopefully a complete genome)– Assembly problem
Comparative analysis– Whole genome level: whole genome comparison– Individual gene level– Genome variation & SNP
Annotate the genome– What are the genes (gene structure prediction)– What are the functions of the genes
![Page 4: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/4.jpg)
How genome sequences are generated? Limitation on read length (new sequencers produce even
shorter reads than Sanger sequencing machines)
Sequencing of long DNA sequences (a chromosome or a whole genome) relies on sequencing of short segments (carried in cloning vectors)
Two approaches to sequence large pieces of– Chromosome walking / primer walking; progresses through the
entire strand, piece by piece
– Shotgun sequencing; cut DNA randomly into smaller pieces; with sufficient oversampling (?), the sequence of the target can be inferred by piecing the sequence reads together into an assembly.
![Page 5: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/5.jpg)
Cloning vectors
Cloning Vectors: DNA vehicles in which a foreign DNA can be inserted; and stay stable
Various types– Cosmid (plasmid, containing 37-52 kbp of DNA)– BAC (Bacterial Artificial Chromosome; takes in
100-300 kbp of foreign DNA)– YAC (Yeast Artificial Chromosome)
![Page 6: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/6.jpg)
Shotgun sequencing
cut randomly (Shotgun)
DNA
Each short read can be sequenced
Too long to be sequenced
Fragment assembly
(an inverse problem)
![Page 7: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/7.jpg)
Shotgun sequencing: from small viral genomes to larger genomes
Early applications of shotgun approach– small viral genomes (e.g., lambda virus; 1982) – 30- to 40-kbp segments of larger genomes that could be
manipulated and amplified in cosmids or other clones (physical mapping) -- hierarchical genome sequencing (divide-and-conquer sequencing)
1994, Haemophilus influenzae -- whole-genome shotgun (WGS) sequencing– Critical to this accomplishment: use of pairs of reads, called
mates, from the ends of 2-kbp and 16-kbp inserts randomly sampled from the genome (which used for ordering the contigs)
2001 whole-genome shotgun sequencing of Human genome
![Page 8: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/8.jpg)
DNA sequencing technology Sanger sequencing
– The main method for sequencing DNA for the past thirty years!
2nd generation sequencing techniques (next generation sequencing)– Differ from Sanger sequencing in their basic chemistry– Massively increased throughput– Smaller DNA concentration– 454 pyrosequencing, Ilumina/Solexa, SOLiD
3rd generation? (single-molecule)
![Page 9: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/9.jpg)
DNA sequencing: historySanger method (1977):
labeled ddNTPs terminate DNA copying at random points.
Both methods generate labeled
fragments of varying lengths that are
further electrophoresed
(electrophoretic separation)
Gilbert method (1977):
chemical method to cleave DNA at specific points (G, G+A, T+C, C).
![Page 10: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/10.jpg)
Sanger method: generating reads
1. Start at primer
(restriction site)
2. Grow DNA chain
3. Include ddNTPs
4. Stops reaction at all
possible points
5. Separate products by
length, using gel
electrophoresisChain terminators:dideoxynucleotides triphosphates (ddNTPs)
![Page 11: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/11.jpg)
Radioactive sequencing versus dye-terminator
sequencing
ddNTPs (chain terminators) are labeled with different fluorescent dyes, each fluorescing at a different wavelength.
![Page 12: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/12.jpg)
Automatic DNA sequencing
Output: chromatograms (fluorescent peak trace)
![Page 13: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/13.jpg)
Trace archive
NCBI trace archive: TI# 422835669
(http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?)
![Page 14: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/14.jpg)
New sequencing techniques Next Generation Sequencing (NGS) (Second Generation)
– Pyrosequencing– Illumina – SOLiD
Third generation sequencing– single-molecule sequencing technologies
NHGRI funds development of third generation DNA sequencing technologies
– “More than $18 million in grants to spur the development of a third generation of DNA sequencing technologies was announced today by the National Human Genome Research Institute (NHGRI). …The cost to sequence a human genome has now dipped below $40,000. Ultimately, NHGRI's vision is to cut the cost of whole-genome sequencing of an individual's genome to $1,000 or less, which will enable sequencing to be a part of routine medical care..”
– http://www.nih.gov/news/health/sep2010/nhgri-14.htm
![Page 15: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/15.jpg)
Next-generation sequencing transforms today's biology
Sanger sequencers
454 sequencer
Ref: Nature Methods - 5, 16 - 18 (2008)
![Page 16: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/16.jpg)
Next-generation sequencing transforms today's biology
Genome re-sequencing Metagenomics Transcriptomics (RNA-seq) Personal genomics ($1000 for sequencing a
person’s genome)
![Page 17: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/17.jpg)
Pyrosequencing Pyrosequencing principles
– the polymerase reaction is modified to emit light as each base gets incorporated.
![Page 18: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/18.jpg)
Roche (454) GS FLX sequencer
![Page 19: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/19.jpg)
Solexa/Illumina sequencing Ultrahigh-throughput sequencing Keys
– attachment of randomly fragmented genomic DNA to a planar, optically transparent surface
– solid phase amplification to create an ultra-high density sequencing flow cell with > 10 million clusters, each containing ~1,000 copies of template per sq. cm.
Short reads Used for gene expression, small RNA discovery etc
![Page 20: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/20.jpg)
Solexa/Illumina sequencing
More details at http://www.illumina.com/pages.ilmn?ID=203
![Page 21: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/21.jpg)
Applied Biosystems SOLiD sequencer Commercial release in October 2007 Sequencing by Oligo Ligation and Detection ~5 days to run / produces 3-4Gb The chemistry is based on template-directed
ligation of short, “dinucleotide-encoding”, 8-mer oligonucleotides. Dinucleotide-encoding permits discrimination of SNP’s from most chemistry and imaging errors, and subsequent in silico correction of those errors.
Ref: http://appliedbiosystems.cnpg.com/Video/flatFiles/699/index.aspx
![Page 22: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/22.jpg)
Comparison of new sequencing techniques
Applied Biosysems 3730 xl
454 GS FLX Pyrosequencer
Solexa 1G Genome Analyzer
Applied Biosystems 1G SOLiD Analyzer
1-2 Mbp per day/machine
100 Mbp per day/machine
800 Mbp per run/machine
1200 Mbp per run/machine
600-900bp 200-300 bp 25-40 bp 25-30
Mate pair No Mate pair No Mate pair Mate pair
Libraries No No Libraries
(“The new science of metagenomics” Table 4-2)
Increased!!
Yes now!
![Page 23: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/23.jpg)
Next generation sequencing (NGS) techniques454 Sequencing Illumina/Solexa ABI SOLiD
Sequencing Chemistry
PyrosequencingPolymerase-based sequence-by-synthesis
Ligation-based sequencing
Amplification approach
Emulsion PCR Bridge amplification Emulsion PCR
Paired end (PED) separation
3 kb 200-500 bp 3 kb
Mb per run 100 Mb 1300 Mb 3000 Mb
Time per PED run <0.5 day 4 days 5 days
Read length (update)
250-400 bp 35, 75 and 100 bp 35 and 50 bp
Cost per run $ 8,438 USD $ 8,950 USD $ 17,447 USD
Cost per Mb $ 84.39 USD $ 5.97 USD $ 5.81 USD
![Page 24: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/24.jpg)
Base calling Determine the sequence of nucleotides from chromatograms or flowgram (trace files often in SCF format) Peak detection Phrep quality score
Q = -10log10(Pe)
![Page 25: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/25.jpg)
Phrep quality score
Phred Quality Score
Probability of incorrect base call
Base call accuracy
10 1/10 90%
20 1/100 99%
(for high values the two scores are asymptotically equal)
![Page 26: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/26.jpg)
Fragment assembly (Genome assembly)
DNA
?
![Page 27: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/27.jpg)
Assembly Comparative assembly
– comparative (re-sequencing) approaches that use the sequence of a closely related organism as a guide during the assembly process.
De novo assembly– reconstructing genomes that are not similar to any
organisms previously sequenced– proven to be difficult, falling within a class of
problems (NP-hard)– main strategies: greedy, overlap-layout-
consensus, and Eulerian
![Page 28: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/28.jpg)
Fragment assembly: based on the overlap between reads
reads
![Page 29: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/29.jpg)
Fragment assembly: overlap-layout-consensus
Assemblers: ARACHNE, PHRAP, CAP, TIGR, CELERA
Overlap: find potentially overlapping reads
Layout: merge reads into contigs
Consensus: derive the DNA sequence and correct read errors ..ACGATTACAATAGGTT..
![Page 30: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/30.jpg)
Overlap Find the best match between the suffix of one
read and the prefix of another
Due to sequencing errors, need to use dynamic programming to find the optimal overlap alignment
Apply a filtration method to filter out pairs of fragments that do not share a significantly long common substring
![Page 31: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/31.jpg)
Overlapping reads
TAGATTACACAGATTAC
TAGATTACACAGATTAC|||||||||||||||||
• Sort all k-mers in reads (k ~ 24)
• Find pairs of reads sharing a k-mer
• Extend to full alignment – throw away if not >95% similar
T GA
TAGA| ||
TACA
TAGT||
![Page 32: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/32.jpg)
Layout
Create local multiple alignments from the overlapping reads
TAGATTACACAGATTACTGATAGATTACACAGATTACTGATAG TTACACAGATTATTGATAGATTACACAGATTACTGATAGATTACACAGATTACTGATAGATTACACAGATTACTGATAG TTACACAGATTATTGATAGATTACACAGATTACTGA
![Page 33: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/33.jpg)
Derive consensus sequence
Derive multiple alignment from pairwise read alignments
Derive each consensus base by weighted voting
TAGATTACACAGATTACTGA TTGATGGCGTAA CTATAGATTACACAGATTACTGACTTGATGGCGTAAACTATAG TTACACAGATTATTGACTTCATGGCGTAA CTATAGATTACACAGATTACTGACTTGATGGCGTAA CTATAGATTACACAGATTACTGACTTGATGGGGTAA CTA
TAGATTACACAGATTACTGACTTGATGGCGTAA CTA
![Page 34: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/34.jpg)
Consensus A consensus sequence is derived from a profile
of the assembled fragments
A sufficient number of reads are required to ensure a statistically significant consensus.
Reading errors are corrected
![Page 35: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/35.jpg)
Gaps and contigs
Gap
Contig 1 Contig 2
Filling gap -- up the gaps by further experimentsMates for ordering the contigs
![Page 36: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/36.jpg)
Read coverage
Assuming uniform distribution of reads:Length of genomic segment: L
Number of reads: n Coverage = n l / LLength of each read: l
How much coverage is enough (or what is sufficient oversampling)?
Lander-Waterman model: P(x) = (x * e- ) / x! P(x=0) = e-
where is coverage
C
![Page 37: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/37.jpg)
Poisson distribution
![Page 38: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/38.jpg)
Contig numbers vs read coverage
Using a genome of 1Mbp
![Page 39: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/39.jpg)
How much coverage is needed
Cover region with >7-fold redundancy
Overlap reads and extend to reconstruct the original DNA sequence
reads
![Page 40: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/40.jpg)
Repeats complicate fragment assembly
True overlap
Repeat overlap
![Page 41: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/41.jpg)
Challenges in fragment assembly
Repeats: A major problem for fragment assembly > 50% of human genome are repeats:
- over 1 million Alu repeats (about 300 bp)
- about 200,000 LINE repeats (1000 bp and longer)
Repeat Repeat Repeat
Green and blue fragments are interchangeable when assembling repetitive DNA
![Page 42: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/42.jpg)
Repeat types
Low-Complexity DNA (e.g. ATATATATACATA…)
Microsatellite repeats (a1…ak)N where k ~ 3-6(e.g.
CAGCAGTAGCAGCACCAG) Transposons/retrotransposons
– SINE Short Interspersed Nuclear Elements(e.g., Alu: ~300 bp long, 106 copies)
– LINE Long Interspersed Nuclear Elements~500 - 5,000 bp long, 200,000
copies
– LTR retroposons Long Terminal Repeats (~700 bp) at each end
Gene Families genes duplicate & then diverge
Segmental duplications ~very long, very similar copies
![Page 43: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/43.jpg)
Celera assembler “The key to not being confused by repeats is the
exploitation of mate pair information to circumnavigate and to fill them”
A mate pair are two reads from the same clone -- we know the distance between the two reads
Myers et al. 2000 “A Whole-Genome Assembly of Drosophila”. Science, 287:2196 - 2204
![Page 44: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/44.jpg)
Celera assembler: unitig
Unitig: a maximal interval subgraph of the graph of all fragment overlaps for which there are no conflicting overlaps to an interior vertexA-statistic: log-odds ratio of the probability that the distribution of fragment start points is representative of a “correct” unitig versus an overcollapsed unitig of two repeat copies.
![Page 45: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/45.jpg)
Celera Assembler: scaffold
Contigs that are ordered and oriented into scaffolds with approximately known distances between them (using mate pairs or BAC ends)
![Page 46: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/46.jpg)
Finishing: filling in gaps
![Page 47: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/47.jpg)
Human genome
2001 Two assemblies of initial human genome sequences published– International Human
Genome project (Hierachical sequencing; BACshotgun)
– Celera Genomics: WGS approach;
Initial impact of the sequencing of the human genome (Nature 470:187–197, 2011)
![Page 48: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/48.jpg)
Assembly of human genome
J. C. Venter et al., Science 291, 1304 -1351 (2001)
sequence tagged site (STS) markers
![Page 49: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/49.jpg)
Finding a path visiting every VERTEX exactly once in the OVERLAP graph: Hamiltonian path problem
NP-complete problem: algorithms unknown
Find a path visiting every EDGE exactly once in the REPEAT graph:Eulerian path problem
Linear time algorithms are known
Fragment assembly: two alternative choicesFragment assembly: two alternative choices
![Page 50: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/50.jpg)
Overlap graph
False overlaps induced by repeats
thick edges (a Hamiltonian cycle) correspond to the correct layout of the reads along the genome
![Page 51: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/51.jpg)
Eulerian path approach
Pairwise overlaps between reads are never explicitly computed, hence no expensive overlap step is necessary
Overlap between two reads (bold) that can be inferred from the corresponding paths through the deBruijn graph
![Page 52: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/52.jpg)
De Bruijn graph repeat graph (no sequencing errors)
ABCDEFCGHBCDIFCGJ
Vertices: (k-1)-mers from the sequence
AB BC CD DE EF FC CG
GHHB
DI IF
GJ
Every sub-repeat is represented as a repeat edge in the graph.
BCD FCG
Edges: k-mers from the sequence
![Page 53: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/53.jpg)
8328 140 628 1185 2905 381 161442628 1185 140 628 1185 381140 628
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
repeat graph
Repeat graph
A-Bruijn graph
repeat graph
Removing bulges and whirls
Pevzner, Tang and Waterman. “A New Approach to Fragment Assembly in DNA Sequencing”. RECOMB01
![Page 54: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/54.jpg)
Genome assembly viewer
EagleView
![Page 55: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/55.jpg)
Assembly quality metrics
Number of contigs, the longest contig N50, defined as the contig length such that using
equal or longer contigs produces half the bases of the genome (or all the contigs). – sorting all contigs from largest to smallest– contig sizes: 2M, 1M, 0.5M, 0.3M, 0.2M, … 500bp
with total bases = 4M, then N50 = 0.2M
![Page 56: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/56.jpg)
Genome assembly reborn Genome assembly reborn: recent computational
challenges (Briefings in Bioinformatics 2009 10(4):354-366)
Hybrid assembler (?)
![Page 57: Sequencing techniques and genome assembly](https://reader035.fdocuments.us/reader035/viewer/2022062222/56815869550346895dc5c794/html5/thumbnails/57.jpg)
Sequencing wars “Ion Torrent’s Fast and Cheap DNA
Sequencer Catches On, Even as Biologists Tighten Belts”– semiconductor-based and almost works like a pH
meter in some respects; Personal Genome Machine in December 2010
– Jonathan Rothberg founded 454 Life Sciences, sold to Roche in 2007
– Carlsbad, CA-based Life Technologies Sequencing Wars—The Third Generation