Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i...
Transcript of Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i...
![Page 1: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/1.jpg)
Bioinformatics and Sequencing
fromC. Robin Buell and Dave Douches
Michigan State Universityg yEast Lansing MI 48824
1
![Page 2: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/2.jpg)
Prokaryotic DNA
Plasmid
2
http://en.wikipedia.org/wiki/Image:Prokaryote_cell_diagram.svg
![Page 3: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/3.jpg)
Eukaryotic DNA
3http://en.wikipedia.org/wiki/Image:Plant_cell_structure_svg.svg
![Page 4: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/4.jpg)
DNA Structure
The two strands of a DNA molecule are held together by weak bonds g y(hydrogen bonds) between the nitrogenous bases, which are paired in the interior of the double helix.
The two strands of DNA are antiparallel; they run in opposite y ppdirections. The carbon atoms of the deoxyribose sugars are numbered for orientation.
4
http://en.wikipedia.org/wiki/Image:DNA_chemical_structure.png
![Page 5: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/5.jpg)
Sequencing DNA
The goal of sequencing DNA is to tell the order of theis to tell the order of the bases, or nucleotides, that form the inside of theform the inside of the double-helix molecule.
High throughput sequencing th dmethods
-Sanger/Dideoxy
-Next Generation (NextGen)(NextGen)
![Page 6: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/6.jpg)
Whole Genome Shotgun Sequencing
• Start with a whole genome
• Shear the DNA into many different, random segments.
• Sequence each of the random segments.
• Then, put the pieces back together again in their original order using a computerg g p
![Page 7: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/7.jpg)
7
![Page 8: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/8.jpg)
AGCTAGGCTC AGCTCGCTAGCTAGCTAGCT
SEQUENCER OUTPUTCTAGCTAGCTAGGCTC
AGCTCGCTAGCTATAGCTAGCTA
AGCTAGC
CTCGCTAGCTAGTAGCTAGC
GCTAGCTAGC
ASSEMBLE FRAGMENTS
TAGCTAGC
Gene 1Gene 2
TAGCTAGC
AGCTCGCTAGCTAGCTAGCTAGCTAGGCTC
AGCTCGCTAGCTAGCTAGCTAGC
Gene 3……
TAGCTAGCAGCTAGC
AGCTAGGCTCAGCTCGCTA
TAGCTAGCTACTAGCTAGCTAGGCTC
GCTAGCTAGCTCTCGCTAGCTAG
AGCTCGCTAGCTA
Fill in any gaps
Annotate genesCTCGCTAGCTAG Annotate genes
8
![Page 9: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/9.jpg)
Theory Behind Shotgun Sequencing
Haemophilus influenzae 1.83 Mb base 3000Haemophilus influenzae 1.83 Mb baseCoverage unsequenced (%)
1X 37%2X 13% 1500
2000
2500
3000
Gap
s
5X 0.67%6X 0.257X 0.09%
0
500
1000
0 20000 40000 60000 80000
Sequences
For 1.83 Mb genome, 6X coverage is 10.98 Mb of sequence, or 22,000 sequencing reactions, 11000 clones (1.5-2.0 kb insert), 500 bp average read.
Sequences
9
![Page 10: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/10.jpg)
Sanger Dideoxy Sequencing reactions-Initial dideoxy sequencing involved use of radioactive dATP and 4 separate reactions (ddATP, ddTTP, ddCTP, ddGTP) & separation on 4 separate lanes on an acrylamide gel with detection through autoradiogram
N t h l i 4 fl tl l b l d b-New techologies use 4 fluorescently labeled bases and separation on capillaries and detection through a CCD camera
10
![Page 11: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/11.jpg)
Sanger Dideoxy DNA sequencing
11
![Page 12: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/12.jpg)
Data Analysis
• An chromatogram is produced and the bases are called
•Software assign a quality value to each base •Phred & TraceTunerPhred & TraceTuner
•Read DNA sequencer traces•Call bases•Assign base quality values•Write basecalls and quality values to output files
![Page 13: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/13.jpg)
GOOD
BAD
13
![Page 14: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/14.jpg)
N t G ti S i T h lNext Generation Sequencing Technology
Main Differences from Sanger:
Sequencing by synthesis vs chain terminationPCR amplification of template vs E coli cloningPCR amplification of template vs E. coli cloning Pennies vs dollars96 vs hundreds of thousands/millions reads per run96 vs hundreds of thousands/millions reads per run36 bp vs 700 bp1-2% vs 0.01%
![Page 15: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/15.jpg)
Fundamental differences in Sanger vs Next Gen Sequencing Approaches
![Page 16: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/16.jpg)
454 Genome Sequencing System
• Library prep, amplification and sequencing: 2-4 daysy p p p q g y• Single sample preparation from bacterial to human genomic
DNASi l lifi i i h l i l i• Single amplification per genome with no cloning or cloning artifacts
• Picoliter volume molecular biologyPicoliter volume molecular biology• 400 Mb per run (4-5 hr); less than $ 15,000 per run• Read lengths 200-230 bases; new Titanium platform, 400 Mb g p
per run, 400-500 bases per reads• Massively parallel imaging, fluidics and data analysis • Requires high genome coverage for good assembly• Error rate of 1-2%• Problem with homopolymers
16
• Problem with homopolymers
![Page 17: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/17.jpg)
Two types of DNA amplification for NextGen: Clonal amplification or Bridge PCR
Clonal amplification by emulsion PCR (454, Polonator, SOLiD)
![Page 18: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/18.jpg)
454 Sequencing: Pyrosequencing
454: Sequencing reaction is coupled to another reaction to generate light from PPi using ATP and luciferin (via ATP sulfurase and luciferase); measure light emission based on nucleotide flowing across picotiter plate; intensity equals number of bases;
/most common error is insertions/deletion, especially at homopolymer bases
C t l tf t t ( i l) GS XLR70 ('Tit i ') 1 000 000 d @Current platform output (maximal): – GS XLR70 ('Titanium') = ~1,000,000 reads @ ~400bp=> 400 Mb per run
![Page 19: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/19.jpg)
454-Pyrosequencing
Perform emulsion Construct PCR Single stranded
adaptor liagated DNA
Depositing DNA Beads into the PicoTiter™Plate
Sequencing by Synthesis:Si lt i f th ti iSimultaneous sequencing of the entire genome in
hundreds of thousands of picoliter-size wells
Pyrophosphate signal generationPyrophosphate signal generation
![Page 20: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/20.jpg)
SoftwareFlowgrams and base calling
Key sequence = TCAG for identifying wells and calibrationFlow of individual bases (TCAG) is 100 times to get 100 bp reads.
TACG
Height of peak shows # of bases for homopolymer
TTCTGCGAA
Base flowSignal strength
20
Signal strength
![Page 21: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/21.jpg)
Accuracy of Homopolymers in E. coli:y p yIndividual Reads and 22x Consensus
100.00%
120.00% ConsensusReads
60 00%
80.00%
urac
y
40.00%
60.00%
Acc
u
0.00%
20.00%
1 2 3 4 5 6 7 8 9Homopolymer Length
21
![Page 22: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/22.jpg)
Software
Observed Individual Read Accuracy
3 5%4.0%
or09_29A09_29B
Observed Individual Read Accuracy
E. coli run #1E. coli run #2Reported in
2.5%3.0%
3.5%
ead
Erro 09_14 + 09_18A
09_18B+09_25ThermophilusC jejuni
E. coli run #3E. coli run #4T. thermophilusC. jejuni
Reported in Nature 2005
1.5%2.0%
5%
ativ
e R
e
GS20Q2 2006
0.5%
1.0%
Cum
ul
GS FLX
0.0%0 50 100 150 200 250
Base PositionAll filtered reads – includes all error modes
Lower quality sequence at 3’ of read
![Page 23: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/23.jpg)
Can divide the picotiter plate into segments of 2,4,8 and 16 to run different samples, run different libraries in each of sectors (gasket separates the sectors)
![Page 24: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/24.jpg)
GS FLX Titanium MID (Multiplex Indentifiers) Protocol• MID Adaptor = Shotgun Adaptor + specially encoded 10-base sequence after key
Seq Primer
ReadMID
Location
Normal Read
Read
Primer A Key Library fragment Primer B
Shotgun Adaptor
MID Read Primer A MIDKey Library fragment Primer B
Seq Primer
MID Adaptorp
Use Physical and Coded Multiplexing together to maximize run efficiency and reduce costs
12 MID x 16 sectors=192 different samples per run
![Page 25: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/25.jpg)
GS FLX Titanium Multi Span Paired End OverviewOverview
• High molecular weight genomic DNA is sheared to desired size of either 20 kb, 8 kb, or 3 kb span distance• Circularization adaptors containing a loxP target sequence are ligated onto fragment ends.• Cre recombinase mediated intra-molecular recombination circularizes fragments
![Page 26: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/26.jpg)
GS FLX Titanium Multi Span pPaired End Overview
• Ligation of 454 Sequencing adaptors• Adaptors required for emPCR and sequencing• Amplification and sequencing with the GS FLX Titanium series kits
![Page 27: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/27.jpg)
De Novo Sequencing of Microbial Genomes One Genome One Scaffold
e 8
kb
us 8
kb
S.p
ne
um
on
iae
E.c
oli 8
kb
T.t
he
rmo
ph
ilu
C.j
eju
ni 8
kbFour microbial genomes. One Run.
Assemblies were generated using one 8 kb library per genome on a 4 region gasket.
Microbial Genome AssemblyS E coli K-12 T thermophilus C jejuniS.
pneumoniae
E. coli K-12 T. thermophilus C. jejuni
Number of Chromosomes 1 1 2 1
Large Scaffolds 1 1 2 1
Genome Size 2.2 Mb 4.6 Mb 2.1 Mb 1.6 Mb
N50 Scaffold Size 2.2 Mb 4.6 Mb 1.9 Mb 1.6 Mb
N50 C ti Si 26 1 kb 57 1 kb 10 5 kb 153 8 kbN50 Contig Size 26.1 kb 57.1 kb 10.5 kb 153.8 kb
Genome Coverage 99.6% 100% 100% 99.3%
Oversampling 25x 15x 33x 33x
N b f R ¼ ¼ ¼ ¼Number of Runs ¼ ¼ ¼ ¼
Estimate cost at $3-5K per genome
![Page 28: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/28.jpg)
De Novo Sequencing of Complex Genomes
Complex Genome AssemblyDrosophila
melanogaste ArabidopCucumber
melanogaster
Arabidopsis
thaliana
Genome Size 175 Mb 157 Mb 376 MbGenome Size 175 Mb 157 Mb 376 Mb
N50 Contig Size 33 kb 37 kb 30 kb
N50 Scaffold Size 5.4 Mb 4.6 Mb 1.1 Mb
Largest Scaffold 10 9 Mb 9 3 Mb 5 6 MbLargest Scaffold 10.9 Mb 9.3 Mb 5.6 Mb
Oversampling
Shotgun 12x 17x 16x
3 kb Paired End 1 6x 1 8x 8x3 kb Paired End 1.6x 1.8x 8x
20 kb Paired End 1.6x 2x 1.5x
![Page 29: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/29.jpg)
Solexa/lIlumina Sequencing
• Sequencing by synthesis (not chain termination)• Generate up to 12 Gb per run• Bridge or Cluster PCR
29
![Page 30: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/30.jpg)
30
![Page 31: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/31.jpg)
Illumina/Solexa Genome Analyzer II
![Page 32: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/32.jpg)
Illumina sequencing reaction
Paired end readsMultiplexing via 96 barcodes per lane
Substitution errors at 3’ end of read
![Page 33: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/33.jpg)
![Page 34: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/34.jpg)
ABI SOLiD: •Sequencing by Oligonucleotide Ligation and Detection•Two base pair encoding (higher accuracy)•Ligation of octamers (4 fluors 4 2-base combinations per•Ligation of octamers (4 fluors, 4 2-base combinations per fluor)•Do one set of ligations for X cyclesR t ith b ff t t d th th b•Reset with one base offset to read the other base
•Paired end reads
![Page 35: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/35.jpg)
HAPLOID !!!! SMALL GENOME, LITTLE REPETITIVE SEQUENCE
![Page 36: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/36.jpg)
![Page 37: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/37.jpg)
NextGen Sequencing Data requires new algorithms;-due to size of datasets (Terabytes and RAM for assembly)-due to size of datasets (Terabytes and RAM for assembly)-short reads-error models
![Page 38: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/38.jpg)
Other “Next Generation” Sequencing Technologies
SoLiD by Applied Biosystems- short reads (~25-75 nucleotides)( )
Helicos- short reads (<50 nucleotides)( )
Pacific Biosystems-LONG reads (several y (kilobases)
http://www.pacificbiosciences.com/video lg.html
38
eo_lg.html
![Page 39: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/39.jpg)
How much sequences are needed to assemble a eukaryotic genome?
-Depends on the genome size of the organism (genes plus repeats), ploidy level, heterozygosity, desired quality
Wheat:16 000 Mb
Rice:Potato850 Mb 16,000 Mb 430 Mb850 Mb
5 Mb
ArabidopsisJ h D
39
130 Mb John Doe2,500 Mb
![Page 40: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/40.jpg)
Eukaryotic Genomes and Gene Structuresy
Gene GeneGeneIntergenicRegion
IntergenicRegioneg o eg o
40
![Page 41: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/41.jpg)
Expressed Sequence Tags (ESTs): Sampling the Transcriptome and Genic Regions
What is an EST? single pass sequence from cDNAg p qspecific tissue, stage, environment, etc.
pick individual clones
template prep
T7 T3
cDNA library in E.coli
clones
pBluescript
Insert in
Multiple tissues statesMultiple tissues, states..with enough sequences, can ask quantitative
questions
41
q
![Page 42: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/42.jpg)
Uses of EST sequencing:Gene disco ery-Gene discovery
-Digital northerns/insights into transcriptome-Genome analyses, especially annotation of genomic DNASNP discovery in genic regions-SNP discovery in genic regions
Issues with EST sequencing:Inherent low quality due to single pass nature Address through-Inherent low quality due to single pass nature
-Not 100 % full length cDNA clones -Redundant sequencing of abundant transcripts
Address throughclustering/
assembly to buildconsensus sequencesq
= Gene Index,Unigene Set,
Transcript Assembly
42
![Page 43: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/43.jpg)
Locus/Gene
Gene models
Full length cDNAs
Expressed Sequence Tags
43
![Page 44: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/44.jpg)
Types of Genomic/DNA-based Diagnostic Markersyp g
1 Restriction Fragment Length Polymorphisms (RFLPs)1. Restriction Fragment Length Polymorphisms (RFLPs)2. Random Amplification of Polymorphic DNA (RAPDs)3. Cleaved Amplified Polymorphisms (CAPs)4 A lifi d F t L th P l hi (AFLP )4. Amplified Fragment Length Polymorphisms (AFLPs)5. Simple Sequence Repeats (SSRs; microsatellites)6. Single Nucleotide Polymorphisms (SNPs)
44
![Page 45: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/45.jpg)
SSRs-Specific primers that flank simple sequence repeat (mono-, di-, tri-, tetra-, etc) which has a higher likelihood of a polymorphism-Amplify genomic DNA -Separate on gel-Look for size polymorphisms
http://cropandsoil oregonstate edu/classes/css430/images/0902 jpg
45http://www.nal.usda.gov/pgdic/Probe/v2n1/chart.gif
http://cropandsoil.oregonstate.edu/classes/css430/images/0902.jpg
![Page 46: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/46.jpg)
SSRs-Computational prediction of SSRs in potato transcriptome data-http://solanaceae.plantbiology.msu.edu/analyses_ssr_query.php
46
![Page 47: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/47.jpg)
SNPs-Specific primers-Amplify genomic DNA-Detect mismatch (many methods for this)
47http://cmbi.bjmu.edu.cn/cmbidata/snp/images/SNP.gif
![Page 48: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/48.jpg)
What is a SNP?
Single Nucleotide Polymorphism
What is a SNP?
Single Nucleotide Polymorphism• A Polymorphism describes the existence of
different alleles of the same gene in plants or adifferent alleles of the same gene in plants or a population of plants. These differences are tracked as molecular markers to identify desired genes andas molecular markers to identify desired genes and the resulting trait.
• SNPs are the result of point mutationsSNPs are the result of point mutations• Deletions, insertions, or substitutions
![Page 49: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/49.jpg)
How are SNPs found?
1 Th f th i i bt i d f
How are SNPs found?
1. The sequence of the organism is obtained from a database or BAC library
2 PCR primers are designed to flank potential SNP2. PCR primers are designed to flank potential SNP containing DNA segments
3. Amplify DNA of diverse individuals by PCR4. Sequence the amplified fragments for each
individual5 Compare the sequences and look for SNPs5. Compare the sequences and look for SNPs
1. TAGCAATGCCTAATGCCAT2. TAGCAATGCCTACTGCCAT
![Page 50: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/50.jpg)
Different SNP detection methodsDifferent SNP detection methods
• High Resolution Melting
• Single base extension
• Allele specific priming
![Page 51: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/51.jpg)
High Resolution Melting (HMR)
U hi h t t d t d DNA i il l t
High Resolution Melting (HMR)
• Uses high temps to denature dsDNA, similarly to PCR
• Intercalating (can fit between between bases in DNAIntercalating (can fit between between bases in DNA strands) fluorescent dyes are used to monitor the DNA fragments• When bound to dsDNA there is a strong
fluorescence• As the DNA is denatured the fluorescence• As the DNA is denatured the fluorescence
decreases• A camera monitors the change in fluorescence
creating a melting curve
![Page 52: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/52.jpg)
High Resolution Melting (HMR)
• If there is a SNP present, the change in the base will
High Resolution Melting (HMR)
If there is a SNP present, the change in the base will change the melting curve slightly• This change is small but because the camera is high
l ti th d t t blresolution they are detectable.
www.wikipedia.org
![Page 53: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/53.jpg)
High Resolution Melting (HMR)High Resolution Melting (HMR)
• Benefits:• cost effective – good for large scale
projects• Fast and accurate• Simple – can be done in any lab with a
HMR capable real-time PCR machine
![Page 54: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/54.jpg)
Single Base Extension
I th PCR ti f ll i th lifi ti f th
Single Base Extension
• In the PCR reaction, following the amplification of the SNP fragment, a single base sequencing reaction occurs• Contains primer designed to anneal one base
short of the polymorphic site• Primer is usually labeled with fluorescent dye
![Page 55: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/55.jpg)
Single Base Extension
A th PCR ti ti
Single Base Extension
• As the PCR reaction continues:• If there is a SNP, the DNA Polymerase will not
extend the primer resulting in a shorter fragmentextend the primer, resulting in a shorter fragment when visualized on a gel
http://las.perkinelmer.com/content/snps/protocol.asp
![Page 56: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/56.jpg)
Allele Specific Priming
Wh d i i th i th SNP h ld b t th
Allele Specific Priming
• When designing the primer the SNP should be at the 3’ end of the primer and the labeled/TAG sequence at the 5’ end• Different labels for different alleles
• The SNP must be known and primers for each allele for this SNP needs to be designed
• The different allele primers will be detected only if they are attached to the templatethey are attached to the template
![Page 57: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/57.jpg)
Illumina GoldenGate Assay Overview andGoldenGate Assay Workflow
57
![Page 58: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/58.jpg)
Potato SNPs: Intra-varietal and inter-varietal-Bulk of sequence data from ESTs (Sanger derived)q ( g )-Use computational methods to identify SNPs within existing potato ESTs-http://solanaceae plantbiology msu edu/analyses snp php-http://solanaceae.plantbiology.msu.edu/analyses_snp.php
58
![Page 59: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/59.jpg)
Illumina Paired End RNA SeqIllumina Paired End RNA-Seq
P t t V i ti Atl ti P i S d• Potato Varieties: Atlantic, Premier, Snowden• Two Paired End RNA-Seq runs were performed.• Reads are 61bp long• Reads are 61bp long• Insert sizes:
• Atlantic: 350bpAtlantic: 350bp• Premier: 300bp• Snowden: 300bp
• Paired End Sequencing is carried out by an Illumina module that regenerates the clusters after the first run and sequences the clusters from the other endand sequences the clusters from the other end.
![Page 60: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/60.jpg)
Velvet Assemblies of Potato Illumina SequencesIllumina Sequences
• With a read length of 31 and a minimum contig length of 150bp:
• Atlantic:Atlantic:• 45214 contigs• n50: 666bp• max contig length: 11 2kb• max contig length: 11.2kb• Transcriptome size: 38.4Mb
• Premier:• 54917 contigs• 54917 contigs• n50: 408bp• max contig length: 6.6kb
T i t i 38 2Mb• Transcriptome size: 38.2Mb• Snowden:
• 58754 contigs50 358b• n50: 358bp
• max contig length: 6.9kb• Transcriptome size: 39.1Mb
![Page 61: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/61.jpg)
Sequence quality: Viewing a Atlantic potato ti f th V l t blcontig from the Velvet assembly
![Page 62: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/62.jpg)
Q SNP Filt d SNP
Identify intra-varietal SNPsQuery SNPs Filtered SNPsAtlantic Asm 224748 150669Premier Asm 265673 181800Premier Asm 265673 181800Snowden Asm 258872 166253
A/C SNP
![Page 63: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/63.jpg)
Hawkeye Viewer Visualizing SNPsHawkeye Viewer – Visualizing SNPs
G/T SNP
![Page 64: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/64.jpg)
Analyses in progressAnalyses in progressSNP Identification:-Identify inter-varietal SNPs using draftIdentify inter varietal SNPs using draft genome sequence from S. phureja-Identify only biallelic SNPsIdentify only biallelic SNPs-Identify high confidence SNPs-Identify SNPs that meet Infinium designIdentify SNPs that meet Infinium design requirements
SNP Selection:-Annotate transcripts for gene functionAnnotate transcripts for gene function-Identify candidate genes within SNP set
![Page 65: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/65.jpg)
Clonal amplification by emulsion PCR (454, Polonator, SOLiD)
Bridge or Cluster PCR (Illumina/Solexa)
![Page 66: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/66.jpg)
![Page 67: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/67.jpg)
67
![Page 68: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing](https://reader034.fdocuments.us/reader034/viewer/2022042201/5ea1f46cbf617b010358a623/html5/thumbnails/68.jpg)
68