Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i...

68
Bioinformatics and Sequencing from C. Robin Buell and Dave Douches Michigan State University East Lansing MI 48824 1

Transcript of Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i...

Page 1: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Bioinformatics and Sequencing

fromC. Robin Buell and Dave Douches

Michigan State Universityg yEast Lansing MI 48824

1

Page 2: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Prokaryotic DNA

Plasmid

2

http://en.wikipedia.org/wiki/Image:Prokaryote_cell_diagram.svg

Page 3: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Eukaryotic DNA

3http://en.wikipedia.org/wiki/Image:Plant_cell_structure_svg.svg

Page 4: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

DNA Structure

The two strands of a DNA molecule are held together by weak bonds g y(hydrogen bonds) between the nitrogenous bases, which are paired in the interior of the double helix.

The two strands of DNA are antiparallel; they run in opposite y ppdirections. The carbon atoms of the deoxyribose sugars are numbered for orientation.

4

http://en.wikipedia.org/wiki/Image:DNA_chemical_structure.png

Page 5: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Sequencing DNA

The goal of sequencing DNA is to tell the order of theis to tell the order of the bases, or nucleotides, that form the inside of theform the inside of the double-helix molecule.

High throughput sequencing th dmethods

-Sanger/Dideoxy

-Next Generation (NextGen)(NextGen)

Page 6: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Whole Genome Shotgun Sequencing

• Start with a whole genome

• Shear the DNA into many different, random segments.

• Sequence each of the random segments.

• Then, put the pieces back together again in their original order using a computerg g p

Page 7: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

7

Page 8: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

AGCTAGGCTC AGCTCGCTAGCTAGCTAGCT

SEQUENCER OUTPUTCTAGCTAGCTAGGCTC

AGCTCGCTAGCTATAGCTAGCTA

AGCTAGC

CTCGCTAGCTAGTAGCTAGC

GCTAGCTAGC

ASSEMBLE FRAGMENTS

TAGCTAGC

Gene 1Gene 2

TAGCTAGC

AGCTCGCTAGCTAGCTAGCTAGCTAGGCTC

AGCTCGCTAGCTAGCTAGCTAGC

Gene 3……

TAGCTAGCAGCTAGC

AGCTAGGCTCAGCTCGCTA

TAGCTAGCTACTAGCTAGCTAGGCTC

GCTAGCTAGCTCTCGCTAGCTAG

AGCTCGCTAGCTA

Fill in any gaps

Annotate genesCTCGCTAGCTAG Annotate genes

8

Page 9: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Theory Behind Shotgun Sequencing

Haemophilus influenzae 1.83 Mb base 3000Haemophilus influenzae 1.83 Mb baseCoverage unsequenced (%)

1X 37%2X 13% 1500

2000

2500

3000

Gap

s

5X 0.67%6X 0.257X 0.09%

0

500

1000

0 20000 40000 60000 80000

Sequences

For 1.83 Mb genome, 6X coverage is 10.98 Mb of sequence, or 22,000 sequencing reactions, 11000 clones (1.5-2.0 kb insert), 500 bp average read.

Sequences

9

Page 10: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Sanger Dideoxy Sequencing reactions-Initial dideoxy sequencing involved use of radioactive dATP and 4 separate reactions (ddATP, ddTTP, ddCTP, ddGTP) & separation on 4 separate lanes on an acrylamide gel with detection through autoradiogram

N t h l i 4 fl tl l b l d b-New techologies use 4 fluorescently labeled bases and separation on capillaries and detection through a CCD camera

10

Page 11: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Sanger Dideoxy DNA sequencing

11

Page 12: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Data Analysis

• An chromatogram is produced and the bases are called

•Software assign a quality value to each base •Phred & TraceTunerPhred & TraceTuner

•Read DNA sequencer traces•Call bases•Assign base quality values•Write basecalls and quality values to output files

Page 13: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

GOOD

BAD

13

Page 14: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

N t G ti S i T h lNext Generation Sequencing Technology

Main Differences from Sanger:

Sequencing by synthesis vs chain terminationPCR amplification of template vs E coli cloningPCR amplification of template vs E. coli cloning Pennies vs dollars96 vs hundreds of thousands/millions reads per run96 vs hundreds of thousands/millions reads per run36 bp vs 700 bp1-2% vs 0.01%

Page 15: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Fundamental differences in Sanger vs Next Gen Sequencing Approaches

Page 16: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

454 Genome Sequencing System

• Library prep, amplification and sequencing: 2-4 daysy p p p q g y• Single sample preparation from bacterial to human genomic

DNASi l lifi i i h l i l i• Single amplification per genome with no cloning or cloning artifacts

• Picoliter volume molecular biologyPicoliter volume molecular biology• 400 Mb per run (4-5 hr); less than $ 15,000 per run• Read lengths 200-230 bases; new Titanium platform, 400 Mb g p

per run, 400-500 bases per reads• Massively parallel imaging, fluidics and data analysis • Requires high genome coverage for good assembly• Error rate of 1-2%• Problem with homopolymers

16

• Problem with homopolymers

Page 17: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Two types of DNA amplification for NextGen: Clonal amplification or Bridge PCR

Clonal amplification by emulsion PCR (454, Polonator, SOLiD)

Page 18: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

454 Sequencing: Pyrosequencing

454: Sequencing reaction is coupled to another reaction to generate light from PPi using ATP and luciferin (via ATP sulfurase and luciferase); measure light emission based on nucleotide flowing across picotiter plate; intensity equals number of bases;

/most common error is insertions/deletion, especially at homopolymer bases

C t l tf t t ( i l) GS XLR70 ('Tit i ') 1 000 000 d @Current platform output (maximal): – GS XLR70 ('Titanium') = ~1,000,000 reads @ ~400bp=> 400 Mb per run

Page 19: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

454-Pyrosequencing

Perform emulsion Construct PCR Single stranded

adaptor liagated DNA

Depositing DNA Beads into the PicoTiter™Plate

Sequencing by Synthesis:Si lt i f th ti iSimultaneous sequencing of the entire genome in

hundreds of thousands of picoliter-size wells

Pyrophosphate signal generationPyrophosphate signal generation

Page 20: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

SoftwareFlowgrams and base calling

Key sequence = TCAG for identifying wells and calibrationFlow of individual bases (TCAG) is 100 times to get 100 bp reads.

TACG

Height of peak shows # of bases for homopolymer

TTCTGCGAA

Base flowSignal strength

20

Signal strength

Page 21: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Accuracy of Homopolymers in E. coli:y p yIndividual Reads and 22x Consensus

100.00%

120.00% ConsensusReads

60 00%

80.00%

urac

y

40.00%

60.00%

Acc

u

0.00%

20.00%

1 2 3 4 5 6 7 8 9Homopolymer Length

21

Page 22: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Software

Observed Individual Read Accuracy

3 5%4.0%

or09_29A09_29B

Observed Individual Read Accuracy

E. coli run #1E. coli run #2Reported in

2.5%3.0%

3.5%

ead

Erro 09_14 + 09_18A

09_18B+09_25ThermophilusC jejuni

E. coli run #3E. coli run #4T. thermophilusC. jejuni

Reported in Nature 2005

1.5%2.0%

5%

ativ

e R

e

GS20Q2 2006

0.5%

1.0%

Cum

ul

GS FLX

0.0%0 50 100 150 200 250

Base PositionAll filtered reads – includes all error modes

Lower quality sequence at 3’ of read

Page 23: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Can divide the picotiter plate into segments of 2,4,8 and 16 to run different samples, run different libraries in each of sectors (gasket separates the sectors)

Page 24: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

GS FLX Titanium MID (Multiplex Indentifiers) Protocol• MID Adaptor = Shotgun Adaptor + specially encoded 10-base sequence after key

Seq Primer

ReadMID

Location

Normal Read

Read

Primer A Key Library fragment Primer B

Shotgun Adaptor

MID Read Primer A MIDKey Library fragment Primer B

Seq Primer

MID Adaptorp

Use Physical and Coded Multiplexing together to maximize run efficiency and reduce costs

12 MID x 16 sectors=192 different samples per run

Page 25: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

GS FLX Titanium Multi Span Paired End OverviewOverview

• High molecular weight genomic DNA is sheared to desired size of either 20 kb, 8 kb, or 3 kb span distance• Circularization adaptors containing a loxP target sequence are ligated onto fragment ends.• Cre recombinase mediated intra-molecular recombination circularizes fragments

Page 26: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

GS FLX Titanium Multi Span pPaired End  Overview

• Ligation of 454 Sequencing adaptors• Adaptors required for emPCR and sequencing• Amplification and sequencing with the GS FLX Titanium series kits

Page 27: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

De Novo Sequencing of Microbial Genomes One Genome One Scaffold

e 8

kb

us 8

kb

S.p

ne

um

on

iae

E.c

oli 8

kb

T.t

he

rmo

ph

ilu

C.j

eju

ni 8

kbFour microbial genomes. One Run.

Assemblies were generated using one 8 kb library per genome on a 4 region gasket.

Microbial Genome AssemblyS E coli K-12 T thermophilus C jejuniS.

pneumoniae

E. coli K-12 T. thermophilus C. jejuni

Number of Chromosomes 1 1 2 1

Large Scaffolds 1 1 2 1

Genome Size 2.2 Mb 4.6 Mb 2.1 Mb 1.6 Mb

N50 Scaffold Size 2.2 Mb 4.6 Mb 1.9 Mb 1.6 Mb

N50 C ti Si 26 1 kb 57 1 kb 10 5 kb 153 8 kbN50 Contig Size 26.1 kb 57.1 kb 10.5 kb 153.8 kb

Genome Coverage 99.6% 100% 100% 99.3%

Oversampling 25x 15x 33x 33x

N b f R ¼ ¼ ¼ ¼Number of Runs ¼ ¼ ¼ ¼

Estimate cost at $3-5K per genome

Page 28: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

De Novo Sequencing of Complex Genomes

Complex Genome AssemblyDrosophila

melanogaste ArabidopCucumber

melanogaster

Arabidopsis

thaliana

Genome Size 175 Mb 157 Mb 376 MbGenome Size 175 Mb 157 Mb 376 Mb

N50 Contig Size 33 kb 37 kb 30 kb

N50 Scaffold Size 5.4 Mb 4.6 Mb 1.1 Mb

Largest Scaffold 10 9 Mb 9 3 Mb 5 6 MbLargest Scaffold 10.9 Mb 9.3 Mb 5.6 Mb

Oversampling

Shotgun 12x 17x 16x

3 kb Paired End 1 6x 1 8x 8x3 kb Paired End 1.6x 1.8x 8x

20 kb Paired End 1.6x 2x 1.5x

Page 29: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Solexa/lIlumina Sequencing

• Sequencing by synthesis (not chain termination)• Generate up to 12 Gb per run• Bridge or Cluster PCR

29

Page 30: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

30

Page 31: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Illumina/Solexa Genome Analyzer II

Page 32: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Illumina sequencing reaction

Paired end readsMultiplexing via 96 barcodes per lane

Substitution errors at 3’ end of read

Page 33: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing
Page 34: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

ABI SOLiD: •Sequencing by Oligonucleotide Ligation and Detection•Two base pair encoding (higher accuracy)•Ligation of octamers (4 fluors 4 2-base combinations per•Ligation of octamers (4 fluors, 4 2-base combinations per fluor)•Do one set of ligations for X cyclesR t ith b ff t t d th th b•Reset with one base offset to read the other base

•Paired end reads

Page 35: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

HAPLOID !!!! SMALL GENOME, LITTLE REPETITIVE SEQUENCE

Page 36: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing
Page 37: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

NextGen Sequencing Data requires new algorithms;-due to size of datasets (Terabytes and RAM for assembly)-due to size of datasets (Terabytes and RAM for assembly)-short reads-error models

Page 38: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Other “Next Generation” Sequencing Technologies

SoLiD by Applied Biosystems- short reads (~25-75 nucleotides)( )

Helicos- short reads (<50 nucleotides)( )

Pacific Biosystems-LONG reads (several y (kilobases)

http://www.pacificbiosciences.com/video lg.html

38

eo_lg.html

Page 39: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

How much sequences are needed to assemble a eukaryotic genome?

-Depends on the genome size of the organism (genes plus repeats), ploidy level, heterozygosity, desired quality

Wheat:16 000 Mb

Rice:Potato850 Mb 16,000 Mb 430 Mb850 Mb

5 Mb

ArabidopsisJ h D

39

130 Mb John Doe2,500 Mb

Page 40: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Eukaryotic Genomes and Gene Structuresy

Gene GeneGeneIntergenicRegion

IntergenicRegioneg o eg o

40

Page 41: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Expressed Sequence Tags (ESTs): Sampling the Transcriptome and Genic Regions

What is an EST? single pass sequence from cDNAg p qspecific tissue, stage, environment, etc.

pick individual clones

template prep

T7 T3

cDNA library in E.coli

clones

pBluescript

Insert in

Multiple tissues statesMultiple tissues, states..with enough sequences, can ask quantitative

questions

41

q

Page 42: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Uses of EST sequencing:Gene disco ery-Gene discovery

-Digital northerns/insights into transcriptome-Genome analyses, especially annotation of genomic DNASNP discovery in genic regions-SNP discovery in genic regions

Issues with EST sequencing:Inherent low quality due to single pass nature Address through-Inherent low quality due to single pass nature

-Not 100 % full length cDNA clones -Redundant sequencing of abundant transcripts

Address throughclustering/

assembly to buildconsensus sequencesq

= Gene Index,Unigene Set,

Transcript Assembly

42

Page 43: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Locus/Gene

Gene models

Full length cDNAs

Expressed Sequence Tags

43

Page 44: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Types of Genomic/DNA-based Diagnostic Markersyp g

1 Restriction Fragment Length Polymorphisms (RFLPs)1. Restriction Fragment Length Polymorphisms (RFLPs)2. Random Amplification of Polymorphic DNA (RAPDs)3. Cleaved Amplified Polymorphisms (CAPs)4 A lifi d F t L th P l hi (AFLP )4. Amplified Fragment Length Polymorphisms (AFLPs)5. Simple Sequence Repeats (SSRs; microsatellites)6. Single Nucleotide Polymorphisms (SNPs)

44

Page 45: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

SSRs-Specific primers that flank simple sequence repeat (mono-, di-, tri-, tetra-, etc) which has a higher likelihood of a polymorphism-Amplify genomic DNA -Separate on gel-Look for size polymorphisms

http://cropandsoil oregonstate edu/classes/css430/images/0902 jpg

45http://www.nal.usda.gov/pgdic/Probe/v2n1/chart.gif

http://cropandsoil.oregonstate.edu/classes/css430/images/0902.jpg

Page 46: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

SSRs-Computational prediction of SSRs in potato transcriptome data-http://solanaceae.plantbiology.msu.edu/analyses_ssr_query.php

46

Page 47: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

SNPs-Specific primers-Amplify genomic DNA-Detect mismatch (many methods for this)

47http://cmbi.bjmu.edu.cn/cmbidata/snp/images/SNP.gif

Page 48: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

What is a SNP?

Single Nucleotide Polymorphism

What is a SNP?

Single Nucleotide Polymorphism• A Polymorphism describes the existence of

different alleles of the same gene in plants or adifferent alleles of the same gene in plants or a population of plants. These differences are tracked as molecular markers to identify desired genes andas molecular markers to identify desired genes and the resulting trait.

• SNPs are the result of point mutationsSNPs are the result of point mutations• Deletions, insertions, or substitutions

Page 49: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

How are SNPs found?

1 Th f th i i bt i d f

How are SNPs found?

1. The sequence of the organism is obtained from a database or BAC library

2 PCR primers are designed to flank potential SNP2. PCR primers are designed to flank potential SNP containing DNA segments

3. Amplify DNA of diverse individuals by PCR4. Sequence the amplified fragments for each

individual5 Compare the sequences and look for SNPs5. Compare the sequences and look for SNPs

1. TAGCAATGCCTAATGCCAT2. TAGCAATGCCTACTGCCAT

Page 50: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Different SNP detection methodsDifferent SNP detection methods

• High Resolution Melting

• Single base extension

• Allele specific priming

Page 51: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

High Resolution Melting (HMR)

U hi h t t d t d DNA i il l t

High Resolution Melting (HMR)

• Uses high temps to denature dsDNA, similarly to PCR

• Intercalating (can fit between between bases in DNAIntercalating (can fit between between bases in DNA strands) fluorescent dyes are used to monitor the DNA fragments• When bound to dsDNA there is a strong

fluorescence• As the DNA is denatured the fluorescence• As the DNA is denatured the fluorescence

decreases• A camera monitors the change in fluorescence

creating a melting curve

Page 52: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

High Resolution Melting (HMR)

• If there is a SNP present, the change in the base will

High Resolution Melting (HMR)

If there is a SNP present, the change in the base will change the melting curve slightly• This change is small but because the camera is high

l ti th d t t blresolution they are detectable.

www.wikipedia.org

Page 53: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

High Resolution Melting (HMR)High Resolution Melting (HMR)

• Benefits:• cost effective – good for large scale

projects• Fast and accurate• Simple – can be done in any lab with a

HMR capable real-time PCR machine

Page 54: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Single Base Extension

I th PCR ti f ll i th lifi ti f th

Single Base Extension

• In the PCR reaction, following the amplification of the SNP fragment, a single base sequencing reaction occurs• Contains primer designed to anneal one base

short of the polymorphic site• Primer is usually labeled with fluorescent dye

Page 55: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Single Base Extension

A th PCR ti ti

Single Base Extension

• As the PCR reaction continues:• If there is a SNP, the DNA Polymerase will not

extend the primer resulting in a shorter fragmentextend the primer, resulting in a shorter fragment when visualized on a gel

http://las.perkinelmer.com/content/snps/protocol.asp

Page 56: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Allele Specific Priming

Wh d i i th i th SNP h ld b t th

Allele Specific Priming

• When designing the primer the SNP should be at the 3’ end of the primer and the labeled/TAG sequence at the 5’ end• Different labels for different alleles

• The SNP must be known and primers for each allele for this SNP needs to be designed

• The different allele primers will be detected only if they are attached to the templatethey are attached to the template

Page 57: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Illumina GoldenGate Assay Overview andGoldenGate Assay Workflow

57

Page 58: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Potato SNPs: Intra-varietal and inter-varietal-Bulk of sequence data from ESTs (Sanger derived)q ( g )-Use computational methods to identify SNPs within existing potato ESTs-http://solanaceae plantbiology msu edu/analyses snp php-http://solanaceae.plantbiology.msu.edu/analyses_snp.php

58

Page 59: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Illumina Paired End RNA SeqIllumina Paired End RNA-Seq

P t t V i ti Atl ti P i S d• Potato Varieties: Atlantic, Premier, Snowden• Two Paired End RNA-Seq runs were performed.• Reads are 61bp long• Reads are 61bp long• Insert sizes:

• Atlantic: 350bpAtlantic: 350bp• Premier: 300bp• Snowden: 300bp

• Paired End Sequencing is carried out by an Illumina module that regenerates the clusters after the first run and sequences the clusters from the other endand sequences the clusters from the other end.

Page 60: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Velvet Assemblies of Potato Illumina SequencesIllumina Sequences

• With a read length of 31 and a minimum contig length of 150bp:

• Atlantic:Atlantic:• 45214 contigs• n50: 666bp• max contig length: 11 2kb• max contig length: 11.2kb• Transcriptome size: 38.4Mb

• Premier:• 54917 contigs• 54917 contigs• n50: 408bp• max contig length: 6.6kb

T i t i 38 2Mb• Transcriptome size: 38.2Mb• Snowden:

• 58754 contigs50 358b• n50: 358bp

• max contig length: 6.9kb• Transcriptome size: 39.1Mb

Page 61: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Sequence quality: Viewing a Atlantic potato ti f th V l t blcontig from the Velvet assembly

Page 62: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Q SNP Filt d SNP

Identify intra-varietal SNPsQuery SNPs Filtered SNPsAtlantic Asm 224748 150669Premier Asm 265673 181800Premier Asm 265673 181800Snowden Asm 258872 166253

A/C SNP

Page 63: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Hawkeye Viewer Visualizing SNPsHawkeye Viewer – Visualizing SNPs

G/T SNP

Page 64: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Analyses in progressAnalyses in progressSNP Identification:-Identify inter-varietal SNPs using draftIdentify inter varietal SNPs using draft genome sequence from S. phureja-Identify only biallelic SNPsIdentify only biallelic SNPs-Identify high confidence SNPs-Identify SNPs that meet Infinium designIdentify SNPs that meet Infinium design requirements

SNP Selection:-Annotate transcripts for gene functionAnnotate transcripts for gene function-Identify candidate genes within SNP set

Page 65: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

Clonal amplification by emulsion PCR (454, Polonator, SOLiD)

Bridge or Cluster PCR (Illumina/Solexa)

Page 66: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing
Page 67: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

67

Page 68: Bioinformatics and Sequencing - Michigan State University CSS 451... · 2010-04-13 · NtG ti S i Th lNext Generation Sequencing Technology Main Differences from Sanger: Sequencing

68