Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa...
Transcript of Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa...
![Page 1: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/1.jpg)
Applications of Applications of Illumina/SolexaIllumina/Solexasequencing technology for grape sequencing technology for grape
genomicsgenomics
Federica CattonaroIllumina Seminar
Milano, 13 giugno 2009
![Page 2: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/2.jpg)
THE FRENCH-ITALIAN PUBLIC CONSORTIUM FOR THE SEQUENCING OF THE GRAPEVINE NUCLEAR
GENOME
Genoscope(INRA)6Mio€
IGA 5Mio€
MIPAF-VIGNA:12 partners tra cui DiSA6Mio€
Genoscope(INRA)6Mio€
IGA 5Mio€
MIPAF-VIGNA:12 partners tra cui DiSA6Mio€
![Page 3: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/3.jpg)
THE PLANT TO BE SEQUENCED
PN40024Cultivated varieties
PN40024 is a nearly homozygous clone (~ 93%) derived from Pinot noir and Helfesteiner after 6 cycles of selfing (Colmar, Fr)
![Page 4: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/4.jpg)
WGS sequencing
480 Mbp
2n = 38Whole genome shotgun
Final coverage: 12X genome equivalents
Final assembly size: 481 Mbp; N50=42; N50 size= 3,4 MB
![Page 5: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/5.jpg)
Anchoring the sequence to the genome
>93% of genome sequence is anchored in 208 scaffolds
![Page 6: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/6.jpg)
Repetitive DNA and Transposable Element annotation
Three different approaches used:- ReAS software- Search for TE encoded proteins- Manual annotation of TEEstimated repetitive fraction: 42 %
TRANSPOSABLEELEMENTSSINGLE COPY DNA
REPETITIVEDNA
57% 30%
12%
![Page 7: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/7.jpg)
Grapevine Gene Annotation
Combination of evidences- Proteins (Uniprot database) - Available grapevine EST 317,000- Newly sequences FL cDNAs 38,500- Eudicotyledon ESTs 2,181,790- Ab initio prediction (Geneid, SNAP, Exofish, Genewise, Est2genome) - Gene model building using GAZE
Predicted Genes: 30,434- 372 codons and 5 exons per gene- Exon CDS ~ 7% of the genome- Introns ~ 36.7 % of the genome
![Page 8: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/8.jpg)
IMPROVING GENE AND REPEAT ANNOTATION
• Use new high throughput sequencing technologies (Illumina RNA-Seq)
• Whole transcriptome shotgun sequencing• RNA-Seq• 4 different tissues, same strain sequenced• Improve gene annotation
• Massive sequencing of smallRNAs• microRNA• siRNA: repeat annotation
![Page 9: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/9.jpg)
IMPROVING GENE PREDICTIONS
Stem reads within exons
Stem reads spanning predicted exon-exon boundaries
Missed intron 5’UTR
Confirmed exon-exon junctions
New exon-exon junction
![Page 10: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/10.jpg)
GRAPE TRANSCRIPTIONAL LANDSCAPE
• No pervasive transcription throughout genome
• Rare alternative splicing events• Very rare antisense transcripts• LINE, Copia-LTR elements are
transcriptionally active
![Page 11: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/11.jpg)
RESEQUENCING: GOING AFTER VARIATION
600000 Sanger paired reads (0.9X-2.9X) from Tocai cv. plasmid library (4.2 kb)
Nucleotide variation (SNPs)Structural variation (large indels)
![Page 12: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/12.jpg)
DETECTING STRUCTURAL VARIATION BI PAIR-END MAPPING
1
Modified from Korbel et al., Science Oct 2007
![Page 13: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/13.jpg)
TE INSERTIONS AS CAUSES FOR SV
LINE
LINE
COPIA LTR
hAT
![Page 14: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/14.jpg)
Tocai and PN40024 genome differ to a greater extent in SVs than in SNPs in terms of affected nucleotides
• At least 1800 unique insertions in PN40024 with respect to Tocai
• ~2000 unique insertions detected in Tocai with respect to PN40024
• At least 150 unique inversions
• Preliminary count of at least 1 SV event every 133 Kbp
• Many insertions due to transposition events• Polymorphic LINE insertions in introns
![Page 15: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/15.jpg)
Gret1
VvmybA1
Old and rare mutation
Frequentmutation/ reversion
Gret1solo LTR
VvmybA1a
VvmybA1b
CREATION OF NEW PHENOTYPIC VARIATION IN GRAPE BREEDING
From Kobayashi et al., Science 2006
![Page 16: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/16.jpg)
a:2.72
b:1.37
c:1.322A B C D E F SHARED SEQUENCE: d:
2.54
l:0.70
h:1.39
30kb
m:1.52
e:2.83
f:0.11
i:4.15
k:0.17
j: 0.47
O:2.01
n:0.31
p:0.99
g:2.181 3Mo17-specific:
s:1.20 0.21q:
0.61r:
1.69
0.43
y:0.80
u:1.32
t:1.58
v: 0.72
w:0.54
x:0.224
S
B73-specific:
aa:1.69 0.63
2.62
NOP
z: 2.04
R 5TG H I J K L M
UM2
giepumRepetitive elements:
1) Transposons:
3) Retrotransposons:Genes: A) – T): geneA9002 – geneT9002
ji
opie
jaws
prem
huck
xilon
zeon
non-LTR rire
dagaf
shadowspawn
ruda
raider2) MITE: M2
Brunner et al., Plant Cell 2005locus9002 (bin 1.08; chromosome 1L)
THE HYPERVARIABLE MAIZE GENOME: HYPER STRUCTURAL VARIATION (HSV)
Mo17
B73
NOPQ
![Page 17: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/17.jpg)
THE MAIZE PAN-GENOME
Mo17 B73
dispensablegenome
25% 25%50%core
genome
Morgante et al. Curr. Opin. Pl. Biol. 2007
![Page 18: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/18.jpg)
THE PAN-GENOME CONCEPT
PN40024 Tocai
dispensablegenome
core genome
![Page 19: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/19.jpg)
DNA shearing
Next generation sequencing(Illumina Genome Analyzer II)
AGCTGCTAGCTAGCTTGAGATCGATCGTTCGATCGATCGCATTTATTCGGATGATGCATCGTACTATCGAT…
Alignment to reference sequence using FAST and allowing up to 2 mismatchesAGCTGCTAGCTAGCTTGAGATCGATCGTTCGATCGATCGCATTTATTCGGATGATGCATCGTACTATCGAT…AGCTGCTAGCTAGCTTGAGATCGATCGTTCGATCGATCGCATTTATTCGGATGATGCATCGTACTATCGAT…AGCTGCTAGCTAGCTTGAGATCGATCGTTCGATCGATCGCATTTATTCGGATGATGCATCGTACTATCGAT…AGCTGCTAGCTAGCTTGAGATCGATCGTTCGATCGATCGCATTTATTCGGATGATGCATCGTACTATCGAT…AGCTGCTAGCTAGCTTGAGATCGATCGTTCGATCGATCGCATTTATTCGGATGATGCATCGTACTATCGAT…
Re-sequencing
Grapevine genome = 19 chromosomes = 487 Mbp
![Page 20: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/20.jpg)
Re-sequencing of grapevine varieties and clones
36 x 2 bp PE-reads 3 pairs of grape clones
![Page 21: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/21.jpg)
CLONES• Derive from somatic mutations
(originate from vegetative propagation)
• Genetically close (same variety)
• Difficult to differentiate with genetic analysis
Resequencing of somatic mutants i.e. clones
Pinot clones
VARIETIES• Derive from crosses
(originate from sexual reproduction)
• Genetically distinct
• Easy to differentiate with genetic analysis
Sangioveseclones
![Page 22: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/22.jpg)
RESEQUENCING FOR WHOLE GENOME ANALYSIS
• Resequence multiple grapevine varieties using Illumina paired-end reads
• Align reads from each resequenced individual to reference genome (PN40024)
• Identify large structural variants from read coverage• Identify short structural variants (TE-related) from
paired-end reads• Identify SNPs from uniquely placed reads
![Page 23: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/23.jpg)
RESEQUENCING FOR LARGE SV IDENTIFICATION (CNV-seq)
• Large SV (>50 kb) = a.k.a. CNVs in humans• Easy to do
• Align reads from each resequenced individual to reference genome (PN40024)
• Identify Large SVs from read coverage along reference sequence
• Requires low sequence coverage: 1-2X
• Asimmetry in information
![Page 24: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/24.jpg)
From Xie and Tammi, BMC Bioinformatics 2009
A comparison of the conceptual steps in aCGHand CNV-seq methods
![Page 25: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/25.jpg)
CNVs (>30 Kbp) can be identified by deep coverage variations
reference
genome 1
genome 2
> 30 Kbp
DELETIONGenome 2= no reads coverage
reference
genome 1
genome 2
> 30 Kbp
AMPLIFICATIONGenome 2=anomalous readscoverage
![Page 26: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/26.jpg)
LARGE SV COMPARISON AMONG VARIETIES
Deletions compared to PN40024
Insertions compared to PN40024
No SV compared to PN40024
Tocai 34.9 Mbp 691 regions
3.1 Mbp106 regions
447.4 Mbp1839 regions
Pinot Noir 31.5 Mbp 523 regions
3.5 Mbp110 regions
450.3 Mbp1732 regions
Corvina 69.6 Mbp 1478 regions
14.7 Mbp520 regions
399.8 M2020 regions
![Page 27: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/27.jpg)
Gene predictions
Transcription profile
Resequencing read density profiles
Resequencing read density ratios
CGH signal intensity ratios
PN40024
Pinot Noir
Tocai
PN40024/TocaiPinot Noir/ PN40024
Pinot Noir/ Tocai
PN40024/TocaiPinot Noir/ PN40024
Pinot Noir/ Tocai
RESEQUENCING FOR STRUCTURAL VARIATION DETECTION
![Page 28: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/28.jpg)
Resequencing by pair-end approach to detect retrotransposons insertions
Inversions(broken mate-paired)
< <
Insertion
> <
Deletion
> <
Concordant
> <
Genomic DNA
Illumina Paired-end Library
Sequence ends of genomic inserts &uniquely map to reference genome
Large Fragment mate pairs (Illumina: 2-3Kb) overcome the problem of placing shorter reads in repeats and detect large insertions and translocations
Small Fragment mate pairs (Illumina: 200bp) provide sensitivity to detect small indels and resolution detecting break points
> = pair-end tagsReference genome
![Page 29: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/29.jpg)
RESEQUENCING FOR SNP IDENTIFICATION
• Align reads from each resequenced individual to reference genome (PN40024)
• Identify SNPs from uniquely placed reads• Requires high sequence coverage: >20X
![Page 30: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/30.jpg)
MAPPING ON REFERENCE GENOME(FAST software)
PINOT TOCAI R5 SANGIOVESE R23 SAUVIGNON 297
![Page 31: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/31.jpg)
RESEQUENCING FOR SNP IDENTIFICATION
Total reads Mappable reads
Unique reads Unique reads, no mismatch
PN40024 83,850,498
3.3 Gbp
77,803,552
3.06 Gbp
56,290,146
2.2 Gbp
45,565,450
1.8 Gbp
Tocai 205,401,113
9.6 Gbp
175,587,385
8.2 Gbp
116,144,877
5.4 Gbp
72,737,376
3.4 Gbp
Corvina213,739,444
8.5 Gbp
155,782,830
6.2 Gbp
89,577,017
3.6 Gbp
51,816,364
2.1 Gbp
![Page 32: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/32.jpg)
RESEQUENCING FOR SNP IDENTIFICATION
Homozygous SNPs
Heterozygous SNPs Total SNPs
PN40024 19,396 36,568 55,964
Tocai 1,271,135 1,322,992 2,594,127
Corvina 864,360 730,926 1,595,286
False positive rate estimated from SNP identified from Sanger resequencing of PCR fragments: 2.1%
![Page 33: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/33.jpg)
IGA laboratories (300mq)
![Page 34: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/34.jpg)
ILLUMINA SEQUENCING PLATFORM AT IGA
• Last run: 16Gbp (2x75bp reads)
• Tested applications:DNA-seq, RNA-seq, PCR products multiplex sequencing (indexing system)
• Upgrade from GAII to GAIIx (July 2009)
• Second GAIIx (December 2009?)
![Page 35: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/35.jpg)
IT INFRASTRUCTURESGROWTH AT IGA
![Page 36: Applications of Illumina/Solexa sequencing technology for ... · Applications of Illumina/Solexa sequencing technology for grape genomics ... dispensable genome 25% 25%50% core genome](https://reader034.fdocuments.us/reader034/viewer/2022051407/5ad9b8c97f8b9a137f8c8ad5/html5/thumbnails/36.jpg)
Federica CattonaroDaniele TrebbiGabriele Di GasperoIrena JurmanNicoletta FeliceVera Vendramin
Cristian DelfabbroSimone ScalabrinFrancesco VezziAlberto PolicritiAlberto StefanAlberto Casagrande
Genomics Bioinformatics
IGA Scientific DirectorMichele Morgante