Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham,...
-
Upload
brayan-hinch -
Category
Documents
-
view
213 -
download
0
Transcript of Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham,...
![Page 1: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/1.jpg)
Illumin8er: Software for the Illumina GAII
Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor
Leeds Institute of Molecular Medicine, Leeds Teaching Hospitals & Cancer Research UK
![Page 2: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/2.jpg)
Sipping from the hosepipeThe cost of DNA sequencing is plummeting
Current sequence output from an Illumina GAII is over 1 Gigabase per day
Managing the data is the single biggest challenge to bringing the benefits to patients and cost savings to to the Healthcare budget
The next biggest challenge is optimising the workflow to achieve cost efficiency
![Page 3: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/3.jpg)
What should the software do?
Scan for and report mutations against a defined reference sequence.
Be able to handle bar-code sequence tags
Be easy to use
Report on data quality
Export to a database
![Page 4: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/4.jpg)
Why Illumina?Cost: 0002p per base
Capacity: 3.5 Gigabase per run
Simplicity: library>cluster station>sequence>data
![Page 5: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/5.jpg)
500,000,000 bases per channel
![Page 6: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/6.jpg)
Software requirementsRuns in MS Windows
User definable reference sequence
Quality scores
Automatic mutation callingSNPs Indels
Speed
![Page 7: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/7.jpg)
Initial data manipulationIlluminator can transform data in prb.txt or
seq.txt in to fasta files
If tagged data is used each tag is separated in to an individual file.
The prb.txt files can be filtered for low quality data
![Page 8: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/8.jpg)
Reference filesReference files are created from plain text
files of the genomic sequence and a cDNA sequence in either a plain text file or a genbank web page.
If a genbank page is used the SNP data in the page is also imported with cDNA sequence.
The reference file contains the position of the exons and ORF relative to the genomic sequence to aid mutation annotation.
![Page 9: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/9.jpg)
Indexing the reference sequence
Each octamer in the reference sequence is mapped to an array of 65537 octamers (the extra one is for unmapped rubbish such as ‘nnnnnnnn’)
Some octamers have no positions in the reference while others have several.
GCTGGTGAGGGGTGGGGCAGGAGTGCTTGGGTTGTGGTGAAACATTGG
aaaaaaaaaaaaaaac
aaaaaaataaaaaaag
aaaaaacaaaaaaacc
tttttttt
tttttttctttttttg
~65000
nnnnnnnn
![Page 10: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/10.jpg)
Mapping reads with 3’ mismatchesTGAGGGGTGGGGCAGGAGTGCTTGGGTTGTGGGAAA
Position where octamer is found in ref seq
60629005000
6148900
3066221400
18302500
Match up positions where octamer increase by 8 606
29005000
6148900
3066221400
NA
not+8b
p+8bp +8bp
3’ mismatches have a run of 3 foot prints with the last octomer missing.This goes in to array 2 (phase 2)
GCTGGTGAGGGGTGGGGCAGGAGTGCTTGGGTTGTGGTGAAACATTGG
![Page 11: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/11.jpg)
Mapping reads with 5’ mismatchesGTGAGGGGGGGGCAGGAGTGCTTGGGTTGTGGTGAA
Position where octamer is found in ref seq 5700
6148900
3066221400
630
Match up positions where octamer increase by 8 NA 614
8900
3066221400
630+8bp
+8bp
GCTGGTGAGGGGTGGGGCAGGAGTGCTTGGGTTGTGGTGAAACATTGG
not+8b
p
5’ mismatches have a run of 3 foot prints with the first octomer missing.This goes in to array 3 (phase 3)
![Page 12: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/12.jpg)
Mapping reads with internal mismatches
TGAGGGGTGGGGCAGAAGTGCTTGGGTTGTGGTGAA
Position where octamer is found in ref seq
60629005000
16645900
3066221400
630
Match up positions where octamer increase by 8 606
29005000
16645900
3066221400
630+8bp
not+8bp
GCTGGTGAGGGGTGGGGCAGGAGTGCTTGGGTTGTGGTGAAACATTGG
not+8b
p
internal mismatches have a run of 3 foot prints with either the second or third octamer out of phase.This goes in to array 4 (phase 4)
+16bp
![Page 13: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/13.jpg)
What each phase is used for
Phase 1 = perfect matches
Phase 2 = indels and small mutations at end of a read
Phase 3 = indels and small mutations at start of a read
Phase 4 = small mutations in the middle of read
![Page 14: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/14.jpg)
Small changes These are found by looking at Phase 4 data.
Homozygous mutation are in Phase 4 but not phase 1 (seen as a hole)
Heterozygous variants are in seen in phase 4 and wt seen in phase 1 data.
WT in Phase 1data
Mut in Phase 4Data.(The wt alleleIs present due to seq errors elsewhere in the read.)
![Page 15: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/15.jpg)
InDels
Phase 2 data gets indels from end of the read while Phase 3 gets them from the start of the read.
In a perfect world Phase 2 and 3 data should mirror each other.
![Page 16: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/16.jpg)
Global view
Data for a PCR product containing two exons; blue = exonic DNA pink = protein coding DNA
The red and blue lines show the read depth of forward and reverse reads.
The lower panel shows the reference and deduced sequences around the a point on the upper panel selected by clicking on the panel with the mouse
![Page 17: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/17.jpg)
Data view
Forward and Reverse sequences
Patient sequence
Patient’s other allele sequence
Score for each nucleotideReference genomic, cDNA and protein sequence
Read depth
Heterozygous base
![Page 18: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/18.jpg)
Indel interface
Forward and Reverse sequences
Reference sequence
Patient sequences with indel at start and end of read
Consensus sequence of patient reads across indel
Alignment of patient and reference sequence to identify indel
![Page 19: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/19.jpg)
Data exportThe program can both export and import the
alignment data as a plain text file
Create an updatable library of sequence variants
Export sequence variants as a text file
Create a LOVD import file for the sequence variants
![Page 20: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/20.jpg)
Validation: BRCA1&BRCA2
Illuminator detected all the mutations previously identified by dye terminator Sanger sequencing of the exons in BRCA1 and 2 of 10 individuals. Each nucleotide had a read depth of at least 75 reads (approximately 6.6x103 sequences per gene). The alignment and mutation annotation took ~50 seconds per gene per person
![Page 21: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/21.jpg)
ConclusionsIllumin8er is
Easy to use RapidRuns on Windows desktopUses standard Illumina output filesReports mutations in a sensitive and specific
manner
![Page 22: Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c59bc5503469d6a8b503f/html5/thumbnails/22.jpg)
Next steps..Make freely available by download
http://dna.leeds.ac.uk/illumin8er/
Design compatible LOVD
Large scale validation trial