Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these...
-
Upload
baldric-johnston -
Category
Documents
-
view
219 -
download
1
Transcript of Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these...
![Page 1: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/1.jpg)
Omixon WorkshopsConsiderations for Analyzing Targeted NGS Data - IntroductionTim Hague, CEO
![Page 2: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/2.jpg)
Targeted Data
![Page 3: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/3.jpg)
Introduction
Many mapping, alignment and variant calling algorithms
Most of these have been developed for whole genome sequencing and to some extent population genetic studies
![Page 4: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/4.jpg)
Premise
In contrast, NGS based diagnostics deals with particular genes or mutations of an individual
Different diagnostic targets present specific challenges
![Page 5: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/5.jpg)
Goal
Present analysis issues related to differences in:
Sequencing technologies
Targeting technologies
Target specifics
Pseudogenes and segmental duplication
![Page 6: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/6.jpg)
Roche 454Illumina IonTorrentt
NGS Sequencers
Illumina
Ion Torrent
Roche 454
(SOLiD)
![Page 7: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/7.jpg)
Mind The Gap
Moore B, Hu H, Singleton M, De La Vega, FM, Reese MG, Yandell M. Genet Med. 2011 Mar;13(3):210-7.
![Page 8: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/8.jpg)
Sequencing Technology
Differences: Homopolymer error rates G/C content errors Read length Sequencing protocols (single vs paired reads)
![Page 9: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/9.jpg)
Targeting Methods
PCR primers (e.g. amplicons) Hybridization probes (e.g. exome kits)
![Page 10: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/10.jpg)
Targeting Technology
Differences: Exact matching regions vs regions with SNPs
Results in: Need for mapping against whole chromosomes to
avoid false positives
![Page 11: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/11.jpg)
Analysis Targets
Differences:
Rate of polymorphism
Repetitive structures
Mutation profiles
G/C content
Single genes vs multi gene complexes
![Page 12: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/12.jpg)
BRCA1/2 HLA CFTR1/2000 1/29 1/2000
Distributions of insertions and deletions
Distribution of repeat elements
![Page 13: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/13.jpg)
![Page 14: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/14.jpg)
Segmental Duplications
Sometimes called Low Copy Repeats (LCRs)
Highly homologous, >95% sequence identity
Rare in most mammals
Comprise a large portion of the human genome (and other primate genomes)
Important for understanding HLA
![Page 15: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/15.jpg)
Many LCRs are concentrated in "hotspots„
Recombinations in these regions are responsible for a wide range of disorders, including: – Charcot-Marie-Tooth syndrome type 1A– Hereditary neuropathy with liability to pressure palsies– Smith-Magenis syndrome– Potocki-Lupski syndrome
Segmental Duplications
![Page 16: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/16.jpg)
Data analysis shouldn’t be like this!
Data Analysis Tools
Differences: Detection rates of complex variants (sensitivity) False positive rates (accuracy) Speed Ease of use
![Page 17: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/17.jpg)
“Depending upon which tool you use, you can see pretty big differences between even the same genome called with different tools—nearly as big as the two Life Tech/Illumina genomes.”
Mark Yandel in BioIT-World.com, June 8, 2011
![Page 18: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/18.jpg)
Examples
Missing variants
SNPs, a DNP and deletions
![Page 19: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/19.jpg)
![Page 20: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/20.jpg)
Identify More Valid Variants
![Page 21: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/21.jpg)
Find Homopolymer Indels
![Page 22: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/22.jpg)
Examples
Coverage differences
![Page 23: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/23.jpg)
[0-432]
[0-96]
Four Times Exon Coverage
![Page 24: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/24.jpg)
[0-24]
[0-10]
Higher Exome Coverage
![Page 25: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/25.jpg)
First Conclusion
Read accuracy is not the limiting factor in accurate
variant analysis
![Page 26: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/26.jpg)
Example - Dense Region of SNPs
![Page 27: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/27.jpg)
Second Conclusion
As variant density increases the performance of most tools
goes down
![Page 28: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/28.jpg)
Variant Calling
There are few popular variant callers: GATK, SAMtools mpileup, VarScan
The most comprehensive (GATK) has a whole pipeline, including a quality recalibration step and an indel realignment step
These recalibration and realignment steps are highly recommended to be run before any variant call
Deduplication and removing non-primary alignments may also be required
![Page 29: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/29.jpg)
Indel Realigner Problem
![Page 30: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/30.jpg)
Variants That Can be Hard to Find
DNPs TNPs Small indels next to SNPs 30+ bp indels Homopolymer indels Homopolymer indel and SNP together Indels in palindromes Dense regions of variants
![Page 32: Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and.](https://reader037.fdocuments.us/reader037/viewer/2022103022/56649d195503460f949ee420/html5/thumbnails/32.jpg)
Download our Omixon Target™ Evaluation Version
Today
OMIXON.COM