Ashg sedlazeck grc_share
-
Upload
genome-reference-consortium -
Category
Health & Medicine
-
view
74 -
download
2
Transcript of Ashg sedlazeck grc_share
![Page 1: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/1.jpg)
Structural Variation Characterization Across the Human Genome and PopulationsFritz Sedlazeck
October, 17, 2017
![Page 2: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/2.jpg)
Scientific interestsDetection of Variants
Sniffles (in bioRxiv)
SURVIVOR Jeffares et. al. (2017)
BOD-Score Sedlazeck et.al.(2013)
Mapping/ Assembly reads
NextGenMap-LR(in bioRxiv)
Falcon UnzipChin et.al. (2016)
NextGenMapSedlazeck et.al. (2013)
Benchmarking/ Biases
DangerTrackDolgalev et.al. (2017)
TeaserSmolka et.al. (2015)
SequencingJünemann et.al. (2013)
ApplicationsModel organisms:-Cancer (SKBR3) (in bioRxiv)-miRNA editing (Vesely et.al. 2012)
Non Model organisms:-Cottus transposons (Dennenmoseret. al. 2017)-Clunio (Kaiser et. al. 2016)-Seabass (Vij et.al. 2016)-Pineapple (Ming et.al. 2015)
Figure'1'
“moonlight”'
![Page 3: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/3.jpg)
Structural VariationsGenomic DisordersEvolution
Impact on regulation Impact on phenotypes
Reg
ula
tory
Sta
te
Cell Line
A549Aorta
B_cells_PB_Roadmap
CD14CD16__monocyte_CB
CD14CD16__monocyte_VB
CD4_ab_T_cell_VB
CD8_ab_T_cell_CB
CM_CD4_ab_T_cell_VB
DND_41
eosinophil_VB
EPC_VB
erythroblast_CB
Fetal_Adrenal_Gland
Fetal_Intestine_Large
Fetal_Intestine_Small
Fetal_Muscle_Leg
Fetal_Muscle_Trunk
Fetal_Stomach
Fetal_Thymus
Gastric
GM12878
H1_mesenchymal
H1_neuronal_progenitor
H1_trophoblast
H1ESC H9
HeLa_S3
HepG2HMEC
HSMM
HSMMtube
HUVEC_prol_CB
HUVECIM
R90
iPS_20b
iPS_DF_19_11
iPS_DF_6_9K562
Left_Ventric
leLung
M0_macrophage_CB
M0_macrophage_VB
M1_macrophage_CB
M1_macrophage_VB
M2_macrophage_CB
M2_macrophage_VB
Monocytes_CD14_PB_Roadmap
Monocytes_CD14
MSC_VB
naive_B_cell_VB
Natural_Killer_cells_PB
neutrophil_CB
neutrophil_myelocyte_BM
neutrophil_VB
NH_A
NHDF_ADNHEK
NHLF
OsteoblOvary
Pancreas
Placenta
Psoas_Muscle
Right_Atrium
Small_Intestine
Spleen
T_cells_PB_Roadmap
Thymus
CTCF_binding_siteACTIVE
CTCF_binding_siteINACTIVE
CTCF_binding_sitePOISED
CTCF_binding_siteREPRESSED
enhancerACTIVE
enhancerINACTIVE
enhancerPOISED
enhancerREPRESSED
open_chromatin_regionACTIVE
open_chromatin_regionINACTIVE
open_chromatin_regionNA
open_chromatin_regionPOISED
open_chromatin_regionREPRESSED
promoterACTIVE
promoter_flanking_regionACTIVE
promoter_flanking_regionINACTIVE
promoter_flanking_regionPOISED
promoter_flanking_regionREPRESSED
promoterINACTIVE
promoterPOISED
promoterREPRESSED
TF_binding_siteACTIVE
TF_binding_siteINACTIVE
TF_binding_siteNA
TF_binding_sitePOISED
TF_binding_siteREPRESSED
A549Aorta
B_cells_PB_Roadmap
CD14CD16__monocyte_CB
CD14CD16__monocyte_VB
CD4_ab_T_cell_VB
CD8_ab_T_cell_CB
CM_CD4_ab_T_cell_VB
DND_41
eosinophil_VB
EPC_VB
erythroblast_CB
Fetal_Adrenal_Gland
Fetal_Intestine_Large
Fetal_Intestine_Small
Fetal_Muscle_Leg
Fetal_Muscle_Trunk
Fetal_Stomach
Fetal_Thymus
Gastric
GM12878
H1_mesenchymal
H1_neuronal_progenitor
H1_trophoblast
H1ESC H9
HeLa_S3
HepG2HMEC
HSMM
HSMMtube
HUVEC_prol_CB
HUVECIM
R90
iPS_20b
iPS_DF_19_11
iPS_DF_6_9K562
Left_Ventric
leLung
M0_macrophage_CB
M0_macrophage_VB
M1_macrophage_CB
M1_macrophage_VB
M2_macrophage_CB
M2_macrophage_VB
Monocytes_CD14_PB_Roadmap
Monocytes_CD14
MSC_VB
naive_B_cell_VB
Natural_Killer_cells_PB
neutrophil_CB
neutrophil_myelocyte_BM
neutrophil_VB
NH_A
NHDF_ADNHEK
NHLF
OsteoblOvary
Pancreas
Placenta
Psoas_Muscle
Right_Atrium
Small_Intestine
Spleen
T_cells_PB_Roadmap
Thymus
CTCF_binding_siteACTIVE
CTCF_binding_siteINACTIVE
CTCF_binding_sitePOISED
CTCF_binding_siteREPRESSED
enhancerACTIVE
enhancerINACTIVE
enhancerPOISED
enhancerREPRESSED
open_chromatin_regionACTIVE
open_chromatin_regionINACTIVE
open_chromatin_regionNA
open_chromatin_regionPOISED
open_chromatin_regionREPRESSED
promoterACTIVE
promoter_flanking_regionACTIVE
promoter_flanking_regionINACTIVE
promoter_flanking_regionPOISED
promoter_flanking_regionREPRESSED
promoterINACTIVE
promoterPOISED
promoterREPRESSED
TF_binding_siteACTIVE
TF_binding_siteINACTIVE
TF_binding_siteNA
TF_binding_sitePOISED
TF_binding_siteREPRESSED
050
010
0015
0020
00
scale
affec
ted #
![Page 4: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/4.jpg)
Diploid genome
• Impact on Regulation
• Variability of genes
• Need to understand the full structure
![Page 5: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/5.jpg)
Challenges: Pursuing the diploid genome
1. Accurate prediction of SVs
2. Comparison of SVs
3. Annotation and interpretation of SVs
4. Population analysis
5. Diploid Genome
Layer et.al. (2014)
![Page 6: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/6.jpg)
1.1 How to detect Structural Variations (SVs)
![Page 7: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/7.jpg)
• (+) SVs in repetitive regions
• (+) Span SVs
• (+) Uniform coverage
• (+) Can identify more complex SVs
• (-) Higher seq. error rate
• (-) Hard to align
1.1 Long Read Technologies
![Page 8: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/8.jpg)
1.1 Accurate mapping and SV calling
NextGenMap-LR (NGMLR):• Long read mapper• Convex gap costs• Faster then BWA-MEM
Sniffles:• SV caller for long reads• All types of SVs• Phasing of SVs
![Page 9: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/9.jpg)
1.2 NA12878: SV calling
Tech. Coverage
Avg read len Method SVs TRA
PacBio 55x 4,334 Sniffles 22,877 119
OxfordNanopore @Baylor
34x 4,982 Sniffles 12,596 46
Illumina 50x 2 x 101 Manta, Delly, Lumpy
7,275 2,247
Sedlazeck et.al. (2017)
![Page 10: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/10.jpg)
1.1 NA12878: SV calling
Tech. Coverage
Avg readlen
Method SVs TRA DEL INS
PacBio 55x 4,334 Sniffles 22,877 119 9,933 12,052
OxfordNanopore @Baylor
34x 4,982 Sniffles 12,596 46 7,102 5,166
Illumina 50x 2 x 101 Manta, Delly,
Lumpy
7,275 2,247 3,744 0
Sedlazeck et.al. (2017)
![Page 11: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/11.jpg)
1.1 NA12878: check 2,247 vs 119 TRA
Illumina data
Translocation:
PacBio data
ONT data
Truncated reads:
Insertion In rep. region
Overlap Illumina TRA(%)
Insertions 53.05
Deletions 12.06
Duplications 0.57
Nested 0.31
High coverage 1.87
Low complexity 9.79
Explained 77.65
Sedlazeck et.al. (2017)
![Page 12: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/12.jpg)
1.1 NA12878: check 2,247 vs 119 TRA
ONT data
PacBio data
Illumina data
Insertion In rep. region
Inversion:
Translocation:
Truncated reads:
Insertion In rep. region
Sedlazeck et.al. (2017)
![Page 13: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/13.jpg)
1.2 More complex SVs
Inverted tandem duplication:• Pelizaeus-Merzbacher
disease• MECP2• VIPR2
Sedlazeck et.al. (2017)
PacBio data
Illumina data
![Page 14: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/14.jpg)
1.2 More complex SVs
Inversion flanked by deletions:• Haemophilia A• Only found over long range PCR!
(2007)
Sedlazeck et.al. (2017)
Illumina data
PacBio data
![Page 15: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/15.jpg)
Challenges
1. Accurate prediction of SVs: Sniffles (talk on Thursday!)
2. Comparison of SVs
3. Annotation and interpretation of SVs
4. Population analysis
5. Diploid Genome
Layer et.al. (2014)
![Page 16: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/16.jpg)
2. Comparison of SVs
SURVIVOR Framework:• Compare SVs
• GiaB: 95 vcf file: 1 minute
• Simulate SVs
• Simulate long reads
• Summarize SVs results
Jeffares et.al. (2017)
New SVs
Observed SVs
![Page 17: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/17.jpg)
2. Genome in a Bottle: merging 95 vcfs (1 min)
10x Genomics
BioNano
Complete Genomics
Illumina
PacBio
Minimum 2 callers:SV Caller Comparison:
Using PCR+Sanger validate SVs form multiple categories.
Join CSHL + Baylor to help with validations!
![Page 18: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/18.jpg)
Challenges
1. Accurate prediction of SVs: Sniffles (talk on Thursday!)
2. Comparison of SVs: SURIVOR
3. Annotation and interpretation of SVs
4. Population analysis
5. Diploid Genome
![Page 19: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/19.jpg)
Histogram over genes impacted
#Gene hit by SVS
Fre
que
ncy
0 20 40 60 80
020
00
4000
6000
3. Annotation: SURVIVOR_ant
Annotating SVs with:• Multiple GTF, BED, VCF
Genome in a Bottle:• 63,677 genes (GTF)
• 1,733,686 regions (3 bed files)
• 22 seconds:• 8,314 Genes impacted
Sedlazeck et.al. (2017)
#G
en
es
# SV hit gene
Genes impacted by SVs
![Page 20: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/20.jpg)
Challenges
1. Accurate prediction of SVs: Sniffles (talk on Thursday!)
2. Comparison of SVs: SURIVOR
3. Annotation and interpretation of SVs: SURVIVOR_ANT
4. Population analysis
5. Diploid Genome
![Page 21: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/21.jpg)
4. SVs in Population: SURVIVOR
• Birth defect study (Karyn MeltzSteinberg, WashU: Wed. 9am: Room 310A)
• 4 callers, 114 samples
• CCDG (William Salerno, HGSC: poster on Friday, #1281)
• 5 callers, 22,600 samples
• Non human:• S. Pombe: 3 callers, 161 samples• Tomato: 3 callers, 846 samples
![Page 22: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/22.jpg)
4. SVs in 22,600 Individuals
We need large SV studies:• Common vs. rare SVs
• Inform GWAS studies
• Ethnicity specific SVs
• Catalog variability of regions• MHC, LPA, etc.
0.0e+00 5.0e+07 1.0e+08 1.5e+08
0.0
00
.10
0.2
0
CHR6: Average SV Allele Frequency per 100kb
Position
Alle
le f
req
uen
cy
MHC LPA
#SV
s
Shared across individuals
Position
![Page 23: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/23.jpg)
Challenges
1. Accurate prediction of SVs: Sniffles (talk on Thursday!)
2. Comparison of SVs: SURIVOR
3. Annotation and interpretation of SVs: SURVIVOR_ANT
4. Population analysis: SURVIVOR
5. Diploid Genome
![Page 24: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/24.jpg)
5.1 Diploid Genome
Challenges: • Sequencing technology
• Computational methods
• Money
HGSC Approach: GADGET1. Sequence 100 individuals: PacBio + 10x Genomics
2. SV detection/genotyping
3. Phasing of SVs+ SNP
4. Population based genotyping of SVs short reads.
![Page 25: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/25.jpg)
5.2 Diploid Genome
Selecting 100 samples
• We want to maximize the outcome/ $ spent
• Selection of samples (red)
• Select top 100 (red)
• Random selection of samples (boxplot)
Histogram of mat[, 2]
# SVS
#P
atien
ts
2e+04 4e+04 6e+04 8e+04 1e+050
50
10
01
50
200
250
1 6 12 19 26 33 40 47 54 61 68 75 82 89 96
020
40
60
80
100
Random vs. informed choice of samples (CCDG)
# of chosen Samples
SV
in p
opu
lation (
%)
Informed
Top100
Random
Number of chosen samples
SV in
po
pu
lati
on
(%
)
![Page 26: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/26.jpg)
5.3 Diploid Genome (Prototype)
![Page 27: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/27.jpg)
Challenges/ Summary
1. Accurate prediction of SVs: Sniffles (Talk on Thursday!)
2. Comparison of SVs: SURIVOR
3. Annotation and interpretation of SVs:SURVIVOR_ANT
4. Population analysis: SURVIVOR
5. Diploid Genome: GADGET
All methods are available:
https://github.com/fritzsedlazeck
https://fritzsedlazeck.github.io/
1 6 12 19 26 33 40 47 54 61 68 75 82 89 96
020
40
60
80
100
Random vs. informed choice of samples (CCDG)
# of chosen Samples
SV
in p
opu
lation (
%)
Informed
Top100
Random
Number of chosen samples
SV in
po
pu
lati
on
(%
)
![Page 28: Ashg sedlazeck grc_share](https://reader031.fdocuments.us/reader031/viewer/2022022415/5a654b2f7f8b9a5b558b6343/html5/thumbnails/28.jpg)
William Salerno
Stephen Richards
Richard Gibbs
Michael Schatz
Schatz lab
Acknowledgments
Daniel JeffaresJürg BählerChristophe Dessimoz
Justin Zook
GiaB consortium