Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through...
Transcript of Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through...
![Page 1: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/1.jpg)
Review:MethodologiesforSVsdetectionFritz Sedlazeck
Nov, 16, 2018
![Page 2: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/2.jpg)
Mygroup/interestsDetectionofVariants
SnifflesSedlazeck et.al. (2018)
SURVIVORJeffareset.al.(2017)
BOD-ScoreSedlazecket.al.(2013)
Mapping/Assemblyreads
NextGenMap-LRSedlazecket.al.(2018)
FalconUnzipChin et.al.(2016)
NextGenMapSedlazecket.al.(2013)
Benchmarking
SVgenotyperChander et.al. (in prep.)
TeaserSmolka et.al.(2015)
SequencingJünemann et.al.(2013)
ApplicationsModelorganisms:-Cancer(SKBR3)(Nattestadet.al.2018)-miRNA editing(Vesely et.al.2012)
NonModelorganisms:-Cottus transposons (Dennenmoseret.al.2017)-Clunio (Kaiseret.al.2016)-Seabass (Vij et.al.2016)-Pineapple (Minget.al.2015)
Figure'1'
“moonlight”'
![Page 3: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/3.jpg)
Early2000sdogma:SNPsaccountformosthumangeneticvariation
https://hapmap.ncbi.nlm.nih.gov
![Page 4: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/4.jpg)
Segmentalduplications(a.k.a.Lowcopyrepeats)
Bailey et al, 2002~5% of the human genome is duplicated!
Self Dotplot: 10 megabases of Chr 15(dot = 1 kb exact match)
![Page 5: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/5.jpg)
Variationingenomestructure.So-called"structuralvariation"(SV)
DB CAReference
DB CA BDuplication
CB DInversion A
DCADeletion *DB CXInsertion A
Translocation RB QA
CNV
CNV
SV
SV
SV
SV is a superset of copy number variation (CNV). Not all structural changes affect
copy number (e.g., inversions)!
![Page 6: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/6.jpg)
Ourunderstandingofstructuralvariationisdrivenbytechnology
1940s - 1980sCytogenetics / Karyotyping
1990sCGH / FISH /
SKY / COBRA
2000sGenomic microarrays
BAC-aCGH / oligo-aCGH
TodayHigh throughput DNA sequencing
![Page 7: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/7.jpg)
Whyare structuralvariations relevant/important?
• They are common and affect a large fraction of the genome
• They are a major driver of genome evolution
GenomicDisordersEvolution
![Page 8: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/8.jpg)
Whyare structuralvariations relevant/important?
• Genetic basis of traits
Impactonregulation Impactonphenotypes
RegulatoryState
CellLine
A549Aorta
B_cells_PB
_Roadm
ap
CD14C
D16__m
onocyte
_CB
CD14C
D16__m
onocyte
_VB
CD4_ab_T_
cell_VB
CD8_ab_T_
cell_CB
CM_CD
4_ab_T
_cell_VB
DND_41
eosinop
hil_VBEPC
_VB
erythroblas
t_CB
Fetal_Ad
renal_Gland
Fetal_Intestine_
Large
Fetal_Intestine_
Small
Fetal_Muscle_L
eg
Fetal_Muscle_T
runk
Fetal_S
tomach
Fetal_Th
ymusGas
tric
GM12878
H1_mes
enchym
al
H1_neurona
l_progenitor
H1_trop
hoblastH1E
SC H9HeL
a_S3Hep
G2HMECHSM
M
HSMMtube
HUVEC_
prol_CBHUV
ECIMR
90iPS_20b
iPS_DF
_19_11
iPS_DF
_6_9K56
2
Left_Ve
ntricleLun
g
M0_mac
rophage_C
B
M0_mac
rophage_V
B
M1_mac
rophage_C
B
M1_mac
rophage_V
B
M2_mac
rophage_C
B
M2_mac
rophage_V
B
Monocytes_
CD14_PB_
Roadma
p
Monocytes_
CD14
MSC_V
B
naive_B
_cell_VB
Natural_Killer_cells_P
B
neutrop
hil_CB
neutrop
hil_mye
locyte_B
M
neutrop
hil_VBNH_
A
NHDF_A
DNHE
KNHL
FOsteoblOva
ry
Pancrea
sPlac
enta
Psoas_Mus
cle
Right_A
trium
Small_IntestineSple
en
T_cells_PB
_Roadm
apThymus
CTCF_b
inding_siteACT
IVE
CTCF_b
inding_siteINACTIVE
CTCF_b
inding_sitePOI
SED
CTCF_b
inding_siteREP
RESSED
enhancerACTIVE
enhancerIN
ACTIVE
enhancerPOIS
ED
enhancerREPR
ESSED
open_chromatin_regio
nACTIVE
open_chromatin_regio
nINACT
IVE
open_chromatin_regio
nNA
open_chrom
atin_reg
ionPOIS
ED
open_chromatin_regio
nREPRE
SSED
promoterACTIVE
promoter_flanking
_region
ACTIVE
promoter_flanking
_region
INACTIVE
promoter
_flankin
g_region
POISED
promoter_flanking
_region
REPRES
SED
promoterIN
ACTIVE
promoter
POISED
promoterREPR
ESSED
TF_bind
ing_siteACT
IVE
TF_bind
ing_siteINACTIVE
TF_bind
ing_siteNA
TF_bind
ing_sitePOI
SED
TF_bind
ing_siteREP
RESSED
A549Aorta
B_cells_PB
_Roadm
ap
CD14C
D16__m
onocyte
_CB
CD14C
D16__m
onocyte
_VB
CD4_ab_T_
cell_VB
CD8_ab_T_
cell_CB
CM_CD
4_ab_T
_cell_VB
DND_41
eosinop
hil_VBEPC
_VB
erythroblas
t_CB
Fetal_Ad
renal_Gland
Fetal_Intestine_
Large
Fetal_Intestine_
Small
Fetal_Muscle_L
eg
Fetal_Muscle_T
runk
Fetal_S
tomach
Fetal_Th
ymusGas
tric
GM12878
H1_mes
enchym
al
H1_neurona
l_progenitor
H1_trop
hoblastH1E
SC H9HeL
a_S3Hep
G2HMECHSM
M
HSMMtube
HUVEC_
prol_CBHUV
ECIMR
90iPS_20b
iPS_DF
_19_11
iPS_DF
_6_9K56
2
Left_Ve
ntricleLun
g
M0_mac
rophage_C
B
M0_mac
rophage_V
B
M1_mac
rophage_C
B
M1_mac
rophage_V
B
M2_mac
rophage_C
B
M2_mac
rophage_V
B
Monocytes_
CD14_PB_
Roadma
p
Monocytes_
CD14
MSC_V
B
naive_B
_cell_VB
Natural_Killer_cells_P
B
neutrop
hil_CB
neutrop
hil_mye
locyte_B
M
neutrop
hil_VBNH_
A
NHDF_A
DNHE
KNHL
FOsteoblOva
ry
Pancrea
sPlac
enta
Psoas_Mus
cle
Right_A
trium
Small_IntestineSple
en
T_cells_PB
_Roadm
apThymus
CTCF_b
inding_siteACT
IVE
CTCF_b
inding_siteINACTIVE
CTCF_b
inding_sitePOI
SED
CTCF_b
inding_siteREP
RESSED
enhancerACTIVE
enhancerIN
ACTIVE
enhancerPOIS
ED
enhancerREPR
ESSED
open_chromatin_regio
nACTIVE
open_chromatin_regio
nINACT
IVE
open_chromatin_regio
nNA
open_chrom
atin_reg
ionPOIS
ED
open_chromatin_regio
nREPRE
SSED
promoterACTIVE
promoter_flanking
_region
ACTIVE
promoter_flanking
_region
INACTIVE
promoter
_flankin
g_region
POISED
promoter_flanking
_region
REPRES
SED
promoterIN
ACTIVE
promoter
POISED
promoterREPR
ESSED
TF_bind
ing_siteACT
IVE
TF_bind
ing_siteINACTIVE
TF_bind
ing_siteNA
TF_bind
ing_sitePOI
SED
TF_bind
ing_siteREP
RESSED
0500
1000
1500
2000
scale
affecte
d #
![Page 9: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/9.jpg)
Outline
1. CNVanalysis
2. SVsanalysis1. Assemblybased2. Shortreads3. Longreads
3. Reviewplan
![Page 10: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/10.jpg)
Humansdifferbyroughly3,000deletions(>=500bp)
![Page 11: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/11.jpg)
Humansdifferbyafewhundredduplications
![Page 12: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/12.jpg)
Copy-number Profiles
![Page 13: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/13.jpg)
Gingko http://qb.cshl.edu/ginkgo
Interactive Single Cell CNV analysis & clustering• Easy-to-use, web interface, parameterized for binning,
segmentation, clustering, etc• Per cell through project-wide analysis in any species
Compare MDA, DOP-PCR, and MALBAC• DOP-PCR shows superior resolution and consistency
Available for collaboration• Analyzing CNVs with respect to different clinical outcomes• Extending clustering methods, prototyping scRNA
Interactive analysis and assessment of single-cell copy-number variations.Garvin T, Aboukhalil R, Kendall J, Baslan T, Atwal GS, Hicks J, Wigler M, Schatz MC (2015) Nature Methods doi:10.1038/nmeth.3578
![Page 14: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/14.jpg)
Data are noisy
Potentialforbiasesateverystep• WGA:Non-uniformamplification• LibraryPreparation:Lowcomplexity,readduplications,barcoding• Sequencing:GCartifacts,shortreads• Computation:mappability,GCcorrection,segmentation,treebuilding
CoverageistoosparseandnoisyforSNPanalysis->Requiresspecialprocessing
![Page 15: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/15.jpg)
CNVanalysis§ Dividethegenomeinto“bins”with~50– 100reads/bin§ Mapthereadsandcountreadsperbin
Useuniquelymappablebasestoestablishbins
1.Binning
![Page 16: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/16.jpg)
1.Binning
CNVanalysis§ Dividethegenomeinto“bins”with~50– 100reads/bin§ Mapthereadsandcountreadsperbin
Useuniquelymappablebasestoestablishbins
![Page 17: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/17.jpg)
1.Binning
5 4 5 10 11 5 2 5
CNVanalysis§ Dividethegenomeinto“bins”with~50– 100reads/bin§ Mapthereadsandcountreadsperbin
Useuniquelymappablebasestoestablishbins
![Page 18: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/18.jpg)
2. Normalization
Also correct for mappability, GC content, amplification biases
![Page 19: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/19.jpg)
3. Segmentation
CircularBinarySegmentation(CBS)
i j j j ji ji
![Page 20: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/20.jpg)
4.EstimatingCopyNumber
CN = argminnX
i,j
(Yi,j � Yi,j)2o
![Page 21: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/21.jpg)
UsingNanopore MinION:CNVkaryotyping.
![Page 22: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/22.jpg)
Nanopore sequencingforCNVdetection
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1819 20212223XY
![Page 23: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/23.jpg)
SKBR3 cell line CNV Analysis
![Page 24: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/24.jpg)
SID97277- partialchromosomaldeletions
MinIONdata
~60kreads
MiSeq Data
5qdeletion indicatespoorprognosis Chr11abnormalities
indicatepoor prognosis
![Page 25: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/25.jpg)
SID97277karyotype
![Page 26: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/26.jpg)
SID97279– trisomy6,15,22anddeletionsinchr11
MinIONData
~73kreads
MiSeq Data
Trisomy6correlatedwithintermediateprognosis
Abnormalitieson11indicatepoorprognosis
![Page 27: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/27.jpg)
CNVdetectionsummary
• Advantages• Lesscoverageisrequired
• ->Applicationssuchassinglecellsequencing
• Disadvantages• Resolutionofevents
• usuallyinthemultikbp• Onlydeletionsandduplications• Coveragebiasesinshortreads
![Page 28: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/28.jpg)
Assemblybased
1. Denovoassembly2. Genomicalignment(WGA)3. Detanglethegenomicalignmenttoidentifyvariants.
![Page 29: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/29.jpg)
Ingredients for a good assembly
Current challenges in de novo plant genome sequencing and assemblySchatz MC, Witkowski, McCombie, WR (2012) Genome Biology. 12:243
Coverage
High coverage is required– Oversample the genome to ensure
every base is sequenced with long overlaps between reads
– Biased coverage will also fragment assembly
Lander Waterman Expected Contig Length vs Coverage
Read Coverage
Exp
ect
ed
Co
ntig
Le
ng
th (
bp
)
0 5 10 15 20 25 30 35 40
10
01
k1
0k
10
0k
1M
+dog mean
+dog N50
+panda mean
+panda N50
1000 bp
710 bp
250 bp
100 bp
52 bp
30 bp
Read Coverage
Expe
cted
Con
tig
Leng
th
Read Length
Reads & mates must be longer than the repeats– Short reads will have false overlaps
forming hairball assembly graphs– With long enough reads, assemble
entire chromosomes into contigs
Quality
Errors obscure overlaps– Reads are assembled by finding
kmers shared in pair of reads– High error rate requires very short
seeds, increasing complexity and forming assembly hairballs
![Page 30: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/30.jpg)
Goal of WGA
• For two genomes, A and B, find a mapping from each position in A to its corresponding position in B
CCGGTAGGCTATTAAACGGGGTGAGGAGCGTTGGCATAGCA
CCGGTAGGCTATTAAACGGGGTGAGGAGCGTTGGCATAGCA
![Page 31: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/31.jpg)
Not so fast...
• Genome A may have insertions, deletions, translocations, inversions, duplications or SNPs with respect to B (sometimes all of the above)
CCGGTAGGATATTAAACGGGGTGAGGAGCGTTGGCATAGCA
CCGCTAGGCTATTAAAACCCCGGAGGAG....GGCTGAGCA
![Page 32: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/32.jpg)
WGA visualization
• How can we visualize whole genome alignments?
• With an alignment dot plot• N x M matrix
• Let i = position in genome A• Let j = position in genome B• Fill cell (i,j) if Ai shows similarity to Bj
• A perfect alignment between A and B would completely fill the positive diagonal
T
G
C
A
A C C T
![Page 33: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/33.jpg)
B
A
B
A
Translocation Inversion Insertion
![Page 34: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/34.jpg)
• Different structural variation types / misassemblies will be apparent by their pattern of breakpoints
• Most breakpoints will be at or near repeats
• Things quickly get complicated in real genomes
http://mummer.sf.net/manual/AlignmentTypes.pdf
![Page 35: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/35.jpg)
Assemblybaseddetectionsummary
• Advantages• Enablesthedetectionofeveryevent• Goodqualityforinsertions
• Disadvantages• Genomicalignmentischallenging.• Heterozygouseventsarelikelymissed.
![Page 36: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/36.jpg)
HowtodetectStructuralVariations
![Page 37: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/37.jpg)
Sequencealignment“signals”forstructuralvariation
1. Align DNA sequences from sample to human reference genome
2. Look for evidence of structural differences
Ref.
Exp.
(a) Depth ofcoverage
(b) Paired-endmapping
(c) Split-readmapping
(d) de novoassembly
Low HighResolution
![Page 38: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/38.jpg)
Lookingfor"discordant"paired-endfragments
Paired-end sequencing
Ref
Sample
paired-ends map farther away than expected
2000 bp
Slide in collaboration with Ira Hall
![Page 39: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/39.jpg)
AprobabilisticframeworkforSVdiscovery
Layer et al, 2014
Ryan Layer
Lumpy integrates paired-end mapping, split-read mapping, and depth of coverage for better SV discovery accuracy
![Page 40: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/40.jpg)
Problem#1:Oftenmanyfalsepositives
- Short reads + heuristic alignment + rep. genome = systematic alignment artifacts (false calls)
- Chimeras and duplicate molecules
- Ref. genome errors (e.g., gaps, mis-assemblies)
- ALL SV mapping studies use strict filters for above
![Page 41: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/41.jpg)
Problem#2:Thefalsenegativerateisalsotypicallyhigh
- Most current datasets have low to moderate physical coverage due to small insert size (~10-20X)
- Breakpoints are enriched in repetitive genomic regions that pose problems for sensitive read alignment
- FILTERING!
- The false negative rate is usually hard to measure, but is thought to be extremely high for most paired-end mapping studies (>30%)
- When searching for spontaneous mutations in a family or a tumor/normal comparison, a false negative call in one sample can be a false positive somatic or de novo call in another.
![Page 42: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/42.jpg)
Howtofilter/choosetheSVcaller?• Eachmethodappliesitsownheuristics.
Method # Sim. SV avg FDR avg SensitivityDELLY 33-198 0.13 0.75LUMPY 33-198 0.06 0.62Pindel 33-198 0.04 0.55SURVIVOR 33-198 0.01 0.70
![Page 43: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/43.jpg)
PacBio /ONTsequencer
Advantage:• Longreads,Disadvantage:• Throughput/yield• Costs• Higherrorrates
![Page 44: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/44.jpg)
LongReadTechnologies
• (+)SVsinrepetitiveregions• (+)SpanSVs• (+)Uniformcoverage• (+)CanidentifymorecomplexSVs
• (-)Higherseq.errorrate• (-)Hardtoalign
![Page 45: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/45.jpg)
Mappingchallenges
BWA-MEM: NGMLR:
![Page 46: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/46.jpg)
Mappingchallenges
BWA-MEM: NGMLR:
![Page 47: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/47.jpg)
NGMLR+Sniffles
• NGMLR• Convexgapcostmodeltobetterdistinguishseq.errorvs.signal
• Novelmethodforsplitreadalignment.
• Sniffles• Includesmultiplestatisticalmodelstodistinguishnoisevs.signal
![Page 48: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/48.jpg)
100
250
500 1k 5k 10k
50k
Indels
0
20
40
60
80
100
BLAS
R
100
250
500 1k 5k 10k
50k
Duplication
100
250
500 1k 5k 10k
50k
Translocation
100
250
500 1k 5k 10k
50k
Inversion
100
250
500 1k 5k 10k
50k
0
20
40
60
80
100
BWA
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
0
20
40
60
80
100
GraphMap
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
0
20
40
60
80
100
NGMLR
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
1.3Longreadmapping
Precise
Indicated
Wrong
Alignmentstoppedprior
Notaligned
![Page 49: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/49.jpg)
Morecomplextypes
![Page 50: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/50.jpg)
2.4LongreadSVcalling
100
250
500 1k 5k 10k
50k
Indels
0
20
40
60
80
100
SURV
IVOR
100
250
500 1k 5k 10k
50k
Duplication
100
250
500 1k 5k 10k
50k
Translocation
100
250
500 1k 5k 10k
50k
Inversion
100
250
500 1k 5k 10k
50k
0
20
40
60
80
100
PBHo
ney
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
0
20
40
60
80
100
Sniffles
+BWA
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
0
20
40
60
80
100
Sniffles
+NGM−LR
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
Precise
Indicated
Notfound
Additionalevents
![Page 51: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/51.jpg)
2.4LongreadSVcalling
Precise
Indicated
Notfound
Additionalevents
100
250
500 1k 5k 10k
50k
Dup
020406080100
SURV
IVOR
100
250
500 1k 5k 10k
50k
Indel
100
250
500 1k 5k 10k
50k
Inv
100
250
500 1k 5k 10k
50k
Tra
100
250
500 1k 5k 10k
50k
InvDel
100
250
500 1k 5k 10k
50k
InvDup
100
250
500 1k 5k 10k
50k
020406080100
PBHoney
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
020406080100
Sniffles
+BWA
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
020406080100
Sniffles
+NGM−LR
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
Dup
020406080100
SURV
IVOR
100
250
500 1k 5k 10k
50k
Indel
100
250
500 1k 5k 10k
50k
Inv
100
250
500 1k 5k 10k
50k
Tra
100
250
500 1k 5k 10k
50k
InvDel
100
250
500 1k 5k 10k
50k
InvDup
100
250
500 1k 5k 10k
50k
020406080100
PBHo
ney
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
020406080100
Sniffles
+BWA
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
020406080100
Sniffles
+NGM−LR
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
100
250
500 1k 5k 10k
50k
INVDEL
INVDUPInversionflankedbydeletions:
• Haemophilia A
Invertedtandemduplication:• Pelizaeus-Merzbacher disease• MECP2• VIPR2
![Page 52: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/52.jpg)
3.2NA12878
• Healthyfemale
• Goldstandardingenomics
• Sequencedwithmanytechnologiesindependently:• Illumina,PacBio,OxfordNanopore
![Page 53: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/53.jpg)
3.2NA12878:Deletioncalling
Tech. Cov. Avg len SVs DEL DUP INV INS TRA
PacBio 55x 4,334 22,877 9,933 162 611 12,052 119
OxfordNanopore
28x 6,432 32,409 27,147 87 323 4,809 43
Illumina 50x 2x101 7,275 3,744 731 553 0 2,247
![Page 54: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/54.jpg)
3.2NA12878:Deletioncalling
Tech. Cov. Avg len SVs DEL DUP INV INS TRA
PacBio 55x 4,334 22,877 9,933 162 611 12,052 119
OxfordNanopore
28x 6,432 32,409 27,147 87 323 4,809 43
Illumina 50x 2x101 7,275 3,744 731 553 0 2,247
![Page 55: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/55.jpg)
3.2OxfordNanoporedeletions
illumina
PacBio
OxfordNanopore
![Page 56: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/56.jpg)
3.2NA12878:Deletioncalling
Tech. Cov. Avg len SVs DEL DUP INV INS TRA
PacBio 55x 4,334 22,877 9,933 162 611 12,052 119
OxfordNanopore
28x 6,432 32,409 27,147 87 323 4,809 43
OxfordNanopore@Baylor
34x 4,982 12,596 7,102 169 113 5,166 46
Illumina 50x 2x101 7,275 3,744 731 553 0 2,247
![Page 57: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/57.jpg)
3.2NA12878:Deletioncalling
Tech. Cov. Avg len SVs DEL DUP INV INS TRA
PacBio 55x 4,334 22,877 9,933 162 611 12,052 119
OxfordNanopore
28x 6,432 32,409 27,147 87 323 4,809 43
OxfordNanopore@Baylor
34x 4,982 12,596 7,102 169 113 5,166 46
Illumina 50x 2x101 7,275 3,744 731 553 0 2,247
![Page 58: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/58.jpg)
3.2NA12878:check2,247 vs 119TRA
Illuminadata
Translocation:
PacBiodata
ONTdata
Truncatedreads:
InsertionInrep.region
Overlap Illumina TRA(%)Translocations 7.74Insertions 53.05Deletions 12.06Duplications 0.57Nested 0.31Highcoverage 1.87Lowcomplexity 9.79Explained 85.40
![Page 59: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/59.jpg)
NA12878:check2,247 TRA
ONTdata
PacBiodata
Illuminadata
InsertionInrep.region
Inversion:
Translocation:
Truncatedreads:
InsertionInrep.region
![Page 60: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/60.jpg)
SKBR-3usingPacbio
(Davidsonetal,2000)
Oftenusedforpre-clinicalresearchonHer2-targetingtherapeuticssuchasHerceptin(Trastuzumab)andresistancetothesetherapies.
MostcommonlyusedHer2-amplifiedbreastcancercellline
80chromosomes insteadof46
![Page 61: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/61.jpg)
Her2GSDMB
TATDN1
8Mb
RARA
PKIA
InversionwasonlyfoundbySniffles
![Page 62: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/62.jpg)
Her2
Chr 17Chr 8
1. Healthychromosome17&82. Translocationinto
chromosome83. Translocationwithin
chromosome84. Complex variantand
invertedduplicationwithinchromosome8
5. Translocationwithinchromosome8
![Page 63: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/63.jpg)
Medicalapproach:UsingNanopore MinION
![Page 64: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/64.jpg)
GBAMutationsinParkinsonandGaucher
![Page 65: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/65.jpg)
ReviewonSVmethodologies
• Whichmethodsdoexistpermethodology?• Assemblyvs.shortreadmappingvs.longreadmapping
• Whataretheadvantages/disadvantagespermethodology• Accuracy• Costs• Limitations,remainingchallenges,complexalleles,polyploidy,etc.
• Whereisthefieldat?• Diploidassemblies• PhasingofSVs+SNPs
• Wehaveanoutlineandajournalthatisinterestedtoworkwithus.
![Page 66: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows](https://reader035.fdocuments.us/reader035/viewer/2022081408/605b44399c0436383d28ce57/html5/thumbnails/66.jpg)
Thankyou
• SVcallingisSNPcallingof2008• Readsaretypicallyshorterthantheallele.• Lotofnoiseinthedata