Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant...
-
Upload
pamela-adams -
Category
Documents
-
view
220 -
download
1
Transcript of Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant...
![Page 1: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/1.jpg)
Next-generation sequencing:from basics to future diagnosticsPART II: NGS analysis to find
variant
Sangwoo Kim, Ph.D.Assistant Professor,
Severance Biomedical Research Institute, Yonsei University College of Medicine
![Page 2: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/2.jpg)
Overview
• PART I: NGS technologies and standard workflow– Next generation sequencing
• History and technology
– Data and its meaning; process workflow– Discussion
• PART II: NGS Analysis to find variants– NGS analysis to find variants
• Single nucleotide variants (SNVs)• Copy number variations (CNVs)• Structural variations (SVs)
• PART III: NGS application to diagnostics – NGS in genomic medicine– Potential application to forensic science
![Page 3: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/3.jpg)
FROM PREVIOUS SESSION
Conventional variant callingVariant calling in minor subgroups
3/123
![Page 4: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/4.jpg)
Next-generation sequencing
Metzker et al, Nat Rev Genet, 2010
Massively Parallel Sequencing (a.k.a. Next-generation sequenc-
ing)
Illumina HiSeq2500
5500 SOLiD sys-tem
Ion Torrent PGM
via spatially separated, clonally amplified DNA templates or single DNA molecules
![Page 5: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/5.jpg)
The human genome project
Began in 1990. Consortium comprised in U.S, U.K, France, Australia, Japan etc.“Rough draft” in 2000“Complete genome” published in 2003
13 years,$3 billion dollars.
The Human Genome Project (1990~2003)
5
![Page 6: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/6.jpg)
FASTQ format (NGS raw data)
one read
sequence
quality
A format for NGS read (FASTQ + quality)
![Page 7: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/7.jpg)
Kim S and Paik S, in preparation
control
sequenc-ing
quality control
short read alignment (BAM files)
sequenc-ingraw reads
(FASTQ files)
germ-line mutation somatic mutation
copy numbervariation (CNV)
structuralvariation (SV)
A. Data Genera-tion
B. Variant Find-ing
C. Variant Anal-ysis
xenogeneic sequence
43%0%
31%
recurrence analysis
GKRRAGGGKRRAV*Gvariant impact prediction
mutation filtration/selection
tumor heterogeneity inference
disease
Box 1. Sequencing types and platforms. Depending on the sequencing purpose, various platforms can be considered for optimiza-tion.Whole genome sequencing (WGS) allows
an inspection of all genomic areas and is applicable for CNV and SV analysis. Whole exome sequencing (WES) only in-terrogates coding regions (1~2% of the genome) with a less cost and throughput. WGS and WES are frequently used for novel causative variant discovery and control sample sequencing is generally mandatory. When a limited regions are to be tested (as in a diagnosis kit), a set of targeted genes are amplified and fed for sequencing (targeted/ panel sequencing). For this case, control is usually omitted when the target sites (hotspots) are clear.
D. Validation and functional assessment
variant confirmation
pathway analysis
functional study
![Page 8: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/8.jpg)
DATA PREPROCESSINGShort Read Alignment
8/123
![Page 9: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/9.jpg)
Mapping back to genome
TAACACCTGGGAAATTCATCACAAAAAGATCTTAGCCTAGGCACATTGTCATTAGGTTATCCAAAGTTAAGACAAAGGAAAGAATCT-TAAGAGCTGTGAGA
Where is this sequence in human genome?
Do this as fast as possible!
![Page 10: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/10.jpg)
brute force way
T G A C G T G T G A T T C A A A A A A G CThe reference genome (chr1, start)
G A T T C A A A Your query
G A T T C A A A
G A T T C A A A
G A T T C A A A
Find “GATTCAAA” in human genome
This is very long (3 billion)
![Page 11: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/11.jpg)
How fast should it be?
time per 1 read (sec)
time per 80x WGS (sec)
is equal to
eyeballing 3x109 3.6x1018 1x1011 yrs
naïve matching 2400 1.2x109 7,608 yrs
improved algorithm 3 3.6x108 10 yrs
minimum required 0.01 1.2x107 11.5 days
desired 0.001 1.2x106 1.2 days
based on 200bp read length, 80x single-end wgs
![Page 12: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/12.jpg)
Searching with index• Assume you’re searching “genome” in
a English dictionary– You don’t search every line in every page– You first find the page range of “g” in the
dictionary– in the above range (of ‘g’), you find the
page range of “ge” in the dictionary– in the above range (of ‘ge’), you find the
page range of “gen” in the dictionary
– ...– until you find “genome”
![Page 13: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/13.jpg)
Searching with index• Assume you’re searching “genome” in
a English dictionary– You don’t search every line in every page– You first find the page range of “g” in the
dictionary– in the above range (of ‘g’), you find the
page range of “ge” in the dictionary– in the above range (of ‘ge’), you find the
page range of “gen” in the dictionary
– ...– until you find “genome”
How can we build an in-
dex for genome?
![Page 14: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/14.jpg)
Burrows-Wheeler Transform
14
![Page 15: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/15.jpg)
Burrows-Wheeler Transformation
BANANA
![Page 16: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/16.jpg)
Burrows-Wheeler Transformation
BANANA$Lexicographically smallest
![Page 17: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/17.jpg)
Burrows-Wheeler Transformation
BANANA$ANANA$B
![Page 18: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/18.jpg)
Burrows-Wheeler Transformation
BANANA$ANANA$BNANA$BA
![Page 19: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/19.jpg)
Burrows-Wheeler Transformation
BANANA$ANANA$BNANA$BAANA$BANNA$BANAA$BANAN$BANANA
![Page 20: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/20.jpg)
Burrows-Wheeler Transformation
0 BANANA$1 ANANA$B2 NANA$BA3 ANA$BAN4 NA$BANA5 A$BANAN6 $BANANA
![Page 21: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/21.jpg)
Burrows-Wheeler Transformation
0 BANANA$1 ANANA$B2 NANA$BA3 ANA$BAN4 NA$BANA5 A$BANAN6 $BANANA
0 6 $BANANA1 5 A$BANAN2 3 ANA$BAN3 1 ANANA$B4 0 BANANA$5 4 NA$BANA6 2 NANA$BA
sort
![Page 22: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/22.jpg)
Burrows-Wheeler Transformation
0 BANANA$1 ANANA$B2 NANA$BA3 ANA$BAN4 NA$BANA5 A$BANAN6 $BANANA
0 6 $BANANA1 5 A$BANAN2 3 ANA$BAN3 1 ANANA$B4 0 BANANA$5 4 NA$BANA6 2 NANA$BA
sort
ANNB$AA
last col-umn
![Page 23: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/23.jpg)
Burrows-Wheeler Transformation
0 BANANA$1 ANANA$B2 NANA$BA3 ANA$BAN4 NA$BANA5 A$BANAN6 $BANANA
0 6 $BANANA1 5 A$BANAN2 3 ANA$BAN3 1 ANANA$B4 0 BANANA$5 4 NA$BANA6 2 NANA$BA
sort
ANNB$AA
last col-umn
BWT(“BANANA$”) = “ANNB$AA”
![Page 24: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/24.jpg)
Burrows-Wheeler Transformation
0 BANANA$1 ANANA$B2 NANA$BA3 ANA$BAN4 NA$BANA5 A$BANAN6 $BANANA
0 6 $BANANA1 5 A$BANAN2 3 ANA$BAN3 1 ANANA$B4 0 BANANA$5 4 NA$BANA6 2 NANA$BA
sort
ANNB$AA
last col-umn
BWT(“BANANA$”) = “ANNB$AA”1. BWT just changes the order of the string2. BWT tends to collect similar characters together3. With only the transformed string, we can easily get the original string
![Page 25: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/25.jpg)
LF Search
0 6 $BANANA1 5 A$BANAN2 3 ANA$BAN3 1 ANANA$B4 0 BANANA$5 4 NA$BANA6 2 NANA$BA
Question: Find “NAN” from BANANA
![Page 26: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/26.jpg)
LF Search
0 6 $BANANA1 5 A$BANAN2 3 ANA$BAN3 1 ANANA$B4 0 BANANA$5 4 NA$BANA6 2 NANA$BA
Question: Find “NAN” from BANANA
NANN
ANNAN
![Page 27: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/27.jpg)
LF Search
0 6 $BANANA1 5 A$BANAN2 3 ANA$BAN3 1 ANANA$B4 0 BANANA$5 4 NA$BANA6 2 NANA$BA
Question: Find “NAN” from BANANA
NAN
The range of strings that start with “N” can be calculated from:
• the number of symbols that are lexicographi-cally less than ‘N’• to determine the start point
• the number of ‘N’• to determine the end point
start
end
![Page 28: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/28.jpg)
LF Search
0 6 $BANANA1 5 A$BANAN2 3 ANA$BAN3 1 ANANA$B4 0 BANANA$5 4 NA$BANA6 2 NANA$BA
Question: Find “NAN” from BANANA
NAN
The range of strings that start with “N” can be calculated from:
• the number of symbols that are lexicographi-cally less than ‘N’• to determine the start point
• =5 • the number of ‘N’
• to determine the end point• =2
start
end
![Page 29: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/29.jpg)
LF Search
0 6 $BANANA1 5 A$BANAN2 3 ANA$BAN3 1 ANANA$B4 0 BANANA$5 4 NA$BANA6 2 NANA$BA
Question: Find “NAN” from BANANA
NAN
The range of strings that start with “N” can be calculated from:
• the number of symbols that are lexicographi-cally less than ‘N’• to determine the start point
• =5 • the number of ‘N’
• to determine the end point• =2
start
end
![Page 30: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/30.jpg)
LF Search
0 6 $BANANA1 5 A$BANAN2 3 ANA$BAN3 1 ANANA$B4 0 BANANA$5 4 NA$BANA6 2 NANA$BA
Question: Find “NAN” from BANANA
NAN
The range of strings that start with “AN” can be calculated from:
• the number of symbols that are lexicographi-cally less than ‘A’• to determine the start point
• =1 • the number of ‘A’
• to determine the end point• =3
start
end
![Page 31: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/31.jpg)
LF Search
0 6 $BANANA1 5 A$BANAN2 3 ANA$BAN3 1 ANANA$B4 0 BANANA$5 4 NA$BANA6 2 NANA$BA
Question: Find “NAN” from BANANA
NAN
The range of strings that start with “AN” can be calculated from:
• the number of symbols that are lexicographi-cally less than ‘A’• to determine the start point
• =1 • the number of ‘A’
• to determine the end point• =3
start
end
This is a range for ‘A’ not ‘AN’!!
![Page 32: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/32.jpg)
LF Search
0 6 $BANANA1 5 A$BANAN2 3 ANA$BAN3 1 ANANA$B4 0 BANANA$5 4 NA$BANA6 2 NANA$BA
Question: Find “NAN” from BANANA
NAN
The range of strings that start with “AN” can be calculated from:
• the number of symbols that are lexicographi-cally less than ‘A’• to determine the start point
• =1 • the number of ‘A’
• to determine the end point• =3
start
end
![Page 33: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/33.jpg)
LF Search
0 6 $BANANA1 5 A$BANAN2 3 ANA$BAN3 1 ANANA$B4 0 BANANA$5 4 NA$BANA6 2 NANA$BA
Question: Find “NAN” from BANANA
NAN
The range of strings that start with “AN” can be calculated from:
• the number of symbols that are lexicographi-cally less than ‘A’• to determine the start point
• =1 • the number of ‘A’
• to determine the end point• =3
start
end
count of ‘A’ before start point = 1
![Page 34: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/34.jpg)
LF Search
0 6 $BANANA1 5 A$BANAN2 3 ANA$BAN3 1 ANANA$B4 0 BANANA$5 4 NA$BANA6 2 NANA$BA
Question: Find “NAN” from BANANA
NAN
The range of strings that start with “AN” can be calculated from:
• the number of symbols that are lexicographi-cally less than ‘A’ + number of ‘A’ before start point• to determine the start point
• =1 + 1 = 2• the number of ‘A’ before end point
• to determine the end point• =3
start
end
count of ‘A’ before start point = 1“Ax” is not “AN” and less than “AN”
![Page 35: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/35.jpg)
LF Search
0 6 $BANANA1 5 A$BANAN2 3 ANA$BAN3 1 ANANA$B4 0 BANANA$5 4 NA$BANA6 2 NANA$BA
Question: Find “NAN” from BANANA
NAN
start
end
The range of strings that start with “NAN” can be calculated from:
• the number of symbols that are lexicographi-cally less than ‘N’ + number of ‘N’ before start point• to determine the start point
• =5 + 1 = 6• the number of ‘N’ before end point
• to determine the end point• =2
![Page 36: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/36.jpg)
LF Search
0 6 $BANANA1 5 A$BANAN2 3 ANA$BAN3 1 ANANA$B4 0 BANANA$5 4 NA$BANA6 2 NANA$BA
Question: Find “NAN” from BANANA
NAN
startend
2nd row at the original permutation=number of rotations of original string=“NAN” exists at the 3rd position of “BANANA”
BANANA
![Page 37: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/37.jpg)
Genome Informatics I (2015 Spring)
Genome query
imported from Mike Schatz’s slidehttp://schatzlab.cshl.edu/teaching/2010/Lecture%202%20-%20Sequence%20Alignment.pdf
![Page 38: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/38.jpg)
Genome Informatics I (2015 Spring)
Genome query
imported from Mike Schatz’s slidehttp://schatzlab.cshl.edu/teaching/2010/Lecture%202%20-%20Sequence%20Alignment.pdf
![Page 39: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/39.jpg)
Genome Informatics I (2015 Spring)
Genome query
imported from Mike Schatz’s slidehttp://schatzlab.cshl.edu/teaching/2010/Lecture%202%20-%20Sequence%20Alignment.pdf
![Page 40: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/40.jpg)
Genome Informatics I (2015 Spring)
Genome query
imported from Mike Schatz’s slidehttp://schatzlab.cshl.edu/teaching/2010/Lecture%202%20-%20Sequence%20Alignment.pdf
![Page 41: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/41.jpg)
Genome Informatics I (2015 Spring)
Genome query
imported from Mike Schatz’s slidehttp://schatzlab.cshl.edu/teaching/2010/Lecture%202%20-%20Sequence%20Alignment.pdf
![Page 42: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/42.jpg)
Genome Informatics I (2015 Spring)
Genome query
imported from Mike Schatz’s slidehttp://schatzlab.cshl.edu/teaching/2010/Lecture%202%20-%20Sequence%20Alignment.pdf
![Page 43: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/43.jpg)
Genome Informatics I (2015 Spring)
Genome query
imported from Mike Schatz’s slidehttp://schatzlab.cshl.edu/teaching/2010/Lecture%202%20-%20Sequence%20Alignment.pdf
![Page 44: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/44.jpg)
Genome Informatics I (2015 Spring)
Inexact matchingT G A C G T G T G A T T C A A A A A A G C
G A T T G A A A
When exact match does not exist:• continue other possible candidates (G -> A, C, T) and increase the mismatch count• If another mismatch occurs, again branch it out. • So edit distance is critical to alignment speed
![Page 45: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/45.jpg)
Genome Informatics I (2015 Spring)
Goal achieved
time per 1 read (sec)
time per 80x WGS (sec)
is equal to
eyeballing 3x109 3.6x1018 1x1011 yrs
naïve matching 2400 1.2x109 7,608 yrs
improved algorithm 3 3.6x108 10 yrs
minimum required 0.01 1.2x107 11.5 days
desired 0.001 1.2x106 1.2 days
![Page 46: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/46.jpg)
VARIANT CALLING – SNV CALLINGSNV calling
46/123
![Page 47: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/47.jpg)
Detailed View
one read = one DNA fragmentaligned to a specific genomic region
= observation of our sample in this re-gion (1 time)
A genome region
![Page 48: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/48.jpg)
Detailed View
A—AAAACAAAAC
A certain genomic posi-tion (in bp)
![Page 49: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/49.jpg)
Detailed View
A—AAAACAAAAC
A certain genomic posi-tion (in bp)
reference allele
observation of our sample at this position from read 1
observation of our sample at this position from read 2
observation of our sample at this position from read 10
![Page 50: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/50.jpg)
Why multiple observations?• Observations contain errors– errors from machine
• basecall error
– errors from mapping• mapping error
– errors from others• library prep error
• With accuracy of 99%...– 1% error from whole region– leads to
• ~30million false SNPs for whole genome• ~500k false SNPs for whole exome
![Page 51: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/51.jpg)
Human diploid genomeG
A
G
G A
A
Homozygotic Reference
Heterozygotic Alternative
Homozygotic Alternative
G G
G GG
GG GGG G G
ASequencing error / map-ping error
G G
GGGG
G
A AA A
AA A
AA
AA
AA
AA
AA
A
A
somatic mutations
51/123
![Page 52: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/52.jpg)
Allele fraction distribution (binomial)
Pr (𝜇−3𝜎 ≤ 𝑥≤𝜇+3𝜎 )≈0.9973Pr (35≤𝑥 ≤65)≈0.9973
Normal approximation of B(100,0.5)
52/123
![Page 53: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/53.jpg)
Allele fraction distribution (binomial)G G
G GG
GG GGG G G
A
G G
GGGG
G
A AA A
AA A
AA
AA
AA
AA
AA
A
A
53/123
![Page 54: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/54.jpg)
Inferring mutations
GAGAGGGGGAAAGAGA
reference allele
• True genotype = “AA” and no sequencing error
• True genotype = “AB” and– Read was generated from ‘A’ allele and no sequencing
error
– Read was generated from ‘B’ allele and sequencing error and ‘A’ was generated by chance
• True genotype = “BB” and sequencing error
Probability of observing “G” at the site of “G”
Obs
erva
tion
of d
onor
gen
ome
![Page 55: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/55.jpg)
Inferring mutations
GAGAGGGGGAAAGAGA
reference allele
Probability of observing “A” at the site of “G”
Obs
erva
tion
of d
onor
gen
ome
• True genotype = “AA” and sequencing errorP(e)
• True genotype = “AB” and- Read was generated from ‘A’ allele and sequencing error and ‘T’ was generated by chance
- Read was generated from ‘B’ allele and no sequencing error
• True genotype = “BB” and no sequencing error
![Page 56: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/56.jpg)
Genotype determination
• L(g=AA|D)• L(g=AB|D)• L(g=BB|D)
Likelihood that the genotype is wild-type given the observation!
Likelihood that the genotype is mutant given the observation!
![Page 57: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/57.jpg)
57
Tools
![Page 58: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/58.jpg)
SOMATIC MUTATIONS
58
![Page 59: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/59.jpg)
59
Germline vs. Somatic mutation
sample from non-disease site
sample from disease site
reference sequence (e.g. hg19)
• UnifiedGenotyper• VarScan2• SomaticSniper• …
![Page 60: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/60.jpg)
60
Easy way to somatic mutations
sample from non-disease site
sample from disease site
GN=AA
GT=AB
![Page 61: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/61.jpg)
61
Joint Probabilities
![Page 62: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/62.jpg)
62
Joint Probabilities• P(GT=AB|GN=AA)
≠P(GT=AB|GN=AB) ≠P(GT=AB|GN=BB)Tumor genotype is dependent on normal genotype!!!
G: Joint Genotype Matrix
![Page 63: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/63.jpg)
WHEN SAMPLE IS NOT PURE
63
![Page 64: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/64.jpg)
Heterogeneous Sample
G G
Normal Cells
G GG G
G G
Tumor Cells
G AG G
GGG
GG
AA
GG
GG
G
G
G G
G GG
GG GGG G
G
64/123
![Page 65: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/65.jpg)
Causes of low-frequency• Sample contamination (e.g. stromal cells)
65/123
![Page 66: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/66.jpg)
Causes of low-frequency• Sample contamination (e.g. stromal cells)• Tumor heterogeneity
66/123
![Page 67: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/67.jpg)
Causes of low-frequency• Sample contamination (e.g. stromal cells)• Tumor heterogeneity• Extreme environments
67/123
![Page 68: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/68.jpg)
Causes of low-frequency• Sample contamination (e.g. stromal cells)• Tumor heterogeneity• Extreme environments• Somatic mosaicism
68/123
![Page 69: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/69.jpg)
Heterogeneous Sample
G G
GGG
GG
AA
GG
GG
G
G
“2/15: No mutation. Two ‘A’s are from sequencing errors”
“2/15: Heterozygous somatic mutation!! The sample is certainly heterogeneous!”
VS
69/123
![Page 70: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/70.jpg)
Heterogeneous Sample
G G
GGG
GG
AA
GG
GG
G
G
“2/15: No mutation. Two ‘A’s are from sequencing errors...”
“2/15: Heterozygous somatic mutation!! The sample is certainly heterogeneous!”
VS
“How do we know this?”
70/123
![Page 71: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/71.jpg)
Estimating Cellularity • It is “easy” only if we already know where to see
(disease genotype is AB or BB)
But how do we know the genotype? (even without knowing α?)
1. Use SNP array - ONCOSNP (Yau et al, Genome Biol, 2009), Absolute (Carter et al, Nature Biotech, 2012)
2. SNP Calling - Snyder et al, PNAS, 2010, PurityEst (Su et al, Bioinformatics, 2012)
71/75
![Page 72: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/72.jpg)
Accurate inference in Virmid
Estimate global within-individual con-tamination to accurate detection of so-matic mutations
72/123
![Page 73: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/73.jpg)
Bias 1 - Loss of Reads (Virmid)
A
B
A
AB
𝑥𝑎=𝑝 (a read that passes 𝑔1 being unmapped )
g1
g2
𝑥𝑏=𝑝 (a read that passes 𝑔2 being unmapped )
¿𝑝 (𝑟1 has 𝑑+1or more variants in the remaining sites )
¿𝑝 (𝑟2 has 𝑑or more variants in the remaining sites )
r1r2
ref
𝑥𝑎=1−∑𝑖=0
𝑑
(𝑙−1𝑖 )𝑝𝑖 (1−𝑝 )𝑙 −1−𝑖𝑥𝑏=1−∑
𝑖=0
𝑑−1
(𝑙−1𝑖 )𝑝𝑖 (1−𝑝 )𝑙− 1− 𝑖
, where 𝑑=maximum edit distance , 𝑙=read length , and 𝑝=frequency of variation
73/123
![Page 74: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/74.jpg)
Bias 2 - Loss of variants (Virmid)
reads from nor-mal
reads from dis-easeB-al-
lele
α
1-α
overestimate BAF
underestimate α
74/123
![Page 75: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/75.jpg)
Estimated α
underestimated α
overestimated α
75/123
![Page 76: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/76.jpg)
Calling low-fraction somatic mutations in Virmid
Kim S et al, Genome Biology 2013
76/123
![Page 77: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/77.jpg)
Low frequent mutations in disease
Identification of de novo somatic mutation in ATK-MTOR-PIK3CA in hemimega-lencephaly
Lee J et al, Nature Genetics, 2012
77/123
![Page 78: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/78.jpg)
Low frequent mutations in disease
Lim J et al, Nature Medicine 2015
Identification of MTOR driver mutations in focal cortical dysplaisa
78/123
![Page 79: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/79.jpg)
COPY NUMBER VARIATION (CNV)
79
![Page 80: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/80.jpg)
Copy Number VariationChanges in copy number of large DNA segment - usually in terms of genes- e.g. HER2 amplification
Types of CNVs- Copy number gain (CN > 2):
- Increase of copy number due to ge-nomic rearrangement like insertion/duplication
- Copy number loss (CN < 2):- Decrease of copy number due to
deleterious genomic rearrangements
Copy number aberration (CNA)- refers to CNV particularly when the
events are associated with disease phe-notype
![Page 81: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/81.jpg)
Comparative Genome Hybridization (CGH)
500kb-1500kb fragmentfor optimal hybridization
![Page 82: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/82.jpg)
Array CGH
![Page 83: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/83.jpg)
Resolution
![Page 84: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/84.jpg)
Benefits of NGS-based CNV detection
• High resolution (< 50 bp) in size• Data reuse (multi-purpose)– One NGS (whole-genome) sequencing
can be used to SNV, CNV, SV detection
• Can be improved with additional NGS information– Discordant reads in paired-end sequenc-
ing
![Page 85: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/85.jpg)
Inferring CNVs from NGS
• Principle:– Samples with copy number gain (or loss)
will generate more (or less) reads in the region
gene
3 Copy (gain) 2 Copy (nor-mal)
1 Copy (loss)
![Page 86: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/86.jpg)
Genome Informatics I (2015 Spring)
The signal3 Copy (gain) 2 Copy (nor-
mal)1 Copy (loss)
mapped to reference
![Page 87: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/87.jpg)
The signal3 Copy (gain) 2 Copy (nor-
mal)1 Copy (loss)
mapped to reference
catching these needs a system-atic approach!
![Page 88: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/88.jpg)
Catching the signal
• Problems– Read depth is not uniform even without
copy number changes• GC bias• Mapping bias in repeat region• Natural variance (Poisson distribution)
Poisson distribution: - The probability of a given number of events occurring in a fixed interval of time and/or space.
Example:- You have 120 phone calls a day, what is the best way to describe the
number of phone call in an hour?- Similarly, you generated 100,000,000 NGS reads from whole genome, what is the number of reads generated within chr1:12781718-12782228?
![Page 89: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/89.jpg)
Significantly deviated read-depth
• Null hypothesis (H0):– copy number of a given region is unchanged– we assume the read-depth follows Poisson dist.
• Alternative hypothesis (Ha):– copy number of a given region is changed
• If H0 is right:– The read-depth (calculated from number of reads) within
a specific genomic region is not significantly deviated from the Poisson distribution
• If the read-depth is too deviated to explain with natural variance (Poisson distribution)– Copy number has been changed
![Page 90: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/90.jpg)
Practically, we should consider
• Bias correction from sequence con-text (GC-bias, etc.)
• Event detection method– If the significant rise (or drop) of read-
depth looks like an event• mean-shift technique (CNVnator, Abyzov et
al 2013)• event-wise testing (Yoon et al, 2009)• paired-end signal (CNVer, Medvedev et al
2010)
![Page 91: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/91.jpg)
CNVNator
91/123
![Page 92: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/92.jpg)
STRUCTURE VARIATION (SV)
92
![Page 93: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/93.jpg)
Beyond the SNVs
![Page 94: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/94.jpg)
Beyond the SNVs
![Page 95: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/95.jpg)
Beyond the SNVs
TFE3-KHSRP Translocation in Renal Cell Carcinoma
![Page 96: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/96.jpg)
Structural Variations (SVs)
• Genomic rearrangements that affect >50bp of sequence
Alkan et al, Nat. Rev. Genetics 12, 363-376, 2011
![Page 97: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/97.jpg)
List of structural variations
![Page 98: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/98.jpg)
98/123
List of structural variations
![Page 99: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/99.jpg)
Paired-end sequencing
![Page 100: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/100.jpg)
Bix Seminar UCSD 100/123
Paired end reads for SV finding
Donor
Reference
Donor
Reference
![Page 101: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/101.jpg)
Methods for SV detection
• Read deptho Assume a random distribution in mapping deptho Significantly higher depth for duplicated regionso Significantly reduced depth for deleted regions
• Read pairo Assess the span and orientation of paired end reads
• Split Reado Define breakpoints of SVs using split-sequence-read
signature (broken alignment)
• Assemblyo Assemble and reconstruct the whole genome of
sample DNA
![Page 102: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/102.jpg)
Methods for SV detection
• Read deptho Assume a random distribution in mapping deptho Significantly higher depth for duplicated regionso Significantly reduced depth for deleted regions
• Read pairo Assess the span and orientation of paired end reads
• Split Reado Define breakpoints of SVs using split-sequence-read
signature (broken alignment)
• Assemblyo Assemble and reconstruct the whole genome of
sample DNA
![Page 103: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/103.jpg)
Methods for SV detection
• Read deptho Assume a random distribution in mapping deptho Significantly higher depth for duplicated regionso Significantly reduced depth for deleted regions
• Read pairo Assess the span and orientation of paired end reads
• Split Reado Define breakpoints of SVs using split-sequence-read
signature (broken alignment)
• Assemblyo Assemble and reconstruct the whole genome of
sample DNA
![Page 104: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/104.jpg)
Methods for SV detection
• Read deptho Assume a random distribution in mapping deptho Significantly higher depth for duplicated regionso Significantly reduced depth for deleted regions
• Read pairo Assess the span and orientation of paired end reads
• Split Reado Define breakpoints of SVs using split-sequence-read
signature (broken alignment)
• Assemblyo Assemble and reconstruct the whole genome of
sample DNA
![Page 105: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/105.jpg)
Methods for SV detection
• Read deptho Assume a random distribution in mapping deptho Significantly higher depth for duplicated regionso Significantly reduced depth for deleted regions
• Read pairo Assess the span and orientation of paired end reads
• Split Reado Define breakpoints of SVs using split-sequence-read
signature (broken alignment)
• Assemblyo Assemble and reconstruct the whole genome of
sample DNA
![Page 106: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/106.jpg)
Methods for SV detection
• Read deptho Assume a random distribution in mapping deptho Significantly higher depth for duplicated regionso Significantly reduced depth for deleted regions
• Read pairo Assess the span and orientation of paired end reads
• Split Reado Define breakpoints of SVs using split-sequence-read
signature (broken alignment)
• Assemblyo Assemble and reconstruct the whole genome of
sample DNA
![Page 107: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/107.jpg)
Methods for SV detection
• Read deptho Assume a random distribution in mapping deptho Significantly higher depth for duplicated regionso Significantly reduced depth for deleted regions
• Read pairo Assess the span and orientation of paired end reads
• Split Reado Define breakpoints of SVs using split-sequence-read
signature (broken alignment)
• Assemblyo Assemble and reconstruct the whole genome of
sample DNA
![Page 108: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/108.jpg)
Methods for SV detection
• Read deptho Assume a random distribution in mapping deptho Significantly higher depth for duplicated regionso Significantly reduced depth for deleted regions
• Read pairo Assess the span and orientation of paired end reads
• Split Reado Define breakpoints of SVs using split-sequence-read
signature (broken alignment)
• Assemblyo Assemble and reconstruct the whole genome of
sample DNA
![Page 109: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/109.jpg)
Methods for Deletion Detection
![Page 110: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/110.jpg)
Methods for Deletion Detection
![Page 111: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/111.jpg)
Methods for Deletion Detection
![Page 112: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/112.jpg)
Methods for Deletion Detection
![Page 113: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/113.jpg)
Methods for Deletion Detection
![Page 114: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/114.jpg)
Methods for Deletion Detection
![Page 115: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/115.jpg)
Problems 1. Judgment of discordance
![Page 116: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/116.jpg)
Problems 1. Judgment of discordance
![Page 117: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/117.jpg)
Problem 2. Size of insertion
![Page 118: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/118.jpg)
Problem 2. Large indels
Novel Sequence Insertion
![Page 119: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/119.jpg)
Problem 2. Large Indels
Existing Se-quence Insertion
![Page 120: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/120.jpg)
Problem 3. Nonspecific Mappings
![Page 121: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/121.jpg)
Problem 3. Nonspecific Mappings
![Page 122: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/122.jpg)
DISCUSSION
122/123
![Page 123: Next-generation sequencing: from basics to future diagnostics PART II: NGS analysis to find variant Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f275503460f94c3f698/html5/thumbnails/123.jpg)
THANK YOU
123/123