Lecture 3. Topics in High-Throughput Sequencing (Identification of Genetic Variations) The Chinese...
-
Upload
darrell-richard -
Category
Documents
-
view
225 -
download
0
Transcript of Lecture 3. Topics in High-Throughput Sequencing (Identification of Genetic Variations) The Chinese...
Lecture 3. Topics in High-Throughput Sequencing (Identification of Genetic Variations)
The Chinese University of Hong KongCSCI5050 Bioinformatics and Computational Biology
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 2
Lecture outline1. Types of genetic variation2. Single nucleotide variants and small
insertions/deletions3. Large insertions/deletions and translocations4. Repeats and copy number variations5. Inversions
Last update: 20-Sep-2015
TYPES OF GENETIC VARIATIONPart 1
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 4
Genetic “variation”• Two main definitions:
1. Differences in DNA among different individuals in a population
2. Differences in DNA between an individual and a reference (focus of this lecture)• Sometimes, it is easy to define the reference
– The human reference sequence(http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/ https://en.wikipedia.org/wiki/Human_Genome_Project)
– “Normal” genome (e.g., blood from the same cancer patient)
• Sometimes, it is not easy to define– A’s insertion with respect to B is B’s deletion with respect to A
– Which one is more “normal”?
Last update: 20-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 5
Types of genetic variation• Single nucleotide variants (SNVs)– Single nucleotide polymorphisms (SNPs) if found
in >1% of individuals in a population• Small insertions/deletions (indels)– Several nucleotides long
• Structural variations (SVs)– Larger variations
Last update: 20-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 6
Some proposed definitions
Last update: 20-Sep-2015
Term Definition
Structural variant (SV) A genomic alteration (e.g., a CNV, and inversion) that involves segments of DNA >1kb
Copy number variant (CNV) A duplication or deletion event involving >1kb of DNA
Duplicon A duplicated genomic segment >1kb in length with >90% similarity between copies
Indel Variation from insertion or deletion event involving <1kb of DNA
Intermediate-sized structural variant (ISV)
A structural variant that is -8kb to 40kb in size. This can refer to a CNV or a balanced structural rearrangement (e.g., an inversion)
Low copy repeat (LCR) Similar to segmental duplication
Multisite variant (MSV) Complex polymorphic variation that is neither a PSV nor a SNP
Paralogous sequence variant (PSV) Sequence difference between duplicated copies (paralogs)
Segmental duplication Duplicated region ranging from 1kb upward with a sequence identity of >90%
Interchromosomal Duplications distributed among nonhomologous chromosomes
Intrachromosomal Duplications restricted to a single chromosome
Single nucleotide polymorphism (SNP)
Base substitution involving only a single nucleotide; ~10 million are thought to be present in the human genome at >1%, leading to an average of one SNP difference per 1250 bases between randomly chosen individuals
Table source: Freeman et al., Genome Research 16(8):949-961, (2006) More commonly used
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 7
Origin of genetic variations• SNVs: Errors during DNA replication that survive the
proof-reading and mismatch-repair mechanisms
Last update: 20-Sep-2015
Image credit: Wikipedia; Martin and D. Scharff, Nature Reviews Immunology 2(8):605-614, (2002)
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 8
Origin of genetic variations• SVs: Various mechanisms
– FoSTeS: Fork stalling and template switching
– MEI: Mobile element insertion
– NAHR: Non-allelic homologous recombination
– NHEJ: Non-homologous end-joining
Last update: 20-Sep-2015
Image credit: Bickhart and Liu, Frontiers in Genetics 10.3389, (2014); Xing et al., Genome Research 19(9):1516-1526, (2009)
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 9
Origin of genetic variations• SVs: More figures
Last update: 20-Sep-2015
Image credit: Gu et al., PathoGenetics 1(1):4, (2008)
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 10
Consequence of genetic variants• Hitting genes:
Last update: 20-Sep-2015
Image source: http://www.nbs.csudh.edu/chemistry/faculty/nsturm/CHEMXL153/DNAMutationRepair.htm
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 11
Consequence of genetic variants• Hitting genes:
– Synonymous (silent) mutation (no change in protein sequence)• May still affect translational efficiency
– Nonsense mutation (pre-mature stop codon)– Read-through (removal of the stop codon)– Missense mutation (change of one/a few amino acids)– Frameshift (shifting the reading frame)– Affecting splicing (removal/new acceptor site or donor site)– Deletion of whole exon/gene– Changing gene copy number– Gene fusion– ...
• Others (more difficult to determine):– Disrupting protein binding sites– Affecting gene regulation– Affecting DNA 3D structure– ...
• See “Effect prediction details” section of SnpEff manual (http://snpeff.sourceforge.net/SnpEff_manual.html) for more details
Last update: 20-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 12
Using NGS to identify genetic variations
• General steps:1. Align sequencing reads to reference
ORConstruct sequence assembly from sequencing reads
2. Look for differences• The alignment strategy only works when accurate and
efficient read alignment is possible.– Cannot determine parts that are completely not in reference
• The assembly strategy only works for genomic regions that can be accurately assembled.
• In both strategies, it is also required to distinguish between sequencing errors/biases and variants.
Last update: 20-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 13
DNA-seq vs. RNA-seq in calling variants
• Using DNA to identify genetic variants could identify variants not functionally significant– Example: Fused gene due to translocation not
actually expressed• Using RNA to identify genetic variants could
falsely treat post-transcriptional modifications as genetic variants– Example: RNA editing
• In general, good to have support from both DNA and RNA data
Last update: 20-Sep-2015
SINGLE NUCLEOTIDE VARIANTS AND SMALL INSERTIONS/DELETIONS
Part 2
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 15
A typical pipeline• The Genome Analysis Toolkit (GATK) workflow for
calling variants in RNA-seq data (similar for DNA-seq)
Last update: 20-Sep-2015
Image credit: Broad Institute, https://www.broadinstitute.org/gatk/guide/tagged?tag=rnaseq
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 16
More details of pipeline• The Genome Analysis Toolkit (GATK) workflow for
calling variants in RNA-seq data
Last update: 20-Sep-2015
Image credit: Broad Institute, https://www.broadinstitute.org/gatk/guide/tagged?tag=rnaseq
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 17
Read re-alignment• In standard sequence alignment, each read is
aligned to reference independently.• To discover indels accurately, re-alignment by
combining information from multiple reads is recommended.– Usually fix mis-alignments at read ends
• Example:Reference: CGACCGTRead 1: ACCAGT (more likely to be one insertion than two SNVs)Read 2: CGACCA (not sure whether it is insertion or SNV by itself,
more likely to be an insertion after considering read 1)
Last update: 20-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 18
Read re-alignment
Last update: 20-Sep-2015
Before re-alignment After re-alignmentImage credit: DePristo et al., Nature Genetics 43(5):491-498, (2011)
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 19
Re-calibration of base quality scores1. Assuming the observed quality score is affected by:– Actual quality score– Machine cycle (i.e., base position on the read)– Di-nucleotide context (the base itself and the one before)
2. Estimating the weight of each factor using mismatches at loci not known to vary in the dbSNP database of genetic variants– All these mismatches are assumed to be due to
sequencing errors
3. Adjusting the quality scores accordingly
Last update: 20-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 20
Re-calibration of base quality scores
Last update: 20-Sep-2015
Image credit: DePristo et al., Nature Genetics 43(5):491-498, (2011)
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 21
Calling SNVs• Notations:– D: data (all bases aligned to a position)– Di: the i-th aligned base (i.e., the base aligned to the
position on the i-th read)– Gj, Gk: genotypes
– Hj1, Hj2: alleles (haplotypes) of Gj
– i: base calling error rate of the i-th aligned base
• Bayesian formulation:
Last update: 20-Sep-2015
Pr൫𝐺𝑗ห𝐷൯= Pr൫𝐺𝑗൯Pr൫𝐷ห𝐺𝑗൯σ Prሺ𝐺𝑘ሻPrሺ𝐷ȁ𝐺𝑘ሻ𝐺𝑘=ሼAA,Aȁ,ȁȁሽPr൫𝐷ห𝐺𝑗൯= ෑ� ቈ
Pr൫𝐷𝑖ห𝐻𝑗1൯2 + Pr൫𝐷𝑖ห𝐻𝑗2൯2 𝑖Pr൫𝐷𝑖ห𝐻𝑗1൯=൜1− ε𝑖 if 𝐷𝑖 = 𝐻𝑗1ε𝑖 if 𝐷𝑖 ≠ 𝐻𝑗1
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 22
Calling indels• For indels, Pr(Di|Hj1) is computed based on a hidden
Markov model:– Ix, Iy: The two indel haplotypes– : gap opening penalty– : gap extension penalty– pxi
,yj: likelihood of aligning xi and yj
– qxi : likelihood of aligning xi and a gap
Last update: 20-Sep-2015
Image source: https://www.broadinstitute.org/gatk/events/slides/1307/GATKwh1-BP-5-Variant_calling.pdf
LARGE INSERTIONS/DELETIONS AND TRANSLOCATIONS
Part 3
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 24
Useful types of information• Split reads: One single read aligned to two different locations on reference
– Precisely define break points– Could be difficult to align– Relatively rare
• Paired-end reads: The two reads in a mate pair aligned to the reference with an unexpected distance, or one read cannot be aligned– Easier to happen– Reads easier to align– Cannot determine precise break points– Could be hard to judge if it is an SV due to inexact insert size
• Read depth/alignment quality: Drop of read depth/alignment quality around break points due to difficulty of alignment, and lack of aligned reads in deleted regions– Can be observed even in standard alignment pipelines– The drop is not always clear– Some drops could be due to other reasons
Last update: 20-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 25
Useful types of information
Last update: 20-Sep-2015
Image credit: Keane et al., Frontiers in Genetics 10.3389, (2014)
Expected insert size
Distance of the aligned locations on the reference
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 26
Alignment strategies• Split mapping– Need to try different possible ways to split a read,
or use specialized alignment algorithms– If the split is too imbalanced, the shorter part may
not be aligned (uniquely)• Constructing junction library (also used in
aligning RNA-seq reads), then aligning reads onto the putative junction sequences– Need to first have a rough idea of the break points– Need to try different possible junctions
Last update: 20-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 27
Junction library• Suppose we have the following rough estimate
of the break points of a deletion (e.g., based on alignment of paired-end reads):
• Possible junctions:
Last update: 20-Sep-2015
A C G A G A T A C T G A C A G A T T A C T G A T G C A G T A
A C G A G A T G A T G C A G T AA C G A G A T A T G C A G T AA C G A G A T T G C A G T AA C G A G A T G C A G T AA C G A G A G A T G C A G T AA C G A G A A T G C A G T AA C G A G A T G C A G T AA C G A G A G C A G T A
A C G A G G A T G C A G T AA C G A G A T G C A G T AA C G A G T G C A G T AA C G A G G C A G T AA C G A G A T G C A G T AA C G A A T G C A G T AA C G A T G C A G T AA C G A G C A G T A
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 28
Real SVs vs. sequencing/alignment errors
• Real SVs are usually indicated by:– Even coverage around break points (ladder)– Good base quality and alignment scores
Last update: 20-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 29
Real SVs vs. sequencing/alignment errors
• A good case:
Last update: 22-Sep-2015
Putative junction sequence
Break points
Paired-end reads
Split reads
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 30
Real SVs vs. sequencing/alignment errors
• A bad case (gray portions of reads are aligned perfectly; colored portions are mismatches, reads marked in dark red have unexpected insert sizes):
Last update: 22-Sep-2015
Break points
Putative junction sequence
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 31
Break point confusion• SVs could be due to micro-homology at the
break points:
– Does the GAT come from the paternal or maternal copy?• Does it matter?• It matters more if we want to know what happens to
the other ends of the breaks
Last update: 20-Sep-2015
A C G A G A T A C T G A C A G A T T A C T G A T G C A G T A
A C G A G A T A C T G A C A G A T T A C T G A T G C A G T A
Paternal
Maternal
A C G A G A T G C A G T A
REPEATS AND COPY NUMBER VARIATIONS
Part 4
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 33
Copy number variation• For a diploid organism, each cell contains two
copies of the same chromosome.– If a gene is unique, there are exactly two copies of
it.• Sometimes, the copy number is not 2:– Paralogs (gene duplication – various mechanisms)– Retro-transcription– Aneuploidy (not exactly 2 copies of each
chromosome)• Whole genome• Whole chromosome
Last update: 20-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 34
Copy number variation• In general, DNA regions can have 2 copies for
many reasons• Copy numbers can have significant
consequences. For example,– Haploinsufficiency (having only one copy cannot
maintain function)– Gene dosage (amount of transcripts/proteins)– Complex phenotypic consequences (e.g., copy
number of DUF1220 domain related to human brain size and diseases)
Last update: 20-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 35
Smaller-scale repeats• Genomes contain many types of repeats
– By size• Tandem repeats: one immediately after another
– E.g., TTAGGG at telomeres: related to protection
• Short interspersed nuclear elements (SINEs)– E.g., Alu elements: ~280bp, GC rich
• Long interspersed nuclear elements (LINEs)– E.g., L1 elements: ~6-8kbp, AT rich
– By number of occurrences– By mechanism: transposable elements (TEs)
• Retrotransposons (transcription reverse transcription): copy and paste• DNA transposons: cut and paste
• Some regions are defined as low complexity regions (LCRs) – regions with low information content
Last update: 20-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 36
Identifying CNVs• Useful information:– For determining boundaries:• Split reads• Paired-end reads• Loss of heterozygosity (LOH)
– For determining both boundaries and copy number:• Read depth, relative to “normal”
– Could be hard to define the “normal” line
• B-allele frequency (BAF)• Long reads, if long enough
Last update: 20-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 37
LOH• Typically, heterozygous variants appear in all
different places in the genome• A large region without heterozygous variants
may indicate occurrence of CNV– Note: Having only one copy leads to LOH, but LOH
can also happen in regions with other copy numbers
Last update: 20-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 38
BAF• LOH only indicates regions with one allele
completely disappeared• B-allele frequency is a more general concept
that asks for the count of reads that support the B allele (defined arbitrarily) as a ratio of the total number of reads aligned to the location (that support either the A or B allele)– The concept was originally defined for microarray
data
Last update: 20-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 39
BAF, LOH and LRR• LRR: log2(observed signal / expected signal)
Last update: 20-Sep-2015
An illustration of log R Ratio (LRR) and B Allele Freq (BAF) values for the chromosome 15 q-arm of an individual. A normal chromosome region has three BAF genotype clusters, as represented as AA, AB, and BB genotypes in boxes, and with LRR values centered around zero. The copy-neutral LOH region has normal LRR values, but without the AB genotype cluster. The increased copy number for a CNV region can be detected based on an increased number of peaks in the BAF distribution, as well as increased LRR values. The patterns of LRR and BAF for different CNV regions, normal regions, and copy-neutral LOH regions are distinct from each other, thus the combination of LRR and BAF can be used to generate CNV calls.
Image credit: Wang et al., Genome Research 17(11):1665-1674, (2007)
INVERSIONSPart 5
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 41
Balanced mutation• Insertion, deletion and CNV result in copy
number changes• In contrast, translocations and inversions
usually do not– They are called “balanced mutations”
• Balanced mutations cannot be detected by checking read depth
Last update: 22-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 42
Inversions: A closer look• Suppose we have the following sequence:– ACGCAT
• What would it look like if the CGCA part is inverted?– AACGCT?– AGCGTT?– ATGCGT?
• Even with both strands sequenced and inversions, we do not try to align a 3’-5’ sequence with a 5’-3’ sequence
Last update: 20-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 43
Inversion: Strand and read orientation
Last update: 20-Sep-2015
Image credit: Okamura et al., BMC Genomics 8:160, (2007)
Reference
Sequenced DNA
Fragment 1 AACTTG
Alignments 1 AAC TTG
Fragment 2 AACGTT
Alignments 2 AAC TTG
Fragment 3 CTTTTG
Alignments 3 TTC TTG
Assuming perfect alignments
Fragment 4 CTTGTT
Alignments 4 TTCTTG
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 44
More on read orientations• Some SVs are complex
Last update: 20-Sep-2015
Image credit: Medvedev et al., Nature Methods 6(11S):S13-S20, (2009); Pevzner, PNAS 100(13):7672-7677, (2003)
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 45
Even more on read orientations• If a fragment is too long, one
can circularize it, segment the circularized DNA again, and sequence the segment with the junction
Last update: 20-Sep-2015
Image source: Illumina Nextera technical note, http://www.illumina.com/documents/products/technotes/technote_nextera_matepair_data_processing.pdf
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 46
VCF files• There is a file format defined for genetic
variants called VCF (Variant Call Format).– Specification available at
http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41
– Two main sections: header and content– Header provides basic information of the file, and
defines content attributes and filters– Each line in the content section represents one
variant in one or more samples
Last update: 20-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 47
An example
Last update: 20-Sep-2015
Example source: http://samtools.github.io/hts-specs/VCFv4.2.pdf
##fileformat=VCFv4.2##fileDate=20090805##source=myImputationProgramV3.1##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>##phasing=partial##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">##FILTER=<ID=q10,Description="Quality below 10">##FILTER=<ID=s50,Description="Less than 50% of samples have data">##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA0000320 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:320 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:420 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:220 1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 48
Final remarks• Some types of genetic variation take more
time and need more complex methods to detect Detect the easy ones first1. Use standard alignment results to:• Detect SNVs and small indels• Get rough information of large indels, translocations,
CNVs and inversions
2. Use unaligned reads and additional procedures to determine detailed information of the SVs
Last update: 20-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 49
Final remarks• Some methods call genetic variants by
combining the information from multiple samples.– Consistency among samples– Contrast among samples (e.g., tumor vs. non-tumor
from the same patient – somatic variants) [lecture]• To study the relationships among multiple
variants, one may further construct haplotypes [project] or identify epistatic interactions among variants [project].
Last update: 20-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 50
Summary• Genetic variations– SNVs– Small indels– SVs: Large indels, translocations, CNVs, inversions
• Methods for detecting genetic variants– Split read– Paired-end read• Orientations
– Depth of coverage– Allele ratios and frequencies
Last update: 20-Sep-2015