Post on 26-Dec-2015
Introduction to Bioinformatics
Dot Plots
Dot Plots
• One of the simplest and oldest methods for sequence alignment
• Visualization of regions of similarity – Assign one sequence on the horizontal axis– Assign the other on the vertical axis– Place dots on the space of matches– Diagonal lines means adjacent regions of
identity
Simple Example• Construct a simple dot plot for
GCTGAAGCGAA
One sequence goes horizontally, the other verticallyMark boxes w/ matched horizontal and vertical symbolsLook for diagonal(s)
Alignment:GCTGAAGCT-AA
G C T G A A
G * *
C *
T *
A *
A *
Another Example• Construct a simple dot plot for
GCTAGTCAGATCTGACGCTAGATGGTCACATCTGCCGC
A long stretch of nearly identical residues is revealed starting at the fifth nucleotide of each sequence (GTCA-ATCTG-CGC).
Sliding Window and Cutoff
• Problem– Plot becomes noisy when comparing large,
similar sequences
• Solution– Sliding window (size = w)– Cutoff (value = v)– Consider w nucleotides at a time – When at least v matches in a window, place a
dot on the space where the window starts
Example• Same example with w = 4 and v = 3
• Compare to the previous plot. You make the call!
Worksheet • w = 4 and v = 3
What else can it do (and how)?
• Gaps • Inverse subsequence• Repeats• Palindrome• Genome rearrangement• Exon identification• RNA structure prediction• Nice tool for conceptualizing sequence-
related algorithms