Post on 17-Jan-2016
Strings and Patterns in Perl
Ellen Walker
Bioinformatics
Hiram College
Finding a Fixed Pattern
• my $string = “ATAAGCTTATCG”;
• my $pattern = “GCT”;
• print index($string,$pattern);
• print index (reverse($string), $pattern);
Finding multiple occurrences
• my $start = 0;• print index($string, $pattern, $start);• $start = index($string, $pattern, $start) +
length($pattern);• print index($string, $pattern, $start);• $start = index($string, $pattern, $start) +
length($pattern);
When do you stop searching?
Finding all (non-overlapping) occurrences
my $start = 0;
my $found;
$found = index($string, $pattern, $start);
while ($found > -1) {
print “$pattern found at $found\n”;
$start = $found + length($pattern);
$found = index($string, $pattern, $start);
}
Pattern Matching Operators
• Three types of operators (so far)– Translation: tr– Substitution: s and g– Matching: m
• Used with =~ to modify a string • Example:
– $complement =~ tr/ACGT/TGCA/
Translation
• The tr operator takes two sequences of characters of the same length
• Every character in the first string is changed to the character at the same position in the second string
• This is destructive; save the old string before you use it!
Translation examples
• my $string = “actgTGCA”;
• my $capitalizedString = $string;
• $capitalizedString =~ tr/actg/ACTG/;
• my $lowerCaseString = $string;
• $lowerCaseString =~tr/ACTG/actg/;
Substitution
• Replaces an entire pattern with another pattern
• Patterns need not be the same length
• s changes only the first occurrence
• Add g to change all occurrences
• Example:– $string =~ s/T/U/g
Substitution Examples
• My $aminoAcids = $dna;
• $aminoAcids =~ s/AUG/Met/g;
• $aminoAcids =~ s/GGU/Gly/g;
• $aminoAcids =~ s/GGG/Gly/g;
A sequence of these substitutions will not really work to translate RNA (why not?)
Matching
• Not destructive to the string
• Tests if the string matched (can be used as a condition in an if statement.
• Example:if ($string =~ m/T/)
print “String is DNA, not RNA\n”;
Non-Exact Patterns
• Can be used with s or m
• Include– wildcard characters, – multiple option matches– capturing
Wildcard characters
. Matches any character
* Matches 0 or more characters equal to the preceding character
+ matches 1 or more…
^ before the beginning of the string
$ matches after the end of the string
Multiple option matches
[actg] Matches one character in the set a, c, t, g
[^A-Z] Matches one character that is not A-Z
TAG|TGA|TAA Matches either TAG, TGA or TAA
– Example:my $Rpattern = ‘A|G’;
Capturing Patterns
• Any pattern in parentheses is “captured”
• The pattern can be recovered with \1, \2 etc.
• Example:
• s/(…)(…)/\2\1/ switches the first two codons in the string.
Slides are not Complete!
• Page 56-57 of the Perl book has an extensive list of regular expression examples.
Examples
• 6-mer palindrome (.)(.)(.)\3\2\1
• Pair of nucleotides repeated at least three times(.)(.).*\1\2.*\1\2
• Strings that end with GGAGGA$