Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.
-
Upload
caren-short -
Category
Documents
-
view
213 -
download
0
Transcript of Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.
![Page 1: Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.](https://reader036.fdocuments.us/reader036/viewer/2022082820/5697bf8f1a28abf838c8d21f/html5/thumbnails/1.jpg)
Strings and Patterns in Perl
Ellen Walker
Bioinformatics
Hiram College
![Page 2: Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.](https://reader036.fdocuments.us/reader036/viewer/2022082820/5697bf8f1a28abf838c8d21f/html5/thumbnails/2.jpg)
Finding a Fixed Pattern
• my $string = “ATAAGCTTATCG”;
• my $pattern = “GCT”;
• print index($string,$pattern);
• print index (reverse($string), $pattern);
![Page 3: Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.](https://reader036.fdocuments.us/reader036/viewer/2022082820/5697bf8f1a28abf838c8d21f/html5/thumbnails/3.jpg)
Finding multiple occurrences
• my $start = 0;• print index($string, $pattern, $start);• $start = index($string, $pattern, $start) +
length($pattern);• print index($string, $pattern, $start);• $start = index($string, $pattern, $start) +
length($pattern);
When do you stop searching?
![Page 4: Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.](https://reader036.fdocuments.us/reader036/viewer/2022082820/5697bf8f1a28abf838c8d21f/html5/thumbnails/4.jpg)
Finding all (non-overlapping) occurrences
my $start = 0;
my $found;
$found = index($string, $pattern, $start);
while ($found > -1) {
print “$pattern found at $found\n”;
$start = $found + length($pattern);
$found = index($string, $pattern, $start);
}
![Page 5: Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.](https://reader036.fdocuments.us/reader036/viewer/2022082820/5697bf8f1a28abf838c8d21f/html5/thumbnails/5.jpg)
Pattern Matching Operators
• Three types of operators (so far)– Translation: tr– Substitution: s and g– Matching: m
• Used with =~ to modify a string • Example:
– $complement =~ tr/ACGT/TGCA/
![Page 6: Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.](https://reader036.fdocuments.us/reader036/viewer/2022082820/5697bf8f1a28abf838c8d21f/html5/thumbnails/6.jpg)
Translation
• The tr operator takes two sequences of characters of the same length
• Every character in the first string is changed to the character at the same position in the second string
• This is destructive; save the old string before you use it!
![Page 7: Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.](https://reader036.fdocuments.us/reader036/viewer/2022082820/5697bf8f1a28abf838c8d21f/html5/thumbnails/7.jpg)
Translation examples
• my $string = “actgTGCA”;
• my $capitalizedString = $string;
• $capitalizedString =~ tr/actg/ACTG/;
• my $lowerCaseString = $string;
• $lowerCaseString =~tr/ACTG/actg/;
![Page 8: Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.](https://reader036.fdocuments.us/reader036/viewer/2022082820/5697bf8f1a28abf838c8d21f/html5/thumbnails/8.jpg)
Substitution
• Replaces an entire pattern with another pattern
• Patterns need not be the same length
• s changes only the first occurrence
• Add g to change all occurrences
• Example:– $string =~ s/T/U/g
![Page 9: Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.](https://reader036.fdocuments.us/reader036/viewer/2022082820/5697bf8f1a28abf838c8d21f/html5/thumbnails/9.jpg)
Substitution Examples
• My $aminoAcids = $dna;
• $aminoAcids =~ s/AUG/Met/g;
• $aminoAcids =~ s/GGU/Gly/g;
• $aminoAcids =~ s/GGG/Gly/g;
A sequence of these substitutions will not really work to translate RNA (why not?)
![Page 10: Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.](https://reader036.fdocuments.us/reader036/viewer/2022082820/5697bf8f1a28abf838c8d21f/html5/thumbnails/10.jpg)
Matching
• Not destructive to the string
• Tests if the string matched (can be used as a condition in an if statement.
• Example:if ($string =~ m/T/)
print “String is DNA, not RNA\n”;
![Page 11: Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.](https://reader036.fdocuments.us/reader036/viewer/2022082820/5697bf8f1a28abf838c8d21f/html5/thumbnails/11.jpg)
Non-Exact Patterns
• Can be used with s or m
• Include– wildcard characters, – multiple option matches– capturing
![Page 12: Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.](https://reader036.fdocuments.us/reader036/viewer/2022082820/5697bf8f1a28abf838c8d21f/html5/thumbnails/12.jpg)
Wildcard characters
. Matches any character
* Matches 0 or more characters equal to the preceding character
+ matches 1 or more…
^ before the beginning of the string
$ matches after the end of the string
![Page 13: Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.](https://reader036.fdocuments.us/reader036/viewer/2022082820/5697bf8f1a28abf838c8d21f/html5/thumbnails/13.jpg)
Multiple option matches
[actg] Matches one character in the set a, c, t, g
[^A-Z] Matches one character that is not A-Z
TAG|TGA|TAA Matches either TAG, TGA or TAA
– Example:my $Rpattern = ‘A|G’;
![Page 14: Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.](https://reader036.fdocuments.us/reader036/viewer/2022082820/5697bf8f1a28abf838c8d21f/html5/thumbnails/14.jpg)
Capturing Patterns
• Any pattern in parentheses is “captured”
• The pattern can be recovered with \1, \2 etc.
• Example:
• s/(…)(…)/\2\1/ switches the first two codons in the string.
![Page 15: Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.](https://reader036.fdocuments.us/reader036/viewer/2022082820/5697bf8f1a28abf838c8d21f/html5/thumbnails/15.jpg)
Slides are not Complete!
• Page 56-57 of the Perl book has an extensive list of regular expression examples.
![Page 16: Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.](https://reader036.fdocuments.us/reader036/viewer/2022082820/5697bf8f1a28abf838c8d21f/html5/thumbnails/16.jpg)
Examples
• 6-mer palindrome (.)(.)(.)\3\2\1
• Pair of nucleotides repeated at least three times(.)(.).*\1\2.*\1\2
• Strings that end with GGAGGA$