Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.

Post on 17-Jan-2016

213 views 0 download

Transcript of Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.

Strings and Patterns in Perl

Ellen Walker

Bioinformatics

Hiram College

Finding a Fixed Pattern

• my $string = “ATAAGCTTATCG”;

• my $pattern = “GCT”;

• print index($string,$pattern);

• print index (reverse($string), $pattern);

Finding multiple occurrences

• my $start = 0;• print index($string, $pattern, $start);• $start = index($string, $pattern, $start) +

length($pattern);• print index($string, $pattern, $start);• $start = index($string, $pattern, $start) +

length($pattern);

When do you stop searching?

Finding all (non-overlapping) occurrences

my $start = 0;

my $found;

$found = index($string, $pattern, $start);

while ($found > -1) {

print “$pattern found at $found\n”;

$start = $found + length($pattern);

$found = index($string, $pattern, $start);

}

Pattern Matching Operators

• Three types of operators (so far)– Translation: tr– Substitution: s and g– Matching: m

• Used with =~ to modify a string • Example:

– $complement =~ tr/ACGT/TGCA/

Translation

• The tr operator takes two sequences of characters of the same length

• Every character in the first string is changed to the character at the same position in the second string

• This is destructive; save the old string before you use it!

Translation examples

• my $string = “actgTGCA”;

• my $capitalizedString = $string;

• $capitalizedString =~ tr/actg/ACTG/;

• my $lowerCaseString = $string;

• $lowerCaseString =~tr/ACTG/actg/;

Substitution

• Replaces an entire pattern with another pattern

• Patterns need not be the same length

• s changes only the first occurrence

• Add g to change all occurrences

• Example:– $string =~ s/T/U/g

Substitution Examples

• My $aminoAcids = $dna;

• $aminoAcids =~ s/AUG/Met/g;

• $aminoAcids =~ s/GGU/Gly/g;

• $aminoAcids =~ s/GGG/Gly/g;

A sequence of these substitutions will not really work to translate RNA (why not?)

Matching

• Not destructive to the string

• Tests if the string matched (can be used as a condition in an if statement.

• Example:if ($string =~ m/T/)

print “String is DNA, not RNA\n”;

Non-Exact Patterns

• Can be used with s or m

• Include– wildcard characters, – multiple option matches– capturing

Wildcard characters

. Matches any character

* Matches 0 or more characters equal to the preceding character

+ matches 1 or more…

^ before the beginning of the string

$ matches after the end of the string

Multiple option matches

[actg] Matches one character in the set a, c, t, g

[^A-Z] Matches one character that is not A-Z

TAG|TGA|TAA Matches either TAG, TGA or TAA

– Example:my $Rpattern = ‘A|G’;

Capturing Patterns

• Any pattern in parentheses is “captured”

• The pattern can be recovered with \1, \2 etc.

• Example:

• s/(…)(…)/\2\1/ switches the first two codons in the string.

Slides are not Complete!

• Page 56-57 of the Perl book has an extensive list of regular expression examples.

Examples

• 6-mer palindrome (.)(.)(.)\3\2\1

• Pair of nucleotides repeated at least three times(.)(.).*\1\2.*\1\2

• Strings that end with GGAGGA$