Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the...

14
Regular Expressions in Perl Part I Alan Gold

Transcript of Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the...

Page 1: Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters.

Regular Expressions in PerlPart I

Alan Gold

Page 2: Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters.

Basic syntax

• =~ is the matching operator• !~ is the negated matching operator• // are the default delimiters• Prefixing the expression with “m” allows for

arbitrary delimiters: e.g. m%Don’t use this%• Modifiers follow the closing delimiter

Page 3: Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters.

Simple matching

• “Hello World” =~ /Hello/• Matches the literal string “Hello”• “Superman” =~ /Kal-El/• Unfortunately does not match

Page 4: Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters.

Metacharacters

• Metacharacters are {}[]()^$.|*+?\• These must be escaped with a “\” to match

their literal characters• “Spoon+fork” =~ /Spoon+/ will match, but not

how you want it to• “Spoonnnnnn” =~ /Spoon+/ will also match• “Spoon+fork” =~ /Spoon\+/ matches properly

Page 5: Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters.

Escape sequences

• Several characters can’t be printed directly• They are matched using an escape sequence• \t is a tab character (ASCII code 9)• \n is a newline character (ASCII code 10)• \r is a carriage return (ASCII code 13)• \0.. Is an octal character, e.g. \033• \x.. Is a hexidecimal character, e.g. \x1B

Page 6: Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters.

Variables

• Variables can be used in regular expressions similarly to double-quoted strings

• $something = “cool”;• ‘cool cruel pool’ =~ /$something/• Will match just fine

Page 7: Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters.

Anchors

• ^ anchors the pattern to the beginning of the string

• $ anchors to the end• “Speaker” =~ /^peak/• Will not match• “Rabbit” =~ /bit$/• Will match

Page 8: Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters.

Character classes

• Character classes match any character contained in [brackets]

• /tin[yas]/ will match tiny, tina, and tins• “-” can be used to represent a range• /[a-zA-Z0-9]/ will match a single alphanumeric

character• The literal “-” character can be matched if it is

the first or last character, e.g. /[-0-9]/

Page 9: Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters.

Negated character classes

• The “^” character negates a character class• /200[^7]/ will not match 2007 but will match

2008, 200q, etc.

Page 10: Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters.

Shortcut character classes

• \d is a digit, equivalent to [0-9]• \s is any whitespace, equivalent to [\ \t\r\n\f]• \w is a word character, eq. [0-9a-zA-Z_]• \D is any non-digit, eq. [^0-9]• \S is any non-whitespace, eq. [^\s]• \W is any non-word, eq. [^\w]• The period ‘.’ matches any character but ‘\n’

Page 11: Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters.

Word anchors

• The word anchor ‘\b’ matches the boundary between a word character and non-word character

• /\bpen/ matches “penitentiary”, not “open”• /\bpen\b/ only matches “pen” if surrounded

by non-words, e.g. “this pen is blue”

Page 12: Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters.

Modifiers

• Modifiers change the behavior of the engine• // is the default, ‘.’ doesn’t match newlines• //s causes ‘.’ to match newlines• //m treats each line as its own string• //i matches case-insensitively• Modifiers can be combined, e.g. //sim• /^car.$/im matches “not a car\nCAR!”

Page 13: Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters.

Or

• The pipe character ‘|’ can be used to match any one of the given choices

• /lumber|wood/ will match “My desk is made of spare lumber” and “My desk is made of 100,000 year old petrified wood”

• /0|1|2/ is equivalent to [0-2]

Page 14: Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters.

A blank slide