Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn ([email protected])

15
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn ([email protected])

Transcript of Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn ([email protected])

Page 1: Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn (geoff@warmage.com)

Regular Expressions for PHP

Adding magic to your programming.

Geoffrey Dunn ([email protected])

Page 2: Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn (geoff@warmage.com)

What are Regular Expressions

• Regular expressions are a syntax to match text.

• They date back to mathematical notation made in the 1950s.

• Became embedded in unix systems through tools like ed and grep.

Page 3: Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn (geoff@warmage.com)

What are RE

• Perl in particular promoted the use of very complex regular expressions.

• They are now available in all popular programming languages.

• They allow much more complex matching than strpos()

Page 4: Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn (geoff@warmage.com)

Why use RE

• You can use RE to enforce rules on formats like phone numbers, email addresses or URLs.

• You can use them to find key data within logs, configuration files or webpages.

Page 5: Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn (geoff@warmage.com)

Why use RE

• They can quickly make replacements that may be complex like finding all email addresses in a page and making them address [AT] site [dot] com.

• You can make your code really hard to understand

Page 6: Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn (geoff@warmage.com)

Syntax basics

• The entire regular expression is a sequence of characters between two forward slashes (/)

• abc - most characters are normal character matches. This is looking for the exact character sequence a, b and then c

• . - a period will match any character (except a newline but that can change)

• [abc] - square brackets will match any of the characters inside. Here: a, b or c.

Page 7: Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn (geoff@warmage.com)

Syntax basics

• ? - marks the previous as optional. so a? means there might be an a

• (abc)* - parenthesis group patterns and the asterix marks zero or more of the previous character. So this would match an empty string or abcabcabcabc

• \.+ - the backslash is an all purpose escape character. the + marks one or more of the previous character. So this would match ......

Page 8: Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn (geoff@warmage.com)

More syntax tricks

• [0-4] - match any number from 0 to 4• [^0-4] - match anything not the number

0-4• \sword\s - match word where there is

white space before and after• \bword\b - \b marks a word boundary.

This could be white space, new line or end of the string

Page 9: Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn (geoff@warmage.com)

More syntax tricks

• \d{3,12} - \d matches any digit ([0-9]) while the braces mark the min and max count of the previous character. In this case 3 to 12 digits

• [a-z]{8,} - must be at least 8 letters

Page 10: Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn (geoff@warmage.com)

Matching Text

• Simple check: preg_match(“/^[a-z0-9]+@([a-z0-9]+\.)*[a-z0-9]+$/i”, $email_address) > 0

• Finding: preg_match(“/\bcolou?r:\s+([a-zA-Z]+)\b/”, $text, $matches); echo $matches[1];

• Find all: preg_match_all(“/<([^>]+)>/”, $html, $tags); echo $tags[2][1];

Page 11: Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn (geoff@warmage.com)

Matching Lines

• This is more for looking through files but could be for any array of text.

• $new_lines = preg_grep(“/Jan[a-z]*[\s\/\-](20)?07/”, $old_lines);

• Or lines that do not match by adding a third parameter of PREG_GREP_INVERT rather than complicating your regular expression into something like /^[^\/]|(\/[^p])|(\/p[^r]) etc...

Page 12: Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn (geoff@warmage.com)

Replacing text

preg_replace(

“/\b[^@]+(@)[a-zA-Z-_\d]+(\.)[a-zA-Z-_\d\.]+\b/”,

array(“ [AT] “, “ [dot] “), $post);

Page 13: Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn (geoff@warmage.com)

Splitting text

• $date_parts = preg_split(“/[-\.,\/\\\s]+/”, $date_string);

Page 14: Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn (geoff@warmage.com)

Tips

• Comment what your regular expression is doing.• Test your regular expression for speed. Some

can cause a noticeable slowdown.• There are plenty of simple uses like /Width: (\

d+)/• Watch out for greedy expressions. Eg /(<(.+)>)/

will not pull out “b” and “/b” from “<b>test</b>” but instead will pull “b>test</b”. A easy way to change this behaviour is like this: /(<(.+?)>)/

Page 15: Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn (geoff@warmage.com)

References

• http://en.wikipedia.org/wiki/Regular_expressions• http://php.net/manual/en/ref.pcre.php

• Thank you