Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn ([email protected])
-
Upload
holly-hines -
Category
Documents
-
view
216 -
download
1
Transcript of Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn ([email protected])
What are Regular Expressions
• Regular expressions are a syntax to match text.
• They date back to mathematical notation made in the 1950s.
• Became embedded in unix systems through tools like ed and grep.
What are RE
• Perl in particular promoted the use of very complex regular expressions.
• They are now available in all popular programming languages.
• They allow much more complex matching than strpos()
Why use RE
• You can use RE to enforce rules on formats like phone numbers, email addresses or URLs.
• You can use them to find key data within logs, configuration files or webpages.
Why use RE
• They can quickly make replacements that may be complex like finding all email addresses in a page and making them address [AT] site [dot] com.
• You can make your code really hard to understand
Syntax basics
• The entire regular expression is a sequence of characters between two forward slashes (/)
• abc - most characters are normal character matches. This is looking for the exact character sequence a, b and then c
• . - a period will match any character (except a newline but that can change)
• [abc] - square brackets will match any of the characters inside. Here: a, b or c.
Syntax basics
• ? - marks the previous as optional. so a? means there might be an a
• (abc)* - parenthesis group patterns and the asterix marks zero or more of the previous character. So this would match an empty string or abcabcabcabc
• \.+ - the backslash is an all purpose escape character. the + marks one or more of the previous character. So this would match ......
More syntax tricks
• [0-4] - match any number from 0 to 4• [^0-4] - match anything not the number
0-4• \sword\s - match word where there is
white space before and after• \bword\b - \b marks a word boundary.
This could be white space, new line or end of the string
More syntax tricks
• \d{3,12} - \d matches any digit ([0-9]) while the braces mark the min and max count of the previous character. In this case 3 to 12 digits
• [a-z]{8,} - must be at least 8 letters
Matching Text
• Simple check: preg_match(“/^[a-z0-9]+@([a-z0-9]+\.)*[a-z0-9]+$/i”, $email_address) > 0
• Finding: preg_match(“/\bcolou?r:\s+([a-zA-Z]+)\b/”, $text, $matches); echo $matches[1];
• Find all: preg_match_all(“/<([^>]+)>/”, $html, $tags); echo $tags[2][1];
Matching Lines
• This is more for looking through files but could be for any array of text.
• $new_lines = preg_grep(“/Jan[a-z]*[\s\/\-](20)?07/”, $old_lines);
• Or lines that do not match by adding a third parameter of PREG_GREP_INVERT rather than complicating your regular expression into something like /^[^\/]|(\/[^p])|(\/p[^r]) etc...
Replacing text
preg_replace(
“/\b[^@]+(@)[a-zA-Z-_\d]+(\.)[a-zA-Z-_\d\.]+\b/”,
array(“ [AT] “, “ [dot] “), $post);
Splitting text
• $date_parts = preg_split(“/[-\.,\/\\\s]+/”, $date_string);
Tips
• Comment what your regular expression is doing.• Test your regular expression for speed. Some
can cause a noticeable slowdown.• There are plenty of simple uses like /Width: (\
d+)/• Watch out for greedy expressions. Eg /(<(.+)>)/
will not pull out “b” and “/b” from “<b>test</b>” but instead will pull “b>test</b”. A easy way to change this behaviour is like this: /(<(.+?)>)/
References
• http://en.wikipedia.org/wiki/Regular_expressions• http://php.net/manual/en/ref.pcre.php
• Thank you