Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies...

21
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies [email protected] Knowledge Engineering : Systems Integration : Web Development : Training

Transcript of Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies...

Page 1: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

Regular Expressions in ColdFusion Applications

Dave FauthDOMAIN technologies

[email protected]

Knowledge Engineering : Systems Integration : Web Development : Training

Page 2: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

2

Knowledge Engineering : Systems Integration : Web Development : Training

Regular Expressions• Small language in itself to perform pattern matching

and text manipulation• Used for client side validation, server side

manipulation and virtually any other task requiring string matching and manipulation

• Enhanced in CF 4.0 to include REFindNoCase and REReplaceNoCase

• Available in CF Studio, CF 4.x, and JavaScript

Page 3: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

3

Knowledge Engineering : Systems Integration : Web Development : Training

CF 4.x supported statements

• REFind• REFindNoCase• REReplace• REReplaceNoCase

Page 4: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

4

Knowledge Engineering : Systems Integration : Web Development : Training

REFind• Returns the position of the regular expressions first

occurrence in a block of text• Case Sensitive• REFind(reg_expression,string [,start]

[,returnsubexpression]<CFSET tmpLoc = REFind(‘[\?&]’,’display.cfm?a=3’)>

<CFOUTPUT>

#tmpLoc#

</CFOUTPUT>

Page 5: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

5

Knowledge Engineering : Systems Integration : Web Development : Training

REFindNoCase• Returns the position of the regular expressions first

occurrence in a block of text• Case Insensitive• REFindNoCase(reg_expression,string [,start]

[,returnsubexpression]<CFSET myPath = “c:\reportFinder.cfm”>

<CFSET tmpLoc = REFindNoCase(“(\.cfm)”,myPath)>

<CFOUTPUT>

#tmpLoc#

</CFOUTPUT>

Page 6: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

6

Knowledge Engineering : Systems Integration : Web Development : Training

REReplace• Return a string with the regular expression replaced

with a substring in the specified scope• Case Sensitive• ReReplace(string,reg_expression,substring [,scope])<CFSET myPath = “c:\reportFinder.cfm”>

<CFSET tmpLoc = REReplace(myPath,“[a-z]:”,”d:”)>

<CFOUTPUT>

#tmpLoc#

</CFOUTPUT>

Page 7: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

7

Knowledge Engineering : Systems Integration : Web Development : Training

REReplaceNoCase• Return a string with the regular expression replaced

with a substring in the specified scope• Case Insensitive• ReReplaceNoCase(string,reg_expression,substring

[,scope])

<CFSET myPath = “c:\reportFinder.cfm”>

<CFSET tmpLoc = REReplaceNoCase(myPath,“[A-Z]:”,”d:”)>

<CFOUTPUT>

#tmpLoc#

</CFOUTPUT>

Page 8: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

8

Knowledge Engineering : Systems Integration : Web Development : Training

Single Character Matching• Match a single character• Extensive set of rules for doing single character

matching• Rules include:

Special Characters are: + * ? . [ ^ $ ( ) { | \ Any character not a special character matches itself A backslash escapes a special character A period matches any character except the newline A set of characters in brackets [] is a one character RE that

matches any of the characters in the set. [AKM] matches A or K or M

Page 9: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

9

Knowledge Engineering : Systems Integration : Web Development : Training

Single Character Matching Cont.

• Rules Cont. Any regular expression can be followed by {m,n} forces a

match of m through n occurrences of the preceding regular expression. Example a{2,4} = aa, aaa, aaaa

A range of characters can be indicated with a dash. Example [A-Z] matches all uppercase letters. If the first character of the set is a ^, the RE matches any character except those in the set. I.e. [^AEIOU] matches all uppercase consonants

Page 10: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

10

Knowledge Engineering : Systems Integration : Web Development : Training

Multi-Character Regular Expressions

• You can use the following rules to build a multi-character regular expressions:

Parentheses group parts of regular expressions together into grouped sub-expressions that can be treated as a single unit. For example, (ha)+

A one-character regular expression or grouped sub-expressions followed by an asterisk (*) matches zero or more occurrences of the regular expression. For example, [a-z]* matches zero or more lower-case characters.

A one-character regular expression or grouped sub-expressions followed by a question mark (?) matches zero or one occurrences of the regular expression. For example, xy?z matches either "xyz" or "xz".

Page 11: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

11

Knowledge Engineering : Systems Integration : Web Development : Training

Multi-Character cont. The concatenation of regular expressions creates a regular

expression that matches the corresponding concatenation of strings. For example, [A-Z][a-z]* matches any capitalized word.

The OR character (|) allows a choice between two regular expressions. For example, jell(y|ies) matches either "jelly" or "jellies".

Braces ({}) are used to indicate a range of occurrences of a regular expression, in the form {m, n} where m is a positive integer equal to or greater than zero indicating the start of the range and n is equal to or greater than m, indicating the end of the range. For example, (ba){0,3} matches up to three pairs of the expression "ba".

Page 12: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

12

Knowledge Engineering : Systems Integration : Web Development : Training

Character Classes• Special Commands that can take the place of

character ranges.• CF uses double brackets [[alpha]]• Cold Fusion supports the following character classes:• alpha Matches any letter. Same as [A-Za-z].• upper Matches any upper-case letter. Same as [A-Z].• lower Matches any lower-case letter. Same as [a-z].• digit Matches any digit. Same as [0-9].• Alnum Matches any alphanumeric character. Same as [A-Za-z0-

9].

Page 13: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

13

Knowledge Engineering : Systems Integration : Web Development : Training

Character Classes cont.• Xdigit - Matches any hexadecimal digit. Same as [0-9A-Fa-f].• Space - Matches a tab, new line, vertical tab, form feed, carriage

return, or space.• Print - Matches any printable character.• punct - Matches any punctuation character, that is, one of ! ` # S

% & ` ( ) * + , - . / : ; < = > ? @ [ / ] ^ _ { | } ~• graph - Matches any of the characters defined as a printable

character except those defined to be part of the space character class.

• cntrl - Matches any character not part of the character classes [:upper:], [:lower:], [:alpha:], [:digit:], [:punct:], [:graph:], [:print:], or [:xdigit:].

Page 14: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

14

Knowledge Engineering : Systems Integration : Web Development : Training

Character Classes example<cfset thistext="<hr>Here is some text<hr>

<b>here is some bold text</b>

<i>Here is italic text.</i>">

<cfset mynewtext = REReplaceNoCase(thistext, "<[/]*[[:print:]]>", "" , "ALL")>

<cfset mynewtext2 = REReplace(thistext, "<[^>]*>", "", "ALL")>

Page 15: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

15

Knowledge Engineering : Systems Integration : Web Development : Training

Back Referencing• Capability of regular expressions to remember a section of text

and refer to it later• Parenthesis provide grouping for back references• Grouping is referred to using ‘\1’ through ‘\9’• Expressions are counted from left to right

ex. “(a(bc)(d)) \1 = a(bc)(d) \2 = bc \3 = d

• Powerful for search and replace functions

Page 16: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

16

Knowledge Engineering : Systems Integration : Web Development : Training

Back Referencing example

<cfset secondstring = "here is my email address [email protected] ">

<CFSET NewString = REReplaceNoCase( secondstring,'([[:space:]])([a-z0-9\.]+@([[:print:]]+\.)+[a-z]{2,3})([[:space:]])', '\1<A HREF="mailto:\2">\2</A>\4', "ALL")>

Page 17: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

17

Knowledge Engineering : Systems Integration : Web Development : Training

Using Regular Expressions in Studio

• Extended find and replace in Studio and Homesite support Regular Expressions

• Open the extended find or the extended replace dialog box. Check the regular expressions box. Type in your regular expression. The Studio RE engine evaluates the selected files and returns each matching pattern

Page 18: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

18

Knowledge Engineering : Systems Integration : Web Development : Training

Example Uses of Regular Expressions

• Removing HTML Tags from Text

<cfset amazonPrice = “Our Price: $14.98 “><cfset amazonPrice = ReFindNoCase('\$[[:digit:]]{1,4}\.[[:digit:]]

{2}',text,1,1)>

• Retrieving Information from a pagerefindnocase("<body[^>]*>(.*)</body>", pagetext, 1, "TRUE")

REFindNoCase("[[:upper:]]{6}-[[:digit:]]{2}-[[:digit:]]{4,6}",Body)

Page 19: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

19

Knowledge Engineering : Systems Integration : Web Development : Training

When Not To Use Regular Expressions

• When it is easier to use something else… Example: <cfset myuser = “engr\dbrown”>Rather than write:

<cfset testname = "engr\dbrown"><Cfset myUsername = ReFindNoCase(".*\\(.*)",testname,1,1)><cfset myUsername = Mid(testname,myUsername.pos[2],myUsername.len[2])>

Write: <cfset myUsername = ListLast(testname,”\”)>

Page 20: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

20

Knowledge Engineering : Systems Integration : Web Development : Training

Cold Fusion RE Limitation

• Limiting input string size In CFML RegExp functions such as REFind and REReplace, large input strings

(greater than approximately 20,000 characters) will cause a debug assertion failure and a regular expression error will be reported. To avoid this, break up your input into smaller chunks as illustrated in the following example. Here the variable input has a size greater than 50000.

<CFSET test = mid(input, 1, 20000)><CFSET out1 = REReplace(test, "[ #Chr(9)##Chr(13)##Chr(10)#]

+#Chr(13)##Chr(10)#", "#chr(10)#", "ALL")><CFSET test = mid(input, 20001, 20000)><CFSET out2 = REReplace(test, "[ #Chr(9)##Chr(13)##Chr(10)#]

+#Chr(13)##Chr(10)#", "#chr(10)#", "ALL")><CFSET test = mid(input, 40001, len(input) - 40000)><CFSET out3 = REReplace(test, "[ #Chr(9)##Chr(13)##Chr(10)#]

+#Chr(13)##Chr(10)#", "#chr(10)#", "ALL")><CFSET result = out1 & out2 & out3>

Page 21: Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies d.fauth@domain-tech.com Knowledge Engineering : Systems Integration : Web.

21

Knowledge Engineering : Systems Integration : Web Development : Training

Resources• Javascript

http://developer.netscape.com/library/documentation/communicator/jsguide/regexp.htm

http://developer.netscape.com/docs/examples/javascript/regexp/overview.htm/documentation/communicator/jsguide/regexp.htm

JavaScript Bible 3rd Edition by Danny Goodman

• CF Studio file:///C|/PROGRAM FILES/ALLAIRE/COLDFUSION

STUDIO4/Help/Developing_Web_Applications_with_ColdFusion/08_Regular_Expressions

• Cold Fusion Advanced Cold Fusion 4.0 Application Development by Ben Forta CF-Talk Mailing List

[email protected]

• General An excellent reference on regular expressions is Mastering Regular

Expressions, Jeffrey E. F. Friedl. O'Reilly & Associates, Inc., 1997. ISBN: 1-56592-257-3, http://www.oreilly.com.