CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.

15
CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009

Transcript of CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.

Page 1: CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.

CIS 451: Regular Expressions

Dr. Ralph D. WestfallJanuary, 2009

Page 2: CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.

Regular Expressions find patterns in text strings based on

the idea of wild cards .e.g., dir s*.doc in DOS/Windows finds

any .doc file starting with s, followed by any other character(s)

find much more complicated patterns e.g., any one of a specified list of

characters rather than just one or any e.g., any character except ones in a list

Page 3: CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.

RegEx Class implements the concept of regular

expressions in .NET languages could be used for E-commerce for

input validation checking user inputs such as e-mail

addresses, credit card #s, telephone #s, etc. to see if they match the typical patterns

Page 4: CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.

Regular Expression Object create a regular expression object

Dim reg as Regex reg = New Regex("[pattern]")

could also create Regex object without a pattern, and then provide the pattern as an argument in a Regex method

Page 5: CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.

.NET RegEx Patterns . ba. finds bat, bal, bac (matches any

single character) ? bottles? finds bottle and bottles (0 or

1 of the character before ?) [3-9] finds 3, 4, … 9 (any 1 character

in specified range) + to+ finds to, too (1 or more of

before +) {#} b{2} finds any bb (# of preceding

matches together)

Page 6: CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.

.NET RegEx Patterns - 2 | or operator e.g., a|b finds a or b \ escape character to identify literals

e.g., \. can combine different parameters

[a-z]{3} finds any 3 letter word [0-9]{4}( |-)? finds any 4 #s, followed

by 0 or 1 occurrence of space or dash (could use in credit card validation)

Page 7: CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.

Regular Expression Object - 2 methods

reg.IsMatch([string to search]) returns True if pattern is found in

string reg.Match([string to search])

returns the matching characters in a string each time pattern is found

reg.Replace([string1, string2]) replaces every occurrence of string1

with string2 'notes (here = .aspx)

Page 8: CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.

Using Regular ExpressionsDim ok as BooleanDim reg As New Regex("[0-9]{4}( |-)?" & _ "[0-9]{4}( |-)?[0-9]{4}( |-)?[0-9]{4}", _ RegexOptions.Compiled)

ok = reg.IsMatch("1234-4323-9876-6543")If ok Then Response.Write "String is OK"End If 'notes (here

=.html)

Page 9: CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.

Warning a regular expression may say a

String is OK even if there are other characters around it e.g.,

Dim input as String Dim ok as Boolean input = "XYZ1234-4323-9876-6543XYZ" ok = reg.IsMatch(input) 'credit card If ok Then 'pattern MsgBox(input & " is OK") End If

Page 10: CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.

Extracting Matches from Strings

can use .Match() function to separate matching part from other characters around it e.g.,

Dim input as String input = "XYZ1234-4323-9876-6543XYZ" If reg.IsMatch(input) Then MsgBox(reg.Match(input) & " is OK")

End If

Page 11: CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.

More Regular Expressions Info Using Regular Expressions with .NET Regular Expressions Reference Google search (5+ million

occurrences) history of regular expressions

concepts date back to 1940s (before computers)

T-shirt

Page 12: CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.

RegEx Exercises1 explain the following e-mail address

pattern [a-z]+@[a-z]+\.com

2 extend it to handle following endings: com, edu, org, net, gov, mil, int (must

appear at least once)3 modify it to allow numbers, dashes [-]

and periods [.] after the 1st character

Page 13: CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.

RegEx Exercises - 2 create patterns to validate

zip codes both as 5-digit and Zip + 4 e.g., 90702 or 90702-7934

phone #s (intnl, long distance, and local)

credit card numbers names (including middle initial, von, de,

etc.) course #s (CIS, EBZ, CS; 1xx-4xx, etc.)

Page 14: CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.

RegEx Exercises - 3 test text or files samples against

regular expressions at Regular Expression Library's tester page and report back

test regular expressions from previous pages in this PowerPoint others that you make up from Regular Expression Library

Page 15: CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.

RegEx Exercises - 4 use Regular Expression Library's

tester page to Load a Data Source from a URL and find data in the web page using a regular expression that you specify

OR find free (trial) software that will

do the same thing