Perl regular expressions This Powerpoint file can be found at: Kansas City Area SAS User Group...

Perl regular expressions

This Powerpoint file can be found at:

http://www.ku.edu/pri/ksdata/sashttp/kcasug2004-10

Kansas City Area SAS User Group (KCASUG)

October 5, 2004

Larry Hoyle

Policy Research Institute, The University of Kansas

Regular expressions

• A regular expression is a pattern to be matched against some text (a string)

• originally from neurophysiology• Then in QED and grep

• see:• http://msdn.microsoft.com/library/default.asp?

url=/library/en-us/dnaspp/html/regexnet.asp

Perl regular expressions

• Practical Extraction and Report Language implements a version of regular expressions that is something of a standard

• see: http://www.perldoc.com/perl5.6.1/pod/perlre.html

SAS Documentation

Short syntax description

Some simple examples

/Baa/ matches the string "Baa"

/Baa\d/ matches "Baa" followed by

any numeric digit

Using Perl Regular Expressions in SAS 9.1 and above

data cc;

input c $; prxNum=prxParse('/Baa\d/'); start=prxMatch(prxNum,c); if start then put c= 'is a match'; else put c= 'does not match';datalines;BaaBaa2baa3aaaaBaa3;run;

proc sql; select * from cc where prxmatch('/Baa\d/',c);

Documentation for PRX Functions and Call Routines in SAS HELP

CALL PRXCHANGE

Performs a pattern-matching replacement

CALL PRXDEBUG

Enables Perl regular expressions in a DATA step to send debug output to the SAS log

CALL PRXFREE Frees unneeded memory that was allocated for a Perl regular expression

CALL PRXNEXT Returns the position and length of a substring that matches a pattern and iterates over multiple matches within one string

CALL PRXPOSN Returns the start position and length for a capture buffer

CALL PRXSUBSTR

Returns the position and length of a substring that matches a pattern

PRXCHANGE Function

Performs a pattern-matching replacement

PRXMATCH Function

Searches for a pattern match and returns the position at which the pattern is found

PRXPAREN Function

Returns the last bracket match for which there is a match in a pattern

PRXPARSE Function

Compiles a Perl regular expression (PRX) that can be used for pattern matching of a character value

PRXPOSN Function

Returns the value for a capture buffer

single character "wildcards"

. matches any character

\d matches a numeric character

\D matches a non-numeric

\w matches a "word character"

(letter, digit, or underscore)

\W matches a non-word character

\s matches white space (spaces or tabs)

\S matches non-white space

Try a different pattern for exprdata myturn;

retain expr '/Whatever/'; /* put your own expression here */

retain prxNum; length c $ 80; input c $80.; if _n_=1 then do; prxNum=prxParse(expr);

if prxNum=0 then put 'bad expression' expr= ; end; start=prxMatch(prxNum,c); put start= c= ;datalines;

Whatever floats your boatNow is the timefor all-goodmen 2come to the aid of their country.the quick brown fox jumped over the lazy dogThe quick red fox jumped over the 3 lazy dogsYou could replace this with whatever text you wanted.;run;

find all the numbersfind the first space on each linefind any non word characters

sample expressions

find all the numbers /\d/find the first space on each line /\s/find any non word characters /\W/

Anchors

^ beginning of the string

$ end of the string

Character Classes

[acB] matches "a", "c" or "B"

[D-G] matches "D", "E", "F", or "G"

[^aeiouyAEIOUY] matches any non vowel

Search for wordsdata mywords;

/* words starting with a-d */ retain expr '/^[a-dA-D]/'; retain prxNum; length word $ 50; input word $50.; if _n_=1 then do; prxNum=prxParse(expr);

if prxNum=0 then put 'bad expression' expr= ; end; start=prxMatch(prxNum,word); put start= c= ; if start>0;datalines;

aboocwmDublinoocytepneumonoultramicroscopicsilicovolcanoconiosisqatWashington;run;

find all the proper namesfind words with a "q" not followed by a "u"

How about?

find all the proper names find words with a "q" not followed by a "u"

How about?

find all the proper names /[A-Z]/

find words with a "q" not followed by a "u"

How about?

find all the proper names /[A-Z]/

find words with a "q" not followed by a "u" /q[^u]/

Multipliers

{n} previous expression n times e.g. {3} {n,} previous expression n or more times{n,m} previous expression from n to m times{0,m} previous expression m or fewer times

* previous expression 0 or more times {0,}

+ previous expression 1 or more times {1,}

? previous expression 0 or 1 times {0,1}

from the word list

find words without vowels

from the word list

find words without vowels /^[^aeiouyAEIOUY]+$/

"write only"? document your expressions

find words without vowels /^[^aeiouyAEIOUY]+$/

/*^ beginning of string[^aeiouyAEIOUY]+ one or more non-vowels$ end of string*/

Hangman Example

• Suppose we want to code the sequence of guesses in the game of hangman by the use of inferred strategies– e.g. did the person guess the most

frequently used letters first?– did the person guess vowels first?

Coding the strategiesdata HangmanGuesses;%let ns=4; drop i prxNum1-prxnum&ns; array expr{&ns} $ 80 ex1-ex&ns( '/^[aeiou]{3}/' '/^[etaoin]{6}/' '/^qwerty/' '/^[zqxjkv]{6}/' ); array used{&ns}used1-used&ns; label used1= '3 vowels first' used2= 'letter frequency' used3= 'qwerty' used4= 'unusuals' ; array prx{&ns}prxNum1-prxnum&ns; retain used1-used&ns; /* strategy

name */ retain ex1-ex&ns; /* strategy name */ retain prxNum1-prxnum&ns; /*prx

number */

length guess $ 13; input guess $13. success; guess=lowcase(guess);

if _n_=1 then do i=1 to &ns; prx{i}=prxParse(expr{i});

if prx{i}=0 then put "expression &ns is bad" expr{i}= ;

end; do i=1 to &ns; used{i}=prxMatch(prx{i},guess); end;datalines;eaotwhnrbg 1etaoinshrdlcu 0etaoinshrdluc 0qwertyuiopasd 0vkjxqznmasdfg 0asdfghjklzxcv 0argbe 1efghijklmnopq 0abcdefghijklm 0;

We get dummy variables

Looking at expression 2

Memory within match

(pattern) treat the pattern as a unit and remember the part of the string matched

\n inside the match recall substring n

example /(\d){3}X\1/ matches 123X123

not 123X456

Memory outside match

(pattern) treat the pattern as a unit and remember the part of the string matched

$n outside the match recall substring n

example s/(\w)+,(\w)+/ $2 $1/ substitutes Doe,John

with John Doe

Call log example

datalines;

I called Fred at 9:17 am at 785-555-1234

10:12 Called George - (913)-555-3213

816-555-9876 was Irving the time was 1:22 pm

751 555 1212 8384 3:33 Bob

Get the time

retain expTime '/\d{1,2}:\d{2}\s?(pm|am)?/';

/* \d{1,2}: one or two digits followed by a colon

\d{2}\s? two digits and optional space

(pm|am)? optional am or pm

Get the phone numberdefine 3 capture buffers

retain expPhone '/$?([2-9]\d\d)$?[ -](\d\d\d)[ -](\d{4})/'; /* $? optional left paren ([2-9]\d\d) 3 digit area code (buffer 1) $? optional right paren [ -] space or hyphen (\d\d\d) 3 digit exchange (buffer 2) [ -] space or hyphen (\d{4}) 4 digit exchange (buffer 3) */

Use the expressions retain prxTime prxPhone;

if _n_=1 then do; prxTime=prxParse(expTime);

if prxTime=0 then put 'bad expression' expTime= ;

prxPhone=prxParse(expPhone);if prxPhone=0 then put 'bad expression'

expPhone= ; end;

sequence=_n_;

call prxsubstr(prxTime, note, position, length); time=substr(note,position,length);

call prxsubstr(prxPhone, note, position, length); phone=substr(note,position,length);

CALL PRXPOSN (prxPhone, 1, position, length); ac=substr(note,position,length);

CALL PRXPOSN (prxPhone, 2, position, length); exchange=substr(note, position,length); CALL PRXPOSN (prxPhone, 3, position, length); last4=substr(note, position,length);

local=exchange||'-'||last4;

Result

The time and phone number have been extracted.The phone number is standardized.

Substitution expressions

s/match expression/replacement/

s/cat/hat/ changes cat to hat

s/([a-zA-Z\-]+),([a-zA-Z\-]+)/$2 $1/

changes Doe-Roe,John to John Doe-Roe

Call PRXCHANGE(Data Step only)

CALL PRXCHANGE (regular-expression-id,

times,

old-string

<, new-string

<, result-length

<, truncation-value

<, number-of-changes>>>>);

PRXCHANGE(Data Step, SQL, where clauses)

PRXCHANGE(perl-regular-expression |

regular-expression-id,

times,

source)

data cc; length c $ 60 changedString $ 60; input c $60.; prxNum=prxParse('s/([a-zA-Z\-]+),[ ]*([a-zA-Z\-]+)/$2 $1/'); CALL prxChange (prxNum, 1, c, changedString, newLength, wasTruncated, numberChanges);

datalines;Doe-Roe,JohnBlackSheep, BaaBaaPrince;

PRXCHANGE example

s/([a-zA-Z\-]+) first word

, comma

[ ]* zero or more blanks

([a-zA-Z\-]+) second word

/$2 $1/ switch words

PRXCHANGE example results

Perl regular expressions This Powerpoint file can be found at: Kansas City Area SAS User Group...

Documents

Transcript of Perl regular expressions This Powerpoint file can be found at: Kansas City Area SAS User Group...

Regular Expressions -- SAS and Perl

Programming EPICS with PERL · 2017-07-12 · Data Structures Control Structures Regular Expressions Object Oriented • Perl handles both strings and numbers elegantly Excellent

1 Perl Regular Expressions. Things Perl Can Do Easily with Regular Expression 2 Pattern matching Find out if a string contains some specific pattern.

Introduction to SAS System Where Expressions

AND FINITE AUTOMATA… Ruby Regular Expressions. Why Learn Regular Expressions? RegEx are part of many programmer’s tools vi, grep, PHP, Perl They provide.

README - IBM · 2016. 7. 1. · openldap openssh openssh-clients ... pcre pdksh perl perl-DateManip perl-Filter perl-HTML-Parser perl-HTML-Tagset perl-lib perl-URI pinfo popt portmap

Programming Perl DBI - The University of Edinburghbmg/software/Perl Books... · · 2013-04-11ODBC-Embraced and Extended DBI-Thrashed and Mutated ... tutorial" or similar expressions

Use Perl like Perl

Regular Expressions. The Purpose Regular expressions are the main way Perl matches patterns within strings. For example, finding pieces of text within.

Extending Finite Automata to Efficiently Match Perl-Compatible Regular Expressions Publisher : Conference on emerging Networking EXperiments and Technologies.

Programming in Perl regular expressions and m,s operators Peter Verhás January 2002.

Perl 101: Regular Expressions - Meetupfiles.meetup.com/501101/Perl 101- Regular Expressions.pdfExpressions-Alan Voss, Perl Hacker. A) Black magic? B) A form of wizardry? C) A (mostly)

Regular Expressions: The Power of Perl - Mathematics | U …dburns/547/damianPerl4.pdf · Regular Expressions: The Power of Perl 1.What is a regular expression (regex) ? - it is a

Perl family: 15 years of Perl 6 and Perl 5

Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,

Regular Expressions - Perlblob.perl.org/books/beginning-perl/3145_Chap05.pdf · Regular Expressions "11:15. Restate my assumptions: 1. Mathematics is the language of nature. 2. Everything

Perl Grouping Statements, Especially When Statements Are Not Expressions

110-31: An Introduction to Perl Regular Expressions · I have heard it said that Perl regular expressions are "write only." That means, with some practice, you can become fairly accomplished

Perl Notes for Professionals - goalkicker.com · Chapter 18: Regular Expressions ... Replace a string using regular expressions ... Section 28.4: Manipulation of Rows / Columns ...

Programming in Perl - Florida State Universitycop4342/slides/perl.pdfProgramming in Perl Introduction Scalars Lists and Arrays Control Structures I/O Hashes Regular Expressions Dealing