Mastering Regex in Perl
-
Upload
edureka -
Category
Technology
-
view
187 -
download
5
Transcript of Mastering Regex in Perl
Slide 2 www.edureka.co/mastering-perl-scripting
What is Perl
Benefits of Perl
Advantages of using Perl scripting
Starting Perl by writing the first script,
Uses of Regular Expression,
grep functions
At the end of this module, you will be able to
Objectives
Slide 3 www.edureka.co/mastering-perl-scripting
Hi there!My name is Jose, I’m a computerconsultant, techie and trainer. Studentsusually come to me and ask which computerlanguage they should use in their projectand why.I’m here to help
Meet Mr. Jose
Slide 4 www.edureka.co/mastering-perl-scripting
Hi There!My name is Han, I’m Quality Analyst and my managerasked me to automate the tasks. I’m confused whichlanguage to use as I have tight deadlines and want tomake automation generic. I am here to meet Mr. Joseand wanted to know which language should I use forautomation
Meet Mr. Han
Slide 5 www.edureka.co/mastering-perl-scripting
Hi Jose, I work for investmentbank. My manager asked me toautomate all my tasks. On adaily basis I interact withmillions of shares. I’m confusedwhich language should I use
Hi Han, seems you need tointeract with data and wheneverthe huge data processing comesto your mind Perl is the mostsuitable computer language
Han is Confused!
Slide 6 www.edureka.co/mastering-perl-scripting
Perl is one of the most popular open source interpreted programming language with a huge number of programmers, libraries and resources
Perl has very powerful inbuilt regular expressions which often is the important reason when people decide to use Perl for bulk text processing
Perl is platform independent and also used to generate html pages
Similar to Python, PHP but, with very powerful and flexible features
Inbuilt regular expression provides data filter and data transformation
Perl is nicknamed "the Swiss Army chainsaw of scripting language" due to its flexibility and power
What is Perl?
Slide 7 www.edureka.co/mastering-perl-scripting
What are the Benefit of using Perl?
» Perl has relatively few keywords, simple structure, and a clearly defined syntaxEasy-to-learn
» Perl can run on a wide variety of hardware platforms and has the same interface on all platforms
Portable
» Perl provides interfaces to all major commercial databases» CPAN an archive of Perl library consist more than 20K modules
Databases
» One of Perl's greatest strengths is the bulk of the library is very portable and cross-platform compatible on UNIX, Windows and Mac OS
Standard Library
» Automatic memory management » Automatic garbage collection
Memory Management
» High-level data types and operations» Object-oriented programming» Easy Debugging Techniques» Scalability
Others Benefits
Slide 8 www.edureka.co/mastering-perl-scripting
About Perl
Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to makereport processing easier. Since then, it has undergone many changes and revisions
Perl is not an official acronym but people say it is derived from Practical Extraction and Report Language
As per the saying, frustrations of Unix shell programming led directly to the creation of Perl
It is an open source and interpreted language
Considered a scripting language, but is much more than that
Scalable, Object Oriented and Functional
Used by many Fortune 500 organizations
Simply, there is nothing which Perl cannot do
Slide 9 www.edureka.co/mastering-perl-scripting
Less Restrictions
Developer Productivity
Program Portability
Support Libraries
Component Integration
Enjoyment
» Perl has the relatively less keywords and they are many ways to do the similar thing is aphilosophy of Perl
Why Perl?
Slide 10 www.edureka.co/mastering-perl-scripting
Less Restrictions
Developer Productivity
Program Portability
Support Libraries
Component Integration
Enjoyment
» Perl code is typically one-third to one-fifth the size of equivalent C++ or Java code. Thatmeans there is less to type, less to debug, and less to maintain
Why Perl?
Slide 11 www.edureka.co/mastering-perl-scripting
Less Restrictions
Developer Productivity
Program Portability
Support Libraries
Component Integration
Enjoyment
» Perl programs run unchanged on all major computer platforms. For Example- Windows,Linux, MAC OS etc.
Why Perl?
Slide 12 www.edureka.co/mastering-perl-scripting
Less Restrictions
Developer Productivity
Program Portability
Support Libraries
Component Integration
Enjoyment
» Perl comes with a large collection of prebuilt and portable functionality, known as theStandard modules. These modules supports an array of application-level programmingtasks, from text pattern matching to network scripting
Why Perl?
Slide 13 www.edureka.co/mastering-perl-scripting
Less Restrictions
Developer Productivity
Program Portability
Support Libraries
Component Integration
Enjoyment
» Perl scripts can easily communicate with other parts of an application, using a variety ofintegration mechanisms
Why Perl?
Slide 14 www.edureka.co/mastering-perl-scripting
Less Restrictions
Developer Productivity
Program Portability
Support Libraries
Component Integration
Enjoyment » Because of the ease of use and built-in toolset, Perl makes the programming morepleasurable
Why Perl?
Slide 15 www.edureka.co/mastering-perl-scripting
Users and Perl Projects
» Yahoo uses Perl in many of website development and data processing
» SpamAssassin is the well known SPAM filter software. It is part ofthe Apache Software Foundation
» CiderWebmail, is an opersource product written in Perl and AJAX
Slide 16 www.edureka.co/mastering-perl-scripting
Users and Perl Projects (Contd.)
» Twiki is one of the best-known wiki software with an orientation tosupport companies. It is built primarily by the company with the samename that also provides cloud-based hosted Twiki service
» Bugzilla is the well known bug-tracking system developed by and forMozilla. It is used in quite a lot of companies
Slide 17 www.edureka.co/mastering-perl-scripting
Traditional Uses of Perl
Internet ScriptingSystem Utilities
Web Scraping Database Programming Ad Targeting
Text Processing
Slide 18 www.edureka.co/mastering-perl-scripting
Traditional Uses of Perl (Contd.)
Request
Result
NetworkCRM
LOBERP
ETL
Data Warehouse
ETL Processing Network Programming
Slide 19 www.edureka.co/mastering-perl-scripting
Write a First Program
We can use any editor to create a scripts on Windows and vi editor on Linux
The extension of the script is .pl
Perl executable statements end with semicolons (;)
Perl is case-sensitive
Free form – whitespaces are ignored
Comment begin with # (pound sign) – may be anywhere, not just beginning of line
Perl also support multiline comment through POD (Plain Old Documentation)
Using POD we can add the documentation in the scripts, these statements are not treated as executable statements
__END__ is one of the special literal which is the logical end of the program
Slide 20 www.edureka.co/mastering-perl-scripting
Write a First Program (Contd.)
To execute the script, invoke the script using perl <script name>
For LINUX users – you can execute the script while adding the she bang line (the interpreter address at the very first line of the script) to make them self executable
Example
D:\Edureka > perl helloWorldDemo.pl
D:\Edureka > helloWorldDemo.pl
Slide 21 www.edureka.co/mastering-perl-scripting
Regular Expression is a set of characters together form the search pattern
Main use of regular expression is to match patterns in any string forms
The other use of regular expression ‘find and replace feature’
Regular expression forms the generic pattern for the string matching with the help of pre-defined wildcard characters
Many language provide regular expression capabilities, some language have it inbuilt and other are having regular expression libraries
Regular expression is also known by regex or regexp
In Perl regex is inbuilt, hence it is pretty good in performance
What is Regular Expression?
Slide 22 www.edureka.co/mastering-perl-scripting
Real World – Regular Expression
I wish, if I could have the software which filter all the
phone call starting with +140
Slide 23 www.edureka.co/mastering-perl-scripting
Match Operator
We have match operator which matches the regex available in the string
=~ (assignment operator followed by tilda operator is use for regex matching)
!~ (Negation operator followed by tilda operator is use for regex un-matching)
~ operator after assignment operator perform the regex matching, REGEX are case sensitive, m character in matching regex is optional
Slide 25 www.edureka.co/mastering-perl-scripting
The First Wildcard
Wildcards (are also called as quantifiers) are the operator symbols which have specific meaning inside regular expression
For example: . (Dot or period) matches any character, digit, alphanumeric character except newline character (\n).
Slide 26 www.edureka.co/mastering-perl-scripting
Match Operator itself
In many cases, user may wants to match the operator symbol itself in the regular expression. We can suppress the wild cards and special characters itself by backslash (\)
Output
Example
Slide 27 www.edureka.co/mastering-perl-scripting
Capturing and Grouping
Perl regex remember a group of strings which being the part parentheses in the regular expression
Inside regex, these groups are refer by back references. They are \1, \2,\3 and so on..
Outside regex, these groups are refer by special variable $1, $2, $3 and so
These groups can also be fetched by variables assignment in list context called as capturing
Slide 29 www.edureka.co/mastering-perl-scripting
Substitution
The another Perl operator that uses regular expressions allows us to provide find and replace feature
Regex are Greedy, means it will try to match as much it can!
This is called as substitution
Slide 30 www.edureka.co/mastering-perl-scripting
Modifier ‘i’ and ‘g’
‘i’ modifier make the REGEX case insensitive
‘g’ modifier is for global search
Slide 31 www.edureka.co/mastering-perl-scripting
Modifier ‘s’ and ‘m’
‘m’ modifier ^ and $ match more than once inside a string.
‘s’ modifier make . to match \n as well
Slide 32 www.edureka.co/mastering-perl-scripting
Modifier ‘x’
‘x’ modifier white spaces in the REGEX are ignored. This modifier is used for clean syntax
Slide 33 www.edureka.co/mastering-perl-scripting
Greedy Property of REGEX Wildcards
Whenever Perl REGEX sees '*' or '+‘ or ‘?’ or {a,b} it will matches as much as it can
This property is greedy property of regex wildcards
Sometimes it’s an issue as substitute replace the matched string
Slide 34 www.edureka.co/mastering-perl-scripting
Other Wildcards
These wildcard characters do not matches themselves. Until and unless they suppressed by backslash
Following are the other wildcards:
Wildcard Meaning
* matches Zero or more occurrence of previous character/s
+ matches One or more occurrence of previous character/s
? matches Zero or One occurrence of previous character/s
Slide 35 www.edureka.co/mastering-perl-scripting
Wildcards Examples
REGEX Matches
AbC*It matches A followed by b followed by either Zero or more occurrence of C. i.e. Ab, AbC,
AbCCCC, AbCCCCCCCCCCCC
AbC+It matches A followed by b followed by minimum one or more occurrence of C i.e. AbC,
AbCCCCCCCC, AbCCC
AbC? It matches A followed by b followed by one or Zero occurrence of C. i.e. Ab, AbC
Ab(cd)*It matches A followed by b followed by either Zero or more occurrence of cd i.e. Ab, Abcd,
Abcdcd
Ab(cd)+It matches A followed by b followed by minimum one or more occurrence of cd i.e. Abcd,
Abcdcd
Ab(cd)? It matches A followed by b followed by either one or zero occurrence of cd i.e. Abcd, Ab
Slide 36 www.edureka.co/mastering-perl-scripting
Combine Multiple Wildcards
REGEX Matches
Ab+C*It matches A followed by minimum one or more occurrence of b followed by either Zero or more
occurrence of C. i.e. Ab, AbC, AbbCCC. AbbCCCCCCCCCC
A.C+It matches A followed by any character followed by minimum one or more occurrence of C i.e. AZC,
AzCCC. AECCCCCCCCCCC
..C? It matches any two characters followed by b followed by one or Zero occurrence of C. i.e. Ab, AbC
<.*> It matches anything inside tags <> i.e. <HTML>, <TAGS>
\( .+\ ) It matches minimum one character inside brackets cd i.e. (Abcd), (a)
ab+c? It match a followed by one or more b followed by zero or one c. i.e. "abbbbc" or "abc", but not "ac"
Slide 37 www.edureka.co/mastering-perl-scripting
Character Class in Regex
Character class is the set of any characters, digits or alphanumeric characters
While using the character class in Regex, it says any single character from the set
In character class we put a list of the characters in set inside square brackets like:
REGEX Matches
[abc] It matches any string which has either ‘a’ or ‘b’ or ‘c’
[abcdefghijklmnopqrstuvwxyz] It matches any string which has either ‘a’ or ‘b’ or ‘c’ or so on till ‘z’
[a-z] It matches any string which has either ‘a’ or ‘b’ or ‘c’ or so on till ‘z’
[0-9] It matches any string which has 0 or 1 or 2 or 3 till 9
[a-zA-Z0-9] It matches any string which has characters from a-z and A-Z and 0-9
[a-z_] It matches any string which has characters from a-z or _ (underscore)
Slide 38 www.edureka.co/mastering-perl-scripting
Negate the Character Class
^ (carat) symbol inside character class is used to negate the character class in regex
If we put the carat within the character class in Regex, it says none of the single character from the set
Here are few examples:
REGEX Matches
[^abc] It matches any string which has neither ‘a’ nor ‘b’ nor ‘c’
[^abcdefghijklmnopqrstuvwxyz] It matches any string which has neither ‘a ‘ or ‘b’ or ‘c’ or so on till ‘z’
[^a-z] It matches the string which has neither ‘a’ or ‘b’ or ‘c’ or so on till ‘z’
[^aeiou] It matches the string which has no vowels
[lL][^abc] It matches the string has ‘l’ or ‘L’ should not followed by ‘a’ nor ‘b’ nor ’c’
[^a-z_] It matches the string doesn’t have a-z or _ (underscore)
Slide 39 www.edureka.co/mastering-perl-scripting
Combine Character Class with Wildcards
REGEX Matches
[aA][0-9]+It matches any string which has ‘a’ or ‘A’ followed by any number and occurrence can any
number of times
A+.[.?] It matches any string which has ‘A’ any number of times followed by any character followed by
either ‘.’ or ‘?’
a[bc] It matches any string which has ‘a’ followed by either ‘b’ or ‘c’
A[abc]? It matches the string which has ‘A’ followed by zero or one occurrence of either ‘a’ or ‘b’ or ‘c’
[a-z_.]\@ It matches the string has ‘a’ to ‘z’ or ‘_’ or ‘.’ followed by ‘@’
Slide 40 www.edureka.co/mastering-perl-scripting
Character Class - Shortcuts
Character classes can also be represent by shortcuts
Following are the examples:
Shortcut Say Meaning
\s Any space, tab or new line characters [ \t\n]
\S Other than space, tab or newline character [^\t\n]
\d Any digit [0-9]
\D Other than digit [^0-9]
\w Digits, characters or _ (underscore) [a-zA-Z0-9_]
\W Other than digit, character or _ [^a-zA-Z0-9_]
Slide 41 www.edureka.co/mastering-perl-scripting
Shortcuts with Wildcards
Shortcuts can also be used with wildcards
Following are the examples:
Shortcut Say Meaning
\s+Any number of space, tab or new line
characters[ \t\n]+
\S+Other than space, tab or newline character
any number of times[^\t\n]+
\d+ Any digit any number of times [0-9]+
\D+ Other than digit any number of times [^0-9]+
\w+Digits, characters or _ (underscore) any
number of times[a-zA-Z0-9_]+
\W+Other than digit, character or _ any
number of times
[^a-zA-Z0-
9_]+
Slide 42 www.edureka.co/mastering-perl-scripting
Meta Characters
Shortcut Say Meaning
^ ^ful Should start with a string i.e. matches ‘ful’ but not ‘wonderful’
$ ful$ Should be ended with a string i.e. matches ‘wonderful’ but not the word ‘Fultron’
{a,b}Abc{1,2}
Abc{1}
It matches the string has ‘A’ followed by ‘B’ followed by minimum one occurrence if ‘c’
and maximum 2 occurrence.
(…) (\w+) Grouping will be discuss in later slides
\ \? Backspace – suppress the special meaning of quantifiers.
| Black|white Black or White in the string
Slide 43 www.edureka.co/mastering-perl-scripting
Meta Symbols
Shortcut Say Meaning
\A \Aful Should start with a string i.e. matches ‘ful’ but not ‘wonderful’
\Z Ful\Z Should be ended with a string i.e. matches ‘wonderful’ but not the word ‘Fulltron’
\cA \cA Match control A, \cB Match control B
\Q and \E \Q What is your name?\E Quotes the meta characters till \E (? Is question mark here not the quantifier)
\b \bful\b Looks for exact word ful
\B \BFul\B Opposite of \b
\n and \t \t Match the \n (new line) and \t(tab character)
Slide 44 www.edureka.co/mastering-perl-scripting
All in One Example
REGEX Matches
/full/ Matches ‘full’, ‘Wonderful’ and ‘Fultron’
/Ao+/ Matches Ao, Aoo, Aoooo
/A(oh)*/ Matches A, Aoh, Aohoh
/Yahoo{1,3}/ Matches Yahoo, Yahooo, Yahoooo
/Edurekas?/ Matches Edureka, Edurekas
/Check\s+mates/ Matches Check followed by spaces followed mates
/\$10/ Matches $10, $100, $101
Slide 45 www.edureka.co/mastering-perl-scripting
All in One Example
REGEX Matches
/full/ Matches ‘full’, ‘Wonderful’ and ‘Fultron’
/Ao+/ Matches Ao, Aoo, Aoooo
/A(oh)*/ Matches A, Aoh, Aohoh
/Yahoo{1,3}/ Matches Yahoo, Yahooo, Yahoooo
/Edurekas?/ Matches Edureka, Edurekas
/Check\s+mates/ Matches Check followed by spaces followed mates
/\$10/ Matches $10, $100, $101
Slide 46
Your feedback is important to us, be it a compliment, a suggestion or a complaint. It helps us to make the course better!
Please spare few minutes to take the survey after the webinar.
www.edureka.co/mastering-perl-scripting
Survey