09 string processing_with_regex copy

String Processing with REGEX

Regular Expressions• Regular Expression is a mean for defining text patterns.

It is vastly used in various implementation of automatic computing tasks

• Regular expressions can be used when working with different programming languages, such as Perl, AWK and TCL or with some of the Linux power tools such as sed, grep, awk, expr and VI

• There are two main types of regular expressions usage– Matching– Substitution

Note: Regular expression can be also referred as regexp or REGEX

Regular Expressions• REGEX are different from the shell’s meta-characters, even

though they make use of similar characters; They should always be quoted in order to protect them from the shell.

Note: There are some variants and additions in the REGEX syntaxes between the different commands; if something does not work or if in doubt, consult the man pages of that program.

Regular Expressions• Below is a list of some of the common REGEX and their

values: . - match any single character. [list] – matches any single character in the list. [range] – matches any single character in the range. [^range] - matches any single character, not in list or range. * - matches previous character 0 or more times. \{n\} – matches previous character n times. \{n,\} – matches previous character at least n times. \{n,m\} – matches previous character between n and m times. ^ - matches regex at the start of the line Only. $ - matches regex at the end of the line Only. \ - quote. Cancels the meaning of a meta-character.

Regular Expressions

| - Logical OR& - Logical AND! - Logical NOT

• Regular Expression parsing is done simply be interpreting each char, from left to right.When matching, each text line will be tested for a match against the Regular Expression every time a new character is being parsed

• Each character matches itself, unless it is a meta-character.

Regular Expressions• Some examples for REGEX matching:

# egrep '^u' /etc/passwduucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologinuser1:x:500:500::/home/user1:/bin/bash

# egrep '^[^a-v]' /etc/passwdwebalizer:x:67:67:Webalizer:/var/www/usage:/sbin/nologin

sed• ‘sed’ is a stream editor, it parses and edit text according to a

predefined set of commands

• Syntax: sed [options] ‘command(s)’ [file]

• Options: -i modify the file data -e adds support for multiple commands -n do not output lines by default

By default, the “sed” command does not change the contents of files; the safer way to make the changes is to redirect the new output, after “sed” has done its trick into a new file.

sed• ‘sed’ uses Regular Expression commands to do both matching

and text manipulation• Regular Expression commands can be pretty confusing, as the

command declaration can be on both sides of the regexp declaration

• Syntax: [command]/regexp/[command][arguments]

– ‘/regexp/p’ Print matched text (to be used with ‘-n’)– ‘/regexp/d’ Delete matched text– ‘s/regexp/string/[g]’ Substitute matched text with string

sed• ‘sed’ is one of the more complex Linux power tools.

For most advanced usages, it has two main competitors: Perl and ‘awk’. Both are fully featured programming languages.

• Example# cat fileone two threefour five six# sed 's/$[a-z]*$ $[a-z]*$ $[a-z]*$/\1 SECRET \3/g' file

09 string processing_with_regex copy

Technology

Transcript of 09 string processing_with_regex copy