Awk essentials

41

Click here to load reader

Transcript of Awk essentials

Page 1: Awk essentials

awk – Essentials and Examples1

Logan Palanisamy

Page 2: Awk essentials

Agenda2

Elements of awkOptional Bio BreakExamples and one-linersQ & A

Page 3: Awk essentials

What is awk3

An acronym of the last names of the three authors

General purpose pattern-scanning and processing language

Used for filtering, transforming and reporting

More advanced than sed, but less complicated than C; less cryptic than Perl.

gawk, nawk

Page 4: Awk essentials

awk syntax4

awk [-Ffield_sep] 'cmd' infile(s)awk [-Ffield_sep] –f cmd_file infile(s)infile can be the output of pipeline.Space is the default field_sep

Page 5: Awk essentials

awk mechanics5

[pattern] [{action} …]Input files processed a line at a timeEvery line is processed if there is no patternLines split into fields based on field-sepPrint is the default action. Input files not affected in anyway

Page 6: Awk essentials

Field Names6

Lines split into fields based on field-sep$0 represents the whole line$1, $2, … $n represent different fieldsField names could be used as variables

Page 7: Awk essentials

Built-in variables. 7

Variable ExplanationFS Field separator variable for input lines. Defaults to space

or tabNR Number of input lines processed so farNF Number of fields in the current input lineFILENAME

Name of the current input file

OFMT Default format for output numbersOFS Output field separator. Defaults to spaceORS Output record separator. Defaults to new-line characterRS Input Record Separator. Defaults to new-line character.FNR Same as NR; but gets reset after each file unlike NRRSTART, RLENGTH

Variables set by the match() function which indicates where the match starts and how long the match is

SUBSEP Subscript separator. Used in multi-dimensional arrays

Page 8: Awk essentials

Operators8

Operator Explanation+, -, *, / Addition, Subtraction, Multiplication, Division, % Remainder/Modulo operation++ Unary increment (var++ same as var=var+1)-- Unary decrement^ or ** Exponentaion+=, -=, *=, /=, %=

Assignment operator preceded by arithmetic operation (var+=5 same as var=var+5)

No operator String concatenation (newstr=“new” $3)?: Ternary operator (expr1 ? expr2 : expr3)

Page 9: Awk essentials

Relational Operators. 9

Operator Explanation== Equality operator!= Not equal to< Less than<= Less than or equal to > Greater than>= Greater than equal to~ Contains regular expression!~ Doesn’t contain regular expression

Page 10: Awk essentials

awk patterns10

Can match either particular lines or ranges of lines

Regular expression patternsRelational expression patternsBEGIN and END patterns

Page 11: Awk essentials

Regular Expressions11

Meta character

Meaning

. Matches any single character except newline* Matches zero or more of the character preceding it

e.g.: bugs*, table.*^ Denotes the beginning of the line. ^A denotes lines

starting with A$ Denotes the end of the line. :$ denotes lines ending

with :\ Escape character (\., \*, \[, \\, etc)[ ] matches one or more characters within the brackets.

e.g. [aeiou], [a-z], [a-zA-Z], [0-9], [[:alpha:]], [a-z?,!][^] matches any characters others than the ones inside

brackets. eg. ^[^13579] denotes all lines not starting with odd numbers, [^02468]$ denotes all lines not ending with even numbers

\<, \> Matches characters at the beginning or end of words

Page 12: Awk essentials

Extended Regular Expressions12

Meta character

Meaning

| alternation. e.g.: ho(use|me), the(y|m), (they|them)+ one or more occurrences of previous character. a+ is

same as aa*)? zero or one occurrences of previous character. {n} exactly n repetitions of the previous char or group{n,} n or more repetitions of the previous char or group{n, m} n to m repetitions of previous char or group. For the

above four –re-interval option needs to be specified(....) Used for grouping

Page 13: Awk essentials

Regular Expressions – Examples13

Example Meaning.{10,} 10 or more characters. Curly braces have

to escaped[0-9]{3}-[0-9]{2}-[0-9]{4} Social Security number([2-9][0-9]{2})[0-9]{3}-[0-9]{4}

Phone number (xxx)yyy-zzzz

[0-9]{3}[ ]*[0-9]{3} Postal code in India[0-9]{5}(-[0-9]{4})? US ZIP Code with optional four-digit

extension

Page 14: Awk essentials

Regular Expression Patterns. 14

Example Explanationawk ‘/pat1/’ infile Same as grep ‘pat1’ infileawk ‘/pat1/, /pat2/’ infile

Print all lines between pat1 and pat2 repetitively

awk ‘/pat1|pat2/’ infile Print lines that have either pat1 or pat2awk ‘/pat1.*pat2/’ infile

Print lines that have pat1 followed by pat2 with something or nothing in between

Page 15: Awk essentials

Relational Expression Patterns. 15

Example Explanationawk ‘$1==“USA”’ infile

Print the line if the first field is USA

awk ‘$2 !=“xyz”’ infile Print all lines whose second field is not “xyz”awk ‘$2 < $3’ infile Print all lines whose third field is greater than

the secondawk ‘$5 ~ /USA/’ infile Print if the fifth field contains USAawk ‘$5 !~ /USA/’ infile

Print if the fifth field doesn’t contain USA

awk ‘NF == 5’ infile Print lines that have five fieldsawk ‘NR == 5, NR==10’ infile

Print lines 5 to 10

awk ‘NR%5==0’ infile Print every fifth line (% is the modulo operator)

awk ‘NR%5’ infile Print everything other than every fifth lineawk ‘$NF ~ /pat1/’ infile

Print if the last field contains pat1

Page 16: Awk essentials

awk compound-patterns16

Compound patterns formed with Boolean operations (&&, ||, !), and range patterns

pat1 && pat2 (compound AND)pat1 || pat2 (compound OR)!pat1 (Negation)pat1, pat2 (range pattern)

Page 17: Awk essentials

Compound Pattern Examples17

Example Explanationawk ‘/pat1/ && $1==“str1”’ infile

Print lines that have pat1 and whose first field equals str1

awk ‘/pat1/ || $2 >= 10’ infile Print lines that have pat1 OR whose second field is greater than or equal to 10

awk ‘!/pat1/’ infile Same as grep –v “pat1” infileawk ‘NF >=3 && NF <=6’ infile

Print lines that have between 3 and six fields

awk ‘/pat1/ || /pat2/’ infile Same as awk ‘/pat1|pat2/’ infileawk ‘/pat1/, /pat2/’ infile Print all lines between pat1 and pat2

repetitivelyawk ‘!/pat1|pat2/’ infile Print lines that have neither pat1 nor

pat2awk ‘NR > 30 && $1 ~ /pat1|pat2/’ infile

Print lines beyond 30 that have first field containing either pat1 or pat2

Page 18: Awk essentials

Compound Pattern Examples18

Example Explanationawk ‘/pat1/&&/pat2/’ infile Print lines that have both pat1 and

pat2. awk ‘/pat1.*pat2/’ infile How is this different from the one

above?awk ‘NR<10 || NR>20’ infile Print all lines except lines 10 to 20awk ‘!(NR >=10 && NR<=20)’ infile

Print lines between 10 and 20. Same as awk ‘NR==10, NR==20’ infile

Page 19: Awk essentials

BEGIN and END patterns19

BEGIN allows actions before any lines are processed.

END allows actions after all lines have been processed

Either or both optionalBEGIN {action}[Pattern] {action}END {action}

Page 20: Awk essentials

BEGIN20

Use BEGIN to: Set initial values for variables Print headings Set internal field separator (same as –F on command

line)awk ‘BEGIN {FS=“:”; print “File name”, FILENAME}’ file2

file2

Page 21: Awk essentials

END21

Use END to: Perform any final calculations Print report footers. Do any thing that must be done after all lines have

been processed.awk ‘END {print NR}’ file2 file2

Page 22: Awk essentials

Creating Actions22

Actions consist of one or more statements separated by semicolon, newline, or a right-brace.

Types of statements: Assignment statement (e.g.var1=1) Flow-control statements Print control statement

Page 23: Awk essentials

Flow-control statements23

Statement Explanationif (conditional) {statement_list1} [else {statement_listt2}]

Perform statement_list1 if conditional is true. Otherwise statement_list2 if specified

while (conditional) {statement_list}

Perform statement_list while conditional is true

for (int_expr;conditional_expr;ctrl_expr) {statement_list}

Perform int_expr firt. While conditional_expr is true, perform statement_list and execute ctrl_expr.

break Break from the containing loop and continue with the next statement

continue Go to the next iteration of the containing loop without executing the remaining statements in loop

next Skip remaining patterns on this lineexit Skip the rest of the input and go to the END

pattern if one exists or exit.

Page 24: Awk essentials

Print-control statements24

Statement Explanationprint [expression_list] [>filename]

Print the expression on stdout unless redirected to filename.

printf format [, expression_list] [>filename]

Prints the output as specified in format (like printf in C). Has a rich set of format specifiers.

Page 25: Awk essentials

Variables25

Provide power and flexibilityFormed with letters, numbers and

underscore character.Can be of either string or numeric typeNo need to declare or initialize. Type implied by the assignment. No $ in

front of variables. (e.g. var1=10; job_type=‘clerk’)

Field names ($1, $2, ..$n) are special form of variables. Can be used like any other variable.

Page 26: Awk essentials

Arrays26

One-dimensional arrays: array_name[index]Index can be either numeric or string.

Starts with 1 if numericNo special declaration needed. Simply

assign values to an array element.No set size. Limited only by the amount of

memory on the machine. phone[“home”], phone[“mobile”],

phone[var1], phone[$1], ranks[1]

Page 27: Awk essentials

Multi-Dimensional arrays27

Arrays are one-dimensional. Array_name[1,2] not supportedConcatenate the subscripts to form a string

which could be used as the index: array_name[1”,”2]

Space is the concatenation operator. “1,2”, a three character string is the index.

Use SUBSEP, subscript separator, variable to eliminate the need to have double quotes around the comma.

Page 28: Awk essentials

Built-in functions28

Function Explanationcos(awk_expr) Cosine of awk_expr

exp(awk_expr) Returns the exponential of awk_expr (as in e raised to the power of awk_expr)

index(str1, str2) Returns the position of strt2 in str1.

length(str) Returns the length of str

log(awk_expr) Base-e log of awk_expr

sin(awk_expr) Sine of awk_expr

sprintf(frmt, awk_expr) Returns the value of awk_expr formatted as per frmt

sqrt(awk_expr) Square root of awk_expr

split(str, array, [field_sep]) Splits a string into its elements and stores into an array

substr(str, start, length) Returns a substring of str starting at position “start” for “length” characters.

toupper(), tolower() Useful when doing case-insensitive searches

Page 29: Awk essentials

Built-in functions contd.29

Function Explanationsub(pat1, “pat2”, [string]) Substitute the first occurrence of pat1 with pat2 in

string. String by default is the entire linegsub(pat1, “pat2”, [string])

Same as above, but replace all occurrences of pat1 with pat2.

match(string, pat1) Finds the regular expression pat1, and sets two special variables (RSTART, RLENGTH) that indicate where the regular expression begins and ends

systime() returns the current time of day as the number of seconds since Midnight, January 1, 1970

Page 30: Awk essentials

Case Insensitive Match30

Case insensitive match: awk ‘BEGIN {ignorecase=1} /PAT1/’awk ‘tolower($0) ~ /pat1/ …’

Page 31: Awk essentials

User-Defined functions31

Gawk allows user defined functions

#!/usr/bin/gawk -f{

if (NF != 4) {error("Expected 4 fields");

} else {print;

}}function error ( message ) {

if (FILENAME != "-") {printf("%s: ", FILENAME) > "/dev/tty";

}printf("line # %d, %s, line: %s\n", NR, message, $0) >> "/dev/tty";

}

Page 32: Awk essentials

Very Simple Examples32

Find the average filesize in a directoryFind the users without passwordConvert String to Word (string2word.awk)List the file count and size for each user

(cnt_and_size.awk)

Page 33: Awk essentials

Awk one-liners33

Example Explanationawk’{print $NF}’ infile Print the last field in each lineawk’{print $(NF-1)}’ infile

Print the field before the last field. What would happen if () are removed? What happens if there is only one field

awk’NF’ infile Print only non-blank lines. Same as awk ‘/./’awk ‘{print length, $0)’ infile

Print each line preceded by its length.

awk ‘BEGIN {while (++x<11) print x}’

Print 1 to 10

awk ‘BEGIN {for (i=10; i<=50; i+=4) print i}’

Print 10 to 50 in increments of 4

awk ‘{print; print “”}’ infile

Add a blank line after every line

awk ‘{print; if (NF>0) print “”}’ infile

Add a blank line after every non-blank line

Page 34: Awk essentials

Awk one-liners34

Example Explanationawk’NF !=0 {++cnt} END {print cnt}’ infile

Count the number of non-blank lines

ls –l | awk ‘NR>1 {s+=$5} END {print “Average:” s/(NR-1)}’

Return the average file size in a directory

awk ‘/pat1/?/pat2/:/pat3/’ infile

uses ternary operator ?: Equivalent to awk ‘/pat1/ && /pat2/ || pat3’ except for lines containing both pat1 and pat3

awk ‘NF<10?/pat1/:/pat2/’ infile

Use pat1 if number of fields is less than 10; otherwise use pat2

awk ‘ORS=NR%3?” ”:”\n”’ infile

Join three adjacent lines. ORS is the output record separator

awk ‘ORS=NR%3?”\t”:”\n” {print $1}’ infile

Print the first field three to row. ORS is the output record separator

awk ‘FNR < 11’ f1, f2, f3

Concatenate the first 10 lines of f1, f2, and f3.

Page 35: Awk essentials

Awk one-liners35

Example Explanationawk ‘length < 81’ Print lines that are shorter than 81

charactersawk ‘/pat1/, 0’ Print all lines between the line containing

pat1 and end of fileawk ‘NR==10, 0’ Print lines 10 to the end of file. The end

condition “0” represents “false”. awk '{ sub(/^[ \t]+/, ""); print }'

Trim the leading tabs or spaces. Called ltrim

awk '{ sub(/[ \t]+$/, ""); print }'

Trim the trailing tabs or spaces. Called rtrim

awk '{ gsub(/^[ \t]+|[ \t]+$/, ""); print }'

Trim the white spaces on both sides

Page 36: Awk essentials

Awk one-liners36

Example Explanationawk '/pat1/ { gsub(/pat2/, “str") }; { print }'

Replace pat2 with “str” on lines containing pat1

awk '{ $NF = ""; print }'

Delete the last field on each line

Page 37: Awk essentials

Awk one-liners37

http://www.catonmat.net/blog/awk-one-liners-explained-part-one/

Page 38: Awk essentials

Translators38

awk2c – Translates awk programs to Cawk2p – Translates awk progrms to Perl

Page 39: Awk essentials

References39

sed & awk by Dale Dougherty & Arnold Robins

http://www.grymoire.com/Unix/Awk.htmlhttp://www.vectorsite.net/tsawk.htmlhttp://snap.nlc.dcccd.edu/reference/awkref/

gawk_toc.html

Page 40: Awk essentials

Q & A40

[email protected]

Page 41: Awk essentials

Unanswered questions41

How to print lines that are outside a block of lines? (print lines that are not enclosed by /pat1/,/pat2/

Does awk support grouping and back-referencing (e.g. identify adjacent duplicate words)?