1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.

11
1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX

description

3 Awk Command Structure awk [options] ‘program’ [file(s)] awk [options] -f programfile [files(s)] A program can be one or more pairs of the following: pattern { procedure } BEGIN and END constructs can also be used An important option is -Fc where c is the field separator to use. For example awk -F:... indicates that the separator is”:” Example awk -F: ‘/this/ { print $2 }’ file1

Transcript of 1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.

Page 1: 1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.

1

Lecture 10

Introduction to AWK

COP 3344 Introduction to UNIX

Page 2: 1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.

2

What is AWK• Important early text manipulation language

– Created by Al Aho, Peter Weinberger & Brian Kernighan

• This Unix utility manipulates text files that are viewed as arranged in columns

• awk splits each line of input (from standard input or a set of files) based on whitespace (the default) and processes each line - the field separator need not be whitespace but can also be a specified character

• There are also other flavors of awk such as nawk and gawk

Page 3: 1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.

3

Awk Command Structureawk [options] ‘program’ [file(s)]awk [options] -f programfile [files(s)]• A program can be one or more pairs of the

following:pattern { procedure }

• BEGIN and END constructs can also be used• An important option is -Fc where c is the field

separator to use. For example awk -F: . . . indicates that the separator is”:”

• Exampleawk -F: ‘/this/ { print $2 }’ file1

Page 4: 1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.

4

Awk Program Processing• awk scans each input line for pattern and when a match

occur the associated actions defined by procedure are executed. The general form of a program is:

BEGIN { initial statements } pattern { procedure } pattern { procedure }END { final statements }

– If the pattern is missing, the procedure is applied to each line– If procedure is missing, then the matched lines are written to

standard output• Fields are referred to by the variables $1, $2, …, $n. $0

refers to the entire record (the line).• Statements following BEGIN are done before any pattern-

procedures; statements after END are done after all pattern-procedures.

• In most programs there is only one pattern {procedure}

Page 5: 1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.

5

awk patterns

• awk patterns can be of the following form/regular expression/relational expressionfield-matching expression

• Example patterns/this//^alpha*/NF > 2$1 == $2$1 ~ /m$/

Page 6: 1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.

6

Example pattern-procedures• Print the second field of each line

{ print $2 }• Print the first field of all lines that contain the

pattern alpha/alpha/ { print $1 }

• Print all records containing more than two fieldsNF > 2

• Add numbers in second column if first field matches the word “add”

• $1 ~ /^add$/ { total += $2 } END { print “total is”, total }

Page 7: 1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.

7

awk Regular Expressions

• Regular expressions are formed in the same way as they are for extended grep. All the operators are available

• Note that regular expressions must be placed with the slashes: /<regular expression>/

• Examples/D[Rr]\./ #matches any line containing DR. or Dr.

/^alpha/ #matches any line starting with alpha

/^[a-zA-Z]+/ #matches any line starting with a sequence of #letters (one or more)

Page 8: 1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.

8

awk Relational Expressions• Relational expressions can consist of strings, numbers,

arithmetic / string operators, relational operators, defined variables, and predefined variables.– $1, …, $n, are the fields of the record– $0 is the entire line– NF is the number of fields in the current line– NR is the number of the current line– FS is the field separator– FILENAME is the current filename

• many relational operators are availableNF > 5 && $1 == $2 /while/ || /do/

• Note: variables can be assigned with the “=“ operatorFS = “,”total = 5

Page 9: 1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.

9

awk field matching expressions

• Field matching expressions can check if a regular expression matches “~” or does not match “!~” a field.

• Examples$1 ~ /D[Rr]\./ #first field matches DR. or Dr. ?

$1 !~ /From/ #first field does not match From ?

Page 10: 1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.

10

awk procedures• An awk procedure specifies the processing of a

line that matches a given pattern. An awk procedure is contained within the “{“ and “}” and consists of statements separated by semicolons or newlines.

• awk is a full programming language, and contains control statements (such as: do while, for, if, break, continue, etc.)

• Note that BEGIN can be used to initialize variables and END can be used to do post processing after all records have been processed

Page 11: 1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.

11

awk examples• #print the first two fields of each line if the first field

matches the string /this/awk ‘/this/ { print $2, $1 }’ file1

• #sum the values of the fields in the second column and print out the final sum, if the first field matches addawk ‘BEGIN { sum=0 } /add/ { sum += $2 } \ END{ print sum }’ file2

• # illustrating if statements and the or operatorawk ‘/green/ || /yellow/ \ {if ($1==“green") print $1 ; \ else if ($1=="yellow") print "SLOW DOWN";}’ \ file3