LIN 69321 Unix Lecture 7 Hana Filip. LIN 69322 Text Processing Command Line Utility Programs (cont.)...

23
LIN 6932 1 Unix Lecture 7 Hana Filip

Transcript of LIN 69321 Unix Lecture 7 Hana Filip. LIN 69322 Text Processing Command Line Utility Programs (cont.)...

LIN 6932 1

Unix Lecture 7

Hana Filip

LIN 6932 2

Text ProcessingCommand Line Utility Programs

(cont.)sed LAST WEEK

wc

sort

tr

uniq

awk TODAY

join

paste

Other you may want to check out:

comm cut

ex

iconv

xargs

LIN 6932 3

Text ProcessingCommand Line Utility Programs

awk • after last names of its inventors, Alfred Aho, Peter Weinberger and Brian

Kernighan• a pattern scanning and processing language• a programming language that extensively uses the string datatype, associative

arrays* and regular expressions• AWK programs and sed scripts inspired Larry Wall to write Perl

*associative array: also map, hash, dictionary, lookup table, and in query-processing an index or index file, is an abstract data type composed of a collection of keys and a collection of values, where each key is associated with one value. The operation of finding the value associated with a key is called a lookup or indexing. The relationship between a key and its value is sometimes called a mapping or binding. Hence, associative arrays are very closely related to the mathematical concept of a function

LIN 6932 4

Text ProcessingCommand Line Utility Programs

awk • an AWK program is a series of

pattern { action }

• pairs, where pattern is typically an expression and action is a series of commands.

• Each line of input is tested against all the patterns in turn and the action executed if the pattern is matched or the relevant expression true.

• Either the pattern or the action may be omitted. • The pattern defaults to matching every line of input. • The default action is to print the line of input.

LIN 6932 5

Text ProcessingCommand Line Utility Programs

awk - how to run it

• If the program is short, it is easiest to include it in the command that runs awk:

% awk 'program' input-file1 input-file2 ...

where 'program' consists of a series of patterns and actions• When the program is long, it is usually more convenient to put it in a file

and run it with a command like this:

% awk -f program-file input-file1 input-file2 ...

LIN 6932 6

Text ProcessingCommand Line Utility Programs

awk - how to run it% awk 'program' input-file1 input-file2 ...

single quotes around 'program' make the shell treat all of 'program' as a single argument for awk and allow program to be more than one line long

LIN 6932 7

Text ProcessingCommand Line Utility Programs

awk - how to run it

% awk '/foo/ { print $0 }' list

fooey 555-1234 2400/1200/300 B

foot 555-6699 1200/300 B

macfoo 555-6480 1200/300 A

sabafoo 555-2127 1200/300 C

when lines containing ‘foo’ are found in the file list, they are printed

PATTERN: /foo/ the slashes indicate that ‘foo’ is a pattern ( = regular expression) to search for

ACTION: print $0 action to print the current line

LIN 6932 8

Text ProcessingCommand Line Utility Programs

awk - how to run it

% awk '/foo/ { print $0 }' listfooey 555-1234 2400/1200/300 Bfoot 555-6699 1200/300 Bmacfoo 555-6480 1200/300 Asabafoo 555-2127 1200/300 C

% egrep 'foo' list fooey 555-1234 2400/1200/300 Bfoot 555-6699 1200/300 Bmacfoo 555-6480 1200/300 Asabafoo 555-2127 1200/300 C

LIN 6932 9

Text ProcessingCommand Line Utility Programs

awk - how to run it

% awk '/12/ { print $0 } /21/ { print $0}' listaardvark 555-5553 1200/300 Balpo-net 555-3412 2400/1200/300 Abarfly 555-7685 1200/300 Abites 555-1675 2400/1200/300 Acore 555-2912 1200/300 Cfooey 555-1234 2400/1200/300 Bfoot 555-6699 1200/300 Bmacfoo 555-6480 1200/300 Asdace 555-3430 2400/1200/300 Asabafoo 555-2127 1200/300 Csabafoo 555-2127 1200/300 C

LIN 6932 10

Text ProcessingCommand Line Utility Programs

Awk - how to run it

The awk language is very useful for producing reports from large amounts of raw data, such as summarizing information from the output of other utility programs like ls.

% ls -la | awk '$6 == "Apr" { sum += $5 } END { print sum }'

692947

This command prints the total number of bytes in all the files in the current directory that were last modified in April

LIN 6932 11

Text ProcessingCommand Line Utility Programs

awk - how to run it - executable awk Programs

An awk script can have three types of blocks. One of them must be there.

The BEGIN{} block is processed before the file is checked.

The {} block runs for every line of input.

The END{} block is processed after the final line of the input file.

LIN 6932 12

Text ProcessingCommand Line Utility Programs

awk - how to run it

• BEGIN and END are special patterns.

• They are not used to match input records.

• They are used for supplying start-up or clean-up information to your awk script.

• A BEGIN rule is executed, once, before the first input record has been read. • An END rule is executed, once, after all the input has been read.

LIN 6932 13

Text ProcessingCommand Line Utility Programs

awk - how to run it - executable awk Programs

write self-contained awk scripts, using the ‘#!’ script mechanism

#! /usr/bin/awk -f

You may want to check with

% whereis awk

to see what to put into the first line of you awk script

LIN 6932 14

Text ProcessingCommand Line Utility Programs

Awk - how to run it - executable awk Programs

% vi whowhat

#! /usr/bin/awk -f/[Uu]npaid/ {print $1, "owes", $2 > "deadbeats" }

% chmod +x whowhat% whowhat debts% vi deadbeats

Dick owes 3.87Harry owes 56.00Tom owes 36.03Harry owes 22.60Tom owes 11.44

LIN 6932 15

Text ProcessingCommand Line Utility Programs

awk - how to run it - executable awk Programs

% vi awklw

#! /usr/bin/awk -f

BEGIN { nl = 0; nw = 0 }

{ nl++ ; nw += NF }

END { print "Lines:", nl, "words:", nw }

% chmod +x awklw

% awklw machen.txt

Lines: 2538 words: 21853

LIN 6932 16

Text ProcessingCommand Line Utility Programs

awk - how to run it - executable awk Programs

++: the increment operator means “add one to a variable” or “make a variable's value one more than it was before.”

NF: awk variable that is used to count how many new fields there are in a given file

Useful reference for awk usage:

Linux and Unix Shell Programming By D. S. W. Tansley

books.google.com/

LIN 6932 17

Text ProcessingCommand Line Utility Programs

paste

prints lines consisting of sequentially corresponding lines of each specified file. In the output the original lines are separated by TABs. The output line is terminated with a newline.

% paste file1 file2 … filen > filen+1

% vi numbers % vi letters % paste numbers letters > numbers.letters

1 a 1 a

2 b 2 b

3 c 3 c

4 d 4 d

LIN 6932 18

Text ProcessingCommand Line Utility Programs

join

merges the lines of two sorted text files based on the presence of a common field

% join file1 file2 > file3

% vi person1 % vi person2 % join person1 person2 > person3

Smith john newman bill smith john betty

Carpenter mary Smith betty

LIN 6932 19

Looping Logic

• In looping logic, a control structure (or loop) repeats until some condition exists or some action occurs

• You know foreach loop

foreach var ( worddlist )

command(s)

end

it loops through a range of values.

it makes a variable take on each value in a specified set, one at a time, and performs some action

LIN 6932 20

Looping Logic

foreach var ( worddlist )

command(s)

end

while ( expr )

command(s)

end

LIN 6932 21

Looping Logic

#!/bin/cshforeach person (Bob Susan Joe Gerry)

echo Hello $personend

Output:Hello BobHello SusanHello JoeHello Gerry

LIN 6932 22

The while Loop• A different pattern for looping is created using the while

loop while ( condition ) command(s) end

• The while statement best illustrates how to set up a loop to test repeatedly for a matching condition

• The while loop tests a condition in a manner similar to the if statement

• As long as the condition is true, the command(s) repeat(s)

LIN 6932 23

Looping Logic

Adding integers from 1 to 10

#!/bin/cshset i=1set sum=0while ($i <= 10)

echo Adding $i to the sum. set sum=`expr $sum + $i` set i=`expr $i + 1`endecho The sum is $sum.