Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax...

Post on 17-Jan-2016

218 views 0 download

Transcript of Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax...

Chapter 11:Perl Scripting

Off Larry’s Wall

In this chapter …• Background

• Terminology

• Syntax

• Variables

• Control Structures

• File Manipulation

• Regular Expressions

Perl• Practical Extraction and Report Language• Developed by Larry Wall in 1987• Originally created for data processing and

report generation• Elements of C, AWK, sed, scripting• Add-on modules and third party code make it

a more general programming language

Features• C-derived syntax

• Ambiguous variables & dynamic typing

• Singular and plural variables

• Informal, easy to use

• Many paradigms – procedural, functional, object-oriented

• Extensive third party modules

Features, con’t• As elegant as you make it

• Do What I Mean intelligence

• Fast, easy, down and dirty coding

• Interpreted, not compiled

• perldoc – man pages for Perl modules

Terminology• Module – one stand alone piece of code

• Distribution – set of modules

• Package – a namespace for one or more distributions

• Package variable – declared in package, accessible between modules

• Lexical variable – local variable (scope)

Terminology, con’t• Scalar – variable that contains only one

value (number, string, etc)

• Composite – variable made of one or more scalars

• List – series of one or more scalars– e.g. (2, 4, ‘Zach’)

• Array – composite variable containing a list

Invoking Perl• perl –e ‘text of perl program’

• perl perl_script

• Make perl script executable and you can execute the script itself– i.e. ./my_script.pl

• Common file extension .pl not required

• Like other scripts start with #! to specify execution program

Invoking Perl, con’t• Use perl –w to display warnings

– Will warn if using undeclared variables– Instead of –w, use warnings; in your script

• Same effect

• Usually you’ll find perl in /usr/bin/perl

Syntax• Each perl statement ended by semicolon (;)

• Can have multiple statements per line

• Whitespace ignored largely– Except within quoted strings

• Double quotes allow interpretation of variables and special characters (like \n)

• Single quotes don’t (just like the shell)

Syntax, con’t• Forward slash used to delimit regular

expressions (e.g. /.*sh?/)

• Backslash used for escape characters– E.g. \n – newline, \t – tab

• Lines beginning with # are ignored as comments

Output• Old way

– print what_to_print;– Concatenate

• print item_1, item_2

– Want a newline?• print what_to_print, “\n”

• New way– say what_to_print

• Automatically adds newline

Output, con’t• what_to_print can be many things

– Quoted string – “Here’s some text”– Variables - $myvar– Result of a function – toupper($myvar)– A combination

• print “Sub Tot: $total \n”, “Tax: $total*$tax \n”

• Want to display an error and exit?– die “Uh-oh!\n”;

Variables• Perl variables can be singular or plural

• Data typing done dynamically at runtime

• Three types– Scalar (singular)– Array (plural)– Hash a.k.a. Associative Arrays (plural)

• Variable names are case sensitive

• Can contain letters, numbers, underscore

Variables, con’t• Each type of variable starts with a different

special character to mark type

• By default all variables are package in scope

• To make lexical, preface declaration with my keyword

• Lexical variables override package variables

• Include use strict; to not allow use of undeclared variables

Variables, con’t• We’ve already covered use warnings;

• Undeclared variables, if referenced, have a default value of undef– Equates to 0 or null string– Can check by using defined() function

• $. is equal to the line number you’re on

• $_ is the default operand – ‘it’

Scalars• Singular, holds one value, either string or

number

• Must be preceded with $ i.e. $myvar

• Perl will automatically cast between strings and numbers

• Will treat as a number or string, whichever is appropriate in context

Arrays• Plural, containing an ordered list of scalars

• Zero-based indexing

• Dynamic size and allocation

• Begin with @ e.g. @myarray

• @variable references entire array

• To reference a single element (which would be a scalar, right?) $variable[index]

Arrays, con’t• $#array returns the index of the last element

– Zero based – this means it’s one less than the size of the array

• @array[x..y] returns a ‘slice’ or sublist

• Printing arrays– Array enclosed in double quotes prints space

delimited list– Not in quotes all entries concatenated

Arrays, con’t• Arrays can be treated like FIFO queues

– shift(@array) – pop first element off– push(@array, scalar) – push element on at end

• Use splice to combine arrays– splice(@array,offset,length,@otherarray)

Hashes• Plural, contain an array of key-value pairs

• Prefix with % i.e. %myhash

• Keys are strings, act as indexes to array

• Each key must be unique, returns one value

• Unordered

• Optimized from random access

• Keys don’t need quotes unless there are spaces

Hashes, con’t• Element access

– $hashvar{index} = value• e.g. $myvar{boat} =“tuna”; print $myvar{boat};

– %hashvar = ( key => value, …);• e.g. %myvar = ( boat => “tuna”, 4 => “fish”);

– Get array of keys or values• keys(%hashvar)• values(%hashvar)

Evaluating Expressions• Most control structures use an expression to

evaluate whether they are run

• Perl uses different comparison operators for strings and numbers

• Also uses the same file operators (existence, access, etc) that bash uses

Expressions• Numeric operators

– ==, !=, <, >, <=, >= – <=> returns 0 if equal, 1 if >, -1 if <

• String Operators– eq, ne, lt, gt, le, ge– cmp same as <=>

Control Structures• if (expr) {…}

• unless (expr) {…}

• if (expr) {…} else {…}

• if (expr) {…} elsif (expr) {…} … else {…}

• while (expr) {…}

• until (expr) {…}

Control Structures, con’t• for and foreach are interchangeble

• Syntax 1– Similar to bash for…in structure– foreach [var] (list) {…}– If var not defined, $_ assumed– For each loop iteration, the next value from list is

populated in var

Control Structures, con’t• for/foreach Syntax 2

– Similar to C’s for loop– foreach (expr1; expr2; expr3) {…}– expr1 sets initial condition– expr2 is the terminal condition– expr3 is the incrementor

Control Structures, con’t• Short-circuiting loops

– Use last to break out of loop altogether• Same as bash’s break

– Use next to skip to the next iteration of the loop• Same as bash’s continue

Handles• A handle is essentially a variable linked to a

file or process

• Perl automatically opens handles for the default streams– STDIN, STDOUT, STDERR

• You can open additional handles– To a file for input/output/appending– To a process for input/output

Handles, con’t• Basic syntax

– open(handle, [‘mode’], “ref”);– handle is a variable to reference the handle– mode can be many things

• Simple cases: <, >, >>, |• Input (<) implied if omitted

– ref is what to open – file or process– mode and ref can be combined as one string

Handles, con’t• Once open access via handle variable

• Output– print handle “what to print”

• Input– $var = <handle> gets one line of input– Use <handle> as a loop condition to read input

one line at a time, populating $_

Handles, con’t• <> - magic handle, pulls from STDIN or

command line arguments to perl

• Line of input contains EOL character– Use chomp($var) to remove it– Use chop($var) to remove the last character

• When done close(handle);– Housekeeping, good coding practice– Perl actually closes all open handles for you

Handles, con’t• Examples

– open(my $INPUT, “/path/to/file”);– open(my $ERRLOG, “>>/var/log/errors”);– open(my $SORT, “| sort –n”);– open(my $ALIST, "grep \'^[Aa]\' /usr/share/dict/words|")– while(<INPUT>) { print $ERRLOG $_; }

Regular Expressions• Recall Appendix A

• Perl has a few unique features and caveats

• Regular Expressions (RE) delimited by forward slash

• Perl uses the =~ operator for RE matching– Ex. if ($myvar =~ /^T/) { …} # if myvar starts w/ T

• To negate RE matching use !~ operator

RE, con’t• =~ operator can also be used to do

replacement– Ex. $result =~s/old/new/;– ‘old’ replaced with ‘new’ if matched

• Remember, RE (esp. in Perl) are greedy– Will match longest possible match

• Bracketed expressions don’t need to be escaped, just use parentheses