More CGI programming.... Back to CGI programming... Now that we know how to use conditional...

190
More CGI programming ...
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    227
  • download

    1

Transcript of More CGI programming.... Back to CGI programming... Now that we know how to use conditional...

More CGI programming ...

Back to CGI programming ...

• Now that we know how to use conditional expressions, we can write a CGI program that determines whether the environment variables set by the HTTP server demon include one of interest

CGI program which checks for a particular env var

#!/usr/local/bin/perl

print <<EOF;

Content-type: text/html

<HTML>

<HEAD><TITLE> Environment checking program </TITLE></HEAD>

<BODY>

<H1> Environment Variables </H1>

<p>

EOF

# next line checks if a certain key/value pair in %ENV is defined

if ( $ENV{”HTTP_GROCERY_ORDER"} )

{ print ”Request includes Grocery-Order header" }

else { print “Request does not include Grocery-Order header”};

print <<EOF;

</p>

</BODY>

</HTML>

EOF

• CS4320 got here on 4 feb 2005

More Perl ...

Defining subroutines in Perl

• A subroutine definition in Perl is of the form

sub <subroutine-name>

{ <sequence-of-statements> }• Example

sub greetTheWorld

{ print “Hello, world!\n”;

print “Have a nice day”

}• In a main program, this would be called as follows:

greetTheWorld();

Defining subroutines in Perl (contd.)

• Another example

sub printEnvironmentVariables

{ foreach my $key ( sort( keys(%ENV) ) )

{ print "<LI> $key = $ENV{$key}</LI>" }

}

• This is used on the next two slides in a new version of the CGI program which prints out its environment variables

Another CGI example ...

A revised CGI program to report env vars (Part 1)

#!/usr/local/bin/perl

print <<EOF;

Content-type: text/html

<HTML>

<HEAD>

<TITLE> Environment reporting program </TITLE>

</HEAD>

<BODY>

<H1> Environment Variables </H1>

<UL>

EOF

printEnvironmentVariables();

print <<EOF;

</UL>

</BODY>

</HTML>

EOF

A revised CGI program to report env vars (Part 2)

sub printEnvironmentVariables

{ foreach my $key ( sort( keys(%ENV) ) )

{ print "<LI> $key = $ENV{$key}</LI>" }

}

Some more Perl ...

Passing Arguments to subroutines

• The subroutines which we have defined so far have not taken any arguments

• Pre-defined Perl subroutines can take arguments, as in this program fragment:

%mothers = (Tom=>May, Bob=>Ann, Tim=>Una);

delete( $mothers{Bob} )

• Can programmer-defined subroutines take arguments?

• Yes, although the way in which they handle arguments is a little different from what you are used to

Passing Arguments to subroutines (contd.)

• Suppose we want a subroutine called greetPerson which

– takes one argument, a string, and

– prints a message greeting the person whose name is the string

• An example call might be

greetPerson(“Eamonn de Valera”)

which should produce the output

Hello, Eamonn de Valera

• The following program fragment should produce the same output:

my $person = “Eamonn de Valera”;

greetPerson($person)

• How would we define such a subroutine?

Passing Arguments to subroutines (contd.)

• Your first instinct might be to write something like this:

sub greetPerson($formalArgument)

{ print “Hello, $formalArgument” }

but that would be WRONG

• A subroutine in Perl must access its actual argument(s) through a special array variable called

@_

• Since our subroutine takes only one argument, this would be in the first element of @_, so our definition would be:

sub greetPerson

{ print “Hello, $_[0]” }

Passing Arguments to subroutines (contd.)

• Suppose we want a subroutine called greetTwoPeople which

– takes two string arguments and

– prints a message greeting the people whose names are the strings

• An example call might be

greetTwoPeople(“Eamonn”, “Michael”)

which should produce the output

Hello, Eamonn and Michael

• Since our subroutine takes two arguments, these would be in the first two elements of @_, so our definition would be:

sub greetTwoPeople

{print “Hello, $_[0] and $_[1]”}

Passing Arguments to subroutines (contd.)

• Suppose we want a subroutine called greetMember which

– takes two arguments

• an array of strings

• an integer pointing to one member of this array

– and prints a message greeting the person whose name in the indicated string

• An example use is:

@club = (Eamonn, Michael, Harry);

greetMember(2, @club)

which should produce the output

Hello, Michael

• This introduces a further complication ...

Passing Arguments to subroutines (contd.)

• All actual arguments to a subroutine are collapsed into one flat array, the special array @_

• Thus, the program fragment

@club = (Eamonn, Michael, Harry);

greetMember(2, @club)

causes the subroutine greetMember to receive an @_ whose value is

(2, Eamonn, Michael, Harry)

• So our definition would be:

sub greetMember

{ print “Hello, $_[$_[0]]” }

Using local variables in subroutines

• Local variables can be defined in subroutines using the my construct

• Indeed, doing so enables us to write subroutines which are easier to understand

• subroutine greetMember on the last slide is clearer if it written using local variables, as follows:

sub greetMember

{my ($position, @strings);

$position = $_[0]-1;

@strings = @_[1..scalar(@_)-1];

print “Hello, $strings[$position]”

}

• CS 4400 got to here on 1 February 2002

Using local variables in subroutines

• We don’t have to declare the local variables in a separate line

• We can just use the my construct in the statements where the vars first appear

• The subroutine greetMember on the last slide could also be written as follows:

sub greetMember

{my $position = $_[0]-1;

my @strings = @_[1..scalar(@_)-1];

print “Hello, $strings[$position]”

}

Using local variables in subroutines

• We can also use a subroutine called shift() to remove the first element from @_

• Since shift() also returns, as its value, the value of the removed element, we can use it in an assignment statement

• Since have removed the first element, we can then assign the new value of @_ to @strings

• The subroutine greetMember on the last slide could also be written as follows:

sub greetMember

{my $position = shift(@_);

my @strings = @_;

print “Hello, $strings[$position]”

}

Using local variables in subroutines

• What I regard as an unfortunate feature of Perl is that it allows a lot of abbreviations

• I present one here, simply because you will often see it in script archives– if no explicit argument is given to shift() in a subroutine, it is

assumed to be @_

• Thus, in a script archive, you might find subroutine greetMember on the last slide written as follows:

sub greetMember

{my $position = shift;

my @strings = @_;

print “Hello, $strings[$position]”

}

subroutines which return values

• We often need to define subroutines which return values, as in the following program fragment:

my @numbers = (1, 2, 3, 4, 5);

my $average = sum( @numbers ) / scalar( @numbers );

print $average

• It can be defined as follows:sub sum

{ my @numbers = @_;

my $sum = 0;

foreach my $value ( @numbers)

{ $sum = $sum + $value }

return $sum

}

• The value returned is specified with a return statement

subroutines which return values (contd.)

• A subroutine can contain more than one return statement• The following program fragment defines and uses a

boolean subroutine which checks for the existence of the argument passed to it

if ( present ( $ENV{"EDITOR"} ) ) { print "\n The envVar EDITOR exists" } else { print "\n The envVar EDITOR does not exist" };

sub present { my $varInQuestion = $_[0]; if ( $varInQuestion ) { return 1 } else { return 0 } }

• It enables us to write a cleaner version of a CGI program we wrote earlier

Revised CGI program which checks for an env var (part 1)

#!/usr/local/bin/perl

print <<EOF;

Content-type: text/html

<HTML>

<HEAD><TITLE> Environment checking program </TITLE></HEAD>

<BODY>

<H1> Environment Variables </H1>

<p>

EOF

if ( present($ENV{”HTTP_GROCERY_ORDER"}) )

{ print ”Request includes Grocery-Order header" }

else { print “Request does not include Grocery-Order header”};

print <<EOF;

</p>

</BODY>

</HTML>

EOF

• Cs 4320 got here on 8 february 2005

Another CGI example ...

Program reporting GET method data

• We will use much of what we have learned to write a CGI program which – is called by a HTML FORM

– and sends back to the browser a HTML page which lists the data it received from the form

Program reporting GET method data (part 1)

#!/usr/local/bin/perl

print <<EOF;

Content-type: text/html

<HTML>

<HEAD>

<TITLE> Program reporting GET method data </TITLE>

</HEAD>

<BODY>

<H1> Form Data sent by the GET method </H1>

<UL>

EOF

printFormData();

print <<EOF;

</UL>

</BODY>

</HTML>

EOF

Program reporting GET method data (part 2)

sub printFormData

{

my $queryString = $ENV{'QUERY_STRING'};

separateAndPrintDataIn($queryString)

}

sub separateAndPrintDataIn

{my (@equations, $name, $value);

@equations = split("&",$_[0]);

foreach my $equation (@equations)

{ ($name,$value) = split("=",$equation);

print ”<LI>$name = $value </LI>";

}

}

• Cs 4320 got here on 24 feb 2004

Decoding query strings

• The previous program was pretty good but it would not work in all cases

• Suppose the program is called by a HTML FORM which contains two text input elements:

– one asks for the user’s name

– one asks for the company for which he works

Decoding query strings

• Suppose the user’s name is Sean Croke and his company’s name is Black&Decker

• The QUERY_STRING received by the program will be

name=Sean+Croke&company=Black%26Decker

because space in the user’s name and the ampersand in the company’s name must be encoded for safe transmission

• separateAndPrintDataIn must be improved to cater for this

• We must learn more about string processing in Perl to do this

• CS 4320 got here on 18 Feb 2003

Some more Perl ...

String Processing in Perl

• Perl contains a facility for reasoning about regular expressions, expression that describe classes of strings

• Since dynamic web page generation is all about text processing, Perl’s regular expression tools are probably the most important reason why the language has become so widely used in server-side web programming

• We will not have time in this course to cover all of Perl’s regular expression facilities

• We will consider only a subset, including those facilities that are required by the form-data processing task we have set out to achieve

Retrieving encoded SP characters

• To retrieve the SP characters that encoded in the QUERY_STRING, we need to learn about only two operators– the translation operator tr///

– the binding operator =~

• Consider the following Perl statement

$stringVar =~ tr/+/ /;

• The binding operator =~ says that the translation expression

tr/+/ / should be applied to the contents of $stringVar • The translation expression tr/+/ / specifies that every instance of the + character should be

replaced by a SP

The tr/// operator

• In general, an application of the tr/// operator is of the form

tr/<list1>/<list2>/

where <list1> and <list2> are (rather simple) reg exprs specifying ordered character lists of equal length

• It specifies that instances of character N in <list1> should be replaced by the corresponding character in <list2>

• Example:

tr/abc/cab/

replaces any instance of a by c, any instance of b by a and any instance of c by b

• Example

tr/A-Z/Z-A/

replaces uppercase letters with the corresponding letters in a reverse-order alphabet

Back to CGI programming ...

Retrieving encoded SP characters (finally!)

• This is the revised definition of separateAndPrintDataIn

sub separateAndPrintDataIn

{my (@equations, $name, $value);

@equations = split("&",$_[0]);

foreach my $equation (@equations)

{ ($name,$value) = split("=",$equation);

$value =~ tr/+/ /;

print ”<LI>$name = $value </LI>";

}

}

Decoding URL-encodings

• The revised definition of separateAndPrintDataIn on the previous slide will handle the + char in a QUERY_STRING like

name=Sean+Croke&company=Black%26Decker

but it will not decode the URL-encoding in %26• We need to modify the subroutine still further so that,

whenever it finds a % followed by two hexadecimal digits it will replace these three characters by the single character whose URL-encoding this three-character sequence represents

• We need to learn some more Perl

Yet more Perl ...

The s/// operator

• A basic application of the s/// operator is of the form s/<pattern>/<replacement>/

where <pattern> is a regular expression and <replacement> is treated as if it were a double-quoted string

• which means that <replacement> can contain variables, some of which may be assigned values while <pattern> is matched with the target string

• The operator specifies that the first instance of <pattern> should be replaced by the corresponding interpretation of <replacement>

The s/// operator (contd.)

• Example s/// expression:

s/ab*c/ac/

this replaces the first substring of the target string that comprises “an a followed by zero or more instances of b followed by by a c” with the substring “ac”

• Example application of the above s/// expression:

$myString = “adabbbbcabbcabceee”;

print “myString is $myString\n”;

$myString =~ s/ab*c/ac/;

print “myString is $myString”

• This produces the following output

myString is adabbbbcabbcabceee

myString is adacabbcabceee

The s/// operator (contd.)

• We have seen that certain characters have a special meaning in regular expressions:– the example on the last slide used the * character which means “0

or more instances of the preceding character or pattern”

• These are called meta-characters

• Other meta-characters are listed on the next slide

The s/// operator (contd.)

• The meta-characters include:• the * character which means “0 or more instances of preceding”

• the + character, which means “1 or more instances of preceding”

• the ? character, which means “0 or 1 instances of preceding”

• the { and } character delimit an expression specifying a range of acceptable occurrences of the preceding character

• Examples:

{m} means exactly m occurences of preceding character/pattern

{m,} means at least m occurrences of preceding char/pattern

{m,n} means at least m, but not more than n, occurrences of preceding char/pattern

• Thus,

{0,} is equivalent to *

{1,} is equivalent to +

{0,1} is equivalent to ?

The s/// operator (contd.)

• Further meta-characters are:• the ^ character, which matches the start of a string

• the $ character, which matches the end of a string

• the . character which matching anything except a newline character

• the [ and ] character starts an equivalence class of characters, any of which can match one character in the target string

• the ( and ) characters delimit a group of sub-patterns

• the | character separates alternative patterns

The s/// operator (contd.)

• Example s/// expression:

s/^a.*d$/x/

this replaces the entire target string with “x”, provided the target string starts with an a, followed by zero or more non-newline characters, and ends with a d

• An example application is on the next slide

The s/// operator (contd.)

• Example application of the s/// expression on the last slide:$myString1 = “adabbbbcabbcabcede”;

print “myString1 is $myString1\n”;

$myString1 =~ s/^a.*d$/x/;

print “myString1 is $myString1\n”

print “\n”;

$myString2 = “adabbbbcabbcabceed”;

print “myString2 is $myString2\n”;

$myString2 =~ s/^a.*d$/x/;

print “myString2 is $myString2”

• This produces the following outputmyString1 is adabbbbcabbcabcede

myString1 is adabbbbcabbcabcede

myString2 is adabbbbcabbcabceed

myString2 is x

The s/// operator (contd.)

• Example s/// expression:

s/^a.{2,5}d$/x/

this replaces the entire target string with “x”, provided the target string starts with an a, followed by between two and five non-newline characters, and ends with a d

• An example application is on the next slide

The s/// operator (contd.)

• Example application of the s/// expression on the last slide:$myString1 = “adabbbbcabbcabced”;

print “myString1 is $myString1\n”;

$myString1 =~ s/^a.{2,5}d$/x/;

print “myString1 is $myString1\n”

print “\n”;

$myString2 = “afghd”;

print “myString2 is $myString2\n”;

$myString2 =~ s/^a.{2,5}d$/x/;

print “myString2 is $myString2”

• This produces the following outputmyString1 is adabbbbcabbcabced

myString1 is adabbbbcabbcabced

myString2 is afghd

myString2 is x

The s/// operator (contd.)

• Example s/// expression:

s/(abc){2,5}d/x/

this replaces the first sub-string in the target that comprises “between 2 and 5 repeats of the the pattern abc, followed by the letter d” with “x”

• An example application is on the next slide

The s/// operator (contd.)

• Example application of the s/// expression on the last slide:$myString = “abcdefabcabcabcabcdefgh”;

print “myString is $myString\n”;

$myString =~ s/(abc){2,5}d/x/;

print “myString is $myString1\n”

• This produces the following outputmyString is abcdefabcabcabcabcdefgh

myString is abcdefxefgh

The s/// operator (contd.)

• Example s/// expression:

s/(foo|bar)/x/

this replaces the first sub-string that matches either foo or bar with x

• An example application is on the next slide

The s/// operator (contd.)

• Example application of the s/// expression on the last slide:$myString1 = “abcfoodefbar”;

print “myString1 is $myString1\n”;

$myString1 =~ s/(foo|bar)/x/;

print “myString1 is $myString1\n”

print “\n”;

$myString2 = “abcbar”;

print “myString2 is $myString2\n”;

$myString2 =~ s/(foo|bar)/x/;

print “myString2 is $myString2”

• This produces the following outputmyString1 is abcfoodefbar

myString1 is abcxdefbar

myString2 is abcbar

myString2 is abcx

The s/// operator (contd.)

• Although some characters have special meanings in regular expressions, we may, sometimes, just want to use them to match themselves in the target string

• We do this by escaping them in the regular expression, by preceding them with a backslash \

• Example s/// expression:

s/^a\^+.*d$/x/

this replaces the entire target string with “x”, provided the target string starts with an a, followed by one or more carat characters, followed by zero or more non-newline characters, and ends with a d

• An example application is on the next slide

The s/// operator (contd.)

• Example application of the s/// expression on the last slide:$myString1 = “adabbbbcabbcabced”;

print “myString1 is $myString1\n”;

$myString1 =~ s/^a\^+.*d$/x/;

print “myString1 is $myString1\n”

print “\n”;

$myString2 = “a^^^abbbbcabbcabceed”;

print “myString2 is $myString2\n”;

$myString2 =~ s/^a\^+.*d$/x/;

print “myString2 is $myString2”

• This produces the following outputmyString1 is adabbbbcabbcabced

myString1 is adabbbbcabbcabced

myString2 is a^^^abbbbcabbcabceed

myString2 is x

The s/// operator (contd.)

• As mentioned earlier, the [ and ] characters have a special meaning in regular expressions – they delimit an equivalence class of characters, any one of which

may be used to match one character in the target string

• Example s/// expression:

s/a[KLM]b/x/

replaces the first substring comprising “the letter a followed by one of the three letters KLM, followed by the letter b” with the substring “x”

The s/// operator (contd.)

• The ^ character has a special meaning when used as the first character between [ and ] characters; this meaning is different from its special meaning when used outside the [ and ] characters– when used as the first character between the [ and ] characters, the

^ character specifies the complement of the equivalence class that would have been specified if its were absent

• Example s/// expression:

s/a[^KLM]b/x/

replaces the first substring comprising “the letter a followed by any single letter that is not one of KLM, followed by the letter b” with the substring “x”

The s/// operator (contd.)

• The - character also has a special meaning when used between [ and ] characters:– it is used to join the start and end of a sequence of characters, any

one of which may be used to match one character in the target string

• Example s/// expression:

s/a[0-9]b/x/

replaces the first substring comprising “the letter a followed by one digit, followed by the letter b” with the substring “x”

The s/// operator (contd.)

• Example s/// expression:

s/ %[a-fA-F0-9]/x/

replaces the first substring comprising “an % followed by a hexadecimal digit” with the substring “x”

• An example application is on the next slide

The s/// operator (contd.)

• Example application of the s/// expression on the last slide:

$myString = “a%klm%Abbb%Cyyy”;

print “myString is $myString\n”;

$myString =~ s/%[a-fA-F0-9]/x/;

print “myString is $myString”

• This produces the following output

myString is a%klm%Abbb%Cyyy

myString is a%klmxbbb%Cyyy

The s/// operator (contd.)

• Certain escape sequences also have a special meaning in regular expressions. They define certain commonly used equivalence classes of characters:\w is equivalent to [a-zA-Z0-9_] \W is equivalent to [^a-zA-Z0-9_] \d is equivalent to [0-9] \D is equivalent to [^0-9] \s is equivalent to [ \n\t\f\r] \S is equivalent to [^ \n\t\f\r] \b denotes a word boundary\B denotes a non-word boundary

• Note the SP characters in the meaning of \s and \S, that is the white-space equivalence includes SP

• Byt the way, \f is formFeed and \r is carriageReturn

The s/// operator (contd.)

• Example s/// expression:

s/ %\d\d\d\D/x/

replaces the first substring comprising “an % followed by three decimal digits, followed by a non-digit” with the substring “x”

• Example s/// expression:

s/ \s\w\w\s/x/

replaces the first substring comprising “a white-space character, followed by two word characters, followed by another white-space character” with the substring “x”

The s/// operator (contd.)

• The standard quantifiers are all "greedy”

– they match as many occurrences as possible without causing the pattern to fail.

• It is possible to make them “frugal”

– that is, make them match the minimum number of times necessary

• We do this by following the quantifier with a "?"

• *? Match 0 or more times, preferably only 0

• +? Match 1 or more times, preferably only 1 time• ?? Match 0 or 1 time, preferably only 0• {n}? Match exactly n times• {n,}? Match at least n times, preferably only n times• {n,m}? Match at least n but not more than m times, preferably

only n times

• Consider the effect of this quantifier modification below:$myString1 = "abcabcabcabc";

print "myString1 is $myString1\n";

$myString1 =~ s/(abc){2,5}/x/;

print "myString1 is $myString1\n";

$myString2 = "abcabcabcabc";

print "myString2 is $myString2\n";

$myString2 =~ s/(abc){2,5}?/x/;

print "myString2 is $myString2\n"

• This produces the following outputmyString1 is abcabcabcabc

myString1 is x

myString2 is abcabcabcabc

myString2 is xabcabc

• CS4400 got to here on 8 February 2002

The s/// operator (contd.) -- remembering subpattern matches

• When a <pattern> is being matched with a target string, substrings that match sub-patterns can be remembered and re-used later in the same pattern

• Sub-patterns whose matching substrings are to be remembered are enclosed in parentheses

• The sub-patterns are implicitly numbered, starting from 1 and their matching substrings can then be re-used later in the pattern by preceding the appropriate integer with a backslash \

The s/// operator -- remembering subpattern matches (contd.)

• Example s/// expression:

s/ %([a-fA-F0-9])\1/x/

replaces the first substring comprising “an % followed by two identical hexadecimal digits” with the substring “x”

• Example application of the above s/// expression:

$myString = “a%klm%Abb%CCbb%DDbbb%Cyyy”;

print “myString is $myString\n”;

$myString =~ s/%([a-fA-F0-9])\1/x/;

print “myString is $myString”

• This produces the following output

myString is a%klm%Abb%CCbb%DDbbb%CyyymyString is a%klm%Abbxbb%DDbbb%Cyyy

• CS 4320 got here on 28 Feb 2003

The s// operator (contd.) -- using subpattern matches in replacements

• We saw that, within a <pattern>, substrings that matched sub-patterns can be re-used later in the pattern by preceding the appropriate integer with a backslash \

• Within a <replacement>, substrings that matched sub-patterns in the <pattern> can be used by preceding the appropriate integer with a dollar $

Using subpattern matches in replacements (contd.)

• Example:

s/ %([a-fA-F0-9])\1/x$1$1/

replaces the first instance of “an % followed by two identical hexadecimal digits” with “x followed by two instances of the hexadecimal digit”

• Example application of the above s/// expression:

$myString = “a%klm%Abb%CCbb%DDbbb%Cyyy”;

print “myString is $myString\n”;

$myString =~ s/%([a-fA-F0-9])\1/x$1$1/;

print “myString is $myString”

• This produces the following output

myString is a%klm%Abb%CCbb%DDbbb%CyyymyString is a%klm%AbbxCCbb%DDbbb%Cyyy

The s/// operator (contd.) -- the g modifier

• Normally, an application of the s// operator replaces only the first instance of the <pattern> regular expression

• When the g (short for global) modifier is used, all instances are replaced

• Example:

s/ %[a-fA-F0-9]/x/

replaces the first substring comprising “an % followed by a hexadecimal digit” with the substring “x”

• Example:

s/ %[a-fA-F0-9]/x/g

replaces all substrings comprising “an % followed by a hexadecimal digit” with the substring “x”

The s/// operator (contd.) -- the e modifier

• In a normal application of the s// operator, the <replacement> is treated as if it were a double-quoted string

• When the e (short for execute) modifier is used, the <replacement> is executed as if it were standard Perl code – which means that it can involve subroutine calls using any

variables as arguments

• Example:

s/ %([a-fA-F0-9])/foo($1)/e

replaces the first substring comprising “an % followed by a hexadecimal digit” with “the result of applying the subroutine foo to a scalar variable containing this hexadecimal digit

The s/// operator (contd.) -- another example

• Example:

s/%([a-fA-F0-9][a-fA-f09])/pack("C",hex($1))/eg

This replaces all substrings comprising “an % followed by a pair of hexadecimal

digits”

with

“the result of evaluating the expression

pack("C",hex($1))

where $1 is a scalar variable containing this pair of hexadecimal digits”

Some more Perl subroutines

• hex() takes one, string, argument, interprets it as a hexadecimal number and returns the corresponding value.

• Example application$string = “aB”;$number = hex($string);print $number

• The above program fragment would produce this output

171

Some more Perl subroutines (contd.)

• pack() and unpack()

These subroutines are used to encode and decode data to/from various formats

• pack() takes a list of data values and packs them into a binary structure

• unpack() takes a string representation of a structure and expands it into a list of data values

pack()

• A call to this subroutine has the form

pack( <format>,<list> )

• The subroutine encodes data provided in <list> into the form specified by the characters in <format>

• <format> is a string whose constituent characters specify both the type of data to be packed into the structure and the order in which it is to be packed.

• <format> can contain a wide variety of characters, only one of which we will consider here:

C

which specifies an unsigned char value

pack() (contd.)

• Example application:

$str = pack( "CCCCC",100,101,102,103,104);

print $str

would produce this output

defgh

• Example application:

$str = pack( "CC",hex(“64”),hex(“65”));

print $str

would produce this output

de

unpack()

• unpack() does the reverse of pack():

• A call to this subroutine is of the form

unpack( <format>, <string-expression> );

• The subroutine unpacks the <string-expression>, which is a representation of some structure, into a list of items

• The form of the unpacking is driven by the characters in <format>

unpack() (contd.)

• Example application:

@list = unpack("CCCCC", defgh);

foreach my $member (@list)

{ print “$member “ }

• This will produce the following output: 100 101 102 103 104

The s/// operator -- another example (contd.)

• Example: application$myString = “Black%26Decker%3ACompany”;

print “myString is $myString\n”;

$myString =~ s/%([a-fA-F0-9][a-fA-f09])/pack("C",hex($1))/eg;

print “myString is $myString”

• This produces the following outputmyString is Black%26Decker%3AIreland

myString is Black&Decker:Ireland

More on s/// expressions

• All the s/// expressions we have written so far have consumed all the characters that matched the <pattern> specified by the regular expression and substituted whatever was specified by the <replacement>

• There was no notion of examining the context surrounding the consumed characters– any characters that were matched were consumed

• We need some way of matching characters without removing them from the target string

• Perl provides two meta-expression for doing this

Look-ahead checks

(?=regexp)

This is a non-consuming positive lookahead check

It matches characters in the target string against the pattern specified by the embedded regular expression regexp without consuming them from the target string

• Example

s/\w+(?=\t)/X/g

This replaces words that are followed by tabs with the character X, without removing the tabs from the target string

• Example applications are on the next slide

Look-ahead checks (contd.)

• Program fragment:$myString1 = "fred\t is a brave\t man";

print "myString1 is $myString1\n";

$myString1 =~ s/\w+\t/X/g;

print "myString1 is $myString1\n";

$myString2 = "fred\t is a brave\t man";

print "myString2 is $myString2\n";

$myString2 =~ s/\w+(?=\t)/X/g;

print "myString2 is $myString2\n”

• Output producedmyString1 is fred is a brave man

myString1 is X is a X man

myString1 is fred is a brave man

myString1 is X is a X man

Look-ahead checks (contd.)

(?!regexp)

This is a non-consuming negative lookahead check

It ensures that characters in the target string do not match the pattern specified by the embedded regular expression regexp

• Example

s/cow(?!boy)/X/g

This replaces all sub-strings “cow” with “X”, provided these sub-strings are not followed by the sub-string “boy”

• CS4400 gotr to here on 12 February 2002

Look-behind checks

(?<=regexp)

This is a non-consuming look-behind check

It ensures that preceding characters in the target string match the pattern specified by the embedded regular expression regexp

• Example

s/(?<=cow)boy/girl/g

This replaces all sub-strings “boy” with “girl”, provided these sub-strings are preceded by the sub-string “cow”

Look-behind checks (contd.)

(?<!regexp)

This is a non-consuming negative look-behind check

It ensures that preceding characters in the target string do not match the pattern specified by the embedded regular expression regexp

• Example

s/(?<!cow)boy/girl/g

This replaces all sub-strings “boy” with “girl”, provided these sub-strings are not preceded by the sub-string “cow”

Back to CGI programming ...

(At last!) back to decoding URL-encodings

This is the revised definition of separteAndPrintDataIn sub separateAndPrintDataIn

{my (@equations, $name, $value);

@equations = split("&",$_[0]);

foreach my $equation (@equations)

{ ($name,$value) = split("=",$equation);

$value =~ tr/+/ /;

$value =~

s/%([a-fA-F0-9][a-fA-f0-9])/pack("C",hex($1))/eg;

print ”<LI>$name = $value </LI>";

}

}

• The following material on Perl will not be subject to examination

Watching out for hackers

• The revised definition of separateAndPrintDataIn on the previous slide will handle the + char and the %26 URL-encoding in a QUERY_STRING like

name=Sean+Croke&Black%26Decker

but it could expose our server to hackers

• One trick of hackers is to send Server Side Include commands in form data

• We need to modify the subroutine still further so that, whenever it finds anything which looks even remotely like an SSI in form data, it eliminates the offending piece of data

• We need to learn some more Perl

Yet more Perl ...

The m// operator

• A basic application of the m// operator is of the form

m/<pattern>/

where <pattern> is a regular expression

• This expression checks whether any instance of <pattern> can be found in the target expression

• The m// operator is frequently used in conditional expressions, as part of if and while statements

The m// operator (contd.)

• Generic application

$targetString =~ m/<pattern>/

evaluates to true if $targetString contains at least one instance of <pattern>

• Generic application

$targetString !~ m/<pattern>/evaluates to true if $targetString contains no instance of

<pattern>

• Specific application

$targetString =~ m/<!--(.|\n)*-->/evaluates to true if $targetString contains at least instance

of a sub-string which looks like a HTML comment (and, therefore, might contain a Server Side Include)

The m// operator (contd.)

• The m// operator can be used in the condition part of an if statement

• Example

if ( $value =~ m/<!--(.|\n)*-->/ )

{ $value =~ s/<!--(.|\n)*-->//g };

removes from $value all sub-strings which looks like HTML comments (and which, therefore, might contain Server Side Includes)

Back to CGI programming ...

Watching out for hackers (contd.)

This is the final definition of separateAndPrintDataIn() sub separateAndPrintDataIn

{my (@equations, $name, $value);

@equations = split("&",$_[0]);

foreach my $equation (@equations)

{ ($name,$value) = split("=",$equation);

$value =~ tr/+/ /;

$value =~

s/%([a-fA-F0-9][a-fA-f0-9])/pack("C",hex($1))/eg;

if ( $value =~ m/<!--(.|\n)*-->/ )

{print ”<EM>SSI removed from following:</EM> ";

$value =~ s/<!--(.|\n)*-->//g };

print ”<LI>$name = $value </LI>";

}

}

Improved program reporting GET method data

• Using this new definition of separateAndPrintDataIn() we have an improved version of the CGI program which – is called by a HTML FORM

– and sends back to the browser a HTML page which lists the data it received from the form

Improved GET data program (part 1)

#!/usr/local/bin/perl

print <<EOF;

Content-type: text/html

<HTML>

<HEAD>

<TITLE> Form Data reporting program </TITLE>

</HEAD>

<BODY>

<H1> Form Data </H1>

<UL>

EOF

printFormData();

print <<EOF;

</UL>

</BODY>

</HTML>

EOF

Improved GET data program (part 2)

sub printFormData

{ print "<P>Your form sent these data:</P>\n";

my $queryString = $ENV{'QUERY_STRING'};

separateAndPrintDataIn($queryString)

}

Improved GET data program (part 3)

sub separateAndPrintDataIn

{my (@equations, $name, $value);

@equations = split("&",$_[0]);

foreach my $equation (@equations)

{ ($name,$value) = split("=",$equation);

$value =~ tr/+/ /;

$value =~

s/%([a-fA-F0-9][a-fA-f0-9])/pack("C",hex($1))/eg;

if ( $value =~ m/<!--(.|\n)*-->/ )

{print ”<EM>SSI removed from following:</EM> ";

$value =~ s/<!--(.|\n)*-->//g };

print ”<LI>$name = $value </LI>";

}

}

Needs further refinement

• This program needs further refinement.

• It will not properly handle forms in which there are fields which allow multiple selections

• A better version of this program is available on my ftp site

• The program is

lectureExample2.cgi

File Processing in Perl

File Processing

• File handling in Perl is based on the notion of a file handle, a token which is associated with a disk file or an input/output device

• Before a file is used, a handle must be created for it;

– subsequently, all operations on the file refer to the handle, not the file name

• However, Perl provides three pre-defined file handles

– STDIN, which is the handle for standard input;

– STDOUT, the handle for standard output;

– STDERR, the handle for the output channel where error messages should be sent

File Processing (contd.)

• In normal execution mode, the pre-defined file handles have the following associations:

– STDIN is attached to the keyboard;

– STDOUT is attached to the console;

– STDERR is attached to the console

• In CGI mode, however,

– STDIN receives data from the HTTP server demon;

– STDOUT sends data to the HTTP server demon, for onward transmission to the client;

– STDERR sends data to ??????

File Processing (contd.)

• We have already been using STDOUT implicitly

– the print() subroutine, by default, send its output there

• The statement

print( “Hello world”)

is implicitly the same as

print(STDOUT “Hello world”)

• We can re-direct the output of the print() subroutine to any file handle

• If, for example, we have already defined myHandle as a file handle, we could direct output there, as follows:

print(myHandle “Hello world”)

File Processing (contd.)

• We define a file handle when we open a file and, at the same time, associate the file handle with the file

• The open() subroutine is used to open a file

• It syntax is as follows:

open( <handle>, <access-and-name> )

where

<handle> is the token we wish to use as the handle for the file

and

<access-and-name> is a string which specifies the operating system’s name for the file and, also, the type of access we want to the file

• read-only, (the default form of access)

• write-only or

• append-only

File Processing (contd.)

• Example usage:

open(Customers, “customerFile.txt”)

opens a file called customerFile.txt and associates with it the handle Customers; because we have said nothing about the form of access, it is read-only by default

• Example usage:

open(Customers, “<customerFile.txt”)

opens a file called customerFile.txt and associates with it

the handle Customers; usage of < explicitly states that we want read-only access

File Processing (contd.)

• Example usage:

open(Customers,“>customerFile.txt”)

opens a file called customerFile.txt and associates with it

the handle Customers; usage of > explicitly states that we want write-only access

• Example usage:

open(Customers,“>>customerFile.txt”)

opens a file called customerFile.txt and associates with it

the handle Customers; usage of >> explicitly states that we want append-only access

File Processing (contd.)

• When a program is finished reading from or writing to the device associated with a file handle, the channel to the device should be closed

• This is done by using the close() subroutine; this takes only one argument, a file handle

• Example usage:

close( Customers )

Example Program

• Consider this program fragment:

open(handle1, “>output.txt”);

print(handle1 “Hello, world!\n”);

print(handle1 “How are you?”);

close(handle1)

• It places the following content in file output.txt:

Hello, world!

How are you?

Reading from a file

• We already know how to write to a file

– we use the print() subroutine, quoting the file handle

• To read from a file,

– we apply the <> input operator to the file handle

• Example usage:

$line = < myHandle99 > This reads the next available line from the file which is

associated with the handle myHandle99 and copies it into the scalar variable $line

• The input operator returns the special value undef at the end of a file

Example Program

• Consider this program fragment:

open(myHandle,"<output.txt");

$line1 = <myHandle>;

$line2 = <myHandle>;

close(myHandle);

print $line1;

print $line2

• It assumes that the file contains at least two lines

• If these are

Hello, world!

How are you?

the program fragment prints the following output

Hello, world!

How are you?

Reading from a file of unknown length

• If we do not know how many lines are in a file, we should use the while construct and a boolean subroutine called defined which checks whether its single argument has the special value undef

• Consider this program fragment:

open(myHandle,"<output.txt");

$line = <myHandle>;

while ( defined($line) )

{ print $line;

$line = <myHandle> };

close(myHandle)

• This program fragment makes no assumptions about how many lines are in the file being read

• CS4400 got to here at 13:00 on 15 February 2002

Reading a datum of known length

• If we know exactly the length of the piece of data we wish to input, we can use the read() subroutine

• Syntax:

read( <HANDLE>,<SCALAR>,<LENGTH> )

• This subroutine attempts to read <LENGTH> bytes of data into variable <SCALAR> from the device attached to <HANDLE>.

• If <LENGTH> bytes are not actually available, <SCALAR> will be assigned the bytes actually read

Reading a datum of known length (contd.)

• Consider this program fragment:

open(myHandle,"<output.txt");

read(myHandle,$line,8);

print $line;

close(myHandle)

• If the file actually contains

Hello, world!

How are you?

the program fragment prints the following eight characters to STDOUT

Hello, w

Checking that a file is available

• Consider this program fragment:

open(myHandle,"<output.txt");

$line = <myHandle>;

if ( defined($line) )

{ ... }

else { ... }

• Although it checks whether the file contains any data, it assumes that the file can be opened for reading

• This is not a safe assumption in all circumstances

• A user on the web will not be happy if a CGI resource that he has requested crashes in mid-execution

• We should always check whether a file can be opened and, if not, send a useful piece of HTML to the user’s browser

Checking that a file is available (contd.)

• Consider this program fragment:

if ( not (open(myHandle,"<output.txt") ) )

{ print”<P>File unavailable</P>” }

else { $line = <myHandle>;

...

...

}

• If the file cannot be opened, it sends a warning to the user’s browser; otherwise, it proceeds to process the file

File Locking in Perl• A CGI program should lock a data file while it is using it

• Why?

– Several copies of the program may be running at the same time

• this will happen, for example, if different users simultaneously send HTTP requests for the same CGI program resource

– Different CGI programs may require access to the same data file at the same time

• for example, one program may wish to read from the file while another may wish to add data to the file

• Perl provides an flock operator which we use

– to lock a file immediately before we use it;

– to unlock the file immediately after we have finished using it

File Locking in Perl (contd.)• Syntax of usage:

flock(<file-handle>,<lock-option>)• The lock-options include:

• 1 which requests a shared lock, which is usually adequate for reading from a file

• 2 which requests an exclusive lock, which is required when writing to a file

• 8 which releases a previously requested lock

File Locking in Perl (contd.) Requesting and receiving a shared lock means

we are happy to allow other programs to read from the file at the same time as our program is doing so;

but a program which wants to write to the file will be delayed until we release the lock, since a write program should request an exclusive lock, something it cannot receive if our shared lock request has been granted

Requesting and receiving an exclusive lock means no other program can read from, or write to, the file until we

release the lock

File Locking in Perl (contd.)

• Example fragment of file-reading program:if ( not (open(myHandle,"<output.txt") ) )

{ print”<P>File unavailable</P>” }

else { flock(myHandle,1);

$line = <myHandle>;

while ( defined($line) )

{ ...

$line = <myHandle>

};

flock(myHandle,8)

}

File Locking in Perl (contd.)

• Example fragment of file-writing program:if ( not (open(myHandle,”>output.txt") ) )

{ print”<P>File unavailable</P>” }

else { flock(myHandle,2);

... Write stuff to the file ...

flock(myHandle,8)

}

File Access Permissions for CGI programs

• Remember that, in a multi-user operating system, different users have differing permissions to access a data file

– some users may be able to write to the file

– other users may be able only to read from the file

– other users may not be permitted to access the file in any way

• In Unix, for example, these permissions appear in the output provided by the ls command

• Example ls output

-rw-r--r-- 1 fred admin 1234 Feb 19 08:24 customers.txt

File Access Permissions for CGI programs (contd.)

• Programs typically inherit the file access permissions of the users which execute the programs

• Thus, only a program executed by the user “fred” could write to the following data file:

-rw-r--r-- 1 fred admin 1234 Feb 19 08:24 customers.txt

• What permissions do CGI programs have to access data files?

• A CGI program is executed by the HTTP server demon• In Unix systems, the HTTP server demon is treated as if it

were an ordinary system user, typically with a username like “nobody” or “httpd”

• Therefore, CGI programs on Unix systems usually have whatever access permissions are possessed by the HTTP server demon

File Access Permissions for CGI programs (contd.)

• Therefore, CGI programs on Unix systems usually have whatever access permissions are possessed by the HTTP server demon

• This is a security hole:– any user of a multi-user Unix system could write a CGI program

which could modify the contents of a data file owned by any other user

• There is a way around this problem:– it involves a notion called setuid – a treatment of this is beyond the scope of this lecture– just remember, when you are doing CGI programming in the real

world, that you need to consider file access permissions

Back to CGI form handling

Back to FORM data handling

• Now that we know how to read data from files, including STDIN, we can write a CGI program which reads data from a FORM that uses the POST request method

• In fact, we can easily write a CGI program that can accept data sent by either the GET or POST method

General Form Data reporting program (part 1)

#!/usr/local/bin/perl

print <<EOF;

Content-type: text/html

<HTML>

<HEAD>

<TITLE> Form Data reporting program </TITLE>

</HEAD>

<BODY>

<H1> Form Data </H1>

<UL>

EOF

printFormData();

print <<EOF;

</UL>

</BODY>

</HTML>

EOF

General Form Data reporting program (part 2)

sub printFormData

{ my ($requestMethod, $buffer);

$requestMethod = $ENV{‘REQUEST_METHOD’};

print "<P> Your form used $requestMethod and “;

print ”it sent the following data: </P>\n";

if ($requestMethod eq 'POST')

{read(STDIN,$buffer,$ENV{'CONTENT_LENGTH'})}

else

{$buffer = $ENV{'QUERY_STRING'}};

separateAndPrintDataIn($buffer)

}

General Form Data reporting program (part 3)

sub separateAndPrintDataIn

{my (@equations, $name, $value);

@equations = split("&",$_[0]);

foreach my $equation (@equations)

{ ($name,$value) = split("=",$equation);

$value =~ tr/+/ /;

$value =~

s/%([a-fA-F0-9][a-fA-f0-9])/pack("C",hex($1))/eg;

if ( $value =~ m/<!--(.|\n)*-->/ )

{print ”<EM>SSI removed from following:</EM> ";

$value =~ s/<!--(.|\n)*-->//g };

print ”<LI>$name = $value </LI>";

}

}

Note

• This program does not handle forms which include fields that allow multiple selections

• You will have to modify this program using the techniques used in lectureExample2.cgi (available at my ftp site, as said earlier) in order to make it handle multiple selections

Some more example programs

Printing files referenced in a GET request

• Remember that the following is a well-formed HTTP request:

GET /cs4400/jabowen/cgi-bin/print.cgi/extra/path/info.txt HTTP/1.1

Host: student.cs.ucc.ie

The server recognizes that the application program is print.cgi so it passes the string /extra/path/info.txt in the

environment variable PATH_INFOPATH_INFO /extra/path/info.txt

Printing files referenced in a GET request (contd.)

• Remember that this information is also also passed in

PATH_TRANSLATED

• For example, if

DOCUMENT_ROOT /usr/local/www/docs

and

PATH_INFO /extra/path/info.txt

then

PATH_TRANSLATED /usr/local/www/docs/extra/path/info.txt

• We will now write a CGI program which prints a file whose path and name are passed in PATH_TRANSLATED

A CGI script that displays text files

#!/usr/local/bin/perl

print "Content-Type: text/plain \n\n";

$fileName = $ENV{'PATH_TRANSLATED'};

if ( open(FILE,"<$fileName”) )

{ $line = <FILE>;

while ( defined($line) )

{ print $line;

$line=<FILE>

}; close(FILE)

}

else { print ”File cannot be opened\n"}

A CGI script that displays text files (contd.)

• The script on the last slide is insecure because, – although we may think the client is restricted to files in the

DOCUMENT_ROOT hierarchy,

– the user sending the request could use “..” to go up the directory structure

• For example, he could send this requestGET /cs4400/jabowen/cgi-bin/print.cgi/../../passwords.txt HTTP/1.1

Host: student.cs.ucc.ie

resulting in PATH_TRANSLATED having this value:PATH_TRANSLATED /usr/local/www/docs/../../passwords.txt

which is equivalent to:PATH_TRANSLATED /usr/local/passwords.txt

A CGI script that displays only text files located in the document root hierarchy

#!/usr/local/bin/perl

print "Content-Type: text/plain \n\n";

$fileName = $ENV{'PATH_TRANSLATED'};

if ($fileName =~ m/\.\./)

{ print ”Bad chars in file name.\n” }

else { if ( open(FILE,"<$fileName”) )

{ $line = <FILE>;

while ( defined($line) )

{ print $line;

$line = <FILE>

}; close(FILE)

}

else {print ”File cannot be opened\n"} }

Modular programming in Perl

Packages

Modular Programming in Perl

• One advantage of programming in Perl is that we can easily access and use code written by others

• This code is available on the web in the form of modules

• One source of these modules is the Comprehensive Perl Archive Network (CPAN)

• This can be found at www.cpan.org but it is also mirrored at more than a hundred sites around the world

Using Modules in Perl

• Before a module can be used, it must be installed – either in the directory where the program using the module resides

– or in a special system directory

• The installation of new modules in the special system directory is easy but is beyond the scope of this course

• Most installations of Perl already include a large number of modules in the special systems directory and it is unlikely that you will need to install anything there for a long time

• If you wish to install your own module in your own directory, this is as simple as placing the text file containing the module in the directory

Using Modules in Perl (contd.)

• By convention, the name of a Perl module starts with an upper-case letter and is stored in a file which has the same name as the name of the module, but which has a

.pm

suffix instead of the usual .cgi or .pl suffix

• Thus, a module called MyOwnUtilities is stored in a file called MyOwnUtilities.pm

Using Modules in Perl (contd.)

• To use a module, we must specify it in a use statement near the top of our program

• There are several forms of use statement

• We will just use the simplest form, which has the following format:

use <module-name>;

• Thus, if we wish to use subroutine(s) implemented in MyOwnUtilities we must use the statement

use MyOwnUtilities;

Using Modules in Perl (contd.)

• If we wish to use a subroutine implemented by a module, we should preface the name of the desired subroutine, whenever we invoke it in our program, with

<module-name>::• Thus, if we wish to use a subroutine called myAverageOf() which is implemented in MyOwnUtilities we must invoke it as is done in the following statement:

$average = MyOwnUtilities::myAverageOf($n1,$n2);

Using Modules in Perl (contd.)

• Example Program:#!/usr/local/bin/perl

use MyOwnUtilities;

my ($n1, $n2) = (12, 24);

my $average = MyOwnUtilities::myAverageOf($n1,$n2);

print "The average of $n1 and $n2 is $average"

• Output Produced by Example program:The average of 12 and 24 is 18

Writing Modules in Perl

• We will not consider all possible details of writing modules in Perl

• However, let us consider the source code of module MyOwnUtilities

• Source code:

package MyOwnUtilities;

sub myAverageOf

{ return ( $_[0] + $_[1] )/2

}

1;

Writing Modules in Perl (contd.)

• Structure of a module:

package <module-name> ;

<definitions-of-resources- implemented-by-the-module>

1;• The first statement in a module file consists of the

keyword package followed by the name of the module

• The last statement of a module must evaluate to a value which Perl regards as true so, by convention, most programmers use the statement 1;

• In between these statements, we place the definitions of the resources implemented by the module

Object-oriented programming in Perl

• Perl is a very large language

• As well as the functional/procedural style of programming that we have seen so far, it also supports object-oriented programming

• We will not, in this course, have time to go into the details of OOP in Perl

• In fact, I mention it only because many of the packages which make Perl a powerful web programming tool are written in an OOP style

• One hint that you have found an OO script is the appearance of the arrow operator, ->, which is usually used to access a method or attribute of an object;

– if you see it, you can be fairly certain you have an OO script

• The web contains many powerful OO-based Perl libraries, modules and packages which provide utilities for useful tasks, including the easy construction of CGI programs

• Indeed, there exists a module called CGI.pm which provides a host of features that make it easy to write CGI programs

– if you ever have to write serious CGI programs in your future jobs, then you should use CGI.pm

– indeed, the only reason I have shown you how to use basic Perl for CGI programming is so that you will understand what is done by CGI.pm resources

• CPAN contains a huge number of modules apart from CGI.pm

• Among the most important are:

LWP::UserAgent which provides facilities that can be used to write special-purpose web clients;

XML::Parser which provides facilities that can be used to parse XML documents;

XML::DOM which provides facilities that can be used to manipulate XML document object models

• We may refer to some of the XML modules later in this course when we discuss XML

Some last thoughts on Perl ...

... for now, at least ..

Warning: variant syntax

• I have mixed feelings about Perl

• It is quite a useful language

• However,

– there are too many pieces of syntactic sugar

– these make the language bigger, without adding any functionality

Warning: variant syntax (contd.)

• For example, consider the until construct

• Example usage$x=0;

until ( $x > 10 )

{ print “$x\n”;

$x = $x+1 }

• But this is equivalent to the following!!!!!$x=0;

while ( $x <= 10 )

{ print “$x\n”;

$x = $x+1 }

• A whole new construct has been added for nothing. It simply means more to remember, more to forget!

Warning: variant syntax (contd.)

• Another exampleopen(FILE, “<customers.txt”);

while ( <FILE> )

{ print $_ };

close(FILE)

• This is short-hand for the followingopen(FILE, “<customers.txt”);

$line = <FILE>;

while ( defined($line) )

{ print $line;

$line = <FILE> };

close(FILE)

• Yes, it’s shorter, but it’s another piece of syntax which adds no functionality to the language.

Warning: variant syntax (contd.)

• There are many sites on the web which offer repositories of CGI scripts written in Perl

• Some of these scripts are useful; some are not

• Some of these scripts are badly written; some are not

• You will find many variants of the core Perl syntax floating around these script repositories

• Try not to get what has been called “cancer of the semi-colon” from all this unnecessary syntactic sugar

• You may often find scripts which use syntactic variants that I have not covered in these lectures

• If you do and you want to use them, search the on-line Perl documentation (see reference later) until you get an adequate explanation

More documentation on Perl

• O’Reilly & Associates, Inc. sponsor the following web-site, which contains a lot of information about Perl

http://www.perl.com/pub

• Included in this documentation is the following page which lists the pre-defined subroutines and gives brief explanations of themhttp://www.perl.com/pub/doc/manual/html/pod/perlfunc/

• For your convenience, a list of subroutines, adapted from the above site, is provided on the next few slides

Glossary of pre-defined Perl subroutines

• abs

– absolute value subroutine

• accept

– accept an incoming socket connect

• alarm

– schedule a SIGALRM

• atan2

– arctangent of Y/X

• bind

– binds an address to a socket

• binmode

– prepare binary files on old systems

• bless

– create an object

• caller

– get context of the current subroutine call

• chdir

– change your current working directory

• chmod

– changes the permissions on a list of files

• chomp

– remove a trailing record separator from a string

• chop

– remove the last character from a string

• chown

– change the owership on a list of files

• chr

– get character this number represents

• chroot

– make directory new root for path lookups

• close

– close file (or pipe or socket) handle

• closedir

– close directory handle

• connect

– connect to a remove socket

• continue

– optional trailing block in a while or foreach

• cos

– cosine function

• crypt

– one-way passwd-style encryption

• dbmclose

– breaks binding on a tied dbm file

• dbmopen

– create binding on a tied dbm file

• defined

– test whether a value, variable, or subroutine is defined

• delete

– deletes a value from a hash

• die

– raise an exception or bail out

• do

– turn a BLOCK into a TERM

• dump

– create an immediate core dump

• each

– retrieve the next key/value pair from a hash

• endgrent

– be done using group file

• endhostent

– be done using hosts file

• endnetent

– be done using networks file

• endprotoent

– be done using protocols file

• endpwent

– be done using passwd file

• endservent

– be done using services file

• eof

– test a filehandle for its end

• eval

– catch exceptions or compile code

• exec

– abandon this program to run another

• exists

– test whether a hash key is present

• exit

– terminate this program

• exp

– raise e to a power

• fcntl

– file control system all

• fileno

– return file descriptor from filehandle

• flock

– lock an entire file with an advisory lock

• fork

– create a new process just like this one

• format

– declare a picture format with use by the write() subroutine

• formline

– internal subroutine used for formats

• getc

– get the next character from the filehandle

• getgrent

– get next group record

• getgrgid

– get group record given group user ID

• getgrnam

– get group record given group name

• gethostbyaddr

– get host record given its address

• gethostbyname

– get host record given name

• gethostent

– get next hosts record

• getlogin

– return who logged in at this tty

• getnetbyaddr

– get network record given its address

• getnetbyname

– get networks record given name

• getnetent

– get next networks record

• getpeername

– find the other hend of a socket connection

• getpgrp

– get process group

• getppid

– get parent process ID

• getpriority

– get current nice value

• getprotobyname

– get protocol record given name

• getprotobynumber

– get protocol record numeric protocol

• getprotoent

– get next protocols record

• getpwent

– get next passwd record

• getpwnam

– get passwd record given user login name

• getpwuid

– get passwd record given user ID

• getservbyname

– get services record given its name

• getservbyport

– get services record given numeric port

• getservent

– get next services record

• getsockname

– retrieve the sockaddr for a given socket

• getsockopt

– get socket options on a given socket

• glob

– expand filenames using wildcards

• gmtime

– convert UNIX time into record or string using Greenwich time

• goto

– create spaghetti code

• grep

– locate elements in a list test true against a given criterion

• hex

– convert a string to a hexadecimal number

• import

– patch a module's namespace into your own

• int

– get the integer portion of a number

• ioctl

– system-dependent device control system call

• join

– join a list into a string using a separator

• keys

– retrieve list of indices from a hash

• kill

– send a signal to a process or process group

• last

– exit a block prematurely

• lc

– return lower-case version of a string

• lcfirst

– return a string with just the next letter in lower case

• length

– return the number of bytes in a string

• link

– create a hard link in the filesytem

• listen

– register your socket as a server

• local

– create a temporary value for a global variable (dynamic scoping)

• localtime

– convert UNIX time into record or string using local time

• log

– retrieve the natural logarithm for a number

• lstat

– stat a symbolic link

• m//

– match a string with a regular expression pattern

• map

– apply a change to a list to get back a new list with the changes

• mkdir

– create a directory

• msgctl

– SysV IPC message control operations

• msgget

– get SysV IPC message queue

• msgrcv

– receive a SysV IPC message from a message queue

• msgsnd

– send a SysV IPC message to a message queue

• my

– declare and assign a local variable (lexical scoping)

• next

– iterate a block prematurely

• no

– unimport some module symbols or semantics at compile time

• oct

– convert a string to an octal number

• open

– open a file, pipe, or descriptor

• opendir

– open a directory

• ord

– find a character's numeric representation

• pack

– convert a list into a binary representation

• package

– declare a separate global namespace

• pipe

– open a pair of connected filehandles

• pop

– remove the last element from an array and return it

• pos

– find or set the offset for the last/next m//g search

• print

– output a list to a filehandle

• printf

– output a formatted list to a filehandle

• prototype

– get the prototype (if any) of a subroutine

• push

– append one or more elements to an array

• q/STRING/

– singly quote a string

• qq/STRING/

– doubly quote a string

• quotemeta

– quote regular expression magic characters

• qw/STRING/

– quote a list of words

• qx/STRING/

– backquote quote a string

• rand

– retrieve the next pseudorandom number

• read

– fixed-length buffered input from a filehandle

• readdir

– get a directory from a directory handle

• readlink

– determine where a symbolic link is pointing

• recv

– receive a message over a Socket

• redo

– start this loop iteration over again

• ref

– find out the type of thing being referenced

• rename

– change a filename

• require

– load in external subroutines from a library at runtime

• reset

– clear all variables of a given name

• return

– get out of a subroutine early

• reverse

– flip a string or a list

• rewinddir

– reset directory handle

• rindex

– right-to-left substring search

• rmdir

– remove a directory

• s///

– replace a pattern with a string

• scalar

– force a scalar context

• seek

– reposition file pointer for random-access I/O

• seekdir

– reposition directory pointer

• select

– reset default output or do I/O multiplexing

• semctl

– SysV semaphore control operations

• semget

– get set of SysV semaphores

• semop

– SysV semaphore operations

• send

– send a message over a socket

• setgrent

– prepare group file for use

• sethostent

– prepare hosts file for use

• setnetent

– prepare networks file for use

• setpgrp

– set the process group of a process

• setpriority

– set a process's nice value

• setprotoent

– prepare protocols file for use

• setpwent

– prepare passwd file for use

• setservent

– prepare services file for use

• setsockopt

– set some socket options

• shift

– remove the first element of an array, and return it

• shmctl

– SysV shared memory operations

• shmget

– get SysV shared memory segment identifier

• shmread

– read SysV shared memory

• shmwrite

– write SysV shared memory

• shutdown

– close down just half of a socket connection

• sin

– return the sin of a number

• sleep

– block for some number of seconds

• socket

– create a socket

• socketpair

– create a pair of sockets

• sort

– sort a list of values

• splice

– add or remove elements anywhere in an array

• split

– split up a string using a regexp delimiter

• sprintf

– formatted print into a string

• sqrt

– square root function

• srand

– seed the random number generator

• stat

– get a file's status information

• study

– optimize input data for repeated searches

• sub

– declare a subroutine, possibly anonymously

• substr

– get or alter a portion of a stirng

• symlink

– create a symbolic link to a file

• syscall

– execute an arbitrary system call

• sysread

– fixed-length unbuffered input from a filehandle

• system

– run a separate program

• syswrite

– fixed-length unbuffered output to a filehandle

• tell

– get current seekpointer on a filehandle

• telldir

– get current seekpointer on a directory handle

• tie

– bind a variable to an object class

• time

– return number of seconds since 1970

• times

– return elapsed time for self and child processes

• tr///

– transliterate a string

• truncate

– shorten a file

• uc

– return upper-case version of a string

• ucfirst

– return a string with just the next letter in upper case

• umask

– set file creation mode mask

• undef

– remove a variable or subroutine definition

• unlink

– remove one link to a file

• unpack

– convert binary structure into normal perl variables

• unshift

– prepend more elements to the beginning of a list

• untie

– break a tie binding to a variable

• use

– load in a module at compile time

• utime

– set a file's last access and modify times

• values

– return a list of the values in a hash

• vec

– test or set particular bits in a string

• wait

– wait for any child process to die

• waitpid

– wait for a particular child process to die

• wantarray

– get list vs array context of current subroutine call

• warn

– print debugging info

• write

– print a picture record

• y///

– transliterate a string