Command Line Basics

9
Command line options Safety net options There are three options I like to think of as a "safety net," as they can stop you from making a fool of yourself when you're doing something particularly clever (or stupid!). And while they aren't ever necessary, it's rare that you'll find an experienced Perl programmer working without them. The first of these is -c. This option compiles your program without running it. This is a great way to ensure that you haven't introduced any syntax errors while you've been editing a program. When I'm working on a program I never go more than a few minutes without saving the file and running: perl -c <program> This makes sure that the program still compiles. It's far easier to fix problems when you've only made a few changes than it is to type in a couple of hundred of lines of code and then try to debug that. The next safety net is the -w option. This turns on warnings that Perl will then give you if it finds any of a number of problems in your code. Each of these warnings is a potential bug in your program and should be investigated. In modern versions of Perl (since 5.6.0) the -w option has been replaced by the use warnings pragma, which is more flexible than the command-line option so you shouldn't use -w in new code. The final safety net is the -T option. This option puts Perl into "taint mode." In this mode, Perl inherently distrusts any data that it receives from outside the program's source -- for example, data passed in on the command line, read from a file, or taken from CGI parameters. Tainted data cannot be used in an expression that interacts with the outside world -- for example, you can't use it in a call to system or as the name of a file to open. The full list of restrictions is given in the perlsec manual page. In order to use this data in any of these potentially dangerous operations you need to untaint it. You do this by checking it against a regular expression. A detailed discussion of taint mode would fill an article all by itself so I won't go into any more details here, but using taint mode is a very good habit to get into -- particularly if you are writing programs (like CGI programs) that take unknown input from users. Actually there's one other option that belongs in this set and that's - d. This option puts you into the Perl debugger. This is also a subject

description

Basics of the command line.

Transcript of Command Line Basics

Page 1: Command Line Basics

Command line options

Safety net optionsThere are three options I like to think of as a "safety net," as they can stop you from making a fool of yourself when you're doing something particularly clever (or stupid!). And while they aren't ever necessary, it's rare that you'll find an experienced Perl programmer working without them.

The first of these is -c. This option compiles your program without running it. This is a great way to ensure that you haven't introduced any syntax errors while you've been editing a program. When I'm working on a program I never go more than a few minutes without saving the file and running:perl -c <program>This makes sure that the program still compiles. It's far easier to fix problems when you've only made a few changes than it is to type in a couple of hundred of lines of code and then try to debug that.

The next safety net is the -w option. This turns on warnings that Perl will then give you if it finds any of a number of problems in your code. Each of these warnings is a potential bug in your program and should be investigated. In modern versions of Perl (since 5.6.0) the -w option has been replaced by the use warnings pragma, which is more flexible than the command-line option so you shouldn't use -w in new code.

The final safety net is the -T option. This option puts Perl into "taint mode." In this mode, Perl inherently distrusts any data that it receives from outside the program's source -- for example, data passed in on the command line, read from a file, or taken from CGI parameters.

Tainted data cannot be used in an expression that interacts with the outside world -- for example, you can't use it in a call to system or as the name of a file to open. The full list of restrictions is given in the perlsec manual page.

In order to use this data in any of these potentially dangerous operations you need to untaint it. You do this by checking it against a regular expression. A detailed discussion of taint mode would fill an article all by itself so I won't go into any more details here, but using taint mode is a very good habit to get into -- particularly if you are writing programs (like CGI programs) that take unknown input from users.

Actually there's one other option that belongs in this set and that's -d. This option puts you into the Perl debugger. This is also a subject that's too big for this article, but I recommend you look at "perldoc perldebug" or Richard Foley's Perl Debugger Pocket Reference.

Some basicsThe next few options make it easy to run short Perl programs on the command line. The first one, -e, allows you to define Perl code to be executed by the compiler. For example, it's not necessary to write a "Hello World" program in Perl when you can just type this at the command line.perl -e 'print "Hello World\n"'

You can have as many -e options as you like and they will be run in the order that they appear on the command line.perl -e 'print "Hello ";' -e 'print "World\n"'

Page 2: Command Line Basics

Notice that like a normal Perl program, all but the last line of code needs to end with a ; character.

The following deal with modules:Although it is possible to use a -e option to load a module, Perl gives you the -M option to make that easier.perl -MLWP::Simple -e'print head "http://www.example.com"'

So -Mmodule is the same as use module.

If the module has default imports you don't want imported then you can use -m instead. Using -mmodule is the equivalent of use module(), which turns off any default imports. For example, the following command displays nothing as the head function won't have been imported into your main package:perl -mLWP::Simple -e'print head "http://www.example.com"'

The -M and -m options implement various nice pieces of syntactic sugar to make using them as easy as possible. Any arguments you would normally pass to the use statement can be listed following an = sign. perl -MCGI=:standard -e'print header'

This command imports the ":standard" export set from CGI.pm and therefore the header function becomes available to your program. Multiple arguments can be listed using quotes and commas as separators.perl -MCGI='header,start_html' -e'print header, start_html'

In this example we've just imported the two methods header and start_html as those are the only ones we are using.

Implicit loopsTwo other command-line options, -n and -p, add loops around your -e code. They are both very useful for processing files a line at a time.

The –n optionIf you type:perl -n -e 'some code' file1

Then Perl will interpret that as: LINE: while (<>) { # your code goes here }

Notice the use of the empty file input operator, which will read all of the files given on the command line a line at a time. Each line of the input files will be put, in turn, into $_ so that you can process it. As a example, try:perl -n -e 'print "$. - $_"' dna.seq

This gets converted to: LINE: while (<>) { print "$. - $_" }

Page 3: Command Line Basics

This code prints each line of the file together with the current line number:1 - ACGTTCTCTGG TTAAAATGGC GCACCCACAA GCAGCCACAC CAATCCCAAA GTTTCATTTA2 - ACGGTTCCAAA TTCCCCCCCC GGGGGGGGGG AAAAAAAAAA TTTTTTTTTT CCCCCCCCCC

The –p optionThe -p option makes that even easier. This option always prints the contents of $_ each time around the loop. It creates code like this: LINE: while (<>) { # your code goes here } continue { print or die "-p destination: $!\n"; }This uses the little-used continue block on a while loop to ensure that the print statement is always called. Using this option, our line number generator becomes:perl -p -e '$_ = "$. - $_"' dna.seq

The result is the same as above (in –n case). In this case there is no need for the explicit call to print as -p calls print for us.

Numbering lines

Number all lines in a fileperl -pe '$_ = "$. $_"' quotes.txt

1 This is the definition of my life2 %%3 We are far too young and clever4 %%5 Stab a sorry heart6 With your favorite finger7

"-p" causes Perl to assume a loop around the program (specified by "-e") that reads each line of input into the " $_ " variable, executes the program and then prints the " $_ " variable. In this one-liner I simply modify " $_ " and prepend the " $. " variable to it. The special variable " $. " contains the current line number of input. The result is that each line gets its line number prepended. Note: the empty line (last) also gets a number.

Number only non-empty lines in a fileperl -pe '$_ = ++$a." $_" if /./' quotes.txt

1 This is the definition of my life2 %%3 We are far too young and clever4 %%5 Stab a sorry heart6 With your favorite finger

Here we employ the "action if condition" statement that executes "action" only if "condition" is true. In this case the condition is a regular expression "/./", which matches any character except newline (that is, it matches a non-empty line); and the action is " $_ = ++$a." $_" ", which prepends variable " $a " incremented by one to the current line. As we didn't use strict pragma, $a was created automatically.

Page 4: Command Line Basics

The result is that at each non-empty line " $a " gets incremented by one and prepended to that line. And at each empty line nothing gets modified and the empty line gets printed as is.Note: empty line 7 is not printed.

Similarly:Number and print only non-empty lines in a file:perl -ne 'print ++$a." $_" if /./'

Number all lines but print line numbers only for non-empty lines:

The LINE label and the unless pattern Notice that the LINE: label is there so that you can easily move to the next input record no matter how deep in embedded loops you are. You do this using next LINE.perl -n -e 'next LINE unless /pattern/; print $_'

Of course, that example would be written as:perl -n -e 'print unless /ACGGT/; $_' dna.seqYou get:ACGTTCTCTGG TTAAAATGGC GCACCCACAA GCAGCCACAC CAATCCCAAA GTTTCATTTAThis is line 1 of the dna.seq file. The pattern ACGGT starts in line 2, so the unless statement omits it from printing because the unless statement only executes the code if the condition evaluates to be false: in this case the condition is true, as ACGGT is present, so line 2 is not printed.

If the pattern is set as ACGTT, line 2 is printed. If the pattern is set as CGTTTT, both lines are printed, as this pattern is absent.

But in a more complex example, the next LINE construct could potentially make your code easier to understand.

Counting words in a fileIf you need to have processing carried out either before or after the main code loop, you can use a BEGIN or END block. Here's a pretty basic way to count the words in a text file:perl -ne 'END { print $t } @w = /(\w+)/g; $t += @w' file.txt

You get: 12 (The file.txt content is: If there is nothing in a name, why do we have names?)

Each time round the loop we extract all of the words (defined as contiguous runs of \w characters into @w and add the number of elements in @w to our total variable $t. The END block runs after the loop has completed and prints out the final value in $t.

Of course, people's definition of what constitutes a valid word can vary. The definition used by the Unix wc (word count) program is a string of characters delimited by whitespace. We can simulate that by changing our program slightly, like this:perl -ne 'END { print $x } @w = split; $x += @w' file.txt

But there are a couple of command-line options that will make that even simpler. Firstly the -a option turns on autosplit mode. In this mode, each input record is split and the resulting list of elements is stored in an array called @F. This means that we can write our word-count program like this (this also gives 12):

Page 5: Command Line Basics

perl -ane 'END {print $x} $x += @F' file.txt

The default value used to split the record is one or more whitespace characters. It is, of course, possible that you might want to split the input record on another character and you can control this with the -F option. So if we wanted to change our program to split on all non-word characters we could do something like this (don’t know how this works!):perl -F'\W' -ane 'END {print $x} $x += @F' file.txt

In-place editing

With the options that we have already seen, it's very easy to build up some powerful command-line programs. It's very common to see command line programs that use Unix I/O redirection like this:perl -pe 'some code' < input.txt > output.txtThis takes records from input.txt, carries out some kind of transformation, and writes the transformed record to output.txt. In some cases you don't want to write the changed data to a different file, it's often more convenient if the altered data is written back to the same file.You can get the appearance of this using the -i option. Actually, Perl renames the input file and reads from this renamed version while writing to a new file with the original name. If -i is given a string argument, then that string is appended to the name of the original version of the file. For example, to change all occurrences of "Mahesh" to "Ramesh" in a data file you could write:perl -i -pe 's/\bMahesh\Ramesh/g' file2 .txtThis was the original: My name is Mahesh. Your name is Ramesh. Mahesh goes to Ramesh.This is the result: My name is Ramesh. Your name is Ramesh. Ramesh goes to Ramesh.

Perl reads the input file a line at a time, making the substitution, and then writing the results back to a new file that has the same name as the original file -- effectively overwriting it. If you're not so confident of your Perl abilities you might take a backup of the original file, like this:perl -i.bak -pe 's/\bMahesh/Ramesh/g' file2.txt

You'll end up with the transformed data in file.txt and the original file backed up in file.txt.bak. If you're a fan of vi then you might like to use -i~ instead.

The ARGV arrayIn Perl, command-line arguments are stored in a special array named @ARGV. So you just need to read from that array to access your script's command-line arguments.ARGV array elements: In the ARGV array, $ARGV[0] contains the first argument, $ARGV[1] contains the second argument, etc. So if you're just looking for one command line argument you can test for  $ARGV[0], and if you're looking for two you can also test for  $ARGV[1], and so on.

ARGV array size: The variable $#ARGV is the subscript of the last element of the @ARGV array, and because the array is zero-based, the number of arguments given on the command line is $#ARGV + 1.

Page 6: Command Line Basics

A typical Perl script that uses command-line arguments will (a) test for the number of command line arguments the user supplied and then (b) attempt to use them. Here's a simple Perl script named "name.pl" that expects to see two command-line arguments, a person's first name and last name, and then prints them:#!/usr/bin/perl -w

# command line arguments basics

# 1 - quit unless we have the correct number of command-line args

$num_args = $#ARGV + 1;

if ($num_args != 2) {print "\nUsage: name.pl first_name last_name\n";exit;

}

# 2 - we got two command line args, so assume they are the first name and # last name

$first_name = $ARGV[0];$last_name = $ARGV[1];

print "Hello, $first_name $last_name\n";

To test this script on a Unix/Linux system, just create a file named name.pl, then issue this command to make the script executable:

1) First type: chmod +x name.pl (this step is not required on Windows 7)2) Then type: ./name.pl Mahesh Vaishnav (In Windows 7, type: perl ./name.pl

Mahesh Vaishnav)

The result is: Hello, Mahesh Vaishnav

Or, if you want to see the usage statement, run the script without any command line arguments, like this:./name.plYou get:Usage: name.pl first_name last_name

The @_ variable and subroutinePerl also comes with a variable named @_, which contains arguments passed to a subroutine, and which is available to the subroutine when it is invoked. The value of each element of the array can be accessed using standard scalar notation - $_[0] for the first element, $_[1] for the second element, and so on.

#!/usr/bin/perl

# @_ contains arguments passed to sub

sub add_two_numbers {$sum = $_[0] + $_[1];return $sum;

}

$total = &add_two_numbers(3,5);print "The sum of the numbers is $total\n";

exit;

Page 7: Command Line Basics

Result: The sum of the numbers is 8

In the example above, once the &add_two_numbers subroutine is invoked with the numbers 3 and 5, the numbers are transferred to the @_ variable, and are then accessed using standard scalar notation within the subroutine. Once the addition has been performed, the result is returned to the main program, and displayed on the screen via the print() statement.

Perl command line arguments in a for loop

#!/usr/bin/perl

#-------------------## PROGRAM: argv.pl ##-------------------#

$numArgs = $#ARGV + 1;print "thanks, you gave me $numArgs command-line arguments:\n";

foreach $argnum (0 .. $#ARGV) {print "$ARGV[$argnum]\n";

}

Run the script as follows from a DOS command-line: perl argv.pl 1 2 3 4

You get:thanks, you gave me 4 command-line arguments:1234