Creating dynamic content for the Web using Perl

4
focus software 68 Volume 2, Issue 2 Summer 2001 Copyright © 2001 John Wiley & Sons, Ltd. Introduction Software engineering as described by academics and authors such as Sommerville [1], is the process of creating quality software solutions to well-defined problems. The software engineering process involves consultation with users, the creation of detailed designs, extensive testing and an iterative approach to the development process. Engineering led software development is the dominant philosophy in the industry today. An alternative craft view of software development dominated the industry into the 1970s, although it was rarely characterized as such. Treating software development as a craft means building code around primitive, or non-existent, design documents, with the code often regarded as providing its own documentation. Development as craft involves being adaptable during the development process and attempting to use the best tool for the job rather than deciding upon tools and techniques before writing any code. Development as craft is often called hacking, although that term has unfortunate connotations outside of programming. Software engineering grew out of the many failures of the industry in the 1960s and 1970s when large systems regularly failed to meet their design goals. It is clear that much software and many systems continue to fail when outcomes are compared to original goals, but the amount of successful software available today far exceeds the amount that fails. Developing software as a craft has a long and illustrious history exemplified by the ‘hacker’ tradition which has given us the GNU tools [2], the Linux kernel [3], and the Emacs editor [4]. Web development sometimes happens in an engineering context. Large E-commerce Web sites tend to be designed and tested thoroughly and are often built using industry standard technologies. The combination of plentiful UML documentation [5], Microsoft Active Server Pages [6],and an Oracle database has much to recommend it to conservative IT managers. Such technologies have a reputation for stolid reliability and good support from their manufacturers, although that clearly comes with a hefty price tag attached. Much Web development is undertaken by non-programmers. These people may be designers who need to add some code to a Creating dynamic content for the Web using Perl Chris Bates School of Computing and Management Sciences, Sheffield Hallam University, City Campus, Pond Street, Sheffield S1 1WB, UK [email protected] focus opinion Keywords: software; computer languages; Web page design Web development is one of the growth areas of programming. The creators of E-commerce and small hobby sites use many of the same tools and techniques. Development languages become fashionable, wildly popular then fade away to be replaced by the next big thing. Some languages have such power and suitability that they deserve to be taken seriously and used widely rather than discarded when an alternative comes along. In this paper CHRIS BATES presents the programming language Perl which is widely used on Web sites that create and present dynamic content. He focuses on the features of Perl that make it so well suited for Web development and worth considering for both experienced programmers and beginners alike. There sometimes appears to be an expectation that using the Web presents a series of radical problems that are not encountered when, for instance, building traditional desktop applications. Programming for the Web tends to be discussed and taught as if it is somehow different to other types of programming. There sometimes appears to be an expectation that using the Web presents a series of radical problems that are not encountered when, for instance, building traditional desktop applications. In reality, many modern programs include some of the features that we associate with Web-based applications. Perhaps the myths about the difficulties associated with Web development have arisen because the Web has never been the exclusive development environment of traditional software engineers.

Transcript of Creating dynamic content for the Web using Perl

Page 1: Creating dynamic content for the Web using Perl

focussoftware68 Volume 2, Issue 2

Summer 2001

Copyright © 2001 John Wiley & Sons, Ltd.

IntroductionSoftware engineering as described byacademics and authors such asSommerville [1], is the process of creatingquality software solutions to well-definedproblems. The software engineeringprocess involves consultation with users,the creation of detailed designs, extensive

testing and an iterative approach to thedevelopment process. Engineering ledsoftware development is the dominantphilosophy in the industry today. Analternative craft view of softwaredevelopment dominated the industry intothe 1970s, although it was rarelycharacterized as such. Treating softwaredevelopment as a craft means buildingcode around primitive, or non-existent,design documents, with the code oftenregarded as providing its owndocumentation. Development as craftinvolves being adaptable during thedevelopment process and attempting touse the best tool for the job rather thandeciding upon tools and techniques before

writing any code. Development as craft isoften called hacking, although that termhas unfortunate connotations outside ofprogramming.

Software engineering grew out of themany failures of the industry in the 1960sand 1970s when large systems regularlyfailed to meet their design goals. It is clearthat much software and many systemscontinue to fail when outcomes arecompared to original goals, but theamount of successful software availabletoday far exceeds the amount that fails.Developing software as a craft has a longand illustrious history exemplified by the‘hacker’ tradition which has given us theGNU tools [2], the Linux kernel [3], andthe Emacs editor [4].

Web development sometimes happens inan engineering context. Large E-commerceWeb sites tend to be designed and testedthoroughly and are often built usingindustry standard technologies. Thecombination of plentiful UMLdocumentation [5], Microsoft Active ServerPages [6],and an Oracle database hasmuch to recommend it to conservative ITmanagers. Such technologies have areputation for stolid reliability and goodsupport from their manufacturers,although that clearly comes with a heftyprice tag attached.

Much Web development is undertaken bynon-programmers. These people may bedesigners who need to add some code to a

Creating dynamic content for the Web using Perl

Chris BatesSchool of Computing and Management Sciences, Sheffield Hallam University, City Campus, Pond Street, Sheffield S1 1WB, [email protected]

focusopinion

Keywords: software; computer languages; Web page designWeb development is one of the

growth areas of programming. The

creators of E-commerce and small

hobby sites use many of the same

tools and techniques. Development

languages become fashionable,

wildly popular then fade away to

be replaced by the next big thing.

Some languages have such power

and suitability that they deserve to

be taken seriously and used widely

rather than discarded when an

alternative comes along. In this

paper CHRIS BATES presents the

programming language Perl which

is widely used on Web sites that

create and present dynamic

content. He focuses on the features

of Perl that make it so well suited

for Web development and worth

considering for both experienced

programmers and beginners alike.

There sometimes appears to be anexpectation that using the Web

presents a series of radicalproblems that are not encountered

when, for instance, buildingtraditional desktop applications.

Programming for the Web tends to be discussed and taught as if it is somehowdifferent to other types of programming. There sometimes appears to be anexpectation that using the Web presents a series of radical problems that are notencountered when, for instance, building traditional desktop applications. Inreality, many modern programs include some of the features that we associate withWeb-based applications. Perhaps the myths about the difficulties associated withWeb development have arisen because the Web has never been the exclusivedevelopment environment of traditional software engineers.

swf2.2_v2 8/8/01 6:55 am Page 68

Page 2: Creating dynamic content for the Web using Perl

focussoftware 69Volume 2, Issue 2Copyright © 2001 John Wiley & Sons, Ltd.

Summer 2001 focus opinion

site, the nominated person within a smallcompany or dedicated hobbyists with agood idea and some free Web space.Developers that are not professionalprogrammers are likely to look widely forsolutions. They may be less likely to hidebehind a brand-name than industrymanagers will, and are more open to theuse of free software [7], or open-sourcesolutions. A plethora of tools have beencreated for, and by, this particular group.

Even non-specialists will be aware of theJava programming language from SunMicrosystems [8], and its use on the Web.Many other languages have beendeveloped or adapted for use on Webservers. These languages are oftenpleasingly free of the compromises thatthe designers of more industrial languagesmust accept. Perl is just such a language.This paper presents those features of Perlthat make it ideal for Web development.The intention is to show those who havenever looked at it what the language has tooffer, and to demonstrate that Perl isworth looking at again if the admittedlycryptic syntax has proved off-putting inthe past.

Section 2 demonstrates that scriptinglanguages are appropriate even on high-load systems, Section 3 briefly describesthe Perl language. In Section 4 theCGI.pm module is described, and finallySection 5 shows that Perl can be used tobuild complex template-based sites.

ScriptingProgramming for Web sites usuallyinvolves either adding dynamic aspects toHTML pages using JavaScript or creatingentire dynamic Web sites using programson the server. The processing of data on aWeb server is now commonplace in manylarge intranet and E-commerce Web sites.Data is collected from users via an HTMLform, some characters such as spaces arereplaced with escape sequences, and thedata returned to the server. The data isextracted from the return messages andpassed to a program that replaces theescape sequences with the original charactersthen processes the data.

Server-side programs can be written in justabout any language, the only restrictionbeing that the server is actually able toexecute the finished program. The mainfacility that a good Web programminglanguage needs is the ability to handle textstrings. Data received from Web formscomes as strings and the Web pages thatare sent back to the browser are primarily

textual. Choosing a development languagethat has good facilities for the handling oftext is therefore important, but sometimesdevelopers and managers worry aboutperformance and suggest that a fastlanguage such as C should be used. Todaythis is rarely an issue as technologies existwhich give rapid execution speeds even whenusing interpreted languages such as Perl.

Why Perl?Perl is probably the dominant language inInternet development today. Perl wascreated in the late 1980s by Larry Wall,who also co-authored Programming Perl [9],which is the standard textbook on thelanguage. Right from its inception Perlwas a language with excellent textmanipulation facilities, in fact that was itsoriginal purpose and remains central to itsuse and development today.

Perl is ideally suited to the creation of Webapplications. The language handles textvery well; its regular expression enginealone is worth the effort of learning thelanguage. If you want to search textstrings, make changes and then get thosechanges back out to a user, Perl andregular expressions make the idealcombination. In Mastering RegularExpressions [10], Jeffrey Friedl describesPerl and other languages that implement

regular expressions, but only Perl makesthem a central feature.

Documentation is yet another strongfeature of Perl. The standard distributioncomes with thousands of pages of onlinedocuments that can be accessed using acommand-line facility called perldoc.

Unix distributions have much of thismaterial available as man pages, in thedistribution from ActiveState [11] whichruns on Microsoft Windows it is providedas HTML pages.

Many introductory Web programmingtexts such as Web Programming [12] by theauthor, Gundvaram’s CGI Programming onthe World Wide Web [13] and Fast Track WebProgramming [14] by Cintron includetutorial material on the use of thelanguage. Slightly older texts such as

Instant WebScripts [15] and CGI Developer’sResource [16] still contain some usefulinformation but the pace of Webdevelopment is so fast that books caneasily get left behind. There are also anumber of good Perl programming textsavailable including Learning Perl [17],Learning Perl on Win32 Systems [18] andProgramming Perl [9]. The Perl Cookbook[19] is a vital source of hints, tips anduseful code for anyone who is seriousabout their Perl programming.

The comprehensive Perlarchive networkMany important modules are part of thecore Perl distribution, updates areavailable from the Comprehensive PerlArchive Network [20], CPAN, as arenumerous other useful libraries. Perl reallydemonstrates many of the benefits of co-operative development using the Internet.Programmers around the world work tocreate not only the core language, but vastlibraries of code modules which performmyriad complex tasks. Installing and usingthese is generally not an onerous taskalthough if a module is being activelydeveloped, handling regular updates canbe time consuming. Fortunately many ofthe most important modules for Webdevelopment are relatively stable and willonly need updating occasionally.

CPAN is a worldwide network of linkedFTP sites. The CPAN archives can beaccessed centrally via the Web [20], andare available via anonymous FTP. Once arunning installation of Perl exists on amachine the most effective way ofupdating it and adding new modules is viaits in-built facilities. Typing

perl -MCPAN -e shell

at a command prompt brings up aninteractive utility that can be used to accessthe archive. There’s not even a need tofind a suitable mirror, as this is doneautomatically by CPAN.

CGI.pmExtracting the data returned from anHTML form and presenting it to anapplication is a relatively trivial task inPerl. It is also something that almost everyWeb script needs to do at some point.Whether a user is uploading data onto aserver, purchasing from an online store orsimply requesting a page, this process liesat the heart of Web programming. Ratherthan write the code to perform theseprocesses in each script, library routinesare used. These have many benefits, not

Perl is ideally suited to thecreation of Web applications. Thelanguage handles text very well;

its regular expression engine aloneis worth the effort of learning the

language.

swf2.2_v2 8/8/01 6:55 am Page 69

Page 3: Creating dynamic content for the Web using Perl

focussoftware70 Volume 2, Issue 2 Copyright © 2001 John Wiley & Sons, Ltd.

Summer 2001 focus opinion

least of which is robustness. Even the bestprogrammers make mistakes but in a well-tested library those mistakes will have beendiscovered and removed. The library ofchoice for Web developers using Perl isCGI.pm written by Lincoln Stein, [21] and[22]. It handles all of the complexity ofWeb development and presents anextremely usable interface to applicationdevelopers.

Many people would use Perl on Webservers even if CGI.pm did not exist, butits easy availability and excellentfunctionality go some way to compensating

for the obscure syntax of the language.Like many Perl modules, CGI.pm isavailable either through its object-orientedor procedural interfaces. This makesslotting it into existing code very simple,but also means that developers can choosethe approach with which they are mostcomfortable.

CGI.pm has two purposes: the creation ofWeb pages, especially those which containforms at run-time, and the parsing of datareturned from Web forms. The creation ofHTML pages, especially the forms withinthem, is accomplished by shortcuts that aresimple function calls. The intention wasthat by providing shortcuts coding errorswould be reduced. In reality, using theshortcuts is often more complex thanwriting the HTML code out in full.

Many developers end up using only asubset of the facilities that the moduleprovides. Its ability to seamlessly extractdata from forms, including files and othermulti-part data, is always used. Themodule includes excellent boilerplatefunctions that create the headers andfooters for both HTTP message andHTML page when building a Web page toreturn to the browser.

Few scripts work perfectly out of the box.Most have some errors which preventthem from working properly. The CGI.pmmodule has a feature that enables scriptsto be tested and debugged on theworkstation without the need for a Webserver. This is such a useful facility that itshould be used in most developmentsituations. Scripts that create errors whenrunning on a Web server tend to writethose errors straight to the server logwhere they are inaccessible to most

developers who do not have systemadministrator privileges. All Perl scriptsshould be developed with the warning flagset, the first line of every script should be:

#!/usr/bin/perl -Tw

The w flag turns on warnings aboutpotential syntax errors, the T flag initiatestaint checking which looks for potentiallydangerous uses of data received from aform. The warnings flag can create manyerrors even from a script that appears torun perfectly. Debugging is needed toremove those errors before labelling thescript as production code. If a script is runfrom the command line, CGI.pm gives theopportunity to enter parameter names andvalues in pairs. When all parameters havebeen entered, Control-D runs the scriptwith errors displayed on the terminal.

Handling error messagesA running script can create errors as partof its normal operation. As previouslynoted these tend to be written into theserver logs that are not accessible bynormal users on most systems. In addition,the server will not usually return amessage back to the browser. The user hasno way of knowing that the Web server justfailed to handle their request and mayassume that all is well. One way of dealingwith this problem is to redirect the errormessage back to the originating browser.This is done by using the CGI::Carpmodule. The module is used by placingthe following code near the top of a script:

use CGI::Carp(fatalsToBrowser);

Perl and databasesLarge Web sites, especially those which arerun as commercial endeavours, create andstore massive amounts of data. Thisstorage can be done in a variety of waysincluding flat files, XML files or relationaldatabases. The use of databases hasnumerous advantages but accessing themcan be difficult. The database will often berunning on a different server to the Webapplication, database commands are oftenwritten in SQL. Ideally a Web applicationwill be able to talk to any databasebackend although changing the backend isunlikely. Perl has a database neutralsystem called Perl::DBI.

DBI is an API that provides a library ofroutines which provide a consistentinterface for accessing any relationaldatabase. The actual database access isprovided by a database-specific driver.Therefore the only databases that can beused with DBI are those for whichsomeone has taken the time to develop adriver. It is worth noting that DBI is not a

Web-only technology, it can be used in anyPerl application which requires databaseaccess, however it is an importanttechnology on many commercial Web sites.

Unix systems have numerous databasesand require myriad drivers; underWindows the whole process is simplifiedsomewhat. Microsoft has a technologycalled ODBC which gives a consistentinterface to any database system. UsingDBI on Windows is therefore very simple:simply use the solid and reliable ODBCbackend, which will talk to the operatingsystem’s ODBC manager.

HTML::Mason, Web templatesin PerlWeb site development has moved from apurely scripted model in recent years.Scripted sites tend to embed HTML codeinside scripts but for many non-programmers this can seem overlycomplex. These developers tend to thinkin terms of documents rather than scriptsand prefer to use templating systems. Thebest known templating systems are probablyActive Server Pages [23], ASP, and PHP

[24], both of which are widely used. Thelarge number of sites using ASPs is notsurprising given the close integration thattechnology has with Microsoft’s serverproducts; PHP is much more of a grassroots movement whose use has spreadalmost by word-of-mouth. Both show thatplenty of people want to embed smallscripts inside HTML pages.

Perl developers are usually going to behappier working with a scripting model ofdevelopment but this may not always be anappropriate solution. HTML::Mason,available from Swart [25], is a recentdevelopment that combines a template-based model with the power of Perl. Ofcourse this can be done using ASP andPerlscript, but HTML::Mason has theadvantage of being a pure Perl solutionwhich is hosted, and can be closelyintegrated with the Apache Web server[26]. The documentation supplied withHTML::Mason states that it:

“is a tool for building,serving and maintaininglarge Websites...serving

dynamic content”.

Perl really demonstrates many of the benefits of

co-operative development using the Internet.

Web site development hasmoved from a purely scripted

model in recent years.

swf2.2_v2 8/8/01 6:55 am Page 70

Page 4: Creating dynamic content for the Web using Perl

focussoftware 71Volume 2, Issue 2Copyright © 2001 John Wiley & Sons, Ltd.

ConclusionsProgrammers are as susceptible to thevagaries of fashion as anyone else. Newtechnologies supersede old ones throughhype or boredom. Often the new saviour isno better than the old and after a fewyears developers drift back to theirprevious language or system. Thisdynamic can be seen in the ongoingmovements between C++ and Java ormainframe and client-server development.

This paper has shown a tiny subset of thefeatures of Perl. As a language Perl hasmuch to recommend it for manydevelopment tasks, its suitability as adevelopment language for the Web isclear. Newer technologies such as PHP,

Java servlets or ASP may be morefashionable or more widely hyped but they

lack the range of applications, extensionsand facilities that Perl provides. Anyonewho needs to develop a major Web-basedapplication would benefit from using Perlas their server-side language in reduceddevelopment times and increasedrobustness.

Summer 2001 focus opinion

Mason bases sites around components whichare files containing a mixture of HTML,Perl and special Mason commands.Components may represent completepages but more commonly they aresmaller objects that can be embeddedwithin each other to create a completepage. This idea is powerful and intuitivelysensible to anyone who is used to object-based design or programming. Breaking acomplex page into its constituent piecesmeans that navigation bars, headers orpage footers can be changed withoutaffecting the rest of the page. It alsomeans that a developer can create acomponent that represents the structure ofthe whole site in a single file then embedthat content into every page on the site.Whilst Mason has not yet been describedin any Perl programming texts, a shortseries of articles presented in Linux Journal[27] and [28] gives a good introduction tothe system.

Mason was designed to integrate closelywith the Apache Web server. It does thisthrough Apache’s commonly usedmod_perl extension [29] which is widelyused to improve the performance of CGIscripts when developing with Perl. Themodule answers one of the commonestobjections raised to the use of scriptinglanguages on Web servers: poor runtimeperformance. In traditional CGI work,each time that a script is called, theinterpreter and script must be loadedbefore execution. With mod_perl theinterpreter is only loaded the first time itis needed; it then remains resident.Similarly scripts are loaded andinterpreted only once, and theinterpreted, usually bytecode, version isexecuted thereafter. The HTML::MasonWeb site [25] claims that speedup from400 to 2000 per cent is possible usingmod_perl. Given the relatively small sizeof most scripts this speedup brings runtimes towards the level of compiled code.

Programmers are as susceptible to the vagaries of

fashion as anyone else. New technologies supersede oldones through hype or boredom.

References[1] Sommerville I. Software Engineering, 6th edn.Addison-Wesley, 2000.[2] GNU. The GNU Project. http://www.gnu.org.[3] The Linux Kernel Archives. http://www.kernel.org.[4] Free Software Foundation. The Emacs Editor.http://www.gnu.org/software/emacs.[5] Booch G, Rumbaugh J & Jacobson I. The UnifiedModeling Language UserGuide. Addison-Wesley,1999.[6] Weissinger AK. ASP in a Nutshell. O’Reilly andAssociates, 1999.[7] Free Software Foundation. Free SoftwareFoundation. http://www.fsf.org.[8] Sun Microsystems Servlet Page.http://www.java.sun.com/products/servlet/index.html.[9] Wall L, Christiansen T & Schwartz RL.Programming Perl, 3rd edn. O’Reilly and Associates,2000.[10] Friedl JEF. Mastering Regular Expressions.O’Reilly and Associates, 1998.[11] The ActiveState Web Site.http://www.activestate.com.[12] Bates C. Web Programing: Building InternetApplications. John Wiley and Sons, 2000.[13] Gundavaram S. CGI Programing on the WorldWide Web. O’Reilly and Associates, 1996.[14] Cintron D. Fast Track Web Programming. JohnWiley and Sons, 1999.[15] Sol S & Birzniecks G. Instant Web Scripts With

CGI Perl. M and T, 1996.

[16] Ivler JM & Husain K. CGI Developer’s Resource.Prentice Hall, 1997.

[17] Schwartz RL. Learning Perl. O’Reilly andAssociates, 1997.

[18] Schwartz RL, Olsen E & Christiansen T.Learning Perl on Win32 Systems. O’Reilly andAssociates, 1997.

[19] Christiansen T & Torkington N. Perl Cookbook.O’Reilly and Associates, 1998.

[20] The Comprehensive Perl Archive Network.http://www.perl.com/CPAN.

[21] Stein LD. The Home of CGI.pm.

[22] Stein LD. How to Set-up and Maintain a WebSite. Addison-Wesley, 1997.

[23] About Active Server Pages.http://msdn.microsoft.com/library/psdk/cdo/_olemsg_about_active_server_%pages.htm.5

[24] PHP Hypertext Processor. http://www.php.net.

[25] Swartz J. Mason HQ. http://www.masonhq.org.

[26] The Apache Group. http://www.apache.org.

[27] Lerner RM. Building sites with Mason. TheLinux Journal 2000; 74.

[28] Lerner RM. Session management with Mason.The Linux Journal 2000; 76.

http://www.genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html.

[29] The Apache Group. The Apache Mod PerlExtension. http://perl.apache.org.

We welcome letters on matters raised in this issue or any otherissue of the magazine or on topics of interest to softwarescientists and engineers.

Please write to: The Editor, Software Focus, John Wiley & Sons, LtdBaffins Lane, Chichester, PO19 1UD, UK

or email: [email protected]

Letters to the Editor

2531

swf2.2_v2 8/8/01 6:55 am Page 71