CGI Programming in c

38
Getting Started with CGI Programming in C Content Why CGI programming? A basic example Analysis of the example So what is CGI programming? Using a C program as a CGI script The Hello world test How to process a simple form Using METHOD="POST" Further reading This is an introduction to writing CGI programs in the C language . The reader is assumed to know the basics of C as well how to write simple forms in HTML and to be able to install CGI scripts on a Web server. The principles are illustrated with very simple examples. Two important warnings: To avoid wasting your time, please check—from applicable local doc u ments or by contacting local webmaster—whether you can install and run CGI scripts written in C on the server. At the same time, please check how to do that in detail—specifically, where you need to put your CGI scripts. This document was written to illustrate the idea of CGI scripting to C program mers. In practice, CGI programs are usually written in other lan guages, such as Perl , and for good reasons: except for very simple cases, CGI programming in C is clumsy and error-prone. Why CGI programming? As my document How to write HTML forms briefly explains, you need a server side-script in order to use HTML forms reliably. Typically, there are simple server-side scripts available for simple, common

Transcript of CGI Programming in c

Page 1: CGI Programming in c

Getting Started with CGI Programming in CContent Why CGI programming? A basic example Analysis of the example So what is CGI programming? Using a C program as a CGI script The   Hello world   test How to process a simple form Using   METHOD="POST" Further reading

This is an introduction to writing CGI programs in the C language. The reader is assumed to know the basics of C as well how to write simple forms in HTML and to be able to install CGI scripts on a Web server. The principles are illustrated with very simple examples.

Two important warnings:

To avoid wasting your time, please check—from applicable local doc u ments or by contacting local webmaster—whether you can install and run CGI scripts written in C on the server. At the same time, please check how to do that in detail—specifically, where you need to put your CGI scripts.

This document was written to illustrate the idea of CGI scripting to C program mers. In practice, CGI programs are usually written in other lan guages, such as Perl, and for good reasons: except for very simple cases, CGI programming in C is clumsy and error-prone.

Why CGI programming?As my document How to write HTML forms briefly explains, you need a server side-script in order to use HTML forms reliably. Typically, there are simple server-side scripts available for simple, common ways of processing form submissions, such as sending the data in text format by E-mail to a specified address.

However, for more advanced processing, such as collecting data into a file or database, or retrieving information and sending it back, or doing some calculations with the submitted data, you will probably need to write a server-side script of your own.

CGI is simply an interface between HTML forms and server-side scripts. It is not the only possibility—see the excellent tutorial How the web works: HTTP and CGI explained by Lars Marius Garshol for both an introduction to the concepts of CGI and notes on other pos si bil ities.

Page 2: CGI Programming in c

If someone suggests using JavaScript as an alternative to CGI, ask him to read my JavaScript and HTML: possibilities and caveats. Briefly, JavaScript is inherently unreliable at least if not “backed up” with server-side scripting.

A basic exampleThe above-mentioned How the web works: HTTP and CGI explained is a great tutorial. The following introduction of mine is just another attempt to present the basics; please consult other sources if you get confused or need more information.

Let us consider the following simple HTML form:

<form action="http://www.cs.tut.fi/cgi-bin/run/~jkorpela/mult.cgi"><div><label>Multiplicand 1: <input name="m" size="5"></label></div><div><label>Multiplicand 2: <input name="n" size="5"></label></div><div><input type="submit" value="Multiply!"></div></form>

It will look like the following on your current browser:

Multiplicand 1: 

Multiplicand 2: 

You can try it if you like. Just in case the server used isn’t running and accessible when you try it, here’s what you would get as the result:

Multiplication resultsThe product of 4 and 9 is 36.

Analysis of the exampleWe will now analyze how the example above works.

Assume that you type 4 into one input field and 9 into another and then invoke sub mis sion—typically, by clicking on a submit button. Your browser will send, by the HTTP protocol, a request to the server at www.cs.tut.fi. The browser pick up this server name from the value of ACTION attribute where it occurs as the host name part of a URL. (Quite often, theACTION attribute refers, often using a relative URL, to a script on the same server as the document resides on, but this is not necessary, as this example shows.)

When sending the request, the browser provides additional information, specifying a relative URL, in this case/cgi-bin/run/~jkorpela/mult.cgi?m=4&n=9

Multiply!

Page 3: CGI Programming in c

This was constructed from that part of the ACTION value that follows the host name, by appending a question mark “?” and the form data in a specifically encoded format.

The server to which the request was sent (in this case, www.cs.tut.fi) will then process it according to its own rules. Typically, the server’s configuration defines how the relative URLs are mapped to file names and which directories/folders are interpreted as containing CGI scripts. As you may guess, the part cgi-bin/ in the URL causes such interpretation in this case. This means that instead of just picking up and sending back (to the browser that sent the request) an HTML document or some other file, the server invokes a script or a program specified in the URL (mult.cgi in this case) and passes some data to it (the datam=4&n=9 in this case).

It depends on the server how this really happens. In this particular case, the server actually runs the (executable) program in the file mult.cgi in the subdirectory cgi-bin of user jkorpela’s home directory. It could be something quite different, depending on server configuration.

Using a C program as a CGI scriptIn order to set up a C program as a CGI script, it needs to be turned into a binary executable program. This is often problematic, since people largely work on Windows whereas servers often run some version of UNIX or Linux. The system where you develop your program and the server where it should be installed as a CGI script may have quite different architectures, so that the same executable does not run on both of them.

This may create an unsolvable problem. If you are not allowed to log on the server and you cannot use a binary-compatible system (or a cross-compiler) either, you are out of luck. Many servers, however, allow you log on and use the server in interactive mode, as a “shell user,” and contain a C compiler.

You need to compile and load your C program on the server (or, in principle, on a system with the same architecture, so that binaries produced for it are executable on the server too).

Normally, you would proceed as follows:

1. Compile and test the C program in normal interactive use.

2. Make any changes that might be needed for use as a CGI script. The program should read its input according to the intended form sub mis sion method. Using the default GETmethod, the input is to be read from the environment variable. QUERY_STRING. (The program may also read data from files—but these must then reside on the server.) It should generate output on the standard output stream (stdout) so that it starts with suitable HTTP headers. Often, the output is in HTML format.

3. Compile and test again. In this testing phase, you might set the environment variableQUERY_STRING so that it contains the test data as it will be sent as form data. E.g., if you intend to use a form where a field named foo contains the input data, you can

Page 4: CGI Programming in c

give the commandsetenv QUERY_STRING "foo=42" (when using the tcsh shell)orQUERY_STRING="foo=42" (when using the bash shell).

4. Check that the compiled version is in a format that works on the server. This may require a recompilation. You may need to log on into the server computer (using Telnet, SSH, or some other terminal emulator) so that you can use a compiler there.

5. Upload the compiled and loaded program, i.e. the executable binary program (and any data files needed) on the server.

6. Set up a simple HTML document that contains a form for testing the script, etc.

You need to put the executable into a suitable directory and name it according to server-specific conventions. Even the compilation commands needed here might differ from what you are used to on your workstation. For example, if the server runs some flavor of Unix and has the Gnu C compiler available, you would typically use a compilation command likegcc -o mult.cgi mult.c and then move (mv) mult.cgi to a directory with a name likecgi-bin. Instead of gcc, you might need to use cc. You really need to check local instructions for such issues.

The filename extension .cgi has no fixed meaning in general. However, there can beserver-dependent (and operating system dependent) rules for naming executable files.Typical extensions for executables are .cgi and .exe.

The Hello world testAs usual when starting work with some new programming technology, you should probably first make a trivial program work. This avoids fighting with many potential problems at a time and concentrating first on the issues specific to the environment, here CGI.

You could use the following program that just prints Hello world but preceded by HTTP headers as required by the CGI interface. Here the header specifies that the data is plain ASCII text.

#include <stdio.h>int main(void) { printf("Content-Type: text/plain;charset=us-ascii\n\n"); printf("Hello world\n\n"); return 0;}

After compiling, loading, and uploading, you should be able to test the script simply by entering the URL in the browser’s address bar. You could also make it the destination of a normal link in an HTML document. The URL of course depends on how you set things up; the URL for my installed Hello world script is the following:http://www.cs.tut.fi/cgi-bin/run/~jkorpela/hellow.cgi

Page 5: CGI Programming in c

ken (even by IE) as starting a tag.

.

Using C for CGI Programming

From Issue #132April 2005

Mar 01, 2005  By Clay Dowling in

Software

You can speed up complex Web tasks while retaining the simplicity of CGI. With many useful libraries available, the jump from a scripting language to C isn't as big as you might think.

Perl, Python and PHP are the holy trinity of CGI application programming. Stores have shelves full of books about these languages, they're covered well in the computer press and there's plenty on the Internet about them. A distinct lack of information exists, however, on using C to write CGI applications. In this article, I show how to use C for CGI programming and lay out some situations in which it provides significant advantages.

I use C in my applications for three reasons: speed, features and stability. Although conventional wisdom says otherwise, my own benchmarks have found that C and PHP are equivalent in speed when the processing to be done is simple. When there is any complexity to the processing, C wins hands-down.

In addition, C provides an excellent feature set. The language itself comes with a bare-bones set of features, but a staggering number

Page 6: CGI Programming in c

of libraries are available for nearly any job for which a computer is used. Perl, of course, is no slouch in this area, and I don't contend that C offers more extensibility, but both can fill nearly any bill.

Furthermore, CGI programs written in C are stable. Because the program is compiled, it is not as susceptible to changes in the operating environment as PHP is. Also, because the language is stable, it does not experience the dramatic changes to which PHP users have been subjected over the past few years.

The Application

My application is a simple event listing suitable for a business to list upcoming events, say, the meeting schedule for a day or the events at a church. It provides an administrative interface intended to be password-protected and a public interface that lists all upcoming events (but only upcoming events). This application also provides for runtime configuration and interface independence.

I use a database, rather than write my own data store, and a configuration file contains the database connection information. A collection of files is used to provide interface/code separation.

The administrative interface allows events to be listed, edited, saved and deleted. Listing events is the default action if no other action is provided. Both new and existing events can be saved. The interface consists of a grid screen that displays the list of events and a detail screen that contains the full record of a single event.

The database schema for this application consists of a single table, defined in Listing 1. This schema is MySQL-specific, but an equivalent schema can be created for any database engine.

Listing 1. MySQL Schema

CREATE TABLE event (

event_no int(11) NOT NULL auto_increment,

event_begin date NOT NULL default '0000-00-00',

name varchar(80) NOT NULL default '',

location varchar(80) NOT NULL default '',

begin_hour varchar(10) default NULL,

end_hour varchar(10) default NULL,

Page 7: CGI Programming in c

event_end date NOT NULL default '0000-00-00',

PRIMARY KEY (event_no),

KEY event_date (event_begin)

)

The following functions are the minimum necessary to implement the functionality of the administrative interface: list_events(), show_event(), save_event() and delete_event(). I also am going to abstract the reading and writing of database data into their own group of functions. This keeps each function simpler, which makes debugging easier. The functions that I need for the data-storage interface are event_create(), event_destroy(), event_read(), event_write and event_delete. To make my life easier, I'm also going to add event_fetch_range(), so I can choose a range of events—something I need to do in at least two places.

Next, I need to abstract my records to C structures and abstract database result sets to linked lists. Abstraction lets me change database engines or data representation with relatively little expense, because only a little part of my code deals directly with the data store.

There isn't room here to print all of my source code. Complete source code and my Makefile can be downloaded from my Web site (see the on-line Resources).

Tools

The first hurdle to overcome when using C is acquiring the set of tools you need. At bare minimum, you need a CGI parser to break out the CGI information for you. Chances are good that you're also looking for some database connectivity. A little bit of logic/interface independence is good too, so you aren't rewriting code every time the site needs a makeover.

For CGI parsing, I recommend the cgic library from Thomas Boutell (see Resources). It's shockingly easy to use and provides access to all parts of the CGI interface. If you're a C++ person, the cgicc libraries also are suitable (see Resources), although I found the Boutell library to be easier to use.

Page 8: CGI Programming in c

MySQL is pretty much the standard for UNIX Web development, so I stick with it for my sample application. Every significant database engine has a functional C interface library, though, so you can use whatever database you like.

I'm going to provide my own interface-independence routines, but you could use libxml and libxslt to do the same thing with a good deal more sophistication.

CGI Programming in Java

Java Programming Resources home. Core Web Programming home.

These examples cover using Java for both the client and the server side of the CGI process. The client-side part covers using GET and POST from applets to talk to CGI programs (regardless of what language the CGI

programs are written in). The server-side part covers implementing CGI programs in Java that handle GET and POST (regardless of whether the client uses HTML forms or applets), and also includes a URL decoder and CGI form parser in Java (and a similar parser for cookie values). The examples are extracted

from Chapters 17 and 18 of Core Web Programming from Prentice Hall. 1996-99 Marty Hall.

If you are new to CGI, I very strongly recommend that you consider Java servlets and JavaServer Pages (JSP) instead of "regular" CGI. Please see

my tutorial on Java servlets and JSP 1.0 for more detail.

For more info on CGI, see the CGI book shelf.

CGI Client-Side Examples

These examples are taken from chapter 17. The chapter also includes complete coverage of all the HTML FORM elements, ISINDEX, ISMAP, and methods for sending GET and POST data from Java applets. Chapter 16 covers HTTP, cookies, and public-

Page 9: CGI Programming in c

key cryptography. Chapters 19 and 20 cover JavaScript, with specific examples on using JavaScript to validate FORM values before they are submitted, processing cookies on the client instead of on the server, calling Java from JavaScript, and calling JavaScript from Java. See the source code archive for example code.

Java Source Description On-Line Example

SearchYahoo.java. Requires SearchService.java.

A simple example of an applet as a CGI client that works like an HTML form, sending GET data and having the browser display the results. This one talks to the Yahoo! search engine.

SearchYahoo.html

SearchExcite.java. Requires SearchService.java.

Another example of an applet as a CGI client. This one talks to the Excite search engine, demonstrating one advantage of using Java instead of forms: you can more easily reuse code in other applications (this one extends the same class that the Yahoo applet did).

SearchExcite.html

ShowFile.java

An applet that sends data via GET and then reads the results itself (rather than having the browser display results).

ShowFile.html

Weather.java. The client-side (applet) requires CityChooser.java andWeatherPanel.java. The server-side is answered by the WeatherInfo script, which then invokes WeatherInfo.java.

An applet that sends data via POST and then reads the result.

Weather.html

CGI Server-Side Examples

Page 10: CGI Programming in c

These examples come from chapter 18. The chapter also includes complete coverage of all the CGI environment variables, a URL decoder, CGI parser, and Cookie parser in Java, server-side Java and the Servlet API, and a quick overview of non-CGI alternatives such as NSAPI, ISAPI, LiveWire, and JDBC. JDBC is covered in depth in chapter 15. Servlets are a particularly good alternative for people who are installing/customizing their own server, or whose employer's or ISP's server already supports them. You should very seriously consider using them instead of "standard" CGI with Java if you fall in this category. Please see my tutorial on Java servlets and JSP 1.0 for more detail.

Shell Script Interface

(Unix Specific)

Java Source(Portable)

DescriptionOn-Line Example

CgiHello N/A

An extremely simple CGI Script that outputs "Hello, World".

CgiHello

ShowData N/A

A simple CGI script that shows any attached data.

ShowData

CgiGet CgiGet.java. Requires CgiShow.java. Note that some browsers will try to interpret the HTML strings in the print statements and the result may be formatted strangely when viewed in the browser. But you can save the file to disk to edit it normally.

A simple script that passes the query data from the QUERY_STRING variable to a Java program

CgiGet

Page 11: CGI Programming in c

that builds a page showing the data supplied.

CgiCommandLine

CgiCommandLine.java

A simple script that passes the command-line data to a Java program that builds a page showing the data supplied. Arguments are separated by plus signs ("+") and cannot contain an equals sign ("=").

CgiCommandLine

IsIndex IsIndex.java A script that passes the data to a Java program that builds an HTML document that uses ISINDEX. Data-entry page or results page is built

IsIndex

Page 12: CGI Programming in c

depending on whether any data is supplied.

CgiPost CgiPost.java

A script that invokes a Java program without passing any data to it. The program reads data from standard input.

CgiPost.html

ShowParseShowParse.java. Requires QueryStringParser.java, CgiParser.java, LookupTable.java,URLDecoder.java, and StringVector.java.

A script that passes data to a Java program that separates the various attached variables, URL decodes their values, and produces a table of the results. Can accept GET or POST.

ShowParse version using GET. See below for a version that sends POST data.

ShowParse ShowParse.java. Requires QueryStringParser.java, CgiParser.java, LookupTable.java,URLDecoder.java, and StringVector.java.

A script that passes data to a Java program

AdBuilder.html; sends POST data to ShowParse.

Page 13: CGI Programming in c

that separates the various attached variables, URL decodes their values, and produces a table of the results. Can accept GET or POST.

See above for a version that sends GET data.

CssTestCssTest.java. Requires CssChoices.java, CookieParser.java, CgiParser.java,LookupTable.java, and URLDecoder.java.

A script that passes data to a Java program that builds a page to let you test out cascading style sheet properties. Properties selected are stored as cookies, which are parsed and used as defaults in later sessions.

CssTest

Java is a trademark of Sun Microsystems. The original of this document can be found at http://www.apl.jhu.edu/~hall/java/CGI-with-Java. 1996-1998 Marty Hall ([email protected]).

Page 14: CGI Programming in c

Web-Database Programming: CGI and Java Servlets

NOTE: This document assumes a basic knowledge of HTML. We will not be providing documentation for HTML coding apart from the creation of forms. There are dozens of tutorials available online. You might check out the NCSA Beginner's Guide to HTML.

Overview Retrieving Input from the User

o Forms o Server-Side Input Handling - CGI o Server-Side Input Handling - Java

Returning Output to the User o CGI Output o Java Output

Sample Code and Coding Tips o CGI Sample Code o CGI Setup o CGI Debugging o Java Sample Code o Java Compilation in Unix o Servlet Setup o Handling Special Characters

Overview

CGI or Common Gateway Interface is a means for providing server-side services over the web by dynamically producing HTML documents, other kinds of documents, or performing other computations in response to communication from the user. In this assignment, students who want to interface with the Oracle database using Oracle's Pro*C precompiled language will be using CGI.

Page 15: CGI Programming in c

Java Servlets are the Java solution for providing web-based services. They provide a very similar interface for interacting with client queries and providing server responses. As such, discussion of much of the input and output in terms of HTML will overlap. Students who plan to interface with Oracle using JDBC will be working with Java Servlets.

Both CGI and Java Servlets interact with the user through HTML forms. CGI programs reside in a special directory, or in our case, a special computer on the network (cgi-courses.stanford.edu), and provide service through a regular web server. Java Servlets are separate network object altogether, and you'll have to run a special Servlet program on a specific port on a Unix machine.

Retrieving Input from the User

Input to CGI and Servlet programs is passed to the program using web forms. Forms include text fields, radio buttons, check boxes, popup boxes, scroll tables, and the like.

Thus retrieving input is a two-step process: you must create an HTML document that provides forms to allow users to pass information to the server, and your CGI or Servlet program must have a means for parsing the input data and determining the action to take. This mechanism is provided for you in Java Servlets. For CGI, you can either code it yourself, find libraries on the internet that handle CGI input, or use the following example code that we put together for you: cgiparse.c.

Forms

Forms are designated within an HTML document by the fill-out form tag:

<FORM METHOD = "POST" ACTION = "http://form.url.com/cgi-bin/cgiprogram"> ... Contents of the form ...</FORM>

The URL given after ACTION is the URL of the CGI program (your program). The METHOD is the means of transferring data from the form to the CGI program. In this example, we have used the "POST" method, which is the recommended method. There is another method called "GET", but there are common problems associated with this method. Both will be discussed in the next section.

Page 16: CGI Programming in c

Within the form you may have anything except another form. The tags used to create user interface objects are INPUT, SELECT, and TEXTAREA.

The INPUT tag specifies a simple input interface:

<INPUT TYPE="text" NAME="thisinput" VALUE="default" SIZE=10 MAXLENGTH=20>

<INPUT TYPE="checkbox" NAME="thisbox" VALUE="on" CHECKED>

<INPUT TYPE="radio" NAME="radio1" VALUE="1">

<INPUT TYPE="submit" VALUE="done">

<INPUT TYPE="radio" NAME="radio1" VALUE="2" CHECKED>

<INPUT TYPE="hidden" NAME="notvisible" VALUE="5">

Which would produce the following form:

The different attributes are mostly self-explanatory. The TYPE is the variety of input object that you are presenting. Valid types include "text", "password", "checkbox", "radio", "submit", "reset", and "hidden". Every input but "submit" and "reset" has a NAME which will be associated with the value returned in the input to the CGI program. This will not be visible to the user (unless they read the HTML source). The other fields will be explained with the types:

"text" - refers to a simple text entry field. The VALUE refers to the default text within the text field, the SIZE represents the visual length of the field, and the MAXLENGTH indicates the maximum number of characters the textfield will allow. There are defaults to all of these (nothing, 20, unlimited).

"password" - the same as a normal text entry field, but characters entered are obscured.

"checkbox" - refers to a toggle button that is independently either on or off. The VALUE refers to the string sent to the CGI server when the button is checked (unchecked boxes are disregarded). The default value is "on".

"radio" - refers to a toggle button that may be grouped with other toggle buttons such that only one in the group can be on. It's essentially the same as the checkbox, but any radio button with the same NAME attribute will be grouped with this one.

"submit" and "reset" - these are the pushbuttons on the bottom of most forms you'll see that submit the form or clear it. These are not required to have

default done

Page 17: CGI Programming in c

a NAME, and the VALUE refers to the label on the button. The default names are "Submit Query" and "Reset" respectively.

"hidden" - this input is invisible as far as the user interface is concerned (though don't be fooled into thinking this is some kind of security feature -- it's easy to find "hidden" fields by perusing a document source or examining the URL for a GET method). It simply creates an attribute/value binding without need for user action that gets passed transparently along when the form is submitted.

The second type of interface is the SELECT interface, which includes popup menus and scrolling tables. Here are examples of both:

<SELECT NAME="menu"> <OPTION>option 1 <OPTION>option 2 <OPTION>option 3 <OPTION SELECTED>option 4 <OPTION>option 5 <OPTION>option 6 <OPTION>option 7</SELECT>

<SELECT NAME="scroller" MULTIPLE SIZE=7> <OPTION SELECTED>option 1 <OPTION SELECTED>option 2 <OPTION>option 3 <OPTION>option 4 <OPTION>option 5 <OPTION>option 6 <OPTION>option 7</SELECT>

Which will give us:

The SIZE attribute determines whether it is a menu or a scrolled list. If it is 1 or it is absent, the default is a popup menu. If it is greater than 1, then you will see a scrolled list with SIZE elements. The MULTIPLE option, which forces the select to be a scrolled list, signifies that a more than one value may be selected (by default only one value can be selected in a scrolled list).

option 4

option 1option 2option 3option 4option 5option 6option 7 Submit Reset

Page 18: CGI Programming in c

OPTION is more or less self-explanatory -- it gives the names and values of each field in the menu or scrolled table, and you can specify which are SELECTED by default.

The final type of interface is the TEXTAREA interface:

<TEXTAREA NAME="area" ROWS=5 COLS=30>Mary had a little lamb.A little lamb?A little lamb!Mary had a little lamb.It's fleece was white as snow.</TEXTAREA>

As usual, the NAME is the symbolic reference to which the input will be bound when submitted to the CGI program. The ROWS and COLS values are the visible size of the field. Any number of characters can be entered into a text area.

The default text of the text area is entered between the tags. Whitespace is supposedly respected (as between <PRE> HTML tags), including the newline after the first tag and before the last tag.

Server-Side Input Handling -- CGI

The form contents will be assembled into an encoded query string. Using the GET method, this string is available in the environment variable QUERY_STRING. It is actually passed to the program through the URL -- examine the URL for the first of the forms above:

http://asdf.asdf.asdf/asdf?thisinput=default&thisbox=on&radio1=2

Everything after the '?' is the query string. You'll see that a number of expressions appear concatenated by & symbols -- each expression assigns a string value to each form object. In this case, the text field named "thisinput" has the value "default", which is what was typed into the field, the checkbox "thisbox" has the value "on", and the radio button group "radio1" has the value "2" (the second button is checked -- note that this is the value I gave it, not a default value. The default is "on").

Let's look at another example from the second form:

Submit

Page 19: CGI Programming in c

http://zxcv.zxcv.zxcv/zxcv?menu=option+4&scroller=option+1&scroller=option+2

The menu has option 4 selected, and the scroller has option 1 and option 2 selected. Note that spaces are converted to '+' symbols in the URL string. The character '+' is converted to its hex value %2B. Other characters similarly converted are & (to %26), % (to %25), and $ (to %24). This conversion is automatic.

Using GET is not recommended, however. Some systems will truncate the URL before passing it to the CGI program, and thus the QUERY_STRING environment variable will contain only a prefix of the actual query string. Instead, you should use the POST method.

The POST query string is encoded in precisely the same form as the GET query string, but instead of being passed in the URL and read into the QUERY_STRING variable, it is given to the CGI program as standard input, which you can thus read using ANSI functions or regular character reading functions. The only quirk is that the server will not send EOF at the end of the data. Instead, the size of the string is passed in the environment variable CONTENT_LENGTH, which can be accessed using the normal stdlib.h function:

char *value; int length;

value = getenv("CONTENT_LENGTH"); sscanf(value, "%d", &length);

Decoding the data is thus just a question of walking through the input and picking out the values. These values can then be used to determine what the user wants to see.

We have written a very simple, linear-search-based mechanism for parsing the input string. These are located, as mentioned above, at cgiparse.c. You might want to cut and paste these into your own code or to use the .h file provided. You can use this in your CGI programs by calling Initialize() at the beginning of your code, and then calling GetFirstValue(key) and GetNextValue(key) to retreive the bindings for each of the FORM parameters. See the comments in the file for more details.

Server-Side Input Handling -- Java

Java handles GET and POST slightly differently. The parsing of the input is done for you by Java, so you are separated from the actual format of the input data completely. Your program will be an object subclassed off ofHttpServlet, the generalized Java Servlet class for handling web services.

Page 20: CGI Programming in c

Servlet programs must override the doGet() or doPost() messages, which are methods that are executed in response to the client. There are two arguments to these methods, HttpServletRequest request andHttpServletResponse response. Let's take a look at a very simple servlet program, the traditional HelloWorld (this time with a doGet method):

import java.io.*;import java.text.*;import java.util.*;import javax.servlet.*;import javax.servlet.http.*;

public class Hello extends HttpServlet { public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException { response.setContentType("text/html"); PrintWriter out = response.getWriter(); out.println("<html>"); out.println("<head>"); String title = "Hello World"; out.println("<title>" + title + "</title>"); out.println("</head>"); out.println("<body bgcolor=white>"); out.println("<h1>" + title + "</h1>"); String param = request.getParameter("param"); if (param != null) out.println("Thanks for the lovely param='" + param + "' binding.");

out.println(""); out.println(""); }}

We'll discuss points in this code again in the section on Java Output, but for now, we will focus on the input side. The argument HttpServletRequest request represents the client request, and the values of the parameters passed from the HTML FORM can be retrieved by calling the HttpServletRequest getParameter method. This method takes as its argument the name of the parameter (the name of the HTML INPUT object), and returns as a Java String the value assigned to the parameter. In cases where the parameter may have multiple bindings, the method getParameterValues can be used to retrieve the values in an array of Java Strings -- note that getParameter will return the first value of this array. It is through these mechanisms that you can retrieve any of the values entered or implicit in the form.

As might be inferred from the example above, Java returns null if the parameter for whose name you request does not have a value. Recall that unchecked buttons'

Page 21: CGI Programming in c

bindings are not passed in a POST message -- you can check for null to determine when buttons are off.

Returning Output to the User

In your project, you are going to be concerned with returning HTML documents to the user. The documents will be dynamically created based on the output of the query. You can format it however you like, using ordinary HTML formatting routines

CGI Output

The only work you have to do apart from constructing an HTML document on the fly with the output from the query is to add a short header at the top of the file. Your header will represent the MIME type for HTML, and consists of a single line of text followed by a blank line:

content-type: text/html

<HTML> ... file ... </HTML>

There are, of course, many other types that you can return, but this is all you'll need to return your database queries.

CGI returns the HTML document to the user through standard output from the program, so you can just use a regular printf function in your C programs. The format for setting the content type is just:

printf("content-type: text/html\n\n");

Java Output

Let's look back at our Java code example. You'll see a number of differences between the Servlet code and the CGI approach. Output is all handled by the HttpServletResponse object, which allows you to set the content type through the setContentType method. Instead of printing the HTTP header yourself, you tell the HttpServletResponse object that you want the content type to be "text/html" explicitly.

Page 22: CGI Programming in c

All HTML is returned to the user through a PrintWriter object, that is retrieved from the response object using the getWriter method. HTML code is then returned line by line using the println method.

Assuming that you all have a basic background in Java, so we won't provide a detailed treatment of exceptions here, but do note that IOException and ServletException both must either be handled or thrown.

Sample Code and Coding Tips

I recommend that everyone attempt to play around a little bit with both of the methods, Java Servlet and CGI, if you have the time and inclination (though you only have to implement your database interface in one of them, of course).

CGI Sample Code

Here is a demonstration of a PRO*C CGI program.You can also check out the source code.The HTML page demonstrates a few input features, though the only ones that do anything are the username and password fields. These are used to log onto your Oracle account when the CGI program is executed, create a table, do some insertions, demonstrate the production of HTML formatting through queries on the data (including a demonstration of constructing a new form, which may provide some of you with ideas of how to make a really cool interface), and then drop the table from your database. You may freely cannibalize whatever portions you find useful.

CGI Setup

Your CGI script will be run from cgi-courses.stanford.edu. The URL for your CGI executable will be: http://cgi-courses.stanford.edu/~username/cgi-bin/scriptname

You will need to perform the following actions before a CGI program will run:

Get an account on cgi-courses.stanford.edu. Create a directory in your home folder to hold your cgi binary executables:

Page 23: CGI Programming in c

mkdir cgi-bin

Set access levels on your cgi-bin directory for your cgi-courses.stanford.edu account ("username" should be replaced with your username):

fs setacl cgi-bin username.cgi write

Make sure that your executeables correctly set all environment variables (normally this is done by /usr/class/cs145/all.env, but this is not available on the cgi machine, so you have to do it explicitly). Here is an example function that you should run before attempting to connect to the database (this function is in C, but you can pretty much just lift the settings and paste them into Perl or PHP, which also need them to connect to the database):

void SetEnvs(void) { putenv("ORACLE_SID=SHR1_PRD"); putenv("ORACLE_HOME=/usr/pubsw/apps/oracle/8.1.7"); putenv("ORACLE_TERM=xsun5"); putenv("TNS_ADMIN=/usr/class/cs145/sqlnet"); putenv("TWO_TASK=SHR1_PRD");

}

Move the executable into your cgi-bin folder. Change the permissions on your cgi executable:

chmod 701 scriptname

Use HTML forms to access your new program at http://cgi-courses.stanford.edu/~username/cgi-bin/scriptname.

Here is the homepage of the leland CGI service, which has a FAQ and gives some information about the capabilities of the system. Please check here first if your CGI programs are giving you errors.

CGI Debugging

Due to popular demand, a new cgi debugging feature was just added to the cgi service. It's not in the leland CGI docs yet. If you access your script like so:

http://cgi-courses/cgi-bin/sboxd/~username/scriptname

The script will execute with extra debug info:

All STDERR goes to the browser

Page 24: CGI Programming in c

A header is included, so lack of any output or lack of Content Type will not cause Internal Server Error.

If still receiving Internal Server Error, consult the cgi FAQ or look in the server log: http://cgi-courses/logs/error_log.

Note, the log shows only several recent entries, due to system issues.

An alternative method is to run your cgi program from command-line, without using the web browser. Put your CGI input into the environment variable QUERY_STRING and run your program. For example (assuming your program is called cgiprog and expects two parameters name1 and name2):

cd ~/cgi-bin setenv QUERY_STRING 'name1=abc&name2=def' cgiprog

Note: If you want to use debugging tools such as dbx or gdb, you need to modify Makefile to add the flag -g after cc or g++.

Java Sample Code

You can provide your HTML FORMs on permanent webpages in your personal WWW directory -- though this isn't recommended because you then have to hard code the Servlet addresses -- or in the webpages subdirectory where you run your Servlet (see below in the Servlet setup section). Alternatively (or additionally) you can integrate FORMs into Servlets by creating a FORM on the fly in your Servlet program, which will be invoked when doPost() or doGet() are invoked by the client. An example of a program that creates a FORM on the fly can be found at RequestParamExample.java.

An example that uses JDBC to implement an interface for querying information about a certain US state (based on the JDBC example programs provided in PDA assignment 5) can be found at StateQuerier.java.

The following two examples implement the state query, but it separates the query form from the answer form, providing these services with two different Servlets: StateQueryForm.java and StateQueryAns.java.

You can find the very simple example given above in the text at Hello.java.

Page 25: CGI Programming in c

One last example demonstrates the concept of a Session, which we do not cover in this handout, but you can use to liven up your interface can be found at HelloSession.java.

Java Compilation in Unix

Compiling Servlets in UNIX requires a few changes to your PATH and CLASSPATH environment variables. These changes have been made for you in the source file /afs/ir/class/cs145/all.env. They include the following additions:

setenv PATH /afs/ir/class/cs145/jsdk2.1:/usr/pubsw/apps/jdk1.2/bin:${PATH}setenv CLASSPATH /afs/ir/class/cs145/jsdk2.1/servlet.jar:$CLASSPATH

If there are any difficulties, let us know. These have been tested on the elaine machines and are assumed to be operational on the leland Sparc machines (elaine, myth, epic, saga).

You also have to set up a specific directory structure to provide Servlets. The directory structure required by Servlets is essentially:

[anydir] [servletdir] webpages WEB-INF servlets

A shell script to build this hierarchy is provided at /afs/ir/class/cs145/code/bin/buildServletDirectory (after you run source /afs/ir/class/cs145/all.env (which you probably should just add to your .cshrc file), you can run buildServletDirectory by just typing the command).

You can store .html documents in your webpages directory, and they will be accessible at your Servlet address (see below), while all Servlets you write have to be located in the servlets directory to be recognized.

Further information on the Java Servlet API can be found at Servlet Package Documentation page.

Page 26: CGI Programming in c

Servlet Setup

The directory structure for your servlets and HTML documents was outlined in the previous section. Static HTML documents may be placed in the webpages directory and are accessible from the web at the addresshttp://machinexx:portnum/page.html, where machinexx refers to the machine from which you're running the webserver (e.g. elaine12, saga22, myth7, etc.), portnum is a specific port (see below), and page.htmlis the name of the HTML page that you are serving. You may find it useful to create a static HTML document or a hierarchy of static documents to serve as the jumping off point for your Servlets, where your HTML FORMs that start the interaction with the database are found.

Servlets will be found in the directory servletdir/webpages/WEB-INF/servlets, and will just be the .class files that you compile from your .java files using javac. These may be reached on the web using the URLhttp://machinexx:portnum/servlet/servletname. Note that the servlet directory is singular in the URL but plural in Unix, while the Servlet itself loses its .class in the URL. HTML and other documents contained in the servlets directory cannot be accessed over the web.

Once you have your directory set up and your Servlets compiled, you have to run the Java JSDK 2.1 webserver manually on a specific leland machine in order to provide these documents over the web. The steps involved in starting the server are as follows:

Choose a port number in the range 5000-65000. This will bind your server application to that port for the machine on which you're running your server. Try to choose a random number and remember it -- you will be the only person on that machine who can use that port, and you will need it to have access over the web.

From the root of your servlet directory (if you run our buildServletDirectory script, then it will be called servletdir), start the server by calling startserver -port portnum from the Unix command line, where portnum is the port number you chose above. The server will begin in the background, and you can see it using the ps command. If you do not enter a port number, the default port number, 8080, will be chosen for you (you can actually set the default yourself -- after you've run the server once, it will create a configuration file called "default.cfg" for you -- it finds the default port number here).

Page 27: CGI Programming in c

From your browser, enter the URL of a webpage or servlet contained in your servletdir hierarchy using the address structure mentioned above. Now you can play with your interface.

If you would like to stop the server, issue the command stopserver. If you want to recompile your servlets, you have to stop the server and restart

it again. Static HTML pages that you are hosting from the webpages directory, however, can be changed at will.

Handling Special CharactersThe special characters &, <, and >, need to be escaped as &amp;, &lt;, and &gt;, respectively in HTML text (see NCSA Beginner's Guide to HTML). Moreover, special characters appearing in URL's need to be escaped, differently than when they appear in HTML text. For example, if you link on text with special characters and want to embed them into extended URLs as parameter values, you need to escape them: convert space to + or %20, convert & to %26, convert = to %3D, convert % to %25, etc. (In general, any special character can be escaped by a percent sign followed by the character's hexadecimal ASCII value.) Important: Do NOT escape the & that actually separates parameters! For example, if you want two parameters p1 and p2 to have the values 3 and M&M, you should write something like:

http://cgi-courses.stanford.edu/~username/cgi-bin/cgiprog?p1=3&p2=M%26M

Be careful not to confuse the escape strings for HTML text with those for URL's.