CGIscriptor 2.4: An implementation of integrated server side CGI scripts

HYPE

CGIscriptor merges plain ASCII HTML files transparantly and safely with CGI variables, in-line PERL code, shell commands, and executable scripts in many languages (on-line and real-time). It combines the "ease of use" of HTML files with the versatillity of specialized scripts and PERL programs. It hides all the specifics and idiosyncrasies of correct output and CGI coding and naming. Scripts do not have to be aware of HTML, HTTP, or CGI conventions just as HTML files can be ignorant of scripts and the associated values. CGIscriptor complies with the W3C HTML 4.0 recommendations.

In addition to its use as a WWW embeded CGI processor, it can be used as a command-line document preprocessor (text-filter).

THIS IS HOW IT WORKS

The aim of CGIscriptor is to execute "plain" scripts inside a text file using any required CGIparameters and environment variables. It is optimized to transparantly process HTML files inside a WWW server. The native language is Perl, but many other scripting languages can be used.

CGIscriptor reads text files from the requested input file (i.e., from $YOUR_HTML_FILES$PATH_INFO) and writes them to <STDOUT> (i.e., the client requesting the service) preceded by the obligatory "Content-type: text/html\n\n" or "Content-type: text/plain\n\n" string (except for "raw" files which supply their own Content-type message and only if the SERVER_PROTOCOL contains HTTP, FTP, GOPHER, MAIL, or MIME).

When CGIscriptor encounters an embedded script, indicated by an HTML4 tag

<SCRIPT TYPE="text/ssperl" [CGI="$name='default value'"] [SRC="ScriptSource"]>
PERL script
</SCRIPT>
or
<SCRIPT TYPE="text/osshell" [CGI="$name='default value'"] [SRC="ScriptSource"]>
OS Shell script
</SCRIPT>

construct (anything between []-brackets is optional, other MIME-types are supported), the embedded script is removed and both the contents of the source file (i.e., "do 'ScriptSource'") AND the script are evaluated as a PERL program (i.e., by eval()), a shell script (i.e., by a "safe" version of `Command`, qx) or an external interpreter. The output of the eval() function takes the place of the original <SCRIPT></SCRIPT> construct in the output string. Any CGI parameters declared by the CGI attribute are available as simple perl variables, and can subsequently be made available as variables to other scripting languages (e.g., bash, python, or lisp).

Example: printing "Hello World"

<HTML><HEAD><TITLE>Hello World</TITLE>
<BODY>
<H1><SCRIPT TYPE="text/ssperl">"Hello World"</SCRIPT></H1>
</BODY></HTML>

Save this in a file, hello.html, in the directory you indicated with $YOUR_HTML_FILES and access http://your_server/SHTML/hello.html (or to whatever name you use as an alias for CGIscriptor.pl). This is realy ALL you need to do to get going.

You can use any values that are delivered in CGI-compliant form (i.e., the "?name=value" type URL additions) transparently as "$name" variables in your scripts IFF you have declared them in a META or SCRIPT tag before e.g.:

<META CONTENT="text/ssperl; CGI='$name = `default value`' 
[SRC='ScriptSource']">
or
<SCRIPT TYPE=text/ssperl CGI="$name = 'default value'" 
[SRC='ScriptSource']>

After such a 'CGI' attribute, you can use $name as an ordinary PERL variable (the ScriptSource file is immediately evaluated with "do 'ScriptSource'"). The CGIscriptor script allows you to write ordinary HTML files which will include dynamic CGI aware (run time) features, such as on-line answers to specific CGI requests, queries, or the results of calculations.

For example, if you wanted to answer questions of clients, you could write a Perl program called "Answer.pl" with a function "AnswerQuestion()" that prints out the answer to requests given as arguments. You then write a HTML page "Respond.html" containing the following fragment:


<CENTER>
The Answer to your question
<META CONTENT="text/ssperl; CGI='$Question'">
<h3><SCRIPT TYPE="text/ssperl">$Question</SCRIPT></h3>
is
<h3><SCRIPT TYPE="text/ssperl" SRC="./PATH/Answer.pl">
AnswerQuestion($Question);
</SCRIPT></h3>
<CENTER>
<FORM ACTION=Respond.html METHOD=GET>
Next question: <INPUT NAME="Question" TYPE=TEXT SIZE=40><br>
<INPUT TYPE=SUBMIT VALUE="Ask">
</FORM>

The output could look like the following (in HTML-speak):


The Answer to your question

What is the capital of the Netherlands?

is

Amsterdam

Next question:

Note that the function "Answer.pl" does know nothing about CGI or HTML, it just prints out answers to arguments. Likewise, the text has no provisions for scripts or CGI like constructs. Also, it is completely trivial to extend this "program" to use the "Answer" later in the page to call up other information or pictures/sounds. The final text never shows any cue as to what the original "source" looked like, i.e., where you store your scripts and how they are called.

There are some extra's. The argument of the files called in a SRC= tag can access the CGI variables declared in the preceding META tag from the @ARGV array. Executable files are called as: `file '$ARGV[0]' ... ` (e.g., `Answer.pl \'$Question\'`;) The files called from SRC can even be (CGIscriptor) html files which are processed in-line. Furthermore, the SRC= tag can contain a perl block that is evaluated. That is,

<META CONTENT="text/ssperl; CGI='$Question' SRC='{$Question}'">

will result in the evaluation of "print do {$Question};" and the VALUE of $Question will be printed. Note that these "SRC-blocks" can be preceded and followed by other file names, but only a single block is allowed in a SRC= tag.

One of the major hassles of dynamic WWW pages is the fact that several mutually incompatible browsers and platforms must be supported. For example, the way sound is played automatically is different for Netscape and Internet Explorer, and for each browser it is different again on Unix, MacOS, and Windows. Realy dangerous is processing user-supplied (form-) values to construct email addresses, file names, or database queries. All Apache WWW-server exploits reported in the media are based on faulty CGI-scripts that didn't check their user-data properly.

There is no panacee for these problems, but a lot of work and problems can be safed by allowing easy and transparent control over which <SCRIPT></SCRIPT> blocks are executed on what CGI-data. CGIscriptor supplies such a method in the form of a pair of attributes: IF='...condition..' and UNLESS='...condition...'. When added to a script tag, the whole block (including the SRC attribute) will be ignored if the condition is false (IF) or true (UNLESS). For example, the following block will NOT be evaluated if the value of the CGI variable FILENAME is NOT a valid filename:

<SCRIPT TYPE='text/ssperl' CGI='$FILENAME' IF='CGIscriptor::CGIsafeFileName($FILENAME)'>
.....
</SCRIPT>

(the function CGIsafeFileName(String) returns an empty string ("") if the String argument is not a valid filename). The UNLESS attribute is the mirror image of IF.

A user manual follows the HTML 4 and security paragraphs below.

HTML 4 COMPLIANCE

In general, CGIscriptor.pl complies with the HTML 4 recommendations of the W3C. This means that any software to manage Web sites will be able to handle CGIscriptor files, as will web agents.

All script code should be placed between <SCRIPT></SCRIPT> tags, the script type is indicated with TYPE="mime-type", the LANGUAGE feature is ignored, and a SRC feature is implemented. All CGI specific features are delegated to the CGI attribute.

However, the behavior deviates from the W3C recommendations at some points. Most notably:

0- The scripts are executed at the server side, invisible to the client (i.e., the browser)
1- The mime-types are personal and idiosyncratic, but can be adapted.
2- Code in the body of a <SCRIPT></SCRIPT> tag-pair is still evaluated when a SRC feature is present.
3- The SRC feature reads a list of files.
4- The files in a SRC feature are processed according to file type.
5- The SRC feature evaluates inline Perl code.
6- Processed META, INS, and DIV tags are removed from the output document.
7- All attributes of the processed META tags, except CONTENT, are ignored (i.e., deleted from the output).
8- META tags can be placed ANYWHERE in the document.
9- Through the SRC feature, META tags can have visible output in the document.
10- The CGI attribute that declares CGI parameters, can be used inside the <SCRIPT> tag.
11- Use of an extended quote set, i.e., '', "", ``, (), {}, [] and their \-slashed combinations: \'\', \"\", \`\`, \(\), \{\}, \[\].
12- IF and UNLESS attributes to <SCRIPT>, <META>, <INS>, and <DIV> tags.
13- <DIV> tags cannot be nested, <DIV> tags are not rendered with new-lines.
14- The XML style <TAG .... /> is recognized and handled correctly. (i.e., no content is processed)

The reasons for these choices are:

You can still write completely HTML4 compliant documents. CGIscriptor will not force you to write "deviant" code. However, it allows you to do so (which is, in fact, just as bad). The prime design principle was to allow users to include plain Perl code. The code itself should be "enhancement free". Therefore, extra features were needed to supply easy access to CGI and Web site components. For security reasons these have to be declared explicitly. The SRC feature transparently manages access to external files, especially the safe use of executable files.

The CGI attribute handles the declarations of external (CGI) variables in the SCRIPT and META tag's.
EVERYTHING THE CGI ATTRIBUTE AND THE META TAG DO CAN BE DONE INSIDE A <SCRIPT></SCRIPT> TAG CONSTRUCT.

The reason for the IF, UNLESS, and SRC attributes (and its Perl code evaluation) were build into the META and SCRIPT tags is part laziness, part security. The SRC blocks allows more compact documents and easier debugging. The values of the CGI variables can be immediately screened for security by IF or UNLESS conditions, and even SRC attributes (e.g., email addresses and file names), and a few commands can be called without having to add another Perl TAG pair. This is especially important for documents that require the use of other (restricted) "scripting" languages that lag transparent control structures.

SECURITY

Your WWW site is a few keystrokes away from a few hundred million internet users. A fair percentage of these users knows more about your computer than you do. And some of these just might have bad intentions.

To ensure uncompromized operation of your server and platform, several features are incorporated in CGIscriptor.pl to enhance security. First of all, you should check the source of this program. No security measures will help you when you download programs from anonymous sources. If you want to use THIS file, please make sure that it is uncompromized. The best way to do this is to contact the source and try to determine whether s/he is reliable (and accountable).

BE AWARE THAT ANY PROGRAMMER CAN CHANGE THIS PROGRAM IN SUCH A WAY THAT IT WILL SET THE DOORS TO YOUR SYSTEM WIDE OPEN

I would like to ask any user who finds bugs that could compromise security to report them to me (and any other bug too, Email: R.J.J.H.vanSon@gmail.com or ifa@hum.uva.nl).

Security features

1 Invisibility
The inner workings of the HTML source files are completely hidden from the client. Only the HTTP header and the ever changing content of the output distinguish it from the output of a plain, fixed HTML file. Names, structures, and arguments of the "embedded" scripts are invisible to the client. Error output is suppressed except during debugging (user configurable).
2 Separate directory trees
Directories containing Inline text and script files can reside on separate trees, distinct from those of the HTTP server. This means that NEITHER the text files, NOR the script files can be read by clients other than through CGIscriptor.pl, UNLESS they are EXPLICITELY made available.
3 Requests are NEVER "evaluated"
All client supplied values are used as literal values (''-quoted). Client supplied ''-quotes are ALWAYS removed. Therefore, as long as the embedded scripts do NOT themselves evaluate these values, clients CANNOT supply executable commands. Be sure to AVOID scripts like:
<META CONTENT="text/ssperl; CGI='$UserValue'">
<SCRIPT TYPE="text/ssperl">$dir = `ls -1 $UserValue`;</SCRIPT>

These are a recipe for disaster. However, the following quoted form should be save (but is still not adviced):

<SCRIPT TYPE="text/ssperl">$dir = `ls -1 \'$UserValue\'`;</SCRIPT>

A special function, SAFEqx(), will automatically do exactly this, e.g., SAFEqx('ls -1 $UserValue') will execute `ls -1 \'$UserValue\'` with $UserValue interpolated. I recommend to use SAFEqx() instead of backticks whenever you can. The OS shell scripts inside

<SCRIPT TYPE="text/osshell">ls -1 $UserValue</SCRIPT>

are handeld by SAFEqx and automatically ''-quoted.

4 Logging of requests
All requests can be logged separate from the Host server. The level of detail is user configurable: Including or excluding the actual queries. This allows for the inspection of (im-) proper use.
5 Access control: Clients
The Remote addresses can be checked against a list of authorized (i.e., accepted) or non-authorized (i.e., rejected) clients. Both REMOTE_HOST and REMOTE_ADDR are tested so clients without a proper HOST name can be (in-) excluded by their IP-address. Client patterns containing all numbers and dots are considered IP-addresses, all others domain names. No wild-cards or regexp's are allowed, only partial addresses.
Matching of names is done from the back to the front (domain first, i.e., $REMOTE_HOST =~ /\Q$pattern\E$/is), so including ".edu" will accept or reject all clients from the domain EDU. Matching of IP-addresses is done from the front to the back (domain first, i.e., $REMOTE_ADDR =~ /^\Q$pattern\E/is), so including "128." will (in-) exclude all clients whose IP-address starts with 128. There are two special symbols: "-" matches HOSTs with no name and "*" matches ALL HOSTS/clients.

For those needing more expressional power, lines starting with "-e" are evaluated by the perl eval() function. E.g., '-e $REMOTE_HOST =~ /\.edu$/is;' will accept/reject clients from the domain '.edu'.

6 Access control: Files
In principle, CGIscriptor could read ANY file in the directory tree as discussed in 1. However, for security reasons this is restricted to text files. It can be made more restricted by entering a global file pattern (e.g., ".html"). This is done by default. For each client requesting access, the file pattern(s) can be made more restrictive than the global pattern by entering client specific file patterns in the Access Control files (see 5). For example: if the ACCEPT file contained the lines
*           DEMO
.hum.uva.nl LET 
145.18.230.

Then all clients could request paths containing "DEMO" or "demo", e.g. "/my/demo/file.html" ($PATH_INFO =~ /\Q$pattern\E/), Clients from *.hum.uva.nl could also request paths containing "LET or "let", e.g. "/my/let/file.html", and clients from the local cluster 145.18.230.[0-9]+ could access ALL files. Again, for those needing more expressional power, lines starting with "-e" are evaluated. For instance:
'-e $REMOTE_HOST =~ /\.edu$/is && $PATH_INFO =~ m@/DEMO/@is;'
will accept/reject requests for files from the directory "/demo/" from clients from the domain '.edu'.
Path selections starting with ! or 'not' will be inverted. That is:

*           not .wav

Will match all file and path names that do NOT contain '.wav'

7 Access control: Server side session tickets
Specific paths can be controlled by Session Tickets which must be present as a CGI or Cookie value in the request. These paths are defined in %TicketRequiredPatterns as pairs of:
('regexp' => 'SessionPath\tPasswordPath\tLogin.html\tExpiration').
Session Tickets are stored in a separate directory (SessionPath, e.g., "Private/.Session") as files with the exact same name of the TICKET variable value. The following is an example of a SESSION ticket:
Type: SESSION
IPaddress: 127.0.0.1
AllowedPaths: ^/Private/Name/
DeniedPaths: ^/Private/CreateUser\.
Expires: +3600
Username: test
...
Other content can follow.

It is adviced that Session Tickets should expire and be deleted after some (idle) time. The IP address should be the IP number at login, and the ticket will be rejected if it is presented from another IP address. AllowedPaths and DeniedPaths are perl regexps. Be careful how they match. Make sure to delimit the names to prevent access to overlapping names, eg, "^/Private/Rob" will also match "^/Private/Robert", however, "^/Private/Rob/" will not. Expires is the time the ticket will remain valid after creation (file ctime). Time can be given in s[econds] (default), m[inutes], h[hours], or d[ays], eg, "24h" means 24 hours. Only the Type: field needs be present.

Next to Session Tickets, there are four other type of ticket files:
- LOGIN tickets store information about a current login request
- PASSWORD tickets store account information to authorize login requests
- IPADDRESS tickets for IP address-only checks
- CHALLENGE tickets for challenge tasks for every request

8 Query length limiting
The length of the Query string can be limited. If CONTENT_LENGTH is larger than this limit, the request is rejected. The combined length of the Query string and the POST input is checked before any processing is done. This will prevent clients from overloading the scripts. The actual, combined, Query Size is accessible as a variable through $CGI_Content_Length.

9 Illegal filenames, paths, and protected directories
One of the primary security concerns in handling CGI-scripts is the use of "funny" characters in the requests that con scripts in executing malicious commands. Examples are inserting ';', null bytes, or <newline> characters in URL's and filenames, followed by executable commands. A special variable $FileAllowedChars stores a string of all allowed characters. Any request that translates to a filename with a character OUTSIDE this set will be rejected.
In general, all (readable files) in the ServerRoot tree are accessible. This might not be what you want. For instance, your ServerRoot directory might be the working directory of a CVS project and contain sensitive information (e.g., the password to get to the repository). You can block access to these subdirectories by adding the corresponding patterns to the $BlockPathAccess variable. For instance, $BlockPathAccess = '/CVS/' will block any request that contains '/CVS/' or:
 
die if $BlockPathAccess && $ENV{'PATH_INFO'} =~ m@$BlockPathAccess@;

10 The execution of code blocks can be controlled in a transparent way by adding IF or UNLESS conditions in the tags themselves.
That is, a simple check of the validity of filenames or email addresses can be done before any code is executed.


USER MANUAL

INTRODUCTION

CGIscriptor removes embedded scripts, indicated by an HTML 4 type <SCRIPT TYPE='text/ssperl'> </SCRIPT> or <SCRIPT TYPE='text/osshell'> </SCRIPT> constructs. The contents of the directive are executed by the PERL eval() and `` functions (in a separate name space). The result of the eval() function replaces the <SCRIPT> </SCRIPT> construct in the output file. You can use the values that are delivered in CGI-compliant form (i.e., the "?name=value&.." type URL additions) transparently as "$name" variables in your directives after they are defined in a <META> or <SCRIPT> tag. If you define the variable "$CGIscriptorResults" in a CGI attribute, all subsequent <SCRIPT> and <META> results (including the defining tag) will also be pushed onto a stack: @CGIscriptorResults. This list behaves like any other, ordinary list and can be manipulated.

Both GET and POST requests are accepted. These two methods are treated equal. Variables, i.e., those values that are determined when a file is processed, are indicated in the CGI attribute by $<name> or $<name>=<default> in which <name> is the name of the variable and <default> is the value used when there is NO current CGI value for <name> (you can use white-spaces in $<name>=<default> but really DO make sure that the default value is followed by white space or is quoted). Names can contain any alphanumeric characters and _ (i.e., names match /[\w]+/).
If the Content-type: is 'multipart/*', the input is treated as a MIME multipart message and automatically delimited. CGI variables get the "raw" (i.e., undecoded) body of the corresponding message part.

Variables can be CGI variables, i.e., those from the QUERY_STRING, environment variables, e.g., REMOTE_USER, REMOTE_HOST, or REMOTE_ADDR, or predefined values, e.g., CGI_Decoded_QS (The complete, decoded, query string), CGI_Content_Length (the length of the decoded query string), CGI_Year, CGI_Month, CGI_Time, and CGI_Hour (the current date and time).

All these are available when defined in a CGI attribute. All environment variables are accessible as $ENV{'name'}. So, to access the REMOTE_HOST and the REMOTE_USER, use, e.g.:

<SCRIPT TYPE='text/ssperl'>
($ENV{'REMOTE_HOST'}||"-")." $ENV{'REMOTE_USER'}"
</SCRIPT>

(This will print a "-" if REMOTE_HOST is not known) Another way to do this is:

<META CONTENT="text/ssperl; CGI='$REMOTE_HOST = - $REMOTE_USER'">
<SCRIPT TYPE='text/ssperl'>"$REMOTE_HOST $REMOTE_USER"</SCRIPT>
or
<META CONTENT='text/ssperl; CGI="$REMOTE_HOST = - $REMOTE_USER"
SRC={"$REMOTE_HOST $REMOTE_USER\n"}'>

This is possible because ALL environment variables are available as CGI variables. The environment variables take precedence over CGI names in case of a "name clash". For instance:

<META CONTENT="text/ssperl; CGI='$HOME' SRC={$HOME}">

Will print the current HOME directory (environment) irrespective whether there is a CGI variable from the query (e.g., Where do you live? <INPUT TYPE="TEXT" NAME="HOME">) THIS IS A SECURITY FEATURE. It prevents clients from changing the values of defined environment variables (e.g., by supplying a bogus $REMOTE_ADDR). Although $ENV{} is not changed by the META tags, it would make the use of declared variables insecure. You can still access CGI variables after a name clash with CGIscriptor::CGIparseValue(<name>).

Some CGI variables are present several times in the query string (e.g., from multiple selections). These should be defined as @VARIABLENAME=default in the CGI attribute. The list @VARIABLENAME will contain ALL VARIABLENAME values from the query, or a single default value. If there is an ENVIRONMENT variable of the same name, it will be used instead of the default AND the query values. The corresponding function is CGIscriptor::CGIparseValueList(<name>)

CGI variables collected in a @VARIABLENAME list are unordered. When more structured variables are needed, a hash table can be used. A variable defined as %VARIABLE=default will collect all CGI-parameter values whose name start with 'VARIABLE' in a hash table with the remainder of the name as a key. For instance, %PERSON will collect PERSONname='John Doe', PERSONbirthdate='01 Jan 00', and PERSONspouse='Alice' into a hash table %PERSON such that $PERSON{'spouse'} equals 'Alice'. Any default value or environment value will be stored under the "" key. If there is an ENVIRONMENT variable of the same name, it will be used instead of the default AND the query values. The corresponding function is CGIscriptor::CGIparseValueHash(<name>)

This method of first declaring your environment and CGI variables before being able to use them in the scripts might seem somewhat clumsy, but it protects you from inadvertedly printing out the values of system environment variables when their names coincide with those used in the CGI forms. It also prevents "clients" from supplying CGI parameter values for your private variables. THIS IS A SECURITY FEATURE!

NON-HTML CONTENT TYPES

Normally, CGIscriptor prints the standard "Content-type: text/html\n\n" message before anything is printed. This has been extended to include plain text (.txt) files, for which the Content-type (MIME type) 'text/plain' is printed. In all other respects, text files are treated as HTML files (this can be switched off by removing '.txt' from the $FilePattern variable). When the content type should be something else, e.g., with multipart files, use the $RawFilePattern (.xmr, see also next item). CGIscriptor will not print a Content-type message for this file type (which must supply its OWN Content-type message). Raw files must still conform to the <SCRIPT></SCRIPT> and <META> tag specifications.

NON-HTML FILES

CGIscriptor is intended to process HTML and text files only. You can create documents of any mime-type on-the-fly using "raw" text files, e.g., with the .xmr extension. However, CGIscriptor will not process binary files of any type, e.g., pictures or sounds. Given the sheer number of formats, I do not have any intention to do so. However, an escape route has been provided. You can construct a genuine raw (.xmr) text file that contains the perl code to service any file type you want. If the global $BinaryMapFile variable contains the path to this file (e.g., /BinaryMapFile.xmr), this file will be called whenever an unsupported (non-HTML) file type is requested. The path to the requested binary file is stored in $ENV('CGI_BINARY_FILE') and can be used like any other CGI-variable. Servicing binary files then becomes supplying the correct Content-type (e.g., print "Content-type: image/jpeg\n\n";) and reading the file and writing it to STDOUT (e.g., using sysread() and syswrite()).

THE META TAG

All attributes of a META tag are ignored, except the CONTENT='text/ssperl; CGI=" ... " [SRC=" ... "]' attribute. The string inside the quotes following the CONTENT= indication (white-space is ignored, "'` (){}[]-quotes are allowed, plus their \ versions) MUST start with any of the CGIscriptor mime-types (e.g.: text/ssperl or text/osshell) and a comma or semicolon. The quoted string following CGI= contains a white-space separated list of declarations of the CGI (and Environment) values and default values used when no CGI values are supplied by the query string.

If the default value is a longer string containing special characters, possibly spanning several lines, the string must be enclosed in quotes. You may use any pair of quotes or brackets from the list '', "", ``, (), [], or {} to distinguish default values (or preceded by \, e.g., \(...\) is different from (...)). The outermost pair will always be used and any other quotes inside the string are considered to be part of the string value, e.g.,

$Value = {['this'
"and" (this)]}

will result in $Value getting the default value

['this'
"and" (this)]

(NOTE that the newline is part of the default value!).

Internally, for defining and initializing CGI (ENV) values, the META and SCRIPT tags use the function "defineCGIvariable($name, $default)" (scalars) and "defineCGIvariableList($name, $default)" (lists). These functions can be used inside scripts as "CGIscriptor::defineCGIvariable($name, $default)" and "CGIscriptor::defineCGIvariableList($name, $default)".

The CGI attribute will be processed exactly identical when used inside the <SCRIPT> tag. However, this use is not according to the HTML 4.0 specifications of the W3C.

THE DIV/INS TAG

There is a problem when constructing html files containing server-side perl scripts with standard HTML tools. These tools will refuse to process any text between <SCRIPT></SCRIPT> tags. This is quite annoying when you want to use large HTML templates where you will fill in values.

For this purpose, CGIscriptor will read the neutral <DIV CLASS="ssperl" ID="varname"></DIV> <INS CLASS="ssperl" ID="varname"></INS> tag (in Cascading Style Sheet manner) Note that "varname" has NO '$' before it, it is a bare name. Any text between these <DIV ...></DIV> or <INS ...></INS> tags will be assigned to '$varname' as is (e.g., as a literal). No processing or interpolation will be performed. There is also NO nesting possible. Do NOT nest </DIV> inside a <DIV></DIV>! Moreover, DIV tags do NOT ensure a block structure in the final rendering (i.e., no empty lines).

Note that <DIV CLASS="ssperl" ID="varname"/> is handled the XML way. No content is processed, but varname is defined, and any SRC directives are processed.

You can use $varname like any other variable name. However, $varname is NOT a CGI variable and will be completely internal to your script. There is NO interaction between $varname and the outside world.

To interpolate a DIV derived text, you can use:

$varname =~ s/([\]])/\\\1/g; # Mark ']'-quotes
$varname = eval("qq[$varname]"); # Interpolate all values

The DIV tag will process IF, UNLESS, CGI and SRC attributes. The SRC files will be pre-pended to the body text of the tag.

CONDITIONAL PROCESSING: THE 'IF' AND 'UNLESS' ATTRIBUTES

It is often necessary to include code-blocks that should be executed conditionally, e.g., only for certain browsers or operating system. Furthermore, quite often sanity and security checks are necessary before user (form) data can be processed, e.g., with respect to email addresses and filenames.

Checks added to the code are often difficult to find, interpret or maintain and in general mess up the code flow. This kind of confussion is dangerous. Also, for many of the supported "foreign" scripting languages, adding these checks is cumbersome or even impossible.

As a uniform method for asserting the correctness of "context", two attributes are added to all supported tags: IF and UNLESS. They both evaluate their value and block execution when the result is <FALSE> (IF) or <TRUE> (UNLESS) in Perl, e.g., UNLESS='$NUMBER \> 100;' blocks execution if $NUMBER <= 100. Note that the backslash in the '\>' is removed and only used to differentiate this conditional '>' from the tag-closing '>'. For symmetry, the backslash in '\<' is also removed. Inside these conditionals, ~/ and ./ are expanded to their respective directory root paths.

For example, the following tag will be ignored when the filename is invalid:

<SCRIPT TYPE='text/ssperl' CGI='$FILENAME' 
IF='CGIscriptor::CGIsafeFileName($FILENAME);'>
...
</SCRIPT>

The IF and UNLESS values must be quoted. The same quotes are supported as with the other attributes. The SRC attribute is ignored when IF and UNLESS block execution.

THE MAGIC SOURCE ATTRIBUTE (SRC=)

The SRC attribute inside tags accepts a list of filenames and URL's separated by "," comma's (or ";" semicolons).

ALL the variable values defined in the CGI attribute are available in @ARGV as if the file was executed from the command line, in the exact order in which they were declared in the preceding CGI attribute.

First, a SRC={}-block will be evaluated as if the code inside the block was part of a <SCRIPT></SCRIPT> construct, i.e., "print do { code };'';" or `code` (i.e., SAFEqx('code)). Only a single block is evaluated. Note that this is processed less efficiently than <SCRIPT> </SCRIPT> blocks. Type of evaluation depends on the content-type: Perl for text/ssperl and OS shell for text/osshell. For other mime types (scripting languages), anything in the source block is put in front of the code block "inside" the tag.

Second, executable files (i.e., -x filename != 0) are evaluated as: print `filename \'$ARGV[0]\' \'$ARGV[1]\' ...` That is, you can actually call executables savely from the SRC tag.

Third, text files that match the file pattern, used by CGIscriptor to check whether files should be processed ($FilePattern), are processed in-line (i.e., recursively) by CGIscriptor as if the code was inserted in the original source file. Recursions, i.e., calling a file inside itself, are blocked. If you need them, you have to code them explicitely using "main::ProcessFile($file_path)".

Fourth, Perl text files (i.e., -T filename != 0) are evaluated as: "do FileName;'';".

Last, URL's (i.e., starting with 'HTTP://', 'FTP://', 'GOPHER://', 'TELNET://', 'WHOIS://' etc.) are loaded and printed. The loading and handling of <BASE> and document header is done by main::GET_URL($URL [, 0]). You can enter your own code (default is curl, snarf, or wget and some post-processing to add a <BASE> tag).

There are two pseudo-file names: PREFIX and POSTFIX. These implement a switch from prefixing the SRC code/files (PREFIX, default) before the content of the tag to appending the code after the content of the tag (POSTFIX). The switches are done in the order in which the PREFIX and POSTFIX labels are encountered. You can mix PREFIX and POSTFIX labels in any order with the SRC files. Note that the ORDER of file execution is determined for prefixed and postfixed files seperately.

File paths can be preceded by the URL protocol prefix "file://". This is simply STRIPPED from the name.

Example:

The request "http://cgi-bin/Action_Forms.pl/Statistics/Sign_Test.html?positive=8&negative=22 will result in printing "${SS_PUB}/Statistics/Sign_Test.html" With QUERY_STRING = "positive=8&negative=22"

on encountering the lines:

<META CONTENT="text/osshell; CGI='$positive=11 $negative=3'">
<b><SCRIPT TYPE="text/ssperl" SRC="./Statistics/SignTest.pl">
</SCRIPT></b><p>"
This line will be processed as:
"<b>`${SS_SCRIPT}/Statistics/SignTest.pl '8' '22'`</b><p>"

In which "${SS_SCRIPT}/Statistics/SignTest.pl" is an executable script, This line will end up printed as:

"<b>p <= 0.0161</b><p>"

Note that the META tag itself will never be printed, and is invisible to the outside world.

The SRC files in a DIV/INS tag will be added (pre-pended) to the body of the <DIV></DIV> tag. Blocks are NOT executed!

THE CGISCRIPTOR ROOT DIRECTORIES ~/ AND ./

Inside <SCRIPT></SCRIPT> tags, filepaths starting with "~/" are replaced by "$YOUR_HTML_FILES/", this way files in the public directories can be accessed without direct reference to the actual paths. Filepaths starting with "./" are replaced by "$YOUR_SCRIPTS/" and this should only be used for scripts. The "$YOUR_SCRIPTS" directory is added to @INC so, e.g., the 'require' command will load from the "$YOUR_SCRIPTS" directory.

Note: this replacement can seriously affect Perl scripts. Watch out for constructs like $a =~ s/aap\./noot./g, use $a =~ s@aap\.@noot.@g instead.

CGIscriptor.pl will assign the values of $SS_PUB and $SS_SCRIPT (i.e., $YOUR_HTML_FILES and $YOUR_SCRIPTS) to the environment variables $SS_PUB and $SS_SCRIPT. These can be accessed by the scripts that are executed. The "$SS_SCRIPT" ($YOUR_SCRIPTS) directory is added to @INC so, e.g., the 'require' command will load from the "$SS_SCRIPT" directory.
Values not preceded by $, ~/, or ./ are used as literals

OS SHELL SCRIPT EVALUATION (CONTENT-TYPE=TEXT/OSSHELL)

OS scripts are executed by a "safe" version of the `` operator (i.e., SAFEqx(), see also below) and any output is printed. CGIscriptor will interpolate the script and replace all user-supplied CGI-variables by their ''-quoted values (actually, all variables defined in CGI attributes are quoted). Other Perl variables are interpolated in a simple fasion, i.e., $scalar by their value, @list by join(' ', @list), and %hash by their name=value pairs. Complex references, e.g., @$variable, are all evaluated in a scalar context. Quotes should be used with care. NOTE: the results of the shell script evaluation will appear in the @CGIscriptorResults stack just as any other result.

All occurrences of $@% that should NOT be interpolated must be preceeded by a "\". Interpolation can be switched off completely by setting $CGIscriptor::NoShellScriptInterpolation = 1 (set to 0 or undef to switch interpolation on again) i.e.,

<SCRIPT TYPE="text/ssperl">
$CGIscriptor::NoShellScriptInterpolation = 1;
</SCRIPT>

RUN TIME TRANSLATION OF INPUT FILES

Allows general and global conversions of files using Regular Expressions. Very handy (but costly) to rewrite legacy pages to a new format. Select files to use it on with
my $TranslationPaths = 'filepattern';
This is costly. For efficiency, define:
$TranslationPaths = ''; when not using translations.
Accepts general regular expressions: [$pattern, $replacement]

Define:

my $TranslationPaths = 'filepattern'; # Pattern matching PATH_INFO

push(@TranslationTable, ['pattern', 'replacement']);
# e.g. (for Ruby Rails):
push(@TranslationTable, ['<%=', '<SCRIPT TYPE="text/ssruby">']);
push(@TranslationTable, ['%>', '</SCRIPT>']);

# Runs:
my $currentRegExp;
foreach $currentRegExp (@TranslationTable)
{
    my ($pattern, $replacement) = @$currentRegExp;
    $$text =~ s!$pattern!$replacement!msg;
};

EVALUATION OF OTHER SCRIPTING LANGUAGES

Adding a MIME-type and an interpreter command to %ScriptingLanguages automatically will catch any other scripting language in the standard <SCRIPT TYPE="[mime]"></SCRIPT> manner. E.g., adding: $ScriptingLanguages{'text/sspython'} = 'python'; will actually execute the folowing code in an HTML page (ignore 'REMOTE_HOST' for the moment):

<SCRIPT TYPE="text/sspython">
# A Python script
x = ["A","real","python","script","Hello","World","and", REMOTE_HOST]
print x[4:8] # Prints the list ["Hello","World","and", REMOTE_HOST]
</SCRIPT>

The script code is NOT interpolated by perl, EXCEPT for those interpreters that cannot handle variables themselves. Currently, several interpreters are pre-installed:

Perl test -  "text/testperl" => 'perl',  
Python    -  "text/sspython" => 'python', 
Ruby      -  "text/ssruby"   => 'ruby',  
Tcl       -  "text/sstcl"    => 'tcl',    
Awk       -  "text/ssawk"    => 'awk -f-',  
Gnu Lisp  -  "text/sslisp"   => 'rep | tail +5 '.
#                                 "| egrep -v '> |^rep. |^nil\\\$'",   
Gnu Prolog-  "text/ssprolog" => 'gprolog',  
M4 macro's-  "text/ssm4"     => 'm4',
Born shell-  "text/sh"       => 'sh',        
Bash      -  "text/bash"     => 'bash',
C-shell   -  "text/csh"      => 'csh',
Korn shell-  "text/ksh"      => 'ksh',
Praat     -  "text/sspraat"    => "praat - | sed 's/Praat > //g'",            
R         -  "text/ssr" => "R --vanilla --slave | sed 's/^[\[0-9\]*] //g'",   
REBOL     -   "text/ssrebol" => 
              "rebol --quiet|egrep -v '^[> ]* == '|sed 's/^\s*\[> \]* //g'", 
PostgreSQL-  "text/postgresql" => 'psql 2>/dev/null',
(psql)

Note that the "value" of $ScriptingLanguages{mime} must be a command that reads Standard Input and writes to standard output. Any extra output of interactive interpreters (banners, echo's, prompts) should be removed by piping the output through 'tail', 'grep', 'sed', or even 'awk' or 'perl'.

For access to CGI variables there is a special hashtable: %ScriptingCGIvariables. CGI variables can be accessed in three ways.

1. If the mime type is not present in %ScriptingCGIvariables, nothing is done and the script itself should parse the relevant environment variables.
2. If the mime type IS present in %ScriptingCGIvariables, but it's value is empty, e.g., $ScriptingCGIvariables{"text/sspraat"} = '';, the script text is interpolated by perl. That is, all $var, @array, %hash, and \-slashes are replaced by their respective values.
3. In all other cases, the CGI and environment variables are added in front of the script according to the format stored in %ScriptingCGIvariables. That is, the following (pseudo-)code is executed for each CGI- or Environment variable defined in the CGI-tag: printf(INTERPRETER, $ScriptingCGIvariables{$mime}, $CGI_NAME, $CGI_VALUE);

For instance, "text/testperl" => '$%s = "%s";' defines variable definitions for Perl, and "text/sspython" => '%s = "%s"' for Python (note that these definitions are not save, the real ones contain '-quotes).

THIS WILL NOT WORK FOR @VARIABLES, the (empty) $VARIABLES will be used instead.

The $CGI_VALUE parameters are "shrubed" of all control characters and quotes (by &shrubCGIparameter($CGI_VALUE)). Control characters are replaced by \0<octal ascii value> and quotes by their HTML character value (’ -> &#8217; ‘ -> &#8216; " -> &quot;). For example: if a client would supply the string value (in standard perl)

"/dev/null';\nrm -rf *;\necho '"
it would be processed as
'/dev/null&#8217;;\015rm -rf *;\015echo &#8217;'
(e.g., sh or bash would process the latter more according to your intentions).
If your intepreter requires different protection measures, you will have to supply these in %main::SHRUBcharacterTR (string => translation), e.g.,
$SHRUBcharacterTR{"\'"} = "&#8217;";

Currently, the following definitions are used:

%ScriptingCGIvariables = (
"text/testperl" => "\$\%s = '\%s';",    # Perl          $VAR = 'value' (for testing)
"text/sspython" => "\%s = '\%s'",       # Python        VAR = 'value'
"text/ssruby"   => '@%s = "%s"',        # Ruby          @VAR = "value"
"text/sstcl"    => 'set %s "%s"',       # TCL           set VAR "value"
"text/ssawk"    => '%s = "%s";',        # Awk           VAR = "value"; 
"text/sslisp"   => '(setq %s "%s")',   # Gnu lisp (rep) (setq VAR "value")
"text/ssprolog" => '',                 # Gnu prolog    (interpolated)
"text/ssm4"     => "define(`\%s', `\%s')", # M4 macro's define(`VAR', `value')
"text/sh"       => "\%s='\%s';",       # Born shell    VAR='value'; 
"text/bash"     => "\%s='\%s';",       # Born again shell VAR='value';
"text/csh"      => "\$\%s = '\%s';",   # C shell       $VAR = 'value';
"text/ksh"      => "\$\%s = '\%s';",   # Korn shell    $VAR = 'value';
"text/sspraat"  => '',                  # Praat         (interpolation) 
"text/ssr"      => '%s <- "%s";',       # R             VAR <- "value";
"text/ssrebol"  => '%s: copy "%s"',     # REBOL         VAR: copy "value"
"text/postgresql" => '',                # PostgreSQL    (interpolation) 
"" => ""
);

Four tables allow fine-tuning of interpreter with code that should be added before and after each code block:

Code added before each script block

%ScriptingPrefix = (
"text/testperl" => "\# Prefix Code;",   # Perl script testing
"text/ssm4"     =>  'divert(0)'        # M4 macro's (open STDOUT)
);

Code added at the end of each script block

%ScriptingPostfix = (
"text/testperl" => "\# Postfix Code;",  # Perl script testing
"text/ssm4"     =>  'divert(-1)'       # M4 macro's (block STDOUT)
);

Initialization code, inserted directly after opening (NEVER interpolated)

%ScriptingInitialization = (
"text/testperl" => "\# Initialization Code;", # Perl script testing
"text/ssawk"    => 'BEGIN {',                # Server Side awk scripts
"text/sslisp"   => '(prog1 nil ',            # Lisp (rep)
"text/ssm4"     =>  'divert(-1)'             # M4 macro's (block STDOUT)
);

Cleanup code, inserted before closing (NEVER interpolated)

%ScriptingCleanup = (
"text/testperl" => "\# Cleanup Code;",  # Perl script testing
"text/sspraat" => 'Quit',
"text/ssawk"    => '};',        # Server Side awk scripts
"text/sslisp"   =>  '(princ "\n" standard-output)).'   # Closing print to rep
"text/postgresql" => '\q',
);

The SRC attribute is NOT magical for these interpreters. In short, all code inside a source file or {} block is written verbattim to the interpreter. No (pre-)processing or executional magic is done.

A serious shortcomming of the described mechanism for handling other (scripting) languages, with respect to standard perl scripts (i.e., 'text/ssperl'), is that the code is only executed when the pipe to the interpreter is closed. So the pipe has to be closed at the end of each block. This means that the state of the interpreter (e.g., all variable values) is lost after the closing of the next </SCRIPT> tag. The standard 'text/ssperl' scripts retain all values and definitions.

APPLICATION MIME TYPES

To ease some important auxilliary functions from within the html pages I have added them as MIME types. This uses the mechanism that is also used for the evaluation of other scripting languages, with interpolation of CGI parameters (and perl-variables). Actually, these are defined exactly like any other "scripting language".

text/ssdisplay:
display some (HTML) text with interpolated variables (uses `cat`).
text/sslogfile:
write (append) the interpolated block to the file mentioned on the first, non-empty line (the filename can be preceded by 'File: ', note the space after the ':', uses `awk .... >> <filename>`).
text/ssmailto:
send email directly from within the script block. The first line of the body must contain To:Name@Valid.Email.Address (note: NO space between 'To:' and the email adres) For other options see the mailto man pages. It works by directly sending the (interpolated) content of the text block to a pipe into the Linux program 'mailto'.

In these script blocks, all Perl variables will be replaced by their values. All CGI variables are cleaned before they are used. These CGI variables must be redefined with a CGI attribute to restore their original values. In general, this will be more secure than constructing e.g., your own email command lines. For instance, Mailto will not execute any odd (forged) email address, but just stops when the email address is invalid and awk will construct any filename you give it (e.g. '<File;rm\\\040-f' would end up as a "valid" UNIX filename). Note that it will also gladly store this file anywhere (/../../../etc/passwd will work!). Use the CGIscriptor::CGIsafeFileName() function to clean the filename.

SHELL SCRIPT PIPING

If a shell script starts with the UNIX style "#! <shell command> \n" line, the rest of the shell script is piped into the indicated command, i.e., open(COMMAND, "| command");print COMMAND $RestOfScript;

In many ways this is equivalent to the MIME-type profiling for evaluating other scripting languages as discussed above. The difference breaks down to convenience. Shell script piping is a "raw" implementation. It allows you to control all aspects of execution. Using the MIME-type profiling is easier, but has a lot of defaults built in that might get in the way. Another difference is that shell script piping uses the SAFEqx() function, and MIME-type profiling does not.

Execution of shell scripts is under the control of the Perl Script blocks in the document. The MIME-type triggered execution of blocks can be simulated easily. You can switch to a different shell, e.g. tcl, completely by executing the following Perl commands inside your document:

<SCRIPT TYPE="text/ssperl">
$main::ShellScriptContentType = "text/ssTcl";     # Yes, you can do this
CGIscriptor::RedirectShellScript('/usr/bin/tcl'); # Pipe to Tcl
$CGIscriptor::NoShellScriptInterpolation = 1;
</SCRIPT>

After this script is executed, CGIscriptor will parse scripts of TYPE="text/ssTcl" and pipe their contents into '|/usr/bin/tcl' WITHOUT interpolation (i.e., NO substitution of Perl variables). The crucial function is :

CGIscriptor::RedirectShellScript('/usr/bin/tcl')

After executing this function, all shell scripts AND all calls to SAFEqx()) are piped into '|/usr/bin/tcl'. If the argument of RedirectShellScript is empty, e.g., '', the original (default) value is reset.

The standard output, STDOUT, of any pipe is send to the client. Currently, you should be carefull with quotes in such a piped script. The results of a pipe is NOT put on the @CGIscriptorResults stack. As a result, you do not have access to the output of any piped (#!) process! If you want such access, execute

<SCRIPT TYPE="text/ssperl">echo "script"|command</SCRIPT>

or

<SCRIPT TYPE="text/ssperl">
$resultvar = SAFEqx('echo "script"|command');
</SCRIPT>.

Safety is never complete. Although SAFEqx() prevents some of the most obvious forms of attacks and security slips, it cannot prevent them all. Especially, complex combinations of quotes and intricate variable references cannot be handled safely by SAFEqx. So be on guard.

PERL CODE EVALUATION (CONTENT-TYPE=TEXT/SSPERL)

All PERL scripts are evaluated inside a PERL package. This package has a separate name space. This isolated name space protects the CGIscriptor.pl program against interference from user code. However, some variables, e.g., $_, are global and cannot be protected. You are advised NOT to use such global variable names. You CAN write directives that directly access the variables in the main program. You do so at your own risk (there is definitely enough rope available to hang yourself). The behavior of CGIscriptor becomes undefined if you change its private variables during run time. The PERL code directives are used as in:

$Result = eval($directive); print $Result;'';

($directive contains all text between <SCRIPT></SCRIPT>). That is, the <directive> is treated as ''-quoted string and the result is treated as a scalar. To prevent the VALUE of the code block from appearing on the client's screen, end the directive with ';""</SCRIPT>'. Evaluated directives return the last value, just as eval(), blocks, and subroutines, but only as a scalar.

IMPORTANT: All PERL variables defined are persistent. Each <SCRIPT> </SCRIPT> construct is evaluated as a {}-block with associated scope (e.g., for "my $var;" declarations). This means that values assigned to a PERL variable can be used throughout the document unless they were declared with "my". The following will actually work as intended (note that the ``-quotes in this example are NOT evaluated, but used as simple quotes):

<META CONTENT="text/ssperl; CGI=`$String='abcdefg'`">
anything ...
<SCRIPT TYPE="text/ssperl">@List = split('', $String);</SCRIPT>
anything ...
<SCRIPT TYPE="text/ssperl">join(", ", @List[1..$#List]);</SCRIPT>

The first <SCRIPT TYPE="text/ssperl"></SCRIPT> construct will return the value scalar(@List), the second <SCRIPT TYPE="text/ssperl"></SCRIPT> construct will print the elements of $String separated by commas, leaving out the first element, i.e., $List[0].

Another warning: './' and '~/' are ALWAYS replaced by the values of $YOUR_SCRIPTS and $YOUR_HTML_FILES, respectively . This can interfere with pattern matching, e.g., $a =~ s/aap\./noot\./g will result in the evaluations of $a =~ s/aap\\${YOUR_SCRIPTS}noot\./g. Use s@regexp@replacement@g instead.

SERVER SIDE SESSIONS AND ACCESS CONTROL (LOGIN)

An infrastructure for user acount authorization and file access control is available. Each request is matched against a list of URL path patterns. If the request matches, a Session Ticket is required to access the URL. This Session Ticket should be present as a CGI parameter or Cookie, eg:

CGI: SESSIONTICKET=<value>
Cookie: CGIscriptorSESSION=<value>

The example implementation stores Session Tickets as files in a local directory. To create Session Tickets, a Login request must be given with a LOGIN=<value> CGI parameter, a user name and a (doubly hashed) password. The user name and (singly hashed) password are stored in a PASSWORD ticket with the same name as the user account (name cleaned up for security).

The example session model implements 4 functions:

  1. Login
    The password is hashed with the user name and server side salt, and then hashed with REMOTE_HOST and a random salt. Client and Server both perform these actions and the Server only grants access if restults are the same. The server side only stores the password hashed with the user name and server side salt. Neither the plain password, nor the hashed password is ever exchanged. Only values hashed with the one-time salt are exchanged.
  2. Session
    For every access to a restricted URL, the Session Ticket is checked before access is granted. There are three session modes. The first uses a fixed Session Ticket that is stored as a cookie value in the browser (actually, as a sessionStorage value). The second uses only the IP address at login to authenticate requests. The third is a Challenge mode, where the client has to calculate the value of the next one-time Session Ticket from a value derived from the password and a random string.
  3. Password Change
    A new password is hashed with the user name and server side salt, and then encrypted (XORed) with the old password hashed with the user name and salt and rehashed with the login ticket number. Ath the server side this operation is reversed. Again, the stored password value is never exchanged unencrypted.
  4. New Account
    The text of a new account (Type: PASSWORD) file is constructed from the new username (CGI: NEWUSERNAME, converted to lowercase) and hashed new password (CGI: NEWPASSWORD). The same process is used to encrypt the new password as is used for the Password Change function. Again, the stored password value is never exchanged unencrypted. Some default setting are encoded. For display in the browser, the new password is reencrypted (XORed) with a special key, the old password hash hashed with a session specific random hex value sent initially with the session login ticket ($RANDOMSALT).
    For example for user NewUser and password NewPassword:
    Type: PASSWORD
    Username: newuser
    Password: 19afeadfba8d5dcd252e157fafd3010859f8762b87682b6b6cdb3e565194fa91
    IPaddress: 127\.0\.0\.1
    AllowedPaths: ^/Private/[\w\-]+\.html?
    AllowedPaths: ^/Private/newuser/
    Salt: e93cf858a1d5626bf095ea5c25df990dfa969ff5a5dc908b22c9a5229b525f65
    Session: SESSION
    Date: Fri Jun 29 12:46:22 2012
    Time: 1340973982
    Signature: 676c35d3aa63540293ea5442f12872bfb0a22665b504f58f804582493b6ef04e
    
    The password is created with the commands:
    printf '%s' 'NewPasswordnewuser970e68017413fb0ea84d7fe3c463077636dd6d53486910d4a53c693dd4109b1a'|shasum -a 256
    
    If the CPAN mudule Digest is installed, it is used instead of the commands. However, the password account files are protected against unauthorized change. To obtain a valid Password account, the following command should be given:
    perl CGIscriptor.pl --managelogin salt=Private/.Passwords/SALT \
      masterkey='Sherlock investigates oleander curry in Bath' \
      password='NewPassword' \
      Private/.Passwords/newuser
    

Implementation

The session authentication mechanism is based on the exchange of ticket identifiers. A ticket identifier is just a string of characters, a name or a random 64 character hexadecimal string. Authentication is based on a (password derived) shared secret and the ability to calculate ticket identifiers from this shared secret. Ticket identifiers should be "safe" filenames (except user names). There are four types of tickets:

All tickets can have an expiration date in the form of a time duration from creation, in seconds, minutes, hours, or days (+duration[smhd]). An absolute time can be given in seconds since the epoch of the server host. Note that expiration times of CHALLENGE authentication tokens are calculated from the last access time. Accounts can include a maximal lifetime for session tickets (MaxLifetime).

A Login page should create a LOGIN ticket file locally and send a server specific salt, a Random salt, and a LOGIN ticket identifier. The server side compares the username and hashed password, actually hashed(hashed(password+serversalt)+Random salt) from the client with the values it calculates from the stored Random salt from the LOGIN ticket and the hashed(password+serversalt) from the PASSWORD ticket. If successful, a new SESSION ticket is generated as a (double) hash sum of the stored password and the LOGIN ticket, i.e. LoginTicket = hashed(hashed(password+serversalt)+REMOTE_HOST+Random salt) and SessionTicket = hashed(hashed(LoginTicket).LoginTicket). This SESSION ticket should also be generated by the client and stored as sessionStorage and cookie values as needed. The Username, IP address and Path are available as $LoginUsername, $LoginIPaddress, and $LoginPath, respectively.

The CHALLENGE protocol stores the single hashed version of the SESSION tickets. However, this value is not exchanged, but kept secret in the JavaScript sessionStorage object. Instead, every page returned from the server will contain a one-time Challenge value ($CHALLENGETICKET) which has to be hashed with the stored value to return the current ticket id string.

In the current example implementation, all random values are created as full, 256 bit SHA256 hash values (Hex strings) of 64 bytes read from /dev/urandom.

Authorization

A limited level of authorization tuning is build into the login system. Each account file (PASSWORD ticket file) can contain a number of Capabilities lines. These control special priveliges. The Capabilities can be checked inside the HTML pages as part of the ticket information. Two privileges are handled internally: CreateUser and VariableREMOTE_ADDR. CreateUser allows the logged in user to create a new user account. With VariableREMOTE_ADDR, the session of the logged in user is not limited to the Remote IP address from which the inital log-in took place. Sessions can hop from one apparant (proxy) IP address to another, e.g., when using Tor. Any IPaddress patterns given in the PASSWORD ticket file remain in effect during the session. For security reasons, the VariableREMOTE_ADDR capability is only effective if the session type is CHALLENGE.

Security considerations with Session tickets

For strong security, please use end-to-end encryption. This can be achieved using a VPN (Virtual Private Network), SSH tunnel, or a HTTPS capable server with OpenSSL. The session ticket system of CGIscriptor.pl is intended to be used as a simple authentication mechanism WITHOUT END-TO-END ENCRYPTION. The authenticating mechanism tries to use some simple means to protect the authentication process from eavesdropping. For this it uses a secure hash function, SHA256. For all practial purposes, it is impossible to "decrypt" a SHA256 sum. But this login scheme is only as secure as your browser. Which, in general, is not very secure.

One fundamental weakness of the implemented procedure is that the Client obtains the code to encrypt the passwords from the server. It is the JavaScript code in the HTML pages. An attacker who could place himself between Server and Client, a man in the middle attack (MITM), could change the code to reveal the plaintext password and other information. There is no real protection against this attack without end-to-end encryption and authentication. A simple, but rather cumbersome, way to check for such attacks would be to store known good copys of the pages (downloaded with a browser or automatically with curl or wget) and then use other tools to download new pages at random intervals and compare them to the old pages. For instance, the following line would remove the variable ticket codes and give a fixed SHA256 sum for the original Login.html page+code:

curl http://localhost:8080/Private/index.html | sed 's/=\"[a-z0-9]\{64\}\"/=""/g' | shasum -a 256
A simple diff command between old and new files should give only differences in half a dozen lines, where only hexadecimal salt values will actually differ.

A sort of solution for the MITM attack problem that might protect at least the plaintext password would be to run a trusted web page from local storage to handle password input. The solution would be to add a hidden iFrame tag loading the untrusted page from the URL and extract the needed ticket and salt values. Then run the stored, trusted, code with these values. It is not (yet) possible to set the required session storage inside the browser, so this method only works for IPADDRESS sessions and plain SESSION tickets. There are many security problems with this "solution".

If you are able to ascertain the integrity of the login page using any of the above methods, you can check whether the IP address seen by the login server is indeed the IP address of your computer. The IP address of the REMOTE_HOST (your visible IP address) is part of the login "password". It is stored in the login page as a CLIENTIPADDRESS. It can can be inspected by clicking the "Check IP address" box. Provided the MitM attacker cannot spoof your IP address, you can ensure that the login server sees your IP address and not that of an attacker.

Humans tend to reuse passwords. A compromise of a site running CGIscriptor.pl could therefore lead to a compromise of user accounts at other sites. Therefore, plain text passwords are never stored, used, or exchanged. Instead, the plain password and user name are "encrypted" with a server site salt value. Actually, all are concatenated and hashed with a one-way secure hash function (SHA256) into a single string. Whenever the word "password" is used, this hash sum is meant. Note that the salts are generated from /dev/urandom. You should check whether the implementation of /dev/urandom on your platform is secure before relying on it. This might be a problem when running CGIscriptor under Cygwin on MS Windows.
Note: no attempt is made to slow down the password hash, so bad passwords can be cracked by brute force

As the (hashed) passwords are all that is needed to identify at the site, these should not be stored in this form. A site specific passphrase can be entered as an environment variable ($ENV{'CGIMasterKey'}). This phrase is hashed with the server site salt and the result is hashed with the user name and then XORed with the password when it is stored. Also, to detect changes to the account (PASSWORD) and session tickets, a (HMAC) hash of some of the contents of the ticket with the server salt and CGIMasterKey is stored in each ticket.

Creating a valid (hashed) password, encrypt it with the CGIMasterKey and construct a signature of the ticket are non-trivial. This has to be redone with every change of the ticket file or CGIMasterKey change. CGIscriptor can do this from the command line with the command:

perl CGIscriptor.pl --managelogin salt=Private/.Passwords/SALT \
  masterkey='Sherlock investigates oleander curry in Bath' \
  password='There is no password like more password' \
  admin
CGIscriptor will exit after this command with the first option being --managelogin. Options have the form: When the value of an option is a existing file path, the first line of that file is used. Options are followed by one or more paths plus names of existing ticket files. Each password option is only used for a single ticket file. It is most definitely a bad idea to use a password that is identical to an existing filepath, as the file will be read instead. Be aware that the name of the file should be a cleaned up version of the Username. This will not be checked.

For the authentication and a change of password, the (old) password is used to "encrypt" a random one-time token or the new password, respectively. For authentication, decryption is not needed, so a secure hash function (SHA256) is used to create a one-way hash sum "encryption". A new password must be decrypted. New passwords are encryped by XORing them with the old password.

Strong Passwords: It is so easy

If you only could see what you are typing

Your password might be vulnerable to brute force guessing. Protections against such attacks are costly in terms of code complexity, bugs, and execution time. However, there is a very simple and secure counter measure. See the XKCD comic. The phrase, There is no password like more password would be both much easier to remember, and still stronger than h4]D%@m:49, at least before this phrase was pasted as an example on the Internet.
For the procedures used at this site, a basic computer setup can check in the order of a billion passwords per second. You need a password (or phrase) strength in the order of 56 bits to be a little secure (one year on a single computer). One of the largest network in the world, Bitcoin mining, can check some 12 terahashes per second (June 2012). This corresponds to checking 6 times 1012 passwords per second. It would take a passwords strength of ~68 bits to keep the equivalent of the Bitcoin computer network occupied for around a year before it found a match.
Please be so kind and add the name of your favorite flower, dish, fictional character, or small town to your password. Say, Oleander, Curry, Sherlock, or Bath, UK (each adds ~12 bits) or even the phrase Sherlock investigates oleander curry in Bath (adds > 56 bits, note that oleander is poisonous, so do not try this curry at home). That would be more effective than adding a thousand rounds of encryption. Typing long passwords without seeing what you are typing is problematic. So a button should be included to make password visible.

Technical matters

Client side JavaScript code definitions. Variable names starting with '$' are CGIscriptor CGI variables. Some of the hashes could be strengthened by switching to HMAC signatures. However, the security issues of maintaining parallel functions for HMAC in both Perl and Javascript seem to be more serious than the attack vectors against the hashes. But HMAC is indeed used for the ticket signatures.

// On Login
HashPlaintextPassword() {
	var plaintextpassword = document.getElementById('PASSWORD');
	var serversalt = document.getElementById('SERVERSALT');
	var username = document.getElementById('CGIUSERNAME');
 	return hex_sha256(plaintextpassword.value+username.value.toLowerCase()+serversalt.value);
}
var randomsalt = $RANDOMSALT; // From CGIscriptor
var loginticket = $LOGINTICKET; // From CGIscriptor
// Hash plaintext password
var password = HashPlaintextPassword();
// Authorize login
var hashedpassword = hex_sha256(randomsalt+password);
// Sessionticket
var sessionticket = hex_sha256(loginticket+password);
sessionStorage.setItem("CGIscriptorPRIVATE", sessionticket);
// Secretkey for encrypting new passwords, acts like a one-time pad
// Is set anew with every login, ie, also whith password changes
// and for each create new user request
var secretkey = hex_sha256(randomsalt+loginticket+password);
sessionStorage.setItem("CGIscriptorSECRET", secretkey);

// For a SESSION type request
sessionticket = hex_sha256(sessionStorage.getItem("CGIscriptorPRIVATE"));
createCookie("CGIscriptorSESSION",sessionticket, 0, "");

// For a CHALLENGE type request
var sessionset = "$CHALLENGETICKET"; // From CGIscriptor
var sessionkey = sessionStorage.getItem("CGIscriptorPRIVATE");
sessionticket = hex_sha256(sessionset+sessionkey);
createCookie("CGIscriptorCHALLENGE",sessionticket, 0, "");

// For transmitting a new password
HashPlaintextNewPassword() {
	var plaintextpassword = document.getElementById('NEWPASSWORD');
	var serversalt = document.getElementById('SERVERSALT');
	var username = document.getElementById('NEWUSERNAME');
 	return hex_sha256(plaintextpassword.value+username.value.toLowerCase()+serversalt.value);
}

var newpassword = document.getElementById('NEWPASSWORD');
var newpasswordrep = document.getElementById('NEWPASSWORDREP');
// Hash plaintext password
newpassword.value = HashPlaintextNewPassword();
var secretkey = sessionStorage.getItem("CGIscriptorSECRET");

var encrypted = XOR_hex_strings(secretkey, newpassword.value);
newpassword.value = encrypted;
newpasswordrep.value = encrypted;

// XOR of hexadecimal strings of equal length
function XOR_hex_strings(hex1, hex2) {
	var resultHex = "";
	var maxlength = Math.max(hex1.length, hex2.length);

	for(var i=0; i < maxlength; ++i) {
		var h1 = hex1.charAt(i);
		if(! h1) h1='0';
		var h2 = hex2.charAt(i);
		if(! h2) h2 ='0';
		var d1 = parseInt(h1,16);
		var d2 = parseInt(h2,16);
		var resultD = d1^d2;
		resultHex = resultHex+resultD.toString(16);
	};
	return resultHex;
};

Password encryption based on $ENV{'CGIMasterKey'}. Server side Perl code:

# Password encryption
my $masterkey = $ENV{'CGIMasterKey'}
my $hash1 = hash_string($masterkey.$serversalt);
my $CryptKey = hash_string($username.$hash1);
$password = XOR_hex_strings($CryptKey,$password);

# Key for HMAC signing
my $hash1 = hash_string($masterkey.$serversalt);
my $HMACKey = hash_string($username.$hash1);

USER EXTENSIONS

A CGIscriptor package is attached to the bottom of this file. With this package you can personalize your version of CGIscriptor by including often used perl routines. These subroutines can be accessed by prefixing their names with CGIscriptor::, e.g.,

<SCRIPT TYPE="text/ssperl"> 
CGIscriptor::ListDocs("/Books/*") # List all documents in /Books
</SCRIPT>

It already contains some useful subroutines for Document Management. As it is a separate package, it has its own namespace, isolated from both the evaluator and the main program. To access variables from the document <SCRIPT></SCRIPT> blocks, use $CGIexecute::<var>.

Currently, the following functions are implemented (precede them with CGIscriptor::, see below for more information)

THE RESULTS STACK: @CGIscriptorResults

If the pseudo-variable "$CGIscriptorResults" has been defined in a META tag, all subsequent SCRIPT and META results are pushed on the @CGIscriptorResults stack. This list is just another Perl variable and can be used and manipulated like any other list. $CGIscriptorResults[-1] is always the last result. This is only of limited use, e.g., to use the results of an OS shell script inside a Perl script. Will NOT contain the results of Pipes or code from MIME-profiling.

USEFULL CGI PREDEFINED VARIABLES (DO NOT ASSIGN TO THESE)

USEFULL CGI ENVIRONMENT VARIABLES

Variables accessible (in APACHE) as $ENV{"<name>"} (see: "http://hoohoo.ncsa.uiuc.edu/cgi/env.html"):

INSTRUCTIONS FOR RUNNING CGIscriptor ON UNIX

CGIscriptor.pl will run on any WWW server that runs Perl scripts, just add a line like the following to your srm.conf file (Apache example):

ScriptAlias /SHTML/ /real-path/CGIscriptor.pl/

URL's that refer to http://www.your.address/SHTML/... will now be handled by CGIscriptor.pl, which can use a private directory tree (default is the DOCUMENT_ROOT directory tree, but it can be anywhere, see manual).

If your hosting ISP won't let you add ScriptAlias lines you can use the following "rewrite"-based "scriptalias" in .htaccess (from Gerd Franke)

RewriteEngine 	On
RewriteBase  /
RewriteCond %{REQUEST_FILENAME} .html$
RewriteCond %{SCRIPT_FILENAME}	!cgiscriptor.pl$
RewriteCond %{REQUEST_FILENAME}	-f
RewriteRule	^(.*)$	/cgi-bin/cgiscriptor.pl/$1?%{QUERY_STRING}

Everthing with the extension ".html" and not including "cgiscriptor.pl" in the url and where the file "path/filename.html" exists is redirected to "/cgi.bin/cgiscriptor.pl/path/filename.html?query". The user configuration should get the same path-level as the .htaccess-file:

# Just enter your own directory path here
$YOUR_HTML_FILES = "$ENV{'DOCUMENT_ROOT'}";     
# use DOCUMENT_ROOT only, if .htaccess lies in the root-directory.

If this .htaccess goes in a specific directory, the path to this directory must be added to $ENV{'DOCUMENT_ROOT'}.

The CGIscriptor file contains all documentation as comments. These comments can be removed to speed up loading (e.g., `egrep -v '^#' CGIscriptor.pl` > leanScriptor.pl). A bare bones version of CGIscriptor.pl, lacking documentation, most comments, access control, example functions etc. (but still with the copyright notice and some minimal documentation) can be obtained by calling CGIscriptor.pl on the command line with the '-slim' command line argument, e.g.,

>CGIscriptor.pl -slim > slimCGIscriptor.pl

CGIscriptor.pl can be run from the command line with <path> and <query> as arguments, as `CGIscriptor.pl <path> <query>`, inside a perl script with 'do CGIscriptor.pl' after setting $ENV{PATH_INFO} and $ENV{QUERY_STRING}, or CGIscriptor.pl can be loaded with 'require "/real-path/CGIscriptor.pl"'. In the latter case, requests are processed by 'Handle_Request();' (again after setting $ENV{PATH_INFO} and $ENV{QUERY_STRING}).

The --help command line switch will print the manual.

Using the command line execution option, CGIscriptor.pl can be used as a document (meta-)preprocessor. If the first argument is '-', STDIN will be read. For example:

> cat MyDynamicDocument.html | CGIscriptor.pl - '[QueryString]' > MyStaticFile.html

This command line will produce a STATIC file with the DYNAMIC content of MyDocument.html "interpolated". This option would be very dangerous when available over the internet. If someone could sneak a 'http://www.your.domain/-' URL past your server, CGIscriptor could EXECUTE any POSTED contend. Therefore, for security reasons, STDIN will NOT be read if ANY of the HTTP server environment variables is set (e.g., SERVER_PORT, SERVER_PROTOCOL, SERVER_NAME, SERVER_SOFTWARE, HTTP_USER_AGENT, REMOTE_ADDR).
This block on processing STDIN on HTTP requests can be lifted by setting

$BLOCK_STDIN_HTTP_REQUEST = 0;
In the security configuration. But be carefull when doing this. It can be very dangerous.

Running demo's and more information can be found at http://www.fon.hum.uva.nl/~rob/OSS/OSS.html

A pocket-size HTTP daemon, CGIservlet.pl, is available from my web site or CPAN that can use CGIscriptor.pl as the base of a µWWW server and demonstrates its use.

NON-UNIX PLATFORMS

CGIscriptor.pl was mainly developed and tested on UNIX. However, as I coded part of the time on an Apple Macintosh under MacPerl, I made sure CGIscriptor did run under MacPerl (with command line options). But only as an independend script, not as part of a HTTP server. I have used it under Apache in Windows XP.

license

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

Author: Rob van Son 
        email:
        R.J.J.H.vanSon@gmail.com 
        University of Amsterdam

Date:   May 22, 2000
Ver:    2.0
Env:    Perl 5.002