About the CGI kit

At the moment this is just random notes.

Code organization

URL-space Organization

There are a small number of (typically indexed) ``areas'', for example /m// for members, or /projects/ for projects. Double-slash (//) indicates a one- or two-level alphabetical index. (/X/Xy/Xyzw). Areas and their indexing style are defined in the site.cf file for the site.

We can do almost all (all?) file management with a single CGI, i.e. /file/...(of course it would be /Ccgi/file.cgi, but we could fix that with mod_rewrite). It would handle directory indices, PUT, POST, and unprocessed GET. It's basically the flipside of process.

File handling

File formatting is done at page update time, by whatever CGI does the upload or edit. If mod_dav is used (probably a good idea), there needs to be a CGI to run the processing at the end of a "session". One way would be start/end session ops. A cron job or cleanup server can handle the inevitable loose ends. Formatting will most likely be controlled by a Makefile to (eventually) allow dependencies to be handled automatically.

With a mod_perl or C implementation it would be possible to do style processing on-the-fly. At the moment, we're not doing that.

.ht files
automatic crosslinking, entity expansion, header/footer tag expansion -- there will be a limited number of extra tags (tbd). Dynamic form processing will be done using a CGI, e.g. /cgi/process/path/to/xxx.ht (Unlike Sparrow Web, this keeps the processing code out of the actual web pages).
.wt files (wiki text)
WikiWiki formatting codes as well as the same crosslinking and entity expansion as text in .ht files. Mail-style headers for additional control info. Not converted to <wiki> tags in .ht files, because we want to allow owners to edit them in the original format even if they're using DAV.
.html files
converted to .ht on upload by splicing in header/footer tags, unless otherwise specified. (Sometimes you need to keep the original formatting.)
Docbook, .tex, .flk, .abc, etc.
statically converted to HTML or PDF as necessary and appropriate, with the originals kept around for editing.

All conversion to HTML should be done offline when a file is uploaded; the best thing is probably to automagically generate a Makefile with the appropriate dependencies (but we won't, at first). Skinning could be done by specifying a handler for HTML -- or a stylesheet.

Note that a CGI file expander will be around for debugging purposes, but its existence won't be widely advertised. It could also be used for offline processing; if mod_perl is in use this would be faster -- possibly much faster -- than running it as a separate app. In any case the on-the-fly expander will require using a different URL path than the usual; we have to keep the real files around for updating with WebDAV.

Note that ad selection (member, nonmember, etc.) doesn't require active HTML, only a CGI for the image src that does the appropriate redirects.

Tags and Entities

Tags

For the moment we need only a limited number of tags. Apart from a very small number of active ones, most tags, including <header> and <footer> will be defined by files (tagname.inc in the /site/include directory). What could be simpler?

<toc />
Table of contents for the page that contains it.
<calendar />
Generates a calendar index for subdirectories of the form yyyy/mmdd
<index />
Index for the directory. Maybe an attribute to specify a subdirectory, and maybe some formatting options.
<date >
Including optional format in Unix date(1) format.
<path />
URL path to the page, with each directory name the anchor of a link to that directory. Attributes for root name (default none) and path sep (default slash).
<wiki>
Delimits `WikiWiki' text that can be edited in a <textarea>.
<ab>
With the syntax of <a> except that, if the href attribute matches the current URL, it shows in bold rather than as a link, and if it matches a prefix of the current URL it shows as a bold link.

Entities

There will also be site-wide and directory-wide entity definitions. Anything not defined (e.g. the standard HTML entities like &amp;) will be passed through. An entity called _tag_ will be expanded into the attributes of that tag when we encounter it; about the only reasonable use for this is _body_.

&pagePath;
URL path to the page, not including its extension
&pageName;
filename, less extension, of the page.
&dirPath;
directory path.
&;

Profiles and Property Files

Property files are just files of name=value pairs that turn into entity maps. The site configuration file and user profiles are property files.

We will also use property files for directory properties; in particular for such things as ownership, permissions, style, color theme, and (very importantly) prototype files. In particular, the prototype property in a specialized directory (e.g. a member, contribution, or song directory) points to the prototype to use for the directory index.

Other properties that will be necessary are ones to specify directory format (long/compact, multi-column, sort order), file descriptions, and so on.

Themes

A theme is a (hopefully more-or-less coherent) set of colors, pixmaps, and possibly fonts. Maybe even CSS stylesheets.

theme.page.bg
page (body) background color
theme.page.text
page (body) text
theme.page.link
link color
theme.page.vlink
visited-link color
theme.page.alink
"active" link color: the color of a link with the mouse button held down over it.
theme.table.bg
background color for (typically borderless) tables, to mark them off from body text.
theme.weak.bg
background color for "weak accent" -- typically used for <th> elements.
theme.weak.text
text on weak accent
theme.strong.bg
background color for "strong accent"
theme.strong.text
text on strong accent
theme.stronger.bg
background color for "stronger accent"
theme.stronger.text
text on stronger accent

Preferences

Preferences are user-specific entities that affect how a particular user prefers to view the site. Not all styles are amenable to this sort of treatment, and a certain amount of finagling will be required to circumvent preformatted pages; style and theme preferences will, therefore, usually be limited to users who have paid for the privilege.

Styles and Prototypes

Unlike a theme, which is defined by a set of properties (accessed as entities), a style is defined by a set of template and prototype files. A full style comprises templates for all of the specialized types of directory, as well as definitions for all of the tags mentioned above. It should be written using the theme entities for colors and backgrounds.

Note that an index.ht or index.html file overrides a directory's index prototype, which is usually located in the /site/ directory.

Note:
Initially we will probably have to ignore directory prototypes and rely on /Ccgi/dir-index.cgi to do what is necessary.

Prototypes

Directory prototypes generally vary according to area. Currently, indices for alphabetical and chronological index spaces are handled specially. However, there needs to be some freedom to allow for local conditions. I.e. does a chronological index default to forward (appropriate for an archive), or reverse (as in a weblog or discussion forum) indexing?

member-dir.ht
Prototype for a member directory.
contrib-dir.ht
Prototype for a contributor's directory.

Naturally, specialized community sites like PenguinSong.net will have their own, specialized subdirectory prototypes, e.g. for song directories, song lists, and so on.

Forms

There are a couple of different ways of doing forms:

The form-to-cf and form-to-skeleton variants are probably the easiest to implement, which is a good thing because we need them earliest.

Tracking Downloads

Note that the following discussion refers to ``tracks'' rather than ``songs'' -- any given song may be represented by multiple tracks, each corresponding to a particular performance. Note, too, that anything said about tracks applies equally well to images, books, and so on. Basically any kind of download that you want to control access to.

We have several different goals for downloads:

So here's what we'll do:

The symlink URL path will probably have a sub-path of the form, e.g., mmdd/hh/xxxx where xxxx is a hash of the user's name. This will make it easy to find with minimal searching (of only the previous n+1 hours, where n is the lifetime), and minimize the number of directories that need to be searched for reaping.

Careful work with permissions will be required to make track directories directly accessible only to rights-holders (possibly only when logged in). This might, for example, involve an appropriate <Location> block for the temporary links that overrides the directories' .htaccess files.

Note that it would be easy to redirect track links to a different server or even to multiple different servers -- the ultimate extension would be a system where each artist with sufficient capacity serves their own tracks. You could even go further, to a Napster-like system in which listeners cache and serve recently-listened tracks. Unlike Napster it would be legal and tracks would have uniform, standardized metadata, but it would give the same sort of view of who has what and what their bandwidth is.

Some of the things we should be able to do include not only detailed statistics but, for users who opt in to the feature, the ability to contact other users currently or recently listening to the same song (possibly on other tracks).


$Id: notes.html,v 1.8 2002/09/27 14:50:48 steve Exp $
Stephen R. Savitzky <steve@theStarport.org>