About the CGI kit

At the moment this is just random notes.

Code organization

The PERL modules will be accessible at a constant location relative to the CGI directory; typically ../lib/. They would, of course, be addressed as Foo::Whatever. ScriptAlias would be used to bring them into CGI space.
Avoid global variables to allow conversion to mod_perl -- globals will be used only for sitewide constants like lookup paths, shared templates, and so on.
Templates and site parameters go by default in a well-known location (/site) relative to the server root or the document root so that the code can access them. If they're not in one of the default places, we use server environment variables (COMBO_SITE_CF, COMBO_SITE_DOCS) to point to them. The /site URL location, by convention, contains such things as FAQ's, policy documents, and so on. The use of environment variables allows the site information to be differentiated in a virtual-host environment.
Specifically, the master site configuration file is (conventionally) site.cf and is normally contained in the /site directory.

URL-space Organization

There are a small number of (typically indexed) ``areas'', for example /m// for members, or /projects/ for projects. Double-slash (//) indicates a one- or two-level alphabetical index. (/X/Xy/Xyzw). Areas and their indexing style are defined in the site.cf file for the site.

We can do almost all (all?) file management with a single CGI, i.e. /file/...(of course it would be /Ccgi/file.cgi, but we could fix that with mod_rewrite). It would handle directory indices, PUT, POST, and unprocessed GET. It's basically the flipside of process.

File handling

File formatting is done at page update time, by whatever CGI does the upload or edit. If mod_dav is used (probably a good idea), there needs to be a CGI to run the processing at the end of a "session". One way would be start/end session ops. A cron job or cleanup server can handle the inevitable loose ends. Formatting will most likely be controlled by a Makefile to (eventually) allow dependencies to be handled automatically.

With a mod_perl or C implementation it would be possible to do style processing on-the-fly. At the moment, we're not doing that.

.ht files: automatic crosslinking, entity expansion, header/footer tag expansion -- there will be a limited number of extra tags (tbd). Dynamic form processing will be done using a CGI, e.g. /cgi/process/path/to/xxx.ht (Unlike Sparrow Web, this keeps the processing code out of the actual web pages).
.wt files (wiki text): WikiWiki formatting codes as well as the same crosslinking and entity expansion as text in .ht files. Mail-style headers for additional control info. Not converted to <wiki> tags in .ht files, because we want to allow owners to edit them in the original format even if they're using DAV.
.html files: converted to .ht on upload by splicing in header/footer tags, unless otherwise specified. (Sometimes you need to keep the original formatting.)
Docbook, .tex, .flk, .abc, etc.: statically converted to HTML or PDF as necessary and appropriate, with the originals kept around for editing.

All conversion to HTML should be done offline when a file is uploaded; the best thing is probably to automagically generate a Makefile with the appropriate dependencies (but we won't, at first). Skinning could be done by specifying a handler for HTML -- or a stylesheet.

Note that a CGI file expander will be around for debugging purposes, but its existence won't be widely advertised. It could also be used for offline processing; if mod_perl is in use this would be faster -- possibly much faster -- than running it as a separate app. In any case the on-the-fly expander will require using a different URL path than the usual; we have to keep the real files around for updating with WebDAV.

Note that ad selection (member, nonmember, etc.) doesn't require active HTML, only a CGI for the image src that does the appropriate redirects.

Tags and Entities

Entities

There will also be site-wide and directory-wide entity definitions. Anything not defined (e.g. the standard HTML entities like &) will be passed through. An entity called _tag_ will be expanded into the attributes of that tag when we encounter it; about the only reasonable use for this is _body_.

&pagePath;: URL path to the page, not including its extension
&pageName;: filename, less extension, of the page.
&dirPath;: directory path.
&;

Profiles and Property Files

Property files are just files of name=value pairs that turn into entity maps. The site configuration file and user profiles are property files.

We will also use property files for directory properties; in particular for such things as ownership, permissions, style, color theme, and (very importantly) prototype files. In particular, the prototype property in a specialized directory (e.g. a member, contribution, or song directory) points to the prototype to use for the directory index.

Other properties that will be necessary are ones to specify directory format (long/compact, multi-column, sort order), file descriptions, and so on.

Themes

A theme is a (hopefully more-or-less coherent) set of colors, pixmaps, and possibly fonts. Maybe even CSS stylesheets.

theme.page.bg: page (body) background color
theme.page.text: page (body) text
theme.page.link: link color
theme.page.vlink: visited-link color
theme.page.alink: "active" link color: the color of a link with the mouse button held down over it.
theme.table.bg: background color for (typically borderless) tables, to mark them off from body text.
theme.weak.bg: background color for "weak accent" -- typically used for <th> elements.
theme.weak.text: text on weak accent
theme.strong.bg: background color for "strong accent"
theme.strong.text: text on strong accent
theme.stronger.bg: background color for "stronger accent"
theme.stronger.text: text on stronger accent

Preferences

Preferences are user-specific entities that affect how a particular user prefers to view the site. Not all styles are amenable to this sort of treatment, and a certain amount of finagling will be required to circumvent preformatted pages; style and theme preferences will, therefore, usually be limited to users who have paid for the privilege.

Styles and Prototypes

Unlike a theme, which is defined by a set of properties (accessed as entities), a style is defined by a set of template and prototype files. A full style comprises templates for all of the specialized types of directory, as well as definitions for all of the tags mentioned above. It should be written using the theme entities for colors and backgrounds.

Note that an index.ht or index.html file overrides a directory's index prototype, which is usually located in the /site/ directory.

Note:: Initially we will probably have to ignore directory prototypes and rely on /Ccgi/dir-index.cgi to do what is necessary.

Prototypes

Directory prototypes generally vary according to area. Currently, indices for alphabetical and chronological index spaces are handled specially. However, there needs to be some freedom to allow for local conditions. I.e. does a chronological index default to forward (appropriate for an archive), or reverse (as in a weblog or discussion forum) indexing?

member-dir.ht: Prototype for a member directory.
contrib-dir.ht: Prototype for a contributor's directory.

Naturally, specialized community sites like PenguinSong.net will have their own, specialized subdirectory prototypes, e.g. for song directories, song lists, and so on.

Forms

There are a couple of different ways of doing forms:

Inline, SparrowWeb style: Basically, clicking on an "edit icon" lets you add or edit an "item" -- table row, table entry, text block, etc. Requires a .ht file with extra attributes (not tags) defining field names and types. Requires a form prototype for every tag type to be edited.
Form-to-table: The data in the form makes an entry in a table (entries may have multiple lines, of course). The form may either be in a separate page or in the same page as the table.
Table-to-form: The form is actually derived from a table in the target page. Really something of a variant on the SparrowWeb style.
Form-to-cf: The data in the form ends up as a .cf property file. The form is basically just a page containing form tags; it can have documentation and so on.
Form-to-skeleton: The form is contained in a skeleton directory and generates the variable bindings that are used to process the skeleton. Keeping the resulting bindings in a skel.cf file in the result directory means that you can re-run it, but you lose edits if you do that.

The form-to-cf and form-to-skeleton variants are probably the easiest to implement, which is a good thing because we need them earliest.

Tracking Downloads

Note that the following discussion refers to ``tracks'' rather than ``songs'' -- any given song may be represented by multiple tracks, each corresponding to a particular performance. Note, too, that anything said about tracks applies equally well to images, books, and so on. Basically any kind of download that you want to control access to.

We have several different goals for downloads:

Keep a count of the number of tracks downloaded, both by tracks (so that we can pay royalties) and by user (so that we can enforce quotas, if any, and pay dividends).
Treat rights-holders differently, allowing them unlimited access to their own tracks.
Allow a user to download multiple copies, possibly in multiple bitrates or formats, for a limited time. Typically a user will listen to a track once or twice, then (possibly) save it to local disk.
Give the rights-holders full DAV access to their tracks. Note that a song may have multiple rights-holders (songwriter, composer, performer, and their publishers).
Make it easy to browse collections of tracks, using DAV, without actually downloading them. Ideally it should be possible not only to browse the collection but to listen to short preview clips.

So here's what we'll do:

Each track will be contained in its own directory. The directory will contain the per-track statistics file (or else this will be in a database -- not clear), the preview clip, and the multiple versions. It will also contain a properties file with metadata, including pointers to the song page, albums, and so on.
Tracks will be represented in the rights-holders' song directories, not by the song directory, but by the preview clip (which will thus be DAV-browsable) and a parallel symlink to the track directory, except in the performer's or recording company's album directory, which is the appropriate place for the actual directory. (Alternatively, the name of the preview clip might be derived from the track ID.)
In HTML track lists, both the preview and the download directory will of course have links.
The directory symlink will have a distinctive extension, allowing it to have an explicit handler. (With luck, the handler will take control even if there's path information after it.) An alternative would be to have a specialized handler for the directory's index file.
The track handler, when a download is requested, will log the download and then create a temporary, unique symlink to the track directory, to which the user's browser is redirected.
The temporary symlink's url path will be constructed deterministically from the track ID, the user name, and the date. Determinism means that all references to the track during the link's lifetime will go to the same place; the fact that the link already exists will keep the track from being counted again for accounting purposes. (Statistics may well be kept on a finer-grained basis on the track side; it's useful to know, for example, the typical number of downloads users make and what bitrates they prefer.)
A cron job will reap temporary symlinks after their lifetime (somewhere between an hour and a day) has expired.

The symlink URL path will probably have a sub-path of the form, e.g., mmdd/hh/xxxx where xxxx is a hash of the user's name. This will make it easy to find with minimal searching (of only the previous n+1 hours, where n is the lifetime), and minimize the number of directories that need to be searched for reaping.

Careful work with permissions will be required to make track directories directly accessible only to rights-holders (possibly only when logged in). This might, for example, involve an appropriate <Location> block for the temporary links that overrides the directories' .htaccess files.

Note that it would be easy to redirect track links to a different server or even to multiple different servers -- the ultimate extension would be a system where each artist with sufficient capacity serves their own tracks. You could even go further, to a Napster-like system in which listeners cache and serve recently-listened tracks. Unlike Napster it would be legal and tracks would have uniform, standardized metadata, but it would give the same sort of view of who has what and what their bandwidth is.

Some of the things we should be able to do include not only detailed statistics but, for users who opt in to the feature, the ability to contact other users currently or recently listening to the same song (possibly on other tracks).

$Id: notes.html,v 1.8 2002/09/27 14:50:48 steve Exp $

Stephen R. Savitzky <steve@theStarport.org>