Jack is a small program that traverses the local file system, and collects keywords from the
hypertext files that the related Web server has access to. Currently it takes keywords from the meta tags
from the .html pages and puts them in the appropriate format. Jack will make both the Lindex and Cindex - in
that order, and can be called at a set time using crontab(1).
Jack has the following parameters:
-ht_dir <directory name> - specify the htdocs directory the
related Web server is using
-lindex <filename> - specify the lindex filename
-cindex <filename> - specify the cindex filename
-server <servername> - specify the name of the related server
-tmp <filename> - specify the temp. filename lindex can use
-logfile <filename> - specify the log filename lindex should use
-I <filename> - specify a file that lists additional directories
that jack should go through
-i <filename> - specify a file that lists directories jack should
only go through (not implemented yet)
+tilda - specify that there should be no simplification to ~
-h - help page
Jack traverses the file system in two ways. It can take directories the user specified,
or it can traverse Web directories of each user in the file system by getting the usernames and
home directories from the password file.
Jack will first parse the files with a .htm or .html suffix, in a directory, then it will go into
sub-directories to repeat the process. As a contingency plan, Jack will stop after descending 3 levels of
sub-directories, so it will not run out of control.
A problem has arisen while making Jack of recording URLs:
Some Web servers do not allow
the path of documents to be shortened,
(for example /u/hon/twyt/public_html/ shortens to
~twyt/), so the option +tilde has been placed that
stops Jack from simplifying the path.