
			The Introducing AUB Document


	1.	What is aub?

	More and more people are posting binary files to usenet these days.
Some of these binaries are executables and audio data; a majority seem to
be pictures of various things, typically landscapes, movie stars and naked
people.  Because of limitations in the type data that usenet can accommodate,
binaries must be encoded into text, and because binary files are commonly very
large relative to text files usenet was designed to handle, they frequently
must be broken up into pieces.  Programs have been developed which take a
given binary, encode it, and automatically post it in pieces with descriptive
subject lines.

	When this data arrives at a remote site, users see subject lines
that look something like this:

		12011 roadkill03.gif, part 1/4
		12012 roadkill03.gif, part 3/4
		12013 More pictures of tatooed children, please...
		12014 Re: roadkill02.gif -- I love the way the eyes bulge out
		12015 roadkill03.gif, part 4/4
		12016 roseanne_nude.jpg, part 02 of 02
	   	12017 Only BINARIES should be posted here, GOD DAMMIT
		12018 roadkill03.gif, part 2/4
		12019 HI, I'M BIFF!!!!  THESE PIX ARE WAY COOL!!!!
		12020 roseanne_nude.jpg, part 01 of 02

	While the process of encoding and splitting up binaries for posting
to usenet is relatively straightforward, the process of retrieving, sorting,
and decoding the pieces (which do not necessarily arrive in order) at
receiving sites is less straightforward, tedious, time consuming, and very
prone to human error.

	aub, which stands for "assemble usenet binaries", automates this
reassembly process for you.  aub is intended for use in newsgroups to which
binaries are posted exclusively.  When run, it accesses news articles via
either a disk-based news spool directory, or via an NNTP news server,
determines whether or not any new binaries have appeared in selected
newsgroups since the last time it was run, and if so, retrieves, organizes
and decodes them, depositing them in a configurable location.  This process
requires no human intervention once aub has been configured.  aub also keeps
track of binaries which it has seen some, but not all, of the pieces of.  It
remembers how to find these old pieces, so that when new, previously missing
pieces arrive at your site, it will build the entire binary the next time it
is run.  It also remembers which binaries it has already seen all of the
pieces of already, so that it does not waste time rebuilding the same binaries
over and over again.

	aub was created as a time saver; too many people at too many sites
were spending way too much time manually unpacking binary files.  Its ability
to identify and assemble binary images depends on people posting images with
subject lines that observe (loosely) established conventions.  aub's
recognition capabilities have been significantly improved since the earliest
release.


	2.	How does aub work?

	aub looks for subject lines containing strings like:

		N of N
		N / N
		N  N
		N | N

	where N is any number composed of one or more digits, and white
space is optional.  Once it sees such a line, it tries to figure out a
name for the binary by looking at the rest of the subject line.  These names
are relevant only to aub's internal functioning; when unpacked, binaries are
named according to the information they were encoded with.  However, it's
important that, whatever internal name aub decides on for the binary, this
name be recognizable in the subject lines of all pieces.

	aub ignores all news articles with null subject lines and subject
lines that begin with "Re:" regardless of other content.

	aub uses two files which are maintained in the $AUBHOME directory.
One is $AUBHOME/aubconf, which is a configuration file that allows you to
customize aub's behavior.  See section 5 for a detailed explanation of the
structure of configuration files.  The other file is $AUBHOME/aubrc.  You
should never need to modify this file; aub creates it and maintains it.  It's
used to keep track of what articles in which groups aub has resolved
already, and what articles aub believes to be pieces of binaries that it
hasn't seen all of the pieces of yet.


    	3.	What do I need on my system to run aub?

	You will need Larry Wall's perl interpreter. If you don't
already have it, head to CPAN (http://www.cpan.org) and follow links
according to your operating system/configuration.

	Your machine must also have access to news, either via the NNTP
NNTP protocol, or by being able to open raw news files on a disk somewhere.
Previous versions of aub required that your news access be NNTP-based; this
restriction has since been lifted.


	4.	How do I install aub?

	There's really only one thing that you might need to configure.
aub is a perl script.  The first line of the program looks like this:

		#!/usr/bin/perl

	This tells your shell where to find the perl interpreter.  If
the path of perl on your system is something else, you'll need to
change this line, or create a link called /usrbin/perl which points to
where your perl executable actually resides.

	If you need to change this, you'll probably see a message like:
'aub: Bad address.' when you try to run aub.


	5.	How do I configure aub?

	Very very old versions of aub made use of a configuration file
which was normally called $HOME/.aubinit.  But few interesting
customizations could be accomplished with .aubinit files, because the
configuration language was so primitive.  The configuration language
was redesigned to allow much greater flexibility.  Old .aubinit files
will no longer work, or be recognized by aub (except inasmuch as aub
will notice them and point out to you that you need to create a new
configuration file if you don't already have one.)  The new
configuration file for aub should be called $AUBHOME/.

	Configuration files are line-oriented; each line is processed
separately.  If any line contains the '#' character, aub concludes that
the character begins a comment, and discards the comment character and
everything on the line that follows it.  If for some reason you need to
put a '#' character in your configuration file and do not want it to be
interpreted as beginning a comment, you'll have to escape it by preceding it
with a backslash character, e.g. '\#'.

	Each non-blank line in a configuration file must begin with a
keyword recognized by aub.  The case of keywords is not significant.
As far as aub is concerned, "keyword", "KEYWORD", "Keyword" and
"KeYWorD" all mean the same thing.  Some keywords require arguments;
some require no arguments appear, and some permit varialbe numbers of
arguments.  If aub sees keywords it doesn't understand in your aubconf
file, it will complain to you about them.

	One of the keywords aub understands is the GROUP keyword.
It's used to tell aub that you want to decode binaries from the
newsgroup(s) which appear as argument(s) to the keyword.  For example:

		GROUP alt.binaries.pictures.misc
		GROUP alt.binaries.pictures.misc alt.binaries.pictures.fractals

	Every configuration file must contain at least one GROUP keyword to
be correct.

	In general, aub understands two types of keywords.  One type is
called 'position insensitive', which means that the keyword will have the
same effect no matter where in the configuration file it appears.  The
other type is called 'position sensitive', which means that the keyword
means something different when it appears before any GROUP keywords than
it does when it appears after any given GROUP keyword.

	One such position sensitive keyword is the DIRectory keyword.
This keyword is used to tell aub what directory to put binaries it decodes
in.  ("DIRectory" is spelled the way it is because only the 'DIR' part needs
to appear in a configuration file for aub to recognize it.  In fact, aub will
interpret any keyword beginning with the letters 'DIR' as being an instance
of the DIRectory keyword.)

	When a position sensitive keyword appears _before_ any GROUP keyword,
the keyword is interpreted as being the default for all groups that appear
later.

	When a position sensitive keyword appears _after_ any GROUP keyword,
it is interpreting as applying *only* to that group, overriding any previous
default which may have been established via use of the same keyword, or
by the value of environment variables (see section 8.)

	Position sensitive keywords appearing after a GROUP keyword which
lists multiple groups are applied only to the last group listed, not to
all groups appearing on the group line.

	For example, the following three configuration files are equivalent:

	# Sample aubconf file no. 1 -- basic example
	#
	dir /tmp/aub					# Default directory
	group alt.binaries.pictures.misc		# Process these
	group alt.binaries.pictures.fractals		#  two groups

        # Sample aubconf file no. 2 -- multiple group usage, mixed case
        #
        DiR /tmp/aub                                    # Default directory
        gRoUp alt.binaries.pictures.misc alt.binaries.pictures.fractals

        # Sample aubconf file no. 3 -- does not use defaults
        #
        group alt.binaries.pictures.misc
        directory /tmp/aub
        group alt.binaries.pictures.fractals
        direct-to /tmp/aub                           	# 'dir' is all you need

	The following three configuration files are also equivalent, though
not equivalent to the previous three:

        # Sample aubconf file no. 4 -- explicit placement of binaries
        #
        group alt.binaries.pictures.misc
        dir /tmp/aub/misc
        group alt.binaries.pictures.fractals
	dir /tmp/aub/fractals

        # Sample aubconf file no. 5 -- explicit and default placement
        #
        dir /tmp/aub/misc   				# Default directory
        group alt.binaries.pictures.misc		# Use default directory
        group alt.binaries.pictures.fractals
	dir /tmp/aub/fractals				# Override default

        # Sample aubconf file no. 6 -- explicit and default placement revisited
        #
        dir /tmp/aub/fractals 				# Default directory
        group alt.binaries.pictures.misc
	dir /tmp/aub/fractals				# Override default
        group alt.binaries.pictures.fractals		# Use default directory

	The configuration file:

	# Sample aubconf file no. 7 -- invalid
	#
	group alt.binaries.pictures.misc
	dir /tmp/aub
	group alt.binaries.pictures.fractals		# No good

	is invalid, because no directory for aub to place binaries decoded
from the newsgroup alt.binaries.pictures.fractals is specified.  The
DIRectory keyword is unique in this regard; there must be some use of the
keyword that enables aub to figure out where to put binaries for every
group specified, or it will refuse to run.  The easiest way to deal with
this is to always establish a default directory by using the DIRectory
keyword somewhere before any groups appear.


	Other position sensitive keywords are available.


		DESCription <file>

	This keyword causes aub to extract text from what it thinks is the 
text portion of posted articles, and append it to the file you specify.  This
is useful if you're interested in reading the text that describes what all
the binaries aub is unpacking are about.  A maximum of 60 lines per binary
extracted will be put into the file you indicate.  Each description is
prepended with the name of the decoded binary it refers to, and the group
that binary was decoded from.


		HOOK <program>

	This keyword enables you to select which binaries aub decodes
using your own software.  If the HOOK keyword is specified, aub will
invoke the argument program and supply it with subject line of the first
piece of a binary that it can potentially decode via standard input.  If the
program returns true (zero), aub will decode the binary.  If the program
returns false (non-zero), aub will skip decoding the binary, and continue
processing.

	It is not (yet) possible to specify arguments to the user program.

	For example, the following sample program returns true if standard
input contains the string ".gif" (case insignificant), and false otherwise.

	#!/usrbin/perl
	#
	# /tmp/sample_aub_hook: a simple, sample hook program
	#

	$sl = <STDIN>;                  # Get standard input
	exit(0) if ($sl =~ m/.gif/i);   # Contains ".gif"
	exit(1);			# Didn't see ".gif"

	Suppose this program were attached to aub via the configuration line:

		hook /tmp/sample_aub_hook

	Then aub would only decode binaries containing the string '.gif'.

	You can write hook programs in any language you choose.


		POSTprocess <postprocessor> <extn> ...

	This keyword enables you to postprocess binaries whose names end
in the string <extn> (you can list any number of these suffixes on a single
line in the configuration file.)  Case is not significant in <extn>.  Before
a POSTprocess keyword can appear, <postprocessor> must first be defined
using the DEFine keyword, which is position insensitive.  The format of
the DEFine keyword is

		DEFine	<postprocessor> <unix cmd>

	<postprocessor> may be any string.  It's recommended that you
stick to alphanumerics.

	<unix cmd> is any UNIX command, with arguments.  Simple substitutions
are performed on <unix cmd> before it's executed in conjunction with the
existenece of a POSTprocess keyword and the appearance of a binary whose
filename ends in one of the <extn> suffixes listed as arguments to the
POSTprocess keyword.  This all makes perfect sense but is a little difficult
to explain.  The following example should make things much clearer.

	Consider the following configuration file:

	# Sample aub configuration file demonstrating use of a postprocessor
	#
	dir /tmp/aubdir
	define jpg2gif djpeg -G $f > $h_.gif
	postprocess jpg2gif .jpg .jpeg
	group alt.binaries.pictures.misc

	The first line tells aub that it should decode binaries into the
directory /tmp/aubdir.  The second line defines a postprocessor for aub.
The name of the postprocessor is specified as "jpg2gif".  The third line
says that the postprocessor will be invoked whenever a binary with a name
ending in '.jpg' or '.jpeg' is decoded.  The fourth line specifies the
group that binaries are to be decoded from.

	Suppose the binary full_moon.jpeg is decoded from
alt.binaries.pictures.misc.  The binary name "full_moon.jpeg" can be
thought of as consisting of three parts; the head part -- everything before
the last '.' character --  the '.' character itself, and the tail part --
everything after the last '.' character.  aub uses the abbreviations
'$h', '$t', and '$f' to refer to the head part, tail part, and entire
filename, respectively.  (If no '.' character appears in the name of a
decoded binary, $h equals $f, the entire name of the binary, and $t is
empty.)

	Because the binary name "full_moon.jpeg" ends in ".jpeg", one of the
arguments specified on line two of the sample configuration file, aub
invokes the postprocessor "jpg2gif".  aub substitutes the appropriate
values for '$f' and '$h', in this case, "full_moon.jpeg" and "full_moon"
into the postprocessor definition, and executes the resulting UNIX command,
which in this case is 'djpeg -G full_moon.jpeg > full_moon_.gif'  Assuming
that you have the djpeg program on your machine (this software is available
via anonymous FTP from ftp.uu.net under the graphics/jpeg directory), this
command will cause the .jpeg file to be automatically converted into a
similarly named .gif file when it is decoded.

	A few more examples, again, based on the configuration file above

   Filename of decoded binary        $h		$t		$f
------------------------------------------------------------------------------
	crescent_moon.jpg	crescent_moon	jpg	crescent_moon.jpg
	big.dog.gif		big.dog		gif	big.dog.gif

   Filename of decoded binary	Postprocessed         Reason
------------------------------------------------------------------------------
	crescent_moon.jpg	   yes       $f ends in '.jpg'
	big.dog.gif		   no	     $f doesn't end in '.jpg' or in
					      '.jpeg'

    Filename of decoded binary	UNIX command executed
------------------------------------------------------------------------------
	crescent_moon.jpg	djpeg -G crescent_moon.jpg > crescent_moon_.gif
	big.dog.gif		(none executed)


	We could have easily have written:

		define jpg2gif djpeg -G $f > $h_.gif ; rm -f $f

	to cause aub to remove the old .jpeg version of the binary after
converting it to .gif format.

	I've added the extra underscore character in this example to
decrease the chance that djpeg, when it runs, will clobber another
binary which aub already unpacked with the name "full_moon.gif" or
"cresecent_moon.gif".

	Postprocessor definitions that can't be executed for some reason
may cause you (and aub) some problems at run time.


	The following keywords are, like DEFine, position independent:


		NNTP <server>

	This tells aub that your news access is NNTP-based, and that
it should use the specified host as an NNTP server. If you don't
include this line, aub will try to use the NNTPSERVER environment
variable.


		SPOOL <directory>

	This tells aub that your news access is based on access to raw news
files, and that <directory> is the root of the news spool tree.

	A single configuration file may not contain both the NNTP and SPOOL
keywords.

	If neither the NNTP keyword nor the SPOOL keyword appear in your
configuration file, aub will assume your news access is via NNTP and use
your NNTPSERVER environment variable, if it is defined, to decide what
server to connect to.  If your NNTPSERVER environment variable is not
defined, aub will try to figure out where you normally read news from.
If it can't do that, it will ask you to supply the information.

	If you ever change the mechanism by which you access news, or the
server you read news on, you'll need to remove the aubrc file that aub
maintains to keep track of what groups you have and have not read.  Otherwise,
because articles are numbered differently on different servers, aub will get
hopelessly confused.  (It's possible, though not recommended, to switch
seamlessly back and forth between NNTP and SPOOL access to news on the
same host.)  This is probably the only time you'll ever want to tamper with
a aubrc file.


		DEBUG <n>

	Sets the default debugging level aub runs at to N.  N must be a
non-negative integer.  Debugging level 0 is the default; when run at
debugging level zero, aub produces no output unless it runs into serious
problems.  Setting the debugging level to 1 will tell you about what aub is
doing.  Setting the debugging level to 2 will tell you even more about what
aub is doing.  Setting the debugging level to 3 or higher will show you
more than you ever wanted to know.


		RECognize <extn> ...

	The recognition code (the part of aub that identifies binaries)
maintains a list of common suffixes that it uses to recognize binaries
while it scans subject lines.  For example, many binaries have names ending
in ".gif", so ".gif" is on aub's internal list of hints.  The RECognize
keyword allows you to add suffixes to this internal list of hints.

	Use this capability sparinging.  You can really give aub a coronary
by saying something like 'rec a b c d e f g ...'.  Doing something foolish
like that will cause your aub to lose the ability to assemble things that it
would otherwise have been able to.

	The current list of common suffixes aub maintains is:

	".gif", ".jpg", ".jpeg", ".gl", ".zip", ".au", ".zoo", ".exe", ".dl",
	".snd", ".mpg", ".mpeg", ".tiff", ".lzh", ".wav"


		ONLYRECognize <extn> ...

	Works like RECognize, except that this list will not add to,
but will replace the current list of extensions, and aub will only
extract files that match one of the given extensions. Example:
'ONLYRECOGNIZE .zip .mp3 .gif' will only retrieve an article if its
subject line contains either of the strings ".zip" or ".mp3" or
".gif". All other articles will be ignored.


		SAMple <n>

        For each group, examine at most n messages. Note that fewer
than <n> messages may be loaded, if the newsgroup is missing articles.


		NOXHDR

	This keyword is meaningful only if your news access is NNTP-based.
It will cause aub to not use the XHDR command to access the subject lines
of news articles, even if the NNTP server you're using has XHDR capability.


		USER <name>

	This keyword is meaningful only if your news access is
NNTP-based. It enables NNTP authentication and provides the name you
will use to log in.


		PASS <password>

	The companion to USER, this keyword is used to provide the
password for NNTP authentication.


               SKIPunresolved

        This keyword will have aub completely ignore any unresolved
articles. On large groups, this can be a huge speed-up but it will
also result in aub ignoring any file that is incomplete as of the time
you run aub but may complete itself later. Be careful when you use
this as it will update the .aubrc file as if they had all be checked
and come negatively. In other words, you run this once and any
information you have about unresolved articles previous to running
this is thrown away. Using this keyword is not recommended but it's
here per a user request.

	If the same keyword appears multiple times, and the second
appearance is not a position sensitive override of some established default,
then aub ignores the second instance of the keyword.


	7.	How do I use aub?

	After you've built your configuration file, just run 'aub'.

	If this is the first time you've run aub since v2.1.1, you may
want to undefine any AUB-related environment variables you had set.  These
variables are interpreted differently now.  See section 8.  You will not
need to remove your aubrc file, but your .aubinit file is no longer useful
and you'll probably want to get rid of it once you've created aubconf.

	If this is the first time you've run any version of aub, ever, you
may want to use the '-c' command line option.  Or you may not...see section 9.


	8.	Environment variables used by aub.

	$AUBHOME	Sets the directory containing configuration
			files, rc file, etc.

	$AUBDIR	Sets the default directory binaries are unpacked into.
			Equivalent to specifying a DIRectory keyword before
			any GROUP keywords.  Will override any DIRectory
			keyword appearing before any GROUP keyword, but not
			those appearing after a GROUP keyword.

	$AUBDESC	Analogous to $AUBDIR

	$AUBHOOK	Analogous to $AUBDIR

	$NNTPSERVER	Specifies an NNTP server to use for news access if
			no NNTP keyword appears in the configuration file.
			If an NNTP keyword does appear, $NNTPSERVER is
			ignored.

	Note that $AUBGROUPS is no longer used as of version 2.1.2.

	If aub doesn't seem to be doing what you'd expect it to do based
on your aubconf file, it could be because your environment variables
are causing defaults you've established there to be ignored.


	9.	Command line options supported by aub:

	-c		'Catch-up' mode; aub will bring its internal
			pointers (and your aubrc file) up to date, but will
			not actually generate any binaries.  This is useful
			when you run aub for the first time; it keeps it
			from generating megabytes and megabytes, as it scans
			old news articles.

	-n		'No-checkpoint' mode; prohibits aub from updating
			its internal pointers (your aubrc file).  This option
			is primarily useful only during debugging.

	-dn		'Debug' mode; sets the debugging level to N.  This
			overrides the debugging level set in the configuration
			file, except that 'aub -d0' does not work...this is a
			bug.

	-M		Causes aub to print the long form of the documentation
			(this document.)

	-m		Causes aub to print a summary of the documentation.

	-C		Lists significant changes since that last major
			release of aub.

	-v		Print version number.


	10.	What do I do if I have problems installing or configuring aub?

	See if you can figure out what the problem is.  I've only set aub
up on my local system, so it's possible you could have problems I haven't
foreseen.  If you really can't get it to work, try talking to a friend who
knows systems programming and administration type stuff.  Offer your friend
food -- systems people especially like dim sum and Heineken.

	You could also send me mail.  Whether or not I answer your mail will
depend a lot on how busy I am.  Sorry, but I have an obligation to get work
done promptly for my client, who's paying me for my time.  I can't really deal
with supporting aub on the side for the entire net.  Also, if your problem
has to do with peculiarities of your local site, there may not be a lot I
can do about it.


	11.	What else do I need to know?

	In order to guarantee proper administration of the aubrc file,
you can only run one instance of aub at a time.  In this respect aub is
similar to most newsreaders.

	The first time you run aub over a given group, if you choose not to
use the -c option, it may take a long time to run.  This is because it's
looking at all of the articles in the group, and building lots of binaries
After you run it for the first time, it only needs to look at new stuff in
the group.  Things will go much faster after that.

	If aub assembles two binaries with the same name, and wants to store
them in the same place, it will compare them to see whether or not they're
identical.  If they are identical, it will discard the newer copy.  If
they're not identical, it will append '+' characters as necessary to the
name of the second binary until the name is unique.

	aub checkpoints its progress in the aubrc file after processing
each group.  This keeps it from having to start all over again if it dies
of a signal, expired CPU time limit, etc...

	aub takes liberties with changing around the names of binaries
that it doesn't particularly like.  It may rename binaries to be called
"Mangled" if people post things that are supposed to be unpacked to "." or
"..", or something equally obnoxious, for instance.  It will drop the
leading "." off of binaries called ".something", and relativize pathnames
so that your binaries always wind up in the directories you want them in.

	It's unfriendly to run aub so often that you occupy too much of your
news server's time.

	It's pronounced "oww-buh", as in "S(au)di", not "awe-buh", as in
"sl(aw)".

	This software is offered as-is, with no guarantees or promises made
by me whatsoever.  I disclaim all responsibility for loss or damage caused
by the program.


	Original Author (08/1992):	Mark Stantz <stantz@sgi.com>
        Contribting Authors:            Bradon Long <blong@fiction.net>
                                        Mako Hill <mako@debian.org>
	[ Current Version (11/2001):	Avinash Chopde <avinash@acm.org> ]

