AMANDA DUMPER API

Last modified $Date: 1998/10/06 17:17:00 $

by Alexandre Oliva <oliva@dcc.unicamp.br>

1. INTRODUCTION

This is a proposal of a mechanism for Amanda to support arbitrary
backup programs, that relies on a generic backup driver and scripts or
programs that interface with backup programs such as dump, tar,
smbclient, and others.  It can also be used to introduce pre- and
post-backup commands.

The interface is simple, but supports everything that is currently
supported by Amanda, and it can be consistently extended to support
new abstractions that may be introduced in the backup driver in the
future.

This proposal does not imply any modification in the Amanda protocol
or in Amanda servers; only Amanda clients have to be modified.  By
Amanda clients, we refer to hosts whose disks are to be backed up;
an Amanda server is a host connected to a tape unit.

Currently (as of release 2.4.1 of Amanda), Amanda clients support
three operations: selfcheck, estimate and backup.

Selfcheck is used by the server program amcheck, to check whether a
client is responding or if there are configuration or permission
problems in the client that might prevent the backup from taking
place.

Estimates are requested by the Amanda planner, that runs on the server
and collects information about the expected sizes of backups of each
disk at several levels.  Given this information and the amount of
available tape space, the planner can select which disks and which
levels it should tell dumper to run.

Dumper is yet another server-side program; it requests clients to
perform dumps, as determined by planner, and stores these dumps in
holding disks or sends them directly to the taper program.  The
interaction between dumper and taper is beyond the scope of this text.

We are going to focus on the interaction between the Amanda client
program and wrappers of dump programs.  These wrappers must implement
the DUMPER API.  The dumptype option `program' should name the wrapper
that will be used to back up filesystems of that dumptype.  One
wrapper may call another, so as to extend its functionality.

2. THE PROBLEM

Different backup programs present distinct requirements; some must be
run as super-user, whereas others can be run under other user-ids.
Some require a directory name, the root of the tree to be backed up;
others prefer a raw device name; some don't even refer to local disks
(SAMBA).  Some wrappers may need to know a filesystem type in order to
decide which particular backup program to use (dump, vdump, vxdump,
xfsdump, backup).

Some provide special options for estimates, whereas others must be
started as if a complete dump were to be performed, and must be killed
as soon as they print an estimate.

Furthermore, the output formats of these backup programs vary wildly.
Some will print estimates and total sizes in bytes, in 512-byte tape
blocks units, in Kbytes, Mbytes, Gbytes, and possibly Tbytes in the
near future.  Some will print a timestamp for the backup; some won't.

There are also restrictions related with possible scheduling policies.
For example, some backup programs only support full backups or
incrementals based on the last full backup (0-1).  Some support full
backups or incrementals based on the last backup, be it a full or an
incremental backup (0-inf++).  Some support incrementals based on a
timestamp (incr/date); whereas others are based on a limited number of
incremental levels, but incrementals of the same level can be
repeated, such as dump (0-9).

Amanda was originally built upon DUMP incremental levels, so this is
the only model it currently supports.  Backup programs that use other
incremental management mechanisms had to be adapted to this policy.
Wrapper scripts are responsible for this adaptation.

Another important issue has to do with index generation.  Some backup
programs can generate indexes, but each one lists files in its own
particular format, but they must be stored in a common format, so that 
the Amanda server can manipulate them.

The DUMPER API must accomodate for all these variations.

3. OVERVIEW OF THE API

We are going to define a standard format of argument lists that the
backup driver will provide to wrapper programs, and the expected
result of the execution of these wrappers.

The first argument to a wrapper should always be a command name.  If
no arguments are given, or an unsupported command is requested, an
error message should be printed to stderr, and the program should
terminate with exit status 1.

3.1.  The `support' command

As a general mechanism for Amanda to probe for features provided by a
backup program, a wrapper script must support at least the `support'
command.  Some features must be supported, and Amanda won't ever ask
about them.  Others will be considered as extensions, and Amanda will
ask the wrapper whether they are supported before issuing the
corresponding commands.

3.1.1. The `level-incrementals' subcommand

For example, before requesting for an incremental backup of a given
level, Amanda should ask the wrapper whether the backup program
supports level-based incrementals.  We don't currently support backup
programs that don't, but we may in the future, so it would be nice if
wrappers already implemented the command `support level-incrementals',
by returning a 0 exit status, printing, say, the maximum incremental
level it supports, i.e., 9.  A sample session would be:

% /usr/local/amanda/libexec/wrappers/DUMP support level-incrementals hda0
9

Note that the result of this support command may depend on filesystem
information, so the disklist filesystem entry should be specified as a
command line argument.  In the next examples, we are not going to use
full pathnames to wrapper scripts any more.

We could have defined a `support' command for full backups, but I
can't think of a backup program that does not support full backups...

3.1.2. The `index' subcommand

The ability to produce index files is also subject to an invocation of
`support' command.  When the support sub-command is `index', like in
the invocation below, the wrapper must print a list of valid indexing
mechanisms, one per line, most preferred first.  If indexing is not
supported, nothing should be printed, and the exit status should be 1.

	DUMP support index hda0

The currently known indexing mechanisms are:

output: implies that the command `index-from-output' generates an
index file from the output produced by the backup program (for
example, from `tar -cv').

image: implies that the command `index-from-image' generates an index
file from a backup image (for example, `tar -t').

direct: implies that the `backup' command can produce an index file as
it generates the backup image.

parse: implies that the `backup-parse' command can produce an index
file as it generates the backup formatted output .

The indexing mechanisms will be explicitly requested with the additionnal
option `index-<mode>' in the `backup' and `backup-parse' command invocation.

`index-from-image' should be supported, if possible, even if other
index commands are not, since it can be used in the future to create
index files from previously backed up filesystems.  

3.1.3. The `parse-estimate' subcommand

The `parse-estimate' support subcommand print a list of valid mechanisms to
parse the estimate output and write the estimate size to its output, the
two mechanisms are:

direct: implies that the `estimate' command can produce the estimate output.

parse: implies that the `estimate-parse' command can produce the estimate
output when fed with the `estimate' output.

The estimate parsing mechanisms will be explicitly requested with the 
additionnal option `estimate-<mode>' in the `estimate' and 
`estimate-parse' command invocation.

3.1.4. The `parse-backup' subcommand

The `parse-backup' support subcommand print a list of valid mechanisms to
parse the backup stderr, the two mechanisms are:

direct: implies that the `backup' command can produce the
backup-formatted-ouput.

parse: implies that the `backup-parse' command can produce the 
backup-formatted-ouput when fed with the `backup' stderr.

The backup parsing mechanisms will be explicitly requested with the 
additionnal option `backup-<mode>' in the `backup' and `backup-parse'
command invocation.

3.1.5. Others subcommands

Some other standard `support' sub-commands are `exclude' and
`exclude-list'.

3.1.6.

One may think (and several people did :-) that there should be only
one support command, that would print information about all supported
commands.  The main arguments against this proposal have to do with
extensibility:

1) the availability of commands might vary from filesystem to
filesystem.  No, I don't have an example, I just want to keep it as
open as possible :-)

2) one support subcommand may require command line arguments that
others don't, and we can't know in advance what these command line
arguments are going to be

3) the output format and exit status conventions of a support command
may vary from command to command; the only pre-defined convention is
that, if a wrapper does not know about a support subcommand, it should
return exit status 1, implying that the inquired feature is not
supported.

3.2. The `selfcheck' command

We should support commands to perform self-checks, run estimates,
backups and restores (for future extensions of the Amanda protocol
so as to support restores)

A selfcheck request would go like this:

	DUMP selfcheck hda0 option option=value ...

The options specified as command-line arguments are dumptype options
enabled for that disk, such as `index', `norecord', etc.  Unknown
options should be ignored.  For each successful check, a message such
as:

OK [/dev/hda0 is readable]
OK [/usr/sbin/dump is executable]

Errors should be printed as:

ERROR [/etc/dumpdates is not writable]

If selfcheck needs super-user (or some other user, for that matter)
access to perform some tests, it should print to the standard output
either:

USER root
GROUP operator

The backup driver should then arrange to re-run the script as the
specified user/group.  Security concerns may impose restrictions on
privileges that can be given to wrapper scripts.  For example, we may
require that, in order to run a wrapper script as any other user or
group, the wrapper script must be in a separate directory, say
/usr/local/amanda/libexec/wrappers-protected, and that the script, its
containing directory and all its parents must only be writable by
root.

The need for starting programs as other users requires amandad (that
will incorporate all the functionality from selfcheck, sendsize and
sendbackup) to be setuid-root.  However, it will fork a child process
and drop to the amanda user privileges as soon as possible.  This
child process will be driven through a pipe, and it will be able to
start services as other users, in a way that no other user, not even
the backup operator, will be able to run arbitrary commands.


A wrapper script will certainly have to figure out either the disk
device name or its mount point, given a filesystem name such as
`hda0', as specified in the disklist.  In order to help these scripts,
Amanda provides a helper program that can guess device names, mount
points and filesystem types, when given disklist entries.

The filesystem type can be useful on some operation systems, in which
more than one dump program is available; this information can help
automatically selecting the appropriate dump program.


The exit status of selfcheck and of this alternate script are probably
going to be disregarded.  Anyway, for consistency, selfcheck should
return exit status 0 for complete success, 1 if any failures have
occurred and 2 if it needs additional permissions (USER/GROUP).  Note
that, if the wrapper needs a special permission to perform a test, it
should not report a failure for that test.

3.3. The `estimate' and `estimate-parse' commands

Estimate requests can be on several different forms.  An estimate of a
full backup may be requested, or estimates for level- or
timestamp-based incrementals:

  DUMP estimate full hda0 option ...
  DUMP estimate level 1 hda0 option ...
  DUMP estimate diff 1998:09:24:01:02:03 hda0 option ...


If the backup program needs privileged access to obtain estimates, it
should just print:

USER root
GROUP operator

and exit, with exit status 2.  If requested estimate type is not
supported, exit status 3 should be returned.

If the option `estimate-direct' is set, then the `estimate' command
should write to stdout the estimated size, in bytes, a pair of numbers
that, multiplied by one another, yield the estimated size in bytes.

If the option `estimate-parse' is set, then the `estimate' command 
should write to stdout the informations needed by the 
`estimate-parse' command, that should extract from its input the 
estimated size.

The syntax of `estimate-parse' is identical to that of `estimate'.

Both `estimate' and `estimate-parse' can output the word `KILL', after
printing the estimate.  In this case, Amanda will send a SIGTERM
signal to the process group of the `estimate' process.  If it does not
die within a few seconds, a SIGKILL will be issued.

If `estimate' or `estimate-parse' succeed, they should exit 0,
otherwise exit 1, except for the already listed cases of exit status 2
and 3.

3.4. The `backup' and `backup-parse' commands

The syntax of `backup' is the same as that of `estimate'.  The backup
image should be written to standard output, whereas stderr should be
used for the user-oriented output of the backup program and other
messages.

If the option `backup-direct' is set, then the `backup' command should 
write to stderr a formatted-output-backup.

If the option `backup-parse' is set, then the `backup' command 
should write to stderr the informations needed by the `backup-parse'
command, that should edit its input so that it prints to standard
output a formatted-output-backup.

If the option `no-record' is set, then the `backup' command should
not modify its state file (ex. dump should not modify /etc/dumpdates).

The syntax of `backup-parse' is identical to that of `backup'.

The syntax of the formatted-output-backup is as follow:
All lines should start with either `| ' for normal output, `? ' for
strange output or `& ' for error output.  If the wrapper can determine
the total backup size from the output of the backup program, it should
print a line starting with `# ', followed by the total backup size in
bytes or by a pair of numbers that, multiplied, yield the total backup
size; this number will be used for consistency check.

The option `index-direct' should cause commands `backup' to output 
the index directly to file descriptor 3.  The option `index-parse' 
should cause commands `backup-parse' to output the index directly to
file descriptor 3.  The syntax of the index file is described in the 
next section.

3.5. The `index-from-output' and `index-from-image' commands

The syntax of the `index-from-output' and `index-from-image' commands
is identical to the one of `backup'.  They are fed the backup output
or image, and they must produce a list of files and directories, one
per line, to the standard output.  Directories must be identified by
the `/' termination.

After the file name and a blank space, any additional information
about the file or directory, such as permission data, size, etc, can
be added.  For this reason, blanks and backslashes within filenames
should be quoted with backslashes.  Linefeeds should be represented as
`\n', although it is not always possible to distinguish linefeeds in
the middle of filenames from ones that separate one file from another,
in the output of, say `restore -t'.  It is not clear whether we should
also support quoting mechanisms such as `\xHH', `\OOO' or `\uXXXX'.

3.6. The `restore' command

Yet to be specified.

3.7. The `print-command' command

This command must be followed by a valid backup or restore command,
and it should print a shell-command that would produce an equivalent
result, i.e., that would perform the backup to standard output, or
that would restore the whole filesystem reading from standard input.
This command is to be included in the header of backup images, to ease
crash-recovery.

4. Conclusion

Well, that's all.  Drop us a note at the amanda-hackers mailing list
if you have suggestions to improve this document and/or the API.  Some
help on its implementation would be welcome too.
