README for L5 vers 1.1, 950207, *Hobbit*

After examining Tripwire and deciding that it was *way* overkill for my own
purposes, I decided to cobble together my own minimalist solution to the unix
file integrity problem.  I call it "L5", for a variety of reasons, and have
decided to present it to the community as a Useful Hack.  For all I know it
may have already been done elsewhere, but I haven't yet seen such a thing
mentioned, despite the simple underlying concept.

L5 simply walks down Unix or DOS filesystems, sort of like "ls -R" or "find"
would, generating listings of anything it finds there.  It tells you everything
it can about a file's status, and adds on the MD5 hash of it.  Its output is
rather "numeric", but it is a very simple format and is designed to be
post-treated by scripts that call L5.  Here are some of its other features:

	Filenames come first, making sorting easier.

	Filenames are delimited in a non-[unix]-spoofable way; ending in
	"//".  A single character after "//" indicates the file type.

	Scanning stops at device boundaries, so L5 doesn't go slogging
	through random NFS trees or "tmpfs"es unless you tell it to.

	You can tell it not to walk any directories lower than the
	one[s] you handed it as arguments.  [It always walks one level
	of its given arguments.]

	You can tell it to only print the filenames.

	You can specify a file to use as a "timestamp", and L5 will only
	print files that have been changed since the timestamp.  Useful
	for automated backup scripts; see below.

	If a file looks like a script of some kind, it is shown as type
	"K" instead of "F".  Useful for finding those setuid shell scripts...

	MD5 hashing can be output in hex, Tripwire's radix64 format, or
	not at all, as you specify.  The hex hash for a given file is the
	same as that of the CERT "md5check".

	You can feed it a list of files or directories to check as its
	standard input.

	You can have it do its hash *on* standard input.  This feature is
	useful for doing things like "l5 /critical/files | l5" to get a
	small but secure summary hash.

	It is small and reasonably fast.

Building is straightforward.  The Makefile's primary targets are system types,
so "make" one of those.  If you don't see your system listed, try "generic" or
check out generic.h to see how to build an appropriate section for your own. 
If you do build a section for a new system, please send me a copy to add to my
collection!  And yes, I know about GNU autoconf already.

After building, you should try these two tests.  First, doing
"echo foofoo | ./l5" [on unix] should produce

   -STANDARD INPUT-//X - - -/- 7 - 1vnGam6fDhYM5zgofZB2Ei

Next, do "./l5 /dev/ttyp0" and make sure it shows the same major and minor
device numbers as "ls -l /dev/ttyp0" does.  [It may be /dev/pts/something or
/devices/pseudo/pts@god-knows-what your system.  Do the right thing.]

Some of it is based on code from Tripwire, but it doesn't use a DBM database
and only offers one hash method.  The MD5 code, in particular, is the endian-
independent version from Tripwire, which builds almost anywhere.  Selection of
files to ignore certain changes in is undoubtedly less versatile, but you can
always filter the output through further scripts before, for example, diffing
your "old" system snapshot against your "new" system snapshot.

Unlike Tripwire itself, this is NOT a complete toolkit -- one is expected to
use it as a small, reliable part of a larger system.  I generally run the
output of L5 through "sed" scripts -- here are some example regexes to select
certain criteria to "p" or "d", for instance.

	# set[ug]id files; also handles extra "tcb" mode bits on AIX
	/\/\/. [0-9]* [10]*[2-7]... /

	# world-writeable plain files
	/\/\/. [0-9]* 10...[2367] /

	# world-writeable directories
	/\/\/D [0-9]* 4...[2367] /

	# low-numbered major devices, such as mem/kmem
	/\/\/[CB].* [0-9],/

	# logfiles in /usr/adm, /var/log, etc that are always changing anyway
	/\/[al][do][mg]\/.*log\/\/F /

	# You get the idea...

Output is in the following format:

/file/name//T inode mode links uid/gid size mtime extra

where:	T	is the file type [see below]
	inode	inode number, a la "ls -i"
	mode	whole mode in octal
	links	number of hardlinks
	uid	owner UID
	gid	owner GID
	size	in decimal
	mtime	in long hex
	extra	varies

File types are as follows:

	F	plain file.  "extra" is the MD5 hash, or "-".
	K	plain file that looks like a script  [#!/bin/foo].
	L	symlink.  "extra" is where it points to.
	D	directory.  "extra" is the device it lives on.
	C	character device special.  "extra" is major,minor.
	B	block device special.  "extra" is major,minor.
	P	FIFO.  No "extra".
	S	socket.  No "extra".
	X	unknown, or standard-input.

Some generic examples:

/tmp//D 2 43777 6 0/0 512 2e71fccc 703
/etc/passwd//F 194 100644 1 0/10 10927 2e70ea30 0LOyVbfQFCUvq64c3XePV5
/dev/console//C 3876 20622 1 0/0 0 2e71fc14 0,0
/diskless/root/dev/fd0b//B 169025 60666 1 0/1 0 2e0a2ea3 16,1
/dev/rst8//C 1693 20666 1 0/1 0 2e0a2eea 18,8
/tools/src/localbuild.csh//K 134417 100644 1 433/100 288 2d90c589 -

L5 should build on DOS machines unmodified.  The creaky old compiler I have
doesn't have a simple readdir() equivalent, so I've supplied one.  Newer
compilers may have library functions to handle this.  L5 still has a little
trouble with pathnames involving "\" and "C:", and stat() on DOS is somewhat
incomplete, but it's still useful for detecting changes in a list of paths you
give it.  The MD5 algorithm is, however, dismally slow in x86 real mode,
especially if you have disabled your machine's memory caching.

It could conceivably be ported to other system types like VMS, if appropriate
directory-walking handlers were supplied.  There's probably already plenty
of example code for that in the gnu stuff, for instance.  I don't have ready
access to a VMS machine at the moment; if you go to do a VMS port please try
to keep the output format the same, i.e. translate from [FOO.BAR]BAZ.XXX to
/FOO/BAR/BAZ.XXX format, and make sure you send me the diffs and extra code!!

L5 prints the mtime rather than the ctime, which some may consider to be
insufficiently "sensitive" due to the ease with which mtimes can be changed.
However, I've run across some systems where a simple file access [like
"cat"ing it] updates the ctime, and if someone has run a backup or something,
you're going to get a *lot* of new output.  That one was really disturbing
until I figured out what was going on.  Besides, if you observe a critical
system file change in no way but the hash value, that's MUCH more suspicious
than if other attributes changed too.  If you want ctimes anyway, change

	statp->st_size, statp->st_mtime, op);
to
	statp->st_size, statp->st_ctime, op);

in the "big printf" around line 380 in l5.c [and beware the "offconv" hack].

If L5 is given a timestamp file, its functionality changes somewhat.  First, it
takes the timestamp from the *mtime* of the given file, and compares it against
the *ctime* of any files encountered in its travels.  Second, only newer files
are printed, directories are always printed, and everything else like devices
and links are ignored.  This is designed to pipe into "cpio -pvadm" as an
automatic backup handler that duplicates a tree into another place.  There is
a problem here, that this functionality works around: if cpio is only handed
files, with the -d switch meaning "create needed directories", the resulting
dirs are all owned by root, and possibly mode 700.  If directory names are fed
into cpio too, the ownership and modes of them are preserved.

Thus, if you want users to be able to grab their own files out of the backup
tree, this needs to print directories.  If you want to be fascist about the
backup tree, by all means, take out the directory check around line 420 in
l5.c.  You could always use something other than "cpio", too, although nothing
immediately comes to mind that's any better.
  
An effective "online backup" script can be as simple as

	touch timestampfile.NEW
	l5 -q -t timestampfile.OLD /various /file /systems | \
	    cpio -pvadm /backuptree 2>> backup.log
	mv timestampfile.NEW timestampfile.OLD

Note carefully how L5 is invoked, so that only new-file and directory names get
printed; otherwise cpio will be *most* confused.  The ctime is compared, rather
than the mtime, so the file gets backed up again if someone changed its modes,
or name, or whatever.  The standard error output from cpio is collected to
backup.log so one can see that it ran correctly.

The name "L5" seemed appropriate for a number of slightly silly reasons:

	It does most of what "ls" does, but more.

	It does MD *5* checksums.

	Shady characters might spell "ls" as "l5" to be |<00L.

	The "L5" point in space is a point of gravitational stability -- if
	you're there, you don't have to worry about drifting away.  I think
	that's what it is, anyway...

You are hereby WARNED that I sometimes tend to write rather, um, expressive
code and comments.  If you don't like observing the occasional obscenity,
don't read it.  This is supplied AS IS, by a hacker for hackers, and you're
expected to be able to deal with any quirks or deficiencies.  I have tried to
make it as solid and portable as I could, while retaining the freedom to take
swipes at well-known vendor stupidities.  If you make improvements, send them
to me so I can update the "master copy".

The major/minor bitshift on AIX appears to be 16, while 18 on solaris and
8 most other places.  This is gross, and I've sort of sleazed around it so that
AIX and Solaris work now.  Search l5.c for "sysmacros" if you want to gaze at
the wreckage.

You cannot use this in any commercial product; it contains bits of code from
Tripwire which is free in the first place.  To avoid having myself, Spaf,
*and* PKP hunting you down to make sure you never work in the field again,
don't go trying to sell this.

_H*
