1 - Purpose of this document
============================

This document describes how to debug parts of the Postfix mail
system, either by making the software log a lot of detail to the
syslog daemon, or by running some daemon processes under control
of an interactive debugger.

2 - Verbose logging for specific SMTP connections
=================================================

In /etc/postfix/main.cf, list the remote site name or address in
the "debug_peer_list" parameter. For example, in order to make the
software log a lot of information to the syslog daemon for connections
from or to the loopback interface:

    debug_peer_list = 127.0.0.1

You can specify one or more hosts, domains, addresses or net/masks.

2b - Record the SMTP connection with a sniffer
==============================================

This example uses tcpdump. In order to record a conversation you
need to specify a large enough buffer or else you will miss some
or all of the packet payload.

    tcpdump -w /file/name -s 2000 host hostname and port 25

Run this for a while, stop with Ctrl-C when done. To view the data
use a binary viewer, or use my tcpdumpx utility that is available
from ftp://ftp.porcupine.org/pub/debugging.

3 - Making Postfix daemon programs more verbose
===============================================

Append one or more -v options to selected daemon definitions in
/etc/postfix/master.cf and type "postfix reload". This will cause
a lot of activity to be logged to the syslog daemon.

4 - Manually tracing a Postfix daemon process
=============================================

Some systems allow you to inspect a running process with a system
call tracer. For example:

    # trace -p process-id (SunOS 4)
    # strace -p process-id (Linux and many others)
    # truss -p process-id (Solaris, FreeBSD)
    # ktrace -p process-id (generic 4.4BSD)

Even more informative are traces of system library calls. Examples:

    # ltrace -p process-id (Linux, also ported to FreeBSD and BSD/OS)
    # sotruss -p process-id (Solaris)

See your system documentation for details.

Tracing a running process can give valuable information about what
a process is attempting to do. This is as much information as you
can get without running an interactive debugger program, as described
in a later section.

5 - Automatically tracing a Postfix daemon process
==================================================

Postfix can attach a call tracer whenever a daemon process starts.

Append a -D option to the suspect command in /etc/postfix/master.cf,
for example:

    smtp      inet  n       -       n       -       -       smtpd -D

Edit the debugger_command definition in /etc/postfix/main.cf so
that it invokes the call tracer of your choice, for example:

    debugger_command =
         PATH=/bin:/usr/bin:/usr/local/bin
         (truss -p $process_id 2>&1 | logger -p mail.info) & sleep 5

Instead of truss use trace or strace.

Type "postfix reload" and watch the logfile.

6 - Running daemon programs under an interactive debugger
=========================================================

Append a -D option to the suspect command in /etc/postfix/master.cf,
for example:

    smtp      inet  n       -       n       -       -       smtpd -D

Edit the debugger_command definition in /etc/postfix/main.cf so
that it invokes the debugger of your choice, for example:

    debugger_command =
         PATH=/usr/bin:/usr/X11R6/bin
         xxgdb $daemon_directory/$process_name $process_id & sleep 5

If you use xxgdb, be sure that gdb is in the command search path.

Export XAUTHORITY so that X access control works, for example:

    % setenv XAUTHORITY ~/.Xauthority

Stop and start the Postfix system. 

Whenever the suspect daemon process is started, a debugger window
pops up and you can watch in detail what happens.

7 - Unreasonable behavior
=========================

Sometimes the behavior exhibit by Postfix just does not match the
source code. Why can a program deviate from the instructions given
by its author? There are two possibilities.

1 - The compiler has messed up.

2 - The hardware has messed up.

In both cases, the program being executed is not the program that
was supposed to be executed, so anything can happen.

There is a third possibility:

3 - Bugs in system software (kernel or libraries).

Hardware-related failures happen erratically, and they usually do
not reproduce after power cycling and rebooting the system.  There's
little I can do about bad hardware.  Be sure to use hardware that
at the very least can detect memory errors. Otherwise, Postfix will
just be a sitting duck waiting to be hit by a bit error. Critical
systems deserve real hardware.

When a compiler messes up, the problem can be reproduced whenever
the resulting program is run. Compiler errors are most likely to
happen in the code optimizer. If a problem is reproducible across
power cycles and system reboots, it can be worthwhile to rebuild
Postfix with optimization disabled, and to see if optimization
makes a difference.

In order to compile Postfix with optimizations turned off:

    % make tidy
    % make makefiles OPT=

This produces a set of Makefiles that do not request compiler
optimization. 

Once the makefiles are set up, build the software:

    % make
    % su
    # make install

And see if the problem reproduces. If the problem goes away, talk
to your vendor.
