How to write filters for the AAFID(tm) system
$Revision: 1.3 $

 ======================================================================
 This file is Copyright 1998,1999 by the Purdue Research Foundation and
 may only be used under license.  For terms of the license, see the
 file named COPYRIGHT included with this software release.
 AAFID is a trademark of the Purdue Research Foundation.
 All rights reserved.
 ======================================================================

NOTE: Please see doc/papers/users_guide_draft.ps for an up-to-date
      tutorial on writing filters. The information in this file is
      slightly outdated.

Filters
=======

A Filter is a program that is used in the AAFID system to pass
information from a data source to an Agent. The purpose of this
mechanism is to simplify the way to get the appropriate information to
the agent and to reduce the time to process the information. See the
file Filters.txt for a more detailed description of the general
mechanism and its objectives.

Composition of the implementation of a filter
=============================================

The process by which a filter works is composed of two classes. The
first one is the AAFID::Filter class, to which we refer as the base
class because in it is specified the basic and general structure. Each
specific filter is implemented as a subclass of AAFID::Filter that
defines the functionality that is specific to the particular
filter.

The base class defines, among other things, the basic functionality
for reading data from a log file and splitting it into fields. It also
defines the SETPATTERN command (by means of the command_SETPATTERN
subroutine) that takes care of receiving pattern specification from
the agents and storing them for future use. Finally, it implements the
generic main loop of a filter, which processes input from other
agents, reads data from its data source, and sends the appropriate
data, as specified by the patterns, to the entities that have
requested it. This is implemented in the ''run'' subroutine.

The second part of a filter implementation is the subclass that
implements the specific subroutines to manage the information
requiered.  For instance, in this part of the filter is specified the
way that the information has to be obtained. In the general case, the
information is extracted from a logfile, but there are many other ways
to extract information; for example, from the log of a program or
command, from two or more files or commands, etc. Because this is a
specific characteristic of the information, it has to be implemented
in the subclass. The implementer of the subclass can provide specific
initialization functions in the ``Init_log'' subroutine, and she has
to implement a ``getline'' subroutine that returns a single line of
text containing the next record of information.

Both Init_log and getline are defined in the base class in default
forms that open and read data from the file specified in the
LogFile_Name parameter. So if the filter that is being implemented
reads from a single log file, the author of the filter only has to
specify the file name in the LogFile_Name parameter, and the rest will
be done automatically.

Another part of the functionality of a filter is the organization of
each information record into fields, which allow for more meaningful
specification of the selection patterns that the agents need. The
choice of number and names of fields is done by each specific filter
and may be arbitrary.

Because this is another filter-specific functionality, the next
functions that the subclass has to define are the ``makefield'' and
``makeline'' functions. In the first subroutine, the line is cut into
fields and returned as a hash reference where the keys are the field
names and each element contains the corresponding information. The
second one takes a hash reference as produced by makefield and
reconstructs the line, returning it as a string.

Again, the base class provides default behavior for both of these
subroutines. The default version of makefield uses the DataFields
parameter, if defined, to determine the number and names of
blank-space-separated fields to generate. The DataFields parameter
should be an array in which each element contains a field name. The
fields will be extracted in the order they appear in the array. If the
DataFields parameter is not defined, generic fields called "Field0",
"Field1", etc. will be created, as many as there are blank-separated
items in the line read.

The default version of makeline simply concatenates all the fields (in
the order given by DataFields, or in numeric order in the case of
generic field names) separated by spaces.

Of course, the filter author may override one or both of makefield and
makeline to handle the specific format of the data that the filter
handling.

Generic algorithm of a filter
=============================

Do initialization
loop
  Process input from other entities
  Get input from data source
  If input was received then
            Process the information received from the data source
            Send the processed information to the appropriate agents
  End if 
  Sleep for a certain amount of time
end loop


How to write filters: specific instructions
==========================================

- Decide where to get the data from. It can be a log file, several log
  files, the output of a command, the result of some system call, or
  any combination of them.

  The initialization for the mechanism that gets the data goes in
  the Init_log subroutine.

  The code that reads one record of data and returns it (it may
  combine data from several sources, as mentioned above) as a single
  string goes in the getline subroutine. Notice that getline has to
  return undef when there is no data to read, and the filter
  infrastructure expects this to happen on a regular basis. So you
  have to make sure that getline returns undef every once in a while
  (for example, if it is executing a command, return undef once every
  time the command is about to be executed again).

  SPECIAL CASE: If the data has to be read from a single file, and
  each record is contained in a single line, then the built-in
  functionality can be used. In this case, you only have to define in
  the subclass the LogFile_Name parameter with the path of the file to
  read. For example, if the data is going to be read from file
  /usr/adm/messages, then something like this at the beginning of the
  subclass will suffice:

  %PARAMETERS=(
		LogFile_Name => '/usr/adm/messages'
	      );

  (note that some other parameters may also go in %PARAMETERS, which
  is the standard way of defining entity parameters).

- Decide on the fields in which the data is going to be split
  internally, and the format in which it is going to be sent to the
  agents.

  The code that takes a string (as generated by getline) and returns a
  hash reference where the keys are the field names and the elements
  contain the appropriate pieces of information goes in the makefield
  subroutine.

  The code that takes a hash reference (as generated by makefield) and
  returns a string in the format that will be passed to the agents
  (which may or may not be the same format generated by getline) goes
  in the makeline subroutine.

  SPECIAL CASE: If the fields can be obtained by splitting the line
  into blank-space-separated tokens, then the built-in functionality
  can be used. In this case, you only need to define the DataFields
  parameter as an array reference that contains the field names, in
  the order they appear in the string. The last parameter will
  "swallow" the rest of the line from where it starts to the end. If a
  line contains fewer tokens than fields, the extra fields will be
  filled in with undef values.

  For example, if the fields are "Month", "Day", "Time", "Host",
  "Message" and they are blank-space-separated, the following
  definition may suffice:

  %PARAMETERS=(
		DataFields => [qw(Month Day Time Host Message)];
	      );

- Decide on a name for the filter.

- Store the code for the filter in a file called <Filtername>.pm
  (where <Filtername> is the name you decided for your filter), using
  the following template:

#!/p/perl/perl -w

package Filter::<Filtername>;

use AAFID::Filter;
use AAFID::Common;
use AAFID::Entity;

use vars qw(%PARAMETERS @ISA);

@ISA=qw(AAFID::Filter);

%PARAMETERS=(
	<any parameters that you need to define, including
	LogFile_Name and DataFields>
	    );

<all the subroutines you need to define, including Init_log, getline,
makefield and makeline>

_EndOfEntity;

	
